AI
AI
ABBAS JAMALIPOUR
The University of Sydney
Australia
MARINA RUGGIERI
University of Rome Tor Vergata
Italy
Editors
Ovidiu Vermesan
SINTEF, Norway
Franz Wotawa
TU Graz, Austria
Björn Debaillie
imec, Belgium
River Publishers
Published 2022 by River Publishers
River Publishers
Alsbjergvej 10, 9260 Gistrup, Denmark
www.riverpublishers.com
© Ovidiu Vermesan, Franz Wotawa, Mario Diaz Nava, Björn Debaillie, 2022. This book is
published open access.
Open Access
This book is distributed under the terms of the Creative Commons Attribution
Non-Commercial 4.0 International License, CC-BY-NC 4.0) (http://creativecommons.org/
licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction
in any medium or format, as long as you give appropriate credit to the original author(s)
and the source, a link is provided to the Creative Commons license and any changes made
are indicated. The images or other third party material in this book are included in the work’s
Creative Commons license, unless indicated otherwise in the credit line; if such material is not
included in the work’s Creative Commons license and the respective action is not permitted by
statutory regulation, users will need to obtain permission from the license holder to duplicate,
adapt, or reproduce the material.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in
this book are believed to be true and accurate at the date of publication. Neither the publisher
nor the authors or the editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been made.
Dedication
“Without change there is no innovation, creativity, or incentive for improve
ment. Those who initiate change will have a better opportunity to manage the
change that is inevitable.”
- William Pollard
“The brain is like a muscle. When it is in use we feel very good. Understand
ing is joyous.”
- Carl Sagan
“By far, the greatest danger of Artificial Intelligence is that people conclude
too early that they understand it.”
- Eliezer Yudkowsky
Acknowledgement
The editors would like to thank all the contributors for their support in the
planning and preparation of this book. The recommendations and opinions
expressed in the book are those of the editors, authors, and contributors
and do not necessarily represent those of any organizations, employers, or
companies.
Ovidiu Vermesan
Franz Wotawa
Mario Diaz Nava
Björn Debaillie
Contents
Preface xv
vii
viii Contents
Index 205
xv
xvi Preface
xix
xx List of Figures
xxv
xxvi List of Tables
xxix
xxx List of Contributors
Abstract
In the last decade, there has been significant progress in the IoT domain due
to the advances in the accuracy of neural networks and the industrialization
of efficient neural network accelerator ASICs. However, intelligent devices
will need to be omnipresent to create a seamless consumer experience. To
make this a reality, further progress is still needed in the low-power embedded
machine learning domain. Neuromorphic computing is a technology suited
to such low-power intelligent sensing. However, neuromorphic computing is
hampered today by the fragmentation of the hardware providers and the diffi
culty of embedding and comparing the algorithms’ performance. The lack of
standard key performance indicators spanning across the hardware-software
domains makes it difficult to benchmark different solutions for a given appli
cation on a fair basis. In this paper, we summarize the current benchmarking
solutions used in both hardware and software for neuromorphic systems,
which are in general applicable to low-power systems. We then discuss the
challenges in creating a fair and user-friendly method to benchmark such
systems, before suggesting a clear methodology that includes possible key
performance indicators.
1
DOI: 10.1201/9781003377382-1
This chapter has been made available under a CC BY-NC 4.0 license.
2 Benchmarking Neuromorphic Computing for Inference
1.1 Introduction
The performance necessary for consumer uptake of IoT devices has not been
achieved yet. Intelligent always-on edge devices and sensors powered by AI
and running on ultra-low power devices require outstanding energy efficien
cies, low latency (real-time), high-throughput, and uncompromised accuracy.
Neuromorphic computing rises to the challenge; however, the neuromorphic
computing landscape is fragmented with no universal Key Performance
Indicators (KPI), and comparison on a fair basis remains illusive [1]. The
landscape is complex: comparisons should consider various aspects such as
industrial maturity, CMOS technology implications, arithmetic precision, sil
icon area, power consumption, and accuracy obtained from neural networks
running on the devices. Comparing target use-cases has the advantage of
looking at the system-wide requirements but adds additional complexity. For
example, if we take into account the inference frequency, this affects the
current leakage and active power, significantly impacting the mean power
consumption of the system.
The most commonly accepted quantitative metrics for benchmarking neu
romorphic hardware are TOPS (Tera Operations Per Second) for throughput,
TOPS/W for energy efficiency, and TOPS/mm2 for area efficiency. Hardware
metrics rarely take into account the algorithmic structure. For software, the
performance of Machine Learning (ML) algorithms is usually defined for a
given task. Their KPIs generally target the prediction performance in terms of
reached objective (often accuracy). Until recently, the KPIs rarely accounted
for algorithm complexity, the computational cost, or the structure which
impacts its performance on a given hardware.
Moreover, these metrics are only applicable to traditional neural net
works, such as Deep Neural Network (DNNs), while for Spiking Neural
Networks (SNN), other metrics such as energy per synaptic operation for
energy efficiency are used. Indeed, the very nature of these DNNs and SNNs
prohibits a comparison based on standard NN parameters.
The main questions asked by end-users, system integrators, and sen
sor manufacturers are: what is the best solution for the application, and
whether a given neuromorphic processor provides some advantages over
the state-of-art microcontrollers. The inability to answer these questions
thwarts the industrial interest. This white paper provides a brief guide to
relevant metrics for fair benchmarking of neuromorphic inference accelerator
ASICs, aiming to help compare different hardware approaches for various
use-cases.
1.2 State-of-the-art in Benchmarking 3
Table 1.2 Accuracy (Acc) for different object detection settings on COCO test-dev. Adapted
from [9].
Model Acc Acc50 Acc75 AccS AccM AccL
YOLOv2 21.6 44.0 19.2 5.0 22.4 35.5
SSD513 31.2 50.4 33.3 10.2 34.5 49.8
DSSD513 33.2 53.3 35.2 13.0 35.4 51.1
RetinaNet (ours) 39.1 59.1 42.3 21.8 42.7 50.2
6 Benchmarking Neuromorphic Computing for Inference
1.2.2 Hardware
An increasing number of hardware evaluation tools aim at benchmarking
ML applications directly on the hardware. For example, QuTiBench [37]
presents a benchmarking tool that takes algorithmic optimization and co
design into account. The MLMark[27] benchmark targets ML applications
running on MCUs at the edge. However, both QuTiBench and MLMark
models are too large for tiny applications and require large memories,
which are not available on tiny edge devices. TinyMLPerf [28] provides
benchmarks for tiny systems based on imposed models and tasks, yield
ing the latency and speed-related KPIs. Submission of results using other
network architectures is allowed in its open division. Further tools, like
SMAUG [29], MAESTRO[30] and Aladdin[31], provide software solu
tions to emulate workloads on deep-learning accelerators using varying
topologies.
The power consumption of edge ML processing hardware is of utmost
interest as it directly impacts the battery lifetime of a system. Dynamic
power dominates in most high-throughput applications, while leakage power
is only significant in low duty cycle modes[32], where power gating, body
biasing, and voltage scaling techniques are employed to reduce leakage.
Peak power consumption corresponds to the maximum power consumption
8 Benchmarking Neuromorphic Computing for Inference
1.3 Guidelines
Benchmarking of ML applications cannot be tackled as a standalone problem
at the level of either only hardware or algorithms. A holistic view requires
a wide range of expertise and domains. It requires a multidisciplinary and
multidimensional approach considering, among other things, the hardware
platform, the NN (model), and the use-case under evaluation. In order to make
the right choices for building blocks, the system integrator needs to know
10 Benchmarking Neuromorphic Computing for Inference
the KPIs for a given use-case that different NNs will be able to deliver on
different hardware platforms.
This section explains why a multidisciplinary approach combining both
algorithms and hardware is needed to avoid drawing unfair and mis
leading conclusions and comparisons. In the following, we first describe
what is unfair and fair benchmarking in Section 1.1, and then present a
combined KPI approach and guidelines for benchmarking in sections 1.2
and 1.3.
Figure 1.1 Benchmarking fairness. (a) Unfair benchmarking: the KPIs are comparable,
but the benchmarked hardware platforms are not exploited to their full potential. (b) Fair
benchmarking: the hardware platforms are exploited to their full potential, but the resulting
combined KPIs (KPICB ) are not comparable.
the hardware system on which the application is deployed, see Figure 1.2.
Because of the large number of KPIs that can be reported, it is difficult to
have an objective comparison between different platforms, as a platform can
perform well on certain KPIs and poorly on others (e.g., simulating an SNN
on a CNN accelerator). Furthermore, not all platforms report the same set of
metrics and the metrics are not usually convertible to each other (e.g., energy
consumption is not always relying only on MAC operations).
Figure 1.3 Benchmarking pipeline based on use-cases. An automated search finds the best
possible model exploiting the performance offered by each target hardware platforms. The
resulting combined KPIs are comparable.
1.4 Conclusion
In this paper, we have summarized the standard techniques for benchmarking
NN accelerator hardware and ML software, in addition, we have specified the
KPIs that are most relevant for resource aware inference. We have through
example shown that, in ultra-low-power or neuromorphic systems, separating
hardware and ML algorithms and use-case parameters leads to an ineffective
means of comparison. Only when considering these three in a holistic manner,
can system be benchmarked. Integrating KPIs that allow benchmarking at the
system level in this way is complex. It is important to do this as the inability to
benchmark the IoT systems today is reducing the uptake by industry. In this
paper, we have proposed a benchmarking methodology based on use-cases
where the ML algorithm is adapted to the hardware to allow fair comparison.
Finally, we provide a guideline on what aspects are important to take into
account while developing such benchmarking tool to ensure that the resulting
KPIs are comparable.
16 Benchmarking Neuromorphic Computing for Inference
Acknowledgements
This work is supported through the project ANDANTE. ANDANTE has
received funding from the ECSEL Joint Undertaking (JU) under grant agree
ment No 876925. The JU receives support from the European Union’s
Horizon 2020 research and innovation programme and France, Belgium,
Germany, Netherlands, Portugal, Spain, Switzerland. ANDANTE has also
received funding from the German Federal Ministry of Education and
Research (BMBF) under Grant No. 16MEE0116. The authors are responsible
for the content of this publication.
References
[1] M. Davies. Benchmarks for progress in neuromorphic computing.
Nature Machine Intelligence, 1(9):386–388, 2019.
[2] B. J. Erickson and F. Kitamura. Magician’s corner: 9. performance
metrics for machine learning models. Radiology: Artificial Intelligence,
3(3), 2021.
[3] A. Rácz, D. Bajusz, and K. Héberger. Multi-level comparison of
machine learning classifiers and their performance metrics. Molecules,
24(15), 2019.
[4] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.
Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit
learn: Machine learning in python. the Journal of machine Learning
research, 12:2825–2830, 2011.
[5] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P.
Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context.
In European conference on computer vision, pages 740–755. Springer,
2014.
[6] https://paperswithcode.com. Website, 2021.
[7] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features
from tiny images. 2009.
[8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet:
A large-scale hierarchical image database. In 2009 IEEE conference
on computer vision and pattern recognition, pages 248–255. Ieee,
2009.
[9] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss
for dense object detection. In Proceedings of the IEEE international
conference on computer vision, pages 2980–2988, 2017.
References 17
[32] F. Fallah and M. Pedram. Standby and active leakage current control and
minimization in cmos vlsi circuits. IEICE transactions on electronics,
88(4):509–519, 2005.
[33] J. Hanhirova, T. Kämäräinen, S. Seppälä, M. Siekkinen, V. Hirvisalo,
and A. Ylä-Jääski. Latency and throughput characterization of convo
lutional neural networks for mobile computer vision. In Proceedings of
the 9th ACM Multimedia Systems Conference, pages 204–215, 2018.
[34] M. Breiling, R. Struharik, and L. Mateu. Machine learning: Elektronen
hirn 4.0. 2019.
[35] Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze. Eyeriss v2: A flexible
accelerator for emerging deep neural networks on mobile devices. IEEE
Journal on Emerging and Selected Topics in Circuits and Systems,
9(2):292–308, 2019.
[36] P. Jokic, E. Azarkhish, A. Bonetti, M. Pons, S. Emery, and L. Benini.
A construction kit for efficient low power neural network accelerator
designs. arXiv preprint arXiv:2106.12810, 2021.
[37] M. Blott, L. Halder, M. Leeser, and L. Doyle. Qutibench: Benchmarking
neural networks on heterogeneous hardware. ACM Journal on Emerging
Technologies in Computing Systems (JETC), 15(4):1–38, 2019.
[38] EMBCC ULPMark: https://www.eembc.org/ulpmark/. Website, 2021.
[39] EMBCC CoreMark: https://www.eembc.org/coremark/. Website, 2021.
[40] V. Fra, E. Forno, R. Pignari, T. Stewart, E. Macii, and G. Urgese. Human
activity recognition: suitability of a neuromorphic approach for on-edge
aiot applications. Neuromorphic Computing and Engineering, 2022.
2
Benchmarking the Epiphany Processor
as a Reference Neuromorphic Architecture
Abstract
This short article explains why the Epiphany architecture is a proper refer
ence for digital large-scale neuromorphic design. We compare the Epiphany
architecture with several modern digital neuromorphic processors. We show
the result of mapping the binary LeNet-5 neural network into few modern
neuromorphic architectures and demonstrate the efficient use of memory in
Epiphany. Finally, we show the results of our benchmarking experiments
with Epiphany and propose a few suggestions to improve the architecture
for neuromorphic applications. Epiphany can update a neuron on average in
120ns which is enough for many real-time neuromorphic applications.
21
DOI: 10.1201/9781003377382-2
This chapter has been made available under a CC BY-NC 4.0 license.
22 Benchmarking the Epiphany Processor as a Reference Neuromorphic
tasks are done without considering one of the main restrictions in bio
evolution, the "energy consumption". The biological restrictions pushed the
evolution toward power-efficient algorithms and architectures. The human
brain is an extreme example that consumes a considerable portion (around
20%) of the human body’s energy while it has less than 3% of the total
weight.
Even though the elements of the biological fabric in the brain are not
as fast and arguably as power efficient as our modern silicon technologies,
no computing platform can get close to the compute efficiency of the bio
logical brain for processing natural signals. The brain is a perfect example
of algorithm-hardware co-optimization. As mentioned, the ultimate goal of
bio-inspired processing is to process the raw sensory data with the minimum
amount of power consumption.
The Epiphany architecture was first introduced back in 2009 [1] as a high-
performance energy-efficient many-core architecture suitable for real-time
embedded systems. Epiphany’s architecture contains many RISC proces
sor cores connected with a packet-based mesh Network-On-Chip (NoC).
Figure 5.1 shows the big picture of the Epiphany’s architecture. This archi
tecture is different from the mainstream von-Neumann type multi-core
processors since in Epiphany, the cores are connected directly via a NoC
Figure 2.2 Adapteva launched an $99 Epiphany-III based single board computer as their
first product.
24 Benchmarking the Epiphany Processor as a Reference Neuromorphic
and power/area performance than using a larger inter (like int32) format.
Therefore, it is possible to trade-off the memory footprint and complexity
of the operations.
Another method to use the memory space efficiently is to store a com
pressed form of the parameters when there is a high amount of sparsity in the
synaptic weight tensor [10]. Weight sharing is another method to efficiently
use the memory for spiking Convolutional Neural Networks (sCNN) [5] [7].
The Epiphany contains 256kb of memory per core and is the most flexible
architecture in Table 5.1. In the table N/A means we could not find the data
publicly. Axons are the destination core addresses to route spikes from a
neuron. All the numbers in this table are for a single processing core inside
the mentioned neuromorphic chip. All the above-mentioned schemes can be
implemented in Epiphany to optimally use the memory space. To demonstrate
28 Benchmarking the Epiphany Processor as a Reference Neuromorphic
Table 2.2 Mapping LeNet-5 neural network (with binary weights) in different neuromorphic
architectures
Architecture Number Average Number of Number Total
of used number of individual of used memory
neurons synapses per stored weights cores used
neuron
LeNet-5 6518 144.5 61k - -
(before
deployment)
TrueNorth [4] 40k 256 941k 155 17Mb
(144.5×6518)
Loihi [5] 6518 1024 61k 7 14Mb
NeuronFlow [7] 6518 1024 61k 7 840kb
SpiNNaker [3] 6518 144.5 61k 1 768kb
Epiphany 6518 144.5 61k 1 256kb
the value of flexibility for efficient use of memory, in Table 5.2 we show
the result of mapping the binary LeNet-5 [11] into the above-mentioned
neuromorphic architecture. The average pooling layers are optimized out in
the mapping (as average pooling is a linear operation and does not consume
stateful neurons). The mappings are hand optimized with only memory
constraint. In TrueNorth, several neurons need to be combined to make a
single neuron with enough synapses and axons. Also, since weight-sharing
is not used, the weight for each synapse needs to be stored individually. In
the flexible architectures, the neuron states are assumed to be 16b, without
refractory mechanism and with a single threshold per channel. Mapping in
SpiNNaker is done with the “Convnet Optimized Implementation” which is
described in [12]. Total memory used is (number of cores×memory per core).
Figure 2.3 Flow chart of processing a LIF neuron with processing time measured in
Epiphany.
a software process). Thereafter, the target neurons will get updated. After
updates, the threshold of neurons is checked, and the refractory check is
executed for each firing. If both checks pass, the firing process starts, and
the RISC core commands to the DMA to transmit a spike packet. Membrane
leakage is also an independent process that starts with a timer interrupt.
Each cycle takes 1ns when using a 1GHz clock frequency. For example,
processing a single spike from the first convolutional layer of LeNet-5 to the
second convolutional layer requires to update 16×5×5 neurons. When the
second layer is implemented in a core and 1% of the updated neurons fire, the
processing time takes around 46us. The leak process on all these 400 neurons
takes around 12us. Our measurements are averaged over many experiments
and therefore the numbers in this figure are reasonable estimations. Since
the neuron model is programmable, one may decide to remove some of
32 Benchmarking the Epiphany Processor as a Reference Neuromorphic
2.5 Conclusion
This article demonstrates that the Epiphany processor is compatible with neu
romorphic computing. Overall, it has a similar architecture to the well-known
neuromorphic processors and is flexible enough for the implementation of
new ideas. Unlike Epiphany, all the mentioned neuromorphic processors
contain optimized elements that add complexity to the architecture and make
it less flexible to be a reference benchmarking architecture (flexibility vs
efficiency trade-off). For example, having a fixed number of neurons per core
(in TrueNorth, Loihi, and NeuronFlow) does not allow for optimized resource
management during mapping. Also, having an accelerated learning mech
anism (in Loihi) may be unnecessary for many applications. Additionally,
suppose one wants to know the performance improvement of the SpiNNaker
processor due to its optimized NoC. In that case, Epiphany is an excellent
platform to compare to, due to its simplicity and flexibility.
As mentioned, not having any accelerator makes the epiphany less effi
cient compared to the accelerated architectures (like Loihi), but it increases
its value for benchmarking the performance improvement of any accelerators.
We have implemented a neural network system and measured the pro
cessing time for different components of the LIF neuron model. It is already
visible that some small improvements (like having a hardware FIFO) can
improve the performance of the system. Increasing the size of the core results
in better memory saving, but the designer should scale the performance of the
cores as well (by the implementation of the schemes like multi-threading [5]
and SIMD, as it is implemented in the forthcoming SpiNNaker2.0 platform
[18]). Other improvements (like adding a more suitable interconnect) can
be examined and is a topic for our future research. All source code used to
benchmark the system and perform hands-on experiments is freely available
upon request ({amirreza.yousefzadeh, gert-jan.vanschaik}@imec.nl)
References 33
Acknowledgements
This technology is partially funded and initiated by the Nether
lands and European Union’s Horizon 2020 research and innovation
projects TEMPO (ECSEL Joint Undertaking under grant agreement No
826655) and ANDANTE (ECSEL Joint Undertaking under grant agreement
No 876925).
References
[1] A. Olofsson, et al., Kickstarting high-performance energy-efficient
manycore architectures with epiphany, in 2014 48th Asilomar Confer
ence on Signals, Systems and Computers, IEEE, 2014, pp. 1719–1726.
[2] A. Olofsson, Epiphany-v: A 1024 processor 64-bit risc system-on-chip,
arXiv preprint arXiv:1610.01832.
[3] E. Painkras, et al., Spinnaker: A 1-w 18-core system-on-chip for
massively-parallel neural network simulation, IEEE Journal of Solid-
State Circuits 48 (8) (2013) 1943–1953.
[4] F. Akopyan, et al., Truenorth: Design and tool flow of a 65 mw 1
million neuron programmable neurosynaptic chip, IEEE transactions on
computer-aided design of integrated circuits and systems 34 (10) (2015)
1537–1557.
[5] M. Davies, et al., Loihi: A neuromorphic manycore processor with on-
chip learning, IEEE Micro 38 (1) (2018) 82–99.
[6] M. Demler, Brainchip akida is a fast learner, spiking-neural-network
processor identifies patterns in unlabeled data, Microprocessor Report
(2019).
[7] O. Moreira, et al., Neuronflow: a neuromorphic processor architecture
for live ai applications, in 2020 Design, Automation & Test in Europe
Conference & Exhibition (DATE), IEEE, 2020, pp. 840–845.
[8] E. Miranda, J. Suñé, Memristors for neuromorphic circuits and artificial
intelligence applications (2020).
[9] N. P. Jouppi, et al., In-datacenter performance analysis of a ten
sor processing unit, in: Proceedings of the 44th Annual International
Symposium on Computer Architecture, 2017, pp. 1–12.
[10] V. Sze, Y.-H. Chen, T.-J. Yang, J. S. Emer, Efficient processing of deep
neural networks, Synthesis Lectures on Computer Architecture 15 (2)
(2020) 1–341.
34 Benchmarking the Epiphany Processor as a Reference Neuromorphic
Abstract
Real-time video processing using state-of-the-art deep neural networks
(DNN) has managed to achieve human-like accuracy in the recent past but
at the cost of considerable energy consumption, rendering them infeasible
for deployment on edge devices. The energy consumed by running DNNs on
hardware accelerators is dominated by the number of memory read/writes
and multiply-accumulate (MAC) operations required. This work explores
the role of activation sparsity in efficient DNN inference as a potential
solution. As matrix-vector multiplication of weights with activations is the
most predominant operation in DNNs, skipping operations and memory
fetches where (at least) one of them is a zero can make inference more
energy efficient. Although spatial sparsification of activations is researched
extensively, introducing and exploiting temporal sparsity has received far less
attention in DNN literature. This work introduces a new DNN layer (called
temporal delta layer) whose primary objective is to induce temporal activation
sparsity during training. The temporal delta layer promotes activation sparsity
by performing delta operation that is aided by activation quantization and l1
norm based penalty to the cost function. As a result, the final model behaves
like a conventional quantized DNN with high temporal activation sparsity
during inference. The new layer was incorporated into the standard ResNet50
architecture to be trained and tested on the popular human action recognition
35
DOI: 10.1201/9781003377382-3
This chapter has been made available under a CC BY-NC 4.0 license.
36 Temporal Delta Layer: Exploiting Temporal Sparsity
3.1 Introduction
DNNs have lately managed to successfully analyze video data to perform
action recognition [1], object tracking [2], object detection [3], etc., with
human-like accuracy and robustness. Unfortunately, DNNs’ high accuracy
comes with considerable costs, in terms of computation and memory con
sumption, resulting in high energy consumption. This makes them unsuitable
for always-on edge devices.
Techniques such as network pruning, quantization, regularization, and
knowledge distillation [4] [5] have helped reduce model size over time,
resulting in less compute and memory consumption overall. Sparsity is a
prominent aspect in all of the aforementioned methods. This is significant
because sparse tensors allow computations involving zero multiplication to
be skipped. They are also easy to store and retrieve in memory. In the DNN
literature, structural sparsity (of weights) and spatial sparsity (of activations)
are well-studied topics [6]. However, while being a popular concept in neuro
morphic computing, temporal activation sparsity has received less attention
in the context of DNN.
This work applies the concept of change or delta based processing to
the training and inference phases of deep neural networks, drawing inspi
ration from the human retina [7]. DNN inference, which processes each
frame independently with no regard to the temporal correlation is dense and
obscenely wasteful. Whereas, processing only the changes in the network can
lead to zero-skipping in sparse tensor operations minimizing the redundant
operations and memory accesses.
Therefore, the proposed methodology in this work induces temporal spar
sity to theoretically any DNN by incorporating a new layer (named temporal
delta layer), which can be introduced in a DNN at any phase (training,
refinement, or inference only). This new layer can be integrated to an existing
architecture by positioning it after all or some of the ReLU activation layers
as deemed beneficial (see Figure 3.1). The inclusion of this layer does not
necessitate any changes to the preceding or following layers. Furthermore,
the new layer adds a novel sparsity penalty to the overall cost function of
the DNN during the training phase. This l1 norm based penalty minimizes
the activation density of the delta maps (i.e., temporal difference between
two consecutive feature maps). Apart from that, the new layer is compared
3.2 Related Works 37
Input Output
Input Output
Figure 3.1 (a) Standard DNN, and (b) DNN with proposed temporal delta layer
Figure 3.2 Sparsity in activation (Δx) drastically reduce the memory fetches and multipli
cations between Δx and columns of weight matrix, W, that correspond to zero [10].
3.3 Methodology
In video-based applications, traditional deep neural networks rely on frame-
based processing. That is, each frame is processed entirely through all the
layers of the model. However, there is very little change in going from one
frame to the next through time, which is called temporal locality. Therefore,
it is wasteful to perform computations to extract the features of the non-
changing parts of the individual frame. Taking that concept deeper into the
network, if feature maps of two consecutive frames are inspected after every
activation layer throughout the model, this temporal overlap can be observed.
Therefore, this work postulates that temporal sparsity can be significantly
increased by focusing the inference of the model only on the changing pixels
of the feature maps (or deltas).
Figure 3.3 Demonstration of two temporally consecutive activation maps leading to near
zero values (rather than absolute zeroes) after delta operation.
Method:
Firstly, a bitwidth is defined to which the 32-bit floating parameter is to be
quantized, BW. Then, the number of bits required to represent the unsigned
integer part of the parameter (x) is calculated as shown in Eq. 3.6.
F = BW − I − 1 (3.7)
42 Temporal Delta Layer: Exploiting Temporal Sparsity
C(R(x.2F ), −t, t)
Q(x) = (3.8)
2F
where R(.) is the round function, C(x, a, b) is the clipping function, and t is
defined as,
�
2BW −S , BW > 1
t=
0 BW ≤ 1
Quantized data
Raw value
Quantized value
Raw data 0
0
0
Figure 3.4 Importance of step size in quantization: on the right side, in all three cases, the
data is quantized to five bins with different uniform step sizes. However, without optimum step
size value, the quantization can detrimentally alter the range and resolution of the original data.
data being a poor representation of the raw data. (b) as the step size is a model
parameter, it is also directly seeking to improve the metric of interest, i.e.
accuracy.
Method:
Given: x - the parameter to be quantized, s - step size, QN and QP - number
of negative and positive quantization levels respectively, and q(x;s) is the
quantized representation with the same scale as x,
⎧
x
⎪
⎨ s l.s, if − QN ≤ xs ≤ QP
q(x; s) = −QN .s, if xs ≤ −QN (3.9)
⎪
⎩ x
QP .s, if s ≥ QP
where al rounds the value to the nearest integer. Considering the number of
bits, b, to which the data is to be quantized, QN = 0 for unsigned and QN
= 2b−1 for signed data. Similarly, QP = 2b−1 for unsigned and 2b−1 − 1 for
signed data.
Modified LSQ:
In this work, the original LSQ method is slightly modified to remove the clip
ping function from the equations as (a) the bitwidth, b, required to calculate
44 Temporal Delta Layer: Exploiting Temporal Sparsity
density. The λ mentioned in Eq. 3.12 refers to the penalty co-efficient of the
cost function. If λ is too small, the sparsity penalty takes little effect and
model accuracy is given more priority and if λ is too large, sparsity becomes
the priority leading to very sparse models but with unacceptable accuracy.
The key is to find the balance between task loss and sparsity penalty.
3.4.1 Baseline
For baseline, the two-stream architecture [24] was used, with ResNet50 as the
feature extractor on both spatial and temporal streams. The dataset used was
UCF101, which is a widely used human action recognition dataset of ‘in-the
wild’ action videos, obtained from YouTube, having 101 action categories
[25]. The spatial stream used single-frame RGB images of size (224, 224,
3) as the input, while the temporal stream used stacks of 10 RGB difference
frames of size (224, 224, 10 × 3) as the input. Also, both these inputs were
time distributed to apply the same layer to multiple frames simultaneously
and produce output that has time as the fourth dimension. Both the streams
were initialized with pre-trained ImageNet weights and fine-tuned with an
SGD optimizer.
Under the above-mentioned setup, spatial and temporal streams achieved
an accuracy of 75% and 70%, respectively. Then, both streams were average
fused to achieve a final classification accuracy of 82%. Also, in this scenario,
both streams were found to have an activation sparsity of ∼ 47%.
3.4.2 Experiments
Scenario 1: The setup consecutively places the fixed point based quantization
layer and temporal delta layer after every activation layer in the network. The
temporal delta layer here also includes a l1 norm based penalty. The baseline
weights were used as a starting point, and all the layers including the temporal
delta layer is fine-tuned until acceptable convergence. The hyper-parameters
specifically required for this setup were bitwidth (to which the activations
were to be quantized) and penalty co-efficient to balance the tussle between
task loss and sparsity penalty.
46 Temporal Delta Layer: Exploiting Temporal Sparsity
Scenario 2: The setup is similar to the previous scenario except for the activa
tion quantization method used. The previous experiment used fixed precision
quantization where all the activation layers in the network were quantized
to the same bitwidth. However, this experiment uses learnable step-size
quantization (LSQ), which performs channel-wise quantization depending
on the activation distribution resulting in mixed-precision quantization of the
activation maps.
The layer also introduces a hyperparameter during training (apart from
the penalty coefficient mentioned earlier) for the step size initialization.
Then, during training, the step size increases or decreases depending on the
activation distribution in each channel.
Table 3.1 Spatial stream - comparison of accuracy and activation sparsity obtained through
the proposed scenarios against the baseline. In the case of fixed point quantization, the reported
results are for a bitwidth of 6 bits.
Model setup
Accuracy Activation sparsity
(Spatial stream)
Baseline 75% 48%
Temporal delta layer with
73% 74%
fixed point quantization
Temporal delta layer with
69% 86%
learned step-size quantization
Table 3.2 Temporal stream - comparison of accuracy and activation sparsity obtained
through the proposed scenarios against the benchmark. In the case of fixed point quantization,
the reported results are for a bitwidth of 7 bits.
Model setup
Accuracy Activation sparsity
(Temporal stream)
Baseline 70% 47%
Temporal delta layer with
68% 67%
fixed point quantization
Temporal delta layer with
65% 89%
learned step-size quantization
3.4 Experiments and Results 47
This is because lowering the precision from 32 bits to 8 bits (or less) leads to
temporal differences of activations going to absolute zero.
Additionally, the reason for close-to baseline accuracy in the method
involving fixed point quantization can be attributed to fractional bit allocation
flexibility. That is, as the bitwidth is fixed, the number of integer bits required
is decided depending on the activation distribution within the layer, and the
rest of the bits are assigned as fractional bits. This makes sure that the pre
cision of the activation is compromised for range. Also, another contributing
factor for accuracy sustenance is that the first and the last layers of the model
are not quantized, similar to works like [26][27]. This is because the first
and last layer has a lot of information density. Those are the layers where
input pixels turn into features and features turn into output probabilities,
respectively, which makes them more sensitive to quantization.
Although the activation sparsity gain in the case of the temporal delta
layer with fixed point quantization is better than the baseline, it is still not
sufficiently high as required. In this effort, the bitwidth of the activations are
decreased in the expectation of increasing sparsity. However, as the bitwidth
goes below a certain value (6 bits for spatial and 7 bits for temporal stream),
sparsity increases, but accuracy starts to deteriorate beyond recovery, as
shown in Table 3.3. This is because quantizing all layers of a network to the
same bitwidth can mean that the inter-channel variations of the feature maps
are not fully accounted for. Since the number of fractional bits is usually
selected to cover the maximum activation value in a layer, the fixed bitwidth
quantization tends to cause excessive information loss in channels with a
smaller dynamic range. Therefore, it can be inferred that mixed-precision
quantization of activations is a better approach to obtain good sparsity without
compromising accuracy.
Table 3.3 Result of decreasing activation bitwidth in fixed point quantization method. For
spatial stream, decreasing below 6 bits caused the accuracy to drop considerably. For temporal
stream, the same happened below 7 bits.
Spatial stream Temporal stream
Activation Accuracy Activation Accuracy Activation
bitwidth (%) sparsity (%) (%) sparsity (%)
32 75 50 70 47
8 75 68 70 65
7 75 71 68 70
6 73 75 61 73
5 65 80 - -
48 Temporal Delta Layer: Exploiting Temporal Sparsity
Figure 3.5 Evolution of quantization step size from initialization to convergence in LSQ.
As step-size is a learnable parameter, it gets re-adjusted during training to cause minimum
information loss in each layer.
Finally, using the temporal delta layer where incoming activations are
quantized using learnable step-size quantization (LSQ) gives the best results
for both spatial and temporal streams. As the step size is a learnable param
eter, it gives the model enough flexibility to result in a mixed precision
model, where each channel in a layer has a bitwidth that suits its activation
distribution. This kind of channel-wise quantization minimizes the impact of
low-precision rounding. It is also evident in Figure 3.5 that as the training
nears convergence, the values of the step size differ according to the acti
vation distribution and bitwidth required to represent each layer. Moreover,
consistent with the expectation, the first and last layers during training opts for
smaller step sizes implying they need more bitwidth for their representation.
Table 3.4 Final results from two-stream network after average fusing the spatial and tempo
ral stream weights. With 5% accuracy loss, the proposed method almost doubles the activation
sparsity available in comparison to the baseline.
Baseline Proposed method
Model Accuracy Activation Accuracy Activation
type (%) sparsity (%) (%) sparsity (%)
Spatial
75 50 69 86
stream
Temporal
70 46 65 89
stream
Two-stream
(Average 82 47 77 88
fused)
3.5 Conclusion 49
The weights generated using this method was then average fused to find
the final two-stream network accuracy and activation sparsity (Table 3.4).
Finally, the proposed method can achieve an overall 88% activation sparsity
with 5% accuracy loss.
3.5 Conclusion
Intuitively, the proposed new temporal delta layer projects the temporal
activation sparsity between two consecutive feature maps onto the spatial
activation sparsity of their delta map. When executing sparse tensor mul
tiplications in hardware, this spatial sparsity can be used to decrease the
computations and memory accesses. As shown in Table 3.4, the proposed
method resulted in 88% overall activation sparsity with a trade-off of 5%
accuracy drop on UCF-101 dataset.
The collateral benefit of the obtained temporal sparsity is that the compu
tations does not increase linearly with the increase in frame rate. In typical
DNNs, doubling the frame rate would automatically necessitate doubling the
computations. However, in the case of temporal delta layer based model,
increasing the frame rate will not only improve the temporal precision of
the network but also increase its temporal sparsity limiting the computations
required [28].
The downside of using the temporal delta layer is that it requires keeping
track of previous activations in order to perform delta operations. As a
result, the overall memory footprint grows, putting more reliance on off-chip
memory. However, the rising popularity of novel memory technologies (like
resistive RAM [29], embedded Flash memory [30], etc.) may improve the
cost calculations in the near future.
Disclaimer: This paper is a distillation of the research done by one of the
authors as a part of her master thesis and is partially published in chapter 3
of [32]. The complete thesis, along with the results and analysis, is available
online [31].
Acknowledgment
This work is partially funded by research and innovation projects TEMPO
(ECSEL JU under grant agreement No 826655), ANDANTE (ECSEL JU
under grant agreement No 876925) and DAIS (KDT JU under grant agree
ment No 101007273), SunRISE (EUREKA cluster PENTA2018e-17004
SunRISE) and Comp4Drones (ECSEL JU grant agreement No. 826610).
50 Temporal Delta Layer: Exploiting Temporal Sparsity
The JU receives support from the European Union’s Horizon 2020 research
and innovation programme and Sweden, Spain, Portugal, Belgium, Germany,
Slovenia, Czech Republic, Netherlands, Denmark, Norway and Turkey.
References
[1] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool,
“Temporal segment networks: Towards good practices for deep action
recognition,” in European conference on computer vision, pp. 20–36,
Springer, 2016.
[2] K. Chen and W. Tao, “Once for all: a two-flow convolutional neural
network for visual tracking,” IEEE Transactions on Circuits and Systems
for Video Technology, vol. 28, no. 12, pp. 3377–3386, 2017.
[3] K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z. Wang,
R. Wang, X. Wang, et al., “T-cnn: Tubelets with convolutional neural
networks for object detection from videos,” IEEE Transactions on Cir
cuits and Systems for Video Technology, vol. 28, no. 10, pp. 2896–2907,
2017.
[4] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing
deep neural networks with pruning, trained quantization and huffman
coding,” arXiv preprint arXiv:1510.00149, 2015.
[5] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural
network,” arXiv preprint arXiv:1503.02531, 2015.
[6] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured
sparsity in deep neural networks,” arXiv preprint arXiv:1608.03665,
2016.
[7] M. Mahowald, “The silicon retina,” in An Analog VLSI System for
Stereoscopic Vision, pp. 4–65, Springer, 1994.
[8] J. W. Mink, R. J. Blumenschine, and D. B. Adams, “Ratio of central
nervous system to body metabolism in vertebrates: its constancy and
functional basis,” American Journal of Physiology-Regulatory, Integra
tive and Comparative Physiology, vol. 241, no. 3, pp. R203–R212,
1981.
[9] A. Yousefzadeh, M. A. Khoei, S. Hosseini, P. Holanda, S. Leroux,
O. Moreira, J. Tapson, B. Dhoedt, P. Simoens, T. Serrano-Gotarredona,
et al., “Asynchronous spiking neurons, the natural key to exploit tempo
ral sparsity,” IEEE Journal on Emerging and Selected Topics in Circuits
and Systems, vol. 9, no. 4, pp. 668–678, 2019.
References 51
Abstract
In this work, we present an automated AI-supported end-to-end technology
validation pipeline aiming to increase trust in semiconductor devices by
enabling a check of their authenticity. The high revenue associated with
the semiconductor industry makes it vulnerable to counterfeiting activities
potentially endangering safety, reliability and trust of critical systems such as
highly automated cars, cloud, Internet of Things, connectivity, space, defence
and supercomputers [7]. The proposed approach combines semiconductor
device-intrinsic features extracted by artificial neural networks with domain
expert knowledge in a pipeline of two stages: (i) a semantic segmentation
stage based on a modular cascaded U-Net architecture to extract spatial and
geometric information, and (ii) a parameter extraction stage to identify the
technology fingerprint using a clustering approach. An in-depth evaluation
and comparison of several artificial neural network architectures has been
performed to find the most suitable solution for this task. The final results
validate the taken approach, with deviations close to acceptable levels as
defined by existing standards within the industry.
53
DOI: 10.1201/9781003377382-4
This chapter has been made available under a CC BY-NC 4.0 license.
54 An End-to-End AI-based Automated Process
4.1 Introduction
Automation is one of the key parameters industries can approach to
strengthen quality and lower overall costs. The improved availability of data
and the mainstream application of approaches relying on artificial intelli
gence (AI) pushes industries towards the adaption of these AI methods.
Nonetheless, practical implementations of these often seem to fail due to
inflated expectations. Via a use-case from the semiconductor industry, we
show various practical ways to overcome these potential pitfalls.
The recently introduced European Chips act recognises the paramount
importance of the semiconductor industry within the global economy. The
market for integrated electronics was at $452.25B in 2021 and is expected
to grow to $803.15B in 2028 [8]. The high revenue potential causes extreme
cost pressure and a highly competitive market. Consequently, since decades,
the semiconductor industry is driven to automation along the complete value
chain. One way to differentiate from competitors is through the utilisation
of AI-powered manufacturing enhancements which have the potential to
gain $35B - $40B annually over the entire industry [10]. Yet, not only
manufacturing yields the potential to benefit from the industries push towards
AI. The methods also offer the chance to be used for trust generation. In
the aforementioned staggering market, rogues also aim to catch their share
through counterfeiting, i.e. cloning, remarking, overproducing, or simply
reselling of used parts [9]. This leads to the use case discussed through
out this work: via physical inspection and a fully integrated AI flow we
present a fully automated assessment of the technological properties of a
device. The idea for such a pipeline has already been introduced in [15]
where it is argued that through a subsequent analysis of the cross-sections,
the authenticity of the manufacturing technology can be validated. Rele
vant features in this case include geometric shapes and dimensions of the
constituent structures, as well as material-related properties. Each technol
ogy can be interpreted as an individual fingerprint, such that deviations
from specifications can be reported as suspicious. This work will focus on
the end-to-end application aspects of the use case and includes following
contributions:
• We will introduce an end-to-end, fully automated flow for semiconduc
tor device technological parameter extraction by image segmentation
and pattern recognition as an exemplary industrial use-case.
• We introduce our methodology that is tailored to the requirements of
the use case. This includes an image segmentation approach which is
4.1 Introduction 55
Figure 4.3 Examples of labelled data showcasing the different ROIs: green – VIA; yellow –
metal; teal – lateral isolation; red – poly; blue – deep trench isolation
map is then fed into a Siamese network containing two encoders (as show
in Fig. 4.9), with the original high resolution image going through the
other encoder in patches. Finally the decoder stitches together the patches,
obtaining a segmentation map at the same resolution as the input image. The
Siamese network reached an averaged Dice score of 0.78 on the test subset.
Figure 4.10 Average Dice Scores (blue) and spread (green) per investigated network
architecture, along with the final chosen architecture (red)
Table 4.1 Obtained Dice Scores for each showcased network architecture
Architecture U-net PSPNet FPN GSCNN Siamese
Average DSC 0.76 0.69 0.71 0.74 0.78
DSC range 0.71 - 0.80 0.63 - 0.72 0.65 - 0.77 0.69 - 0.79 0.74 - 0.81
62 An End-to-End AI-based Automated Process
Figure 4.11 An overview of the U-net cascade architecture, consisting of a 2D U-net (top)
and a 3D U-net (bottom) which takes as input the high resolution input image stacked with the
output segmentation of the first stage
Table 4.3 Utilised cluster evaluation techniques [14]. Notation: n: number of objects in data-
set; c: centre of data-set; NC: number of clusters; Ci : the i-th cluster; ni : number of objects
in Ci ; ci : centre of Ci ; Wk : the within-cluster sum of squared distances from cluster mean;
W ∗k appropriate null reference; B reference data-sets
Method Definition Value
2
i n i · d (ci , c)/(N C − 1)
CH [4] 2 (x, c )/(n − N C)
Elbow
i x∈Ci d i
∗ 1/B
( Wkb )
Gap [28] log Elbow
⎧ Wk ⎫
⎪ 1 1 ⎪
⎪
⎨ d(x, c i ) + d(x, c j ⎪
) ⎬
1 ni x∈Ci nj x∈Cj
DB [5] · i Min
NC ⎪
⎪ d(ci , cj ) ⎪
⎪
⎩ ⎭
⎧ ⎫
⎪ 1 1 ⎪
⎪
⎨ x∈Ci d(x, ci ) + x∈Cj d(x, cj )
⎪
⎬
1 ni nj
DB2 [5] · i ·i 2
Min
NC ⎪
⎪ d(ci , cj ) ⎪
⎪
⎩ ⎭
1 1 b(x) − a(x)
Sil. [23] i x∈Ci Max
NC ni max[b(x), a(x)]
Figure 4.13 Example cross-section image with annotated metal and contact/VIA features
Since the polygon objects are now vertically assigned, a clustering in the
horizontal dimension is the next step. The procedure is the same as previously
discussed for the vertical clustering. For the vertically and horizontally clus
tered elements, the technological, geometrical parameters can be inferred.
These are illustrated via figure 4.13 for the metal and VIA classes. The
vertical height is determined for metallisation layers and height, width, and
pitches for the interconnecting contact and VIA layers. After the polygons
objects are assigned to classes, these attributes can be calculated through
trivial mathematical operation. Height is the difference of the bounding box
maximum and minimum in vertical dimension. Width is the difference of the
bounding box in horizontal dimension, and pitches are the differences of the
x-coordinate of the centroid of two adjacent polygon objects. The values are
respectively averaged within all classes.
An example is shown for the VIA class through the example in figure
4.14. After segmentation of the grey-scale image, the individual segmented
classes are inferred into polygon objects. Here the VIA class is exempli
fied. The vertical clustering process is shown through the two right images.
The dendrogram visualises the linkage of the different clusters which are
subsequently optimised via discussed approach. The optimum number of
clusters are shown in the bottom right figure. An evaluation techniques
report an optimum of four (different values constitute the optimum) clusters.
Following this, these four are subsequently clustered in horizontal dimension
and respectively geometrically inferred. Following results were obtained for
this example (besides the absolute values, the relative deviation to a manual
measurement is given):
4.3 Parameter Extraction 67
Figure 4.14 Example cross-section image (upper left). The polygonised VIA objects are
shown (lower left). A dendrogram is shown for the relative distances of the y-coordinates of
the single objects (upper right). Finally, the results of the utilised cluster evaluation techniques
are presented (lower right).
that the current automated end-to-end flow reaches 75% accuracy for previ
ously known Al-Tu technologies. Improvement is necessary for copper (Cu)
technologies which are more complex to segment. According to existing
procedures within the industry, deviations of less than 5% for pitches and
deviations of less than 25% for all other geometrical measurements compared
to a ground truth, i.e. the designed technology parameters are acceptable.
The same requirements have been used as a benchmark for the validation of
this application. The high deviations are a consequence attributed to process
variances during device manufacturing and de-processing. Presented image
shows a single frame which was acquired in a sub-optimal zoom level for
measuring discussed features. Yet, almost all requirements were achieved.
In summary it can be stated that the proof-of-concept presented in this work
displays strong potential to satisfy existing industrial requirements, especially
when adequate zooms levels are chosen for the particular technological
parameters.
4.4 Conclusion
The settings for AI implementation in an industrial setting are often com
pletely different from consumer applications. Data being scarce the design of
productive AI application is forcibly data-driven, or more specifically data-
adapted. Industrial parameters are manifold, and the requirements typically
impose the need to automate, improve, or even enable new processes. To
make an AI-based solution viable these requirements must be met. In this
work, we have shown through an end-to-end technology demonstrator –
incorporating deep learning and cluster evaluation – showcasing the automa
tion of semiconductor technology identification based on SEM cross-section
analysis. A comparison of different convolutional neural network architec
tures was presented, and a candidate best suited for the SEM segmentation
task was drafted. The proposed candidate architecture represents a cascade
of 2D and 3D Unets, arranged in branches each dedicated to a single label
of interest. Following a pragmatic perspective, a modular design is proposed,
ensuring scalability and ease-of-maintenance. Trained on a custom-created
data set of 2192 images, the proposed architecture obtained Dice scores in
the range of 0.76-0.93 for labels of different complexity, arguing in favour
of the employment of supervised deep learning-based methods even in appli
cations with strongly limited amounts of available labelled data. Based on
the obtained results, a parameter extraction algorithm is proposed, aimed at
exploiting the obtained segmentation maps with the purpose of identifying
References 69
Acknowledgment
This work is conducted under the framework of the ECSEL AI4DI “Artificial
Intelligence for Digitising Industry” project. The project has received funding
from the ECSEL Joint Undertaking (JU) under grant agreement No 826060.
The JU receives support from the European Union’s Horizon 2020 research
and innovation programme and Germany, Austria, Czech Republic, Italy,
Latvia, Belgium, Lithuania, France, Greece, Finland, Norway.
References
[1] N. Abraham and N. Mefraz Khan. A novel focal tversky loss function
with improved attention u-net for lesion segmentation. 2019 IEEE 16th
International Symposium on Biomedical Imaging (ISBI 2019), pages
683–687, 2019.
[2] F. Altaf, S. M. S. Islam, N. Akhtar, and N. Khalid Janjua. Going deep
in medical image analysis: Concepts, methods, challenges, and future
directions. IEEE Access, 7:99540–99572, 2019.
70 An End-to-End AI-based Automated Process
Abstract
Surface defects generated during semiconductor wafers processing are among
the main challenges in micro- and nanofabrication. The wafers are typically
scanned using optical microscopy and then the images are inspected by
human experts. That tends to be a quite slow and tiring process. The devel
opment of a reliable machine vision-based system for correct identification
and classification of wafer defect types for replacement of manual inspection
is a challenging task, due to the variety of possible defects. In this work
we developed a machine vision system for the inspection of semiconductor
wafers and detection of surface defects. The system integrates an optical
scanning microscopy system and an AI algorithm based on the Mask R
CNN architecture. The system was trained using a dataset of microscopic
images of wafers with Micro Electro-Mechanical Systems (MEMS), silicon
photonics and superconductor devices at different fabrication stages including
surface defects. The achieved accuracy and detection speed makes the system
promising for cleanroom applications.
73
DOI: 10.1201/9781003377382-5
This chapter has been made available under a CC BY-NC 4.0 license.
74 AI Machine Vision System for Wafer Defect Detection
schematically shown in Figure 5.3. The dataset was split into training and
validation sets, containing 935 and 165 images each.
Here we used a Convolutional Neural Network (CNN): a special type
of deep learning algorithm, used primarily for image recognition and pro
cessing. CNNs are inspired by the organization of the animal visual cortex
[4][5] and are designed to learn spatial hierarchies of features, from low- to
high-level patterns. We developed an algorithm based on the Mask R-CNN
architecture [6], which is a state-of-the-art algorithm for object detection
a computer vision technique that enables the identification and location of
objects in an image or video. Mask R-CNN is the latest stage of evolution
of CNNs, providing high detection accuracy. At the same time, it requires
more computational resources compared to faster algorithms, such as YOLO
[7]. Mask R-CNN consists of two stages. The first stage, called a Region
Figure 5.3 A scheme of the image dataset preparation, including labelling, cropping and
data augmentation
5.2 Machine Vision-based System Description 77
demonstrated 86% accuracy with a detection time of 1÷2 seconds per image.
The accuracy of the system is approximately on the same level as that
of a human operator, although it also depends a lot on the experience of
the operators and their tiredness. The experts estimated 86% accuracy as
sufficient for applications at VTT cleanroom but mentioned that only about
15% of the detected defects were critical for wafer processing. Unfortunately,
the criteria of a defect being critical or non-critical is very device-specific and
cannot be easily generalized. After the system provides the detection results,
the final decision on the importance of the defects for processing had to be
made by the cleanroom experts.
Regarding the system scalability, in the current work we did not have
the goal of moving towards smaller technology nodes, although such scaling
might require utilization of faster neural networks, like one-stage YOLO
detectors. In general, the main expected impact of the system development
is the reduction of the overall working time required for wafer defect inspec
tion. We believe that the system will help saving valuable working time of
cleanroom experts, improve fabrication yield and reduce fabrication cost.
5.3 Conclusion
We developed a system for the detection of wafer surface defects. The system
integrates an optical scanning microscopy system and an AI algorithm based
on the Mask R-CNN architecture. The image dataset used for training and
testing the system included microscopic images of wafers with MEMS,
silicon photonics and superconductor devices at different fabrication stages
References 79
Acknownledgements
This work is conducted under the framework of the ECSEL AI4DI “Artificial
Intelligence for Digitising Industry” project. The project has received funding
from the ECSEL Joint Undertaking (JU) under grant agreement No 826060.
The JU receives support from the European Union’s Horizon 2020 research
and innovation programme and Germany, Austria, Czech Republic, Italy,
Latvia, Belgium, Lithuania, France, Greece, Finland, Norway.
References
[1] H. J. Queisser, E. E. Haller, “Defects in Semiconductors: Some Fatal,
Some Vital”, Science, 281, 945– 950, 1998.
[2] T. Yuan, W. Kuo, and S. J. Bae, “Detection of spatial defect patterns gen
erated in semiconductor fabrication processes”, IEEE Trans. Semicond.
Manuf., vol. 24, no. 3, pp. 392–403, Aug. 2011.
[3] A. Buslaev, V. I. Iglovikov,; E. Khvedchenya, A. Parinov, M. Druzhinin,
A. A. Kalinin, “Albumentations: Fast and Flexible Image Aug
mentations”, Information 11, 2, 2020. https://www.mdpi.com/2078
2489/11/2/125.
[4] S. Albawi, T. A. Mohammed, S. Al-Zawi, “Understanding of a convo
lutional neural network”, International Conference on Engineering and
Technology (ICET), IEEE, pp. 1-6, 2017.
[5] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F.E. Alsaadi, “A survey of
deep neural network architectures and their applications”, Neurocom
puting, 234, pp. 11-26, 2017.
[6] K. He, G. Gkioxari, P. Dollár, R. Girshick, “Mask R-CNN”.
arXiv:1703.06870, 2018.
[7] M. Carranza-García, J. Torres-Mateo, P. Lara-Benítez, J. García-
Gutiérrez, “On the Performance of One-Stage and Two-Stage Object
Detectors in Autonomous Vehicles Using Camera Data”, Remote Sens.
13, 89, 2021.
[8] K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image
Recognition”, Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 770-778, 2016.
80 AI Machine Vision System for Wafer Defect Detection
[9] Z. Zhao, P. Zheng, S. Xu, X. Wu, “Object Detection With Deep Learn
ing: A Review”, IEEE Transactions on Neural Networks and Learning
Systems, 30, 11, 2019.
[10] U. Batool, M. I. Shapiai, M. Tahir, Z. H. Ismail, N. J. Zakaria, A.
Elfakharany, “A Systematic Review of Deep Learning for Silicon Wafer
Defect Recognition”, IEEE Access, 9, 116573, 2021.
6
Failure Detection in Silicon Package
Abstract
In an ever more connected world, semiconductor devices represent the core of
every technically sophisticated system. The desired quality and effectiveness
of such a system through assembly and packaging processes is high demand
ing. In order to achieve an expected quality, the output of each process must
be inspected either manually or rule-based. The latter would lead to high
over-reject rates which require a lot of additional manual effort. Moreover,
such an inspection is sort of handcrafted by engineers, who can only extract
shallow features. As a result, either more yield-losses due to an increase in the
rejection rate or more products with low quality will be shipped. Therefore,
the demand for advanced image inspection techniques is constantly increas
ing. Recently, machine learning and deep learning algorithms are playing an
increasingly critical role to fulfil this demand and therefore have been intro
duced in multiple applications. In this paper, an overview of the potential use
of advanced machine learning techniques is explored by showcasing of image
and wirebonding inspection in semiconductor manufacturing. The results are
very promising and show that AI models can find failures accurately in a
complex environment.
81
DOI: 10.1201/9781003377382-6
This chapter has been made available under a CC BY-NC 4.0 license.
82 Failure Detection in Silicon Package
Figure 6.1 Left: Curve with abnormal minimum position (red) in comparison to normal
ones (white) of recorded sensor data during wirebonding process. Right: shows an example of
abnormal OOI image with shown crack on the surface.
context, trained personnel took care of the heatsink inspection and was used
to label the image data, roughly 300 images, for supervised learning.
Figure 6.2 Flow chart of development and deployment life cycle for AI solution at IFX.
In development phase data scientists could use different programming language as the final
model can be converted to ONNX. In deployment phase, the vision frame can simply access
to ONNX and run during inference time.
Figure 6.3 Process flow integration of the developed AD solution into an existing IFX
infrastructure.
Figure 6.4 show the flow processes during silicon package, the backside blue arrow shows
the position of transfer learning from OOI backwards to taken images after molding process,
see Figure 6.5
Figure 6.5 shows an example of the OOI image on left side (This image is taken before
shopping and after electrical test) and example of image after molding process on right side.
of the used anomaly detection was that the result is an anomaly score,
indicating how different the raw data from normal is not a Boolean indication
anomaly / no anomaly. Thus, it is necessary to find an optimal threshold on
which the difference in the raw data influences the quality of the product.
An important impact of the work was also the adaptation of the approach
to a performant data management infrastructure; i. e. the development of
automatable methods for the detection of conspicuous parameter behaviour
and its marking and storage. The evaluation was based on sample data and
statistical analysis of standard deviations considering Nelson’s rules. The
work carried out covers both the familiarization with the various technologies
and their variants, the adaptation of the methods to the subject area, and
the prototypical implementation and testing of the algorithms by embedding
them in automated analysis pipelines. Currently the anomaly detection for
wirebonding is running on over 40 machines on 3 different IFX sites. During
a runtime of 4 months, several misadjusted bonders were detected, random
errors and contaminated devices. However, currently a big focus is set to
fully integrate the model not only in the infrastructure but also in the day to
day workflow of the operators, this also includes a clear definition of action
plans for found deviations and trainings of operators. For OOI use case, after
collecting images, the labeled images are pre-processed first by cropping the
region of interest and normalization the intensity values between 0 and 1.
These images are sent to CNN for training purpose. The CNN consist of
100 layers. The latter consisting of different blocks. Each block contains
the convolutional, pooling and ReLU layer. Also, before the last layer, fully
connected layer, a strict regularization factor is added in order to avoid over-
fitting issue by adding dropout layer with value 0.6. The data was splited into
80% training and 20% validation data. The model reported with accuracy
higher than 99%. Afterwards, the model is tested on productive data with
roughly 25k images. Table 6.1 shows the confusion matrix with the important
measures, sensitivity, specificity and accuracy. As, one can see that model to
follow zero defect philosophy, as sensitivity value is 100%. The accuracy
also is less than 1%. Hence, only the latter have to be reviewed by an expert.
Moreover, the performance model after scaling to anew process is still very
robust. As one can see in the Table 6.2, which shows the reported results by
a model when run on productive data of the new process. Although, one can
see there is one escapee in bottom surface (BOT), but the accuracy is still
higher than 99%.
References 89
Table 6.1 Show the confusion matrix and metrics of the CNN model on productive data for
BOT and TOP of OOI images.
Table 6.2 Show the confusion matrix and metrics of the CNN model on productive data for
BOT and TOP of the new process.
Acknowledgements
AI4DI receives funding within the Electronic Components and Systems for
European Leadership Joint Undertaking (ECSEL JU) in collaboration with
the European Union’s Horizon2020 Framework Programme and National
Authorities, under grant agreement n◦ 826060.
References
[1] S. Al-Baddai, M. Juhrisch, J. Papadoudis, A. Renner, L. Bernhard, C.
Luca, F. Haas, and W. Schober. Automated Anomaly Detection through
Assembly and Packaging Process, pages 161–176. 09 2021.
90 Failure Detection in Silicon Package
Abstract
For semiconductor manufacturing, easy access to causal knowledge docu
mented in free texts facilitates timely Failure Modes and Effects Analysis
(FMEA), which plays an important role to reduce failures and to decrease
production cost. Causal relation extraction is the tasks of identifying causal
knowledge in natural text and to provide a higher level of structure. How
ever, the lack of publicly available benchmark causality datasets remains
a bottleneck in the semiconductor domain. This work addresses this issue
and presents the S2ORC-SemiCause benchmark dataset. It is based on the
S2ORC corpus, which has been filtered for literature on semiconductor
research, and consecutively annotated by humans for causal relations. The
resulting dataset differs from existing causality datasets of other domain in
the long spans of causes and effects, as well as causal cue phrases exclusive
to the domain semiconductor research. As a consequence, this novel datasets
poses challenges even for state-of-the-art token classification models such as
S2ORC-SciBERT. Thus this dataset serves as benchmark for causal relation
extraction for the semiconductor domain.
91
DOI: 10.1201/9781003377382-7
This chapter has been made available under a CC BY-NC 4.0 license.
92 S2ORC-SemiCause: Annotating and Analysing Causality
7.1 Introduction
Although causality represents a simple logical idea, it becomes a complex
phenomenon when appearing in textual form. Natural language provides a
wide variety of structures to represent causal relationships that can obfuscate
the causal relations expressed via cause and effect. The task of causal rela
tion extraction aims at extracting sentences containing causal language and
identifying causal constituents and their relations [17].
In the last years significant progress have been made in automatizing
the identification of causal cues and extraction of causal relation in natu
ral language, defining it as a multi-way classification problem of semantic
relationships [6], designing a lexicon of causal constructions [2, 3], and
insights how to achieve high inter-rater agreement [13]. Approaches have
been developed in scientific domains traditionally dominated by textual infor
mation, such as biomedical sciences. Here, models to process causal relations
are facilitated and accelerated with the development of benchmark datasets
such as BioCause [10]. Such datasets not only allow for comparison and
automatic evaluation of custom causal extractors, but also allow for training
high performing supervised models.
For semiconductor manufacturing, much of existing knowledge can be
considered to be causal, highlighted by approaches like Ishikawa causal dia
grams as well as the Failure Modes and Effects Analysis (FMEA) tool which
captures root causes of potential failures. Even though such FMEA document
provides more structure than natural language text, dedicated pre-processing
is required before further processing [12]. A signification amount of such
causal knowledge is captured in textual documents, such as reports and
knowledge bases. However, there is no publicly available annotated dataset
for causal relation extraction yet. As a consequence, in this work we propose
such a dataset, named S2ORC-SemiCause. The source for the documents of
this novel dataset is the S2ORC academic corpus, which has been filtered for
documents of relevance for the semiconductor domain. Human annotators
identified causal cues and causal relations in the documents of the corpus.
To achieve consistent and reproducible results, an annotation guideline was
created and the annotation processes was conducted in multiple phases. To
provide baseline performance, the pre-trained language model BERT [1],
which is currently considered state of the art for many natural language
processing (NLP) tasks was adapted for the task. An error analysis gives
insights on the challenges of future causal relation extraction methods.
7.2 Dataset Creation 93
1
The annotation guideline will be make public at https://github.com/tugraz-isds/kd.
94 S2ORC-SemiCause: Annotating and Analysing Causality
Table 7.1 Inter-annotator agreement for the first two iterations. Arg1 (cause) refers to the
span of the arguments that lead to Arg2 (effect) for the respective relation type.
Iteration 1 Iteration 2
Relation classification Cohen’s κ 0.65 0.80
Consequence Arg1 F1 0.55 0.71
Consequence Arg2 F1 0.60 0.81
Purpose Arg1 F1 0.00 0.92
Purpose Arg2 F1 0.00 0.80
F1 micro average 0.49 0.78
Table 7.2 Comparison of labels generated by both annotators for Iteration 2. Examples
and total counts (in number of arguments) for each type also given. Arg1 and Arg2
are highlighted with blue and yellow background, respectively. Partial overlapped texts are
highlighted with green background.
Type # Example sentence
Exact match 54 In fact, and for the soil in question, the capillary rise process is low , so
the indirectly loss by evaporative loss is low too .
Partial 8 This result suggests a possible dynamical influence
overlap of the mesospheric layers on the lower atmospheric levels .
Only one 14 The wing displaces away from the ground , as a result of
annotator the reduction in (-ve) lift .
in Table 7.2. For 54 arguments, both annotators agree in both span and
argument type. The remaining disagreements are from (1) one annotator
misses a relation (14 occurrences); (2) only partial overlap of the annotated
spans by both annotators (8 occurrences).
Based on the insights from the updated baseline, the first set of document
was revisited and both set of annotations from the first two iterations were
then merged manually. In addition, for the 3rd iteration, two extra sets of 250
sentences were annotated by each annotators. As a result, our dataset consist
of 600 sentences annotated with Consequence and Purpose relations.
Figure 7.1 Causal cue phrases ranked by frequency for all sentences in S2ORC
SemiCause dataset.
2
We release all data for future studies at https://github.com/tugraz-isds/kd
7.3 Baseline Performance 97
Table 7.4 Descriptive statistics of S2ORC-SemiCause dataset. #-sent: total number of anno
tated sentences, #-sent no relations: number of sentences without causality, Argument: total
amount and mean length (token span) of all annotated argument, Consequence/Purpose:
amount and mean length of cause and effect arguments for the respective relation types.
Argument Consequence Purpose
#-sent
#-sent cause effect cause effect
no relations
count mean count mean count mean count mean count mean
overall 600 291 670 9.4 258 8.4 290 9.2 58 10.8 64 12.9
train 360 174 405 9.5 155 8.5 178 9.1 34 11.1 38 13.5
dev 120 55 122 9.3 49 8.1 52 8.8 10 9.7 11 16.1
test 120 62 143 9.3 54 8.3 60 9.9 14 10.9 15 8.9
Table 7.5 Baseline performance using BERT with a token classification head. Both the F1
scores and the standard derivation over 7 different runs are shown. Despite the small sample
size, the standard deviation remain low, similar to previous work [14].
Relation Argument # F1 F1 -filter F1 -filter partial
Consequence Arg1 54 0.43 ± 0.03 0.48 ± 0.02 0.59 ± 0.01
Consequence Arg2 60 0.45 ± 0.03 0.50 ± 0.03 0.62 ± 0.02
Purpose Arg1 14 0.20 ± 0.07 0.25 ± 0.10 0.50 ± 0.05
Purpose Arg2 15 0.31 ± 0.06 0.36 ± 0.08 0.57 ± 0.07
micro average 143 0.39 ± 0.02 0.45 ± 0.02 0.59 ± 0.01
The resulting F1 scores3 are shown in Table 7.5 and is remarkable lower than
for other benchmark NER datasets when down-sampled to similar size [14].
3
The best performance is found using learning rate 1.5e − 4, batch size 8, warm up steps
10, and 10 epochs.
98 S2ORC-SemiCause: Annotating and Analysing Causality
Table 7.6 Comparison of predicted and annotated argument spans for the test split. Exam
ples and total counts (in number of arguments) for correct prediction and for each error
source are also given. Arg 1 and Arg 2 are highlighted with blue and yellow background,
respectively. Partial overlapped texts are highlighted with green background.
Missed examples (false negatives) are the cases where annotators have
labelled while the model fails to predict a relation. For example, the missed
example shown in Table 7.6 uses the rare causal trigger derived from, which
might be the reason why the model failed to recognize.
7.4 Conclusions
Causality is critical knowledge in semiconductor manufacturing. In order to
enable automatic causality recognition, we created the S2ORC-SemiCause
dataset by annotating 600 sentences with 670 arguments for causal rela
tion extraction from a subset of semiconductor literature taken from the
S2ORC dataset. This unique dataset challenges established state-of-the-art
techniques, because of its long spans for each argument. This benchmark
dataset is intended to spur further research, fuel development of machine
learning models, and to provide benefit to the NLP research in semiconductor
domain.
Acknowledgements
The research was conducted under the framework of the ECSEL AI4DI "Arti
ficial Intelligence for Digitising Industry" project. The project has received
funding from the ECSEL Joint Undertaking (JU) under grant agreement No
826060. The Know-Center is funded within the Austrian COMET Program–
Competence Centers for Excellent Technologies under the auspices of the
Austrian Federal Ministry of Transport, Innovation and Technology, the
Austrian Federal Ministry of Economy, Family and Youth and by the State
of Styria. COMET is managed by the Austrian Research Promotion Agency
FFG. We acknowledge useful comments and assistance from our colleagues
at Know-Center and at Infineon.
References
[1] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. BERT: Pre-
training of deep bidirectional transformers for language understanding.
NAACL HLT 2019 - 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language
Technologies - Proceedings of the Conference, 1(Mlm):4171–4186,
2019.
100 S2ORC-SemiCause: Annotating and Analysing Causality
[12] H. Razouk and R. Kern. Improving the consistency of the failure mode
effect analysis (fmea) documents in semiconductor manufacturing.
Applied Sciences, 12(4), 2022.
[13] I. Rehbein and J. Ruppenhofer. A new resource for German causal lan
guage. In Proceedings of the 12th Language Resources and Evaluation
Conference, pages 5968–5977, Marseille, France, May 2020. European
Language Resources Association.
[14] E. Salhofer, X. L. Liu, and R. Kern. Impact of training instance selection
on domain-specific entity extraction using bert. In NAACL SRW, 2022.
[15] I. Segura-Bedmar, P. Martínez, and M. Herrero-Zazo. SemEval-2013
task 9 : Extraction of drug-drug interactions from biomedical texts
(DDIExtraction 2013). In Proc. of the 7th Int. Workshop on Semantic
Evaluation (SemEval 2013), 2013.
[16] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P.
Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. v.
Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame,
Q. Lhoest, and A. Rush. Transformers: State-of-the-art natural language
processing. In Proc. of the 2020 Conf. on Empirical Methods in NLP:
System Demonstrations, 2020.
[17] J. Yang, S. C. Han, and J. Poon. A survey on extraction of causal rela
tions from natural language text. Knowledge and Information Systems,
pages 1–26, 2022.
8
Feasibility of Wafer Exchange for European
Edge AI Pilot Lines
Abstract
This paper compares the contamination monitoring of the three largest
microelectronics research organizations in Europe, CEA-Leti, imec and
Fraunhofer. The aim is to align the semiconductor infrastructure of the three
research institutes to accelerate the supply to European industry for disruptive
chip processing. To offer advanced edge AI systems with novel non-volatile
memory components, integration into state-of-the-art semiconductor fab
rication production flow must be validated. For this, the contamination
monitoring is an essential aspect. Metallic impurities can have a major impact
on expensive and complex microelectronic process flows. Knowing this, it is
important to avoid contamination of process lines. In order to benefit from the
combined infrastructure, expertise and individual competences, the feasibility
of wafer loops needs to be investigated.
Through a technical comparison and a practical analysis of potential
cross-contaminations, the correlation of the contamination measurement
103
DOI: 10.1201/9781003377382-8
This chapter has been made available under a CC BY-NC 4.0 license.
104 Feasibility of Wafer Exchange for European Edge AI Pilot Lines
8.1 Introduction
The aim is to align the semiconductor infrastructure of the three largest
microelectronics research institutions in Europe, CEA-Leti, imec and Fraun
hofer, in order to accelerate supply to European industry for disruptive
chip processing. Contamination monitoring is an essential aspect of this
alignment. Metallic impurities can have a major impact on expensive and
complex microelectronic process flows. Therefore, it is important to avoid
contamination of the process lines. To benefit from the semiconductor
infrastructure, expertise and individual skills, the feasibility of wafer loops
needs to be investigated. Additionally, to offer advanced edge AI systems
with novel non-volatile memory components, integration into state-of-the
art semiconductor fabrication production flow must be validated. Metallic
contamination can have a major impact within the microelectronic process
flow, whereby the different chemical elements have various effects. There
fore, contamination of the process lines must be avoided (Bigot, Danel,
& Thevenin, 2005; Borde, Danel, Roche, Grouillet, & Veillerot, 2007).
To simplify the future exchange of wafers in-between research institutes
and between institutes and semiconductor fabs, it is necessary to find out
more about contamination monitoring and possible cross-contamination.
For this purpose, a technical comparison and a practical analysis of the
possible cross-contaminations is carried out in order to furthermore inves
tigate the correlation of the contamination measurement results of the three
institutes.
8.2 Technical Details and Comparison 105
Table 8.2 Overview VPD-ICPMS LLD determination and technical details for LETI /
IMEC / FhG
Aligned Data LETI IMEC FhG IPMS CNT
Determination LLD Calculated from For complete process
of LLD VPD-ICPMS = 3xstandard VPD-ICPMS
(VPD 3xSigma for deviation of permanent blank
ICPMS) each elements calibration blank method.
and slope of
calibration curve.
VPD Rigaku IAS ExpertT M External source: no
Brand and VPD300A, VPD system data
type stand alone CNT: TePla System
stand alone
ICP-MS Agilent 8800, Perkin-Elmer External source: no
brand and three NexionT M data
type quadrupoles ICP-MS CNT: Thermo
Fischer RQ, single
quadrupole
Exclusion 7 mm 1 mm External source: no
size VPD data
CNT: 5 mm
(planned)
Figure 8.2 shows that the VPD-ICPMS LLDs of each institute are
between 1E+6 and 5E+9 at/cm2 , more or less three decades lower than TXRF
ones.
Differences observed across LLDs of each institute are due to the different
techniques used and the different environments. The collection system at
CEA-Leti is not full automatic and technicians have to transfer a small
container containing the chemical droplet from the VPD to the ICPMS. This
container has to be manually cleaned between collection and all these manual
steps contribute to the increased Na, Mg and Ca levels of contamination.
However, these specific LLDs are still lower than 1E+10 at/cm2 and these
elements are usually not critical for the microelectronic device performances.
For imec, high values of Ti and V seem to be due to specific detector settings
that favours minimal peak interference for Ti and V. For other elements,
all imec LLDs are lower as they use a fully automatic tool without manual
steps. Fraunhofer has a comparable system to CEA-Leti, but it is still in the
method development process and the current analyses are done externally on
an automated system.
108 Feasibility of Wafer Exchange for European Edge AI Pilot Lines
Figure 8.3 Schematic of the VPD bevel collection at (a) IMEC, (b) CEA-LETI and (c) FhG
IPMS
Overall, the VPD-ICPMS LLDs of each institute are very low and
comparable to industry standards and thus are sufficient for the metallic
contamination control in the microelectronic environment. One other impor
tant parameter is the recovery rate that has to be more than 95 % for
each of the elements. As each institute use the same chemical solution
for the collection step, recovery rates are nearly the same and are very
good (>95 %).
Figure 8.4 Comparison LLDs CEA LETI / IMEC for VPD-ICPMS Bevel
Figure 8.5 Comparison TXRF results of CEA LETI / IMEC for IMEC inspection tool
and edge of 300 mm wafers. The tool supports the inspection of patterned
and unpatterned wafers.
Figure 8.5 shows the comparison of TXRF measurement obtained by
CEA-Leti and imec for the inspection tool. There is a high agreement between
the values, demonstrating the comparability of the measurement results. The
Ti measured by imec is assumed to be a handling contamination during the
measurement. Nevertheless, the concentration is low.
Figure 8.6 shows the comparison of the VPD-ICPMS data for the back
side surface of wafers. For the VPD-ICPMS, the results show noticeable
differences. On Figure 8.6, only detected element at concentrations higher
than the LLD are reported; i.e. if an element is not detected in one of institute,
it is not mentioned in the graph. The first conclusion is that more elements
are detected by VPD-ICPMS due to the lower LLDs. All the concentrations
are lower than 1E+11 at/cm2 and are in accordance with TXRF results.
The second conclusion is that the three analysed wafers have not the same
contamination. If CEA-Leti and imec found Ga, Ge and Sb, Fraunhofer did
not detect these elements. Imec and Fraunhofer quantified Al, Fe, Ti and W
whereas CEA-Leti did not find these elements. The analysed wafers are not
twins because the cross-contamination process do not allow to contaminate
each wafers at the same concentration. Moreover, some wafers were more
handled and shipped than other and these differences impact the metallic
contamination.
Figure 8.7 shows the results obtained on the bevel. Contamination levels
on the bevel are higher than those measured on the surface. In this example,
results obtained by CEA-Leti and imec are in agreement when the elements
are detected by both institutes. Concentrations measured by imec are almost
higher than those of CEA-Leti, probably due to the different influencing
factors. At first, collection techniques are different and the droplet scanned
areas are not the same. Moreover, the bevel of each wafers was probably
contaminated by the handling and the shipping. That is why concentrations
8.4 Conclusiion 111
Figure 8.6 Comparison VPD-ICPMS results of CEA LETI / IMEC /FhG for IMEC
inspection tool
Figure 8.7 Comparison VPD-ICPMS bevel results of CEA LETI / IMEC for IMEC
inspection tool
obtained on the bevel were always higher than those obtained on the surface.
The study of the bevel is very challenging and these results show the metallic
contamination due to the process in the selected equipment, but also those
brought by the handling and the shipping.
8.4 Conclusiion
This study confirms that the three different institutes are able to analyse
metallic contamination either by TXRF or VPD-ICPMS with comparable
LLDs. This result is very promising for the exchange of wafers in the future.
TXRF, with higher LLDs, did not show metallic contamination above 1E+11
at/cm2 . On the other side, due to very low limits of detection, VPD-ICPMS
allows to observe different concentrations obtained by the different institutes.
Nevertheless, these concentrations are very low. The cross-contamination in a
tool do not allow to contaminate wafers at the same level. Hence in the future,
in order to compare more reliably the capabilities of different institutes,
an inter-laboratory test with intentionally standardised contaminated wafers
would be necessary. Moreover, all the measurements were done on “witness
wafers” and not on product-wafers. In the future, it will be necessary to
develop techniques able to analyse the metallic contamination on real wafers
112 Feasibility of Wafer Exchange for European Edge AI Pilot Lines
during their flow. In this way, CEA-Leti has developed a new system allowing
the metallic contamination control of the bevel of product wafers. (Boulard,
et al., 2022) (FR Patentnr. U.S. Patent No 20200203190 A1, 2020).
Although some additional improvement is required to create a smooth
loop between the research institutes, this work makes wafer exchange flow
much easier due to the first experiences and contribute to the strengthening
of the collaboration in current and future projects. Moreover, the conclusion
of this study broadens the capabilities in terms of tool, process and expertise
access for potential industrial partners. Thus, an important milestone has been
reached in aligning the three research institutes to offer advanced AI systems
with novel non-volatile memory components.
Acknowledgements
This study was fully financed by TEMPO project. The TEMPO project
has received funding from the Electronic Components and Systems for
European Leadership Joint Undertaking under grant agreement No 826655.
This Joint Undertaking receives support from the European Union’s Hori
zon 2020 research and innovation program and Belgium, France, Germany,
Switzerland, The Netherlands.
References
[1] C. Bigot, A. Danel, S. Thevenin (2005). Influence of Metal Contam
ination in the Measurement of p-Type Cz Silicon Wafer Lifetime and
Impact on the Oxide Growth. Solid State Phenomena (Vols. 108-109),
S. 297–302 doi:10.4028/www.scientific.net/SSP.108-109.297
[2] Y. Borde, A. Danel, A. Roche, A. Grouillet, M. Veillerot (2007). Esti
mation of Detrimental Impact of New Metal Candidates in Advanced
Microelectronics. Solid State Phenomena (Vol. 134), S. 247–250
doi:10.4028/www.scientific.net/SSP.134.247
[3] F. Boulard, V. Gros, C. Porzier, L. Brunet, V. Lapras, F. Fournel, N.
Posseme (21. Mai 2022). Bevel contamination management in 3D inte
gration by localized SiO2 deposition. SSRN Journal (SSRN Electronic
Journal)
[4] D. Autillo, et al. (June 2020). FR Patentnr. U.S. Patent No 20200203190
A1
9
A Framework for Integrating Automated
Diagnosis into Simulation
Abstract
Automatically detecting and locating faults in systems is of particular interest
for mitigating undesired effects during operation. Many diagnosis approaches
have been proposed including model-based diagnosis, which allows to derive
diagnoses from system models directly. In this paper, we present a framework
bringing together simulation models with diagnosis allowing for evaluating
and testing diagnosis models close to its real world application. The frame
work makes use of functional mock-up units for bringing together simulation
models and enables their integration with ordinary programs written in either
Python or Java. We present the integration of simulation and diagnosis using
a two-lamp example model.
9.1 Introduction
To keep systems operational, we need to carry out diagnoses regularly. Diag
nosis includes the detection of failures, the localization of corresponding root
causes, and repair. We carry out regular maintenance activities that include
diagnosis and predictions regarding the remaining lifetime of components to
prevent systems from breaking during use. However, there is no guarantee
113
DOI: 10.1201/9781003377382-9
This chapter has been made available under a CC BY-NC 4.0 license.
114 A Framework for Integrating Automated Diagnosis into Simulation
that system components are not breaking during operation, even when carry
ing out maintenance as requested. In some cases, it is sufficient to indicate
such a failure, i.e., via presenting a warning or error message and passing
mitigation measures to someone else. Unfortunately, there are systems like
autonomous systems where we can hardly achieve such a mitigation process.
For example, in fully autonomous driving, there is no driver anymore for
passing control. Therefore, there is a need for coming up with advanced
diagnosis solutions that cover detection, localization, and repair. A practical
real world problem demonstration of an on-board control agent was validated
in the year 1999, within the scope of Deep Space One, a space exploration
mission, carried out by NASA. Regarding this, the authors of the paper [4]
describe developed methods related to model-based programming principles,
including the area of model-based diagnosis. The methods were applied on
autonomous systems, designed for high reliability, operating as subject of a
spacecraft system.
When we want to integrate advanced diagnosis into systems, we need to
come up with means for allowing us to easily couple monitoring with diagno
sis. As stated by the authors in [3], the coupling enables the diagnosis method
to detect and localize faults based on observations, obtained by monitoring a
cyber-physical system (CPS). Furthermore, we require close integration of
today’s development processes, which rely on system simulation. The latter
aspect is of uttermost importance for showing early that diagnosis based on
monitoring can improve the overall behaviour of a system even when working
not as expected. We contribute to this challenge and present a framework
for integrating different simulation models and diagnoses. The framework
utilizes combining functional mock-up units (FMUs) that may originate from
modeling environments like OpenModelica1 with ordinary programming lan
guages like Java or Python. We use these language capabilities to integrate
diagnosis functionality. The architecture of our framework is based on the
client-server pattern and implemented using Docker containers.
Using our framework, we can easily add diagnoses into systems. In
addition, we can use this framework for carrying out verification and valida
tion of the system functionality enhanced with diagnosis capabilities. In this
manuscript, we present the framework and show the integration of diagnosis.
For the latter purpose, we make use of a simple example. We will make
the framework and the underlying diagnosis engine available for free and as
open-source. The framework contributes to research area of Edge Artificial
1
see https://openmodelica.org
9.2 Model-based Diagnosis 115
B L1 L2
Figure 9.1 A simple electric circuit comprising bulbs, a switch and a battery.
2
We are using Prolog syntax because recent solvers like Clingo (see
https://potassco.org/clingo/) are relying on it.
9.2 Model-based Diagnosis 117
To use a model for diagnosis we only need to define the structure of the
system making use of the component models. For the two bulb example, we
define a battery, a switch, and two bulbs that are connected accordingly to
Figure 9.1.
type(b, bat).
type(s, sw).
type(l1, lamp).
type(l2, lamp).
conn(in_pow(s), pow(b)).
conn(out_pow(s), in_pow(l1)).
conn(out_pow(s), in_pow(l2)).
on(s).
val(light(l1),off).
val(light(l2),on).
When using a diagnosis engine like described in [8] we obtain one single
fault diagnosis {l1}. But how is this working? The diagnosis engine makes
use of a simple mechanism. It searches for a truth setting to the nab\1
predicates, such that the model together with these assumptions is not leading
to a contradiction. When assuming l1 to be not working, the fact that lamp
l2 is on can be derived. However, we cannot derive anything else that would
lead to a contradiction.
118 A Framework for Integrating Automated Diagnosis into Simulation
Note that this simple model is also working in other more interesting
cases. Let us assume that the switch is on but no light is on. For this case,
the diagnosis engine delivers three diagnoses: {b}, {s}, and {l1, l2} stating
the either the battery is empty, the switch is broken, or both lamps are not
working at the same time. Another interesting case that might occur is setting
the switch to off, put still one lamp, i.e., l1 is on. In this case we only obtain
a double fault diagnosis {s, l2} stating that the switch is not working as
expected and lamp l2 as well.
3
see https://fmi-standard.org
4
see https://potassco.org/clingo/
9.3 Simulation and Diagnosis Framework 119
Figure 9.2 Illustration of the simulation and diagnose environment as well as the overall
operating principles. The framework of the FMU Simulation Tool provides an interface to
enable the integration of a diagnose tool and/or other methods. The models can be substituted
by any others in the provided framework.
5
see https://de.mathworks.com/products/matlab.html
6
see https://github.com/INTO-CPS-Association/unifmu
120 A Framework for Integrating Automated Diagnosis into Simulation
(for a given time step) simulation. To enable that feature, it is essential that
the FMU is generated as a co-simulation model. Within a co-simulation setup,
the numerical solver is embedded and supplied by the generated FMU. By the
provided interface methods, the FMU can be controlled by setting the inputs
and parameter, computing the next simulation time step, and reading the
resulting observations. The given setup, enables to execute tools and methods
while the simulation is paused after a simulated time step.
Besides the main diagnose algorithm, the tool enables different output
options to simplify the evaluation of the received diagnose. Thus, the received
data can be exported in a JSON file, CSV file or directly printed in the
terminal during run-time. The output results are the detailed computed
diagnose, the total number of found diagnosis for each fault size, an indicator
for strong faults and the diagnose time separated for each fault size and in
total. As input, the tool requires the Prolog model, representing the CPS as
abstract model (see Section 9.2), and the related observation/constraint file
with all necessary input information to execute the diagnose process.
In reference to Figure 9.2, we show the simulation tool update loop,
where an update is triggered and the observations are received. Further the
observations are passed by the method interface as input to the implemented
diagnose tool. Before calling the diagnose, some configurations are specified,
as the abstract model, the maximum number of computing answer sets, the
maximum fault size of interest and the observations, which are generated
based on the simulation output information. In addition, the diagnose output
format, e.g., JSON or CSV can be selected. Last, the ASP theorem solver
with the given model, configuration and simulation observations is executed.
After receiving the diagnose result of the current time frame, it is stored in
the defined format structure and the simulation is continued with the next
time step in the loop.
9.4 Experiment
To show the applicability of the framework, we make use of the
two-lamps-model concept as shown in Figure 9.1. For the simulation, a
model of the two-lamps-model (see Listing 1) is generated in OpenModelica
comprising a battery (5.0V ), a closing switch and two light bulbs (100Ω).
Besides the connection of each component, the model also describes inputs,
which can be set during the simulation. These inputs are covering the fault
type of each component and the operational switch logic. To give an example
of the component programming, the switch model is shown in more detail
at Listing 2. Besides the component mode, the equations also represent the
behaviour based on different fault states, e.g. a broken switch, resulting in
an infinite high internal resistor value equal to an open electrical circuit.
An equivalent fault state is implemented for each component as shown in
Table 9.1.
Moreover the OpenModelica model is converted into a co-simulation
FMU, which enables to use the model in the described FMU simulation tool.
122 A Framework for Integrating Automated Diagnosis into Simulation
1: Let DS be {}
2: Let Mf be M .
3: for i = 0 to n do
4: Mfj = M ( f ∪ { :- not numABs(i).
) }
5: S = F ASPSolver(Mfj )
6: if i is 0 and S is {{}} then
7: return S
8: end if
9: Let DS be DS ∪ S.
10: for Δ in S do
11: Let C = AB(Δ) be the set {c1 , . . . , ci }
12: Mf = Mf ∪ { :- ab(c1 ), . . ., ab(ci ). }.
13: end for
14: end for
15: return DS
model Two_Lamp_Circuit
P h y s i c a l F a u l t M o d e l i n g . PFM_Bulb b u l b 1 ( r = 1 0 0 . 0 ) ;
P h y s i c a l F a u l t M o d e l i n g . PFM_Bulb b u l b 2 ( r = 1 0 0 . 0 ) ;
P h y s i c a l F a u l t M o d e l i n g . PFM_Switch sw ;
P h y s i c a l F a u l t M o d e l i n g . PFM_Ground gnd ;
P h y s i c a l F a u l t M o d e l i n g . PFM_Battery b a t ( vn = 5 . 0 ) ;
equation
c o n n e c t ( gnd . p , b a t .m) ;
c o n n e c t ( b a t . p , sw . p ) ;
c o n n e c t ( sw . m, b u l b 1 . p ) ;
c o n n e c t ( sw . m, b u l b 2 . p ) ;
c o n n e c t ( b u l b 1 . m, gnd . p ) ;
c o n n e c t ( b u l b 2 . m, gnd . p ) ;
end Two_Lamp_Circuit ;
model Two_Lamp_Circuit_Testbench
P h y s i c a l F a u l t M o d e l i n g . Two_Lamp_Circuit s u t ;
i n p u t FaultType b a t _ s t a t e ( s t a r t=FaultType . ok ) ;
i n p u t O p e r a t i o n a l M o d e switch_mode ( s t a r t=O p e r a t i o n a l M o d e . c l o s e ) ;
i n p u t FaultType s w i t c h _ s t a t e ( s t a r t=FaultType . ok ) ;
i n p u t FaultType b u l b 1 _ s t a t e ( s t a r t=FaultType . ok ) ;
i n p u t FaultType b u l b 2 _ s t a t e ( s t a r t=FaultType . ok ) ;
equation
s u t . sw . mode = switch_mode ;
sut . bat . s t a t e = bat_state ;
s u t . sw . s t a t e = s w i t c h _ s t a t e ;
sut . bulb1 . s t a t e = bulb1_state ;
sut . bulb2 . s t a t e = bulb2_state ;
end Two_Lamp_Circuit_Testbench ;
model PFM_Switch
e x t e n d s P h y s i c a l F a u l t M o d e l i n g . PFM_Component ;
P h y s i c a l F a u l t M o d e l i n g . O p e r a t i o n a l M o d e mode ( s t a r t=O p e r a t i o n a l M o d e . open ) ;
Modelica . Units . SI . R e s i s t a n c e r_int ;
equation
v = r_int * i ;
i f s t a t e == FaultType . ok t h e n
i f mode == O p e r a t i o n a l M o d e . open t h e n
r_int = 1 e9 ;
else
r_int = 1e - 9 ;
end i f ;
e l s e i f s t a t e == FaultType . b r o k e n t h e n
r_int = 1 e9 ;
else
r_int = 1e - 9 ;
end i f ;
end PFM_Switch ;
Table 9.1 CPS Model component state description for the light bulb, switch and battery. All
used states, including fault states of the components are shown.
Component State Description
ok ordinary behaviour
light bulb (bulb),
broken open connection in eletrical circuit
switch (sw)
short short in the electrical circuit
ok ordinary behaviour
battery (bat)
empty empty battery fault
In order to simulate the model behaviour in detail, the update time step is set
to 0.01 seconds. In addition, the fault injection during run-time is configured
to trigger a single light bulb fault at 0.2 seconds and a switch fault after 0.3
seconds, which is described in detail at the simulation part of Figure 9.4.
For the diagnose part, we make use of the described abstract model of the
electrical two-lamps circuit (see Section 9.2). The overall framework is built
up in a way, that a diagnose is computed after each simulated time step and is
based on the actual observations (simulation outputs, parameter and inputs).
The use of a co-simulation FMU, allows a step-by-step simulation, which
enables to pause the simulation during the diagnose process and continuing
afterwards. Therefore, the diagnose time effort has no impact on the overall
simulation results.
Figure 9.3 shows the observed signals for the current flow in the battery,
light bulb 1 and 2 as well as the actual switch mode. Further the injected faults
are highlighted at the correlated time point. In Figure 9.4 a table represents the
observations for the three interesting time sections, as the normal behaviour,
a broken light bulb and a broken switch. After reaching simulation time 0.05
124 A Framework for Integrating Automated Diagnosis into Simulation
Figure 9.3 Simulation showing the measured signal output of the two bulbs, switch and the
battery. For this example a fault injection (broken) in bulb 1 after 0.2 seconds (red indicator)
and a fault injection (broken) to the switch after 0.3 seconds (orange indicator) is initiated.
seconds, the switch mode is changed from open to closed and the model
shows the expected ordinary behaviour without any abnormal components.
Both light bulbs are operating at an expected current consumption of 0.05 A.
These observations are translated to a readable input format for the diagnose
tool, which is shown in the corresponding status row "Observation" (see
Figure 9.4). In regards to the abstract model and the observation input, the
diagnose tool computed a satisfied model at fault size zero, which concludes
an expected ordinary behaviour of all considered components.
The time section at 0.2 seconds shows the behaviour with a broken
light bulb. Thus the current consumption of bulb 1 immediately drops to
0.0 A and the diagnose observation changes from mode on to off. Since
the main power switch is still closed and bulb 2 is in active mode on, the
diagnose model concludes component bulb 1 as abnormal ab(l1). The
next investigated fault (broken) is injected to the closed switch. Since the
power supply for both light bulbs is not given, the current consumption
drops to 0.0 A. The diagnose model concludes as expected an abnormal
switch (ab(s)) or battery (ab(b)) based on the given observations for
single faults. Under consideration of double faults, the computed diagnose
9.5 Conclusion 125
Figure 9.4 Simulation and diagnose output results based on the electrical two-lamps circuit
with a broken bulb after 0.2 seconds and a broken switch at 0.3 seconds. The upper tables
illustrate the simulation input/output signals, which are used as observation for the diagnose
(lower tables) part. Based on the given observations for the three selected time steps, different
diagnose results are obtained.
9.5 Conclusion
In this paper, we have shown how to use an automated diagnosis method
within a simulation framework for a CPS (cyber-physical system). For this
purpose we introduced the foundations behind the model-based diagnosis
method based on a simple electric circuit model comprising two light bulbs,
a switch and battery. Next we describe a framework for simulating the
developed CPS model with the ability of fault injection during run-time. In
order to run the model in the given framework, it is essential to generate a
functional mock-up unit (FMU) based on the developed electrical two lamp
circuit model. By providing the FMU in co-simulation configuration, the
simulation can run in a step-by-step mode (time steps), which enables to call
other functions, as for example the diagnose method, while the simulation is
paused and continued with the next time step.
126 A Framework for Integrating Automated Diagnosis into Simulation
Acknowledgments
The research was supported by ECSEL JU under the project H2020 826060
AI4DI - Artificial Intelligence for Digitising Industry. AI4DI is funded by the
Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT)
under the program "ICT of the Future" between May 2019 and April 2022.
More information can be retrieved from https://iktderzukunft.at/en/
.
References 127
References
[1] R. Davis, H. Shrobe, W. Hamscher, K. Wieckert, M. Shirley, and S. Polit.
Diagnosis based on structure and function. In Proceedings AAAI, pages
137–142, Pittsburgh, August 1982. AAAI Press.
[2] R. Davis. Diagnostic reasoning based on structure and behavior. Artificial
Intelligence, 24:347–410, 1984.
[3] D. Kaufmann, I. Nica, and F. Wotawa. Intelligent agents diagnostics
enhancing cyber-physical systems with self-diagnostic capabilities. Adv.
Intell. Syst., 3(5):2000218, 2021.
[4] N. Muscettola, P. Pandurang Nayak, B. Pell, and B. C. Williams. Remote
agent: to boldly go where no ai system has gone before. Artificial
Intelligence, 103(1):5–47, 1998. Artificial Intelligence 40 years later.
[5] R. Reiter. A theory of diagnosis from first principles. Artificial Intelli
gence, 32(1):57–95, 1987.
[6] F. Wotawa. Reasoning from first principles for self-adaptive and
autonomous systems. In E. Lughofer and M. Sayed-Mouchaweh, editors,
Predictive Maintenance in Dynamic Systems – Advanced Methods, Deci
sion Support Tools and Real-World Applications. Springer, 2019.
[7] F. Wotawa. Using model-based reasoning for self-adaptive control of
smart battery systems. In Moamar Sayed-Mouchaweh, editor, Artificial
Intelligence Techniques for a Scalable Energy Transition – Advanced
Methods, Digital Technologies, Decision Support Tools, and Applica
tions. Springer, 2020.
[8] F. Wotawa and D. Kaufmann. Model-based reasoning using answer set
programming. Applied Intelligence, 2022.
[9] F. Wotawa, O. A. Tazl, and D. Kaufmann. Automated diagnosis of
cyber-physical systems. In IEA/AIE (2), volume 12799 of Lecture Notes
in Computer Science, pages 441–452. Springer, 2021.
10
Deploying a Convolutional Neural Network
on Edge MCU and Neuromorphic Hardware
Platforms
Abstract
The rapid development of embedded technologies in recent decades has led
to the advent of dedicated inference platforms for deep learning. However,
unlike development libraries for the algorithms, hardware deployment is
highly fragmented in both technology, tools, and usability. Moreover, emerg
ing paradigms such as spiking neural networks do not use the same prediction
process, making the comparison between platforms difficult. In this paper,
we deploy a convolutional neural network model on different platforms
comprising microcontrollers with and without deep learning accelerators and
an event-based accelerator and compare their performance. We also report the
perceived effort of deployment for each platform.
10.1 Introduction
Edge computing is a key tool in harnessing the possibilities of artificial
intelligence. Some advantages of edge over cloud processing are low latency,
allowing real-time application and connectivity independence, i.e., no need
129
DOI: 10.1201/9781003377382-10
This chapter has been made available under a CC BY-NC 4.0 license.
130 Deploying a Convolutional Neural Network on Edge MCU
processing units, the technology of the hardware, and the frameworks and
tools used during the deployment of the models to benchmark. To harmonize
the performance assessment, benchmarking suites such as TinyMLPerf [3]
have been created. Recently, a benchmarking suite has been developed for
event-based neuromorphic hardware [4]. However, both these solutions still
need manual adaptation of the code to run on new platforms. While the
benchmarking gives good insights about which and why to select a certain
platform. It still remains the question of how to use the benchmarking
tools itself. Each platform comes with its own SDK, conversion tools, and
constraint of utilization, which in turn limits the possibility of comparing the
platforms between them.
Today, many benchmarks are therefore performed on just a few hardware
platforms and comparing only a single use-case, as alternatives are more
cumbersome. Furthermore, it is easier to benchmark and compare platforms
from the same constructor, as the deployment pipelines are usually similar
between devices. In this regard, standard architectures LeNet-5 and ResNet
20 have been benchmarked on a few STM32 boards [5]. Machine learning
algorithms have also been compared on Cortex-M processors [6][7]. Some
efforts of cross-constructor benchmarking have also been made. For example,
a recent work deployed a gesture recognition and wake-up words application
on an Arduino Nano BLE and a STM32 NUCLEO-F401RE [8] using a
convolutional neural network.
While the above research focuses on the established STM32 Cortex-
M based MCUs, some emerging processors are also explored [9], but
the research in this domain remains scarce. Furthermore, the deployment
pipelines are not documented, which limits the reproducibility of the results.
In our research, we deploy a single neural network on three different
platforms and observe their performance. We also highlight the difference
between the deployment pipelines of each constructor, and we perform a
qualitative study of the easiness of deployment on each system.
10.3 Methods
In this section, we present the selected task and associated experimental setup,
and a method to evaluate the effort of the deployment.
edge devices today. These sample devices are a very small subset of the large
variety of devices today, but they show that with only three different board
manufacturers, an extensive adaptation of the deployment pipeline is neces
sary. The selected 3 devices for our experiments are the following: a Kendryte
K210 from Canaan, a dual-core RISC-V processor with floating-point units;
an STM32L4R9 from STMicroelectronics (ST) with an ARM Cortex-M4
core also including floating-point unit, and SynSense DynapCNN, an event-
based processor. Table 10.1 summarizes the major differences between these
platforms.
Kendryte K210
The Kendryte K210 is used with the Sipeed MaixDock M1. The Neural
networks embedded in this device were converted from Keras H5 file format,
10.3 Methods 133
using Tensorflow 2.9.1 and associated TFLite. The firmware version of the
Kendryte is 0.6.2, and the version of the NNCase package used for conversion
is 0.2.
STM32L4R9
The STM32L4R9 board with an Arm Cortex-M4 core processor from ST
is programmed in C. Due to the complexity of hardware initialization, ST
provides a tool, STM32CubeMX 6.5.0, which automatically generates an
initial C project for a specific board. The tool X-CUBE-AI 7.1.0 converts
TFLite models into C files which are, alongside the X-CUBE-AI inference
library, added to the project. The Keras H5 file network is converted to
TFLite format using Tensorflow 2.8.2 and Python 3.6. Gcc-arm-none-eabi
15:10.3-2021.07-4 and Make 4.2.1 are used to compile the whole project,
and STM32CubeProgrammer 2.10.0 is used to upload the binaries on the
device.
DynapCNN
The SynSense DynapCNN processor was programmed using Python 3.7.13
with PyTorch 1.11.0, Sinabs 0.3.3 (and underlying Sinabs-DynapCNN
0.3.1.dev3), and Samna 0.14.33.0 libraries. The neural network is written
in PyTorch and converted to a spiking version using Sinabs, while Samna
is used to map the network to the hardware. The inputs are presented to the
network using a preprocessing function that generates spikes1 from random
sampling of the image, using the following function, where tWindow is the
duration of the spiking frame and img has shape [channels, width, height]:
10.3.1.3 Deployment
For standalone platforms, the network was converted and uploaded to the
platform. For Kendryte, the inference script was written such that the model
1
Spikes are binary events (on or off) distributed in input space and time.
134 Deploying a Convolutional Neural Network on Edge MCU
Table 10.1 Relevant technical specifications of the devices (from constructor websites).
Board Kendryte K210 STM32L4R9 DynapCNN
Processor ISA Dual-core RISC-V 64b ARM Cortex-M4 Event-based
Power Consumption 300mW 66mW 1mW
Max Frequency (MHz) 900 120 -
TOPS/W 3.3 - -
Standalone Yes Yes No
Event-based No No Yes
Language MicroPython C Python
Figure 10.2 Deployment pipelines for all platforms. From left to right: STM32L4R9,
Kendryte K210 and DynapCNN. For DynapCNN, the pipeline is contained in a single Python
script, while the other relay on external languages and tools.
is loaded at the beginning of the script and processes images one by one.
The images are transmitted via serial communication and inferred by infer
ence script. In X-CUBE-AI, this is automatically done, while Kendryte
requires a script that sends batches of images and obtains the predictions. For
DynapCNN, the images are predicted by sending the corresponding events to
the device and reading the output events from the buffer of the board.
The prediction time is provided automatically by the X-CUBE-AI plat
form, while Kendryte requires to time the prediction manually. In the
MicroPython script used for inference on Kendryte, we put a counter around
the line performing the inference. For DynapCNN, the reported times corre
sponds to the timestamp of the first output event and the final output event,
respectively. Both times are averaged over the test samples. The computation
of the key performance indicators (accuracy, mean time) is performed offline.
Figure 10.2 illustrates the pipelines for all platforms.
10.3 Methods 135
10.4 Results
In this section, we present the results and metrics recorded for each platform,
and the effort perceived by the team to perform the experiments.
2
Some samples (with indices [18, 247, 493, 495, 717, 894, 904, 947] in test set) did not
produce any spikes for an unknown reason. In that case, we removed the associated labels and
compute the balanced accuracy on the 992 remaining samples.
10.5 Conclusion 137
Table 10.3 Perceived effort for each stage of the inference. 1: small, 5: large.
Board A S G M I Total
Kendryte 1 3 2 3 3 12
STM32L4R9 1 2 4 3 2 12
DynapCNN 3 1 3 1 1 9
10.5 Conclusion
Although the development of embedded machine learning holds great
promise, the lack of consistency and standardization across devices makes
development extremely platform-dependent. Deploying a model on these
devices requires to use of low-level tools, such as C language. However, most
models are developed using (high-level) Python-based tools. The deployment
process of a model therefore requires adaptation of the model from Python
to C, which is time-consuming and is prone to errors and artifacts in the
final implementation. Platform providers are aware of this problem and have
started putting effort into facilitating the deployment by providing automated
tools and interfaces with DL frameworks. Specifically, for the platforms used
in these experiments, Sipeed has ported MicroPython to the Maix Dock,
allowing to write code close to the one used to train the model; SynSense
provides a library that allows interaction with the DynapCNN directly from a
Python script, and allow simulation of the model before deployment, to get a
quick idea of performance. Finally, the well-established ST-Microelectronic
provides the X-CUBE-AI tool, which, in addition to analyzing the model
before deployment, offers the possibility of validating the model on the target
and retrieves relevant metrics without writing a single line of code.
However, these tools are recent and standards are not yet established. To
promote and accelerate the development of machine learning on embedded
interfaces, it is necessary to provide standardized tools accessible to model
138 Deploying a Convolutional Neural Network on Edge MCU
Acknowledgements
This work is supported through the project ANDANTE. ANDANTE has
received funding from the ECSEL Joint Undertaking (JU) under grant agree
ment No 876925. The JU receives support from the European Union’s
Horizon 2020 research and innovation programme and France, Belgium, Ger
many, Netherlands, Portugal, Spain, Switzerland. The authors are responsible
for the content of this publication.
References
[1] Q. Liu, O. Richter, C. Nielsen, S. Sheik, G. Indiveri, and N. Qiao.
Live demonstration: face recognition on an ultra-low power event-driven
References 139
Abstract
The recent advancements towards Artificial Intelligence (AI) at the edge
resonate with an impression of a dichotomy between resource intensive,
highly abstracted Machine Learning (ML) research and strongly optimized,
low-level embedded design. Overcoming such opposing mindsets is imper
ative for enabling desirable future scenarios such as autonomous driving
and smart cities. edge AI must incorporate both straightforward streamlined
deployments together with resource efficient execution to achieve general
acceptance. This research aims to exemplify how such an endeavour could be
realized, utilizing a novel low power AI accelerator together with a state-of
the-art object detection algorithm. Different considerations regarding model
structure and efficient hardware acceleration are presented for deploying
Deep Learning (DL) applications in resource restricted environments while
maintaining the comfort of operating at a high degree of abstraction. The
goal is to demonstrate what is possible in the field of edge AI once software
and hardware are optimally matched.
11.1 Introduction
With AI shifting from a simple research subject towards end user applica
tions, the issue of efficient deployment moves into focus. ML workloads
141
DOI: 10.1201/9781003377382-11
This chapter has been made available under a CC BY-NC 4.0 license.
142 Efficient Edge Deployment Demonstrated on YOLOv5 and Coral Edge TPU
are decidedly different from average computing tasks. Hence, GPUs were
the common solution for such undertakings. Realizing mobile intelligent
appliances, requires even more specialized, low power accelerators which
can be integrated into embedded environments. Such edge solutions attracted
increasing interest within the last years. The European Strategic Research and
Innovation Agenda (SRIA) [1] concretizes the term even further by intro
ducing the terms Micro-, Deep- and Meta-edge. There are several different
solutions available which target this new frontier. Most prominent are the
NVIDIA Jetson family, which utilizes optimized embedded GPUs, the Intel
Neural Compute Stick 2 which is comprised of a specialized Vision Process
ing Unit (VPU) and the Google Coral edge Tensor Processing Unit (TPU),
which will be the focus of this work. As such, its impact on related research
is presented in the following section. The task of object detection was chosen
to be part of the experimental test setup for evaluating the accelerator. You
Only Look Once (YOLO) version 5 [2] serves as delegate for these class
of networks in the upcoming section. It is evaluated, how models can be
modified to facilitate edge TPU characteristics. Furthermore, it is shown how
this optimized solution compares to models provided by Google. With a focus
on deployment, a lightweight software stack is introduced which enables
efficient AI solutions without sacrificing high-level development. Finally, a
conclusion is provided giving a synapsis of the key findings and offering
points of interest for future work.
Figure 11.1 Raspberry Pi 4 with Google Coral edge TPU USB accelerator.
11.3.2 YOLOv5
The original You Only Look Once (YOLO) architecture was proposed by
Joseph Redmon in 2016 [16]. It performs both object detection and classifi
cation in a single model. This resulted in a significant performance increase
compared to classical two stage designs (e.g., Region Based Convolutional
Neural Networks (R-CNNs) [17]). Since the original design, many improve
ments were made. YOLOv5 [2] is based on the YOLOv3 [18] architecture. It
is under constant open-source development by Ultralytics, who shifted the
focus from academic research to accessible deployment. They provide an
end-to-end solution which allows for training, testing and exporting models to
a variety of different deployment frameworks. This includes the integration
of the previously described pipeline for generating edge TPU models from
version 6.1 onward.
propagated through the network. Hence, reducing the input size results in
smaller intermediate tensors. Further reduction can be induced by limiting
the number of output classes. If graph modifications are viable, a divide and
conquer strategy can be used to split tensors before the operation and merging
afterwards. Moving these operations to the bottom of the graph can also be an
option as the instruction are fast on CPU. A last option is using mathematical
transformation to change the graph beneficially.
Some of these strategies were used to optimize the YOLOv5 models
which are evaluated further in this research. All changes were committed to
the open-source project in a pull request [22] and are part of the next major
release (6.2). Table 11.1 shows the performance impact for the demonstrator
setup. Both model variants experienced a significant speedup in inference
time. The variant with the larger input size improves significantly.
Figure 11.3 USB3 speed-accuracy comparison of different model types and configurations
for edge TPU deployment.
works best lower input sizes, while larger inputs cause an unproportionate
slowdown compared to the benefit in accuracy. Interesting are the nano and
small models with 320 px input. They have an almost identical inference
time, while the accuracy of the s-model is significantly better. They share the
same vertical graph structure, while the larger one is scaled horizontally by
a factor of two. Hence, the small variant has twice as many weights for each
convolutional layer. This aligns with the insights from [12] that horizontal
scaling is preferable. The model should be very close to a sweet spot, for
which all weights are cached within the 8 MB device memory. Sacrificing
some model vertical space for more width could theoretically improve the
accuracy even further.
In general, YOLOv5 performs better than the other models. Only the
nano model has issues, which is probably caused by its particularly small
file size. If speed is the deciding factor, SSDLite MobileDet [26] [27], is
still the preferable solution. The classical SSD Mobilenetv2 [26] [28] does
not seem to be competitive anymore. The EfficientDet models perform rea
sonable, however considering the additional overhead by a particularly slow
postprocessing operation, YOLOv5 should be considered the better solution.
All models share a low accuracy for small objects, which could be an issue
inflicted by quantization.
150 Efficient Edge Deployment Demonstrated on YOLOv5 and Coral Edge TPU
Figure 11.4 YOLOv5s inference speed comparison between USB2 and USB3
Figure 11.5 Micro software stack for fast and lightweight edge deployment.
Acknowledgements
This work has been financially supported by the AI4DI project. AI4DI
receives funding within the Electronic Components and Systems For Euro
pean Leadership Joint Undertaking (ESCEL JU) in collaboration with
the European Union’s Horizon 2020 Framework Programme and National
Authorities, under grant agreement n◦ 826060.
References
[1] AENEAS, Inside Industry Association, and EPOSS. ECS – Strategic
Research and Innovation Agenda 2022. en. Jan. 2022. URL:
https://ecscollaborationtool.eu/publication/download/slides-ovidiu
vermesan.pdf (visited on 03/31/2022).
[2] G. Jocher et al. ultralytics/yolov5: v6.1 - TensorRT, TensorFlow
Edge TPU and OpenVINO Export and Inference. Feb. 2022. URL:
https://zenodo.org/record/6222936 (visited on 03/30/2022).
[3] A. Boschi et al. “A Cost-Effective Person-Following System for Assis
tive Unmanned Vehicles with Deep Learning at the Edge”. en. In:
Machines 8.3 (Aug. 2020), p. 49.
References 153
[31] C. R. Harris et al. “Array programming with NumPy”. en. In: Nature
585.7825 (Sept. 2020), pp. 357–362.
[32] Q. Wang et al. “AUGEM: automatically generate high performance
dense linear algebra kernels on x86 CPUs”. en. In: Proceedings of the
International Conference on High Performance Computing, Network
ing, Storage and Analysis. Denver Colorado: ACM, Nov. 2013, pp.
1–12.
12
Embedded Edge Intelligent Processing for
End-To-End Predictive Maintenance in
Industrial Applications
Abstract
This article advances innovative approaches to the design and implementation
of an embedded intelligent system for predictive maintenance (PdM) in
industrial applications. It is based on the integration of advanced artificial
intelligence (AI) techniques into micro-edge Industrial Internet of Things
(IIoT) devices running on Arm� Cortex� microcontrollers (MCUs) and
addresses the impact of a) adapting to the constraints of MCUs, b) analysing
sensor patterns in the time and frequency domain and c) optimising the
AI model architecture and hyperparameter tuning, stressing that hardware–
software co-exploration is the key ingredient to converting micro-edge IIoT
devices into intelligent PdM systems. Moreover, this article highlights the
importance of end-to-end AI development solutions by employing existing
frameworks and inference engines that permit the integration of complex AI
mechanisms within MCUs, such as NanoEdgeTM AI Studio, Edge Impulse
and STM32 Cube.AI. Both quantitative and qualitative insights are presented
in complementary workflows with different design and learning components,
as well as in the backend flow for deployment onto IIoT devices with a
common inference platform based on Arm� Cortex� -M-based MCUs. The
use case is an n-class classification based on the vibration of generic motor
rotating equipment. The results have been used to lay down the foundation
157
DOI: 10.1201/9781003377382-12
This chapter has been made available under a CC BY-NC 4.0 license.
158 Embedded Edge Intelligent Processing for End-To-End Predictive Maintenance
of the PdM strategy, which will be included in future work insights derived
from anomaly detection, regression and forecasting applications.
With PdM, the motors are serviced considering the actual wear and tear
and service needs, reducing unexpected outages, making fewer scheduled
maintenance repairs or replacements, and using fewer maintenance resources
(including spare parts and supplies) while simultaneously decreasing failures.
PdM provides the prerequisite foundation for PsM and autonomous main
tenance (by executing actions automatically, without human intervention).
PsM builds on the infrastructure and data collected for PdM, following the
various corrective actions taken by maintenance personnel and the resulting
outcomes.
Figure 12.1 illustrates a typical industrial motor with a rotor, stator,
bearings, and shaft as essential components for the engine’s normal operation.
The various components conditions and operations are possible causes
that can generate anomalous behaviour, thus defining various abnormal states
(classes). A large amount of historical and real-time information is required
to identify, classify, and predict motor’s possible failures. AI-based ML and
DL algorithms are suitable to deal with these types of tasks.
This paper focuses on AI-based PdM approaches, which learn from
historical and real-time data and recommend the best timing and course of
action for a given set of conditions and sub conditions employing ML and
DL models implemented using micro-edge-embedded devices. For example,
the implementation of an ML solution into a PdM application includes
several steps: data preparation, feature engineering, algorithm selection and
parameter tuning.
The interaction between the edge IIoT devices, ML and DL have opened
opportunities for new data-driven approaches for PdM solutions in industrial
processes. In this paper, different techniques and tools were successfully
tested using various methods based on ML and DL to predict the state
of industrial motors and to detect and classify motors conditions based
on trained data. The PdM monitoring has been tested on measurements
Table 12.1 Frameworks and inference engines for integrating AI mechanisms within MCUs
the accelerometer spectral features (e.g., root mean square (RMS), frequency
and amplitude of spectral power peaks, etc.) and optimise the performance.
In the end the three models were deployed and integrated with the firmware
using STM32 CubeIDE. Finally, inference classifications were run to note the
performance of the implementations and deployments.
Figure 12.3 Visualisation of two selected classes signals in both temporal and frequency
domain with NEAI
Till acceptable quality-labelled data sets were arrived at, several iter
ations were performed, and this included recording new signals without
background noise, collecting/recording longer signals and even changing the
categorisation of classes.
Figure 12.5 Snapshots of Feature Explorer in EI based on the pre-processing block early in
the process.
At the end of the training, the model’s performance and the confusion
matrix of the validation data can be evaluated. Figure 12.6 shows an accuracy
and a loss on the training and validation datasets, comparable with the results
obtained with NEAI with a different model architecture. To avoid overfitting,
the learning rate was reduced, and more data was collected, and the model
was re-trained.
Figure 12.6 Confusion Matrix and Data Explorer based on full training set: Correctly
Classified (Green) and Misclassified (Red).
Figure 12.7 A comparison between int8 quantized and unoptimized versions of the same
model, showing the difference in performance and results.
hyperparameters were exchanged back and forth between the EI and Python
frameworks.
The improvements consisted in making the model deeper by adding more
layers, and wider by increasing the number of hidden units, changing the
activation and optimisation functions, learning rate, fitting more data.
While the improvement process was run manually in Python, the EI’s
Edge Optimized Neural (EONTM ) Compiler [9] can be used to find the
best solution for the Arm� Cortex� -M-based MCUs, i.e., the most optimal
combination of processing block and ML model for the given set of con
straints, including latency, RAM usage, and accuracy. Currently, there are a
limited number of MCUs that are supported and does not include the MCU
of STWIN IIoT device (Arm� Cortex� -M4 MCU STM32L4R9), which
operates at a frequency of up to 120MHz. Nevertheless, the estimated on-
device performance could be evaluated for Cortex-M4F 80MHz, to determine
the impact of optimisations such as quantisation across different slices of the
datasets (Figure 12.7).
12.4.4 Testing
ML/DL model testing usually refers to the evaluation of the trained model
on the testing dataset to analyse how well the model performs against unseen
data. However, model testing in NEAI and EI provide more than that. Both
platforms provide a microcontroller emulator to test and debug the generated
model prior to its deployment on the device.
As part of the NEAI toolkit, a microcontroller emulator is provided for
each library to test and debug the generated model without the need to
download, link or compile. Test signals can be imported from file; however,
170 Embedded Edge Intelligent Processing for End-To-End Predictive Maintenance
Figure 12.8 Evaluation of trained model using NEAI Emulator with live streaming.
the signals were imported live from the same datalogger application through
serial port, in this way ensuring completely new signals, not seen before.
The classification is automatically run using the live signals, while changing
motor speeds and triggering shaft disturbances, to switch between classes and
cover all five states and classes.
The results are presented in Figure 12.8, showing that the classifier man
aged to properly reproduce and detect all classes with reasonable certainty
percentages.
In EI, the trained model was evaluated by assessing the accuracy using
the test dataset. To ensure unbiased evaluation of model effectiveness, the test
samples were not used directly or indirectly during training. The EI emulator
took care of extracting the features from the test set, running the trained
model, and reporting the performance in the confusion matrix. The results
are shown in Figure 12.9.
12.4.5 Deployment
In the context of micro-edge embedded systems, model deployment is depen
dent on the hardware/software platform and is more or less automated, and
in essence comprises three steps: the first is a format conversion of the
fully trained model; the second is a weight/model compression to reduce the
amount of memory to store the weights in the target hardware platform and
to simplify the computation so it can run efficiently on target processors. The
third entails compiling the model and generating the code to be integrated
with the MCUs firmware.
12.4 Experimental Setup 171
Figure 12.10 Live classification streaming with detected state and confidence (with Tera
Term)
The third flow was branched out from EI and further developed in a
Python framework using TensorFlow’s Keras API. The resulted model was
converted into optimised C code with STM32 Cube.AI, an extension of the
CubeMX tool, which offers simple and efficient interoperability with other
ML frameworks.
12.4.6 Inference
Inference classifications have been conducted with all applications running
directly from the target hardware platform on the micro-edge IIoT devices,
producing classification in real-time.
The state machine consists mainly of two states with two functions “init”
and “inferencing”, respectively, with the former initialising the deep NN
model and the latter being a continuously running function for collecting
raw data from the sensors on the micro-edge IIoT device and making clas
sifications in real-time. A snapshot from the classification based on the NEAI
model is shown in Figure 12.10.
12.5 Discussion and Future Work 173
The “?” indicate the state switching, which happens after several con
secutive confirmations of inference result is encounter, and this number is
programmable.
Acknowledgements
This work is conducted under the framework of the ECSEL AI4DI “Artificial
Intelligence for Digitising Industry” project. The project has received funding
from the ECSEL Joint Undertaking (JU) under grant agreement No 826060.
The JU receives support from the European Union’s Horizon 2020 research
174 Embedded Edge Intelligent Processing for End-To-End Predictive Maintenance
References
[1] R. Sanchez-Iborra and A.F. Skarmeta, “TinyML-Enabled Frugal
Smart Objects: Challenges and Opportunities,” in IEEE Circuits and
Systems Magazine, vol. 20, no. 3, pp. 4-18, third quarter 2020.
https://doi.org/10.1109/MCAS.2020.3005467
[2] T. Hafeez, L. Xu and G. Mcardle, “Edge Intelligence for Data Handling
and Predictive Maintenance in IIoT,“ in IEEE Access, Vol. 9, pp. 49355
49371, 2021. https://doi.org/10.1109/ACCESS.2021.3069137
[3] Y. Liu, W. Yu, T. Dillon, W. Rahayu and M. Li, “Empowering
IoT Predictive Maintenance Solutions With AI: A Distributed Sys
tem for Manufacturing Plant-Wide Monitoring,“ in IEEE Transactions
on Industrial Informatics, vol. 18, no. 2, pp. 1345-1354, Feb. 2022.
https://doi.org/10.1109/TII.2021.3091774
[4] H. Wang, H. Sayadi, S.M. Pudukotai Dinakarrao, A. Sasan, S. Rafatirad
and H. Homayoun, “Enabling Micro AI for Securing Edge Devices
at Hardware Level,“ in IEEE Journal on Emerging and Selected Top
ics in Circuits and Systems, vol. 11, no. 4, pp. 803-815, Dec. 2021.
https://doi.org/10.1109/JETCAS.2021.3126816
[5] F. Cipollini, L. Oneto, A. Coraddu, et al. “Unsupervised Deep Learning
for Induction Motor Bearings Monitoring”. Data-Enabled Discov. Appl.
3, 1, 2019. https://doi.org/10.1007/s41688-018-0025-2
[6] M. Guenther. 6 Ways to Improve Electric Motor Lubrication for Better
Bearing Reliability. Available online at: https://blog.chesterton.com/lu
brication-maintenance/improving-electric-motor-lubricaiton/
[7] C. Kammerer, M. Gaust, M. Küstner, P. Starke, R. Radtke, and A. Jesser,
“Motor Classification with Machine Learning Methods for Predictive
Maintenance,“ IFAC-PapersOnLine, vol. 54, no. 1, pp. 1059–1064,
2021. https://doi.org/10.1016/j.ifacol.2021.08.126
[8] Edge Impulse. Available online at: https://www.edgeimpulse.com
[9] EON Tuner. Available online at: https://docs.edgeimpulse.com/docs/eon
tuner
[10] J. Jongboom, 2020. “Learning for all STM32 developers with
STM32Cube.AI and Edge Impulse”. Available online at: https://ww
w.edgeimpulse.com/blog/machine-learning-for-all-stm32-developers
-with-stm32cube-ai-and-edge-impulse
References 175
Abstract
In this paper, we assess the usage of machine learning techniques to predict
the infection events of Downy Mildew. Every year, Champagne vineyards
are exposed to grapevine diseases that affect the plants and fruits, most
caused by fungi. Using data from an agro-meteorological station, we compare
machine learning performances against traditional prediction methods for
Downy Mildew (Plasmopara viticola) infections. Indeed, depending on the
year, we obtain 82 to 97% accuracy for primary infections and 98% for
secondary infections. These results may guide the development of Edge AI
applications integrated to meteorological stations and agricultural sensors,and
help winegrowers to rationalize the vine’s treatment, limiting the damages
and the usage of fungicide or chemical products.
13.1 Introduction
Every year, Champagne vineyards are exposed to grapevine fungal diseases
that affect the plants and fruits. Black rot (Guignardia bidwellii), Downy
177
DOI: 10.1201/9781003377382-13
This chapter has been made available under a CC BY-NC 4.0 license.
178 AI-Driven Strategies to Implement a Grapevine Downy Mildew
and robust methods and to prepare the path to their implementation on Edge
AI devices deployed directly on the vineyards.
The remainder of this paper is organized as follows: Section 13.2 presents
the datasets and research methodology used in this work. Section 13.3 intro
duces the different machine learning techniques used in this work, as well as
their implementation specifications. In Section 13.4 we present a comparative
study of machine learning strategies, aiming at their accuracy as well as their
robustness over the years. Section 13.5 goes beyond the simple results by
discussing the impact of AI-based algorithms on the monitoring of crops.
Finally, Section 13.6 concludes this work.
1
Data could be provided upon request
180 AI-Driven Strategies to Implement a Grapevine Downy Mildew
alert”/ “not alert” labels. We decided to split it into two binary classification
problems instead of a multi-class classification problem to favour each alert
type’s accuracy. Henceforth, we choose to compare five well-known binary
classification techniques:
• Decision trees
• Random forest
• Support Vector Machines (SVM)
• Dense Neural Networks (DNN)
• Convolutional Neural Networks (CNN)
Decision Trees and Support Vector Machine predictors use the basic
scikit-learn implementation (DecisionTreeClassifier and SVC, respectively)
13.4 Results 183
13.4 Results
13.4.1 Primary Mildew Infection Alerts
As stated above, we create three different training-validation datasets, one for
each year. Therefore, Table 13.1 compares the accuracy score from the 2019’s
model when applied to 2020 and 2021. The best scores are presented in bold,
showing that two techniques detach from the others: CNN and SVM. CNN
shows slight better scores in the 2021 dataset but is closely followed by SVM.
In the case of the 2020’s model, Random Forest and SVM perform well
for the 2019 case, and almost all techniques (except simple Decision Tree)
present similar results for the 2021 case (see Table 13.2). Finally, the 2021’s
model Random Forest seems the best technique for the 2019 dataset, while
SVM is better in the case of the 2020 dataset (Table 13.3). We can, however,
point out that Random Forest achieves good results in this latter case, even if
not as good as the SVM scores. If the“ best” technique varies from year to
184 AI-Driven Strategies to Implement a Grapevine Downy Mildew
year, both SVM and CNNs show robust results, closely followed by Random
Forest.The choice reposes therefore in the computing capabilities available to
the devices.
We can also see that 2021 was different from the previous ones. If models
from 2019 or 2020 achieve lower scores when predicting 2021 alerts, we can
also say that models trained with 2021 data are among the best ones when
predicting alerts for the previous years. This was somehow expected, as 2021
was rich in favourable events for spreading diseases in the vineyard.
seem much easier to identify, with higher accuracy scores. Unfortunately, the
absence of a 2021 dataset does not allow a broader comparison under different
weather conditions (2021 presented the lowest accuracy in the Primary Alert
experiments).
Once again, CNN presents the highest accuracy scores, closely followed
by SVM and Random Forest. Indeed, we shall point-out that SVN and
Random Forest are good candidates when considering the implementation
on environments with performance restrictions, such as in the case of IoT /
Edge AI.
13.5 Discussion
The results obtained here are encouraging but shall be considered in the
context of the reduced span of the dataset gathered from a single agro
meteorological station installed since 2019. A deeper analysis would require
several years of data, as performed by [3] or [9].
However, our main objective was to conceive a proof of concept inscribed
in the efforts of the European project AI4DI to develop and disseminate an
environmental monitoring system based on different industrial sensors (e.g.,
TEROS, Bosch BME68x, ST Microelectronics) connected to STM32WL
enhanced by a machine learning core. These sensors are expected to enable
continuous monitoring of the environment, the soil, meteorological condi
tions, and/or plant performances.
Besides implementing AI models on the STM32WL, some sensors can
also be enriched with a machine learning core. This is the case of the LSM6D
SOX sensor from ST Microelectronics, which comprises a set of configurable
parameters and decision trees able to run AI algorithms in the sensor itself.
Hence, this environment would benefit from simpler models such as random
forest and SVM, rather than CNN.
186 AI-Driven Strategies to Implement a Grapevine Downy Mildew Warning System
13.6 Conclusion
Every year, Champagne vineyards are exposed to grapevine diseases that
affect the plants and fruits, and the Downy Mildew, caused by Plasmopara
viticola is a common disease. Forecasting the infection events of Downy
Mildew may help vine growers to rationalize the treatment of the vine,
limiting the damages and the usage of fungicide or chemical products.
In this paper, we compare the accuracy of several machine learning
techniques when applied to datasets from the Champagne region. By creating
multiple models and using cross-validation across different years, we were
able to identify three candidate techniques with close results, namely Convo
lutional Neural Networks, Support Vector Machines and Random Forest.
If CNN seems to be more robust across different years, the accuracy
difference is minimal,and the other techniques present an interest in the case
of deployment over an Edge AI infrastructure. Indeed, we aim to prepare
the path to the implementation of Downy Mildew forecast models on Edge
AI sensing devices that will be deployed directly on the vineyards to closely
monitor the crops.
Acknowledgements
This work has been performed in the project AI4DI: Artificial Intelligence
for Digitizing Industry, under grant agreement No 826060. The project
is cofunded by grants from Germany, Austria, Finland, France, Norway,
Latvia, Belgium, Italy, Switzerland, and the Czech Republic and - Elec
tronic Component Systems for European Leadership Joint Undertaking
(ECSEL JU).
We want to thank Vranken-Pommery Monopole for providing the datasets
for the training. We also thank the ROMEO Computing Center2 of Université
de Reims Champagne Ardenne, whose Nvidia DGX-1 server allowed us to
accelerate the training steps and compare several model approaches.
2
https://romeo.univ-reims.fr
References 187
References
[1] J. Abdulridha, Y. Ampatzidis, J. Qureshi, and P. Roberts. Identification
and classification of downy mildew severity stages in watermelon utiliz
ing aerial and ground remote sensing and machine learning. Frontiers in
Plant Science, 13, 2022.
[2] J. Abdulridha, Y. Ampatzidis, P. Roberts, S. C. Kakarla. Detecting
powdery mildew disease in squash at different stages using UAV-based
hyperspectral imaging and artificial intelligence. Biosystems Engineer
ing, 197:135–148, 2020.
[3] M. Chen, F. Brun, M. Raynal, and D. Makowski. Forecasting severe grape
downy mildew attacks using machine learning. PLOS ONE, 15:1–20, 03
2020.
[4] C. Gessler, I. Pertot, and M. Perazzolli. Plasmopara viticola: A review
of knowledge on downy mildew of grapevine and effective disease
management. PhytopathologiaMediterranea, 50:3–44, 04 2011.
[5] E. Gonzalez-Domínguez, T. Caffi, N. Ciliberti, and V. Rossi. A mecha
nistic model of botrytis cinerea on grapevines that includes weather, vine
growth stage, and the main infection pathways. PLOS ONE, 10(10):1–23,
10 2015.
[6] I. Hernández, S. Gutiérrez, S. Ceballos, R. I˜níguez, I. Barrio, and J.
Tardaguila. Artificial intelligence and novel sensing technologies for
assessing downy mildew in grapevine. Horticulturae, 7(5), 2021.
[7] I. Mezei, M. Lukic, L. Berbakov, B. Pavkovic, and B. Radovanovic.
Grapevine downy mildew warning system based on nb-iot and energy
harvesting technology. Electronics, 11(3), 2022.
[8] V. Rossi, T. Caffi, S. Giosue, and R. Bugiani. A mechanistic model‘
simulating primary infections of downy mildew in grapevine. Ecological
Modelling, 212(3):480–491, 2008.
[9] I. Volpi, D. Guidotti, M. Mammini, and S. Marchi. Predicting symptoms
of downy mildew, powdery mildew, and graymold diseases of grapevine
through machine learning. Italian Journal of Agrometeorology, (2):57–
69, Dec. 2021.
14
On the Verification of Diagnosis Models
Abstract
Enhancing systems with advanced diagnostic capabilities for detecting,
locating, and compensating faults during operation increases autonomy and
reliability. To assure that the diagnosis-enhanced system really has improved
reliability, we need – besides other means – to check the correctness of the
diagnosis functionality. In this paper, we contribute to this challenge and
discuss the application of testing to the case of model-based diagnosis, where
we focus on testing the system models used for fault detection and local
ization. We present a simple use case and provide a step-by-step discussion
on introducing testing, its capabilities, and arising issues. We come up with
several challenges that we should tackle in future research.
14.1 Introduction
Every system comprising hardware faces the problem of degradation under
operation, which impacts its behavior over time. To prevent unwanted behav
ior that may lead to harm, we have to carry out regular maintenance tasks.
Maintenance includes preventive activities like changing the tires of cars
when their surfaces do not meet regulations anymore and looking at errors
occurring during operation. The latter requires root cause identification, i.e.,
searching for components we have to repair for failure recovery. There is
no doubt that the maintenance and diagnosis of engineered systems are of
practical importance and, therefore, worth being considered in research.
189
DOI: 10.1201/9781003377382-14
This chapter has been made available under a CC BY-NC 4.0 license.
190 On the Verification of Diagnosis Models
B L1 L2
Figure 14.1 A simple electric circuit comprising bulbs, a switch and a battery.
can be part of a root cause. It is also worth noting that we can use uncertainty
in model-based diagnosis. De Kleer and Williams [4] formalized the use
of fault probabilities of components for searching for the most probable
diagnosis. In addition, de Kleer and Williams introduced an algorithm for
selecting the optimal probing locations for minimizing probing steps for
identifying a single diagnosis.
In this manuscript, we do not focus on the diagnosis methods and
processes themselves. Instead, we provide a discussion on how to verify
diagnosis models. The challenge of model verification is of uttermost impor
tance for assuring that systems equipped with diagnosis functionality work
correctly. Although we may use some of the presented results for verifying
diagnosis models generated by machine learning, we consider models for
model-based reasoning in the context of this paper. For testing machine
learning, we refer the interested reader to a recent survey [24].
The challenge of model-based diagnosis and other logic-based reasoning
systems is not that novel. Wotawa [17] introduced the use of combinatorial
testing and fault injection for testing self-adaptive systems based on models.
The same author also discussed the use of combinatorial testing and meta
morphic testing for theorem provers in [18] and the general challenge [19].
In any of these papers, the focus is on testing the implementation and not
the underlying models. Koroglu and Wotawa [10] also contributed to the
challenge of verifying the reasoning system but focused on the underlying
compiler that allows reading in logic theories, i.e., system models. Hence,
testing the system models used for diagnosis is still an open challenge worth
tackling for quality assurance.
We organize this paper as follows: In Section 14.2, we introduce the
testing challenge in detail including a first solution. Afterward, we present
the results when using the provided solution in a small case study. Finally, we
discuss open issues, and further challenges, and conclude the paper.
192 On the Verification of Diagnosis Models
Figure 14.2 The model-based diagnosis principle and information needed for testing.
the diagnosis engine, we can use models and observations together with
the corresponding expected diagnoses to define a test case. However, when
we want to test the models, which are usually divided into two parts, the
component models, and the structure of the system, we have to further think
about underlying assumptions and prerequisites.
First, we have to assume that the diagnosis engine itself is correct. This
means that the diagnosis engine is delivering the right diagnoses for a given
model and observations. Testing the implementation of the diagnosis engine
might also comprise testing the underlying theorem prover or constraint
solver, the implementation of the diagnosis algorithm, and the compiler that
is used to load a model and the observations into the diagnosis engine.
Second, the observations themselves describe the data that have been
observed from the system. Usually, we do not use the raw data obtained from
the system directly. The data is usually mapped to logical representations.
Because we are only focusing on the verification of models used for diagno
sis, there might also be faults occurring that originate from the mapping of
data to their logical representations. For verifying the model, we do not need
to deal with this topic. We can stay with the abstract representation of real
observations for testing.
Finally, we assume that models can be divided into component models
and structural models. We further assume that the component models are
generally valid and can be used in several systems. This assumption is of
particular importance because one argument in favor of model-based diag
nosis is its flexibility in adapting to different systems and its model re-use
capabilities.
Let us now come up with a definition of the challenge of testing diagnosis
models where we have the following information given:
1. A model M for components of given types and their connections.
For testing we want to have the following:
1. A set of systems Σ and for each system S ∈ Σ a model MS representing
the structure, i.e., its components and connections.
2. For each system S, we want to have a set of inputs, i.e., possible
observations, and a set of expected diagnoses. Note that observations
include inputs and outputs of a system, and control commands (like
opening or closing a switch).
Note that the systems, as well as their inputs, must be obtained such that
they may lead the diagnosis engine to compute different values. This principle
194 On the Verification of Diagnosis Models
1
see https://potassco.org
14.3 Use Case 195
two bulbs regarding light emission (on, off) serve as the inputs. It is worth
noting that the power supply of the battery might also be observed. However,
for the initial testing, we only consider those observations where we do not
require additional equipment for measurement in practice. Nevertheless, for
testing, we may also consider more observations.
When having 3 observations each having a domain comprising 2 values,
we finally obtain 8 test cases covering all combinations. We depict this test
cases in Table 14.1. Note that the first two test cases (which are highlighted
in gray) cover the correct behavior of the system, where the switch is used
to turn on and off lamps. Therefore, we see the empty set as the expected
diagnosis in the corresponding column. The other test cases formalize an
incorrect behavior of the two-bulb circuit.
For testing the model, we run our diagnosis engine model_diagnose
using the observations of a test case. In Clingo adding observations to models
can be simple done via linking the model into a file where we state the
196 On the Verification of Diagnosis Models
Table 14.1 All eight test cases used to verify the 2-bulb example comprising the used
observations and the expected diagnoses. The P/F column indicates whether the original
√
model passes ( ) or fails (×) the test.
Observations Expected diagnoses P/F
√
1 on(s). val(light(l1),on). {{}}
val(light(l2,on)).
√
2 off(s). val(light(l1),off). {{}}
val(light(l2,off)).
√
3 off(s). val(light(l1),on). {{s, l2}}
val(light(l2,off)).
√
4 off(s). val(light(l1),off). {{s, l1}}
val(light(l2,on)).
√
5 off(s). val(light(l1),on). {{s}}
val(light(l2,on)).
√
6 on(s). val(light(l1),on). {{l2}}
val(light(l2,off)).
√
7 on(s). val(light(l1),off). {{l1}}
val(light(l2,on)).
√
8 on(s). val(light(l1),off). {{b}, {s}, {l1, l2}}
val(light(l2,off)).
observations. For the first test case the file tle_obs1.pl comprises the
following statements:
¨
#include two_lamps_example.pl .
¨
on(s).
val(light(l1),on).
val(light(l2),on).
The first line includes the model we show in Figure 14.3, which we
store in the file two_lamps_example.pl. For executing a test case,
we run the diagnosis engine in a shell using the following command:
./model_diagnose -f tle_obs1.pl -fault 2. In this call, we
ask for diagnoses comprising up to two components, which we do via setting
the parameter -fault to 2. Finally, we used a shell script to carry out all
test cases. We see the outcome of testing in column P/F in Table 14.1. The
model passes all tests successfully.
After checking the correctness of diagnosis results obtained when using
the model, we wanted to evaluate the quality of the test suite. In software
engineering, measures like code coverage or the mutation score are used
for this purpose. Estimating code coverage, i.e., the number of rules used
to derive a contradiction for diagnosis is difficult because theorem provers
14.3 Use Case 197
Table 14.2 Running 7 model mutations Mi, where we removed line i in the original model
of Figure 14.3, using the 8 test cases from Table 14.1.
M1 M2 M3 M4 M5 M6 M7
√ √ √ √ √ √ √
1
√ √ √ √ √ √ √
2
√ √ √ √
3 × × ×
√ √ √ √
4 × × ×
√ √ √ √ √
5 × ×
√ √ √ √ √
6 × ×
√ √ √ √ √
7 × ×
√ √ √ √
8 × × ×
type(b, bat).
type(s1, sw).
S2 type(s2, sw).
type(l1, lamp).
type(l2, lamp).
S1
conn(in_pow(s1), pow(b)).
conn(out_pow(s1), in_pow(l1)).
conn(out_pow(s1), in_pow(l2)).
B L1 L2 conn(in_pow(s2), pow(b)).
conn(out_pow(s2), in_pow(l1)).
conn(out_pow(s2), in_pow(l2)).
Figure 14.4 Another simple electric circuit comprising bulbs, switches and a battery. This
circuit is an extended version of the circuit from Figure 14.1. On the right, we have the
structural model of this circuit in Prolog notation.
off only if both switches are open, i.e., in their off state. See Figure 14.4 for
the schematics of the extended two-bulb circuit.
For testing the extended two-bulb circuit, we have to introduce test cases.
Similar to the original circuit, we use all combinations of input values, and
manually computed the expected diagnoses. We depict the whole test suite in
Table 14.3. There we also see the obtained results after automating the test
execution using shell scripts. For many test cases, the computed diagnoses
are not equivalent to the expected ones. We conclude that the provided model
is not generally applicable.
After carefully analyzing the root cause behind this divergence, we iden
tified the rule in Line 4 of the component model (from Figure 14.3) as
problematic. This rule states that an open switch assures that there is no
power on the output of the switch. Unfortunately, there might be electricity
available because of another power supplying component like given in the
extended two-bulb example. Unfortunately, we are also not able to remove
this rule because otherwise, the behavior of the original two-bulb example
would change (see Table 14.2). A solution would be to introduce a specific or-
component that takes the outputs of the two switches as inputs and provides
power whenever at least one power output has a nominal value.
Table 14.3 Test cases for the extended two-bulb example from Figure 14.4 and their test
execution results. In gray we indicate tests that check the expected (fault-free) behavior of the
circuit.
Observations Expected diagnoses P/F
√
1 on(s1). on(s2). {{}}
val(light(l1,on)). val(light(l2),on).
2 off(s1). on(s2). {{}} ×
val(light(l1,on)). val(light(l2),on).
3 on(s1). off(s2). {{}} ×
val(light(l1,on)). val(light(l2),on).
√
4 off(s1). off(s2). {{}}
val(light(l1,off)). val(light(l2),off).
√
5 on(s1). on(s2). {{l1}}
val(light(l1,off)). val(light(l2),on).
√
6 on(s1). on(s2). {{l2}}
val(light(l1,on)). val(light(l2),off).
√
7 on(s1). on(s2). {{b}, {s1, s2}{l1, l2}}
val(light(l1,off)). val(light(l2),off).
8 on(s1). off(s2). {{l1}} ×
val(light(l1,off)). val(light(l2),on).
9 on(s1). off(s2). {{l2}} ×
val(light(l1,on)). val(light(l2),off).
10 on(s1). off(s2). {{b}, {s1}{l1, l2}} ×
val(light(l1,off)). val(light(l2),off).
11 off(s1). on(s2). {{l1}} ×
val(light(l1,off)). val(light(l2),on).
12 off(s1). on(s2). {{l2}} ×
val(light(l1,on)). val(light(l2),off).
13 off(s1). on(s2). {{b}, {s2}, {l1, l2}} ×
val(light(l1,off)). val(light(l2),off).
√
14 off(s1). off(s2). {{s1, s2, l2}}
val(light(l1,on)). val(light(l2),off).
√
15 off(s1). off(s2). {{s1, s2, l1}}
val(light(l1,off)). val(light(l2),on).
√
16 off(s1). off(s2). {{s1, s2}}
val(light(l1,on)). val(light(l2),on).
14.5 Conclusion
In this paper, we discussed the use of testing for model-based diagnosis. We
focused on assuring the quality of system models used for fault detection
and localization. We discussed how to test models and identified arising
shortcomings, and future research directions. Testing a system model comes
in two flavors: (i) testing a model of a particular system and (ii) testing
component models used in different system models. For both, we need to
define test cases comprising observations and expected diagnoses. For testing
component models, in addition, we need to come up with different systems.
Issues and challenges include providing means for answering the question of
when to stop testing, giving quality guarantees, and the automation of test
case generation.
Acknowledgments
The research was supported by ECSEL JU under the project H2020
826060 AI4DI - Artificial Intelligence for Digitising Industry. AI4DI is
funded by the Austrian Federal Ministry of Transport, Innovation, and
Technology (BMVIT) under the program "ICT of the Future" between
May 2019 and April 2022. More information can be retrieved from
https://iktderzukunft.at/en/ .
References
[1] A. Beschta, O. Dressler, H. Freitag, M. Montag, and P. Struss. A model-
based approach to fault localization in power transmission networks.
Intelligent Systems Engineering, 1992.
[2] T. Budd, R. DeMillo, R. Lipton, and F. Sayward. Theoretical and empir
ical studies on using program mutation to test the functional correctness
of programs. In Proc. Seventh ACM Symp. on Princ. of Prog. Lang.
(POPL). ACM, January 1980.
[3] R. Davis, H. Shrobe, W. Hamscher, K. Wieckert, M. Shirley, and S. Polit.
Diagnosis based on structure and function. In Proceedings AAAI, pages
137–142, Pittsburgh, August 1982. AAAI Press.
[4] J. de Kleer and B. C. Williams. Diagnosing multiple faults. Artificial
Intelligence, 32(1):97–130, 1987.
[5] A. Felfernig, G. Friedrich, D. Jannach, and M. Stumptner. Consistency
based diagnosis of configuration knowledge bases. In Proceedings of the
202 On the Verification of Diagnosis Models
205
206 Index
K R
kendryte 130, 132, 134, 139 random forest 165, 178, 182, 183,
key Performance Indicators 2, 5, 6, 7, 186
138 relation extraction 91
L
S
labelling 76, 84, 180
semantic segmentation 53, 56
low power 3, 130, 141
semiconductor wafer 73
smart sensors systems 158
M spiking neural network 26, 38, 129
machine learning 5, 75, 81, 158, 177, STM32 129, 130, 157, 171
180, 191 supervised learning 82, 84, 161, 180
machine vision 73, 76 surface 73, 74, 105, 189
manufacturing AI solutions 81 SVM 161, 177, 182, 185
Mask R-CNN 76, 77, 78
ML 2, 55, 141, 161, 183
T
model-based diagnosis 113, 114,
tensor processing unit 142
125, 189, 192
testing 77, 113, 169, 171, 189, 201
transfer learning and scalability 83,
N 85, 86
neuromorphic 1, 9, 21, 24, 129 TXRF 105, 110, 111
neuromorphic computing 1, 32, 36
neuromorphic processor 2, 23, 25, 32
V
verification and validation 114, 189,
O 203
object detection 5, 36, 77, 141, 151 vibration analysis 158
VPD-ICPMS 105, 106, 107, 111
P
performance 2, 6, 9, 130, 145, 168, W
185 wafer loops 103, 104
physical inspection of electronics 53
physical simulation 113 Y
predictive maintenance 157, 159, 161 YOLO 76, 78, 141, 145
About the Editors
207
208 About the Editors
closing the gap between research and practice. Starting from October 2017,
Franz Wotawa is the head of the Christian Doppler Laboratory for Quality
Assurance Methodologies for Autonomous Cyber-Physical Systems. During
his career, Franz Wotawa has written more than 430 peer-reviewed papers
for journals, books, conferences, and workshops. He supervised 100 master’s
and 38 Ph.D. students. For his work on diagnosis, he received the Life
time Achievement Award of the Intl. Diagnosis Community in 2016. Franz
Wotawa has been a member of a various number of program committees and
organized several workshops and special issues of journals. He is a member
of the Academia Europaea, the IEEE Computer Society, ACM, the Austrian
Computer Society (OCG), and the Austrian Society for Artificial Intelligence
and a Senior Member of the AAAI.
Mario Diaz Nava has a PhD, and M.Sc. both in computer science, from
Institut National Polytechnique de Grenoble, France, and B.S. in commu
nications and electronics engineering from Instituto Politecnico National,
Mexico. He has worked in STMicroelectronics since 1990. He has occupied
different positions (Designer, Architect, Design Manager, Project Leader,
Program Manager) in various STMicroelectronics research and development
organisations. His selected project experience is related to the specifications
and design of communication circuits (ATM, VDSL, Ultra-wideband), digital
and analogue design methodologies, system architecture and program man
agement. He currently has the position of ST Grenoble R&D Cooperative
Programs Manager, and he has actively participated, for the last five years,
in several H2020 IoT projects (ACTIVAGE, IoF2020, Brain-IoT), working
in key areas such as Security and Privacy, Smart Farming, IoT System mod
elling, and edge computing. He is currently leading the ANDANTE project
devoted to developing neuromorphic ASICS for efficient AI/ML solutions at
the edge. He has published more than 35 articles in these areas. He is currently
a member of the Technical Expert Group of the PENTA/Xecs European
Eureka cluster and a Chapter chair member of the ECSEL/KDT Strategic
Research Innovation Agenda. He is an IEEE member. He participated in the
standardisation of several communication technologies in the ATM Forum,
ETSI, ANSI and ITU-T standardisation bodies.
Björn Debaillie leads imec’s collaborative R&D activities on cutting-edge
IoT technologies in imec. As program manager, he is responsible for
the operational management across programs and projects, and focusses
on strategic collaborations and partnerships, innovation management, and
public funding policies. As chief of staff, he is responsible for executive
About the Editors 209