Smart Embedded Systems and Applications
Smart Embedded Systems and Applications
and Applications
SMART EMBEDDED SYSTEMS
Saad Motahhir
Saad Motahhir
Editor:
River Publishers River Saad Motahhir River Publishers
Smart Embedded Systems and
Applications
RIVER PUBLISHERS SERIES IN ELECTRONIC
MATERIALS, CIRCUITS AND DEVICES
Series Editors
Indexing: all books published in this series are submitted to the Web of Science Book Citation Index
(BkCI), to SCOPUS, to CrossRef and to Google Scholar for evaluation and indexing. All River Publishers
books in the area are available on the IEEE Explore platform.
The “River Publishers Series in Electronic Materials, Circuits and Devices” is a series of comprehensive
academic and professional books which focus on theory and applications of advanced electronic mate-
rials, circuits and devices. This includes analog and digital integrated circuits, memory technologies,
system-on-chip and processor design. Also theory and modeling of devices, performance and reliability
of electron and ion integrated circuit devices and interconnects, insulators, metals, organic materials,
micro-plasmas, semiconductors, quantum-effect structures, vacuum devices, and emerging materials. The
series also includes books on electronic design automation and design methodology, as well as computer
aided design tools.
Books published in the series include research monographs, edited volumes, handbooks and text-
books. The books provide professionals, researchers, educators, and advanced students in the field
with an invaluable insight into the latest research and developments.
Topics covered in this series include:-
Editor
Saad Motahhir
École Nationale des Sciences Appliquées,
Sidi Mohamed Ben Abdellah University, Morocco
River Publishers
Published, sold and distributed by:
River Publishers
Alsbjergvej 10
9260 Gistrup
Denmark
www.riverpublishers.com
Preface xv
Acknowledgments xvii
vii
viii Contents
6.3.2
Implementation of the ADTF technique using
multi-CPU architectures. . . . . . . . . . . . . . . . 125
6.3.3 HLS implementation of ADTF technique
using FPGA architecture . . . . . . . . . . . . . . . . 127
6.3.4 VHDL implementation of ADTF technique
using FPGA architecture . . . . . . . . . . . . . . . . 131
6.3.5 VHDL implementation of hybrid DWT-ADTF
technique using FPGA architecture. . . . . . . . . . 137
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 141
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
xv
Acknowledgments
This book could not be that successful without the effort of the authors and
reviewers. Therefore, I would like to express my sincere appreciation to all of
you who generously supported this book.
S. Motahhir
xvii
List of Reviewers
xix
List of Figures
xxi
xxii List of Figures
Figure 3.8 The inputs and outputs of the engine plant model in
Simulink.. . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 3.9 Simplified system components for open-loop mode
overview.. . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 3.10 Simplified system components for closed-loop mode
overview.. . . . . . . . . . . . . . . . . . . . . . . . 44
Figure 3.11 The system layout shows the used hardware and
overall occupied space. . . . . . . . . . . . . . . . . 45
Figure 3.12 Connection layout between vehicle and laptop. . . . . 46
Figure 3.13 AVL CruiseTM M engine model MAF
accuracy results.. . . . . . . . . . . . . . . . . . . . 47
Figure 3.14 AVL CruiseTM M engine model MAP
accuracy results.. . . . . . . . . . . . . . . . . . . . 47
Figure 3.15 Normalized transient real vehicle throttle
position and pedal position which were used in
analyze the transient results. . . . . . . . . . . . . . . 48
Figure 3.16 Normalized transient real vehicle engine speed and
injection time which were used in analyze the
transient results.. . . . . . . . . . . . . . . . . . . . 49
Figure 3.17 A transient cycle comparison of the engine
speed from a vehicle measurement and the
engine speed generated by the microcontroller. . . . . 50
Figure 3.18 The comparison between the accelerator pedal
position signal from a vehicle measurement and
A2S-HIL. . . . . . . . . . . . . . . . . . . . . . . . 51
Figure 3.19 HIL simulation results compared to vehicle data in
open-loop mode. . . . . . . . . . . . . . . . . . . . . 51
Figure 3.20 HIL results in a steady closed-loop cycle with an
accelerator pedal. . . . . . . . . . . . . . . . . . . . 52
Figure 4.1 Configuration of the quadrotor aircraft.. . . . . . . . 61
Figure 4.2 Proposed control scheme for the quadrotor
aircraft.. . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 4.3 Procedure of the PIL experiment using the
STM32F429 board.. . . . . . . . . . . . . . . . . . 68
Figure 4.4 Quadrotor’s position (x,y,z).. . . . . . . . . . . . . . 69
Figure 4.5 Attitude (Φ,Θ,Ψ) . . . . . . . . . . . . . . . . . . . 70
Figure 4.6 Control signals.. . . . . . . . . . . . . . . . . . . . 71
Figure 4.7 3D-path following under disturbances/uncertainties.. 72
Figure 5.1 Signal waveform of ECG.. . . . . . . . . . . . . . . 80
Figure 5.2 Signal waveform of ECG.. . . . . . . . . . . . . . . 81
List of Figures xxiii
xxix
xxx List of Tables
Abbreviations Definition
A2S Automotive systems simulation
ACUA Automotive Company Under Audit
AI Artificial intelligence
ANNs Artificial neural networks
APP Acceleator pedal position
ASIC Application specific integrated circuit
ASIL Automotive safety integrity level
ATI Accurate Technologies Inc.
AVL Anstalt für Verbrennungskraftmaschinen List (Company)
BDC Bottom dead centre
BMBF Federal ministry of education and research
CA Crank angle
CAN Controller area network
CNNs Convolutional neural networks
DAC Digital-analogue converter
DL Deep learning
DMA Direct memory access
DSP Digital signal processing
DT Digital twin
DUT Device under-test
ECU Electronic control unit
FAO Food and agriculture organization
FMU Functional mock-up
FPGA Field programmable gate array
FSA Functional safety assessment
FS Functional safety
HDL Hardware description language
HIL Hardware-in-the-loop
HLL High-level language
xxxi
xxxii List of Notations and Abbreviations
1
1
Functional Safety Audit/Assessment for
Automotive Engineering
Abstract
Automotive safety is important because lives and reputations are at stake. In
this chapter, we present an automotive overview of functional safety audits
and assessments. The safety management activity of an organization needs a
periodic check ensured by functional safety audit and assessment. In accor-
dance with ISO26262, the automotive company should have a defined pro-
cess for functional safety audit and assessment. In this overview, we present
some basic definitions related to ISO26262 and ASIL, categories of audits
and assessments, procedures, and phases to comply with safety requirements.
In addition, we define the scope, objectives, roles and responsibilities, and
departments inside the automotive company looking to ensure a well-defined
process for functional safety audit and assessment.
3
4 Functional Safety Audit/Assessment for Automotive Engineering
A–D. ASIL A dictates the lowest level of risk and ASIL D is the highest
integrity requirements on the product, as you go from A to D, the compliance
requirements get stricter.
Figure 1.3 Phases of compliance with functional safety procedures and plans to
ISO 26262.
○○Documentation
○○Confidence in the use of software tools
○○Qualification of software components
○○Qualification of hardware components
○○Proven in using argument
• ISO26262 Method Selection Guidelines for parts 4, 5, 6, and 8 as per
the criteria: highly recommended (++) for ASILC
• ISO26262 Part4, Part5, Part6 requirements.
The maintenance phase will include:
• ISO26262 WP from Parts 2, 4, 5, 6 and 8.
• ISO26262 Part2 requirements.
• ISO26262 Part8 requirements from chapters:
○○Interfaces within distributed developments
○○Specification and management of safety requirements
1.3 Functional Safety Audit/Assessment Program 11
○○Configuration management
○○Change management
○○Verification
○○Documentation
○○Confidence in use of software tools
○○Qualification of software components
○○Qualification of hardware components
○○Proven in use argument
• ISO26262 Methods and Guidelines(specific to each automotive entity)
for parts 4, 5, 6 and 8 as per the criteria: highly recommended (++) for
ASILC
• ISO26262 Part7, Part9, Part10.
Audited Departments:
Management
• Responsible: Safety manager
Systems
• Responsible : Systems manager or System process responsible
Hardware
• Responsible : Hardware manager or hardware process responsible
Software
• Responsible : SW manager or SW process responsible
Supporting Processes
• Responsible : Processes manager
1.3.1.4 Non-Conformance
A non-conformance as defined by ISO is “The non fulfillment of specific
requirements.” “The definition covers the departure or absence of one or more
quality characteristics or quality system elements from specified require-
ments. A non-conformance may be a failure to:
• Comply with the applicable standard.
• Implement quality manual, procedures, or other documentation require-
ments specified by the automotive company.
• Implement a code of practice, regulation, contract..
1.8 Conclusion
In summary, a good plan is the key to making a functional safety audit and
assessment succeed. When this is done before and during the development
phase of a project, potential risks and gaps can be early detected and cor-
rected. It can also help us to properly manage functional safety within the
project by determining recommendations for corrective actions, as the audit
and assessment are usually carried out by people with engineering and pro-
cess experience associated with the ISO 26262 standard.
References
[1] Birch J. et al. (2013) Safety Cases and Their Role in ISO 26262 Functional
Safety Assessment. In: Bitsch F., Guiochet J., Kaâniche M. (eds)
Computer Safety, Reliability, and Security. SAFECOMP 2013. Lecture
Notes in Computer Science, vol 8153. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-642-40793-2_15
[2] A. Nardi and A. Armato, “Functional safety methodologies for auto-
motive applications,” 2017 IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), 2017, pp. 970-975, doi: 10.1109/
ICCAD.2017.8203886.
[3] G. Xie, Y. Li, Y. Han, Y. Xie, G. Zeng, and R. Li, “Recent Advances and
Future Trends for Automotive Functional Safety Design Methodologies,”
in IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp.
5629–5642, Sept. 2020, doi: 10.1109/TII.2020.2978889.
[4] Y. Chang, L. Huang, H. Liu, C. Yang and C. Chiu, “Assessing auto-
motive functional safety microprocessor with ISO 26262 hardware
requirements,” Technical chapters of 2014 International Symposium
on VLSI Design, Automation, and Test, 2014, pp. 1–4, doi: 10.1109/
VLSI-DAT.2014.6834876.
[5] M. Safar, “Asil decomposition using SMT,” 2017 Forum on
Specification and Design Languages (FDL), 2017, pp. 1–6, doi: 10.1109/
FDL.2017.8303902.
[6] W. M. Goble and J. V. Bujkowski, “Extending IEC61508 reliability eval-
uation techniques to include common circuit designs used in industrial
safety systems,” Annual Reliability and Maintainability Symposium.
2001 Proceedings. International Symposium on Product Quality and
Integrity (Cat. No.01CH37179), 2001, pp. 339–343, doi: 10.1109/
RAMS.2001.902490.
20 Functional Safety Audit/Assessment for Automotive Engineering
Abstract
In the next Vehicle generations, connected and highly developed driving cars
will have an important impact on the networking architecture and the inter-
connection between ECUs(Electronic Control Unit). The automotive indus-
try begins to develop new and efficient strategies to improve the performance
of the global system. AUTOSAR organization as part of this industry tries to
present plenary solutions especially software architectures for new technol-
ogies in this field. Thus, in this chapter, we present the aspects of new E/E
architectures with upcoming technologies. We discuss a new solution pre-
sented by AUTOSAR organization to implement new software requirements
for next-generation cars. This solution aims to provide a safe environment for
the features that require complex data processing and to communicate with
AUTOSAR and non AUTOSAR Platforms. We summarize a comprehensive
comparison between AUTOSAR adaptive and AUTOSAR classic in terms of
functionality and application area. We provide functional Safety preliminar-
ies for the global E/E architectures.
2.1 Introduction
Recent cars like connected and autonomous vehicles becoming a state of art
in the automotive industry. That leads directly to an increase in the percentage
21
22 Comparison between AUTOSAR Platforms with Functional Safety
and thread handling for all adaptive applications and functional clusters that
establish the platform.
In the AUTOSAR Adaptive platform, applications are not totally
bounded by static scheduling and memory management but are free to allo-
cate memory on their current need and break down their tasks thanks to
object-oriented programming.
The Execution Manager module is an element of the architecture
responsible for start-up and stopping the AUTOSAR Adaptive Applications,
and responsible for providing the necessary resources during the execution
period of the applications. To ensure the communication between local appli-
cations and applications on other ECUs including the interaction with the
Adaptive platform services, middleware protocols must be defined. The most
noticeable changes in the use of AUTOSAR Adaptive are the universal use of
Ethernet based communication systems. For the release R20-11, new tech-
nology added to support the Ethernet protocol is related to 10BASE-T1S [9]
which is specified by IEEE802.3cg.This new feature allows easy integra-
tion of devices into automotive Ethernet using the multidrop configuration.
Furthermore, it is localized on layers 1 and 2 of the OSI model and is to be
supported by Classic Platform as well as Adaptive.
AUTOSAR organization has an extensive release plan for adaptive
AUTOSAR. The latest release date was in November 2020.The main focus
for this release is to enhance the security and communication (10BASE-T1S,
ara Communication Groups) by adding new functionality to the platform.
Additionally, some concepts target the Classic and Adaptive Platform and
reinforceing the interaction between the two platforms.
Figure 2.3 Difference between applications within adaptive AUTOSAR and classic
AUTOSAR.
2.5 Classic AUTOSAR Vs Adaptive AUTOSAR 27
Figure 2.4 An example illustrates the code implementation of RTE and ARA::COM.
28 Comparison between AUTOSAR Platforms with Functional Safety
Adaptive platform, and they can use the services that are offered. In the other
hand the focus of the Classic platform is primarily on signal-oriented commu-
nication. Nonetheless, it is also possible to use AUTOSAR Classic in a ser-
vice- based way for communication between multiple ECUs. In practice, the
main properties of the AUTOSAR Adaptive and Classic platforms comple-
ment one another, it may therefore be assumed that ECUs based on both stan-
dards will be used in future vehicles resulting in a heterogeneous architecture.
gateway, and it converts message signals directly into UDP frames [11] on
ethernet then the AUTOSAR Adaptive ECU converts signals from the UDP
frame to a service that is available within ECU2.
2.8 Conclusion
Currently, Autonomous, and connected car technology are developing and
presenting more challenges especially in terms of safety. In the software part,
AUTOSAR as an organization provides solutions that fit very well for these
challenges.
References 31
References
[1] J. G. Kassakian et D. J. Perreault, « The future of electronics in automo-
biles », in Proceedings of the 13th International Symposium on Power
Semiconductor Devices & ICs. IPSD ’01 (IEEE Cat. No.01CH37216),
Osaka, Japan, 2001, p. 15–19. doi: 10.1109/ISPSD.2001.934550.
[2] D.-K. Choi, J.-H. Jung, S.-J. koh, J.-I. Kim, et J. Park, « In-Vehicle
Infotainment Management System in Internet-of-Things Networks »,
in 2019 International Conference on Information Networking
(ICOIN), Kuala Lumpur, Malaysia, janv. 2019, p. 88–92. doi: 10.1109/
ICOIN.2019.8718192.
[3] J. Schafer et D. Klein, « Implementing Situation Awareness for Car-to-X
Applications Using Domain Specific Languages », in 2013 IEEE 77th
Vehicular Technology Conference (VTC Spring), Dresden, Germany,
juin 2013, p. 1–5. doi: 10.1109/VTCSpring.2013.6692589.
[4] R. Hussain et S. Zeadally, « Autonomous Cars: Research Results, Issues,
and Future Challenges », IEEE Commun. Surv. Tutor., vol. 21, no 2, p.
1275–1313, 2019, doi: 10.1109/COMST.2018.2869360.
[5] « Furst and Bechter - 2016 - AUTOSAR for Connected and Autonomous
Vehicles The.pdf ».
[6] S. Bunzel, « AUTOSAR – the Standardized Software Architecture »,
Inform.-Spektrum, vol. 34, no 1, p. 79–83, févr. 2011, doi: 10.1007/
s00287-010-0506-7.
[7] G. L. Gopu, K. V. Kavitha, et J. Joy, « Service Oriented Architecture based
connectivity of automotive ECUs », in 2016 International Conference
on Circuit, Power and Computing Technologies (ICCPCT), Nagercoil,
India, mars 2016, p. 1–4. doi: 10.1109/ICCPCT.2016.7530358.
[8] V. Atlidakis, J. Andrus, R. Geambasu, D. Mitropoulos, et J. Nieh,
« POSIX abstractions in modern operating systems: the old, the new,
and the missing », in Proceedings of the Eleventh European Conference
32 Comparison between AUTOSAR Platforms with Functional Safety
Abstract
In today’s world, the powertrain system complexity has increased drastically,
whereas the development time for powertrain hardware and software is still
being decreased due to the cost reduction strategies of automotive companies.
This chapter will offer an overview of Electronic Control Unit architectures
and their usage in the automotive domain. To highlight the real implementa-
tion constraints, we report a cost-effective solution based on a hardware-in-
the-loop (HIL) system that was developed to perform real-time closed-loop
simulations for testing the engine hardware components and the software in
the ECU, as well as the calibration of these software functions. The proposed
implementation method uses cheaper components compared to the existing
systems in academia or industry. Also, it concentrates on representing a new
combination of software and hardware tools to simulate the engine and signals.
By utilizing this system, it is expected to have fast, robust, and cost-effective
software and calibration development processes for the automotive industry.
3.1 Introduction
In the development cycle of engine controls, several challenges are encoun-
tered in the determination of control targets, controller design, and software
33
34 Hardware-in-the-Loop System
Figure 3.1 Typical V-cycle for automotive controllers’ design and validation.
implementation [1]. The design and validation of such a system are performed
according to the V-cycle presented in Figure 3.1. Nowadays, automotive control
and calibration strategies are much more complicated compared to the past the
car industry. Thus, engineers must put more effort into developing and calibrat-
ing systems within a short period and satisfying strict demands from customers
and emissions regulations [2]. Likewise, there is a need for advanced hard-
ware-software interdependent systems to handle the challenges in the design.
After the software verification step using software-in-the-loop (SIL), the ECU
hardware and calibration are tested in HIL to finalize the cycle. In the past,
several researchers tackled the topic of producing a mobile solution for the HIL
to be used in software and calibration development [3, 4, 5, 6]. However, there
have been many shortcomings in the solutions provided in the literature, espe-
cially in terms of the hardware for generating signals and real-time communi-
cation between the device under test and the plant model environment [7, 8, 9].
As an alternative solution, a system called Automotive Systems
Simulation HIL (A2S-HIL) is planned to be utilized in several calibration
and software development applications such as training activities, engine
model validation, ECU function tests, engine/after-treatment, and com-
ponent calibration activities. Likewise, this tool enhances the process of
studying software functions in the ECU, the closed-loop system structures,
software-hardware interactions in automotive systems, internal combustion
engines, and training activities. Also, software validation tests can be per-
formed for ECU functions on this system as an open-loop system. Similarly,
any engine model that runs in an open- loop system can be validated via
A2S-HIL as well. To close the control loop, the system requires outputs from
3.2 Automotive Embedded Systems Overview 35
the engine plant model and feedback signals from ECU to the model. In case
the validation tests resulted in a mature engine model, calibration tasks that
require open-loop or closed-loop tests on a dynamometer can also be handled
using an on-desk A2S-HIL. According to several studies, mature models can
have an error of less than 31.4% [10]. For that, the accuracy of any engine
model can be determined by calculating the error in the operation points with
the highest time spent. Even though the engine model cannot represent all
the transient cases as well as the real-life, the pre-calibration tasks can still be
done on A2S-HIL. Especially, open-loop functions can easily be calibrated
without the need for feedback coming from the ECU. For example, catalyst
efficiency monitoring does not require any signal to be returned to the engine
model. Consequently, the following calibration tasks are proper to the open-
loop approach: on-board diagnostics (OBD) component monitoring calibra-
tion, OBD release condition calibration, model-based function calibration,
environmental trip precondition, etc.
In this work, we suggest avoiding expensive hardware devices while
ensuring the same accuracy. Therefore, the engine model validation with a real
ECU for control and calibration applications is performed on the same system
without major modifications. Comparably, the postposed system can run sev-
eral driving cycles in open-loop and closed-loop. Also, most of the hardware
connections are replaced with software communication structures. This allowed
maintaining all the system components synchronous in almost real-time. In par-
allel, the concept of functional mock-up (FMU) is introduced into engine HIL
simulations. Besides that, the structure of the system was established on the
Matlab®/Simulink® platform including all the subsystems such as cam/crank
signal generator based on the cycle inputs, AVL CruiseTM M engine model, a cal-
ibration data acquisition software, and AVL Concerto™ was used for plotting.
This chapter is organized as follows: Section 2 presents an introduction
to the automotive electronic controls, Section 3 defines the method and imple-
mentation details of each subsystem, and Section 4 shows the results of the
system in open and closed-loop cycles. Finally, Section 5 concludes the chap-
ter. Moreover, the remainder of this chapter elaborates on the creation of a
testing environment, beginning with the importance of such test systems to the
automotive sector and an example application on a real engine control unit.
targeted as the system [11]. From that point of view, a typical vehicle mostly
consists of an engine, a fuel system, an ignition system, an electrical system,
an exhaust system, a powertrain, a suspension, steering system, a brake sys-
tem, a frame, and a body.
Mainly for an internal combustion engine, the constraints of design
include:
• Obtaining the maximum performance,
• Reducing the fuel consumption,
• Reducing tailpipe emissions,
• Ensuring the safety of passengers,
• Protecting the components of the vehicle.
Consequently, several control algorithms are applied to ensure injecting the
optimal fuel into the cylinders. For such systems, there are sensors and actu-
ators to be controlled electronically.
The ECUs are classified based on their functions such as engine control
module (ECM), body control module (BCM), electronic brake control mod-
ule (EBCM), powertrain control module (PCM), transmission control module
(TCM), suspension control module (SCM), door control unit (DCM), battery
management system (BMS), and many other. All the functions of automotive,
from basic window movements to a critical safety or fuel injection function,
are being controlled by ECUs. They collect, analyze the data, and decide the
actions based on defined parameters.
For example, the engine control module (ECM), also named engine con-
trol unit (ECU), ensures that the car operates at optimal results. It reads most of
the sensors in the engine in order to manage the air-fuel mixture and regulate
the emission. To successfully control an engine, the microcontroller inside the
ECU read the accelerator, brake, and clutch pedal signals to calculate the mass
of fuel to be injected. Then it gives the opening signal to the injectors at the
optimal time [12]. The ECM controls four main parts of any car: air-fuel ratio,
idle speed, variable valve timing, and ignition timing. In terms of the air-fuel
ratio, the ECM uses Lambda sensors to regulate the oxygen to fuel ratio in the
exhaust. That is to detect an engine status as rich or lean. For the idle speed, the
ECM counts on a crankshaft position sensor that tracks the engine revolutions.
The variable valve timing system controls the valves opening and closing to
either increase power or fuel economy. Lastly, the ECM controls the ignition
spark timing. Accurate control of this provides more power and fuel economy.
As another example, the battery management system (BMS) is mostly
seen in hybrid and electric vehicles. Mainly, lithium-ion batteries, are a pack of
multiple cells that need to be monitored closely because failure in a single cell
may affect the performance of the whole battery. Besides that, it checks vari-
ous parameters of the battery such as the health of the battery, state of charge,
the current account, voltage, and power to optimize the performance. BMS
also supports the recharging of the batteries from the regenerative braking.
Though all ECUs function independently, communication between
each other is needed as well. This communication is mostly via the CAN
bus and managed by the body control module (BCM). BCMs fall under the
category of ECUs, but as a gateway to connect the other ECUs. They consist
of processors that manage multiple body functions in a vehicle such as win-
dows, lights, and wipers.
Apparently, ECUs are designed to support a specific number of inputs
and outputs. In addition to that, multi-purpose controllers exist nowadays.
Mostly, Tier 1 suppliers such as Bosch, Continental, and Delphi usually use
the same ECU hardware with several software releases. Considering this high
level of complexity, the automotive engineers follow the development cycle
38 Hardware-in-the-Loop System
Figure 3.2 HIL overview and components used for system realization.
a robust A2S-HIL system are Cam/crank signal generation for steady and
transient engine speed simulation, the delivery of the model outputs to the
ECU, and, transmitting feedback signals to the engine model (in closed-loop
mode).
To power up the ECU, a 3 A 12 V power supply was used. Also, a
push-button switch was used in the electrical circuit to simulate the ignition
key of the engine. As for the accelerator pedal of the A2S-HIL system, there
are two modes for accelerator pedal position (APP) control: the cycle input
mode and the external mode. In the cycle input mode, the APP was fed from
the engine model to ECU by writing its electrical signal value to the memory
directly. While, in the external mode, a potentiometer was used to control the
pedal position by generating the analogue signal for the APP pin.
In Figure 3.2, the subsystem “CruiseTM M Engine Model” represents the
plant model which is a 1-liter spark-ignition engine. It receives the desired
ambient pressure and temperature of test execution from the block “Cycle
Inputs” which also delivers the desired engine speed signal to the “Cam-
Crank Signal Generator” board. The plant can function in an open or closed
loop according to the status of “Mode Switch”. Subsequently, the outputs
of the model are transferred to the ECU through the “Signal Manipulation”
strategy that writes the electrical values of the physical quantities to the mem-
ory of the controller. In closed-loop mode, the actuators and lambda setpoints
are filtered to obtain the actual feedback values. The throttle and the waste-
gate valve signals are sent to the engine.
The cam/crank signals serve as the core of all ECU calculations.
Depending on the software that the ECU uses, the profile of these signals may
vary. The crankshaft position sensor can be considered the primary sensor for
electronic fuel injection and ignition. It produces a pulse for each tooth on
40 Hardware-in-the-Loop System
Figure 3.3 Signal flow for the generated camshaft and crankshaft signals from Simulink® to
ECU through the cam-crank signal generator board.
Figure 3.4 Typical crankshaft and camshaft signals for 1-liter spark-ignition engine [14]
(TDC: Top Dead Centre, BDC: Bottom Dead Centre, CA: Crank Angle.)
the crankshaft which gives the engine speed and the shaft’s instantaneous
position [13]. In A2S-HIL, the frequency of the crankshaft signal was varied
according to the desired engine speed. A board containing an Arduino Uno
and a Sparkfun® CAN-Shield was utilized to generate the crankshaft signal as
shown in Figure 3.3. This “Cam-Crank Signal Generator” board establishes
real-time CAN communication between itself and Simulink®. Furthermore,
intake and exhaust camshaft position sensor signals were also necessary to
measure the camshaft positions for the valves’ opening and closing. These
signals were generated in full synchronization with the crankshaft signal as
shown in Figure 3.4.
For instance, the required PWM frequency to be generated by the
microcontroller for each sample was estimated by converting the maximum
desired engine speed (4000 revolutions/minute = 66/6 revolutions/second) in
the HIL system into sampling time according to the number of degrees in
the crankshaft. Since each revolution of the crankshaft contains 720 degrees
(1 degree = 1 cycle), then the maximum frequency of the “Cam-Crank Signal
Generator” microcontroller is expressed in Equation 3.1.
Figure 3.5 Signals flow from cycle inputs to ECU by ATI® Vision® and NoHooks®.
42 Hardware-in-the-Loop System
Figure 3.6 Signals flow from cycle inputs to ECU by ATI® Vision® and NoHooks®.
Figure 3.7 The approach of slowing down signals flow to ensure real-time running.
time. Actively, the engine model was slowed down to be closer to real-time
as shown in Figure 3.7.
Figure 3.8 The inputs and outputs of the engine plant model in Simulink.
back from ECU to the model as illustrated in Figure 3.10. This mode supports
the calibration of actuators like the throttle valve by running the system and
observing the set points and actual values according to different controller
adjustments.
Figure 3.11 The system layout shows the used hardware and overall occupied space.
In Figure 3.12, the device under-test (DUT) was an opened ECU. Such
controllers allow reaching all the memory in real-time through CAN. The
Kvaser® USBcan® II has two channels. One of them communicated the DUT
to the PC, and the other passed the engine speed frequency to the “Cam-
Crank Signal Generator” via CAN as well. Moreover, the wired connection
consisted of the power supply, crank signal, intake/exhaust cam signals, and
the ignition key switch. Whereas in the left side of the figure, there were two
cables representing the throttle positive and negative position sensor signals.
As highlighted before, the pins of the other sensors were bypassed from
within the memory of the ECU. For example, the MAF sensor electrical sig-
nal address in the RAM was replaced by the engine model value of Matlab®/
Simulink®. This strategy has reduced the number of wires and digital-ana-
logue converters (DAC) required to simulate the whole system.
The steady and transient tests were done on a real MPV (Multi-Purpose
Vehicle). A laptop with ATI® Vision® was used to record the input acquisi-
tions via Kvaser® Leaf Light® v2 CAN device as shown in Figure 4.2. To
save cost, the same Kvaser® USBcan® II can be utilized on the vehicle by
connecting a single CAN channel to the ECU. As with other alternatives,
Vector® CAN devices are compatible with ATI® Vision®. In case other tools
will be used, the datasheets of products must be checked.
real driving emission (RDE) cycle were run for open-loop validation for the
engine mode as well as the cam-crank signal generator board. A transient
cycle with slow dynamics and an external accelerator pedal position signal
was tested for the closed-loop mode validation.
SignalReal − SignalModel
Error (% ) = *100% (3.2)
SignalReal
Therefore, the average errors were 7.71% and 15.01% in mass air flow and
manifold air pressure outputs respectively as shown in Figures 3.13 and 3.14.
The figures of model accuracy show the real measurement mapped data on
the upper right side, the model simulation result on the upper left side, the
difference between measurement and simulation results on the lower left
side, and the error distribution according to y = x line to show displace-
ment of simulation results from real engine data on the lower right side. All
outputs were mapped according to engine speed (x-axis) and fuel quantity
(y-axis). Concretely, the lower left side plots showed the blue and red regions
were the sections with the highest error Additionally, the plot of the lower
3.4 Experimental Evaluation and Measurements 47
right side illustrated the deviation of the model from the real engine and the
engine speed value for each point. Optimally, a 100% accurate model will
have all green on the lower left side and all points on the x=y line on the
lower right side.
Figure 3.15 Normalized transient real vehicle throttle position and pedal position which
were used in analyze the transient results.
3.4 Experimental Evaluation and Measurements 49
Figure 3.16 Normalized transient real vehicle engine speed and injection time which were
used in analyze the transient results.
The other cycle inputs such as the accelerator pedal position and the
environmental conditions were converted to electrical signals and fed to the
ECU in 100 ms sampling time as presented in Figure 3.17. Accordingly, the
real vehicle accelerator pedal signal (blue line) was almost equal to the HIL
system results (red line) the “Signal Manipulation” subsystem processed the
data transfer with a maximum error of 0.02%. This minor error value was due
to the time spent by the engine speed signal to pass from Simulink® to the
“Cam-Crank Signal Generator” board.
Figure 3.17 A transient cycle comparison of the engine speed from a vehicle measurement
and the engine speed generated by the microcontroller.
record containing the engine model’s output was taken and compared with
the actual measurements of the cycle. The manifold air pressure and mass
airflow signals were validated with the real engine data. The minor errors
observed at the time range 120–130 and 180–190 seconds were the same
blues and red regions in Figures 3.13 and 3.14. These results validated the
“Signal Manipulation” subsystem as well.
Figure 3.18 The comparison between the accelerator pedal position signal from a vehicle
measurement and A2S-HIL.
Figure 3.19 HIL simulation results compared to vehicle data in open-loop mode.
52 Hardware-in-the-Loop System
Figure 3.20 HIL results in a steady closed-loop cycle with an accelerator pedal.
3.5 Conclusion
The use of hardware-in-the-loop systems setup for engine calibration tasks
was challenged by cost, time, and manpower to achieve targets. In this chap-
ter, the A2S-HIL working principle and subsystems were discussed. The
first step of this study targeted cost-effectiveness in HIL systems as well
as the high sampling rate. The challenges faced such as the generation of
cam and crank signals, modeling the engine in CruiseTM M, and the integra-
tion of all components was a feasibility study for the A2S-HIL concept. The
areas of application such as software development, engine controls, OBD,
and calibration were introduced briefly. Concretely, the CruiseTM M engine
model validation for calibration purposes was done using a real ECU and
read measurements. Also, the sampling time of all the system components
was 100 ms in open-loop and closed-loop simulations. Despite the robust-
ness of signals and integration of components, there were deficiencies to be
addressed.
In the upcoming research, an engine model with higher accuracy will
be introduced. Moreover, several OBD calibration tasks use cases will be
performed. Similarly, actuator models such as throttle valves will replace the
position setpoints. Likewise, an after-treatment model of this engine model
will be developed to perform calibration on the exhaust path. Additionally,
3.6 Acknowledgements 53
the sampling time for the overall system will be improved. Finally, future
works will focus as well on implementing this cost-effective HIL testing
methodology on a hybrid control unit.
3.6 Acknowledgements
The authors gratefully acknowledge all the affiliates of AVL List GmbH for
providing access to the software and hardware used in this project. Also,
everyone who was involved in developing this challenging project from the
family of AVL is appreciated. The authors would like to thank Accurate
Technologies Inc. (ATI) for their suggestions and support.
References
[1] Shahbakhti M., Li J., Hedrick J. (2012). “Early model-based
verification of automotive control system implementation,” in
Proceedings of American Control Conference (ACC), doi:10.1109/
ACC.2012.6314852
[2] Lee, S. Andert, J. Neumann D., Querel C. et al. (2018). “Hardware-
in-the-Loop-Based Virtual Calibration Approach to Meet Real Driving
Emissions Requirements,” in Proceedings of SAE Int. J. Engines
11(6):1479–1504, https://doi.org/10.4271/2018-01-0869.
[3] Pedersen M.M., Hansen M.R., Ballebye M. (2011). “A Cost-Effective
Approach to Hardware-in-the-loop Simulation,” in Proceedings of
Jabloński R., Březina T. (eds) Mechatronics. Springer, Berlin, Heidelberg
[4] Isermann R., Schaffnit J., Sinsel S. (1999). “Hardware-in-the-loop
simulation for the design and testing of engine-control systems,” in
Proceedings of Control Engineering Practice, ISSN: 0967-0661, Vol: 7,
Issue: 5, Page: 643–653
[5] Kendalla I. R., Jones R. P. (1999). “An investigation into the use of
hardware-in-the-loop simulation testing for automotive electronic con-
trol systems,” in Proceedings of Control Engineering Practice, ISSN:
0967-0661, Vol: 7, Issue: 11, Page: 1343–1356
[6] Jaikamal V. (2009). “Model-based ECU development – An Integrated
MiL-SiL-HiL Approach,” in Proceedings of SAE World Congress
& Exhibition. SAE Technical Paper 2009-01-0153, https://doi.
org/10.4271/2009-01-0153
[7] Nanjundaswamy H., Tatur, M. Tomazic, D. Dahodwala, M. et al.
(2011). “Development and Calibration of On-Board-Diagnostic
Strategies Using a Micro-HiL Approach,” in Proceedings of SAE
54 Hardware-in-the-Loop System
57
4
Processor in the Loop Experiments of an
Adaptive Trajectory Tracking Control for
Quadrotor UAVs
Abstract
Quadrotor drones are highly maneuverable rotary wing vehicles, which are
vulnerable to modeling uncertainties, state variable couplings, and aerody-
namic perturbations. These factors pose an issue that warrants a robust con-
troller. In this chapter, we explore the design of an adaptive sliding mode
controller for trajectory tracking of an uncertain quadrotor. The suggested
approach delivers good tracking performance, despite the severe impact of
uncertainties and disturbances. The Lyapunov theory is used to analyze the
closed-loop stability of the entire system and to calculate the adaptive laws.
The efficacy of the suggested controller is evaluated under the influence of
disturbances and modeling inaccuracies. Moreover, Processor-in-the-Loop
(PIL) experiments on an STM32F429 discovery board are carried out to con-
firm the workability of the suggested method.
4.1 Introduction
During the past few years, quadrotor drones have experienced a boost.
Thanks to the new upcoming technologies, quadrotors have become the pre-
ferred solution to cover multiple tasks including surveillance, package deliv-
ery, mapping, cinematography, etc [1], [2]. The autonomy in the quadrotor
59
60 Adaptive Trajectory Tracking Control for Quadrotor UAVs
mode observer to raise the performance of the control system in path track-
ing missions.
Based on the above cited articles, in this chapter, we synthesize a sim-
ple adaptive nonlinear control law for path following of an underactuated
quadrotor exposed to time-varying disturbances. The suggested controller
is employed for the position and orientation to improve the tracking ability
despite the severe impact of wind disturbances. The newly designed con-
trol algorithm efficiently rejects the impact of uncertainties and disturbances.
Processor in the loop implementations was carried out to confirm the feasibil-
ity of our method. The performance of the suggested controller is examined
under the influence of modelling uncertainties and wind disturbances.
The rest of this chapter is constructed as; the mathematical model of the
aerial robot is investigated in section 2. Next, the proposed control scheme
is detailed in section 3. PIL tests are highlighted in section 4. Section 5 con-
cludes the chapter.
Using the Newton-Euler method, the full dynamical model of the aerial
robot is given by[22]–[25]:
J - J z Jr
Φ y
= ΘΨ UΦ KΦ 2
J - J ΘΩ + J - J Φ + dΦ
x x x x
Θ J z - J x + J r ΦΩ
= ΦΨ + UΘ - K Θ Θ2 +d
Θ
Jy Jy Jy Jy
J - J y UΨ KΨ 2
x
= ΦΘ
Ψ J + J - J Ψ + dΨ
z z z
U K
x = 1 (cos Φ sin Θ cos Ψ + sin Φ sin Ψ) - x x + d x
m m
U Ky
y = 1 (cos Φ sin Θ sin Ψ - sin Φ cos Ψ) -
y + d y
m m
U K
z = 1 (cos Φ cos Θ ) - g - z z + dz
(4.1)
m m
The aircraft’s absolute position is denoted by χ = (x, y, z), while the attitude
( )
is described by Euler angles γ = (Φ, Θ, Ψ). = diag J x , J y , J z symbolizes
the matrix of inertia. Jr denotes the motor inertia. Ki, i = (Φ, Θ, Ψ, x, y, z) sig-
nify the aerodynamic friction constants. (U1,UΦ,UΘ,UΨ) are the four control
inputs. d denotes the bounded additive perturbations.
The model of the quadrotor aircraft (Equation 4.1) can be described by
a second order nonlinear system which has the following form:
χ = Α ( χ ) + Β ( χ ) U + d (4.2)
With χ = {χ𝒽, 𝒽 = 1,2,…,12} stands for the quadrotor state variables given
{
by Φ, Φ , Ψ, Ψ
, Θ, Θ }
, x, x , y, y, z, z , Α(χ) and Β(χ) are given in Equation (4.3).
U = [U1, τΦ, τΘ, τΨ]T represents the control signals. |d𝒽| < ξ𝒽, ξ𝒽 is the upper
limit of additive perturbations.
4.2 Quadrotor Modeling 63
0 0 0 0
Β 0 0 0
χ2 Φ
a χ χ + a χ + a χ2 0 0 0 0
1Φ 4 6 2 Φ 4 3Φ 2
χ4 0 ΒΦ 0 0
2 0 0 0 0
a1Θ χ 2 χ6 + a2 Θ χ 2 + a3Θ χ3
χ6 0 0 ΒΨ 0
0 0 0 0
a1Ψ χ 2 χ 4 + a2 Ψ χ6
2
Α (χ) = ; Β (χ) = νx
χ8
0 0 0
m
a x χ8
0 0 0 0
χ10
νy
a y χ10 0 0 0
m
χ12 0
0 0 0
- g + a χ
z 12 Cχ Cχ
0 0 0
1 3
m
(4.3)
Where, a1Φ =
(J y - Jz ) ,a =-
Jr K
Ω, a3 Φ = - Φ , a1Θ =
(J z - J x ), a = Jr Ω ,
2Θ
Jx
2Φ
Jx Jx Jy Jy
a3Θ = -
KΘ
, a1Ψ =
Jx - Jy ( K
, a2 Ψ = - Ψ ,
) ax = -
Kx
, ay = -
Ky
,
Jy Jz Jz m m
Kz 1 1 1
az = - , ΒΦ = , ΒΘ = , ΒΨ = .
m Jx Jx Jz
The quadrotor vehicle is underactuated, to circumvent this issue, virtual
control signals are introduced as,
U1
(
ν x = m Cχ Sχ Cχ + Sχ Sχ
1 3 5 1 5
)
(4.4)
(
ν = U1 C S S - S C
y m χ χ χ 1 χ3 χ 5 1 5
)
64 Adaptive Trajectory Tracking Control for Quadrotor UAVs
ν x C Ψ + ν y SΨ
Θ d = arctan
d d
νz
ν x SΨ - ν y C Ψ
Φ
d = arctan C
Θ
d
d
(4.5)
d
νz
(
With νz = Cχ Cχ1 3
) Um
1
eh = h - hd
eh = h - hd
eh = h - hd = Α (h ) + Β (h ) U + dh - hd
(4.6)
The control goal is to ensure that the tracking errors vanish asymptotically.
To attain this objective, the sliding manifolds σh for the position and attitude
subsystem are selected as:
σ h = eh + λ h eh (4.7)
σ h =
eh + λ h eh
( )
= h - hd + λ h eh
= A (h ) + Β (h ) Uh + dh - hd + λ h eh (4.8)
̂
ξh = µ h σh (4.10)
1
(hd - A(h) - λ h eh - T1h σ h - T2 h | σ h |α sign(σ h ) - ξ h sign(σ h ))
Uh = h
B(h)
(4.11)
V = - 1h σ 2h - 2 h σ
α h +1
≤0 (4.14)
As the Lyapunov function time derivative V is negative; then, the sliding sur-
face σh and its first time-derivative are forced to converge to the origin. Thus,
the system valued can achieve a stable state.
Figure 4.3 Procedure of the PIL experiment using the STM32F429 board.
4.5 Conclusion
A simple adaptive sliding mode position/attitude control for an unmanned
quadrotor has been elaborated with consideration of modelling inaccuracies
70 Adaptive Trajectory Tracking Control for Quadrotor UAVs
References
[1] M. Hassanalian and A. Abdelkefi, “Classifications, applications, and
design challenges of drones: A review,” Progress in Aerospace Sciences,
vol. 91, pp. 99–131, 2017.
[2] H. Shraim, A. Awada, and R. Youness, “A survey on quadrotors:
Configurations, modeling and identification, control, collision avoid-
ance, fault diagnosis and tolerant control,” IEEE Aerospace and
Electronic Systems Magazine, vol. 33, no. 7, pp. 14–33, 2018.
[3] H. Hassani, A. Mansouri, and A. Ahaitouf, “Robust autonomous flight
for quadrotor UAV based on adaptive nonsingular fast terminal sliding
mode control,” International Journal of Dynamics and Control, vol. 9,
no. 2, pp. 619–635, 2021.
[4] H. Liu, J. Xi, and Y. Zhong, “Robust attitude stabilization for nonlinear
quadrotor systems with uncertainties and delays,” IEEE Transactions
on Industrial Electronics, vol. 64, no. 7, pp. 5585–5594, 2017.
[5] O. Mofid and S. Mobayen, “Adaptive sliding mode control for finite-
time stability of quad-rotor UAVs with parametric uncertainties,” ISA
transactions, vol. 72, pp. 1–14, 2018.
References 73
77
5
A Detailed Review on Embedded Based
Heartbeat Monitoring Systems
Abstract
In medical applications, embedded system technologies have become increas-
ingly significant. Developing tools to improve the safety of healthcare profes-
sionals in the occurrence of epidemic contagious diseases, such as pandemic
influenza, is a top priority. There has been an increase in demand for tele-
medicine services in recent years due to the rise in infectious diseases such as
the Covid-19 viral infection. Telemedicine services include diagnostic tests,
prognosis, and patient monitoring. Various heartbeat monitoring systems have
been introduced to mitigate human distractions and explore the things that
happen inside the human body. The cardiovascular system pushes blood and
transports oxygen and nutrients via blood circulation. Heartbeat monitoring
plays a vital role in preventing real-time accidents and providing the status of
the heart’s pumping prior to the external world. This review starts by using
various sensors that measure the heart rate in a sophisticated manner, includ-
ing invasive and non-invasive technologies. The amendable support with this
modern device is retrofitted with the environmental conditions to provide the
information rate without any delay. Nowadays, the emerging wireless sensor
devices simplify collecting data regarding heart rate monitoring and sharing
the information with anyone who resides in any part of the world. The types
of sensors and the practical difficulties that various monitoring system faces
are reviewed in detail.
79
80 A Detailed Review on Embedded Based Heartbeat Monitoring Systems
5.1 Introduction
Embedded system applications are growing in popularity in recent times.
The area of application varies from the device designs, garments, industries,
healthcare, and armored vehicles, and handheld devices, but also in appli-
cation areas such as mobile networks and ‘e-worlds,’ Artificial Intelligence,
and IoT (Internet of Things), which allow for the creation of a wide range of
software. Monitoring patients’ heartbeat in remote houses is essential to take
proper care after vacating the hospitals. The heartbeat activity in terms of
electrical parameters is measured using Electrocardiography (ECG). Signal
examinations using an electrocardiograph (ECG) can identify various traits
in a patient’s heart. Irregularities, chamber size and location, tissue injury,
cardiac diseases prevalent, and heart rate are all characteristics. The issue
with today’s ECG signal equipment is that they can’t characterize the data
without the need for a full assessment and diagnosis by a specialist [1].
An ECG signal with various features is shown in Figure 5.1. A P wave
is preceded by a QRS complex in a regular rhythm or cycle. The beat then
comes to a halt with something like a T wave. A U wave can develop follow-
ing a T wave on rare occasions.
control over the vehicle without the vehicle driver’s involvement in case any
issues arise in the person’s health.
Various monitoring systems are available, like biometric-based data col-
lection of internal human health activities, including blood pressure and fat
level with multiple parameters such as the pattern of sleep, diet, and exercise
levels. The datum must be evaluated and analyzed more appropriately. The
momentary heartbeat, which may be computed from the ECG, is a crucial
piece of data. The transient heart rate fluctuates within a particular amount
of variation in the resting condition. The variability in terms of heart rate is
measured in a sophisticated way. The variable heart rate analysis allows for
assessing cardiac autonomic stress, diagnosing angina and ischemic cardio-
vascular attacks, and classifying various illnesses [4].
The heart rate variation reflects the equilibrium between sympathetic
nerves [5]. The sympathetic stimulation and reduced nerve (vagal) move-
ment are considered. The spectrum analysis uses low-frequency component
values of around 0.4Hz to 0.15Hz to depict the waveform indicating the per-
son’s attention. Usually, Fast Fourier Transform (FFT) is calculated using
two R-R intervals, and the power spectral density value evaluates the low/
high-frequency components. Hence complex signal processing of algorithms
needs to be assessed the J peak signals of ballistocardiogram depending on
the respiratory exertion and spurious movements that impede heartbeat [6].
Cardiopulmonary monitoring utilizes electrodes attached to the skin, which
is uncomfortable. To avoid this, non invasive methodologies like Doppler
radars are adopted [7]. The monitoring of respiratory action while traveling
in vehicles is complicated and affects the frequency values; the efficiency
of recognizing the pulse duration erodes the power ratio value of the sig-
nal-noise ratio to the minimum value [8].
The work presented in this chapter is organized in the following manner.
Section 2 discusses the various types of equipment to monitor the heartbeat
using invasive and non-invasive technologies. Section 3 provides an overview
of the heartbeat detection algorithms for fast and efficient object measure-
ments. Section 4 focuses on the application point of view of the heartbeat
detection system and its software side. In section 5, the discussion about the
hardware components that need to be utilized in heartbeat detection monitor-
ing. The future work and conclusion of this review are discussed in section 6.
available with cloths. As a result, even though the validity of the collected
data is inferior to that of wearable electronics, these approaches are suitable
for measuring the heart in a car, and a non wearable-type surveillance system
is preferred. Without making contact with the person, the required details are
collected.
The electric signals created by the body during every ventricular con-
traction are measured in the method used by conventional ECG. The magni-
tude of the heart’s pumping behavior over time is used to evaluate the ECG.
In twelve-lead ECG equipment, ten electrodes are employed. A total of six
electrodes are positioned across the chest. The left/right arm and left/right leg
are the four remaining electrodes on the limbs. The drawback of this conven-
tional method is when electrodes get slack while the skin is moist, they are
difficult to place correctly and are impacted by physical movement. The car-
diogram-based measurement depends on the sound produced by the heart’s
beat. The finger-based equipment such as a stethoscope or microphone is
used to evaluate the heart rate. The weak side of this measurement is the
readings are disturbed by finger movements.
The sphygmomanometer is a technique for determining variations in
arterial blood pressure as a function of heart pulse. The weak point is that the
displacement of the human body affects the value. A photodiode or similar
device receives the reflected light from near-infrared light delivered to the
human skin to evaluate the value of heart pulse through wearing the equip-
ment like a smartwatch. The readings are affected by physical movements,
tactile state of fingers, skins, and other factors.
These approaches are acceptable for measuring the person’s heartbeat
in a vehicle or aircraft, and a non-wearable-type surveillance system is pref-
erable to wearable computing. However, the gathered data is less reliable.
Without establishing touch with the particular person, such tracking devices
must accurately determine the driver’s state of awareness. Table 5.1 depicts
parameters such as age, sex, level of heartbeat activity, and some other noise
factors.
The previous methods relied on simple filtering of heartbeat-related data
and imposing a threshold to the filtered signals to recover heartbeat locations
[23] or spectral analysis from estimating heart rate frequency [24]. These
methods could not provide both quick and actual performance and good
prediction. Simple bandpass filtering would give a fast response time, but
the filtered output signals would require additional computation to retrieve
heart rhythm autonomously. As a result, the resilience of these procedures
is severely hampered. To improve the accuracy nowadays for the retrieval
of typical heart characteristics, the ensemble empirical mode decomposition
(EEMD) [25] and the auto-correlation and frequency-time phase regression
(FTPR) algorithms developed a noise-resistant technique [26]. The research-
ers in [27-28] have studied the fast Fourier transform (FFT), or the trans-
form based on wavelet aspects (WT) were utilized to analyze a temporal
fluctuation of the timeframe. The poly-phase basis discrete cosine transform
was applied for pulse rate assessment [29]. The short-time Fourier transform
(STFT) analysis in [30] was used to extract a specific heartbeat waveform,
which was then filtered by an adaptive bandpass filter for increased precision.
The insights obtained from the time - the domain of the cardiac signal on
windows of 2–3 s were used to control the adaptive bandpass filter. Machine
learning-based algorithms were utilized for computations [31]
The radar approach has evolved to be one of the most effective alterna-
tives for non-invasive vital sign surveillance. It can produce small, minimal
sensors that are fully non-obstructive and safe for human safety. Heart rate is
extracted from discrete-time radar signals using different signal processing
algorithms. Radar sensors are performed to recognize sub millimeter changes
of the chest wall skin surface caused by pulse rate. The radar system has
shown considerable promise in estimating heart rate and obtaining ventricu-
lar ejection period utilizing nonlinear filtering approaches.
Many research organizations have looked into the use of CW Doppler
radars to detect a heartbeat. The majority of earlier study was obtained from
experimental information recorded from healthy volunteers lying or sitting in
a confined space. Figure 5.3 depicts the quadrature and in-phase signals of
Doppler radar. Figure 5.4 illustrates the recorded signals of ECG and radar
signals for the time period of 50–250ms.
Table 5.2 plots the various methods to classify the signals with the ref-
erence ECG. The author [32] has mentioned the retrieval of information and
organization with the help of the ANN concept for more than 21 people with
much less window time of less than 1 sec was revealed from the chart, which
86 A Detailed Review on Embedded Based Heartbeat Monitoring Systems
Figure 5.4 ECG and radar recorded signals for a single fragment.
enhances the performance of the device and produces a better result com-
pared to other methods.
the noise factor. Several strategies have been presented and studied to address
this problem.
In the receiver, quadrature demodulation was implemented, allowing at
least one channel which is not functioning at the null point to be selected for
demodulation [36]. A frequency tuning was used, as well as a double-side-
band transmission. The severe null point problem was alleviated by choos-
ing the appropriate frequency spacing [37]. A voltage-controlled RF phase
shifter was used to regulate the communication delays, which is comparable
to adjusting the detecting range. However, further expenses of system com-
plexity and adjustable feedback loop must be made to keep the sensor con-
tinuously functioning at the optimal position without incorporating distance
dependencies [34].
When the comparison was made between the wavelength of the carrier
and the amplitudes due to vibration, in this case, a small approximation was
invalid. Even when the assessment is accomplished at the optimum loca-
tion, powerful nonlinear harmonics and inter-modulation products will be
formed, which do not represent the true mobility of the target. Furthermore,
the amplitude characteristics of the vibration under test are not immediately
extracted from the observable data in the small-angle approximation tech-
nique. One proven way is to use Bessel function expansion to estimate the
baseband signal’s harmonic ratio [38]. This strategy can only be used when
a small number of vibration tones are available. There is also a high demand
that the harmonic ratios be precisely determined. An arctangent demodu-
lation technique has been developed to deal with the challenges [39]. The
co-domain range of arctangent value is (-π/2 to π/2). A disconnection will
happen once demodulation surpasses this range. In theory, phase unwrapping
methods that relocate the discontinuity point by a multiple (integer) of π
can eradicate such discontinuity. However, deciding which point has to be
shifted is almost impossible for hardware (or software), particularly when
the amplitude of vibration is high. However, if the vibration amplitude is
minimal, remote calibration can be challenging during the presence of noise.
Due to the existence of noise in practice, this type of phase unwrapping may
not always be beneficial [40]. Signal processing methods in optical com-
munication utilize Differentiate and cross multiply (DACM), a much better
modulation technique related to angle in optical communication. DACM may
potentially eliminate the major limitations of the approximation method. As a
result, it may be a viable choice for CW Doppler sensors.
Figure 5.5 depicts the measured I and Q values along with the detailed
view of the arctangent function in Figure 5.5 (c), which is outdated; the dis-
continuity problem and co-domain issues are overcome with the assistance
5.4 Various Embedded Applications in the Medical Field 89
Figure 5.5 Measured values of optimum and null point values under a) I channel, b) Q chan-
nels, c) arctangent function, d) DACM function.
of a DACM that has been expanded. Irrespective of the detecting range, the
frequency of [n] is always determined tobe1 Hz using the expanded DACM
algorithm. In other words, in a small-angle approximation, it is distance inde-
pendent of the null point condition.
computing libraries, is quite well, has a compiler for practically all micro-
controllers, and has easy access to information. Table 4 plots out the various
embedded application in the medical field. Older adults may have inadvertent
injuries due to falls accidentally. The detection of falls can be monitored with
5.5 Embedded System Hardware for Heartbeat Monitoring 91
support from the nodes. The nodes are interconnected to the servers and feed
movements from the accelerometer to the servers in real-time. The server
determines whether or not such a fall has transpired and reacts appropriately
[42]. The gadgets utilization to monitor, diagnose the disease, and decide
based on the algorithms [43]. The mechanical ventilators make use of rasp-
berry pi boards to monitor the resuscitator bag [44], and wearable devices
to examine the respiratory cycle of human beings [45]. Deep learning-based
algorithms for classifying X-ray images [46] for efficient detection of dis-
eases. In order to prevent airborne diseases from keeping social distance,
monitoring has been examined with the help of embedded-based systems
implemented [48]. The radar-based wireless healthcare monitoring systems
have been implemented to remote monitor the patient from the healthcare
center and the hospital [49–51].
Embedded system technology is improving at a quick pace as well.
While previously dealing with microcontrollers with limited resources, it is
now feasible to discuss devices that reach extremely high speeds. Subjects
like internet protocols and encryption techniques have recently emerged as
new research areas. We now require platforms that endorse us as program-
ming becomes more complex. The Internet of Things (IoT) architecture
includes “entry points,” which interconnect other functionalities and objects
to the network.
for estimating pulse rate and Spo2 readings. Then [53] has sensor works in
reflection mode.
In this paper, [54-57] max30102 sensor is used to predict the pulse rate
values. It consists of two lights; one is red, and the other is infrared, with
wavelengths of 650nm and 950nm. Hence the absorption of light varies for
oxygenated/deoxygenated blood. The sensor received the reflected backlight
utilizing the algorithm the value of Spo2 was evaluated. To get blood from
94 A Detailed Review on Embedded Based Heartbeat Monitoring Systems
Figure 5.9 Node MCU and arduino board connection to monitor heart rate / Spo2.
the heart to the rest of the body, it needs to be pounded at high pressure. It
occurs as a result of the pulse, and it enables the arteries to be strained (blood
pressure) when the blood flows through them. Arteries are the vessels that
convey blood away from the heart. As a result of the blood pressure causing
the artery to expand and shrink, the volume of the artery in the body parts (in
this case, the fingertip) rises and falls. As the volume increases, more hemo-
globin accumulates in the section region, raising the amount of consumed
infrared light and decreasing the reflected signal back to the pulse rate sensor.
In Figure 5.10 Arduino Lilypad controller is used to estimate the heart
rate wirelessly by means of sending SOS/to make calls using the GSM mod-
ule of SIM900A [53]. A noise-reducing circuit sensor and a photonic inte-
grated boosting circuit capture the information from the finger and deliver it
to the microcontroller, which uses the logic to determine the heart rate. The
RF module is used to transmit the messages or make calls for the guardian to
intimate the situation of the person’s current pulse rate [54].
In Figure 5.11, the non-invasive method of monitoring the person’s
respiratory system [55] is to evaluate the health without disturbing by means
of wearing watches or any other equipment to collect the information. In
section 3, the classification of heartbeat signals and demodulation techniques
5.6 Conclusion 95
5.6 Conclusion
The present state of the heart rate monitoring system was examined in the
aspect of invasive and non-invasive technologies. The comprehensive study
of the merits and demerits of the technologies revealed that the non-invasive
mode plays a vital role in the future modern technologies without interfering
96 A Detailed Review on Embedded Based Heartbeat Monitoring Systems
or disturbing the person. The amalgamation of the non invasive method with
cloud-based concepts suffices for the more accurate and sensitive device
for heartbeat detection. The invasive technology-based methods’ reliability
mainly depends on the subject wearing the object; otherwise, there is no use.
They are solely responsible for heart-rate detection and can be worn with-
out constraining the driver. Furthermore, with the advent of smartphones in
recent decades, cameras have become handheld, which is also a significant
advancement. There are a number of issues with today’s wristwatch and
mobile phone technologies, as well as impediments to their practical imple-
mentation in vehicles. So non-invasive methods are widely used to assuage
the drawbacks of the invasive type of technologies.
In this review, one of the upcoming robust radar-based non invasive
technology was explored. The insinuate components such as the radar types,
classification of algorithms from wavelet to Artificial Neural Networks,
heartbeat displacement models, and demodulation techniques were exam-
ined. From the application point of view, the invasive and non invasive mode
of technologies in the automobile sector to measure the heartbeat detection
of the vehicle drivers to avoid accidents and revealed the merits and demerits
of the technologies, which show the non-invasive method is one of the best
contenders in the automobile field.
It is vital to examine the right hardware and applications by evaluating pres-
ent and future technical developments in automated vehicles and also the recent
developments in heartbeat detection and monitoring systems. By analyzing the
situation based on the heartbeat of the vehicle’s driver, the mechanical part alone
takes control over the vehicle. Such technologies need a non invasive method
of extracting the driver’s heartbeat in much sophisticated and meticulously
made computation with the support of Adaptive Neuro-fuzzy Inference System
(ANFIS) type of algorithms for speedy calculations. In the future, the wire-
less-based sensor network will play a major role in healthcare due to its remote
patient monitoring system. In the upcoming years, this non invasive technology,
with the help of Radiofrequency wireless technologies, has greater potential to
save and prolong the life span of many people. In this pandemic situation, the
diseases are mostly spread through air borne, which can be mitigated with this
kind of wireless sensor system technologies without any contact with the infected
person and monitor the person remotely from the health care center itself.
References
[1] Y. Hu, S. P. (1997, September). ‘A patient-adaptable ECG beat classifier
using a mixture of expert’s approach,’ IEEE Trans. Biomed. Eng., vol.
44, no. 9, pp. 891–900.
References 97
[2] CNN, Driver Killed and Seven Children Hurt in Mississippi School Bus
Crash. Available online:https://edition.cnn.com/2019/09/10/us/missis-
sippi-school-bus-crash/index.html (accessed on 30 July 2021).
[3] Kwon, S.; Jung, C.; Choi, T.; Oh, Y.; You, B. Autonomous Emergency
Stop System. IEEE Intell. Veh. Symposium Proc. 2014, 444–449.
[4] Ryan, S. S., ‘Understanding the Electrocadriogram (EKG or ECG)
Signal’, Retrieved December 2, 2014, from Atrial Fibrillation
Resources for Patients: http://a-fib.com/treatmentsfor-atrial-fibrillation/
diagnostic-tests/the-ekg-signal/
[5] J. H. Shin, S. H. Hwang, M. H. Chang, K. S. Park, Heart Rate Variability
Analysis Using a Ballistocardiogram During Valsalva Manoeuvre and
Post Exercise. Physiol. Meas. 2011, 32, pp.1239–1264.
[6] Y. Liu, Y. Lyu, Z. He, Y. Yang, J. Li, Z. Pang, Q. Zhong, X. Liu, H.
Zhang, ‘ResNet-BiLSTM: A Multiscale Deep Learning Model for
Heartbeat Detection Using Ballistocardiogram Signals’, J Healthc Eng.,
2022,pp.6388445.
[7] P. Kontou, S. Ben Smida, S. Nektarios Daskalakis, S. Nikolaou, M.
Dragone and D. E. Anagnostou, ‘Heartbeat and Respiration Detection
Using a Low Complexity CW Radar System,’ 2020 50th European
Microwave Conference (EuMC), 2021, pp. 929–932.
[8] K. Tsuchiya, K. Mochizuki, T. Ohtsuki, K. Yamamoto, ‘Heartbeat
Detection Technology for Monitoring Driver’s Physical Condition,’
SAE Technical Paper 2020-01-1212, 2020.
[9] C. Ye, K. Toyoda, T. Ohtsuki, ‘Robust Sparse Adaptive Algorithm for
Non-Contact Heartbeat Detection with Doppler Radar’, IEICE Tech.
Rep., 117, pp. 5–10, 2018.
[10] C. Gu, ‘Short-range non-contact sensors for healthcare and other emerg-
ing applications: A review’ Sensors 2016, 16, 1169.
[11] S. Izumi, Development of Non-Contact Heart Rate Variability and
Respiration Monitoring Technology Using Microwave Doppler
Sensor for in-Vehicle Application, Research Paper Funded by Takata
Foundation; Takata Foundation: Tokyo, Japan, 2008.
[12] M. Zhao, F. Adib, D. Katabi, ‘Emotion recognition using wireless signals’
In Proceedings of the 22nd Annual International Conference on Mobile
Computing and Networking, New York, NY, USA, 3–7 October 2016.
[13] J. A. Healey, R. W. Picard, ‘Detecting stress during real world driv-
ing tasks using physiological sensors’, IEEE Trans. Intell. Transp. Syst.
2005, 6, pp.156–166.
[14] B. G. Lee, B. L. Lee, W. Y. Chung, ‘Wristband-type driver vigilance
monitoring system using smartwatch’, IEEE Sens. J. 2015, 15, pp.
5624–5633.
98 A Detailed Review on Embedded Based Heartbeat Monitoring Systems
and FTPR algorithm’, IEEE Trans. Microw. Theory Technol. 2018, 66,
pp.556–567.
[27] M. Li, J. Lin, ‘Wavelet-transform-based data-length-variation technique
for fast heart rate detection using 5.8-GHz CW Doppler radar’, IEEE
Trans. Microw. Theory Technol. 2018, 66, pp.568–576.
[28] J. Tu, J. Lin, ‘Fast acquisition of heart rate in non-contact vital sign
radar measurement using time-window-variation technique’, IEEE
Trans. Instrum. Meas. 2016, 65, pp.112–122.
[29] J. Park, J. W. Ham, S. Park, D. H. Kim, S. J. Park, H. Kang, S. O. Park,
‘Polyphase-basis discrete cosine transform for real-time measurement
of heart rate with CW Doppler radar’, IEEE Trans. Microw. Theory
Technol. 2018, 66, pp.1644–1659.
[30] K. Yamamoto, K. Toyoda, T. Ohtsuki, ‘Spectrogram-based non-contact
RRI estimation by accurate peak detection algorithm’, IEEE Access
2018, 6, pp.60369–60379.
[31] J. J. Saluja, J. J. Casanova, J. Lin, ‘A Supervised Machine Learning
Algorithm for Heart-rate Detection Using Doppler Motion-Sensing
Radar’, IEEE J. Electromagn. RF Microw. Med. Biol. 2019, 4, pp.45–51.
[32] N. Malešević, V. Petrović, M. Belić, C. Antfolk, V. Mihajlović,
M. Janković, ‘Contactless Real-Time Heartbeat Detection via 24 GHz
Continuous-Wave Doppler Radar Using Artificial Neural Networks’,
Sensors 2020, 20, 2351.
[33] C. Ye, K. Toyoda, T. Ohtsuki, ‘Blind source separation on non-contact
heartbeat detection by non-negative matrix factorisation algorithms’,
IEEE Trans. Biomed. Eng. 2019, 67, pp.482–494.
[34] J. Wang, X. Wang, L. Chen, J. Huangfu, C. Li, L. Ran, ‘Non-contact
distance and amplitude-independent vibration measurement based on
an extended DACM algorithm’, IEEE Trans. Instrum. Meas. 2014, 63,
pp.145–153.
[35] W. Pan, J. Wang, J. Huangfu, C. Li, and L. Ran, Null point elimina-
tion using RF phase shifter in continuous-wave Doppler radar system’,
Electron. Lett., vol. 47, no. 21, 2011, pp. 1196–1198.
[36] A. D. Droitcour, O. Boric-Lubecke, V. M. Lubecke, J. Lin, and G. T.
Kovacs, Range correlation and I/Q performance benefits in single chip
silicon Doppler radars for non-contact cardiopulmonary monitoring,’
IEEE Trans. Microw. Theory Tech., vol. 52, no. 3, 2004, pp. 838– 848.
[37] Y. Xiao, J. Lin, O. Boric-Lubecke, and V. M. Lubecke, ‘Frequency tun-
ing technique for remote detection of heartbeat and respiration using
low power double-sideband transmission in Ka-band,’ IEEE Trans.
Microw. Theory Tech., vol. 54, no. 5, 2006, pp. 2023–2032.
100 A Detailed Review on Embedded Based Heartbeat Monitoring Systems
Abstract
This chapter proposes a survey on the use of embedded systems in some
biomedical applications such as ECG signal processing. Various embedded
architectures are proposed such as Multi-cores CPU, FPGA, and CPU-FPGA
architectures. Some recently published implementations of ECG signal
denoising methods are also proposed in this chapter. These implementations
concern time-domain analyses as ADTF technique, time-frequency domain
as DWT technique, or a hybrid approach as ADTF-DWT technique.
6.1 Introduction
Nowadays, embedded systems are emerging considerably in different engi-
neering and research fields. This is due to their high possibility to offer spec-
ified needs to these fields. From very low-cost systems to high performances
ones, embedded systems propose different architectures to respond to differ-
ent criticalities of embedded approaches [1].
Embedded systems are complex systems that integrate software and
hardware designed together to provide given functionality. They are gener-
ally composed of one or more microprocessors intended to execute various
programs defined during conception and saved in memories. To optimize
103
104 Embedded Systems in Biomedical Engineering
CPU Architectures
A CPU or Central Processing Unit refers to a processor-one of the main ele-
ments that go into the composition of electronic devices. The processor has
an executor role. It abides by the instructions given to it by computer pro-
grams. In fact, its function resembles that of the human brain.
The CPU comprises three main parts: control unit (CU), arithmetic
logic unit (ALU), and registers [6]. The architecture of a CPU is presented
in Figure 6.4.
The CU tries to find instructions in memory, decodes them, and then
coordinates the rest of the processor to execute them. An elementary con-
trol unit consists of an instruction register and a “decoder/sequencer” unit.
The ALU executes the logic and arithmetic instructions requested by the
CU. The instructions relate to one or more operands. Execution speed is
optimal when the operands are located in registers rather than in external
memory to the processor. Registers are memory cells internal to the CPU.
They are few but very quick to access. They are used to store variables,
intermediate results of operations (arithmetic or logical), or processor con-
trol information.
The structure of the registers varies from one processor to another. This
is what makes each type of CPU have its own instruction set. Their basic
108 Embedded Systems in Biomedical Engineering
functions are nevertheless similar, and all processors have roughly the same
categories of registers:
• The accumulator is primarily intended to contain the data which the
ALU must process.
• General registers are used to store intermediate results.
• Address registers are used to create specific data addresses. These are,
for example, the base and index registers that make it possible to orga-
nize the data in memory as indexed tables.
• The instruction register contains the instruction code, which is pro-
cessed by the decoder/sequencer.
• The program counter saves the address of the following instruction to
be executed. In principle, this register keeps counting. It generates the
addresses of the instructions to execute one after the other.
• The state register, sometimes called the condition register, contains indi-
cators called flags whose values (0 or 1) vary according to the results of
arithmetic and logic operations. These states are used by the conditional
jump instructions.
6.2 Embedded System Architectures 109
DSP Architectures
A DSP for “Digital Signal Processor” is a particular type of microprocessor.
It is characterized by integrating a set of special functions that are intended to
make it particularly efficient in digital signal processing.
A DSP is implemented by combining memory (RAM, ROM) and
peripherals like a conventional microprocessor [9]. A typical DSP is more
intended to be used in stand-alone processing systems. Therefore, it is gener-
ally in the form of a microcontroller incorporating, depending on the brands
and ranges of manufacturers, memory, fast synchronous serial ports, timers,
DMA controllers, and different I/O ports.
110 Embedded Systems in Biomedical Engineering
is saved in this memory. One memory port is connected to the I/O controller
which continuously accepts data from DACs and ADCs.
ADC takes the analog input and converts it into digital of proper width
with a suitable sampling rate. The IO controller writes it into the data RAM.
This data is saved in RAM after being processed by the DSP, where the IO
controller will collect data and send it to the DAC. Analog output will be pro-
vided by the DAC. ADCs and DACs can also be incorporated into the DSP.
GPU Architectures
Graphic processors have grown significantly. Compared to CPUs, GPUs use
many threads that copy between them to exploit the total parallelism of the
architecture. This number of threads varies between architectures depending
on the hardware’s need and limitations.
GPUs have adopted SIMD (Single Instruction Multiple Data) as a basis
for parallelism. The reason why SIMD is the best choice can be summarized
in the fact that GPU threads are not as high in terms of processing frequency
as CPU threads. For this reason, they are weak on the execution side of many
mathematical instructions, which makes data parallelism more efficient. In
general, the exploitation of GPU architecture requires specific languages like
OpenGL to parallelize the different data or instructions in the GPU threads.
The Nvidia GPU architecture is extremely different from the CPU
architecture. It consists of several core blocks:
• Multiprocessors, called “Streaming multiprocessors” (SM)
• Processors or “CUDA Core,” called “Streaming processors” (SP)
• Memory (global, constant, shared)
A GPU consists of one or more multiprocessors. Each one of those multipro-
cessors has N CUDA cores (processors). The more multiprocessors you have,
the more GPU will be able to process tasks simultaneously or the same task
faster. The performance of Nvidia hardware has improved by combining an
improvement in the number of multiprocessors and the number of cores per
multiprocessor.
Each multiprocessor has access to what is called a register file, which
is a block of memory that runs at the same speed as the processors (CUDA
cores). Each processor has its own private register. In terms of memory, this
results in zero latency which is a local memory for each processor.
Each multiprocessor has its own on-chip memory in the form of shared
memory and an L1-level cache. All processors in one multiprocessor can
access this memory while processors in another multiprocessor cannot.
112 Embedded Systems in Biomedical Engineering
The L2 level cache is the data unification point between the different
multiprocessors. It takes care of all load and store requests and provides faster
access to the global memory allowing high-speed data sharing. All processors
can access the texture memory and the constant memory used for read-only
data. The GPU has a global memory. All processors in a multiprocessor have
access to this memory. The following figure shows the architecture of a GPU.
FPGA Architectures
FPGAs (Field Programmable Gate Arrays or “programmable logic net-
works”) are fully reconfigurable VLSI components, which allows them to be
reprogrammed at will to significantly accelerate certain calculation phases.
The advantage of this type of circuit is its great flexibility which allows
them to be reused at will in different algorithms in a very short time.
The progress of these technologies makes it possible to make even
faster and more profound integrated components, which makes it possible to
program important applications.
FPGA circuits consist of a matrix of programmable logic blocks sur-
rounded by programmable input-output blocks. The assembly is linked by a
programmable interconnection network.
FPGAs are quite distinct from other families of programmable circuits
while offering the highest level of logic integration. Here is an example of the
internal structure of asymmetric array type FPGA (figure 6.8):
6.2 Embedded System Architectures 113
Besides the outputs and inputs of the circuit, the advantage of FPGAs
relies in their ability to be configured on-site without sending the circuit to
the manufacturer, which allows them to be used for a few minutes after their
design. The most recent FPGAs can be configured in a hundred milliseconds.
FPGAs are used for the fast and inexpensive development of ASICs.
CPU-GPU Architectures
CPU-GPU systems contain two parts, one for the host and the other for the
device. The host part is always a CPU, but the device part is a GPU. The CPU
part takes charge of initializing and sending the necessary data to the GPU
part as well as sending commands to the GPU for the implementation of the
different algorithms. This ordering is based on sending the number of threads
to execute and the number of workgroups. Generally, the use of this system
is done in two levels of parallelization. The first one is based on the intercon-
nection of treatment between the CPU and the GPU to distribute the treat-
ment tasks between them. This mode of parallelism is done using specific
languages like CUDA, and OpenACC for Nvidia architectures. On the other
side, we can find the use of OpenCL for different architectures. The second
level of parallelism is based on the combination of OpenMP and OpenCL.
This level of parallelism exploits the entire target architecture based on CPU
cores using OpenMP and GPU threads using OpenCL. In this case, we can
conclude that the use of these types of architecture is based on a thorough
study to exploit the totality of the resources based on an optimal implemen-
tation. Additionally, these architectures have a memory model ecosystem.
This memory model is based on the global memory of the host and device
that communicate with each other through buses. The wrong communication
between the CPU and GPU part can sometimes create memory latency prob-
lems which influence the execution time of such algorithms. The following
figure shows the architecture of a CPU-GPU system.
Figure 6.11 Target architecture of a heterogeneous system integrating CPU and FPGA.
CPU-FPGA Architectures
As accelerators continue to raise the bar for performance and energy effi-
ciency, heterogeneous architectures combining CPU and FPGA are now
becoming increasingly desirable for achieving significant performance gains.
Because of developments in interconnection technologies between heteroge-
neous devices, data communication is also becoming much more efficient.
Since the CPU and FPGA may interact through shared memory in these
systems, the cache hit rate improves and overall data transmission latency
decreases [14].
Figure 6.11 depicts a heterogeneous system platform that combines
CPU and FPGA. To develop individualized hardware accelerators, the FPGA
logic consists of look-up table logic and on-chip memory resources. The
CPU is a multi-core general-purpose processor with a cache structure on the
chip. For fast data transfer, the CPU and FPGA include coherent memory
interfaces. The physical connection between the CPU and the FPGA is cre-
ated using high-speed connectivity.
The shared memory structure between the CPU and the FPGA is
depicted in Figure 6.12. For FPGA, the DRAM access granularity is cache
line. The FPGA, on the other hand, is unable to access the DRAM directly
[15]. To access DRAM data, it must, instead, submit read/ write requests to
the coherent cache system. Figure 6.12 shows how the FPGA and CPU share
the last level cache on the CPU. The shared memory allows the FPGA and
CPU to communicate data efficiently.
6.3 Embedded Systems Applications in Biomedical Engineering: 117
“g” is the mean of the chosen window, “m” is the size of the window, and “ψ(i)”
is the noisy ECG signal. The upper threshold equation is presented by (2):
118 Embedded Systems in Biomedical Engineering
Ht = g + ( Mx - g ) × b (6.2)
“Ht” is the upper threshold of the chosen window, “Mx” is the maximum
value of the chosen window, and “b” is the thresholding coefficient. For the
lower threshold, the equation is presented by (3):
Lt = g - ( g - Mi ) × b (6.3)
“Lt” is the lower threshold of the selected window, “Mi” is the minimum
value of the selected window. “b” is the thresholding coefficient with:
0<b<1
As presented in [16, 17], lower values of the “β” coefficient are proposed for
high noise concentration. In the case of lower noise concentration, a larger
tolerance is required, i.e. higher values of β are recommended. The ADTF
algorithm is the following for a given window of size [i;i+m]; with “ϕ” is the
corrected ECG signal:
• If ψ (i + m / 2) > Ht ⇒ ϕ(i + m / 2) = Ht
• If ψ (i + m / 2) < Lt ⇒ ϕ(i + m / 2) = Lt
• If Lt < ψ (i + m / 2) < Ht ⇒ ϕ(i + m / 2) = ψ (i + m / 2)
Figures 6.13 and 6.14 present two radical cases of the influence of noise on
the morphologies of the ECG signal. For the first case, that of figure 6.13, the
ADTF technique makes it possible to establish the different characteristics
of the waves deteriorated by a very high level of noise of 10dB. This allows
a correct analysis of the ECG signal waves (Ex: The P or T waves) despite
a significant deterioration of their physiological characteristics. Regarding
figure 6.14, the ECG signal presents the case of hyperkalemia with a critical
deterioration of the major characteristics for the diagnosis of this case (Ex:
The P wave, the QRS complex) by adding a very high level of noise of 10dB.
The ADTF has also made it possible this time to restore these characteristics
and to offer a signal ready for diagnosis.
For quantitative evaluation of the ADTF method, the use of the SNR
improvement parameter is proposed. The SNRimp equation is presented
by (6.4):
∑
i=n 2
[φ(i) - ψ (i)]
SNRimp = 10 × log10 i =1
(6.4)
∑ [φ(i) - ϕ(i)]
i=n 2
i =1
6.3 Embedded Systems Applications in Biomedical Engineering: 119
Figure 6.13 ECG signal denoising based on ADTF: the case of signal n°221 at 10 dB of
WGN. (a) The original signal, (b) noise infected signal, (c) corrected signal.
Figure 6.14 ECG signal denoising based on ADTF: the case of signal n°210 at 10 dB of
WGN. (a) The original signal, (b) noise infected signal, and (c) corrected signal.
Figure 6.15 Evaluation of the SNRimp parameter for the ECG signal denoising based on ADTF.
Figure 6.16 Comparative diagram of the evolution of the SNRimp parameter for the ECG
signal denoising based on ADTF, EEMD-FR, and EEMD-GA.
A[ n] = ∑ x[k ] × h[n - k ]
k =0
(6.5)
D[ n] = ∑ x[k ] × g[n - k ]
k =0
(6.6)
With A[n] being the result of the high-pass filter of signal x, this result is
referred to as an approximation of this signal. D[n] is the result of the low pass
filter of signal x, this result is referred to as a detail of this signal. m represents
the size of the signal x or the number of samples of this signal [21]. Figure 6.17
presents an example of a decomposition of the signal x. Table 1 presents an
example of the coefficients of the mother function of debauchees 2.
wavelet transform (DWT) [17, 22]. The purpose of this method is to combine
the advantages of the two methods to improve the denoising of the ECG sig-
nal. The purpose of the proposed ADTF-DWT hybrid method is to deal with
EMG electromyogram noises, power line frequency interference (50Hz), and
high-frequency noises that could disturb the ECG signal.
The proposed algorithm is based on three denoising steps. This process
makes it possible to successively reduce the noise of the ECG signal:
Step 1: The DWT decomposes the ECG signal into various frequency
bands. The Debauchies coefficients 6 (dB6) are the wavelet coef-
ficients utilized in this approach. These coefficients demonstrate
the best results in this technique when compared to others. This is
owing to db6’s resemblance to various ECG signal morphologies.
Details 1 and 2 (D1 and D2), as illustrated in Figure 6.18, concen-
trate on a major portion of the noise in the ECG signal [23]. In the
case of the ADTF-DWT technique, we propose that these details
be removed in the first stage.
Step 2: The suggested method for the second step is based on applying
ADTF to the corrected signal from the first stage. The filtering
window for this stage is 10 samples of the ECG signal, and the
β parameter deployed for this step is 10%. For this stage, these
options produce better results. Table 6.2 illustrates the impact
Figure 6.18 Step 1 of the proposed hybrid method. (a) original signal, (b) D1 signal + D2
signal.
6.3 Embedded Systems Applications in Biomedical Engineering: 123
Table 6.2 The influence of the β parameter on the results of the proposed hybrid filter.
MIT-BIH 5% 10% 15% 20%
SNRimp 101 6.82 8.69 7.54 7.16
SNRimp 115 8.72 9.20 8.92 8.60
∑ [φ(i) - ϕ(i)]
i=n 2
PRD(%) = 100 × i =1
(6.7)
∑ [φ(i)]
i=n 2
i =1
124 Embedded Systems in Biomedical Engineering
Figure 6.19 Filtering of high-frequency noise: the case of the MIT-BIH 101 signal with 15 B
of the WGN. (a) the original signal, (b) noise infected signal, (c) corrected signal.
As shown in this table, the hybrid method shows better results than the
ADTF. This is due to the addition of the frequency analysis offered by the DWT
which offers a more precise analysis for the denoising phase of the ECG signal.
Figure 6.20 Matlab and C/C++ MSE comparison of the denoised signals.
Figure 6.21 Matlab and C/C++ MSE comparison of the denoised signals.
takes an average of 7.5 milliseconds, the XU4 architecture takes 2.34 milli-
seconds, and the desktop takes 0.34 milliseconds.
The results allowed us to exclude raspberry as a solution due to the pro-
cessing time exceeding 2.77 milliseconds. Despite its low energy consump-
tion and weight, this architecture can’t process the algorithm in real-time. The
6.3 Embedded Systems Applications in Biomedical Engineering: 127
Figure 6.22 Min, max, and average processing times by different architectures.
Figure 6.26 Simulation results of the ADTF architecture applied to signal No. 100 of MIT-
BIH database with 10 dB WGN.
bits
DSP blocs 4 blocs 2 blocs – – – – 2 blocs 3 blocs
(2 %) (<1%) (1 %) (2%)
Embedded – – 4 2 – – – –
Multiplier (13%) (7%)
9-bit
elements
6.3 Embedded Systems Applications in Biomedical Engineering: 131
6.3.4 V
HDL implementation of ADTF technique
using FPGA architecture
In this chapter, we propose the description of a VHDL implementation of
an ADTF architecture based on three modules: ALM, AFM, and ATM. The
first module consists of loading cardiac data by ensuring real-time processing
without using large memory space. The second module makes it possible to
calculate the elements necessary for the treatment based on ADTF. The third
module concerns the computation of threshold levels then the application of
tests on ECG data and the assignment of output data. Figure 6.27 presents the
RTL diagram of the ADTF architecture based on the three modules. In digital
circuit design, the RTL is a design abstraction that models a digital circuit
in terms of signals flow between hardware registers and the logic operations
performed on these signals.
cardiac data without occupying a large memory space for storing data before
and after processing. Each item in the register is bound to an output of the
same data size.
Figure 6.31 Real-time denoising under ModelSim: the case of signal n°100 at 20 dB of
noise WGN. (a) CLK and Internal CLK clock signals, (b) input signal, and (c) output signal.
Figure 6.32 ECG signal denoising based on ADTF: the case of signal n°203 at 10 dB of
blue-colored noise. (a) The original signal, (b) noise infected signal, (c) soft-ADTF corrected
signal, (d) VHDL-ADTF corrected signal.
Table 6.6 Evaluation results of the SNRimp parameter for colored noise denoising: the case
of signal n°203.
Colored noises
Methods Blue (5 dB) Pink (5 dB) Blue (10 dB) Pink (10 dB)
Soft - ADTF 9.37 1.43 6.91 1.18
VHDL - ADTF 9.54 1.39 7.98 1.22
In Table 6.6, it is clear that the statistical results based on the SNRimp
parameter present a very small difference in this case in favor of the VHDL
implementation of the ADTF with real-time processing of the ECG signal.
This approves the high performance of the proposed implementation. In the
case of WGN noises, the statistical results based on the SNRimp parameter
present very small differences in favor of the Soft-ADTF.
These small differences in the statistical results do not reflect an
improvement or a change in the general course of treatment with ADTF. They
are related to the size of the data during processing as well as the choice of
the fixed point of 5 bits for the fractional part. Soft-ADTF is built on a 64-bit
machine with floating-point processing. But in the case of VHDL-ADTF, we
have proposed fixed-point based processing to reduce the complexity of the
processing as well as to further simplify the proposed implementation, which
allows it to be implemented in different FPGA devices.
6.3 Embedded Systems Applications in Biomedical Engineering: 137
Figure 6.35 Simulation result of architecture applied to the signal number100 of the MIT-
BIH database.
Figure 6.36 The architecture’s total logic elements in different FPGA targets.
They present the consumption in terms of the total logic elements, utilized
registers, total pins, and embedded multipliers employed.
The hybrid technique’s architecture doesn’t require expensive FPGA
boards to achieve high performance. Hence, the devices utilized in the com-
parison are categorized as low-power and low-cost. The Cyclone and Arria
families are studied.
With a total of 292 registers for Cyclone IV GX, Cyclone III LS, Cyclone
IV E, and Arria II GX, and 329 for Cyclone V. The hybrid architecture
6.4 Conclusions 141
consumes less than 1% of the total registers for all FPGA devices. The occu-
pancy of logic elements in Cyclone V and Cyclone IV GX varies between 3%
and 60%, respectively.
The global architecture has 28 pins with 11 pins for the input signal,
which is coded in 11 bits, 16 pins for the output or denoised signal, and
one pin for the clock, which has a percentage of 9% for Cyclone IV-E and
Cyclone III-LS, 10% for Cyclone V, 16% for Arria II-GX, and 35% for
Cyclone IV-GX.
Some devices use an integrated multiplier of 9-bit elements to optimize
multiplications. So, the architecture requires eight embedded multiplier 9-bit
elements, which account for 5% of the Cyclone IV-GX, 3% of the Cyclone
IV-E, and 2% of the Cyclone III-LS.
To validate the proposed architecture, we implemented the design on the
Intel-Altera FPGA-DE1 board. Based on the test signals, we also developed
a study on the state of the architecture’s output signals. The SignalTap II tool
included in the Quartus II embedded architecture design program performs
the validation step. The results achieved using ModelSim are confirmed by
the architecture’s implementation on the FPGA-DE1 board as can be seen in
figure 6.40.
6.4 Conclusions
Proposing algorithms or merging between them always requires behavior
algorithm validation. This validation on conventional machines such as work-
stations does not imply the implementation in embedded architecture. Even if
homogeneous (CPU/CPU) or heterogeneous (CPU-GPU/ CPU-FPGA/ CPU-
DSP), this architecture requires a detailed study of the software and hardware
parts. This study includes temporal and algorithmic complexity as well as
the study of the target architecture. Furthermore, the implementation of ECG
signal preprocessing algorithms in embedded architecture gives flexibility in
terms of temporal constraints. This flexibility is based on optimizing the tar-
get architecture to propose implementations that meet the different real-time
requirements.
These requirements in the medical field make the implementation task
difficult if the embedded architecture does not give the necessary flexibility
to optimize the consumed resources, which will influence the processing time
directly. Through this work, we have proposed a study on the two filtering
algorithms, DWT and ADTF. The use of these two filtering techniques in a
single algorithm to exploit the strengths of ADTF and DWT. The evalua-
tion of the algorithm has shown that the results are better than ADTF, which
142 Embedded Systems in Biomedical Engineering
well as the low cost and energy consumption. Still, the problem with this
type of architecture compared to the CPU-FPGA system is the complexity of
code development, which makes this architecture limited. Another alteration
that has been proposed in this work is the use of code generators offered by
Matlab. This type of tool is very effective if we do not take into account the
optimization of the target architecture, which poses a significant limitation.
Besides, the use of multi-core can accelerate the processing, but the problem
here is the time taken in scheduling between different processors.
As a solution, the use of architecture based on multi-core and FPGA.
This solution will allow us to exploit the advantages of FPGA and multi-
core. But we need to use high-level languages like OpenCL and OpenMP to
use the features of FPGA for the acceleration of the algorithm parts as well
as the multicore. However, this exploitation requires a detailed study of the
software and hardware based on the H/S Mapping approach to have an opti-
mal implementation that satisfies the different environmental and algorithmic
constraints. Therefore, in future work, a heterogenous architecture based on
multi-core and FPGA will be used. Other processing stages will be added
to the implementations to realize a real-time monitoring system of patients’
cardiac status.
References
[1] S. Heath, “Embedded systems design,” Elsevier, 2002.
[2] W. Jenkal, «Conception et implémentation des algorithmes d’analyse
des signaux cliniques ecg : Vers un système embarqué,» PhD thesis
National School of Applied Sciences, 2018.
[3] S. Youngsoo, K. Choi, and T. Sakurai, “Power optimization of real-
time embedded systems on variable speed processors,” in IEEE/ACM
International Conference on Computer-Aided Design. ICCAD-2000,
2000.
[4] H. Ghasemzadeh, S. Ostadabbas, E. Guenterberg, and A. Pantelopoulos,
“Wireless medical-embedded systems: A review of signal-processing
techniques for classification,” IEEE Sensors Journal, vol. 13, no. 2,
pp. 423–437, 2013.
[5] R. M. Rangayyan, “Biomedical signal analysis,” John Wiley & Sons,
vol. 33, 2015.
[6] D. Etiemble, “45-year CPU evolution: one law and two equations,”
arXiv preprint arXiv:1803.00254, 2018.
[7] N. Harki, A. Ahmed, and L. Haji, “ CPU scheduling techniques: A
review on novel approaches strategy and performance assessment.,”
144 Embedded Systems in Biomedical Engineering
Journal of Applied Science and Technology Trends , vol. 1.2, pp. 48-55,
2020.
[8] I. ZAGAN, “ Improving the performance of CPU architectures by reduc-
ing the Operating System overhead,” in IEEE 3rd Workshop on Advances
in Information, Electronic and Electrical Engineering (AIEEE). IEEE,
2015.
[9] S. W. Smith, “The Scientist and Engineer’s Guide to Digital Signal
Processing,” California Technical Publishing, pp. 503-534, 1997.
[10] D. Marković and R.W. Brodersen, “DSP architecture design essentials,
“ Springer Science & Business Media, 2012.
[11] P. Eles, K. Kuchcinski and Z. Peng, “System synthesis with VHDL,”
Springer Science & Business Media, 2013.
[12] C. H. Roth. and L. K. John, “Digital systems design using VHDL,”
Cengage Learning, 2016.
[13] F. Vahid, “Digital Design with RTL Design, Verilog and VHDL,” John
Wiley & Sons, 2010.
[14] D. Iorga, A. F. Donaldson, T. Sorensen and J. Wickerson, “ The
Semantics of Shared Memory in Intel CPU/FPGA Systems,” in Proc.
ACM Program. Lang. 5, OOPSLA, 2021.
[15] C. Zhang, R. Chen and V. Prasanna, “High throughput large scale sorting
on a CPU-FPGA heterogeneous platform,” in nternational Parallel and
Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2016.
[16] W.Jenkal, R. Latif,A. Toumanari, A. Dliou and O. El B’charri, “An effi-
cient method of ECG signals denoising based on an adaptive algorithm
using mean filter and an adaptive dual threshold filter,” International
Review on Computers and Software (IRECOS), Vols. 10(11), pp. 1089-
1095, 2015.
[17] W.Jenkal, R. Latif,A. Toumanari, A. Dliou, o. El B’charri and
F. M. Maoulainine, “An efficient algorithm of ECG signal denoising
using the adaptive dual threshold filter and the discrete wavelet trans-
form,” Biocybernetics and Biomedical Engineering, vol. 36(3), pp.
499–508, 2016.
[18] V. Gupta, V. Chaurasia and M. Shandilya, “ Random-valued impulse noise
removal using adaptive dual threshold median filter,” Journal of visual
communication and image representation, vol. 26, pp. 296–304, 2015.
[19] K. M. Chang, “ Arrhythmia ECG noise reduction by ensemble empirical
mode decomposition,” Sensors, vol. 10(6), pp. 6063-6080, 2010.
[20] P. Nguyen and J.M. Kim, “ Adaptive ECG denoising using genetic algo-
rithm-based thresholding and ensemble empirical mode decomposi-
tion,” Information Sciences, vol. 373, pp. 499–511, 2016.
References 145
Abstract
Surface electromyography (sEMG) signal has been the subject of much
research for many years now. The surface EMG signal provides a non-in-
vasive possibility of studying muscular activities produced during periods
of contraction and relaxation. However, commercial sEMG systems are not
flexible enough and not scalable to contemporary needs. Therefore, this chap-
ter deals with the implementation of the signal conditioning circuit, real-time
acquisition, and feature extraction of the sEMG signal. The entire embed-
ded system was developed in a user-friendly and flexible way. For the EMG
signal conditioning circuit, National Instrument Educational Laboratory
Virtual Instrumentation Suite (NI-ELVIS II+) was used. Further tasks were
implemented on a Field Programmable Gate Array (FPGA) and a real-time
processor with LabVIEW programming. The implementation results were
presented on a LabVIEW-based graphic user interface (GUI). It can be con-
cluded that the evaluated time and frequency features such as sEMG enve-
lope, mean absolute value (MAV), root mean square (RMS), mean power
frequency (MPF), and median frequency (MDF) can be employed as valu-
able vital signs monitor to assist doctors to perform their particular spots
regarding the state of the muscle.
147
148 Acquisition and Processing of Surface EMG Signal
7.1 Introduction
Nowadays, biomedical monitoring systems have increased their popularity in
modern healthcare and medicine. The monitoring of bio-signals is not only
serves to understand the physiological changes produced in the body, but also
a powerful way to anticipate the occurrence of specific dysfunctions. One of
the most widely used bio-signal types is surface electromyography (sEMG)
signal, which provides the means to understand how skeletal muscle works.
In fact, the sEMG signal expresses the electrophysiology of muscle contrac-
tion/relaxation, and it can be easily recorded with patch electrodes. Positive
and negative electrodes are placed over the target muscle, while the third ref-
erence electrode has to be placed away from it [1, 2]. For the standard sEMG
signal, the range of frequency content is typically from 20 to 2000Hz and its
amplitude range is between 0 and 10 mV, depending on the type of investi-
gation (non-invasive or invasive) [3, 4]. This non-invasive signal, therefore,
contains abundant physiological insights that can be used for analysis in the
studies such as muscle fatigue [5], strength [6], gait activity [7], and gesture
[8], among other applications. On the other hand, the sEMG signal is inev-
itably affected by noises (e.g., electrode motion artifacts, muscle artifacts)
that are generated from a variety of external sources during the recording [1].
These noises severely limit the utility of the sEMG signal, and thus the use
of hardware efficient filters long before the sEMG processing step is crucial
to exclude all the unwanted frequency components. Signal conditioning is a
well-known electronic circuit. It usually consists of an analog conditioning
stage that amplifies and filters the signal, followed by an analog-digital con-
verter [4, 9, 10].
Several health-related electronic and software based on sEMG signal
monitoring systems and signal processing can be found in the literature [11–
14]. Despite the ongoing evolution of sEMG monitoring systems, many of
them are still treadmill embedded systems because of their restricted charac-
teristics. Most traditional designs suffer from various shortcomings. Similar
to other systems, they perform specialized tasks, serve specific targets, or
are suitable for certain technologies. Firstly, they lack a user-friendly graph-
ical (GUI), and even if it is available, it is usually programmed by classical
software, which is not easy for clinicians to develop their functional require-
ments. This makes the available commercial sEMG monitoring system hard
to develop and reuse. Secondly, they do not provide pertinent information
to medical experts to help them get the correct interpretation and medical
diagnosis. Therefore, an urgent need to develop an embedded system that can
overcome the aforementioned limitations. In this chapter, a comprehensive
7.2 EMG Signal Conditioning Circuit 149
amplifier) eliminates the common mode effect and provides a differential gain
R
Ad = 4 [15, 16].
R3
The output voltages of the differential amplifier (stage 1) are given as:
R
( )
Voutp - Voutn = Vp - Vn × 1 + 2
R1
(7.1)
Vout =
R4
R3
(
× Voutp - Voutn ) (7.2)
R4 R
Vout = (
× 1 + 2 × Vp - Vn
R3 R1
) (7.3)
For the acquisition, we used the AD8226 circuit (see Figure 7.2) designed
to use in medical instrumentation. AD8226 is a low-cost amplifier with high
CMRR (more than 90 dB), operates on a supply range of signal voltages
(single and dual supplies) that needs only one external resistor to set any gain
7.2 EMG Signal Conditioning Circuit 151
between 1 and 1000. This gain can be computed according to the following
gain equation [17]:
49.4 k
G = 1+ (7.4)
RG
where, RG is the gain-setting resistor.
( jω )
2
C1C2 R1 R2
T1 (ω ) = - (7.5)
(
1 + jωR1 (C1 + C2 + C3 ) + jω C1C2 R1 R2 )
2
R3
R4
T2 (ω ) = - (7.7)
1 1 1
( )
2
1 + jωR5 R3C5 + + + jω C4C5 R3 R5
R3 R4 R5
The low-pass filter cut-off frequency at -3dB is:
1
ωc2 = (7.8)
C4C5 R3 R5
The frequency response of the designed filter is illustrated in Figure 7.4:
7.3 EMG Signal Processing 153
sEMG if sEMG ≥ 0
sEMG r = (7.9)
-sEMG if sEMG < 0
Nw
1
MAF (k ) =
Nw ∑sEMG
i=0
r (k + i) (7.10)
N
1
∑sEMG (n)
2
RMS = (7.12)
N n =1
and,
N -1 2 πkn
X (k ) = ∑ sEMG ( n ) e
-j
N
, k = 0,1,…, N - 1 (7.14)
n=0
The relative changes in PSD of the sEMG signal during muscle voluntary
contraction may be quantified by two well-known descriptors: Mean power
frequency (MPF) and median frequency measure the central tendency and,
then, indicate about which frequency the power of sEMG distributes. MPF
and MDF indicators can be calculated by the following mathematical expres-
sions, respectively [20] :
fs
MPF =
∑ f × PSD (f )
2
f =0
(7.15)
fs
∑ PSD (f )
2
f =0
fs fs
MDF 2 2
1
∑f =0 MDF
∑
PSD (f ) = PSD (f ) =
2 ∑PSD (f )
f =0
(7.16)
Table 7.1 Technical characteristics of the NI-cRIO-9035 and its used modules.
Hardware name Attribute Specification
NI-cRIO-9035 [23] CPU Intel Atom E3825
CPU frequency 1.33 GHz
Number of cores 2
RAM 4 GB
Slot count 8-Slot
NI-9215 [24] Module Type Voltage Input
Channels 4 Differential
Sample Rate
Simultaneous Yes
Resolution 16-Bit
NI-9474 [25] Module Type Sourcing Output
Channels 8(DO0, …,DO7)
Update Rate 1 µs
Figure 7.7 Block diagram of the LabVIEW FPGA .vi (data acquisition).
parameters used in FPGA .vi are presented in Table 7.2. The LabVIEW
FPGA module enables Direct Memory Access (DMA) transfers of data to be
performed with the use of FIFO architecture. The FIFO (First-In, First-Out)
has consisted of two parts that act as one FIFO. The first one of this DMA-
FIFO is designed on the FPGA using the FPGA’s buffer. The second one
of the DMA-FIFO is on the real-time controller. This part of the FIFO uses
the host processor’s memory resources. For every sample, the FPGA acquire
value is put into a FIFO queue, which can be accessed from the real-time
processor.
158 Acquisition and Processing of Surface EMG Signal
Figure 7.8 Block diagram of the LabVIEW host real-time.vi (data reading).
that assists developers in easily building flexible and powerful test software
with an intuitive GUI. In this study, the GUI is designed to illustrate and dis-
play the real-time results of sEMG analysis. Figure 7.12 describes the overall
LabVIEW GUI of the study. After successful deploying of the host real-time
.vi on the NI-cRIO-9035 processor, the users can validate the feasibility of the
implemented system. A simple click on the ‘Start acquisition & processing’
button allows the users to monitor the variation of sEMG features in real-time
during muscle contraction and relaxation movement. For more flexibility, the
users have the privilege to start and stop the real-time acquisition data at any
instant by clicking on the ‘start/stop acquisition’ button. In addition, config-
uration parameters related to data processing are available for the users to
define customized sEMG control parameters. The LabVIEW GUI consists of
160 Acquisition and Processing of Surface EMG Signal
three different areas: (1) the parameters setting area, where the users can enter
the count (ticks) to check data delivering frequency, and the window length
together with the threshold value to avoid false alarms in onset/offset detec-
tion; (2) the data visualization area to display real-time monitored informa-
tion in graphs (such as sEMG signal, linear envelope and PSD associated with
the position of MDF as well as MPF); and (3) the computational display area
for observing the key monitoring sEMG indicators (such as cycle duration,
onset, offset, MAV, RMS, MDF, and MPF). Furthermore, the LED indicator
7.5 Conclusion 161
lights up when the start of contraction occurs. Otherwise, the LED indicator
turns off which indicates no occurrence onset event. This is also noticeable on
the indicator of the NI-9474, as clearly shown in Figure 7.6.
The amplitude variations contained in the acquired sEMG signal are
random, but it is certainly great medical information. In fact, it can inform
the doctors about the state of the muscle. One of the main advantages
of the sEMG signal is that it has temporal and frequency dynamics that
are important to be quantified in order to characterize voluntary muscle
contractions.
7.5 Conclusion
The goal of this chapter was to build an embedded system suitable enough
for pre-processing, real-time acquisition, and processing of sEMG signals.
The proposed investigation ended with the successful design and implemen-
tation of the EMG signal conditioning circuit on the NI-ELVIS II+ board,
acquisition, and feature extraction through the CompactRIO-9035 control-
ler. The physiological information extracted from each detected contraction/
relaxation cycle including time and frequency features were displayed in
real-time on the LabVIEW GUI, providing a huge quantity of relevant health
information for medical practitioners. Compared to commercial devices,
our proposed system has proven to be highly adaptable to the needs of the
medical community. The feasible outcome provided a novel technique to
monitor sEMG information holistically, which can be potentially applied for
further investigation on other surface EMG applications, especially medical
diagnosis.
162 Acquisition and Processing of Surface EMG Signal
7.6 Funding
The authors received no financial support for the research, authorship, and/or
publication of this chapter.
7.7 ORCID ID
Abdelouahad Achmamad: https://orcid.org/0000-0002-2951-5468s
Mohamed El Fezazi: https://orcid.org/0000-0001-6072-325X
Atman Jbari: https://orcid.org/0000-0002-1855-2503
References
[1] Chowdhury, Rubana H., Mamun BI Reaz, Mohd Alauddin bin Mohd
Ali, Ashrif AA Bakar, Kalaivani Chellappan, and Tae G. Chang.
Surface electromyography signal processing and classification tech-
niques. Sensors. vol.13, no. 9, 2013, pp.12431–12466. https://doi.
org/10.3390/s130912431
[2] Nazmi, Nurhazimah, Mohd Azizi Abdul Rahman, Shin-Ichiroh
Yamamoto, Siti Anom Ahmad, Hairi Zamzuri, and Saiful Amri Mazlan.
A review of classification techniques of EMG signals during isotonic
and isometric contractions. Sensor. vol.16, no. 8, 2016, pp. 1304. https://
doi.org/10.3390/s16081304
[3] Gerdle, Björn, et al. Acquisition, processing and analysis of the sur-
face electromyogram. Modern techniques in neuroscience research.
Springer, Berlin, Heidelberg, 1999, pp. 705–755. https://doi.
org/10.1007/978-3-642-58552-4_26
[4] Rodríguez-Tapia, Bernabe, Israel Soto, Daniela M. Martínez, and
Norma Candolfi Arballo. Myoelectric interfaces and related applica-
tions: current state of EMG signal processing–a systematic review.
IEEE Access. 2020, pp. 7792–7805. https://doi.org/10.1109/
ACCESS.2019.2963881
[5] Yochum, Maxime, Toufik Bakir, Romuald Lepers, and Stéphane
Binczak. A real time electromyostimulator linked with emg analysis
device. IRBM. vol.34, no. 1, 2013, pp. 43–47. https://doi.org/10.1016/j.
irbm.2012.12.003
[6] Ma, Ruyi, Leilei Zhang, Gongfa Li, Du Jiang, Shuang Xu, and Disi
Chen. Grasping force prediction based on sEMG signals. Alexandria
Engineering Journal. vol. 59, no. 3, 2020, pp. 1135–1147. https://doi.
org/10.1016/j.aej.2020.01.007
References 163
165
8
Quick and Efficient Hardware-Software
Design Space Exploration Using
Vivado-HLS: A Case Study of Adaptive
Algorithm for Image Denoising
Abstract
The Field Programmable Gate Array (FPGA) devices provide ample configu-
rable on-chip resources for the implementation of complex applications. The
capability of these devices is enhanced by adding other blocks like multi-core
processors, DSP blocks, on-chip block memories, etc. in basic FPGA fabric
This makes it suitable for the implementation of real-time applications by using
only Hardware (HW) or Hardware- Software (HW-SW) codesign approach.
Advancements in FPGA demand upgrading implantation techniques in order
to improve utilization capacity and the productivity of FPGA. One such tech-
nique is the ability to write applications in High-Level Languages (HLL)
than the traditional Hardware Description Languages (HDL). It attracts HLL
experts to the domain of FPGA without the need of studying complex HDLs.
The presented chapter explores Embedded System Design with Vivado
HLS by implementing a case study on Least Mean Square (LMS) and
167
168 Quick and Efficient Hardware-Software Design Space Exploration
Normalized Least Mean Square (NLMS) algorithms for image denoising. The
simplicity of coding provided by HLS has proved to be advantageous in quick
turnaround of implementation without much knowledge of HDL. Available
optimization directives with the tool have implemented the design with opti-
mal resources and improved latency resulting in reduced computation cost.
8.1 Introduction
Field Programmable Gate Array (FPGA) devices have been an attrac-
tive option as accelerators for high computation demanding applications.
However, many software engineers seldom choose FPGA for implementa-
tion due to two important factors, one is the requirement of hardware design
knowledge and another is expertise in a programming language of FPGA.
This imposes the need for a new approach for the programmers. Nowadays
the programmers are originally trained to use High Level languages (HLL),
which results in the demand for a technique that can convert HLL code to
Register Transfer Level (RTL) for FPGA implementation. Smith et.al.[1] pro-
posed a way to utilize the advantages of FPGAs without having the knowl-
edge of its hardware by using high-level synthesis (HLS) tools.
HLS addresses this challenge; it accepts input as the algorithmic
description of the application written in HLL like C/C++/SystemC etc. This
algorithmic description is converted to the hardware description i.e., RTL net-
list. An HLL description can typically be implemented faster and can reduce
design efforts and susceptibility to programmer error. In recent years, HLS
has enhanced its compatibility with input source code and produces more
accurate output hardware designs. A recent study has proved that HLS allows
the rapid evaluation of architecture regardless of the developer’s expertise
(software or hardware), thus expanding the horizons of the FPGA market
[2]–[6]. There are various HLS tools available like Academic tools-LEGUP,
DWARV, BAMBU, etc., and commercial tools like Altera SDK, Vivado HLS,
etc. The comparison between academic tools and commercial tools is done
by R. Nane et.al. [7] and the conclusion that was drawn based on the com-
parison was that commercial tools offer improved features and performance
as compared to academic tools. In a comparison to Altera SDK and Vivado
HLS, RTL developed by Vivado consumes less resources as compared to the
RTL of Altera SDK.
Considering the market demands, the focus of this chapter is to explore
Vivado HLS for the simplicity of coding and optimizations. Furthermore,
this paper uses the case study of Least Mean Square (LMS) and Normalized
Least Mean Square (NLMS) filter for image denoising [5]–[7]. For the
8.2 High-level Synthesis 169
implementation of the filters, the Zynq board is used and the results are pre-
sented in detail. Implementation is presented in three phases, wherein LMS
and NLMS algorithms are designed in the first phase to proceed by its C code
implementation in the second phase of the work. Finally, HW-SW with Zynq
codesign aspect is explored in the third phase. The outline of this chapter is
as follows: Section 2 deals with the basics of Vivado HLS, and the details of
adaptive algorithm targeting LMS and NLMS filters are given in Section 3.
Section 4 is devoted to the implementation of the LMS algorithm using Vivado
HLS along with the discussion on results obtained during implementation [8]–
[9]. Lastly, Section 5 provides the concluding remark and future scope.
Vivado HLS enables engineers to validate their designs for functional cor-
rectness using C/RTL co-simulation. Engineers will also be able to improve
timing and performance by inserting optimization directives into the code
and taking advantage of more parallelisms in the algorithm.
HLS offers many features such as,
1. Reduced design efforts
2. Easy verification
3. Quick Design space exploration
4. Easy adaption to new platforms
5. Opportunity for software engineers to tackle hardware projects
Along with the above-mentioned features, it also reduces the design and veri-
fication time, furthermore reduces in development costs. As a result, the time
to market is reduced and the use of hardware acceleration on heterogeneous
systems becomes more appealing.
real-time images collected from the camera. Any algorithm is chosen based
on two key factors: its complexity and the number of computations. This
section briefs about LMS and NLMS algorithms which are best-suggested
candidates for such applications in literature [10]–[11], [17]–[23].
Where, x(n) = Input Signal, e(n) = error Signal, and µ = convergence factor
In the preceding equation, the choice of the convergence factor is critical
in the design of a filter. The value of µ ranges from 0 to 1. The smaller the value
of µ, the longer it will take to obtain the exact output. The larger the value of
µ, the less time will take to get the bumped output. As a result, the value of µ
should be determined by the application for which it is used. The value
of µ in adaptive algorithms is chosen during the noise filtration process based
on the value of the error signal. This allows the design to quickly adapt and
produce noise-free output in a shorter amount of time.
the Simulink model using Matlab Function Block. In Phase-2, C code of both
models is imported, simulated, synthesized and optimized in Vivado-HLS.
Based on a comparison of design metrics LMS algorithm is chosen for sys-
tem implementation In Phase-3 a complete embedded system is developed
around Zynq platform where real time images acquired by the Camera are
de-noised using the LMS filter.
8.4.1 Phase I
The Simulink model of LMS and NLMS filter is given in Figure 8.3(a & b)
respectively. The model consists of two images - the input noisy image and
desired image. Since the LMS and NLMS function on 1D arrays, the 2D
images are converted to 1D pixel values. The value of µ for the LMS algo-
rithm is set to 0.01 as discussed in Section 3.1.
After successfully obtaining the output from the Simulink model, the
C code of both the models is generated using the code generation function
available in the configuration parameter tab in Simulink.
8.4.2 Phase II
In Phase-II, the C-codes from Simulink are imported and the RTL is gener-
ated using Vivado HLS. The C code is modified in two terms –
• From using “memset” instruction to using simple loops; as “memset”
instruction was requiring more BRAM than available on board.
• Direct Memory Access (DMA) Access is modified to simple loops as
DMA is not supported in HLS.
References
[1] J. P. Smith et al., “A High-Throughput Oversampled Polyphase Filter
Bank Using Vivado HLS and PYNQ on a RFSoC,” in IEEE Open
Journal of Circuits and Systems, vol. 2, pp. 241–252, 2021, doi: 10.1109/
OJCAS.2020.3041208
[2] J. Caba, F. Rincón, J. Barba, J. A. De La Torre, J. Dondo and J. C.
López, “Towards Test-Driven Development for FPGA-Based Modules
Across Abstraction Levels,” in IEEE Access, vol. 9, pp. 31581–31594,
2021, doi: 10.1109/ACCESS.2021.3059941.
[3] S. Lahti, P. Sjövall, J. Vanne and T. D. Hämäläinen, “Are We There Yet?
A Study on the State of High-Level Synthesis,” in IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no.
5, pp. 898–911, May 2019.
References 179
Abstract
Real-time object detection has become a popular technology in many applica-
tions such as real-time security monitoring, robot navigation, event detection,
and autonomous vehicles. In this work, we propose a real time implemen-
tation method of detecting moving objects from a video stream, the main
element of the used method is based on the comparison of different video
frames. The complete video motion detection chain has been implemented in
the software and hardware by using monocular static vision which requires
high computational performance, the hardware implementation of the pro-
posed system reduces the detection time of moving objects compared to the
software implementation and achieves a processing speed of 30 fps for a res-
olution of 640 × 480 pixels. Finally, several tests are performed in real time
and the experimental results show that the proposed method is robust and
provides excellent performance in terms of detection accuracy and process-
ing speed, which is suitable for real time applications.
181
182 Fast FPGA Implementation of A Moving Object Detection System
9.1 Introduction
Detecting moving objects from video footage becomes necessary to make
analysis more intelligent for a number of embedded applications such as
security, vehicle navigation, and traffic control, etc. The detection of a mov-
ing object has several constraints related to the processing time of different
objects on the one hand and the performance of the detection on the other
hand which depends on many parameters such as light variation, presence of
shadows, and others [1].
The main objective of moving object detection is to distinguish the fore-
ground of the object from the stationary background. Several methods have
been proposed so far for moving object detection, including background sub-
traction, image differentiation, temporal differentiation, and optical flow [2,
3, 4, 5]. The most common approach for detecting moving objects is based on
the image subtraction technique. When a new image is captured, the differ-
ence between the image and the next image is computed to extract the differ-
ence image that marks the areas where a moving object was in the image, this
subtraction calculation of pixel intensity is simple and easy to implement [6].
This work is based on image differences for moving object detection
and tracking using a static camera. The processing chain contains RGB to
grayscale conversion block, subtraction between images, thresholding, noise
removal filter, and edge detector of moving objects in the scene. This detec-
tion algorithm of moving objects is greedy in calculation and carries out
treatment with the delayed time, the purpose of this work is to reduce the time
of treatment of the moving objects detection process based on a technology
of Hardware implementation on FPGA.
For example, several hardware implementations have been proposed
in the literature to improve the quality of the image and accelerate the pro-
cessing time for the requirement of the critical system. Siva Nagi Reddy
Kalli and Bhanu Murthy Bhaskara [7] implemented a real-time Moving
Object Segmentation in FPGA by using Background Modeling with Biased
Illumination Field Fuzzy C-Means, the system response produced accurate
results. In the paper [8], the authors proposed a Visual Object Tracking by
Adaptive Background Subtraction on FPGA to obtain a more complete mov-
ing object, in this work, the threshold gives better noise immunity, and to
eliminate the noise a morphological filter is used. I.Iszaidy et al [9] proposed
an analysis of background subtraction on embedded platform based on syn-
thetic dataset. The proposed system is to provide a comparative analysis of
available background subtraction algorithms on the embedded platform. In
the work referenced in [10], the authors presented the FPGA based Object
9.2 Detect Moving Objects Algorithm 183
Figure 9.1 Processing chain of the algorithm used to detect moving objects based on the
difference between images.
We convert all the images in the color video sequences to gray scale
images in order to have better performance and to reduce the computational
requirements. This is because the color image increases the amount of data
processing to achieve good performance because a pixel in RGB format is rep-
resented by three 8-bit data, however, for a 640 × 480 pixel image in RGB for-
mat contains 921600 information bytes. In the literature, many approaches for
color to gray scale conversion have been proposed [11]. A common approach
for the conversion of RGB to gray scale is expressed by the formula (9.1).
The technique of frame subtraction [12], detects and tracks the object over
time by locating its position by subtracting consecutive frames from the
video, moving objects, which are called foreground, from the background
of the sequence in the video stream are detected without having any prior
information about the object. We consider Ii(x,y) that is the intensity of the
Ii frame. The difference image Id(x,y) used to detect the moving region is
calculated mathematically by the expression (9.2):
I d ( x, y ) = I i +1 ( x, y ) − I i ( x, y ) (9.2)
Where G: is the Sobel gradient operator value, GX: Horizontal sobel gradient
and GY: Vertical sobel gradient.
The proposed system can experiment with different metrics which can
be used for performance evaluation.
Figure 9.2 (a) Video to be tested (b) gray scale video version (c) frame Subtraction result
(d) thresholding video (e) median filtered video (f) edge detection to obtain the result of the
proposed method.
Figure 9.3 (a) Video to be tested (b) gray scale video version (c) frame Subtraction result
(d) thresholding video (e) median filtered video (f) edge detection to obtain the result of the
proposed method.
are the frames of the sequences. The obtained results show that it is perfectly
possible to detect objects in the scene from the cameras.
The extraction of moving objects from the video is based on the frame
subtraction method. Despite several drawbacks of this method, such as cam-
era noise, shadows, light, etc., this algorithm can detect the moving object in
an effective way. Some pixels of the object become part of the background,
and the quality of the foreground object is highly dependent on the threshold
value. The method used in this work can detect moving objects by removing
the noise. By analyzing the obtained results, we conclude that the algorithm
works well with an improvement of the results by using an image processing
filter and an edge detector.
The processing time of the overall detection chain time is 3.8s/frame.
The program takes longer to execute due to its iterative structure and the num-
ber of memory allocations it performs. The execution time on the Raspberry
Pi 4 platform is more important due to the constraints of the platform used.
The main disadvantage of this software solution is low performance in terms
of processing speed. As a result, detecting moving object consumes very
high computation time, making this solution unsuitable for applications that
require real-time processing. To overcome this problem, the solution requires
specially designed hardware implementation to perform the task in a high-
speed processing based on a reconfigurable FPGA circuit.
Figure 9.5 (a) RGB video (b) RGB to gray scale conversion (c) median filter result
(d) thresholding video (e) edge detection (f) result of the proposed method.
190 Fast FPGA Implementation of A Moving Object Detection System
approximately 1 frame per 3.8s. The second implementation is faster than the
first one; however, it provides an image of a detected moving object in real time.
9.4 Conclusion
In this chapter moving objects are detected based on the frame subtraction
method. This chapter is focused on the video which is captured from a static
camera. Two experiments for moving objection detection are described. The
first is implemented on Raspberry pi4 by using c programming language with
Open CV Library, while the second is implemented on FPGA, the proposed
architecture is realized using VHDL, The results of the prototype system are
obtained at 640×480 resolution, the proposed architecture can be employed
in various real-time computer vision applications and FPGA-based system
is a good solution in real-time computer vision problem. Finally, the results
of the multiple videos test show that the proposed technique performs well,
eliminate noise and detect moving object more completely for both indoor
and outdoor sequences. In future, this work can be extended to a moving
camera.
References
[1] Jaya S. Kulchandani, Kruti J. Dangarwala. « Moving Object Detection:
Review of Recent Research Trends”. International Conference on
Pervasive Computing, 2015.
[2] Soharab Hossain Shaikh, Khalid Saeed , Nabendu Chaki.”Moving
Object Detection Using Background Subtraction”. 2014.
[3] Yong, H.; Meng, D.; Zuo, W.; Zhang, K. Robust online matrix factor-
ization for dynamic background subtraction. IEEE Trans. Pattern Anal.
Mach. Intell. 2018.
[4] S. S. Sengar, S. Mukhopadhyay, “Moving object detection based on
frame difference and W4” in Signal, Image and Video Processing 2017;
11 (7): 1357–1364.
192 Fast FPGA Implementation of A Moving Object Detection System
[17] Jia Wei Tang, Nasir Shaikh-Husin, Usman Ullah Sheikh, and M. N. Marsono,
“FPGA-Based Real-Time Moving Target Detection System for Unmanned
Aerial Vehicle Application”. International Journal of Reconfigurable
Computing. 2016.
[18] Jaechan Cho , Yongchul Jung , Dong-Sun Kim , Seongjoo Lee and Yunho
Jung , “Moving Object Detection Based on Optical Flow Estimation and
a Gaussian Mixture Model for Advanced Driver Assistance Systems”.
MDPI. 2019
[19] Masayuki SHIMODA , Shimpei SATO, and Hiroki NAKAHARA,
“Power Efficient Object Detector with an Event-Driven Camera for
Moving Object Surveillance on an FPGA”. 2019
10
Face Recognition based on CNN, Hog and
Haar Cascade Methods using Raspberry
Pi v4 Model B
Abstract
Automatic person recognition has received a lot of attention in the last few
years. It is one of the best biometric modalities for applications related to the
identification or authentication of persons. Since face recognition is a com-
pute-intensive process, an embedded system solution allows for low-cost,
discrete hardware implementations that can be applied and embedded in a
wide range of applications. Several methods have been developed for face
detection such as CNN, Haar Cascade, and Hog. In this work, we will apply
an algorithm that exploits its different methods for facial recognition based
on a powerful embedded system and connected object such as the Raspberry
Pi. This study aims to better understand the difference between each method
and to compare the effectiveness of each learning technique used.
10.1 Introduction
In recent times and with the advancement of technology, passwords and keys
used in all areas of security and access control have become easily forged
and breached. This is how biometrics was invented and became fashion-
able in areas that require a high level of security and control. Among all the
195
196 Face Recognition based on CNN, HOG, and Haar cascade methods
biometric technologies that exist, facial recognition is one of the most used
and adapted. It allows exploiting a lot of information about a person.
Facial recognition is applied mainly in the field of security [12]. It is
responsible for the identification and authentication of the face [8]. It can also
be used to detect, track and recognize persons in public areas like shopping
payment, schools, ministries, airports, areas with restricted access such as
private offices, houses... etc
Facial recognition [14] is a technology based on biometric techniques,
artificial intelligence, 3D mapping, and deep learning. This technology allows
identifying a person on an image or video frame, by comparing the character-
istics of his face with those saved in the dataset automatically.
The facial recognition system used in this chapter is based on the
Raspberry Pi 4 Model B. It is a small board, which can be connected to
additional modules. It is a good platform for testing the functionality of a
connected object before it is put into use [11].
Face recognition is sensible to consider an embedded system implemen-
tation that would be specifically enhanced to recognize faces. An embedded
system would provide multiple benefits, such as low cost, since just a small
subset of hardware is required relative to general-purpose computing solutions,
improving face recognition processing algorithms in real time independently
of the other post processing issues, and integrating with other technologies.
The Raspberry Pi is used to be connected and controlled by external
devices and aims to create Internet of Things (IoT) solutions.
The IoT [10] is a system of interconnection between computer devices,
machines, objects, animals, and even people, provided with unique identifiers
(UID) with the ability to transfer data over a network. Technically, this inter-
connectivity digitally identifies an object through a wireless communication
system, such as wi-fi or bluetooth. And without human-to-human or human-
to-computer interaction. Collectively, it is any natural or man-made “object”
to which an IP address can be assigned and which can transfer data over a
network (Figure 10.1).
The domain of facial recognition has suffered many challenges since
its development, with various reasons making accurate recognition a difficult
task, as shown in the following:
• Illumination: Changing lighting levels can affect the image of a per-
son’s face in a variety of ways.
• Occlusion: Once a part of the face is hidden, the facial characteristics
cannot be totally visualized, and as a result authentication by the facial
recognition application is compromised.
10.1 Introduction 197
10.3.1.2 Camera Pi V2
To capture shots with Raspberry Pi, the new camera module v2 features a
Sony IMX219 fixed focus sensor, for a high native resolution. Useful for
time-lapse activities, motion detection, or security camera, the module is
plugged into the CSI port of the Raspberry Pi. The parameters of the model
of camera Pi V2: Fixed focus lens, native resolution of 8MP, 3280 × 2464
photo resolution, video resolution 1080p30, 720p60, and 640 × 480p90 Size:
25mm × 23mm × 9mm, Weight: 3g.
to that person (Figure 10.6) Otherwise, the face is unknown. This process is
repeated until we stop the program. The figure (Figure 10.5) illustrates the
stages of learning and testing algorithms.
Figure 10.7 Result of face recognition using (a), (b), (c) haar cascade classifier, (d), (e), (f)
hog classifier, (g), (h), (i) CNN classifier.
10.4 Conclusion 205
than one person (Figure 10.7(f)). Haar cascades and hog are faster but do not
detect in different angles of view, and are less accurate (Figure 10.7(b), (e)).
While CNNs are more accurate, and able to detect several people at once and
in various positions (see Figure 10.7(i), (h)), but is slower. Haar cascade clas-
sifier does more false positive prediction on faces than the HOG classifier.
Therefore, CNN requires a graphics processing unit (GPU) with mas-
sive memory because they have heavy computations, but for devices like
Raspberry Pi (4GB RAM), you should limit yourself to a shallower CNN.
Unfortunately, the shallower CNNs do not reach a very high level of accu-
racy. For a very large training data set, it is better to use CNNs. But for
the Raspberry Pi, the HOG classifier is the most suitable. The table below
(Table 10.2) summarizes the advantages and disadvantages of each algorithm:
10.4 Conclusion
In this chapter, we have developed a simple and effective face authentication sys-
tem aimed at addressing surveillance and security issues. To realize this system,
we used at the hardware level a Raspberry Pi4 electronic board, and a Camera
Pi V2. At the software level, we used the programming language Python and
the operating system Rasbian. The application starts with the creation of the
database. Then comes the step of extracting the features of each image in the
dataset. In the end, follows the step of comparing these features with the fea-
tures of the face detected by the pi camera. For this code, we used three facial
recognition algorithms namely hog, haar cascade, and CNN. After most, we
compared the results of the facial recognition of these three algorithms.
206 Face Recognition based on CNN, HOG, and Haar cascade methods
References
[1] Adiono, T., Prakoso, K. S., Putratama, C. D., Yuwono, B., and Fuada,
S. (2018). Hog-adaboost implementationfor human detection employ-
ing fpga altera de2-115.International Journal of Advanced Computer
Scienceand Applications, 9(10):353–8.
[2] Albelwi, S. and Mahmood, A. (2017). A framework for de-signing the
architectures of deep convolutional neuralnetworks. Entropy, 19(6):242.
[3] Balogh, Z., Magdin, M., and Molnár, G. (2019). Motiondetection and
face recognition using raspberry pi, asa part of, the internet of things.
Acta PolytechnicaHungarica, 16(3):167–185.
[4] Bruce, V. and Young, A. (1986). Understanding face recognition. British
journal of psychology, 77(3):305–327.
[5] Dhiman, G. et al. (2020). An innovative approach for face recognition
using raspberry pi. Artificial Intelligence Evolution, pages 102–107.
[6] Gauswami, M. H. and Trivedi, K. R. (2018). Implementation of machine
learning for gender detection using cnn on raspberry pi platform.
In2018 2nd International Conference on Inventive Systems and Control
(ICISC), pages 608–613. IEEE.
[7] Gupta, I., Patil, V., Kadam, C., and Dumbre, S. (2016). Face detection
and recognition using raspberry pi. In2016IEEE international WIE con-
ference on electrical and computer engineering (WIECON-ECE), pages
83–86. IEEE.
[8] Mane, S. and Shah, G. (2019). Facial recognition, expression recog-
nition, and gender identification. In Data Management, Analytics and
Innovation, pages 275–290. Springer.
[9] Nikisins, O., Fuksis, R., Kadikis, A., and Greitans, M. (2015). Face rec-
ognition system on raspberry pi. In-stitute of Electronics and Computer
Science, 14.
[10] Nord, J. H., Koohang, A., and Paliszkiewicz, J. (2019). The internet
of things: Review and theoretical framework. Expert Systems with
Applications, 133:97–108.
[11] Patnaik Patnaikuni, D. R. (2017). A comparative study of arduino, rasp-
berry pi and esp8266 as iot development board. International Journal of
Advanced Research in Computer Science, 8(5).
[12] Radzi, S. A., Alif, M. M. F., Athirah, Y. N., Jaafar, A., Nori-han, A.,
and Saleha, M. (2020). Iot based facial recognition door access control
home security system using raspberry pi. International Journal of Power
Electron-ics and Drive Systems, 11(1):417.
References 207
[13] Sajjad, M., Nasir, M., Muhammad, K., Khan, S., Jan,Z., Sangaiah, A.
K., Elhoseny, M., and Baik, S. W.(2020).Raspberry pi assisted face rec-
ognition framework for enhanced law-enforcement services in smart
cities. Future Generation Computer Systems,108:995–1007.
[14] Soldera, J., Schu, G., Schardosim, L. R., and Beltrao, E. T. (2017). Facial
biometrics and applications. IEEE Instrumentation & Measurement
Magazine, 20(2):4–30.
[15] Viola, P. and Jones, M. (2001). Rapid object detection us-ing a boosted
cascade of simple features. In Proceedings of the 2001 IEEE computer
society conference on computer vision and pattern recognition. CVPR
2001, volume 1, pages I–I. Ieee.
[16] Wazwaz, A. A., Herbawi, A. O., Teeti, M. J., and Hmeed,S. Y. (2018).
Raspberry pi and computers-based face detection and recognition sys-
tem. In2018 4th International Conference on Computer and Technology
Ap-plications (ICCTA), pages 171–174. IEEE
SECTION 5
Internet of Things Based
Embedded System
209
11
Survey Review on Artificial Intelligence and
Embedded Systems for Agriculture Safety:
A proposed IoT Agro-meteorology System
for Local Farmers in Morocco
Abstract
Agriculture is crucial to human life. It is the main food provider, yet it
remains prone to climate change and other challenges, notably in develop-
ing countries. Some of the most prominent challenges are related to the sur-
veillance and monitoring of climate, water resources, and soil quality. The
evolution of Artificial Intelligence (AI), embedded systems, and the Internet
of Things (IoT) is undeniable. Their adoption at a large scale in agricultural
activities offers a great opportunity to overcome many of these challenges.
Unfortunately, their adoption in many countries is still limited, and farmers
are still the most forgotten of this evolution towards agriculture 4.0.
In this context, IoT-embedded systems can play a major role as a tech-
nology mediator that mutually boosts the productivity of farmlands and
promote A.I. in agricultural fields. A distinctive feature of this study is to
review and investigate the most suitable solutions for rural eras and familial
agriculture. To this end, the present chapter proposes a study and design of
a low-cost local weather station for local farmers. The proposed solution is
211
212 Survey Review on Artificial Intelligence and Embedded Systems
11.1 Introduction
All around the world, agriculture is dramatically confronted with serious
challenges, such as climate change, water scarcity, and heavy dependence
on fossil fuels [1]. furthermore, the COVID19 pandemic threatens the entire
agricultural value chain[2]. Indeed, the impacts are intensely intertwined and
the agro-food chain is the most considered by these issues[3].
However, many developing countries have been excluded from most
of the benefits of sustainable development [4]. After the Kyoto Protocol and
COP22, little seems to have changed; but the truth is that traditional energy
sources (firewood, biomass waste, human, and animal traction) remain the
main and often the only energy resources available to millions of rural
families[5]. Therefore, new approaches are immediately required to over-
whelm classic practices that consider farms and ecosystems as industrial
entities [6].
In fact, we are experiencing a world with a growing population and mil-
lions of people going hungry. It will take all our knowledge and imagination
to deal in an integrated way with the challenges of maintaining soil fertility,
and water resources, eliminating pests and diseases that affect both crops
and animals. Thus, sustainability and precision in agriculture must focus on
integrate environmental health, energetic profitability, social and economic
fairness[7].
Actually, the global temperatures were estimated at 1.7 degrees surpass-
ing the global average by approximately 0.15 degrees [8]. In a Mediterranean
context, the dry and warm summer climate can influence agriculture. Those
countries are highly exposed to climate change too, an increase in the deg-
radation of lands is predicted as a consequence of reduced precipitation [9].
Thus, climate data remains one of the sources of information necessary for
the integration of any possible technology related to the biosphere. In agricul-
ture as for renewable energies, parameters such as wind, temperature, relative
humidity, and availability of resources must be quantified and calculated rig-
orously in order to ensure a good prediction or use.
From another point of view, intelligent systems are able to process
information, deliver complex reports, serve farmers in decision making
11.1 Introduction 213
[21] soil water content and convolutional neural Pearson correlation, deep learning prediction accuracy
meteorological data network soil water content Near-infrared (NIR) 93%
autocorrelation spectroscopy
[22] moisture, weather Machine Learning –– IOT, ZigBee technology, Drought prediction
forecast and water level algorithm – Arduino microcontroller
[23] meteorological data, soil artificial neural network Evapotranspiration unmanned aerial vehicle Performance to predict
humidity. (ANN) model (UAV) water stress
Machine learning IoT.
techniques
[24][25] Different conditions ANN Optimal model sizing hybrid intelligent Sizing of optimal stand-
systems alone PV-systems
[26] Climatic Data Regression Optimal model sizing Standalone power seasonal variability of
comparison supply (SAPS) solar insolation
[27] Different conditions and genetic algorithms Optimal model sizing sizing grid-connected stability voltage
imputes PV-system distribution
[28] Climatic data Feed Forward Neural Optimal model sizing Pumping systems Photovoltaic power
Network forecast
11.2 AI-enabled Embedded Systems for Agriculture
use of wireless sensor networks and IoT. In other hands, the development
of thermal imaging in crops has industrialized thermal cameras that can
offer new opportunities for estimating the hydraulic conditions of plants by
acquiring thermal indices of plants, which help to precisely determine water
needs [14]. In more advanced cases, AI reasoning of soil water balance and
forecasting are able to optimize irrigation and secure farms against probable
flooding, droughts, and disasters [15][16].
forecasting
Disease detection DB Web-Based Expert System High performance Internet dependence [48]
217
218 Survey Review on Artificial Intelligence and Embedded Systems
Figure 11.2 The site-specific crop management based on three-dimensional approach that
assesses inputs and outputs from fields to watershed and regional scales [50].
Figure 11.3 Crop yielding map using machine intelligence algorithms [54].
quantities [52]. By using AI, the chances of plant or soil degradation are
reduced and crops are able to meet the market trends, maximize the return
of different soils [53], and ensure a better crop mapping for decision-making
(Figure 11.3).
expelled by the development of using IOT coupled with sensors [72]. Several
solutions were proposed to forecast weather conditions in particular areas
[73][74][75].
Table 11.3 summarizes more recent solutions for forecasting weather
variables in many parts of the world. Generally, the design involves sensors
coupled to a master controller. The gathered outputs are processed by a CPU
or an online cloud platform. A mobile application can be proposed for the
display of results with precision, in the short term [76].
Load Multi Linear Different parameters higher accuracy Short term [85]
forecasting Regression (MLR)
221
222 Survey Review on Artificial Intelligence and Embedded Systems
After having carried out the assembly, to test the MH-RD the Arduino
project can use the code source presented in (Appendix B).
• Light sensor: A photo-resistor or LDR is a dependent resistor sensor to
light. In a decreased light, the sensor resistance increases respectively.
This characteristic makes the LDR more useful for greenhouses and
automated solar pumping.
Appendix A
Appendix B
Appendix C
Appendix D
References
[1] H. Turral, J. Burke, and J.-M. Faurès, Climate change, water and food
security., no. 36. Food and Agriculture Organization of the United
Nations (FAO), 2011.
[2] S. Aday and M. S. Aday, “Impact of COVID-19 on the food supply
chain,” Food Qual. Saf., vol. 4, no. 4, pp. 167–180, 2020.
[3] F. FAO, “The future of food and agriculture–Trends and challenges,”
Annu. Rep., 2017.
[4] L. T. Clausen and D. Rudolph, “Renewable energy for sustainable rural
development: Synergies and mismatches,” Energy Policy, vol. 138,
p. 111289, 2020.
[5] C. Aall, K. Moberg, J.-P. Cerone, E. Reimerson, and F. Dorner, “Policies
for reducing household green house gas emissions,” 2017.
[6] A. A. Mana, A. Allouhi, K. Ouazzani, and A. Jamil, “Feasibility of agri-
culture biomass power generation in Morocco: Techno-economic anal-
ysis.,” J. Clean. Prod., p. 126293, 2021.
[7] J. L. Johnston, J. C. Fanzo, and B. Cogill, “Understanding sustainable
diets: a descriptive analysis of the determinants and processes that influ-
ence diets and their impact on health, food security, and environmental
sustainability,” Adv. Nutr., vol. 5, no. 4, pp. 418–429, 2014.
[8] H.-O. P. Mbow, A. Reisinger, J. Canadell, and P. O’Brien, “Special
Report on climate change, desertification, land degradation, sustainable
land management, food security, and greenhouse gas fluxes in terrestrial
ecosystems (SR2),” Ginevra, IPCC, 2017.
[9] V. Simonneaux, A. Cheggour, C. Deschamps, F. Mouillot, O. Cerdan,
and Y. Le Bissonnais, “Land use and climate change effects on soil ero-
sion in a semi-arid mountainous watershed (High Atlas, Morocco),” J.
Arid Environ., vol. 122, pp. 64–75, 2015.
[10] N. Walmsley and G. Pearce, “Towards sustainable water resources man-
agement: bringing the Strategic Approach up-to-date,” Irrig. Drain.
Syst., vol. 24, no. 3–4, pp. 191–203, 2010.
[11] Y. Shekhar, E. Dagur, S. Mishra, and S. Sankaranarayanan, “Intelligent
IoT based automated irrigation system,” Int. J. Appl. Eng. Res., vol. 12,
no. 18, pp. 7306–7320, 2017.
[12] J. Muangprathub, N. Boonnam, S. Kajornkasirat, N. Lekbangpong, A.
Wanichsombat, and P. Nillaor, “IoT and agriculture data analysis for
smart farm,” Comput. Electron. Agric., vol. 156, pp. 467–474, 2019.
[13] K. Jha, A. Doshi, P. Patel, and M. Shah, “A comprehensive review on
automation in agriculture using artificial intelligence,” Artif. Intell.
Agric., vol. 2, pp. 1–12, 2019.
References 235
Abstract
Craft manufacturing activities are rapidly developing all over the world as
the global economy grows. Meanwhile, the handcrafted industry is boom-
ing, particularly traditional crafts that have been passed down through gen-
erations. Customers, such as tourists, require more scientific and practical
guidance to ensure that a craft product is genuine in these circumstances.
An Internet of Things (IoT)-based craft system is presented in this chapter
to track the craft product in its life cycle, especially, when it reaches its des-
tination. The embedded system provides suggestions for end-users and the
manufacturing actors using a product identification label. During the process
of manufacturing products and shopping, the collected data are exchanged
between the embedded sensors and the customer’s smartphone. This data will
then be submitted to cloud computing for processing to extract beneficial
guiding information for consumers and artisans. A detailed implementation
of the various system components is also presented in the rest of this chapter.
12.1 Introduction
Along with the increasing growth of the digital domain and the changing busi-
ness environment characterized by large globalization, several companies are
looking for new ways to value their consumers and get a competitive advantage
243
244 IoT-Based Intelligent Handicraft System Using NFC Technology
The following sections of the chapter are structured as follows: The next
section introduces relevant technologies and work. Section 3 describes the
proposed system design, including a detailed description of several compo-
nents. The system implementation is described in detail in Section 4. Finally,
the last part is for the conclusion and potential future work.
and business processing to convert the collected data into useful information
helping to achieve a balanced knowledge base.
The application layer consists of custom applications that make use of
models established previously in the computing layer. Among these applica-
tions are many that control production, sales, and the manufacturing process
of Craft Hand products.
and constructs. As shown in the reference [18], the automated data analysis
can reveal information about previously undiscovered interactions between
objects, their surroundings, and their users, allowing for the improvement of
their behavior. Real-time data analysis integrated into physical systems, in
particular, has the potential to allow novel types of remote control.
Depending on the rate of data generation, there exist two different mod-
els of data processing: Data may either be stored and analyzed as a batch, or
it can be processed directly as a stream based on the speed of data generation.
Batch analysis: large amounts of data cannot be examined on a single
server. It requires data distribution over multiple connected storage devices.
After saving data, it may be examined in batches by distributed algorithms
that perform jobs collaboratively. Each computer in a data center may have
numerous cores that algorithms may leverage for parallel processing.
Several frameworks support distributed batch data analysis. Hadoop[19]
is one of the most popular frameworks. It is considered a scalable data pro-
cessing solution for storing and batch analyzing very large amounts of data
following the map and reducing paradigm.
Analysis of streaming data: Continuously generated data from various
sources (sensors, recommendations, production...) must be analyzed sequen-
tially and gradually on a record-by-record basis. The Stream analysis will
help the production unit discover new opportunities, strong and fast customer
interactions, and revenue streams to increase profits data to build a reliable
system.
In the proposed system, both scenarios are used, batch analysis and
streaming to improve user experiences, the performance of craftsmen, and
a labialization process. Batch processing is deployed when the system gets
massive quantities of data from several sources, such as product data input,
label data entry, and transactions that have been performed by customers. To
analyze this high amount of data, we used the Hadoop MapReduce frame-
work, which provides vast storage capacity for all types of data and analyzes
this huge volume of data at once.
Stream processing is used in the system to analyze client feedback in
real-time allowing data to be sent to the analysis tools as soon as it is cre-
ated along with obtaining fast analysis results. There are several open-source
stream processing systems available. In the suggested system, we chose
Apache Kafka which seeks to provide a uniform, real-time, low-latency sys-
tem for managing data streams.
As the closed-loop PLM is also adopted, the system can detect through
user feedback if a manufacturer maintains the requirements of the labializa-
tion process. In addition, a recommender system is implemented to assist
12.4 System Implementation 253
users in the process of choosing the products that meet their needs using the
same real-time gathered data such as location and NFC tag data.
for decision making. The processed data is properly shown to consumers via
dynamic layouts as shown in Figure 12.8.
To ensure a reliable and secure connection between the different archi-
tecture modules. This implementation uses REST web services based on the
HTTP methods (GET, POST, PUT and DELETE) and on the JSON to format
for the communication between a server and Smartphone.
12.5 Conclusion
This chapter has presented the architecture of an intelligent system with an
early prototype based on a collection of technologies, such as the Internet of
Things and cloud computing to monitor the lifecycle of handcrafted items in
order the guarantee the authenticity of those types of products. The design
of this proposed system, as described above, is made to ensure the well-
functioning of the craft hand system through the use of several modules for
collecting, analyzing, storing, processing, and transforming data to guarantee
256 IoT-Based Intelligent Handicraft System Using NFC Technology
12.6 Acknowledgments
This work was supported by the National Center for Scientific and
Technical Research of Morocco identified by the following number:
Alkhawarizmi/2020/28
References
[1] I. Lee, “The Internet of Things for enterprises: An ecosystem, archi-
tecture, and IoT service business model,” Internet of Things, 2019, doi:
10.1016/j.iot.2019.100078.
[2] D. Kiritsis, “Closed-loop PLM for intelligent products in the era of the
Internet of things,” CAD Computer Aided Design, 2011, doi: 10.1016/j.
cad.2010.03.002.
[3] H. Ministry, “Labeling and certification,” 2021, 2021. https://mtataes.
gov.ma/fr/artisanat/qualite-et-innovation/labellisation/ (accessed Nov.
24, 2021).
[4] H. Ministry, “LE LABEL NATIONAL DE L’ARTISANAT DU
MAROC,” 2021. https://label.artisanat.gov.ma/consomateur?lng=fr
(accessed Nov. 24, 2021).
[5] D. Gil, A. Ferrández, H. Mora-Mora, and J. Peral, “Internet of things: A
review of surveys based on context aware intelligent services,” Sensors
(Switzerland). 2016, doi: 10.3390/s16071069.
[6] L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A sur-
vey,” Computer Networks, vol. 54, no. 15, pp. 2787–2805, 2010, doi:
10.1016/j.comnet.2010.05.010.
[7] K. Ashton, “That ‘Internet of Things’ Thing - RFID Journal.pdf,” RFID
Journal, 2009. .
[8] A. Whitmore, A. Agarwal, and L. Da Xu, “The Internet of Things—A
survey of topics and trends,” Information Systems Frontiers, 2015, doi:
10.1007/s10796-014-9489-2.
References 257
259
13
SoC Power Estimation: A Short Review
Abstract
In recent years, the development of embedded systems has challenged the
electronics industry, driven by strong market demand for ever-evolving appli-
cation functionalities. However, increasing application functionalities require
an additional power budget, which consequently shortens the system’s battery
lifetime. Estimating an application’s power consumption early in the design
process creates an opportunity to extend the battery lifetime. Therefore, accu-
rate and efficient performance analysis and estimation at all levels of abstrac-
tion throughout the design phase are becoming increasingly important. This
chapter examines the concepts of single and multiprocessors, then focuses
on giving a detailed description of the different existing abstraction levels. It
also examines and analyses existing energy estimation techniques. It features
virtual prototyping platforms that combine scalable hardware and software to
estimate and evaluate energy consumption patterns.
13.1 Introduction
The number of transistors was 32 when Moore’s Law [1] was predicted.
However, as processing speed has increased, so has the number of transistors
in processors, making them no longer scalable under Moore’s law as a large
number of transistors doubles every two years. Due to recent breakthroughs
in silicon technology, a large number of transistors can now be housed in a
261
262 SoC Power Estimation
Figure 13.1 Typical levels of abstraction and design flow phases in SoCs.
13.2 Background
Microprocessor efficiency has improved exponentially in recent years. To
achieve parallelism, techniques [11] have been split, starting with pipelining
and ending with multicore processors. In this section, we shed more light on
how technologies attempt to exploit any degree of parallelism by identifying
some key terms as a first step. Next, we participate in the search for different
levels of abstraction used.
13.2.2.3 Pipelining
CPUs are typically able to handle one or two tasks at a time. For exam-
ple, most CPUs have instructions that add A to B and place the result in C.
Roughly, the data for A, B, and C could be encoded directly into the state-
ment. But it is seldom that simple when it comes to active implementation.
The data is pointed to by being routed to a memory location that holds it in
an address. It is seldom broadcast in raw form. It may take some time to get
the data out of memory after its address has been decoded, during which time
the CPU sits idle waiting for the requested data to be entered.
Most modern CPUs use a technique called instruction pipelining to
reduce the time required for these steps, where the instructions go through
many subunits one after the other. The single-cycle processor is divided into
numerous stages in this situation; Parts of the instructions are executed simul-
taneously. Figure 2.2 shows how pipelining takes advantage of the parallelism
of the instruction layer, which allows many instructions to run concurrently.
The result is an increase in the number of instructions that can be executed in
a given amount of time.
vector processors can mask the actual number of usable c ompute units as a
micro-architectural parameter, allowing the program to determine how much
work there is to do. With varying levels of parallelism, this also allows a sin-
gle binary to be implemented on cores with near-optimal performance.
later, but its specific value will only be determined with each cycle. At this
level, models are written in hardware description languages (HDLs) such as
VHDL, SystemVerilog, or Verilog.
EDA synthesis tools perform the translation to gate level, allowing for
automatic circuit optimization in terms of surface, power, and timing. Several
industrial tools, such as
Philips’ Petrol [12], allow RTL level consumption evaluation. To
acquire exact values for consumption elements, designers typically use sim-
ulation with input triggers, which minimize estimation error. Compared to
the findings produced using SPICE [13-14] this error can range from 10%
to 15%.
ET = PT Texecution (13.2)
(
I leak = I s AT 2 e(
v1 vdd + v2 ) / T
+ Be(
v3 vdd + v4 )
) (13.4)
Pdynamic = ∑S C V
i
i Li
2
DD f (13.7)
Where Si and CL are the switching activity and load capacitance of gate i
i
respectively.
Again, we can control the amount of complex power consumed simply
by changing frequency-voltage pairs. This method was used to get a different
ratio of energy per command. Introduced commercially as Intel’s SpeedStep
technology [29] and AMD’s PowerNow [30].
13.4.1 WATTCH
WATTCH [5] is one of the first scientific tools for system-level consumption
estimation. In 2000, the first article describing this tool, created at Princeton
University in collaboration with Intel, was published. WATTCH allows anal-
ysis and optimization of microprocessor performance with an acceleration
factor of 1000 compared to tools at the mask drawing level and an error of
less than 10%. This tool provides models of configurable consumption for
the components of a microprocessor. Equation 13.8 shows the total power
consumed for the various sub-blocks of a multiprocessor system.
ET = ∑ E (i)
i
T (13.8)
13.4.2 AVALACHE 273
Where i defines the type of component and j defines the type of events.
13.4.2 AVALACHE
One of the most common tools based on virtual prototyping is the
AVALANCHE tool [33], which was the subject of an NEC laboratory publi-
cation in 2002. AVALANCHE offers consumption models for the following
components: cache, memory, and processor.
Equation 13.10 presents the processor model defined by two types of
events:
N0 and N1 are respectively the number of cycles when the processor is idle
and the number of cycles during which the processor is active. They are
obtained from an ISS simulator. E0 and E1 represent the elementary energies.
13.4.3 PowerVIP
PowerViP [34] is an experimental tool designed to estimate consumption,
implemented in a TLM-level simulator, [35] ViP. This tool was created in
collaboration with Yonsei University in Korea by Samsung Electronics.
As for AVALANCHE, the processor is modelled by two states busy a
nd idle.
ET (i ) = ∑ N (i, j)e(i, j)
j
(13.11)
is used to monitor and control the simulation. During the simulation and
based on the power consumption as the power density of each area, ATMI
calculates the temperature at a regular pace, such as once every millisec-
ond. The ATMI is called directly from SystemC code since it is packaged
as a C library. Calculating the temperature of the components and the given
power densities is simply a function call to the ATMI library. As an example
shown, when the temperature reaches some thresholds, the temperature sen-
sor defines a call-back method that triggers an interrupt.
The general formula is expressed by Equation 13.13.
13.5 Discussion
When performing tasks on on-chip multiprocessor systems, the new trend
calls for optimizing legacy applications for efficient use of battery resources.
Therefore, running its applications helps developers reconsider their device
design for efficient use of battery resources in the early stages of develop-
ment. In general, a power estimation approach relies on hardware-level or
higher-level estimation to predict consumption during the execution of a task.
On the one hand, low-level estimates were the first work targeting the level
of abstraction that provided the most data. On the other hand, the demand for
high-level power estimation simulators has increased recently, which enables
early exploration of the design field. WATTCH provides a reasonable balance
between simulation precision and speed; however, not all parts that make up
the system in particular connections are modeled.
WATTCH provides a reasonable balance between simulation precision
and speed; however, not all parts that make up the system in particular con-
nections are modeled. The characterization approach is not homogeneous.
Depending on the design of each node, calculating the power consumption at
each connection to a sub-block uses four different models, making it difficult
to generalize. Thus, in addition to the missing modeling of the circuits, the
classification is also the weak point of WATTCH. AVALANCHE offers a rea-
sonable balance between accuracy and speed when it comes to estimating the
energy consumed by the processor alone; however, the accuracy is slightly
degraded when considering a complete system. In fact, the power consump-
tion of the system measured with AVALANCHE does not take into account
the interconnection consumption. The models of the various components are
delimited from each other in different ways, which makes it difficult to char-
acterize them in general.
276 SoC Power Estimation
For the sake of accuracy, the designers of the Power VIP tools decided
to use a different characterization method for each part of the studied plat-
form: memory sheets and RTL level estimation tool for the rest. When devel-
oping the bus model, PowerViP uses a different architecture under different
conditions. We recognize that this heterogeneity increases the time to build
useful models for hardware components other than the current ones.
In HSL, the basic principle is to use the regression method to classify
specific energies. The benefit of this approach is fairly general. However, for
regression characterization to be effective, it is necessary to write test vectors
(i.e., different applications), each capable of activating a component. These test
vectors were not provided by the authors. However, we believe that the opera-
tion of several components, especially the processor and its cache, is difficult to
separate. This role is much more difficult to fulfil when the multicore model is
involved. Other research efforts propose an EDPE technique that could be used
to choose between different implementations of the same software application
on the same hardware architecture. Although EDPE is used to compare the
power consumption for different hardware platforms or to compare different
software algorithms used on the same hardware platform, it is often only used
for homogeneous architectures. The power and temperature estimates can be
captured early by using recent power and temperature modeling methods, but
still lack accuracy as long as the static power model is based on a linear equa-
tion. The framework used improved simulation speed, with some TLM mod-
ules using a DMI (Direct Memory Interface) technique. It speeds up memory
access by providing the initiator with a point to the memory array. Therefore,
the initiator uses the memory pointer directly when accessing memory, rather
than creating a transaction that travels through the bus.
include more than one processor, which describes its flexibility [42]. Our
methodology is currently being further explored and our future work is to
validate it using hardware simulations.
13.7 Conclusion
In this chapter, we provide a comprehensive literature review on performance
estimation techniques for single and multi-processors by presenting various
techniques used and examining their problems, advantages, and limitations.
An analysis was performed for each technique. Finally, we have proposed a
new methodology with which the results obtained are comparable and some-
times better compared to others. It is very clear from the discussion that a
system-on-chip design does not have a perfect way of estimating the power.
It depends not only on the architecture or microarchitecture to have the right
accuracy in the estimation but also on the proposed model and the way it will
be implemented, without forgetting the specifics of the language used. The
combination of certain proposed techniques makes it possible to provide an
optimal solution. Future work proposes an assessment of a large platform
containing multiple IPs and heterogeneous multiprocessors.
References
[1] Shalf J. 2020. The future of computing beyond Moore’s Law. Phil. Trans.
R. Soc. A 378: 20190061. http://dx.doi.org/10.1098/rsta.2019.0061.
278 SoC Power Estimation
Abstract
The design complexity of embedded systems is constantly increasing. For
this reason, robust approaches must be adopted. The principal aims behind
adopting such approaches are: reducing the development time, achieving the
high possible performance, and meeting the functional specifications of the
system. The compound design (codesign) is among the most powerful meth-
odologies used to fulfill the aforementioned requirements. The Hardware
Software Partitioning (HSP) is one of the most important steps in the code-
sign. Its role is to define the best possible partitions for the hardware and
software functionalities. This chapter focuses on major algorithms proposed
in the domain of hardware-software partitioning. It sheds light on some tradi-
tional exact algorithms and several heuristics and meta-heuristic algorithms.
New perspectives are also presented including proposed solutions based on
some powerful techniques such as the Lagrangian relaxation method, game
theory, and Balas method. The main common objective accentuated the
optimization of the involved parameters in the HSP process as well as the
speedup of the algorithms.
14.1 Introduction
An Embedded System (ES) is generally composed of several hardware (HW)
blocks and software (SW) tasks implemented together in one chip. Depending
283
284 Hardware/Software Partitioning Algorithms
on the complexity of the system, the SW blocks are running either on a single
processor or on multiprocessors. Other than meeting its functional specifica-
tions, and ES faces several challenges related to non-functional requirements
such as the marketing time, the execution time, the hardware cost, power con-
sumption, and so on. To achieve these requirements, new approaches have
to be adopted while designing an ES. One of the distinguishable approaches
is the codesign methodology. The codesign is composed of a set of engi-
neering processes grouped into four processes: the Co-specification process,
Co-synthesis process, Co-simulation process, and Co-verification process.
The Co-specification describes the functionalities of the system whereas the
Co-synthesis defines the architecture of the system. The Co-simulation step,
on the other hand, simulates the hardware and the software simultaneously
before the prototyping. Lastly, the Co-verification phase mathematically ver-
ifies whether the specifications of the system are met. HSP is part of the
Co-synthesis. It is considered to be a vital process in the codesign. The objec-
tive of the HSP is to split the system’s functionalities into two major sets
(hardware set H and software set S), and choose for each functionality the
best implementation possible (HW or SW).
Many studies have been conducted in the literature considering the fol-
lowing parameters: the execution time (performance) and the hardware area
(cost). For instance, in [1], a proposed solution was based on the Genetic
Algorithm. While in [2], the solution was based on the Tabu Search Algorithm.
In [3], the Simulated Annealing algorithm was used to solve the HSP prob-
lem. Other studies also considered power consumption which is especially
important for portable devices. For example, in [4] (Hierarchical Clustering),
[5] and [6] the proposed algorithms dealt with the problem on the basis of
the three parameters: execution time, hardware cost, and power consump-
tion. Other studies added more parameters by considering, for example, the
access to the shared memory and the IP (Intellectual Property) reuse. There
are mainly two families of algorithms: the family of classical algorithms and
the family of modern algorithms. This chapter gives the principles of some
of the significant algorithms of each family and the advantages and disadvan-
tages of each of them.
The perspectives present new heuristic algorithms. These perspec-
tives can be grouped into three categories: The first category is based on
the Lagrangian relaxation method. The second category is based on the
GO game, while the third category is built upon the Balas method and 0-1
Knapsack algorithm.
The rest of the chapter is organized as follows: An overview of the parti-
tioning problem is explained in Section 14.2. Section 14.3 reviews the most exact
14.2 Overview of Partitioning Problem 285
algorithms used to solve the problem. The solutions based on classical heuris-
tic algorithms are presented in Section 14.4. Modern proposed approaches are
explained in Section 14.5. The new perspectives are presented in Section 14.6.
Finally, Section VII is dedicated to conclusions and possible future research.
xi , k ∈{0,1} ,1 ≤ i ≤ n,1 ≤ k ≤ m
m
∑x
k =1
i ,k = 1, 1 ≤ i ≤ n
m n
minimize ∑∑x
k =1 i =1
i ,k * ci ,k
subject to : ∑xi =1
i ,k ≤ hk ,1 ≤ k ≤ m (14.1)
The binary variable xi,k indicates whether the block belongs to the partition Pk.
ci,k is the cost of the block if it belongs to Pk . hk is the max number of blocks
that can be part of Pk. Tools like CPLEX can be used to solve the ILP prob-
lems. An application of ILP method to solve the HSP problem is given in [8].
parents for each individual. Fourth, the crossover is applied to each parent
to generate a new offspring. Fifth, the newly generated offspring is mutated.
The elitism principle is applied to keep the individuals with the highest fit-
ness in the newly generated population. The algorithm is repeated from the
evaluation step until a terminal condition is met.
The GA algorithm is adopted to solve the HSP problem in several articles
([14, 15, 16, 17, 18, 19, 20, 21, 22]). The proposed approaches used different
forms of GA to optimize the overall execution time and the total hardware cost
of the system. Numerous approaches deployed a combination of the GA algo-
rithm with another heuristic algorithm. In [23], the GA algorithm is combined
with the Tabu Search algorithm. In [24], however, the algorithm combines the
genetic GA and the Clustering Algorithm with the objective of minimizing the
hardware cost under the constraint of the execution time.
(
min f1 ( x1 , x2 ,…, xn ) , f2 ( x1 , x2 ,…, xn ) ,…, fm ( x1 , x2 ,…, xn ) )
subject to : gk ( x1 , x2 ,…, xn ) ≤ 0 , 1 ≤ k ≤ p
xi ∈{0,1} , 1 ≤ i ≤ n (14.2)
∀ i ∈{1,…, n} , fi ( X1 ) ≤ fi ( X 2 )
∃ j ∈{1,…, n} , f j ( X1 ) < f j ( X 2 ) (14.3)
The set of solutions that are not dominated by the other solutions is called
the set of Pareto-Optimal solutions. In the example of Figure 14.5, the set of
Pareto-Optimal is composed of points: {1,2,4,6}.
Based on Pareto Optimal, several algorithms were proposed to deal
with the HSP problem. The articles presented in [49], [50], [51], [52], [53]
and [54] are examples of such proposed approaches.
For optimal path optimization, the adopted model for the system’s rep-
resentation is the same as presented in [55]. As shown in Figure 14.6, the
basic scheduling blocks (BSB) are directly derived from functional specifica-
tions represented as data/control flow graphs. The parent blocks can then be
grouped as a single block instead of all the child blocks that composed them.
The corresponding computational model and the HW/SW partitioning
problem to find the shortest path in a direct graph, having a unique entry
point and unique exit point, are respectively represented in Figure 14.7.a and
Figure 14.7.b as used in [56].
By finding the shortest path from the entry to the exit points, the sys-
tem is automatically split into two sets (HW, SW). A plethora of studies has
adopted this model to solve the HSP problem and have used different algo-
rithms for solving the optimal path problem. Examples of those studies are
presented in [10], [55], [56], [57], [58], [59], [60] and [61].
For the critical path optimization, the system is modeled as any data
acyclic graph (DAG). The critical path is the longest path in terms of execu-
tion time. The objective is to minimize this critical path and also optimize
the other parameters such as the hardware cost and the power consumption.
In [62], an algorithm based on a combination of the Shuffled Frog Leaping
Algorithm and Greedy Algorithm was proposed to minimize the critical path.
In this study, the hardware cost parameter is defined as a constraint. The pro-
posed approaches presented in [63] and [64] have the same objectives as the
problem described in [62]. In [63], the execution time of the whole system
14.5 Modern Heuristic Approaches 295
Figure 14.6 Partitioning model. [55]: (a) example of actual data-dependencies between
hardware and software blocks, (b) how data dependencies between adjacent hardware blocks
and software blocks are interpreted in the model.
Figure 14.7 Partitioning model: (a) example of a computational model, (b) corresponding
graph with unique entry and unique exit points.
296 Hardware/Software Partitioning Algorithms
is defined as the execution time of the critical path in the graph. In [64],
the approach was based on the Brainstorm Optimization Algorithm. Another
approach based on critical path optimization is presented in [65].
bandwidth, and the shared memory usage. To achieve this goal, two heuristic
approaches with different purposes were proposed. The first approach ([86])
took advantage of the possibility of Balas’ method to minimize an objective
function under several constraint functions. The study used this method to
minimize the hardware area (cost) parameter under the given constraints on
the other parameters: the execution time, the power consumption, and the
shared memory usage. The second proposed approach ([87]) had the inverse
goal. In fact, the approach aimed to optimize several metrics (power con-
sumption, execution time, quality, bandwidth, development time, and shared
memory usage) concerning the constraint on one parameter (hardware cost).
The algorithm is based on the total profit which is obtained when each block
is implemented in HW. This profit is calculated using a sum function of each
weighted parameter’s benefit if a block is implemented in HW over its devel-
opment in SW. This solution was developed using the Knapsack algorithm
and it was compared to GA and the SA algorithms. Empirical results demon-
strated that the KP method is the best choice for the proposed approach as it
proved to be faster and to achieve more optimal solutions.
14.7 Conclusion
This chapter studied the hardware/software Partitioning process. First, it
presented the problem and then studied the different algorithms used to
tackle the aforementioned problem. Some of the proposed algorithms were
based on exact algorithms, while the others were based on heuristic and meta-
heuristic algorithms. Most of the proposed approaches studied the HSP prob-
lem with two parameters: the hardware cost and the performance (overall
execution time). Some approaches studied the multiobjective optimization
by considering the power consumption parameter as well. The new perspec-
tives have studied the partitioning problem while taking into consideration
different major aspects such as the criteria (the hardware cost, the execution
time, the power consumption, etc.), the graph modeling of the system, and
the software architecture (single-processor or multi-processors). The pro-
posed algorithms were based on the Lagrangian relaxation method, game
theory, and Balas method. Each of the proposed algorithms was compared
to the Simulated Annealing algorithm and the Genetic Algorithm. Different
empirical results proved that the proposed algorithms presented fast con-
vergence and led to more optimal solutions. In future work, as the software
architecture takes more and more place in system design, the goal would be
to study the tasks scheduling problem and the optimization of the access to
the shared memory.
References 301
References
[1] Shuai Guo Li, Fu Jin Feng, Hua Jun Hu, Cong Wang, and Duo Qi.
Hardware/software partitioning algorithm based on genetic algorithm.
In Journal of Computers, volume 9. 2014.
[2] Lin Geng, Zhu Wenxing, and Ali Montaz. A tabu search-based memetic
algorithm for hardware/software partitioning. In Mathematical Problems
in Engineering, pages 1309–1315. 2014.
[3] Sudarshan Banerjee and Nikil Dutt. Very fast simulated annealing for
hw-sw partitioning. In Technical Report, CECS-TR-04-17. 2004.
[4] Jorg Henkel. A low power hardware/software partitioning approach for
core-based embedded systems. In Proceedings 1999 Design Automation
Conference (Cat. No. 99CH36361), pages 122–127. IEEE, 1999.
[5] Wenjun Shi, Jigang Wu, Siew-kei Lam, and Thambipillai Srikanthan.
Algorithms for bi-objective multiple-choice hardware/software parti-
tioning. In Computers & Electrical Engineering, volume 50, pages 127–
142. 2016.
[6] Edwin Sha, Li Wang, Qingfeng Zhuge, Jun Zhang, and Jing Liu. Power
efficiency for hardware/software partitioning with time and area con-
straints on mpsoc. In International Journal of Parallel Programming,
volume 43, pages 381–402. 2015.
[7] Wayne Wolf. A decade of hardware/software codesign. Computer,
36(4):38–43, 2003.
[8] Ralf Niemann and Peter Marwedel. Hardware/software partitioning
using integer programming. In Proceedings of the 1996 European
Conference on Design and Test, pages 473–, 1996.
[9] P. V. Knudsen and J. Madsen. Pace: a dynamic programming algorithm
for hardware/software partitioning. In Proceedings of 4th International
Workshop on Hardware/Software Co-Design. Codes/CASHE ’96, pages
85–92, 1996.
[10] Jigang Wu and Thambipillai Srikanthan. Low-complex dynamic pro-
gramming algorithm for hardware/software partitioning. Inf. Process.
Lett., 98(2):41–46, 2006.
[11] Jens Clausen. Branch and bound algorithms-principles and examples.
Department of Computer Science, University of Copenhagen, pages
1–30, 1999.
[12] Mann Zoltan Adam, Andras Orban, and Peter Arato. Finding optimal
hardware/software partitions. In Form. Methods Syst. Des., volume 31,
pages 241–263. 2007.
[13] Wu Jigang, Baofang Chang, and Thambipillai Srikanthan. A hybrid
branch-and-bound strategy for hardware/software partitioning. In
302 Hardware/Software Partitioning Algorithms
[25] Petru Eles, Zebo Peng, Krzysztof Kuchcinski, and Alexa Doboli.
Hardware/software partitioning with iterative improvement heuris-
tics. In Proceedings of the 9th International Symposium on System
Synthesis, pages 71–. 1996.
[26] Xibin Zhao, Hehua Zhang, Yu Jiang, Songzheng Song, Xun Jiao, and
Ming Gu. An effective heuristic-based approach for partitioning. Journal
of Applied Mathematics, 2013, 2013.
[27] Peng Liu, Jigang Wu, and Yongji Wang. Hybrid algorithms for hard-
ware/software partitioning and scheduling on reconfigurable devices.
Mathematical and Computer Modelling, 58(1-2):409–420, 2013.
[28] Jigang Wu, Pu Wang, Siew-Kei Lam, and Thambipillai Srikanthan.
Efficient heuristic and tabu search for hardware/software partitioning.
The Journal of Supercomputing, 66(1):118–134, 2013.
[29] Lanying Li and Min Shi. Software-hardware partitioning strategy using
hybrid genetic and tabu search. In 2008 International Conference on
Computer Science and Software Engineering, volume 4, pages 83–86.
IEEE, 2008.
[30] MC Bhuvaneswari and M Jagadeeswari. Hardware/software partition-
ing for embedded systems. In Application of Evolutionary Algorithms
for Multi-objective Optimization in VLSI and Embedded Systems,
pages 21–36. 2015.
[31] G. Lin. An iterative greedy algorithm for hardware/software partition-
ing. In 2013 Ninth International Conference on Natural Computation
(ICNC), pages 777–781. 2013.
[32] Joon Edward Sim, Tulika Mitra, and Weng-Fai Wong. Defining neigh-
borhood relations for fast spatial-temporal partitioning of applica-
tions on reconfigurable architectures. In Proceedings of International
Conference on ICECE Technology, pages 121 – 128, 2009.
[33] Frank Vahid. Modifying min-cut for hardware and software functional
partitioning. In Proceedings of 5th International Workshop on Hardware/
Software Co Design. Codes/CASHE’97, pages 43–48. IEEE, 1997.
[34] Frank Vahid and Thuy Dm Le. Extending the kernighan/lin heuristic for
hardware and software functional partitioning. Design automation for
embedded systems, 2(2):237–261, 1997.
[35] ZoltaN Mann, Andras Orban, and Viktor Farkas. Evaluating the kernighan-lin
heuristic for hardware/software partitioning. International Journal of
Applied Mathematics and Computer Science, 17(2):249–267, 2007.
[36] dian Palupi Rini, Siti Mariyam Shamsuddin, and Siti Yuhaniz. Particle
swarm optimization: Technique, system and challenges. International
Journal of Computer Applications, 1, 09 2011.
304 Hardware/Software Partitioning Algorithms
[37] Amin Farmahini Farahani, Mehdi Kamal, Seid Mehdi Fakhraie, and
Saeed Safari. hw/sw partitioning using discrete particle swarm. In
Proceedings of the 17th ACM Great Lakes symposium on VLSI, pages
359–364,2007.
[38] MB Abdelhalim and SED Habib. Particle swarm optimization for hw/
sw partitioning. Particle Swarm Optimization, pages 49–76, 2009.
[39] Mohamed B Abdelhalim, AE Salama, and SE-D Habib. Constrained
and unconstrained hardware-software partitioning using particle
swarm optimization technique. In Embedded System Design: Topics,
Techniques and Trends, pages 207–220. Springer, 2007.
[40] Xiaohu Yan, Fazhi He, Neng Hou, and Haojun Ai. An efficient parti-
cle swarm optimization for large-scale hardware/software co-design
system. International Journal of Cooperative Information Systems,
27(01):1741001, 2018.
[41] Alakananda Bhattacharya, Amit Konar, Swagatam Das, Crina Grosan,
and Ajith Abraham. Hardware software partitioning problem in embed-
ded system design using particle swarm optimization algorithm. In
2008 International Conference on Complex, Intelligent and Software
Intensive Systems, pages 171–176. IEEE, 2008.
[42] Marco Dorigo, Mauro Birattari, and Thomas Stutzle. Ant colony optimi-
zation. IEEE computational intelligence magazine, 1(4):28–39, 2006.
[43] Christian Blum. Ant colony optimization: Introduction and recent
trends. Physics of Life Reviews, 2(4):353–373, 2005.
[44] Fabrizio Ferrandi, Pier Luca Lanzi, Christian Pilato, Donatella Sciuto, and
Antonino Tumeo. Ant colony optimization for mapping, scheduling and
placing in reconfigurable systems. In 2013 NASA/ESA Conference on
Adaptive Hardware and Systems (AHS-2013), pages 47–54. IEEE, 2013.
[45] Gang Wang, Wenrui Gong, and Ryan Kastner. Application partitioning
on programmable platforms using the ant colony optimization. journal
of Embedded Computing, 2(1):119–136, 2006.
[46] Leandro Nunes Castro, Leandro Nunes De Castro, and Jonathan Timmis.
Artificial immune systems: a new computational intelligence approach.
Springer Science & Business Media, 2002.
[47] S. Kellie and Z. Al-Mansour. Chapter four - overview of the immune
system. In Mariusz Skwarczynski and Istvan Toth, editors, Micro
and Nanotechnology in Vaccine Development, pages 63–81. William
Andrew Publishing, 2017.
[48] Yiguo Zhang, Wenjian Luo, Zeming Zhang, Bin Li, and Xufa Wang.
A hardware/software partitioning algorithm based on artificial immune
principles. Applied Soft Computing, 8(1):383–391, 2008.
References 305
[49] Jiang Hong, Yang Meng-fei, Zhang Shao-lin, and Wang Ruo-chuan. A
new method for multi-objective optimization problem. In 2013 IEEE 4th
International Conference on Electronics Information and Emergency
Communication, pages 209–212. IEEE, 2013.
[50] Cagkan Erbas, Selin C Erbas, and Andy D Pimentel. A multiobjective opti-
mization model for exploring multiprocessor mappings of process networks.
In Proceedings of the 1st IEEE/ACM/IFIP international conference on
Hardware/software codesign and system synthesis, pages 182–187, 2003.
[51] M Jagadeeswari and MC Bhuvaneswari. Efficient multiobjective
genetic algorithm for hardware-software partitioning in embedded sys-
tem design: Enga. International journal of computer applications in
technology, 36(3-4):181–190, 2009.
[52] Yang Liu and Qing Cheng Li. Hardware software partitioning using
immune algorithm based on pareto. In 2009 International Conference
on Artificial Intelligence and Computational Intelligence, volume 2,
pages 176–180. IEEE, 2009.
[53] Cagkan Erbas, Selin Cerav-Erbas, and Andy D Pimentel. Multiobjective
optimization and evolutionary algorithms for the application mapping
problem in multiprocessor system-on-chip design. IEEE Transactions
on Evolutionary Computation, 10(3):358–374, 2006.
[54] Anup Kumar Das, Akash Kumar, Bharadwaj Veeravalli, and Francky
Catthoor. Reliability and energy-aware codesign of multiprocessor
systems. In Reliable and Energy Efficient Streaming Multiprocessor
Systems, pages 75–101. Springer, 2018.
[55] Jan Madsen, Jesper Grode, Peter Voigt Knudsen, Morten Elo Petersen,
and Anne Haxthausen. Lycos: The lyngby co-synthesis system. Design
Automation for Embedded Systems, 2(2):195–235, 1997.
[56] Ji-Gang Wu, Thambipillai Srikanthan, and Guang-Wei Zou. New model
and algorithm for hardware/software partitioning. Journal of Computer
Science and Technology, 23(4):644–651, 2008.
[57] Wu Jigang and Srikanthan Thambipillai. A branch-and bound algo-
rithm for hardware/software partitioning. In Proceedings of the Fourth
IEEE International Symposium on Signal Processing and Information
Technology, 2004., pages 526–529. IEEE, 2004.
[58] Jigang Wu, Thambipillai Srikanthan, and Chengbin Yan. Algorithmic
aspects for power-efficient hardware/software partitioning. Mathematics
and Computers in Simulation, 79(4):1204–1215, 2008.
[59] Wenjun Shi, Jigang Wu, Siew-kei Lam, and Thambipillai Srikanthan.
Algorithms for bi-objective multiple-choice hardware/software parti-
tioning. Computers & Electrical Engineering, 50:127–142, 2016.
306 Hardware/Software Partitioning Algorithms
[72] Hokchhay Tann, Soheil Hashemi, R Iris Bahar, and Sherief Reda.
Hardware-software codesign of accurate, multiplier-free deep neu-
ral networks. In 2017 54th ACM/EDAC/IEEE Design Automation
Conference (DAC), pages 1–6. IEEE, 2017.
[73] Mingxuan Yuan, Xiuqiang He, and Zonghua Gu. Hardware/software
partitioning and static task scheduling on runtime reconfigurable fpgas
using a smt solver. In 2008 IEEE Real-Time and Embedded Technology
and Applications Symposium, pages 295–304. IEEE, 2008.
[74] Yuchun Ma, Jinglan Liu, Chao Zhang, and Wayne Luk. Hw/sw parti-
tioning for region-based dynamic partial reconfigurable fpgas. In 2014
IEEE 32nd International Conference on Computer Design (ICCD),
pages 470–476. IEEE, 2014.
[75] Ihsen Alouani, Braham L Mediouni, and Smail Niar. A multi-objective
approach for software/hardware partitioning in a multi-target tracking
system. In 2015 International Symposium on Rapid System Prototyping
(RSP), pages 119–125. IEEE, 2015.
[76] Naman Govil and Shubhajit Roy Chowdhury. Gma: a high speed
metaheuristic algorithmic approach to hardware software partitioning
for low-cost socs. In 2015 International Symposium on Rapid System
Prototyping (RSP), pages 105–111. IEEE, 2015.
[77] Mourad, Khetatba & Boudour, Rachid. (2021). A Modified Binary
Firefly Algorithm to Solve Hardware/Software Partitioning Problem.
Informatica. 45. 10.31449/inf.v45i7.3408.
[78] Xian, Tiong & Halim, Zaini & Leong, Ching & Gim, Tan. (2021).
Hardware-software partitioning using three-level hybrid algorithm
for system-on-chip platform. Bulletin of Electrical Engineering and
Informatics. 10. 466-473. 10.11591/eei.v10i1.2201.
[79] Adil Iguider, Mouhcine Chami, Oussama Elissati, and Abdeslam
En-Nouaary. Embedded systems hw/sw partitioning based on lagrang-
ian relaxation method. In Proceedings of the Mediterranean Symposium
on Smart City Applications, pages 149–160. Springer, 2017.
[80] Adil Iguider, Kaouthar Bousselam, Abdeslam En-Nouaary, Oussama
Elissati, and Mouhcine Chami. A novel approach for hardware software
partitioning in embedded systems. In 2019 International Conference
on Wireless Technologies, Embedded and Intelligent Systems (WITS),
pages 1–5. IEEE, 2019.
[81] Adil Iguider, Oussama Elissati, Mouhcine Chami, and Abdeslam
En-Nouaary. An efficient hw/sw partitioning algorithm for power opti-
mization in embedded systems. In 2018 International Symposium on
308 Hardware/Software Partitioning Algorithms
309
310 Index
I R
Intelligent Handicraft 243, 248 Raspberry Pi4 183, 185, 190–191,
Internet of Things 80, 90–91, 196, 198, 200, 205
206, 211, 213, 228, 239–240, Real-time monitoring 98, 135, 143,
243–247, 255–257 247
IoT 211–213, 215–216, 220–221,
227–229, 234, 239–240, S
ISO26262 3, 5, 7, 9–11, 14–15, 17 Safety Design 19, 30, 32
Sensors 97, 99–100
L Signal processing 153, 156,
LabVIEW software 149, 153, 162–164
156, 158 Software Architecture 22, 26, 31
LMS and NLMS algorithm 169, SoPC design 176
171, 172, 178 Surface electromyography
signal 162
M System on Chip 262, 277
Monitoring 235, 239–240
Multi-cores CPU 103 V
Vivado HLS 167–170, 173,
N 176–179
NFC tag 249–250, 253, 255
About the Editor
311
Smart Embedded Systems
and Applications
SMART EMBEDDED SYSTEMS
Saad Motahhir
Saad Motahhir
Editor:
River Publishers River Saad Motahhir River Publishers