Robotics, Control and Computer Vision
Robotics, Control and Computer Vision
Hariharan Muthusamy
János Botzheim
Richi Nayak Editors
Robotics,
Control and
Computer
Vision
Select Proceedings of ICRCCV 2022
Lecture Notes in Electrical Engineering
Volume 1009
Series Editors
Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Naples, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán,
Mexico
Bijaya Ketan Panigrahi, Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi,
Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore,
Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe,
Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Università di Parma, Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid,
Madrid, Spain
Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München,
Munich, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Yong Li, Hunan University, Changsha, Hunan, China
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona,
Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering and Advanced Technology, Massey University,
Palmerston North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Department of Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova,
Genova, Genova, Italy
Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi Roma Tre, Roma, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore,
Singapore
Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany
Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Walter Zamboni, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy
Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering—quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the Publishing
Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada
Michael Luby, Senior Editor ([email protected])
All other Countries
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **
Hariharan Muthusamy · János Botzheim ·
Richi Nayak
Editors
Robotics, Control
and Computer Vision
Select Proceedings of ICRCCV 2022
Editors
Hariharan Muthusamy János Botzheim
National Institute of Technology Eötvös Loránd University
Uttarakhand Budapest, Hungary
Srinagar, India
Richi Nayak
Queensland University of Technology
Brisbane, QLD, Australia
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Contents
Computer Vision
Challenges and Opportunity for Salient Object Detection
in COVID-19 Era: A Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Vivek Kumar Singh and Nitin Kumar
Human Activity Recognition Using Deep Learning . . . . . . . . . . . . . . . . . . . 15
Amrit Raj, Samyak Prajapati, Yash Chaudhari,
and Ankit Kumar Rouniyar
Recovering Images Using Image Inpainting Techniques . . . . . . . . . . . . . . . 27
Soureesh Patil, Amit Joshi, and Suraj Sawant
Literature Review for Automatic Detection and Classification
of Intracranial Brain Hemorrhage Using Computed
Tomography Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Yuvraj Singh Champawat, Shagun, and Chandra Prakash
A Pilot Study for Profiling Diabetic Foot Ulceration Using Machine
Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Irena Tigga, Chandra Prakash, and Dhiraj
A Deep Learning Approach for Gaussian Noise-Level Quantification . . . 81
Rajni Kant Yadav, Maheep Singh, and Sandeep Chand Kumain
Performance Evaluation of Single Sample Ear Recognition Methods . . . 91
Ayush Raj Srivastava and Nitin Kumar
AI-Based Real-Time Monitoring for Social Distancing Against
COVID-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Alok Negi, Krishan Kumar, Prachi Chauhan, Parul Saini,
Shamal Kashid, and Ashray Saini
v
vi Contents
János Botzheim earned his M.Sc. and Ph.D. degrees from the Budapest Univer-
sity of Technology and Economics in 2001 and 2008, respectively. He joined the
Department of Automation at Szechenyi Istvan University, Gyor, Hungary in 2007
as a senior lecturer, in 2008 as an assistant professor, and in 2009 as an associate
professor. He was a visiting researcher at the Graduate School of System Design at
the Tokyo Metropolitan University from September 2010 to March 2011 and from
September 2011 to February 2012. He was an associate professor in the Graduate
School of System Design at the Tokyo Metropolitan University from April 2012
to March 2017. He was an associate professor in the Department of Mechatronics,
Optics, and Mechanical Engineering Informatics at the Budapest University of Tech-
nology and Economics from February 2018 to August 2021. He is the Head of the
Department of Artificial Intelligence at Eötvös Loránd University, Faculty of Infor-
matics, Budapest, Hungary, since September 2021. His research interest areas are
computational intelligence, automatic identification of fuzzy rule-based models and
some neural network models, bacterial evolutionary algorithms, memetic algorithms,
applications of computational intelligence in robotics, and cognitive robotics. He has
about 180 papers in journals and conference proceedings.
xi
xii About the Editors
Richi Nayak is the Leader of the Applied Data Science Program at the Centre for
Data Science and a Professor of Computer science at Queensland University of
Technology, Brisbane Australia. She has a driving passion to address pressing soci-
etal problems by innovating the Artificial Intelligence field underpinned by funda-
mental research in machine learning, data mining, and text mining. Her research has
resulted in the development of novel solutions to address industry-specific problems
in Marketing, K 12 Education, Agriculture, Digital Humanities, and Mining. She
has made multiple advances in social media mining, deep neural networks, multi-
view learning, matrix/tensor factorization, clustering, and recommender systems.
She has authored over 180 high-quality refereed publications. Her research leader-
ship is recognized by multiple best paper awards and nominations at international
conferences, QUT Postgraduate Research Supervision awards, and the 2016 Women
in Technology (WiT) Infotech Outstanding Achievement Award in Australia. She
holds a Ph.D. in Computer Science from the Queensland University of Technology
and a Master in Engineering from IIT Roorkee.
Computer Vision
Challenges and Opportunity for Salient
Object Detection in COVID-19 Era:
A Study
1 Introduction
Humans have the ability to identify visually informative scene regions in the image
effortlessly and rapidly based on perceived distinctive features. These filtered regions
contain rich information about objects depicted in an image. Salient Object Detec-
tion (SOD) aims to highlight important objects or regions and suppress background
regions in the image. SOD methods transform an input image into a probability map
called saliency map [1] that expresses how much each image element (pixel/region)
grabs human attention. An example of salient object detection is illustrated in Fig. 1.
Salient Object Detection (SOD) has been widely applied as pre-processing step in
computer vision applications such as object detection [4, 5], video summarization [6],
and image retrieval [7].
Coronavirus disease (COVID-19) is an infectious disease [8–10] which has posed
several challenges to salient object detection, for example, due to use of face mask,
face detection performance is decreased. Diffusion of the disease has been occurring
from person to person quickly in the world. The disease is called COVID-19 and the
virus is denoted as SARS-CoV-2 which is a family of viruses effective for devolving
acute respiratory syndrome. COVID-19 common clinical features are fever, dyspnea,
cough, myalgia, and headache [11]. The most common diagnosis tool used for diag-
nosis of COVID-19 is the reverse-transcription polymerase chain reaction (RT-PCR).
Further, chest radiological imaging including computed tomography (CT) and X-ray
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 3
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_1
4 V. Kumar Singh and N. Kumar
Fig. 1 An example of salient object detection process, a input image, b saliency map [3], and
c ground truth
Fig. 2 A motivational example of this study, a input image, b saliency map obtained from Graph-
Based Manifold Ranking (GMR) [31] method, and c ground truth
is playing important role in the early diagnosis and treatment of this disease [12].
Researchers are looking for detecting infected patients through medical image pro-
cessing like X-rays and CT scans [13]. COVID-19 is a pandemic virus that infected
many people worldwide and continues spreading from person to person. The disease
also affected the lifestyle of humans such as education, office work, transportation,
economic actives, etc. Therefore, our main motivation is to look at the impact of the
virus on salient object detection performance and the applicability of salient object
detection approach to control spreading of the virus. Figure 2 shows a motivational
example of this study. In this figure, input image contains a human with face mask,
in which saliency map does not highlight the masked region of the face. The purpose
of this research work is to analyze the effectiveness of saliency detection on the
images generated around the current human life activities. In this study, we propose
a dataset which use to validate our suggested challenges in salient object detection
due to COVID-19.
The rest of this paper is structured as follows. Section 2 illustrates the related
works on salient object detection methods and novel Coronavirus-2019 (COVID-
2019). In Sect. 3, a detailed discussion about the challenges and opportunities for
salient object detection in the COVID-19 era is presented. Suggested challenges are
evaluated and analyzed in Sect. 4. Finally, conclusion and future works are presented
in Sect. 5.
Challenges and Opportunity for Salient Object Detection in COVID-19 Era: A Study 5
2 Related Work
A large number of salient object detection methods have been reported in litera-
ture. These methods are broadly categorized into two categories: bottom-up meth-
ods and top-down methods. Bottom-up salient object detection methods utilize the
appearance contrasts between objects and their surrounding regions in the image.
The earliest bio-inspired bottom-up saliency method was proposed by Itti et al. [1].
This method has extracted three low-level visual features such as luminance, color,
and orientation and exploits center-surround mechanisms to compute the saliency
maps. Achanta et al. [14] proposed a simple and efficient saliency detection approach
that computes saliency value of each image pixel by subtracting the Gaussian
blurred version of the image from the mean pixel value of the image. Goferman
et al. [15] presented four principles, namely, local low-level features, global consid-
erations, visual organizational rules, and high-level factors to compute saliency maps.
Perazzi et al. [16] suggested a saliency detection method based on color contrast.
Cheng et al. [17] proposed a global contrast-based saliency computation approach
which utilizes Histogram-based Contrast (HC) and Region-based Contrast (RC) for
saliency estimation. Liu and Yang [18] proposed saliency detection method that
exploited color volume and perceptually uniform color differences and combined
foreground, center, and background saliency to obtain saliency map. Top-down
salient object detection methods calculate the saliency values with the help of high-
level priors. Gao et al. [19] computed saliency values of interest points by their
mutual information and extracted discriminant features. Yang et al. [20] proposed
a novel saliency detection method that jointly learned Conditional Random Field
(CRF) for generation of saliency map. Jiang et al. [21] suggested saliency estimation
method that effectively integrated shape prior into an iterative energy minimization
box. Recently, convolutional neural networks (CNNs) have drawn great attention of
computer vision researchers. Wang et al. [22] presented saliency detection method
that employed two different deep networks to compute the saliency maps. Wang et
al. [23] proposed the PAGE-Net for saliency calculation. Ren et al. [24] suggested
the CANet, which has combined high-level semantic and low-level boundary infor-
mation for salient object detection. Currently, computer vision and machine learning
approaches have been rapidly applied for Coronavirus disease-2019 (COVID-19)
detection. Ozturk et al. [25] proposed an automatic COVID-19 detection model that
exploited deep learning method to detect and classify COVID-19. Waheed et al. [26]
proposed an Auxiliary Classifier Generative Adversarial Network (ACGAN) called
CovidGAN which has produced synthetic chest X-ray (CXR) images. Fan et al. [27]
suggested a novel COVID-19 lung CT infection segmentation network called Inf-Net.
Zhou et al. [28] presented a fully automatic, rapid, accurate, and machine-agnostic
method for identifying the infection regions on CT scans. Wang et al. [29] suggested
a novel noise-robust framework to learn from noisy labels for the segmentation. A
summary of the recent research works for object detection during COVID-19 is given
in Table 1.
6 V. Kumar Singh and N. Kumar
3.1 Challenges
The first challenging scenario is the complexity of the image where the appearance
such as color and texture of foreground regions and background regions is similar.
This is a difficult scenario for salient object detection methods because several meth-
ods exploit color and texture as distinctive features for calculating saliency value to
each image element. Therefore, if foreground and background image regions have
similar features then the methods may fail to highlight salient regions and suppress
background regions. Secondly, saliency detection process is very challenging in real-
time images in which the target object is partially hidden by some other objects. This
scenario is known as occlusion problem in natural images. The saliency detection
methods may fail to identify object in the image which is partially blocked by other
objects.
Figure 3 shows various visual challenges of salient object detection in natural
images. Similar color and texture of foreground and background regions in the com-
plex natural images are shown in Fig. 3a. An owl is situated in a place where the
surrounding location is homogeneous to the owl, the saliency detection task faces
problem to identify owl bird from real-time image as shown in Fig. 3a. Partial occlu-
sion problem in real-time images is depicted in Fig. 3b. In a cow body, some regions
are blocked by wooden poles which is shown in Fig. 3b, images are taken from
PASCAL-S [30] dataset, and in this scene cow is target object to which salient
regions are identified, but the methods may detect it partially. Figure 2a illustrates
the effect of coronavirus on human real image. In this scene, a man is wearing a white
face mask that is not similar to the human face skin. It is a case of partial occlusion
where the human face is partially hidden by the face mask.
Moreover, the face mask shows high center-surrounding difference than the tar-
geted object (i.e., man). Hence, the salient object detection methods may identify the
face mask as an important object instead of the man. This is a challenge for salient
object detection methods to achieve better performance on the visual data generated
in COVID-19 era. The COVID-19 pandemic has affected appearance of real-time
visual images surrounding human life. For example, nowadays, people are wearing
Personal Protective Equipment (PPE) which includes a face mask, gloves, gowns,
head cover, shoe cover, etc. to safeguard them from COVID-19. All the images taken
8 V. Kumar Singh and N. Kumar
from public places captured the human face with blockages by face mask. This sit-
uation can be considered as an occlusion problem in the natural images. It poses
a challenge to computer vision applications and most of them fail to identify hid-
den face in the presence of a face mask. This is also challenging for salient object
detection methods to uniformly highlight the human face. In addition, these PPE can
visually appear similar to the surrounding environment in terms of color and texture.
Any object identification computer vision application can be easily misguided to
identify wrong objects in an image.
Further, COVID-19 has also affected the visual appearance of groups of people
due to the following social distancing in public places. On many occasions, people are
capturing group images as shown in Fig. 4, image is adopted from PASCAL-S [30]
dataset. In this image, all people are together to form an object and salient object
detection methods can easily detect it as a salient object. However, today in group
images, people are maintaining minimum defined distance which is popularly known
as social distancing. Such effects may degrade the performance of salient object
detection because the target object is all the people in the image, in contrast, saliency
detection methods may detect some people out of all the people who appeared in the
image. A summary of these challenges is also given in Table 2.
Challenges and Opportunity for Salient Object Detection in COVID-19 Era: A Study 9
Table 2 Challenges and opportunities for salient object detection in COVID-19 era
S. no. Challenges Reason Opportunity
1 Low contrast between People are wearing Need to develop SOD
foreground and Personal Protective methods which can work
background Equipment (PPE) which better in low contrast
may be similar with situations
surrounding environment
2 Occlusion problem with For fighting with SOD methods which can
human face in the COVID-19 humans are address the occlusion
real-time images wearing face mask which problem effectively
illustrates high contrast
between humans face
skin and face mask in
terms of color and texture
3 Group of object may not Today’s people are not SOD methods which can
be detected standing very close due to detect multiple objects at
simultaneously social distancing rule a distance
implemented for
controlling transmission
of coronavirus virus.
Therefore, in group
images each and very
people are considered as
individual objects while
the significant meaning of
the image is to capture all
the people present on the
location.
4 Saliency detection The face mask may Intelligent SOD methods
methods may be become more important are required to detect
misguided by protected object than the human in actual salient object in an
gears to highlight the image. Whereas the image
non-salient regions as image is captured for
salient regions keeping the human as
target object by
photographer
5 Keeping an eye on the It is difficult by an SOD methods are
student activity in online instructor to monitor the required which can keep
teaching students in an online class an eye on the student
due to no direct activities
interaction
10 V. Kumar Singh and N. Kumar
3.2 Opportunities
COVID-19 period has emerged as a great opportunity for computer vision researchers
to contribute in battling COVID-19 disease. This is also an opportunity for salient
object detection methods. For battling with the COVID-19 disease, salient object
detection methods are required to focus on the challenges discussed in Sect. 3.1.
In this section, we discuss research opportunities and directions for handling the
challenges that emerged in COVID-19 era for salient object detection. The low con-
trast image has a similar appearance of foreground and background regions. Such
types of images can be captured during COVID-19 as people are wearing Personal
Protective Equipment (PPE) which may have similar color and texture with the
surrounding environment. This scenario provides an opportunity to discover visual
features which have the discriminative capability to classify foreground and back-
ground regions from the input image. The partial occlusion problem may occur in
COVID-19 environment as people are wearing a face mask. This effect on the visual
scene may influence the performance of salient object detection as partial occlusion
is a challenging scenario for saliency detection. Consequently, it is an opportunity
for researchers to introduce such saliency detection approaches which can deal with
partial occlusion in a better manner.
During COVID-19, people are following social distancing, which affects the visual
appearance of people. However, with the social distancing people are scattered on the
whole image and it is very difficult to identify all the humans who have appeared for
salient object detection. This is an opportunity to find such methodologies which can
deal with multiple object detection in a scene. Furthermore, the education system is
also facing a big problem during this COVID-19 pandemic. The educational institu-
tions are conducting their classes using online platforms. In such a mode, controlling
class behavior is very challenging for the instructor. In this process, the visual data are
coming from various sources, hence it is very difficult to identify which visuals are
important. This is yet another opportunity to identify salient regions from a different
source of visual data. A summary of these opportunities is also given in Table 2.
4 Experimental Result
Fig. 5 Qualitative study on samples of images of proposed dataset. First row represents
original Images, GMR [31] and FF-SVR [32] saliency maps are depicted in second and third
rows, respectively, fourth row shows ground truth (GT)
COVID-19 pandemic has noticeably affected human lives across the world and the
death rate is also alarming. In this study, we have focused on various scenarios of
salient object detection which may be affected due to the presence of the COVID-19
pandemic worldwide. Nowadays, people are wearing various modalities such as Per-
sonal Protective Equipment (PPE), face masks which change the visual appearance
of people in outside places. Such visual changes have put certain challenges in the
12 V. Kumar Singh and N. Kumar
real-time images, namely, low contrast between foreground and background, partial
occlusion and online monitoring, etc. These challenges for salient object detection
have also come with certain opportunities for the researchers and practitioners work-
ing in this research area. We have evaluated these challenges on the proposed dataset
to provide experimental support. In future work, we will explore saliency detection
models that can effectively handle the COVID-19 challenges.
References
1. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene
analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
2. Alpert S, Galun M, Basri R, Brandt A (2007) Image segmentation by probabilistic bottom-up
aggregation and cue integration. In: IEEE conference on computer vision and pattern recogni-
tion, 2007. CVPR\’07 , pp 1–8
3. Singh VK, Kumar N (2019) Saliency bagging: a novel framework for robust salient object
detection. Vis Comput 1–19
4. Ren Z, Gao S, Chia L-T, Tsang IW-H (2014) Region-based saliency detection and its application
in object recognition. IEEE Trans Circuits Syst Video Technol 5(24):769–779
5. Zhang D, Meng D, Zhao L, Han J (2017) Bridging saliency detection to weakly supervised
object detection based on self-paced curriculum learning. arXiv:1703.01290
6. Simakov D, Caspi Y, Shechtman E, Irani M (2008) Summarizing visual data using bidirectional
similarity. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8
7. Gao Y, Shi M, Tao D, Xu C (2015) Database saliency for fast image retrieval. IEEE Trans
Multimed 17(3):359–369
8. Lau H, Khosrawipour V, Kocbach P, Mikolajczyk A, Ichii H, Schubert J, Bania J, Khosrawipour
T (2020) Internationally lost COVID-19 cases. J Microbiol Immunol Infect
9. Lippi G, Plebani M, Henry BM (2020) Thrombocytopenia is associated with severe coronavirus
disease 2019 (COVID-19) infections: a meta-analysis. Clinica Chimica Acta
10. Zhang J, Yan K, Ye H, Lin J, Zheng J, Cai T (2020) SARS-CoV-2 turned positive in a discharged
patient with COVID-19 arouses concern regarding the present standard for discharge. Int J Infect
Dis
11. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X et al. (2020)Clin-
ical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet
395(10223):497–506
12. Zu ZY, Jiang MD, Xu PP, Chen W, Ni QQ, Lu GM, Zhang LJ (2020) Coronavirus disease 2019
(COVID-19): a perspective from China. Radiology 200490
13. Nguyen TT (2020)Artificial intelligence in the battle against coronavirus (COVID-19): a survey
and future research directions, vol 10. (Preprint, DOI)
14. Achanta R, Hemami S, Estrad F, Susstrunk S (2009) Frequency-tuned salient region detection.
In: 2009 IEEE conference on computer vision and pattern recognition, pp 1597–1604
15. Goferman S, Zelnik-Manor L, Tal A (2011) Context-aware saliency detection. IEEE Trans
Pattern Anal Mach Intell 34(10):1915–1926
16. Perazzi F, Krähenbühl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for
salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition,
pp 733–740
17. Cheng M-M, Mitra NJ, Huang X, Torr PHS, Hu S-M (2015) Global contrast based salient
region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569
18. Liu GH, Yang JY (2019) Exploiting color volume and color difference for salient region
detection. IEEE Trans Image Process a Publ IEEE Signal Process Soc 28(1):6
Challenges and Opportunity for Salient Object Detection in COVID-19 Era: A Study 13
19. Gao D, Han S, Vasconcelos N (2009) Discriminant saliency, the detection of suspicious coinci-
dences, and applications to visual recognition. IEEE Trans Pattern Anal Mach Intell 31(6):989–
1005
20. Yang J, Yang M-H (2016) Top-down visual saliency via joint CRF and dictionary learning.
IEEE Trans Pattern Anal Mach Intell 39(3):576–588
21. Jiang H, Wang J, Yuan Z, Liu T, Zheng N, Li S (2011) Automatic salient object segmentation
based on context and shape prior. BMVC 6(7):9
22. Wang L, Lu H, Ruan X, Yang M-H (2015) Deep networks for saliency detection via local
estimation and global search. In: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp 3183–3192
23. Wang W, Zhao S, Shen J, Hoi SCH, Borji A (2019) Salient object detection with pyramid
attention and salient edges. In: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp 1448–1457
24. Ren Q, Lu S, Zhang J, Hu R (2020) Salient object detection by fusing local and global contexts.
IEEE Trans Multimed
25. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR (2020) Automated
detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol
Med 103792
26. Waheed A, Goyal M, Gupta D, Khanna A, Al-Turjman F, Pinheiro PR (2020) Covidgan: data
augmentation using auxiliary classifier gan for improved covid-19 detection. IEEE Access
8:91916–91923
27. Fan D-P, Zhou T, Ji G-P, Zhou Y, Chen G, Fu H, Shen J, Shao L (2020) Inf-Net: automatic
COVID-19 lung infection segmentation from CT images. IEEE Trans Med Imag
28. Zhou L, Li Z, Zhou J, Li H, Chen Y, Huang Y, Xie D, Zhao L, Fan M, Hashmi S et al (2020)
A rapid, accurate and machine-agnostic segmentation and quantification method for CT-based
COVID-19 diagnosis. IEEE Trans Med Imag
29. Wang G, Liu X, Li C, Xu Z, Ruan J, Zhu H, Meng T, Li K, Huang N, Zhang S (2020) A
noise-robust framework for automatic segmentation of COVID-19 pneumonia lesions from
CT images. IEEE Trans Med Imag
30. Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 280–287
31. Yang C, Zhang L, Lu H, Ruan X, Yang M-H (2013) Saliency detection via graph-based manifold
ranking. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 3166–3173
32. Singh VK, Kumar N (2021) A novel fusion framework for salient object detection based on
support vector regression. In: Proceedings of the Springer conference on evolving technologies
for computing, communication and smart world, pp 437–450
Human Activity Recognition Using Deep
Learning
Amrit Raj, Samyak Prajapati, Yash Chaudhari, and Ankit Kumar Rouniyar
1 Introduction
In the current age, the products of the 4th Industrial Revolution are establishing their
prevalence in our daily lives and technology has advanced to such a level that going
“off-grid” is no longer a viable option. The boom in technology is directly correlated
with the boom in the economical position of a nation, and while it has proven apt in
ameliorating the quality of life, the general trend is leading us to an over-reliance on
technology. This dependence has several pros and cons associated with it, where it
all depends on us humans, on how we decide to make use of it. Mobile phones and
laptops have now become commonplace items that are at arm’s reach for most of us.
Data from such sources can prove valuable in establishing a security-critical
surveillance system as proven in the 2013 Boston Marathon Bombings [1] where
videos recordings from mobile phones used by citizens aided the investigators in
determining the cause of the explosion. With the given abundance of CCTV cameras
in nearly every public location, a system designed for activity recognition could
prove invaluable in circumventing illegal activities. Such systems could be used for
recognizing abnormal and suspicious activities at crowded public locations and aid
the on-ground personnel in flagging an individual as needed.
This work has the potential to be extended for applications in areas including
assisted living/healthcare to detect activities carried out by patients, or to detect
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 15
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_2
16 A. Raj et al.
if a certain person has fallen, and needs active assistance. Systems like these can
also be deployed to monitor activities in smart homes, which would then allow the
central system to control the lighting and HVAC units depending on the activity being
performed.
This paper is organized in the following. Section 2 contains the literature review
of related works. Section 3 describes the dataset used and Sect. 4 presents the chosen
models, along with the details of the performance metrics that were used. Section 5
consists of the results obtained and finally, Sect. 6 consists of the conclusions and
the future works.
2 Literature Review
The works of Mohammadi et al. [2] built their results on CNNs which were pre-trained
on “ImageNet” [3] weights and performed transfer learning along with the use of
attention mechanisms to achieve an average Top-1 classification accuracy of 76.83%
across 8 models. They were also involved in the creation of ensemble models with
4 models that yielded the highest accuracies and achieved an action classification
accuracy of 92.67% (Top-1). Geng et al. [4] performed feature extraction on raw
video inputs using pre-trained CNNs, and then they performed pattern recognition
using an SVM classifier on the extracted features to classify the videos based on
acting classes. Bourdev et al. [5] define the term poselet as to express a part of
one’s pose. Their work focuses on the creation of an algorithm to pick or choose the
best poselet in the sample space. They proposed a two-layer regression model for
detecting people and localizing body components. The first layer would detect the
local patterns in the input image as it contains poselet classifiers. The second layer
in turn would combine the output of the classifiers in a max-margin framework.
In their research, González et al. [6] proposed an adaption to the Genetic Fuzzy
Finite State Machine (GFFSM) method after selecting the three best features from
the human activity data using Information Correlation Coefficient (ICC) analysis
followed by a wrapper Feature Selection (FS) method. The data used by them was
gathered using two triaxial accelerometers on the subject’s wrists while performing
activities that were going to be recognize at a later stage. While doing the review
on Video-based Activity Recognition, Ke et al. [7] have addressed three stages of
activity recognition. The first stage is Human Object Segmentation, where they have
divided the task into two categories, the static camera segmentation, and the moving
camera segmentation, and discussed the same. The second stage is Feature Extrac-
tion and Representation, where they have extracted the global features as well as
local features, this is because the global features are sensitive to noise, occlusion,
and variation of viewpoint. The third stage is Activity Detection and Classification
Algorithms, where they have discussed classification algorithms like Dynamic Time
Human Activity Recognition Using Deep Learning 17
Warping (DTW), K Nearest Neighbor (KNN), Kalman Filter, and Binary tree multi-
dimensional indexing. They have also discussed the various applications of human
activity recognition, specifically healthcare systems and surveillance systems, and
the challenges associated with them.
In their research, Liu et al. [8] proposed to use a set of attributes, directly asso-
ciated with visual characteristics to represent human actions. They claimed that a
representation based on action attributes would be more descriptive and distinct, as
compared to the traditional methods.
Ji et al. [9] in their work proposed a 3D CNN model for human action recognition,
this model is designed to extract features from the spatial dimensions as well as
the temporal dimensions by performing 3D convolutions, as a result capturing the
motion information encoded in multiple adjacent frames. They propose regularizing
the outputs with high-level features to boost the performance of the model.
3 Data Source
The Stanford 40 Action Classification Dataset [10] was used in this work for training
the images. It contains 9532 images across 40 action classes (each class is exhibited
in Fig. 1) with around 180–300 images dedicated for each action class. The image
collection contained numerous activities which resulted in a colossal number of
candidate attributes. In addition, the number of possible interactions between the
attributes in terms of co-occurrence statistics. Subsequently, a custom dataset [11]
was also created which embodies three YouTube URLs for each action class present
in the Stanford 40 dataset. Each URL is a copy-right free and royalty-free “stock”
video, with the video length ranging from 15–30 s.
Table 1 depicts the class distribution of images in the original Stanford-40 dataset
and videos in the custom dataset.
4 Methodology
4.1.1 ResNet-50
layers of the network. This increased network depth can result in higher accuracies
on more difficult tasks. It has publicly available model weights that were trained on
the ImageNet dataset and achieved a Top-1 classification accuracy of 75.3% on the
ImageNet dataset.
4.1.2 ResNet-101
ResNet 101 is a deep CNN that, as the name suggests, is 101 layers deep. It was also
proposed in their paper by He et al. [12]. It is composed of 100 convolutional layers
along with a single fully connected layer as its output layer with softmax activation.
Being a model of the ResNet family, it makes use of residual blocks (illustrated in
Fig. 2), which use skip connections to propagate the output of a previous layer to the
“front”. As with ResNet 50, it also has publicly available weights that were trained
on the ImageNet dataset, achieving a Top-1 classification accuracy of 76.4%.
4.1.3 InceptionV3
Proposed by Szegedy et al. in their paper [13] the model is made up of symmetric and
dropout layers, asymmetric building blocks, including convolution layers, average
pooling layers, max-pooling layers and fully connected layers, concatenation layers.
Throughout the model, Batch Normalization was used to a great extent and applied to
activation inputs. The final activation for the output layer is often chosen as softmax
Human Activity Recognition Using Deep Learning 19
Table 1 Distribution of
Class Stanford-40 Custom video
action classes
imagery dataset dataset
Applauding 184 3
Blowing bubbles 159 3
Brushing teeth 100 3
Cleaning the floor 112 3
Climbing 195 3
Cooking 188 3
Cutting trees 103 3
Cutting vegetables 89 3
Drinking 156 3
Feeding a horse 187 3
Fishing 173 3
Fixing a bike 128 3
Fixing a car 151 3
Gardening 99 3
Holding an umbrella 192 3
Jumping 195 3
Looking through a 91 3
microscope
Looking through a 103 3
telescope
Phoning 159 3
Playing guitar 189 3
Playing violin 160 3
Pouring liquid 100 3
Pushing a cart 135 3
Reading 145 3
Riding a bike 193 3
Riding a horse 196 3
Rowing a boat 85 3
Running 151 3
Shooting an arrow 114 3
Smoking 141 3
Taking photos 97 3
Texting message 93 3
Throwing Frisby 102 3
Using a computer 130 3
Walking the dog 193 3
(continued)
20 A. Raj et al.
Table 1 (continued)
Class Stanford-40 Custom video
imagery dataset dataset
Washing dishes 82 3
Watching TV 123 3
Waving hands 110 3
Writing on a board 83 3
Writing on a book 146 3
4.1.4 InceptionResNetV2
This model was proposed by Szegedy et al. [14], the network is 164 layers in
“depth” and is a variation of the InceptionV3 model which borrows some ideas from
Microsoft’s original ResNet works [12, 15]. Residual connections allow for short-
cuts in the model and have allowed researchers to successfully train even deeper
neural networks, which has led to increased performance when compared to its base,
InceptionV3. It achieved a Top-1 classification accuracy of 80.1% on the ImageNet
dataset.
4.2 Workflow
The images were first augmented with random rotations between 0 and 359 degrees
followed by resizing them to 256 × 256 pixels. The augmented images were then
used to train four CNNs, namely ResNet50, ResNet101, InceptionV3, and Incep-
tionResNetV2 using Keras. The models were initialized with “ImageNet” weights
Human Activity Recognition Using Deep Learning 21
Table 2 Optimized
Model Learning rate Momentum Dropout
hyperparameters
ResNet50 1e-3 0.9 –
ResNet101 1e-3 0.9 0.2
Inception V3 1e-3 0.9 –
Inception ResNetV2 1e-4 0.9 0.2
and Stochastic Gradient Descent (SGD) was chosen as the optimizer. The dataset
was then divided into a 90:10 train-test split. The metrics were further improved by
using different combinations of regularization layers, dropout layers and by hyper-
parameter tuning; the final optimized hyperparameters are exhibited in Table 2. To
introduce the modality of classification by the use of videos, the trained models were
tested by decomposing the videos into individual frames, and then each frame was
tested by each model and the predicted class with the highest frequency was chosen
as the class exhibited in the video.
A browser-based end-to-end deployment was also created using Streamlit to have
a visualizable experience for the end-user of the product. The user can choose from
multiple models for detecting action classes, and then the user can decide whether
they wish to run on images on a video. In case the user elects to run on a single
image, the UI would allow them to upload a single image, and that same image
would be used by the selected model to generate a prediction. The prediction would
then be printed out below the input image, with a confidence value as well. In case
the user instead wishes to detect the most prominent action class of a video, they
would be given the option to insert a video URL, which would be downloaded in
the background and decomposed into individual frames, the aforementioned steps
would then be initiated to run inference for the video and the results would then be
printed out below the video. The entire workflow is exhibited as a flowchart in Fig. 3.
The metrics chosen for model evaluation were chosen as Top-1 Accuracy, Preci-
sion, Recall, AUROC (Area under ROC Curve), and F1 Score. The mathematical
formulas for the metrics are described below as a function of True Positives (TP),
True Negatives (TN), False Positives (FP), and False Negatives (FN). The AUROC
is calculated by Reimann summation of the curve plotted between the TP Rate and
the FP Rate.
T P + FN
Accuracy = (1)
T P + T N + FP + FP
TP
Pr ecision = (2)
T P + FP
22 A. Raj et al.
TP
Recall = (3)
T P + FN
Pr ecision ∗ Recall
F1 Scor e = 2 ∗ (4)
Pr ecision + Recall
5 Results
The performance evaluation metrics achieved after training and testing on Stanford-
40 imagery and corresponding videos were tabulated in Tables 3 and 4 respectively.
The accuracy mentioned henceforth refers to the Top-1 accuracy.
Human Activity Recognition Using Deep Learning 23
As evident from the results of Table 3, we can see that the models (initial-
ized with Image-Net weights) were able to perform quite well without the use of
computationally heavy techniques such as transfer learning.
The lower prediction accuracy in the video classification task exhibits the need
for certain “memory” in the neural network for predicting prominent action classes
in videos. In such a scenario, a hybrid network with LSTMs and would undoubtedly
perform better where the previous prediction has a considerable impact on the current
prediction.
The models could further be improved upon by training further with fine-tuning
the hyperparameters and making use of transfer learning. Models with 3D CNN
layers or hybrid models that incorporate memory-based models such LSTMs could
be used for improving the accuracies of video action classification as well. The use
of multiple datasets in classification would expand the scope of use case scenarios of,
such as the Sports-1 M Dataset [16], which consists of almost one million videos for
around 487 sporting activities, and UCF101 Dataset [17], which consists of 13,320
videos for various common actions. Data from mobile sensors such as accelerometer,
heart rate sensor, pedometer, barometer, et cetera could also be used in assisting the
models in analyzing the conditions of the human body and assessing that in making
the prediction. A weighted ensemble model or a cascaded network can also be used
for improving overall accuracy in the classification of action categories.
References
1. Hunt for Boston bomber in iPhone era (2013) Financial times. (18 Apr 2013). https://www.ft.
com/content/48adc938-a781-11e2-bfcd-00144feabdc0
2. Mohammadi S, Majelan SG, Shokouhi SB (2019) Ensembles of deep neural networks for
action recognition in still images. In: 2019 9th international conference on computer and
knowledge engineering (ICCKE), Mashhad, Iran, 2019, pp 315–318. https://doi.org/10.1109/
ICCKE48569.2019.8965014
3. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) ImageNet: a large-scalemax-
pooling hierarchical image database. In: 2009 IEEE conference on computer vision and pattern
recognition, Miami, FL, USA, 2009, pp 248–255. https://doi.org/10.1109/CVPR.2009.520
6848
4. Geng C, Song JX (2016) Human action recognition based on convolutional neural networks
with a convolutional auto-encoder. https://doi.org/10.2991/iccsae-15.2016.173
5. Bourdev L, Malik J (2009) Poselets: Body part detectors trained using 3D human pose annota-
tions. In: 2009 IEEE 12th international conference on computer vision, 2009, pp 1365–1372.
https://doi.org/10.1109/ICCV.2009.5459303
6. González S, Sedano J, Villar JR, Corchado E, Herrero L, Baruque B (2015) Features and
models for human activity recognition. Neurocomputing. https://doi.org/10.1016/j.neucom.
2015.01.082
7. Ke S-R, Thuc H, Lee Y-J, Hwang J-N, Yoo J-H, Choi K-H (2013) A review on video-based
human activity recognition. Computers. https://doi.org/10.3390/computers2020088
8. Liu J, Kuipers B, Savarese S (2011) Recognizing human actions by attributes. CVPR 2011.
Published. https://doi.org/10.1109/cvpr.2011.5995353
9. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recog-
nition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.
2012.59
10. Yao B, Jiang X, Khosla A, Lin AL, Guibas LJ, Fei-Fei L (2011) Human action recognition by
learning bases of action attributes and parts. In: International conference on computer vision
(ICCV), Barcelona, Spain. 6–13 Nov 2011
11. Prajapati S, Raj A (2021) djsamyak/DM-Stanford40. GitHub. https://github.com/djsamyak/
DM-Stanford40. (Apr 2021)
12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Human Activity Recognition Using Deep Learning 25
13. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception archi-
tecture for computer vision. In: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp 2818–2826
14. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the
impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial
intelligence, vol 31, no 1. (Feb 2017)
15. He K, Zhang X, Ren S, Sun J (2016). Identity mappings in deep residual networks. In: The
European conference on computer vision. Springer, Cham, pp. 630–645. (Oct 2016)
16. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale Video
Classification with Convolutional Neural Networks. In: Soomro K, Zamir AR, Shah M (eds)
UCF101: a dataset of 101 human action classes from videos in the wild, CRCV-TR-12-01, Nov
2012
17. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from
videos in the wild, CRCV-TR-12-01, Nov 2012.
Recovering Images Using Image
Inpainting Techniques
1 Introduction
Image inpainting is an actively researched area of deep learning which aims to fill
the missing pixels of the image as realistically as possible following the context.
This idea is not new and it has been researched for a long time. Approaches to
the inpainting tasks can be classified as sequence-based, Convolutional Neural Net-
work (CNN)-based, and Generative Adversarial Network (GAN)-based [1]. Initial
approaches used partial differential equations with fluid-dynamics-based approach
and Fast Marching method for inpainting [2, 3]. However, these approaches needed
manual intervention for creating masks and worked for small damage only. Due to the
high availability of data, deep-learning-based approaches can produce better results
but realistic image inpainting is still a difficult task. GAN framework served as a
base to several inpainting approaches to train the models effectively using adversar-
ial loss function [4]. Context encoders started using GANs for inpainting but had
drawbacks for mask sizes and semantic textures. Later models improved on the con-
text encoders to support variable size images and masks to prevent blurry output.
Using deep neural networks with established structures like Visual Geometry Group
(VGG), learning structural knowledge with shared generators, newer approaches like
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 27
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_3
28 S. Patil et al.
training generative models to map a latent prior distribution to natural image mani-
folds are being explored [5–7]. The use of descriptive text is also helpful to generate
better semantics [8].
Image inpainting techniques are abundantly available but the choice of the inpaint-
ing technique for a particular task depends on various factors like total damaged area,
availability of computational resources, memory, and space requirements. Hence, this
work provides a comparative analysis of readily available and commonly used tech-
niques, Navier–Stokes and Telea algorithms. The rest of the paper is organized as a
literature review in Sect. 2 followed by the proposed methodology in Sect. 3. Section 4
throws light on the experimental setup, achieved results, and its discussions followed
by the conclusion in Sect. 5.
2 Literature Review
Pathak et al. proposed context encoders consisting of CNN trained to generate con-
tent based on the context of its surroundings. An important contribution of this paper
was the “Channel-wise fully connected layer”. They achieved state-of-the-art perfor-
mance for semantic inpainting and the learned features were useful in other computer
vision tasks [9]. Context encoders were lacking texture details for predicted pixels.
Yang et al. proposed a framework by combining the techniques of neural style transfer
and context encoders and obtained enhanced texture details [10]. Many approaches
were inefficient in handling diverse-size images. Iizuka et al. proposed a Fully Con-
volutional Network with Dilated Convolution and local and global discriminators
and obtained better texture details for diverse images [11]. Demir et al. demonstrated
a combination of PatchGAN and GGAN discriminators. This enhanced local tex-
ture details of generated pixels [12]. Yan et al. proposed guidance loss to improve
decoded features of the missing region and shift connection layer to enhance global
semantic and local texture [13]. Yu et al. proposed contextual attention to obtain
information from distant spatial locations. They achieved better training stability
by using Wasserstein GAN (WGAN) adversarial loss and weighted L1 loss [14].
Wang et al. proposed the idea of ID-MRF loss term, multi-column structure, and
weighted L1 loss following previous trends to obtain high-quality results [15]. Liu
et al. proposed the idea of Partial Convolution to obtain state-of-the-art results [16].
Many inpainting methods usually generate blurry images due to usage of L1 loss
only. Nazeri et al. proposed an Edge Map of the missing region which contains prior
information. They separated the task of image inpainting into edge prediction and
image generation to obtain high-quality inpainting [17].
Yu et al. developed DeepFill v2 with Gated Convolution and SN-patch GAN
to obtain better inpainting results as compared to other methods [18]. Vitoria et al.
incorporated a novel Generator and Discriminator to build on improved WGAN [19].
They produced the ability to recover large regions by learning semantic information.
The approaches toward inpainting were able to handle irregular holes but they were
not able to generate textures of damaged areas. Guo et al. proposed Fixed-Radius
Recovering Images Using Image Inpainting Techniques 29
Nearest Neighbors (FRNN) to solve this issue. Using N blocks-one dilation strategy
and residual blocks is effective for smaller irregular holes. However, for larger holes,
this method needed to be trained using a large number of parameters [20]. Zeng
et al. proposed Pyramid-Context Encoder Network (PEN-Net) based on U-Net to
learn contextual semantics from full-resolution input and decode it effectively. This
network can be further refined for high-resolution images [21]. Image inpainting
results highly depend on input and many models yield unsatisfactory results when
the object overlaps with the foreground due to lack of information. Xiong et al.
proposed a foreground-aware inpainting system that outperformed other models on
complex compositions [22].
Li et al. proposed Spatial Pyramid Dilation (SPD) residual blocks for handling
different image and mask sizes. They applied Multi-Scale Self-Attention (MSSA)
to enhance coherency and obtained high PSNR scores [23]. For training inpainting
models, it is usually assumed that missing region patterns are known. This limits
the application scope. Wang et al. proposed Visual Consistency Network (VCNet), a
blind inpainting system, which first learns to locate the mask and then fills the missing
regions [24]. Liu et al. proposed a coherent semantic attention layer to preserve the
contextual structure and modeled the semantic relevance between hole features [25].
Zhao et al. proposed an Unsupervised Cross-space Translation GAN (UCTGAN)
model and were able to create visually realistic images. Their new cross-semantic
attention layer improved realism and appearance consistency [26]. For GAN-based
inpainting tasks, feature normalization helps in training. Most of the methods applied
feature normalization without considering its impact on mean and variance shifts. Yu
et al. proposed Basic and Learnable Region Normalization methods and obtained bet-
ter performance than full spatial normalization [27]. Liu et al. proposed Probabilistic
Diverse GAN (PDGAN) and achieved diverse inpainting results by modulation of
random noise [28]. Liao et al. introduced a joint optimization framework of semantic
segmentation and image inpainting by using the Semantic-Wise Attention Propaga-
tion (SWAP) module and obtained superior results for complex holes [29]. Zhang
et al. proposed a context-aware SPL model for inpainting that uses global seman-
tics to learn local textures [30]. Marinescu et al. proposed a generalizable Bayesian
Reconstruction through Generative Models (BRGM) using Bayes’ theorem for image
inpainting [31]. Although there are a lot of conditional GANs proposed for image
inpainting, they underperform when it comes to large missing regions. Zhao et al.
proposed a generic Co-Mod-GAN structure to represent conditional and stochastic
styles [32].
3 Proposed Methodology
This section explains the OpenCV algorithms used for comparative analysis and
custom error masks for producing corrupt images. The two explored areas are.
30 S. Patil et al.
1. Algorithms
(a) Telea algorithm.
(b) Naiver–Stokes algorithm.
2. Custom error masks.
3.1 Algorithms
This algorithm is based on the Fast Marching Method. It inpaints missing pixels
proximal to known pixels first, similar to manual heuristic operations. First, one of the
invalid boundary pixels is picked and inpainted. This is followed by the selection of
all boundary pixels iteratively to inpaint the whole boundary region. Invalid pixels are
replaced by the normalized weighted sum of neighboring pixels with more weightage
given to closer pixels. Hence, the newly created valid pixels are more influenced by
local valid pixels lying on the normal line of the boundary region and contours. After
inpainting one pixel, the next invalid pixel is chosen using the Fast Marching Method
and slowly propagates toward the center of the unknown region from the image as
shown in Fig. 1.
points with the same intensity to form contours, also known as isophotes. The edges
are considered analogous to the incompressible fluid and using the fluid dynamics
methods, the isophotes are continued in the unknown region. In the end, color is
filled to reduce the minimum variance in the concerned area.
This work aims to analyze the results on the Oxford Buildings dataset; a medium-
sized dataset, consisting of different objects and contexts with custom error masks.
It is emphasized to use manually crafted binary error masks covering smaller
damage across different directions. Diagonal, Horizontal, and Vertical masks are
aimed to corrupt the image counters on small scale along with respective directions.
The center mask is used to simulate a large corrupted area. Custom error masks are
shown in Fig. 2.
The effectiveness of the Navier–Stokes algorithm and Telea algorithm is ana-
lyzed by measuring established metrics like Peak Signal-to-Noise Ratio (PSNR) and
Structural Similarity Index Measure (SSIM). Runtime and memory allocated by the
algorithms are additionally considered to understand their complexity. The sample
images are shown in Fig. 3.
This section explains the observed results for the two algorithms discussed in this
work. The main criterion of evaluation is PSNR and SSIM values observed for both
algorithms.
For practical comparison of the two algorithms, this work had the following testing
setup specifications:
This work uses Python and OpenCV library for the implementation of sequential
approaches. The OpenCV library contains the implementation of the Navier–Stokes
method and Telea method of image inpainting. To get the corrupted images, four
different crafted binary masks are used.
Recovering Images Using Image Inpainting Techniques 33
4.1.1 Dataset
The Oxford Buildings dataset contains 5062 images obtained from querying Flicker
by 17 different keywords [33]. It contains 11 different landmarks and the images are of
different resolutions. They are preprocessed to 256 × 256 resolution for uniformity.
These preprocessed images are then damaged according to different error masks and
provided as input to the inpainting algorithms.
To get the quality assessment of the inpainting results, PSNR and SSIM are used,
which are part of the OpenCV library.
4.2.1 PSNR
The PSNR between two images is the peak signal-to-noise ratio measured in decibels.
This ratio is generally used in the computing efficiency of compressed images. The
higher the PSNR, the better the quality of the reconstructed image. The Mean Squared
Error (MSE) represents the cumulative squared error between the compressed and
the original image, whereas PSNR represents a measure of the peak error. Lower
the value of MSE, lower the error. PSNR is calculated using MSE, followed by an
equation containing logarithms and MSE. For colored images, PSNR is computed
differently. Images are converted to color spaces of different intensity channels and
PSNR is computed on those intensity channels.
4.2.2 SSIM
4.3 Discussion
PSNR and SSIM are established metrics to assess image similarities in image pro-
cessing tasks. Along with that, this work also uses runtime and memory consumption
as supplementary metrics. This work has obtained the average values of the metrics
on each error mask. Both Navier–Stokes and Telea algorithms performed best at
34 S. Patil et al.
horizontal contour recovery with PSNR 34.12692 and 34.23631, respectively. For
the central mask, as the larger area containing the most useful semantic information
was damaged, the algorithms couldn’t recover the images effectively as seen from
PSNR values 28.78492 and 28.90572, respectively. Memory consumption in all cases
is the same and efficient (196.70 KB). Runtime for diagonal mask shows that it is
costly to recover discontinuous area along different contours than continuous areas.
Both algorithms have efficient and equal runtime ranging between 3 and 10 ms. The
detailed results are summarized in Tables 1, 2, 3, and 4. Sample recovered images
for Vertical Mask, Horizontal Mask, Diagonal Mask, and Center Mask are shown in
Figs. 4, 5, 6, and 7, respectively.
Recovering Images Using Image Inpainting Techniques 35
5 Conclusion
Image inpainting problem is an actively researched area and there are many solu-
tions available to this problem. These solutions have a trade-off between complexity
and accuracy. The purpose of this work is to appraise new users and researchers of
the effectiveness of readily available algorithms. For most common use-cases, the
smaller area needs to be inpainted managing time and space complexity. This work
shows that PSNR up to 34.23631 and SSIM up to 0.977399 can be achieved with the
Telea algorithm. For larger corrupt regions, both methods failed to achieve decent
PSNR and SSIM values. Hence, these algorithms are not suitable for recovering
larger corrupted regions. Both algorithms are highly efficient in time and space com-
plexities and suitable for small damage recovery. Overall, Telea algorithm performs
slightly better than Navier–Stokes algorithm. The future scope of this work aims to
consider these algorithms a baseline for further study. The study will be done against
CNN-based and GAN-based algorithms which provide better inpainting for complex
semantics.
References
13. Yan Z, Li X, Li M, Zuo W, Shan S (2018) Shift-net: image inpainting via deep feature rear-
rangement. In: Proceedings of the European conference on computer vision (ECCV), pp 1–17
14. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with
contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 5505–5514
15. Wang Y, Tao X, Qi X, Shen X, Jia J (2018) Image inpainting via generative multi-column
convolutional neural networks. arXiv:1810.08771
16. Liu G, Reda FA, Shih KJ, Wang TC, Tao A, Catanzaro B (2018) Image inpainting for irregular
holes using partial convolutions. In: Proceedings of the European conference on computer
vision (ECCV), pp 85–100
17. Nazeri K, Ng E, Joseph T, Qureshi FZ, Ebrahimi M (2019) Edgeconnect: generative image
inpainting with adversarial edge learning. arXiv:1901.00212
18. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2019) Free-form image inpainting with gated
convolution. In: Proceedings of the IEEE/CVF international conference on computer vision,
pp 4471–4480
19. Vitoria P, Sintes J, Ballester C (2018) Semantic image inpainting through improved wasserstein
generative adversarial networks. arXiv:1812.01071
20. Guo Z, Chen Z, Yu T, Chen J, Liu S (2019) Progressive image inpainting with full-resolution
residual network. In: Proceedings of the 27th ACM international conference on multimedia,
pp 2496–2504
21. Zeng Y, Fu J, Chao H, Guo B (2019) Learning pyramid-context encoder network for high-
quality image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pp 1486–1494
22. Xiong W, Yu J, Lin Z, Yang J, Lu X, Barnes C, Luo J (2019) Foreground-aware image inpainting.
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp
5840–5848
23. Li CT, Siu WC, Liu ZS, Wang LW, Lun DPK (2020) Deepgin: deep generative inpainting
network for extreme image inpainting. In: European conference on computer vision. Springer,
pp 5–22
24. Wang Y, Chen YC, Tao X, Jia J (2020) Vcnet: a robust approach to blind image inpainting.
In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28,
2020, Proceedings, Part XXV 16. Springer, pp 752–768
25. Liu H, Jiang B, Xiao Y, Yang C (2019) Coherent semantic attention for image inpainting. In:
Proceedings of the IEEE/CVF international conference on computer vision, pp 4170–4179
26. Zhao L, Mo Q, Lin S, Wang Z, Zuo Z, Chen H, Xing W, Lu D (2020) Uctgan: diverse image
inpainting based on unsupervised cross-space translation. In: Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pp 5741–5750
27. Yu T, Guo Z, Jin X, Wu S, Chen Z, Li W, Zhang Z, Liu S (2020) Region normalization for
image inpainting. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp
12733–12740
28. Liu H, Wan Z, Huang W, Song Y, Han X, Liao J (2021) Pd-gan: probabilistic diverse gan for
image inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 9371–9381
29. Liao L, Xiao J, Wang Z, Lin CW, Satoh S (2021) Image inpainting guided by coherence priors
of semantics and textures. In: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pp 6539–6548
30. Zhang W, Zhu J, Tai Y, Wang Y, Chu W, Ni B, Wang C, Yang X (2021) Context-aware image
inpainting with learned semantic priors. arXiv:2106.07220
31. Marinescu RV, Moyer D, Golland P (2020) Bayesian image reconstruction using deep gener-
ative models. arXiv:2012.04567
32. Zhao S, Cui J, Sheng Y, Dong Y, Liang X, Chang EI, Xu Y (2021) Large scale image completion
via co-modulated generative adversarial networks. arXiv:2103.10428
33. Philbin J (2007) Oxford buildings dataset. http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/
Literature Review for Automatic
Detection and Classification
of Intracranial Brain Hemorrhage Using
Computed Tomography Scans
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_4
40 Y. S. Champawat et al.
Fig. 1 Sample images of CT scan with intracranial hemorrhage (marked with red arrow) and
healthy brain
Intracranial Brain Hemorrhage comprises five types, named as, epidural hemor-
rhage, subdural hemorrhage, subarachnoid hemorrhage, intraventricular hemor-
rhage, and intraparenchymal hemorrhage [1].
• Epidural Hemorrhage: It is a type of hemorrhage in which the blood accumulates
between the thick outer membrane, that is, the dura mater, and the skull. The main
cause of such hemorrhage is when a skull fracture or injury tears the underlying
blood vessels.
• Subdural Hemorrhage: It is a type of hemorrhage in which the blood accumulates
within the skull but outside the tissue of the brain. It causes when any brain injury
bursts the outer blood vessels on the skull head. It sometimes does not show
symptoms and needs no treatment.
• Subarachnoid Hemorrhage: It is a type of hemorrhage in which the blood accu-
mulates in the space surrounding the brain. It is mainly caused when any blood
vessel presents on the surface of the brain’s outer tissue bursts. It is a severe type
of stroke and needs immediate treatment.
• Intraventricular Hemorrhage: It is a type of hemorrhage in which the blood accu-
mulates into the brain’s ventricular system. It mainly occurs due to a lack of
oxygen in the brain or traumatic birth. It also has a high mortality rate, especially
among newborn babies.
• Intraparenchymal Hemorrhage: It is a type of hemorrhage in which the blood
accumulates within the brain parenchyma region, that is, the tissue region of
the brain. It mainly occurs due to sudden trauma, tumors, rupture of inner brain
arteries or veins, or birth disorders (Fig. 2).
It is well known that India is facing a shortage of both trained medical staff and
medical facilities. As per statistics presented in Thayyil and Jeeja [4], India comprises
approx. 17% of the total world population but contributes to about 20% of the total
world disease burden. About 70% of the total population of the country resides in
rural areas but approx. 74% of the total trained medical staff lives in urban areas,
leaving behind 26% for the majority of the population. As per a survey conducted in
Literature Review for Automatic Detection and Classification … 41
March 2018, a shortfall in health facilities at different levels is about: 18% at the Sub-
Centre level, 22% at the PHC level, and 30% at the CHC level [5]. Thus, there is a lot
of burden on the existing medical staff. The professional medical staff works day and
night for the well-being of society. Examples of this have been seen in the past two
years during the COVID-19 pandemic. The advancements in science and technology,
particularly in the field of artificial intelligence should be implemented and used in
such a way that it helps and supports our medical workforce. AI-assisted tools and
chatbots, AI-powered robots, and various computer-aided diagnostic systems should
be promoted more and more. Real-time automatic diagnosis of severe health issues
like intracranial brain hemorrhage will definitely prove a milestone in medical history.
It will save thousands of patients per year who lost their lives due to late treatment
and improper diagnosis of hemorrhage.
The rest of this paper is organized as follows: Sect. 2 describes the existing methods
of diagnosis of ICH and comparison between CT scan images and MRI images for
diagnosis purpose. Section 3 describes how machine learning and deep learning
techniques can assist in the detection of ICH and also presents the summary of some
previously done works, the comparison table, and analysis based on the obtained
table. Section 4 describes some limitations of this study, presents the future research
work for related to the field and lastly, and concludes our paper.
Intracranial Brain Hemorrhage is a severe type of stroke that can affect the functioning
of brain cells and thus can lead to critical symptoms and can eventually lead to the
death of a patient. Fast and effective treatment is generally required in case of an
ICH emergency. In some cases, major surgeries are also required to save the life of a
patient. Diagnosis of ICH is done by either CT scan or Magnetic Resonance Imaging
(MRI) [6, 7]. Neurologists and Radiologists require images of the inner regions of the
brain, in order to locate and confirm the presence of hemorrhage. Further, they per-
form the volumetric analysis of ICH on the basis of the spread of blood over brain
42 Y. S. Champawat et al.
tissues. This is an important step of treatment because performing this provides in-
formation about location, position, volume, and subtype of hemorrhage. Generally,
a CT scan is done first, and then if further clear and detailed images are required
then MRI is done. Due to the better image quality of MRI sometimes, it is being
assumed that MRI should be preferred over CT scan for diagnosis, but this is not
always true. CT scans have many advantages over MRI. Imaging in case of CT scan
is fast, generally takes 10–15 min while MRI might take 35–45 min and in case of
an emergency, the patient might not have that much time and need instant treatment.
Moreover, a CT scan can also be performed if the patient is taking a drip but MRI
cannot be done in that case. CT scan machines are easily available as compared to
MRI machines and performing CT scans is also less costly. MRI scan cannot be
per- formed in case if a patient is having any metallic or electrical implant in the
body. Also, in MRI the body of the patient is completely passed into the machine
thus it might lead to a state of unconsciousness. Sometimes, patients might not fit
into the MRI scanning machine due to their weight. Generally, it is recommended to
the patient to stay still in the MRI machine but sometimes it might not be feasible
for the patient due to old age or pain. MRI also has some advantages over CT scan
like the dose of harmful X-rays is high in case of CT scan while MRI works on the
magnetic and electrical power. Frequent CT scans can increase the risk of cancer to
the patient. The quality of images and information provided by MRI scans is much
better as compared to CT scan images.
Thus, it can be seen that both types of diagnostic imaging processes have their
own pros and cons. It has been observed that the image quality of a CT scan is
sufficient enough to provide details and information about brain hemorrhage so that
doctors can start initial treatment. Head CT scan images can even show the acute
hemorrhage or abnormality present in brain tissues. That’s why doctors prefer CT
scans over MRI for the accurate diagnosis of brain hemorrhage. If frequent imaging
reports are required or radiologists need further details of inner brain tissues then
MRI is done. Due to these reasons, we have chosen Computed Tomography (CT)
scan for the diagnosis of Intracranial Brain Hemorrhage as our work.
Intracranial Brain Hemorrhage is a very serious health problem that requires imme-
diate and intensive medical treatment. The delay in proper treatment might lead to the
death of the patient. The diagnosis of ICH using CT scans is a very complex process
and generally requires a very experienced radiologist. Sometimes it is not possible
to have an experienced radiologist available all the time. Which leads to a lack of
treatment. Moreover, the volumetric analysis of ICH using CT scan images is a very
complex and error-prone process. In the case of complex ICH, it becomes very diffi-
cult to estimate the volume of the Hemorrhage. Thus, a rapid and accurate alternative
method of diagnosis is necessary for the treatment process achieving success over
Literature Review for Automatic Detection and Classification … 43
ICH. The advancements in the field of machine learning and deep learning, particu-
larly computer vision, attracts the research community to propose computer-aided,
rapid, and accurate mechanisms for the automatic diagnosis of various diseases. As
the diagnosis of hemorrhage depends on the images obtained from CT scan or MRI,
a self-learning algorithm can be trained to obtain a model that can learn the patterns
from the normal and abnormal images. On the basis of these learned patterns the
model can detect the traces of disease present in medical images. In recent years, a
lot of work has been done in the field of diagnosis using machine learning [3, 8–18].
Some of these are, detection of pneumonia and COVID-19 using X-ray images of
chest, classification of brain tumor into benign and malignant, detection of breast
cancer, treatment of dead cells related skin infections, detection of degenerative dis-
eases like Parkinson and Alzheimer, in Diabetic Retinopathy, assisting doctors for
prescribing medicines and ICU calls, detection of stage of Diabetes and many more.
The detection and classification of ICH using machine learning techniques gener-
ally follows the pipeline presented in Fig. 3. The first stage of the pipeline is Data
collection or Data acquisition, in this stage the medical images along with proper
metadata of patients are collected from different hospitals or radiology centres. These
images are later used for training and testing of models. The following step is the Data
preparation step, which includes various data pre-processing techniques applied on
the medical images to make them ready for the input to model. This is an important
step as in this step noise and extra, unwanted information are removed from images
and various data augmentation techniques are applied. Next stage is Dataset parti-
tion, this stage includes dividing the dataset into training, validation, and test sets.
Following is the Training stage, this is the most important stage in the pipeline as it
includes feature extraction, feature selection and classification on the basis of features
obtained. The performance of the model is highly dependent on the methods that are
being adopted for feature extraction and classification in this stage. Lastly, the trained
model is being tested on the test dataset images and performance and generalizability
of the model is evaluated on the basis of various parameters like accuracy, recall,
precision, F1-score, AUC, sensitivity, specificity, etc. [19]. The brief description of
some of the most commonly used performance metrics is as follows.
• Accuracy: It is defined as the ratio of the sum of true positives and true negatives
to the total number of data instances available.
• Recall: It is defined as the ratio of true positives to the sum of true positives and
false negatives.
• Precision: It is defined as the ratio of true positives to the sum of true positives
and true positives.
44 Y. S. Champawat et al.
Fig. 3 The block diagram represents general pipeline for the diagnosis of brain hemorrhage
• Sensitivity: It is defined as the ability of the model to predict true positives from
the total given labels for each class. In binary classification, sensitivity is similar
to recall. In medical diagnosis, high sensitivity is preferred because if a patient is
having hemorrhage but classified as healthy, that is, no hemorrhage present then
it might lead to a big trouble.
• Specificity: It is defined as the ability of the model to predict true negatives from
total given labels for each class. In binary classification, specificity is similar to
precision.
• F1-score: It is defined as the measure of model’s accuracy on the complete dataset.
Mathematically it is being calculated using values of precision and recall.
1
N
Log Loss = − yi ∗ log( pi ) + (1 − yi ) ∗ log(1 − pi ) (5)
N i=1
Literature Review for Automatic Detection and Classification … 45
where TP stands for True Positives, TN stands for True Negatives, FP stands for
False Positives, FN stands for False Negatives, y stands for the true label of a data
instance and p stands for predicted label of a data instance (Fig. 3).
Depending on stage 4, feature extraction and classification, we have divided the
approaches for building pipeline into four types:
• Both feature extraction and classification based on machine-learning techniques
and algorithms.
• Feature extraction based on deep learning models and classification using machine
learning algorithms.
• Both feature extraction and classification based on deep learning techniques and
algorithms.
• Classification using IoT-powered techniques or segmentation-based algorithms.
In this approach, after applying suitable data pre-processing methods to input images,
the useful features are extracted using different standard manual methods and then
traditional machine learning-based classifiers like SVM, Random Forest, KNN, etc.
are trained on the obtained features (Fig. 4).
Shahangian and Pourghassem [20], implemented a pipeline for the segmentation
of the hematoma region for its area evaluation and classification into subtypes. This
pipeline includes pre-processing techniques, skull removal methods, brain ventricles
removal technique, morphological filtering processes, segmentation of ICH region,
feature extraction, quantifiable feature selection using genetic algorithm, and lastly,
classification of ICH into subtypes. The skull and brain ventricles were removed by
applying a check on the intensity values of the CT scan. Then a median filter was
applied to remove noise and the largest area object had been selected from the binary
image to get only the brain region. ICH segmentation was performed by applying a
threshold to pixel intensities. For the classification purpose, a KNN algorithm and a
multilayer perceptron (MLP) model with a tan sigmoid activated output layer were
trained. MLP model outperformed KNN.
Liu et al. [7], dealt differently with the nasal cavity and encephalic region CT scans.
From Fig. 5 we can observe that both types of CT scans have different textures, thus,
the method working efficiently with brain regions might not work well with the nasal
Fig. 4 This flow diagram represents the pipeline for both feature extraction and classification based
on machine learning techniques and algorithms
46 Y. S. Champawat et al.
cavity. Both are separated on the basis of texture analysis using Wavelet transform.
Skull removal and gray matter removal methods were applied to the encephalic
region to get segmented hemorrhages. Then 12 different features corresponding to
intensity distribution and texture descriptions were extracted. Entropy calculation
was employed to select good features and a Support Vector Machine (SVM) classifier
was trained to distinguish abnormal slices (slice consisting of ICH) from normal
slices.
Al-Ayyoub et al. [21], proposed a pipeline that includes skull removal, segmenta-
tion of ICH, morphological methods, extraction of the region of interest, feature
extraction, and classification. For the segmentation purpose, Otsu’s method was
applied followed by the opening transformation technique. Region of Interest is
obtained by applying the region growing algorithm on the output obtained after
segmentation. Finally, features based on the size, shape, and position of hemorrhage
ROI were extracted. The SVM, Multinomial logistics regression (MLR), Multi-
layer perceptron model, Decision tree, and Bayesian network classifiers were trained
independently on features. The MLR classifier outperforms others.
In this approach, after applying suitable data pre-processing methods to input images,
the pre-trained Convolutional Neural Networks (CNN) are imported and are trained
end-to-end in order to extract features from images. The traditional machine learning
algorithms applied on the top of these CNN models are then trained for performing
classification using the obtained features (Fig. 6).
Salehinejad et al. [8], stacked three windows of CT scan images to get 3- channel
input for 2D-CNN models. They have used pre-trained SE-ResNeXt-50 and SE-
ResNeXt-101 models as the backbone for extracting features from images and
have applied traditional machine-learning algorithms like LightGBM, CatBoost, and
XGBoost for classification. To utilize the interdependency among slices of a CT scan
they applied a sliding window module. For testing the generalizability of the models,
they tested them on a private external validation dataset. This is an important step,
Literature Review for Automatic Detection and Classification … 47
Fig. 6 This flow diagram represents the pipeline for feature extraction using pre-trained convolu-
tional neural network (CNN) model and classification based on machine learning algorithms
especially in the case of medical images. Testing the models on a dataset consisting
of temporally and geographically different images indicates the generalization power
of models.
Sage and Badura [9], have applied regions of interest, that is, brain region crop-
ping and skull removal methods before giving image input to the ResNet-50 model.
They performed brain region cropping by determining the largest binary object from
the CT scan image after applying Otsu Algorithm. The skull removal method was
applied by reducing the values of pixels having the highest intensities to zero. Two-
branch architecture was used to train the classification model. In the first branch,
three different windows were stacked and in the second branch, three consecutive
subdural windows were stacked to get a 3-channel image. SVM and Random Forest
were applied on top of the ResNet-50 network for predicting the class.
In this approach, after applying suitable data pre-processing methods to input images,
the pre-trained Convolutional Neural Networks (CNN) are imported and then a
transfer learning protocol is followed to train these models for performing classi-
fication. The features obtained from the pre-output layer of these models can also be
used to train Bi-LSTM network layers in order to utilize the spatial interdependence
among slices of CT scan (Fig. 7).
He et al. [10], developed a classification model using pre-trained CNN models
like SE-ResNeXt50 and EfficientNet-B3 as the backbone. They have used weighted
Fig. 7 This flow diagram represents the pipeline for feature extraction using pre-trained convolu-
tional neural network (CNN) model and classification using softmax activation function layer or
Bi-LSTM layers as output layers
48 Y. S. Champawat et al.
multi-label logarithmic loss for the training of models. For improving the perfor-
mance, they employed K-fold cross-validation (K = 10 in their case) and pseudo-
label technique. Using the pseudo-label technique, 52,260 new images were added
to the training dataset which was originally present as unlabeled data in the RSNA
dataset.
Anaya and Beckinghausen [11], proposed a multi-label classification model for
classifying ICH into its subtypes. The features were extracted using pre-trained
MobileNet and ResNet-50 networks. On the basis of experimental results, the authors
concluded that it is most difficult to detect epidural hemorrhage using a CT scan.
This is probably due to the presence of an epidural hematoma near the skull region
of the head.
Juan Sebastian Castro et al. [12], proposed a binary classification model for
detecting hemorrhage in CT scans. The brain region from CT scans was extracted
from the background and then a single window (WW = 80; WL = 50) was applied
to get the brain parenchyma region. They have used pre-trained VGG-16 and a
customized CNN model as the backbone for the classification model. The training
was performed using two protocols, one is slices randomized and another is subject
randomized.
Lewicki et al. [13], presented a multi-label classification model for the detection
and classification of ICH into its subtypes. Due to heavy negative bias and high-class
imbalance among positive classes in the RSNA dataset [22], class weights were
applied to loss function and recall/precision tuning was performed. A batch of 3-
channel CT scan images produced by stacking three different windows was fed as
input to the ResNet-50 model for training purposes.
Patel et al. [14], used a private dataset to train the combination of CNN and Bi-
LSTM networks for predicting the probabilities corresponding to each class. Initially,
features of the CT scan images were extracted using CNN and then the output spatial
vectors of consecutive slices were together given as input to Bi-LSTM layers. The
Bi-LSTM network was applied to utilize the interdependency among slices of a CT
scan. Rotation and Random Shifting augmenting techniques were also applied. The
authors also specified the importance of pre-training of CNN models before applying
end-to-end training for fine-tuning.
Nguyun et al. [15], trained a CNN and Bi-LSTM combinational network on the
RSNA dataset and used the CQ500 dataset [23] for external validation. They have
applied various types of augmenting techniques to improve the generalizability of
models. To deal with the class imbalance problem in the RSNA dataset they have
applied weighted binary cross-entropy loss for training. They have used ResNet-50
and SE-ResNeXT-50 models as the feature extractors.
Burduja et al. [3], proposed a slice-based classification model using the ResNeXt-
101 network for feature extraction and Bi-LSTM layers on top. The Res- NeXt-101
network outputs a 2048-seized feature vector for each image. Then, PCA was applied
to reduce the dimensions of this feature vector to a 120-sized vector. This reduced
feature vector was given as input to recurrent neural networks. The outputs of RNN
were concatenated to the prediction probabilities obtained as outputs from Res-
NeXt-101. These concatenated feature vectors were used to train the final output
Literature Review for Automatic Detection and Classification … 49
In this approach, after applying suitable data pre-processing methods to input images,
various medical image segmentation algorithms are applied to get the segmented
image of ICH. Features are then extracted from the obtained segmented images
using either a manual feature extraction process or pre-trained CNN by fine-tuning
the model. Later on, these features are used to train the machine learning algorithms
or CNN models for the classification purpose. Internet-of-Things (IoT) powered tech-
niques can also be used for getting processed images in electrical format. These elec-
trical signals act as feature vectors of images which are then used to train classifiers
(Fig. 8).
Sage and Badura [9], presented the comparison between various ICH segmenta-
tion algorithms. Majorly, three techniques have been used for the segmentation of
ICH, named as, Thresholding technique, Region Growing technique, and Clustering
techniques. The authors have implemented and compared the proposed multilevel
segmentation approach (MLSA), watershed method, and EM method on the basis
Fig. 8 This flow diagram represents the pipeline for classification of ICH using feature vectors that
are extracted from segmented ICH image. For the classification purpose, any classifying model can
be applied
50 Y. S. Champawat et al.
of the time taken to process a single image and average PCC values. The MLSA
technique has performed better than other methods.
Vincy Davis et al. [12], presented a model for the diagnosis and classification of
ICH. The model includes the conversion of CT scan image into the grayscale image
then resizing and edge detection were applied. After that several morphological tech-
niques like opening and closing transformations and boundary smoothing methods
were applied. Segmentation of ICH was performed using Watershed Algorithm.
The paper also presents the importance of the Watershed algorithm in extracting
hematoma regions. An ANN model was trained using features extracted from Gray
Level Co-occurrence Matrix (GLCM) method.
Patel et al. [14], proposed a CNN model inspired by U-Net for the segmentation
of the ICH region in the CT scan image. The model was trained on the ground truth
labeled images. The segmented hematoma was classified into its sub-types. They
have applied several data augmentation techniques for achieving better- generalized
outcomes. It also discussed possible reasons for achieving better results for subtypes
and not-so-good results for other subtypes.
Balasooriya et al. [24], presented a pipeline for the diagnosis of ICH using image
segmentation of hemorrhage region in CT scan using watershed algorithm. As the
first step, the input images were converted to greyscale and were reduced to 2- dimen-
sional images. Then various morphological techniques were applied for removing
noises and disturbances from CT scan image, preparing it for segmentation purpose.
Features extracted manually from segmented images were used to train artificial
neural network (ANN).
Chen et al. [25], presented a smart Internet-of-Things (IoT) based technique for
classification of ICH using machine learning algorithms. In the setup, a Wi-Fi sensor
was placed in between CT scan machine and Arduino board. Two types of sensors
were applied, for converting the CT scan images into electrical signals and saving
them to the server. A complementary metal oxide semiconductor (CMOS) sensor was
used to convert medical images into electrical signals and ESP8266 Wi-Fi module
was used for posting data to server. The electrical signals obtained were used to
train the Support vector machine (SVM) and Feedforward neural network (FNN)
model for classification. A mobile application was also developed for testing CT
scan images and generating reports in real time (Table 1).
With reference to Table 1, we can infer some research challenges and can provide
some suggestions for future scope related to the field, which have to be taken care
of while implementing a model for detection and classification of intracranial brain
hemorrhage into its subtypes. The following are some measures.
Table 1 Table presents the comparison of reviewed papers on the basis of common parameters primarily related to the implementation of work. The parameters
included are application of the paper, dataset used in the paper, windowing policies adopted for the CT scans to convert them into 3-channel images, pre-
processing techniques applied before feature extraction and training of models, saliency or heat maps presenting the presence of ICH in CT scan, performance
metrics included in work, strong points related to the methods adopted by authors and review comments for the presented work
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Liu et al. [7] Splitting of CT Private – Skull removal; – Accuracy = 80%, 1. Presented pre- 1. Dataset not
scan images into dataset gray Matter Recall = 88% processing made publicly
nasal cavity and removal; methods for available
encephalic wavelet discarding 2. Applicable
region. transforms abnormal only on
Classification of slices encephalic
ICH into its 2. Wavelet and region images
subtypes haralick 3. Poor feature
texture-based extraction and
model for selection
splitting of CT methods
scan images
Balasooriya Detection of Private – Opening and – Accuracy = 80%, 1. Implemented 1. Small private
et al. [24] ICH in CT scan dataset closing Recall = 88% various pre- dataset was
using transformation; processing used
Literature Review for Automatic Detection and Classification …
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Saini and Segmentation of Private – – – Highest Accuracy 1. Presented a 1. No
Banga [6] ICH region and dataset = 97.1% using comparative pre-processing
Detection of MLSA method, analysis of techniques
abnormal slices Highest Precision various applied
of CT scan = 94.69% using segmentation 2. No
K-means, methods presentation
Highest Recall = of classifier
90.07% using algorithm was
K-means and given
FCM 3. Proposed
segmentation
method is also
unclear
(continued)
Y. S. Champawat et al.
Table 1 (continued)
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Shahangian and Segmentation of Private – Skull removal; – Highest Accuracy 1. Implemented 1. Dataset is not
Pourghassem ICH region and dataset Brain = 93.3% using pre- made publicly
[20] Classification of Ventricles Multilayer processing available.
ICH into removal; Perceptron techniques on Small dataset
subtypes Median Filter; model, images before 2. No window
Soft tissue For feature policy applied
Edema segmentation, extraction 3. Only three
removal; highest accuracy 2. Various subtypes
obtained is for segmentation (Epidural,
epidural ICH = techniques Intracerebral
96.22% were and Subdural
implemented Hematoma)
and are classified
comparative 4. The method
analysis was proposed for
presented segmentation
is based on
pixel intensity
Literature Review for Automatic Detection and Classification …
division. This
method is not
so promising
and might not
work better in
case of
complex CT
scans
(continued)
53
Table 1 (continued)
54
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Al-Ayyoub Segmentation of Private – Skull removal; – Accuracy for 1. Texture-based 1. Dataset not
et al. [21] ICH region and dataset Segmentation detection of ICH made publicly
Classification of using Otsu’s Hemorrhage = segmentation available
ICH into its method; 100% was applied 2. Poor feature
subtypes Opening Accuracy for 2. Various extraction and
operation; classification of morpholog- selection
region growing ICH into ical methods
subtypes = 92% techniques 3. Only three
and region of subtypes
interest (Epidural,
extraction Intra-
techniques are parenchymal
presented and Subdural
Hematoma)
are classified
(continued)
Y. S. Champawat et al.
Table 1 (continued)
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Davis and Segmentation of Private – Edge – Error in detection 1. Segmentation 1. Small dataset
Devane [26] ICH region and dataset Detection; of ICH = using (just 35
Classification of Opening and 0.47838 watershed images)
ICH into closing algorithm is 2. No window
subtypes operations; presented policy
Median Filter; 3. Only two
Watershed subtypes
Algorithm (Intracerebral
(Segmentation) and Subdural
Hematoma)
are classified
4. Poor feature
extraction and
selection
methods
Majumdar et al. Segmentation of Private – Data – Sensitivity = 1. Described 1. No proper
[27] ICH region and dataset augmentation 81% model for pre-processing
Classification of Specificity = segmentation applied
Literature Review for Automatic Detection and Classification …
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Anaya and Detection and RSNA – – – Accuracy = 76%, 1. Presented a 1. No window
Beckinghausen classification of Recall = 93% detailed policy
[11] ICH into analysis of 2. No
subtypes obtained pre-processing
results done
2. Stated 3. Small dataset
importance of (only 5000
3D - CNN for images from
classification RSNA were
used)
Castro et al. Detection of CQ500 Brain window Background – Accuracy = 98%, 1. Two protocol 1. Small Dataset
[12] ICH in CT scan (WW = 80; Removal; Recall = 97% training: - 2. Only detection
WL = 50) Anisotropic F1-score = 98% Slices of ICH, No
filter randomized classification
and Subject into subtypes
randomized
Patel et al. [14] Detection of Private – Data – Highest AUC = 1. Used spatial 1. Not made
ICH in CT scan dataset augmentation 0.96 interdepen- dataset
using spatial dency by publicly
interdependency using available
among slices of Bi-LSTM 2. No
CT scan network pre-processing
and
visualization
techniques
applied
(continued)
Y. S. Champawat et al.
Table 1 (continued)
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
He et al. [10] Detection and RSNA – Data – Weighted mean 1. Applied 1. No window
Classification of augmentation log loss = 0.0548 K-fold cross- policy applied
ICH into validation 2. No
subtypes which pre-processing
improves the done
performance
of model
Lewicki et al. Detection and RSNA Brain window – – Highest Accuracy 1. All the 1. No
[13] Classification of (WW = 80; = 93.3% performance pre-processing
ICH into WL = 40); Average per-class metrics are done
subtypes Subdural Recall = 76% presented as 2. No
window (WW per-class visualization
= 200; WL = measures of ICH
80); which helps in presented
Bone window better analysis 3. Only one
(WW = for diagnosis classifier is
2800;WL = among trained
600) subtypes of
Literature Review for Automatic Detection and Classification …
ICH
(continued)
57
Table 1 (continued)
58
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Sage and Detection and RSNA Brain window Brain region – Highest Accuracy 1. Pre- 1. Not used
Badura [9] Classification of (WW = 80; Cropping Skull reported for: processing spatial inter-
ICH into WL = 40); Removal Intraventricular techniques dependency
subtypes Subdural = 96.7% were applied among slices
window (WW Intraparenchymal on images 2. No saliency
= 200; WL = = 93.3% before map
100) Subdural = training phase visualization
Bone window 89.1% 2. Made use of 3. Only subset of
(WW = Epidural = spatial inter- RSNA was
2800;WL = 76.9% dependency used
600) Subarachnoid = among slices
89.7% of a CT scan
(continued)
Y. S. Champawat et al.
Table 1 (continued)
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Nguyun et al. Detection and RSNA; Brain window Data – Weighted mean 1. Used interde- 1. No
[15] Classification of CQ500 (WW = 80; augmentation log loss: For SE pendency pre-processing
ICH into (external WL = 40); ResNext-50 = among slices and
subtypes using validation) Subdural 0.05218 For by applying visualization
Spatial window(WW ResNet-50 = Bi-LSTM techniques
Interdependency = 215; WL = 0.05289 network applied
among slices of 75) 2. Tested models
CT scan Bone window on CQ500
(WW = dataset for
2800;WL = external
600) validation
Brain window
(WW = 80;
WL = 40);
Subdural
window (WW
= 200; WL =
80);
Literature Review for Automatic Detection and Classification …
Soft Tissue
window (WW
= 380; WL =
40)
(continued)
59
Table 1 (continued)
60
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Burduja et al. Detection and RSNA Data GRAD-CAM Weighted mean 1. Used interde- 1. No
[3] Classification of augmentation heat maps log loss = pendency pre-processing
ICH into presented 0.04989 among slices techniques
subtypes using by applying applied
Spatial Bi-LSTM
Interdependency network
among slices of 2. Presented
CT scan saliency maps
Hoon et al. [16] Detection and RSNA Brain window Data – Weighted mean 1. Addressed 1. No
Classification of (WW = 80; augmentation; log loss = problem of pre-processing
ICH in CT scan WL = 40); Data balancing 0.07528 class and
using Spatial Subdural imbalance in visualization
Interdependency window (WW RSNA dataset techniques
among slices of = 200; WL = and presented applied
CT scans 80); data balancing 2. Number of
Bone window techniques labels are
(WW = shown as
1800;WL = number of
400) images in
dataset which
is not correct
(continued)
Y. S. Champawat et al.
Table 1 (continued)
Author [Year] Application of Dataset Window Pre-processing Saliency map Performance Strong points of Review
paper policy visualization metrics methods adopted comments
Chen et al. [25] Detection and Private – – – Accuracy for 1. Presented the 1. Dataset used is
Classification of dataset SVM = 80.67% importance small and not
ICH in CT scan Accuracy for and use of made publicly
using Internet of Feedforward IoT-based available
Things based Neural Network devices for 2. No
system = 86.7% diagnosis of pre-processing
diseases techniques
2. Implemented were applied
an end-to-end 3. More better
mobile classifiers
application could be used
for the real for achieving
time use better results
Salehinejad Detection and RSNA; Brain window – GEAD-CAM RSNA: AUC = 1. Tested models 1. No
et al. [8] Classification of private (WW = 80; and GRAD- 98.4%, on external pre-processing
ICH into dataset WL = 40); CAM+ + heat Sensitivity = validation done
subtypes using (external Subdural maps 98.8%, dataset which 2. Haven’t made
Spatial validation) window (WW presented Specificity = proves better private dataset
Literature Review for Automatic Detection and Classification …
This study aims to investigate the problem of detection of Intracranial Brain Hemor-
rhage and classification into its subtypes. Intracranial Hemorrhage (ICH) is a life-
threatening emergency that corresponds to acute bleeding within the skull (cranium).
Thousands of people die every year due to the lack of instant treatment of ICH. We
have shown the significance of machine learning and deep learning, in the field of
diagnosis of ICH. Along with the general insights of intracranial hemorrhage and its
subtypes, the paper described the existing methods of diagnosis using CT scan and
MRI. Our study also explains how AI/ML techniques can be used for the detection
and extraction of the ICH region. In the review process of previously done works, the
paper consists of a state-of-art ranging from data handling to feature extraction and
classification. All these stages in the pipeline were explored and analyzed individ-
ually. The works are compared on the basis of various dimensions like application
of work, the dataset used, data pre-processing steps included, heat maps presented,
AI/ML techniques employed and classifiers used, etc.
We have compared different previously done studies in the field of detection
and classification of Intracranial Brain Hemorrhage on the basis of some common
parameters. But there are some limitations of this study that need to be addressed in
future work. Firstly, we have majorly reviewed works which are using deep learning
techniques. This is because it has been observed that the performance of deep learning
models is generally much better than that of traditional machine learning methods
and algorithms. Almost all studies related to this field done in recent years have
employed only deep learning-based CNN models for classification. Secondly, we
assumed that the reader is having some prior knowledge about the implementation
details of various algorithms and methods presented in this study. That is why we
have not shown the working details or theoretical information about these algorithms.
Thirdly, some specific parameters like hyperparameters values (batch size, learning
rate, number of nodes or layers in customized networks, epochs, kernel size, etc.),
number of images in the datasets, information about data splitting, and results of
the reviewed works have not been presented. This is because these parameters were
differently implemented in different studies and thus, cannot be directly compared.
Lastly, we have not implemented any codes for the confirmation of the results claimed
in the reviewed studies. Also, we do not guarantee the qualitative results of these
studies in real-time applications for the diagnosis of ICH.
Further, for the future works to be done, it would be suggested for aiming to
implement several pipelines for the detection and classification of ICH using CT
scans. In these pipelines, one can implement different pre-processing techniques
like skull removal methods, head cropping methods, enhancing the medical image
quality by applying CLAHE, Gamma correction or Histogram equalization, etc., and
different image data augmentation techniques. Then compare the results obtained
from these pipelines to get the best pre-processing and augmenting techniques to
be followed for achieving the best results. For the classification purpose, it would
be suggested to use pre-trained CNN models for feature extraction and Bi-LSTM
64 Y. S. Champawat et al.
References
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 67
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_5
68 I. Tigga et al.
four parts: MPA (medial plantar artery), LPA (lateral plantar artery), MCA (medial
calcaneal artery), and LCA (lateral calcaneal artery) [1, 2]. As shown in Fig. 1.
The past work performed in DFU in the context of the application of Machine
Learning and Deep Learning is mainly done over the thermogram image data where
feature selection and extraction is being performed by deep learning models. Various
deep learning models are provided with pre-trained data in order to get high accuracy
[6]. Francisco et al. in 2017, Thermoregulation of healthy individuals, overweight–
obese, and diabetic is discussed. In this paper, conventional foot assessment methods,
infrared.
Muhammad et al. proposed computer-aided diagnosis of the diabetic foot
using infrared Thermography. Different techniques for thermal image analysis
are presented in this paper. Among them, asymmetric temperature analysis is a
commonly used technique as it is simple to implement and yielded satisfactory results
in previous studies. In 2019, Dineal et al., create Database of Plantar thermograms.
It also discusses various challenges to capture and analyze thermogram data and
provides a database which is composed of 334 individual thermograms from 122
diabetic subjects and 45 non-diabetic subjects. Each thermogram includes four extra
images corresponding to the plantar angiosomes, and each image is accompanied by
its temperature.
Many techniques had been used for processing thermogram patterns like spatial
patterns, segmentation, active contour models, edge detection, and diffuse clustering
[2]. Later further work was done on Image classification using Deep Learning where
models like GoogLeNet and AlexNet performance was compared with ANN and
70 I. Tigga et al.
SVM. There are some issues that need to be addressed when DL is used these
include the dataset size, the appropriate labeling of the samples, the segmentation
and selection of Regions of Interest (ROIs), the use of pre-trained structures in the
mode of transfer learning, or the design of a proper new learning-structure from
scratch, among others [7]. Proper feature selection and appropriate hyper parameter
adjustment can provide high-accuracy classification results using traditional ML
techniques. In this study feature extraction, feature ranking, and Machine learning
(ML) methods are explored.
This study provides a comparative analysis of various ML techniques when
performed on the thermogram database for the DFU profile of the subjects [2].
Grid search provides the best hyper parameter for the models and helps to get high
accuracy for Random Forest and SVM.
2 Methodology Proposed
This section presents the methodology proposed for the Profiling diabetic foot ulcer-
ation Using Machine Learning Techniques for Rehabilitation Fig. 2. Illustrates the
methodology used in this pilot study.
In this methodology, firstly the dataset which is in the form of excel containing
information for each individual (DM patients and the CG people) is preprocessed. In
this dataset preprocessing includes the identification and treatment of missing values
and encoding categorical data. The next step is Data Analysis and feature extraction,
where data has been analyzed using various pandas libraries in order to figure out
features correlation and relevance. The next step includes the application of ML
models on processed data and thereafter applying Hyper parameter optimization in
Fig. 2 Methodology used for the Profiling diabetic foot ulceration Using Machine Learning
Techniques
A Pilot Study for Profiling Diabetic Foot Ulceration Using Machine … 71
order to get optimum results. Further steps describe the various comparative analyses
done using different ML models and different ratios of training and test sets. Results
give a clear understanding for various features and their role in different ML models,
and thereby concluding which ML model provides optimum result with which set of
features. This analysis points to important features that can be checked for abnormal
temperature change in those regions which can be focused in order to avoid DFU at
an early stage.
Thermogram database [2] is used which contain features like Age Weight, Height,
IMC, R_General, R_LCA, R_LPA, R_MCA, R_MPA, R_TCI, L_General, L_LCA,
L_LPA, L_MCA, L_MPA, L_TCI, Result. The database is composed of 167 plantar
thermograms, which were obtained from 122 diabetic subjects (referring here as DM-
diabetes mellitus) and 45 non-diabetic subjects (referring as CG-Controlled Group).
The subjects were recruited from the General Hospital of the North, the General
Hospital of the South, the BIOCARE clinic and the National Institute of Astrophysics,
Optics and Electronics (INAOE) over a period of 3 years (from 2012 to 2014) [2].
There was much research done under capturing correct and accurate thermograms,
posture and angle at which Thermogram took matters and In order to obtain accurate
and useful thermograms for clinical practice, the recommendations of the Interna-
tional Academy of Clinical Thermology were followed [8]. The dataset consists of
data in two formats; One format consists of a thermogram image where they provide
csv containing temperature at each pixel this record has been maintained for each
subject.
Dataset includes information about following:
• Gender,
• Age,
• Weight,
• Height,
• IMC (stands for BMI in french),
• R_General and L_General (general temperature of the foot. R_ represents the
Right foot and L_ represents the Left foot),
• R_LCA and L_LCA (temperature value in celsius for the lateral calcaneal artery),
• R_LPA and L_LPA (temperature value in celsius for the lateral plantar artery),
• R_MCA and L_MCA (temperature value in celsius for the medial calcaneal artery)
• R_MPA and L_MPA (temperature value in celsius for the medial plantar artery)
• R_TCI and L_TCI (based on the mean differences between corresponding
angiosomes of the foot from a diabetic subject)
Based on Fig. 3, output, Age, Weight, Height, IMC, R_General, R_MCA, R_MPA,
L_MCA are selected as features for the study.
72 I. Tigga et al.
In this study, k-nearest neighbor, naïve bayes, decision tree, random forest, logistic
Regression, Support vector machine (SVM), and ada boost methods are explored on
the dataset for the independent feature analysis.
KNN: K nearest neighbors is a simple algorithm that stores all available cases and
classifies new cases by a majority vote of its k neighbors. Various Distance functions
can be used–Euclidean, Manhattan, Minkowski, and Hamming distance. The first
three functions are used for continuous functions and the fourth one (Hamming) for
categorical variables.
Naive Bayes Classifier: A Naïve Bayes Classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature. The
model is easy to build and particularly useful for very large data sets. Bayes theorem
provides a way of calculating posterior probability P(c|x) from P(c), P(x), and P(x|c).
A Pilot Study for Profiling Diabetic Foot Ulceration Using Machine … 73
3 Result
The tenfold cross-validation procedure is used to evaluate each algorithm with a 70%
and 30% training and testing data split, configured with the same random seed to
ensure that the same splits to the training data are performed and that each algorithm
is evaluated in precisely the same way. The result is illustrated in Table 1.
Figure 5 illustrates the accuracy score spread throughout each cross validation fold
for each algorithm using a box and whisker plot. For machine learning approaches,
grid search is used to find the best possible set of parameters. Table 1 shows that
the Random Forest and SVM classification methods provide higher classification
accuracy for tenfold cross validation.
In this pilot study, five cases have been considered on the dataset used with
training and testing split. The hypothesis is that machine learning accuracy and
dataset features are not correlated.
Case 1 consists of 10% training and the rest 90% as testing data. In case 2, it is
30 and 70% respectively. This ratio is 50% for both training and testing in case 3.
Case 4 comprises 70% training and 30% testing followed by 90 and 10% for case 5
as shown in Table 2.
Table 3 shows the results of five cases over k-Nearest Neighbor, Naïve Bayes,
Decision Tree, Random Forest, Logistic Regression, Support Vector Machine
(SVM), and Adaboost methods. Random Forest and Logistic Regression are able
to classify the DFU profile even with only 10% of data split as the training set. Naïve
Bayes and Logistic Regression are also able to perform well in Case 2. When the split
was 50% for both training and testing, the Decision Tree accuracy was 96.42. This
suggests that the accuracy of the DFU profiling may be independent of the dataset
size because features are prominent identifiers.
Figure 6 shows the support feature for the machine learning technique and which
is not relevant for the respective split ratio with five cases. Age is a major factor
Table 2 Cases consider for result analysis with respect to different training and test spits
Case 1 Case 2 Case 3 Case 4 Case 5
Train Test Train Test Train Test Train Test Train Test
data data data data data data data data data data
10 90 30 90 50 50 70 30 90 10
76 I. Tigga et al.
Table 3 Accuracy for five cases considered in the study over machine learning techniques
ML technique Case 1 Case 2 Case 3 Case 4 Case 5
KNN 72.8 93.16 94.04 88.25 94.11
Naïve Bayes 86.09 95.72 92.85 76.47 88.23
Decision tree 92.71 92.3 96.42 94.11 88.23
Random forest 94.03 92.3 95.23 94.11 100
Logistic regression 94.03 95.72 92.85 92.15 100
SVM 92.05 94.87 91.66 92.15 91.66
Ada boost 92.71 88.88 94.04 92.15 100
that is a prominent feature in DFU profiling. This is in correlation with the standard
factor responsible for the diabetic foot. It can be concluded that Height, weight,
IMC(BMI), and R_general are major factors contributing to the accuracy of the
model for profiling.
Table 4 presents the detailed effect of features on accuracy for five cases with
respect to the machine learning techniques used. This analysis can help in localizing
the foot regions which are more sensitive toward ulcer formation.
Fig. 6 Feature importance across various ML techniques under five different cases
Table 4 Effect of features on accuracy for five cases with respect to the machine learning techniques used
ML technique Age Weight Height IMC R_General R_MCA R_MPA L_MCA
Case 1 KNN 0 0 0 0 0 0 0 0
Naïve Bayes 0.11 0.79 0.03 0.029 0.012 0.011 0.01 0.008
Decision tree 1 0 0 0 0 0 0 0
Random forest 0.41323 0.04004 0.13328 0.09129 0.09398 0.0756 0.07495 0.07667
Logistic regression 1.49027 0.00953 −0.60241 0.43996 0.10717 0.27533 −0.129 0.05483
SVM 1.53529 0.07203 −0.31452 0.3811 0.21512 0.43094 −0.2202 0.1254
Ada boost 1 0 0 0 0 0 0 0
Case 2 KNN 0.11257 0 0.04311 0.03832 0.00479 0 0.00359 0.00599
Naïve Bayes 0.022 0.049 0.03 0.03 0.025 0.0175 0.016 0.1
Decision tree 0.85303 0 0 0.14697 0 0 0 0
Random forest 0.33165 0.04995 0.17966 0.13545 0.09977 0.0493 0.09017 0.06405
Logistic regression 1.77988 −0.01645 −0.98581 0.72805 0.36352 0.1799 0.17141 −0.07007
SVM 1.13259 −0.1013 −0.72769 0.64185 0.56792 0.28346 0.14939 −0.37955
Ada boost 0.24 0 0.16 0.48 0.04 0 0.04 0.04
Case 3 KNN 0.15569 0.01198 0.04072 0.02994 −0.00599 −0.00359 −0.0012 0.00958
Naïve Bayes 0.17 0.01 0 0 0 0 0 0.01
A Pilot Study for Profiling Diabetic Foot Ulceration Using Machine …
4 Discussion
By the above analysis we can infer that taking high ratios for training data sets leads
to overfitting of various classifiers. And taking low training data as compared to
test data leads some classification to predict with high accuracy but internally they
have used only one feature or very less feature for classification. For example while
performing Decision Tree Classification over Plantar Thermogram database when
the split is set to be 10% for training and 90% for testing. The classifier gives 100%
accuracy when applying grid search over it, this was because age was the only feature
used for the creation of a decision tree. But as we know we cannot classify straight
forward by only one feature, also the feature age which is very common information
and doesn’t have much significance with the medical problem of diabetic foot.
Naive Bayes accuracy falls as we increase ratio for training dataset in the split,
this shows that this model is not suitable for classifying this dataset. The split ratio of
70:30 is best situated as mostly all ML models are performing well. Random forest
and SVM performs better for Plantar Thermogram Database.
The earlier finding worked on image data which purely work on pattern, which
is a complex process as DFU leads to foot deformation and the pattern isn’t fixed.
This analysis can help in localizing the foot regions which are more sensitive towards
ulcer formation. This can be inferred from the feature importance for each ML model
and also provide a clear idea of which features are more important to be recorded
and which data split ratio helps to get optimum result.
5 Conclusion
This study shows that detecting diabetic foot in an early stage with the help of
Thermogram data is a very good approach. The Thermogram data consists of the
temperature of four angiosome regions of the plantar area along with personal details
like age, weight, etc. The paper comes up with the conclusion that using diabetic
foot thermogram data as input for machine learning technique in order to classify the
Diabetes Mellitus Group and Control Group and keeping the 70:30 ratio for training
the dataset gives a balanced result which is free from overfitting and underfitting.
Results indicate that Random Forest is performing better and all the features have
positive importance in the Random Forest machine learning technique. The accuracy
and performance of the model can further be increased by hyper parameter tuning
methods.
80 I. Tigga et al.
References
1. Cajacuri LAV (2014) Early diagnostic of diabetic foot using thermal images. HAL 11 Jul 2014
2. Hernandez-Contreras DA, Peregrina-Barreto H, de Jesus Rangel-Magdaleno J, Renero-Carrillo
F-J (2019) Plantar thermogram database for the study of diabetic foot complications. IEEE
Access. (4 Nov 2019)
3. Lahiri BB, Bagavathiappan S, Jaya kumar T, Philip J (2012) Medical applications of infrared
thermography: a review. Infrared Phys Technol 55(4). (July 2012)
4. Mori T, Nagase T, Takehara K, Oe M, Ohashi Y, Amemiya A, Noguchi H, Ueki K, Kadowaki
T, Sanada H (2013) Morphological pattern classification system for plantar thermography of
patients with diabetes. J Diabetes Sci Technol 7(5). (September 2013)
5. Adam M, Ng EYK, Tan JH, Heng ML, Tong JWK, Acharya UR (2017) Computer aided
diagnosis of diabetic foot using infrared thermography: a review. (25 Oct 2017)
6. Gamage C, Wijesinghe I, Perera I (2019) Automatic scoring of diabetic foot ulcers through
deep CNN based feature extraction with low rank matrix factorization. In: 2019 IEEE 19th
international conference on bioinformatics and bioengineering (BIBE). https://doi.org/10.1109/
bibe.2019.00069
7. Cruz-Vega I, Hernandez-Contreras D, Peregrina-Barreto H, de Jesus Rangel-Magdaleno J, and
Ramirez-Cortes JM (2020) Deep learning classification for diabetic foot thermograms. (Mar
2020)
8. International Academy of Clinical Thermology (2002) Thermography guidelines: standards
and protocols in clinical thermographic imaging. Redwood City, CA, USA
9. Peregrina-Barreto H, Morales-Hernandez LA, Rangel-Magdaleno JJ, Avina-Cervantes JG,
Ramirez-Cortes JM, Morales-Caporal R (2014) Quantitative estimation of temperature varia-
tions in plantar angiosomes: a study case for diabetic foot. In: Computational and mathematical
methods in medicine, vol 2014
10. Renero-C FJ (2017) The thermoregulation of healthy individuals, overweight–obese, and
diabetic from the plantar skin thermogram: a clue to predict the diabetic foot, vol 8
A Deep Learning Approach for Gaussian
Noise-Level Quantification
1 Introduction
Image noise removal has been an active topic of research in the domain of image
processing. Noise in image processing is a random variation of brightness or color in
images that do not portray the true information of the image. The presence of noise
in an image alters the true value of pixels and causes a loss of information which
is a disadvantage to image processing. A few common types of noises that can be
found in images are Gaussian noise, Salt and Pepper noise, Speckle noise and more
[1–3]. Noise may be introduced in the image during capturing, transmission, or due
to electrical faults in the capturing device [4–6].
Noise reduction techniques have been a domain of extensive study over the last
few years. Most of these studies are focused on additive white gaussian noise as it
is one of the most common types of noise present in an image. It is hard to object
that these techniques have proven to be very helpful in Digital Image Processing
(DIP) [7]. However, these techniques are based on the assumption that the images
to be processed are noisy. The possibility of the image being noise-free is being
ignored. Almost all of the aforementioned methods suffer in determining if the images
are corrupted by noise and therefore another processing overhead where the noisy
images have to be sorted out manually in advance. Therefore, noise quantification
also becomes a necessary step in image denoising. The development of a Gaussian
noise quantification model is the main interest of the author(s). In this research article,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 81
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_6
82 R. K. Yadav et al.
the author(s) are presenting a Convolutional Neural Network (CNN) model which is
inspired by LeNet and AlexNet architecture [8, 9]. The proposed model will help to
identify and apply the appropriate algorithm based on the amount of noise available
in the image.
The paper is further organized as follows. Section 2 is a brief review of related
work. Section 3 introduces the proposed model. Section 4 delineates the experimental
results. Finally, Sect. 5 concludes the work and talks about the future scope of the
work.
2 Related Work
A lot of work has been done in image noise reduction so far. An image-denoising col-
laborative filtering method using sparse 3D transform domain is proposed by Dabov
et al. [10]. The Non-Local Mean (NLM) technique such as Block Matching and 3D
filtering (BM3D) [11] is one of the powerful image-denoising techniques used by the
researchers. Another Prefiltered Rotationally Invariant Non-Local Means 3D (PRI-
NLM3D) technique is proposed by Manjon et al. [12]. This NLM-based denoising
technique provided good accuracy scores in terms of Peak Signal-to-Noise Ratio
(PSNR), Structural Similarity Index Measure (SSIM), and Universal Image Quality
Index (UQI) measures for denoising Magnetic Resonance(MR) images. Gondara et
al. [13] proposed a deep learning approach for medical image denoising based on a
convolutional autoencoder. This model was compared with NLM and median filter
and yielded better SSIM scores for a small training sample of 300.
However, less work has been done on identifying the type and amount of noise
present in the image. Quantification of the noise is no less important than noise
reduction. For identifying the type of noise present in an image a voting-based deep
CNN model is proposed by Kumain et al. [14]. This model is only giving information
about the type of noise present in an image. For quantifying the Gaussian noise present
in an image Chauh et al. [15] proposed a deep learning approach based on CNN.
This CNN method quantified the Gaussian noise into ten classes with the noise levels
of σ = 10, 20, 30, 40, 50, 60, 70, 80, and 90 to corrupt the image, and achieved an
accuracy of 74.7%. A noise classifier based on CNN was proposed by Khaw et al. [16]
utilizing the Stochastic Gradient Descent (SGD) optimization technique. The noisy
image was fed as input to the model and based on the distinctive features extracted by
the sequence of convolutional and pooling layers. The CNN classification methods
have yielded excellent results in several domains such as Handwritten Character
Recognition or Character Classification [17], Vehicle, Logo Recognition [18], Face
Classification [19], and Bank Notes Series identification [20].
In the real-world scenario, if the clean image is unavailable and only the noisy
image is available then the performance parameters such as PSNR and SSIM fail
to work. So, quantification of noise level becomes necessary. The author(s) have
proposed a CNN model for Gaussian noise quantification. The next section describes
the same.
A Deep Learning Approach for Gaussian Noise-Level Quantification 83
3 Proposed Model
In this section, the architecture of the proposed model for the quantification of Gaus-
sian noise has been discussed. The model architecture is shown in Fig. 1.
The author(s) developed a noise quantification model which is based on the deep
learning technique. The proposed model is inspired by the LeNet and AlextNet archi-
tecture [8, 9]. The proposed architecture is addressing the multiclass classification
problem and classifies the input image into 11 different classes where the 10 classes
represent images corrupted by 10 different levels of Gaussian noise with a mean
zero and 1 class represents a noise-free image. The specification of the dataset uti-
lized for the model training and testing as per the prepared dataset discussed in the
experimental analysis section.
CNN has been used to develop the classifier model. Input images are resized to
256 × 256 × 3 and perturbed by Gaussian noise levels of variance 0.01, 0.02, 0.03,
0.04, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5. The noisy image is fed to the first Convolutional
(Conv2D) layer. The filter values are adjusted through backpropagation repeatedly
to get the optimum set of values for the best classification accuracy. The model has
a series of four alternating Conv2D and MaxPool layers. MaxPool layer is used to
subsample the feature maps. Further, the author(s) utilized the dropout layer to reduce
the overfitting of the model. In the model development process, the author(s) utilized
Rectified linear unit (ReLU) as an activation function and Adam as an optimizer. The
Fig. 1 Architecture of the proposed model depicting the CNN layers used
84 R. K. Yadav et al.
softmax function is utilized in the final dense layer to predict probabilities for each
class of image. The class with the highest probability will be chosen. The model
summary with the number of trainable parameters is as per Table 1.
This section comprises the steps utilized for dataset preparation and the classifi-
cation results. Section 4.1 talks about the dataset preparation. Sect. 4.2 is about the
performance parameters used for evaluation, and Sect. 4.3 describes the experimental
results.
A Deep Learning Approach for Gaussian Noise-Level Quantification 85
In the dataset preparation process, due to the non-availability of the specific dataset,
the noisy dataset was prepared by incorporating the Gaussian noise at different lev-
els of variance with 0 mean. First, the 2000 images are taken randomly from the
MSRA10K [21] dataset, and further, noise is incorporated. For model training, 70%
of the sample is utilized. The remaining data was split into validation and testing set
with 15% data in each. Along with the noise-free class, a total of 11 classes were
created. A brief description of the dataset is as per Table 2.
For overall evaluation, the classification report [22] and confusion matrix [23] have
been used. The key terms for this are as follows.
(a) Precision: Precision is the measure that out of the total predicted positives for a
class, how many are actually positive. The equation for precision is as follows:
True Positive
pr ecision = (1)
True Positive + False Positive
(b) Recall: Recall is the measure of how many positives were correctly classified,
out of the total number of positives for a particular class. The equation for recall is
as follows:
True Positive
r ecall = (2)
True Positive + False Negative
(c) F1-Score: It is the weighted harmonic average between Precision and Recall.
The best score is represented by 1 and the worst score is represented by 0.
(d) Confusion Matrix: It is a matrix that represents the result in the form of a table.
The diagonals represent true positives for the corresponding class. The rows represent
actual values and the columns represent predicted values.
During the training of the model, the ModelCheckpoint and EarlyStopping [24] func-
tionality of keras(python) library were used to get the model with the best validation
accuracy. Since it is difficult to estimate the exact number of epochs, EarlyStopping
was utilized and the patience level was set to 50 during the model training. Figures 2
and 3 are showing the model training/validation accuracy and training and validation
loss, respectively.
After analyzing the accuracy and loss graph for training and validation, the effect
of the dropout layer during the model development process was observed. The
dropout layer is useful for solving the problem of overfitting. However, it can be
seen that there are fluctuations in the accuracy and loss graph. Nevertheless, the
author(s) have been able to save the best model using the ModelCheckpoint [24]
feature of keras(python) library. The best model was achieved at epoch 116. Due to
a patience level of 50, the training process automatically ended in epoch 166. The
best-saved model was used to yield the accuracy of the model on the test set.
The experimental results as per the performance parameters discussed above are
shown in Figs. 4 and 5. The proposed model has shown better results compared to
Chau et al. [13]. The author(s) in the aforementioned paper used standard images
from the USC-SIPI dataset [25]. Noise levels of σ = 10, 20, 30, 40, 50, 60, 70, 80, and
90 were used. This paper achieved an accuracy of 74.7%, whereas the quantification
model present in this paper achieved 96% accuracy, which is much higher.
88 R. K. Yadav et al.
In this research article, the author(s) have proposed a model for quantification of the
Gaussian noise present in the image. Using quantitative parameters such as SSIM,
PSNR, it is not possible to identify the strength of noise reduction, if the clean image
is not available. This quantification model can help evaluate a denoising model in
terms of the amount of noise it has denoised when there is no clean image available.
The author(s) here have addressed only 11 classes for this quantification task. The
work can be further extended by incorporating more levels of noise and developing
a generalized model which will address other types of noise as well.
References
1. Ambulkar S, Golar P (2014) A review of decision based impulse noise removing algorithms.
Int J Eng Res Appl 4:54–59
2. Verma R, Ali J (2013) A comparative study of various types of image noise and efficient noise
removal techniques. Int J Adv Res Comput Sci Softw Eng 3(10)
3. Singh M, Govil MC, Pilli ES, Vipparthi SK (2019) SOD-CED: salient object detection for
noisy images using convolution encoder-decoder. IET Comput Vision 13(6):578–587
4. Hosseini H, Hessar F, Marvasti F (2015) Real-time impulse noise suppression from images
using an efficient weighted-average filtering. IEEE Signal Process Lett 22:1050–1054
5. Bovik A (2000) Handbook of image and video processing, 2nd ed. Elsevier Academic Press
6. Kumain SC, Singh M, Singh N, Kumar K (2018) An efficient Gaussian noise reduction tech-
nique for noisy images using optimized filter approach. In: 2018 first international conference
on secure cyber computing and communication (ICSCCC), pp 243–248
7. Chang SG, Bin Y, Vetterli M (2000) Adaptive wavelet thresholding for image denoising and
compression. IEEE Trans Image Process 9:1532–1546
A Deep Learning Approach for Gaussian Noise-Level Quantification 89
1 Introduction
Biometrics [1] are physical or behavioral characteristics that can uniquely identify a
human being. Physical biometrics include face, eye, retina, ear, fingerprint, palmprint,
periocular, footprint, etc. Behavioral biometrics include voice matching, signature
and handwriting, etc. There have been several applications [1] of biometrics in diverse
areas such as ID cards, surveillance, authentication, security in banks, airports and
corpse identification. Ear [2] is a recent biometric which has drawn the attention
of the research community. This biometric possesses certain characteristics which
distinguish it from other biometrics, e.g. less amount of information is required than
the face, where the person is standing in a profile manner facing the camera, face
recognition do not perform satisfactorily. Further, no user cooperation is required for
ear recognition as required by other biometrics such as iris and fingerprint.
The ear is one of those biometrics whose permanence attribute is very high. Unlike
our face which changes considerably throughout our life, the ear experiences very
less changes. Further, it is fairly collectible and in the post-covid scenario, it can
be considered as a safer biometric since the face and hands are covered with masks
or gloves. It can be more acceptable if we do not bother a user for more number
of samples. In a real-world scenario, the problem of ear recognition becomes more
complex when only a single training sample is available. Under these circumstances,
one sample per person (OSPP) [3] architecture is used. This methodology has been
highlighted in the research community over all the problem domains such as face
recognition [3, 4], ear recognition [5] and other biometrics. The reason OSPP is
popular is the preparation of the dataset; specifically, the collection of the sample
from the source is very easy. However, recognition becomes more complex due to
the lack of samples. Hence, the model cannot be trained in the best possible manner.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 91
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_7
92 A. R. Srivastava and N. Kumar
There are several methods suggested in the literature by researchers for addressing
OSPP for different biometric traits. Some of the popular methods include Principal
Component Analysis (PCA), Kernel PCA, Wavelet transformation, Fourier trans-
formation with frequency component masking and wavelet transformation using
subbands. These methods have been employed for different biometrics and under
different experimental settings. However, it is not clear which method performs best
for ear recognition under a single training sample. Hence, there is a need to compare
the performance of the aforementioned methods for ear recognition. In this paper,
the performance of all the aforementioned methods is compared on three standard
publicly available datasets viz., Indian Institute of Technology-Delhi (IIT-D) [6],
Mathematical Analysis of Images (AMI) [7] and Annotated Web Ears (AWE) [8].
The rest of the paper is organized as follows: Sect. 2 reviews the methods available
in the literature briefly. Section 3 describes the single sample ear recognition methods
whose performance is compared in this paper. Experimental setup and results are
given in Sect. 4. Finally, conclusion and future work are given in Sect. 5.
2 Related Work
PCA method was used for ear recognition by Zhang et al. [9] in 2008. This method
extracted local as well as global features. Linear Support Vector Machine (SVM)
was used for classification. Later in 2009, Long et al. [10] proposed using wavelet
transformations for ear recognition. The proposed method was better than PCA and
Linear Discriminant Analysis(LDA) [11] previously implemented. In 2011, Zhou et
al. [12] used the color Scale Invariant Feature Transform (SIFT) method for repre-
senting the local features. In the same year, Wang et al. [13] employed an ensemble of
the local binary pattern (LBP), direct LDA (linear discriminant analysis) and waterlet
transformation methods for recognizing ears. The method was able to give accuracy
up to 90% depending upon the feature dimension given as input. A robust method for
ear recognition was introduced in 2012 by Yuan et al. [14]. They proposed an ensem-
ble method of PCA, LDA and random projection for feature extraction and a sparse
classifier for classification. The proposed was able to recognize partially occluded
image samples. In 2014, Taertulakarn et al. [15] proposed ear recognition based on
Gaussian curvature-based geometric invariance. The method was particularly robust
against geometric transformations. In the same year, an advanced form of wavelet
transformation along with discrete cosine transformation was introduced by Ying et
al. [16]. The wavelet used weighted distance which highlighted the contribution of
low-frequency components in an image.
In 2016, Ling et al. [17] used Deep Neural Network for ear recognition. The pro-
posed method also took advantage of CUDA cores for training the model. The final
model was quite accurate against hair-, pin- and glass-occluded ear image. The same
year, the One Sample Per Person (OSPP) problem for ear biometric was tackled
by Long et al. [18]. This method used an adaptive multi-keypoint descriptor sparse
representation classifier. This method was occlusion-resistant and better than con-
Performance Evaluation of Single Sample Ear Recognition Methods 93
temporary methods. The recognition time was a little high in the band of 10–12 s. In
2017, Emersic et al. [8] introduced an extensive survey of methods of ear recognition.
In this paper, different divisions were suggested for recognition approaches depend-
ing on the technique used for feature extraction viz., holistic, geometric, local and
hybrid. Holistic approaches describe the ear with global properties. In this approach,
the ear sample is analyzed as a whole and local variations are not taken into consider-
ation. Methods using geometrical characteristics of the ear for feature representation
are known as geometric approaches. Geometric characteristics of the ear include the
location of specific ear parts, shape of ear, etc. Local approaches describe local parts
or the local appearance of the ear and use these features for the purpose of recogni-
tion. Hybrid approaches involve those techniques which cannot be categorized into
other categories or are an ensemble of different category methods. The paper also
introduced a very diverse ear dataset called Annotated Web Ears (AWE) which has
been used in this paper also.
In 2018, the deep transfer learning method was proposed as a deep learning
technique for ear biometric recognition by Ali et al. [19] over a pretrained CNN model
called ALexNet. The methodology involved using a state-of-the-art training function
called Stochastic Gradient Descent with Momentum (SGDM) and a momentum of 0.9.
Another deep learning-based method was suggested in 2019 by Natchapon et al. [20].
In this method, a CNN architecture was employed for frontal-facing ear recognition. It
was more acceptable due to the fact that the creation of the face dataset simultaneously
created the ear dataset. In the same year, Matthew et al. [21] proposed a variation
of wavelet transformation and successive PCA for single sample ear recognition. In
2020, Ibrahim et al. [22] introduced a variation of Support Vector Machine (SVM) for
ear biometric recognition called Learning Distance Metric via DAG Support Vector
Machine. In 2021, deep unsupervised active learning methodology was proposed by
Yacine et al. [23]. The labels were predicted by the model as it was unsupervised.
Conditional deep convolutional generative adversarial network (cDCGAN) was used
to color the grayscale image which further increased the accuracy of recognition.
3 Methodology
3.1 PCA
Principal Component Analysis, or PCA [11], is a method used to reduce the dimen-
sions of samples. It extracts those features which contain more variation in the inten-
sity values. Its popularity owes to the fact that although the size of data is reduced,
still it is an unsupervised method. Reducing the number of variables of a dataset
naturally comes at the expense of accuracy, but the trick in dimensionality reduc-
tion is to trade a little accuracy for simplicity, because smaller datasets are easier to
explore and visualize and make analyzing data much easier and faster for machine
learning algorithms without extraneous variables to process. So in a nutshell, the idea
94 A. R. Srivastava and N. Kumar
3.2 KPCA
PCA is a linear method which means that it can only be applied to datasets which are
linearly separable. It does an excellent job for datasets, which are linearly separable.
But, if we use it for non-linear datasets, we might get a result which may not be the
optimal dimensionality reduction. Kernel PCA [9] uses a kernel function to project the
dataset into a higher dimensional feature space, where the data is linearly separable.
Hence, using the kernel, the originally linear operations of PCA are performed in a
reproducing kernel Hilbert space.
Most frequently used kernels include cosine, linear, polynomial, radial basis func-
tion (rbf), sigmoid as well as pre-computed kernels. Depending upon the type of
dataset on which these kernels are applied, different kernels may have different pro-
jection efficiency. Thus, the accuracy depends solely on the kernel used in the case
of KPCA.
3.3 Fourier
Fourier analysis [24] is named after Jean Baptiste Joseph Fourier (1768–1830), a
French mathematician and physicist. Joseph Fourier, while studying the propagation
of heat in the early 1800s, introduced the idea of a harmonic series that can describe
any periodic motion regardless of its complexity. Fourier Transform is a mathemat-
ical process that relates the measured signal to its frequency content. It is used for
analyzing the signals. It involves the decomposition of the signals in the frequency
domain in terms of sinusoidal or cosinusoidal components. Fourier transform of a
function of time is a complex-valued function of frequency, whose magnitude (abso-
lute value) represents the amount of that frequency present in the original function,
and whose argument is the phase offset of the basic sinusoid in that frequency. The
Fourier transform is not limited to functions of time, but the domain of the original
function is commonly referred to as the time domain.
When the image is transformed, there are usually bright areas signifying the edges
or high-frequency components and dull areas signifying noise or low-frequency
components [25]. In the proposed methodology, the high- as well as low-frequency
components are sequentially masked, and the inverse of the masked frequency profile
Performance Evaluation of Single Sample Ear Recognition Methods 95
3.4 Wavelet
The edge is the most important high-frequency information of a digital image. The
traditional filter eliminates the noise effectively. But it will make the image blurry.
So it is aimed to protect the edge of the image when reducing the noise in an image.
The wavelet analysis method is a time-frequency analysis method which selects
the appropriate adaptive frequency band based on the characteristics of the signal.
Then the frequency band matches the spectrum which improves the time-frequency
resolution. The wavelet analysis method has an obvious effect on the removal of
noise in the signal.
In this paper, for directly applying the wavelet transformation [10] as well as for
further wavelet analysis, the “Discrete” Meyer class of wavelets is used. According
to the features of the multi-scale edge of the wavelet, we analyze the de-noising
method of the Meyer wavelet transform which is based on a soft and hard threshold.
“Discrete” Meyer is a comparatively simpler wavelet as compared to other classes
of wavelets. It has only 2 variables, namely the scaling function and the wavelet
function.
After wavelet analysis of the samples, unlike the Fourier method where the trans-
formed image had to be converted back to the spatial domain for further processing
and classification, the processed feature vector is directly fed into PCA for dimension-
ality reduction. It is a distinguishing feature of wavelet and Fourier transformations
where the former transformation preserves the locality of features but the latter takes
a holistic approach to conversion to the frequency domain. Feature vector from PCA
is input to SVM for classification.
In this method, a little more sophisticated wavelet called the “Biorthogonal 1.1”
wavelet is used. In this family of wavelets, the scaling and wavelet functions of
discrete Meyer wavelets is extended by introducing a decomposition and reconstruc-
tion parameter to both of the wavelet parameters. A biorthogonal wavelet is used
to transform the image in the frequency domain. Further, it divides the image into
subbands [21] depending on the frequency components as low-low (LL), low-high
(LH), high-low (HL) and high-high (HH). Here, the LL subband is the approximate
image and c, whereas the LH, HL and HH subbands inherently include the edge
information of horizontal, vertical and diagonal directions, respectively (Fig. 1).
96 A. R. Srivastava and N. Kumar
In this method, a mean image is derived from the HH and LL subband. The HH
band contains diagonal details and LL is the approximate image. This mean the image
is then fed to PCA and SVM classifier for the purpose of classification.
4 Experimental Results
of other methods has a large deviation from 40% to approximately 78%. It is also
observed that the classification accuracy saturates after 15 components. Further, PCA
and KPCA report drop in performance in comparison to the AMI dataset. This is
due to the high diversity of ear images such as yaw, high occlusion and variation in
ethnicity. The highest classification accuracy is again reported by multiband wavelet
transformation.
A summary of the highest average classification accuracy reported by five com-
pared methods on the three datasets after 25 iterations is given in Table 2. It is apparent
from Table 2 that the Wavelet transformation with multiband gives the highest as well
as most consistent accuracy of the three datasets. The variation in performance by
all the compared methods is the least on the IIT Delhi ear dataset and largest on
AWE dataset. These results also support the characteristics of individual datasets
Performance Evaluation of Single Sample Ear Recognition Methods 99
Ear recognition has emerged as an attractive research area in the past two decades.
This problem becomes more challenging when there is only one sample per person
available for training. In literature, there have been several methods which have been
suggested for ear recognition under different experimental settings. In this paper,
we have attempted to investigate which method performs best for single sample ear
recognition. We have compared the performance of five methods on three publicly
available datasets. It has been found that the wavelet subband-based method performs
best on all three datasets. In future work, it can be explored how the deep learning-
based methods can be exploited for single sample ear recognition.
References
1. Jain A, Bolle R, Pankanti S (1996) Introduction to biometrics. In: Jain AK, Bolle R, Pankanti
S (eds.) Biometrics. Springer, Boston, MA. https://doi.org/10.1007/0-306-47044-6_1
2. Yuan L, Mu Z, Xu Z (2005) Using ear biometrics for personal recognition. In: Li SZ, Sun Z, Tan
T, Pankanti S, Chollet G, Zhang D (eds) Advances in biometric person authentication. IWBRS
2005. Lecture notes in computer science, vol 3781. Springer, Berlin, Heidelberg. https://doi.
org/10.1007/11569947_28
100 A. R. Srivastava and N. Kumar
3. Kumar N, Garg V (2017) Single sample face recognition in the last decade: a survey. Int J
Pattern Recognit Artif Intell. https://doi.org/10.1142/S0218001419560093
4. Zhao W, Chellappa R, Phillips P, Rosenfeld A (2003) Face recognition: a literature survey.
ACM Comput Surv 35(4):399–458. https://doi.org/10.1145/954339.954342
5. Kumar N (2020) A novel three phase approach for single sample ear recognition. In: Boony-
opakorn P, Meesad P, Sodsee S, Unger H (eds) Recent advances in information and commu-
nication technology 2019. Advances in intelligent systems and computing, vol 936, Springer,
Cham. https://doi.org/10.1007/978-3-030-19861-9_8
6. Kumar A, Wu C (2012) Automated human identification using ear imaging. Pattern Recognit
41(5)
7. AMI Ear database. https://ctim.ulpgc.es/research_works/ami_ear_database/
8. Emeršič Z, Štruc V, Peer P (2017) Ear recognition: more than a survey. Neurocomput-
ing 255:26–39. ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2016.08.139. https://www.
sciencedirect.com/science/article/pii/S092523121730543X
9. Zhang H, Mu Z (2008) Ear recognition method based on fusion features of global and local
features. In: 2008 international conference on wavelet analysis and pattern recognition, pp
347–351. https://doi.org/10.1109/ICWAPR.2008.4635802.
10. Long Z, Chun M (2009) Combining wavelet transform and Orthogonal Centroid Algorithm
for ear recognition. In: 2009 2nd IEEE international conference on computer science and
information technology, pp 228–231 (2009). https://doi.org/10.1109/ICCSIT.2009.5234392
11. Kaçar Ü, Kirci M, Güeş E, İnan T (2015) A comparison of PCA, LDA and DCVA in ear
biometrics classification using SVM. In: 2015 23nd signal processing and communications
applications conference (SIU), pp 1260–1263. https://doi.org/10.1109/SIU.2015.7130067
12. Zhou J, Cadavid S, Mottaleb M (2011) Exploiting color SIFT features for 2D ear recognition.
In: 2011 18th IEEE international conference on image processing, pp 553–556. https://doi.org/
10.1109/ICIP.2011.6116405
13. Wang Z, Yan X (2011) Multi-scale feature extraction algorithm of ear image. In: 2011 Inter-
national conference on electric information and control engineering, pp 528–531. https://doi.
org/10.1109/ICEICE.2011.5777641
14. Yuan L, Li C, Mu Z (2012) Ear recognition under partial occlusion based on sparse repre-
sentation. In: 2012 international conference on system science and engineering (ICSSE), pp
349–352. https://doi.org/10.1109/ICSSE.2012.6257205
15. Taertulakarn S, Tosranon P, Pintavirooj C (2014) Gaussian curvature-based geometric invari-
ance for ear recognition. In: The 7th 2014 biomedical engineering international conference, pp
1–4. https://doi.org/10.1109/BMEiCON.2014.7017396
16. Ying T, Debin Z, Baihuan Z (2014) Ear recognition based on weighted wavelet transform and
DCT. In: The 26th Chinese control and decision conference (2014 CCDC), pp 4410–4414.
https://doi.org/10.1109/CCDC.2014.6852957
17. Tian L, Mu Z (2016) Ear recognition based on deep convolutional network. In: 2016 9th
international congress on image and signal processing, biomedical engineering and informatics
(CISP-BMEI), pp 437–441. https://doi.org/10.1109/CISP-BMEI.2016.7852751
18. Chen L, Mu Z (2016) Partial data ear recognition from one sample per person. IEEE Trans
Hum Mach Syst 46(6):799–809. https://doi.org/10.1109/THMS.2016.2598763
19. Almisreb A, Jamil N, Din N (2018) Utilizing alexnet deep transfer learning for ear recognition.
In: 2018 Fourth international conference on information retrieval and knowledge management
(CAMP), pp 1–5. https://doi.org/10.1109/INFRKM.2018.8464769
20. Petaitiemthong N, Chuenpet P, Auephanwiriyakul S, Theera-Umpon N (2019) Person identi-
fication from ear images using convolutional neural networks. In: 2019 9th IEEE international
conference on control system, computing and engineering (ICCSCE), pp 148–151. https://doi.
org/10.1109/ICCSCE47578.2019.9068569
21. Zarachoff M, Sheikh-Akbari A, Monekosso D (2019) Single image ear recognition using
wavelet-based multi-band PCA. In: 2019 27th European signal processing conference
(EUSIPCO), pp 1–4. https://doi.org/10.23919/EUSIPCO.2019.8903090
Performance Evaluation of Single Sample Ear Recognition Methods 101
22. Omara I, Ma G, Song E (2020) LDM-DAGSVM: learning distance metric via DAG support
vector machine for ear recognition problem. In: 2020 IEEE international joint conference on
biometrics (IJCB), pp 1–9. https://doi.org/10.1109/IJCB48548.2020.9304871
23. Khaldi Y, Benzaoui A, Ouahabi A, Jacques S, Ahmed A (2021) Ear recognition based on deep
unsupervised active learning. IEEE Sens J 21(18):20704–20713. (15 Sept 2021). https://doi.
org/10.1109/JSEN.2021.3100151
24. Gonzalez R, Woods R (2006) Digital Image Processing, 3rd edn. Prentice-Hall Inc, USA
25. Frejlichowski D (2011) Application of the polar-fourier greyscale descriptor to the problem
of identification of persons based on ear images. In: Image processing and communications
challenges, vol 3. Springer, Berlin, Heidelberg, pp 5–12
AI-Based Real-Time Monitoring for
Social Distancing Against COVID-19
Pandemic
Alok Negi, Krishan Kumar, Prachi Chauhan, Parul Saini, Shamal Kashid,
and Ashray Saini
1 Introduction
The COVID-19 epidemic has impacted the lives of millions of people worldwide,
and the crisis’s consequences are still being felt. The COVID-19 catastrophe has
been dubbed the worst economic disaster since the great depression. It is a sobering
reminder of long-standing imbalances in our societies. The daily struggles of the
COVID-19 pandemic are constantly compared to living in any war environment [1].
The long-term social and economic effects of the COVID-19 epidemic are uncertain,
but many people are concerned that lockdown-related education cuts affected 1.6 bil-
lion students globally, resulting in a loss of 0.3–0.9 years of education. According to
World Bank statistics, five months global shutdown could result in 10 trillion dollars
in lost wages over dollars lifetimes. Economic shocks from the pandemic are highly
probable to increase school dropout rates, and nearly two-thirds of the households
surveyed lead to a decline in agricultural and non-agricultural income (the latter
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 103
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_8
104 A. Negi et al.
being more severe), as well as a large majority (94%) reporting reduced remittances
received, which is consistent with international reports from the first months just after
infection [2]. In all, almost three-fourths’ people reported an unambiguous reduction
in income. Our evidence validates worries regarding pandemics’ adverse negative
externalities. When a pandemic hit, it sent most employees scurrying home, resulting
in income disparity and hurting employment prospects for all of those with only a
high school diploma while having little effect on those with graduate degrees.
The COVID-19 pandemic’s trajectory shows the changing environment regu-
lating both coronavirus transmission and its socio-economic consequences. Since
mid-March, a second severe wave of diseases has resulted in lockdowns, the second
set of stringent measures established following the original epidemic in the spring.
Although COVID-19 transmission behaviors and consequences differed between
rural and urban regions, there were significant implications on rural incomes and
livelihoods. COVID-19’s effects on revenue, food security, and dietary variety are
progressively appearing as global trends and local variances. Vaccination rates are
already increasing, and people are looking forward to a safer post-pandemic future.
However, specific essential actions are required after vaccines, such as masks, which
are essential to prevent transmission and save lives. Social distancing, avoiding
crowded, confined, and close-contact situations, proper ventilation, washing hands,
concealing sneezes and coughs, and much more used to be parts of a complete “Do
it all!” strategy [3].
Coronaviruses can be disseminated when persons with the infection have close,
constant contact with those who are not infected. It generally involves staying more
than 15 minutes approximately two meters from an infected individual, such as con-
versing with someone for illustration. The more people come into proximity with
the droplets from coughs and sneezes of an infected individual, the more susceptible
you are to get the virus. It necessitates the use of a new measurement notion. These
measures are sometimes referred to as “social distancing” that include activities like
temporarily prohibiting socializing in public areas like entertainment or sporting
events, restricting the usage of non-essential public transportation, or encouraging
more work at home. In general, social distancing is an attempt to prevent coron-
avirus transmission in big gatherings such as meetings, movie theaters, weddings,
and public transportation. Schools, universities, malls, and movie theaters are now
shuttered across the country to emphasize the need for social distance. People are
being encouraged to work from home and have as little interaction with others as
possible.
Wearing a mask and keeping a six-foot distance to prevent the disease from spread-
ing as advised by the WHO. It is indeed essential for citizens to have a specified level
of social interaction for better mental wellness. As a result, distinct stages of artificial
intelligence (AI) can be followed depending on the disease’s spread [4]. Therefore,
we developed an AI-based model for real-time monitoring of individuals for social
distancing that uses YOLOv3 person identification, VGG-16-based face mask clas-
sifier, Dual Shot Face Detector-based face detection, and DBSCAN clustering. The
main objectives and contribution of this paper are as follows:
AI-Based Real-Time Monitoring for Social Distancing … 105
– To use real-time video streams to monitor persons who are breaking the rules of
Social Distancing.
– Building a data-driven framework to assist governments in establishing a secure
de- and re-confinement planning schema for their respective regions.
– To assist in navigating future waves of viral transmission and other unforeseeable
negative consequences.
– To create a decision-making tool that can be used not only for the present epidemic
but also for future pandemics, which we all know are coming, especially as we
witness the repercussions of global climate change.
– To prevent the transmission of new infection waves by shifting from a reactive to
a proactive approach.
The remaining part of the paper is laid out as follows: Section 2 describes the
related work followed by the proposed methodology in Sect. 3. Section 4 describes
results and discussion. Section 5 brings the paper to a conclusion and outlines future
research.
2 Related Work
In this crucial time, social disengagement is one of humanity’s most urgent calls.
In this way, countries are preventing infection and reducing infection, and flattening
the infection-to-community curve. The “lockdown,” as it is known, will essentially
lower the viral load and the number of infected cases that need to be treated. Masks
can help prevent the infection from spreading from the person who is wearing it
to others. COVID-19 [5] is not protected by masks alone; they must be used with
physical separation and hand cleanliness.
In the COVID-19 pandemic, identifying persons who use face masks is com-
plex, and detection of facemask-with high accuracy has practical applications in
COVID-19 epidemic prevention. As a consequence, Qin et al. [6] proposed a four-
step technique for identifying facemask-wearing conditions: image pre-processing,
facial recognition and cropping, image super-resolution, and facemask-wearing sce-
nario detection. For face image classification, the approach integrated an SR network
with a classification network (SRCNet). The input images were processed with facial
detection and cropping, SR, and facemask-wearing condition identification to recog-
nize the facemask-wearing scenario. Finally, SRCNet achieved a 98.70% accuracy
and outperformed conventional end-to-end image classification methods by over
1.5% in kappa.
For face mask identification, Loey et al. [7] introduced a hybrid model that com-
bined deep and conventional machine learning. There were two sections to the model.
The initial step was to extract features using Resnet50, one of the most popular deep
supervised learning models. The second section dealt with the identification of face
masks using traditional machine learning techniques. Conventional machine learning
106 A. Negi et al.
techniques such as the Support Vector Machine (SVM), decision trees, and collabo-
rative algorithms were investigated.
In order to work and travel securely during the COVID-19 outbreak, Xiao et al. [8]
created a deep learning-based security detection technique that relied on machine
vision rather than manual monitoring. To identify unlawful actions of workers without
masks in workplaces and highly populated locations, convolutional neural network
VGG-19 modifies the original 3 FC layers with 1 Flatten layer and 2 FC layers, as
well as the original Softmax classifier with two labeled Softmax classification layers,
Masked workers (Mask) and unmasked workers were subjected to training and testing
(Un-mask). The upgraded network model’s precision for identifying whether or not
to wear a mask has grown by 10.91% and 9.08%, respectively, while its recall rate
has enhanced by 11.4% and 8.39%.
Hussai et al. [9] deployed deep learning to classify and recognize face emotions
in real-time. They classified seven face expressions using VGG-16. The suggested
model was trained using the KDEF dataset and has an accuracy of 88%. The use
of masks is an essential part of the covid-19 prevention process. Due to embedded
devices’ limited memory and computational capability, real-time surveillance of
persons wearing masks or not is complicated. Roy et al. [10] tested several prominent
object detection methods on the Moxa3K benchmark dataset to address these issues,
including YOLOv3 YOLOv3Tiny, SSD, and Faster R-CNN. As a good combination
of accuracy and real-time inference, the YOLOv3 small model gave an excellent
mAP of 56.27% with an FPS of 138. The backbone of YOLOv3 is Darknet-53
in [11] applied the YOLOv3 algorithm to detect faces. The accuracy of the proposed
technique was 93.9%. It was developed using the CelebA and WIDER FACE datasets,
which contain over 600,000 shots.
Din et al. [12] presented a new GAN-based network that can automatically delete
masks covering the facial region and recreate the vision by filling in the empty hole.
Nieto-Rodrguez et al. [13] recommended that ICDSC participants engage with a
system that divides faces into two categories: those with surgical masks and those
without. The system establishes a per-person ID through tracking, resulting in only
one warning for a mask-less face over several frames in a video. The system can
achieve five frames per second with several faces in VGA images on a standard
laptop. The tracking method significantly reduces the number of false positives.
The system’s output includes confidence values for both mask and non-mask face
detections.
3 Proposed Work
This work aims to use real-time video streams to track persons who are breaking the
rules of social distancing. Furthermore, a VGG16-based Face Mask Classifier model
is trained and deployed to recognize people who are not wearing a face mask. For
detecting prospective intruders, the suggested technique also employs YOLOv3 and
DBSCAN clustering. The detailed flow is drawn in Fig. 1.
AI-Based Real-Time Monitoring for Social Distancing … 107
Firstly the frames are extracted from the real-time video and passed to the
YOLOv3 model for person detection. Further, faces are detected from the frame
using a Duel shot face detector, and a vgg-16 based face mask detection classifier
is used to check whether a person is wearing a mask or not. Person position is also
detected with DBSCAN clustering for cluster detection. Then bounding box and
monitoring status is placed into the frame, and finally, frames are displayed. This
process is done for each frame until the end of the frame.
Real-time object detection model YOLOv3 (You Only Look Once) is used for the
person detection, which is pretrained on the COCO dataset. Yolov3 used a better
hybrid architecture of YOLOv2, Residual networks, and Darknet-53 for the feature
extraction. Inside each residual block, the network is created using a bottleneck
structure (1 × 1 followed by 3 × 3 convolution layers) and a skip connection. Due
to ResNet, the performance of the network will not be harmed by overlaying layers.
Furthermore, the mass of fine-grained features is not lost because the more profound
layers receive more information directly from the shallower layers.
The model made use of the Darknet-53 architecture, which was designed with a
53-layer network for feature extraction training. The detection head for the training
object detector was then stacked with 53 more layers, giving YOLOv3 a total of
106 layers of the fully convolutional underlying architecture. Instead of stacking the
prediction layers at the last layers as before, YOLOv3 added them to the side network.
108 A. Negi et al.
YOLOv3’s most significant feature is that it detects at three distinct scales. Three
distinct scale detectors were created using the features from the last three residual
blocks. 1 × 1 kernel is applied on each detection layer responsible for predicting the
bounding box for feature map of each grid cell. 416 × 416 resolution is used in this
work to get the bounding box on a person.
On the SMFD dataset, the VGG-16 model is used as a face mask classifier to deter-
mine whether a person is wearing a mask or not. In VGG-16, the first two convolu-
tional layers have 64 filters with 3 × 3 sizes to generate 224 × 224 × 64 volume.
The next layer is the pooling layer, which reduces the height and width of volume
224 × 224 × 64 to 112 × 112 × 64. Then again, there are more conv layers with 128
filters, and 112 × 112 × 128 will be the new dimension. After that, a pooling layer
is applied, resulting in a new dimension of 56 × 56 × 128. Then VGG-16 has two
convolutional layers with 256 filters followed by pooling layer, three convolutional
layers with 512 filters followed by pooling layer, and three convolutional layers with
512 filters followed by a pooling layer. Finally, Vgg16 has a 7 × 7 × 512 into a Fully
connected layer (FC) with 4096 hidden units and a softmax output of one of 1000
classes. As shown in Figure 2, the three fully connected layers of the original VGG16
are replaced with two dense layers with 128 and 2 hidden nodes, respectively. The
softmax activation function is utilized to create a second dense layer for the final
output.
Across low resolution or covered images, the MTCNN and Haar-Cascades face
detectors are ineffective; hence DSFD is utilized in this study for a wide range of
orientations to detect the face. Cv2 and face detection library are used for DSFD
with Confidence threshold (0.5) and IOU threshold (0.3). After applying the model,
it will return a tensor with (N, 5) shape where N is the no of faces and xmin, ymin,
xmax, ymax, detection confidence values.
For this work, training is performed on Google Colab using python script for only 30
epochs. Adam optimizer with Batch size 32 is used for it. There are total 14,780,610
parameter out of which 65,922 are trainable and remaining 14,714,688 non-trainable
parameter. Real-time video with 25 fps is used for this work.
AI-Based Real-Time Monitoring for Social Distancing … 109
A simulated Masked Face Dataset (SMFD) is used for the face mask classifier. The
dataset contains a total of 1651 images as shown in Figure 3. The training set has a
total of 1315 images for both masked and without masks. The validation and test set
contains 142 and 194 images, respectively.
Data-augmentation can help to increase the number of images (creating image vari-
ations) and provide the images in batch to the model. The images are not replicated
in batches, and they also help to avoid model overfitting. Images are resized into
224 × 224 × 3 due to different sizes and to decrease the scale. ImageDataGenera-
tor is used for the augmentation with rescale (1./255), zoom range (0.2), shear range
(0.2), and horizontal flip (true) parameters. Figure 4 shows the random transformation
of the images using data augmentation.
The performance analysis for the proposed work is performed on the basis of Accu-
racy curve, Loss curve, Precision, Recall, F1 score, and Confusion matrix. Equa-
tions 1, 2, 3, 4, and 5 show the mathematics behind the each metrics.
AI-Based Real-Time Monitoring for Social Distancing … 111
Fig. 4 Random
transformation using data
augmentation
Categorical cross entropy as shown in Eq. 2 is used as a metric for this work. A
perfect classifier gets the logloss of 0.
N
M
logloss = −1/N yi j log( pi j ) (2)
i=1 j=1
Recall is the ability of a classifier to find all positive instances. For each class, it is
defined as the ratio of true positives to the sum of true positives and false negatives.
The proposed work recorded the training accuracy of 99.32% with a loss score
of 0.02, while the accuracy for the validation set recorded 100% with 0.01 loss as
shown in Fig. 5a and b. Our proposed model achieved 98.97% accuracy with 0.02
loss for the test set.
Confusion Matrix is shown in Figs. 6 and 7 for validation and testing set with nor-
malization and without normalization respectively. True Negatives, False Positives,
False Negatives, True Positives values are recorded 71, 0, 0, 71 for the validation
set while 97, 0, 2, 95 for the test set, respectively. So our model achieved the 100%
precision, recall, and F1 score for the validation set. In the validation set, both the
classes (with mask, without mask) recorded 100% precision, recall, f1-score with a
support value of 71 each for a total of 142 images as shown in Table 1. Support is
the number of actual occurrences of the class in the specified dataset. Imbalanced
support in the training data may indicate structural weaknesses in the reported scores
of the classifier and could indicate the need for stratified sampling or rebalancing.
Overall, 100%, 97.94%, and 98.96% precision, recall, and F1 score are recorded
for the test set. Further, in the test set with mask class recorded the 98%, 100%,
99% precision, recall, f1-score with support value of 97 while without mask class
recorded the 100%, 98%, and 99% precision, recall, f1-score with a support value
of 97 for total 194 images as shown in Table 2. Sample images obtained for the
real-time videos using the proposed work are displayed in Fig. 8.
AI-Based Real-Time Monitoring for Social Distancing … 113
We compared the proposed work with some other state-of-the-arts models and found
more excellent and nearer results. Starting from the [7], the author used the same
dataset for the face mask detection classifier and obtained 94 and 98.7% accuracy
using ensemble classifier, 96 and 95.64% using decision trees classifier, 100 and
99.49% using SVM classifier. Similarly, the work proposed in [14] recorded 98.59
and 98.97% accuracy for validation and test set using VGG-16. Nagrath et al. [15]
recorded 92.64% accuracy and 93% f1 score. The work proposed in [15] obtained
93% accuracy. Zhang et al. [16] recorded 84.10 mAP for the face mask detection.
Our work has recorded 99.32%, 100%, and 98.97% accuracy for training, validation,
and test set, respectively, in just 30 epochs.
The proposed work yielded promising results in only 30 epochs, but it may be
expanded to include more standard datasets such as RMFD, LFW, and others. For
blurred faces caused by quick movement or noise during capture, blurring augmen-
tation (Motion blur, Average blur, Gaussian blur, etc.) might be utilized.
AI-Based Real-Time Monitoring for Social Distancing … 115
5 Conclusion
The proposed work can enhance real-time public health governance, decision-
making, and related data insights around the world—not only for the virus we
currently face but also for the pandemics we will inevitably face in the future. In
this work, AI-based real-time monitoring of people for social distancing is imple-
mented using YOLOv3 person detection, VGG-16-based face mask classifier, Dual
Shot Face Detector-based face detection, and DBSCAN clustering. The proposed
work achieved 99.32%, 100%, and 98.97% accuracy for training, validation, and
test set. The proposed study may be expanded using more advanced neural networks
(Yolov5, VGG19, Resnet, Densenet, etc.) and standard dataset such as RMFD, LFW,
etc. A successful solution would assist governments and companies in making quick
and confident decisions about proper confinement strategy for their region while also
reducing the number of lives and livelihoods lost.
References
14. Negi A, Kumar K, Chauhan P, Rajput R (2021) Deep neural architecture for face mask detection
on simulated masked face dataset against covid-19 pandemic. In: 2021 International Conference
on Computing, Communication, and Intelligent Systems (ICCCIS). IEEE, pp 595–600
15. Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2021) SSDMNV2: a real time
DNN-based face mask detection system using single shot multibox detector and MobileNetV2.
Sustain Cities Soc 66:102692
16. Zhang J, Han F, Chun Y, Chen W (2021) A novel detection framework about conditions of
wearing face mask for helping control the spread of covid-19. IEEE Access 9:42975–42984
Human Activity Recognition in Video
Sequences Based on the Integration
of Optical Flow and Appearance
of Human Objects
1 Introduction
Human activity recognition has emerged as a pivotal research problem in recent years
due to its potential applications in several intelligent automated monitoring applica-
tions such as intelligent surveillance, robot vision, automated healthcare monitoring,
entertainment, video analytics, security and military applications, etc. Video data
is booming very fast due to the advancements in multimedia technology such as
smartphones, drones, movies, and surveillance cameras in the modern era. So it
has become essential to predict and monitor semantic video contents automatically.
Therefore, human activity recognition systems have become an innovative solution
to such automated monitoring of visual systems and encouraged the adoption and
usability of intelligent monitoring visual applications [1, 2]. Vision-based activity
recognition often becomes more difficult for real-world applications when the irreg-
ular motion of non-stationary cameras records activity videos. Such videos have a
complex background, varying illumination conditions, different poses, orientations,
and scaling of objects. Therefore, activity recognition involves parsing the complex
video sequences and learning complex activity patterns. Therefore, the extraction of
compelling features plays a vital role in activity recognition.
Over the last decade, various handcrafted feature descriptors were proposed, such
as single feature descriptors and a combination of multiple feature descriptors [1, 3,
4], and some encoding schemes with mid-level representation such as Bag-of-Words
(BoW) [5] and Fisher Vector [6] have been considered for activity recognition task
using several machine learning algorithms. Since realistic videos have a dynamic
range of varying details. Human activity recognition in realistic videos is still a chal-
lenging and open problem for research. For accurate recognition of human activity,
there is a need for an excellent and discriminative feature descriptor that selects
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 117
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_9
118 A. Kushwaha and A. Khare
relevant visual data and reduces unnecessary visual content [7]. This fact motivated
us to design a novel framework for human activity recognition for motion activities
recorded in realistic and multi-view environments. This work used the integration
of multiple feature representation techniques to represent human activities recorded
by static and moving cameras with varying scales, poses, orientations of human
objects, and changing illumination conditions. In the proposed approach, first, we
performed object segmentation by the method proposed by Kushwaha et al. [8, 9]
to capture the moving human objects (to compute human appearance in the subse-
quent frames) [10]. Then, computed the magnitude and orientation information of
moving objects using the optical flow technique [11], followed by the histogram
of oriented gradient [12] of optical flow features to capture the dynamic pattern of
human activities [13]. The final feature vector is constructed by a fusion of local-
oriented gradients of magnitude and orientation, which is then processed by a multi-
class support vector machine to compute the class scores of each activity category.
The proposed method’s effectiveness is empirically justified by conducting several
extensive experiments. Therefore, to analyze the proposed framework, we consid-
ered three publically available datasets that are IXMAS [14], UT Interaction [15],
and CASIA [16], and the results of the proposed method were compared with several
state-of-the-art methods. The recognition result demonstrates the usefulness of the
proposed method over considered state-of-the-art methods.
The rest of the paper is organized as follows: Sect. 2 has a detailed literature review
study. Section 3 consists of details of the proposed framework. The experimental
result and detailed discussion are given in Sect. 4. We concluded the proposed work
in Sect. 5.
2 Literature Review
With the increase in video recording cameras in different firms like visual surveil-
lance, film crews, drones, robotics, and smartphones, computer vision scientists have
increased their interest in developing an automated monitoring system. Therefore,
video-based human activity recognition (HAR) has become one of the most critical
research problems over the last few decades in different computer vision applica-
tions, such as security monitoring, gaming entertainment, smart indoor security,
intelligent visual surveillance, military applications, healthcare, robot vision, and
daily life activity monitoring. The process of capturing and recognizing human
activities is affirmed to be cumbersome and challenging due to the high degree
of freedom of human body motion with unpredictable appearance visibility such
as personal style and activity length, clothing, and object appearance in different
viewpoints and scales. Feature extraction techniques always play a crucial role in
accurately recognizing human activities. Researchers in this field used one of two
types of feature extraction techniques: (1) self-learning techniques from raw data
based on deep learning approaches, and (2) traditional handcrafted feature descriptor-
based techniques. Traditional handcrafted feature descriptor-based techniques are
Human Activity Recognition in Video Sequences Based … 119
The ultimate goal of this work is to present a framework for the recognition of human
activity based on supervised learning, which is recorded for real-world applications
by single and multi-camera. We designed a novel feature descriptor to represent
120 A. Kushwaha and A. Khare
complex motion activities in this work. The general framework of the proposed
work is shown in Fig. 1. Since excellent and discriminative feature descriptors always
play a crucial role in the activity recognition task, we first segmented the moving
objects from complex video data to capture the objects of interest and reduce the
unnecessary background content from the video clips. Then, we used the optical flow
technique [11] to compute the magnitude (motion or velocity vectors) and orientation
(direction) information of each moving pixel of an object further to avoid the noise
and background content [8]. Then magnitude and orientation information is further
used to compute the histogram of oriented gradients (HOG) [12] because it captures
the dynamic pattern of complex motion activities more discriminatively. At last, the
unique dynamic pattern of magnitude and orientation information captured by the
histogram of oriented gradients is further integrated using the feature fusion strategy
(concatenation) to construct the final feature vector. We have taken velocity and
direction information to construct the final feature vector to avoid inter and intra-class
variations and redundant information that may confuse the classifier on training. The
sample data of different activity categories may have the same magnitude (velocity)
but not the direction [8, 9]. A multiclass support vector machine then processes the
final constructed feature vector to compute the class scores of activities [24]. The
proposed work consists of the following steps:
i. The object segmentation technique proposed by Kushwaha et al. [8] separates
the complex background and computes human appearance in the subsequent
video frames.
ii. The optical flow technique [11] has been used to compute the magnitude
(velocity vector) and orientation (direction) of each moving pixel and to
eliminate background noise.
iii. Along with the temporal axis, we integrated optical flow vectors with a histogram
of oriented gradients (HOG) to compute dynamically oriented histograms of
optical flow sequences.
iv. Finally, a local-oriented histogram of the velocity vector and orientation infor-
mation is integrated using a feature fusion strategy to construct the final feature
vector.
v. We used a one-vs-one multiclass support vector machine to compute the class
scores of human activities [24].
where C A is the number of correct activity sequences and T A is the number of activity
sequences taken to be tested, and the result of the proposed method and other existing
methods considered for comparison [19–23, 25] on IXMAS, UT interaction, and
CASIA datasets is presented in Table 1.
(a)
(b)
(c)
Fig. 2 Sample frames of the considered datasets. a IXMAS [14], b UT Interaction [15], and c
CASIA [16]
Human Activity Recognition in Video Sequences Based … 123
As illustrated in Table 1, it can be observed that the proposed method achieves the
highest classification value for the IXMAS dataset (99.19%), CASIA (interaction)
(96.35%), second-highest for UT Interaction (99.11%), and CASIA (single person)
(97.35%). Although for UT Interaction, Kushwaha et al. [19] achieve the highest
accuracy value (100%). For CASIA (single person), Kushwaha et al. [20] achieve
the highest accuracy (97.95%), but both accuracy values are comparable to the result
of the proposed method; therefore, the overall performance of the proposed method is
good. The reason behind excellent accuracy is that the proposed method can extract
more discriminant features and provide exemplary performance in low resolution
with multi-view and realistic data by the proposed feature descriptor. The efficient
object segmentation technique in the proposed method followed by motion informa-
tion and histogram of oriented gradients gives another reason for excellent accuracy.
From Table 1, one can see that the proposed method gives better results for low-
resolution data recorded by different views, i.e. for human–human interaction and
human-object interaction with the capability to deal with challenges like varying illu-
mination conditions, presence of complex background and camera motion, and varia-
tion in scales, poses, and orientations. The recognition results demonstrate the useful-
ness of the proposed method for real-world applications, e.g. surveillance systems
having complex activities and outdoor scenes recorded from different viewing angles.
124 A. Kushwaha and A. Khare
5 Conclusion
This paper presents human activity recognition framework for motion activities in
a realistic and multi-view environment. In this work, we designed a novel feature
representation technique based on integrating the object’s appearance of interest
and the object’s motion information. Therefore, we used the object segmentation
technique to extract the human object and the optical flow technique to compute
the velocity (magnitude) and orientation (direction) information of moving human
objects. We considered velocity and direction information to avoid variations in intra-
class activities because samples of different activity categories may have the same
velocity but not the orientation. The histogram of orientated gradients computation
then follows the magnitude and orientation information to compute the dynamic
pattern of human activities, which gives a relative distribution of information of each
activity category uniquely and in a more discriminative way. The final feature vectors
are constructed by integrating local-oriented histogram of optical flow vectors using
feature fusion strategy followed by multiclass support vector machine to compute
the class score of human activities. The effectiveness of the proposed method is
established by conducting several experiments on three different publically available
datasets that are IXMAS, UT Interaction, and CASIA. The result of the proposed
method was analyzed by comparing its result with several existing state-of-the-art
methods. The result of the proposed method demonstrates the outperformance of the
method over the other state-of-the-art methods.
Acknowledgements This work was supported by the Science and Engineering Research Board
(SERB), Department of Science and Technology (DST), New Delhi, India, under Grant No.
CRG/2020/001982.
References
8. Kushwaha A, Khare A, Prakash O, Khare M (2020) Dense optical flow based background
subtraction technique for object segmentation in moving camera environment. IET Image Proc
14(14):3393–3404
9. Kushwaha A, Prakash O, Srivastava RK, Khare A (2019) Dense flow-based video object
segmentation in dynamic scenario. In: Recent trends in communication, computing, and
electronics. Springer, Singapore, pp 271–278
10. Al-Faris M, Chiverton J, Yang L, Ndzi D (2017) Appearance and motion information based
human activity recognition. In: IET 3rd international conference on intelligent signal processing
(ISP 2017). IET, pp 1–6
11. Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In:
Scandinavian conference on image analysis. Springer, Berlin, Heidelberg, pp 363–370
12. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings
of IEEE computer society conference on computer vision and pattern recognition, vol 1, pp
886–893
13. Li X (2007) HMM based action recognition using oriented histograms of optical flow field.
Electron Lett 43(10):560–561
14. Kim SJ, Kim SW, Sandhan T, Choi JY (2014) View invariant action recognition using
generalized 4D features. Pattern Recogn Lett 49:40–47
15. Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison
for recognition of complex human activities. In: 2009 IEEE 12th international conference on
computer vision. IEEE, pp 1593–1600
16. Wang Y, Huang K, Tan T (2007) Human activity recognition based on r transform. In: 2007
IEEE conference on computer vision and pattern recognition, pp 1–8
17. Singh R, Dhillon JK, Kushwaha AK, Srivastava R (2019) Depth based enlarged temporal
dimension of 3D deep convolutional network for activity recognition. Multimedia Tools Appl
78(21):30599–30614
18. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556
19. Kushwaha A, Khare A (2021) Human activity recognition by utilizing local ternary pattern
and histogram of oriented gradients. In: Proceedings of international conference on big data,
machine learning and their applications. Springer, Singapore, pp 315–324
20. Kushwaha A, Khare A, Khare M (2021) Human activity recognition algorithm in video
sequences based on integration of magnitude and orientation information of optical flow. Int J
Image Graph 22:2250009
21. Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: 2009 IEEE
12th international conference on computer vision, pp 492–497
22. Nigam S, Khare A (2016) Integration of moment invariants and uniform local binary patterns for
human activity recognition in video sequences. Multimedia Tools Appl 75(24):17303–17332
23. Seemanthini K, Manjunath SS (2018) Human detection and tracking using HOG for action
recognition. Procedia Comput Sci 132:1317–1326
24. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers.
In: Proceedings of the fifth annual workshop on Computational learning theory, pp 144–152
25. Aly S, Sayed A (2019) Human action recognition using bag of global and local Zernike moment
features. Multimedia Tools Appl 78(17):24923–24953
Multi-agent Task Assignment Using
Swap-Based Particle Swarm
Optimization for Surveillance
and Disaster Management
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 127
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_10
128 M. S. Ghole et al.
is Marina Beach, Chennai, India, which was affected by Tsunami [10]. An emergency
environment is created by invoking tasks inside the areas of interest. The objective
of the MAS system is to go on to complete these tasks satisfying different real-
life constraints. The distribution of these tasks among these agents is a challenging
problem [11, 12]. Thus, a design of an effective procedure for task assignment (TA)
is required.
TA is a process to assign tasks using the available resources (in our case, agents) in
such a way that the agent system effectively performs the tasks. TA plays an impor-
tant role in different real-life problems such as surveillance [13], disaster manage-
ment [14], intelligent parcel delivery system [15], and waste collection management
[16]. Since these agents are assigned the tasks, the sequence in which these tasks
are completed has a great impact on the total resources used by the agent system.
In this paper, the agents are deployed from a base camp to complete some tasks and
return to the starting position. This process can be represented as a traveling salesman
problem (TSP). TSP is a problem in which an agent has to complete all the tasks
by visiting them only once and finally coming back to starting point once all the
tasks are completed in the most efficient route. Now, to solve the TSP problem and
reduce the resource consumption of the agent system, a swap-based particle swarm
optimization (PSO) paradigm is proposed.
PSO is a meta-heuristic algorithm that optimizes a problem by iteratively improv-
ing a probable solution [17]. It solves the problem by having a population of probable
solutions called particles. Each particle moves in the search space influenced by its
local best position and best position among all the particles in the search space. In
the proposed method, the objective of PSO is to optimize the assigned task sequence
of individual agent to reduce the resource consumption. A variant of PSO named
swap-based PSO is used. Various applications of the swap-based PSO algorithm are
post-earthquake scenario problem [4], intelligent welding robots path optimization
[18], flexible job scheduling problem [19], team formation problem [20], in partial
shading of solar panels [21], vehicle routing problem [22], etc. This has motivated
the authors to use the swap-based PSO algorithm in this paper. The key contributions
of this work are highlighted as follows:
1. A task assignment approach for a multi-agent system is developed which is suit-
able for surveillance and disaster management. It is assumed that all service
requests (tasks) appear at the same time in the form of respective GPS coordi-
nates.
2. A two-stage approach for the assignment of tasks is proposed. At first, the tasks
are distributed for each agent based on available resources such as proximity
of resources and task completion overhead. This breaks down the problem as a
traveling salesman problem for each agent.
3. Then the assigned tasks for an agent are further optimized for the sequence of
executions by a proposed swap-based particle swarm optimization.
4. Extensive results are presented to demonstrate the feasibility of the proposed
method. It is demonstrated on the Google Maps considering the real coordinates
of two different locations (M. G. Marg at Gangtok, India which had experi-
Multi-agent Task Assignment Using Swap-Based Particle Swarm . . . 129
enced earthquakes [9] and Marina Beach at Chennai, India which was affected
by Tsunami [10]).
This paper is arranged in the following way, the proposed method is presented in
Sect. 2. The results are given in Sect. 3, followed by the conclusion and future direc-
tion of this work in Sect. 4.
2 Proposed Methodology
In this work, a task assignment approach for a multi-agent system with sequence
optimization is considered. The following assumptions are considered in the proposed
method:
1. Point-based agents and tasks are considered for the simulations.
2. All tasks are appearing at the same time.
3. An obstacle-free environment is considered.
4. Stationary tasks are considered.
5. A homogeneous agent system is considered where all the agents in the agent
system are of the same specifications.
Now, let us consider an MAS of N number of agents and M number of tasks (service
requests) in a workspace where
Ai ∈ A, ∀ i = 1, 2, . . . , N (1)
T j ∈ T, ∀ j = 1, 2, . . . , M (2)
where Ai is the position of the agent i and T j is the position of the task j. Next, the
stage I of the proposed method is presented.
Now, the job is to find the closest agent to each task (Eq. 4) and the closest task
for each agent (Eq. 5):
Now, when a task is assigned to an agent, the corresponding agent will be denoted
as A(assigned,i) and the corresponding task will be denoted as T(assigned, jr ) . Let, there
be a binary matrix Ci of agent i such that
M(assigned,i)
Ci (Ai , Tl ) = 1 (8)
l=1
Now, we update the remaining tasks and position of agents to the already assigned
tasks as follows:
Let us consider that p number of tasks are assigned in this time step. Therefore,
Mr = Mr − p. Now, the distance overhead will be calculated as
diai = A(assigned,i) , T(assigned, jr ) + diai + δi (10)
where δi is the ith agent task completion cost. Now, if the ith agent goes to task
T(assigned, jr ) from task Tl , the binary matrix Ci will be updated as
Repeat the process from Eqs. (3) to (11), until Mr = 0, i.e., till all the tasks are
assigned.
After the assignment process, the agent will be at the end of its assigned task.
Now, the agent has to come back to its starting position which is represented as
Multi-agent Task Assignment Using Swap-Based Particle Swarm . . . 131
A(starting point,i) is the starting position of the agent i and TM(assigned,i) is the last task
assigned to agent i. Now, this problem can be presented as TSP.
Consider there are M(assigned,i) tasks and there is an agent i which has to visit each
task only once and finally come back to the starting point. The objective of TSP is
to complete these tasks in such a way that the total cost incurred by the agent after
completing all the tasks and returning back to its starting point is minimum. Thus,
the objective of the agent i is defined as
M(assigned,i)
subject to: Ci (Tl , Ts ) = 1, ∀ l (14)
s=1,s=l
M(assigned,i)
Ci (Tl , Ts ) = 1, ∀ s (15)
l=1,l=s
where Eq. (14) represents that the agent goes to any one of the task Ts (excluding
Tl ) from Tl and Eq. (15) represents that the agent comes from any one of the task Tl
(excluding Ts ) to Ts .
One of the objectives of this work is to minimize Eq. (13) (henceforth will be
called as path cost) using the constraints given in Eqs. (14) and (15). This leads to
the stage II of the proposed method presented in the next section.
solution, and the best known solution among the population. To optimize the task
sequence of each agent, a swap-based PSO technique [23] is proposed in this paper.
In the original PSO, each particle starts with an initial position from a defined search
space. In the proposed method, each particle will start with a sequence of tasks
assigned to a particular agent. Let’s consider that there is an agent i having the
sequence of tasks assigned as discussed in the previous subsection. Let this agent
have K number of PSO particles (henceforth will be called as particles) with each
particle k containing the random sequence of the tasks assigned to agent i, so this
kth particle is defined as
“+” sign indicates that the swap operator (S O(Ti , T j )) is acting on Z k to obtain
Z knew . For instance, let Z k be (1,3,2,4) and then S O(1, 2) acts on Z k to get Z knew
as (3, 1, 2, 4).
The swap sequence is defined as the collection of swap operators of particle k and it
is denoted as
SSk = (S O1 + S O2 + · · · + S O(M(assigned,i) −1) ) (18)
Let a normal solution of kth particle be Z k and a target solution of kth particle be
Z k (tgt). The swap sequence that should operate on Z k to get Z k (tgt) is defined as
Multi-agent Task Assignment Using Swap-Based Particle Swarm . . . 133
SS(Z k , Z k (tgt)), e.g., let Z k = (2, 3, 1, 4) and let Z k (tgt) = (1, 2, 3, 4). The swap
sequence that is generated will be SSk (Z k , Z k (tgt)) = (S O(1, 3)+S O(2, 3)). So,
here first S O(1, 3) will act on Z k to get Z k = (1, 3, 2, 4) and then S O(2, 3) will act
to get Z k = (1, 2, 3, 4) which is Z k (tgt).
t is the current iteration; Vk is the velocity of the particle k which is actually the
consensus swap sequence of particle k; ω(t) is the inertia weight update for iteration
t; N pk (t) and N gk (t) are the number of swap operators to be allowed to operate on
Z k (t) for both P(t) and G(t), respectively; and P(t) and G(t) represent the swap
pb gb
sequence that is generated by comparing Z k (t) with Z k (t) and Z k (t), respectively.
Nk (t) is the total number of swap operators required to generate the swap sequences
P(t) and G(t) separately.
Z k (t + 1) = Z k (t) + Vk (t + 1) (25)
With Z k (t + 1) the path cost (diak (t + 1)) is calculated. The personal best solution
pb
of each particle Z k will be updated iff
pb
diak (t + 1) < diak (t) (26)
gb
The global best solution Z k will be updated iff
gb
diak (t + 1) < diak (t) (27)
134 M. S. Ghole et al.
The process will be repeated from Eqs. (20) to (27) for each particle k ∈ K until
gb gb
t = itermax . The final solution and path cost of an agent i will be Z k and diak ,
respectively.
3 Results
Extensive simulations are done on Google Maps. Two locations are considered for
simulations, one is Marg at Gangtok, India [24] and the other is Marina Beach at
Chennai, India [25]. The process of calculating the aerial distance between two
GPS coordinates is given in Appendix 5. For each location, two PSO variants are
considered. These are
1. Biased swap-based PSO, in this variation the agents which are assigned the tasks
in TA, their task execution sequence is optimized by taking each agent’s task
sequence and the respective path cost from the TA process as initial global best
task sequence and initial global best path cost for the particles.
2. Unbiased swap-based PSO, in which random task sequence is considered as the
initial global best sequence and initial global best path cost for the particles.
In the proposed method, the following parameter values are considered: number of
PSO particles are 20 for all variations; itermax are 100 and 50 for M. G. Marg, Gangtok
and Marina Beach at Chennai, respectively; number of tasks are 100 and 50 for M.
G. Marg, Gangtok and Marina Beach, Chennai, respectively; and number of agents
are 20. The values of ω are updated using the methods and values presented in [26].
The result of the proposed task assignment method is demonstrated in Fig. 1 and the
results of the agents’ assignment by the proposed biased swap-based PSO method
are shown in Fig. 2 for M. G. Marg, Gangtok. In both the figures, the movements of
agent 11, 12, and 20 are shown. By the proposed task assignment process as shown
in Fig. 1, agent 11 is assigned with tasks 17, 52, 47, and 84 which incurred a path
cost of 142.503 units; agent 12 is assigned with tasks 38, 80, and 18 which incurred a
path cost of 130.814 units; and agent 20 is assigned with tasks 73, 79, 94, 44, 26, 15,
58, 56, 82, 36, and 53 which incurred a path cost of 187.868 units. On the contrary,
in Fig. 2, agent 11 is assigned with tasks 17, 52, 84, and 47 which incurred a path
cost of 140.228 units; agent 12 is assigned with tasks 38, 18, and 80 which incurred
a path cost of 130.814 units; and agent 20 is assigned tasks 73, 79, 94, 44, 53, 15, 58,
56, 82, 36, and 26 which incurred a path cost of 177.622 units. Similar observations
of improvements are also noted for Marina Beach, Chennai, India as demonstrated
through Fig. 3 (stage I or task assignment results) and Fig. 4 (unbiased swap-based
PSO results) using the movements of agents 1, 14, and 16. This shows that with the
inclusion of swap-based PSO algorithm, the cost incurred at the individual agent
level and the total cost incurred by the MAS have improved.
The analysis is extended to demonstrate the effects of the maximum number of
iterations on total path cost by all agents. Three variations in the maximum number of
iterations are considered here. For each variation, the proposed method is simulated
Multi-agent Task Assignment Using Swap-Based Particle Swarm . . . 135
Table 1 Effect of variations in the total number of iterations in PSO algorithm for both PSO
variations for M. G. Marg, Gangtok
M. G. Marg, Gangtok, India
TA Iterations Biased swap-based Unbiased swap-based
PSO PSO
2867.949 50 2844.294 2843.297
100 2840.402 2840.663
150 2840.663 2841.323
Table 2 Effect of variations in the total number of iterations in PSO algorithm for both PSO
variations for Marina Beach, Chennai, India
Marina Beach, Chennai, India
TA Iterations Biased swap based Unbiased swap based
PSO PSO
33897.434 50 33881.104 33881.134
100 33884.206 33885.079
150 33881.133 33881.104
10 times and the best results among 10 runs are presented here. In Table 1, the
variations are shown for M. G. Marg, Gangtok and Marina Beach, Chennai for
both variants of the PSO algorithm. It is observed from Table 1 that there is a good
improvement in the results of path cost for 100 and 150 iterations as compared to 50
iterations in biased swap-based PSO mode. Thus, for the task assignment problem
at M. G. Marg, Gangtok, the increasing number of iterations are improving the total
path cost by all agents. However, in case of Marina Beach, Chennai, the increasing
number of iterations have negligible effect on the total path cost by all agents as
shown in Table 2.
4 Conclusion
the authors would like to extend this work to consider the proximity of resources
(e.g., fuel), the priority of tasks, and dynamic task assignment where tasks will appear
randomly or sequentially.
The proposed method is implemented on Google Maps. To calculate the aerial dis-
tance between two GPS coordinates, the law of cosines model is used, where it is
assumed that earth is spherical [27] and this model considers all the GPS coordinates
at the mean sea level. The following process is used for the calculation of aerial
distance. Let GPSo1 = (lato1 , lono1 ), where lato1 and lono1 are in decimal degree. Then,
π
GPSr1 = GPSo1 ×
180
π
GPSr2 = GPSo2 ×
180
References
11. Ghole MS, Ghosh A, Singha A, Das C, Ray AK (2021) Self organizing map-based strategic
placement and task assignment for a multi-agent system. In: Advances in intelligent systems
and computing. Springer, pp 387–399
12. Ghole MS, Ray AK (2020) A neural network based strategic placement and task assignment
for a multi-agent system. In: Lecture notes in electrical engineering. Springer, pp 555–564
13. Gu J, Su T, Wang Q, Du X, Guizani M (2018) Multiple moving targets surveillance based on
a cooperative network for multi-UAV. IEEE Commun Mag 56(4):82–89
14. Li P, Miyazaki T, Wang K, Guo S, Zhuang W (2017) Vehicle-assist resilient information and
network system for disaster management. IEEE Trans Emerg Top Comput 5(3):438–448
15. Wang F, Wang F, Ma X, Liu J (2019) Demystifying the crowd intelligence in last mile parcel
delivery for smart cities. IEEE Netw 33(2):23–29
16. Shao S, Xu SX, Huang GQ (2020) Variable neighborhood search and Tabu search for auction-
based waste collection synchronization. Transp Res Part B: Methodol 133:1–20
17. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-
international conference on neural networks. IEEE, pp 1942–1948
18. Yifei T, Meng Z, Jingwei L, Dongbo L, Yulin W (2018) Research on intelligent welding robot
path optimization based on GA and PSO algorithms. IEEE Access 6:65397–65404
19. Gu XL, Huang M, Liang X (2020) A discrete particle swarm optimization algorithm with
adaptive inertia weight for solving multiobjective flexible job-shop scheduling problem. IEEE
Access 8:33125–33136
20. El-Ashmawi WH, Ali AF, Tawhid MA (2019) An improved particle swarm optimization with
a new swap operator for team formation problem. J Indus Eng Int 15(1):53–71
21. Li H, Yang D, Su W, Lu J, Yu X (2019) An overall distribution particle swarm optimization
MPPT algorithm for photovoltaic system under partial shading. IEEE Trans Indus Electron
66(1):265–275
22. El-Hajj R, Guibadj RN, Moukrim A, Serairi M (2020) A PSO based Algorithm with an Efficient
Optimal Split Procedure for the Multiperiod Vehicle Routing Problem with Profit. Annals of
Operations Research 291(1):281–316
23. Liu X, Su J, Han Y (2007) An improved particle swarm optimization for traveling salesman
problem. In: International conference on intelligent computing, pp 803–812
24. MG Marg, Gangtok, India, lat 27.32860 (deg) and lon 88.61230 (deg), (Google Earth). Accessed
4 Feb 2022
25. Marina Beach, Chennai, India, lat 13.056327 (deg) and lon 80.283403 (deg), (Google Earth).
Accessed 4 Feb 2022
26. Huang X, Li C, Chen H, An D (2020) Task scheduling in cloud computing using particle swarm
optimization with time varying inertia weight strategies. Clust Comput 23(2):1137–1147
27. Calculate distance, bearing and more between Latitude/Longitude points. https://www.
movable-type.co.uk/scripts/latlong.html. Accessed 4 Feb 2022
Facemask Detection and Maintaining
Safe Distance Using AI and ML
to Prevent COVID-19—A Study
1 Introduction
COVID-19 was initially reported in Wuhan, China, and then it has been unrolled to
the whole world. The rapid spread of the coronavirus has resulted in 4 million global
deaths by Oct 21, 2021. COVID-19 is becoming a headache for everyone. Everyone
is afraid of this disease. The COVID-19 pandemic has created a difficult scenario for
the entire world; as a result, everyone is taking drastic measures to stem the spread
of coronavirus. Coronavirus spread can be kept away by maintaining distance and
wearing masks to prevent the transmission of the virus from one person to another.
In a nutshell, the performances of this study are mentioned as follows:
• This paper makes an extensive study on some recent research works to detect
facemasks worn by people and check safe distances through machine learning
and deep learning techniques along with the concept of image processing.
• Performances among several state-of-the-art methods are investigated and
compared.
• Discusses the benefits and applications of these recent studies.
The rest of the paper is organized as follows. Section 2 discusses some state-of-
the-art methods proposed for handling the COVID-19 spread. This section discusses
various facemasks and social distancing approaches where machine learning, deep
learning, and image processing play vital roles. Then a brief comparison section
among some popular methods in this domain is discussed and analyzed in Sect. 3.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 139
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_11
140 A. Mishra et al.
2 Related Study
The paper [1] presents a clear picture of recent studies about machine learning and
also artificial intelligence to handle all sides of COVID-19 trouble in various ranges
like molecular, clinical, and societal applications. In [2] this paper, a hybrid deep
learning and machine learning model is applied for detecting facemasks. The first
part is designed for feature extraction using ResNet-50 and for the classification
process of facemasks using decision trees, support vector machine (SVM), and the
ensemble method. In [3] this paper, the main goal is to identify the crowd. To adopt
live video, Raspberry Pi is used along with an RPi camera. Then the video is prepared
frame-by-frame. To identify people, vehicles in the video image processing are used.
Moreover, TensorFlow and OpenCV take an important role to do so. A model has
been established in [4] this paper to identify masked and physical distances among
construction workers in order to protect their safety during the COVID-19 epidemic.
Among the several models to achieve 99.8% accuracy for recognizing facemask, a
fast region-based CNN inception ResNet version2 network is chosen. The goal of [5]
this work is to develop RetinaFaceMask, a unique facemask detector that can detect
facemasks and contribute to public healthcare. In [6] this paper, Image classification
studies the performance of the convolution neural network (CNN). In [7] this paper,
MobileNet which is a new model architecture that is based on depth-wise dividable
convolutions is suggested. In [8] this book, neural networks are discussed broadly and
how they can be used extensively to predict diseases. A facemask detection model
based on computer vision and deep learning has been proposed in this research [9].
This model can be used in conjunction with a computer or laptop camera to determine
whether people are wearing masks on their faces. In [10] this paper, the main goal of
this study is to gain more about social distancing and facemask detection. Normally
object detection is taking place for social distancing and faces are being used to
identify masks on faces. OpenCV is generally used for all this. Opencv Darknet is
responsible for target tracking.
Facemask Detection and Maintaining Safe Distance Using AI and ML … 141
In [11] this paper, using image processing and deep learning, real-time social distance
is calculated. The YOLO location model is used here. YOLO has three tuning bound-
aries. In the first place, the edges are marked. At that stage of the jump, the box is
sorted which then determines the center of the bounding box. The edge is pre-prepared
to provide three results, which are assurance, the bounding box, and the centroid of
each person. In [12] this paper, a system has been proposed that uses computer vision
and the MobileNet V2 architecture for the benefit of the environment and automati-
cally monitors public places to prevent the spread of the COVID-19 virus. In [13] this
paper, the intention is to construct a system to detect if a person is wearing a mask
or not and notify the resembling authority in a smart city network with the help of
CCTV cameras and features extraction from images CNN. In this research paper [14],
pre-trained deep neural network models like ResNet Classifier, DSFD, and YOLOv3
bounding box have been utilized to identify individuals and masks and conclude two
things: Social distance can abate the expansion of the coronavirus. In this paper [15],
an integrated real-time facemask and social distance infraction detection system has
been built where objects are identified using YOLO v4. In [16] this paper, recent
technology has been used such as computer vision and deep learning. It uses the
MobileNetV2 architecture for facemask detection and uses the Euclidean distance
formula for distance computing. In [17] this paper, a system has been suggested that
monitors human activity using deep learning techniques, assuring human safety in
public places. In [18] this paper, the explicit study is based on the conclusions of
previous literary work with social distance and related technical predictions. In [19]
this paper, a summarized preface to social distance and masks is presented that is the
main resource in this present scenario. In [20] this paper, the action can able to differ-
entiate the type of social distance and categorize them as the norm of social distance.
In addition, it shows labels according to object identification. The classifier has been
applied to live video streams and photos. By observing the distance between two
people, it can be confirmed that one person is maintaining social. In [21] this paper,
a model has been proposed that can able to detect social distance and facemask by
using YOLOv2, YOLOv3, and YOLOv4. Social distance and facemask detection is
performed using the Darknet model YOLOV4, from video collected by a camera or
user-provided images and videos, identifying whether people follow social distances
and whether wearing a mask. In [22] a red line will report this paper, the deep
learning and YOLO methods are used to reduce the caliber of coronavirus epidemics
by assessing the distance between humans, and any couple failing to comply with
the regulations.
142 A. Mishra et al.
In the paper [23], the training and testing of the commonly used deep pre-trained
CNN models (DenseNet, InceptionV3, MobileNet, MobileNetV2, ResNet-50, VGG-
16, and VGG-19) using the Facemask dataset are simulated. In [24] this paper, the
OpenCV is used to gather live input video feeds from webcams and to feed them into
deep learning models. Using a complex neural network to classify the various object
classes discernible in the video gives us objects that are interested in such as people
and a closed box around them and then comparing distances. In [25] this paper, a
comparative study of various methods of CNN and machine learning techniques for
the detection and identification of a person wearing a facemask to prevent the spread
of COVID-19 is given. In [26] this work, a deep learning-based approach for detecting
masks has been introduced by using a combination of single and two-stage detectors
and then a transfer learning is applied to pre-trained models to measure the accuracy
and robustness of the system. In [27] this paper, they manufactured the PWMFD
with 9205 high-quality masked face photos and developed SE-YOLOv3, a quick
and accurate mask detector with a channel attention mechanism that improved the
backbone network’s feature extraction capability. The findings show that this Yolo can
provide state-of-the-art performance in object identification and classification while
requiring significantly less inference time [28]. The fundamental goal of this [29]
is to summarize the critical roles of AI-driven approaches (machine learning, deep
learning, and so on) and AI-empowered imaging techniques in analyzing, predicting,
and diagnosing COVID-19 disease. Various machine learning and deep learning
models are developed in the paper [30] to predict the PPIs between the SARS-CoV-2
virus and human proteins, which are then confirmed using biological tests.
another important way to get better accuracy in this type of work after analyzing
Fig. 2.
In Table 3, we have considered some popular papers that make a comparison
on various social distance-maintaining techniques mask detection techniques. These
papers use CNN, AI, Deep Learning, YOLO, MobileNetV2, and so on. We can say
these technologies are very much effective to give an accurate result considering the
above accuracy graph in Fig. 3.
Social distancing and SocialdistancingNet-19 [20] DL-based safe DL-based safer Detection using DL
face mask detection distance and facemask distancing and and computer vision
using deep learning detection [19] facemask detection [12] [14]
[17]
Number of trainable – The network input sizes, Learning rate = Adam optimizing, –
parameters anchoring box, and feature 0.0001, EPOCHS = learning rate = 1e−4 ,
extraction network are the three 50 and batch size BS epochs = 20 and BS =
tuning parameters in YOLO = 32 32
Accuracy 99.22% SocialdistancingNet-19–92.8% – 92% Between 96.73 and
ResNet: 50–86.5% Precision-0.917 100%
ResNet: 18–85.3% Recall-0.917
A. Mishra et al.
Facemask Detection and Maintaining Safe Distance Using AI and ML … 147
4 Conclusion
References
1. Bullock J, Luccioni A, Pham KH, Lam CSN, Luengo-Oroz M (2020) Mapping the landscape
of artificial intelligence applications against COVID-19. J Artif Intell Res 69:807–845
2. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) A hybrid deep transfer learning
model with machine learning methods for facemask detection in the era of the COVID-19
pandemic. Measurement 167:108288
3. Dhanush Reddy KN (2021) Social distance monitoring and facemask detection system for
Covid-19 pandemic. Turk J Comput Math Educ (TURCOMAT) 12(12):2200–2206
4. Razavi M, Alikhani H, Janfaza V, Sadeghi B, Alikhani E (2021) An automatic system to monitor
the physical distance and facemask wearing of construction workers in a Covid-19 pandemic.
arXiv preprint arXiv:2101.01373
5. Jiang M, Fan X, Yan H (2020) Retina facemask: a facemask detector. arXiv preprint arXiv:
2005.03950, 2
6. Lubis R. Machine learning (convolutional neural networks) for facemask detection in image
and video. Binus University Repository. https://core.ac.uk/reader/328808130
7. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam H (2017) Mobilenets:
efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:
1704.04861
8. Nielsen MA (2015) Neural networks and deep learning, vol 25. Determination Press, San
Francisco, CA
9. Maurya P, Nayak S, Vijayvargiya S, Patidar M (2021) COVID-19 facemask detection. In: 2nd
international conference on advanced research in science, engineering & technology, Paris,
France, pp 29–34
10. Bhutada S, Nirupama NS, Mounika M, Revathi M (2021) Social distancing and mask detector
based on computer vision using deep learning methods. Int J Res Biosci, Agricult Technol
2(9):81–87
11. Murugan KS, Kavinraj G, Mohanaprasanth K, Ragul KB (2021) Real-time social distance
maintaining using image processing and deep learning. J Phys: Conf Ser 1916(1):012190. IOP
Publishing
Facemask Detection and Maintaining Safe Distance Using AI and ML … 149
12. Yadav S (2020) Deep learning-based safe social distancing and facemask detection in public
areas for covid-19 safety guidelines adherence. Int J Res Appl Sci Eng Technol 8(7):1368–1375
13. Rahman MM, Manik MMH, Islam MM, Mahmud S, Kim JH (2020) An automated system
to limit COVID-19 using facial mask detection in the smart city network. In: 2020 IEEE
international IOT, electronics and mechatronics conference (IEMTRONICS). IEEE, pp 1–5
14. Shete I (2020) Social distancing and facemask detection using deep learning and computer
vision (Doctoral dissertation, Dublin, National College of Ireland). http://norma.ncirl.ie/4419/
1/ishashete.pdf
15. Bhambani K, Jain T, Sultanpure KA (2020) Real-time facemask and social distancing violation
detection system using YOLO. In: 2020 IEEE Bangalore humanitarian technology conference
(B-HTC). IEEE, pp 1–6
16. Savita S (2021) Social distancing and facemask detection from CCTV camera. Int J Eng Res
Technol (IJERT) 10(8)
17. Krishna KP, Harshita S (2020) Social distancing and facemask detection using deep learning.
In: 10th international conference, IACC 2020, Panaji, Goa, India, December 5–6, 2020, Revised
Selected Papers, Part I, vol 1367. Springer Nature
18. Pandiyan P. Social distance monitoring and facemask detection using deep neural network.
19. Bala MMS (2021) A deep learning technique to predict social distance and facemask. Turk J
Comput Math Educ (TURCOMAT) 12(12):1849–1853
20. Keniya R, Mehendale N (2020) Real-time social distancing detector using Socialdistancingnet-
19 deep learning network. https://doi.org/10.2139/ssrn.3669311, available at SSRN 3669311
21. Babu DCR, Jyothir Vijaya Lakshmi K, Saisri KM, Anjum SR (2021) Social deprivation with
protective mask detector. J Eng Sci 12(7):219–226
22. Patil NS, Rani K, Rangappa S, Jain V (2021) Social distancing detection. Int J Res Eng Sci
9(9):50–56
23. Teboulbi S, Messaoud S, Hajjaji MA, Mtibaa A (2021) Real-time implementation of AI-based
facemask detection and social distancing measuring system for COVID-19 prevention. Sci
Program 1–21
24. Yadav N, Sule N, Yadav S, Kullur S (2021) Social distancing detector using deep learning. Int
Res J Eng Technol 8(5):3699–3703
25. Jenitta J, Shrusti BK, Vidya DY, Sinnur VS, Varma S (2021) Survey on detection and
identification of facemask. Int J Sci Res Eng Trends 7(2):985–988
26. Sethi S, Kathuria M, Kaushik T (2021) Facemask detection using deep learning: an approach
to reduce risk of Coronavirus spread. J Biomed Inform 120:103848
27. Jiang X, Gao T, Zhu Z, Zhao Y (2021) Real-time facemask detection method based on YOLOv3.
Electronics 10(7):837
28. Liu R, Ren Z (2021) Application of Yolo on mask detection task. In: 2021 IEEE 13th
international conference on computer research and development (ICCRD). IEEE, pp 130–136
29. Chakraborty S, Dey L (2021) The implementation of AI and AI-empowered imaging systems
to fight against COVID-19—a review. Smart Healthc Syst Des: Secur Privacy Aspects 301
30. Dey L, Chakraborty S, Mukhopadhyay A (2020) Machine learning techniques for sequence-
based prediction of viral–host interactions between SARS-CoV-2 and human proteins. Biomed
Journal 43(5):438–450
A Machine Learning Framework
for Breast Cancer Detection
and Classification
1 Introduction
Medical science and researchers with the implementation of neural network and
computer-based techniques have come up with approaches where early detection
of breast cancer is possible in which time plays a vital role that when the tumor is
detected it is possible to detect it in initial stage and stop cancer cells to grow further
as soon as detection of the tumor is found.
Breast cancer starts in the cells of the breasts and spreads throughout the body.
Women are more likely than men to develop breast cancer. A mass in the breast,
blood extravasation from the nipple, and changes in the consistency or structure of
the breast or nipple are all the signs of cancer of the breast which is also known
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 151
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_12
152 B. Kumar et al.
These are not deadly and not harmful as it shows some abnormal growth or sometimes
few changes in the tissue of the breast which is not cancerous. These are basically
lump in breast and are non-cancerous and not prone to impact at deadly level. They
are basically a lump in the breast which looks scary but they are not cancerous.
Benign breast conditions are not harmful.
Malignant cancer is dangerous. These cells grow and then spread to other parts of
the body. When these cells grow, they accumulate together. Malignant conditions are
dangerous and need to be identified as soon as possible.
The steps involved in the methodology we used are exploratory data analysis
(EDA) and data preprocessing. An SVM is built for the prediction of the nature
of tumor, optimizing the SVM classifier, and comparison with other classification
models. In this model for the EDA section correlation matrix, scatter plots have
been used. SVM is the basic methodology behind the main model. We have made
use of cross validation and hyperparameter tuning. In the comparison section, SVM
is compared with five other algorithms with the help of scikit-learn pipelining for
smooth succession and to avoid loss.
For training Wisconsin breast cancer dataset gathered and maintained by the
University of California is used which we have discussed in detail in “Dataset”
discussion section.
2 Literature Survey
SVM showed highest accuracy of 97%. This paper was done for the understanding
of comparative performance of algorithms.
This paper does not have an optimization done on some particular ML algorithm,
it can successfully improve the performance.
In Ref. [2], “Using Machine Learning algorithms for breast cancer risk prediction
and diagnosis”, authors applied various machine learning algorithms on breast cancer
data. Here scope for improvement would be fine-tuning of the parameters and data
standardization. Ensuring traceability of the data and optimizing the downstream
data flow.
In Ref. [3], “An Enhanced Breast Cancer Diagnosis Scheme based on Two-Step-
SVM Technique”, authors have used a hybrid support vector machine (SVM) and
the two-step clustering technique to separate the incoming tumors, and the two-step
algorithm and SVM were coupled and used to identify the hidden patterns of malig-
nant and benign tumors. When tested on the UCI-WBC dataset, the proposed hybrid
approach improves accuracy by 99.1%. In future paper to improve diagnostic accu-
racy, an optimization approach can be coupled with the SVM two-step clustering
methodology. Also in [21] authors have used SVM classifier with statistical param-
eters such as entropy, mean, RMS, etc. and have achieved 80% accuracy. Similarly
on [22] they have used contrast stretching to increase the contrast of the image. The
segmentation of mammogram images has been playing an important role to improve
the detection and diagnosis of breast cancer.
3 Dataset
This dataset is the Wisconsin Diagnostic Breast Cancer dataset [4] gathered by the
University of California, Irvine machine learning repository. The dataset contains
357(62.74) B-type breast cancer and 212(37.25) m-type breast cancer cases, where B
and M denote benign and malignant. The dataset consists of 32 vertical lines with the
first column as exclusive ID number; the second column being the diagnosis result (M
or B); and after that there are standard deviations, average (mean), and the average
of the ten worst measurements. Missing values were not noticed. The exclusive
ID numbers of the specimens and the accompanying diagnosis (M and B which
actually denote malignancy and benignity) are saved in the first two progressions of
the dataset. The columns from three through thirty-two contain thirty actual value
attributes generated from digitized capture of cell nucleus which we can use to
establish a machine learning configuration to determine the nature of the tumor may
it be malignant or benign. A digital image of a fine needle aspiration biopsy of the
tumor was used to extract the characteristics. These features give a description of the
nuclei of the cell. We have obtained this dataset from Kaggle.
154 B. Kumar et al.
4 Methodology
So far in the paper we can get a visceral idea regarding the dataset we are working
with, now we are going to do a detailed paper of the features and the data values.
Exploratory data analysis (EDA) is a critical course of action which follows feature
engineering and data acquisition and it is supposed to be completed prior to any kind
of modeling. This is because a data scientist’s ability to comprehend the nature of
the facts is critical, by not assuming things prior to the analysis. Data research results
are incredibly valuable in determining the arrangement and distribution of data, as
well as the presence of extreme boundary type points and interrelationships within
the data collection.
Summary statistics are used to summarize significant aspects of a dataset into simple
quantitative measures. Standard deviation (SD), mean, and correlation are some of
the most commonly used measures. Since our data can be unevenly distributed, we
performed the data distortion operation. The distortion result indicates whether the
distortion is -ive (left) or +ve (right). Points near the zero have less distortions. Due
to the unique grouping of malignant and benign cancer kinds in these the graphs
show that “radius mean”, “area mean”, “concave points mean”, “concavity mean”,
and “perimeter mean” are beneficial in predicting cancer type. It’s also important to
mention that the parameters “area worst” and “perimeter worst” could be valuable
at some point.
A Machine Learning Framework for Breast Cancer … 155
4.1.3 Visualization
The process of protruding data, or the chunks of data, in the abstract visuals is known
as visualization. Data exploration is used in many various aspects of the data mining
process, including preprocessing, modeling, and interpretation of results.
1. Density plot.
2. Histogram.
3. Box and Whisper plot.
We can see that mean value parameters between 1 and 0.75 have a strong positive
association. The radius and parameter mean values have a strong positive association
with the mean area of the tissue nucleus. Concavity and area, concavity and perimeter,
and other parameters have a slight positive correlation (“r” belongs to the range of
0.5–0.75). Similarly, the attribute values “texture”, “radius”, and “parameter mean”
have a high negative association with fractal dimension (Fig. 4).
156 B. Kumar et al.
We can utilize the average values of “area”, “cell radius”, “compactness”, “perime-
ter”, “concavity”, and “concave regions” to classify cancer. The presence of malig-
nant tumors is associated with higher values of these parameters. Texture, smooth-
ness, symmetry, and factual dimension mean values do not indicate a preference for
one diagnosis over another. There are no obvious significant outliers in any of the
histograms that need to be cleaned up.
158 B. Kumar et al.
Every predictive analysis paper involves preprocessing of data. Formatting our data
in a manner that the nature of the challenge is optimally revealed to the machine
learning methodologies will be beneficial. This is a smart idea most of the times.
Following tasks are involved in the preprocessing of data:
So in this EDA part, data was studied for learning more about how the data was
distributed and how the qualities were related to one another. We saw a few things that
A Machine Learning Framework for Breast Cancer … 159
piqued our interest. In this section, we utilize feature selection, feature extraction,
and transformation to minimize dimensionality in high-dimension data. Our goal
here is to identify the data’s most predictive attributes and filter them to improve the
analytics model’s predictive capability.
NumPy was used to assign the 30 characteristics to an array X, and the class
names were converted to integers from their original textual format (M and B).
Malignant tumors are now designated as category 1 and benign tumors as category
0, respectively. Thereafter, we encode the class labels (diagnosis) in the array y, as
shown by invoking the transform method of LabelEncoder on two dummy variables.
Splitting of data into train and test sets. Using separate training and testing datasets
is the simplest way to measure the effectuation of the machine learning classifier.
We’ve divided the data into two sets: a testing set and a training set (70% practise,
30% assessment). The algorithm is taught in the first section, forecasts are made in
the other section, and the forecasts are compared to the anticipated results in the third
section. The length of the split is dependent on the length and details of our dataset,
but it’s typical to use 67% of it for practice and 33% for assessment. The 80:20 split
is also pretty common.
When working with only two dimensions, because a number of attribute couples
partition the dataset similarly, that makes it logical to include some of the feature
extraction techniques to try to use as many attributes while retaining maximum fea-
sible data. The PCA method will be employed. We now have a reduced dimensional
subspace (in this case, from 3D to 2D) in which the data is “most spread” along the
new attribute axes after the application of the linear PCA modification.
160 B. Kumar et al.
Hyperplanes are judgment boundaries that make sorting of the datum. Datums along
both sides of the hyperplane may be assigned to separate categories. In case of only
two input attributes, the hyperplane is just an irrelevant strip. Whereas in case of input
count of attributes being three, the hyperplane becomes a 2D plane. It becomes hard
to imagine when the number of features is more than three, hence we can say that the
hyperplane’s dimension is determined by the count of attributes. In this paper, first, we
split the dataset between train and test dataset in 70:30 proportion, i.e., 70 of the data
is used to train the model and 30 dataset is used for testing. We analyzed and built a
model based on this dataset to determine if such a particular set of manifestations will
evolve to form breast cancer. Support vector machine (SVM) is a binary classifier, it
looks for a hyperplane which leaves the biggest feasible fragment of points that lie on
162 B. Kumar et al.
exactly the side and belong to that particular class, which at the same time amplifies
the distance between the hyperplane and every class [7]. SVMs are one of the most
recent approaches of machine learning techniques which we can apply in the area of
prognosis of carcinoma. In the starting part of SVM does the designation of input
vectors into an attribute field which belongs to a higher dimension and recognizes
the hyperplane that it does the segregation of the data entries among two sub-classes.
The minimal spacing in between judgment hyperplane and the occurrences closest
to the border must be kept to as low as possible. The final classifier achieves sub-
stantial generalizability and so we can use it for the efficient categorization of new
specimens [7].
1. The kernel selection from linear, radial basis function (RBF), or polynomial.
2. C : the regularization parameter.
3. Parameters that are particular to the kernel.
Gamma and C parameters have an impact on the model’s complexity, with large
values of either producing a more complicated model. As a result, good values for
such two variables are generally firmly associated, and gamma and C are something
that can be regulated in coordination. After performing support vector classification
on our model, we got an accuracy of 95%. Now so as to improve the model we have
to use few of the techniques.
We can’t certainly say that the model which is trained on the training data would
function with accuracy on practical data for every case in machine learning. To tackle
this issue, it should be assured that the model we have can give us the accurate result
from the data, and has low resultant noise. The cross-validation approach is a go to
strategy for this sort of thing. In the cross-validation method, we split the data among
various subsections and the Ml model is trained on one subsection of the dataset and
the other subsection is used for the re-evaluation.
In this technique, the dataset is divided into k subsections, on all of them but one, sub-
sections are trained and then the trained model is evaluated on one subsection. Here
a unique subsection designated for testing reasons each time is re-replicated k times.
A Machine Learning Framework for Breast Cancer … 163
Here in this model we first checked the model with three-fold cross validation, i.e.,
K = 3. With this we got an accuracy of 97%. This assessment was done while taking
all the available parameters into consideration. Now we will try to cut down parame-
ters. We used three parameters that fit the model better than the other parameters. By
doing that we again got the accuracy of 97%. Hence, we can conclude that a small
number of features can also give us the model with similar performance. Hence, we
need to focus a little on feature selection now. Let’s have a detailed discussion on
model accuracy.
A receiver operating characteristic curve (ROC curve) is a graph that defines the
degree of accuracy of the categorization model that works across every categorization
criteria. In this plot, y- and x-axes are as follows:
1. Rate of True Positives.
2. Rate of False Positives.
The True Positive Rate (TPR), known as recall or sensitivity, is defined as follows:
TPR = (TP)/(TP + TN).
The False Positive Rate (FPR) can be defined as follows:
FPR = (FP)/(FP + TN),
where FP is false positive, TP is true positive, and TN is true negative. The True
Negative Rate (TNR) which is also known as specificity is defined as
TNR = (TN)/(FP + TN).
“Area under the ROC Curve” has short form “AUC”. AUC in short assesses full 2D
area beneath whole ROC curve from (0, 0) to (1, 0). The AUC value lies between
164 B. Kumar et al.
0 and 1. The model with all the incorrect predictions has AUC = 0.0, whereas the
model with all predictions correct has AUC = 1.0. The following are two reasons
for AUC to be desirable:
1. The AUC stays impervious to scaling. It measures how effectively predictions are
sorted instead of measuring absolute values.
2. The categorization boundary has no ramification to AUC. It evaluates the system’s
classification performance autonomous of the categorization level employed.
4.3.8 Observation
Confusion matrix for the model’s current performance is shown in Fig. 9. Now here
we have “1” and “0” as the two probable expected classes. Benign equals to 0 which
indicates absence of cancer cells and malignant equals to 1 which identifies existence
of cancer cells. A total of 174 predictions were made by the classifier. The classifier
accurately guessed “yes” and “no” 113 out of 174 cases. In actuality, 64 of the total
patients in the data have cancer, whereas the remaining 107 do not. So here we have
calculated rates based on confusion matrix:
Accuracy is calculated as:
(TP + TN)/(TP + TN + FP + FN) = (57 + 106)/171 = 0.95.
Rate of Miscellaneous:
(FP + FN)/(TP + TN + FP + FN) = (1 + 7)/171 = 0.05 (0.05 = 1 − 0.95).
True Positive Rate (Sensitivity): Ratio of number of times it predicts yes and is
actually yes, with total number of yes it predicted.
TP/actual yes = 57/64 = 0.89.
False Positive Rate:
FP/actual no = 1/107 = 0.01.
A Machine Learning Framework for Breast Cancer … 165
Prevalence:
Actual yes/total = 64/171.
Precision:
TP/(TP + FP) = 57/58 = 0.98.
True Negative Rate:
TN/(actual no) = 106/107.
Now, here we have the ROC curve for this model: In this ROC, we can interpret that,
points that are present on diagonal, they have 0.5 probability of being either 0 (no)
or 1 (yes). Hence classification model is not really making a difference, hence the
decision is being made at random (Fig. 10).
TPR is greater than FPR in the areas over diagonal, and the model suggests that this
region outperforms randomness. Let us suppose FPR = 0.01 and TPR = 0.99. In this
case, the chance of true positive subsection is (TRP/(TPR+FPR)), i.e., 99%. Besides,
suppose F.P.R. stays constant, it is clear that the classification model performs better
as we go vertically higher and higher from the diagonal.
166 B. Kumar et al.
To tune their behavior to a specific environment, machine learning models are param-
eterized. Because models might include a lot of parameters, finding the ideal combi-
nation is a search problem. Now in this section, we have used scikit-learn to adjust
the SVM classification model’s parameters.
Now in this section first we tried applying k-fold cross validation with k = 5,
hence we got the accuracy of 96%.
Results are shown in Fig. 11.
1. Type of kernel.
2. C and gamma parameters.
It is very important to pick the right kernel type because if the transformation is
incorrect, the model’s outcomes can be very less accurate. We should always check
if our data is linear and, if so, we utilize linear SVM (linear kernel). By default the
kernel type of SVM is set as RBF (radial basis function), whereas C value is set to 1.
Now in scikit-learn library we have following techniques for hyperparameter tuning:
168 B. Kumar et al.
1. GridSearchCV
GridSearchCV uses a dictionary to specify the parameters that can be used to train
a model. The grid of parameters is defined as a dictionary, with the keys being the
parameters and the values being the test settings. There is one shortcoming to this
method [12]. GridSearchCV will go through all of the intermediate hyperparameter
combinations, making grid search computationally quite expensive.
2. RandomizedSearchCV
RandomizedSearchCV only runs through a predetermined number of hyperparame-
ter settings, therefore RandomizedSearchCV overcomes the shortcomings of Grid-
SearchCV. It moves randomly throughout the grid to discover the optimal collection
of hyperparameters. This method eliminates the need for extra computation [19].
Through this process we got an accuracy of 98% with parameters that suited best for
our model as C : 0.1, gamma : 0.001 and the kernel being “linear”. The result can be
seen in Fig. 11.
We can successfully classify the malignant and benign breast cancer tumor with the
use of the support vector machine methodology. Hyperparameter tuning can give us
considerable improvement in the accuracy of the model. Performance of the SVM
can be improved as compared to default SVC, when all the parameters are scaled so
that the mean is zero and standard deviation is set at one (Figs. 12 and 13).
Now before we jump to comparison we first made the process a little more convenient.
We created machine learning pipelines. In a machine learning paper, there are regular
workflows that we should automate. Pipelines in the scikit-learn library in Python
assist in explicitly defining and automating these operations.
1. Pipelines are useful for resolving issues like data leaks in your test harness.
2. Pipeline is a Python scikit-learn facility for automating workflows of machine
learning.
3. Enabling a linear succession of data alterations so as to link together for pipelining.
1. Create a validation dataset and separate it from the rest of the data.
2. Set up a ten-fold cross validation for the test system.
A Machine Learning Framework for Breast Cancer … 169
of scikit-learn for comparison against each other. Six classification models that we
used are as follows:
cases and new case/data have huge similarity, hence new case is placed in the cate-
gory that has highest number of similar existing cases like this one [16]. The K-NN
approach saves all available data then does the classification of new data points on
the basis of their similarity with current data. This suggests that new data can be put
right away in a precise group with use of the K-NN method.
Decision Tree Classifier (CART): The Decision Tree is a supervised learning
approach that we can use to overcome regression and categorization difficulties,
however we often make its use for categorization [17]. Dataset attributes are shown
by internal nodes whereas decision rules are represented by branches, and outcome
is represented by each leaf node in a tree-structured classifier.
Gaussian Naive Bayes Classifier: “Gaussian Naive Bayes” is a Naive Bayesian
variation which permits serialized input which follows the Gaussian normal dis-
tribution. The Bayes theorem is the foundation of the Naive Bayes categorization
methods, which are supervised machine learning classification algorithms. It is a
straightforward categorization method that works efficiently. When input complex-
ity is substantial, they become advantageous [18]. The Naive Bayesian Classifier
may also be of use in solving complex categorization issues.
These results indicated that SVM accuracy had been increased, and it is giving
the highest accuracy achieved so far. SVM, LDA, and LR have shown good results,
hence with tuning they can produce better results (Fig. 14).
172 B. Kumar et al.
Now, we have tuned the parameters for SVC and K-NN classifiers. We have used
GridSearchCV in tuning these parameters. The GridSearchCV method receives pre-
determined hyperparameter values. This is achieved by creating a dictionary in which
every hyperparameter is listed along with the possible values.
SVM: On hyperparameter tuning SVC we got the following result and parameters
as best suited.
K-NN: Now, in K-NN-classifier parameters “k” and distance metric function can be
tuned. By this process we got following results, and best suited parameters.
Hence, from this we have gathered that the SVM performs better. Hence, we
decided that the SVM is the best suited model for this machine learning problem.
A Machine Learning Framework for Breast Cancer … 173
5 Result
We have finalized that SVM is the best performing model for our classification
problem, now after running the model individually we got the impressive results on
the test dataset as follows.
The result shows that SVM gets an accuracy of 97%, which is the best among the
five and hence we have used that.
Here we conclude that with the help of hyperparameter tuning and data standardiza-
tion, SVM best classifies the malignant and benign cancers from given data among
the algorithms we considered. In this paper, we have created a model with optimized
model based on support vector machine (SVM), and compared its performance with
five other methodologies which are logistic regression (LR), linear discriminant
analysis (LDA), K-nearest neighbor classification (K-NN), decision tree classifier
(CART), and Gaussian Naive Bayes Classifier (GaussianNB). SVM has proved its
superiority over others that we can see its evaluation matrix. In the future, there
are some possibilities to the direction in which this research can lead. One of them
being the possibility of change in type of database; in this paper, we have worked on
Wisconsin breast cancer biopsy dataset. This dataset is derived from biopsy results
of various breast cancer potential patients. Different results can be obtained from a
dataset of X-ray data. Furthermore, complex algorithms can be designed with the
help of deep learning methods. Larger datasets can be used to train the model.
174 B. Kumar et al.
References
1. Khourdifi Y, Bahaj M (2018) Applying best machine learning algorithms for breast cancer
prediction and classification. In: 2018 international conference on electronics, control, opti-
mization and computer science (ICECOCS), pp 1—5. https://doi.org/10.1109/ICECOCS.2018.
8610632
2. Bharat A, Pooja N, Reddy RA (2018) Using Machine Learning algorithms for breast can-
cer risk prediction and diagnosis. In: 2018 3rd international conference on circuits, control,
communication and computing (I4C), pp 1–4. https://doi.org/10.1109/CIMCA.2018.8739696
3. Osman Ahmed Hamza (2017) An enhanced breast cancer diagnosis scheme based on two-step-
SVM technique. Int J Adv Comput Sci Appl 8(4):158–165
4. Dr. Wolberg WH General Surgery Department University of Wisconsin, Clinical Sciences
Center. “Breast Cancer Wisconsin (Diagnostic) Data Set” Retrieved from https://www.kaggle.
com/uciml/breast-cancer-wisconsin-data
5. Gupta T (2021) Machine learning—Geeksforgeeks. https://www.geeksforgeeks.org/machine-
learning/
6. Support Vector Machine (SVM) Algorithm -Javatpoint. https://www.javatpoint.com/machine-
learning-support-vector-machine-algorithm
7. Gandhi R (2018) SVM Introduction to Machine Learning algorithms Rohit Gandhi—
Datascience. https://towardsdatascience.com/support-vector-machine-introduction-to-
machine-learning-algorithms-934a444fca47
8. Unknown (2020) Classification: ROC curve and AUC—google developer website. https://
developers.google.com/machine-learning/crash-course/classification/roc-and-auc
9. Narkhede S (2018) Understanding AUC—ROC curve—towards data science. https://
towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
A Machine Learning Framework for Breast Cancer … 175
10. Czako Z (2018) SVM and Kernel SVM by Czako zoltan—towards data science. https://
towardsdatascience.com/svm-and-kernel-svm-fed02bef1200
11. Singh T (2020) Hyperparameter tuning—Geeksforgeeks. https://www.geeksforgeeks.org/
hyperparameter-tuning/
12. Tyagikartik, 2021. “SVM Hyperparameter Tuning using GridSearchCV - Geeksforgeeks”.
Available at https://www.geeksforgeeks.org/svm-hyperparameter-tuning-using-gridsearchcv-
ml/
13. Shah T (2017) About train, validation and test sets in machine learning—towards data science.
https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7
14. Pant A (2019) Introduction to logistic regression—towards data science. https://
towardsdatascience.com/introduction-to-logistic-regression-66248243c148
15. Raman_257 (2021) ML—linear discriminant analysis—Geeksforgeeks. https://www.
geeksforgeeks.org/ml-linear-discriminant-analysis/
16. Unknown KNN algorithm for machine learning—Javatpoint. https://www.javatpoint.com/k-
nearest-neighbor-algorithm-for-machine-learning
17. Majumder P (2020) Gaussian Naive Bayes, machine learning—Opengenus. https://iq.
opengenus.org/gaussian-naive-bayes/
18. Unknown Decision tree classification algorithm—Javatpoint. https://www.javatpoint.com/
machine-learning-decision-tree-classification-algorithm
19. Hussain M (2020) Hyperparameter tuning with GridSearchCV—MyGreatlearning. https://
www.mygreatlearning.com/blog/gridsearchcv/
20. Gardezi SJS, Elazab A, Lei B (2019) Wang T breast cancer detection and diagnosis using
mammographic data: systematic review. J Med Internet Res 21(7):e14464
21. Chanda PB, Sarkar SK (2018) Detection and classification technique of breast cancer using
multi Kernal SVM classifier approach. In: 2018 IEEE applied signal processing conference
(ASPCON), pp 320–325. https://doi.org/10.1109/ASPCON.2018.8748810
22. Rejani YI, Dr. Selvi ST (2009) Early Detection of breast cancer using SVM classifier technique.
Int J Comput Sci Eng 1
Vision Transformers for Breast Cancer
Classification from Thermal Images
1 Introduction
Breast cancer is the topmost eminent cause of death among women, resulting in the
rising of breast cancer cases worldwide [1], affecting annual breast cancer screening
necessary for early detection, and reducing the mortality rate. India positions third
highest in cancer cases side by side with China and United States and is increasing
by 4.5–5% every year. In India, the death rate for breast cancer is 1.7 times higher
than maternal fatality [2]. Thermal imaging is a physiological imaging used as an
adjunctive modality and has become an appreciable area of research. Breast ther-
mography is non-contact and non-invasive on the strength of using no radiation and
avoiding painful breast compression [3]. Expert radiologists and pathologists are
required to diagnose breast cancer, which is time-consuming, and they draw their
conclusion formulated on various visual features monitored which may vary from
person to person. Computer-aided diagnosis (CAD) systems can support experts to
reach decisions automatically. These techniques can also minimize inter-observer
variations to implement the diagnosis process replicable. Deep learning algorithms
have performed much the same as human experts on object detection and image
classification tasks [4]. The convolutional neural network (CNN) is the best-used
deep learning model to grasp complex discriminative features among image classes.
Different architectures of CNNs such as VGG-16 [5] have presented exceptional
results in the past few years on the very large ImageNet dataset. Also, CNNs are
utilized on medical images to produce futuristic results.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 177
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_13
178 L. S. Garia and M. Hariharan
The transformer architecture [6] is already the superior model in natural language
processing (NLP). Inspired by the progress of the self-attention-based deep neural
networks of Transformer models in NLP, the Vision Transformer (ViT) [7] architec-
ture is introduced for the image classification application. Input image is split into
patches and every embedded patch is treated as a word in NLP during the training
of these models. ViT uses self-attention modules to acquire the relation between
these embedded patches. Here in, we move forward in applying Transformers in
thermal image analysis and examine the potential application of self-attention-based
architectures in breast thermal images (thermograms) classification. Specifically, we
inspected ViT base model with different patch sizes on the basis of their perfor-
mance during fine-tuning for our specific task based on the thermogram dataset. The
outcomes display the high potential of ViT models in breast thermal image classi-
fication. We believe that this is the first study to explore the performance of ViT
architectures on the classification of breast thermograms.
2 Related Work
This section presents the review of some of the significant works on breast cancer
detection/diagnosis using thermal images, image processing, machine learning, and
deep learning.
Zuluaga-Gomez et al. [8] performed a study of the impact of data pre-processing,
data augmentation, and database size on a proposed set of CNN models. Tree Parzen
Estimator was used for CNN hyperparameters fine-tuning. 57 patients database from
DMR-IR database [9] were used, and the CNN models obtained 92% of accuracy and
F1-score, which outperformed various state-of-the-art architectures namely Incep-
tion, ResNet50, and SeResNet50. The results also confirmed that a CNN model using
data-augmentation techniques attained similar performance metrics compared with
a CNN that uses a 50% bigger database.
Kakileti et al. [10] explored several CNN architectures for semantic segmentation.
Hotspots in the thermal image were detected using naive patch-based classifiers
to several variations of the encoder-decoder architecture. 180 subjects were used
(private database) and results revealed that encoder-decoder architectures performed
better than patch-based classifiers in spite of small thermal image datasets in terms
of accuracy.
Torres-Galvan et al. [11] used the DMR-IR database to classify breast ther-
mograms using transfer learning. Pre-trained architectures: GoogLeNet, AlexNet,
ResNet50, ResNet101, InceptionV3, VGG-16, and VGG-19 were used. Images were
resized to a fixed size of 227 × 227 or 224 × 224 pixels, and 173 patients database
was randomly split into 70% for training and 30% for validation. The learning rate
of 1 × 10–4 and 5 epochs were used for all deep Neural Networks. VGG-16 outper-
formed with a balanced accuracy of 91.18%, specificity of 82.35%, and sensitivity
of 100%.
Vision Transformers for Breast Cancer Classification from Thermal Images 179
Fernandez-ovies et al. [12] used 216 patients (41 sick patients and 175 healthy
patients) from the DMR-IR dataset (dynamic thermogram) and divided them into 500
healthy and 500 sick patients with breast thermal images with 80% allocation for
training and testing (80–20 split) and 20% for validation. Various CNN models such
as ResNet18, ResNet34, ResNet50, ResNet152, VGG-16, and VGG-19 were used.
The results showed that ResNet50 and ResNet34 produced the highest validation
accuracy rate of 100% for breast cancer detection.
Mishra et al. [13] used DCNN on 160 abnormal and 521 healthy breast ther-
mograms of DMR-IR Database. After the conversion of color to grayscale, thermal
images were pre-processed, segmented, and then classified using DCNN with SGD
optimizer and a learning rate of 0.01. An accuracy of 95.8%, with specificity and
sensitivity levels at 76.3 and 99.5%, respectively, resulted.
From the previous works, it can be observed that the researchers have explored
and applied different deep convolutional neural network models for the classification
of normal and abnormal breast thermograms using the self-collected breast thermal
images and the images from the DMR-IR database. The number of images used in
different research works was also different. The accuracies were obtained between
90 and 100%. Most of the self-collected datasets are not available for research
purpose and current public datasets consist of only two classes of breast thermo-
grams (healthy/normal and abnormal/sick). Though considerable research works
have been published in the literature using deep learning models, researchers are
continuously working on improving the efficiency of the algorithms, reducing the
time complexity of the deep learning models, and improving the detection accu-
racy. In this paper, Vision Transformer (ViTs)-based solution is proposed for the
classification of normal and abnormal breast thermograms.
3 Vision Transformer
image is chosen and performed patching on it. Figure 1 shows splitting an image into
several 32 × 32 patches. The network structure for Vision Transformer is shown in
Fig. 2.
Breast thermograms are used from the Research Data Base (DMR) [9] for this work.
Thermograms of healthy and sick patients were acquired using a FLIRSC-620 IR
camera having a resolution of 640 × 480 pixels with static and dynamic protocols.
The dataset consists of images of individuals aged between 29 and 85 years old.
In this work, static thermograms are used as tabulated. 90–10 data split is used for
training and testing purposes (Fig. 3 and Table 2).
In order to measure the performance of the ViT, six performance indices are
measured as follows:
TP + TN
Accuracy (ACC) = (1)
TP + FP + TN + FN
TP
Sensitivity/Recall (SE) = (2)
TP + FN
Vision Transformers for Breast Cancer Classification from Thermal Images 181
Fig. 3 Number of
thermograms
TN
Specificity (SP) = (3)
TN + FP
TP
Positive Predictive Value (PPV)/Precision (PRE) = (4)
TP + FP
TN
Negative Predictive Value (NPV) = (5)
TN + FN
PRE.SE
F1-score (F1 ) = 2 (6)
PRE + SE
ViT-Base has 12 encoder layers having 12 heads for multi-head attention. This
network has 768 embedded size and 3072 MLP size. In the present study,16 ×
16 and 32 × 32 size image patches are given to the input of ViT-B (ViT-B/16 and
ViT-B/32). Adam optimizer is used with learning rate 1e-2 for training. 10% of test
data is used for validation purposes.
A confusion matrix is drawn for each classifier (Fig. 4). In the present study, the
positive and negative cases were allotted to cancerous and non-cancerous patients,
respectively. Hence, TP and TN symbolize the number of correctly diagnosed
cancerous and non-cancerous patients, respectively. FP and FN represent the number
of incorrectly diagnosed cancerous and non-cancerous patients, respectively. Results
are tabulated in Table 3.
Further, the area under the ROC curve (AUC) [14] is calculated to show the overall
performance of the ViTs (Fig. 5). F1-score is calculated when the False Negatives
and False Positives are important [15].
The proposed ViT model yielded a maximum accuracy of 95.78% for 32 × 32
patches and 94.73% for 16 × 16 patches using the distribution of 90% training and
10% testing. It is also observed from Fig. 5 that the proposed ViT model yielded
a maximum AUC of 0.957 for 32 × 32 patches and 0.946 for 16 × 16 patches
using the distribution of 90% training and 10% testing. The results of the proposed
model cannot be compared directly with the existing works in the literature due
to different numbers of images/subjects used, different deep learning models used,
transfer learning or learning from scratch techniques used, and acquisition protocols
used (dynamic/static). Some of the significant works published in the literature using
the DMR dataset with different deep learning models are reported in Table 4. This
table clearly indicates that the CNN models proposed by researchers achieved accu-
racies between 91.8 and 100% either using transfer learning [11] or trained from the
scratch which includes ResNet18, ResNet34, ResNet50, SeResNet50, VGG-16, and
Inception models. The proposed ViT model yielded a maximum accuracy of 95.78%
and a maximum AUC of 0.957 for 32 × 32 patches using the distribution of 90%
training and 10% testing. Considering ViT models demand a large-scale dataset for
training, and the size of DMR data is relatively small, 90% of the images were used
for training and the rest 10% of images were used for testing in this work.
Medical images diverge from natural images as they have originally higher resolu-
tions along with smaller regions of interest. As a result, neural network architectures
that perform well for natural images probably not be appropriate for medical image
analysis.
184 L. S. Garia and M. Hariharan
The Vision Transformer model works effectively, and it may require more data
to classify the right class. The self-attention mechanism is very powerful not only
in the field of NLP, but also in Computer Vision. Splitting the image into many
patches helps the model to learn the image better, when sending these patches into
the transformer encoder, the self-attention mechanism is applied. It will look for the
most significant feature for each class and predict a new input image based on the
significant part. The outcomes are compared with the corresponding performance
of the CNNs and demonstrate that attention-based ViT models score comparable
achievement with CNN methods (95.78% accuracy).
Improving the performance of Vision Transformer is a challenging task. This
work can also be extended and modified for low-resolution breast thermal images
captured using a mobile camera. The results presented in this analysis reveal new
ways to utilize self-attention-based architectures as a substitute for CNNs in different
medical image analysis tasks.
References
1. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D (2011) Global cancer statistics.
Cancer J Clin 61(2):69–90
2. Pandey N (2018) [World Cancer Day] Why does India have the third highest number of cancer
cases among women? https://yourstory.com/2018/02/world-cancer-day-why-does-india-have-
the-third-highest-number-ofcancer-cases-among-women/amp
3. Borchartt TB, Conci A, de Lima RCF, Resmini R, Sanchez A (2013) Breast thermography
from an image processing view point: a survey. Int. J Signal Process 93(10):2785–2803
4. Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an
overview and application in radiology. Insights Imag 9(4):611–629
Vision Transformers for Breast Cancer Classification from Thermal Images 185
5. Zhang X, Zou J, He K, Sun J (2015) Accelerating very deep convolutional networks for
classification and detection, arXiv:1505.06798 [cs], May 2015
6. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I
(2017) Attention is all you need. In: Advances in neural information processing systems, pp
5998–6008
7. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M
et al (2020) An image is worth 16 × 16 words: transformers for image recognition at scale.
arXiv preprint arXiv:2010.11929
8. Zuluaga-Gomez J, Al Masry Z, Benaggoune K, Meraghni S, Zerhouni N (2019) A CNN-based
methodology for breast cancer diagnosis using thermal images. http://arxiv.org/abs/1910.13757
9. Silva LF, Saade DCM, Sequeiros GO, Silva AC, Paiva AC, Bravo RS, Conci A (2014) A new
database for breast research with infrared image. J Med Imag Health Inform 4(1):92–100
10. Kakileti ST, Dalmia A, Manjunath G (2019) Exploring deep learning networks for tumor
segmentation in infrared images. Quant Infr Thermogr J 17(3):153–168. https://doi.org/10.
1080/17686733.2019.1619355
11. Torres-Galvan JC, Guevara E, Gonzalez FJ (2019) Comparison of deep learning architectures
for pre-screening of breast cancer thermograms. In: Proceedings of Photonics North (PN), pp
2–3, May 2019. https://doi.org/10.1109/PN.2019.8819587
12. Fernández-ovies FJ, De Andrés EJ (2019) Detection of breast cancer using infrared thermog-
raphy and deep neural networks. In: Bioinformatics and biomedical engineering. Springer,
Berlin, Germany. https://doi.org/10.1007/978-3-030-17935-9
13. Mishra S, Prakash A, Roy SK, Sharan P, Mathur N (2020) Breast cancer detection using thermal
images and deep learning. In: Proceedings of 7th international conference on computing for
sustainable global development (INDIACom), pp 211–216, March 2020
14. Van Erkel AR, Pattynama PMT (1998) Receiver operating characteristic (ROC) analysis: basic
principles and applications in radiology. Eur J Radiol 27:88–94
15. Sasaki Y (2007) The truth of the F-measure
An Improved Fourier Transformation
Method for Single-Sample Ear
Recognition
1 Introduction
Biometrics [1] are physical or behavioral characteristics that can uniquely iden-
tify a human being. Physical biometrics include—face, eye, retina, ear, fingerprint,
palmprint, periocular, footprint, etc. Behavioral biometrics include voice matching,
signature, handwriting, etc. There have been several applications [1] of biometrics
in diverse areas such as ID cards, surveillance, authentication, security in banks,
airports, corpse identification, etc. Ear [2] is a recent biometric which has drawn
attention of the research community. This biometric possesses certain characteris-
tics which distinguish it from other biometrics, e.g., less amount of information is
required than face, where the person is standing in a profile manner to the cam-
era, face recognition does not perform satisfactorily. Further, no user cooperation is
required for ear recognition as required by other biometrics such as iris, fingerprint,
etc.
Ear is one of those biometrices whose permanence attribute is very high. Unlike
our face which changes considerably throughout our life, ear experiences very less
changes. Further, it is fairly collectible and in the post-COVID scenario, it can be
considered as a safer biometric since face and hands are covered with masks or gloves.
It can be more acceptable if we do not bother a user for more number of samples.
In real-world scenario, the problem of ear recognition becomes more complex when
only a single training sample is available. Under these circumstances, One sample
per person (OSPP) [3] architecture is used. This methodology has been highlighted
in research community over all the problem domains such as face recognition [3,
4], ear recognition [5], and other biometrices. The reason of OSPP being popular is
that the preparation of dataset; specifically the collection of samples from source is
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 187
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_14
188 A. R. Srivastava and N. Kumar
very easy. However, recognition becomes more complex due to the lack of samples.
Hence, the model can not be trained in the best possible manner.
There are several methods suggested in literature by researchers for addressing
OSPP for different biometric traits. Some of the popular methods include Principal
Component Analysis (PCA), Kernel PCA, Wavelet transformation, Fourier trans-
formation with frequency component masking, and wavelet transformation using
subbands. Here, we propose an improved Fourier transform-based method for single-
sample ear recognition. The biometrics image samples have been pre-processed using
a morphological operation called opening. This is followed by the selection of high-
frequency components using Fourier transformation and then PCA is used for feature
extraction. Finally, SVM is used as a classifier. The performance of the proposed
method is evaluated on the publicly available Indian Institute of Technology-Delhi
(IIT-D) [6] ear dataset. Samples of dataset are shown in Fig. 1.
The rest of the paper is organized as follows: Sect. 2 presents the related work
in single-sample ear recognition. Section 3 details the proposed improved Fourier
transform-based method. Experimental setup and results are given in Sect. 4. Finally,
the conclusion and future work are given in Sect. 5.
2 Related Work
PCA method was used for ear recognition by Zhang and Mu [9]. This method
extracted local as well as global features. Linear Support Vector Machine (SVM)
was used for classification. Later in 2009, Long and Chun [10] proposed using
wavelet transformations for ear recognition. The proposed method was better than
PCA and Linear Discriminant Analysis (LDA) [11] previously implemented. In 2011,
Zhou et al. [12] used color Scale-Invariant Feature Transform (SIFT) method for
An Improved Fourier Transformation Method for Single-Sample Ear Recognition 189
representing the local features. In the same year, Wang and Yan [13] employed an
ensemble of local binary pattern (LBP), direct LDA (linear discriminant analysis),
and waterlet transformation methods for recognizing ears. The method was able to
give accuracy up to 90% depending upon the feature dimension given as input. A
robust method for ear recognition was introduced in 2012 by Yuan et al. [14]. They
proposed an ensemble method of PCA, LDA, and random projection for feature
extraction and sparse classifier for classification. The proposed was able to recog-
nize partially occluded image samples. In 2014, Taertulakarn et al. [15] proposed ear
recognition based on Gaussian curvature-based geometric invariance. The method
was particularly robust against geometric transformations. In the same year, advanced
form of wavelet transformation along with discrete cosine transformation was intro-
duced by Ying et al. [16]. The wavelet used weighted distance which highlighted
the contribution of low-frequency components in an image.
In 2016, Tian and Mu [17] used deep neural network for ear recognition. The
proposed method also took advantage of CUDA cores for training the model. The
final model was quite accurate against hair, pin, and glass occluded ear image. The
same year, One Sample Per Person (OSPP) problem for ear biometric was tackled
by Chen and Mu [18]. This method used an adaptive multi-keypoint descriptor sparse
representation classifier. This method was occlusion-resistant and better than con-
temporary methods. The recognition time was little high in the band of 10–12 s. In
2017, Emersic et al. [8] introduced an extensive survey of methods of ear recognition.
In this paper, different divisions were suggested for recognition approaches depend-
ing on the technique used for feature extraction, viz. holistic, geometric, local, and
hybrid. Holistic approaches describe the ear with global properties. In this approach,
ear sample is analyzed as a whole and local variations are not taken into considera-
tion. Methods using geometrical characteristics of ear for feature representation are
known as geometric approaches. Geometric characteristics of ear include location of
specific ear parts, shape of the ear, etc. Local approaches describe local parts or local
appearance of the ear and use these features for the purpose of recognition. Hybrid
approaches involve those techniques which cannot be categorized into other cate-
gories or are an ensemble of different category methods. The paper also introduced
a very diverse ear dataset called Annotated Web Ears (AWE) which has been used
in this paper also.
In 2018, deep transfer learning method was proposed as a deep learning tech-
nique for ear biometric recognition by Ali et al. [19] over a pretrained CNN model
called ALexNet. The methodology involved using a state-of-the-art training func-
tion called Stochastic Gradient Descent with Momentum (SGDM) and momentum of
0.9. Another deep learning-based method was suggested in 2019 by Petaitiemthong
et al. [20]. In this method, a CNN architecture was employed for frontal-facing
ear recognition. It was more acceptable due to the fact that the creation of face
dataset simultaneously created the ear dataset. In the same year, Zarachoff et al. [21]
proposed a variation of wavelet transformation and successive PCA for single sam-
ple ear recognition. In 2020, Omara et al. [22] introduced a variation of Support
Vector Machine (SVM) for ear biometric recognition called “Learning distance
Metric via DAG Support Vector Machine.” In 2021, deep unsupervised active
190 A. R. Srivastava and N. Kumar
learning methodology was proposed by Khaldi et al. [23]. The labels were predicted
by the model as it was unsupervised. Conditional Deep convolutional generative
adversarial network (cDCGAN) was used to color the gray-scale image which fur-
ther increased the accuracy of recognition.
Principal component analysis, or PCA [11], is a method used to reduce the dimen-
sions of samples. It extracts those features which contain more variation in the inten-
sity values and have higher contribution in image details. Reducing the number of
variables of a dataset naturally comes at the expense of accuracy, but the trick in
dimensionality reduction is to trade a little accuracy for simplicity. Because smaller
datasets are easier to explore and visualize and make analyzing data much easier and
faster for machine learning algorithms without extraneous variables to process.
PCA is a linear method which means that it can only be applied to datasets which
are linearly separable. So, if we were to use it on non-linear datasets, higher chances
are of getting inconsistent data. Kernel PCA [9] uses a kernel function to project
dataset into a higher dimensional feature space, where the data is linearly separable.
Hence, using the kernel, the original linear operations of PCA are performed in a
reproducing kernel Hilbert space. Most frequently used kernels include cosine, linear,
polynomial, radial basis function (rbf), sigmoid as well as pre-computed kernels.
Depending upon the type of dataset on which these kernels are applied, different
kernels may have different projection efficiency. Thus, the accuracy depends solely
on the kernel used in the case of KPCA.
In the case of ear biometric, most of the data is contained in edges. In general case
also, edge is the most important high-frequency information of a digital image. The
traditional filters not only eliminate noise effectively but also make the image blurry.
Blurring heavily deteriorates the edges. So, noise reduction becomes too costly in
terms of information tradeoff. It is a top priority to retain the edge of the image
when reducing the noise in an image. The wavelet analysis [10, 21] method is a
time–frequency analysis method which selects the appropriate adaptive frequency
band on the basis of the images’ frequency component. Then the frequency band
matches the spectrum which improves the time–frequency resolution. The wavelet
transformation method has an obvious effect on the removal of noise in the signal. It
also falls under the category of “local approaches”. It preserves the locality of data
while conversion from spatial/time to frequency domain. Hence, further operations
can be applied in the frequency domain itself.
Fourier Transform [24] is a mathematical process that represents the image accord-
ing to its frequency content. It is used for analyzing the signals. It involves the
decomposition of the image components in the frequency domain in terms of infinite
sinusoidal or cosinusoidal components. For a function of time, Fourier transform is
a complex-valued function of frequency, whose magnitude gives the amount of that
frequency present in the original function, and whose argument is the phase offset
of the basic periodic wave in that frequency.
Unlike wavelet transformation which was a “local” approach, Fourier is a “holis-
tic” approach. While converting from time/spatial domain to frequency domain, the
locality of data is not preserved. Hence, data at each pixel in the resulting frequency
An Improved Fourier Transformation Method for Single-Sample Ear Recognition 191
map represents the components of the whole image in different proportions. Further
operations in frequency domain become tricky, but the same “holistic” nature of this
method increases its responsiveness towards other noise reduction techniques.
3 Proposed Work
Image pre-processing [24] using morphological operations [25] plays a vital role
in improving the system performance. In morphology, two basic operations include
dilation and erosion. Dilation operation is at the most basic level and XOR operation
is performed on an image using a structuring element. It is used to fill holes as well
as connect broken areas; subsequently, it widens the edges and increases the overall
brightness of the images. Erosion, on the other hand, is the dual of dilation operator. It
removes small anomalies as well as disconnects isthmus-like structures from images.
Other advanced morphological operators are based on these two operators. One
such operation is called Opening, which is successive dilation of the eroded image.
The main aim of this operation is to remove small noise from the foreground. An
illustration of these morphological operations is shown in Fig. 2.
We can see that erosion operation, although effectively removes the hair noise from
the background, also increases the dimension of the ear periphery edge in foreground
which is an important descriptor of the ear. Dilation removes that descriptor altogether
as well as emphasizes the hair noise. Opening operation resembles the denoised ear
to the maximum extent. Closing operation, although removes the hair occlusion
effectively, also removes the periphery descriptor. Hence, in this proposed method,
opening is preferred as a method of denoising the ear sample.
A schematic representation of the proposed method is shown in Fig. 3. After
the pre-processing step, the Fourier transform is applied for finding low- and high-
frequency components in the biometric image. Due to the fact that low-frequency
components do not contribute much to the classification task, high-frequency compo-
nents are selected using masking operation. The frequency components are arranged
in descending order and the top 10% components are selected for image reconstruc-
Fig. 2 (left to right) Binary image, Images after erosion, dilation, opening, closing operations
192 A. R. Srivastava and N. Kumar
tion using Inverse Fourier transform (IFT) [24]. Subsequently, PCA is applied for
feature extraction. Finally, support vector machine classifier is used for classification
and Radial Basis Function (rbf) [26] kernel is used due to its property of projecting
data into an infinite dimension.
Since data is finite, so infinite dimension won’t be necessary. It guarantees the most
optimum hyperplane since all data will be linearly separable in infinite dimensions.
It contains two parameters: Regularization parameter (C) and Acceptance parameter
(gamma). Regularization parameter indicates the complexity of decision boundary
and a high value of this parameter will lead to overfitting since the boundary will be
too complex to miss any point and a low value indicating that the boundary will be
linear and model will underfit in training phase itself. Gamma is applicable to rbf
kernel since it is based on the Gaussian function and has a classical inverted bell-
shaped graph. Gamma indicates the significant region on that curve. A low value
of gamma indicates that the model is too strict and may give low accuracy since it
has very low tolerance towards deviation of samples. A high value of gamma will
again lead to overfitting since it will accept any sample against any other sample.
The proposed method improves the performance of the traditional Fourier transform-
based method significantly. Experimental results presented in the next section also
support this fact.
4 Experimental Results
In this section, we will compare the performance of the improved Fourier transforma-
tion method with other peer methods, viz. PCA, KPCA, and wavelet transformation
using sub-bands in single-sample ear recognition scenario. The experiments are per-
formed on the publicly available IIT-Delhi ear dataset. This dataset contains a total
of 493 images corresponding to 125 identities with each image of size 50 × 180.
One image per person is used for training and the remaining are used for testing.
Each identity contains at least three images. The training is repeated three times by
An Improved Fourier Transformation Method for Single-Sample Ear Recognition 193
selecting one image of each identity in each iteration and forming the test set of
the remaining images. The average classification accuracy of the three iterations is
reported in this paper.
Each ear image is converted into a flattened feature vector of size 9000 (= 50 ×
180). Thus, the size of the training data becomes 125 × 9000 whose covariance
matrix will be of size 125 × 125. So, the maximum number of components after
application of PCA is restricted to 125. Hence, the model is trained and tested on
all possible number of principal components. The highest accuracy was obtained
within the top 25 principal components in most cases. The performance in terms of
average classification accuracy of proposed and compared methods on the basis of
application of morphological operation is summarized in Table 1 and accuracy is
plotted for all methods at all possible principal components in Fig. 4.
Now, we show the effect of kernel size of morphological operation on the clas-
sification accuracy as shown in Fig. 5. It can be observed that the kernel of size
6 × 14 performs optimally for the proposed method and result in a classification
accuracy of 87.22%. It can also be observed that traditional PCA features are not
Table 1 Average classification accuracy of proposed and compared methods with and without
morphological preprocessing
Method Without opening With opening % Improvement
Accuracy (%) Components Accuracy (%) Components
PCA [11] 71.59 6 76.05 21 6.23
KPCA [9] 71.03 8 78.26 102 10.18
Wavelet [21] 79.88 17 82.33 23 3.07
Proposed 74.15 18 87.22 22 17.63
Fig. 4 Average classification accuracy of various methods against number of principal components
194 A. R. Srivastava and N. Kumar
Fig. 6 Average
classification accuracy of
various methods against
classifier parameters
much suitable for single-sample ear recognition. Further, the effect of regularization
and gamma parameters is shown in Fig. 6. It can be readily observed that the classi-
fication accuracy is not affected much over a large range of both these parameters.
The classification accuracy decreases sharply when both these parameters take val-
ues of more than 250. The highest accuracy was obtained at parameters of classifier
C = 200 and gamma = 0.001.
An Improved Fourier Transformation Method for Single-Sample Ear Recognition 195
Ear recognition has emerged as an attractive research area in the past few decades.
This problem becomes more challenging when there is only one sample per person
available for training. In this paper, we have proposed an improved method based on
Fourier transformation for addressing single-sample ear recognition. Experimental
results show that the proposed method performs better than the traditional Fourier
transformation-based method. Further, it also performs better than several state-of-
the-art methods. In future work, it can be explored how the deep learning-based
methods can be exploited for single-sample ear recognition.
References
1. Jain A, Bolle R, Pankanti S (1996) Introduction to Biometrics. In: Jain AK, Bolle R, Pankanti
S (eds) Biometrics. Springer, Boston, MA. https://doi.org/10.1007/0-306-47044-6_1
2. Yuan L, Mu Z, Xu Z (2005) Using ear biometrics for personal recognition. In: Li SZ, Sun
Z, Tan T, Pankanti S, Chollet G, Zhang D (eds) Advances in biometric person authentication.
In: IWBRS 2005. Lecture Notes in computer science, vol 3781. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/11569947_28
3. Kumar N, Garg V (2017) Single sample face recognition in the last decade: a survey. Int J
Pattern Recogn Artif Intell. https://doi.org/10.1142/S0218001419560093
4. Zhao W, Chellappa R, Phillips P, Rosenfeld A (2003) Face recognition: a literature survey.
ACM Comput Surv 35(4):399–458. https://doi.org/10.1145/954339.954342
5. Kumar N (2020) A novel three phase approach for single sample ear recognition. In: Boony-
opakorn P, Meesad P, Sodsee S, Unger H (eds) Recent advances in information and commu-
nication technology 2019. Advances in intelligent systems and computing, vol 936. Springer,
Cham. https://doi.org/10.1007/978-3-030-19861-9_8
6. Kumar A, Wu C (2012) Automated human identification using ear imaging. Pattern Recogn
41(5)
7. AMI Ear database. https://ctim.ulpgc.es/research_works/ami_ear_database/
8. Emeršič Z, Štruc V, Peer P (2017) Ear recognition: more than a survey. Neurocomputing
255:26–39. ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2016.08.139. (https://www.
sciencedirect.com/science/article/pii/S092523121730543X)
9. Zhang H, Mu Z (2008) Ear recognition method based on fusion features of global and local
features. In: 2008 international conference on wavelet analysis and pattern recognition, pp
347–351. https://doi.org/10.1109/ICWAPR.2008.4635802
10. Long Z, Chun M (2009) Combining wavelet transform and orthogonal centroid algorithm for ear
recognition. In: 2009 2nd IEEE international conference on computer science and information
technology, pp 228–231. https://doi.org/10.1109/ICCSIT.2009.5234392
11. Kaçar Ü, Kirci M, Güneş E, İnan T (2015) A comparison of PCA, LDA and DCVA in ear
biometrics classification using SVM. In: 2015 23nd signal processing and communications
applications conference (SIU), pp 1260–1263. https://doi.org/10.1109/SIU.2015.7130067
12. Zhou J, Cadavid S, Mottaleb M (2011) Exploiting color SIFT features for 2D ear recognition.
In: 2011 18th IEEE international conference on image processing, pp 553–556. https://doi.org/
10.1109/ICIP.2011.6116405
13. Wang Z, Yan X (2011) Multi-scale feature extraction algorithm of ear image. In: 2011 inter-
national conference on electric information and control engineering, pp 528–531. https://doi.
org/10.1109/ICEICE.2011.5777641
196 A. R. Srivastava and N. Kumar
14. Yuan L, Li C, Mu Z (2012) Ear recognition under partial occlusion based on sparse repre-
sentation. In: 2012 international conference on system science and engineering (ICSSE), pp
349–352. https://doi.org/10.1109/ICSSE.2012.6257205
15. Taertulakarn S, Tosranon P, Pintavirooj C (2014) Gaussian curvature-based geometric invari-
ance for ear recognition. In: The 7th 2014 biomedical engineering international conference, pp
1–4. https://doi.org/10.1109/BMEiCON.2014.7017396
16. Ying T, Debin Z, Baihuan Z (2014) Ear recognition based on weighted wavelet transform and
DCT. In: The 26th Chinese Control and decision conference (2014 CCDC), pp 4410–4414.
https://doi.org/10.1109/CCDC.2014.6852957
17. Tian L, Mu Z (2016) Ear recognition based on deep convolutional network. In: 2016 9th
international congress on image and signal processing, biomedical engineering and informatics
(CISP-BMEI), pp 437–441. https://doi.org/10.1109/CISP-BMEI.2016.7852751
18. Chen L, Mu Z (2016) Partial data ear recognition from one sample per person. IEEE Trans
Hum-Mach Syst 46(6):799–809. https://doi.org/10.1109/THMS.2016.2598763
19. Almisreb A, Jamil N, Din N (2018) Utilizing AlexNet deep transfer learning for ear recognition.
In: 2018 fourth international conference on information retrieval and knowledge management
(CAMP), pp 1–5. https://doi.org/10.1109/INFRKM.2018.8464769
20. Petaitiemthong N, Chuenpet P, Auephanwiriyakul S, Theera-Umpon N (2019) Person identi-
fication from ear images using convolutional neural networks. In: 2019 9th IEEE international
conference on control system, computing and engineering (ICCSCE), pp 148–151. https://doi.
org/10.1109/ICCSCE47578.2019.9068569
21. Zarachoff M, Sheikh-Akbari A, Monekosso D (2019) Single image ear recognition using
wavelet-based multi-band PCA. In: 2019 27th European signal processing conference
(EUSIPCO), pp 1–4. https://doi.org/10.23919/EUSIPCO.2019.8903090
22. Omara I, Ma G, Song E (2020) LDM-DAGSVM: learning distance metric via DAG support
vector machine for ear recognition problem. In: 2020 IEEE international joint conference on
biometrics (IJCB), pp 1–9. https://doi.org/10.1109/IJCB48548.2020.9304871
23. Khaldi Y, Benzaoui A, Ouahabi A, Jacques S, Ahmed A (2021) Ear recognition based on
deep unsupervised active learning. IEEE Sens J 21(18):20704–20713. https://doi.org/10.1109/
JSEN.2021.3100151
24. Gonzalez R, Woods R (2006) Digital image processing, 3rd edn. Prentice-Hall Inc., USA
25. Said M, Anuar K (2016) Jambek A, Sulaiman N (2016) A study of image processing using
morphological opening and closing processes. Int J Control Theory Appl 9:15–21
26. Masood A, Siddiqui AM, Saleem M (2007) A radial basis function for registration of local
features in images. In: Mery D, Rueda L (eds) Advances in image and video technology PSIVT,
Lecture notes in computer science, vol 4872. Springer, Berlin, Heidelberg. https://doi.org/10.
1007/978-3-540-77129-6_56
Driver Drowsiness Detection for Road
Safety Using Deep Learning
Parul Saini, Krishan Kumar, Shamal Kashid, Alok Negi, and Ashray Saini
1 Introduction
Drowsiness is a state of lack of attention. It’s a normal as well as transitory stage that
happens as you’re transitioning from becoming conscious toward being sleeping.
Drowsiness can diminish a person’s attention and raise the chance of an accident
while they’re doing things like driving a car, working a crane, or operating with
heavy machinery like mine explosions. While driving, several indicators of driver
drowsiness can be detected, such as inability to keep eyes open, frequent yawning,
shifting the head forward, and so on. Various measures are used to determine the
extent of driver drowsiness. Physiological, behavioural, and vehicle-based metrics
are the three types of assessments [1].
Drowsy driving has resulted in several accidents and deaths. In a country like the
United States, over 328,000 crashes happen each year. Each year, dollar 109 billion
is spent on sleepy driving accidents [2]. To ensure that their vehicles are infallible,
many automobile manufacturers employ various drowsy driver detecting technolo-
gies. Drowsy detection systems like driver alert and driver attention warning systems
are incredibly effective and trustworthy, thanks to companies like Audi, BMW, and
Bosch. There is, though, yet room to grow. There are a lot of different factors
that may be utilised to identify tiredness in driver drowsiness detection systems.
Behavioural data, physiological measurements, and vehicle-based data can all be
used to detect criminal activity. Eye/face/head movement caught with a camera
is considered behavioural data. Electrocardiogram (ECG) heart rate, electrooculo-
gram (EOG), electroencephalogram (EEG), and others are examples of physiological
measures [2].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 197
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_15
198 P. Saini et al.
Steering wheel motion, vehicle speed, braking style, and lanes position devi-
ation are all used to provide vehicle-based data. Questionnaires and electrophys-
ical measurements can both be used to acquire data. However, getting meaningful
feedback from a driver in a real-world driving situation is usually impossible or
impracticable, and each of these methods has advantages and disadvantages. Physi-
ological assessments are excessively intrusive, as they impair the driver’s ability to
drive safely. Hardware is required for vehicle-based measurements, which may be
prohibitively expensive. Behavioural measurements, on the other hand, necessitate
minimal technology, are very cost-effective, and do not impair the driver’s ability
to drive. Because of the benefits of behavioural data, we decided to use them as the
foundation for the suggested detection system described in this study [2].
Behavioural measures are employed to detect driver tiredness in our suggested
technique. Various face detection techniques [3] were employed in the Facial Detec-
tion phase to identify the face regions from the input photos. The problem of detecting
human faces is simpler, but it is more challenging in computer vision. Face detection
algorithms are divided into two categories: feature-based and image-based. Image-
based algorithms for face detection have used statistical, neural network, and liner
subspace methods. Different eye area detection techniques were employed in the
second stage to detect and extract the eye region from facial photographs. After
finding face regions, normalisation is performed in the preprocessing stage to reduce
the impacts of illumination. Histogram equalisation can be used to adjust the contrast
discrepancies between face images.
Extracted features was applied to the input eyes region images in the third stage.
Appearance-based feature extraction and geometric-based feature extraction are the
two basic methods for extracting features from photos. The geometric extraction
approach extracts metrics relating to shape and position from the eyes and brows. In
contrast, appearance-based feature extraction uses techniques like PCA [4], Discrete
Cosine Transform (DCT) [5], and Linear Discriminant Analysis to extract skin
appearance or face features (LDA). These approaches can be used to extract facial
traits from the full face or specific parts of the face. Rise sleeping and non-sleeping
images using the features extracted in the previous two steps. Deep Layered CNN
was created to classify drowsy drivers [6].
2 Literature Review
This section describes Drowsiness Detection models [7, 8] and their limitations,
along with some deep learning [9] processes that can automatically be fea tured
directly from the raw data.
Babaeian et al. [10] introduced a unique technique for measuring driver drowsiness
that uses biomedical signal analysis based on machine learning and is applied to heart
rate variation (HRV), which is measured from an ECG signal. The wavelet transform
(WT) and the short Fourier transform (SFT) are used in the procedure (STFT). It then
uses the support vector machine (SVM) and k-nearest neighbour (KNN) methods
Driver Drowsiness Detection for Road Safety Using Deep Learning 199
to extract and select the desired features [10]. The applied technique achieves an
accuracy of 80% or more as a result of this. The accuracy result for the SVM approach
is 83.23% when using STFT and 87.5% when using WT methods in our research. The
algorithm with the best accuracy resulted in a lower number of drowsiness-related
accidents, as our findings demonstrate.
Jabbar et al. [11] proposed the model in which accuracy was improved by using
facial landmarks detected by the camera and transmitted to a Convolutional Neural
Network (CNN) to classify tiredness. With more than 88% for the category without
glasses and more than 85% for the category night without glasses, study has demon-
strated the ability to give a lightweight alternative to larger categorization models.
In all areas, more than 83% accuracy was attained on average. Furthermore, the new
proposed model has a significant reduction in model size, complexity, and storage
when compared to the benchmark model, which has a maximum size of 75 KB. The
suggested CNN-based model may be used to create a high-accuracy and simple-to-
use real-time driver drowsiness detection system for embedded systems and Android
devices.
Saifuddin et al. [12] proposed research used a cutting-edge cascade of regressors
method, in which each regression refers to estimation of facial landmarks, to improve
recognition under drastically variable illumination situations. To learn nonlinear data
patterns, the proposed method uses a deep convolutional neural network (DCNN).
In this case, the challenges of varying illumination, blurring, and reflections for
robust pupil detection are overcome by using batch normalisation to stabilise distri-
butions of internal activations during the training phase, reducing the impact of
parameter initialization on the overall methodology. The accuracy rate of 98.97%
was attained utilising a frame rate of 35 frames per second in the proposed research,
which is greater than prior research results. Balam et al. [1] proposed unique deep
learning architecture based on a convolutional neural network (CNN) for auto-
matic drowsiness detection utilising a single-channel EEG input is proposed in this
paper. Subject-wise, cross-subject- wise, and combined-subjects-wise validations
have been used to improve the suggested method’s generalisation performance. The
entire project is based on pre-recorded sleep state EEG data from a benchmarked
dataset. When compared to existing state-of-the-art drowsiness detection algorithms
using single-channel EEG signals, the experimental results reveal a greater detection
capability.
200 P. Saini et al.
3.1 Dataset
The Deep Learning model developed here is trained on images obtained from open
source driver drowsiness detection dataset. Open dataset is classified into two cate-
gories: Closed and open. 1234 images for training belonging to 2 classes. 218 images
test belonging to 2 classes. These images are preprocessed to create frames for this
study.
for many types of neural networks. The second dense layer is used for final output
with 2 hidden nodes using softmax activation function. In neural network models
that predict a multinomial probability distribution, the softmax function is utilised
as the activation function in the output layer. Softmax is therefore utilised as the
activation function for multiclass classification issues requiring class membership
on more than two class labels.
Step 4: Transfer Learning To detect driver tiredness using hybrid features, a multi-
layer based transfer learning strategy employing a convolutional neural network
(CNN) was applied. A pre-train VGG-16 model, which is a sort of transfer learning
approach, was employed to optimise feature.
The experiments were conducted on Google colab using python and model training
runs for a total of 50 epochs with a batch size of 16. Image Data Generator is used for
randomizing the training images for better performance of the model. Categorical
cross entropy loss and accuracy are used as a metrics. A classifier’s performance can
be measured using a variety of indicators. Total accuracy, precision, recall and F1
Score measures are used in this paper and represented by Eqs. 1, 2, 3 and 4,
Accuracy is the number of correct predictions made as a ratio of all predictions
made.
TP + TN
Acc = (1)
FN + TP + TN + FP
Precision analyzes the ability of the model to detect activeness when a subject is
actually active.
The proposed work recorded 97.81% training accuracy with 0.07 loss and 96.79%
accuracy with 0.08 loss score. The accuracy and loss curve are shown in Figs. 2 and 3.
The precision, recall and F1 score are calculated as 97.22, 96.33 and 96.77%
respectively. Confusion matrix are shown in Fig. 4. So, the eyes are certainly a
crucial element in drowsiness classification in any setting, according to research and
experimentation.
5 Conclusion
References
1. Balam VP, Sameer VU, Chinara S (2021) Automated classification system for drowsiness
detection using convolutional neu ral network and electroencephalogram. IET Intell Transp
Syst 15(4):514–524
2. Dua M, Singla R, Raj S, Jangra A (2021) Deep CNN models-based ensemble approach to
driver drowsiness detection. Neural Comput Appl 33(8):3155–3168
3. Dang K, Sharma S (2017) Review and comparison of face detection algorithms. In: 2017 7th
international conference on cloud computing, data scienceand engineering confluence. IEEE,
pp 629–633
4. VenkataRamiReddy C, Kishore KK, Bhattacharyya D, Kim TH (2014) Multi-feature fusion
based facial expression classification using DLBPand DCT. Int J Softw Eng Its Appl 8(9):55–68
5. Ramireddy CV, Kishore KK (2013)Facial expression classification using Kernel based PCA
with fused DCT and GWT features. In: 2013 IEEE international conference on computational
intelligence and computing research. IEEE, pp 1–6
6. Chirra VR, Reddy SR, Kolli VKK (2019) Deep CNN: a machine learning approach for driver
drowsiness detection based on eye state. Rev d’Intelligence Artif 33(6):461–466
7. Altameem A, Kumar A, Poonia RC, Kumar S, Saudagar AKJ (2021) Early identification and
detection of driver drowsiness by hybrid machine learning. IEEE Access 9:162805–162819
Driver Drowsiness Detection for Road Safety Using Deep Learning 205
8. Esteves T, Pinto JR, Ferreira PM, Costa PA, Rodrigues LA, Antunes I, ... Rebelo A (2021)
AUTOMOTIVE: a case study on AUTOmatic multiMOdal drowsiness detecTIon for smart
VEhicles. IEEE Access 9:153678–153700
9. Negi A, Kumar K, Chauhan P, Rajput RS (2021) Deep neu ral architecture for face mask
detection on simulated masked face dataset against COVID-19 pandemic. In: 2021 international
conference on computing, communication, and intelligent systems (ICCCIS). IEEE, pp 595–
600
10. Babaeian M, Mozumdar M (2019)Driver drowsiness detection algorithms using electrocardio-
gram data analysis. In: 2019 IEEE 9th annual computing and communication workshop and
conference (CCWC). IEEE, pp 0001–0006
11. Jabbar R, Shinoy M, Kharbeche M, Al-Khalifa K, Krichen M, Barkaoui K (2020) Driver
drowsiness detection model using convolutional neural networks techniques for android appli-
cation. In: 2020 IEEE international conference on informatics, IoT, and enabling technologies
(ICIoT). IEEE, pp237–242
12. Saifuddin AFM, Mahayuddin ZR (2020) Robust drowsiness detection for vehicle driver using
deep convolutional neural network. Int J Adv Comput Sci Appl 11(10)
13. McDonald AD, Lee JD, Schwarz C, Brown TL (2018) A contextual and temporal algorithm
for driver drowsiness detection. Accid Anal Prev 113:25–37
14. Zhao L, Wang Z, Wang X, Liu Q (2018) Driver drowsiness detection using facial dynamic
fusion information and a DBN. IET Intel Transp Syst 12(2):127–133
15. Reddy B, Kim YH, Yun S, Seo C, Jang J (2017)Real-time driver drowsiness detection for
embedded system using model compression of deep neural networks. In: Proceedings of the
IEEE conference on computer vision and pattern recognition workshops, pp 121–128
16. Jabbar R, Al-Khalifa K, Kharbeche M, Alhajyaseen W, Jafari M, Jiang S (2018) Real-time
driver drowsiness detection for android application using deep neural networks techniques.
Procedia Comput Sci 130:400–407
17. Deng W, Ruoxue W (2019) Real-time driver-drowsiness detection system using facial features.
IEEE Access 7:118727–118738
Performance Evaluation of Different
Machine Learning Models in Crop
Selection
1 Introduction
Agriculture is the world’s primary source of food supply, and India is no exception.
The pressure for food demand is increasing with growing population and reducing
natural resources [1]. Hence, a more strategic approach with the use of modern
technologies like artificial intelligence is need of the hour. Machine learning is a
subsidiary of artificial intelligence, having two categories: supervised and unsuper-
vised learning. Supervised learning algorithms perform classification or regression
tasks, while unsupervised learning can cluster data based on similarity. ML tech-
niques are being applied in various applications such as cybersecurity, agriculture,
e-commerce, healthcare, and many more [2]. There are a variety of machine learning
techniques that can assist in developing predictive models to solve real-world prob-
lems. ML is used in agriculture to solve various issues, including proper crop selec-
tion, weather forecasting, crop disease detection, agricultural production forecasting,
and automated agricultural systems [3].
Traditional agricultural practices pose several challenges in terms of cost-
effectiveness, and resource utilization including improper crop selection, declining
crop yield, inappropriate usage of fertilizer and pesticides [4, 5]. Farmers and the agri-
culture community can benefit from machine learning technology to solve various
issues by increasing crop yields and profits. Soil quality, climatic conditions, and
water requirements play a vital role in crop selection for a specific piece of land [6].
In recent years, ML algorithms have been used in various aspects of agriculture like
weather and yield prediction, disease detection, farmers risk assessment, and many
more [7].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 207
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_16
208 A. Bhola and P. Kumar
This paper implements six supervised machine learning models and analyze their
performance for crop selection. The performance of these algorithms is evaluated in
terms of accuracy parameter. This paper is organized as follows: Section I highlights
the importance of ML in agriculture along with various agricultural issues. Section II
discusses the related work in the field of crop selection. Various ML models used in
this study are discussed in Section III. Section IV compares different crop prediction
models with experimental data. Finally, Section V concludes the paper.
2 Related Work
Agriculture is the most important economic sector for any country, including India.
Machine learning in agriculture aids scientists and researchers in predicting crop
production, fertilizer, and pesticide use to boost crop yield and maximize resource
utilization. Classification and prediction approaches using weather and soil data are
analyzed for crop selection.
Paul et al. [8] provide a soil classification technique that divides soil into three
categories based on its nutrient content: low, medium, and high. KNN classifies soil
characteristics such as phosphorus, nitrogen, potassium, organic carbon, pH, and a
few other micronutrients. The different soil categories help in determining the crop
to a particular soil for optimal yield.
Kumar et al. [9] describe a crop selection approach for maximizing output yield.
This research work also proposes crops plantation in a specific order annually to
maximize the production. Crops are divided into two groups based on how long they
take to grow, such as (1) crops available only at certain times of the year, (2) crops
that can be grown throughout the year. Weather, soil, crop, and water density features
are used for crop selection. This work also recommends crop sequencing depending
on the crop sowing duration, time of plantation, and expected yield.
Tseng et al. [10] implement a crop selection approach that uses sensors to collect
meteorological data such as humidity, temperature, etc. and soil data such as electrical
conductivity, salinity, etc. 3D clustering is applied to examine crop growth used by
farmers for a particular crop.
Pudumalar et al. [11] present an ensemble technique that uses Random forest,
Naive Bayes, K-nearest neighbor, and CHAID to classify factors including soil
colour, texture, depth, and drainage. This approach selects the crop for a given land
using various input parameters.
Priya et al. [12] implement a Naive Bayes classification technique that uses
weather parameters such as soil moisture, temperature, rainfall, and air pressure
to determine the adaptability of crops such as rice, maize, cotton, and chilli. This
approach also suggests the appropriate time for harvesting and sowing a specific
crop.
Pratap et al. [13] implement a CART based system for fertilizer recommendation
that uses ML model to determine the type and quantity of fertilizer to be used to
Performance Evaluation of Different Machine Learning Models in Crop Selection 209
maximize yield. This work tries to forecast the fertility of a particular soil sample in
real-time by determining the soil nutrient content.
Chiche et al. [14] developed a neural network based crop yield prediction system.
The proposed framework achieves a better prediction accuracy with 92.86% on 3281
instances collected from agricultural land dataset.
Kumar et al. [15] applied Logistic Regression, Support Vector Machine (SVM),
and Decision Tree algorithms to predict the suitable crop based on agriculture param-
eters. These classification algorithms are compared and analyzed for crop prediction.
The result shows that SVM performs better than other studied models.
Islam et al. [16] used Deep Neural Network (DNN) for agricultural crop selection
and yield prediction. Various climatic and weather parameters are given as an input
to the model for the crop prediction. The authors compared proposed DNN with
SVM, Random forest, and Logistic regression. DNN outperforms other models in
terms of accuracy.
From the literature review done on the existing work, it can be concluded that
ML algorithms are being used in agriculture domain, but still there is a lot of scope
in improving their performance in crop selection and yield prediction. Hence, this
research work is conducted with a comparative study of supervised algorithms in
crop selection. The following section discusses various machine learning models
used in the agriculture domain.
This work implemented six different machine learning-based crop selecting algo-
rithms. Different ML algorithms used are decision trees, random forests, support
vector machines, naive Bayes, XGBoost, and k-NN to design and analyze crop-
selecting models. The supervised machine learning algorithms are chosen for more
accuracy in prediction tasks than unsupervised learning [17]. Various soil and weather
parameters are used to implement these models. Soil parameters used are pH, nitrogen
210 A. Bhola and P. Kumar
(N), phosphorus (P), and potassium (K), and weather parameters used are tempera-
ture, humidity, and rainfall. Different machine learning models are discussed in the
following subsection.
Decision Tree Classifier: A decision tree (DT) is a tree-structured classifier where
internal nodes denote features, branches represent the decision rules, and each leaf
node represents the outcome. The decisions or the tests are performed based on
features of the given dataset. One of the DT techniques is classification or regression
tree (CART). The tree begins with the root node, which contains all of the data, and
splits the nodes using intelligent algorithms. It uses various impurity measures like
the Gini Impurity Index, or Entropy to split the nodes. The Gini index and Entropy for
a classification problem is defined in Eqs. (1) and (2) respectively, where n denotes
total class and pi is the probability of an object that is being classified to a particular
class ‘n’.
n
Gini = 1 − ( pi )2 (1)
i=1
n
Entr opy = − pi ∗ log2 ( pi ) (2)
i=1
p(X/y) p(y)
p(y/ X ) = (3)
p(X )
f (x) = w ∗ x + b (4)
other forming a layered structure. The structure comprises of an input layer, one or
more hidden layers, and an output layer. ANN uses training data to learn and upgrade
their performance. The equation for the neural network is a linear combination of the
independent variables and their respective weights and bias term for each neuron.
Equation (7) shows the neural network formula, where W0 is bias, W1, W2… Wn are
the weights, and X1, X2… Xn are inputs. Here, each term represents neuron which is
a combination of independent variables and their respective weights.
Z = W0 + W 1 X 1 + W2 X 2 + · · · + Wn X n (7)
The discussed ML algorithms are designed to choose the optimum crop for a
specific piece of land based on the soil and environmental properties of the land.
These algorithms use soil attributes of a particular area and the required climatic
conditions to recommend crops. The following section discuss the experimental
setup, dataset description, results achieved and their discussion.
This section discusses the experimental setup to perform the analysis, dataset used,
implementation specification, and discussion of results achieved.
4.2 Dataset
The dataset considered in this study is collected from Kaggle [18]. The dataset
includes soil properties like pH, phosphate (P), potassium (K), nitrogen (N), and envi-
ronmental parameters that affect crop development like humidity, and precipitation.
Table 1 presents the description of the features used in this study.
The data collected contains 2200 land samples and 22 different crops, with each
crop containing 100 different land samples. The various crops included in the study
Performance Evaluation of Different Machine Learning Models in Crop Selection 213
are maize, rice, banana, mango, grapes, watermelon, apple, orange, papaya, coconut,
cotton, jute, coffee, muskmelon, lentil, black-gram, kidney beans, pigeon beans,
mung beans, moth beans, and pomegranates. The following subsection analyzes the
dataset used in this paper.
This section analyses the soil and environmental data that affect the crop selection
procedure among different crop data. Primary macronutrients play a vital role in
increasing crop yield and quality. Nitrogen, phosphorus, and potassium (N, P, and
K) are the three significant elements that must be present in large quantities for
proper crop growth. Figure 1 shows the comparison of N, P, and K values required
by various crops. The required amount of macronutrients for crop development is
maximum in cotton, apple, and grapes, and minimum in lentils, blackgram, and
orange, respectively.
Figure 2 shows the essential features for crop selection. It is inferred that rainfall
and humidity are important features among all the weather parameters. Various soil
macronutrients like N, P, and K have almost equal weightage for all the crops. Overall,
rainfall has the highest importance, while pH is the least importance among all the
used parameters.
The following sub-sections discuss the algorithm used in the study, followed with
the results and discussion of the implemented machine learning based crop selection
models.
This section presents the algorithm used in the approach. Algorithm 1 explains the
detailed steps involved in crop selection.
214 A. Bhola and P. Kumar
This section highlights the result obtained from ML techniques used on the crop data.
Machine Learning models can be evaluated using a variety of performance metrics
like accuracy, precision, recall, Area under Curve (AUC), etc. This paper uses accu-
racy parameter to evaluate the models used in this study. These models are individ-
ually evaluated on the training and testing dataset as seen in Fig. 3. It shows the
comparison of the training and testing accuracy of different ML models.
As seen in Fig. 3, the decision tree has the lowest training and testing accuracy of
88.18 and 90%, respectively. Random forest and XGBoost have the highest training
accuracy of 100%, while XGBoost has the highest testing accuracy of 99.31%. As
a result, in terms of testing accuracy, it can be concluded that Random Forest and
XGBoost outperform all other supervised machine learning models.
The overall accuracy of all the crop prediction models is shown in Fig. 4. XGBoost
has the highest accuracy in comparison to other models. Accuracy for Naive Bayes,
SVM, Random Forest, and kNN are 99.09, 97.72, and 97.5%, respectively.
The Decision Tree is the worst performing model, with an accuracy of 90.0%. It
can be concluded from the results achieved that Naive Bayes, Random Forest, and
XGBoost perform better than other models for crop prediction, while XGBoost is the
one which can be used for real applications, as it performed best in terms of overall
accuracy. The following section concludes this paper, highlighting the research work
done, results achieved and future scope.
216 A. Bhola and P. Kumar
5 Conclusion
This paper compares six ML models to select crop based on soil and weather
inputs. The models used are Decision Tree, Naive Bayes, Support Vector Machine,
Random Forest, XGBoost, and K-Nearest Neighbor. The XGBoost supervised
machine learning algorithm performed best with the testing accuracy of 99.31%,
when compared with other used models. Crop selection models based on machine
learning produces better results than traditional methods, as determined from the
analysis done in this research work. Future work may include more number of param-
eters, such as water availability, irrigation facility, fertilizer requirement and market
demand.
Performance Evaluation of Different Machine Learning Models in Crop Selection 217
References
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 219
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_17
220 I. Mitra et al.
2 Proposed Model
The goal of this research is to use machine learning to help with drug supply. Using
the Apriori algorithm’s support metrics, the goal is to create a recommendation
system for the medicine that a specific customer is most likely to buy, resulting in a
win–win situation for both the customer and the shop owner: the customer gets the
most appropriate medicine they want at all times and does not have to deal with the
hassles of out-of-stock medicines; and the pharmacist learns the specific combination
of medicines that is made available quickly. A lack of drug supply implies the medical
black market is gone, which helps the economy thrive. The complete workflow of
the proposed model is given in Fig. 1.
Apriori Based Medicine Recommendation System 221
Data Preprocessing
To support the laws and syntax that the specific ML model requires, the dataset must
be preprocessed.
The following are the stages of preprocessing:
• Importing the desired libraries
• Importing datasets
• Dealing with missing data
• Encoding categorical data and encoding the dependent variable
• Feature scaling
• Splitting the dataset (training and test sets)
Apriori algorithm
• The Apriori algorithm [2, 13, 14] is an influential algorithm in determining
frequent item sets for Boolean association rules.
• Apriori uses a “bottom up” approach, where frequent item sets are extended one
item at a time (a step known as candidate generation, and groups of candidates
are tested against the data).
• Apriori is designed to operate on datasets containing transactions. For
example collection of items bought by customers.
of relationship is called single cardinality. The metrics to find the association is given
by the parameters namely Support, Confidence and Lift.
Support (Supp) is referred to as the frequency of X, or the number of times an
item appears in a collection. It is the proportion of the transaction T that contains the
itemset X as defined in (1).
Freq(X )
Supp(X ) = (1)
T
Confidence (Conf) can be defined as the frequency with which a rule is correct
which is reflected in its degree of confidence. It’s the ratio of a transaction that
contains X and Y to the number of records that include X and defined as given in (2).
Freq(X, Y )
Conf = (2)
Freq(X )
Lift is the ratio of the observed support measure and expected support if X and
Y are independent of each other as defined in (3).
Supp(X, Y )
Lift = (3)
Supp(X ) × Supp(Y )
The dataset used for simulation is a sample of medicine combinations that have
been commonly bought by customers over the past 2 months. It is a random dataset
that is made to illustrate the idea of medicine prediction and contains 7500 example
records.
The dataset has been randomly generated thus ensuring the accuracy of the model
in the context of its probability of getting lucky for a particular dataset. Since it is
generated randomly, it verifies the model’s correctness in terms of its likelihood of
being fortunate for a certain dataset. The practical use case of this dataset is that it will
224 I. Mitra et al.
be given by the chemist shop based on their previous sales. The Apriori algorithm
will be executed on this for getting the preferred result.
The most commonly bought medicine items are shown in Fig. 3.
Figure 4 displays the most popular medicines as a frequency distribution. Figure 5
is a representation of the results obtained by using the algorithm to predict most
common associations, presented as a descending order of their Lifts. Table 1 shows
the labels for the different medicine combination. Figure 6 illustrates the association
obtained for various medicine combinations, as recommended by the algorithm.
It is observed from Figs. 5 and 6 that the combination of the medicine Levothyroxin
and Lisdexamfetamine denoted by (Le+Li) has the highest Lift which indicates that
it is highly recommended. Similarly the combination of the medicine Sofosbuvir and
Lupron denoted by (So+Lu) has the lowest Lift which indicates that the combination
is least recommended.
Apriori Based Medicine Recommendation System 225
Fig. 5 Different medicine combination with their support, confidence and lift value
226 I. Mitra et al.
4 Conclusions
Patients and healthcare providers can use health recommender systems to help them
make better health-related decisions. Shortages of key medicines will likely continue
to be a problem. Our objective of, the medicine recommendation system will be
helpful for the healthcare sector. People won’t have to face the problem of unavailable
medicines, since the stores will be stocked well in advance since they can know which
medicines are most likely to be bought. Moreover, the economy will be helped since
the medical black market will be eliminated as medicines are readily available so
there will not be any shortage, thus no scope of dishonest people to dupe others by
profiteering from selling medicines at exorbitant rates to the needy people. The future
scope of this Apriori based machine learning recommendation model is that it will
allow low infrastructural casualties in a healthcare center as it will always ensure
that the best possible medicine or other health equipment are available at all times
of the year. This will boost the lack of technical and managerial policies that are
lacking today in different healthcare centers across India. This model can be further
integrated with UI/UX apps which will allow a patient and his/her family to get a clear
visual understanding of the current status of the different healthcare facilities that are
available at a healthcare center in some developed areas without even travelling long
distances in search of a preferable diagnostic center for the patient. This approach is
expected to save many lives and thereby contribute to a better policy making for the
common people.
References
1. Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential.
Health Inf Sci Syst 2:3. Published 2014 Feb 7. https://doi.org/10.1186/2047-2501-2-3
2. Al-Maolegi M, Arkok B (2014) An improved Apriori algorithm for association rules. Int J Nat
Lang Comput. 3. https://doi.org/10.5121/ijnlc.2014.3103
3. Tran TNT, Felfernig A, Trattner C et al (2021) Recommender systems in the healthcare domain:
state-of-the-art and research issues. J Intell Inf Syst 57:171–201
4. Han Q, Ji M, Martínez de Rituerto de Troya I, Gaur M, Zejnilovic L (2018) A hybrid recom-
mender system for patient-doctor matchmaking in primary care. In: The 5th IEEE international
conference on data science and advanced analytics (DSAA), pp 1–10
5. Kohli PS, Arora S (2018) Application of machine learning in disease prediction. In: 2018 4th
international conference on computing communication and automation (ICCCA). IEEE, pp
1–4
6. Ferdous M, Debnath J, Chakraborty NR (2020) Machine learning algorithms in healthcare: a
literature survey. In: 2020 11th international conference on computing, communication, and
networking technologies (ICCCNT)
7. Ganiger S, Rajashekharaiah KMM (2018) Chronic diseases diagnosis using machine learning.
In 2018 international conference on circuits and systems in digital enterprise technology
(ICCSDET). IEEE, pp 1–6
8. Ramesh D, Suraj P, Saini L (2016) Big data analytics in healthcare: a survey approach. In: 2016
international conference on microelectronics, computing and communications (MicroCom).
IEEE, pp 1–6
228 I. Mitra et al.
9. Ravì D et al (2017) Deep learning for health informatics. IEEE J Biomed Health Inform 21(1):4–
21—Geron (2019) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow.
O’Reilly Media, Inc, Canada
10. DeCaprio D, Gartner J, McCall C J, Burgess T, Garcia K, Kothari S, Sayed S (2020) Building
a COVID-19 vulnerability index. J Med Artif Intell 3
11. Ahuja V, Nair LekshmiV (2021) Artificial intelligence and technology in COVID Era: a
narrative review. J Anaesthesiol Clin Pharmacol 37:28. https://doi.org/10.4103/joacp.JOACP_
558_20
12. Tran TNT, Atas M, Felfernig A, Le VM, Samer R, Stettinger M (2019) Towards social choice-
based explanations in group recommender systems. In: Proceedings of the 27th ACM confer-
ence on user modeling, adaptation and personalization, UMAP’19. Association for Computing
Machinery, New York, NY, USA, pp 13–21
13. Bagui S, Dhar PC (2019) Positive and negative association rule mining in Hadoop’s MapReduce
environment. J Big Data 6:75. https://doi.org/10.1186/s40537-019-0238-8
14. Zheng Y, Chen P, Chen B, Wei D, Wang M (2021) Application of apriori improvement algorithm
in asthma case data mining. J Healthc Eng 2021:1–7. Article ID 9018408. https://doi.org/10.
1155/2021/9018408
NPIS: Number Plate Identification
System
Ashray Saini, Krishan Kumar, Alok Negi, Parul Saini, and Shamal Kashid
1 Introduction
Number plate recognition has been feasible vehicle monitoring in recent years. It
may be used in a variety of public spaces for a variety of objectives such as traf-
fic safety enforcement, automatic toll text collecting [1], car park system [2], and
automated vehicle parking system [3]. The number plate identification systems use
several methods to find vehicle number plates on automobiles and then extract vehi-
cle numbers from the picture. This technology is also gaining popularity because
it requires no other vehicle installation with a license plate. Although number plate
detection algorithms have advanced significantly in recent years, it remains challeng-
ing to recognize license plates from photos with complicated backgrounds. Various
scholars have offered different strategies for each phase, and each approach has
advantages and disadvantages. The three primary steps for identifying license plates
are as follows. That is the region of interest, extraction of plate numbers, and character
recognition.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 229
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_18
230 A. Saini et al.
Kim et al. [5] used a learning methodology to create a license plate recognition
system. Inside the automobile detecting module, the camera collects an image. The
result is then an image of the candidate region. Han et al. [6] proposed a system
that tracks several targets and generates high-quality photos based on plate numbers.
The author created a fine-tuned dual-camera computer with a fixed camera and a
pan-tilt-zoom camera to track moving transportation in an open field. The CNN
classifier has recognized the license plate consecutively for recognition. Because 64
cars entered this location, data was manually compiled from the science images, and
59 I.D.s were accurately recognized using this technology. Dhar et al. [7] developed
an automated program to support it for identifying license plates. Prewitt operators
performed the detection of the number plate to segment the edges. Morphological
dilation was performed to accentuate the points. Eventually, deep CNN was used to
accomplish the reconnaissance job.
As a result, technology needs to track vehicles used for illegal activities so that
the criminal can be arrested and punished as quickly as feasible. Human vision is
constrained by various elements such as speed, illumination, tiredness, and so on,
therefore relying on human aptitude for such a task is not ideal. This technology is
also gaining popularity because it does not require any other installation on vehicles
that already have a number plate. Furthermore, the previous technique results in
additional overhead due to our utilized learning parameters.
To manage the challenges noted above, we developed a vehicle number plate
detection approach that can work in low-light and noisy environments to address the
challenges above. Three categories divide the silent aspects of our work:
– The number plate identification problem was formulated as a complex and time-
consuming machine learning problem. It attains a better computational complexity
for vehicle plate number detection.
– Our method uses computer vision and deep learning to detect the vehicle number
plate in low-light and noisy environments.
– This model relies on color and texture to detect the presence of multiple edges in
images.
The outline of this paper is organized as follows. Related work of vehicle number
plate detection is described in Sect. 2. The detection and recognition modules of
our framework are described in Sect. 3. Experiments performed on test images and
results obtained are summarized in Sect. 4. Finally, conclusions are drawn, and some
comments, in general, are made in Sect. 5.
2 Literature Review
This section describes vehicle number plate identification models and their limita-
tions, along with some deep learning processes featured directly from the raw data.
Prabuwono et al. [2] studied and designed a car park control system using Optical
Character Recognition (OCR) devices which are presented in this work. The system
NPIS: Number Plate Identification System 231
is designed to work in a client–server scenario. The results reveal that the system can
save log records, which will make it easier to track parking users, update user and
parking credit databases, and monitor parking space availability.
Kim et al. [5] studied the construction of a license plate recognition system using
a learning-based technique. Three modules made up the system. The car detection,
license plate segmentation, and recognition modules are the three in question. The
car detection module recognizes a car in a given image sequence collected from the
camera with a simple color-based technique. The license plate in a detected car image
is extracted utilizing Neural Networks (NNs) as filters to analyze the license plate’s
color and texture attributes. The recognition module then uses a Support Vector
Machine (SVM)-based character recognizer to read the characters in the detected
license plate.
Qadri et al. [14] proposed Automatic Number Plate Recognition (ANPR) is a
methodology for image processing that utilizes a vehicle’s plate number to recognize
it. The developed system initially detects the car before taking a picture of it. The
image segmentation is used to retrieve the vehicle number plate region. Character
recognition is done using an optical character recognition approach. The gathered
data is then compared to records in a database to determine specific information such
as the vehicle’s owner, registration location, and address.
Fahmy et al. [12] explained the place of each contained character is extracted using
image processing procedures, and the Binary Associative Memories (BAM) neural
network handles the character identification procedure. BAM is a neural network
that may automatically read characters of a number plate. Even though BAM is a
specific neural technique, it can rectify skewed input patterns.
3 Proposed Model
The general architecture of the number plate identification system is shown in Fig. 1.
This section describes the proposed model of vehicle number plate detection steps
in detail.
Step 1: Input image and Noise Reduction
During the picture capture, coding, transmission, and processing phases, noise is
constantly present. Image noise is the random change of brightness or color infor-
mation in collected photographs. In the first step, reduce the noise from the image
to achieve better accuracy for our model by using noise-reducing filters. A common
problem with a noise-reducing filter is that it can degrade image details or the edges
present in the image.
So to eliminate the noise from the images while maintaining the features, the
model uses a bilateral filter [4]. The bilateral filter is non-linear and edge-preserving
in nature which employs the Gaussian filter, but it adds a multiplicative component
based on pixel intensity difference. It guarantees that only pixel intensities identical
to the center pixel are used when calculating the blurred intensity value. This filter is
defined by Eq. (1), where the values of parameters of bilateral filters are as follows:
diameter (Diameter of each pixel neighborhood) as 5, sigmaColor (Value in color
space) and sigmaSpace (Value in coordinate space) both as 21.
1
B F[I ] p = I (xi ) fr (||I (xi ) − I (x)||) gs (||xi − x||), (1)
W p x ∈
i
Edges are tiny fluctuations in the intensity of a picture. Edge detection is a critical
mechanism for detecting and highlighting an object in an image and defining the bor-
ders between things and the background. The most common method for identifying
significant discontinuities in intensity levels is edge detection. The edge represen-
tation of an image minimizes the amount of data to be processed while retaining
important information about the forms of objects in the picture.
Gabor filter [8] has been used for edge detection and feature extraction. These
filters include possessing optimal localization properties in both spatial and frequency
fields and thus are well suited for texture segmentation issues. A Gabor filter can be
described as a sinusoidal signal of a particular frequency and orientation, modulated
by a Gaussian wave. The filter comprises a real and an imaginary component that
represents orthogonal directions. The two parts can be combined to make a complex
number or utilized separately. The Gabor filter is represented by Eqs. (3), (4) and (5).
The values of parameter of Gabor filter are as follows: λ as 10, θ as π, ψ as 5, σ as
1.9, and γ as 1.
x 2 + γ 2 y 2 x
Complex: g(x, y; λ, θ, ψ, σ, γ) = exp − exp i 2π + ψ (3)
2σ 2 λ
2
x + γ 2 y 2 x
Real: g(x, y; λ, θ, ψ, σ, γ) = exp − cos 2π + ψ (4)
2σ 2 λ
NPIS: Number Plate Identification System 233
2
x + γ 2 y 2 x
Imaginary: g(x, y; λ, θ, ψ, σ, γ) = exp − sin 2π + ψ (5)
2σ 2 λ
.
Step 3: VGG-16 Model Based on CNN
A typical CNN has several convolutional layers, pooling layers, and eventually fully
linked layers in the final step. The convolution operation extracts high-level charac-
teristics such as edges from the input picture. This output is transmitted to the next
layer to identify more complex properties like corners and a combination of edges.
As the network advances deeper, it detects increasingly difficult characteristics like
things, faces, objects, and so on. The Pooling layer is in charge of lowering the spatial
size of the convolved feature. Then the matrix is converted into a vector and sent
into a fully linked layer, much like a neural network. Finally, it uses an activation
function to classify or find particular points in pictures.
We have used transfer learning and customized VGG-16 architecture [9] to train
the CNN model to recognize the number plate points. We have also augmented
the data by using horizontal flip to True, vertical flip to True, zoom range as 0.2,
and shear range as 0.2. All the layers use Rectified Linear Unit (ReLU) activation
function except the last layer, which uses the linear activation function to predict the
four points in the images of the vehicle number plate. The more detailed architecture
that has been used to develop our model is given in Fig. 2.
Step 4: Optical Character Recognition
Optical Character Recognition (OCR) [10] systems convert a two-dimensional text
picture, including machine-printed or handwritten text, from its image representation
to machine-readable text. The initial phase is a connected component analysis, in
which the component outlines are saved. Observing the layering of forms and the
amount of child and grandchild outlines enables detecting and recognizing inverse
text as straightforward as black-on-white writing. At this point, outlines are nested
together to form Blobs.
Blobs are grouped into text lines, and the sequences and areas are evaluated
to determine if the text is fixed pitch or proportional. Depending on the character
spacing, text lines are divided into words in various ways. Character cells immediately
cut selected pitch text. Balanced text is divided into words with definite and fuzzy
spaces. In the first pass, an effort is made to recognize each word in turn. Each
excellent term is used as training data by an adaptive classifier. The adaptive classifier
is then allowed to detect text farther down the page more correctly. A second pass
across the page is conducted because the adaptive classifier may have learned helpful
information too late to contribute to the top of the page. Words that did not identify
well enough are recognized again.
234 A. Saini et al.
The NPIS model is built on a standard dual-core 2.6 GHz CPU on a six-core machine
with an NVIDIA GeForce RTX 2060 GPU of 6GB. The experiment was carried out
using a dataset of 664 images. The images were resized to 256 × 256 × 3 pixels for
training. The data is then normalized in the range [0, 1]. This data is then input into
CNN architecture for training and testing purposes.
The proposed NPIS model correctly detected license numbers with great accuracy
of 98.21% on the training dataset with 0.013 loss and 91.79% accuracy on the test
dataset with 0.027 loss score. Our proposed model was trained up to 100 epochs,
and the batch size was 11. The accuracy and loss curves of our model are shown in
Fig. 3.
NPIS: Number Plate Identification System 235
After completing the training of the model, it is used to predict the number plate.
As shown in Fig. 4, our proposed model predicts the vehicle number plate inside the
bounding box (which is shown in red color). The average time taken by our proposed
model to predict the vehicle number plate in a single image is about 235 ms.
5 Conclusions
The proposed model is based on CNN architecture, NPIS (number plate identification
system) system. Before processing, appropriate filters were applied to de-noise and
sharpen low-quality photos resulting from high-speed vehicles. One of our strategy’s
primary characteristics is its scalability, which allows it to perform appropriately on
various font styles and font sizes. The technology is so effective that it makes no
difference whether the vehicle is stationary or moving at high speeds. The method
given here may be applied in a cosmopolitan region, a rural location, an unpleasant
background, poor light circumstances, a toll booth, any shielded parking lot, and so
on. The primary drawback of this model is that it is not working on multiple vehicle
number plates. The efficiency of larger datasets comprising a range of number plate
styles from various countries will be improved in the future.
References
1. Chen Y-S, Cheng C-H (2010) A delphi based rough sets fusion model for extracting payment
rules of vehicle license tax in the government sector. Exp Syst Appl 37(3):2161–2174
2. Prabuwono AS, Idris A (2008) A study of car park control system using optical character
recognition. In: 2008 International conference on computer and electrical engineering. IEEE,
pp 866–870
3. Albiol A, Sanchis L, Mossi JM (2011) Detection of parked vehicles using spatiotemporal maps.
IEEE Trans Intell Transp Syst 12(4):1277–1291
4. Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Proceedings of
the IEEE international conference on computer vision, pp 839–846
5. Kim KK, Kim KI, Kim JB, Kim HJ (2000) Learning-based approach for license plate recog-
nition. In: Proceedings of the 2000 IEEE signal processing society workshop (Cat. No.
00TH8501). Neural Networks for Signal Processing X, vol 2. IEEE, pp 614–623
6. Han CC, Hsieh CT, Chen YN, Ho GF, Fan KC, Tsai CL (2007) License plate detection and
recognition using a dual-camera module in a large space. In: 2007 41st annual IEEE interna-
tional carnahan conference on security technology. IEEE, pp 307–312
7. Dhar P, Guha S, Biswas T, Abedin MZ (2018) A system design for license plate recognition
by using edge detection and convolution neural network. In: 2018 International conference on
computer, communication, chemical, material and electronic engineering (IC4ME2). IEEE, pp
1–4
8. Ji Y, Chang KH, Hung C-C (2004) Efficient edge detection and object segmentation using
gabor filters. In: Proceedings of ACMSE-’04, pp 454–459, 2–3 April 2004
NPIS: Number Plate Identification System 237
9. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image
recognition. In: ICLR
10. Verma R, Ali J (2012) A-survey of feature extraction and classification techniques in OCR
systems. Int J Comput Appl Inform Technol 1(3):1–3
11. Lotufo RA, Morgan AD, Johnson AS (1990) Automatic number-plate recognition. In: IEE
colloquium on image analysis for transport applications. IET, pp 1–6
12. Fahmy MM (1994) Automatic number-plate recognition: neural network approach. In: Pro-
ceedings of VNIS’94–1994 vehicle navigation and information systems conference. IEEE, pp
99–101
13. Kim KI, Jung K, Kim JH (2002) Color texture-based object detection: an application to license
plate localization. In: International workshop on support vector machines. Springer, Berlin,
Heidelberg, pp 293–309
14. Qadri MT, Asif M (2009) Automatic number plate recognition system for vehicle identification
using optical character recognition. In: 2009 International conference on education technology
and computer. IEEE, pp 335–338
Leveraging Advanced Convolutional
Neural Networks and Transfer Learning
for Vision-Based Human Activity
Recognition
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 239
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_19
240 P. Chauhan et al.
preprocessed data, and a classifier model was developed based on these features
at the second level. The most prevalent HAR feature detectors include Directed
Gradient Histograms, Optical Flow Histograms, Spatial-Temporal Interest Points,
dense trajectories, and some others. Because the selection of characteristics varies
in real time from problem to problem, extracting these features is indeed a time-
consuming and challenging operation. To solve these issues, a deep learning model
was presented and addressed to utilize the requirement for crafted features while
reducing completeness.
Deep learning-based strategies [1, 2] have grown quite effective in recent years,
outperforming conventional approaches to feature extraction to the extent of winning
ImageNet contests. Because of its excellent accomplishment in multiple domains
such as bio signal identification, gesture recognition, computer vision, bioinformat-
ics, and so on, it might be completely utilized in human activity recognition. In
the proposed study, transfer learning is utilized in conjunction with data augmenta-
tion, dropout, and batch normalization to train several advanced convolutional neural
networks to categorize human activity photos into their appropriate classes.
This proposed work aims to recognize persons based on their position and motions
using various advanced convolution neural networks. The research discussed in this
paper makes two contributions to the field of human activity categorization. The first
is activity detection and identification. The HAR system detects shapes or orientations
based on implementation to task the system into executing a certain job, and activity
detection is connected to the localization or position of a human at a given moment
in a rigid image or succession of images, i.e., moving images. The quantitative
comparative analysis of several advanced deep models is the second contribution.
2 Related Work
A lot of researchers have worked on HAR throughout the last few decades. For exam-
ple, Liu et al. [3] presented a coupled hidden conditional random fields model for the
UTKinect HAR dataset by taking the use of complementing properties on both RGB
and depth modalities. The coupled hidden conditional random model expanded the
standard hidden-state conditional random fields approach from one chain-structure
sequential observation to multiple chain-structure sequential observations, synchro-
nizing sequence information recorded in different modalities by merging RGB and
depth sequential data. The authors established the graph structure for the interaction
of several modalities and designed the associated potential functions for model for-
mulation. The inference methods are then utilized to uncover the latent connection
between depth and RGB data with the model temporal context within each modality.
Masum et al. [4] built an intelligent human activity recognition system employ-
ing Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron
(MLP), Naive Bayes (NB), and Deep CNN in the continuation of HAR research.
Leveraging Advanced Convolutional Neural Networks and Transfer Learning … 241
After that, sensors were employed for the data accumulation process such as the
gyroscope, accelerometer, and magnetometer which removed the uniformity and
null label instances of imbalanced classes.
For human motion in 3D space, Vemulapalli et al. [5] used translations and rota-
tions to describe the 3D geometric connections between various body components
in order to properly depict human motion in 3D space. Because 3D rigid body move-
ments are always members of the special Euclidean group SE (3), the suggested
skeleton representation is in the Lie group SE (3)*...*SE (3), which was a curved
manifold. The authors transferred all of the curves to their Lie algebra, which was
a vector space, and performed temporal modeling and classification in the Lie alge-
bra, demonstrating that the suggested representation outperforms several existing
skeleton representations on UTKinect action datasets.
On the other hand, for compact representation of postures from depth imagery,
Xia et al. [6] developed a technique for HAR using histograms of 3D joint positions
(HOJ3D) inside a modified spherical coordinate system for a concise depiction of
orientations from depth imaging. The HOJ3D calculated from the action depth series
is reprojected using LDA and then grouped into k posture visual words, which reflect
the generic action poses. Discrete hidden Markov models are used to simulate the
temporal variations of such visual words (HMMs). The authors also demonstrated
considerable view invariance owing to the spherical coordinate system design and
the robust 3D skeleton estimation from Kinect on a 3D action dataset consisting of
200 3D sequences of 10 indoor activities performed by 10 participants in different
viewpoints.
In a similar vein, Phyo et al. [7] detected human everyday activities using human
skeletal information, merging image processing and deep learning approaches.
Because of the usage of Color Skl-MHI and RJI, the suggested system has a quite
low computational cost. The processing time was calculated using the feature extrac-
tion times of Color Skl-MHI and RJI, as well as the classification time employing
15 frames per second of video data, as a result, the creation of an effective skeletal
information-based HAR for usage as an embedded system. The studies were carried
out with the use of two well-known public datasets Color Skl-MHI and RJI of human
everyday activities.
In terms of 3D space-time, Zhao et al. [8] suggested a fusion-based action recog-
nition system made up of three components: a 3D space-time CNN, a human skeletal
manifold depiction, and classifier fusion. The strong correlation among human activ-
ity was considered throughout the time domain, followed by the deep mobility map
series as input to another stream of the 3D space-time CNN. Furthermore, the related
3D skeleton sequence data was assigned as the recognition framework’s third input.
For the additional fusion step, the computational cost was in the tens of millisec-
onds range. As a result, the proposed approach might be used in parallel. In the
past few years, we have seen significant development in HAR for RGB videos using
handcrafted features.
Liu et al. [9] proposed a simple and effective HAR technique based on depth
sequence skeletal joint information. To begin, the authors computed three feature
vectors that collect angle and position data between joints. The resulting vectors were
242 P. Chauhan et al.
then utilized as inputs to three independent support vector machine (SVM) classi-
fiers. Finally, action recognition was carried out by combining the SVM classification
findings. Because the retrieved vectors primarily featured angle and normalization
relative position based on joint coordinates, the attributes are perspective-invariant.
By employing interpolation to standardize action videos of varying temporal dura-
tions to a constant size, the extracted features have the same dimension for different
videos while retaining the main movement patterns, making the suggested technique
time-invariant. The experimental findings showed that the suggested technique out-
performed state-of-the-art methods on the UTKinect-Action3D dataset while being
more efficient and easier.
3 Proposed Work
The goal of the proposed study is to develop and implement a unique paradigm
that uses advanced convolutional models (CNN, VGG-16, VGG-19, ResNet50,
ResNet101, ResNet152, and YOLOv5) to classify human behavior into ten cate-
gories, making it a multiclass classification problem in machine learning terms.
– Firstly, UTKinect dataset is divided into training and testing sets, and data aug-
mentation is performed to get a clear view of an image sample from different
angles.
– Initially, a base CNN is implemented and then pertained ImageNet is used to fine-
tune the VGG-16, VGG-19, ResNet50, ResNet101, and ResNet152 architecture.
At last, YOLOv5 model is implemented to leverage the power of deep learning.
– For advanced CNN models, a fully connected layer is designed by exploring the
use of dropout and normalization techniques. Two new Dense layers with dropout
and batch normalization are added to the top and a dense layer with a softmax
activation function is added to predict the final image.
– For YOLOv5, Darknet 52 works as a backbone that is used as a feature extractor,
which gives us a feature map representation of the input. Neck is the subset of the
backbone which enhances the feature of discrimination so YOLOv5 uses PAN as
a Neck. If the prediction made is composed of one stage, then it is called Dense
Prediction.
– Finally, a comparison study of different advanced deep CNN models and YOLOv5
are performed for the best score.
Input images in the UTKinect-Action dataset are of various sizes and resolutions,
so they were reduced to 256 x 256 x 3 to reduce file size, and 1610 of the total
1896 images are in training, while the remaining 286 are in validation. To avoid
Leveraging Advanced Convolutional Neural Networks and Transfer Learning … 243
overfitting, the proposed model uses the transfer learning technique along with data
augmentation, dropout, and batch normalization. Fully connected layers are excluded
from each model and pre-trained weights are used. The Accuracy and Loss curve with
data augmentation, dense layers, dropout, and batch normalization have recorded
the models per epoch for 50 epochs. Table 1 displays all the experiments performed
along with their results. In a Convolution neural network, the input layer reads the
image so there is no parameter. There are (n × m × l + 1) × k total parameters in
the convolution layer which takes l and k feature maps as input and output using
n × m filter size. The pooling layer has no parameters because it is used to reduce
the dimension. The fully connected layer has the (n + 1) × m total parameters.
For Deep CNN, total parameters are 3,209,322 out of which 3,207,274 are train-
able parameters and 2,048 are non-trainable parameters. As shown in Fig. 5, the
training accuracy is 97.45 % near the end of 48 epochs and validation accuracy is
also about 96.94 % near the end of the 44 epochs in the diagram. Similarly, the best
training loss is close to 0.0698 and the validation loss is around 0.0938 as shown in
Fig. 1.
For VGG-16, total parameters are 49,338,186 out of which 34,619,402 are train-
able parameters and 14,718,784 are non-trainable parameters. As shown in Fig. 7,
the training accuracy is 92.02 % near the end of 49 epochs and validation accuracy is
also about 93.09 % near the end of the 31 epochs in the diagram. Similarly, the best
training loss is close to 0.2165 and the validation loss is around 0.1708 as shown in
Fig. 2.
244 P. Chauhan et al.
VGG-19 have total 37,073,994 parameters out of which 17,047,562 are trainable
parameters and 20,026,432 are non-trainable parameters. As shown in Fig. 9 the
training accuracy is 91.43 % near the end of 45 epochs and validation accuracy is
also about 92.52 % near the end of the 44 epochs in the diagram. Similarly, the best
training loss is close to 0.2361 and validation loss is around 0.2003 as shown in
Fig. 3.
ResNet50 has total parameters 90,968,970 out of which 67,379,210 are trainable
parameters and 23,589,760 are non-trainable parameters. As shown in Fig. 11 the
training accuracy is 91.66 % near the end of 48 epochs and validation accuracy is
also about 92.52 % near the end of the 42 epochs in the diagram. Similarly, the best
training loss is close to 0.2217 and the validation loss is around 0.1764 as shown in
Fig. 4.
For ResNet101, total parameters are 110,039,434 out of which 67,379,210 are
trainable parameters and 42,660,224 are non-trainable parameters. As shown in
Fig. 13, the training accuracy is 92.42 % near the end of 49 epochs and valida-
tion accuracy is also about 93.66 % near the end of the 44 epochs in the diagram.
Similarly, the best training loss is close to 0.2022 and the validation loss is around
0.1883 as shown in Fig. 5.
Leveraging Advanced Convolutional Neural Networks and Transfer Learning … 245
For ResNet152, total parameters are 193,657,738 out of which 135,282,698 are
trainable parameters and 58,375,040 are non-trainable parameters. As shown in
Fig. 15, the training accuracy is 92.75 % near the end of 46 epochs and valida-
tion accuracy is also about 92.70 % near the end of the 37 epochs in the diagram.
Similarly, the best training loss is close to 0.1922 and the validation loss is around
0.1771 as shown in Fig. 6.
At last, the YOLOv5 model is trained for 50 epochs using the batch size 8 in 0.558
h. Only 802 images and 130 images are used for training and validation for YOLOv5.
The model uses 213 layers, 7037095 parameters, and 0 gradients. Precision, recall,
and mean average precision are recorded at 92.9 %, 94.5 %, and 96.6 %, respectively.
Mean average precision computes the average precision value for recall value over
0 to 1. Figure 7 shows the results of this experiment.
The activity detection task is challenging to complete since the human stance in
the image changes depending on whether the person is sitting, standing, walking, or
sleeping. The rotation can occur both within and outside of the plane. Therefore, as
Leveraging Advanced Convolutional Neural Networks and Transfer Learning … 247
We evaluated the mAP and classification accuracy of our proposed system to that of
other systems, as given in Table 1, and found that some methods had better as well
as closer accuracy, starting from [3] in which the authors recorded 92 % accuracy
on the same dataset. In another work [6], the author recorded 90.92 % accuracy. In
another work [5] 97.08 % accuracy was calculated.
6 Conclusion
References
1. Negi A, Kumar K, Chauhan P, Rajput R (2021) Deep neural architecture for face mask detection
on simulated masked face dataset against covid-19 pandemic. In: 2021 International Conference
on Computing, Communication, and Intelligent Systems (ICCCIS). IEEE, pp 595–600
2. Negi A, Chauhan P, Kumar K, Rajput RS (2020) Face mask detection classifier and model
pruning with Keras-Surgeon. In: 2020 5th IEEE International Conference on Recent Advances
and Innovations in Engineering (ICRAIE). IEEE, pp 1–6
3. Liu AA, Nie WZ, Su YT, Ma L, Hao T, Yang ZX (2015) Coupled hidden conditional random
fields for RGB-D human action recognition. Signal Process 112:74–82
248 P. Chauhan et al.
4. Masum AKM, Hossain ME, Humayra A, Islam S, Barua A, Alam GR (2019) A statistical and
deep learning approach for human activity recognition. In: 2019 3rd International Conference
on Trends in Electronics and Informatics (ICOEI). IEEE, pp 1332–1337
5. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d
skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. pp 588–595
6. Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using his-
tograms of 3d joints. In: 2012 IEEE computer society conference on computer vision and
pattern recognition workshops. IEEE, pp 20–27
7. Phyo CN, Zin TT, Tin P (2019) Deep learning for recognizing human activities using motions
of skeletal joints. IEEE Trans Consum Electron 65(2):243–252
8. Zhao C, Chen M, Zhao J, Wang Q, Shen Y (2019) 3d behavior recognition based on multi-modal
deep space-time learning. Appl Sci 9(4):716
9. Liu Z, Feng X, Tian Y (2015) An effective view and time-invariant action recognition method
based on depth videos. In: 2015 Visual Communications and Image Processing (VCIP). IEEE,
pp 1–4
10. Verma KK, Singh BM, Mandoria HL, Chauhan P (2020) Two-stage human activity recognition
using 2D-ConvNet. Int J Interact Multimed Artif Intell 6(2)
Control Techniques and Their Applications
Real Power Loss Reduction by Chaotic
Based Riodinidae Optimization
Algorithm
Lenin Kanagasabai
1 Introduction
Loss lessening is a precarious assignment in power systems since its plays fore-
most role in better operation. Conversely, in this matter the aforementioned owns
an indisputable influence on upholding the solidity and protected power course.
Commonly, this problem is smeared to optimum controlling of the bases in links
targeting at underrating losses and taming the power silhouette. Loss lessening is
a momentous commission in network. Loss is primarily self-possessed and insti-
gated by flow of power. Supplementary loss not solitary upsurges production cost,
nevertheless lessening the power factor of the organism. Consequently, the loss is
unique and and is a key function. Munificent conformist approaches [1–6] previously
employed and Evolutionary techniques [7–16] are smeared. Meta-heuristic proce-
dures fluctuate from approaches and methodically moving towards nearby conceiv-
able optimum location throughout the calculation procedure, sidestepping premature
convergence to indigenous optima [17]. In addition, these approaches frequently
agonize on or after the succeeding inadequacies. Primarily, a considerable calcula-
tion encumbrance is obligatory owing to monotonous power course computation,
and creation of actual phase of application is perplexing. Furthermore, procedure’s
enactment is strappingly reliant on the structure prototype’s accurateness. In this
paper Chaotic based Riodinidae (CRO) optimization algorithm is used to condense
the loss. In Riodinidae the optimization examination process owns twofold posses-
sions of Riodinidae. Tinkerbell chaotic map engendering standards are implemented.
Riodinidae algorithm has been integrated with the Firefly algorithm’s examination.
In IEEE 118 and 300 bus systems, Chaotic based Riodinidae (CRO) optimization
L. Kanagasabai (B)
Prasad V.Potluri, Siddhartha Institute of Technology, Kanuru, Vijayawada, Andhra
Pradesh 520007, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 251
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_20
252 L. Kanagasabai
algorithm legitimacy has been weighed. Appraisal of loss is done with standard proce-
dures. Projected Chaotic based Riodinidae (CRO) optimization algorithm abridged
the loss adeptly.
2 Problem Formulation
F = PL = gk V2i + V2j − 2Vi Vj cosθij (1)
k∈Nbr
F = PL + ωv × VDV (2)
Npq
VDV = |Vi − 1| (3)
i=1
PG = PD + PL (4)
gi ≤ Qgi ≤ Qgi , i ∈ Ng
Qmin max
(6)
Vmin
i ≤ Vi ≤ Vmax
i , i∈N (7)
Tmin
i ≤ Ti ≤ Tmax
i , i ∈ NT (8)
Qmin
c ≤ Qc ≤ Qmax
C , i ∈ NC (9)
t+1
Ri,k = R1a1,k
t
(10)
cr t = a. p (11)
where a is arbitary
Freshly created Riodinidae is calculated as,
t+1
Ri,k = Ra2,k
t
(12)
j,k = Rb,k
R t+1 t
(13)
In the proposed Chaotic based Riodinidae (CRO) optimization procedure, the explo-
ration process is boosted by applying the Firefly procedure’s exploration equiva-
lence. Figure 1 shows the Schematic diagram of Chaotic based Riodinidae (CRO)
optimization algorithm.
2
Rit+1 = Rit + βo r −γ ri, j R tj − Rit + γ (a − 0.50) (16)
a. Start
b. Engender the population
c. Compute the fitness rate of Riodinidae
d. while t < max.gen do
e. Rendering to the fitness rate catalogue the entities
f. Split the population
g. For i = 1 to NPA ; Riodinidae in sub pop A
h. Apply Riodinidae relocation operative
i. Create fresh entities
j. End for
k. For i = 1 to NP_ B; Riodinidae in sub pop B
l. if t < max. gen: 0.50, then
m. Engender sub.pop by Riodinidae regulative op.
n. otherwise
o. Engender new pop. In sub. pop B by Riodinidae regulative op
p. Apply Tinkerbell chaotic map
q. et+1 = e2t − f2t + a · et + b · ft
r. ft+1 = 2et ft + c · et + d · ft
s. End if
t. End for
u. Entire population is amalgamation of the freshly created sub. Pop A and B
v. Rendering to the freshly rationalised locations, appraise the populace
w. t=1
x. End while
y. choose the exceptional unit form complete populace
z. End.
4 Simulation Results
Base value
140
120
100
CRO 80 ImPSO
60 True Loss (MW)
40
20
0
Ratio of loss
diminution
BaCLPSO BaPSO
BaEPSO
BaCSO
Table 3 Convergence
CRO Loss in MW Time (S) No. of iter
characteristics
IEEE 118 112.19 38.79 29
IEEE 300 625.020208 68.62 36
5 Conclusion
References
1. Lee K (1984) Fuel-cost minimisation for both real and reactive-power dispatches. Proc Gener,
Transm Distrib Conf 131(3):85–93
2. Deeb N (1998) An efficient technique for reactive power dispatch using a revised linear
programming approach. Electr Power Syst Res 15(2):121–134
3. Bjelogrlic M (1990) Application of Newton’s optimal power flow in voltage/reactive power
control. IEEE Trans Power System 5(4):1447–1454
4. Granville S (1994) Optimal reactive dispatch through interior point methods. IEEE Trans Power
Syst 9(1):136–146
5. Grudinin N (1998) Reactive power optimization using successive quadratic programming
method. IEEE Trans Power Syst 13(4):1219–1225
6. Sinsuphan N (2013) Optimal power flow solution using the improved harmony search method.
Appl Soft Comput 13(5):2364–2374
7. Valipour K (2017) Using a new modified harmony search algorithm to solve multi-objective
reactive power dispatch in deterministic and stochastic models. AI Data Min 5(1):89–100
8. Naidji (2020) Stochastic multi-objective optimal reactive power dispatch considering load and
renewable energy sources uncertainties: a case study of the Adrar isolated power system. Int
Trans Electr Energy Syst 6(30):1–12
9. Farid (2021) A novel power management strategies in PV-wind-based grid connected hybrid
renewable energy system using proportional distribution algorithm. Int Trans Electr Energy
Syst 31(7):1–20
10. Sheila (2021) A novel ameliorated Harris hawk optimizer for solving complex engineering
optimization problems. Int J Intell Syst 36(12):7641–7681
11. Prashant (2021) Design and stability analysis of a control system for a grid-independent direct
current micro grid with hybrid energy storage system. Comput & Electr Eng 93(1):1–15
12. Chen. : Optimal reactive power dispatch by improved GSA-based algorithm with the novel
strategies to handle constraints. Appl Soft Computing, 50(1), 58–70 (2017).
258 L. Kanagasabai
13. Mei (2017) Optimal reactive power dispatch solution by loss minimization using moth flame
optimization technique. Appl Soft Comput 59(1):210–222
14. Uney (2019) New metaheuristic algorithms for reactive power optimization. Tehnički Vjesnik
26(1):1427–1433
15. Abaci K (2017) Optimal reactive-power dispatch using differential search algorithm. Electr.
Engineering 99(1):213–225
16. Huang (2012) Combined differential evolution algorithm and ant system for optimal reactive
power dispatch. Energy Procedia 14(1):1238–1243
17. Kanatip R, Keerati C (2021) Probabilistic optimal power flow considering load and solar power
uncertainties using particle swarm optimization. GMSARN Int J 15:37–43
18. Inoue (2000) Application of chaos degree to some dynamical systems. Chaos, Solut Fractals
11 (1):1377–1385
19. Salimi (2015) Stochastic fractal search: a powerful metaheuristic algorithm. Knowl-Based Syst
75(1):1–18
20. IEEE (1993) The IEEE-test systems. http://www.ee.washington.edu/trsearch/pstca/
21. Dai C (2009) Seeker optimization algorithm for optimal reactive power dispatch”. IEEE T
Power System 24(3):1218–1231
22. Reddy (2014) Faster evolutionary algorithm based optimal power flow using incremental
variables. Electr Power Energy Syst 54(1):198–210
23. Reddy S (2017) Optimal reactive power scheduling using cuckoo search algorithm. Int J Electr
Comput Engineering 7(5):2349–2356
24. Hussain AN (2018) Modified particle swarm optimization for solution of reactive power
dispatch. Res J Appl Sci, Eng Technol 15(8):316–327
5G Enabled IoT Based Automatic
Industrial Plant Monitoring System
1 Introduction
In modern day industrial plants electrical machines i.e., motors, generator, trans-
formers etc. are the prime elements. No industry can run without the use of electrical
machines to drive the system. If an electrical machine fails, it may result in several
consequences such as break in continuity of production time, failure of system or
even complete shutdown of the plant and in some cases may even pose threat of
injury or even human life. Thereby failure of an electrical machine may result in
lots of revenue, production, product quality and risk to safety of workers. The Fig. 1
depicts how the electrical machines and automation has become a key element of
modern day Industries.
Therefore, condition monitoring of parameters of electrical machines like vibra-
tion, temperature, current, voltages etc. becomes important in order to timely identify
defect development of a fault in machine. Condition monitoring plays a vital rook
in predictive maintenance. With the help of proper condition monitoring necessary
maintenance can be scheduled ensuring complete health of the machines.
This will prevent the consequential damages to the machine and further implica-
tions. Figure 2 shows a typical industrial setup deployed for condition monitoring of
industrial machines.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 259
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_21
260 K. Shinghal et al.
The proposed system uses 5G technologies based IoT System for the purpose of
communication with end servers. Figure 3 depicts the various generations of mobile
communication.
5G Enabled IoT Based Automatic Industrial Plant Monitoring System 261
5G wireless technologies are growing at a rapid speed and will find numerous
applications in the coming years. It has several advantages over existing 4G wireless
technologies such as faster network speed, low delay in data rate i.e. huge increase in
responsiveness and a smooth experience (must for real time applications). Figure 4
depicts the advantage of using 5G wireless technologies for IoT based industrial
plant monitoring systems.
Rest of the paper is organized as follows: literature review, problem identification
and gap in existing technology is carried out in Sect. 2, the proposed 5G enabled
monitoring system is presented in Sect. 3 followed by the experimental setup &
methodology in Sect. 4. The results are discussed in Sect. 5 and finally the conclusion
and future work of the proposed work are given in Sect. 6.
Fig. 4 Advantages of using 5G wireless technologies for IoT based industrial plant monitoring
systems
262 K. Shinghal et al.
2 Literature Review
Karemore et al. in their paper titled a review of IoT based smart industrial system for
controlling and monitoring presented proposed a framework required in industries
for controlling, monitoring, security and safety of various exercises. The monitoring
frame incorporates detectors like fire detector, fume detector, ultrasonic detector,
humidity and temperature detector, current and voltage with Wi-Fi module for control
operations. With the advantages of unusual practices, reasonable conditioning will
be actuated. [1]. Gore et al. in their paper titled Bluetooth based sensor monitoring in
industrial IoT plants presented that typical industrial IoT use cases involve acquiring
data from detector bias in factory and communicating the same to the internet for
original or remote monitoring and control. They described how Bluetooth low energy
(BLE) technology can be used to connect detector bumps to Internet-grounded
services and operations using gateway in an artificial factory [2]. A. Vakaloudis et al.
in their paper titled A framework for rapid integration of IoT Systems with industrial
environments proposed a comprehensive start to finish perspective extending from
sensor devices to interfacing with the end user where all product and equipment
components of the framework are being thought of and addressed [3]. Zhao et al.
in their paper titled design of an industrial IoT based monitoring system for power
substations gave a reasonable application that was executed and tried in a real power
substation. The framework joins the highlights of an IoT stage with the necessities
of high-speed real-time applications while utilizing a solitary high-resolution time
source as the reference for both steady-state and transient conditions [4]. Picot et al.
in their paper titled Industry 4.0 LabVIEW Based Industrial Condition Monitoring
System for Industrial IoT System presented a platform to host varied operations,
the industry standard fieldbus protocol Modbus TCP was used in confluence with
the LabVIEW development ambient, where a bespoke graphical UI was created to
give control and a visual depiction of the information gathered. In addition, one of
the bases went about as the yield for outfit shows, which in turn corresponded the
alert status of the UI [5]. Khan et al. in their paper titled IoT Based Health Moni-
toring System for Electrical Motors presented internet of things (IoT) predicated
system is designed for the electrical motor. The electrical motor health is covered
by measuring the parameters similar as vibration, current and temperature. It can
be measured through the detectors, like accelerometer, current detector and thermo-
couple. To avoid the limitation of the internet, the signals of these detectors were
also transferred to the receiver through global system for mobile (GSM) because it
can also work in the areas where the internet isn’t available [6]. Gore et al. in their
paper titled IoT based equipment identification and location for maintenance in large
deployment industrial plants presented condition monitoring system that employs
fusion of detectors and uses acquired data in health evaluation algorithms to distin-
guish faults. In a standard factory deployment, each machine similar as motor would
be convoyed by a respective health monitoring unit. Condition monitoring opera-
tion system integrated with regulator in the factory control room, gathers condition
monitoring data from the varied sub-systems and generates automated cautions upon
5G Enabled IoT Based Automatic Industrial Plant Monitoring System 263
failure discovery [7]. Lyu et al. in their paper titled 5G Enabled Codesign of Energy-
Efficient Transmission and Estimation for Industrial IoT Systems introduced a trans-
mission assessment codesign structure to set out the establishment for ensuring the
endorsed assessment exactness with restricted correspondence assets. The proposed
approach is then optimized by planning a compelled minimization issue, which is
blended integer nonlinear programming and addressed effectively with a block coor-
dinate descent based deterioration technique. At last, simulation results show that
the proposed approach has superiorities in improving both the assessment precision
and the energy productivity [8].
From the literature review it is evident that monitoring applications are having
higher real time responsiveness requirements. In critical machines such as assembly
lines, conveyer belts where the product is continuously being supplied, immediate
response on detecting a fault is required to save the product from damage, the use
of reliable monitoring infrastructure based on superior qualities of 5G network is
needed where there is risk of human life in case of failure.
an Arduino coprocessor on board to connect with local sensors required for moni-
toring of electrical machines and also with controller actuators subunit for controlling
relays, switches and valves etc. All experiments were conducted in laboratory with
the same local area network within the radius of 6-m. Figure 6 shows the laboratory
experimental setup for conducting the experiments.
Table 1 Outlines hardware configuration of IoT node and 5G wireless gate-way.
The local sensors were installed to monitor stator current and temperature.
Fig. 6 Laboratory experimental setup for conditioning monitoring of Induction Motor (IM)
Experimental studies of eighty-four (84) cases have been carried out, out of which
eight (08) cases have been reported here for rotor fault detection. The specifications
of Induction Motors (IM) under observation are tabulated in Table 2. The current
patterns for all the cases are shown in Figs. 7, 8, 9, 10, 11, 12, 13, 14. The rating of
the IM is varying from 180 − 661 KW.
The proposed setup was evaluated using a prototype system deployed in the laboratory
to study the behavior of the 5G based IoT enabled plant monitoring system. Latency in
terms of actuator & control subunit response time and reliability was evaluated. Earlier
268 K. Shinghal et al.
it was considered as highly reliable and its low latency is achievable only through wired
connections. The use of 5G based wireless technologies enabled developing wire-
less condition monitory systems using IoT. This helped manufacturers in increased
productivity with increased safety & reliability of complete systems Table 3.
It can be observed from the results shown in Fig. 15 that latency for case 1 and case
8 is maximum i.e. 163 and 160 ms, respectively. It can be seen that even the worst
case latency in case of 5G IoT network is approximate 15% better than a standard
4G network latency [14, 15]. Further from Table 4 it is observed that the resource
utilization is more in case 1 and case 8 and the storage device required is a solid state
storage type which is costlier than conventional storage devices but are faster and
are more reliable.
Acknowledgements The authors are thankful to Prof. Rohit Garg, Director MIT & the Manage-
ment of MITGI for constant motivation and support.
References
1. Karemore P, Jagtap PP (2020) A review of IoT based smart industrial system for con-trolling
and monitoring. In: 2020 Fourth International Conference on Computing Methodologies and
Communication (ICCMC). Erode, India, pp 67–69. https://doi.org/10.1109/ICCMC48092.
2020.ICCMC-00012
270 K. Shinghal et al.
2. Gore RN, Kour H, Gandhi M, Tandur D, Varghese A (2019) Bluetooth based Sensor Monitoring
in Industrial IoT Plants. In: 2019 International Conference on Data Science and Communication
(IconDSC). Bangalore, India, pp 1–6. https://doi.org/10.1109/IconDSC.2019.8816906
3. Vakaloudis A, O’Leary C (2019) A framework for rapid integration of IoT Systems with
industrial environments. In: 2019 IEEE 5th World Forum on Internet of Things (WF-IoT).
Limerick, Ireland, pp 601-605. https://doi.org/10.1109/WF-IoT.2019.8767224
4. Zhao L, Matsuo, Zhou Y, Lee W (2019) Design of an Industrial IoT-Based Monitoring System
for Power Substations. In: 2019 IEEE/IAS 55th Industrial and Commercial Power Systems
Technical Conference (I&CPS). Calgary, AB, Canada, pp 1-6. https://doi.org/10.1109/ICPS.
2019.8733348
5. Picot HW, Ateeq M, Abdullah B, Cullen J (2019) Industry 4.0 LabVIEW Based Industrial
Condition Monitoring System for Industrial IoT System. In: 2019 12th International Conference
on Developments in eSystems Engineering (DeSE). Kazan, Russia, pp 1020–1025. https://doi.
org/10.1109/DeSE.2019.00189
6. Khan N, Rafiq F, Abedin F, Khan FU (2019) IoT based health monitoring system for electrical
motors. In: 2019 15th International Conference on Emerging Technologies (ICET). Peshawar,
Pakistan, pp 1–6. https://doi.org/10.1109/ICET48972.2019.8994398
7. Gore RN, Kour H, Gandhi M (2018) IoT based equipment identification and location for
maintenance in large deployment industrial plants. In: 2018 10th International Conference on
Communication Systems & Networks (COMSNETS). Bengaluru, pp 461–463. https://doi.org/
10.1109/COMSNETS.2018.8328244
8. Lyu L, Chen C, Zhu S, Guan X (2018) 5G enabled codesign of energy-efficient trans-mission
and estimation for industrial IoT systems. IEEE Trans Industr Inf 14(6):2690–2704. https://
doi.org/10.1109/TII.2018.2799685
9. Acharya V, Hegde VV, Anjan K, Kumar M (2017) IoT (Internet of Things) based efficiency
monitoring system for bio-gas plants. In: 2017 2nd International Conference on Computational
Systems and Information Technology for Sustainable Solution (CSITSS). Banga-lore, pp 1–5.
https://doi.org/10.1109/CSITSS.2017.8447567
10. Shyamala D, Swathi D, Prasanna JL, Ajitha A (2017) IoT platform for condition monitoring of
industrial motors. In: 2017 2nd International Conference on Communication and Electronics
Systems (ICCES). Coimbatore, pp. 260–265. https://doi.org/10.1109/CESYS.2017.8321278
11. Zhang F, Liu M, Zhou Z, Shen W (2016) An IoT-based online monitoring system for continuous
steel casting. IEEE Internet Things J 3(6):1355–1363. https://doi.org/10.1109/JIOT.2016.260
0630
12. Rahman A, Hossain MRT, Siddiquee MS (2021) IoT based bidirectional speed control and
monitoring of single phase induction motors. In: Vasant P, Zelinka I, Weber GW (eds) Intelligent
computing and optimization. ICO 2020. Advances in intelligent systems and computing, vol
1324. Springer, Cham. https://doi.org/10.1007/978-3-030-68154-8_88
13. Kannan R, Solai Manohar S, Senthil Kumaran M (2019) IoT-based condition monitoring and
fault detection for induction motor. In: Krishna C, Dutta M, Kumar R (eds) Proceedings of
2nd international conference on communication, computing and networking. Lecture notes in
networks and systems, vol 46. Springer, Singapore. https://doi.org/10.1007/978-981-13-1217-
5_21
14. D. K. M. Dr. V. Khanaa, “4G Technology”, International Journal of Engineering and Computer
Science, vol. 2, no. 02, Feb. 2013.
15. Gopal BG (2015) A comparative study on 4G and 5G technology for wireless applications.
IOSR J Electron Commun Eng (IOSR-JECE), vol.10, issue 6, Dec. 2015
Criterion to Determine the Stability
of Systems with Finite Wordlength
and Delays Using Bessel-Legendre
Inequalities
1 Introduction
During the design of controllers for robots many hardware are employed that are
based on fixed point representation of data. Usually, the fixed point hardware have
limited wordlength known as finite wordlength. Further, many of the mobile robot
systems are controlled using wired control or wireless control as in the case of drones.
There may arise propogation delays during the control of such mobile robots. The
presence of delays and the finite wordlength nature of the hardware employed may
lead to instabilities in the system. This paper is concerned with the instabilities that
arise in discrete systems during their digital implementation and due to the time-
varying delays present in the system. Due to limited wordlength being employed,
overflow arises in the digital implementation of discrete systems. To overcome the
overflow saturation finite wordlength nonlinearity is widely employed [2–4, 10–12].
The delays are another source of instability in a system. Various summation
inequalities such as Jensen, Reciprocally Convex and Wirtinger have been employed
to deal with the sum terms that arise in the forward difference of Lyapunov functions
[1, 6, 9].
The system considered in this paper represents a class of systems under the influ-
ence of finite wordlength nonlinearities and time-varying delays. Such systems have
been studied for example in [2, 10, 14, 15]. In [2], a delay-dependent stability crite-
rion was proposed for discrete systems with saturation nonlinearities, time-varying
delays and uncertainties. Free-weighting matrix method was employed to obtain the
criterion. The delay-partitioning method was employed in [14] to obtain less con-
servative results as compared to [2]. Further improvement in conservativeness was
reported in a criterion presented in [15]. The nonlinear characterization was similar
to [2, 14], the improvement was due to Wirtinger-based inequality employed to deal
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 271
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_22
272 R. Nigam and S. K. Tadepalli
with the sum terms in the forward difference of the Lyapunov function. In [10], the
problem was extended for the case of two-dimensional discrete systems represented
by the Fornasini-Marchesini Second Local State Space (FMSLSSS) model.
Through better nonlinear characterization and employing better summation
inequalities, there is still further scope to obtain less conservative results. This is
the motivation behind the work presented in this paper. Following is the contribution
of this paper:
Section 2 describes the system and specifies the lemmas employed to obtain the
main results of the paper; Sect. 3 presents the main results of the paper; a Numeri-
cal Example is provided in Sect. 4 and comparisons are made with previous works
available in the literature.
2 Problem Formulation
where (ι) ∈ Rn is vector representing the system state; A, Ad ∈ Rn×n are system
matrices; Φ(ι) ∈ Rn represents initial condition at time ι; F (·) is the saturation
nonlinear function and τ (ι) is a time-varying delay satisfying τ (ι) ∈ [τ2 , τ1 ].
The following Lemmas have been used for obtaining the main results of the paper.
d−1
1
η T (i)N η(i) ≥ GT (c, d − 1)ωr (N )Gr (c, d − 1), (2)
i=c
d −c r
where
Criterion to Determine the Stability of Systems … 273
where
Jl (a, b) = (a), if r = −1
b
b
b
= ··· (i)if r ≥ 0 (6)
ir +1 =a i 2 =i 3 i 1 =i 2
ιι+l ι ι!
alι = (−1)ι+l l l
, l represent the Binomial Coefficients given by (ι−l)!l!
.
3 Main Results
Theorem 1 For a time-varying delay τ (ι) and nonnegative integer r , the system
described by (1) is asymptotically stable if there exist positive definite matrices
P r ∈ R(r +2)n , Q1 , Q2 , R1 , R2 ∈ Rn , matrices S 1 , S 2 ∈ R4n×2n and matrices M,
N and Q such that
(τ1 ) E2 (τ2 ) E1
< 0, <0 (8)
∗ −ω1 (R2 ) ∗ −ω1 (R2 )
di ≥ φi , i = 1, 2, . . . n (9)
where
di = min {(nii + ks mii ), (nii + mii )} , i = 1, 2, . . . , n (10)
274 R. Nigam and S. K. Tadepalli
n
φi = max {|nsi + ks msi |, |nsi + msi |}
s=1,s =i
n
+ |qsi |, i = 1, 2, . . . , n (11)
s=1
2n
ks denotes m=1 |asm | where asm is the element of A = [AAd ].
Proof We employ the Lyapunov Functional [5] shown below to obtain the stability
criterion:
where
ι−τ
1 −1
where
Let
Λr (ι) = χrT (τ (ι))ξr (ι), Λr (ι) = UrT ξr (ι) (19)
where
U r = Ū0 , Ū1 , Ū2 , . . . , Ūr (20)
Ū0 = [er +7 − e1 , e2 − e4 ] (21)
Ū1 = e1 − e2 (22)
τ1 + r − 1
Ūr = (e1 − er +5 ) for r ≥ 2 (23)
r −1
χr (τ (ι))) = χ̄0 (τ (ι))), χ̄1 (τ (ι))), χ̄2 (τ (ι))) · · · , χ̄r (τ (ι))) (24)
χ̄0 (τ (ι))) = [e1 , (τ (ι)) − τ1 + 1)e5 − e2 + (τ2 − τ (ι)) + 1)e6 − e3 ] (25)
τ1 + r τ1 + r − 1
χ̄r (τ (ι))) = er +6 − e1 for r ≥ 1 (26)
r r −1
ξrT (ι) = T (ι) T (ι − τ1 ) T (ι − τ (ι))) T (ι − τ2 )
Γ2,1
T
(ι) Γ3,1
T
(ι) F T (y(ι)) , if r = 0 (27)
ξrT (ι) = T (ι) T (ι − τ1 ) T (ι − τ (ι))) T (ι − τ2 )
Γ2,1
T
(ι) Γ3,1
T
(ι) Γ1,1
T
(ι)
Γ1,2
T
(ι) ··· Γ1,r
T
(ι) F T (y(ι)) , if r ≥ 1 (28)
where
r!
Γ1,r (ι) = Jr −1 (ι − τ1 , ι) (29)
(τ1 + 1)r̄
r!
Γ2,1 (ι) = Jr −1 (ι − τ (ι)), ι − τ1 ) (30)
(τ (ι)) − τ1 + 1)r̄
r!
Γ3,1 (ι) = Jr −1 (ι − τ2 , k − τ (ι))). (31)
(τ2 − τ (ι)) + 1)r̄
ι−τ1
1
Γ2,1 = ( j1 ) (32)
(τ (ι) − τ1 + 1) j1 =ι−τ (ι)
ι−τ
(ι)
1
Γ3,1 = ( j1 ) (33)
(τ2 − τ (ι) + 1) j =ι−τ
1 2
ι
1
Γ1,1 = ( j1 ) (34)
τ1 + 1 j1 =ι−τ1
ι ι
2
Γ1,2 = ( j1 ) (35)
(τ1 + 1)(τ1 + 2) j =ι−τ j = j
2 1 1 2
ι ι ι
6
Γ1,3 = ( j1 ) (36)
(τ1 + 1)(τ1 + 2)(τ1 + 3) j =ι−τ j = j j = j
3 1 2 3 1 2
24
Γ1,4 = ×
(τ1 + 1)(τ1 + 2)(τ1 + 3)(τ1 + 4)
ι
ι ι ι
( j1 ) (37)
j4 =ι−τ1 j3 = j4 j2 = j3 j1 = j2
120
Γ1,5 = ×
(τ1 + 1)(τ1 + 2)(τ1 + 3)(τ1 + 4)(τ1 + 5)
ι
ι ι ι ι
( j1 ). (38)
j5 =ι−τ1 j4 = j5 j3 = j4 j2 = j3 j1 = j2
This yields
Also,
ι−1
V 3 (ι) = ξrT (ι)Ω 2 ξr (ι) − τ1 T (i)Q2 (i)
i=ι−τ1
ι−τ
(ι))−1
− τ12 T (i)R2 (i)
i=ι−τ2
ι−τ
1 −1
ι−1
− τ1 T (i)Q2 (i)
i=ι−τ1
ι−τ
(ι))−1 ι−τ
1 −1
Using Lemma 2
1
ω1 (R2 ) 0
− ξrT (ι) [ς 1 ς 2] α [ς 1 ς 2] T
ξr (ι)
0 1
ω (R2 )
1−α 1
≤ −ξrT (ι) {[ς 1 ς 2 ] [H e(S 1 [I n 0n ] + S 2 [0n I n ])
−1 −1 T
−αS 1 ω1 (R2 ) S 1T − (1 − α)S 2 ω1 (R2 ) S 2 [ς 1 ς 2 ]T ξr (ι). (54)
We further obtain
1
ω (R ) 0
≤ − ξrT (ι) [ς 1 ς 2 ] α 1 2 1 [ς 1 2 ] ξr (ι)
ς T
(55)
0 1−α 1
ω (R2 )
≤ ξrT (ι)Ω 3 ξr (ι) + ξrT (ι) αE 1 ω1 (R2 )−1 E 1T
+(1 − α)E 2 ω1 (R2 )−1 E 2T ξr (ι) (56)
where
Ω 3 = −H e(E 1 ς 1T + E 2 ς 2T ) (57)
E 1 = [ς 1 ς 2 ] S 1 (58)
E 2 = [ς 1 ς 2 ] S 2 (59)
ς 1 = [e2 − e3 , e2 + e3 − 2e5 ]
ς 2 = [e3 − e4 , e3 + e4 − 2e6 ] . (60)
Since is a positive quantity [12], then V(ι) ≤ 0 if and only if Ξ (τ (ι))) < 0.
Here, Ξ (τ (ι))) < 0 implies Ξ (τ1 ) < 0 and Ξ (τ2 ) < 0. Using the Schurs comple-
ment would yield the LMIs
(τ1 ) E2 (τ2 ) E1
< 0, < 0. (66)
∗ −ω1 (R2 ) ∗ −ω1 (R2 )
4 Numerical Example
Example 1 Consider the system (1) with the following system parameters:
0.68 −0.45 −0.1 −0.2
A= , Ad = . (67)
0.45 0.65 −0.2 −0.1
We find the upper delay bound τ2 for a given lower delay bound τ1 to compare the
conservativeness of the presented criterion with the previously reported criterion.
Here we use SeDuMi solver [13] and YALMIP parser [8] along with MATLAB to
obtain the results.
Table 1 presents the upper delay bound for various lower delay bounds. ‘X’ denotes
the inability of the criterion to determine the stability of the system under consider-
ation. It can be observed that Theorem 2 [15] and Theorem 3.1 [2] are unable to test
the stability of the system. Theorem 2.1 is used with different r , this yields different
results. By increasing r the conservativeness decreases and the criterion is able to
determine the stability of the system.
280 R. Nigam and S. K. Tadepalli
This paper presented a stability criterion for discrete systems with saturation finite
wordlength nonlinearity and time-varying delays. The criterion is based on the
Bessel-Legendre Summation Inequalities. With the help of a numerical example,
Criterion to Determine the Stability of Systems … 281
References
1. Hien LV, Trinh H (2016) New finite-sum inequalities with applications to stability of discrete
time-delay systems. Automatica 71:197–201
2. Kandanvli VKR, Kar H (2013) Delay-dependent stability criterion for discrete-time uncertain
state-delayed systems employing saturation nonlinearities. Arab J Sci Eng 38(10):2911–2920
3. Kokil P, Jogi S, Ahn CK, Kar H (2020) An improved local stability criterion for digital fil-
ters with interference and overflow nonlinearity. IEEE Trans Circuits Syst II Express Briefs
67(3):595–599. https://doi.org/10.1109/TCSII.2019.2918788
4. Kokil P, Jogi S, Ahn CK, Kar H (2021) Stability of digital filters with state-delay and external
interference. Circuits Syst Signal Process. https://doi.org/10.1007/s00034-021-01650-8
5. Lee SY, Park J, Park P (2018) Bessel summation inequalities for stability analysis of discrete-
time systems with time-varying delays. Int J Robust Nonlinear Control 29(2):473–491
6. Liu J, Zhang J (2012) Note on stability of discrete-time time-varying delay systems. IET Control
Theory Appl 6(2):335–339
7. Liu K, Seuret A, Xia Y (2017) Stability analysis of systems with time-varying delays via the
second-order Bessel-Legendre inequality. Automatica 76:138–142
8. Lofberg J (2004) Yalmip: a toolbox for modeling and optimization in MATLAB. In: Proceed-
ings of computer aided control systems design conference, Taipei, Taiwan, pp 284–289
9. Nam PT, Pathirana PN, Trinh H (2015) Discrete Wirtinger-based inequality and its application.
J Franklin Inst 352:1893–1905
10. Pandey S, Tadepalli SK (2021) Improved criterion for stability of 2-D discrete systems involving
saturation nonlinearities and variable delays. ICIC Express Lett 15(3):273–283
11. Rani P, Kumar MK, Kar H (2019) Hankel norm performance of interfered fixed-point state-
space digital filters with quantization/overflow nonlinearities. Circuits Syst Signal Process
38:3762–3777
12. Shen T, Yuan Z, Wang X (2012) Stability analysis for digital filters with multiple saturation
nonlinearities. Automatica 48(10):2717–2720
13. Sturm J (1999) Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric
cones. Optim Methods Softw 11–12:625–653, version 1.05. http://fewcal.kub.nl/sturm
14. Tadepalli SK, Kandanvli VKR (2016) Improved stability results for uncertain discrete-time
state-delayed systems in the presence of nonlinearities. Trans Inst Meas Control 38(1):33–43
15. Tadepalli SK, Kandanvli VKR, Vishwakarma A (2018) Criteria for stability of uncertain
discrete-time systems with time-varying delays and finite wordlength nonlinearities. Trans
Inst Meas Control 40(9):2868–2880
Adaptive Control for Stabilization of Ball
and Beam System Using H∞ Control
Sudhir Raj
1 Introduction
Control of the ball and beam system is an interesting problem in the control theory.
The proposed non-linear controller is applied for the stabilization of underactuated
system. H∞ -based adaptive control can be applied for the control of underactuated
systems. The problem considered is the ball and beam system and the objective is to
develop the controller for stabilization of underactuated systems.
The proposed method [1] combines the state feedback controller with observer-
based control for the stabilization of ball beam system. State-dependent saturation
controller [2] is used for the stabilization of ball and beam system. Energy shaping
[3]-based inverse Lyapunov controller is applied for the control of ball beam system.
Convex optimization-based optimal control [4] is carried out for the stabilization
of ball beam system. Three different controllers [5] are applied for the control of
ball beam system. Experimental results are presented to validate the controllers.
Static and dynamic-based sliding mode controller [6] is applied for the control of
ball beam system which avoids chattering in the system. The proposed ant colony
optimization [7] is proposed for the control of ball beam system. The interpolating
sliding mode observer [8]-based control is carried out for the stabilization of ball
beam system. Adaptive control [9] based on recurrent neural network is applied for
the stabilization of ball and beam system which gives better performance as compared
to Linear quadratic regulator. The passivity-based controller [10] is applied for the
control of ball beam system. The input output linearization [11]-based controller is
carried out for the stabilization of ball beam system. The algebraic Riccati equation
approach [12] is applied to H∞ -based state feedback controller. Adaptive sliding
mode control [13] is proposed for non-linear underactuated systems.
S. Raj (B)
SRM University, Amaravati, Andhra Pradesh, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 283
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_23
284 S. Raj
H∞ -based adaptive control was not reported in the earlier work for stabilization
of ball and beam system. The objective of this controller is to bring the states of the
system to the origin after a perturbation was started. The control objective is to find a
decentralized control that will bring an arbitrary initial state to the equilibrium point
of the system. The main contribution of this work is to develop a robust controller
for the stabilization of underactuated systems.
The control of ball beam becomes difficult due to its non-linear dynamics.
Figure 1 shows the diagram of ball and beam system. The ball rolls on the beam
and the rotation of the beam is controlled by the motor. H∞ -based adaptive control
is applied so that the position of the ball can be controlled.
The state space model of the ball and beam system is given by the equation number
(1). The disturbances for ball and beam system is taken as w1 .
The equation of the ball beam for force balance can be written as
Fb = Mball gsinθ − Fr
= Mball ẍ + b1 ẋ (2)
x is taken as the vertical distance between the center of ball and center of the shaft.
b1 is taken as the friction constant. θ gives the beam’s tilt angle from the horizontal
position. Fr gives the value of the force which is applied externally. The position of
the ball is given by equation number (3). The rotational angle and radius of ball are
taken as α and a1 , respectively.
x = αa1 (3)
2
Jball = Mball Rb2 (5)
5
Equation (6) can be derived from Eqs. (2)–(5).
2
2 Rb b1 ẋ
1+ ẍ + = gsinθ (6)
5 a1 Mball
K is the electromotive force constant as taken in equation number (7). I gives the
current which flows in the motor. The term b is taken as the damping constant of the
rotational system. The torque by the ball and beam is given by the equation numbers
(8) and (9), respectively.
Equation (13) is found by combining Newton’s law with the Kirchhoff’s law since
the DC motor is armature-controlled.
dI
L + R I = V − K e θ̇ (13)
dt
L is taken as the armature induction. K e is the motor constant and R is taken
as the armature resistance. Equation (14) can be derived by rearranging equation
number (14).
V − R I − K e θ̇
I˙ = (14)
L
⎢ 0 0 0 1 0 ⎥
⎢ Mball g ⎥
⎣− 0 0 K ⎦
− Jbbm Jbm
Jbm
0 0 0 − KLe − RL
⎡ ⎤
0
⎢0⎥
⎢ ⎥
B=⎢ ⎢0⎥
⎥
⎣0⎦
1
L
Adaptive Control for Stabilization of Ball … 287
10000
y= X (17)
00100
The equation number (18) is derived from equation number (14) since the armature
resistance is very small.
V = R I + K e θ̇ (18)
The equation number (20) is derived from equation numbers (19) and (6) assuming
that friction constant b1 is zero.
⎡ ⎤
⎡ ⎤ 0 1 0 0 ⎡ ⎤
ẋ ⎢ g ⎥ x
⎢ ẍ ⎥ ⎢ 0 0
Rb
2 0 ⎥ ⎢ ẋ ⎥
⎢ ⎥=⎢ 1+ 25 a1 ⎥⎢ ⎥
⎣ θ̇ ⎦ ⎢ ⎥⎣θ ⎦
⎣ 0 0 0 1 ⎦
θ̈ ( R +b)
K Ke
θ̇
− MJbm
ball
0 0 − Jbm
⎡ ⎤
0
⎢ 0 ⎥
⎢ ⎥
+⎢
⎢ 0 ⎥V
⎥ (20)
⎣ 0 ⎦
K
R Jbm
1000
y= X (21)
0010
The system parameters are given in Table 1. The system parameters are substituted
in system equations.
⎡ ⎤ ⎡ ⎤⎡ ⎤
ẋ 0 1 0 0 x
⎢ ẍ ⎥ ⎢ 0 0 3.7731 0 ⎥ ⎢ ẋ ⎥
⎢ ⎥=⎢ ⎥⎢ ⎥
⎣ θ̇ ⎦ ⎣ 0 0 0 1 ⎦⎣θ ⎦
θ̈ −5.170 0 0 −105.1 θ̇
⎡ ⎤
0
⎢ 0 ⎥
⎢ ⎥
+⎢
⎢ 0 ⎥V
⎥
⎣ 0 ⎦
16.85
288 S. Raj
P is a positive definite solution of the Algebraic Riccati Equation and the system
becomes stable.
u 0 = −B21T
P x1 (23)
Equation number (24) gives the sliding surface for ball beam system. G is same as
+ +
B21 , and B21 is taken as the Pseudoinverse of matrix B21 .
t
s(x, t) = G[x1 (t) − x1 (t0 ) − (A1 − B21 B21
T
P)x1 (t)dt]] (24)
t0
The terms u eq and u sw are taken as the equivalent and switching control, respectively.
u 1 = u eq + u sw (26)
The derivative of the sliding surface is taken as zero to find the equivalent control.
G B21 u 1 + B11 w1 + +B21 B21
T
P x1 = 0
u eq = −B21
T
P x1 − (G B21 )−1 G B11 w1 (27)
The switching control law is found using the Lyapunov theorem. The Lyapunov
function is taken as equation number (28).
sT s
V = (28)
2
Equation number (29) gives the condition for the convergence of the sliding mode.
The term η is a positive constant.
V̇ = s ṡ ≤ −η|s| (29)
V̇ = s ṡ
= sG B21 u 1 + B11 w1 + B21 B21 T
P x1
= sG B21 u eq + u sw + B11 w1 + B21 B21 T
P x1
= sG B21 u eq + B21 u sw + B11 w1 + B21 B21 T
P x1
= s −G B21 B21 T
P x1 − G B21 (G B21 )−1 G B11 w1
+ s G B21 u sw + G B11 w1 + G B21 B21 T
P x1
= s [G B21 u sw ] (30)
The switching control law for stabilization of ball and beam system can be found
by the equation number (31).
V̇ = s [G B21 u sw ] ≤ −η|s|
ηsign (s)
u sw ≤ −
G B21
u sw = −u 0 sign (s) (31)
290 S. Raj
The sliding mode control input is given by equation number (32) for the stabi-
lization of ball and beam system.
u 1 = −B21
T
P x1 − u 0 sign(s) (32)
A1 is taken as the uncertainties of the system matrix A. Equation number (33)
gives the state space equation of ball and beam system.
The reaching condition for ball and beam system can be taken as equation num-
ber (35).
s ṡ < 0
sG [A1 x1 − B21 u 0 sign(s) + B11 w1 ] < 0
+ +
s B21 Ax1 + B21 B11 w − u 0 sign (s) < 0
+
B21 Ax1 s
+
+ B21 B11 γ − u 0 s < 0 (35)
sT s
V =
2
V̇ = s ṡ
+
+
= B21 Ax1 s + B21 B11 γ − u 0 s
+
+
= B21 Ax1 s + B21 B11 γ s
+
+
− B21 Ax1 s − B21 B11 γ + v s
= −v s < 0 (37)
The negative sign of the derivative of the Lyapunov function V ensures the stabi-
lization of the ball and beam system.
H∞ -based adaptive control is proposed. Lyapunov theory is used to verify the pro-
posed controller. The modified control law can be written as
u = u eq + u sm (38)
The term u eq is the same as in Eq. (27) used for the nominal system. The adaptive
term u sw is modified as equation number (39).
˙ˆ = | s | (40)
α
The term ˙ˆ is an adjustable gain constant. The term α is the adaptation gain and
α > 0. The adaptation speed of ˆ can be tuned by α. The adaptation error is defined
as equation number (41).
= ˆ − d (41)
Equation number (42) gives the Lyapunov function for the modified controller.
sT s 1
V = + α2 (42)
2 2
The derivative of the sliding surface can be taken as equation number (43).
292 S. Raj
˙
V̇ = s ṡ + α
ṡ = G [ A1 x1 + A1 x1 + B21 u 1 + B11 w1 ]
+ G −A1 x1 + B21 B21 T
P x1
= G A1 x1 + A1 x1 + B21 u eq + u sm + B11 w1
+ G −A1 x1 + B21 B21 T
P x1
= G [ A1 x1 + A1 x1 ]
+ G B21 −B21 T
P x1 − (G B21 )−1 G B11 w1 − ˆ sgn (s)
+ G B11 w1 − A1 x1 + B21 B21 T
P x1
= G A1 x1 + A1 x1 − B21 B21 T
P x1
+ G −B21 (G B21 )−1 G B11 w1 − B21 ˆ sgn (s)
+ G B11 w1 − A1 x1 + G B21 B21 T
P x1
= G A1 x1 − B21 ˆ sgn (s)
s ṡ + α ˙ = sG A x − B ˆ sgn (s)
1 1 21
+ α ˆ − d ˙ˆ
= sG A1 x1 − B21 ˆ sgn (s) + s ˆ − d sgn (s)
= sGA1 x1 − s (s)
d sgn
= s (GA1 x1 − d sgn (s))
= s (GA1 x1 ) − d | s |< 0 (43)
4 Simulation Results
Simulation of ball and beam system was carried out in MATLAB. The different
parameter values are taken from Table 1 for the simulation of the ball and beam
system. Two initial states are considered for the ball and beam system as:
X 0 = [1.2, 0, 0, 0]T
X 1 = [0.09, 0, 0.0873, 0]T
Adaptive Control for Stabilization of Ball … 293
The proposed controller is applied for the stabilization for ball and beam system.
Simulation results for the ball and beam system, controlled by the proposed controller,
are shown in Figs. 2, 3, 4, and 5. Figures 2 and 3 show the trajectories of the ball
and beam system using the proposed controller. The corresponding control input is
shown in Fig. 4. Figure 5 shows the variation of sliding surfaces s1 and s2 for the ball
and beam system using H∞ -based adaptive control.
5 Conclusion
H∞ -based adaptive control was applied for the stabilization of underactuated non-
linear systems. The effectiveness of the proposed controller is shown considering
various initial conditions for stabilization of ball and beam system. The proposed
controller can be applied to many other non-linear underactuated control problems.
294 S. Raj
References
1. Rapp P, Sawodny O, Tarin C (2013) Stabilization of the ball and beam system by dynamic
output feedback using incremental measurements. In: European control conference. Zurich,
Switzerland
2. Ye H, Gui W, Yang C (2011) Novel stabilization designs for the ball-and-beam system. In:
Proceedings of the 18th world congress, Italy
3. Aguilar-Ibanez C, Suarez Castanon MS, de Jesus Rubio J (2012) Stabilization of the ball on
the beam system by means of the inverse Lyapunov approach. Math Prob Eng
4. Lian J, Zhao J (2019) Stabilisation of ball and beam module using relatively optimal control.
Int J Mech Eng Robot Res 8(2):265–272
5. Keshmiri M, Jahromi AF, Mohebbi A, Amoozgar MH, Xie W-F (2012) Modelling and control
of ball and beam system using model based and non model based control approaches. Int J
Smart Sens Intell Syst 5
6. Naif B (2010) Almutairi and Mohamed Zribi: on the sliding mode control of a ball on a beam
system. Nonlinear Dyn 59:221–238
7. Changa YH, Chang C-W, Tao C-W, Lin H-W, Taurd J-H (2012) Fuzzy sliding mode control for
ball and beam system with fuzzy ant colony optimization. Experts Syst Appl 39:3624–3633
Adaptive Control for Stabilization of Ball … 295
8. Hammadih ML, Al Hosani K, Boiko I (2016) Interpolating sliding mode observer for a ball
and beam system. Int J Control 39:3624–3633
9. Tack HH, Choo YG, Kim CG, Jung MW (1999) The stabilization control of a ball-beam using
self-recurrent neural networks. In: International conference on knowledge-based intelligent
information engineering systems, Australia
10. Muralidharana V, Anantharamanb S, Mahindrakara AD (2010) Asymptotic stabilisation of
the ball and beam system: design of energy-based control law and experimental results. Int J
Control 83(6):1193–1198
11. Hauser J, Sastry S, Kokotovid P (1992) Nonlinear control via approximate input-output lin-
earization: the ball and beam example. IEEE Trans Autom Control 31(3)
12. Yan X-G, Edwards C, Spurgeonl SK (2004) Strengthened H infinity control via state feedback: a
majorization approach using algebraic Riccati inequalities. IEEE Trans Autom Control 49:824–
827
13. Huang Y-J, Kuo T-C, Chang S-H (2008) Adaptive sliding mode control for nonlinear systems
with uncertain parameters. IEEE Trans Syst Man Cybern 38(2):534–539
Optimal Robust Controller Design
for a Reduced Model AVR System Using
CDM and FOPIλ Dμ
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 297
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_24
298 M. Silas and S. Bhusnur
In an alternator, the rotor and the remaining part of the system is interlocked through
electromechanical coupling and the assembly just behaves like a R-L-C system which
oscillates around the steady state. Turbine output fluctuates in an oscillatory manner
due to the occurrence of sudden transitions in loads and variation in parameters of
transmission line. The most crucial measure to strengthen the power system stability
is synchronous generator excitation control. Ignoring the saturation effect and other
non-linearities, the mathematical modeling of the system is presented in Fig. 1. AVR
system parameter range chosen for simulation is as follows (Tables 1 and 2).
Vter (s)
GAVR =
Vref (s)
0.1s + 10
= (1)
0.0004s4 + 0.045s3 + 0.555s2 + 1.51s + 11
The closed loop response of AVR is stable but its nature is oscillatory. So further
this higher degree function is changed to a lower degree using model reduction tech-
nique for easy design of the controller, system analysis and representation. Figures 2
and 3 depict the step response and the bode plot of the reduced order AVR system.
(Fig. 3).
The reduced order system shows a similar response as the original AVR system
and hence it can be used for modeling. Transfer function of the reduced second order
AVR is as follows:
18.41
GAVR = (2)
s2 + 1.147s + 20.25
Classical theory and modern control theory are combined in the CDM method, which
enables an efficient algebraic design and analysis of the controller [14, 15]. CDM is
an effective technique for control system design, controller parameter adjustment and
to observe the effect of parameter variations. Stability indices Ui, stability limit Ui *
and equivalent time constant τ are significant parameters in designing of CDM [16].
They, respectively, depict the transient behavior and the stability of the system in the
300 M. Silas and S. Bhusnur
time domain. Further, the robustness during parameter variations can be observed. By
adapting the Lipatov’s stability conditions, Manabe modified the range of stability
indices. The new form is called as the Manabe Standard Form [17]. CDM design
procedure is abridged as:
Initially, a mathematical model of a plant is described in polynomial form and
next step is concerned with the assumption of suitable controller order and config-
uration in polynomial format. The desired design specifications are translated into
the characteristic equation and the controller coefficients are deduced by solving
the Diophantine equation. Finally, a coefficient diagram is drawn, to visualize and
make inferences about the stability and robustness. Two prominent factors, equiva-
lent time constant τ and stability indices Ui are chosen to compute the coefficient of
CDM controller polynomials.
The standard CDM control structure is presented in Fig. 4. In plant transfer func-
tion Np (s) and Dp (s) are numerator and denominator polynomials, Ac (s) and Bc (s)
polynomials of the CDM controller to fix a desired transient response and pre-filter
F(s) takes care of the steady state gain. The symbols u, d, r and y are controller signal,
external disturbance signal, reference input and system output respectively.
From Fig. 4, the closed response of the system is derived as
where, the closed-loop characteristic polynomial P(s) is the Hurwitz polynomial with
positive real coefficients and is given by
n
= ai si (5)
i=0
In Eq. (6), NP (s) and DP (s) are independent of each other and their degree are
related by the condition m ≤ n.
Controller polynomials Ac (s) and Bc (s) are chosen as:
p
q
Ac (s) = li si and Bc (s) = ki si
i=0 i=0
Optimal Robust Controller Design for a Reduced Model AVR System … 301
CDM controller polynomials with coefficients li and ki must satisfy the condition
p ≥ q for practical implementation.
Design parameters of CDM, are defined as
a1
τ= (7)
a0
a2i
γi = , i = 1, 2,− − − − − − − (n − 1) (8)
ai+1 ai−1
1 1
γi∗ = + , i = 1, 2,− − − − − − (n − 1), γn = γ0 = ∞ (9)
γi+1 γi−1
The system stability is determined by stability indices and stability limits; the equiv-
alent time constant determines the speed of the time domain response. The required
settling time ts, is resolved before the design procedure is started. The relation between
the user defined settling time (ts ) and equivalent time constant (τ ) is expressed as
ts
τ=
(2.5 3)
There is conflict amidst τ and the control signal magnitude. Control signal diminishes
and the system becomes slow when τ increases. When the response becomes faster
due to small τ, the control signal grows in size. Accordingly, the value of τ should
be chosen in view of the aforesaid conflict.
PID controllers are one of the prominent amongst controllers designed for various
industrial applications and also it is the most popular practical controllers imple-
mented. In the above context CDM-PID controller design is proposed.
CDM-PID controller design for AVR system covers the following steps:
i. Higher order AVR is approximated in second order using model reduction
technique is as follows:
Np (s) 18.41
Gp (s) = = 2
Dp (s) s + 1.147s + 20.25
i−1
n 1
Ptarget (s) = a0 (τ s)i + τ s + 1
γ
i=2 i=1 i−j
i
τ3 3 τ2 2
= a0 s + s + τs + 1 (10)
γ12 γ2 γ1
where γ1 and γ2 are stability indices and τ is the equivalent time constant
iv. Characteristic polynomial is formulated as
Kc
C(s) = K c + + K c TD s
Ti s
K1 K0 K2s
+ +
l1 l1 s l1
K1 K1 K2
Kc = , Ti = , and TD = (12)
l1 K0 K1
This concept originated in 1695, when two scientists L’ Hospital and Leibnitz
communicated through letter about the concept with respect to half-order derivative,
non-integer order. This mathematical concept is mapped and represent as integration
and differentiation in term of non-integer order as a Dαt , here operating limits are α
and t.
5.1 Preliminaries
[ t−a
h ]
α 1 j α
a Dt f(t) = lim α (−1) f(t − jh) (14)
h→0 h j
j=0
where, wαj = (−1) j αj presents the coefficients of the polynomial
(1 − z)α . Alternatively,
recursively they can be derived from
α+1
w0 = 1w j = 1 − j w αj−1 j = 1, 2,....
α α
−α 1 t
a Dt f(t) = ∫(t − τ )α−1 f(τ )dτ (15)
(α) a
Here, ‘a’ presents initial time instance vary between 0 < α < 1. RL explanation is
prominently used in FC and in fractional order differentiation if its order satisfied
(n–1 < α ≤ n) and it is given as:
304 M. Silas and S. Bhusnur
t
α 1 dn f(τ )
a Dt f(t) = dτ (16)
(n − α) dtn (t − τ )α−n+1
a
FOCs are extended kind of classical PID controllers. FOPID is used for enhancing
flexibility, stability and robustness of the system. Despite the existence of uncertain-
ties, the aim of using non-integer models is to get robust performance. In FOCs
besides the nominal three parameters, two additional parameters, add to further
complexity as well as flexibility in tuning the control parameters. There are abun-
dant analytical methods and numerical techniques in [19–23] that have been trialed
for optimum tuning of five parameters of FOCs. Therefore, it has five parameters
that make the FOCs flexible and less sensitive towards change in parameter. Various
toolboxes like NINTEGER [24], CRONE [25], FOMCON [26] aid in design of the
fractional order system in which many optimization techniques have been provided
within the toolbox itself. The standard mathematical output response of FOCs is
presented a
Ki
CFOPID (s) = Kp + λ + Kd .sμ , 0 < (λ, μ) < 2
s (17)
1 μ
CFOPID (s) = Kp 1 + + T D s
Ti sλ
All the conventional PID controllers can be obtained by the FOPID controller because
it is a particular case of the fractional controller and its converging region in the
two-dimensional plane is given as (Fig 5).
Firstly, parameters such as Kp , Ki , Kd , λ, and μ of control variables were optimized
and then the fractional term of the controller was into the integer term. There are
several approximation techniques which convert fractional term into integer order
[27].
There are many methods available for realization of FOFT into integer order in contin-
uous domain [28–30]. In a given specified frequency band [wb ,wh ], Oustaloup’s
recursive method is an ubiquitous approach to approximate the fractional term into
an integer order.The generalized non-integer order representation of the differentiator
sα can be presented as:
Optimal Robust Controller Design for a Reduced Model AVR System … 305
N
s + wk
G(s) = (C0 )α (18)
k=−N
s + wk
k+N2N++1
1 α
2+2 k+N2N++1
1 α
2−2
wb wb
where, wk = wb wμ
and wk = wb wμ
are the rank k zeros and
poles respectively and (2N + 1) is their total number.
6 Simulation Results
Closed loop response of unity feedback without the controller for AVR is shown in
Fig. 2. Although, the Z-N method gives an enhanced response, yet research work is on
to cast around to magnify the quality, performance and robustness of the controller.
Further, many researchers have designed and implemented the fractional order PIλ Dμ
Controller for improvement in the performance of AVR [31–33]. The unit response
of an AVR with the FOPIλ Dμ controller is revealed in Fig. 6.
Further system performance is improved by employing CDM-PID with FOCs
to develop a new CDM-FOPIλ Dμ control technique for tuning. CDM-FOPIλ Dμ.
Controller is established by entailing the CDM-PID controller parameters (Kp =
0.7861, Ki = 3.125, Kd = 0.3903) and its transfer function is given as:
Ki
CFOPID (s) = Kp + + Kd .sμ (19)
sλ
Step Response
1.6
1.4
1.2
Amplitude
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12
T ime (seconds)
Fig. 2 Comparison of original and reduced order step response of AVR system
306 M. Silas and S. Bhusnur
Bode Diagram
50
Magnitude (dB)
-50
-100
-150
0
Phase (deg)
-90
-180
-270
-1 0 1 2 3
10 10 10 10 10
Frequency (rad/s)
Step Response
1.4
FOPID-cont.
1.2
1
Amplitude
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Time (seconds)
Step Response
1.4
ZN-PID controller
1.2 CDM-FOPID controller
1
A m p litu d e
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Time (seconds)
3.125
CCDM−FOPID (s) = 0.7861 + + 0.3903.s0.9744 (20)
s0.9997
308 M. Silas and S. Bhusnur
Step Response
1.4
ZN-PID controller
1.2 CDM-FOPID controller with uncertainity in amp. para
1
A m plitude
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Time (seconds)
Step Response
1.4
ZN-PID controller
1.2 CDM-FOPID controller with uncertainity in exciter para
1
Am plitude
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Time (seconds)
Step Response
1.4
ZN-PID controller
1.2 CDM-FOPID controller with uncertainity in generator para
1
Amplitude
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Time (seconds)
According to this work a new CDM-FOPIλ Dμ controller was designed for AVR
system by blending features of CDM and fractional calculus to optimize the controller
parameters. The response of the AVR with the proposed controller gives better result
as compared to prevailing techniques for PID and FOPID controllers. Simulation
results show effectiveness of CDM-FOPIλ Dμ controller as contrasted to the conven-
tional technique. The standard performance specifications are fully achieved by
the CDM-FOPIλ Dμ controller. The variation in step response in the presence of
uncertainty is trivial, which confirms the robustness.
Incorporating the proposed method relative stability analysis can be investigated
by comparing with other methods using Kharitonov theorem, Edge theorem etc.
Although fractional order controller design is computationally complex, it provides
greater flexibility and control over system performance.
References
17. Manabe S (1999) Sufficient condition for stability and instability by Lipatov and its application
to the coefficient diagram method. In: 9-th Workshop on Astrodynamics and Flight Mechanics,
Sagamihara, ISAS, pp 440–449
18. Monje CA, Chen Y,Vinagre BM, Xue D, Feliu-Batlle V (2010) Fractional-order systems and
controls fundamentals and applications. Springer Science & Business Media
19. Chen Y, Petras I, Xue D (2009) Fractional order control-a tutorial. In: American control
conference, 2009. ACC’09. IEEE, pp 1397–411
20. Valerio D, Costa JS.da (2010) A review of tuning methods for fractional PIDs. In: 4th IFAC
Workshop on fractional differentiation and its applications, FDA, vol 10
21. Yeroglu C, Tan N (2011) Note on fractional-order proportional–integral–differential controller
design. IET Control Theory Appl 5(17):1978–1989
22. Xue D, Zhao C, Chen YQ (2006) Fractional order PID control of a DC-motor with elastic
shaft: a case study. In: American control conference. pp 3182–3187
23. Monje,C.A. et al.: Proposals for fractional P I λD μ tuning. In: Proceedings of The First IFAC
Symposium on Fractional Differentiation and its Applications (FDA04)., vol. 38, pp. 369–
381,(2004).
24. Valério D, Costa J.Sá da (2004) Ninteger, a non-integer control toolbox for MatLab. In: Proc
First IFAC Work Fract Differ Appl Bordeaux. pp 208–213
25. Oustaloup A, Melchior P, Lanusse P, Cois O, Dancla F (2000) The CRONE toolbox for Matlab.
In: CACSD. Conference Proceedings. IEEE International symposium on Computer-Aided
Control System Design (Cat.No.00TH8537). pp 190–195
26. Tepljakov A, Petlenkov E, Belikov J (2011) FOMCON: Fractional-order modeling and control
toolbox for MATLAB. In: Mixed Design of Integrated Circuits and Systems (MIXDES), 2011
Proceedings of the 18th International Conference IEEE. pp 684–689
27. Vinagre BM, Podlubny I, Hernandez A, Feliu V (2000) Some approximations of fractional
order operators used in control theroy and applications. Fract Calc Appl Anal 3(3):231–248
28. Maione G (2008) Continued fractions approximation of the impulse response of fractional-order
dynamic systems. IET Control Theory Appl 2(7):564–572
29. Xue,D., Zhao,C.,Chen,Y.Q.:A modified approximation method of fractional order system.In:
Proc. 2006 IEEE Int. Conf.Mechatron. Autom., pp. 1043–1048 ,Jun(2006).
30. Khanra,M., Pal,J.,Biswasl,K.:Rational approximation and analog realization of fractional order
transfer function with multiple fractional powered terms. Asian J. Control, vol. 15, no. 4, (2013).
31. Verma SK, Nagar SK (2018) Design and optimization of fractional order PIλDμ controller
using grey wolf optimizer for automatic voltage regulator system. Recent Advances in
Electrical & Electronics Engineering (Formerly Recent Patents on Electrical & Electronics
Engineering), vol. 11, no. 2. pp. 217–226
32. Tang Y, Cui M, Hua C, Li L, Yang YY (2012) Optimum design of fractional order PIλDμ
controller for AVR system using chaotic ant swarm. Expert Syst Appl 39(8):6887–6896
33. Majid Zamani NS, Karimi-Ghartemani M (2007) Fopid controller design for robust perfor-
mance using practicle swarm Optimization. Fract Calc Appl Anal An Int J Theory Appl
10(2):169–187
Neural Network Based DSTATCOM
Control for Power Quality Enhancement
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 313
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_25
314 I. Srikanth and P. Kumar
2 DSTATCOM Topology
Estimation of reference supply currents with use of unit vectors through Adaline
NN-based control technique is discussed here. In each phase, the fundamental active
load current component is extracted i.e., reference source current. The neural network
LMS-Adaline based extraction algorithm uses the PCC voltages and load current.
Weights are obtained from each phase in this technique, i.e., W p , Wq . Figures 2
and 3 demonstrate the control algorithm for computing active and reactive weight
components. Using the LMS algorithm, the weights are derived from the load currents
and unit vectors, and the loss dc component is added to provide reference currents
for each phase.
Neural Network Based DSTATCOM Control for Power Quality … 315
The sensed 3-φ PCC voltages are filtered and the amplitude is given by
1/2
vt = (2/3)(v2sa + v2sb + v2sc ) (1)
where ω L (n) is the active components of supply currents and k pd are proportional
and kid integral gain constants.
The active component of the supply currents’ mean weight is
ω L (i) = ω L (i) + ω pa (i) + ω pb (i) + ω pc (i) /3 (5)
The extraction of weights of the basic d-axis components of the load currents may
be done using the least mean square (LMS) technique, and weights can be trained
using the Adaline neural network algorithm. The weights of 3- φ load currents’ d-axis
components are assessed as follows:
∗
ω pa (i) = [ω pa (i − 1) + η i La (i) − ω pa (i − 1)∗ u pa (i) u a∗ (i)] (6)
∗
ω pb (i) = [ω pb (i − 1) + η i Lb (i) − ω pb (i − 1)∗ u pb (i) u ∗b (i)] (7)
∗
ω pc (i) = [ω pc (i − 1) + η i Lc (i) − ω pc (i − 1)∗ u ∗c (i) u ∗c (i)] (8)
where η is the convergence factor and the value of η diverges from 0.01 to 1. The 3-φ
active components of load currents of the weights were extracted using Adaline in
Neural Network Based DSTATCOM Control for Power Quality … 317
each phase. The fundamental 3-φ reference active components of the supply currents
are computed as
∗ ∗ ∗
i sapr = ω p u∗a , i sbp r = ω p u∗b , i scpr = ω p u∗c (9)
The unit vectors of quadrature are obtained using phase unit vectors as
√
(−u ∗b +u∗c ) 3 (u ∗b −u∗c )
u qa = √ , u qb = ∗ (u∗a ) + √ ,
3 2 2 3
√ (10)
3 (u ∗b −u∗c )
u qc = − ∗ u ∗a + √
2 2 3
318 I. Srikanth and P. Kumar
The measured PCC voltages and the PCC voltage reference value are sent into the
AC PI Controller as the terminal voltage. At the ith sample instant, the AC voltage
error is
The output of PCC voltage from the AC Voltage PI Controller at the ith sampling
instant.
ωqv (i) = ωqv (i − 1) + k pa {Vte (i) − Vte (i − 1)} + kia Vte (i) (12)
where ωqv (i) is the d-axis component of the supply currents and k pa is the
proportional gain, kia are the integral gain constants.
The 3-φ weights of the reactive components of the load currents are computed as
ωqa (i) = [ωqa (i − 1) + η i La (i) − ωqa (i − 1) ∗ u qa (i) ∗ u qa (i)] (13)
ωqb (i) = [ωqb (i − 1) + η i Lb (i) − ωqb (i − 1) ∗ u qb (i) ∗ u qb (i)] (14)
ωqc (i) = [ωqc (i − 1) + η i Lc (i) − ωqc (i − 1) ∗ u qc (i) ∗ u qc (i)] (15)
The reactive components of the source currents in the 3-ϕ system is given as
∗ ∗ ∗
i saqr = ωq u qa , i sbqr = ωq u qb , i scqr = ωq u qc (17)
The sum of the active and reactive power components is used to calculate the total
reference supply currents.
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
i sa = i sapr + i saqr , i sb = i sbpr + i sbqr , i sc = i scpr + i scqr (18)
The sensed feedback currents are related with the assessed reference supply to
generate the error signal. The output error signal is given to the IGBT of the VSC
through the hysteresis current controller.
Neural Network Based DSTATCOM Control for Power Quality … 319
The characteristics of the 3-φ system, when the DSTATCOM is in the operating mode
and not in the operating mode are discussed. The simulation results were validated
through the MATLAB/Simulink software.
Case.1: Performance of the 3-φ system not connected to the DSTATCOM.
Because of the nonlinear load, i.e., an unregulated rectifier with R-L load, the supply
current waveform of a 3-φ system is non-sinusoidal. The DSTATCOM injected
current is also zero, and the DC-link voltage constant is to be Vdcref = 700 V. The
load current (iLabc ) exhibits a non-sinusoidal waveform due to the connected 3phase
uncontrolled rectifier as shown in Fig. 4.
Case.2: Performance of the 3-φ system connected to DSTATCOM.
The 3-φ supply currents are sinusoidal in nature, as seen by the waveform in Fig. 5.
DSATACOM injects currents (iDST ) in to PCC and the DC link Voltage (vDC ) is
constant throughout the simulation period.
From Figs. 6 and 7, it has been observed that the THD percentage of the supply
current without DSTATCOM is 26.66%, and the THD percentage with DSTATCOM
Fig. 4 Simulation wave forms without DSTATCOM under non linear load
320 I. Srikanth and P. Kumar
is 1.20%. The results reveal that the neural network control method performs well
when it comes to removing harmonic distortion. According to the IEEE-519 standard,
it should be less than 5%, which is attained by the neural network control.
6 Conclusion
This paper mainly elaborates the ADALINE neural network-based LMS algorithm
for DSTATCOM. The DC-link voltage is kept constant throughout the simulation,
making the system more stable and without harmonics. The addition of DSTATCOM
using the neural network control algorithm compensates the harmonics in the supply
currents. Its performance improved under a nonlinear load condition. Also, the simu-
lation results indicate that the source current limitation complies with the IEEE-519
THD standard.
Appendix
References
1 Introduction
FACTS devices are extensively used for the effective power utilization, demand
management, stabilization of voltage, improvement of power quality, mitigation of
harmonic and power factor improvement [1, 2]. The additional benefits of these
controllers include compensation of reactive power, control of power flow, voltage
regulation, enhancement of steady state and transient stability, minimization of power
losses, and conditioning of power systems [3, 4]. Emerging trends in nonconventional
and distributed energy sources stimulated FACTS devices to play a critical role to
maintain the effective energy usage, improvement of reliability and security of the
power grid [1].
The advantages of this controller are utilized in standalone microgrids for the
purpose of effective usage of distributed power sources to deliver power intended for
the remote locations [2]. With the help of power electronic converters, performance
of the system is collectively improved. The expected outcomes are in enhancement
of quality of power in the point of common coupling.
The utilities, domestic, industrial and commercial customers face a very big chal-
lenge to mitigate the various power quality indices existing in the system [3]. Several
FACTS controllers and its control methodologies can support to overcome the power
quality issues. To utilize the power sources in a much effective and secure manner,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 323
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_26
324 D. Sarathkumar et al.
FACTS devices begun its debut incorporate in the power system during 1970s. Funda-
mental operation of these components depends on various control methodologies to
control reactive as well as real flow of power [4].
The recent research concentrates on the architectures and control strategies of
power electronic converters to enhance the overall efficiency of controllers in power
electrical networks and also subsequently improve the security of the power system
[5, 6]. Currently, FACTS controllers and smart control approaches became a most
dominant device in power generation through distributed power sources such as
solar photo voltaic, wind farm as well as fuel cell [6]. More number of researchers
concentrated on maximum power extraction from renewable energy sources. The
effective usage of these controllers for micro-grid and smart grid integrated with the
non-conventional system paved a new avenue for the overall performance improve-
ment [7, 8]. The major objective of the article is to survey the advantages of FACTS
controllers for micro-grid and smart grid which was integrated with renewable energy
sources.
The paper comprises of six sections. In Sect. 2, the basic concept of power quality
in power system networks was explained. The overview of transmission side FACTS
controller and its role was presented in Sect. 3. Section 4 deals with the distribution
side FACTS controllers and its task was elaborated. Section 5 postulates the role of
FACTS controllers in the micro grid and smart grid environments. In Sect. 6, the
conclusion and future focus of FACTS controllers in the micro grid and smart grid
was explained.
Power quality issues results voltage or current distortions in the electrical systems
or deviations of frequency causing the faults or abnormal operations of consumer
components. Moreover, the electrical energy is provided to the customers to be safe,
secure and also continuous with pure sine waveform with constant frequency and
magnitude which need to be ensured at all levels.
Commonly, power quality issues leads to maximization of power losses, mal-
operation of apparatus which are interconnected with adjacent power networks too.
The more utilization of power electronic devices results in the minimization of current
and harmonics and also maximizes the reactive and real power [9]. Nowadays, the
improvement of power quality is a very difficult phenomenon and creates the serious
tasks in various levels of electrical networks. Hence, this problem creates more effects
in the electrical networks. So, power quality problems are getting increased attention
and awareness amongst customers and power companies [10]. Sustaining quality of
power in the permissible range was the major challenging task. Major issues in poor
power quality are clearly explained in the paper. [11].
An Extensive Critique on FACTS Controllers and Its Utilization … 325
Table 1 explains the continuous effects, origin and description of power quality
indices and its occurrence in an electrical network. It is noted that the occurrence
of swells in voltages have the largest level which is approximately 35% and the
minimum level of occurrence is transients in voltage which is nearly 8%. More
usage of critical loads, create the harmonics and non-sinusoidal voltages of around
20% and 18% consequently. From the 30 years of Scopus database, 3264 papers was
published in FACTSs controllers from the year 1987−2017.
3 Facts Controllers
FACTS controllers with the combination of power electronic circuits and high speed
operation control methods are used in recent micro grids comprising of alternating
current to direct current distributed power sources. It depends on the following
fundamental strategies:
(1) Reactance was connected at PCC
(2) Supplying the alternating current systems in any one combination with the power
network junctions
(3) Injecting total power and reactive current in point of real energy flow operation.
The operation tools depend upon current, power, phase angle or real current
flow operation, applying PID tuning, optimum regulation, analytical optimization
operational methodologies, heuristic-optimization control execution index.
326 D. Sarathkumar et al.
This device was implemented in the late 1970s which is the initial inventions of
FACTS controllers. The SVC was interconnected in a parallel connection in the
point of common coupling to inject or absorb the reactive power that is competent of
interchanging the inductive and capacitive power to regulate the particular parameters
in an existing electrical system [13]. In the year 1974, General Electric Company
implemented the initial SVC. Around 500 SVCs in reactive power ratings ranging
as 50−500 MVAR is installed by power companies till now.
SVCs are used to enhance the rotor angle stability through dynamically controlling
the voltage in various places as well as transient stability in supporting to enhance the
dynamic of power oscillation. The availability, effectiveness and speed response of
SVCs is enabled to give superior action related to the control of transient and steady
state parameters. Moreover, this device is used for improving alternator rotor angle
stability, swinging of damping power oscillations and minimization of power losses
through controlling of reactive power [14]. The SVC can be functioned in two modes
namely, VAR regulator and voltage regulation mode. The steady state behaviour of
SVC in the voltage regulation state was given in Fig. 2.
This device which is a combination in series form of capacitors parallel with the
silicon-controlled reactor gives a variable series capacitive reactance in flexible
manner [15]. TCSC plays important role in the functioning and regulation of elec-
trical systems like power flow improvement, short circuit current limiting, improving
the dynamic and transient stability.
The important features of TCSC components is enhancing the real power flow,
damping of power oscillations, and control of line power flow [5, 6]. The starting
TCSC was first implemented in Arizona power substation in the year late 1994 func-
tioning in 220 kV and is utilized to enhance the transfer of power flow capacity. After
implementing this capability, the power network was increased by approximately
30%. Figure 3 depicts the TCSC for power quality problems mitigation.
This device was combined through static var compensator and commonly depending
on gate turn-off thyristor based SCRs. This device was capable to reactive power
supplying or absorbing in the receiving end side. It also functions with real power
flow it should integrate from a power supply or energy storage systems with proper
ranging.
The initial STACTOM was implemented in Japan during the year 1994 in the
Inumaya power substation. It was capacity of ± 60 MVAR and supports voltage
stability improvement. The intention of this controller implementation is to support
variable reactive power compensation.
STATCOM does not require more capacitive and inductive components to support
capacitive and inductive reactive power in large power transmission networks as
needed in SVCs [16]. The primary advantage of STATCOM is the requirement
of a minimum area and large output reactive power in minimum grid networks.
An Extensive Critique on FACTS Controllers and Its Utilization … 329
STATCOM provides a current source while it is not depending in grid supply voltage.
Also, it provides better variable stability in the exact location and STATCOM gives
best damping behaviour than SVC. It also transiently interchanges the real power
of the networks. Commonly, a STATCOM is functioning in two modes such as
VAR regulation and voltage control mode. Figure 3 depicts the STATCOM for VAR
regulation and voltage control Fig. 4.
Developing smart grids along with distributed generation and renewable energy
sources needs the help of FACTS controllers and power electronic circuit’s stabi-
lization, combined with super behaviour operation methodologies [7, 8]. Advanced
FACTS controller is developed to assure decoupled alternating current to direct
current integration, enhanced power security, compensation of reactive power,
improvement of voltage and power factor and minimization of loss [9, 10]. It
also improved the reliability in distribution side micro grids networks, stand-alone
alternating current to direct current distribution generation strategies through non-
conventional energy systems. FACTS controllers involve along with the voltage
source converters, the passive filters [6–10].
Advanced electrical networks along with additional demand advance metering
infrastructure and distributed generation integration comprise solar photovoltaic,
wind energies need the newly designed advanced-soft computing tools, operational
methodologies and improved power electronic circuits infrastructure to assure reli-
ability, safety, efficiency without involving the short circuit currents and transient
over-voltages [19]. Enhanced power usage and efficient power regulation is the main
interconnections line regulations control the rating of extra additional or substitute
generation [4]. Clean and non-conventional energy production was able to deliver
30–35% of total energy in the year 2040 in various sources. The advance imple-
mentation of FACTS controller strategies is aimed for generation and transmission
system components [20] of smart grid.
332 D. Sarathkumar et al.
5 Conclusion
This article examined a detailed survey and application of FACTS controllers’ inte-
gration with renewable energy sources for minimizing the power quality problems
in micro grid as well as in smart grid technology. The presently available FACTS
controllers is subjected to various modifications in the design depending on opti-
mization of control methods by applying the smart grid control methods which also
serves several functions like control of power flow, enhancement of stability, and
compensation of reactive power.
The article also surveyed various FACTS control solutions, while the regula-
tion methods for better usage of linear, nonlinear and critical loads, power quality
problems in the smart grid and micro-grid environments are also presented. The
overview of this survey is intended for effective power utilization; minimize the
losses, stabilization of voltage, and enhancement in power quality, and minimizing
harmonics in the PCC of transmission. Another issue of grid integration issues in the
weak alternating power utility networks was examined. Future of these controllers
are exciting and welcomed by more and optimal usage distributed energy sources
in domestic building, office buildings, commercial based buildings, industries and
create the awareness in hybrid power systems and power grid to E-vehicles, energy
storage technologies, better lighting schemes and use of energy efficient motors.
References
1. Darabian M, Jalilvand A (2017) A power control strategy to improve power system stability
in the presence of wind farms using FACTS devices and predictive control. Int J Electr Power
Energy Syst 85(2):50–66
2. Subasri CK, Charles Raja S, Venkatesh P (2015) Power quality improvement in a wind farm
connected to grid using FACTS device. Power Electron Renew Energy Syst. 326(4):1203–1212
3. Liao H, Milanović JV (2017) On capability of different FACTS devices to mitigate a range of
power quality phenomena. IET Gener Transm Distrib 11(5):2002–2012
4. Yan R, Marais B, Saha TK (2014) Impacts of residential photovoltaic power fluctuation on
on-load tap changer operation and a solution using DSTATCOM. Electr Power Syst Res.
111:185–193
5. Hemeida MG, Rezk H, Hamada MM (2017) A comprehensive comparison of STATCOM versus
SVC-based fuzzy controller for stability improvement of wind farm connected to multi-machine
power system. Electr Eng 99: 1–17
6. Bhaskar MA, Sarathkumar D, Anand M (2014) Transient stability enhancement by using
fuel cell as STATCOM. In: 2014 International conference on electronics and communication
systems (ICECS). pp 1–5
7. Sarathkumar D, Srinivasan M, Stonier AA, Samikannu R (2021) A research survey on microgrid
faults and protection approaches. In: IOP Conference series: Materials science and engineering,
vol 1055. pp 012128
8. Sarathkumar D, Srinivasan M, Stonier AA, Samikannu R, Dasari NR, Raj RA (2021) A tech-
nical review on classification of various faults in smart grid systems. In: IOP conference series:
Materials science and engineering, Vol 1055. pp 012152
An Extensive Critique on FACTS Controllers and Its Utilization … 333
9. Sarathkumar D, Srinivasan M, Stonier AA, Samikannu R, Dasari NR, Raj RA (2021) A tech-
nical review on self-healing control strategy for smart grid power system. In: IOP conference
series: Materials science and engineering, vol 1055. pp 012153
10. Sarathkumar D, Srinivasan M, Stonier AA, Samikannu R, Vijay Anand D (2021) Design
of intelligent controller for hybrid PV/wind energy based smart grid for energy management
applications. In: IOP Conference series: Materials science and engineering, vol 1055. pp 012129
11. Vanaja DS, Stonier AA, Mani G, Murugesan S (2021) Investigation and validation of solar
photovoltaic-fed modular multilevel inverter for marine water-pumping applications. Electr
Eng. https://doi.org/10.1007/s00202-021-01370-x
12. Stonier AA, Lehman B (2018) An intelligent-based fault-tolerant system for solar-fed cascaded
multilevel inverters. IEEE Trans Energy Convers
13. Alexander A, Thathan M (2014) Modelling and analysis of modular multilevel converter for
solar photovoltaic applications to improve power quality. IET Renew Power Gener
14. Albert Alexander S, Manigandan T (2014) Power quality improvement in solar photovoltaic
system to reduce harmonic distortions using intelligent techniques. J Renew Sustain Energy
15. Albert Alexander S, Manigandan T (2014) Digital control strategy for solar photovoltaic fed
inverter to improve power quality. J Renew Sustain Energy
16. Sarathkumar D, Srinivasan M, Stonier AA, Kumar S, Vanaja DS (2021) A brief review on
optimization techniques for smart grid operation and control. In: 2021 International confer-
ence on advancements in electrical, electronics, communication, computing and automation
(ICAECA). pp 1–5. https://doi.org/10.1109/ICAECA52838.2021.9675618
17. Sarathkumar D, Srinivasan M, Stonier AA, Kumar S, Vanaja DS (2021) A review on renew-
able energy based self-healing approaches for smart grid. In: 2021 International confer-
ence on advancements in electrical, electronics, communication, computing and automation
(ICAECA). pp 1–6. https://doi.org/10.1109/ICAECA52838.2021.9675495
18. Stonier A, Yazhini M, Vanaja DS, Srinivasan M, Sarathkumar D (2021) Multi level inverter and
its applications—An extensive survey. In: 2021 International conference on advancements in
electrical, electronics, communication, computing and automation (ICAECA). pp 1–6. https://
doi.org/10.1109/ICAECA52838.2021.9675535
19. Sarathkumar D, Kavithamani V, Velmurugan S, Santhakumar C, Srinivasan M, Samikannu
R (2021) Power system stability enhancement in two machine system by using fuel cell as
STATCOM (static synchronous compensator). Mater Today: Proc 45, Part 2:2130–2138. ISSN
2214–7853. https://doi.org/10.1016/j.matpr.2020.09.730. 9
20. Sarathkumar D, Venkateswaran K, Vijayalaxmi A (2020) Design and implementation of solar
powered hydroponics systems for agriculture plant cultivation. Int J Adv Sci Technol (IJAST)
29(05):3266–3271
Arctangent Framework Based Least
Mean Square/Fourth Algorithm
for System Identification
1 Introduction
One of the major challenges in the study of adaptive filters is the selection of a
suitable cost function [1, 2]. The efficiency of adaptive filters is primarily deter-
mined by the design technique of the filter and the cost function (CF) used. Mean
Square Error (MSE) is preferably a widely used cost function for Gaussian signals or
noise distribution because of its low computational tractability, simplicity, optimal
performance and convexity. Some of the adaptation algorithms developed utilizing
this criterion are least mean square (LMS), normalized LMS (NLMS) and variable
step-size LMS (VSS-LMS) [1, 2]. In practical scenarios, MSE based algorithms
can sometimes deviate and degrade its performance where noise is non-Gaussian or
impulsive [2, 3].
The cost function used for noise or signal with a light-tailed impulsive distribution
should be higher-order moment of the error measurement. The family of least mean
fourth (LMF) algorithm [2] uses this property. However, the instability issue hampers
its performance. This results in the development of a least mean square/fourth
(LMS/F) algorithm combining the strengths of both LMF and LMS algorithms [4]
where the LMS/F algorithm’s behavioral impact in the Gaussian noise environment
was studied and the algorithm’s behavior in the presence of non-Gaussian noise envi-
ronment was compared in [5]. However, with the constant presence of such impulsive
noise, the algorithm’s performance was not satisfactory. Later in [6], a reweighted
zero-attracting modified variable step-size continuous mixed p− norm algorithm was
developed to exploit sparsity in a system against impulsive noise.
Arctangent, being one of the saturation properties of non-linearity error that can
enhance the behavior of the adaptive algorithms. A novel cost function framework
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 335
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_27
336 S. Saha et al.
called the arctangent framework was proposed exploiting the property of the arct-
angent function. Proposed algorithms such as arctangent sign algorithm (ATSA),
arctangent least mean square (ATLMS), arctangent least mean fourth (ATLMF), and
arctangent generalized maximum correntropy criterion algorithm are all based on
the arctangent framework [7]. Since the LMS/F algorithm outperforms the standard
LMS and LMF algorithms while maintaining their flexibility and stability [4], an
arctangent Least Mean Square/Fourth (ATLMS/F) algorithm is presented, and its
response is evaluated using various simulations in MATLAB in a noisy environment
system identification model. Sect. 2 reviews the arctangent framework based cost
function. Sect. 3 explains the proposed algorithm. Sect. 4 discusses the simulation
and observations while sect. 5 states the conclusion.
where X (n) = [x(n), x(n − 1), ....., x(n − M + 1)]T represents the input signal
∧
vector and φ(n) is the filter coefficient. Defining the weight coefficients of the adap-
∧ ∧ ∧ ∧ ∧
tive filter as φ(n) = [φ , φ , φ , ...., φ ]T and transmitting X (n) through the adaptive
1 2 3 M
Unknown
system,
Adaptive
filter,
filter provides the output signal ŷ(n) and error ε(n) as follows
The values of the weight coefficients of the adaptive system can be optimized by either
reducing or maximizing the CF. It has been recognized that the saturation attributes of
non-linearity error provide resilience against random impulsive disturbances [3, 8].
Based on saturation properties of the arctangent function, an arctangent framework
dependent cost function was introduced as [7]
where controlling constant α > 0 controls steepness of the arctangent cost function.
Gradient of Eq. (4) is denoted as
∂ψ(n)
ϕ(n + 1) = ϕ(n) − β (6)
∂ϕ(n)
where β represents step-size of the weight upgradation. Combining Eqs. (4) and (6),
the updated weight vector is
∇ϕ ξ(n)
ϕ(n + 1) = ϕ(n) − β (7)
1 + [αξ(n)]2
1 2 1
ξ (n) = ε (n) − λ ln ε2 (n) + λ (8)
2 2
Integrating LMS/F algorithm’s CF, ξ(n) with the conventional arctangent frame-
work provided in (7), the updated arctangent LMS/F (ATLMS/F) algorithm’s weight
vector is defined as
ε3 (n).X (n)
ϕ(n + 1) = ϕ(n) + μ 1 2 (9)
ε2 (n) + λ 1 + [α 2 ε2 (n) − 21 λ ln ε2 (n) + λ ]
2
From (9) it is observed that an extra term 1 + α 21 ε2 (n) − 21 λ ln ε2 (n) + λ
in the weight update equation of ATLMS/F algorithm compared to the conventional
LMS/F algorithm counteracts any change in the weight updation under the influence
of impulsive noise making the ATLMS/F algorithm stable in comparison to the
typical LMS/F algorithm.
where ||.||2 is the l2 norm. The calculated NMSD is for n = 20,000 iterations taking
the average of 100 independent trials for analyzing the outcomes. The performance
of the suggested algorithm is compared to that of the LMS/F algorithm. The step-
size parameter used for the LMS/F algorithm is β = 0.002 whereas the cumulative
Arctangent Framework Based Least Mean Square/Fourth Algorithm … 339
step-size used for the ATLMS/F algorithm is β = 0.01 where β = 0.1 and α = 0.1
for both the experiments based on system identification.
A system identification case is considered where the impulse response is
constructed synthetically using the method given in [9]. The approach begins by
defining a vector U
(Mu −1) T
UMx1 = O M p x1 1e− τ e− τ ..e−
1 2
τ (11)
where Mp is the length of the bulk delay and Mu = M − Mp represents the length of
the decaying window that can be regulated by τ. The synthetic impulse is represented
as
O M p x M p O M p x Mu
h(n) = u+P (12)
O Mu x M p B Mu x M p
where BMu xMp = diag(b), P and b represents zero mean white Gaussian noise vectors
of length M and Mu respectively. The simulation parameters used for the generation
of impulse response shown in Fig. 1 are M = 128, Mp = 30 and τ = 2.
The impulse response of the echo path generated for the first experiment of length
128 is provided in Fig. 2 whereas Fig. 3 shows the NMSD behavior of the proposed
algorithm in comparison to the standard algorithm.
Fig. 4
Concatenating impulse
response of the system
5 Conclusion
A novel arctangent least mean square/fourth algorithm was proposed in this work. It
was developed by embedding the standard LMS/F algorithm cost function into the
arctangent framework. The ATLMS/F algorithm’s performance was compared with
the standard LMS/F algorithm for system identification cases under impulsive noise
effect. The simulation results provided better steady-state values compared to the
standard algorithm.
Arctangent Framework Based Least Mean Square/Fourth Algorithm … 341
References
1. Diniz PS (2020) Introduction to adaptive filtering. adaptive filtering. Springer, Cham, pp 1–8
2. Wang S, Wang W, Xiong K, Iu HH, Chi KT (2019) Logarithmic hyperbolic cosine adaptive filter
and its performance analysis. IEEE Trans Syst, Man, Cybern: Syst
3. Chen B, Xing L, Zhao H, Zheng N, Prı JC (2016) Generalized correntropy for robust adaptive
filtering. IEEE Trans Signal Process 64(13):3376–3387
4. Gui G, Peng W, Adachi F (2014) Adaptive system identification using robust LMS/F algorithm.
Int J Commun Syst 27(11):2956–2963
5. Patnaik A, Nanda S (2020) The variable step-size LMS/F algorithm using nonparametric method
for adaptive system identification. Int J Adapt Control Signal Process 34(12):1799–1811
6. Patnaik A, Nanda S (2021) Reweighted zero-attracting modified variable step-size continuous
mixed p-norm algorithm for identification of sparse system against impulsive noise. In: Proceed-
ings of international conference on communication, circuits, and ystems: IC3S 2020, vol 728.
Springer Nature, p 509
7. Kumar K, Pandey R, Bora SS, George NV (2021) A robust family of algorithms for adaptive
filtering based on the arctangent framework. Express Briefs, IEEE Transactions on Circuits and
Systems II
8. Das RL, Narwaria M (2017) Lorentzian based adaptive filters for impulsive noise environments.
IEEE Trans Circuits Syst I Regul Pap 64(6):1529–1539
9. Khong AW, Naylor PA (2006) October. Efficient use of sparse adaptive filters. In:2006 Fortieth
asilomar conference on signals, systems and computers. IEEE, pp 1375–1379
Robotics and Autonomous Vehicles
Stabilization of Ball Balancing Robots
Using Hierarchical Sliding Mode Control
with State-Dependent Switching Gain
Sudhir Raj
1 Introduction
S. Raj (B)
SRM University, Amaravati, Andhra Pradesh, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 345
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_28
346 S. Raj
proposed controller. Extended Kalman filter [7] -based state estimation is carried out
for the ball bot to maintain its upright position. The proposed robot [8] consists of
three omnidirectional wheels with stepping motors. The observer is designed for the
stabilization of the ball beam system. The proposed sliding mode control [9] gives
better performance as compared to other linear controllers for the stabilization and
tracking of the ball bot. Neural network-based control [10] for trajectory tracking
and balancing of a ball balancing robot is carried out considering uncertainties.
The vertical position is achieved using the proposed controller, and it requires
less time to stabilize the ball bot system. The control input of the state-dependent
switching gain is less as compared to the Hierarchical sliding mode controller. The
objective of this work is to stabilize the ball bot in less time as compared to the
previous controllers as reported in the literature review. The comparison between the
two controllers is carried out to show the effectiveness of the proposed controller.
The ball bot is an underactuated system with four degrees of freedom and two control
inputs. There are three omni wheel motors in the ball bot. It is assumed that no slip
is occurring between the ball and the floor and between the ball and the wheels. The
equation of the ball bot is derived using the Euler-Lagrange formulation. The motion
of the ball bot is derived in the x-z and y-z planes. Figure 1 shows the ball bot in the
x-z plane. The Lagrangian L is calculated as the difference between the kinetic and
potential energy of the ball bot:
L =T −V (1)
= Tkx + Twx + Tax − (Vkx + Vwx + Vax ) (2)
1 Ik 3Iw cos 2 α 2
= m k + 2 ẏk2 + 2
ẏk + rk θ̇x
2 rk 4rw
1 1 2
+ Ix θ̇x2 + m a ẏk − I θ̇x cosθx
2 2
1
+ m a l θ̇x sin 2 θx − m a glcosθx
2 2
(3)
2
Therefore, the Lagrangian dynamics for the ball bot can be calculated as equation
number (4):
d ∂ Lx ∂ Lx 1 1
− = τx − D (q̇x ) (4)
dt ∂ q̇x ∂qx rw rk
The equations of the ball bot in the y-z plane can be taken as equation numbers
(5) and (6):
Stabilization of Ball Balancing Robots Using Hierarchical Sliding Mode … 347
The system equations of the ball bot in the y-z plane are taken as equation numbers
(7) and (8):
348 S. Raj
The mathematical equations describe the ball segway system dynamics in the x-z
plane as follows:
ẍ x b1 + b4 cosθ y − b3 θ̈ y − b4 θ̇ y2 sinθ y + bx x́k = −rw−1 τ y (9)
b4 cosθ y − b3 ẍk + θ̈ y b2 − b5 sinθ y + br y θ̇ y = rk rw−1 τ y (10)
where
Ik 3Iw cos 2 α
b1 = m k + + m a +
rk2 2rw2
3Iw rk2 cos 2 α
b2 = m a l 2 + + Iy
2rw2
3Iw cos 2 α
b3 = rk
2rw2
b4 = m a l
b5 = m a gl
where
Stabilization of Ball Balancing Robots Using Hierarchical Sliding Mode … 349
Fy1 q y , q̇ y = A−1 y [b2 b4 sinθ y θ̇ y − bx ẋ k
2
+ b3 − b4 cosθ y b5 sinθ y − br y θ̇ y ]
G y1 q y = A−1 y rw
−1
b2 − b3rk + b4 rk cosθ y
Fy2 q y , q̇ y = A−1 y [ b3 − b4 cosθ y b4 sinθ y θ̇ y2 − bx ẋk
+ b1 b5 sinθ y − br y θ̇ y ]
G y2 q y = A−1 −1
y rw rk b1 − b3 + b4 cosθ y
2
A y = b1 a2 − b4 cosθ y − b3
The sliding mode surfaces for the y-z plane are given by equation numbers (13)
and (14):
where cx1 and cx2 are constants, and ex1 and ex2 are taken as tracking errors:
ṡx1 and ṡx2 are equated to zero for finding the equivalent control of subsystems:
τxeq1 = −G −1
x1 (q x ) [cx1 ẏk + Fx1 (q x , q̇ x )] (19)
−1
τxeq2 = −G x2 (qx ) cx2 θ̇x + Fx2 (qx , q̇x ) (20)
The hierarchical sliding mode control can be taken as Sx1 = sx1 . Equation number
(21) gives the sliding mode control law for the first layer. The Lyapunov function is
taken as equation number (22):
The τxsw1 is the switching control of the first layer of Sliding mode control. Vx1 (t)
is differentiated with respect to time t:
τx1 = τxeq1 + G −1
x1 (q x ) Ṡx1 (25)
The sliding mode control for the second layer can be taken as S1 and s2 , respec-
tively:
Sx2 = αx Sx1 + sx2 (26)
where αx is the sliding mode parameter. The sliding mode control law for the second
layer can be taken as equation number (27):
where τxsw2 is the switching control of the second layer of sliding mode control.
Vx2 (t) is differentiated with respect to time t:
β and γ are taken as positive constants. The switching gain ηx1 is a function of
the state variable. Integrating both sides of the equation from 0 to t,
t t
V̇ d x = − ηx1 Sx1 sat (Sx1 ) d x
0 0
t
V (t) − V (0) = − ηx1 Sx1 sat (Sx1 ) d x
0
t
V (0) = V (t) + ηx1 Sx1 sat (Sx1 ) d x
0
t
V (0) ηx1 Sx1 sat (Sx1 ) d x
0
It follows from Eq. (36) that lim Sx1 = 0. As a consequence of this, the second-
t→∞
level sliding surface is asymptotically stable.
352 S. Raj
4 Simulation Results
Simulation results for the ball bot in the y-z plane are shown in Figs. 7, 8, 9, 10
and 11, respectively. The initial conditions of the ball bot in the y-z plane y, ẏ, θx , θ˙x
are taken as −25, 0, 6.5◦ , 0.
354 S. Raj
5 Conclusion
References
1. Pham DB, Lee S-G (2018) Hierarchical sliding mode control for a two-dimensional ball segway
that is a class of a second-order underactuated system. J Vib Control 25(1):72–83
2. Lee SM (2020) Bong Seok park: robust control for trajectory tracking and balancing of a
ballbot. IEEE Access 8:159324–159330
3. Hasan A (2020) eXogenous Kalman filter for state estimation in autonomous ball balancing
robots. In: IEEE/ASME international conference on advanced intelligent mechatronics, Boston,
USA
4. Hertig L, Schindler D, Bloesch M, David Remy C, Siegwart R (2013) Unified State estima-
tion for a ballbot. In: IEEE international conference on robotics and automation. Karlsruhe,
Germany
5. Nagarajan U, Kantor G, Hollis R (2014) The ballbot: An omnidirectional balancing mobile
robot. Int J Robot Res 33(6):917–930
6. Nagarajan U, Kantor G, Holli RL (2009) Trajectory planning and control of an underactuated
dynamically stable single spherical wheeled mobile robot. In: IEEE international conference
on robotics and automation, Kobe, Japan (2009)
7. Herrera L, Hernandez R, Jurado F (2018) Control and extended Kalman filter based estimation
for a ballbot robotic system. In: Robotics Mexican congress, Ensenada, Mexico
8. Kumagai M, Ochiai T (2008) Development of a robot balancing on a ball. In: International
conference on control, automation and systems, Coex, Seoul, Korea
9. Lal I, Codrean A, Busoniu L (2020) Sliding mode control of a ball balancing robot. In: 21st
IFAC world congress. Berlin, Germany
10. Jang H-G, Hyun C-H, Park B-S (2021) Neural network control for trajectory tracking and
balancing of a ball-balancing robot with uncertainty. Appl Sci 11(11):1–12
Programmable Bot for Multi Terrain
Environment
1 Introduction
IFR (International Federation of Robotics) aims at promoting the research and devel-
opment in the field of robotics, industrial robots and service robots as well as setting
standards to the design and manufacturing of the robots worldwide. Development of
Robotics and Automation in India is monitored by the All-India Council for Robotics
and Automation (AICRA) [1]. The organization aims in making India the global
leader in the field of Robotics, Artificial Intelligence and Internet of Things (IoT). It
provides support to educational institutions to produce the best talents in this field
[2]. An intelligent autonomous system requires accurate information about location
of the vehicle and present road scenario. The system must be robust to handle adverse
weather conditions. Algorithm must be designed to identify road margins with toler-
ably minimum error. This is possible from measurements obtained from equipment
such as laser sensors and camera. Even in case of having incomplete information,
Autonomous vehicles should be able to take quick decisions that mostly might not
be considered by the programmer.
A miniature version of Autonomous vehicle is an Autonomous bot which is also
expected to move from specified source to destination without human intervention
or minimal intervention. This paper discusses about developing an autonomous bot
or self-navigating bot equipped with a Kinect sensor for capturing image. It also has
an IR camera which can generate a depth image. It is interfaced with a micropro-
cessor and a Dell Vostro laptop using ROS framework on Ubuntu. Obstacles may be
dynamic or static. Thus there are atleast two valid approaches to solve such prob-
lems. Ultrasonic sensors are attached to the bot to detect immediate moving objects
in its path. It is controlled by an Arduino Uno. A YOLOv4 model is developed for
object detection on images captured by Kinect RGB camera and bot coordinates are
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 357
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_29
358 K. R. Sudhindra et al.
collected by a GPS module. The following sections describe the development stages
of the project. In Sect. 2, the block diagram of proposed solution with Hardware and
Software Architecture are illustrated and described. In Sect. 3, all the implementa-
tions of the Self-Navigating bot are discussed. Section 4 discusses the results of each
implementation, and finally, conclusions are given in Sect. 5.
The Self-Navigating bot development involves both software and hardware inter-
facing of different components. Raspberry Pi acts as the main processor for handling
Kinect sensor, running on Ubuntu-20.04 LTS using ROS framework. Arduino Nano
collects data from speed sensor for odometry of bot and sends same data to Pi
and controls motors based on Pi signal. Arduino Uno collects data from GPS(Neo-
6M Module) and IMU (MPU6050) and conveys it to Pi for location identification
and orientation of the bot respectively. Ultrasonic sensors are connected for imme-
diate obstacle avoidance and YOLOv4 is implemented using OpenCV and machine
learning for object detection on Pi. The flow chart depicting the operation of the bot
with the necessary hardware required is depicted in Fig. 1. The software packages and
algorithms required for interfacing with the hardware and successful implementation
of the prototype is as shown in Fig. 2.
Collision avoidance is based on reconfiguration method where the joints are made
active/passive to enable collision-free tip trajectory. Previous works on collision
avoidance are based on optimization approaches but with inherent limitations like
not having any information about the manipulator configuration after collision
avoidance [12].
3 Implementation
In this section, the integral parts of implementation such as Universal Robot Descrip-
tion Format (URDF) model creation, design of the hardware model, object detec-
tion module, Simultaneous localization and mapping (SLAM), path planning and
interfacing of different components with Arduino are discussed.
A 3D model of the robot will be initially designed using SolidWorks software and
built using the chassis, motors, controller, and circuit connections. The design of the
3D model is shown below in Fig. 3, the chassis is made up of acrylic of 4mm in
thickness. The robot has a differential drive mechanism, which is a two-stage body
Programmable Bot for Multi Terrain Environment 359
with two wheels and a castor wheel. The robot has a Kinect on top of the flat acrylic
slate supported by an acrylic plate on top of spacers. Then the model is extracted
into URDF (to provide the transforms between the joints for ROS integration and
simulation purposes. Later the URDF is used to perform the simulation in RViz along
with some ROS plugins. The robot is made to move in all possible directions and
speed. Its movement is observed for any deviations due to weight distribution while
both motors are given the same velocity. Figure 3a shows the robot model created in
SolidWorks and Fig. 3b shows the model simulated in RViz.
As SolidWorks model depicted the hardware bot is designed with acrylic of 4mm
thickness chasis and connections are made similar to block diagram and final model
is shown in Fig. 4.
360 K. R. Sudhindra et al.
3.4 SLAM
A ROS Navigation algorithm is developed using ROS framework. The map of the
environment is built which acts as a reference for navigation, localization of the
robot in 3D space, and path planning from the current position to the user’s given
destination position avoiding both dynamic and static obstacles. Localization can be
achieved by a SLAM technique called RTAB-map available with ROS framework.
Localization is identifying the robot’s position and orientation with respect to the
environment. It is an RGB-D graph SLAM method based on the global Bayesian
loop closure detector. It uses an approach that, how often a new frame is captured
using the Kinect sensor, from a new location or old location, which is known as loop
closure detector.
362 K. R. Sudhindra et al.
IMU and encoder ticks are used to create odometri1y to localize the robot in
the map. Initially, sub-maps are created using the consecutive scan data from the
Kinect sensor which is a probability grid (2D matrix) for a specific region of space,
the values indicate the probability of grid being obstructed. After the completion of
environment mapping, the map data is stored in the form of rtabmap.db database. The
launch folder contains four ROS node launch configurations and the config directory
contains the RViz configuration file, and a script for tele-operating the bot can be
found in the script.
Path planning is performed using several functions. Move base is used for path
planning, responsible for the functions like robot controlling, traversing, and trajec-
tory planning. Given a goal in the world, move base will publish the required veloc-
ities to move the robot base towards the goal by using global plan and local plan.
Cost map is a map data type that uses laser sensor data and saved maps to update
the information about both dynamic and static obstacles. For instance, if it is 2 m, it
means that the cost of the cells from the obstacle starts exponentially decreasing and
when the distance from the obstacle is more than 2 m the cost of the cell due to this
obstacle is zero. There are two types of cost maps. They are global and local cost
maps where global cost map is a static map, which considers only static obstacles
and local cost map accounts mainly for dynamic obstacles.
The move base path planner subscribes to the map topic along with wheel odom-
etry and laser scan and publishes global and local plans. The planners further
subscribe to their respective cost map and calculate the velocity at which the robot
should move and publishes that data over the topic cmd_vel of message type geom-
etry/Twist. The differential drive node subscribes to this twist message and calculates
the velocity for two motors independently based on linear velocity in x direction and
angular velocity in z direction.
It publishes two messages of float type (ex: +/−40.0). The sign indicates clock-
wise or anti-clockwise rotation, magnitude indicates the velocity value in m/s. With
help of ROS serial, the Arduino subscribes to both the values and actuates the two
motors based on the velocity commands.
GPS module, Speed sensor, ultrasonic sensor, IMU, and keypad are interfaced with
the Arduino. A Ublox Neo-6M GPS module is connected to the Arduino Uno. It uses
serial Communication connected over Rx and Tx pins using UART protocol with a
default baud rate of 9600. GPS module needs to lock on to 2–3 satellites for receiving
coordinates of bot which may take upto 3–5 min. This delay is present because, the
on-chip EEPROM needs to charge up to a certain level to get a lock on the satellites.
A speed sensor is connected to an Arduino Nano and an encoder disk is attached to
motor shaft, where the disk rotation implies motor rotation by counting ticks. The
same data is used to calculate the odometry of the bot.
Programmable Bot for Multi Terrain Environment 363
1
D= ×c×t (1)
2
where D is the distance, c is the speed of sound and t is the time taken for the wave
to return. A total of three ultrasonic sensors are used for three different directions.
A MPU6050 and Magnetometer (QMC833L) are connected to the Arduino Uno
for orientation of bot. It is based on I2C communication and the data can be collected
using the same. Rosserial communication is used to later publish the data to ROS
framework.
The geo-location detected from GPS and Arduino interface is published to ROS
framework, for navigation in autonomous mode towards its goal set by user. In
ROS, geographiclib python library and WGS ellipsoid are used to convert the geo-
coordinates into cartesian coordinates corresponding to the occupancy grid map.
The location info can be sent in using either sensor msgs/NavSatFix or geometry
msgs/PoseStamped message format. In pose stamped method bots desired orientation
data i.e., quaternion (x, y, z, w) are sent. A launch file named initialize origin will
initialize and sets origin to (0, 0, 0) and publish geometry msgs/PoseStamped message
to local xy origin frame parameter of ROS coordinate frame.
A 3*4 keypad is connected to raspberry pi for insertion of security code. A security
feature for delivery type bots is developed based on the Pi and keypad interface.
Whenever a key on the keypad is pressed, that column gets high and pi sends high
signals to each row so based on row and column combination the pressed key can
be determined. A python library random is used to generate a random key code of
specific length as password and using pywhatkit python library, the code can be sent
to selected users.
In this work, indoor environment is considered for testing the bot. Results corre-
sponding to RTAB-Map, object detection, GPS and Ultrasonic sensors are presented
and discussed. SLAM is achieved on ROS framework using RTAB-map node. Results
were obtained as SLAM map for the cases of absence of bot, presence of bot and
bot navigating in the region of interest. These cases are depicted in Figs. 5, 6 and 7
respectively.
YOLOv4 successfully detected the objects both on webcam and a video. Figure 8
shows an example of object detection performed using YOLOv4.The encoder ticks
364 K. R. Sudhindra et al.
from speed sensor and orientation from mpu6050 were used to create odometry data.
Figure 9 shows the latitude and longitude values obtained from the GPS module.
The current location of the bot were used as origin and the destination coordinates
are to be given manually. The goal of GPS for a certain area of interest is depicted
in Fig. 10.
Figure 11 shows the data gathered from all 3 ultrasonic sensors at a time.
which enables users to visualize the robot and the occupancy grid map in real time.
The SLAM algorithm was initially developed using gmapping. To increase the speed
of mapping and to navigate in unexplored areas, RTAB-map was used at the cost of
system computation. The testing environment was limited to a small area of 3 m2
due to the range constraint of the Kinect sensor. The robot was tested for different
real time scenarios. In future work, we plan to increase the size of the testing area,
Programmable Bot for Multi Terrain Environment 367
Acknowledgements The Authors would like to thank B.M.S. College of Engineering for
supporting to carry out this work.
References
1. Aziz MVG, Prihatmanto AS (2017) Implementation of lane detection algorithm for self-driving
car on toll road using python language. In: 4th international conference on electric vehicular
technology (ICEVT 2017). ITB Bandung, Indonesia
2. Prabhu S, Kannan G, Indra Gandhi K , Irfanuddin, Munawir (2018) GPS controlled autonomous
bot for unmanned delivery. In: International conference on recent trends in electrical, control
and communication (RTECC 2018), Chennai
3. Brahmanage G, Leung H (2017) A Kinect-based SLAM in an unknown environment using
geometric features. In: International conference on multisensor fusion and integration for
intelligent systems (MFI 2017), Daegu, Korea, 16–18 Nov 2017
4. Jape PR, Jape SR (2018) Virtual GPS guided autonomous wheel chair or vehicle. In: 3rd
international conference for convergence in technology (I2CT 2018). The Gateway Hotel,
XION Complex, Wakad Road, Pune, India, 06–08 Apr 2018
5. Thorat ZV, Mahadik S, Mane S, Mohite S, Udugade A (2019) Self-Driving car using Raspberry-
Pi and machine learning. Int Res J Eng Technol (IRJET), Navi Mumbai 6(3)
6. Das S, Simultaneous localization and mapping (SLAM) using RTAB-Map. https://arxiv.org/
pdf/1809.02989.pdf
7. ROS Noetic. http://wiki.ros.org/noetic/Installation/Ubuntu
8. rtabmap. http://wiki.ros.org/rtabmapros/Tutorials/SetupOnYourRobot
9. movebase. http://wiki.ros.org/movebase
10. IFR World Robotics. https://ifr.org/worldrobotics/
11. https://www.youtube.com/watch?v=u9l-8LZC2Dc
12. Dalla VK, Pathak PM (2015) Obstacle avoiding strategy of a reconfigurable redundant space
robot. In: Proceedings of the international conferences. on integrated modeling and analysis in
applied control and automation
A Computer Vision Assisted Yoga
Trainer for a Naive Performer by Using
Human Joint Detection
1 Introduction
Yoga has recently gained worldwide popularity due to its physical and mental bene-
fits. Everyone needs to practice yoga to establish a balance between themselves and
their surrounding environment. The United Nations General Assembly declared June
21st as the ’International Day of Yoga’ in 2014 [1]. COVID-19’s ambiguity, as well
as the subsequent lockdown, created a great deal of worry, tension, and anxiety and
we were all compelled to remain at home, making life extremely difficult [2]. Over
the last few years, yoga has received a lot of attention in the field of healthcare.
Yoga helps in the reduction of stress and anxiety, as well as the improvement of
physical health and the minimization of negative mental effects [3]. People who do
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 369
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_30
370 R. Sachdeva et al.
not have a clear grasp of yoga begin practicing it without proper direction, and as a
consequence, they harm themselves while performing due to their incorrect posture.
It should be performed under the guidance of a professional.
Human Pose Estimation is a well-studied topic with applications in a variety of
fields, including human–computer interaction, virtual reality, robots, and many more
[4]. A perfect blend of these techniques can create wonders. Many frameworks and
keypoint detection libraries for pose estimation have been introduced which makes
it easier for everyone to build AI-based applications. One of them is the Mediapipe
framework by Google for solving problems such as face detection, hands, pose,
object detection, and many more using machine learning [5].
The aim of our method is to correct the user’s yoga asana in real-time. We have
developed a user-friendly Python-Flask based web application that assists its regis-
tered users to perform every pose accurately. The user is given feedback on how to
modify their incorrect posture. The name of our web application is “Yuj” which is a
Sanskrit root word for yoga: meaning to join or to unite [6].
“Yuj” is currently functioning for four asanas: Adho Mukha Svanasana
(downward-facing dog posture), Phalakasana (plank pose), Trikonasana (triangular
pose) and Virabhadrasana II (warrior-2 pose). The rationale for selecting these four
asanas comes from the ease of availability of professional videos on the web, as well
as the fact that these asanas are highly popular among people and simple for those
who are new to yoga or novices.
Related work, methodology, and results are discussed in the following sections,
followed by concluding remarks. Section 2 provides an overview of the work that has
been proposed by others in the area. In Sect. 3, data collection and methodology are
described. Experimental results are discussed in Sect. 4. Sections 5 and 6 examine
concluding remarks and future prospects, respectively.
2 Related Work
A plethora of work has been proposed for the identification of human posture. Chen et
al. [7] proposed a yoga self-training system that uses a Kinect depth camera to assist
users in correcting postures while performing 12 different asanas. It uses manual
feature extraction and creates separate models for each asana.
Trejo et al. [8] suggested a yoga recognition system for six asanas using Kinect
and Adaboost classification and achieved an accuracy of 94.78%. For identification,
they employed a depth sensor-based camera, which may not be generally available
to the general public.
Borkar et al. [9] have developed a method called Match Pose, to compare a user’s
real-time pose with a pre-determined posture. They employed the PoseNet algorithm
to estimate users’ poses in real-time. They compared and checked whether users’ real-
time poses were properly replicated using pose comparison algorithms. The proposed
approach enabled the user to choose only the image they wanted to replicate. After
then, the user’s real-time postures were collected using a camera and analyzed using
A Computer Vision Assisted Yoga Trainer for a Naive Performer … 371
a human pose estimation algorithm. The same technique was used to process the
image from the database that was chosen. Yoga postures with finger placements are
not comparable in the system.
Rishan et al. [10] proposed a yoga posture detection and correction system, that
uses open pose to detect body keypoints and a Deep Learning model that analyzes and
predicts user posture or asana using a sequence of frames utilizing time-distributed
Convolutional Neural Networks, Long Short-Term Memory, and SoftMax regression.
OpenPose is a real-time multi-person keypoint detection library introduced by the
Perceptual Computing Lab of Carnegie Mellon University (CMU) [11]. It can jointly
detect a human body, hand, facial, and foot keypoints on a single image.
Islam et al. [12] used the Microsoft Kinect to capture a person’s joint coordinates
in real-time. This system can only detect yoga poses; it cannot, however, assist the
user in correcting an incorrect yoga posture.
Hand tracking is a key component that enables natural interaction and conversa-
tion, and it has been a subject of great interest in the industry. A significant portion of
previous work necessitated the use of specialized hardware, such as depth sensors.
In one investigation [13], the author used Mediapipe to demonstrate a real-time on-
device hand tracking system that uses a single RGB camera to identify a human
hand skeleton. It also presents a unique approach that works in real-time on mobile
devices and does not require any additional hardware.
3 Proposed Methodology
The overall workflow of the system is as follows: The user has to first register on
“Yuj”. Then he/she can select their desired asana from the following asana after
logging in the system: Adho mukha svanasana (downward-facing dog), Phalakasana
(plank), Trikonasana (triangular pose), and Virabhadrasana II (warrior-2 pose) as
shown in Fig. 1.
As soon as the pose is selected, the webcam is activated and the user starts
performing the selected pose. Once in position, he/she shows a closed fist gesture
to the webcam, this starts the video recording and his/her posture is captured. After
recording the pose for 5 sec duration, visual and textual feedback is generated and
provided to the user. Figure 2 shows the flowchart of our implementation.
It is hard to find an accurate and effective yoga-pose video dataset on the web. We
gathered videos of people of various age groups and genders performing four yoga
asanas: Virabhadrasana II (warrior-II), Trikonasana (triangular pose), Phalakasana
(plank), and Adho Mukha Svanasana (downward-facing dog) from various online
sources, including video channels and websites, for training purpose. According to
372 R. Sachdeva et al.
the survey conducted by Patanjali Research Foundation [14] on 3135 yoga experi-
enced persons. It is found that most of the people in the age group of 21–44 years,
45–59, and more than 60 years have a higher belief in the benefits of yoga and its
practice. So, in our data collection of yoga videos, we have considered data ratios
(shown in Fig. 3) similar to those provided in Table 2 of the survey [14].
374 R. Sachdeva et al.
A total of 50 videos were collected for testing and training purposes. In Fig. 3 from
the term training data, we refer to those video datasets with which we determined
the angle ranges for feedback generation. Whereas, the term testing data here refers
to the ones we have used for observing the accuracy of our feedbacks.
For testing, all of the videos were recorded for 5 sec in an indoor as well as outdoor
location at a frame rate of 20 frames per second (shown in Fig. 4). Table 1 describes
the 4 poses which registered users can perform on Yuj.
Fig. 4 Row 1,2 and 3 represents testing data for age group 10–20 years, 21–44 years and > 60 years
age group respectively
A Computer Vision Assisted Yoga Trainer for a Naive Performer … 375
3 Phalakasana (Plank)
Table 2 Number of
Yoga posture No. of videos of
professionals’ videos
professionals
observed for each pose
Virabhadrasana II 10
Trikonasana 9
Phalakasana 8
Adho Mukha Svanasana 8
376 R. Sachdeva et al.
To identify the timestamp when the user is ready in pose, we have introduced the
concept of hand gesture recognition in our code. A specific gesture is defined by
us, which when identified the very first time will command the machine to work on
pose recognition and stop hand gesture recognition. In order to minimize latency and
complexity, we have aimed to work only on those frames in which the user is ready in
pose and there is a minimum deflection. To calculate minimum delflection, minimum
deviation in keypoints of the user between adjacent captured frames is observed.
The Mediapipe Hands solution (initializing command: “medi-
apipe.solutions.hands”) is used here for hand keypoints detection of the hand with
detection confidence of 0.7. We have deduced 21—three dimensional landmarks of
a hand from a single frame (Fig. 5b depicts all the 21 keypoints). In our approach,
the mediapipe’s palm detector (which has an average precision of 95.7% in palm
identification) works on a full webcam captured image of 640 × 480 and locates
palms via an aligned hand bounding box.
The detection of hand gestures is done with the help of finger count and frame
count. A “closed fist” hand gesture is used as an initializing gesture to activate human
pose recognition as shown in Fig. 5a. It will be identified when the finger count = 0
for continuous 50 frames (frame count = 50 f). 50 frames count means holding the
closed fist gesture for 2.5 s (50 f/20fps = 2.5 s), this wait time makes sure that the
triggering gesture is shown by the user when he/ she is actually ready in pose.
We have defined an array which consists of the hand landmarks of the tips of all
fingers (Fig. 5b shows hand landmarks defined by Mediapipe): tips = [ 4, 8, 12, 16,
20].
Fig. 5 a “Closed fist” gesture which acts as a trigger to start recording of video b Detailed
information of Hand Landmarks in Mediapipe [15]
A Computer Vision Assisted Yoga Trainer for a Naive Performer … 377
Once the closed fist gesture is detected, the incoming video stream is fed to medi-
apipe pose pipeline (its pose detector working is covered in brief in Fig. 6) for pose
landmarks detection [16].
Upon correct detection of pose in a frame, those frames are processed in real-
time to obtain the pose 33 landmarks (joint coordinates) and a live stick diagram is
displayed on the web page. Only x and y coordinates of human joints, normalized
to [0, 1] by the image width and height respectively, are fed in a csv file. Figure 7
depicts all the joint landmarks defined in mediapipe: it will play a vital role in our
feedback mechanism.
The landmark distance from camera is represented by the z coordinate in the
mediapipe, with the origin being the depth at the midway of hips, and the higher
the value, the farther the joint is to the camera. The value of z is determined using a
scale that is identical to that of x in the range [0, 1]. With x, y, z, and visibility the
Fig. 8 3D Plot of: a Adho mukha svanasana (downward face) b Phalakasana (plank pose)
c Trikonasana (triangular pose) d Virabhadrasana II (warrior-2 pose)
3D plots obtained for 4 different poses are shown in Fig. 8 but these 3D plots are not
displayed on our webpage they have been shown for a better understanding of the
concept.
To accurately define the angle ranges for various poses we have taken reference from
35 videos of professionals. Each frame of their video is used to determine the feasible
range of angles for particular joints of specific pose. Table 2 describes the number
of professional videos considered for each asana.
A Computer Vision Assisted Yoga Trainer for a Naive Performer … 379
We have used the mathematical formulae in Eqs. (1–3) to calculate the angle
between 3 joints.
Let’s consider 3 joints J1 , J2, and J3 .
To calculate angle between lines J1 - J2 and J2 - J3 :
Step 1:
Using distance formula to find distances J12 and J23
J12 = sqrt ((J1 (x) − J2 (x)). (J1 (x) − J2 (x)) + (J1 (y) − J2 (y)).(J1 (y) − J2 (y)))
(1)
J23 = sqrt ((J2 (x) − J3 (x)). (J2 (x) − J3 (x)) + (J2 (y) − J3 (y)) (J2 (y) − J3 (y)))
(2)
Step 2:
Using “Law of Cosine” for angle calculation by taking J2 as vertex
angle (J123 ) = arccos (J12 )2 + (J13 )2 −(J23 )2 /(2 ∗ J12 ∗ J13 ) (3)
After observation, we have found that only 8 angles are sufficient to uniquely
identify a particular pose as correct or incorrect. Given below is the list of angles
considered for pose corrections which is also available on our website:
LH = Angle between Left_shoulder, Left_elbow and Left_wrist.
RH = Angle between Right_wrist, Right_elbow and Right_shoulder.
LU = Angle between Left_hip, Left_shoulder and Left_elbow.
RU = Angle between Right_elbow, Right_shoulder and Right_hip.
LW = Angle between Left_shoulder, Left_hip and Left_knee.
RW = Angle between Right_shoulder, Right _hip and Right_knee
LL = Angle between Left_ankle, Left _knee and Left_hip.
RL = Angle between Right_ankle, Right_knee and Right_hip.
We have further calculated the average of all the feasible angles from all the videos
dataset of professionals depicted in Table 3. These values are taken as “Threshold
angle” values.
For feedback purposes angle range is categorized into two categories:
Table 3 The reference angle values obtained from several professional videos
Reference values LH RH LU RU LW RW LL RL
Virabhadrasana II 178o 178o 90o 90o 135o 90o 178o 90o
Trikonasana 175o 170o 1350 850 1650 60o 165o 1700
Phalakasana 90o 90o 90o 90o 167o 167o 178o 178o
Adho Mukha Svanasana 175o 175o 178o 178o 60o 60o 179o 179o
380 R. Sachdeva et al.
• Threshold angle ± 4° deviation → acceptable range (no feedback needed for that
particular angle)
• Correction will be given on angles exceeding this value.
4 Experimental Results
When the trigger of the “closed fist” (depicted by 0 in Fig. 5a) gesture is provided,
the code executes by parallel programming. The two side by side running processes
are
• Video Recording (discussed in detail in Sect. 4.1)
• Real-Time Pose Estimation (discussed in detail in Sect. 3.3).
Recording of a 5 s video is performed using OpenCV from the very moment the
trigger was captured. We have considered a 5 s timer because recording a video of
more than 5 s when the user is already in pose from the very start is an increasing
computational bottleneck and of no use. Once 5 s are over, the system saves that
recorded video onto the user’s downloads folder from the browser. The purpose of
adding this feature is to provide the recorded videos to the user for his/her reference.
For example, the users can compare their previously recorded videos with their latest
videos (so that they can observe their improvement over time).
Once the 5 s timer ends, on the front-end the pose webpage directs to the feedback
page, whereas backend processing is shown in Fig. 9.
The obtained csv file of joint coordinates is read and the mean of ten stillest frames
is obtained. Mean of ten stillest frames for only 12 joint coordinates (mentioned in
Table 4) are calculated to increase the accuracy and decrease the complexity thereby
reducing latency of code. The mean coordinates obtained from the csv file are the
most precise coordinates (for 5 s recorded video duration). The purpose of selecting
only 12 joints is that these joints are sufficient to calculate the 8 angles for feedback
generation.
Before moving to the angle approach, let’s compare the deviation of 4 poses
performed by the user with the reference pose i.e., professional’s pose. This deviation
approach is not a sufficient base to determine the feedback because when considering
the 8 angles of user and professional are almost the same but both plots don’t coincide
A Computer Vision Assisted Yoga Trainer for a Naive Performer … 381
as shown in Fig. 10. In the compared scatter plots (refer Fig. 10) the main cause of
deviations observed is due to an individual’s distance from the camera.
Figure 11 depicts a frame of the user’s performed pose. All the images shown in
Fig. 11 are only used for displaying the 8 angles used in the feedback mechanism
but none of them are displayed on the website.
This data is used to give precise visual feedback to the user by plotting a scatter
plot and displaying it on the web page. A list of strings is made to give textual
feedback for all the 8 joint coordinates (if wrongly positioned). Figure 12 depicts
the visual and textual feedback of one yogi performing different poses (an option to
select the pose is available on our website). The feedback for the performed yoga
pose is generated by the web application.
The initial word in textual feedback is categorized as
• Excellent!—When no angle exceeds beyond the acceptable range
• Good!—When 1 or more angles are beyond the acceptable range
• Oops!—When no angles are acceptable.
A Computer Vision Assisted Yoga Trainer for a Naive Performer … 383
To improve the overall visual feedback, those joint angles having a value greater
than the acceptable range are highlighted with green sticks. Figure 13 shows the
visual and textual feedback displayed on our webpage.
5 Conclusion
Fig. 13 Stick diagram with textual feedback a Phalakasana (plank pose) b Trikonasana (triangular
pose) c Virabhadrasana II (warrior-2 pose) d Adho mukha svanasana (downward face)
6 Future Prospects
Further enhancements to the web app may be developed by including the concept of
posture classification so that users can perform any pose they desire rather than being
prompted to select a yoga pose. Data set collection is relatively small to perform this
operation which can be further extended to get more accurate results. Our app is
restricted to four asanas at the moment: Adho mukha svanasana (downward-facing
dog posture), Phalakasana (plank pose), Trikonasana (triangular pose) and Virab-
hadrasana II (warrior-2 pose) which can be extended to include a variety of other
yoga poses such as Suryanamaskar, Bhujangasana, Padmasana, etc. Furthermore,
this can also be extended to sports-related activities. It can be applied for evaluating
skating element’s quality, tracking and estimating 3D human poses of the player
and for estimating jumps of various types which can be beneficial for sportsmen in
many ways including coordination checks and preventing injuries. The system can
be improved further by incorporating voice feedback.
References
1. Guddeti RR, Dang G, Williams MA, Alla VM (2019) Role of Yoga in cardiac disease and
rehabilitation. J Cardiopulm Rehabil Prev 3:146–152
2. Rodríguez-Hidalgo AJ, Pantaleón Y, Dios I, Falla D (2020) Fear of COVID-19, Stress, and
Anxiety in University undergraduate students: a predictive model for depression. Front Psychol
11
3. Sharma YK, Sharma S, Sharma E (2018) Scientific benefits of Yoga: a review. Int J Multidiscip
Res 03:11–148
4. Chen Y, Tian Y, He M (2020) Monocular human pose estimation: a survey of deep learning-
based methods. Comput Vis Image Understand
5. Lugaresi C, Tang J, Nash H, McClanahan C, Uboweja E, Hays M, Zhang F, Chang CL, Yong
MG, Lee J, Chang WT, Hua W, Georg M, Grundmann M (2019) MediaPipe: a framework for
building perception pipelines
6. Yoga: Its Origin, History and Development: https://www.mea.gov.in/search-result.htm?
25096/Yoga:_su_origen,_historia_y_desarrollo#:~:text=The%20word%20’Yoga’%20is%20d
erived,and%20body%2C%20Man%20%26%20Nature. Accessed 2021
7. Chen HT, He YZ, Hsu CC (2018) Computer-assisted yoga training system. Multimed Tools
Appl 77:23969–23991
8. Trejo EW, Yuan P (2018) Recognition of Yoga poses through an interactive system with kinect
device. In: 2018 2nd international conference robotics and automation science: ICRAS, pp
12–17
9. Borkar PK, Pulinthitha MM, Pansare A (2019) Match pose—a system for comparing poses.
Int J Eng Res Technol (IJERT) 08(10)
10. Rishan F, Silva BB, Alawathugoda S, Nijabdeen S, Rupasinghe L, Liyanapathirana C (2020)
Infinity Yoga Tutor: Yoga posture detection and correction system. In: 2020 5th international
conference on information technology research
11. Cao Z, Simon T, Wei SE, Sheikh Y (2017) OpenPose: realtime multi-person 2D pose estimation
using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern
recognition (CVPR), pp 7291–7299
386 R. Sachdeva et al.
12. Islam MU, Mahmud H, Ashraf FB, Hossain I, Hasan MK (2017) Yoga posture recognition
by detecting human joint points in real time using microsoft Kinect. In: 2017 IEEE region 10
humanitarian technology conference (R10-HTC), Dhaka, pp 668–673
13. Zhang F, Bazarevsky V, Vakunov A, Tkachenka A, Sung G, Chang CL, Grundmann G (2021)
MediaPipe hands: on-device real-time hand tracking
14. Telles S, Sharma SK, Chetry D, Balkrishna A (2021) Benefits and adverse effects associated
with yoga practice: a cross-sectional survey from India. Complementary therapies in medicine.
Elsevier
15. MediaPipe Github. https://google.github.io/mediapipe/solutions/hands. Accessed 2021
16. On-device, Real-time Body Pose Tracking with MediaPipe BlazePose. https://ai.googleblog.
com/2020/08/on-device-real-time-body-pose-tracking.html. Accessed 2021
17. MediaPipe Github. https://google.github.io/mediapipe/solutions/pose. Accessed 2021
Study of Deformation in Cold Rolled Al
Sheets
1 Introduction
Rolling is a commonly used method to reduce the thickness of the sheet. The
generally applied parameters for rolling simulation are the radius of the rolls, roll
velocity, friction coefficient, initial and the final thicknesses of a rolled sheet [1]. In
general, the reference directions are indicated according to the following scheme: x,
y and z correspond to rolling (RD), transverse (TD), and normal (ND) directions,
respectively.
Previous studies on materials flow during cold rolling [1, 2] suggest that the
displacement field across the thickness is not homogeneous and can be assessed by
the function:
d x = α · zn (1)
h 0 1 h 0 −h
1 h ln h
+4 R
μmin = (2)
2 R tan−1 hh0 − 1
where μmin is the minimum friction coefficient necessary for cold rolling, h is the
sheet thickness of the deformed sheet, h0 is the thickness prior to rolling, and R is
the radius of the rolls.
In numerous studies [2–6], the value of friction coefficient is estimated either
by analytical approximations or results of finite element modeling, however, the
exact quantity remains unknown. In this view, this contribution presents a way that
allows asses the friction coefficient based on experimental evidence and finite element
calculations.
2 Modeling Methods
3 Model Parameters
The model parameters used to simulate the rolling process are presented in Table 1.
The minimum value of μ, calculated with Eq. 2, is 0.048 and therefore the following
COF values were used for the simulation: 0.05, 0.06, 0.07, 0.08, 0.10, 0.15, 0.20 and
0.25.
In order to examine the deformation flow in the rolled Al sheet, the TD plane of
a virgin (deformation-free) material was marked by the microindentation technique,
and as a result, rectangular patterns were created (see Fig. 4). The distortion of
initially straight lines (perpendicular to RD) after 30% reduction (with a roll diameter
of 150 mm) is shown in Fig. 5. The displacement values can be determined using
the function expressed by Eq. 3. This equation is a polynomial approximation of
Eq. 1, and the advantage of expression 3 is that it can be used for the nonmonotonic
displacement patterns, which appear at high friction coefficients.
d x = A · z8 + B · z6 + C · z4 + D · z2 (3)
where coefficients A, B, C, and D are fitting parameters and their values are listed in
Table 2 for various friction coefficients.
Fig. 4 Reference patterns, made by microhardness indentation on the plane perpendicular to the
TD prior to rolling (rolling direction is parallel to the scalebar)
Study of Deformation in Cold Rolled Al Sheets 391
Fig. 5 Displacement of
microhardness patterns after
30% thickness reduction
(rolling direction is
perpendicular to the
scalebar)
Analyzing the data of Table 2, one can conclude that the fitting parameters A-D
are functions of friction coefficient μ and can be calculated by employing Eqs. 4–7.
The corresponding displacement patterns for various COFs are shown in Fig. 6.
0.15
SIM
FIT
MEA
dx(mm)
0.10
0.05
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
z (mm)
Fig. 6 Experimentally observed (MEA) and calculated displacement patterns by FEM (SIM) and
analytical expression 3 (FIT)
The strain values can be subdivided into two groups: normal and shear components
[9]. The normal strain can be computed by using Eq. 8 [10, 11], while the shear
component can be estimated by Eqs. 9 and 10 [12, 13]. Once both components are
known, the value of equivalent strain can be determined by Eq. 11 [13].
h0
ε = εx = −εz = ln (8)
h
2(1 − ε)2 1
εs = γ ln (9)
ε(2 − ε) 1−ε
Study of Deformation in Cold Rolled Al Sheets 393
0.20
(1) 0.15
0.10
s
0.05
0.00
Fig. 8 Shear strain values computed for different friction coefficients (μ from bottom to top: 0.05,
0.06, 0.07, 0.08, 0.10, 0.15, 0.20 and 0.25)
dx
γ = (10)
dz
2
4 1 ε2
εvM = ln + s (11)
3 1−ε 3
5 Summary
In this study, the friction coefficient was determined for a given roll gap geometry
based on both experimental evidence and numerical simulations. It was shown that
rolling of Al sheet with 30% thickness reduction with a roll diameter of 150 mm
Study of Deformation in Cold Rolled Al Sheets 395
0.43
(1)
0.42
vM
0.41
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
z (mm)
Fig. 9 Equivalent strain values calculated for different friction coefficients (μ from bottom to top:
0.05, 0.06, 0.07, 0.08, 0.10, 0.15, 0.20 and 0.25)
accounts for the friction of 0.068 and this value correlates well with the one reported
in literature sources.
A new polynomial function was developed for the estimation of displacement
fields during cold rolling. The model parameters for the polynomial equation were
determined by analyzing the data obtained from finite element calculations. It was
shown that the analytical expression developed is capable of reproducing the FEM
outputs with high accuracy.
The measured displacement profile values were used for validation of the simu-
lated data. The newly developed model accurately reproduces the experimentally
observed deformation flow profile. The correlation coefficient between the measured
and simulated values is estimated to be 0.871.
The model parameters of the polynomial function developed can be determined
for various rolling conditions by the algorithm described in the current study. The
analytical model can also be extended to other materials.
Acknowledgements Project no. TKP2021-NVA-29 has been implemented with the support
provided by the Ministry of Innovation and Technology of Hungary from the National Research,
Development, and Innovation Fund, financed under the TKP2021-NVA funding scheme.
References
1. Bátorfi JGy, Chakravarty P, Sidor J (2021) Investigation of the wear of rolls in asymmetric
rolling. eis 14–20. https://doi.org/10.37775/EIS.2021.2.2
2. Sidor JJ (2019) Assessment of flow-line model in rolling texture simulations. Metals 9:1098.
https://doi.org/10.3390/met9101098
396 J. G. Bátorfi and J. J. Sidor
3. Avitzur B (1980) Friction-aided strip rolling with unlimited reduction. Int J Mach Tool Des
Res 20:197–210. https://doi.org/10.1016/0020-7357(80)90004-9
4. Decroos K, Sidor J, Seefeldt M (2014) A new analytical approach for the velocity field in
rolling processes and its application in through-thickness texture prediction. Metall Mat Trans
A 45:948–961. https://doi.org/10.1007/s11661-013-2021-3
5. Cawthorn CJ, Loukaides EG, Allwood JM (2014) Comparison of analytical models for sheet
rolling. Procedia Eng 81:2451–2456. https://doi.org/10.1016/j.proeng.2014.10.349
6. Minton JJ, Cawthorn CJ, Brambley EJ (2016) Asymptotic analysis of asymmetric thin sheet
rolling. Int J Mech Sci 113:36–48. https://doi.org/10.1016/j.ijmecsci.2016.03.024
7. Fluhrer J DEFORM(TM) 2D Version 8.1 User’s Manual
8. Beausir B, Tóth LS (2009) A new flow function to model texture evolution in symmetric and
asymmetric rolling. In: Haldar A, Suwas S, Bhattacharjee D (eds) Microstructure and texture
in steels. Springer, London, pp 415–420
9. Bátorfi JGY, Sidor J (2020) Alumínium lemez aszimmetrikus hengerlése közben fellépő de-
formációjának vizsgálata. eis 5–14. https://doi.org/10.37775/eis.2020.1.1
10. Pesin A, Pustovoytov DO (2014) Influence of process parameters on distribution of shear
strain through sheet thickness in asymmetric rolling. KEM 622–623:929–935. https://doi.org/
10.4028/www.scientific.net/KEM.622-623.929
11. Inoue T (2010) Strain variations on rolling condition in accumulative roll-bonding by finite
element analysis. In: Moratal D (ed) Finite element analysis. Sciyo
12. Ma CQ, Hou LG, Zhang JS, Zhuang LZ (2014) Experimental and numerical investigations of
the plastic deformation during multi-pass asymmetric and symmetric rolling of high-strength
aluminum alloys. MSF 794–796:1157–1162. https://doi.org/10.4028/www.scientific.net/MSF.
794-796.1157
13. Inoue T, Qiu H, Ueji R (2020) Through-Thickness microstructure and strain distribution in
steel sheets rolled in a large-diameter rolling process. Metals 10:91. https://doi.org/10.3390/
met10010091
Modelling and Control
of Semi-automated Microfluidic
Dispensing System
1 Introduction
Nowadays, in the field of the syringe dispensing system, the development of the high
precision device is a challenging task that is achieved using the proposed design.
The author developed the syringe injection rate detection system based on two Hall-
effect sensors in the differential mode of operation. From tests conducted on the
prototype developed, the worst-case error in p was found to be less than 1:2% and
the error in the determination of the rate of injection to be less than 2:4%. This is
within clinically acceptable limits since the rate of injection in practical scenarios
rarely exceeds 15 ml/s [1]. The electronic technique uses a needling instrument for
the purpose of detaching the needle automatically, i.e. an action that can detach the
used needles from the syringe and then collect them respectively The caliber of the
developed design aims at the common 10 and 20 ml syringe practice in the hospital
[2]. A novel machine-driven injection device is bestowed specifically designed for
correct delivery of multiple doses of product through a variety of adjustable injec-
tion parameters, as well as injection depth, dose volume and needle insertion speed.
The device was originally planned for the delivery of a cell-based medical aid
to patients with skin wounds caused by epidermolysis bullosa [3]. Consequently,
there’s a robust demand for machine-controlled liquid handling strategies like sensor-
integrated robotic systems. The sample volume is at the micro- or nanoliter level,
and therefore the variety of transferred samples volume is immense once work in
large-scope combinatorial conditions. Below these conditions, liquid handling by
hand is tedious, long, and impractical [4].
Some of the patents related to the microfluidic dispensing systems are the technical
field of cell culturing for the production of in-vitro tissues and provides a device for
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 397
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_32
398 M. Prabhu et al.
dispensing a suspension of biological cells into culture vessels for culture, comprising
mean for re-suspending cells among the suspension [5]. The extremely machine-
driven, high volume multichannel pipetting system transfers liquid from mother
plates to daughter plates, or from a fill station to daughter plates [6].
Stepper Motor: A motor has to be coupled with lead screw to give the rotary motion.
As mentioned above the pitch of the screw does not meet the required minimum
movement. Hence the actuator is selected such that it can achieve the minimum
required movement. In this case, the high-resolution stepper motor can be used
as the stepping angle of the motor can be controlled to the required position. The
specifications of the stepper motor were tabulated in Table 1.
Torque Calculation of the Microfluidic Dispensing System.
Required minimum volume to be manipulated 25 µl.
Pitch of the screw 1 mm.
Coefficient of friction between nut and screw, µf −0.73.
Volume displaced by the syringe per mm stroke 0.035 ml or 35 µl.
Required minimum movement of plunger—25/35 = 0.714 mm.
Peak Load P−100 g or 0.1N.
Angle made by stepper motor per mm-360°
Angle required to make 0.714 mm − 0.714/360 = 257°
For a stepper 1.8° Resolution Stepper motor number of steps required to make
257° is 142.8 step. So, the stepper motor with 1.8° resolution can be chosen. For a
screw having a pitch (p) of 1 mm and Diameter (D) of 4 mm,
p
Thread Angle, α = (1)
πD
tan φ = 0.73
φ = 0.63rad
PD
Torque, τGZ = tan(φ + α)
2
The desired position signals were given to stepper motor to achieve the various stroke
length. Figure 4 shows the input position value of the stepper motor with lead screw.
The signal input represents the continuous suction and dispensing operations for
five cycles without break in operation. Each cycle consists of 10 s plunger movement
of the syringe from bottom to top and vice versa for five times with some random
direction change within each cycle (Fig. 5).
The total torque required by the motor were simulated and shown in Fig. 6.
The graph shows recorded maximum torque value in positive half is 0.00008 Nm
Modelling and Control of Semi-automated Microfluidic Dispensing System 401
Fig. 2 Semi-automated syringe dispensing system. a Automated syringe. b Exploded view. c Cut
section of syringe dispensing system
representing the motor rotating in clockwise direction and maximum torque value
in negative half is 0.00012 Nm representing the motor rotating in anti-clockwise
direction. The stepper motor which is used in the assembly has 0.4609 Nm or 4.7
kgcm torque. The calculated theoretical torque value is 0.0001715 Nm.
402 M. Prabhu et al.
5 Experimental Validation
Once the pipette tip is used, it cannot be used again for processing another sample. It
must be detached to trash. The pipette tip attaches and clamps itself to the syringe by
means of frictional force between the outer face of the syringe tip and the inner face
of the pipette tip. A simple push operation between these contact faces is enough to
detach the pipette tip from the syringe. An actuator which moves relatively fixed to
the syringe is required. So, a cam and follower mechanism, as the cam is driven by
a motor and the follower moves and pushes the pipette tip is deployed. Servo motor
can be used for this purpose as they can make a full or half step rotation precisely
[8]. Hence there needs a clamp to hold the servomotor in a fixed position. The simple
control algorithm is used to run the stepper motor precisely in micrometres while
using piezo-stepper motors it is possible to achieve the motion in nanometers range
by utilizing the appropriate high precision control algorithms explained [9]. The
semi-automated microfluidic dispensing system is shown in Fig. 7. The flow rate of
the sample were shown in Fig. 8.
Thus, the Position, Total force and Total Torque for the proposed model design of
the syringe dispensing system is theoretically calculated and simulation results were
carried out using the software. The syringe system is found to have to move 0.714 mm
to dispense 35 µl. The torque required by the motor to dispense the sample is 0.00012
Nm which is lesser than the calculated theoretical value of 0.0001715 Nm. The future
work is to ensure the fluid flow will travel in high precise movements especially for
404 M. Prabhu et al.
Fig. 7 Experimental Setup a Stepper motor with lead screw setup b Microfluidic dispensing system
the following application such as the drug delivered system, cell injection and cell
piercing using the developed dispensing system.
Acknowledgements I would like to express my deep and sincere gratitude to my former research
super-visor, late Dr. R. Sivaramakrishnan, Ph.D., Anna University, Chennai, for giving me the
opportunity to do research and providing invaluable guidance throughout this work.
It was a great privilege and honor to work and study under his guidance. I express my heartfelt
thanks for his patience during the discussion I had with him on this work and many other research
activities. In addition to that I sincerely thank him for establishing advanced facilities and equipments
in the Mechatronics lab under lab modernization scheme of University.
Modelling and Control of Semi-automated Microfluidic Dispensing System 405
References
1. Mukherjee GB, Sivaprakasam M (2013) A syringe injection rate detector employing a dual
Hall-effect sensor configuration. Annu Int Conf IEEE Eng Med Biol Soc
2. Chen CSC, Shih YY, Chen YL (2011) Development of the syringe needle auto-detaching device.
In: 5th international conference on bioinformatics and biomedical engineering, pp 1–4
3. Leoni LAG, Ginty P, Schutte R, Pillai G, Sharma G, Kemp P, Mount N, Sharpe M (2017)
Preclinical development of an automated injection device for intradermal delivery of a cell-based
therapy. Drug Deliv Transl Res 7:695–708
4. Kong YLF, Zheng YF, Chen W (2012) Automatic liquid handling for life science: a critical
review of the current state of the art. J Lab Autom 17:169–185
5. Andreas T (2015) Cell dispensing system. In: WIP Organization (Ed), pp 1–18
6. Walter Meltzer NM (2006) Conn, Automated Pipetting System, Matrix Technologies Corp,
Hudson, NH (US); Cosmotec Co, Ltd, Tokyo (JP), US, pp 1–20
7. Sabarianand DV, Karthikeyan P (2019) Nanopositioning systems using piezoelectric actuators,
In: Kamalanand K, Jawahar DNJAPM (eds) (2019) Advances in nano instrumentation systems
and computational techniques. Nova Sci
8. Sabarianand DV, Karthikeyan P, Muthuramalingam T (2020) A review on control strategies for
compensation of hysteresis and creep on piezoelectric actuators based micro systems. Mech
Syst Signal Process 140:1–17
9. Sabarianand DV, Karthikeyan P (2022) Duhem hysteresis modelling of single axis piezoelectric
actuation system. In: Suhag MCS, Mishra S (ed) Control and measurement applications for
smart grid. Springer, Singapore
Im-SMART: Developing Immersive
Student Participation in the Classroom
Augmented with Mobile Telepresence
Robot
1 Introduction
The COVID-19 pandemic which completely disturbed the entire world has caused
an enormous and long-lasting impact on day to day lives. Several sectors of the
economy have taken massive hits and are working relentlessly for coming back on
track as soon as possible. Specifically, the education sector has taken a massive hit
due to the ongoing pandemic and the emergent ‘not-so’ promising scenario. The
educational sector faces a large number of hurdles with the delivery of knowledge
and skills with educational institutions grappling with alternative and efficient ways
to match the efficiency and effectiveness of an offline classroom. Studies and various
surveys show that even though classes have taken a virtual route through online
platforms, they have failed to provide engagement and an environment similar to an
offline classroom [1]. The connectedness, interaction, and engagement that exists
between a faculty and a student in offline classes is something that online classes
have failed to replicate. With the pandemic still not completely over, educational
sector bears a large burden and therefore there is a cogent need to ensure that aspects
of student attention, inclusion, and participation levels do not drop while at the same
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 407
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_33
408 R. N. Kashi et al.
time maintaining similar experiences that were prevalent with traditional offline
classrooms. Several projects and research works have been reported in this area and
we propose a framework and platform that enables development of an MTR that
meets the need for ensuring an environment which is similar to conventional offline
classes and also addresses the engagement aspect through Virtual Telepresence. The
novelty of our approach is a scalable platform for the provision of ‘build-as-you-
go’platform that will incrementally add features taking into account cost aspects too.
The prototype also serves as a research test bed for future work.
Mobile Robot Presence (MRP) and Mobile Telepresence Robot (MTR): The
terms MRP and MTR are used interchangeably in literature. Telepresence is the
ability to provide a context and an environment wherein the user of the system is
empowered with both tasks and the environment that are present at a remote location.
There are two components: (a) The presence of an automated system which can be a
Mobile Robot which enables task accomplishment (MRP) and (b) resources on the
robot which extend the remote environment to the user so that the user feels he is
‘situated’ and contextually aware of the environment; this would require providing
necessary and sufficient information about both tasks and the environment using the
resources on the Robot (MTR).
Telepresence has wide ranging applications: Healthcare, Security and Surveil-
lance, Business Meetings, Mining in remote areas, and medical applications [2]
describes an application of MTR deployed in the Healthcare Sector for remote disin-
fection while [3] provides an example of training nurses using Virtual Telepresence.
MTR also find use in training medical students prior to them performing real surg-
eries or operations. MTRs also find use in hazardous or inconvenient environments
and [4] describes an application in the Mining Sector, involving a bot that is used in
mixed-presence teleoperation. A novel mobile robot in a Search and Rescue Opera-
tion is detailed in [6] and serves to save a lot of human efforts by employing Virtual
Telepresence Robots. The area of MTR provides a fertile ground for research and
advancement with the possibility of many applications and the challenges.
Several initiatives and projects have sprung up during the pandemic times that propose
to tackle the issues and challenges of home bound pupils. In this section, we provide
information related to MTR in learning environments. [5] details work that empha-
sizes the initial challenges of the use of telepresence robots and provides some
quantifications to measure the psychological efficacies of using such approaches,
while detailing supporting infrastructure like student and teacher preparation. [7]
discusses experiences with two MTR systems in the area of academics and stresses
the need for the requirement of stable network coverage and power sources. Refer-
ence [8] provides research findings in the context of virtual transnational education
Im-SMART: Developing Immersive Student Participation in the Classroom … 409
scenarios. Some inputs from this research are to address the technology aspects using
appropriate hardware, software and their integration. This research also suggests that
one needs to look into specific solutions for a particular context since technology
is evolving. A review of system design aspects is collated in [9]. Application area
challenges are discussed and user evaluation studies are performed. Reference [10]
provides the concept of visibility checks and guiding remote users for enabling
visual access to materials in classrooms teaching foreign languages. Reference [11]
describes ‘Professor Avatar’, a telepresence robot using a human scale holographic
projection. A good review of available telepresence robots in education is provided in
[12] and analyzes responses from students, teachers, and also parents. [14] outlines
a web-based framework for providing a robotic telepresence environment. In [13] a
framework that utilizes seven identified dimensions for designing telepresence robots
is provided.
Outline of the paper: In Sect. 2, we formulate the system requirements and provide
the overall system design. Hardware architecture is dealt with in Sect. 3 with alloca-
tion of functions to system components. Section 4 provides an overview of the Soft-
ware aspects. Section 5 discusses implementation and system use cases. Section 6
provides data collected from experiments with the system and also provides insights
into future work that has been planned with the platform.
2 Problem Context
The most important requirement of an MTR system is the need to ensure the “connect-
edness” of the remote student with the classroom. The introduction of new technolo-
gies into the learning environment will address the necessity of social presence for
the remote users or individuals. An attendant requirement is to provide opportunities
that enhance the learning skills in a participative environment. These opportunities
can become more immersive with the use of appropriate sensory inputs connected to
the user or remote location. The immersive aspects are enabled by interfaces that are
amenable for adaptation based on the remote individual needs. An associate aspect
is the requirement for providing flexible movement of the Mobile Robotic Pres-
ence system and its subsystems. A goal often cited in this context is the reduction
of transactional distance, ‘psychological and communication space to be crossed, a
space for potential misunderstanding between the inputs of instructor and those of the
learner’. It follows that the learning experience of the student is increased with smaller
measures for the transactional distance. Considering an MRP system, this transac-
tional distance is dependent on providing the right technological base for the robot so
410 R. N. Kashi et al.
that the user experience meets the need for social presence, and providing the right
controls for the user so that the learning and immersive experiences are enhanced.
In order to meet these design aspects, we have conceived Im-SMART (Immersive
Student participation in the classroom Augmented with Mobile Telepresence Robot).
Considering the aspects of connectedness, immersive experience and prior work in
this area (outlined in Sect. 1), we envisaged to develop the MTR system in two phases
and captured requirements in a structured manner:
authentication, the platform sends back credentials of the bot via an email mechanism
for subsequent access. These credentials are used with the Main script on the remote
user subsystem. The computing and control platform is also responsible for providing
the necessary control signals to the camera on board the bot. The signals are derived
after processing the raw commands coming over the internet and processing them
through filtering, estimation and conversion algorithms. The computing and control
platform also serves to convert the commands for movement of the bot to signals that
drive the bot’s locomotory motors. The computing and control platform incorporates
the Server which is responsible for the audio and video streaming functions, along
with the microphone integration. Figure 1 indicates the proposed MTR System block
diagram with various components of the Remote user subsystem. Figure 2 shows the
hardware architecture as a block representation.
A modular design approach is employed in the incremental development of the
prototype, driven by requirements in the two phases. The top-level modules are
the ‘Android Application’, ‘Camera Module’, ‘Video and Audio Feed Module’,
‘Bot Locomotion Controller module’, and ‘Microphone Control Module’. The
‘Android Application module’ generates the manual (phase-1) and voice commands
(phase-2) for locomotion, camera control, and screenshot capture. The Camera
SmartPhone
Downloaded cred.txt
connected with VR
Area to access
Mail App screenshots and Main script to run
Mail with
credentials recordings
of the Bot
Motor Drivers
Camera Module
Fig. 1 The proposed Im-SMART system block diagram, outlining the various components of the
Remote User subsystem and the Virtual Telepresence bot
412 R. N. Kashi et al.
Fig. 2 Hardware architecture of the Im-SMART system (Remote End and MTR)
module processes all movements and orientations of the camera in synchronism with
the user’s orientation. The Bot Locomotion Controller module drives the motors
using the traditional PWM technique and is integrated with the ability to receive
voice commands from an android application hosted on the remote user end via a
Speech Recognizer. The ‘Microphone Control Module’ integrates with the Liquid
Soap encoder client on the Bot to interface with an IceCast Streaming server on
remote user end.
3 Hardware Architecture
Table 1 Allocation of
Req. Id MTR design aspect Hardware element
hardware elements
[R1] Affordability All elements
[R2] Immersive experience Camera
[R3] Telepresence Camera motor
[R4] Connectedness MTR platform motor and
driver support
[R5] Interaction, mobility Microphone
[R6] Immersive experience, Mobile device, raspberry
communication Pi platform
[R7] Connectedness, Mobile Device
connectivity
[R8] Ease of use, user Mobile Device, Raspberry
interface Pi Platform
[R9] Flexibility Mobile device, raspberry
Pi platform
[R10] Extensibility Raspberry Pi platform
also serves as a checklist to ascertain whether all appropriate MTR design aspects
are met.
The Hardware System block diagram is provided in Fig. 2. The key components are
the Raspberry-Pi platform, servo motors for robot camera movement, power source,
USB microphone, and motor driver circuitry for driving the robot’s locomotion
motors.
Assembly of the robot frame: The chassis of the bot is designed and assembled
in a two-tier fashion. The bottom storey houses the servo motors and brackets used
for it, batteries and the motor driver, whereas the top storey houses the microphone,
Raspberry Pi and the Power Bank. The bot has two wheels controlled by DC motors
and one caster wheel in the front. Locomotion of the Studo Bot.
Movement of the Bot’s Camera based on sensor readings from smartphone
of the user: We use the concept of Socket Programming to communicate between the
bot and the user. Here, the Raspberry Pi acts as a Server, and the User Device acts as a
Client. The sensor values needed are of accelerometer, gyroscope and magnetometer
to determine the orientation of the phone because using only one of them will cause
integration error or noise due to the movement of the bot. Smartphone sends sensor
readings to the bot using a reliable TCP protocol. The PWM values are obtained on
the Raspberry Pi on the bot and are mapped to appropriate pulse signals and fed to
the servo motors which moves the camera to the orientation of the head of the user.
414 R. N. Kashi et al.
Based on the commands received at the motor driver enable pins, the studo bot
moves in the required directions. The control logic is as shown in the tabulation in
Fig. 3 and the image in Fig. 4 is the test set up of the bot.
Obtaining video feed and streaming it to the user: The camera is tested and
configured in order to be able to send the video stream. A web interface is designed
using PHP to create the User Interface for the Camera. The video is streamed through
this web interface and can be viewed on the phone in VR. The latency of the stream
is extremely low and the quality is great over the previous version. Using the inbuilt
features on the web interface, we can control camera settings like brightness, contrast,
camera scheduling, motion detection etc. The features of taking a screenshot and
starting the recording of an ongoing session are added on the Web Interface. Two
buttons have been added, which store the recorded files on the server which can be
downloaded on to the User’s Device. Options are also included to delete any file, if
needed. The Web Interface is hosted on a Server at Port 80. Voice Commands have
been integrated to enable the Screenshot button when needed, significantly reducing
the user intervention. We can combat the negative effects of prolonged screen time
by using reading mode, night mode or blue light filter on the phone by software.
Integration of microphone on the bot: There is availability of a USB port on the
Raspberry Pi which is used to connect the USB microphone with ease. A soundcard
can be employed to reduce the low frequency noises by the microphone due to low
proximity of distance with the Pi’s circuitry. We make use of the Icecast Streaming
Server and the Liquidsoap Client to send the low latency live audio stream from
the microphone on the bot. Liquidsoap is an encoder client and a scripting language
which reads the microphone input and encodes it to the format required by the user.
In our case, we have deployed the.opus encoding due to its extremely low latency and
amazing usability in live audio streaming. To reduce the effect of microphone noise,
Liquid soap provides inbuilt filter functions which have been employed to reduce
noise and obtain a better streaming function. Icecast is a streaming server which is
hosted on Port 8000. It can automatically stream the audio data incoming from the
client, Liquidsoap and play it on the server.
Locomotion of the Studo Bot: Locomotion information is obtained from the
user as Voice Commands, through an Android Application specifically designed to
suit the application. The Android Application is designed to run in the background
continuously, and on receipt of a trigger command, in our case: “Start”, starts running
the Speech Recognizer and picks up commands like, “Forward”, “Backward” etc.,
which are converted to text and sent to the Studo Bot through the reliable TCP
protocol with the help of sockets. The Motor Driver is mapped accordingly and the
wheels are actuated as per the commands by the User. A mechanism is provided to
notify the teacher if a student has any questions or doubts through an LED on the
bot.
4 Software Architecture
Figure 5 lays out the software architecture of the system and is partitioned into two
parts, the user side subsystem and the MTR subsystem.
On the Bot platform, the ‘main.sh’ spawns five threads namely ngrok tunnels, camera
web interface, Liquidsoap streamer, locomotion and servo control python scripts. A
mail server thread is distinct and created separately to handle the initial registration
and setup process. This thread is responsible for generating the credentials for a
remote user. On the user side the ‘main.py’ spawns the four threads Camera feed
416 R. N. Kashi et al.
URL, access VLC, Client socket and app voice recognition which form the comple-
mentary components of their counterparts on the BOT platform. Figure 5 indicates
the Software Architecture of the system, depicting the key software components.
In order to be able to minimize User Interventions, and to make the Bot accessible
truly remotely and from anywhere in the world, it is necessary for the entire setup
Im-SMART: Developing Immersive Student Participation in the Classroom … 417
moves to the internet. One of the simplest mechanisms to ensure a Server and a Client
stay connected remotely is through the concept of Port Forwarding. To make use of a
safe, reliable and cost friendly option, we opted for the services of ngrok, a platform
which creates secure tunnels, enables accessing of local websites from anywhere, and
also enables port forwarding to easily send data packets through TCP from anywhere
in the world which is shown in Fig. 6. ngrok creates tunnels, and provides secure
URLs to view the camera feed and stream the live audio from the Bot as represented
in Fig. 7. It also provides access to a Public IP and enables port forwarding on local
ports to send in data through sockets from User Device seamlessly.
Port forwarding is very useful in preserving public IP addresses. It can help in
protecting servers and clients from unwanted access, hide the services and servers
available essentially on a network and can also limit access to a network. Port
forwarding adds an extra layer of security to networks.
5 Implementation
The Bot is switched on and connects to the network at the remote location. As
soon as the bot turns on, it starts scanning for emails using the email server and if a
new one is received from a registered user with the right login credentials (email and
password), it sends the credentials of the bot using which the user can connect to the
bot. The bot is marked busy and no other requests are entertained until the bot is free
again. The User downloads the file received on the E-mail. A python script reads
the recently downloaded file and extracts necessary information needed to establish
a connection with the Studo Bot, the Camera stream opens on the browser and the
Audio Stream begins on the VLC Player and the user can now use the VR headset
to control the orientation of the virtual environment. If it’s needed to move the Bot
around the physical location, the user speaks the trigger command, “Start” to start
the Speech Recognizer in Android Application which runs in the background. On
trigger, the control commands, as spoken by the user are converted to text and are
sent to the Bot over the Internet. While in the session, if the user feels a need to
record and store the session for future use, or needs to take a snap of something
useful, the “Click” voice command is spoken and the Camera web interface clicks
a snap. This way, the user with minimum interventions, controls and experiences a
physical environment, virtually at the comfort of their own place and surroundings.
Im-SMART: Developing Immersive Student Participation in the Classroom …
Integrated Model
column ‘μ’ and Standard Deviation under ‘σ’. The Camera Web Interface provides
extremely low latency and higher quality video feed and is upgraded with new features
including brightness control, contrast control etc. We found that the average latency
for the video stream was about 0.89 secs and this was acceptable at the remote user
end. The remote user did not perceive any difficulties with the video feed and was
able to participate in the classrooms effectively. Features like taking a screenshot and
starting the recording of an ongoing session are added on the Web Interface. Two
buttons have been added, which store the recorded files on the server which can be
downloaded on to the User’s Device. Options are also included to delete any file,
if needed. These features have proven to be useful as utilities. The audio streaming
interface which was enhanced with Liquid soap encoder client and Ice Cast streaming
Server has provided a delay of about 0.59 s on the average and the remote user did not
find any appreciable delays that usually accompany audio–video synchronization,
providing a seamless integration. The bot mobility was also measured in the context
of time delay for controls to take effect at the bot platform from the moment a voice
command appeared at the remote user end. The average delay was slightly more
and was measured to be within about 2 s. This aspect is being investigated, since
the parameters communicated are very few. The camera deployed on the bot moves
according to the orientation values of the remote end user device as commanded, in
real time accurately (within about 2 degrees of actual position) with very less delay
of approximately 1 s on the average. In Phase-2, major improvement was obtained
in the servo’s movement with the orientation values with a minimal delay after a
smoothing filter was added to the PWM output. Currently, there is ongoing work to
measure the control aspects of the camera movement like time for camera to settle
down in the locked position. The entire Im-SMART bot platform was built with a
budgeted cost of Rs 8000 and actual expenditure was limited to Rs 6500.
We are currently working to improve the system platform and exploring the usage of
Machine Learning algorithms to provide adaptable aspects related to the video and
audio functions. We are also exploring aspects related to the efficient usage of the
bot in educational context by examining usage scenarios more closely. User Inter-
faces are another area where we are currently examining the integration of Virtual
Reality aspects into the platform. One area of active research is repurposing the plat-
form for other applications like medical education, industry, survey operations, and
surveillance.
422
References
1. Ying L, Jiong Z, Wei S, Jingchun W, Xiaopeng G (2017) VREX: Virtual reality education
expansion could help to improve the class experience (VREX platform and community for
VR based education). In: 2017 IEEE frontiers in education conference (FIE), Indianapolis, IN,
USA, pp 1–5. https://doi.org/10.1109/FIE.2017.8190660
2. Potenza A, Kiselev A, Saffiotti A, Loutfi A, An open-source modular robotic system for
telepresence and remote disinfection. arXiv:2102.01551. [cs.RO]
3. Mudd SS, McIltrot KS, Brown KM (2020) Utilizing telepresence robots for multiple patient
scenarios in an online nurse practitioner program. Nursing Edu Perspect 7/8 41(4):260–262
4. James CA, Bednarz TP, Haustein K, Alem L, Caris C, Castleden A (2011) Tele-operation of a
mobile mining robot using a panoramic display: an exploration of operators sense of presence.
In: 2011 IEEE international conference on automation science and engineering
5. Gallon L et al (2019) Using a Telepresence robot in an educational context. In: Proceed-
ings of the international conference on frontiers in education: computer science and computer
engineering FECS
6. Ruangpayoongsak N, Roth H, Chudoba J (2005) Mobile robots for search and rescue. In:
Proceedings of the 2005 IEEE international workshop on safety, security and rescue robotics
Kobe, Japan, June 2005
7. Herring SC (2013) Telepresence robots for academics. In: Proceedings of the American society
for information science and technology 50(1). https://doi.org/10.1002/meet.14505001156
8. Khadri HO, University academics’ perceptions regarding the future use of telepresence robots
to enhance virtual transnational education: an exploratory investigation in a developing country.
https://doi.org/10.1186/s40561-021-00173-8
9. Kristoffersson A, Coradeschi S, Loutfi A (2013) A review of mobile robotic telepresence.
Hindawi Publishing Corporation Advances in Human-Computer Interaction, vol 2013, Article
ID 902316, 17 pages. http://dx.doi.org/https://doi.org/10.1155/2013/902316
10. Jakonen T, Jauni H, Mediated learning materials: visibility checks in telepresence robot
mediated classroom interaction. https://doi.org/10.1080/19463014.2020.1808496
11. Belmonte LEL (2018) “Professor avatar: telepresence model” IACEE world conference on
continuing engineering education (16TH MONTERREY 2018)
12. Velinov A, Koceski S, Koceska N (2021) Review of the usage of telepresence robots in
education. Balkan J Appl Math Inf 4(1) (2021)
13. Rae I, Venolia G, Tang JC, Molnar D (2015) A framework for understanding and designing
telepresence, CSCW ’15, 14–18 Mar 2015
14. Melendez-Fernandez F, Galindo C, Gonzalez-Jimenez J (2017) A web-based solution for
robotic telepresence. Int J Adv Robot Syst November-December 2017: 1–19ª
15. Kachach R, Perez P, Villegas A, Gonzalez-Sosa E (2020) Virtual tour: an immersive low
cost telepresence system. In: 2020 IEEE conference on virtual reality and 3D user interfaces
abstracts and workshops (VRW), Atlanta, GA, USA, pp 504–506
Architecture and Algorithms for a
Pixhawk-Based Autonomous Vehicle
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 425
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_34
426 A. Pratap et al.
Prior work in this direction includes platforms like MuSHR, AutoRally, and MIT
Racecar. While the MuSHR Platform focuses on multi-robot systems, it does not have
any functionality related to GPS Navigation, which makes prior mapping a necessity
for outdoor navigation. It thus restricts its use in a featureless or external environment
which is our primary use case. The AutoRally platform comes with an RTK (Real-
time kinematic positioning) corrected GPS (Global Positioning System) module but
added complexity because of the need to set up a ground station and the necessity
to deploy sensor fusion as GPS alone is not accurate enough for reliable navigation.
Further, the high cost of the AutoRally platform might be a bottleneck to student
researchers. MIT Racecar also does not provide GPS-based navigation facilities.
In our approach, we exploit the accurate localization and control capabilities of
Pixhawk, an open-source flight controller popular in the UAV (Unmanned Aerial
Vehicle) Industry. Pixhawk already provides the capability of precise localization
using EKF (Extended Kalman filter) and GPS-based waypoint navigation using cheap
sensors like Neo7m GPS-Compass Module and inbuilt IMU (Inertial Measurement
Unit). We combine this with our perception and planning modules. This enables one
to get started with the platform quickly without extensive calibration and tuning. Our
GUI Part fetches global path from google maps and passes these to Pixhawk as a
mission using Mavlink protocol. Hence, there is no need to create an SD (Standard
Definition) or HD (High definition) Map beforehand to start autonomous navigation
using our approach. Further, we also share the possible ways to surpass dependency
on google maps and pixhawk using Open Street Maps.
Our overall high-level architecture can be represented by this diagram (Fig. 1).
1. The user first selects the start point and end point
2. We fetch the global path from the google maps API or our own SD Map
3. The Global Map, Localization from Pixhawk and Output of Perception Sensors
(Camera/LIDAR/Ultrasonic Sensor Etc.) reaches companion computer (Jetson
Nano)
4. The perception module in Companion computer processes data from the percep-
tion stack and extracts data useful to the local planner, like type and position of
obstacles, drive-able region, etc.
5. Based on the output of perception stack, Global Map, and current state of the
vehicle, the local planner decides the trajectory for the next step (using DWA)
6. The trajectory is executed using the control module directly or passed to pixhawk
to implement it.
7. User sees all this in real time on screen and also this info is live streamed to ground
station through radio telemetry/4G Communications and data is logged.
We first elaborate on our perception and planning stack. Our perception stack
uses YOLOv4 for object detection and drivable region estimation using semantic
segmentation, RANSAC, edge detection, and filtering. We demonstrate all these
algorithms using the SOA CARLA Simulators. In the planner part, we first develop
an OSM Format map for CARLA towns and use A* to find the global path given a
start and endpoint. Then our local planner uses the global path and input from our
perception module to calculate the local path using the Dynamic window avoidance
algorithm. We demonstrate the accuracy of our planner through our simulation, where
the car is able to reach the goal point through 3 cars in between. Finally, we present
the overall architecture approach to scale it to real size golf cart and future upgrades.
All the Videos1 and Codes2 of simulation are released as open-source.
2 Object Detection
Object detection is the task of detecting instances of objects of a certain class within
an image. The state-of-the-art methods can be categorized into two main types:
1. One-Stage Methods
2. Two-Stage Methods.
One-Stage methods prioritize inference speed, for example, YOLO, SSD, etc. Two-
Stage methods prioritize detection accuracy, for example, Mask R-CNN, Faster R-
CNN, etc. With this kind of identification and localization, object detection can be
used to count objects in a scene and determine and track their precise locations, all
while accurately labeling them (Fig. 2).
For our project, we have used “You Only Look Once” or YOLO, the family
of Convolutional Neural Networks that achieve near state-of-the-art results with
a single end-to-end model that can perform object detection in real time and can
1 https://youtube.com/playlist?list=PL3HszLlqYTxCdmZk7xqaDreLpCilpBEyz.
2 https://github.com/AmitGupta7580/Static_vehicle_avoidance_carla.
428 A. Pratap et al.
identify multiple objects in a single frame with high precision and is faster than other
models. Its implementation is based on Darknet, an Open-Source Neural Network in
C. Compared to other Region Proposal Classification Networks (e.g., Faster R-CNN)
which perform detection on various region proposals and thus end up performing
prediction multiple times for various regions in an image thus takes more time for
predictions.
3-4fps is achieved on the live feed of camera fetching from Carla Simulator
on a computer with 8GB of RAM, 2GB of Nvidia GPU, and intel i7 processor.
The confidence threshold is set to 0.5 for detecting the object. Classes and weights
of the model that we use can be found here.3
Detection of Drivable space is a very important task to calculate the possible trajec-
tories in which our agent can move. For this task, the RGB and DEPTH camera feed
is used as input and return the equation of lanes in the real-world 3-D coordinate
system. This Algorithm consists of 5 steps which are explained below in a sequential
order (Fig. 3).
3 https://github.com/AlexeyAB/darknet/releases.
Architecture and Algorithms for a Pixhawk … 429
4 https://www.kaggle.com/ammmy7580/lane-road-detection.
430 A. Pratap et al.
Fig. 6 Overlapping the calculated road mask over original RGB image
result. Our model uses the RANSAC algorithm to reduce the noise coming from the
Machine Learning model. Parameters for RANSAC (Fig. 7)
In this step, canny edge detection is applied over the road-masked image to get
the edges of lanes as their starting and ending points.
Parameters used for Canny Edge Detection:
1. First threshold for the hysteresis procedure: 0
2. Second threshold for the hysteresis procedure: 150.
Filters are used to rule out noise in edge detection like merging almost similar
lanes.
1. Threshold for slope similarity: 0.1
2. Threshold for intercept similarity: 40
3. Threshold for minimum slope: 0.3.
432 A. Pratap et al.
This step is about the casting of RGB image pixels into a 3D coordinate system.
Estimating the x, y, and z coordinates of every pixel in the image.
With the help of the starting and ending pixels of lanes in the image, it can be
easily cast into a 3D coordinate system with the help of a depth camera feed. Now
for storing and visualizing the lane data, our code calculates the equation of the line
using its starting and ending point (Fig. 9).
By using the equation of lanes, the similarity of the currently encountered lane with
the previously encountered lane equations is checked, and if the similarity is less than
some threshold, then this lane gets merged into the previous one by calculating the
collective lane equation. The similarity is checked based on their slopes and intercept
in the equation of lane (Fig. 10).
4 Path Planning
For our project, the Town02 map of the Carla Simulator is used. Carla maps are in
OpenDrive format, so we converted the Town02 map into OpenStreetMap format.
With the help of the Python library OSMnx, we extracted the road network data from
the converted Town02 map and selected two nodes of the road network as the start
node and goal node. Then, we applied the A* algorithm to find the shortest path
(Fig. 11).
A* search algorithm approximates the shortest path in real-life situations, like in
maps where there can be many obstacles. It is a popular technique used in pathfinding.
A* uses a heuristics function that estimates the cost of the shortest path from the start
node to the goal node (Fig. 12).
Once the global path of the mission is fetched, it can be divided further into small
straight paths. For example, if the global path is ABCD, then the small missions
would be A to B, B to C, and finally, C to D. Now, to follow these sub-paths, DWA
algorithm is used as our local path planner which provides the path that meets certain
criteria (Fig. 13):
1. Minimum distance to our goal.
2. Avoiding obstacles comes into the path.
3. Follow the lanes and do not cross them.
4. Smoothness of the motion avoids absurd turns.
DWA (Dynamic Window Approach) is an algorithm used to find the best collision-
free trajectory among all the possible trajectories. Figure 11 shows the complete
flowchart of the DWA algorithm (Fig. 14).
The Current state (position, orientation, linear velocity, angular velocity) of the
car, the position of obstacles, lane equations, and goal position is provided as input
to our DWA model. Using these values it returns the next optimal state having the
minimum cost. It computes 4 different costs for optimal path:
1. Goal Cost (Calculates the distance of the next possible state with the goal)
2. Speed Cost (For smoothness of the motion)
3. Obstacle Cost (Calculates the distance of the next possible state with all the
obstacles)
4. Lane Cost (Calculates the perpendicular distance of the next possible state with
the lanes)
Total Cost = Goal + Lane + Speed + Obstacle (1)
The above figure shows the predicted trajectory using DWA, where red dots are
displaying static objects and orange and blue lines are representing lanes of the road.
Complete implementation of Dynamic Window Approach in Carla Simulator can
be found here.5
5 Controllers
The controller is an essential task for deciding the inputs for steer and throttle to
efficiently move the vehicle from the starting coordinate to the final destination
given by the path planning module. There are various Controllers (e.g., Stanley, Pure
Pursuit, etc.), but for our project, a controller that is more responsive to the change
in the path and also has fewer errors compared to the actual path is used.
5 https://github.com/AmitGupta7580/Static_vehicle_avoidance_carla/blob/master/DWA.py.
436 A. Pratap et al.
The controlling part is divided into two sub parts which are:
1. Lateral Control—For Controlling the steer of the vehicle
2. Longitudinal Control—For Controlling the Speed of the vehicle.
The lateral control part is the most important task for navigating the vehicle on the
actual path by deciding the steering value, i.e., how much the vehicle should have
to turn to follow the path. For selecting the best lateral controller, a comparison is
made between three different Controllers, Pure Pursuit, Stanley, and Model Predictive
Controller (MPL), in the CARLA simulator on the same track with the same inputs
of coordinates.
The first two Controllers are geometric path tracking controllers. A geometric
path tracking controller is any controller which uses the vehicle kinematics and the
actual path to decide the steering value (Fig. 15).
Pure Pursuit controller uses a look-ahead point which is a fixed distance on the
actual path ahead of the vehicle. The vehicle needs to proceed to that point using
a steering angle which we need to compute. In this method, the center of the rear
axle is used as the reference point on the vehicle. The target point is selected on the
actual path. And the distance between the rear axle and the target point is calculated
to accordingly determine the steering angle of the vehicle. Our target is to make the
vehicle steer at a correct angle and then proceed to that point (Fig. 16).
Pure Pursuit controller ignores dynamic forces on the vehicle and also the lim-
itation of the vehicle to steer at such high angles. One improvement is to vary the
look-ahead distance based on the current speed of the vehicle to fine-tune the steering
angle. For lower speed, it should be small so that the vehicle can steer at high angles,
and for higher speed, it should be large to limit the steering changes.
Stanley Controller is also a geometric path-tracking controller. The Stanley
method uses the front axle as its reference point. Meanwhile, it looks at both the
Fig. 15 Throttle value ans Steer value by the pure pursuit controller
Architecture and Algorithms for a Pixhawk … 437
Fig. 17 Comparison of three different controllers on the basis of change in throttle in the CARLA
simulator
heading error and cross-track error. In this method, the cross-track error is defined as
the distance between the closest point on the path with the front axle of the vehicle
(Fig. 17).
Model Predictive Controller is not a geometric path-tracking controller. Model
Predictive Controller uses cost function and predictive model to output the steering
values. Cost function contains the deviation from the reference path, smaller deviation
better results. Meanwhile, minimization of control command magnitude to make
passengers in the car feel comfortable while traveling, smaller steering better results
(Fig. 18).
By plotting the change in the steering value compared to the previous value of
the steer multiplied by 10 (so that we can observe small changes also), we simply
observe the sudden change that happens. Because sudden changes in the steering
438 A. Pratap et al.
Fig. 18 Comparison of three different controllers on the basis of sum of error in the CARLA
simulator
will make the vehicle unstable at a higher speed. In the plot Stanley, MPL has better
resistance to the sudden change that happens in the actual path so that it slowly turns
the vehicle to the actual trajectory but the pure pursuit method makes sudden changes
in the steer. Also, Stanley, MPL has less error compared to pure pursuit.
By comparing Stanley and MPL controller, it was found that the MPL controller is
more stable than Stanley. Mainly when the road ahead is straight, Stanley controllers
still have variation, but MPL (from 0–250 range in the plot) is stable in that range
(Fig. 19).
Architecture and Algorithms for a Pixhawk … 439
6 Overall Architecture
Firstly, the user inputs start point and end point through the screen using our GUI.
Then, we fetch the path as a series of GPS waypoints through google maps or our
global planner implemented on OSM. We feed this global plan as well as the output
of the perception module to our local planner which calculates the local plan to
reach the closest waypoint on the global path. We can pass this local plan either
to our Control module or to the pixhawk rover firmware’s controller. We use ROS
for all the communications between different nodes and processes. We present the
communication between different nodes of our system as follows (Figs. 20 and 21).
7 Future Work
8 Conclusion
References
1 Introduction
Obstacle detection and avoidance are crucial for modern-day drone applications
like drone delivery, surveillance, mapping, etc. We present our novel approach for
this task. We first detect the type and location of obstacles through CNN. Once the
obstacles are detected, we divide the field of view of the RGB-D camera into a 12*16
grid. We find a cost value for each grid based on factors like proximity to the goal,
distance from obstacles, smooth motion of drone, etc. Our cost function is built on the
concept of the DWA algorithm for 2d path planning. Based on the cost distribution
and type of obstacle, drone maneuvering takes place.
We first present details on dataset and model to train obstacle detection in AirSim,
followed by our overall approach for obstacle avoidance and then elaboration on each
component of our cost function and calculation of optimal velocity. We also present
the implementation and results of our approach using AirSim Simulator.
Ankur Pratap Singh, Amit Gupta, Bhuvan Jhamb, Karimulla Mohammad—These authors con-
tributed equally.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 443
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_35
444 A. P. Singh et al.
YOLOv4 is trained over different types of classes but the YOLO is never used for
Aerial object detection. To fill this void in YOLO we have build our own dataset which
helps YOLO for detecting the specific Aerial Obstacles like the current version of
our dataset provides data of these 4 classes (Figs. 1 and 2)
1. Bird
2. Drone
3. Building
4. Blocks (Big structures in AirSim Blocks Environment).
1 https://drive.google.com/drive/folders/1mfD6Pdkb4Y8l9C4ksE0ZVxfuQJXgTOeG.
3D Obstacle Detection and Path Planning … 445
In this chapter, we aim to design an aerial object detection system for autonomous
drones in the AirSim simulator using YOLOv4 object detector.
In object detection, the task is to detect instances of all the objects of a certain
class within an image. The state-of-the-art methods can be categorized into two main
types:
1. One-Stage Methods
2. Two-Stage Methods.
One-Stage methods prioritize inference speed, example YOLO, SSD, etc. Two-
Stage methods prioritize detection accuracy, example Mask R-CNN, Faster R-CNN,
etc. With this kind of identification and localization, object detection can be used to
count objects in a scene and determine and track their precise locations, all while
accurately labeling them.
For our project we have used “You Only Look Once” or YOLO, family of Con-
volutional Neural Networks that achieve near state-of-the-art results with a single
end-to-end model that can perform object detection in real-time and can identify
multiple objects in a single frame with high precision and is faster than other mod-
els. Its implementation is based on Darknet, an Open-Source Neural Networks in C
(Fig. 3).
To perform the task of obstacle detection we employed the concept of transfer
learning. Transfer Learning is when existing models are reused to solve a new chal-
lenge or problem. Transfer learning is a technique or method used while training
models. The knowledge developed from previous training is recycled to help per-
form a new task. The new task will be related in some way to the previously trained
task, which could be to categorize objects in a specific file type. The original trained
model usually requires a high level of generalization to adapt to the new unseen data.
We used pretrained YOLOv4 model and used its convolutional layers weights
which help our custom object detector to be way more accurate and not have to train
as long and it will converge and be accurate way faster. We are using the Blocks
environment of AirSim because it consumes less resources of our computers, so we
can get better FPS (Frames Per Second) for testing algorithms and implementing
future work. We have trained YOLOv4 to detect custom objects (Fig. 4).
Developing and testing algorithms for autonomous vehicles in the real world is an
expensive and time-consuming process. Also, in order to utilize recent advances in
machine learning and deep learning we need to collect a large amount of annotated
training data in a variety of conditions and environments. AirSim is a new simulator
built on Unreal Engine that offers physically and visually realistic simulations for
both of these goals. The simulator is designed from the ground up to be extensible
to accommodate new types of vehicles, hardware platforms, and software protocols.
In addition, the modular design enables various components to be easily usable
independently in other projects.
YOLOv4’s architecture is composed of CSPDarknet53 as a backbone, SPP (spatial
pyramid pooling) additional module, PANet path-aggregation neck, and YOLOv3
head. Our custom trained model achieves 75.39% [email protected]–59.585 BFlops.
3D Obstacle Detection and Path Planning … 447
In this approach we take evenly distributed points in the field of view of the camera.
And then calculate the Obstacle cost, Smoothness Cost, and Goal Cost of each path
present in the FOV, and then we add up all the costs to get the total costs of the path.
The path with the minimum total cost is selected (Fig. 5).
As our depth camera gives a feed of 144*256 array so we decided to divide out fov
into 12*16 paths/points that a drone can take so as to maintain the symmetry. Then
for each of these paths we find out the obstacle cost, smoothness cost, and the depth
cost of that path. The obstacle cost is proportional to the proximity of the obstacles,
the smoothness cost is proportional to the sudden changes in velocity and the goal
cost is inversely proportional to how close a path takes the drone toward goal. This
is how we calculate the costs (Figs. 6 and 7):
Obstacle Cost
1. We first fetch the camera feed of the depth image, which contains the distance of
the object present at a certain pixel. Airsim returns the feed in form of a 144*256
array.
2. We then pass the array through the average 12*16 pooling with the stride of 12
in the horizontal direction and a stride of 16 in the vertical direction such that no
two layers overlap. The pooling will result in a 12*16 array. And each of these
points denotes the 12*16 paths of fav which a drone can take.
2 https://drive.google.com/file/d/1v5KT0cw5LgAQFfhb\discretionary-4VtaZJBodEPZaEf/view.
3 https://drive.google.com/file/d/1OkrreuxpYbSFslZ3irBKa48P7X9gpYxq/view.
448 A. P. Singh et al.
Fig. 7 Obstacle cost versus obstacle distance with effective distance 100
optimal path to select depends upon smoothness and goal cost as the obstacle cost
doesn’t change highly.
6. So we pass each value in the distance matrix to get the cost matrix. The function
being:
Costi j = Wg ∗ ((1/e f f ective_dist) − (1/disti j ))2
where
Wsh is smoothness weight in horizontal direction
Wsv is smoothness weight in vertical direction (Figs. 8, 9 and 10).
Goal Cost
1. Goal cost helps to determine whether the path moves toward the goal or away
from the goal. The more the path directs toward the goal the lesser will be its Goal
cost.
2. Goal cost is broken down into two parts:
– Due to the difference in the angle the drone will face if it selects the block and
the angle in which the drone should go to reach the goal.
– Due to the difference in the height of the drone currently and the preferred
height at which we want our drone.
3. We get to know about the current facing of the drone using the compass. Let the
current facing angle be alpha.
4. Now we will find the angle in which the drone will move if it selects any of the
boxes. Let the field of view of the camera be fov so the rightmost block will be
(alpha+fov/2) degrees and the leftmost will be (alpha-fov/2) degrees. And these
values change linearly.
5. We can find the goal angle, gamma, using the current location and final location
by
γ = ((G y − y)/(G x − x))
where:
A j = angle of the drone if jth column is selected
γ = angle:toward:the:goal
Woa = obstacle weight for angle
hi = height of drone if ith row is selected
Gh = height of the goal
Woh = obstacle weight due to the height of the drone
The total cost of each block at ith row and jth column will be:
The block containing the minimum cost is selected. Let the selected optimal block
be in ith row and the jth column then we calculate the effective angle aj of the column
j and the height hi for the ith row similar as we have calculated in the Goal cost. In
452 A. P. Singh et al.
airsim, to move the drone we give velocity in x-direction, velocity in y-direction, and
destination height. So with ht help if aj and hi we can easily determine them,
X _velocit y = v ∗ cos(a j )
Y _velocit y = v ∗ sin(a j )
Destination height = h i
Let the optimal path we get by adding up the costs be of ith row and jth column.
The distance that the drone can move in that path safely will be the value of the
ith row and jth column of the depth image(distance of obstacle) which we got after
applying average pooling (refer obstacle cost section). We then take the minimum
of this distance with the effective distance, distance from the drone beyond which
obstacles are ignored.
d = pooled_depth_image(i, j);
We must stop the drone if it reaches near the goal so we shall limit the distance the
drone can move in that path with the distance of the drone from the goal.
d = min(d, goal_distance);
So we want our drone to be fastest when d is equal to effective distance and zero
when d is equal to zero.
We can change velocity linearly but it would be better to have the rate of change
in the velocity of the drone to be low at low values of d as we have obstacles nearby.
So we change velocity as a quadratic function of d. This also ensures that the velocity
of the drone is less at lower values as compared to that of when we change velocity
linearly (Figs. 11 and 12).
So our velocity will be:
Fig. 11 Top view of goal angle (gamma) and drone angle (alpha)
Fig. 12 Velocity versus safe distance with max velocity set to 10 and effective distance 100
We also don’t want the velocity to highly increase as the safe distance may increase
suddenly in just a turn so we will always store the previous velocity of the drone and
make sure the current velocity should not be more that the previous velocity summed
up with v, where v is the maximum change in velocity which we want to allow so
We will not put restrictions while decreasing velocity as if we will not decrease
velocity as much as it demands then we may end up colliding with the obstacle.
The weights we defined above in the algorithm that determines which of the factors,
obstacles, smoothness, or goal, will be contributing more in the selection of the path
changes with different types of obstacle.
This has been done as we can see that in the case of birds, it’s more preferable
to move from the top so we can reduce vertical smoothness weight and increase
horizontal smoothness weight. Whereas in the case of poles, it is better to go sideways
so we can increase vertical smoothness weight and reduce horizontal smoothness
weight
In case, if there are multiple types of obstacles in the view then the general weights
are selected (Fig. 13).
Simulation Videos4 are released publicly.5
Acknowledgements This work was carried out as an intern project for TSAW We would like to
thank Mr. Kishan Tiwari and Mr. Rimashu pandey for their guidance, mentorship, and valuable
inputs.
4 https://drive.google.com/drive/folders/1eSa_CJ5WKoi4o3tcwirDeDtM1xMRdsQv.
5 https://www.tsaw.tech/.
3D Obstacle Detection and Path Planning … 455
References
1. Shah S, Dey D, Lovett C, Kapoor A (2017) AirSim: high-fidelity visual and physical simulation
for autonomous vehicles
2. Bochkovskiy A, Wang C-Y, Mark Liao H-Y (2020) YOLOv4: optimal speed and accuracy of
object detection. arxiv:abs/2004.10934
3. Fox D, Burgard W, Thrun S (1997) The dynamic window approach to collision avoidance. IEEE
Robot Autom Mag 4(1):23–33. https://doi.org/10.1109/100.580977
4. Borenstein J, Koren Y (1991) The vector field histogram-fast obstacle avoidance for mobile
robots. IEEE Trans Robot Autom 7(3):278–288. https://doi.org/10.1109/70.88137
5. Zhuang F et al (2021) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76.
https://doi.org/10.1109/JPROC.2020.3004555
Vibration Suppression of Hand Tremor
Using Active Vibration Strategy:
A Numerical Study
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 457
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_36
458 A. Sharma and R. Mallick
2 Mathematical Modelling
Fig. 2 Degenerated shell element used to model cylindrical shell mounted on human forearm
motion are then decoupled into sensor and actuator parts for active vibration control
of hand tremors.
The shell element under study is capable of capturing five degrees of freedom per
node. In addition, at element level, the potential difference through the piezoelectric
thickness is incorporated. The coordinates of any location within the structure under
study is represented as
⎧ ⎫ ⎧⎧ ⎫ ⎧ ⎫⎫
⎨x ⎬
nnel ⎨⎨ xl ⎬ 1 ⎨ l3l ⎬⎬
y = Ni y + th m (1)
⎩ ⎭ ⎩⎩ l ⎭ 2 l ⎩ 3l ⎭⎭
z i=1 zl n 3l
where hi is the thickness of node l, nnel is no. of nodes per element, t is thickness of
shell element, and N i is the shape function.
The displacement within the element may be calculated as
⎧ ⎫ ⎧⎧ 0 ⎫ ⎧ ⎫ ⎫
⎨u⎬ nnel ⎨⎨ u l ⎬ 1 ⎨ l1l −l2l ⎬ ⎬
αl
v = Ni vl0 + th l m 1l −m 2l (2)
⎩ ⎭ ⎩⎩ 0 ⎭ 2 ⎩ ⎭ βl ⎭
w i=1 wl n 1l −n 2l
where {D} is electrical displacement, {σ } is stress vector, {E} is electric field, {ε}is
strain vector, [e] is piezoelectric coefficient, [Q] is elastic stiffness coefficients, and
[b] is dielectric constant.
[K φu ]{q} − [K φφ ]{φ} = Fq (7)
where [M uu ] is the mass matrix which includes mass of cylindrical shell structure
and piezoelectric layers, [K uu ] is the elastic stiffness matrix which includes elastics
stiffness of host shell structure, piezoelectric layers, and torsional stiffness of elbow
joint motion as torsional spring, [K φφ ] is the electric stiffness matrix and [K uφ ]
is coupled elastic-electric stiffness matrix. {F m } is mechanical force and {F q } is
applied electrical charge, respectively.
Vibration Suppression of Hand Tremor Using Active Vibration Strategy ... 461
[Muu ]{q̈} + [Cuu ]{q̇} + [K uu ]{q} + [K uφs ]{φs } = {Fm } − [K uφa ]{φa } (8)
[K φs u ]{q} − [K φs φ ]{φs } = Fqs (9)
[K φa u ]{q} − [K φa φ ]{φa } = Fqa (10)
From Eq. (9), the open circuit sensor voltage may be predicted as
[Muu ]{q̈} + [Cuu ]{q̇} + ([K uu ] + [K uφs ][K φs φ ]−1 [K φs u ]){q} = {Fm } − [K uφa ]{φa }
(12)
The crucial objective of the controller design is to regulate the hand tremor to a desired
level by driving an actuator using control force. The cylindrical shell is sandwiched
among piezoelectric layers. The output voltage form piezoelectric sensor subjected
to external force is predicted using Eq. (11). After filtration and magnification, the
output sensor voltage is send to the controller for analysing the input voltage. The
controller, thereafter, delivers control voltage (φ a ) as output to piezoelectric actuator
which in resultgenerates control force as represented in Eq. (12). In the present
study, negative velocity feedback controller is used for active vibration control of
hand tremor. The closed loop active vibration control strategy is illustrated in Fig. 3
and is mathematically represented as
{φa } = − Gain V φ̇s (13)
462 A. Sharma and R. Mallick
3 Validation
Fig. 4 Radial deflection versus normalized hoop distance of simply supported cylindrical shell
In this section, numerical study on active vibration control of hand tremor in the
patients suffering from Parkinson’s disease is confronted. A cylindrical shell sand-
wiched between piezoelectric sensor and actuator layers is modelled using finite
element formulation presented in Sect. 2. The upper piezoceramic layer acts as sensor
and lower piezoceramic layer acts as actuator as shown in Fig. 1.
As the human hand tremor is a kind of sinusoidal movement, the external force
that is applied to the cylindrical shell model as a tremor is a harmonic force.
Mathematically, the applied load is applied as
To model the human forearm motion, the pined boundary conditions are incorpo-
rated at one curved face of the cylindrical shell to capture motion about elbow joint.
The material properties used for cylindrical shell and piezoelectric layers are listed
in Table 2.
The effect of active vibration suppression of hand tremor using different control
forces is presented in Fig. 5. With the increase in control gain (Gain v ), the vibration
due to hand tremor can be damped out quickly. The observed damping ratio subjected
to the control gain of 0.05, 0.1 and 0.5 is 0.0047, 0.0092, and 0.0295, respectively.
With the increase in the value of control gain, the damping ration increases. However,
due to the hardware limitations, the maximum value of control gain must be restricted.
It should be noted that there might be a restriction on the maximum control gain
which necessitates the use of optimum combination of other parameters as well.
The numerical results show that the active vibration control strategy with collocated
piezoelectric sensor and actuator pair can efficiently suppress the hand tremor.
5 Conclusion
This paper numerically investigates the active vibration suppression of hand tremor in
patients suffering from Parkinson’s syndrome. For the same, forearm is covered with
cylindrical shell panel sandwiched between piezoelectric sensor and actuator layers.
The cylindrical shell is modelled using degenerated shell element with four nodes.
Hamilton principle is used to capture the dynamic response of forearm subjected
to harmonic tremor. Harmonic force is applied to simulate the hand tremors. This
article emphasizes on hand tremor suppression using the concepts of smart struc-
tures. The effect of control gains on active vibration suppression are investigated.
Numerical simulations reveals that the active vibration control strategy with collo-
cated piezoelectric sensor and actuator pair can efficiently suppress the hand tremor.
The observed damping ratio subjected to the control gain of 0.05, 0.1, and 0.5 is
0.0047, 0.0092, and 0.0295, respectively.
Vibration Suppression of Hand Tremor Using Active Vibration Strategy ... 465
(b)
(a)
(c)
Fig. 5 Active vibration suppression of hand tremor subjected to harmonic motion corresponding
to a Gain v = 0.05, b Gain v = 0.1 and c Gain v = 0.5
References
1. Abbasi M, Afsharfard A, Safaie RA (2018) Design of a noninvasive and smart hand tremor
attenuation system with active control: a simulation study. Int Fed Med Biol Eng 56(7):1315–
1324
2. Su Y, Allen CR, Geng D, Burn D, Brechany U, Bell GD, Rowland R (2003) 3-D motion system
data-gloves application for parkinsons disease. IEEE Trans Instrum Meas 52(3):662–674
3. Cooper C, Evidente VGH, Hentz JG (2000) The effect of temperature on hand function in
patients with tremor. J Hand Ther 13(4):276–288
4. Zhou Y, Naish MD, Jenkins ME (2017) Design and validation of a novel mechatronic
transmission system for a wearable tremor suppression device. Robot Auton Syst 91:38–48
5. Haas CT, Turbanski S, Kessler K (2006) The effects of random whole-body-vibration on motor
symptoms in Parkinson’s disease. Neuro Rehabil 21(1):29–36
6. Filipovic SR, Rothwell JC, Bhatia K (2010) Low-frequency repetitive transcranial magnetic
stimulation and off-phase motor symptoms in Parkinson’s disease. J Neurol Sci 291(1):1–4
7. Kotovsky J, Rosen MJ (1998) A wearable tremor-suppression orthosis. J Rehabil Res Dev
35:373–387
8. As’arry A, Zain MM, Mailah M (2011) Active tremor control in 4-DOFs biodynamic hand
model. Int J Math Models Methods Appl Sci 5:1068–1076 (2011)
9. Kazi S, Mailah M, Zain ZM (2014) Suppression of hand postural tremor via active force control
method. Manuf Eng Autom Control Robot 12(6):76–82
466 A. Sharma and R. Mallick
10. Sharma A, Kumar R, Vaish R, Chauhan VS (2016) Experimental and numerical investigation
of active vibration control over wide range of operating temperature. J Intell Mater Syst Struct
27(13):1846–1860
11. Mallick R, Ganguli R, Bhat MS (2015) Robust design of multiple trailing-edge flaps
for helicopter vibration reduction: a multi-objective bat algorithm approach. Eng Optim
47(9):1243–1263
12. Sharma A, Kumar A, Susheel CK, Kumar R (2016) Smart damping of functionally graded
nanotube reinforced composite rectangular plates. Compos Struct 155:29–44
13. Balamurugan V, Narayanan S (2001) Shell finite element for smart piezoelectric composite
plate/shell structures and its application to the study of active vibration control. Finite Elem
Anal Des 37:713–738
14. Saravanan C, Ganesan N, Ramamurti V (2000) Analysis of active damping in composite
laminate cylindrical shells of revolution with skewed PVDF sensors/actuators. Compos Struct
48:305–318
15. Sharma A, Kumar R, Vaish R, Chauhan VS (2014) Lead-free piezoelectric materials’
performance in structural active vibration control. J Intell Mater Syst Struct 25(13):1596–1604
16. Mallick R, Ganguli R, Kumar R (2017) Optimal design of a smart post-buckled beam actuator
using bat algorithm: simulations and experiments. Smart Mater Struct 26(5):055014
Design of a Self-reconfigurable Robot
with Roll, Crawl, and Climb Features
for False Ceiling Inspection Task
1 Introduction
False or suspended ceilings are favorable for rodents to seek refuge and build their
habitat. These pests can wreak havoc in the buildings, whether residential, commer-
cial, or industrial. Pests infestation is a significant health hazard as well as [1]. For
example, pests such as rats, cockroaches, and mosquitoes spread asthma, allergy,
and food contamination illnesses. Rats damage building structures, chew electrical
wires, and transmit diseases. The false-ceiling environment and the manual inspec-
tion process are shown in Fig. 1.
The requirement of smoothly and implementing the autonomous task in uncertain
environments with robust adaptive autonomous features is vital for developing next-
generation robots. Legged robots have higher adaptability to the different conditions
of ground [2, 3]. However, they are more complex and require high torque and power.
On the other hand, a wheeled robot is comparatively simpler in structure, easier to
control [4], and is efficient on moving a plane surface. Nevertheless, it is inferior
to adapt to obstacles or rough terrain. Track wheels can overcome irregularities
in the terrain with limited height, and it was used in the design of a false-ceiling
robot named Falcon reported in [5]. However, the track wheels have limitations in
overcoming and accessing the vertical surfaces such as sidewalls and ducts, in the
false ceiling. Therefore, we propose a novel robot design with roll, crawl, and climb
capabilities referred to here as FalconRCC, i.e., Falcon with Roll Crawl and Climb
(RCC) features.
The mobility of a wheel-legged type device can be used to negotiate obstacles.
This system combines the benefits of both a leg and a wheel mechanism. With the
disadvantage of high power consumption, track wheel robots can overcome obsta-
cles and operate on unstructured ground. The evolvability, multi-functionality, and
Hanger wire
T-channels
(a) False ceiling environment (b) Section of false-ceiling (c) Manual inspection
and hazards
survivability in reconfigurable robots [6] are useful for challenging terrains. Several
robotic architectures based on the reconfigurable design principles proposed in [7, 8]
were implemented in width changing pavement sweeping robot [9], Tetris inspired
floor cleaning robots [10, 11] staircase accessing robot [12], rope climbing robot [13],
drain inspection robots [14, 15], among others. Quattroped [16] was designed with
a unique transformation technique from wheeled to legged morphology. It includes
a transformation mechanism that allows them to convert the morphology of the driv-
ing mechanism between wheels (i.e., a full circle) and two degrees of freedom legs
(i.e., combining two half circles as a leg). In [17], a robot with a unique claw-wheel
transformation design is described. Moreover, mobile robots with differential wheel
action were used in the area coverage strategy for false ceiling in [18]. However,
the robot cannot self-recover, and the dimension restricts it from accessing cluttered
regions in false ceiling. In this work, we present the novel design of the reconfig-
urable robot with the ability to switch between crawl and roll mode and the modular
attachment that can aid in climbing walls.
The rest of this paper is organized as follows. Section 2 explains the requirements
and considerations for the design, mechanical layout, and system architecture of the
false-ceiling robot FalconRCC. The mechanical design of the quadruped robot with
the ability to crawl and roll, along with the modular attachment for wall climbing
is detailed in Sect. 3. Section 4 explains the components for the system architecture,
and experimental results for the transition and climbing of the wall by the robot are
shown in Sect. 5. Finally, Sect. 6 concludes the paper.
The existing inspection and surveillance task of the false ceiling is done manually
(Fig. 1b) and is tedious. The environmental scenario and design requirements are
discussed here.
Design of a Self-reconfigurable Robot with Roll, Crawl, and Climb … 469
By these observations, the following features in the robot designed for the inspection
and surveillance task will be of help:
• From the false-ceiling standards and observations, it was concluded that the chan-
nel height h (Fig. 1b) varies as 30 < h < 90 mm. Hence the robot design should
overcome this obstacle height.
• The platform must be lightweight and should generate less noise while moving
over the false ceiling.
• The platform should be able to recover itself from the fall.
• It can climb over vertical surfaces.
• Night vision camera mounted for the inspection task in the dark environment.
The transformation design principles [20] were utilized to cater to the need for
crawl, roll, and climb the wall by designing the subsystems accordingly. The detailed
470 S. Selvakumaran et al.
aspects of utilizing the design principles and facilitators with the mechanisms pre-
sented in [8] were utilized in this work to come up with a system facilitated by
roll/wrap/coil, modularity, shared transmission, furcate, and fold.
3 Mechanical Layout
In the robotics area, reconfiguration refers to a system’s ability to change its configu-
ration to fulfill the required task by reversibly changing its mechanism type, mobility,
gaits, architecture (say, serial to parallel), and so on. In this work a self-reconfigurable
robot designed using transformation principles [20] aimed at a false-ceiling inspec-
tion task. The scale of the designed robot is depicted in Fig. 2a, b which also adheres
to the system requirement. The two configurations for the crawling and rolling are
also shown along with the exploded view of the system showing it components and
the symmetry in design. The crawl and roll capabilities over the false ceiling are
discussed next.
A1
A3
145
A2
152
Fig. 2 Dimensions of FalconRCC (in mm) and its exploded depicting the components
Design of a Self-reconfigurable Robot with Roll, Crawl, and Climb … 471
The primary mechanism for the crawling and rolling locomotions of the robot are its
four semi-circular limbs connected using the spherical joint with the body. Each limb
has an active spherical joint that is providing three Degrees of Freedom (DoF) and
is made possible with three perpendicular revolute joints that are powered by micro
servo motors. The robot has a total of 12 such servo motors to control the movement
of its four limbs. The servo motors on each limb are arranged such that the joint
proximal to the robot body, i.e., Axis A1 (Fig. 2a) controls the legged locomotion,
the joint in the middle with A2 helps to control the rolling and the third motor
connected the arched leg helps in lifting the leg and transition of the position of the
leg. HS-35HD HiTec Ultra nano servos motors are placed to give the spherical joint
using three servo motors in each leg. The same joint configuration is provided for the
four legs attached to the main body, and as a result, twelve units of the servo motors
are used.
The home position state for FalconRCC is when its limbs face diagonally outward
at an angle of 45◦ from its body. Crawling begins from this state, and periodic drags are
created by each leg, one leg at a time. Figure 3a, b shows the home position of the robot
and the cawing gait pattern over the false-ceiling environment. Forward translation,
backward translation, clockwise rotation, and anticlockwise rotation are the four
basic locomotion patterns in the crawling state. This enables the maneuverability of
the robot to be used for the inspection task. With the change in the spread angle of
each leg or by increasing the leg footprint the height of the robot can be varied from
135 to 155 mm. This enables lower change of the configures the height as per the
obstacle. Figure 3c shows the FalconRCC reconfiguring its go beneath the duct pipe.
The reconfigurability of the limbs also plays a crucial role in enabling the robot
to transition from its crawling state to the rolling state, as shown with the sequence
of leg transformation in Fig. 4a. The advantage of rolling is observed with ten times
higher speed than during the crawling, and in the false-ceiling environment, the
small height (<90 mm) obstacle can be overcome during rolling. Both forward and
i ii iii
iv v vi
(b)
(a) (c)
Fig. 4 Changing the state from crawl to roll and overcome obstacles
backward rolling can be achieved. The feedback from the time of flight sensor (ToF)
and inertial measurement units (IMU) attached in the center helps avoid the fall due
to depth and regulates the rolling action. The robot has self-recovery mode upon
falling, since the design is made symmetric, the upside-down recovery can be easily
achieved.
The exploded view of the FalconRCC robot is shown in Fig. 5 with the subsystems.
The main subsystems shown are the Transition wheel, chassis, and bipedal. The four
limbs play a crucial part in enabling the robot to transition from its crawling to climb-
ing state. Also, this transition mechanism consists of a small pair of wheels to enable
a seamless transition from the floor to the wall after achieving the desired climbing
configuration. These wheels are driven by an 8V DC motor, which is engaged to the
wheel shaft using a pair of bevel gears as shown in Fig. 5a.
The robot uses micro-suction tape from AirStick. This tape establishes stable
bonds between robot and wall, similar to a gecko or spider forming Van der Waals
forces between its feet and the wall surface. The sticky surface of tape contains micro-
scopic air pockets that create partial vacuums between the tape and wall surfaces.
It leaves no residue behind, similar to suction cups. Thus it can be used repeatedly
without losing its adhesive holding power. However, it is not pressure-sensitive, a
property lacking in regular suction cups that rely on ambient pressure. These micro-
suction tapes are designed such that it is hard to pull off the entire tape in the direction
perpendicular to the attachment surface, but they are easy to peel off.
A pair of pedals move synchronously with the central trunk in a periodic fashion.
This periodic motion is made possible with the cam mechanisms incorporated within
the bipedal mechanism. The pair of pedals is similar to the long limbs on either side
of an ape, and the central trunk of the robot is like the trunk of the ape. The climbing
motion is achieved using a single motor. Before climbing, the transition from the
Design of a Self-reconfigurable Robot with Roll, Crawl, and Climb … 473
Gecko tapes
Fig. 5 Mechanisms helping in the transition from flat to a vertical surface and pedals with Gecko
tape actuated using transmission from gears for vertical climbing
floor to the wall is assisted using the pair of wheels, as shown in Fig. 5. The assembly
is moved toward the vertical surface with the help of these wheels to make contact
of sticky pedals to the vertical surface.
4 System Architecture
5 Experiments
The transition of the robot from its crawling state to the climbing state is shown in
a series of images in Fig. 7a. In this transition, the robots’ legs use two out of their
three Degrees of Freedom (DoF). When the robot has completed its crawling motion,
the four legs return to their default state. The configuration is achieved by rotating
the four legs, such that there are now two legs facing left and another two facing the
right side of the robot. After which, all four legs rotate about A3, away from one
another. As two of the legs contact the wall the other two legs continue rotating about
the A2-axis. The two legs furthest from the wall then rotate about the A1-axis, until
the joints of the two legs are parallel to the X-axis. Next, the legs closest to the wall
rotate until they are right above the robot’s body and parallel to the wall. Once this
configuration is achieved, the robot rolls forward on the two wheels in contact with
the floor until the micro-suction tape (MST) face pedals contact the wall.
While moving on a vertical or steep inclined surface, the modular attachment to
FalconRCC utilizes a bipedal gait. The figure shows one cycle of the bipedal motion,
from the moment where the center foot is about to begin detaching from the wall
surface to the same moment one cycle later. Some vertical distance is covered as a
result, which is represented by the dotted lines Fig. 7b.
The limitations in the current design observed are, namely, (a) a Higher number
of actuators used for each leg movement. This can be overcome by studying the
Design of a Self-reconfigurable Robot with Roll, Crawl, and Climb … 475
kinematic behavior and replacing a joint with a flexural joint as in [21] to reduce
the number of active actuation, (b) The passive suction unit to adhere to the vertical
surface works only for the clean and flat surface, like glass, mica, etc., (c) The
localization of the robot on the false ceiling is another limitation in the current
design, (d) The kinematic model and identification [22] of the geometric parameters
of the assembled robot along with the dynamic model is not incorporated in the
control scheme, and, () The noise with the current servo motors due to the gearbox
is more than the acceptable limit to operate in the false-ceiling environment.
6 Conclusions
In this paper, we conceptualize the design of the FalconRCC robot with the capa-
bility to roll, crawl, and climb inclined surfaces. It was shown that the design of the
robot is suitable for the false-ceiling environment and its inspection task due to the
self-reconfigurable and modular design. The mechanisms for each subsystem were
selected according to the transformation design principles. The modular attachment
of the climbing and transitioning mechanism using spatial transmission of motion
from motors to the pedals was demonstrated using the experimental transition. The
viability of the flexible joint for each leg to improve leg movement and reduce the
476 S. Selvakumaran et al.
number of actuators is being carried out. Future work includes the design optimiza-
tion and control of the robot.
Acknowledgements This research is supported by the National Robotics Programme under its
Robotics Enabling Capabilities and Technologies (Funding Agency Project No. 192 25 00051),
National Robotics Programme under its Robot Domain Specific (Funding Agency Project No. 192
22 00058), National Robotics Programme under its Robotics Domain Specific (Funding Agency
Project No. 192 22 00108), and administered by the Agency for Science, Technology and Research.
References
17. Chou JJ, Yang LS (2013) Innovative design of a claw-wheel transformable robot. In: 2013
IEEE international conference on robotics and automation. IEEE, pp 1337–1342
18. Pathmakumar T, Sivanantham V, Anantha Padmanabha SG, Elara MR, Tun TT (2021) Towards
an optimal footprint based area coverage strategy for a false-ceiling inspection robot. Sensors
21(15):5168
19. AS/NZS 2785:2000. Suspended ceiling design and installation. https://www.shop.standards.
govt.nz/catalog/2785. Accessed Apr 2019
20. Singh V, Skiles SM, Krager JE, Wood KL, Jensen D, Sierakowski R (2009) Innovations in
design through transformation: a fundamental study of transformation principles. J Mech Des
131(8)
21. Hayat AA, Akhlaq A, Alam MN (2012) Design of a flexural joint using finite element methods.
Mach Mech 198–205
22. Hayat AA, Chaudhary S, Boby RA, Udai AD, Dutta Roy S, Saha SK, Chaudhury S (2022)
Identification. Springer Singapore, Singapore, pp 75–113
Smart Technologies for Mobility
and Healthcare
Review Paper on Joint Beamforming,
Power Control and Interference
Coordination for Non-orthogonal
Multiple Access in Wireless
Communication Networks for Efficient
Data Transmission
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 481
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_38
482 L. S. Bitla and C. Sakode
2 Related Works
Liu et al. [23] suggested a method to combine the Reconfigurable Intelligent Surfaces
(RIS) with Unmanned Aerial Vehicle (UAV) in order to enhance the service quality
of UAV. The method effectively allowed roaming of the mobile users coupled
with increased spectrum efficiency even when roaming. The method developed the
decaying deep Q-network (D-DQN) algorithm to reduce the energy utilisation of
the system. The UAV-enabled status wireless network was monitored periodically
and measures were made appropriately to make the system adapt to the changing
environment. The interference arising in the NOMA was removed via the usage of
linear precoding technique. Upon simulation, the methods demonstrated to decrease
the energy usage substantially in the UAV.
Ashok and Sudha [24] proposed a beamforming method using Conditional Time-
Split Energy Extraction (CT-EE) to optimise the error rate with great energy effi-
ciency. The MSE and attainable sum rate (ASR) were improved and a minimal MSE
method was also developed to enable the system’s efficiency. Based on the battery
life, the time split was done between information decoding and Energy Extraction
(EE) phase and the bypassing scope of EE phase was explored simply based on
the power need. The method effectively manages the power depleted situation with
maximum connection and increased relay lifetime. The method focuses on allowing
the wireless nodes to gain complete connection even amid natural catastrophes. The
method dominates the other strategies related to beamforming designs with minimal
error.
Al Obiedollah et al. [25] proposed a beamforming method. In this, the trade-off
between the two metrics are discovered and optimised to enhance the performance
of the beamforming method. The approach had used the SCA methodology to figure
out the non-convexity problem and to get an optimum solution. The integrated priori
articulation method with weighted sum technique was exploited to accomplish an
effective optimization of EE and SE. The simulations were carried out using the
known beamforming methods relevant to power minimization.
Zhao et al. [26] proposed a method to increase the average sum rate for IRS-aided
multi-user system. This method focused on optimising the passive IRS phase shifts
first and then the beamforming vectors are developed with the help of minimising the
channel training overhead and the design complexity compared to previous ways.
The passive phase shifts are optimised based on statistics CSI (S-CSI) rather than
instantaneous CSI (I-CSI) and the beamforming vectors were developed in a manner
to accommodate fading channels of I-CSI of the users. The method presented the
Penalty Dual Decomposition (PDD) algorithm for single-user case and generic TTS
stochastic sequential convex approximation (SSCA) for multi-user scenarios. The
simulations focused on the effect assessment of S-CSI and channel correlation on
the performance of the system.
Ehlers et al. [27] developed a digital beamforming method to detect the spatial
disparities between the nodes present in a vast network. The method focused on
minimising the interference happening between the networks with the usage of
484 L. S. Bitla and C. Sakode
Multi-User Detection (MUD) receiver that may decrease the interference between the
nodes. MUD was primarily utilised to reduce the interference that was not detected
simply via the beamforming method. The overall performance of MUD was improved
using the Direct Sequence Spread Spectrum (DSSS) waveform. MUD, DSSS and
digital beamforming were the three main mitigation types focused on and described as
MUD-Aided Multi-Beam Uncoordinated Random Access Medium Access Control
(MAMBU-RAM) (MAMBU-RAM). The method proved capable of reducing the
interference observed in the closely spaced nodes spanning a limited range.
Newell and Vejarano [28] proposed a method to decrease the average power
consumption in the networks during data transmission. A dynamic routing method
for power management during transmission was developed in the Wireless Body Area
Networks (WBAN) based on the body motions of the user. The programme recorded
the body movement and developed a periodic time-domain model to minimise the
energy usage during transmission with assured packet delivery rate (PDR) (PDR).
The algorithm also focussed on finding the shortest route towards the sink node
to reduce the energy usage while reaching the destination. The movement of the
sensors placed in the user’s body was monitored by the algorithm utilising the inertial
measurement units (IMU) (IMU). The method upon tests showed that the power
consumption relies on the movement of the user and the distance between the middle
of the sensor node and the destination.
Zhu et al. [29] put forward a method to jointly optimise both the power regulation
and beamforming for the uplink NOMA for 2 users in mmWave communications.
The method provided maximal ASR for 2 users with minimum rate restriction for
both the users. The joint optimization method was addressed as two distinct issues
where the first phase focused on power management and beam gain allocation, while
the second phase worked on addressing analogue beamforming problem exposed
to constant- modulus constraint. The tests carried out indicated that the method
performed better in joint optimization than the Orthogonal Multiple Access (OMA).
Ji et al. [30] developed a method to simultaneously optimise the UAV trajectory,
cache location and power needed for transmission within a limited time. The method
used an alternating iterative algorithm to optimise the three stated issues in an iterative
way. The method mainly focused on increasing the network throughput and full
exploitation of wireless caching and mobility of UAV for multi-user content delivery.
The iterative technique was based on convex approximation methods and block
alternating descent. The experimental findings showed the improved convergence of
the algorithm in optimising the lowest throughput for UAV users. Also, the method
showed effective access latency reduction and throughput improvement when several
UAVs were deployed.
Zhan et al. [31] proposed a model to enhance the spectrum sharing capabili-
ties of the network as well as to concurrently improve the power management during
transmission. To accomplish the two stated goals, the model constructed a Distributed
Proximal Policy Optimization (DPPO) for efficient spectrum sharing. Based on regu-
lating the power among the main the issue of spectrum sharing was studied. Through
various power changes, the method allowed spectrum sharing between the PU and
SU along with fulfilling the QoS criteria.
Review Paper on Joint Beamforming, Power Control and Interference … 485
Lin et al. [32] proposed a method using user admission control to simultaneously
optimise the base station activation, admissible users and transmission beamformers.
The ultimate aim was to optimise the use of power in the networks. The author
formulated the problem into a convex sparse optimisation problem to ensure proper
functioning of the network and to achieve low power efficiency. To address the issue,
the Alternating Direction Method of Multipliers (ADMM) was used which operates
in an iterative way to find a solution. The tests demonstrated the effectiveness of the
method in reducing the power in multi-cell downlink green networks.
Shen et al. [33] addressed the joint trajectory and cross-link interference problem
through the use of a successive convex approximation algorithm. The author formu-
lated the joint trajectory and power control problem (TPC) to increase the aggregate
sum rate of UAV enabled Interference Channel (UAV-IC) under certain constraints.
The optimal solution for TPC problem was obtained based on the fly-hover-fly
technique employed in finding the optimal hovering locations of UAVs. The time
complexity of the method was decreased utilising the parallel TPC technique. It
jointly updated the trajectory and power variables for every iteration and the algo-
rithm proved to dominate other algorithms in terms of mitigating the cross- link
interference along with reducing the time complexity.
Li et al. [34] addressed the inter-tier interference coordination problem in
heterogeneous networks (HetNet) comprising macro and small cells working under
Frequency Division Duplexing (FDD) mode sharing a similar spectrum. Large-scale
and multiple antenna arrays were installed in the macro and small cell, respectively.
The SINR was approximated in the macro BS and 3D beamforming strategy was
applied for the users utilising the macro BS. The expected SINR was derived for
the small-cell users using the Wishart matrix. Then, the interference coordination
algorithms were introduced based on the observations to achieve a better trade off
between the network traffic and performance of both macro and small cells. Under
experiments, the algorithm dominated the other algorithms in reducing the interfer-
ence between the networks and the results of the experiments correlated with the
Monte Carlo results.
Li et al. [35] published a technique to solve the interference problem for cell-
edge users in full-dimension MIMO (FD-MIMO) systems. The authors introduced
Fractional-Frequency-Reuse (FFR) method for the FD-MIMO systems allocating
same frequency band for cell-centre users and the remaining frequency for the cell-
edge users. Also, two joint interference coordination strategies on 3D beamforming
were introduced such as the full- cooperative strategy. Wang et al. [36] developed a
two- level beamforming coordination method with the help of minimising the inter-
ference happening between the wireless networks. The method operated by dividing
the network into clusters and followed an inter-cluster coordination for every clus-
ters. A dynamic time domain interference coordination method was put-forth to
collect the interference information. The method assisted in decreasing the inter-
cluster interference between the network clusters for the switched-beam systems
(SBS) (SBS). Upon simulations, the two-level method showed to minimise the inter-
ference between the networks and also proven to achieve greater performance for
the edge users than the other techniques.
486 L. S. Bitla and C. Sakode
4 The Concept
Fig. 1 Spectrum splitting for NOMA and OFDMA for two users
colour-coded information signals being sent from the transmitter, as seen in the
figure. All three signals are included in the signal received by the SIC receiver. The
strongest signal is the first one decoded by SIC, with the others acting as noise.
Decoding is performed by first subtracting the coded signal from the received signal,
and if decoding is successful, a complete waveform is produced. Once SIC has found
the required signal, it continues to repeat the procedure until it does
5 NOMA Downlink
The BS covers the served users information waveform on its own. SIC is used by
each piece of user equipment (UE) to pick up on its own signals. Figure 3 depicts
a BS and K UEs equipped with SIC receivers in a wireless network. Base stations
(BS) are presumed to be nearest (UE1) and furthest (UEK), respectively.
488 L. S. Bitla and C. Sakode
One of the biggest problems for BS is deciding how much power to distribute
across the various information waveform. When using the NOMA downlink, the UE
situated further away from the base station receives more power and the UE located
closest to the base station receives less power. This information is sent to all UEs in
the network as a single signal. The strongest signal is decoded first by each UE, and
the decoded signal is deducted from the received signal thereafter. The subtraction is
repeated until the SIC receiver locates its own signal. The signals from UEs situated
far away from the BS may be cancelled by UEs located near the BS. Due to the fact
that the furthest UE’s signal gives the most to the received signal, it will decode its
own signal first.
Most of the beamforming methods rely on achieving a high data transmission rate
but the interference and power consumption during transmission are not addressed.
Joint optimization of all the three methods enables the network to operate effectively
with maximum throughput. Interference in the network is a serious problem that
confronts security leading to loss of data. Because of the development of wireless
communication networks and the dense deployment of BS and antennas, interference
has become a frequent issue. Also, the energy that is dissipated from the network
creates the interference and capacity difficulties. Power control method assists in
preventing the interference happening in the network as well as increases the capacity
of the network. Efficient beamforming can jointly optimise the power as well as the
interference happening in the networks and when the sensors and BS are situated
in a sparse area, beamforming can operate well with maximum SINR. In a dense
network, the capacity of the network and the QoS guaranteed to users are severely
restricted owing to the inter-cell interference. The capacity of the network may be
increased by the frequency reuse techniques, thus decreasing the interferences.
Thus, increasing the capacity of the network and effective beamforming for
multiple-antennas minimises the inter-cell interference. The proactive method
Review Paper on Joint Beamforming, Power Control and Interference … 489
(prediction) of finding a solution to enhance the capacity and decrease the inter-
ference is more important than the strategies that simply respond to the issues after
identification. Another most essential aspect is the power management in the network
that assists in maintaining a higher battery capacity. Both the power control and
interference coordination methods are tightly connected as the decrease in power
consumption lowers the interference as well. In the dense deployment of BS, the
dynamic flow needs to be addressed instead of static methods so that the power
consumed by BS may be effectively controlled to increase the throughput. In case of
static methods, the power may only be changed in the BS if the users avoid travelling
and remain in a specific location for a set period of time.
There are only a few methods known related to beamforming that induce both
power management and interference coordination with simultaneous optimization of
SINR and network performance. Due to the urgent need for effective beamforming
methods with joint optimization of power management and interference coordination,
the upcoming joint beamforming strategies have been suggested. This primary goal
is to accomplish efficient data transmission rate in a wireless network and to solve
the faults highlighted in the current methods with decreased interference and with
effective spectrum sharing viewpoint.
6 MIMO-NOMA
For the reason that it increases total capacity performance also when there are a huge
number of users, MIMO systems may be able to take use of multicast beamforming.
Although there are numerous applications for it, there are also numerous draw-
backs. When all users share a single beam, everyone receives an identical signal [37],
according to one technique. It is possible to employ numerous beams, each of which
may be utilised by different groups of users to receive a different signal, as an alternate
method [38]. The following studies on beamforming in MIMO-NOMA systems are
examples of such investigations. [39] Proposes a downlink MIMO-NOMA system
with multi-user beamforming as an alternative to the current standard. Two persons
can use the same beam at the same time if they are in close proximity to one another.
Because this beam can only be shared by two users at a time, strategies like clus-
tering and power allocation can be utilised to enhance overall capacity while simul-
taneously decreasing interference between clusters and between users.The effec-
tiveness of multicast beamforming is examined in [40]. In order to distribute their
information streams, broadcast system transmitters use a large number of antennas
based on multi resolution broadcast ideas that only provide low priority signals for
consumers who are far away from the broadcasting system or who have poor channel
quality near BS in order to keep him or her connected. With the use of superposition
coding and a minimal power beamforming issue, it is possible to conduct random
beamforming. Due to the assumption that all users in a cluster would utilise the
same beam, all transmission power is assigned to the same amount of transmission
490 L. S. Bitla and C. Sakode
power across all beams. A spatial filter should also be used to decrease interfer-
ence between clusters and between beams, according to the recommendations of
the authors. The fractional frequency reuse concept is proposed as a means of opti-
mising power distribution across a large number of beams, with consumers with
varied channel conditions accepting a variety of reuse ratios. [41] describes a down-
link multi-user MIMO-NOMA system that reduces interference while simultane-
ously increasing capacity, where mobile users receive antennas outnumber the base
station’s broadcast antennas. In this paper, we present a zero-force beamforming-
based technique for inter-cluster interference reduction, which is particularly useful
when distinguishing between users of various channel quality is predicted. To achieve
the highest possible throughput while minimising disturbance, approaches such as
user clustering and the minorization-maximization approach is also advised as an
approximation in order to minimise the significant computational costs associated
with the nonconvex optimization problem
The minorisation-maximisation approach is also advised as an approximation in
order to minimise the significant computational costs associated with the nonconvex
optimization problem. The primary objective of the minorisation-maximisation tech-
nique is to maximise system throughput for a given number of users that multiple
beams are used, the [42] downlink MIMO- NOMA system broadcasts precoded
signals to all cellular users; means every beam serves a certain number of customers;
in other words, each beam serves a single consumer. It has been recommended that
three distinct approaches be used in conjunction with one another in order to optimise
the overall rate.
With the use of weighted sum rate maximisation, a unique beamforming matrix is
created for each beam, with each beam making use of all of the CSI that is accessible
at the BS. The second technique makes use of user scheduling in order to take use of
super SIC for each mobile user. To realise the maximum potential of SIC, channel
gains within each cluster must be considerably different, and channel correlation
among mobile users must be strong in order to enjoy the benefits to the greatest extent
possible. Fixed power allocation, on the other hand, attempts to optimise performance
by delivering neither a higher sum rate nor also a more pleasant performance for
customers who have poor channel quality. [43] investigates a layered transmission
system with a two-user MIMO- NOMA maximum transmission power constraint
in order to determine the most efficient power allocation strategy for the system.
Because each mobile user decodes signals in SIC in a sequential manner, using
layered transmission instead of non layered transmission significantly reduces the
complexity of decoding signals in the SIC. As a consequence, the average sum rate
and its limitations are demonstrated to have a closed-form expression in both the
perfect CSI and partial CSI situations, demonstrating that the average sum rate and
its limits can be expressed mathematically. The average total rate grows in tandem
with the growth in the number of antennas used. It is stated in [44–46] networks in
a MIMO-NOMA framework. It was also discovered that by combining two distinct
power distribution systems, a reasonable balance between fairness and throughput
could be achieved. Different QoS criteria may be satisfied utilising the fixed power
allocation approach. Further, a power allocation approach based on cognitive radio
Review Paper on Joint Beamforming, Power Control and Interference … 491
technology ensures that the QoS needs of the end user are satisfied straight away.
Also, conceivable for the open-system were the construction of exact and asymptotic
equations (OP). A study published in [47] investigates the power reduction issue.
According to [48], there exist precoders known as linear beamformers that provide a
greater overall total throughput while simultaneously enhancing the user’s throughput
on channels of poor quality. These precoders also meet the requirements of the Quality
of Service standard. Furthermore, it has been proven that higher distinct channel gains
result in superior NOMA performance for the greatest number of users per cluster.
A superimposed pilot scheme in which the Gaussian signal prohibits the use of a
pilot seems to have the greatest amount of pilot power when the power loss that may be
produced by the use of a pilot appear to have zero power. It is more efficient to use the
superimposed code maximisation method rather than the orthogonal method when
there are more mobile users and greater mobility. Massive MIMO is distinct from
massive access MIMO, which is discussed in [49], In [50], a low-complexity Gaussian
message passing iterative detection technique is applied to achieve the lowest mean
square error multi-user detection, and both its means and variances perfectly converge
with rapid speed. MA scheme, NOMA, has also been proposed for consideration in
millimeter-wave communication systems and it integrates beamspace MIMO and
provides massive connectivity in situations where lot of cellular users outnumber the
lot of radio frequency chains, while also achieving improved spectrum and energy
efficiency performance [51]. In addition, a zero-forcing (ZF) precoding approach
has been developed to reduce inter beam interference to the greatest extent possible.
Another set of innovations includes a dynamic power allocation system and iterative
optimization methods with higher sum rates and less complexity. The issue of energy
efficiency optimization for MIMO-NOMA systems with imperfect BS CSI across
Rayleigh fading channels is addressed in [52, 53]. However, there are specified limits
on the total amount of money that can be spent.
7 Conclusion
Today’s wireless networks make use of the Non-orthogonal multiple access (NOMA)
technique in order to distribute radio resources equally among its customers devices.
Because of the growth in multiple users, it is more similar to OMA-based tech-
niques that will fall short of more stringent requirements such as high spectral
efficiency, ultra-low latency and broad connectivity. Successive improve spectral
efficiency while yet permitting for some multiple-access interference at the receiver
end, the idea of non-orthogonal multiple access (NOMA) has evolved. Our goal is to
write this instructional-style essay to provide a NOMA downlink model and to offer
extensions to MIMO and cooperative communication scenarios.
492 L. S. Bitla and C. Sakode
References
1. Lin Z, Li Mn, Zhu W-P, Wang J-B, Cheng J (2020) Robust secure beamforming for wireless
powered cognitive satellite-terrestrial networks. IEEE Trans Cognitive Communications and
Networking (2020).
2. Lu Y, Koivisto M, Talvitie J, Valkama M, Lohan ES (2020) Positioning- aided 3D beamforming
for enhanced communications in mmWave mobile networks. IEEE Access 8: 55513–55525
3. Papageorgiou GK, Voulgaris K, Ntougias K, Ntaikos DK, Butt MM, Galiotto C, Marchetti
N et al (2020) Advanced dynamic spectrum 5G mobile networks employing licensed shared
access. IEEE Commun Mag 58(7):21–27
4. Naderializadeh N, Eisen M, Ribeiro A (2020) Wireless power control via counterfactual opti-
mization of graph neural networks. In: 2020 IEEE 21st international workshop on signal
processing advances in wireless communications (SPAWC). IEEE, pp 1–5
5. Gilan MS, Maham B (2020) Virtual MISO with joint device relaying and beamforming in 5G
networks. Phys Commun 39:101027
6. Choi J, Cho Y, Evans BL (2020) Quantized massive MIMO systems with multicell coordinated
beamforming and power control. IEEE Trans Commun
7. Kong J, Dagefu FT, Sadler BM (2020) Simultaneous beamforming and nullforming for covert
wireless communications. In: 2020 IEEE 91st vehicular technology conference (VTC2020-
Spring). IEEE, pp 1–6
8. Liu Y, Li J, Wang H (2019) Robust linear beamforming in wireless sensor networks. IEEE
Trans Commun 67(6):4450–4463
9. Wu Q, Zhang R (2019) Intelligent reflecting surface enhanced wireless network via joint active
and passive beamforming. IEEE Trans Wireless Commun 18(11):5394–5409
10. Huang H, Peng Y, Yang J, Xia W, Gui G (2019) Fast beamforming design via deep learning.
IEEE Trans Veh Technol 69(1):1065–1069
11. Ioushua SS, Eldar YC (2019) A family of hybrid analog–digital beamforming methods for
massive MIMO systems. IEEE Trans Signal Process 67(12):3243–3257
12. Zhu L, Zhang J, Xiao Z, Cao X, Xia X-G, Schober R (2020) Millimeter-wave full-duplex
UAV relay: Joint positioning, beamforming, and power control. IEEE J Sel Areas Commun
38(9):2057–2073
13. Peken T, Tandon R, Bose T (2020) Unsupervised mmWave beamforming via autoencoders. In:
ICC 2020–2020 IEEE international conference on communications (ICC). IEEE, pp 1–6
14. AlAmmouri A, Gupta M, Baccelli F, Andrews JG (2020) Escaping the densification plateau in
cellular networks through mmWave beamforming. IEEE Wirel Commun Lett 9(11):1874–1878
15. Zheng Y, Bi S, Zhang Y-JA, Lin X, Wang H (2020) Joint beamforming and power control for
throughput maximization in IRS-assisted MISO WPCNs. IEEE Internet of Things J
16. Zhao C, Cai Y, Liu A, Zhao M, Hanzo L (2020) Mobile edge computing meets mmWave
communications: Joint beamforming and resource allocation for system delay minimization.
IEEE Trans Wireless Commun 19(4):2382–2396
17. Li X, Zhu G, Gong Y, Huang K (2019) Wirelessly powered data aggregation for IoT via over-
the-air function computation: Beamforming and power control. IEEE Trans Wirel Commun
18(7):3437–3452
18. Zhu L, Zhang J, Xiao Z, Cao X, Wu DO, Xia X-G (2019) Joint Tx-Rx beamforming and
power allocation for 5G millimeter-wave non-orthogonal multiple access networks. IEEE Trans
Commun 67(7):5114–5125
19. Chen W-Y, Chen B-S, Chen W-T (2020) Multiobjective beamforming power control for robust
SINR target tracking and power efficiency in multicell MU-MIMO wireless system. IEEE
Trans Veh Technol 69(6):6200–6214
20. Mei W, Qingqing W, Zhang R (2019) Cellular-connected UAV: Uplink association, power
control and interference coordination. IEEE Trans Wirel Commun 18(11):5380–5393
21. Liang F, Shen C, Wei Y, Feng W (2019) Towards optimal power control via ensembling deep
neural networks. IEEE Trans Commun 68(3):1760–1776
Review Paper on Joint Beamforming, Power Control and Interference … 493
22. Chen Y, Wen M, Wang L, Liu W, Hanzo L (2020) SINR-outage minimization of robust beam-
forming for the non- orthogonal wireless downlink. IEEE Trans Commun 68(11):7247–7257
23. Liu X, Liu Y, Chen Y (2020) Machine learning empowered trajectory and passive beamforming
design in UAV-RIS wireless networks. IEEE J Selected Areas Commun
24. Ashok K, Sudha T (2020) Uninterrupted connectivity using conditional time split energy
extraction with beamforming system for disaster affected wireless networks. IEEE Access
8:194912–194924
25. Al-Obiedollah HM, Cumanan K, Thiyagalingam J, Tang J, Burr AG, Ding Z, Dobre OA (2020)
Spectral-energy efficiency trade-off-based beamforming design for MISO non-orthogonal
multiple access systems. IEEE Trans Wirel Commun 19(10):6593–6606
26. Zhao M-M, Wu Q, Zhao M-J, Zhang R (2020) Intelligent reflecting surface enhanced wireless
network: two-timescale beamforming optimization. IEEE Trans Wirel Commun
27. Ehlers B, Gupta AS, Learned R (2020) A MUD-enhanced multi-beam approach for increasing
throughput of dense wireless networks. IEEE Sens J
28. Newell G, Vejarano G (2020) Motion-based routing and transmission power control in wireless
body area networks. IEEE Open J Commun Soc 1:444–461
29. Zhu L, Zhang J, Xiao Z, Cao X, Wu DO, Xia X-G (2018) Joint power control and beamforming
for uplink non-orthogonal multiple access in 5G millimeter-wave communications. IEEE Trans
Wirel Commun 17(9):6177–6189
30. Ji J, Zhu K, Niyato D, Wang R (2020) Joint cache placement, flight trajectory, and transmission
power optimization for multi-UAV assisted wireless networks. IEEE Trans Wirel Commun
19(8):5389–5403
31. Zhang H, Yang N, Huangfu W, Long K, Leung VCM (2020) Power control based on deep
reinforcement learning for spectrum sharing. IEEE Trans Wirel Commun 19(6):4209–4219
32. Lin J, Zhao R, Li Q, Shao H, Wang W-Q (2017) Joint base station activation, user admission
control and beamforming in downlink green networks. Digital Signal Process 68:182–191
33. Shen C, Chang T-H, Gong J, Zeng Y, Zhang R (2020) Multi-UAV interference coordination
via joint trajectory and power control. IEEE Trans Signal Process 68:843–858
34. Li X, Li C, Jin S, Gao X (2018) Interference coordination for 3-D beamforming- based HetNet
exploiting statistical channel-state information. IEEE Trans Wirel Commun 17(10):6887–6900
35. Li X, Liu Z, Qin N, Jin S (2020) FFR based joint 3D beamforming interference coordination for
multi-cell FD-MIMO downlink transmission systems. IEEE Trans Veh Technol 69(3):3105–
3118
36. Wang J, Weitzen J, Bayat O, Sevindik V, Li M (2019) Interference coordination for millimeter
wave communications in 5G networks for performance optimization. EURASIP J Wirel
Commun Netw 2019(1):1–16
37. Mismar FB, Evans BL, Alkhateeb A (2019) Deep reinforcement learning for 5G networks: Joint
beamforming, power control, and interference coordination. IEEE Trans Commun 68(3):1581–
1592
38. Kaliszan M, Pollakis E, Stańczak S (2012) Multigroup multicast with application-layer coding:
beamforming for maximum weighted sum rate. In: Proceedings of the 2012 IEEE wireless
communications and networking conference, WCNC 2012, France, pp 2270–2275. (Apr 2012)
39. Kimy B, Lim S, Kim H et al (2013) Non-orthogonal multiple access in a downlink multiuser
beamforming system. In: Proceedings of the 2013 IEEE military communications conference,
MILCOM 2013. San Diego, Calif, USA, pp 1278–1283. (Nov 2013)
40. Choi J (2015) Minimum power multicast beamforming with superposition coding for multires-
olution broadcast and application to NOMA systems. IEEE Trans Commun 63(3):791–800
41. Ali MS, Hossain E, Kim DI (2017) Non-orthogonal multiple access (NOMA) for downlink
multiuser MIMO systems: user clustering, beamforming, and power allocation. IEEE Access
5:565–577
42. Sun X, Duran-Herrmann D, Zhong Z, Yang Y (2015) Non-orthogonal multiple access with
weighted sum-rate optimization for downlink broadcast channel. In: Proceedings of the 34th
annual IEEE military communications conference, MILCOM 2015. Tampa, Fla, USA, pp
1176–1181. (Oct 2015)
494 L. S. Bitla and C. Sakode
43. Choi J (2016) On the power allocation for MIMO-NOMA systems with layered transmissions.
IEEE Trans Wirel Commun 15(5):3226–3237
44. Chen C, Cai W, Cheng X, Yang L, Jin Y (2017) Low complexity beamforming and user selection
schemes for 5G MIMO-NOMA systems. IEEE J Sel Areas Commun 35(12):2708–2722
45. Shin W, Vaezi M, Lee B, Love DJ, Lee J, Poor HV (2017) Coordinated beamforming for
multi-cell MIMO-NOMA. IEEE Commun Lett 21(1):84–87
46. Ding Z, Schober R, Poor HV (2016) On the design of MIMO-NOMA downlink and uplink
transmission. In: Proceedings of the 2016 IEEE international conference on communications,
ICC 2016, Kuala Lumpur, Malaysia, May 2016
47. Cui J, Ding Z, Fan P (2017) Power minimization strategies in downlink MIMO-NOMA systems.
In: Proceedings of the 2017 IEEE international conference on communications, ICC 2017,
Paris, France, May 2017
48. Nguyen V-D, Tuan HD, Duong TQ, Poor HV, Shin O-S (2017) Precoder design for signal
superposition in MIMO-NOMA multicell networks. IEEE J Sel Areas Commun 35(12):2681–
2695
49. Liu L, Yuen C, Guan YL, Li Y, Huang C (2016) Gaussian message passing iterative detection
for MIMO-NOMA systems with massive access. In: Proceedings of the 59th IEEE global
communications conference, GLOBECOM 2016, Washington, DC, USA, Dec 2016
50. Liu L, Yuen C, Guan YL, Li Y (2016) Capacity-achieving iterative LMMSE detection
for MIMO-NOMA systems. In: Proceedings of the 2016 IEEE international conference on
communications, ICC 2016, Kuala Lumpur, Malaysia, May 2016
51. Wang B, Dai L, Wang Z, Ge N, Zhou S (2017) Spectrum and energy-efficient beamspace
MIMO-NOMA for millimeter-wave communications using lens antenna array. IEEE J Sel
Areas Commun 35(10):2370–2382
52. Sun Q, Han S, Chin-Lin I, Pan Z (2015) Energy efficiency optimization for fading MIMO non-
orthogonal multiple access systems. In: Proceedings of the IEEE international conference on
communications, ICC 2015, pp 2668–2673, London, UK, June 2015
53. Wu P, Jie Z, Su X, Gao H, Lv T (2017) On energy efficiency optimization in downlink
MIMO-NOMA. In: Proceedings of the 2017 IEEE international conference on communications
workshops, ICC workshops 2017. France, pp 399–404. (May 2017)
3D Reconstruction Methods
from Multi-aspect TomoSAR Method:
A Survey
1 Introduction
1.1 TomoSAR
The SAR system uses a radar sensor that is mounted on a satellite to synthesize an
antenna for several long kilometers. When the system is placed along the path of the
satellite, it accurately and continuously takes information about the particular area.
After that, the captured image of a particular area is reformed by digital processing
technology. The outcome of the process is a 2D high-resolution map of the image
scene. The main characteristic of the process is the microwaves generated by SAR
(2D). It penetrates into media like snow, cloud, rain, etc. TomoSAR (3D) system
obtained from SAR (2D) uses a radar that flows along multiple trajectories or paths
[1, 2]. When the TomoSAR 3D system is placed in multiple paths, it measures the
distance of the target from multiple paths. For localization of SAR (2D) radar along
a straight line, it only measures the distance from the target to each point of the
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 495
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_39
496 N. Akhtar et al.
particular line, whereas, for TomoSAR (3D) radar along multiple lines, it measures
the distance from the target to multiple lines [3, 4]. The resolution of TomoSAR
determines the bandwidth of the pulse along the direction of the slant range. The
length of the synthetic aperture is in the azimuth direction and the baseline aperture
is present in the cross range [5, 6]. The algorithm of TomoSAR is mainly classified
in backward projection [7, 8], compressive detection [9] and estimation which is
mainly spectral [10–12].
TomoSAR focusing is done by only one principle which is obtained by Fourier
transform of the specific scene reflected in project complexity and cross range of
coordinates [13, 14] (Fig. 1).
Observing the side view of the target building using tomosar principle, which is
marked as a red line, determines the visibility of the area, and the blue dots are the
scatters [15]. These scatters will represent the structure of building and to reconstruct
the same, it will return blue points to red line to improve the visibility. This process
in turn provides the elimination of fake targets [16, 17] (Fig. 2).
TomoSAR has many advantages, one such advantage is the use of advanced
technology by SAR (2D) as compared to TomoSAR (3D) for providing more accurate
information which was not available by using SAR(2D) [1]. Another added advantage
is that, for a particular given area, it covers a stack of images by flying multiple
trajectories or paths [18]. Even if, it can work in all weather conditions [19].
With these added advantages the TomoSAR (3D) has some specific drawbacks
also; the quality of the tomosar image deteriorates by noise and fake targets present
in them [20]. Multiple methods are required to remove unwanted factors and to
3D Reconstruction Methods from Multi-aspect TomoSAR Method ... 497
Fig. 2 Geometry of
Tomosar(3D) building [17]
reconstruct the 3D cloud TomoSAR point. It takes a lot of time to provide information
by flying in multiple paths and extracting the data points of a particular area [21].
In this manuscript, different approaches to the construction and reconstruction
of TomoSAR images are described thoroughly. Researchers will get the advantage
and flexibility of choosing different approaches very easily and in a very efficient
manner.
2 Methodology
As noise is present in the tomographic image, the false target is scattered in a disor-
dered manner. To detect the outline, the motive of this transform is to find imperfect
instances of an object present in a given shape class by using the procedure of voting.
It is composed of several straight lines connected in a segment. It is used in the
recognition of the pattern. In this algorithm, a process of voting is held where each
data point belonging to the pattern, votes for the possible pattern passing through
that point. The votes are stored in an accumulator array, known as bins. The pattern
which receives the maximum votes is known as the desired pattern [22].
In a N*N binary edge image equation of straight line is
Fig. 3; the red dashed line. So for all parameter cell, like (ρ, θ ), this algorithm will
calculate the value parameter and store all the pixel which lies within (ρ, θ ).
(ρ, θ ) is known as a straight line lying within the coordinates of x and y, but if
it is not determined, then it is termed as noise. The lines which are detected may be
broken due to noise present or by the density of some point clouds. Thus, some of
the broken lines belong to the same segment of the outline but the parameter of the
line is slightly not the same. In the K-means method of clustering, the lines detected
are grouped in various clusters and parameters associated with distance are used in
the clustering of the detected lines.
The computation parts of the Hough transform are as follows:
1. The calculation of value parameter and storing the pixels in the parameter of
space.
2. Finding all local maximum points that represent the segment of the line.
3. The extraction of a segment of the line using maximum position.
In this method, there are three steps that are performed in TomoSAR point of
extracting large datasets [23] which include the detection of facades and their
extraction, segmentation, and reconstruction (Fig. 4).
In facade detection, the already existing model like DTM (DIGITAL TERRAIN
MODEL) [using technical filter] is used for detection and for extraction of 2D point
density in the x-y horizontal ground. Now in segmentation, the reconstruction of the
individual facade is required so the point cloud which belongs to the same facade can
be used. It sometimes uses unsupervised clustering technique. Lastly, for reconstruc-
tion, a facade is normally determined by a flat surface, curved surface, and edges
or the boundary of facades and vertices [24]. To replace the Hough transform and
Facade reconstruction, we can use some other popular techniques described below.
3D Reconstruction Methods from Multi-aspect TomoSAR Method ... 499
MinPts > = D + 1
Whereas MinPts > = D + 1 in dimension D of the dataset
Minm points selected = 3
In tomographic reconstruction, DBSCAN is used. Firstly the TomoSAR point
cloud is generated and then it is imputed in the DBSCAN module (Fig. 7).
In the DBSCAN module, density detection is used for the separation of high-
density clusters from low-density clusters and by unsupervised clustering. This
process separates the data points into several groups. Having similar properties in
similar groups and different properties in different groups is a common phenomenon
in this type of clustering. After that, the unwanted factors such as noise and fake
targets are removed. So the extraction of targeted point clouds is achieved [25].
3D Reconstruction Methods from Multi-aspect TomoSAR Method ... 501
4 Conclusion
References
16. Liang L, Li X, Ferro-Famil L, Guo H, Zhang L et al (2018) Urban area tomography using a sparse
representation based two-dimensional spectral analysis technique. Remote Sens 10(2):109
17. Liu H, Pang L, Li F, Guo Z (2019) Hough transform and clustering for a 3-D building
reconstruction with tomographic SAR point clouds. Sensors 19:5378
18. Frey O, Magnard C, Ruegg M, Meier E (2009) Focusing of airborne synthetic aperture radar
data from highly nonlinear flight tracks. IEEE Trans Geosci Remote Sens 47(6):1844–1858
19. Meng M, Zhang J, Wong YD, Au PH (2016) Effect of weather conditions and weather forecast
on cycling travel behavior in Singapore. Int J Sustain Transp 10(9):773–780
20. Budillon A, Crosetto M, Johnsy AC, Monserrat O, Krishnakumar V, Schirinzi G (2018)
Comparison of persistent scatterer Interferometry and SAR tomography using sentinel-1 in
urban environment. Remote Sens 10:1986
21. Gini F, Lombardini F, Montanari M (2002) Layover solution in multibaseline SAR interfer-
ometry. Aerospace and electronic systems. IEEE Trans Aerosp Electron Syst 38:1344–1356
22. Basca CA, Talos M, Brad R (2005) Randomized Hough transform for ellipse detection with
result clustering. In: EUROCON 2005-The international conference on “computer as a tool”,
pp 1397–1400
23. Wang Y, Zhu X, Shi Y, Bamler R (2012) Operational TomoSAR processing using multi-
track TerraSAR-X high resolution spotlight data stacks. In: Proceedings of the IEEE IGARSS,
Munich,Germany
24. Zhu XX, Shahzad M (2014) Facade reconstruction using multiview spaceborne TomoSAR
point clouds. IEEE Trans Geosci Remote Sens 52(6):3541–3552
25. Guo Z, Liu H, Pang L, Fang L, Dou W (2021) DBSCAN-based point cloud extraction for tomo-
graphic synthetic aperture radar (TomoSAR) three-dimensional (3D) building reconstruction.
Int J Remote Sens 42(6):2327–2349
26. Bohn FJ, Huth A (2017) The importance of forest structure to biodiversity–productivity
relationships. R Soc Open Sci 4:160521
27. D ˘anescu A, Albrecht AT, Bauhus J (2016) Structural diversity promotes productivity of mixed,
uneven-aged forests in southwest- ern Germany. Oecologia 182:319–333
28. Toraño Caicoya A, Pardini M, Hajnsek I, Papathanassiou K (2015) Forest above-ground
biomassestimation from vertical re- flectivity profiles at L-Band. IEEE Geosci Remote Sens
Lett 12(12):2379–2383
29. Ho Tong Minh D, Ndikumana E, Vieilledent G, McKey D, Baghdadi N (2018) Potential value
of combining ALOS PALSAR and Landsat-derived tree cover data for forest biomass retrieval
in Madagascar. Remote Sens Environ 213:206–214
30. Le Toan T, Beaudoin A, Riom J, Guyoni D (1992) Relating forest biomass to SAR data. IEEE
Trans Geosci Remote Sens Lett 30:403–411
Security and Privacy in IoMT-Based
Digital Health care: A Survey
Ashish Singh, Riya Sinha, Komal, Adyasha Satpathy, and Kannu Priya
1 Introduction
A few decades back, there was nothing to look at or detect inside the human body
because of a lack of knowledge and technology. In many cases, no one knew the
cause of death of many people and the cause of the disease. People were not familiar
with their bodies or which condition was inherited in their bodies. They also did not
know how to overcome from the disease. But now, the scenario is different. IoMT
changes the medical system. IoMT refers to the interconnection of medical devices
architecture with technology. Medical sensors and wearable devices together make
the IoMT. It provides better communication, remote medical assistance, management
of proper medicines, tracking patients’ life cycles, and many more things. The role of
IoMT in human’s life is people use this approach to detect different things inside the
body, such as level of glucose, pulse rate, proper circulation of blood, and many more
in daily life. With the help of a smart system in health care, doctors are successfully
completing critical operations and saving many individuals’ lives. IoMT also helps
people to know and analyze their bodies. After analyzing the body, it suggests suitable
yoga and exercises which keeps them fit and healthy.
In today’s scenario, one-third of IoT devices are engaged in health organizations,
and it is about to increase by the year 2025 [24]. Day by day, the technology of
IoMT is revolutionizing. Its efficiency is also growing, and the cost is decreasing;
these outcomes are far better than in the past. The data collection, transmission, and
analysis of the system’s raw facts and figures are speedy using IoMT tools. People
can pair their devices with their smartphone applications. This makes the system
keep track of the particular thing in need.
It contains different IoMT aspects from basic to advance in terms of technology
and advancements. We also focus on the security system architecture, including the
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 505
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_40
506 A. Singh et al.
device, fog, and cloud layers. Then we discussed the different communication proto-
cols running on different IoMT protocol layers, such as link layer protocol, network
layer protocols, transport layer protocols, and application layer protocols. We also
discussed the requirement of security in IoMT. This survey work also covers the
types of malware and mitigation techniques. The mitigation techniques include IDS,
anomaly-based detection, misuse-based detection, and specification. Malware detec-
tion through blockchain is also discussed in this paper. Analysis of security attacks is
done here, including eavesdropping, tag cloning, sensor tracking, etc. Different secu-
rity countermeasures are explained. Applications of IoMT include fitness tracking
and diagnostic, smart pills, virtual home, real-time patient monitoring, and personal
emergency response system. At the end, some of the open issues and challenges are
identified in this work.
The first step of the survey work is to define the research questions with different
types of security attacks, security countermeasures, and applications of IoMT. The
selection of accurate and concise research articles is critical in forming any research
project. The research topics were addressed using “search words” or “search key-
word” methodologies. Springer, Science Direct, IEEE, Elsevier, and other academic
research databases return results for the search phrase. These are typical databases
that cover a wide range of topics and facts that are useful. As a result, we’ve cho-
sen these databases. We have provided search engines with precise search phrases.
However, the results required some filtration, so the first criterion was language cho-
sen to be English. The second criterion is to eliminate brief publications that do not
adequately explain the study work. We also sought to stay away from old research
publications and focus mainly on new approaches. After receiving all the required
articles, we double-checked the list of all the selected works. This search procedure
ensures that no crucial and relevant works are overlooked during the keyword search
process. Following the discovery of relevant works, the next step is to categorize
them using various criteria, including security requirements, privacy, and security
aspects. The classified papers were being used to develop the sections of this paper.
The following is a list of the study’s remaining sections. The existing works
related to IoMT are discussed in Sect. 2. The derive security system architectural
model is discussed in Sect. 3. Section 4 discusses the protocols utilized in this layered
system. Section 5 discusses the security requirements for IoMT. Types of malware
and mitigation techniques are discussed in Sect. 6. Security attacks and their analysis
are covered in Sect. 7. In Sect. 8, security countermeasures are discussed. In Sect. 9,
IoMT applications are presented, followed by challenges and open issues in Sect. 10.
Finally, in Sect. 11, conclusions are discussed.
2 Literature Survey
This section discusses several previous works that are helpful to understand the IoMT
phenomena from different aspects. A comparative Table 1 is included to understand
all the existing works.
Security and Privacy in IoMT-Based Digital Health care: A Survey 507
Table 1 (continued)
Paper Year Aim of the Proposed Advantages Disadvantages
work approach
Alsubae et al. [5] 2019 Developing a Created a This One of the
web-based web-based framework can most difficult
IoMT-SAF IoMT Security be used by aspects of
that helps in Assessment solution using
the selection of
Framework providers to IoMT-SAF is
a solution that(IoMT-SAF) analyze and the length and
matches the based on a authenticate complexity of
stakeholder’s novel the security of defining
security ontological their products security
objectives and scenario-based features
supports the approach for
decision- recommending
making security
process features in
IoMT and
assessing
protection and
deterrence in
IoMT
solutions
Maddikunta 2020 Comparison of In the IoMT The detection It is not
et al. [41] DNN with context, DNN accuracy of the suitable for the
other machine is employed to model is good multi-class
learning construct problem
techniques effective and
using standard efficient IDS
intrusion
detection
dataset
Haseeb et al. [21] 2021 Develop a ML technique It provides an Limited
machine- is used to unsupervised scalability due
learning-based categorize IoT machine to the use of a
prediction nodes, and the learning single
model that SDN approach for controller
predicts controller’s IoT networks
network configurable that reduce
resource usage structure is communica-
and improves employed for a tion overheads
sensor data centralized and forecasts
delivery security
system
(continued)
Security and Privacy in IoMT-Based Digital Health care: A Survey 509
Table 1 (continued)
Paper Year Aim of the Proposed Advantages Disadvantages
work approach
Ogundokun 2021 Developing a An User privacy, Only work on
et al. [36] CryptoStegno amalgamated complete text data, not
model to approach assurance, audio or video
secure medical employing efficiency, and data
information on Triple Data durability are
the IoMT Encryption all achieved
environment Standard using this
(3DES) hybrid
cryptographic technique
techniques and
the
steganography
encoding
technique
Matrix XOR
was deployed
to safeguard
medical data
on the IoMT
platform
Doubla et al. [15] 2021 Investigate the A tabu Based on It does not
behaviors of a learning unpredictable extract
two-neuron two-neuron sequences meaningful
non- (TLTN) model from the data and uses a
autonomous with a TLTN model, whole chaotic
tabu learning composite complicated sequence for
model hyperbolic data such as encryption.
tangent medical Hence, taking
function made picture a larger
up of three encryption is duration of
hyperbolic easy time
tangent
functions with
varying offsets
Almogren et al. [4] 2020 Developed An intelligent It determines It has high
Fuzzy-based trust the trust value server
Trust management of a node, and overhead and
Management method is then the trust packet
System (FTM) developed in traits, such as delivery delay
for reducing two phases, integrity, time
Sybil attacks the first phase receptivity,
in the In outlines the and respon-
FTM-IoMT mechanisms of siveness, are
processing and assessed
the second
phase shows
how the
suggested
mechanism
works
510 A. Singh et al.
and estimate possible S&P hazards in the IoMT. Allouzi et al. [3] define a security
plan for the IoMT network. Any flaws or defects in the IoMT network that could
allow unauthorized users to get access and threats that could exploit these flaws
are also discussed. Using the Markov transition probability matrix, the probability
distribution of IoMT threats is derived. Priya et al. [41] proposed a Deep Neural
Network (DNN) framework to create efficient IDS that categorizes and anticipates
unexpected cyberattacks in the IoMT environment. A detailed analysis of trials in
DNN with some other machine learning techniques is compared using the standard
intrusion detection dataset. The Internet of Medical Sensor Data, IDS, and Intruders
are the three primary components of the developed framework.
Identification Authentication
Cloud Layer
Security Gateway
Private Network Messaging Control
Fog layer
Device layer
In IoMT, many devices are linked together in a network. The communication in this
network between physical objects takes place through protocols and standards. So, it
is very important to use the correct protocol to make the communication secure and
reliable. IoMT protocols are used in various network layers to facilitate data exchange
between devices, devices to the cloud, and other interactions. This section will discuss
the protocols used in various layers of IoMT. The summary of IoMT communication
protocols that are running in different layers is summarized in Table 2.
– Link Layer Protocol: This layer determines how the data is physically sent over a
medium. This layer uses Z-Wave, Wi-Fi, BLE, ZigBee, and NFC protocols.
Security and Privacy in IoMT-Based Digital Health care: A Survey 513
IoT networks enable various new services and business models for users and ser-
vice providers by increasing connectivity across all markets and sectors. The better
connection enables more accurate healthcare services, and faster workflows enhance
operational productivity for healthcare organizations [18]. A set of security require-
ments is required to assure the security of IoMT sensitivity.
Malware refers to any malicious software intended to hurt or exploit any pro-
grammable thing, application, or network. Cybercriminals often utilize it to retrieve
information that they may exploit to gain financial advantage. The following malware
is discussed here [48].
Security and Privacy in IoMT-Based Digital Health care: A Survey 517
1. Types of Malwares:
– Spyware: Spyware is a type of spyware that monitors user behavior without
their permission. Such malicious actions as keylogging, activity tracking, data
harvesting, account passwords, and financial data monitoring are examples of
spyware. It may potentially change the software’s security settings. It takes
advantage of software flaws and attaches itself to a usual computer running
program.
– Keylogger: It is a malicious piece of code that allows a hacker to track the
user’s keystrokes. A keylogger [42] attack is more effective than a brute-force
or dictionary-based attack. This dangerous program tries to gain access to a
user’s device by convincing them to download it by clicking on a link in an
email. It is one of the most deadly malwares because even a strong password
isn’t enough to protect the system.
– Trojan Horse: This malware poses as a legitimate computer program to deceive
people into downloading and installing it. It enables a hacker to gain remote
access to an infected system with permission. Once a hacker has access to an
infected system, they can steal sensitive information. It can also install other
malicious programs in the system and carry out additional destructive acts.
– Virus: This harmful application can replicate itself and propagate to other com-
puters. It infects other computers by attaching itself to different programs, and
when a user runs a legitimate code, the attached infected program is also run. It
can be used to steal data, cause damage to the host system, and create botnets.
– Worm: It spreads across a network by exploiting flaws in the operating system.
It harms their host networks by consuming too much bandwidth and over-
whelming web servers. It generally contains a payload designed to harm a host
system. Hackers frequently use this to steal important information, erase files,
or build a botnet. In nature, worms self-replicate and spread independently,
whereas viruses require human intervention to spread. Corrupted attachments
transmit worms in emails.
2. Mitigation Techniques: The first step in reducing risk is to recognize the potential
risk. It includes addressing main risks regularly to guarantee that your system is
completely safeguarded.
(a) Intrusion Detection System: An IDS is a part of the software that monitors and
analyzes harmful activity within a network or system. It detects and protects a
variety of devices (such as smart medical equipment) against potential threats
and attacks [29]. The IoMT context includes the deployed IDS monitors and
verifies all traffic (both usual and malicious) and looks for harmful indicators.
The linked IDS component takes the appropriate action to detect any harmful
behavior.
An IDS technique can be classified into three types: anomaly-based detection,
misuse-based detection, and specification-based detection. The following is
a summary of these mechanisms.
518 A. Singh et al.
10. Impersonation Attack: In this attack, a malicious person poses a genuine party in
an authentication protocol to obtain access to resources or confidential material
that they are not allowed to access [37].
9 Applications of IoMT
The IoT medical field is rapidly evolving with new developments and applications.
Radical solutions are being deployed to address holistic healthcare concerns, ranging
from smart monitors to patient diagnostic devices. Increased accuracy, enhanced
efficiency, and lower costs are benefits of adopting IoMT into regular healthcare
procedures. Table 3 gives brief information about some of the key applications of
IoMT.
This section discusses some of the open issues and challenges in the IoMT environ-
ment that is still unsolved [28]:
– Security Concerns: IoMT devices rely on open wireless connections. Thus, they are
vulnerable to a variety of wireless/network attacks. In fact, due to a lack of security
protections and security verification mechanisms, numerous IoMT devices are
readily circumvented by a trained intruder. An intruder can gain access to incoming
522 A. Singh et al.
and outgoing data and information. As a result, security risks like unauthorized
access can arise.
– Privacy Issues: Passive attacks such as traffic analysis raise privacy concerns. The
majority of these attacks resulted in the intrusion of patients’ privacy through data
leakage, which leads to exposure of sensitive data. In this issue, the attacker can
obtain and publish information about patients’ identities and sensitive and secret
patient data. This might create a person’s medical problems, damage the patient
image in the social environment, or pose a significant threat to patients.
– Trust Concerns: The trust of IoMT devices is another issue because device breaches
may leak the patient’s personal sensitive information. It might also endanger their
lives and social image because hackers will access their confidential medical
information.
– Accuracy concerns: The accuracy of IoMT devices is another concern caused
by the device’s malfunction. A report says more than 8061 malfunctions were
reported from 2001 to 2013. These attacks lack precision and accuracy in medical
robot-assisted surgeries, patient misdiagnosis, and incorrect medical prescriptions.
– Standardization of IoT devices: The absence of standardization of IoT devices
is a vital issue. The medical devices were incorporated into IoT systems. There
is a need of a standard communication protocol that will communicate in dif-
ferent networks or platforms. Standardization is necessary for numerous medical
equipments and devices to work together. It also required manufacturers to imple-
ment the appropriate security measures to safeguard them from being attacked by
hackers.
This paper discussed an architectural model of IoMT in terms of security and pri-
vacy. From the literature, we have identified that security and privacy are significant
problems that limit IoMT usage at the consumer level, so a discussion about secu-
rity system architecture is essential. The work includes different communication
protocols based on the IoMT protocol stack. Security requirements, types of mal-
ware and mitigation techniques, security attacks and analysis, countermeasures, and
application are important points covered in this survey work. Based on the discussed
aspect, problems and open issues in the IoMT field are presented, which will assist
researchers and practitioners in developing new applications securely.
Apart from this, this article has a limited number of security solutions and appli-
cations. We need to discuss application-specific security attacks and its prevention
in IoMT in health care in the future. This also needed to be elaborated in the future.
Security and Privacy in IoMT-Based Digital Health care: A Survey 523
References
1. Abdul-Ghani HA, Konstantas D (2019) A comprehensive study of security and privacy guide-
lines, threats, and countermeasures: an IoT perspective. J Sens Actuator Netw 8(2):22
2. Al-Kashoash HA, Kemp AH (2016) Comparison of 6lowpan and lpwan for the internet of
things. Australian J Electr Electron Eng 13(4):268–274
3. Allouzi MA, Khan JI (2021) Identifying and modeling security threats for IoMT edge network
using markov chain and common vulnerability scoring system (CVSS). arXiv:2104.11580
4. Almogren A, Mohiuddin I, Din IU, Almajed H, Guizani N (2020) FTM-IoMT: Fuzzy-based
trust management for preventing sybil attacks in internet of medical things. IEEE Int Things J
8(6):4485–4497
5. Alsubaei F, Abuhussein A, Shandilya V, Shiva S (2019) IoMT-SAF: internet of medical things
security assessment framework. Int Things 8:100123
6. Alsubaei F, Abuhussein A, Shiva S (2017) Security and privacy in the internet of medical things:
taxonomy and risk assessment. In: 2017 IEEE 42nd conference on local computer networks
workshops (LCN Workshops), pp 112–120. https://doi.org/10.1109/LCN.Workshops.2017.72
7. Aslam B, Javed AR, Chakraborty C, Nebhen J, Raqib S, Rizwan M (2021) Blockchain and
ANFIS empowered IoMT application for privacy preserved contact tracing in covid-19 pan-
demic. Pers Ubiquitous Comput 1–17
8. Bharati S, Podder P, Mondal MRH, Paul PK (2021) Applications and challenges of cloud
integrated IoMT. In: Cognitive internet of medical things for smart healthcare. Springer, pp
67–85
9. Bibi N, Sikandar M, Ud Din I, Almogren A, Ali S (2020) IoMT-based automated detection
and classification of leukemia using deep learning. J Healthc Eng 2020
10. Bigini G, Freschi V, Lattanzi E (2020) A review on blockchain for the internet of medical
things: definitions, challenges, applications, and vision. Futur Int 12(12):208
11. Chen M, Ma Y, Song J, Lai CF, Hu B (2016) Smart clothing: connecting human with clouds
and big data for sustainable health monitoring. Mob Netw Appl 21(5):825–845
12. Das PK, Zhu F, Chen S, Luo C, Ranjan P, Xiong G (2019) Smart medical healthcare of internet
of medical things (IoMT): application of non-contact sensing. In: 2019 14th IEEE conference
on industrial electronics and applications (ICIEA). IEEE, pp 375–380
13. Dilawar N, Rizwan M, Ahmad F, Akram S (2019) Blockchain: securing internet of medical
things (IoMT). Int J Adv Comput Sci Appl 10(1):82–89
14. Ding ZH, Li JT, Feng B (2008) A taxonomy model of RFID security threats. In: 2008 11th
IEEE international conference on communication technology. IEEE, pp 765–768
15. Doubla IS, Njitacke ZT, Ekonde S, Tsafack N, Nkapkop J, Kengne J (2021) Multistability and
circuit implementation of tabu learning two-neuron model: application to secure biomedical
images in IoMT. Neural Comput Appl 1–29
16. Fuji R, Usuzaki S, Aburada K, Yamaba H, Katayama T, Park M, Shiratori N, Okazaki N (2019)
Blockchain-based malware detection method using shared signatures of suspected malware
files. In: International conference on network-based information systems. Springer, pp 305–
316
17. Gaddour O, Koubâa A (2012) RPL in a nutshell: a survey. Comput Netw 56(14):3163–3178
18. Ghubaish A, Salman T, Zolanvari M, Unal D, Al-Ali AK, Jain R (2020) Recent advances in
the internet of medical things (IoMT) systems security. IEEE Int Things J
19. Goffredo R, Accoto D, Guglielmelli E (2015) Swallowable smart pills for local drug delivery:
present status and future perspectives. Expert Rev Med Devices 12(5):585–599
20. Grym K, Niela-Vilén H, Ekholm E, Hamari L, Azimi I, Rahmani A, Liljeberg P, Löyttyniemi E,
Axelin A (2019) Feasibility of smart wristbands for continuous monitoring during pregnancy
and one month after birth. BMC Pregnancy Childbirth 19(1):1–9
21. Haseeb K, Ahmad I, Awan II, Lloret J, Bosch I (2021) A machine learning SDN-enabled big
data model for IoMT systems. Electronics 10(18):2228
524 A. Singh et al.
45. Sun Y, Lo FPW, Lo B (2019) Security and privacy for the internet of medical things enabled
healthcare systems: a survey. IEEE Access 7:183339–183355
46. Usman M, Jan MA, He X, Chen J (2019) P2dca: a privacy-preserving-based data collection
and analysis framework for IoMT applications. IEEE J Sel Areas Commun 37(6):1222–1230
47. Vaiyapuri T, Binbusayyis A, Varadarajan V (2021) Security, privacy and trust in IoMT enabled
smart healthcare system: a systematic review of current and future trends. Int J Adv Comput
Sci Appl 12:731–737
48. Wazid M, Das AK, Rodrigues JJ, Shetty S, Park Y (2019) IoMT malware detection approaches:
analysis and research challenges. IEEE Access 7:182459–182476
5G Technology-Enabled IoT System
for Early Detection and Prevention
of Contagious Diseases
1 Introduction
The outbreak of the COVID-19 virus has conveyed a message that communities,
countries and civilizations are evolving and transforming due to the disease. The
faster means of transport quickly convert a disease into an epidemic and then into a
pandemic. Table 1 shows the global health pandemic timeline.
Table 1 clearly indicates that from time to time there has been an outbreak of a
virus, and this is the right time that society should get ready for the next outbreak.
An IoT-based system for early detection and prevention of the spread of contagious
disease is the need of the hour. The proposed system will employ 5G wireless tech-
nologies for communication with cloud computation and storage. Figure 1 shows the
death toll due to various pandemics over a century. Figure 2 shows the evaluation of
wireless technologies.
The Indian Government has already started implementing 5G networks. The bands
identified for 5G technology are 700 MHz, 3.5 GHz and 26/28 GHz. Table 2 gives
the year-wise details of various wireless technology.
The inherent advantages of 5G technology-based IoT network over a 4G LTE-
based IoT network are shown in Fig. 3. The proposed IoT-based system will be based
on the latest 5G technology to take inherent advantages of 5G technology and harness
higher data rates for better processing as shown in Table 2 and Fig. 3.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 527
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_41
528 A. Saxena et al.
The rest of the paper is organized as follows: literature review, problem identifi-
cation and gap in existing technology are carried out in Sect. 2, the proposed system
architecture is presented in Sect. 3 followed by implementation details in Sect. 4 and
hardware description of the proposed work in Sect. 5. The results are discussed in
Sect. 6 and finally the conclusion and future work of the proposed work in Sect. 7.
2 Related Work
U. Varshney in his paper on health monitoring of disabled patients using wireless tech-
nology proposed a health monitoring system that uses wireless and mobile networks.
The proposed system operated autonomously without patient intervention, which is
generally not possible with patients suffering from one or more disabilities. However,
the system didn’t address the issue of the detection of disease [1]. V. Sharma et al.
in their paper on low-energy health monitoring for patients based on the LEACH
protocol proposed a health monitoring wireless device that contains good range and
capability and improved the performance of the health monitoring network by the
Low Energy Adaptive Clustering Hierarchy (LEACH) protocol. The proposed system
had the limitations of portability and easy implementation [2]. M. Baswa et al. in
their paper on e-health monitoring architecture proposed health monitoring archi-
tecture using GSM based upon the communication devices like mobile phones and
wireless sensor networks for the real-time analysis of the patient health condition.
The main focus of the paper was on developing a model that can facilitate doctors
through tele-monitoring. The device failed to address the health monitoring of a
large number of people; it was suitable for individuals who were at home or at the
hospital [3]. M. S. Uddin et al. in their paper on IoT-based patient monitoring system
proposed a remote monitoring system which includes vehicle or assets monitoring,
kids/pets monitoring, fleet management, parking management, water and oil leakage,
energy grid monitoring, etc. They have proposed an intelligent patient monitoring
system for monitoring the patients’ health condition automatically through sensor-
based connected networks. However, the system had severe limitations in monitoring
patients who are suspected of contagious diseases [4]. A. Bhatti et al. in their paper
on economical patient tele-monitoring system for remote areas proposed a novel,
rapid and cost-effective tele-monitoring architecture based on an Arduino device
hardware system. Their prime goal was to design a prototype that could serve as a
reliable patient monitoring system, so that healthcare professionals can monitor their
patients in real time, who are either hospitalized in critical conditions or unable to
perform their normal daily life activities. The system was not designed for the early
detection of disease nor it had any feature to check the spread of disease once it is
detected [5]. T. Erlina et al. in their paper on patient’s smart health system proposed
a system that monitors the number of heartbeats and respiratory rate, and detects
5G Technology-Enabled IoT System for Early Detection and Prevention … 531
eyelid opening using a pulse sensor, thermistor and Infrared Light Emitting Diode
(IR LED), respectively. Still, the system severely suffered from the limitation of
continuous unattended monitoring and alarm generation on detection of symptoms
of infection in monitored subjects [6]. Shahbaz Khan et al. in their paper on COVID-
19 patients monitoring using a health band proposed a health band that is developed
for monitoring the patients sent to quarantine, or under medical treatment. The novel
COVID-19 created a time of pandemic as large crowds of people were sent to either
isolation or quarantine centers; their health monitoring is a challenge for today’s
medical team as well as patients under observation. This health band is developed
to provide quality monitoring without spreading the virus among the patients and
medical staff. However, the implemented system requires some necessary changes
in terms of parameters monitored, response time and reliability [7]. Otoom M. et al.
in their paper on identification and monitoring of COVID-19 using IoT proposed
a system that collects real-time symptom data from users using an IoT framework
for early identification of suspected coronavirus cases. The system also monitors the
treatment and response of those who have already recovered from the virus; thus, it
tries to understand the nature of the virus by collecting and analyzing relevant data.
The proposed system severely suffered from the limitation of continuous unattended
monitoring and alarm generation on detection of infection in monitored subjects [8].
The proposed system consists of four parts: the sensors, the data aggregator, applica-
tion and the cloud server. The health of persons needs to be monitored, and the sensors
deployed should be able to detect any deviation from normal values and send an alert
message to responsible persons such as government authorities, doctors, hospitals
and family members. Several sensors can be deployed to measure and monitor various
physiological changes. The sensors can be deployed in jackets, wristbands, watches,
clothes, shoes, jewelry, handbag, etc. in order to monitor various parameters like
heart rate, blood pressure, body temperature, oxygen level in body, pulse rate, etc.
Figure 4 depicts various possibilities for deploying the proposed system.
The number of sensors used can be changed depending on the parameters sensed.
The system is fully customizable. In the present paper, pulse sensor, heart rate, SPO2,
temperature sensor, heart ECG monitoring sensor and PIR motion sensors are used.
532 A. Saxena et al.
The sensor continues to sense various physical parameters and sends the data to Node
MCU for aggregation, analysis and monitoring purposes.
3.3 Application
The application part of the proposed system continuously checks the aggregated data
for any unusual and abnormal activity which means the acquired data crosses the
required pre-set values.
The analyzed data along with any alert signal in case of abnormal reading is sent to
the cloud server for sending an alert signal as decided by the user to the hospital, to
relatives or to his own smartphone.
5G Technology-Enabled IoT System for Early Detection and Prevention … 533
4 Implementation Details
The proposed system was implemented and a hardware prototype was prepared
for testing. The hardware details of the proposed system are shown in Fig. 5.
Figure 6 gives the complete hardware implementation of the proposed system and
its experimental setup.
5 Hardware Description
The different sensors used in designing the system prototype (shown in Fig. 6) are
listed below.
1. Node MCU ESP8266
2. Pulse Sensor (SKU-835048)
3. Heart rate, SPO2, Temperature sensor (SKU-845800)
4. Heart ECG Monitoring Sensor (AD8232)
5. PIR Motion Sensor.
Node MCU ESP8266 is the main controller that is used in this IoT application
as shown in Fig. 7a. Its high processing power and low operating voltage of 3.3 V
with in-built Wi-Fi/Bluetooth and Deep Sleep Operating features make it ideal for
the present application [9]. Pulse Sensor used in the proposed circuit is SKU-835048
shown in Fig. 7b. The used sensor is compatible with most of the microcontrollers
such as Arduino and Node MCU. The output of the pulse sensor is digital, therefore
it can be directly interfaced with MCU. The sensor works on 5VDC [10]. Heart rate,
SPO2, Temperature sensor used in the proposed circuit is SKU-845800 shown in
Fig. 7c. The used sensor is compatible with most of the microcontrollers such as
Arduino and NODE MCU. The output of Heart rate, SPO2, Temperature sensor is
digital, therefore it can be directly interfaced with MCU. The sensor is compatible
with 3.3 and 5 V logic levels. This sensor has three LEDs green, red and infrared.
The amount of light reflected back to the sensor can be detected by these LEDs in
combination with the photodetectors. Photoplethysmography (PPG) is a technique
that is used to detect the patient’s heart beat. When the patient’s fingertip is pressed
against the sensor, the change in color of the patient’s skin with each beat of his/her
heart is detected. This sensor measures the amount of light bounced back to the
sensor by the particles and thus can also be used to detect particles in the air, like
smoke [11]. Heart ECG Monitoring Sensor used in the proposed circuit is ECG
Module AD8232 shown in Fig. 7d. The used sensor is compatible with most of the
microcontrollers such as Arduino and NODE MCU. The output of the pulse sensor
is analog, therefore it cannot be directly interfaced with MCU; it needs connection
through ADC. The sensor works on 5VDC. The sensor is designed to extract, amplify
and filter bioelectric signals in the 0.1–10 mV range. The sensor can measure signals
in the presence of noisy conditions, such as those created by motion or remote
electrode placement. It is a cost-effective board for measuring the ECG of the patient.
Body movement sensor used in the proposed circuit is the SeeedStudio Grove Mini
PIR Motion sensor shown as shown in Fig. 7e. The used sensor is compatible with
most of the microcontrollers such as Arduino and NODE MCU. The output of the
pulse sensor is digital, therefore it can be directly interfaced with MCU. The sensor
works on 5VDC [12]. Human body movement sensor Grove Mini PIR Motion Sensor
5G Technology-Enabled IoT System for Early Detection and Prevention … 535
v1.0. is ideal for the present application. PIR stands for Passive Infra-Red. PIR sensor
measures infrared (IR) light radiating from objects in its field of view. This sensor
can be easily used in various things with the proposed design. This sensor is compact,
cost-effective and has low power consumption; moreover, this sensor has adjustable
sensitivity, and there is a reserved pin out on the back of the board so that it can be
soldered to a slide rheostat to adjust the sensitivity [13].
The features and specifications of the above components are given in appendix.
The proposed system prototype was implemented and evaluated for performance.
The proposed system is working as per theoretical predictions. With the help of the
sensors, it is able to predict and send timely alerts in case any of the parameters sensed
by the sensors indicate chances of infectious disease. Table 3 gives the various sensor
output, condition suspected, alert message sent or not and the response time of the
system. Figure 8 shows the alert message on a smartphone.
This research found that the spreading of any contagious disease may quickly turn as
an epidemic and then as a pandemic if not checked timely. So, the timely detection
and control of the spread of infectious disease is a much-waiting kind of research
investigation. This paper has proposed an IoT-based and 5G technology-based
automatic system to mitigate the impact of contagious diseases like COVID-19.
536 A. Saxena et al.
Table 3 Sensor status, response time and alert generation of proposed system
S.no Sensor data Condition Response Alert
Pulse Heart rate, Heart ECG PIR status time (ms) message
sensor SPO2, sensor motion
Temp. sensor
sensor
1. BT BT BT BT OK 52 Not
generated
2. AT BT AT BT Alert 59 Generated
3. AT AT AT BT Alert 64 Generated
4. BT BT AT AT Alert 62 Generated
5. BT AT AT AT Alert 68 Generated
6. AT BT AT AT Alert 67 Generated
7. AT AT BT AT Alert 67 Generated
8. AT AT BT BT Alert 60 Generated
9. BT BT BT AT Alert 57 Generated
10. AT AT AT AT Alert 72 Generated
BT = Below Threshold, AT = Above Threshold
Fig. 8 App showing normal and abnormal parameters of iOS user and Android user
5G Technology-Enabled IoT System for Early Detection and Prevention … 537
An experimental prototype was developed and tested, the results showed that the
prototype developed achieved desired accuracies of more than 90%, and its response
time confirmed the theoretical results. Using the proposed design, end user will
be equipped with an effective and accurate system to fight against the spread of
COVID-19 and other such contagious diseases. Employing the proposed system
in day-to-day life usage could potentially reduce the impact of pandemics, as well
as mortality rates through early detection of cases. The proposed system will also
provide the ability to follow up on recovered cases, and a better understanding of the
disease. The system will take to its leverage the inherent properties of 5G and IoT
for its benefit and to overcome the limitations posed by the 4G/LTE technologies.
IoT-based and 5G technology-based automatic system was able to utilize 5G-enabled
IoT technologies ensuring reduced date delay and increased reliability in terms of
quality of service. It has been suggested to deploy the system in various wearable
apparel. There has been extensive study of this work to provide the best performance
of the device by comparing the existing domains. The new features of this design
accomplish different objectives to measure the health symptoms, track and monitor
the patient during quarantine and maintain the data to predict the situation. As future
work, and due to the unavailability of the required data and testing on real subjects,
the system will be tested in hospitals and nursing homes for field testing and its
performance established in real-time operations.
Acknowledgements The authors are thankful to Prof. Rohit Garg, Director MIT and the
Management of MITGI for their constant motivation and support.
Appendix
Table 5 Specifications of PIMORONI MAX30105 Heart rate, Oximeter, temperature sensor SKU-
845800
Sl.no Parameters Output
1. Operating voltage (VDC) 5
2. Interface I2C
3. I2C Address 0 × 57
4. Compatible with All Models of Raspberry Pi and Arduino
5. Sensor length (mm) 19
6. Sensor width (mm) 19
7. Sensor height (mm) 3.2
8. Sensor weight (gm) 10
9. Sensor weight 0.015 kg
10. Sensor dimensions 5 × 5 × 1 cm
Table 7 Specifications of body movement Sensor, i.e. SeeedStudio Grove Mini PIR Motion sensor
Sl.no Parameters Output
1. Input supply voltage (VDC) 3.3 ~ 5
2. Working current 12 ~ 20 µA
3. Sensitivity 120 µ–530 µV
4. Sensor Max. detecting range 2m
5. Sensor length (mm) 24
6. Sensor width (mm) 20
7. Sensor height (mm) 12
8. Sensor weight (gm) 8
9. Sensor weight 0.012 kg
10. Sensor dimensions 6.8 × 4.3 × 1.2 cm
5G Technology-Enabled IoT System for Early Detection and Prevention … 539
References
1. Varshney U (2006) Managing wireless health monitoring for patients with disabilities. In: IT
professional, vol 8, no 6, pp 12–16, Nov–Dec 2006. https://doi.org/10.1109/MITP.2006.139
2. Sharma V, Sharma S (2017) Low energy consumption based patient health monitoring by
LEACH protocol. In: International conference on inventive systems and control (ICISC) 2017,
pp 1–4. https://doi.org/10.1109/ICISC.2017.8068632
3. Baswa M, Karthik R, Natarajan PB, Jyothi K, Annapurna B (2017) Patient health manage-
ment system using e-health monitoring architecture. In: International conference on intelligent
sustainable systems (ICISS) 2017, pp 1120–1124. https://doi.org/10.1109/ISS1.2017.8389356
4. Uddin MS, Alam JB, Banu S (2017) Real time patient monitoring system based on Internet of
Things. In: 2017 4th international conference on advances in electrical engineering (ICAEE),
2017, pp 516–521. https://doi.org/10.1109/ICAEE.2017.8255410
5. Bhatti A, Siyal AA, Mehdi A, Shah H, Kumar H, Bohyo MA (2018) Development of cost-
effective tele-monitoring system for remote area patients. In: International conference on engi-
neering and emerging technologies (ICEET) 2018, pp 1–7. https://doi.org/10.1109/ICEET1.
2018.8338646
6. Erlina T, Saputra MR, Putri RE (2018) A smart health system: monitoring comatose patient’s
physiological conditions remotely. In: International conference on information technology
systems and innovation (ICITSI) 2018, pp 465–469. https://doi.org/10.1109/ICITSI.2018.869
6094
7. Khan S, Shinghal K, Saxena A, Pandey A (2020) Design and development of health band for
monitoring of novel covid-19 under medical observation. Int. J. Adv. Eng. Manag. (IJAEM)
2(1):332–336. (June 2020)
8. Otoom M, Otoum N, Alzubaidi MA, Etoom Y, Banihani R (2020) An IoT-based framework
for early identification and monitoring of COVID-19 cases. Biomed Signal Process Control
62:102149. https://doi.org/10.1016/j.bspc.2020.102149
9. Datasheet of Node MCU ESP8266. https://www.espressif.com/sites/default/files/documenta
tion/0a-esp8266ex_datasheet_en.pdf
10. Datasheet of Pulse Sensor SKU-835048. https://robu.in/wp-content/uploads/2020/10/Pulse-
Sensor.pdf
11. Datasheet of Heart rate, SPO2, Temperature sensor (SKU-845800). https://datasheets.maximi
ntegrated.com/en/ds/MAX30102.pdf
12. Datasheet of Heart ECG Monitoring Sensor (AD8232). https://www.analog.com/media/en/tec
hnical-documentation/data-sheets/ad8232.pdf
13. Datasheet of Grove–Mini PIR Motion Sensor v1.0. https://www.mouser.com/datasheet/2/744/
Seeed_101020020-1217525.pdf
A Brief Review of Current Smart
Electric Mobility Facilities and Their
Future Scope
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 541
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_42
542 D. Satya Sai Surya Varun et al.
These types of EV’s are usually powered by both electricity as well as gaso-
line/petrol/diesel and are driven mainly by the engine (ICE/IC) and electric motor.
These types of EV’s are further classified as follows:
(a) Series Hybrid EV/EREV’s/REEV’s (Range extended)
These kinds of hybrid EV’s are usually equipped with similar batteries as in battery
electric vehicles (BEV). The ICE is utilized for power generators as well as the
A Brief Review of Current Smart Electric Mobility Facilities and Their … 543
battery. For high power requirements, the combined power from both batteries as
well as the generator is used. Since petrol/diesel is only used to drive the electric
motor, these are range-extended EV’s.
(b) Parallel Hybrid EV
These kinds of hybrid EV’s are powered by both ICE as well as electric
motor/generator. The varying power distribution system allows both components
to work simultaneously. Unlike Series Hybrid EV’s, having a separate generator, for
this case, isn’t required.
(c) Parallel Mild Hybrid EV
The components for this type of EV are the same as that of a Parallel Hybrid EV,
but the only disadvantage of this is being unable to be driven purely using electric
power. The motor is turned on only when the extra boost is required for the EV, under
extreme situations of need. As these types are unable to provide individual functioned
engine system’s deployment, either of ICE or Electric Motor Engine, they are termed
as “Mild Hybrids”.
(d) Parallel Split Hybrid EV/Through-the-Road (TTR) HEV
This type of HEV is usually equipped with both ICE as well as an electric motor (in-
wheel motor (IWM)) just as the above-mentioned EV’s. The electric motor, however
for this case, is capable of providing propulsion power to different axles [8]. This
type of HEV doesn’t consist of any kind of mechanical system to drive the wheels
of the EV; rather, the coupled power of the two systems is used to move the wheels.
These types of EV’s are equipped with power-split devices for the driver/customer
to opt for either of both mechanical as well as electrical operation of driving. These
types of HEV’s are capable of zero-emission driving, generally 20–30 miles.
(e) Series–Parallel Hybrid EV
This type of HEV could be driven with the help of petrol/diesel or by completely
reliance on electric motors or with the help of both the components to get optimum
performance. While both of them could be utilized, the engine is given higher priority
of performance and power input than that of the motor as it is the main component
that drives the whole system and also gets a maximum operating range for the same.
(f) Micro HEV’s
This kind of HEV is equipped with an integrated alternator/starter-type electric motor
to start or stop the engine. The ICE system for this is utilized when the EV starts
moving.
(g) Mild HEV’s
This type of HEV is mostly similar to that of Micro HEV in terms of components.
The integrated alternator/stator for this is designed larger and is more efficient as
compared to that of the Micro HEV’s component. A battery is equipped for the
same, which is only utilized for propulsion while EV is under cruising mode.
544 D. Satya Sai Surya Varun et al.
As the name suggests, these types of EV’s are equipped with a plug-in facility for
the electric motor/engine/battery to be charged with the help of grid-connected wall
sockets.
These types of EV’s are equipped with one or more than one electric motors to power
the engine. These are also equipped with exceptional batteries to get powered from
grid systems directly. They don’t use any form of gasoline. These types of EV’s
include the following types:
(a) Battery Electric Vehicle (BEV)
Propulsion is provided by an electric motor and the rest is powered by the power
storage unit. These types of EV’s are solely driven by batteries. There is zero emission
claim for these types of EV’s.
A Brief Review of Current Smart Electric Mobility Facilities and Their … 545
4 Topology of EV’s
See Fig. 1.
The EV industry was always under constant progression for a decade and yet
continues to develop to this day. The ever-growing research gives greater oppor-
tunities for implementing better components of replacement. A study states that in
the year 2020, the global EV stock hit the 10 million mark and with a 43% increase,
more than that of 2020’s mark. As technology keeps developing, new models and
designs for EV’s keep changing/evolving. The efficiency of batteries also keeps
advancing. As technologies as such continue to emerge to meet the growing needs
of the Electric Vehicle industries, one of the trends to look forward to is changing
customer sentiments. As the fuel rates keep skyrocketing, the demands and expec-
tations for alternatives for the customers keep increasing towards the automobile
industry. Some more of the important trends are listed below (Figs. 2 and 3).
A Brief Review of Current Smart Electric Mobility Facilities and Their … 547
Fig. 3 Annual Passenger-car and Light duty vehicle sales analytic (2010–19). (Image Source Elec-
tric vehicles. (2020, July 28). Deloitte Insights. https://www2.deloitte.com/us/en/insights/focus/fut
ure-of-mobility/electric-vehicle-trends)
The components or elements used inside the vehicle such as the dashboard and
touchscreen are some of the futuristic designs and a symbol of luxury and comfort.
Customers expect the cruising journey in an EV to be comfortable and rather better
548 D. Satya Sai Surya Varun et al.
Fig. 4 Annual Passenger-car and Light duty vehicle sales Analytic (2010–20). (Image Source
https://www.iea.org/commentaries/how-global-electric-car-sales-defied-covid-19-in-2020)
than what they’ve been experiencing in usual ICE-based vehicles. Comfort and design
play a crucial role in getting better sales in the automotive industries. The Utility
Vehicle (UV) design keeps getting popular being the most suitable design for middle-
class customers. The EV’s exterior design has become some sort of competitive art
form. The aerodynamics of the exterior design plays a crucial role, especially for the
exterior elements of manufacture. The EV in comparison with usual ICE vehicles
has no front area of occupation, i.e. separate crash absorption system is uniquely
designed for the same. This type of trend gives greater scope of marketing in the
automobile industry and yet is under constant evolution (Figs. 4 and 5).
(b) Demand for Autonomous Facilities
Harmonized charging standards are very important especially for cities with the
requirement to achieve zero emissions. Development and research of achieving ultra-
fast charging facilities are booming in the industry [17]. V2G research with better
equipment is also a topic under development for the same. The electrification effi-
ciency of the battery affects the grid system, and hence smart charging facilities are to
be developed. Autonomous EV’s have the potential of replacing traditional ICE-type
vehicles. Advanced charging and connective solution facilities would create better
business opportunities for the industry to excel [18].
Fig. 5 EV sales review pre- and post-COVID-19 pandemic. (Image Source https://www.marketsan
dmarkets.com/Market-Reports/covid-19-impact-on-electric-vehicle-market-81970499.html)
cases [19]. However, it’s difficult to find LCA statements of each type as very short
review papers for each type exist. EV LCA performance analysis and literature review
are increasing constantly as being an important subject of concern, especially for the
customers. Most studies, yet, majorly consider only Well to Wheel performance of
EV’s while neglecting factors such as battery production. A brief comparison of
specific types of EV’s was conducted and framed with over 79 study cases [19–
22]. The Wheel-to-Wheel (WTW) study highlights the amount of carbon emission
intensity and amount of electrification that could be assessed for the specific type of
vehicle. The study states that full EV emits over ~ ½ of the amount of CO2 emission
in comparison to that of an ICE-based common conventional vehicle [21].
The study suggests the average CO2 emission produced by an EV over a common
ICE-based vehicle is over 25% less in percentage [20].
The prognosis of EV’s carbon footprint study also suggests that the life cycle
performance and efficiency of EV’s are going to increase in upcoming years. For
better performance, the demand for better metals are increasing [23]. For example,
Tesla utilizes metals like Lithium, Aluminium Oxide, Manganese, Nickel and Cobalt.
Rare Earth Elements (REE) are used for manufacturing electric motors for greater
performance. Electrification of vehicles with larger sizes and weights has been
constantly criticized (e.g. SUVs) for them requiring larger battery sizes and storage
capabilities which is yet hard to achieve in a complete (Full-EV)-type system. Yet
the same study enlightens upon the fact that the batteries manufactured with REE’s
would create better and efficient batteries which would also be capable to drive large-
sized-SUV-type vehicles as well. It is a true fact that the accountability of the LCA’s
influencing factors into consideration can be taken up by individual Automobile
industries and depends upon their individual goals and aims pre-set and yet it is also
550 D. Satya Sai Surya Varun et al.
the veracious fact that modern society relies on believes and trusts new technology
and ever-expanding scientifically efficient devices and auto-mobility as well.
Conclusively, the expanding research efforts and studies for the subject of EV’s
keep generating better chances of decreasing Carbon emissions in comparison to
that of the utilization of the conventional fuel-based form of vehicles. The life cycle
analysis of the previous subject research papers shows that the carbon footprints
emitted by EV’s are way lower, justifying the replacement of conventional ICE-based
vehicles with electrified EV’s for good. With the increase in studies for developing
means for generating/harvesting electricity from renewable sources of energy, the
hazardous climatic-carbon effects are expected to diminish rapidly. The technolog-
ical improvements not only in the field of energy harvesting systems but also in
the field of improving battery chemistry, battery-efficient-materials chemistry and
battery storage capacity will contribute to the same goal of achieving a carbon-free
environment.
(d) Demand for Price reductions
EV’s are viewed as the ultimate problem solution for many types of issues. For the
same, economic value and for them to be the replacement for common ICE-based
vehicles, the price should be such that middle-class people from different countries
could afford to buy them.
There are many strategies that would work to help improve this situation of afford-
ability by common members of society to rely on EV’s. EV’s are costly/expensive
mainly for their batteries. Batteries of different types have different lifetime and
energy storage capabilities with which one’s owner would have to worry about its
“health” (Lifeline). The battery has to be such that it would not have materials of
Extreme Rare Earth Elements (EREE’s) and wouldn’t consume more electricity
while manufacturing (depending on the individual manufacturing device’s capa-
bility). With improving technology, EREE’s are being utilized for better quality.
Cobalt-based batteries are cheap and affordable in comparison to other forms of
recent-type batteries such as Lithium-Titanate and Lithium-Iron-Phosphate [24].
Falling battery prices would solve 25% of the price demand issue. But what other
types of factors could be done to reduce the EV’s battery value? The performance.
Performance of the battery is essential and is something people/customers would ask
about before even deciding to buy a certain type of EV.
EV’s design optimization play’s a crucial role to reduce the price as well. For this
case, it isn’t the vehicle’s exterior design but rather mostly for its battery and other
components’ compatibility design that we are going to focus on. Having LCB in a
luxury type V2X EV’s, it is hard to keep their state as such without increasing the
height of the vehicle. If so, this type of vehicle would consume a lot of energy even if
it were manufactured with an ICE-based engine. Its design could be easily compared
to an SUV and hence a complex-internal design has to be taken into account for the
same. There shall be fewer compromises and higher flexibility for these types of EV’s
design. The electric cable design slots have to be pre-designed using computerized
software to avoid mistakes and to save space.
A Brief Review of Current Smart Electric Mobility Facilities and Their … 551
The estimation for total electric vehicle battery manufacturing includes almost
50–40% of the total vehicle cost. Investing in new upraising companies is hence
essential to let new technologies with better ideas of replacement to give a chance of
development.
Electric Vehicles are expected to get cost-effective with time as better sources and
materials for manufacturing the batteries are yet under constant development and
research. A study suggests that 77% per cent of EV battery cost will be reduced in
the 2016 to 2030 time frame [25]. The continued efforts given by current researchers
in the automotive industry are evident for the same.
(e) Demand for better Wireless systems
The “Wireless Charging Facilities” is an enormous topic of discussion, improve-
ment and research. Optimizing such a facility is a necessity for getting better perfor-
mance output from the EV. Various studies for the same are under great interest of
automobile/EV researchers [26, 27].
There are many factors affecting the charging facilities and major ones include the
charging time and charging location. This type of facility challenges the current grid
systems available. And the types of grid-facilitated charging system types have been
discussed above. When it comes to charging location, CS’s should be abundant in
certain areas of cruising. In fact, the current gas stations should be deployed with a CS
facility for electric vehicles [3]. The wireless charging system is quite a new concept
of origin and yet under major construction of government policies for different
countries, technological development and manufacturing developments. With the
progressive market competition, the wireless EV charging facility development is
increasing.
Types of wireless systems for power transfer (WPT’s):
• Near-Field-WPT
• Inductive-WPT-System
• Capacitive-WPT-Systems.
Current Trends in WPT’s:
• Reducing the component sizes and increasing spaces
• Achieving high power transfer with high efficiency
• Achieving variable compensations
• Multi staging Matching Network Systems in the EV’s
• Phased array Field Focusing.
Irreversible climatic conditions have been and are affecting the environment as well
as different types of flora and fauna unnoticeably yet noticeable under years of
inspection. The greenhouse effect is real and yet people don’t seem to bother/get
concerned about it. Even if people agree to replace their fuel-type vehicle with a
modern EV-type vehicle, they have their own budgets which are hard to compensate
552 D. Satya Sai Surya Varun et al.
for newly developing technology to be owned. People even take their own time,
for “the development of tech” before buying it to avail extremely advanced tech
possible. Emerging technologies are always financially inaccessible at the initial
stages of development, and it’s true for every one of them. Though EV’s reduce
carbon emissions and take the financial perspectives of individuals to afford the EV to
cause immense and immediate effects of CO2 , greenhouse gas emission reduction is
impossible to be achieved within even 20 years of time. Even though EV evolutionary
development might take years of time to achieve the ultimate EV, the controlling of
the carbon emissions is yet if not better than utilizing ICE Based EV’s. Accessing
EV’s is not only difficult but also might cause a burden to people if any sort of
equipment repeatedly needs to be replaced or to be changed, financially for the
owners. Equipment failure plays a crucial role in customers’ interest in owning EV’s.
Bad reviews Might cause serious issues for the individual automobile industry, and
hence every piece of equipment needs a thorough examination of life assessment and
lifetime warranty facility to be provided for the customers.
Countries like Africa, India, Bangladesh and so on yet suffering from immense
poverty issues are far from achieving the goal of full EV replacement modernization
even after a complete century. Yet people of modernizing places in such countries
should get access to technologies as such to avoid further owning of ICE-based
vehicles. A study suggests over 96% of people in India might not access such features
even after 5 decades of development [28]. Accessibility of such technologies to be
showcased at various places has to be ensured by the automobile industries; the
question of affordability depends upon the people and customers.
(g) Demand for complete Electric Facilities
With the increased number of electric vehicles, the electrical bills are highly expected
to be the same as the usual fuel available in the market. This sounds remote in reality
but is also true to be achieved with the reports and suggestions for the EV futuristic
research reviews. Electrical facilities are associated with many components of the
vehicle in a usual ICE-based vehicle/car. But in the case of EV’s, the customers
expect every component to be driven with only electricity which isn’t true in the
current situations of EV’s. EV’s are manufactured into different models and ideas
and with different ranges and even with internal component differences. This type of
analysis is hard to be taken into account because the performance and maintenance
for the same do not stay the same for even individual types of EV’s as discussed
above. Although most of the components could be just driven by electricity, the
challenge is only “Batteries” with a great capacity to store energy and to be having
as high mileage as for usual fuel-type vehicles. Hence, it is the common perspective
of the people who desire to invest their money in something of such “nearly emerging
technology”.
EV’s account for a significant load capacity in most of the country’s as of 2021.
For EV future to flourish, new technology and innovative ideas have to be taken into
consideration by any minor or major companies working for the same [4]. This is
time taking, and the evolution of EV’s is also expected to escalate same as in 1900’s
usual car evolution.
A Brief Review of Current Smart Electric Mobility Facilities and Their … 553
Renewable sources of energy are hard to extract or harvest, and it’s challenging to
get better efficiency from them as well. However, these types of energy harvesting
systems are the future of powering sources for people in decades to come. With
challenging environmental hazards created by us, humans keep evolving; repairing
the same gets more difficult with time. With greater provisions and availability of
renewable energy resources, the utilization of grid systems is constantly decreasing.
But it is a matter of fact that people have to rely on the grid systems to get fuel/energy
for their EV’s.
There has been a significant amount of studies to improve the driving range of
EV’s and accuracy predictions of the electric motors equipped in the vehicles. BEV
types of EV’s are given greater importance than any other type as this would be
the test module of real-life implementation for further improvements. However, just
as there are many differences between all types of EV’s discussed above, each and
every type also has its own advantages and disadvantages over the other. As BEV’s
are equipped with sophisticated battery systems, it usually takes a lot of time to charge
and hence are unreliable under emergency situations if the battery runs out. To count
and analyse this range issue, conventional multiple linear regression methods could
be used [31]. And hence it is one of the recent innovative applications for ML-based
EV development. Batteries are the main source of development for EV’s and that is
to ensure a high range of mobility and deployment.
AI is under great expansion and is one of the leading subjects to contribute to the
field of machine learning for the future of electric battery and automobile battery
development and research [32]. Recent studies conducted by “Stanford” are innova-
tive and may also be one of the branch of study choices that may be under development
for the future of EV’s. The study claims it to be helping future automobile batteries
to have long-lasting charges and fast charging facilities with fast charging powering
grid systems [17].
Machine Learning in the field of EV’s can be considered a type of “hit and trial”
method to achieve successive outcomes. The patterns of failure from previously
examined and tested batteries could be observed and solutions for the same with
thoughtful and scientific concepts by current researchers could lead to the future of
this study. Storage facilities for not only EV’s but also for a wide range of applications
such as House-Inverters, wind and solar energy harvesting systems would lead to
more efficient utilization of renewable power resources as well.
Predicting the EV driving range using the Machine Learning concept is a fairly
recent topic of discussion [31]. Charge scheduling and manufacturing and designing
the cable structure for charging the EV with minimum waiting time is a challenge
facing ML in recent times, and as discussed above it is one of the main focuses of
study for a decade [33].
A Brief Review of Current Smart Electric Mobility Facilities and Their … 555
Conclusively one of the main factors that affects the study of machine learning in
the field of Electric vehicles, Powering systems and Battery chemistry includes the
following obstacles:
• Battery enhancement difficulty
• Battery storage capacity
• Battery’s physical dimensions for space and adjustments
• Charging equipment’s modelling
• Charging port’s design for PEV-type EV’s
• Charging time efficiency
• Grid system enhancements for CS’s.
And hence the current focus of studies and research under ML for futuristic devel-
opment and to achieve goals of comfort for the customers also by maintaining the
environmental issues into consideration.
As discussed above, deep learning facilitates and provides basic infrastructural devel-
opment plans for futuristic EV’s each and individual types. In ever-growing popu-
lations such as India and China, Road Mapping applications in the EV dashboard
feature demand is increasing with greater dependency and efficiency expectations
[39]. Driving assistance and dependence on the automated facility in these countries
is a challenge of deployment and requires complex algorithmic codes and develop-
ments for the same as no risks of life-threatening circumstances should be avoided
in the same. The automated forms of such programs have already been seen in ICE-
based vehicles [39, 40]. The High-Definition-Road-Network (HDRN) for the road
mapping facility provides the greatest, to date, solution for self-driving vehicles.
The power requirement for the same requires a complex and separate battery or
powering system in which, BEV-type vehicles are the best for deployment having
high-capacitated Lithium-ion batteries to facilitate both, engine/wheels as well as the
A Brief Review of Current Smart Electric Mobility Facilities and Their … 557
Let’s brief here about lithium-ion battery management systems for EV’s. In EV’s,
the Battery Management Unit (BMU) is a small part of the system that stores and
converts energy from the battery-stored electricity into motion and vice versa. As we
have discussed above, the different types of EV’s have different battery requirements
and also space for deployment. Modern-type electrical impact of the battery system
vehicle impacts the overall performance of the EV and is expected to be very highly
efficient than the hydraulic-based (ICE) type vehicle, and it is the same for every type
of EV discussed above. Battery management is the most important aspect and concept
of concern in the overall EV system as discussed above, modern type EV major
deployment issue being battery systems, their performance and storage capabilities.
Battery system being the most expensive component of the vehicle, the type of battery
of deployment and its design make a great impact on the overall performance of the
vehicle. Care and the feeding pack of the battery system is a great deal of focus
to ensure the best performance and avoid any damages in the future along with the
longevity of the battery’s life. There are several factors that are taken into account
while designing the battery pack as well as the BMU. In ideal conditions, the service
of the battery pack and its performance outlast that of the overall life span of the
vehicle driven by itself and is highly expected as well. The safety and efficiency of
the same have to be ensured as well. Some of the variations in the BMU available in
the market are as follows.
Increased capacity of the battery pack than that of the targeted range of the same.
Despite the overall capacity of the battery getting diminished over time, the overall
performance of the vehicle is retained over a longer period of time (in years). Some
of the diagnostics and prognostics of the EV battery system along with other types
have been studied as well [50].
Some of the BM-Fuel Gauging techniques are as follows:
• Monitoring Cell Voltage
• Hydrometer Analysis
• Coulomb Counting.
The overall aim of the BMU monitoring system is to carefully update and note
the capacity of the battery being charged and discharged frequently and how it’s
affecting the battery’s performance which in turn affects the vehicles’ performance
range, efficiency optimization and battery’s life span. Usually, the stacked battery
system consisting of cells having one lesser amount of charge being charged entirely
may cause damage to the entire system. This discharge and charged levels of the
BM being damaged depend upon various factors of the situation. Hence, we can
say that the situation of damage to the Battery Management System (BMS) doesn’t
only get affected by external influences but also by internal structure and the level of
charges in each and every stacked cell as well. Hence, Cell Charge level balancing
and equalization provide a mechanism for all of the stacked cells in the BMU to be
A Brief Review of Current Smart Electric Mobility Facilities and Their … 559
maintained at the almost identical level of charge and hence maintaining the overall
performance of the Battery pack over a long period of time of being charged and
discharged.
The long period of charging and discharging cycles affects the battery perfor-
mance and has been monitored by many papers as well [50]. Strategic propositions
to maintain these levels are necessary and have to be monitored for individual EV
battery health.
Some of the strategies include [51]:
• Un-Coordinated Direct Charging (U-Di-C)
• Un-Coordinated Direct Charging and Discharging (U-Di-CD)
• Un-Coordinated Delayed Charging and Discharging (U-De-CD)
• Un-Coordinated Delayed Charging (U-De-CD)
• Un-Coordinated Random Charging (U-R–C)
• Un-Coordinated Delayed Charging and Discharging (U-R-CD)
• Continuous Coordinated Direct Charging (CC-Di-C)
• Continuous Coordinated Direct Charging and Discharging (CC-Di-CD)
• Continuous Coordinated Delayed Charging (CC-De-C)
• Continuous Coordinated Delayed Charging and Discharging (CC-De-CD).
For a detailed analogy of these strategic propositions, it is highly recommended
to endure the paper [51].
• Affordability of EV’s
• Charging and electric facilities to be robust
• Exterior design and architectural design
• Quality design and comfortable interior for opulence travel experiences
• Manufactured with quality material resources
• Less likelihood of repairs
• Affordable charging (to charge) facilities
• Greater mileage
• Fast charging facility
• Home charging facility and CS’s deployment.
With all of these factors taken into consideration, the expectancy of the goal to be
achieved is prominent and would definitely take years of development and research.
Though the emission expectations might not be as high as expected, the production
of gases such as NOx and CO2 is expected to be lesser in comparison to the situations
as recorded in 2019 at various regions with high population and pollution rates
globally. Electrification of transportation systems is one of the major steps that has
to be undertaken within this century to maintain the balance of the environment and
humanity on this planet.
This paper has discussed various types of possible EV’s available in the market
and has given a brief review of the type of batteries that are in the market and what
new developments are being undertaken by automobile industries to achieve this
goal.
EV’s create a major role in the power sector and especially for the futuristic
reliance on power grid systems according to this yet emerging technology. Energy
conservation and Harvesting systems and Environmental consciousness are two
different factors that are taken into account for EV development. Various papers
have suggested that this could be achieved hand-in-hand sooner or later. Vast-ranging
types of EV’s in the market show a greater potential to achieve the goal with time.
Futuristic innovations such as metal intensity batteries apart from usual high-
capacitive yet expensive batteries made of Nickel, Cobalt or Lithium could be a
potential future of batteries and power systems as well apart from EV’s [25].
Innovative interior and exterior designs and architectural development of modern
EV designs could attract people’s attention towards improvising technology of
EV’s. Slow-powering different components via electrification with slowly devel-
oping battery systems with slight or higher voltage outputs could be utilized to
power complete vehicles on electrified systems as a whole by years of studies and
burgeoning. In the case of interior design, better and wireless systems equipped could
attract customers’ interest to invest as well.
Tablet features and automated driving systems with automated parking systems
are the future of electric vehicles.
The grid system technology is a complicated powering system that has to be
developed and would provide great opportunities and employment for the people as
well. Wireless or contact mode transmitted Energy systems are yet a research topic
A Brief Review of Current Smart Electric Mobility Facilities and Their … 561
that might also be the future of charging facilities in EV’s. Better and fast charging
facilities are also an enormous topic of debate in recent years for PEV-type EV’s.
Conclusively, it’s an undeniable fact that EV’s have a scintillating future with an
appreciable scope of deployment globally in the coming decades.
As Covid-19 struck the world in 2020, while the whole world went under isolative
conditions, the global market not for EV’s but for usual vehicles dropped drastically
within a very short time [53]. In 2019, the integrated annual sales of BEV- and
PEV-type electric vehicles reached over the 2 Million mark and yet are expected to
increase till 2030 [30].
The Sources suggest that overall 15% of the vehicle sales dropped on a year-
on-year basis. Though the effect could be observed for EV’s as well, the expected
sales for EV’s in 2019 couldn’t hit the mark in 2020 since the pandemic became an
obstacle, though it has been observed that the sales of EV’s rather increased slightly.
For a detailed analysis of the same over that affected the market in different countries,
it is highly suggested to refer to the paper [53].
The impact of the sales of EV’s is likely to increase rather than that of fossil-fuel-
powered vehicles across the world. There have been various investigative reports on
the affective impact of the coronavirus over sales of EV’s as well as normal ICE-
based vehicles [53, 54]. While some of the reports have suggested the decline of
Charging Station Implementations to be decreased to over 70–75% in regional basis
areas, the overall EV demand is yet increasing which is evident from IEA reports
[53]. It’s very much evident that other than EV’s, all forms of transportation systems
in the market have majorly got impacted/affected by the same. The EV after the
Covid-19 situation has a bright future and has to be taken into account by different
countries; the reasons are as follows [55]:
To Incentive the Economy of Individual economic situations of different countries.
• Cost Saving for EV’s.
• At times of Emergencies and low demand in the market, to increase new revenue
streams.
• To encourage people to retain local air quality using EV’s.
According to the article [55], the following steps have to be undertaken by the
government to encourage the supplement of EV’s in the market:
• By Increasing Studies and Research for Charging infrastructure
• By Encouraging and Supporting the people for purchasing EV’s
• By Implementing Emission Standards and EV Mandates.
The pandemic has also affected the mindsets of individuals owning EV’s. Though
there have been reports of people’s interest to increase in the field of sustainable
living conditions, and driving facilities, the situation of the pandemic has diverted
562 D. Satya Sai Surya Varun et al.
the interest in owning EV’s drastically. People’s interest in relying upon a sustainable
resource mobility system has caused great interest as it’s the best alternative for the
ever-increasing rates of fossil fuels.
While it might be a bad situation for Charging Station deployment across the
places, it is expected that home charging facilities overnight would be more conve-
nient for people to be relied upon. While most of the fact of Home-Charging-Facilities
may seem the most convenient, over places, deployment of CS’s may be risky as there
might be a need for the same over different places in case of battery shortages and
long driving days for the EV’s owned by the customers.
A Covid-19 impacted sales of fuel-ICE-based vehicles would be a disaster for
Global weather health and warming. While it isn’t what is really happening, the
chances for the same aren’t even low. In 2021 even, the electric vehicle market is
poised for growth.
All in all, the future of electric vehicles is going to be remarkable, and it’s evident
from all the papers discussed above.
India aspires to be a significant player in the worldwide electric car industry. The
prevalence of BEVs has expanded dramatically in the previous five years, thanks to
various automakers in the nation working on electric cars. Along with the traditional
automotive manufacturers, a number of start-ups have risen in the market with their
own goods and technology.
13 Conclusions
This review gives insight into an elaborated discussion about current types, trends
and future scope of EV’s. It is clear from this review that even though EV’s are still
an emerging technological issue under constant development, it has a bright future
for the automobile industry. Considering all the other possible obstacles that need
to be attained, it is a necessity for countries producing higher carbon emissions to
replace traditional types of vehicles with modern types of electric vehicles as soon
as possible to avoid further damage to the environment. EV’s are great for this work
as they provide transport with reduced carbon output. Garages could be used as a
Home-CS facility for an individual who could afford the current type of EV. Even
though it is hard to afford full-EV for citizens, many of the country’s people could
at least start adapting EV-driven types of vehicles as mentioned above to slowly
compensate for the environmental effects.
The carbon footprint of an EV will depend upon the type and size of the battery as
well. The demand for better quality depends mainly upon the type of battery being
utilized. In recent years, the demand and price of raw materials used to manufacture
A Brief Review of Current Smart Electric Mobility Facilities and Their … 563
lithium-ion batteries is getting higher and higher which is making the provision
of EV’s in the market to be reduced. The other materials such as Cobalt are also
increasing, which is another material of significance in the battery evolution [24].
The main objective of this study is to provide a clear view of current types and
trends of EV’s in the market and classify the differences and benefits received for
individual types for future customers. By the empirical results, it is confirmed that
in the long run EV’s show promising equilibrium for fighting against environmental
issues.
With the ever-increasing demand for EV’s in various developing countries, the
expansion and development in the fields of renewable energy generating systems
have to be increased and should be a topic of serious implementation and research
subject. The demand for energy storage systems with effective storage capabilities
is also yet to be achieved without using REE’s. The demand for Batteries with such
capabilities is increasing with a size range of mid-to-large in EV’s. The policies for
developing such technologies must be expanded. The extreme reliance on raw mate-
rials such as Lithium aand Cobalt will cause a huge halt in the further development
of the technology [24]. New materials have to be examined and a superior element
of replacement has to be discovered in order to accelerate the research and develop-
ment. Newly emerging companies with great ambitions and innovative ideas should
be given a chance to develop and for consideration in the market to accelerate the
research. The limited and expensive supply of cobalt and lithium is causing compa-
nies to revert from the idea of utilization. In the use phase, LCA of EV’s of every
type has yet to be given importance as a research subject as very few papers have
been seen in recent years [20, 21]. Cumulative efforts for increasing the work could
promise a better future for humanity via automobile industries with the help of EV’s.
References
1. Towoju OA, Ishola FA (2020) A case for the internal combustion engine powered vehicle.
Energy Rep 6:315–321
2. Boston W (2019) Rise of electric cars threatens to drain German growth. WSJ. https://www.
wsj.com/articles/rise-of-electric-cars-threatens-to-drain-german-growth-11565861401 (2019,
Aug 16)
3. Xu X, Niu D, Li Y, Sun L (2020) Optimal pricing strategy of electric vehicle charging station
for promoting green behavior based on time and space dimensions. J Adv Transp 1–16
4. Sneha Angeline P, Newlin Rajkumar M (2020) Evolution of electric vehicle and its future
scope. Mater Today: Proc 33:3930–3936
5. Global greenhouse gas emissions data. US EPA. https://www.epa.gov/ghgemissions/global-
greenhouse- gas- emissions-data (2021, March 25).
6. Nanaki EA (2021) Electric vehicles. Electric Veh Smart Cities 13–49
7. Larman C, Vodde B (2010) Practices for scaling lean and agile development: large, multisite,
and offshore product development with large-scale scrum. Pearson Education, Boston
8. Zulkifli SA, Mohd S, Saad N, Aziz ARR (2015) Split-parallel through-the-road hybrid electric
vehicle: operation, power flow and control modes. In: 2015 IEEE transportation electrification
conference and expo (ITEC), pp 1–7
564 D. Satya Sai Surya Varun et al.
9. Doucette RT, McCulloch MD (2011) Modeling the prospects of plug-in hybrid electric vehicles
to reduce CO2 emissions. Appl Energy 88(7):2315–2323
10. Chakraborty S, Vu HN, Hasan MM, Tran DD, Baghdadi ME, Hegazy O (2019) DC-DC
converter topologies for electric vehicles, plug-in hybrid electric vehicles and fast charging
stations: state of the art and future trends. Energies 12(8):1569
11. Gago RG, Pinto SF, Silva JF (2016) G2V and V2G electric vehicle charger for smart grids. In:
2016 IEEE international smart cities conference (ISC2)
12. Goel S, Sharma R, Rathore AK (2021) A review on barrier and challenges of electric vehicle
in India and vehicle to grid optimisation. Transp Eng 4:100057
13. Kempton W, Tomić J (2005) Vehicle-to-grid power implementation: from stabilizing the grid
to supporting large- scale renewable energy. J Power Sourc 144(1):280–294
14. NextEnergy. Vehicle-to-building (V2B). https://nextenergy.org/vehicle-building-v2b/. (2017,
June 26)
15. Sami I, Ullah Z, Salman K, Hussain I, Ali SM, Khan B, Mehmood CA, Farid U (2019) A
bidirectional interactive electric vehicles operation modes: vehicle-to-grid (V2G) and grid-to-
vehicle (G2V) variations within smart grid. In: 2019 international conference on engineering
and emerging technologies (ICEET)
16. Mahure P, Keshri RK, Abhyankar R, Buja G (2020) Bidirectional conductive charging of
electric vehicles for V2V energy exchange. In: IECON 2020 The 46th annual conference of
the IEEE industrial electronics society. Published
17. Attia PM, Grover A, Jin N, Severson KA, Markov TM, Liao YH, Chen MH, Cheong B,
Perkins N, Yang Z, Herring PK, Aykol M, Harris SJ, Braatz RD, Ermon S, Chueh WC (2020)
Closed-loop optimization of fast-charging protocols for batteries with machine learning. Nature
578(7795):397–402
18. Bonnema GM, Muller G, Schuddeboom L (2020) Electric mobility and charging: systems of
systems and infrastructure systems. In: 2015 10th system of systems engineering conference
(SoSE)
19. Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in
software engineering. Empir Softw Eng 14:131–164
20. Helmers E (2020) Sensitivity analysis in the life-cycle assessment of electric vs. combustion
engine cars under approximate real-world conditions. MDPI (2020, Feb 9).
21. Helmers E, Dietz J, Weiss M (2020) Sensitivity analysis in the life-cycle assessment of electric
vs. combustion engine cars under approximate real-world conditions. Sustainability 12(3):1241
22. Nordelöf A, Messagie M, Tillman AM, Ljunggren Söderman M, Van Mierlo J (2014)Envi-
ronmental impacts of hybrid, plug-in hybrid, and battery electric vehicles—what can we learn
from life cycle assessment? Int J Life Cycle Assess 19(11):1866–1890
23. Jones B, Elliott RJ, Nguyen-Tien V (2020) The EV revolution: the road ahead for critical raw
materials demand. Appl Energy 280:115072
24. Mo J, Jeon W (2018) The impact of electric vehicle demand and battery recycling on price
dynamics of lithium- ion battery cathode materials: a vector error correction model (VECM)
analysis. Sustainability 10(8):2870
25. U.S Department of Energy. (n.d.). All-electric vehicles. www.fueleconomy.gov - the official
government source for fuel economy information. https://www.fueleconomy.gov/feg/evtech.
shtml.
26. Triviño A, González-González JM, Aguado JA (2021) Wireless power transfer technologies
applied to electric vehicles: a review. Energies 14(6):1547
27. Al Mamun MA, Istiak M, Al Mamun KA, Rukaia SA (2020) Design and implementation
of a wireless charging system for electric vehicles. In: 2020 IEEE region 10 symposium
(TENSYMP)
28. Mishra S, Verma S, Chowdhury S, Gaur A, Mohapatra S, Dwivedi G, Verma P (2021) A
comprehensive review on developments in electric vehicle charging station infrastructure and
present scenario of India. Sustainability 13(4):2396
29. Naik AR (2020) How electric vehicles will impact electricity demand, India’s grid
capacity. Inc42 Media. https://inc42.com/features/how-electric-vehicles-will-impact-electr
icity-demand-indias-grid-capacity/ (2020, April 3)
A Brief Review of Current Smart Electric Mobility Facilities and Their … 565
1 Introduction
Photonic crystal fiber (PCF) is a compatible platform to design and develop a surface
plasmon resonance (SPR)-based RI sensor [1]. The PCF is considered as an suitable
candidate for sensor designing because it offers several advantages over conventional
optical fibers. PCF SPR sensor offers advantages like design flexibility to obtain
maximum sensing parameters, non-linearity, small analyte sample for detection,
suitability to carry over to different places, and fit for remote sensing applications [2].
In PCFSPR sensors, plasmonic material deposition is an important task to perform.
Gold (Au) [3], silver (Ag) [4], copper (Cu) [5], aluminum (Al) [6], titanium dioxide
(TiO2 ) [7], indium tin oxide (ITO) [8], etc. are some common plasmonic materials
used in the sensor designing and fabrication. Recently in a quest for searching for
new plasmonic materials, scientists and researchers have discovered materials like
tantalum pentoxide (Ta2 O5 ) [9], titanium nitrate (TiN ) [10, 11], zinc oxide (ZnO)
[12], palladium (Pd ) [13], etc. These materials can be deposited over the PCF fiber
using the chemical vapor deposition (CV D) technique [3]. The base material of
the PCFSPR sensor design is mostly silica, because silica is easily and abundantly
present in the environment. Besides silica, new background material like Topaz is
also used in sensor designing these days [14].
The structural design of the PCFSPR sensor follows three different methodolo-
gies. The first one is PCFSPR sensor models in which plasmonic material coating is
applied over the internal air holes of the PCFSPR design. This is a highly complicated
methodology from a fabrication perspective. Because the size of the PCFSPR sensor
itself is in the micrometer range, air holes have a more diminutive size. Therefore,
applying a thin layer of plasmonic material in the nanometer range over the air holes
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 567
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_43
568 A. K. Shakya and S. Singh
from RI 1.40 to 1.48 RIU , which is RI range of the typical household oils, biochem-
icals, and analytes. Thus, the proposed sensor consists of several new features which
will be interesting to observe during plasmonic sensing.
The paper is divided into four different sections. Sensor modeling and design
parameters are explained in Sect. 2. Sensor simulation results and the future scope
of the designed sensor are presented in Sect. 3. Finally, Sect. 4 offers a concluding
remark on the research work.
The sensor model is constituted of elliptical air holes arranged in a pattern to produce
a tetra core within the PCF fiber. Silica material in a fused condition is used as the
base material in the presented sensor. Elliptical air holes having 1.2 µm toward the
semi-minor axis and 1.5 µm semi-major axes are created. The combination of Au
and ZnO is examined as the plasmonic material in the presented sensor design. The
thickness of plasmonic material Au is 35nm, and that of ZnO is taken as 75nm.
A 1.25 µm thick analyte layer is placed over fused plasmonic material for analyte
sensing. Finally, a 1.85 µm thick PML layer is placed over the fiber to prevent it
from atmospheric disturbances [20]. The centers of two elliptical holes are separated
by a distance called pitch which is selected as = 2.25 µm. Figure 1a presents
the 2D design of the presented RI sensor. The structural design of the presented
RI sensor having a thin layer of plasmonic materials is zoomed to visually identify
the thickness of the plasmonic material represented by Fig. 1b. Figure 1c presents
the formation of the quad cores along X —polarization. Similarly, Fig. 1d shows the
quad-core formation along with the Y —polarization modes.
The sensing methodology of the presented RI sensor is shown in Fig. 1e. Here,
the light from the optical source passes in the proposed fiber through IN port along
with the analytes. Those RI need to be investigated. The analyte is taken out of the
PCF using the OUT port. An optical spectrum analyzer (OSA) is used to detect
the variation developed in the light signal corresponding to different analytes which
pass through the optical fiber. The output of OSA is connected with the computer to
obtain the change produced in wavelength (nm). The shift in wavelength is different,
corresponding to other analytes, oil samples, and chemicals. The capability of the
setup can be enhanced by using a device known as a polarization controller. Thus,
different analytes and chemicals can be analyzed from the proposed sensing setup.
The RI range of 1.40 to 1.48 RIU belongs to household oils and analytes [8, 12]. The
proposed system works efficiently in the presence of the computer system because
the computer system is always required for reading the output obtained generated
from the OSA device. No information about the chemical and oil behavior can be
obtained if the computer read-out device is absent. Thus, this research work presents
the sensing behavior of the proposed sensor with computer vision merged with optics.
The Au layer can be deposited over the PCF fiber using the “Drude-Lorentz
Model” employing the CV D technique [3]. Sensing performance parameters for
570 A. K. Shakya and S. Singh
Fig. 1 Designed PCFSPR sensor model, b Zoom Au and ZnO layers, c Quad-core (X—polariza-
tion), d Quad-core (Y —polarization), and e Sensing setup for analyzing analyte using the proposed
sensor
any designed sensor include confinement loss (CL), wavelength sensitivity (W S),
amplitude sensitivity (AS), sensor resolution (SR), linear relationship between RI ,
and resonant wavelength [3]. They are expressed by Eqs. (1–4) [7].
1. Confinement loss (CL): It is defined as the amount of loss developed due to the
non-perfect design of the sensor model. It can be understood as the power loss
going outside the core of the designed PCF. It is expressed in terms of dB/cm
and expressed by Eq. (1) [7]:
dB
CL = 8.686 × k0 × img(neff ) × 104 (1)
cm
λpeak
WS = (2)
RI (refractive Index)
1 ∂α(λ, na )
AS(RIU)−1 = − (3)
α(λ, na ) ∂na
Here, ∂na represents the difference in the RI value of two consecutive analytes.
4. Sensor resolution (SR): It can be defined as the potential of the sensor to identify
the slightest amount of drift in the “RI of the analyte.” It is represented by Eq. (4)
and assigned a unit RIU [3]:
na × λmin
SR(RIU ) = RIU (4)
λpeak
3 Simulation Results
RI 1.40
55 44
RI 1.41
(a) (b)
Confinement Loss (CL) (dB/cm)
Confinement Loss (CL) (dB/cm)
RI 1.42
43
RI 1.43
50 RI 1.44
RI 1.45 42
RI 1.46 RI 1.40
RI 1.47 41 RI 1.41
45 RI 1.48 RI 1.42
40 RI 1.43
RI 1.44
RI 1.45
39 RI 1.46
40 RI 1.47
X-polarization RI 1.48 Y- polarization
38
1700 1750 1800 1850 1900 1700 1750 1800 1850 1900
Wavelength (WL)(nm) Wavelength (WL)(nm)
4 4
x 10 x 10 RI 1.40
3 RI 1.40
RI 1.41
RI 1.42 RI 1.42
2 RI 1.43
RI 1.44
RI 1.43
RI 1.45 2 RI 1.44
1 RI 1.46 RI 1.45
RI 1.47
RI 1.46
0 0 RI 1.47
-1
-2
-2
X-polarization Y-polarization (b)
-3 -4
1700 1750 1800 1850 1900 1700 1750 1800 1850 1900
Wavelength (WL) (nm) Wavelength (WL) (nm)
43.14 dB/cm, 43.18 dB/cm, 43.20 dB/cm, and 43.24 dB/cm, respectively, corre-
sponding to biochemical having RI 1.40, 1.41, 1.42, 1.43, 1.44, 1.45, 1.46, 1.47,
and 1.48 RIU respectively presented in Fig. 2b.
The amplitude sensitivity corresponding to wavelength for different analytes
3613 RIU −1 , 4107 RIU −1 , 5172 RIU −1 , 8380 RIU −1 , 9272 RIU −1 , 13074 RIU −1 ,
14954 RIU −1 , 22150 RIU −1 , and 26834 RIU −1 corresponding to biochemical having
RI 1.40, 1.41, 1.42, 1.43, 1.44, 1.45, 1.46, and 1.47 RIU , respectively, corresponding
to X —polarization is presented in Fig. 3a.
The amplitude sensitivity corresponding to wavelength for different analytes
is 21380 RIU −1 , 22630 RIU −1 , 24187 RIU −1 , 26178 RIU −1 , 26990 RIU −1 ,
33580 RIU −1 , 35590 RIU −1 , and 39550 RIU −1 corresponding to biochemical having
RI 1.40, 1.41, 1.42, 1.43, 1.44, 1.45, 1.46, and 1.47 RIU , respectively, for Y —polar-
ization as illustrated in Fig. 3b.
The shift in resonance wavelength is 1805, 1810, 1815, 1820, 1830, 1840, 1850,
1860, and 1890nm for biochemical having RI 1.40, 1.41, 1.42, 1.43, 1.44, 1.45,
1.46, and 1.47 RIU respectively for X —polarization. The wavelength sensitivity
for the proposed design is 500 nm/RIU , 500 nm/RIU , 500 nm/RIU , 1000 nm/RIU ,
1000 nm/RIU , 1000 nm/RIU , 1000 nm/RIU , and 3000 nm/RIU for biochemical
having RI 1.40, 1.41, 1.42, 1.43, 1.44, 1.45, 1.46, and 1.47 respectively for X —
polarization.
The resonance wavelength for Y —polarization is shifted from 1760, 1765,
1770, 1775, 1780, 1785, 1795, 1810, and 1835nm for biochemical having RI 1.40,
1.41, 1.42, 1.43, 1.44, 1.45, 1.46, and 1.47 RIU respectively for Y —polarization.
The wavelength sensitivity for the proposed design is 500 nm/RIU , 500 nm/RIU ,
500 nm/RIU , 500 nm/RIU , 500 nm/RIU , 1000 nm/RIU , 1500 nm/RIU , and
2500 nm/RIU for biochemical having RI 1.40, 1.41, 1.42, 1.43, 1.44, 1.45, 1.46,
and 1.47 respectively for Y —polarization.
Gold-ZnO Coated Surface Plasmon Resonance Refractive Index Sensor … 573
and 3.33 × 10−5 RIU for biochemicals having RI 1.40, 1.41, 1.42, 1.43, 1.44, 1.45,
1.46, and 1.47 RIU , respectively.
The sensor resolution corresponding to Y —polarization is 2 × 10−4 RIU , 2 ×
10 RIU , 2 × 10−4 RIU , 2 × 10−4 RIU , 2 × 10−4 RIU , 1 × 10−4 RIU , 6.66 ×
−4
10−5 RIU , and 4.00 × 10−5 RIU for biochemicals having RI 1.40, 1.41, 1.42, 1.43,
1.44, 1.45, 1.46, and 1.47 RIU , respectively.
The fitting between resonant wavelength and RI provides information about
“sensor optimization.” A value of R—square close to unity represents good fitting
between resonance wavelength and RI . The fitting between RI and resonant wave-
length produces R2 = 0.9839 corresponding to X —polarization and R2 = 0.9758
corresponding to Y —polarization illustrated by Fig. 4a and b, respectively. The value
of R2 is close to unity which represents great fitting to the sensor response.
The peak value of the sensing parameters is produced for the RI having a value of
1.47 RIU . Thus, the proposed sensor has justified various features based on which it
can be considered an effective RI sensor.
Finally, Table 1 compares the parameters obtained for the proposed RI sensor
with other reported sensors developed to date.
Besides the conventional sensor parameters, the figure of merit (FOM ) can also
be obtained for the designed sensor model. The FOM is dependent on the full-wave
half maximum (F W HM ). Today, PCFSPR sensing field has been immensely revo-
lutionized. Scientists and researchers have presented several applications related to
PCFSPR sensors like cancer detection, environmental monitoring, pregnancy detec-
tion, transformer oil monitoring, food pathogen detection, etc. These photonic sensors
are working on the variation in the RI index values. Thus, there is the possibility that
they can be used in several application areas where a change is determined on the
basis of variation in the RI values. Household oils like coconut oil, gooseberry oil, and
amla oil have RI varying in the range of 1.40−1.48 RIU , besides some biochemicals
RW RW
1830
Resonnat Wavelength (RW)
having the same operational range of RI . Thus, the proposed RI sensor is designed to
cover RI range of various chemicals, household oils, and biochemicals. It is expected
that with the evolution of RI sensing, PCFSPRRI sensors will be used in several new
application areas.
4 Conclusion
Acknowledgements “This work is performed under the All India Council of Technical Education
(AICTE), National Doctoral Fellowship (NDF). Authors are further thankful to AICTE for the
AICTE NDF RPS project, sanction order no: File No.8-2/RIFD/RPS-NDF/Policy-1/2018-19 dated
March 13, 2019”.
Gold-ZnO Coated Surface Plasmon Resonance Refractive Index Sensor … 575
References
1. Liu W, Wang F, Liu C, Yang L, Liu Q, Su W, Lv J (2020) A hollow dual-core PCF-SPR sensor
with gold layers on the inner and outer surfaces of the thin cladding. Results Opt 1:100004.
https://doi.org/10.1016/j.rio.2020.100004
2. Khanikar T, De M, Singh VK (2021) A review on infiltrated or liquid core fiber optic SPR
sensors. In: Photonics and nanostructures —fundamentals and applications, vol 46, p 100945.
https://doi.org/10.1016/j.photonics.2021.100945
3. Shakya AK, Singh S (2021) Design of dual-polarized tetra core PCF based plasmonic RI
sensor for visible-IR spectrum. Opt Commun 478:126372. https://doi.org/10.1016/j.optcom.
2020.126372
4. Yang H, Wang G, Lu Y, Yao J (2021) Highly sensitive refractive index sensor based on SPR
with silver and titanium dioxide coating. Opt Quantum Electron 53:341. https://doi.org/10.
1007/s11082-021-02981-1
5. Butt M, Khonina S, Kazanskiy N (2021) Plasmonics: a necessity in the field of sensing-a review
(invited). Fiber Integrat Opt 40:14–47. https://doi.org/10.1080/01468030.2021.1902590
6. Liu Q, Ma Z, Wu Q (2020) The biochemical sensor based on liquid-core photonic crystal fiber
filled with gold, silver, and aluminum. Opt Laser Technol 130:106363. https://doi.org/10.1016/
j.optlastec.2020.106363
7. Shakya AK, Singh S (2021) Design and analysis of dual-polarized Au and TiO2-coated photonic
crystal fiber surface plasmon resonance refractive index sensor: an extraneous sensing approach.
J Nanophotonics 15(1):016009
8. Liu A, Wang J, Wang F, Su W, Yang L, Lv J, Fu G (2020) Surface plasmon resonance (SPR)
infrared sensor based on D-shape photonic crystal fibers with ITO coatings. Opt Commun
464:125496. https://doi.org/10.1016/j.optcom.2020.125496
9. Danlard, Akowuah EK (2021) Design and theoretical analysis of a dual-polarized quasi D-
shaped plasmonic PCF microsensor for back-to-back measurement of refractive index and
temperature. IEEE Sens J 21(8):9860 —9868
10. Shakya K, Singh S (2022) Design of novel Penta core PCF SPR RI sensor based on the fusion of
IMD and EMD techniques for analysis of water and transformer oil. Measurement 188:110513.
https://doi.org/10.1016/j.measurement.2021.110513
11. Monfared YE (2020) Refractive index sensor based on surface plasmon resonance excitation in
a D-shaped photonic crystal fiber coated by titanium nitride. Plasmonics 15:535–542. https://
doi.org/10.1007/s11468-019-01072-y
12. Liang H, Shen T, Feng Y, Liu H, Han W (2021) A D-shaped photonic crystal fiber refractive
index sensor coated with graphene and zinc oxide. Sensors 21(1):71
13. Chen DY, Zhao Y (2021) Review of optical hydrogen sensors based on metal hydrides: Recent
developments and challenges. Opt Laser Technol 137:106808. https://doi.org/10.1016/j.optlas
tec.2020.106808
14. Hasan MM, Barid M, Hossain MS, Sen S, Azad MM (2021) Large effective area with high
power fraction in the core region and extremely low effective material loss-based photonic
crystal fiber (PCF) in the terahertz (THz) wave pulse for different types of communication
sectors. J Opt 50:681–688. https://doi.org/10.1007/s12596-021-00740-9
15. Ramola A, Marwaha A, Singh S (2021) Design and investigation of a dedicated PCF SPR
biosensor for CANCER exposure employing external sensing. Appl Phys A 127:643. https://
doi.org/10.1007/s00339-021-04785-2
16. Popescu V, Sharma AK, Marques C (2021) Resonant interaction between a core mode and
two complementary supermodes in a honeycomb PCF reflector-based SPR sensor. Optik
227:166121. https://doi.org/10.1016/j.ijleo.2020.166121
17. Zhu M, Yang L, Lv J, Liu C, Li Q, Peng C, Li X, Chu PK (2021) Highly sensitive dual-core
photonic crystal fiber based on a surface. Plasmonics 1:1–8. https://doi.org/10.1007/s11468-
021-01543-1
18. Yan X, Wang Y, Cheng T, Li S (2021) Photonic crystal fiber SPR liquid sensor based on
elliptical detective channel. Micromachines 12(4):408
576 A. K. Shakya and S. Singh
19. Falah AS, Wong WR, Adikan FRM (2022) Single-mode eccentric-core D-shaped photonic
crystal fiber surface plasmon resonance sensor. Opt Laser Technol 145:107474. https://doi.org/
10.1016/j.optlastec.2021.107474
20. Shakya AK, Singh S (2022) Design of biochemical biosensor based on transmission,
absorbance, and refractive index. Biosens Bioelectron X 10:100089. https://doi.org/10.1016/j.
biosx.2021.100089
21. Society G (2021) Refractive index list of common household liquids, IGS, 01
January 2021. https://www.gemsociety.org/article/refractive-index-list-of-common-househ
old-liquids/. [Accessed 01 Nov 2021].
22. Otupiri R, Akowuah EK, Haxha S, Ademgil H, AbdelMalek F, Aggoun A (2014) A novel
birefringent photonic crystal fiber surface plasmon resonance biosensor. IEEE Photonics J
6(4):6801711
23. Gao D, Guan C, Wen Y, Zhong X, Yuan L (2014) Multi-hole fiber-based surface plasmon
resonance sensor operated at near-infrared wavelengths. Opt Commun 313:94–98. https://doi.
org/10.1016/j.optcom.2013.10.015
24. Osório H, Oliveira R, Aristilde S, Chesini G, Franco MAR (2017) Bragg gratings in surface-
core fibers: refractive index and directional curvature sensing. Opt Fiber Technol 34:86–90.
https://doi.org/10.1016/j.yofte.2017.01.007
25. Dash N, Jha R (2014) Graphene-based birefringent photonic crystal fiber sensor using surface
plasmon resonance. IEEE Photon Technol Lett 26(11):1092–1095
Fault Detection and Diagnostics
in a Cascaded Multilevel Inverter Using
Artificial Neural Network
Stonier Albert Alexander , M. Srinivasan , D. Sarathkumar ,
and R. Harish
1 Introduction
In industrial applications, the inverters play a major role in adjustable speed control of
AC drives, induction heating, air-craft stand-by power supplies, UPS for computers,
etc. The phase-controlled converter operated in the inverter mode is called a line
commutated inverter that requires the existing AC supply for the commutation
purpose. This implies that the line commutated inverter cannot be operated as an
isolated AC voltage source or a variable frequency generator with the input as DC
power. Thus, the AC side voltage of the line commutated inverter cannot be changed
by its voltage and frequency. Hence, the forced commutated inverters are used to
provide adjustable voltage and frequency for independent AC output that are used in
wider applications. The DC power input to the inverter is fed from different kinds of
sources like battery, photovoltaic array and fuel cell. This can be done by using the
DC link which comprises an AC to DC converter and a DC to AC inverter connected
to the DC link. Most of the rectification process is performed using diodes or thyristor
converter circuits.
Basically, the inverters are classified into two different types such as voltage source
inverters (VSI) and current source inverters (CSI). For the reduction of harmonics,
multilevel inverters are highly preferred whose types are (i) Flying capacitor inverter,
(ii) Diode-clamped system inverter and (iii) Cascade H-type level inverter [1–5].
Among the various types, owing to the advantages of cascaded multilevel inverter is
taken into consideration in this paper. A cascaded H-bridge multilevel inverter can
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 577
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_44
578 S. Albert Alexander et al.
be used for both single-phase and three-phase systems. Each H-bridge cell consists
of four switches and fly-wheeling diodes.
The proposed method deals with the implementation of a five-level cascaded
multilevel inverter employed with multilayer perceptron networks to identify the fault
location from inverter output voltage measurement and the corresponding diagnosis
for the same. Figure 1 shows the five-level cascaded multilevel inverter comprising
8 semiconductor switches. The objective of the work is to appropriately detect the
various faults existing in the system. In addition, the system should locate the fault and
diagnose it by stimulating the auxiliary circuit for providing continuous power even
under fault conditions. Most of the literature dealt with the faults by considering only
the common short-circuit and open-circuit faults [6–15]. In this paper, an intelligence-
based ANN is proposed to detect and diagnose the various faults in an inverter
configuration.
2 Proposed Methodology
The structure of a fault diagnostic system is illustrated in Fig. 2. The structure has
four main blocks such as feature extraction, network configuration, fault diagnosis
and switching pattern calculation. The feature extraction block extracts the output
voltage of a five-level inverter and transfers the same to the ANN. The ANN is trained
with normal and fault data and provides the corresponding binary code that if “1”
arrives it is a normal condition, and if “0” it is a fault condition. Hence, the output
of the network configuration is merely the binary code of either 0 or 1.
Fault Detection and Diagnostics in a Cascaded Multilevel Inverter Using … 579
The location corresponding to the code is then sent to the fault diagnosis to
interpret the condition. Based on this, the switching pattern is calculated which is
then provided to the inverter switches. A single-phase cascaded multilevel inverter
with 10 V DC and MOSFET as the switching device is used. The level of an inverter
is given by m = 2Ns + 1. Here, m denotes the level of an inverter and Ns denotes
the number of stages included. In the proposed configuration, m = 5 and Ns = 2.
The types of faults considered and their conditions are as follows:
• Open-circuit fault (V = 10 V; I = 0.09693A)
• Short-circuit fault (V = 0 V; I = 10.32A)
• Over-voltage fault (V = 99.96 V; I = 9.63A)
• Losing drive pulse fault (V = 19.99 V; I = 1.907A).
The losing drive pulse fault occurs when the pulse given to the circuit is lost or if
the pulse is not given properly. If the given pulse is wrong, the normal output will
not be displayed. The output may vary based on the pulse provided.
MATLAB/Simulink simulation tool is used to simulate the proposed system.
The selection of an approximate signal is much essential for feature extraction and
will have a significant insight to make a decision, and the highest degree of accu-
racy is obtained by a neural network. Features concentrate on voltage, current and
error signals at various normal and abnormal conditions. The dataset is the first
pre-requisite for the process of ANN. Once the dataset is obtained, the next stage is
training which is done with the aid of a backpropagation algorithm. Once the training
580 S. Albert Alexander et al.
is completed, the testing process is followed to check the accuracy of the system.
The network is examined by the test data values given to the network and is trained
to achieve the desired goal. Testing of the system network is based on the way by
which the system responds to normal and fault conditions. The trained system covers
the entire fault detection and diagnosing of the network to the required level of the
output requirement. Figure 3 shows the simulation of a five-level inverter without an
ANN-based controller. Figure 4 shows the simulation of the inverter with ANN.
Neural networks comprise different layers such as input, hidden and output. Figure 5
shows the network architecture. The layers are interconnected with the aid of acti-
vation functions to perform mathematical calculations and corresponding scaling
processes. Input layers are linked with each other in the form of a hidden layer and
an output layer. The function used is the sign activation function for input layer nodes,
tangent for hidden nodes, and log segment for the output node. Among the various
algorithms used for the implementation of ANN, the BPN algorithm is predominately
used for complex applications. The functions performed in the BPN algorithm are
feed-forward of data, error backpropagation and weight (connection links between
the layers) updating [16–20]. The algorithmic involved for the implementation of
fault detection and diagnosis is given as follows:
• Two-stage five-level inverter is simulated using MATLAB/Simulink.
• Voltage and current values were collected by varying the load conditions.
• With the aid of a dataset, the neural network was trained to get the best training
performance curve.
• The network is trained to detect and diagnose the various faults.
• The trained system is tested to check its accuracy.
• Five-level inverter is now implemented with ANN.
582 S. Albert Alexander et al.
The simulation results using MATLAB for various fault conditions using the ANN
controller are shown in the following figures. Without introducing any fault, the
waveform obtained under normal conditions is obtained as shown in Fig. 6. It clearly
depicts the five-level output voltage waveform. By introducing various faults like an
open circuit, short circuit, losing drive pulse and overvoltage faults, the waveforms
are obtained as shown in Figs. 7, 8, 9 and 10, respectively. Figure 11 shows the
training performance curve trained with the ANN-based controller. For the different
time intervals, the faults are introduced, tested and analyzed. Figure 12 shows the
waveform obtained in the five-level inverter with ANN after introducing a fault in
the system.
The various types of faults are detected by the corresponding binary values of ANN
(as per its training) as displayed in Table 1. Various fault detections are observed
during the simulation process using the Artificial Neural Network with the refer-
ence output voltage waveform compared with the actual waveform obtained during
the different fault conditions. The different types of faults have been detected by
comparing the output waveforms of actual and desired ones. According to the result,
the values assigned to each fault are 00, 01, 10 and 11 by the ANN controller, and
the fault detection processed can be easily assessed.
5 Conclusion
In this article, the fault detection and diagnosis of the cascaded five-level inverter
using a backpropagation algorithm-enabled artificial neural network is performed.
Different types of faults are induced in the cascaded multilevel inverter, and fault
detection and diagnosis are undertaken with reduced computation complexity. The
Fault Detection and Diagnostics in a Cascaded Multilevel Inverter Using … 583
fault conditions considered in the paper are short-circuit fault, open-circuit fault and
overvoltage fault along with other common faults.
584 S. Albert Alexander et al.
Funding The authors acknowledge and thank the Department of Science and Technology (Govern-
ment of India) for sanctioning the research grant for the project titled, “Design and Development
of Solar Photovoltaic Assisted Micro-Grid Architecture with Improved Performance Parameters
Intended for Rural Areas” (Ref. No. DST/TMD/CERI/RES/2020/32 (G) dated 03.06.2021) under
TMD-W&CE Scheme for completing this work.
References
1. Vanaja DS, Stonier AA, Mani G, Murugesan S (2021) Investigation and validation of solar
photovoltaic fed modular multilevel inverter for marine water pumping applications. Electr
Eng. https://doi.org/10.1007/s00202-021-01370-x
2. Jalhotra M, Sahu LK, Gupta S, Gautam SP (2021) Highly resilient fault-tolerant topology of
single-phase multilevel inverter. IEEE J Emerg Select Topics Power Electron 9(2)
3. Kumar M (2021) Open circuit fault detection and switch identification for LS-PWM H- bridge
inverter. IEEE Trans Circuits Syst—Ii: Express Briefs 68(4)
4. Majumder MG, Rakesh R, Gopakumar K, Umanand L, Al-Haddad K, Jarzyna W (2021) A
fault-tolerant five-level inverter topology with reduced component count for OEIM drives.
Fault Detection and Diagnostics in a Cascaded Multilevel Inverter Using … 587
1 Introduction
VSIs are typically used for generating alternating three-phase voltages of variable
magnitude and frequency voltages from a fixed DC-source for the different applica-
tions such as variable speed or torque drives [1] traction drives or electrical vehicles
[2] STATCOM [3] power system distributed generation [4] solar photo voltaic cells
[5]. VSIs in electrical industrial markets have shown to be more efficient, depend-
able, and quicker in dynamic reaction, as well as capable of operating motors that
have been de-rated [6] for low power applications, To increase the quality of the
voltage source inverter output line voltage, the number of pulses is increased [7],
i.e., P = 2N + 1, where ‘N’ represents the number of triggering instants for a quarter
cycle of the fundamental voltage. However, due to higher switching losses in power
semiconductor devices, low-frequency device switching is favored at higher levels
[8]. At low switching frequency, Odd harmonics surround the fundamental compo-
nent in pole voltage of a voltage source inverter (VSI) [9]. Various PWM techniques,
such as the traditional SPWM, SVPWM, and SHE PWM, have been proposed for
enhancing the inverter performance [10]. This paper presents the SHE technique,
and several solutions for bipolar PWM waveform are examined. The primary dis-
tinction among the discussed modulation systems is the generation of Pulse Width
Modulation(PWM) signals to switch ON and OFF the corresponding power elec-
tronic devices [11]. In the early 1970s the SHE PWM method was established with
inverter angles switching based on off-line calculations [12]. This strategy is based
Vinesh Agarwal, and Ashish Maheshwari These authors contributed equally to this work.
Figure 1 depicts the setup of a two-level voltage source inverter. Every leg of the
inverter contains two number of power electronic switches, pole voltage of any one
phase is calculated using the DC bus’s midpoint ‘O,’ i.e., VR O , VY O and VB O . It
is observed that the upper and lower switches are to be operated in complimentary
ways to minimize the short circuit condition during DC supply transients. It has been
recommended for keeping a minimal delay period for both the switches of the similar
leg of the inverter must be turned OFF. While S R1 is turned ON and S R2 is turned
OFF, the Pole voltage VR O = Vdc /2; when S R2 is turned on and S R1 is turned off, the
pole voltage VR O = −Vdc /2. The voltage waveform of phase R is shown in Fig. 2
Fig. 1 The 3 phase two-level Voltage Source Inverter fed with a squirrel cage induction motor
592 M. Chaitanya Krishna Prasad et al.
where two switching instants (α1 and α2 ) each quarter waveform, i.e., N is equal to
2. The number of Pulses (P) for the two switching angles might be calculated using
the expression P = 2N + 1, where, P = 5 indicates the switching value of frequency
is 5 times that of the basic inverter value of frequency.
It should be noticed that the symmetric characteristics of two-level PWM wave-
forms are retained as shown in Fig. 2. For both the quarter cycle(QWS) and half
cycle (HWS) periods in each cycle of the waveform. Equations (1) and (2) show the
mathematical expressions that represent QWS and HWS requirements.
where θm indicates either the positive or the negative maximum angle with respect
to fundamental R-phase voltage.
The SHE PWM approach may totally minimize odd (N–1) non-triplen unwanted
harmonics from the output line voltage, Here N denotes the total number of the
switching angles in a quarter-wave cycle. In the present research paper, two switching
angles are employed to eliminate the 5th odd harmonic while retaining the correct
fundamental required voltage value. SHE PWM approach is based on the Fourier-
series formula for pole voltage VR O shown in Fig. 2, It is written as:
∞
Vout = (an cos(nθ) + bn sin(nθ)) (3)
n=1
Identification of Multiple Solutions Using Two-Step Optimization … 593
2Vdc
bn = 1 − 2cos(nα1 ) + 2cos(nα2 ) (4)
nπ
here, Vdc is the DC-source voltage. The values of switching angles are found by
solving the following non-linear equation sets, stated in Eqs. (5) and (6) for the
elimination of the 5th harmonic component while keeping a specified fundamental
component.
2Vdc
V1 = 1 − 2cos(α1 ) + 2cos(α2 ) (5)
π
2Vdc
V5 = 1 − 2cos(5α1 ) + 2cos(5α2 ) = 0 (6)
5π
The optimal switching angles determined by Eqs. (5), (6) are constrained by in-
equality constraint mentioned in Eq. (7). Switching angles can be symmetrically
summarized in the following table enabling continuous inverter operation across the
whole modulation range.
0 ≤ α1 ≤ α2 ≤ π/2 (7)
F(α) = H (10)
594 M. Chaitanya Krishna Prasad et al.
where,
1 + 2cos(α2 ) − 2cos(α1 )
F(α) = ,
1 + 2cos(5α2 ) − 2cos(5α1 )
T
H = M ∗ 0 , and
α = α1 α2
Next, Jacobian matrix for non-linear equation set is solved by using Eq. (11).
∂ F i (α) ∂ F1i (α)
1
∂α1 ∂α2 2sin(α1 ) −2sin(α2 )
J (α) =
i
= (11)
∂ F2i (α) ∂ F2i (α) 10sin(5α1 ) −10sin(5α2 )
∂α1 ∂α2
Initial values are used for the switching angles of M ∗ = 0.01, displacement vector
α might be obtained as follows:
Increase the value of M ∗ in 0.01 in increments to obtain the ideal switching angles for
the whole modulation index. Figures 3 and 4 show two distinct solution sets. The first
set of the switching angle solutions was discovered within 60◦ , whereas the second set
Identification of Multiple Solutions Using Two-Step Optimization … 595
was discovered within 90◦ . Which is shown in Fig. 5, the removal of first significant
5th harmonics are achieved for solutions set1 for the modulation index range M =
0 to 0.95. When compared the solutions set 1 with solutions set 2 eliminates the
5th harmonic across a relatively limited range of the M, i.e., M <= 0.80. Figure 6
depicts the accomplishment of the two solutions sets with reference to weighted
THD, In terms of VW T H D , and it can be demonstrated that the solutions set 2 distinctly
exceeds solutions set 1 for the range 0 < M < 0.8. However, within a small range,
solution-set1 provides a marginal gain in higher M values above 0.8. Figure 7 depicts
two dimension of α2 and α1 plane denoted with restrictions. 0 < α1 < α2 < π/2.
Furthermore, solutions constrained to the typical triangular area denoted with 0 < α1
and α1 < α2, α2 < π/2 in Fig. 7. Solid and dotted lines indicate the curves that
596 M. Chaitanya Krishna Prasad et al.
Fig. 7 2D representation of
solutions for α2 and α1
reflect two sets of solutions for SHE where N = 2. To assess quality in line voltage,
current, and total harmonic distortion VT H D , IT H D are to be determined by Eqs. (14),
(15). Furthermore, uncontrolled voltage harmonics introduce harmonic distortion
into the inverter’s line voltages.
In2
n=6k±1
IT H D = (14)
I12
Fn2 /n 2
n=6k±1
VW T H D = (15)
F12
k = 1, 2, 3, 4, 5....
Identification of Multiple Solutions Using Two-Step Optimization … 597
Fig. 9 V RY voltage waveform of the a solutions set 1, b solutions set 1 harmonics spectra and c
solutions set 2, d solutions set 2 harmonics spectra for the M = 0.65 and P = 5
Fig. 10 V RY voltage waveform of the a solutions set 1, b solutions set 1 harmonics spectra and c
solutions set 2, d solutions set 2 harmonics spectra for the M=0.9 and P = 5
Table 1 Weighted harmonic distortion values and Optimum switching angle values for the two
solution sets
Optimum Solution Switching Angles α1 Switching Angles α2 Simulated voltage
value V_{WTHD}
M = 0.65
Solution Set 1 22.35 41.53 0.1140
Solution Set 2 73.951 84.20 0.0591
M = 0.9
Solution Set 1 21.08 27.96 0.0409
Solution Set 2 87.15 89 0.0481
index of the two solutions sets, the modulation index of M = 0.9 is used in this
example.
Figure 10a depicts the line voltage waveform for solution sets 1 and 10(c) for
solution sets 2. In particular, no solution for claimed 5th harmonic removal possible at
the M = 0.9 for solutions set 2, resulting in the greater amplitude for the 7th harmonics
are compared with a solution set 1, as represented the FFT ranges of Fig. 10b, d,
clearly shows that the solution set 1 results have full 5th harmonic removal, showing
higher harmonics minimization or elimination at the M = 0.9. Table 1 summarizes
switching angle instances and overall distortions value of two solutions sets. Figures 5
Identification of Multiple Solutions Using Two-Step Optimization … 599
and 6 indicate the modulation index range is extended to 0.95 from 0.8 with reduced
total harmonic distortion. These findings can be further implemented in induction
motor drives, renewable energy integration, real-time microgrid optimization for
home appliances, etc.,
5 Conclusion
This work reports on an expansion of the solutions linked to SHE approaches for
two-level inverters. In a quarter cycle, a hybrid GA-NR approach is used to identify
various sets of solutions for the two switching angles. The comparison and analysis
of several solution sets were performed. As compared to solution set 1, solution
set 2 significantly decreases the 5th harmonic minimization solutions ranging from
(0–0.95) to (0–0.79). With regard to the voltage THD performance, solutions set 2
distinctly outperforms the solution set 1 up to the M = 0.8, and solution set 1 assures
ideal performance behind M = 0.8. The simulation findings for the bipolar-type
waveform are in order and good enough for confirming the robustness of findings
and this recommended solution augmentation.
References
1. Iqbal A, Khan MA (2008) A simple approach to space vector PWM signal generation for a
five-phase voltage source inverter. Ann IEEE India Conf 2008:418–424. https://doi.org/10.
1109/INDCON.2008.4768760
2. Su G, Tang L (2011) Current source inverter based traction drive for EV battery charging
applications. IEEE Veh Power Propuls Conf 2011:1–6. https://doi.org/10.1109/VPPC.2011.
6043143
3. Kantaria RA, Joshi SK, Siddhapura KR (2011) A novel hysteresis control technique of VSI
based STATCOM. India Int Conf Power Electron 2010(IICPE2010):1–5. https://doi.org/10.
1109/IICPE.2011.5728110
4. Kantaria RA, Joshi SK, Siddhapura KR (2011) A novel hysteresis control technique of VSI
based STATCOM. India Int Conf Power Electron 2010(IICPE2010):1–5. https://doi.org/10.
1109/IICPE.2011.5728110
5. Meshram S, Agnihotri G, Gupta S (2012) The steady state analysis of Z-source inverter based
solar power generation system. In: 2012 IEEE 5th India international conference on power
electronics (IICPE), pp 1–6. https://doi.org/10.1109/IICPE.2012.6450366
6. Kharjule S (2015) Voltage source inverter. Int Conf Energy Syst App 2015:537–542. https://
doi.org/10.1109/ICESA.2015.7503407
7. Holmes D, Lipo T (2003) Pulse width modulation for power converters: principles and practice.
Wiley
8. Tripathi A, In: Student member, IEEE, G Narayanany; investigations on optimal pulse-width
modulation to minimize total harmonic distortion in the line current
9. Abdul Azeez N, Mathew J, Gopakumar K, Cecati C (2013)A 5th and 7th order harmonic
suppression scheme for open-end winding asymmetrical six-phase IM drive using capacitor-
fed inverter. In: IECON 2013—39th annual conference of the IEEE industrial electronics
society, pp 5118–5123. https://doi.org/10.1109/IECON.2013.6699966
600 M. Chaitanya Krishna Prasad et al.
10. Sinha A, Jana KC, Das MK, An inclusive review on different multi-level inverter topologies,
their modulation and control strategies for a grid connected photo-voltaic system
11. Corzine KA, Wielebski MW, Peng F, Wang J (2003) Control of cascaded multi level inverters
in electrical machines and drives conference, IEMDC’03. IEEE Int 149–1555
12. Omara AM, Moschopoulos G (2018) Implementation of SHE-PWM technique for parallel
voltage source inverters employed in uninterruptible power supplies. IEEE Int Telecommun
Energy Conf (INTELEC) 2018:1–6. https://doi.org/10.1109/INTLEC.2018.8612396
13. Yang K, Fu S, Hu H, Yuan R, Yu W (2010) Real solution number of the nonlinear equations in
the SHEPWM technology. Int Conf Intell Control Inf Process 2010:446–450. https://doi.org/
10.1109/ICICIP.2010.5565322
14. Omara AM, Sleptsov M, El-Nemr MK (2018) Genetic algorithm optimization of SHE-PWM
technique for paralleled two-module VSIs employed in electric drive systems. In: 2018 25th
international workshop on electric drives: optimization in control of electric drives (IWED),
pp 1–6. https://doi.org/10.1109/IWED.2018.8321380
15. Ahmad S, Ashraf I, Iqbal A, Fatimi MAA (2018) SHE PWM for multilevel inverter using mod-
ified NR and pattern generation for wide range of solutions. In: 2018 IEEE 12th international
conference on compatibility, power electronics and power engineering (CPE-POWERENG
2018), pp 1–6. https://doi.org/10.1109/CPE.2018.8372498
16. Artificial Neural Network and Newton Raphson (ANN-NR) Algorithm Based Selective Har-
monic Elimination in Cascaded Multilevel Inverter for PV Applications SANJEEVIKUMAR
PADMANABAN, (Senior Member, IEEE), C. DHANAMJAYULU 2 , (Member, IEEE), AND
BASEEM KHAN 3 , (Member, IEEE)
17. Patil SD, Kadwane SG (2017) Application of optimization technique in SHE controlled multi-
level inverter. In: 2017 international conference on energy, communication, data analytics and
soft computing (ICECDS), pp 26–30. https://doi.org/10.1109/ICECDS.2017.8390050
18. Kavousi A, Vahidi B, Salehi R, Bakhshizadeh MK, Application of the Bee Algorithm for
selective harmonic elimination strategy in multilevel inverters
19. Jiang Y, Li X, Qin C, Xing X, Chen Z (2022) Improved particle swarm optimization based
selective harmonic elimination and neutral point balance control for three-level inverter in low-
voltage ride-through operation. IEEE Trans Ind Inf 18(1):642–652. https://doi.org/10.1109/TII.
2021.3062625
20. Deniz E, Aydogmus O, Implementation of ANN-based selective harmonic elimination PWM
using hybrid genetic algorithm-based optimization
21. Kato T (1999) Sequential homotopy-based computation of multiple solutions for selected
harmonic elimination in PWM inverters. IEEE Trans Circ Syst I: Fund Theory Appl 46(5):586–
593. https://doi.org/10.1109/81.762924
22. Guan Eryong, Song Pinggang, Ye Manyuan, Bin Wu (2005) Selective harmonic elimination
techniques for multilevel cascaded H-bridge inverters. Int Conf Power Electron Drives Syst
2005:1441–1446. https://doi.org/10.1109/PEDS.2005.1619915
23. Mythili M, Kayalvizhi N (2013) Harmonic minimization in multilevel inverters using selective
harmonic elimination PWM technique. Int Conf Renew Energy Sustain Energy (ICRESE)
2013:70–74. https://doi.org/10.1109/ICRESE.2013.6927790
24. Dahidah MSA, Agelidis VG (2007) Non-Symmetrical selective harmonic elimination PWM
techniques: the unipolar waveform. IEEE Power Electron Spec Conf 2007:1885–1891. https://
doi.org/10.1109/PESC.2007.4342290
A Review on Recent Trends in Charging
Stations for Electric Vehicles
1 Introduction
The international introduction of electric vehicles (EVs) will see a change in pri- vate
passenger car usage, operation, and management [1]. To construct a large amount
of electric vehicle charging stations with appropriate locations, a multi- level layout
planning model, which minimizes the initial construction investment and the users
charging cost at the same time, is necessary [2]. With the increased popularity of
electric vehicles and increasing awareness of renewable energy sys- tems, charging
of EVs should be crucial to enable EVowners to align with available RES generation
and available charging time, and at the same time, maximize profits by taking into
consideration the variance in grid prices [3]. The renewable energy-supported system
has the potential to meet the increasing charging demand to reduce the effect of EV
charging on the grid [4]. The rise of renewable energies on the one hand, and the
rise of electric cars on the other hand, requires expensive expansion projects of the
low voltage network in many cases. With the help of alternative approaches, such as
Vehicle to Grid (V2G) applications, the expansion actions can be circumvented [5].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 601
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_46
602 V. C. Thombare et al.
2 EV Charging Technology
For EV charging stations, the following standards have been mentioned in the
literature.
IES Standards
AC charging—3 levels of AC power charging.
Level 1—The voltage is 120 v, 1 phase supply, and the current used to charge is 12A
to 16 A.
Level 2—The 1 phase charging voltage 240 v, current up to 60 A.
Level 3—3 phase supply and charging voltages is 400 v ac, current from 32 to 63 A.
Level 3 is used when charging time is fast around 30 min.
DC fast charging—The electric current is from 400 A, and the power rating is from
100 to 200 kW. The charging time of the DC charger is faster than the AC charger.
The AC charging system of the SAE standards is the same as the IEC standards. But,
the DC charging system of the SAE standards is different from the IEC one. It is
divided into 3 levels, and the DC output voltage can change to suit various EVs and
batteries.
SAE Standards
Level 1—80 A of electric current and the rated power capacity is 40 KW.
Level 2—electric current is 200 A and the rated power capacity is 90 KW.
Level 3—electric current is 400 A and the rated power is 240 KW.
CHAdeMO Standards
This standard is made by the Tokyo Electric Power Company (TEPCO), The electric
current is 400 A and the rated power capacity is 240 kW [6].
One important reason is energy storage because nothing can be compared with the
specific energy found in gasoline, which is around 10,000 Wh/kg against 150 Wh/kg
found in the best Li-ion battery. Other alternatives instead of batteries are flywheels
and ultracapacitors, which present the same energy limitation as electrochemical
batteries [7] (Figs. 1 and 2).
A Review on Recent Trends in Charging Stations for Electric Vehicles 603
Standalone PV system term means the system has no connection to the grid. Off-
grid systems are suitable for EV charging stations on roads. Backup batteries are
used most commonly for the off-grid system for energy storage. The PV-Battery
Energy Storage Charging stations either work on grid or solar power, and both of
them have advantages and limitations. Grid power increases the reliability of the
system, but at the same time, increases the cost of the energy; and solar power
decreases the cost of energy, but at the same time, it will decrease the reliability of
604 V. C. Thombare et al.
the system [8]. The PV array delivers power at the DC link through boost converters,
whereas the DG set and grid exchange power with the PCC. The local loads also take
power from the PCC. On both sides of the switch, a filter is placed, which filters the
switching noise in islanded and DG set or grid-connected modes [9]. The process of
Maximum PowerPoint Tracking is performed by a boost converter which facilitates
the wind turbine to operate in maximum PowerPoint. The energy storage unit is
connected to the DC bus through the bidirectional buck-boost converter. The excess
power from renewable energy sources is used to charge the ESU [10]. This paper
proposes a hybrid charging station for an electric vehicle, it uses both solar energy
and conventional energy. The charging station will charge the EV when solar power
is available. When solar is not available, it will use grid power for charging the EV.
When solar is available, but EV is not present for the charging, then grid-tie inverter
technology is used to feed power back to the grid. Generally, a grid-tie inverter uses
a transformer to step the voltage, the use of a transformer makes the system costly.
Solar power is fed to the grid by using a voltage source PWM inverter. When the
dc bus charging station is fed from the PV array and operates in islanded mode, the
power supply is limited by the PV system. Hence, the MPPT algorithm is applied
to the PV array. A dc bus voltage can be affected by the increase in the number of
PEVs or due to weather conditions (Table 1).
Charging stations for EV, supplied by small-scale wind energy systems, is
reasonable because of the following reasons:
(1) Immense advancement in the innovation of power converter topology for small-
scale wind energy system
(2) Excess electricity productivity of the system from slow winds which are frequent
Economics of the wind energy system can be increased by absorbing excess wind
production. The EV can absorb that wind production [11].
The process of MPPT is performed by a boost converter that fascinates the wind
turbine to operate at maximum power. The energy storage system is connected to
the DC bus through a bidirectional buck-boost converter. The energy storage system
can charge the EV when power from the wind turbine is not available. This charging
station can be installed at shopping malls, universities, etc. To ensure the reliable
operation of the charging station, there is a need to consider different operating
modes (Figs. 3, 4 and Table 2):
Mode 1—WPCS with grid connection.
Mode 2—Inversion operation.
Mode 3—Rectification operation.
Boost converter[17]
Dual converter[18]
Buck-boost converter[16]
Converter
SEPIC converter[21] circuits Isolated converter[19]
Cuk converter[20]
5 Conclusion
This review has covered some of the designs and converter topologies required for the
effective operation of charging stations of electrical vehicles, which focus on the de-
sign of MPPT controllers, hybridization of charging stations, the software required for
the greater operation of the charging station, and grid synchronization. The demands
stated by PV modules have also been reviewed; in particular, the effective design for
the charging station has been investigated. According to the above discussion, the
converter should boost the voltage as per the demand for the electrical vehicle. The
specifications of the DC-DC converters for the application of this paper were inves-
tigated. Compared with the standalone charging station, a hybrid charging station
608 V. C. Thombare et al.
References
1. Foley AM, Winning IJ, Ó Gallachóir BPÓ (2010) State-of-the-art in electric vehicle charging
infrastructure. In: 2010 IEEE vehicle power and propulsion conference, pp 1– 6
2. Jin M, Shi R, Zhang N, Li Y (2012) Study on multi-level layout planning of electric vehicle
charging stations based on an improved genetic algorithm 5–10
3. Li H, Liu H, Ji A, et al (2013) Design of a hybrid solar-wind powered charging station for
electric vehicles. In: 2013 international conference on materials for renewable energy and
environment, pp 977–981
4. Wang R, Wang P, Xiao G (2014) Two-stage mechanism design for electric vehicle charging
involving renewable energy. In: 2014 international conference on connected vehicles and expo
(ICCVE), pp 421–426
5. Aldejohann C, Maasmann J, Horenkamp W, et al (2014) Testing environment for vehicle to
grid (V2G) applications for investigating a voltage stability support method. In: 2014 IEEE
transportation electrification conference and expo (ITEC), pp 1–6
6. Dost P, Bouabana A, Sourkounis C (2014) On analysis of electric vehicles DC-quick- chargers
based on the CHAdeMO protocol regarding the connected systems and security behaviour. In:
IECON 2014 - 40th annual conference of the IEEE industrial electronics society, pp 4492–4497
7. Takeda K, Takahashi C, Arita H, et al (2014) Design of hybrid energy storage system using
dual batteries for renewable applications. In: 2014 IEEE PES general meeting | conference
exposition, pp 1–5
8. Nizam M, Wicaksono FXR (2018) Design and optimization of solar, wind, and dis- tributed
energy resource (DER) hybrid power plant for electric vehicle (EV) charging station in rural
area. In: 2018 5th international conference on electric vehicular technology (ICEVT), pp 41–45
9. Verma A, Singh B (2018) A Solar PV, BES, Grid and DG set based hybrid charging station
for uninterruptible charging at minimized charging cost. In: 2018 IEEE industry applications
society annual meeting (IAS), pp 1–8
10. Vijayakumar R (2018) Design of public plug-in electric vehicle charging station for improving
LVRT capability of grid connected wind power generation. International conference on soft-
computing and network security (ICSNS) 2018:1–6
11. Koochaki A, Divandari M, Amiri E, Dobzhanskyi O (2018) Optimal design of solar- wind
hybrid system using teaching-learning based optimization applied in charging station for elec-
tric vehicles. In: 2018 IEEE transportation electrification confer- ence and expo (ITEC), pp
1–6
12. Narula A, Verma V (2018) PV fed cascaded modified T source converter for DC support to
grid coupled inverters. In: 2018 IEEE international conference on power electronics, drives
and energy systems (PEDES), pp 1–6
13. Uno M, Sugiyama K (2019) Switched capacitor converter based multiport converter integrating
bidirectional PWM and series-resonant converters for standalone photovoltaic systems. IEEE
Trans Power Electron 34:1394–1406. https://doi.org/10.1109/TPEL.2018.2828984
14. Jensanyayut T, Phongtrakul T, Yenchamchalit K, Kongjeen Y (2020) Design of solar- powered
charging station for electric vehicles in power distribution system 7–10. https://doi.org/10.
1109/iEECON48109.2020.229545
15. Fareed N, Kumar MVM (2020) Single stage grid tied solar PV system with a high gain bi-
directional converter for battery management. In: 2020 international conference on power
electronics and renewable energy applications (PEREA), pp 1–6
16. Nesrin AKN, Sukanya M, Joseph KD (2020) Switched dual input buckboost inverter for contin-
uous power operation with single stage conversion. In: 2020 international conference on power
electronics and renewable energy applications (PEREA), pp 1–6
A Review on Recent Trends in Charging Stations for Electric Vehicles 609
17. Singh S, Manna S, Hasan Mansoori MI, Akella AK (2020) Implementation of perturb amp;
observe MPPT technique using boost converter in PV system. In: 2020 international conference
on computational intelligence for smart power system and sustainable energy (CISPSSE), pp
1–4
18. Tayebi SM, Chen X, Batarseh I (2020) Control design of a dual-input LLC converter for
PV-battery applications. In: 2020 IEEE applied power electronics conference and exposition
(APEC), pp 917–921
19. Wei Y, Luo Q, Mantooth A (2020) A function decoupling partially isolated high voltage gain
DC/DC Converter for PV application. In: 2020 IEEE transportation electrifi- cation conference
expo (ITEC), pp 1–5
20. Sudiharto I, Murdianto FD, Budikarso A, Wibisana A (2020) CUK converter using FLC to
manage power consumption from PV directly. In: 2020 international conference on applied
science and technology (iCAST), pp 575–579
21. Manikandan K, Sivabalan A, Sundar R, Surya P (2020) A study of landsman, sepic and zeta
converter by particle swarm optimization technique. In: 2020 6th international conference on
advanced computing and communication systems (ICACCS), pp 1035–1038
IoT-Based Vehicle Charging Eco System
for Smart Cities
1 Introduction
The retained type of electrically operated driven quality storage in electric vehicles
ended up being batteries (EVs). Over the past couple of years, the development
around transportation has essentially changed, culminating in the relatively critical
enhancement of social demands amid the local situation [1]. Since the battery is a
regularly finished gadget for the unity garage locale, finding Charge’s worth now
plays a fundamental significance. Satisfactory behaviour is being established by
strategies for the use of electric cars to eliminate the devouring engines as a result of
the upward drive in CO2 accomplished through business parts and transportation.
This course of action has been professed to lower the amount of CO2 and the
disclosures for new cleaner efficiency technologies have been delayed for the sake
of the fact. Electric vehicles (EVs) tended to be a decision to force CO2 discharges
as a finding. As the proportion of electric cars is improving, electric vehicles are
redesigning bit by bit around the globe, there might be excitement to bring in power
electric vehicle charging machines in the car leaving systems or networks. With
Engel, a worldwide force association, in the UK, Nissan distributed a vehicle-to-
grid (V2G) procedure to full-size cars. Nissan has genuinely been researching and
despite this endeavour, in addition to scrutinizing the topic of V2G networks, they
are undeniably the first of their kind inside the UK and a portion of the most outra-
geous stunning partnerships to this stage. Smart Grid gear’s compromise in power
frameworks will affect unthinkable trade in ownership of structures [2]. In addition
to wandering forward dissipating non-conventional power tools, the conscious trade
load fundamental belief implies an optimal propensity to concentrate on conditions
in planning rate and in addition to output.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 611
H. Muthusamy et al. (eds.), Robotics, Control and Computer Vision, Lecture Notes
in Electrical Engineering 1009, https://doi.org/10.1007/978-981-99-0236-1_47
612 N. Dinesh Kumar and F. B. Shiddanagouda
2 Literature Survey
As more countries move towards pollution-free traffic, EVs are gaining more
momentum around the world. EV charging facilities would also be a basic require-
ment as the number of EVs grows. An IoT device will certainly streamline EV
charging efficiency and look at the consequences. For transportation systems and
V2G services, this approach is useful. This new system would boost public planning
and make it easier for the city. We can effectively handle the whole V2G infrastruc-
ture from IoT, which would certainly save time as well as resources. The job is to
design a smartly programmed application for grid communication and understand
the different tariff rates of the grid. Both the grid energy delivery rate and the grid
power take-off tariff rate will be used in the tariff rates. If the customer has charged
the car battery full, he will supply some power to the grid and collect some cash.
Using the ARM Mbed controller, the SoC is measured and sent to the cloud. When
it comes to the grid, the program can also show the user’s battery status (SoC) [7].
3 Methodology
When more nations are heading towards free traffic defilement, EVs are raising
more prestige across the globe. In addition, as the number of EVs increases, the EV
charging structure would be an integral necessity. For transportation networks and
V2G structures, this technique is useful. This suggested system would strengthen the
masterminding of the city and make life easier for the city. Without a very striking
stretch deal for the whole V2G scheme that sets time and resources back. This job is
to develop a wonderful application to communicate with the system and to consider
the system’s real cost rates. The application program will display the client’s battery
status (Sock) as he goes to the structure in the same way.
A. Existing System
In the existing system, the Electric Car battery-charging system and the Wired
Charging and Control Infrastructure are provided. The rise in oil prices and envi-
ronmental problems has led to growing demand for technology for clean vehicles
such as EVs and EV fuel cells. Electric vehicles (EVs) are now becoming a more
attractive choice than conventional automobiles (CV). EVs are powered by electric
batteries which have to be recharged with energy from the grid. It is evident that
EVs are a direct link between the sectors of transport and electricity. Moreover, since
they have low energy usage and zero emissions, EVs would be better positioned to
reduce the environmental impacts of transport and energy dependency. Off-board and
on-board, unidirectional, and bi-directional power flow battery chargers are widely
used for two types of battery chargers. During the daytime, on-board chargers can
be used to charge from the electrical socket at the office or home socket, or shopping
centre. Off-board charging is similar to those used by conventional vehicles at a gas
station, but the purpose is to charge easily. In contrast to off-board charging, on-board
charging infrastructure is less suitable. The existing system is shown in Fig. 1.
614 N. Dinesh Kumar and F. B. Shiddanagouda
2 Proposed System
We use the Raspberry pi controller board in the proposed framework, which functions
as a small-sized computer. It can be used for much of the same things as a PC
after you add a keyboard, mouse, or trackpad. To create a vehicle charging device,
the Raspberry pi controller interface with external modules is used. Three separate
types of RFID passive tags are been used, two of which are permitted and one is
unauthorized. The RFID tags enable customer details and automated billing to be
identified. In the Thing speak IoT server, consumer data is stored. In the Thing speak
server, the car charging battery state can be checked, so data can be downloaded
and analyzed. Every cloud has its own special API Key and IP Address. Your cloud
channel can be made private or public.
3 Working Principle
Electric Vehicles today have an extraordinarily significant charge; together it has been
an important verbal trade media for public use in addition to business use. At the same
time, as verbal exchange, the battery can often become vain, particularly in emergency
cases where access to an essential charger is not feasible. To cope with this problem,
coin-based fully mobile battery chargers are made. This computer is identical to
coin-based mobile phones, which were popular at the beginning of the twentieth
century. Initially, while we set the coin properly into the coin insertion slot, then the
mobile cell phone will really be paid; it will compare the coin picture stored within
the source of information. If the current picture of the logo and the saved photograph
are matched, it will screen on the broadcast. After that, in addition to spending, we
will really start joining our device to the billing plug. With the aid of fixed charges,
billing relies on the coin. Similar to the coin-based technology, RFID tag—plastic
card secured and easy-to-use technology is chosen in the proposed method (Fig. 2).
The heart of the block is raspberry pi which interfaces with different modules such
as voltage sensor, RFID reader, LCD, chargeable battery, and Wi-Fi. The Raspberry
pi controller and the external module are attached to a 230 V Step-down transformer
that is transformed to a 12VAC supply at the bridge rectifier with near-ideal filtering
to 12VDC (EM-18 Reader Module TTL Pin). Three separate Passive RFID (Radio
IoT-Based Vehicle Charging Eco System for Smart Cities 615
Frequency Identification) tag forms are used with protected authorization. The LCD
monitor displays a card that has been swiped. Using the EM-18, Reader Module first
swipe the RFID tag and input the number to be deducted for charging the car. It is
assumed that the tags are all recharged with RS 1000. The complete flow of charging
a battery is shown in Fig. 3.
4 Simulation Tools
5 Operating system—Raspberries OS
It’s a single-board computer, based on an ARM processor with built-in graphics and
sound. If a keyboard and mouse or trackpad is attached, it can be used for much of
the same tasks as a PC. It is possible to use storage in the form of an SD card, or
hard drive using USB. We can import images of operating systems from Raspberry
Pi downloads or the Noobs system, which enables several operating systems, and
OSMC which is like a Kodi media player. It can also be used in several languages
to practice programming, some of which are ideal for a ten-year-old, up to advanced
Python and Java. External Electronics like musical instruments, lights, motors, and
robots can be operated using R-Pi.
2. Editor and Compiler—Python
Python is a high-level, disrupted, programming language that is open source and
very easy to use. It is often known to be a very powerful language, too. Python is
a perfect language for programming. With the help of which consumer applications
and gaming applications can be made very fast.
3. Thing speak
Thing speak is a cloud platform that is open source. This is the location where infor-
mation about a real-time sensor is uploaded. Download and review the information
and it can be used for our own purposes. Per cloud have its own special API Key and
IP Address. Cloud channels may be made public or private.
In the application, the user can see the realities. The user may also use the product
application to identify the places of the charging station. As easily as the consumer
learns about the circumstances of his vehicle battery because of the fact, he will
determine without problems whether or not to continue giving power to the matrix
or to take power from the system depending primarily on the levy costs. To get
the desired results, IoT architecture uses sensors. It uses sensors or controls so the
essential operating system is used. Gadgets like a cell phone or tablet PC are done
to examine the ultimate results or a final product which lessens the endeavour to get
the measurements. Figure 4 shows the complete hardware circuit.
LCD display in Figs. 5 and 6 shows a particular RFID tag C1 debited with 10
Rupees for charging purposes and the remaining balance of rupees 990 left in the
card. The voltage level is 13 v in the battery and T indicates the Thing speak web
server where the data is stored in the cloud. Hence, the propositional amount of
charging will be made to the battery of the C1 user. And if any unauthorized tag
is used, no user data will be available and hence the battery charging will not be
successful.
Figure 7 shows the user C1 data graphical representation in the IoT server “Thing
Speak” web page. It displays the balance amount on the card along with the date
and time. Similarly, Fig. 8, shows the user C2 data graphical representation in the
IoT server “thing speak” web page. Figure 9 shows unauthorized user RFID Tag C3
output data graphical representation in the IoT server “thing speaks” web page.
IoT-Based Vehicle Charging Eco System for Smart Cities 617
Voltage levels of the battery before and after charging are shown with graphical
representation in Thing Speak as shown in the Figs. 10 and 11. In Thing Speak
graphical representation, it is clearly visible the time needed to charge the battery
for deducted amount.
5 Conclusion
The battery-charging infrastructure for an electric vehicle was reviewed in this article.
Transportation is the greatest cause of environmental emissions in any region. To
address the climate crisis, we need to make the cars on our highways as safe as
possible. Vehicle pollutants are not only bad for our atmosphere; they’re bad for
our wellbeing. Gasoline and diesel-powered automotive air emissions cause asthma,
bronchitis, cancer, and premature death. The Electronic Vehicle charging system
shows promising results. In this paper, passive RFID tags permit to detect the
customer information and automatic billing. The customer data is stored in the Thing
speak IoT cloud server so that the server will always be updated and will be aware
of the status of every customer. Therefore, the implementation of an EV charging
management system (CMS) is important to automatically and effectively organize
these large charging demands by leveraging the benefits of IoT technology.
References
1. Benedetto M, Ortenzi F, Lidozzi A, Solero L (2021) Design and implementation of reduced grid
impact charging station for public transportation applications. World Electr. Veh. J. 12:28
2. Vermesan O, Friess P (2013) Internet of things: converging technologies for smart environments
and integrated ecosystems. River Publishers
3. Yao L, Chen YQ, Lim WH (2015) Internet of things for electric vehicle: an ımproved decen-
tralized charging scheme. In: Proceedings of the 2015 IEEE ınternational conference on data
science and data ıntensive systems, Sydney, Australia, 11–13 December 2015; pp 651–658
4. Sousa RA, Monteiro V, Ferreira JC, Melendez AA, Afonso JL, Afonso JA (2018) Development
of an IoT system with smart charging current control for electric vehicles. In: Proceedings of the
IECON 2018–44th annual conference of the IEEE ındustrial electronics society, Washington,
DC, USA, 21–23 October 2018; pp. 4662–4667.
5. Yao L et al. (2015) Internet of things for electric vehicle: an ımproved decentralized charging
scheme. In: 2015 IEEE ınternational conference on data science and data ıntensive systems, pp
651–658
6. Sharma E, Bharath S, Devaramani A, Deepti SR, Kumar S (2019) IOT enabled smart charging
stations for electric vehicles. J Telecommun Study 2:34–39
620 N. Dinesh Kumar and F. B. Shiddanagouda