Vision@Nettrain
Vision@Nettrain
me/nettrain
Machine Intelligence
Features:
■ Motion images object detection over voice using deep learning algorithms
■ Ubiquitous computing and augmented reality in HCI
■ Learning and reasoning using Artificial Intelligence
■ Economic sustainability, mindfulness, and diversity in the age of artificial
intelligence and machine learning
■ Streaming analytics for healthcare and retail domains
Pethuru Raj is the chief architect in the Site Reliability Engineering (SRE) division
of Reliance Jio Platforms Ltd., Bangalore, India. He previously worked as a cloud
infrastructure architect in the IBM Global Cloud Center of Excellence (CoE), as
a TOGAF-certified enterprise architecture (EA) consultant in Wipro Consulting
Services (WCS) Division and as a lead architect in the corporate research (CR)
T.me/nettrain
division of Robert Bosch. In total, he has more than 19 years of IT industry experi-
ence and 8 years of research experience.
T.me/nettrain
Machine Intelligence
Computer Vision and Natural
Language Processing
Edited by
Pethuru Raj, P. Beaulah Soundarabai, and
D. Peter Augustine
T.me/nettrain
First edition published 2024
by CRC Press
2385 Executive Center Drive, Suite 320, Boca Raton, FL 33431
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2024 Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future
reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or
contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. For works that are not available on CCC, please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used
only for identification and explanation without intent to infringe.
ISBN: 9781032201993 (hbk)
ISBN: 9781032543727 (pbk)
ISBN: 9781003424550 (ebk)
DOI: 10.1201/9781003424550
Typeset in Garamond
by Newgen Publishing UK
T.me/nettrain
Contents
v
T.me/nettrain
vi | Contents
Index.............................................................................................................311
T.me/nettrain
Figures
vii
T.me/nettrain
viii | Figures
T.me/nettrain
Figures | ix
T.me/nettrain
x | Figures
T.me/nettrain
Tables
xi
T.me/nettrain
T.me/nettrain
Preface
xiii
T.me/nettrain
xiv | Preface
data into insights is being hugely simplified and speeded up. By feeding the know-
ledge discovered into IoT devices, the days of having cognitive devices are at hand.
Such edge devices can exhibit intelligent behavior.
Machine and deep learning (ML/DL) algorithms along with a host of enabling
frameworks, pre-trained models, and specialized accelerators come in handy in pro-
viding computer vision (CV) and natural language processing (NLP) applications.
These unique capabilities are being replicated in resource-intensive IoT devices and
machines. Machines that can see and perceive can bring forth deeper and decisive
acceleration, automation, and augmentation capabilities to businesses as well as
people in their everyday assignments and engagements. Machine vision is becoming
the grandiose reality with the distinct advances in computer vision and device instru-
mentation spaces. Machines are increasingly software-defined. That is, AI-enabled
machines are being deployed in mission-critical locations to minutely monitor
everything that is happening there and to take decisive actions in time with all the
clarity and alacrity.
These pioneering developments have laid down a stimulating and scintillating
foundation to visualize and realize a variety of sophisticated applications not only
for enterprising businesses but also for people at large. Empowering machines with
sensing, vision, perception, analytics, knowledge discovery and dissemination,
decision-making and action capabilities through a host of breakthrough digital tech-
nologies and tools goes a long way in establishing and sustaining intelligent systems
and societies. In an industrial setting, these technologies in conjunction with high-
speed and adaptive networking, such as 5G and Wi-Fi 6, can lead to a new indus-
trial revolution. Edge AI-inspired robots, drones, gizmos, and gadgets are bound to
facilitate smart manufacturing and Industry 4.0 and 5.0 applications. Similarly the
next-generation digitally transformed healthcare domain is set to immensely benefit
society through such vision-enabled machines. Technology-induced innovations can
aid experts and workers in conceiving and concretizing fresh ways and means to per-
form low-waste and high-efficiency industrial activities.
Machines are enabled to be interactive with other machines in the vicinity and
with remote machines through a host of networking options. Similarly, through nat-
ural interfaces, machines and men are able to interact purposefully. That is, human-
machine interfaces (HMIs) are seeing the grand reality these days with a plethora
of technological innovations and disruptions. Just as convolutional neural networks
(CNNs) facilitate machine vision, the faster maturity and stability of recurrent
neural networks (RNNs), a grandiose part of the deep learning (DL) field, have
strengthened the NLP capability of our everyday machines.
The third and final feature is to produce autonomous machines with self-
configuring, healing, defending, and managing features. With the steadily growing
power of machine learning (ML) algorithms, a dazzling array of industrial machinery,
medical instruments, defense equipment, consumer electronics, handhelds,
wearables, and other edge devices are being readied to intrinsically exhibit intelligent
T.me/nettrain
Preface | xv
behavior. Devices are becoming autonomous; therefore, they can continuously and
consistently deliver their designated services without any slowdown or breakdown.
In their actions and reactions, they become fault-tolerant, high-performing, self-
defending, adaptive, reactive, proactive, etc. In a nutshell, people and businesses will
get state-of-the-art products, solutions, and services through a bevy of cutting-edge
technologies.
It is possible to produce and supply a host of people-centric, real-time, service-
oriented, event-driven, and context-aware services with intelligent machines in and
around us. Thus, connected devices, digitized entities, and intelligent applications,
the business-driven IT is all set to be people IT in the years to come.
This book is to enlighten our esteemed readers with all the details on the
following topics.
T.me/nettrain
T.me/nettrain
Contributors
xvii
T.me/nettrain
xviii | Contributors
T.me/nettrain
Contributors | xix
T.me/nettrain
T.me/nettrain
Abbreviations
xxi
T.me/nettrain
xxii | Abbreviations
T.me/nettrain
Abbreviations | xxiii
T.me/nettrain
newgenprepdf
xxiv | Abbreviations
T.me/nettrain
Chapter 1
A New Frontier in
Machine Intelligence:
Creativity
Ardhendu G. Pathak
1.1 Introduction
On October 25, 2018, the topic of computer creativity, until then mostly an idle
curiosity for lay persons and a plaything for computer nerds, suddenly captured the
public imagination when a painting called “Portrait of Bellamy” (Figure 1.1) was
sold for $432,000 at Christie’s auction house [1]. This painting was created using a
type of Artificial Intelligence (AI) program called Generative Adversarial Network
(GAN). At the bottom right of the painting, a mathematical equation appeared
where normally the artists’ signature is placed.
Creativity has long been considered a gift from the gods. For example, the ancient
Greeks believed that the Muses –goddesses for different areas of artistic expression,
such as epic poetry –whispered in the ears of the mortals to impart the powers of
creativity. In India too, the story goes that “Kavikulguru” (the Dean of Poets) Kalidas
received a boon from the goddess Kali to compose his epic poems in Sanskrit.
The modern view of human creativity, however, considers it a part of the many
dimensions of the human intelligence. Being subjective and socially indexed, there
is no single accepted definition of creativity. Among many different classification
schemas, Boden [2] proposes that creative output can be classified as either combina-
torial, exploratory, or transformational creativity. Novel or improbable combinations
of familiar ideas are denoted as combinatorial creativity. The exploratory
DOI: 10.1201/9781003424550-1 1
T.me/nettrain
2 | Machine Intelligence
creativity involves the generation of novel ideas by the exploration of existing ideas
or constructs, while the transformational creativity involves generation of novel ideas
and constructs by transformation of an existing idea space in such a way that the new
constructs bear little or no relation to the existing ones.
The attempts to get computers to do something that can be fun or interesting
(“creative”) apart from mathematical computations started with the widespread
availability of computers. In this chapter, we will first briefly discuss the history
of such efforts. We will then see some examples of creative output generated by
computers using AI. Next, the legal and intellectual property rights implications of
the output autonomously generated by computers will be covered. We will conclude
with the outline of the philosophical debate around the artificial creativity domain.
T.me/nettrain
A New Frontier in Machine Intelligence | 3
McKinnon-Wood [4]. It generated short haikus such as the one below, based on the
words chosen by the user:
A program called JAPE was constructed in 1981 at the University of Edinburgh [5].
It produced short punning puzzles such as:
In the 1960s, for the early computer artists, especially for visual art, one of the sig-
nificant limitations was the output devices. Plotters, connected with a brush or pen,
were used in early experimentations. An example of a complex algorithmic work
from that period is a screen-print of a plotter drawing, entitled “Hommage à Paul
Klee 13/9/65 Nr. 2,” created by Frieder Nake in 1965. The work was inspired by
Paul Klee’s oil painting entitled “Highroads and Byroads” (1929) [3].
One of the pioneers of computer art or what is now called generative art, was
Harold Cohen. He was an accomplished painter and an outstanding engineer. He
developed a system called AARON that became increasingly sophisticated (and
large in size), and the output was exhibited at prestigious museums and galleries
[6]. When asked who was the artist, Cohen or Aaron, Harold gave the example of
the relationship between Renaissance painters and their assistants. This question of
authorship, which appears quite academic and abstract, has some serious commer-
cial and legal implications. We will discuss this later in the section related to intel-
lectual property (IP) rights.
With the advent of Artificial Intelligence (AI), the “picture” for computer cre-
ativity changed dramatically. This was due to the availability of powerful algorithms
and computational power coupled with advances in the related fields such as 3D
printing and interactive display technologies.
T.me/nettrain
4 | Machine Intelligence
Figure 1.2 Image generated using VQGAN +CLIP colab notebook. The text
prompt was “a cityscape in the style of Van Gogh.”
spans a spectrum from an artist as a programmer, with the machine carrying out the
instructions, to the learning or evolutionary behavior being so unpredictable as to
leave the programmer/artist almost entirely out of the creative process.
The machine learning models used for creative purposes can be classified into two
categories: (1) continuous; and (2) discrete or sequential models [7]. The continuous
models are used to generate continuous artifacts such as images whereas the sequen-
tial models are used mainly to generate artifacts such as text. The models or net-
work topologies used in such application include Variational Auto-Encoders (VAE),
autoregressive models, Generative Adversarial Networks (GANs), and sequence pre-
diction models. For more technical details of these models, see [8].
These models have been successfully used to generate images from curated
training data sets. Figure 1.2 is a sample image generated by the author using one
such program that uses text prompt to generate images and uses a second, “adver-
sarial,” network to refine it [9]. The text prompt given in this case was “a cityscape
in the style of Van Gogh.”
In 2016, an artwork in the style of the Dutch master painter Rembrandt was
unveiled by researchers in the Netherlands (Figure 1.3). The artwork, entitled “The
T.me/nettrain
A New Frontier in Machine Intelligence | 5
Next Rembrandt,” was generated using a training set consisting of the known
portraits painted by the seventeenth-century master [10].
For Natural Language Processing (NLP) applications, sequence prediction
models for text generation have been successfully demonstrated through training
sets consisting of either phonemes, syllables, words, or lines. One of the early and
fun examples of text generation using recurrent neural networks (RNN) was the
Shakespeare generator [11]. A sample below looks quite convincing!
KING LEAR:
O, if you were a feeble sight, the courtesy of your law,
Your sight and several breath, will wear the gods
With his heads, and my hands are wonder’d at the deeds,
So drop upon your lordship’s head, and your opinion
Shall be against your honour.
In the field of poetry, Google has an experimental AI-powered tool that can compose
poetry snippets, based on the training corpus of the works by classic American poets
T.me/nettrain
6 | Machine Intelligence
and another one that can even generate your “poetry portrait” [12, 13]. In 2016, a
short Japanese novel written by a program reached the second round of the national
literary prize contest [14].
Another interesting example in a creative field was the first AI-scripted movie,
whose screenplay was generated by using sci-fi scripts as the training set. The effect,
where professional actors enact the AI-generated scenes, is quite hilarious and unex-
pected. The movie, Sunspring, was placed in the top ten at the annual Sci-Fi London
Film Festival in one of the categories [15].
Meanwhile in the field of games, AlphaGo, developed by Google DeepMind,
defeated Mr. Lee Sedol, the winner of 18 world titles at professional Go, by a 4-1
victory in Seoul, South Korea, in March 2016. The viewership for this game reached
200 million worldwide. During the games AlphaGo played several inventive moves,
several of which –including a specific move number 37 in game two –were sur-
prising even for the defending champion. Afterwards Lee Sedol said, “I thought
AlphaGo was based on probability calculation and that it was merely a machine. But
when I saw this move, I changed my mind. Surely, AlphaGo is creative” [16].
Recently, the company OpenAI announced the Generative Pre- trained
Transformer 3 (GPT-) language model [17]. Trained on trillions of words from the
internet, it was the largest NLP model till that point. GPT-3 is aimed at natural
language answering of questions, but it can also translate between languages and
coherently generate improvised text. It has been successfully used for many tasks
from copywriting to automatic generation of computer code. Recently, OpenAI also
announced a program called DALL·E 2 in the beta testing phase that can create real-
istic images and art from a natural language description (e.g., “Teddy bears shopping
for groceries as a one-line drawing”).
Similarly, in the field of music, there are many projects that include Sony
Computer Science Laboratories’ AI-assisted music production program called Flow
Machines. DeepMind, an AI company owned by Google, has created software that
can generate music by listening to recordings [18]. At this point we are not aware of
a similar project for Indian classical or pop music.
Mathematics is another area where creativity is thought to be a preserve of the
select few individuals with natural (or God-given) gifts. Here too AI has been used
for discovery and to generate proofs of mathematical theorems. Google has an open
source project called DeepHOL that has proved more than 1,200 theorems it had
not previously seen [19]. Among the AI-based discoveries are a new formula for pi,
and some results in number theory.
T.me/nettrain
A New Frontier in Machine Intelligence | 7
discussions have closely followed the similar debate about artificial intelligence [2,
20, 21, 22].
Broadly speaking, the arguments fall into the following groups:
1. Some researchers have proposed a test for artificial creativity on the lines of the
Turing Test for judging artificial intelligence as “the philosophy, science and
engineering of behaviors that unbiased observers would deem to be creative”
[23]. This does not presume any a priori criteria and relies only on human
judgment.
2. The second camp proposes to define a criterion for creativity and then test to
see if any of the machine-generated works would pass muster. For example,
Boden [2] defined creativity as “the ability to come up with ideas or artifacts
that are new, surprising and valuable.” Of course, all three attributes listed here
are not objectively measurable and hence necessarily assume human judgment
and subjectivity.
3. The third stream of discussion is the one Alan Turing [24] referred to as
“Lovelace’s objection,” after Ada Lovelace, the celebrated English math-
ematician whom many consider to be the first computer programmer. She
wrote: “[The Analytical Engine of Charles Babbage … has no pretensions
to originate anything, since it can only do whatever we know how to order
it to perform.” The famous “Chinese room” thought experiment of John
Searle [25] for Artificial Intelligence is in a similar vein. It goes something
like this: imagine a scenario where a person ignorant of Chinese language and
script assembles sentences according to an instruction manual using cards that
have Chinese logograms. The long chain of symbols thus assembled is mean-
ingless to the person assembling it, but to a person who knows Chinese, they
may make perfect sense. As per this scenario, the output of any computer
program (including AI) might surprise its creators, but it still does not consti-
tute intelligence (or creativity) on the part of machines in any human sense.
As can be seen from the above, the debate is far from settled and there are valid
arguments on all sides. But meanwhile, surprising, useful, and sometimes aes-
thetically pleasing results are being obtained by programs that use increasingly
sophisticated techniques and datasets, as seen in Section 1.3.
T.me/nettrain
8 | Machine Intelligence
computer and the program used for generation have been seen as mere tools in the
creative process. But with the advent of sophisticated AI paradigms, the program
is not just a tool; it makes many “creative” decisions to produce an output that is
novel, useful and surprising to humans. Therefore, the long-held legal doctrines and
policies are being tested in the new AI-driven world [26].
While looking at the copyright and AI-created work, two types of situations have
to be considered: (1) how the work “created” by AI should be protected, if at all (“AI
as producer”); and (2) to what extent the data and algorithms used in training the AI
might constitute fair use and infringement (“AI as consumer”) [27].
The question of “authorship” must be decided first when considering the
provisions of copyright laws to any work in general, and AI-generated work in par-
ticular, since the author is the first owner of the copyright. For AI-generated work,
this question is not easy to answer. The candidates for authorship range from the
programmer who created the system, the user who might have given final inputs
at “runtime,” the AI computer system itself, and finally the investors who might
have funded the efforts. In most jurisdictions including the US and EU coun-
tries, the law does not recognize non-human authorship. In a recent case involving
selfies taken by monkeys, the US Court rejected any legal standing for animals to
claim copyright [28]. The same logic may then be extended to machine-generated/
assisted work.
In terms of “AI as consumer” of the copyright, it is known that lots of data are
consumed in training the AI models. Some of this data, whether in terms of text,
images, or music, may be owned by third parties. Whether there would be any
liability or not resulting from the use of such data would depend on the application
of the “fair-use” doctrine that covers transformative and derivative products that are
not market substitutes of the original copyrighted material (e.g., making thumbnail
images and storing them for training).
The topic of copyright for autonomously generated creative work using machines
is quite involved, and the legal thinking is still being shaped in the important
jurisdictions [29].
1.6 Conclusion
In this chapter we have seen a brief history of using machines to produce creative
work. We also saw the debate this has generated right from the start about the
very nature of creativity and whether machines can be creative. In many ways, the
arguments, both for and against, parallel those that are familiar from the debate on
the nature of artificial intelligence. After outlining representative samples from the
early attempts at machine creativity, we saw the virtual explosion of the use of AI
in what are traditionally considered creative fields such as painting, writing poetry,
proving mathematical theorems, or playing complex strategy games, etc.
T.me/nettrain
A New Frontier in Machine Intelligence | 9
1.7 Postscript
While this manuscript was in preparation, US-based company OpenAI released
their chatbot, ChatGPT, based on a so-called “Large Language Model” (LLM),
consisting of billions of parameters. It became a global sensation for its ability to
generate text in a variety of writing styles, invent engaging narratives, and even to
explain poetry and jokes. LLMs, coupled with the vastly improved image gener-
ation programs such as Dall-E and Stable Diffusion, have significantly redefined
the landscape of creativity in the context of artificial intelligence and machine
learning. As these models continue to evolve, they will reshape our notions of cre-
ativity itself. Furthermore, their emergence has sparked a renewed debate on intel-
lectual property rights. Additionally, these models have brought the issues of bias
and fairness to the forefront of the conversation, as their responses are influenced
by the data they are trained on. The implications of these developments extend
far beyond the academic sphere, influencing industries ranging from literature to
advertising.
References
1. Christie’s (n.d.) “Is Artificial Intelligence Set to Become Art’s Next Medium?” Available
at: www.christies.com/features/A-collaboration-between-two-artists-one-human-one-a-
machine-9332-1.aspx (accessed March 25, 2022).
2. M. A. Boden (1998) “Creativity and Artificial Intelligence.” Artificial Intelligence,
103(1): 347–356, doi:10.1016/S0004-3702(98)00055-1.
3. Victoria and Albert Museum (2011) “The Pioneers (1950–1970).” Available at: www.
vam.ac.uk/content/articles/a/computer-art-history/ (accessed April 19, 2022).
4. “Computerized Haiku.” Available at: www.in-vacua.com/cgi-bin/haiku.pl (accessed
April 20, 2022).
5. G. Ritchie (2003) “The JAPE Riddle Generator: Technical Specification.” Institute for
Communicating and Collaborative Systems, January (accessed April 18, 2022). Online.
Available at: www.academia.edu/403908/The_JAPE_Riddle_Generator_Technical_Sp
ecifi cation
6. CHM (2016.) “Harold Cohen and AARON—A 40-Year Collaboration.” Blog. August
23. Available at: https://computerhistory.org/blog/harold-cohen-and-aaron-a-40-year-
collaboration/ (accessed March 25, 2022).
T.me/nettrain
10 | Machine Intelligence
T.me/nettrain
A New Frontier in Machine Intelligence | 11
25. J. R. Searle (1980) “Minds, Brains, and Programs,” Behavioral and Brain Sciences, 3:
417–424.
26. WIPO (2017) “Artificial Intelligence and Copyright.” Available at: www.wipo.int/wipo_
magazine/en/2017/05/article_0003.html (accessed March 24, 2022).
27. E. Bonadio and L. McDonagh (2020) “Artificial Intelligence as Producer and Consumer
of Copyright Works: Evaluating the Consequences of Algorithmic Creativity,”
Intellectual Property Quarterly, 2: 112–137. Online. Available at: www.sweetandmaxwell.
co.uk/Catalogue/ProductDetails.aspx?recordid=380&productid=6791 (accessed March
24, 2022.
28. Wikipedia (2022) “Monkey Selfie Copyright Dispute.” May 1. Online. Available
at: https:// e n.wikipe d ia.org/ w / i ndex.php?title= Monk e y_ s e lfi e _ c o p yri g ht_ d isp
ute&oldid=1085659976 (accessed May 3, 2022).
29. J. K. Eshraghian (2020) “Human Ownership of Artificial Creativity.” Nature Machine
Intelligence, 2(3). doi:10.1038/s42256-020-0161-x.
T.me/nettrain
T.me/nettrain
Chapter 2
Overview of Human-
Computer Interaction
Nancy Jasmine Goldena
2.1 Introduction
HCI definition: Human-computer interaction (HCI), also known as man-machine
interaction (MMI) or computer-human interaction (CHI), is the study of inter-
active computing systems for human use, including their design, evaluation, and
implementation.
HCI defined by experts: Stuart K. Card, Allen Newell, and Thomas P. Moran
popularized the term HCI in their book, The Psychology of Human Computer
Interaction, in 1983. Carlisle was the first to coin it in 1975. Experts have defined
HCI as “computers, unlike other technologies with specialized and limited
functions, offer a wide range of applications that typically necessitate an open-
ended interaction between the user and the computer” [1].
DOI: 10.1201/9781003424550-2 13
T.me/nettrain
14 | Machine Intelligence
what they wish to do and the computer’s ability to comprehend such a user’s activity.
HCI can be implemented in any sector where computer installation is possible.
In general, HCI is the study of how to create computer systems that efficiently
meet people’s needs. HCI incorporates information and methodologies from a var-
iety of fields [2], including computer science, cognitive psychology, social sciences,
and ergonomics (human factors) (Figure 2.1).
Computer science includes technological knowledge as well as developing
abstractions, methodologies, languages, and tools to handle the problem of creating
a successful design with inspiration and an understanding of what people want.
Cognitive psychology is a branch of psychology that analyses mental processes,
such as perception, memory, problem-solving, emotions, and learning in order to
gain insight into users’ capabilities and limitations. It focuses on users and their
tasks rather than the technology they use. Ethnomethodology is a technique used
in social psychology to understand the structure and functions of organizations
and addresses human behavior with respect to context, society, and interactions.
Ergonomics (also known as human factors) is a scientific discipline that studies the
interactions between humans and other system elements, as well as a profession
that uses theory, concepts, data, and methodologies in order to improve human
well-being and overall system performance. Sociology is a social science that studies
T.me/nettrain
Overview of Human-Computer Interaction | 15
2.3 Factors in HCI
In order to analyse and build a system employing HCI principles, a significant number
of aspects must be examined. Many of these variables interact, thus complicating the
investigation. The following are the most important factors affecting HCI [3].
T.me/nettrain
16 | Machine Intelligence
1. Human-machine symbiosis
2. Human-environment interactions
3. Ethics, privacy and security
4. Well-being, health and eudaimonia
5. Accessibility and universal access
6. Learning and creativity
7. Social organization and democracy.
2.4.1 Human-Machine Symbiosis
1. Meaningful human control: It is necessary to address the issue of design trade-
offs between human control and automation. Machine learning systems must
demonstrate their ability to explain and convey knowledge in a way that
humans can comprehend. The system’s operation must be well understood by
the users and the users need to have a deeper understanding of the underlying
computational processes in order to better regulate the system’s actions.
2. Human digital intelligence: A reliable technology needs to be established that
supports and respects individual and social life, respects human rights and
T.me/nettrain
Overview of Human-Computer Interaction | 17
privacy, supports users in their activities, creates trust, and allows humans to
exploit their individual, creative, social, and economic potential, as well as
living and enjoying a self-determined life.
3. Adaptation and personalization to human needs: Other organizational and soci-
etal aspects, such as the culture of the organization and the user’s level of
expertise in the use of technology to acquire Big Data, should be considered in
personalization/adaptation, in addition to individuals’ preferences.
4. Human skills support: Instead of computers that objectify knowledge, a
method that requires instruments to assist human talent and inventive-
ness is required. AI needs to be integrated to support human memory as
well as human problem-solving, particularly in instances when a user must
handle a large quantity of sophisticated information rapidly or under stress.
Human memory and learning capacity can be increased by ensuring that
people have access to networked cognitive intelligent technology that can
help them solve common difficulties. Through the convergence of ICT and
the biological brain, technology is employed to increase human perception
and cognition.
5. Emotion detection and simulation: Technology must display emotions and
empathetic behavior in order to record and correlate the manifestation of
human emotions. Deception, whether intentional or accidental, must be
dealt with.
a. Human safety: Advanced intelligent systems that are “safe by design” are
recommended. New testing approaches must be implemented that not
only test the design and implementation of the system, but also the system’s
gained knowledge. Through focused investigations, monitoring, and ana-
lysis, an interdisciplinary group of specialists will provide evaluations and
suggestions for a safety system.
b. Cultural shift: Intelligent surroundings and AI provide a cultural transform-
ation in addition to technological hurdles and adjustments. As a result,
technological advances must progress in tandem with societal changes. To
persuade the public and the scientific community, extensive studies of the
ethical and privacy concerns, as well as design methods with their under-
lying perspectives on the role of people, are required.
2.4.2 Human-Environment Interactions
1. Interactions in the physical and digital continuum: The computer no longer appears
as a recognizable object. For vanishing computers and devices, new kinds of
affordances must be created. It is necessary to frame user commands successfully
for the right interactive artifacts. The right user command interpretation should
be delivered based on the present context. Recognizable interaction is required
for everyday objects, and also information overload might be possible.
T.me/nettrain
18 | Machine Intelligence
T.me/nettrain
Overview of Human-Computer Interaction | 19
T.me/nettrain
20 | Machine Intelligence
T.me/nettrain
Overview of Human-Computer Interaction | 21
tailoring the design to each user. As a result, many businesses regard universal
design as an additional cost or feature, which is not the case.
2. Population aging: A major concern is how senior users will be encouraged
to adopt ICT technologies, as well as how acceptable they will find them,
both of which are predicted to alter in the near future as a new generation of
technologically-adept elder users emerges.
3. Accessibility in technologically enriched environments: The variety of gadgets,
programs, and environment intelligence can put a lot of cognitive strain
on consumers, if these programs are not properly built. Intelligent settings
must be transparent to their users, but this in itself presents a serious acces-
sibility issue, especially for the elderly and people who suffer from cognitive
disabilities.
4. Methods, techniques, and tools: New evaluation methodologies and tools must
be targeted in order to take into account the changing human needs and
context of use, develop appropriate user models, advance knowledge of user
requirements, and determine whether various solutions are appropriate for
different combinations of user and environment characteristics.
5. Universal access in future technological environments: In order to prevent
anyone from being excluded, isolated, or exploited by the new futures, special
consideration must be given to any person who is at risk of exclusion. Age,
education, occupational status, health, social connection, infrastructure acces-
sibility, impact on digital inequality, including race and ethnicity, gender, and
socio-economic level are some of the key elements that must also be taken
into account.
T.me/nettrain
22 | Machine Intelligence
4. Serious games for learning: Designing for and measuring fun in the user experi-
ence of serious games is a crucial issue. It is important to strike the correct
balance between seriousness and gamification: if learners are exposed to too
much educational information, their motivation may decline; conversely, if
too much fun is included, learning may suffer. To ensure that game objectives
and learning objectives are aligned, all stakeholders need to participate in and
collaborate on design processes, metrics, and assessment methods.
5. Intelligent environments: How to give learners the appropriate material at the
appropriate time and in the appropriate manner presents a significant diffi-
culty, especially when many users are present in such settings and judgments
made by one user may be influenced by those of other users. The difficulty lies
in finding ways to increase human learning capacity while attempting to make
learning efficient and simple.
6. Pedagogical impact of learning technologies: A significant issue is that tech-
nology should put an emphasis on the viewpoints and requirements of the
learners themselves and push the frontiers of education in order to include
factors besides grades, such as motivation, self-assurance, fun, contentment,
and achieving professional goals. The fact that the same technology may per-
form differently in various economic, social, and cultural contexts presents
another barrier to the effective pedagogical use of technology in education.
As a result, a “one-size-fits-all” strategy would not be practical and technology
should be customizable and adaptable.
7. Creativity: The comprehensive support of the entire creative process, the auto-
matic retrieval and dynamic delivery of stimuli for varying degrees of relevance
to the creation task, and the provision of individualized assistance for each
participant in a collaborative creation process are indicative developments.
Tools will also need to re-evaluate how to encourage creativity in physical
environments when the digital and physical are combined.
T.me/nettrain
Overview of Human-Computer Interaction | 23
These are the top seven technological and sociological issues that need to be addressed
in order to respond to urgent societal and human requirements when using the ever-
increasing interaction intelligence.
T.me/nettrain
24 | Machine Intelligence
must be developed from the inside out. Rather than organizing the code so
that the application has control, the program must be designed as a series of
sub-routines that the user interface toolkit calls when the user does something.
Each sub-routine will have strict time limits so that it completes before the
user is ready to issue the next command, making it more difficult to organize
and modularize reactive programs.
3. Multiprocessing: User interface software is typically structured into various
processes in order to be reactive. As a result, programmers developing
user interface software will encounter well-known issues with numerous
processes, such as synchronization, thread consistency, deadlocks, and race
situations.
4. Need for real-time programming: Objects that are animated or move around
with the mouse are common in graphical, direct manipulation interfaces. To
be appealing to users, the objects must be presented between 30 and 60 times
per second with no unequal pauses. As a result, the programmer must ensure
that any processing required to calculate the feedback may be completed in
less than 16 milliseconds. This may include employing less realistic but faster
approximations and complex incremental algorithms that compute the output
based on a single changing input rather than a simpler recalculation based on
all inputs.
5. Need for robustness: The program that handles user input has particularly
rigorous requirements since all inputs must be treated graciously. A pro-
grammer must define the interface to an internal method to only work when
a specific type of data is supplied, while the user interface must always take
any conceivable input and continue to operate. Furthermore, unlike internal
procedures, which may abort to a debugger when an incorrect input is
detected, user interface software must respond with a helpful error message
and allow the user to restart or repair the error and continue. User interfaces
should also allow the user to cancel and undo any operation. As a result, the
programmer must design all activities so that they can be canceled during
execution and reversed after completion. To accommodate this, special data
structures and coding styles are usually required.
6. Low testability: Because automated testing methods have problems supplying
input and testing output for direct manipulation systems, user interface soft-
ware is more challenging to test.
7. No language support: Programming user interface software is challenging since
today’s programming languages lack necessary features such as primitives
for graphical input and output, as well as reactive programming and
multi-processing.
8. Complexity of the tools: A huge number of tools designed to improve the
software’s user interface are notoriously difficult to use. The tool manuals
are often many volumes long and contain hundreds of procedures, therefore
T.me/nettrain
Overview of Human-Computer Interaction | 25
learning to program user interfaces with these tools often requires extensive
training. Despite their size and complexity, the tools may not even provide
enough flexibility to accomplish the intended result.
9. Difficulty of modularization: One of the main ways to make software easier to
design and maintain is to properly modularize the various elements. The user
interface should be segregated from the rest of the software, in part so that it
may be quickly modified (for iterative design). Unfortunately, programmers
discover in practice that it is difficult or impossible to segregate the user inter-
face and application elements, and modifications to the user interface fre-
quently necessitate reprogramming parts of the application as well.
T.me/nettrain
26 | Machine Intelligence
■ Accessibility: A system’s environment that is most usable and has the most cap-
abilities. Web accessibility refers to the ability of persons with impairments to
interact, perceive, comprehend, and navigate the web.
■ User experience: Beyond efficacy, efficiency, and satisfaction, this covers the
elements of user interactions with and responses to systems.
Designers must keep in mind and examine these essential aspects and issues when
creating and implementing HCI systems.
HCI is a scientific and practical sector that first developed in the late 1970s and
early 1980s as a sub-field of computer science. HCI has grown rapidly and consist-
ently, attracting individuals from a variety of disciplines and encompassing a wide
range of concepts and methodologies.
During the 1980s and 1990s, HCI was primarily concerned with developing
systems that were simple to understand and use. An individual could not afford a
computer at that time. Computers developed into communication tools between
the 1990s and the early 2000s. It became essential to examine external impacts and
how interactions develop across technologies and enterprises. During this time,
email became ubiquitous, which meant that people were not only connecting with
computers, but also with one another through computer systems.
During the 2000s and 2010s, value- driven innovation took the lead in
implementing initiatives and designing interfaces for long-term development. A new
interface design approach is needed for complicated connections among people,
environments, and tools [7].
Currently, Ubiquitous Computing (UC) is now the most active research field in
HCI. UC and immersive technologies such as augmented reality (AR) are becoming
extremely important in HCI.
2.7 Components of HCI
A computer interface is a medium that allows any user to communicate with a com-
puter. HCI is made up of three main components:
■ a human
■ the computer
■ the interaction between them.
The interface is essential to most HCI. To create effective interfaces, users must
understand the limitations and capabilities of the components. To accomplish
various tasks, humans interact with computer systems. Input-output channels vary
between humans and computers [8]. Input and output channels for human and
computer components are briefly discussed here.
T.me/nettrain
Overview of Human-Computer Interaction | 27
■ Long-term memory: This has the ability to store data for a long time.
■ Short-term memory: All information that enters the system is stored in short-
term memory, for a limited time.
■ Sensory memory: Sensory memories are short-term memories that occur in our
sensory organs.
■ Visual perception: Visual perception is the ability to perceive the environment
through the eyes, including colors, patterns, and structures.
■ Auditory perception: Auditory perception is the ability to receive and com-
prehend information provided to the ears via audible frequency waves trans-
mitted through the air.
■ Speech and voice: Speech has become a more frequent mode of communica-
tion with computer systems. The most popular examples are Amazon Echo,
Apple’s Siri, and Google Home.
■ Text input devices: A text input device, sometimes known as a text entry device,
is an interface for entering text data into an electronic device. A mechanical
computer keyboard is a well-known example of an input device.
■ Speech recognition: This is a feature that allows a computer to transform
human speech into text, also known as automatic speech recognition (ASR).
■ Mouse/touchpad/keyboard: A mouse is a small, movable device that can be
wired or wireless and allows the user to control a variety of functions on a
computer. Used with the keyboard to control input.
■ Eye-tracking: Eye-tracking is a sensor technology that can detect and monitor
the movement of a person’s eyes in real time.
■ Display screens: The graphic design and layout of user interfaces on displays are
referred to as screen design.
■ Auditory displays: The use of audio to convey information from a computer to
a user is known as auditory display.
■ Printing abilities: Users have the option of printing their own customized user
interface.
T.me/nettrain
28 | Machine Intelligence
■ Learnability and familiarity determine how quickly a novice user can learn to
interact with a system. To do so, a system must be predictable.
■ To be effective, all characters and objects must be available.
■ Users will make mistakes, but the mistakes should not interrupt the system
or have a negative impact on an interaction. To prevent potential mistakes,
warnings should be provided and the system should also offer assistance when
potential errors occur.
■ Any non-trivial critical action should be authenticated.
■ Reduce the amount of information that needs to be recalled in the gaps
between tasks.
■ The system may struggle to meet the user’s needs if the information given is
insufficient or inconsistent.
■ Don’t overload the user with data; instead, choose a presentation layout that
enables an efficient information system.
■ Use common labels, abbreviations, and colors that are most likely.
T.me/nettrain
Overview of Human-Computer Interaction | 29
Several HCI principles have been suggested by three experts: Ben Shneiderman,
Donald Norman, and Jakob Nielsen.
The following eight broad guidelines were developed by Ben Shneiderman, an
American computer scientist, who documented certain hidden truths regarding
design:
1. Consistency is important.
2. Universal usability should be considered.
3. Provide constructive feedback.
4. Create dialogs to achieve closure.
5. Errors should be avoided.
6. Allow easy action reversal.
7. Encourage internal locus of control.
8. Short-term memory load should be reduced.
Using these eight rules, you can tell the difference between a good and a bad inter-
face design. These are useful in the experimental evaluation of better graphical user
interfaces (GUIs).
Donald Norman recommended seven stages for transforming challenging tasks:
T.me/nettrain
30 | Machine Intelligence
The ten Nielsen principles listed above serve as a checklist for the heuristic evaluator
while analyzing a design [8, 10].
T.me/nettrain
Overview of Human-Computer Interaction | 31
Users do not need to consider the intricacies and complexity of using the com-
puting system while adopting efficient HCI designs. Interfaces that are user-friendly
ensure that user interactions are clear, accurate, and natural.
There are four HCI design approaches [11] that can create user-friendly, effi-
cient, and intuitive user-interface designs:
T.me/nettrain
32 | Machine Intelligence
2.11 HCI Devices
Computers are now present in almost every aspect of our lives. In the earlier stage,
batch processing techniques became popular in the nineteenth century. Human
operators used punched cards to feed data into the system. These cards had holes
in them and light was flashed through the holes; wherever the light passed through
represented a one, otherwise it was a zero. The user-friendly interface appeared over
time. The Xerox Star was the first workstation to feature a commercial GUI OS with
a desktop and bit-mapped displays.
At present, the smartphone revolution has arrived. Massive computational power
is contained within a portable handset. Virtual assistants such as Siri, Cortana,
Google Assistant, Alexa, and Bixby use voice recognition technology to make calls,
take notes, and have extended, nearly human, conversations. An analog to digital
converter (ADC) converts the spoken word into a series of ones and zeros. This is
then broken into fractions of a second. They are paired with phonemes, which are
the smallest elements of a language. Complex statistical methods are now being
investigated to determine the context of spoken words. The Internet-of-Things is
the most exciting means of communication. Smartphone, laptops, cars, and homes
will all be connected to a vast network of websites. Sensors at various locations will
collect vital data. When the garbage cans are full, sensors will tell automated trucks
to dispose of the segregated garbage [13]. Without leaving the house, one can ask a
virtual assistant to turn on a car’s heater.
T.me/nettrain
Overview of Human-Computer Interaction | 33
activities by minimizing effort and enhancing performance. Some of the most recent
technologies, as well as their associated tools, are briefly discussed here.
T.me/nettrain
34 | Machine Intelligence
2.13.2 Remote Eye-Tracking
Modern remote systems do not require any interaction with the participant, they
are termed “remote.” The camera is structured to provide a distant image of the eyes
and the systems can automatically adjust the camera’s view point. They track eye
location and head orientation using the eye center and cornea reflection remotely.
While the eye-tracking device is working, a person can use a computer normally.
This technology is suitable for usability testing, behavioral psychology, vision experi-
mentation, and screen-based market research.
Limitations: Remote devices are only capable of tracking a fixed working area.
Touch-screens can be challenging since the participant must frequently reach across
T.me/nettrain
Overview of Human-Computer Interaction | 35
the camera, generating data gaps. Non-infrared light is blocked by an optical filter in
most remote systems. They can work in any artificial light.
2.13.3 Mobile Eye-Tracking
Eye-tracking glasses, used in mobile eye-tracking, are also known as “head-mounted”
(Figure 2.4). This type of technology usually necessitates the placement of a camera
in the visual path of one (monocular) or both (binocular) eyes, as well as another
camera that records the scene or field of view. This technology is used in various
studies such as sports, driving, navigational, social communication, hand-eye coord-
ination, mobile device and retail inventory testing.
Limitations: These devices, like all eye-tracking equipment, can struggle to cap-
ture eye movements in bright light. Eye-tracking cameras require an uninterrupted
view of the eyes. Sometimes edges are difficult to track and mostly result in lower
accuracy. When adopting mobile eye-tracking equipment, no absolute coordinate
system exists.
T.me/nettrain
36 | Machine Intelligence
(AR) gadgets. Systems built into VR and AR devices can also be a control system,
allowing the user to interact with information by moving their gaze [13] (Figure 2.5).
Where there is no mouse or keyboard, the technology can provide an effective con-
trol method for interfaces in AR and VR.
■ Apple’s Siri: After its launch in 2011, Apple’s Siri quickly became the most
popular voice assistant. Siri is available on all iPhones, iPads, Apple Watches,
HomePods, Mac desktops, and Apple TVs. Siri follows the user wherever
they go, i.e., on the road, at home, and physically on their body. This pro-
vide Apple with a significant advantage in terms of early adoption. Siri is
capable of sending a text message or making a phone call on your behalf
(Figure 2.6).
T.me/nettrain
Overview of Human-Computer Interaction | 37
■ Alexa from Amazon: Alexa and the Echo were introduced worldwide by
Amazon in 2014, starting the smart speaker revolution (Figure 2.7). Alexa
can be found in the Echo, Echo Show (a voice-controlled tablet), Echo Spot
(a voice-controlled alarm clock), and Echo Buds headphones (Amazon’s vari-
ation of Apple’s AirPods). Alexa was ahead of the curve in terms of smart
home device integration, including cameras, door locks, sound systems,
lighting, and thermostats. Users can ask Alexa to reorder their garbage bags,
and it will simply order them from Amazon. In fact, Alexa can order millions
of things from Amazon without ever having to lift a finger, giving it an advan-
tage over its competitors.
■ Google Assistant: In early 2016, Google Home launched Alexa’s most signifi-
cant competitor. Google Assistant not only responds appropriately, but also
T.me/nettrain
38 | Machine Intelligence
■ Tessel 2: The Tessel 2 is a development board with built-in WiFi that allow
users to write Node.js programs. Tessel is a networked hardware prototyping
system that can be used in a variety of applications. Tessel is a JavaScript-
capable microcontroller that allows users to quickly create physical devices
that connect to the internet.
■ Eclipse IoT: Eclipse IoT is an open-source community that provides a frame-
work for developing IoT projects. This platform allows developers to propose
T.me/nettrain
Overview of Human-Computer Interaction | 39
T.me/nettrain
40 | Machine Intelligence
T.me/nettrain
Overview of Human-Computer Interaction | 41
These four technologies are widely used in a variety of applications. The following
are cloud computing tools:
T.me/nettrain
42 | Machine Intelligence
2.17 HCI Applications
HCI is important since it will be required for products to be more effective, secure,
useful, and efficient [18]. It will improve the user’s experience in the long term.
Some of the most useful applications are.
■ In daily life: Technology has now penetrated every aspect of human life. Even
if a person does not own or use a computer directly, computers still have an
impact on their lives. ATM machines, train ticket vending machines, and hot
beverage vending machines are just a few examples of computer interfaces
that individuals can use on a daily basis without owning a computer. HCI
concepts should be investigated and properly considered when developing
any of these systems or interfaces, whether it is for an ATM or a desktop PC,
to ensure that the interface is comfortable, helpful, and effective.
■ The commercial sector: HCI is essential for every business that uses computers
in its daily operations. Computers are used by employees in their daily activ-
ities. They may use distinct software for their operations. If the design is poor,
they will experience frustration and other issues as well. HCI is especially
important when designing safety-critical systems, such as power plants and air
traffic control centers. Design errors can have severe implications in certain
situations, including the death of the user.
■ Assistive technology: HCI is an important concern when selecting systems that
are not only effective but also accessible to people with disabilities. The core
T.me/nettrain
Overview of Human-Computer Interaction | 43
2.18 Advantages of HCI
HCI is a field that cannot be avoided because it has now been integrated into every
phase of product development. There are numerous advantages of HCI and some
of them are:
■ Users learnt how to communicate with computers using the keyboard, mouse,
and desktop when computers first arrived. Users started with Microsoft Excel
for calculations, Word for documents, PowerPoint presentations, and virtual
desktops via GUIs, but GUIs have revolutionized the way computers and
users interact.
■ HCI improves data security.
■ Mobile phones are the most widely used medium of communication in the world.
■ Music players, global positioning system (GPS), news apps, calendars, games,
and other applications are all available on today’s smartphones.
■ HCI is adaptable, versatile, and simple to use.
■ AR and VR are making games more fascinating.
2.19 Disadvantages of HCI
HCI can be very useful, but there are also certain disadvantages:
T.me/nettrain
44 | Machine Intelligence
2.20 Conclusion
The study and deliberate design of human and computer interactions are known as
human-computer interaction (HCI). HCI is used in a variety of computer systems,
such as in offices, nuclear processing, air traffic control, and video games. HCI
systems are simple, secure, efficient, and interesting. There is a vast body of know-
ledge on HCI, comprised of concepts, rules, and design standards. In this present
technological age, cleverly designed computer interfaces encourage the use of digital
devices. HCI allows two-way communication between man and machine. Because
of the effectiveness of the communication, people assume they are interacting with
human personalities rather than a complicated computational system.
Nonetheless, creating an appropriate interface at the first try is impossible. This
will be a huge challenge in the coming years. Tasks, users, needs, context analysis,
and interface design must all have more explicit, direct, and formal relationships.
Current ergonomic information should also be used more directly and simply in
interface design. As a result, it is essential to provide a solid HCI foundation that
will effect future applications, such as targeted marketing, eldercare, and even psy-
chological trauma healing.
References
1. Wilbert O. Galitz (2007) “The Essential Guide to User Interface Design.” DreamaTec.
Available at: https://mrcet.com/pdf/Lab%20Manuals/IT/ R15A0562%20HCI.pdf
2. Khalid Majrashi, and Margaret Hamilton (2014) “User Experience of University
Websites.” Available at: www.researchgate.net/publication/295857951_User_Experi
ence_of_University_Websites
3. Jenny Preece (1994) “Components of HCI.” Available at: http://ir.lib.cyut.edu.tw:8080/
bitstream/310901800/14308/3/c2.pdf
4. Constantine Stephanidis, Gavriel Salvendy, Margherita Antona, Jessie Y. C. Chen,
Jianming Dong, and Vincent G. Duffy (2019) “Seven HCI Grand Challenges.” https://
doi.org/10.1080/10447318.2019.1619259
5. Brad A. Myers (1993) “Why Are Human-Computer Interfaces Difficult to Design and
Implement?” Available at: www.cs.cmu.edu/~bam/papers/WhyHard-CMU-CS-93-
183.pdf
6. Lowella Viray (2016) “Exploring Human- Computer Interaction: Its Issues and
Challenges.” Available at: www.academia.edu/38324061/Exploring_human_com-
puter_interaction_its_issues_and_challenges
T.me/nettrain
Overview of Human-Computer Interaction | 45
7. Fakhreddine Karray et al. (2008) “Cooperative Multi Target Tracking Using Multi
Sensor Network.” International Journal on Smart Sensing and Intelligent Systems, 1(3).
8. Educative (2021) “Introduction to HCI and Design Principles.” Blog. Available at:
www.educative.io, www.educative.io/blog/intro-human-computer-interaction (accessed
May 31, 2022).
9. The Interaction Design Foundation (2014) “The 5 Main Characteristics of Interaction
Design.” Available at: www.interaction-design.org/literature/article/the-5-main-characte
ristics-of-interaction-design (accessed August 1, 2015).
10. Educative (n.d.) “Introduction to Human-Computer Interaction & Design Principles:
Interactive Courses for Software Developers.” Available at: www.educative.io, www.
educative.io/blog/intro-human-computer-interaction (accessed May 31, 2022).
11. Alma Leona (2016) “Design Methods and Methodologies in HCI/ID.” Available at:
www.uio.no/studier/emner/matnat/ifi/INF2260/h16/lec2_designmethods.pdf
12. Wikipedia (n.d.) “Human–Computer Interaction.” Available at: https://en.wikipedia.
org/wiki/Human%E2%80%93computer_interaction (accessed August 1. 2021).
13. Bitbrain (n.d.) “Different Kinds of Eye Tracking Devices.” Blog. Available at: www.bitbr
ain.com/blog/eye-tracking-device (accessed June 12, 2020).
14. Summa Linguae (2021) “The Complete Guide to Speech Recognition Technology.”
Available at: https://summalinguae.com/language-technology/guide-to-speech-recognit
ion-technology/ (accessed June 12, 2021).
15. Eduonix (2019) “Top 10 Popular IoT Development Tools.” Blog, May 14. Available at:
https://blog.eduonix.com/internet-of-things/top-10-popular-iot-development-tools/
16. Javatpoint (n.d.) “Cloud Computing Technologies.” Available at: www.javatpoint.com/
cloud-computing-technologies (accessed May 31, 2022).
17. Jigsaw Academy (n.d.) Blog. Available at: www.Jigsawacademy.Com/Blogs/Cloud-
Computing/Cloud-Computing-Tools/ (accessed May 31, 2022).
18. Utsav Mishra, and AnalyticsSteps (2021) “Human- Computer Interaction (HCI):
Importance and Applications.” Blog. Available at: www.analyticssteps.com/blogs/
human-computer-interactionhci-importance-and-applications
T.me/nettrain
T.me/nettrain
Chapter 3
Edge/Fog Computing:
An Overview and Insight
into Research Directions
Priya Thomas and Deepa V. Jose
3.1 Introduction
Cloud computing has emerged as a platform which provides on demand access to
computing resources and services anywhere, at any time. It enables remote access
for storage, software, data repositories, etc. and offers services such as networking,
data filtering, and analytics. The cloud services can be accessed using the pay per
use policy, making it more flexible and customer-friendly. There are lots of cloud
providers, such as AWS, Google Cloud, Microsoft Azure, etc., that guarantee
seamless services to end users. The reliability of the cloud technology will greatly
reduce the infrastructure and maintenance costs, making it very popular.
The cloud computing technology can be classified into public, private, and
hybrid cloud, based on the access model followed. The public cloud offers services
to users by resource sharing. The users can access the services of the remote cloud
without building their own infrastructure. Data availability depends on the under-
lying network condition, and the security and privacy policies will be defined by the
cloud providers. The private cloud provides dedicated infrastructure to users with
more security as they are built within the premises of an organization. It helps to
access data faster and with greater reliability. Hybrid clouds are a combination of pri-
vate and public clouds which can be relied on to improve the scalability. This type
helps to access all the benefits of the public cloud without compromising security
and speed.
DOI: 10.1201/9781003424550-3 47
T.me/nettrain
48 | Machine Intelligence
All these services are available to cloud users, with choice of service based on their
requirements.
Cloud computing offers numerous benefits to its users, such as scalability, agility,
elasticity, low cost, high performance, higher productivity, etc. Cloud computing
also suffers from a few drawbacks [1], such as single point of failure, security issues,
high response time, longer processing delays, latency, less reliability, etc. Research is
continuously going on to improve the features of cloud computing models. Among
the different solutions, one of the most trustworthy solutions is the incorporation
of fog/edge computing. The terms fog and edge are usually used interchangeably in
the literature, and these two types of computing help to improve the performance of
applications which depend on cloud computing technology.
The term fog computing was originally coined by Cisco. The concept of the
edge/fog was introduced to bring programming to the edge of the network instead
of remote data centres. The major drawback faced by cloud computing was related
to high response time due to the longer distance travelled by the data. The cloud data
centres reside in geographically distant locations from the user or application which
uses it. This significantly increases the response time and the delay will be much
more if the network is congested, making cloud a less promising solution for delay-
sensitive applications. Edge/fog computing helps to reduce this problem by reducing
the distance between the end user and the computing nodes. Even though the terms
are used synonymously, there is a difference between edge and fog. The difference
mainly lies in the location where the data is processed.
In edge computing, the data is processed on the edge of the network. For
example, if the Internet of Things (IoT) is an application that implements edge
computing platform, the IoT sensor itself can be configured as an edge node and
data can be processed in this node. In fog computing, the processing happens in
processors connected to a local area network (LAN) or the LAN hardware itself
which is nearer to the end user. So, both of these computing platforms help to pro-
cess data locally and respond faster to the application. The time-sensitive data will be
processed in these edge/fog nodes and the response will be transferred immediately
to the end devices, thereby reducing latency and delay considerably. The less time-
sensitive data will be transferred to the remote cloud for further processing, forming
a three-layered architecture.
T.me/nettrain
Edge/Fog Computing | 49
The introduction of edge/fog computing has greatly reduced the delay in getting
a response in time-sensitive applications [2]. The edge/fog layer is not an independent
entity. It relies on the cloud for further processing and permanent storage of data.
Thus, the end devices, the edge/fog layer and the cloud together form the design
architecture for many applications. Like any technology, the edge/fog also has a
few drawbacks which are an area where active research is still going on. The recent
research shows that the latency, response time, delay, etc. can be considerably reduced
by incorporating edge/fog computing, whereas the impact on other parameters needs
to be tested. The lack of a service-level agreement (SLA), offloading issues, resource
monitoring, etc. need to be addressed to make the system much better. The major
challenges to be addressed on edge/fog incorporation are discussed in later sections.
In short, cloud computing helps to reduce the infrastructure and maintenance
costs by providing on-demand access to a pool of resources, including software,
databases, storage, etc. and provides numerous services to users based on a pay
per use policy. One of the major drawbacks of cloud computing lies in the delay
in getting a response as the data centres are in geographically remote locations.
The cloud also suffers from security and privacy issues and higher bandwidth con-
sumption. The introduction of edge/fog computing solves many issues related to
cloud implementation. Section 3.2 gives an introduction to edge/fog computing,
defines the architecture, the benefits of edge/cloud integration, applications and
discusses research challenges to be addressed when integrating edge/fog to the
cloud environment.
3.2.1 Edge Computing
Edge computing refers to performing the processing at the edge of the network,
which can be either in the IoT sensor itself or the gateways [3]. The time-sensitive
data will be analysed and processed in the network’s edge and the response will be
immediately sent to the end devices. Many companies are making the shift from
cloud to edge or have incorporated the edge as a middle layer to improve the per-
formance of the existing applications.
T.me/nettrain
50 | Machine Intelligence
3.2.2 Fog Computing
Fog computing is synonymous with edge computing with the difference being only
in location where the data is processed. The fog follows a distributed architecture
where the incoming tasks are offloaded to different fog nodes [2]. In fog computing,
the processing happens in LAN hardware or processors attached to a LAN. This also
greatly reduces the response time and reduces latency issues.
According to Cisco, fog computing can be defined as an extension of the cloud
computing paradigm from the core of the network to the edge of the network. It
is a highly virtualized platform which supports processing, analytics, data storage,
and networking services between end devices and cloud servers. The fog services
are usually evaluated using the Quality of Service (QoS) metric. The QoS is usually
evaluated based on four parameters: (1) connectivity; (2) reliability; (3) capacity;
and (4) delay.
Connectivity refers to the ability to stay connected to the network using the
appropriate protocols. As the fog follows a distributed nature, failure of one fog
node will not affect the connectivity. Reliability is the measure of how error-free
the data is. Fog computing performs processing in close proximity to the end
devices. This will considerably reduce the errors and issues which can happen
for data which needs to travel long distances. This ensures reliability in fog
computing.
Capacity is defined by bandwidth usage and storage. Capacity needs can be
improved based on how well the data is offloaded to different fog nodes and how
well the filtering is performed and the cloud resources are utilized. Delay is the time
T.me/nettrain
Edge/Fog Computing | 51
between the request and the response, which also affects the QoS requirement of
fog computing. As processing happens on the edge of the network, latency will be
reduced and the response time will be shorter, which considerably reduces delay and
improves performance.
T.me/nettrain
52 | Machine Intelligence
clearly shows that response time, delay and jitter will be comparatively less for edge/
fog. The security is usually predictable and can be defined in the edge/fog whereas
it is undefined in the cloud. The chance of an attack on the data en route is much
less in fog computing and less for edge computing. But cloud computing has more
chance of attacks. The cloud also does not support location awareness and mobility.
The cloud is designed as a centralized data centre whereas both edge and fog follow
a distributed pattern.
3.2.4.1 Low Latency
One of the most promising characteristics of edge computing is latency reduction.
The latency can be defined as a delay in communication in simple terms. It refers
to the overall delay happening for packets from the source to the destination, right
from packet transfer by the sender to decoding at the receiver side. As the edge com-
puting performs computations at the edge of the network, latency can be consider-
ably reduced compared to cloud implementation.
T.me/nettrain
Edge/Fog Computing | 53
3.2.4.3 Mobility Support
Fog/edge computing supports mobility with the help of protocols, such as Locator
ID Separation Protocol (LISP). LISP is an RFC 6830 standard protocol developed
by the Internet Engineering Task Force LISP Working Group. The LISP protocol
decodes the location information from the host packet and provides the required
mobility support for applications using edge computing.
3.2.4.4 Heterogeneity
Heterogeneity refers to different types of elements and technologies involved in
the edge computing infrastructure. The edge computing elements include the end
devices, servers and the edge network which performs a variety of functions. The
APIs, communication protocols, policies and platforms involved also contribute to
the heterogeneous nature of edge computing.
3.2.4.5 Location Awareness
Applications which require location awareness such as transportation and utility
management require edge/fog computing environment. Cloud computing does not
support geographically based applications due to lack of the location awareness fea-
ture. The edge/fog helps to collect and process information based on geographic
location.
T.me/nettrain
54 | Machine Intelligence
3.2.5.2 Speed
The edge/fog computing services are provided at the edge of the network which
increases the speed in communication. The incoming requests are processed faster
and the response is generated immediately. The data which needs detailed processing
only will be transferred to the remote cloud. This will reduce the traffic in the net-
work connecting edge and the cloud also. This integration greatly enhances the
speed and performance of the system.
3.2.5.3 Scalability
The edge computing infrastructure can be scaled as per the requirements. Companies
can easily expand the edge infrastructure in their premises as needed. The fog com-
puting which is more scalable than edge can be used for organizations which needs
to scale their resources.
3.2.5.4 Performance
The performance of applications which rely on edge/fog computing improves as the
delay, jitter, response time, etc. will be reduced. The faster processing and filtering
done at the edge level contribute to high performance. Proper task scheduling
algorithms should be chosen to distribute the load among the available fog nodes.
This also will enhance the performance of the applications.
3.2.5.5 Security
The edge computing offers better security than the cloud as the edge/fog imple-
mentation lies within the premises and is under the control of the company. The
company can set up the required policies and can decide which data has to be passed
to the remote cloud and which has to be processed locally, which enhances security.
The data need not travel longer distances, reducing the chance of data manipulation.
The longer the transmission time, the more the chance of attacks and data loss. The
edge/fog reduces this risk and thus contributes to security.
3.2.5.6 Reliability
Edge/fog computing offers a more reliable service compared to the cloud. The
cloud-enabled applications completely rely on the remote cloud for storage, data
access and data processing. A failure or crash of the cloud server makes the cloud
services completely inaccessible. Fog and edge computing models are distributed in
nature, reducing the problems arising from a single point of failure. The failure of
one node will not affect the other nodes and the data transfer. The failed node can
T.me/nettrain
Edge/Fog Computing | 55
be identified and tasks can be rescheduled to compensate the loss. The data can be
monitored properly to avoid errors creeping in, making the system more reliable
and trustworthy.
3.2.5.7 Agility
The fog/edge computing model can be built by small and individual businesses,
making it more agile. The system deployment is now possible with the help of open
software interfaces and development kits. There are numerous edge/fog simulation
platforms which can be explored to test the use cases.
The edge/fog devices can easily be configured and require lower infrastructure
costs for operation and maintenance. The edge/fog cooperates with the cloud data
centre and follows a layered design approach. The available architectural pattern for
edge/fog incorporation and its features are discussed next.
T.me/nettrain
56 | Machine Intelligence
3.3.1.2 Fog/Edge Layer
The fog layer includes the fog nodes, fog devices and fog servers in general. Fog
nodes are the devices which are configured to collect data from the virtual and phys-
ical nodes. The fog nodes include fog servers where the data collected is stored and
analysed. Fog servers will be connected to fog devices which act as an intermediate
device between the end devices and the servers. The fog devices need to be connected
to the fog servers and this is enabled using gateways [8].
The fog servers will be responsible for performing the offloading and storing
various algorithms which decide the offloading strategy. Offloading refers to the dis-
tribution of incoming requests to the available fog nodes. The fog layer also includes
T.me/nettrain
Edge/Fog Computing | 57
T.me/nettrain
58 | Machine Intelligence
any errors and communications have to be properly encoded. The security policies
for the connected applications and users haves to be properly framed and should be
monitored strictly, using security tools to maintain the reliability of the data.
The different units in the fog/edge layer cooperate with each other and work in
synchrony with the upper cloud layer and lower physical layer.
3.3.1.3 Cloud Layer
The filtered data from the middle layer will be forwarded to the upper cloud layer
for permanent storage. The cloud performs detailed analytics and processing which
cannot be done in the fog layer. The cloud is rich in resources and act as a supporting
layer for the intermediate edge/fog layer. The cloud layer provides the permanent
storage of data and data accessibility at any time.
The cloud layer includes components such as client interface, which includes a set
of graphical user interfaces (GUIs) which help applications to interact with the cloud,
applications which include software capable of delivering the required functionalities
for requesting client models and a wide range of services models, such as Software as
a Service, Platform as a Service, Infrastructure as a Service, etc. The cloud has a large
volume of storage space and proper security principles to ensure the safety of the data.
The management module in the cloud is responsible for managing the front end and
back end and also checks resource utilization and availability. The management module
ensures that communication happens without errors and also monitors the perform-
ance of the system. The cloud storage and services can be accessed remotely. The major
components of the cloud layer work together to ensure data transfer between the lower
layers. The response time depends on availability and network conditions. But as the
edge/fog are capable of performing most of the processing locally, the performance of
connected applications can be guaranteed. The different units in the cloud layer which
supports data manipulation are shown in Figure 3.3.
The major components of the sub-layers of the three-tier architecture coord-
inate together to improve the performance of the connected system. The working
of the model can be summarized as follows. The lowermost end user layer continu-
ously generates data which moves to the edge/fog layer through the communication
medium. The fog devices in the middle layer accept the data and perform analysis.
The time-sensitive data will be processed in a faster mode to reduce the response
time. The response generated will be sent back to the requesting devices in the lower
layer. The edge/fog layer has data processors which decide whether the data has to
be offloaded to fog nodes or to the remote cloud [7]. If the data is not time-critical,
it will be filtered and sent to the remote cloud for further storage and processing.
The cloud layer, after processing the request, will send the response back to the lower
layer. Thus, the combined strategy helps to reduce the delay without compromising
the performance and resource requirements.
The above model has numerous applications in different sectors. Most of the
IoT applications are currently shifting from cloud-only infrastructure to edge/
T.me/nettrain
Edge/Fog Computing | 59
fog–cloud integration to reduce latency. The research in this area is active and
different algorithms are being tested to make the applications more productive. The
different layers are independently analysed and problems are identified. The research
initiatives in this area are gaining popularity nowadays.
3.4.1 Healthcare Industry
The healthcare sector generates a vast amount of data every second, which needs
to be stored, processed and analysed. The analysis and decision-making process of
health information are highly time-critical and cannot tolerate delay. Edge or fog
T.me/nettrain
60 | Machine Intelligence
computing reduces the delay by providing faster response times. The data can be
filtered and transferred to the cloud for permanent storage.
3.4.2 IoT Applications
IoT applications depend on the cloud for storage and processing as the IoT nodes
are resource-constrained. The sensors generate millions of bits of data which need
to be stored and processed at a fast rate. Cloud computing models suffer from huge
delays as the bulk amount of raw data needs to be filtered and analysed before gen-
erating a response. Edge/fog computing helps data filtering and reduces delay as the
processing happens near the IoT devices. The bandwidth consumption, delay and
traffic congestion will be reduced after the integration of edge and fog computing.
3.4.5 Manufacturing Industries
Manufacturing industries require faster response as their operations are highly time-
critical. The delay in getting the response from the remote cloud can result in on-site
accidents and errors. The machines generate bulk amounts of raw data which need
to be analysed to derive patterns and decisions. The huge volume of data will over-
load the cloud server, reducing its capacity to generate a response. Edge/fog com-
puting helps by data filtering, rather than passing massive amount of raw data to the
central server. This makes the decision-making process easier. The edge also helps in
processing data locally and reducing the response time and transportation cost for
time-critical applications.
T.me/nettrain
Edge/Fog Computing | 61
3.4.6 Automobile Industry
EdgeAI is a promising platform for the automobile industry. The automobile
industry is rapidly growing. Edge computing provides the required support for pre-
dictive maintenance, assists drivers in identifying problems and taking emergency
control, driver monitoring to prevent accidents, driver identification support, etc.
Emergency situations can be properly tracked and alerts can be given with the
incorporation of EdgeAI.
3.4.7 Summary
The edge/fog computing also supports online gaming applications, video streaming
applications, industrial control applications etc. The user experience can be greatly
improved in online streaming applications with edge/fog incorporation. Edge/Fog
is a promising solution for all time-critical and delay-sensitive applications.
3.5.1 Increased Complexity
The edge/fog computing is distributed in nature with fog nodes spanning across
multiple locations. The system resources also will be distributed among different
locations, making the design complex. The resource allocation and maintenance
should be done properly to improve the performance of the applications. Proper
research initiatives have to be undertaken to reduce the complexity of distributed
implementation.
T.me/nettrain
62 | Machine Intelligence
still ongoing. The common encryption policies and security algorithms need to be
implemented and tested in the edge/fog layer to understand the potential risks.
3.5.3 Task Scheduling/Offloading
Offloading is the process of loading the given tasks to the available fog nodes in a
distributed manner. The incoming requests from different devices need to be effi-
ciently distributed to improve the performance of the application. The fog imple-
mentation should choose the appropriate load balancing algorithms to effectively
offload the data. This is one of the challenges to be addressed in fog /edge implemen-
tation. Research work should be focused on designing newer algorithms to perform
offloading in an efficient manner.
3.5.4 Power Consumption
The fog nodes span across multiple locations, increasing the power consumption.
Not a lot of research has been done in this area. The power-constrained end devices
are sometimes configured as edge devices. The edge nodes need to perform complex
processing and analytics which can make the battery run down faster. This challenge
has to be prioritized and appropriate solutions have to be identified.
3.5.5 Data Management
Data management is another challenge which needs to be addressed in fog imple-
mentation. The distribution of incoming data, the collection of data, data caching,
encryption principles, namespace issues, etc. need to be addressed to effectively
manage the data. The algorithms should be carefully designed to manage data
effectively.
3.5.6 Quality of Service
The fog nodes are usually heterogeneous in nature. The QoS requirements need to
be satisfied by different participating nodes. The quality has to be maintained with
respect to the connectivity, data content, delivery, etc. The incorporation of different
fog nodes with varying capacities can be a challenge for industrial applications [10],
where QoS requirements must be guaranteed. Proper research has to be done to
enable edge/fog for QoS-guaranteed applications.
T.me/nettrain
Edge/Fog Computing | 63
etc. But current research work focuses only on one or two parameters and doesn’t
support a combined strategy. This is an area of open research where fog designs
should include multiple characteristics and an efficient system has to be designed.
The fog/edge computing suffers from a few challenges as discussed above. As the
technology is still in its infant stage, there are a lot of open research problems which
need to be addressed. The fog security and privacy, data offloading in distributed fog
nodes, bandwidth utilization in fog, multi-characteristic fog design ideas, etc. are
some of the areas where research work has to be focused. The different implemen-
tation platforms are available for fog implementation. There are numerous simula-
tion platforms available for edge/fog implementation. The researchers can explore the
features of different platforms and choose an appropriate simulator by considering the
requirements. The architectural hierarchy, encryption policies, task scheduling, net-
work usage and resource monitoring are areas where research opportunities are open.
3.6 Conclusion
Edge/fog computing has emerged as the subsidiaries to cloud computing to make
processing faster and improve performance. The wide range of applications depends
on the cloud for data storage, processing, performing data analytics, etc. Some
applications which depend on the cloud suffer from serious latency issues. The delay
and latency can be reduced by the incorporation of edge/fog computing. Edge/
fog computing is implemented at the edge of the network. This greatly reduces the
response time, delay, jitter and latency. Edge/fog computing has numerous benefits
in terms of cost, scalability, reliability, privacy and security.
The wide range of applications currently is moving from cloud-alone platforms
to an edge/fog-cloud environment. This incorporation has greatly improved the per-
formance of the applications. Edge/fog is a promising platform which has lots of
advantages over the existing cloud-alone implementation. The hybrid model which
combines edge/fog and the cloud has gained wide popularity. Addressing the edge/
fog-related challenges needs a thorough understanding of the concept. The different
sections in this chapter will help researchers to gain knowledge about the edge/fog
environment and to focus on areas which need further research inputs.
References
1. B. Alouffi, M. Hasnain, A. Alharbi, W. Alosaimi, H. Alyami, and M. Ayaz (2021) “A
Systematic Literature Review on Cloud Computing Security: Threats and Mitigation
Strategies”. IEEE Access, 9: 57792–57807. doi: 10.1109/ACCESS.2021.3073203.
2. M. B. Yassein, O. Alzoubi, S. Rawasheh, F. Shatnawi, and I. Hmeidi (2020) “Features,
Challenges and Issues of Fog Computing: A Comprehensive Review”. WSEAS
Transactions on Computers, 19. doi: 10.37394/23205.2020.19.12.
T.me/nettrain
64 | Machine Intelligence
T.me/nettrain
Chapter 4
Reduce Overfitting
and Improve Deep
Learning Models’
Performance in Medical
Image Classification
Nidhin Raju and D. Peter Augustine
4.1 Introduction
Image classification is the process of assigning at least one name to an image, and it is
one of the most important challenges in computer vision. It has several applications,
including image and video recovery, video reconnaissance, web content evaluation,
and biometrics. Feature coding is an important element of image classification that
has received a lot of attention in recent years, with numerous coding calculations
presented [1]. In general, image classification involves extracting features from images
and then grouping the separated pieces. In this way, the critical question of image
classification is how to extract and analyse features from images. To handle an image,
traditional image classification approaches use low-level or mid-level features. Low-
level features are mostly determined by grayscale thickness, variety, surface, shape,
and location of data, all of which are influenced by human perception (otherwise
called hand-created features). After the extraction of features in computer vision, a
classifier is typically used to assign labels to various types of objects [2].
DOI: 10.1201/9781003424550-4 65
T.me/nettrain
66 | Machine Intelligence
The deep learning technique, unlike the traditional image classification strategy,
combines the process of image features extraction and grouping into a single struc-
ture. The high-level feature visualization of deep learning has proven to be superior
to hand-made low-level and mid-level components, achieving excellent results in
the recognition and classification of images. This is the premise of the deep learning
model, which is made up of many layers, such as convolutional layers and fully
connected layers that transform input data (e.g., pictures) into output data (e.g.,
classification outputs) while proceeding to increasingly significant level features. The
primary advantage of deep learning is that it can train information-driven (or task-
explicit) diverse tiered features and perform feature extraction and arrangement in
a single network that is prepared from beginning to end [3, 4]. Deep learning is an
effective approach in many domains, including medical image classification, due to
its ability to automatically learn and extract intricate features from complex data.
Compared to the computer vision domain, applying deep learning to med-
ical image classification has a few inherent problems. While huge databases
of common, widely usable images are easily accessible and available for com-
puter vision professionals, often for free, obtaining and using clinical images is
more difficult. While massive databases of common generally beneficial images
are effectively accessible and open for PC vision experts, sometimes even freely,
securing and using medical images in the development of new deep learning-based
innovations is a critical constraint. Despite the fact that the picture archiving and
communication system (PACS) frameworks are filled with a huge number of
images obtained on a regular basis in almost every clinic, medical image datasets
are often small and private [3, 5]. There are two main reasons why this massive
amount of stored data is not easily exploitable for clinical image analysis. The
first reason is the moral and security concerns, as well as legal difficulties. Clinical
information must be used according to certain rules to guarantee proper usage.
To include an image in a specific report, the patient must give their informed
consent, and information anonymization procedures must be followed to ensure
the patient’s safety. The lack of image annotations is another reason. Constructing
a clinical image classification model assumes that every pixel in the picture is
marked by its class, i.e., object or remaining portions, which is time-consuming
and requires deeper details [6, 7].
The constraints stem not only from the imaging data but also from the nature
of the relevant annotations. Obtaining specific labels for each pixel in the image is
time-consuming and requires master data. As a result, professionals try to lighten the
load by developing self-loading annotation tools, making insufficient annotations,
or publicly promoting non-master labels. Managing label noise when creating
classification models is a test in any of these scenarios. The nature of the outcome
comments, which can be treated as fuzzy, is often substandard, with variables such as
information, the objective of the images, visual discernment, and weakness playing
a significant role. As a result, developing a deep learning framework based on such
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 67
data necessitates careful consideration of the most effective way of managing noise
and exposure in the real world, which is still an ongoing topic of research [8].
As a result, properly progressing from limited clarified data is an important topic
of investigation that designers and specialists must focus on. There are commonly
different strategies to increase the dataset size as the deep learning-based classifica-
tion approach. Data augmentation is one of the strategies which refers to enlarging
the dataset by adding extra images, either through simple activities such as interpret-
ations and revolutions or through more advanced methods. In many applications,
instead of traditional photos, 3D medical volumes are replaced with heaps of autono-
mous 2D images to provide additional information in the model [9]. Sub-volumes
or image patches are frequently eliminated from images to increase the amount of
data available. The undeniable flaw is that the anatomic setting in headings symmet-
rical to the slice plane is completely eliminated. In light of Generative Adversarial
Networks (GAN), another technique to cope with making fabricated images to
enlarge the dataset has just been offered. The GANs are a unique type of neural net-
work model in which two models are prepared simultaneously: one is the generator,
which is focused on creating manufactured images from commotion vectors, and the
other is the discriminator, which is focused on distinguishing between genuine and
fake images produced by the generator. GANs are created by addressing an advance-
ment issue that the discriminator tries to improve and the generator tries to reduce
[10, 11].
T.me/nettrain
68 | Machine Intelligence
dimensionality of the activation map. This capability is similar to that of C-cells and
contributes to the abstraction of various leveled features in our visual data hand-
ling [14, 15]. Convolutional neural networks (CNNs) are classified as deep learning
networks and are commonly used to analyse and classify images. They are made up
of an input and an output layer, with several hidden layers in between. The hidden
layers are made up of a few convolutional layers that extract the various features of
the images.
A CNN is represented by the following non-linear function:
ci = C ( M ; θ ) (4.1)
( )
T
which transforms an image M ∈ R W ×W of size W×W to a vector ci = c1,c 2, …cn ,
where ci [0,1] and signifies the chance that the Image M belongs to one of the n
classes: i =1 … n. The number of K parameters used to map the input image to the
{
vector ci is θ = θ1,θ2, …θ K . }
CNN training can be viewed as a non-linear optimization problem:
1
Oi
Di = (4.4)
1
∑ 1=1.
n
Oi
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 69
T.me/nettrain
70 | Machine Intelligence
than other excellent CNNs but too expensive to be trained on a standard computer
device [24]. While there has been significant development in Central Processing
Units and Graphics Processing Units recently, it is not practical in some real-world
applications. As this is still in its early stages, this current situation provides an
opportunity to investigate lightweight architectures (LWAs) that can be used in
cell phones to improve convenience and productivity. A dataset is used labeled if it
contains appropriate image comments for each of the available classes. For example,
if a cardiac image dataset has two classes, “cardiac” and “usual,” each image should
be clarified with both of these classes [25].
Labeled datasets are important for medical image classification because deep
learning models rely on a large number of labeled images for training in order to
provide productive image classification. Collecting a large number of labeled clinical
image datasets can be a difficult task, in contrast to regular images. With access to
decent labeled image data, transfer learning can also succeed and make more powerful
arrangements. Traditional transfer learning techniques perform well in the absence
of information, but there is still room for improvement [26]. In normal images,
the differences in features are critical from one image to the next, with the images
having a variety of lighting and shapes, yet in medical images, these distinctions can
be minute, depending on the problem, making it difficult to learn characterizations
with limited information while using transfer learning strategy. multi-stage transfer
learning is one viable strategy for overcoming this challenge.
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 71
examples of LWAs that are opening the way for more proficient image classification
products. Each of these models has unique compositional advantages that result
in improved image recognition tasks. MobileNets, for example, uses depth-wise
distinct convolutions, whereas SqueezeNet achieves better exactness by replacing
3×3 convolutional channels with 1×1 channels to decrease computational expenses
during basic training. The use of LWAs in medical image classification is still in its
early stages, but the field is rapidly progressing with recommendations of new ways
to further develop proficiency, as mentioned in Section 4.3.1. A large number of
these models are used for transfer learning in medical image classification. Due to
its engineering of fewer boundaries, the SqueezeNet model outperformed all other
models in terms of speed. Despite the fact that SqueezeNet achieved excellent results
with additional complex models, there were some significant misclassifications of
precancerous images as harmless, indicating that there are issues in LWAs and that
further analysis is necessary [30, 31].
T.me/nettrain
72 | Machine Intelligence
data and ends up retaining the patterns as well as the noise and irregular variances.
Because of invisible data conditions, these models fail to summarize and perform
adequately, negating the point of the model. The high chance of the model exhib-
ition indicates an overfitting problem. The model’s training duration or architecture
depth may cause it to overfit [33]. If the model trains for an extremely long time on
the training data or is overly complex, it learns the noise or superfluous data from
the dataset. A model that fails to adequately learn the problem, performs poorly on a
training dataset, and does not perform well on a hold-out test is an underfit model.
A model fit can be thought of as a predisposition difference compromise. Underfit
models have a strong bias but a limited variance. It cannot become acquainted with
the issue, regardless of the specific examples in the training data. An overfit model
has a limited bias and a strong variance. The model learns the training data extremely
well, and its execution differs significantly from that of new unseen data or even
quantifiable noise added to samples in the training dataset [34]. The easy ways to
reduce overfitting are:
The advantage of extremely deep neural networks is that their performance improves
as they handle increasingly large datasets. A model with a near-limitless number of
data will eventually level out as far as what the network’s limit is fit for learning.
A model can overfit a training dataset if it has the necessary capability. Reduce
the model’s limit to reduce the likelihood of the model overfitting the preparation
dataset to the point where it no longer overfits. The complexity of a brain network
model is defined by both its construction in terms of nodes and layers and its bound-
aries in terms of weights. In this way, we can change the complexity of a model to
reduce overfitting in two different ways:
In neural networks, the complexity can be changed by adjusting the number of versa-
tile parameters. Another primary method for controlling the complexity of a model
is through the use of regularization, which includes the inclusion of a penalty term
to the error method. Massive weights will almost always result in sharp advances in
activation functions and, as a result, enormous changes in output for small changes
in inputs. It is more reasonable to concentrate on techniques that compel the size of
the weights of a neural network, because a single model structure can be characterized
that is under-compelled, for example, it has a much higher limit than is expected for
the issue, and regularization can be used during training to ensure that the model
doesn’t overfit. In such cases, execution may be improved because the additional
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 73
limit can be oriented on better learning generalizable ideas in the work [35, 36].
Regularization strategies are those that attempt to decrease overfitting by keeping
network weights to a minimum. Regularization relates to a class of approaches that
add extra data to change a poorly presented problem into a more consistent, well-
presented matter. When small changes in the given data cause large changes in the
arrangement, a problem can be said to be poorly presented. Because of the uncer-
tainty surrounding the information, arrangements are untruthful because minor
estimation errors in constraints can be extraordinarily amplified and lead to absurdly
diverse reactions. The concept behind regularization is to utilize useful data to repeat
a poorly presented issue in a consistent structure. Regularization strategies are so
extensively used to reduce overfitting that the term regularization could be used to
refer to a strategy that improves the generalization error of a model. Regularization
is any adjustment to a learning model that is intended to reduce its generalization
error but not yet its training faults. Regularization is one of the primary concerns
in the field of deep learning, challenged in importance only by optimizing [37, 38].
4.4.1 Weight Regularization
The easiest and possibly most common regularization strategy is to add a penalty
to the loss measurement in relation to the size of the loads in the model. Neural
networks become familiar with a variety of loads that best guide contributions to
yields. A network with massive network weights may be an indication of an unstable
architecture in which minor changes in input can result in significant changes in the
outcome. This could indicate that the model overfits the training dataset and will
probably perform poorly when forecasting on new data. One way to solve this is to
update the learning model to empower the networks to keep the weights light. This
is known as weight regularization, and it is commonly used as an overall strategy
to reduce overfitting of the training dataset and work on the model’s generaliza-
tion. While training a neural network model, we should become acquainted with
the network’s weights by using stochastic gradient descent and the training dataset.
The more we train the model, the more specific the weights will become to the
data, causing the training data to be overfit. The weights will fill in size to deal
with the samples available in the training data. Massive weights render the model
unstable. When the weights are specific to the training dataset, minor variations
or factual noise on the standard sources of the data will cause huge differences in
the results. A network with huge weights is more complex than one with smaller
weights. It indicates a model that is overly concerned with training data. Generally,
we recommend using simpler models to solve problems and models with lighter
weights. Another possibility is that there are numerous input factors, each with
varying degrees of relevance to the outcome variable. We can use strategies to help us
choose input factors at times, but the interrelationships between factors aren’t always
T.me/nettrain
74 | Machine Intelligence
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 75
the network’s advancement to take into account the size of the weights. Keep in
mind that when we train a model, we limit a loss measurement. We can add the
current size of all loads in the model or include a layer to this estimation when cal-
culating the difference between the anticipated and expected values in a batch. This
is referred to as a penalty because we are punishing the model in relation to the size
of the weights in the model. Larger weights result in a harsher penalty, as well as a
higher loss score. The optimization algorithm will then push the model to have more
modest weights, such as weights no larger than what is required to perform well on
the training dataset. Tinier weights are regarded as more normal or less abnormal,
use this penalty as weight regularization in this way [39]. Higher weights result in a
harsher punishment, as well as a higher loss score. The advancement calculation will
then drive the model to have more modest weights, such as weights no larger than
what is necessary to perform well on the training dataset. We refer to this penalty
as weight regularization since lighter weights are considered more common or less
particular. When this technique of punishing model coefficients is used in various
AI models, it might be addressed as shrinkage, because the punishment energizes the
coefficients to retreat during the progression interaction.
4.4.2 Activity Regularization
Deep learning models can facilitate feature learning. The model will naturally extract
the notable features from the input data or learn features during network training.
These features could be used in the model to forecast a relapse amount or antici-
pate a class as an incentive for classification. These interior characterizations are spe-
cific things. The features learned by the model at that point in the network are
represented by the output of a hidden layer within the network. There is an area of
research focused on the productive and successful programmed learning of features,
which is frequently examined by having a network reduce a contribution to a minor
scholarly feature before using a second network to remake the exact input from
the learned feature. These models are known as auto-encoders or encoder-decoders,
and their learned elements can be helpful in learning more about the domain and
in forecasting analytics. The learned elements, or encoded inputs, should be large
enough to capture the salient characteristics of the data while also being narrow
enough not to overfit the specific models in the training dataset. As a result, there
is conflict between the expressiveness and speculation of the learned features [43].
The network’s loss function can be refreshed to punish models in relation to
the magnitude of their actuation. This is similar to weight regularization in that the
loss function is refreshed to punish the model based on the magnitude of the loads.
This type of punishment or regularization is known as initiation regularization or
action regularization because the result of a layer is referred to as its activation or
action. The result of an encoder or, for the most part, the result of a hidden layer in
a neural network could then be viewed as the model’s characterization of the issue.
T.me/nettrain
76 | Machine Intelligence
The L1 norm promotes sparsity by, for example, allowing a few enactments to become
zero, whereas the L2 norm promotes small initiation values overall. The use of the
L1 norm may be a more commonly involved penalty for initiation regularization.
A hyperparameter indicating the sum or degree to which the misfortune capability
will weight or suffer in light of the consequence should be determined. Normal
qualities are measured on a logarithmic scale between 0 and 0.1. Activity regular-
ization could be applied in conjunction with other regularization procedures, such
as weight regularization. A common approach is activation regularization. It may be
used with most, if not all, types of neural network models, including the best-known
network types, such as Multilayer Perceptrons, Convolutional Neural Networks, etc.
Activity regularization could be the best option for model types that are looking
for an excellent learnt depiction. These contain auto-encoder models and encoder-
decoder models. The L1 standard, which enables sparsity, is the most commonly
recognized activation regularization. Investigate several routes for different types of
regularization, such as the L2 standard or using both the L1 and L2 norms concur-
rently. The rectified linear activation function, also known as relu, is an actuation
capability that is currently widely used in the hidden layer of deep neural networks.
Despite classic initiation works such as tanh and sigmoid (strategic capability), the
relu capability allows for definite zero characteristics without issue [45]. This makes
it a viable contender when learning limited representations, for example, with the
L1 vector norm for regularization. It is usual to provide minor characteristics for
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 77
T.me/nettrain
78 | Machine Intelligence
4.4.4 Noise Regularization
Adding random noise is one technique to address the growing conjecture blunder
and work on the mapping issue’s structure. The increase in noise during the devel-
opment of a neural network model affects its regularization and, consequently, its
performance. It has been demonstrated that weight regularization approaches have a
similar effect on the capacity for misfortune as the extension of a penalty period. The
size of the training dataset is essentially increased by introducing noise. The input
variables are altered randomly each time a training sample is introduced into the
model, making each exposure to the model unique. This makes the addition of noise
to input samples a straightforward method of data augmentation. The addition of
noise reduces the network’s ability to recall training samples since they change all the
time, leading in smaller network weights and a more resilient network with reduced
generalization error. The noise implies that fresh samples are being pulled from the
domain around known samples, smoothing the structure of the input space. Because
of this smoothing, the mapping function may be easier for the network to learn,
resulting in better and faster learning [50, 51].
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 79
the validation dataset begins to deteriorate, the training process is terminated. The
model that is used when training is stopped is known to have strong generalization
performance. This is known as early stopping, and it is one of the oldest and most
extensively used types of neural network regularizations [54, 55].
4.5 Conclusion
One of the important components for medical image assessment is deep learning. It
has been used successfully in target identification, segmentation and classification.
The advance of deep learning in the clinical sector is dependent on the gathering of
vast amounts of clinical information, which has multi-modular features and thus
provides a lot of rich information for deep learning. In terms of disease treatment,
deep learning may not only locate the injured spot, but also distinguish and iden-
tify specific lesions. When the injury is true, several identifying models can also
fragment the injured area. While deep learning has many advantages, it also has
some drawbacks. The deep learning approach is heavily reliant on informational
collections. Every deep learning network necessitates massive information for
training, making data index acquisition really difficult. The fundamental reason is
that the pixel highlights in the first information image are overly intricate, thus
a future improvement pattern is to focus on developing a model with a smaller
data size. Deep learning may evolve so quickly in the medical industry because it is
inextricably linked to a wide range of treatment procedures. The goal of adequately
applying deep learning to all phases of medical care is becoming more complicated.
It is determined by two factors: the perpetual iteration of technology and the con-
tinuing accumulation of medical experience. On the other hand, such deep learning
models tend to overfit issues mainly because of the lack of available data in the
training set and network architecture. We present some solutions to the overfitting
issues and leverage the performance of the deep learning models in this chapter.
References
1. C. Affonso, A. L. D. Rossi, F. H. A. Vieira, and A. C. P. de Carvalho (2017) “Deep
Learning for Biological Image Classification.” Expert Systems with Applications, 85: 114–
122. doi:10.1016/J.ESWA.2017.05.039.
2. S. R. M. Zeebaree et al. (2020) “Deep Learning Models Based on Image Classification:
A Review of Facial Expression Recognition Based on Hybrid Feature Extraction
Techniques,” International Journal of Science and Business, 4(11): 75–81. doi:10.5281/
zenodo.4108433.
3. S. Suganyadevi, V. Seethalakshmi, and K. Balasamy (2021) “A Review on Deep Learning
in Medical Image Analysis.” International Journal of Multimedia Information Retrieval,
11(1): 19–38. doi:10.1007/S13735-021-00218-1.
T.me/nettrain
80 | Machine Intelligence
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 81
18. L. Alzubaidi et al. (2021) “Novel Transfer Learning Approach for Medical Imaging with
Limited Labeled Data.” Cancers, 13(7):1590, doi:10.3390/CANCERS13071590.
19. J. Wang, H. Zhu, S. H. Wang, and Y. D. Zhang (2021) “A Review of Deep Learning
on Medical Image Analysis.” Mobile Networks and Applications, 26(1): 351–
380.,doi:10.1007/S11036-020-01672-7/TABLES/11.
20. S. Niu, M. Liu, Y. Liu, J. Wang, and H. Song (2021) “Distant Domain Transfer
Learning for Medical Imaging.” IEEE Journal of Biomedical and Health. Informatics,
25(10): 3784–3793. doi:10.1109/JBHI.2021.3051470.
21. G. Ayana, K. Dese, and S. W. Choe (2021,) “Transfer Learning in Breast Cancer Diagnoses
via Ultrasound Imaging.” Cancers, 13(4): 738.doi:10.3390/CANCERS13040738.
22. F. Altaf, S. M. S. Islam, and N. K. Janjua (2021) “A Novel Augmented Deep Transfer
Learning for Classification of COVID-19 and Other Thoracic Diseases from X-Rays,”
Neural Computing and Applications, 33(20): 14037–14048.,doi:10.1007/S00521-021-
06044-0/TABLES/7.
23. M. A. Morid, A. Borjali, and G. Del Fiol (2021) “A Scoping Review of Transfer Learning
Research on Medical Image Analysis Using ImageNet.” Computers in Biology and
Medicine, 128: 104115. doi:10.1016/J.COMPBIOMED.2020.104115.
24. D. R. Sarvamangala and R. V. Kulkarni (2022) “Convolutional Neural Networks in
Medical Image Understanding: A Survey.” Evolutionary Intelligence, 15(1): 1– 22.
doi:10.1007/S12065-020-00540-3/FIGURES/2.
25. M. Arbane, R. Benlamri, Y. Brik, and M. Djerioui (2021) “Transfer Learning for
Automatic Brain Tumor Classification Using MRI Images.” Paper presented at 2nd
International Workshop on Human-Centric Smart Environments for Health and Well-
Being, IHSH 2020, pp. 210–214, February. doi:10.1109/IHSH51661.2021.9378739.
26. S. Cyriac, N. Raju, and S. Ramaswamy (2021) “Comparison of Full Training and
Transfer Learning in Deep Learning for Image Classification.” Lecture Notes on Networks
Systems, 290: 58–67. doi:10.1007/978-981-16-4486-3_6/COVER.
27. D. Karimi, S. K. Warfield, and A. Gholipour (2021) “Transfer Learning in Medical Image
Segmentation: New Insights from Analysis of the Dynamics of Model Parameters and
Learned Representations.” Artificial Intelligence in Medicine, 116: 102078. doi:10.1016/
J.ARTMED.2021.102078.
28. M. O. Aftab, M. Javed Awan, S. Khalid, R. Javed, and H. Shabir (2021) “Executing
Spark Big DL for Leukemia Detection from Microscopic Images Using Transfer
Learning.” Paper presented at 2021 1st International Conference on Artificial
Intelligence and Data Analysis (CAIDA 2021), April, pp. 216–220. doi:10.1109/
CAIDA51941.2021.9425264.
29. N. Raju, Anita H. B., and P. Augustine (2020) “Identification of Interstitial Lung
Diseases Using Deep Learning.” International Journal of Electrical and Computer
Engineering, 10(6): 6283–6291. doi:10.11591/IJECE.V10I6.PP6283-6291.
30. L. Gaur, U. Bhatia, N. Z. Jhanjhi, G. Muhammad, and M. Masud (2021) “Medical
Image-Based Detection of COVID-19 Using Deep Convolution Neural Networks.”
Multimedia Systems, 1: 1–10. doi:10.1007/S00530-021-00794-6/TABLES/6.
31. J. M. Valverde et al. (2021) “Transfer Learning in Magnetic Resonance Brain Imaging:
A Systematic Review.” Journal of Imaging, 7(4): 66. doi:10.3390/JIMAGING7040066.
T.me/nettrain
82 | Machine Intelligence
T.me/nettrain
DL Models’ Performance in Medical Image Classification | 83
T.me/nettrain
T.me/nettrain
Chapter 5
5.1 Introduction
5.1.1 Self-Driving Vehicles
Self-driving vehicles are cars or trucks without human drivers to securely operate
the vehicle. Conjointly referred to as autonomous or driverless cars, they combine
sensors and computer code to regulate, navigate, and drive the vehicle.
5.1.2 Levels of Autonomy
Different cars are capable of various levels of self-driving, and area units are usually
represented by researchers on a scale of 0–5 (Figure 5.1):
DOI: 10.1201/9781003424550-5 85
T.me/nettrain
86 | Machine Intelligence
5.1.3.1 Image Pre-Processing
The pre-processing technique is used to remove or suppress unwanted noise and
improve the image quality in such a way by using image processing techniques
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 87
called smoothing and sharpening. The other image pre-processing techniques used
are rotation, scaling, zooming, resizing, converting images to grayscale images, and
transforming.
5.1.4 Deep Learning
Deep learning (also called deep structured learning) may be a part of a broader
family of machine learning ways that support artificial neural networks with illus-
tration learning. Learning may be supervised, unattended, and reinforced. Deep
learning is also an assortment of algorithms galvanized by the workings of the
human brain as a method to process information and create patterns to be used in
the higher cognitive process. Deep learning is based on a model design known as
Artificial Neural Network. Deep learning has achieved enormous success in compu-
tational science, and its algorithms are broadly used in industries that solve advanced
problems. Each of the deep learning algorithms uses numerous varieties of neural
networks to accomplish determined tasks. Deep learning algorithms train computers
by learning from examples. Industries such as healthcare, e-commerce, entertain-
ment, and advertising commonly use deep learning.
T.me/nettrain
88 | Machine Intelligence
5.1.6 Computer Vision
Computer vision means the extraction of data from pictures, text, videos, etc.
Generally, computer vision strives to imitate human vision. It is a subdivision of
Artificial Intelligence that collects data from digital pictures or videos and splits
them to describe the attributes. The complete method involves image obtaining,
viewing, reviewing, distinguishing, and extracting data. This in-depth process
helps computers to know any visual content and consequently act upon it.
Computer vision projects translate digital visual content into precise descriptions
to assemble multidimensional knowledge. This knowledge is then turned into
a computer-readable language to assist the decision-making method. The main
objective of this wing of AI is to train machines to gather information from
pictures.
The field of AI has seen dramatic changes over the past few years resulting in sev-
eral new techniques. Computer vision is one such field that has gained momentum
in recent times that aims at instructing machines to study and interpret the visual
world. Computer vision deals with numerous difficult cases ranging from classifying
pictures to recognizing faces. One such challenge that might be addressed these days
is object detection.
5.1.7 Object Detection
Object detection is a technique that incorporates two tasks: object classification and
object localization. It is a model trained to observe the presence and place of multiple
categories of objects. This could be used on static pictures or maybe in real time on
videos. Object detection locates the thing and classifies it into completely different
categories and localizes it by drawing boundary boxes around it. There are several
use cases for object detection. As an example, self-driving or automated driving must
be ready to determine different objects on the road while driving. To handle these
use cases, various progressive algorithms are getting used to observing objects, e.g.,
R-CNN, Fast R-CNN, quicker R-CNN, Mask R-CNN, SSD, YOLO, etc. YOLO
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 89
(YOU ONLY LOOK ONCE) is considered one of the foremost powerful object
detection algorithms, designed by Joseph Redmon and three others.
T.me/nettrain
90 | Machine Intelligence
boxes of that grid cell area unit accurately locate and predict that object (Figures
5.2 and 5.3).
Each bounding box can have five predictions: x coordinate, y coordinate,
width, height, and confidence. The calculated confidence score indicates how cer-
tain the model thinks there’s a category inside the bounding box, and the correct
way it thinks that category fits within the box uses a metric known as intersection
over the union. Intersection over the union is employed in object detection as a
result of comparing the bottom true bounding box with the anticipated bounding
box. By dividing the region of overlap with the region of the union, we’ve got a
concept that rewards serious overlapping and penalizes inaccurate bounding box
predictions. The bounding box’s goal is to ultimately confine the item inside the
image as accurately as possible and intersection over union may be a nice metric
to do this. When a picture is run through YOLO, it outputs predictions to the
associated S X S X (B* 5+C) tensor where every grid cell predicts the situation and
confidence legion B quantity of bounding boxes across C quantity of categories.
Ultimately, we tend to find ourselves with tons of bounding boxes –most of those
are digressive. To filter the proper boxes, the bounding boxes with an expected cat-
egory that meets a particular confidence score can then be unbroken. This enables
all relevant objects inside the image to be isolated. Figure 5.4 gives an overview of
the proposed method.
5.2 Literature Review
T.H. Jung et al. [1] designed a system that detects only persons in the agricultural
environment. A four-channel frame camera was mounted on the four sides
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 91
(left, right, front and rear) of the tractor, and the data was collected from the
entire surrounding. The Nvidia Jetson Xavier platform is used as an experimental
system capable of performing high-speed GPU-based parallel processing to avoid
resource problems. CNN [2] has several object detection algorithms, like R-
CNN [3], Fast R-CNN [3], Faster R-CNN [3], Mask R-CNN, Over Feat, Single
Shot Detector (SSD), and YOLO series. YOLO is considered the best network in
this chapter, because it is capable of obtaining high-accuracy detection and can
run in real time. YOLO (You Only Look Once), as the name implies, needs only
one forward propagation through a neural network for object detection. YOLO
[3] uses CNN for object detection. This chapter adopted the YOLOv3 [4] net-
work for person recognition and some changes were made to apply the proposed
system.
In a study [5] a pedestrian object detection system was set up using YOLOv3.
The YOLO algorithmic rule is trained on visualization in only one category: the
person category. It has achieved great precision and recall in comparison to plain
YOLOv3. In a Lidar-based LS-R-YOLOv4 approach, a paper used a device purpose
cloud map to measure data with deep learning to discover and section objects [6].
The primary information set was KITTI which provides depth data information for
measuring a device segmentation (Lidar Segmentation) of objects obtained through
T.me/nettrain
92 | Machine Intelligence
measuring device purpose clouds and then done on pictures. Later in the YOLOv4
[7], a neural network region of interest (ROI) detects the objects as final object
detection.
Tiny target detection supported the improved YOLO network approach
[8]. The paper chiefly introduces the small object detection of road information
supported by the proposed YOLO network model optimized by YOLOv3 network
[4], which uses a twin feature extraction network structure. The backbone net-
work is given a special feature extraction auxiliary network of the receptive fields.
The feature information fusion of the auxiliary network and additionally the back-
bone network adopted the attention mechanism. It focuses on the effective feature
channels, suppresses the invalid information channels, and improves the network
method efficiency.
Object detection, using a modified YOLOv1 [9] neural network, is a modified
network proposed for object detection. The old loss function of YOLOv1 [7] is
modified. The loss function of the original YOLOv1 [10] takes the same error for
large and small objects. So the prediction of neighbouring objects is unsatisfactory.
Detecting two objects in the same grid becomes a problem, so the new loss function
is more flexible and optimizable, compared to the old loss function and in a net-
work error.
P.L. Chitra and Gomathi [11] proposed a method that detects a pedestrian
with the lane detection method which is essential for self-driving vehicles using the
YOLOv3 algorithm.
Handalage 12] reviewed the fundamental structure of the CNN algorithms,
and an overview of YOLO’s real-time object detection algorithm. The CNN design
models will take away highlights and see objects in every given image. Here the
authors used the YOLOv2 model to convey the summary of this work.
Thakkar et al. [13] reported on a custom image dataset being used and trained
for six classes using YOLO to track and sort. Recognizing a vehicle or a pedestrian
in an ongoing video is helpful for traffic analysis.
Pal and Chawan [14] gave a brief introduction about deep learning and an object
detection framework, in which they clearly explained CNN and its latest algorithms
for object detection.
After going through the deep network Lan et al. [15] improved the network
structure of the YOLO algorithm and proposed a new network structure YOLO-R.
There was some loss of pedestrian information, which caused the disappearance of
gradients, causing inaccurate pedestrian detection.
5.3 Proposed Methodology
Road accidents are quite common today. Particularly in large cities, there are
numerous modes of transport. Moreover, the roads have become narrower and
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 93
also the cities have become more crowded. Object detection is a significant part
of driverless vehicles. This work is projected to find completely different categories
which are trained using the YOLOv4 design besides the conversion of object
labels (text) into speech or audio responses for the visually impaired person to
grasp unfamiliar surroundings. This works by exploiting the YOLOv4 algorithmic
program that runs through a variation of a very advanced convolutional neural
specification referred to as the Dark net with OpenCV and Google Text to Speech.
The methodologies for multiple object detection with audio output are shown in
Figure 5.5.
5.3.1 Proposed Architecture
In this proposed architecture, the input data are collected using the video which was
taken during traffic and then converted into frames. This system focuses on recog-
nizing and identifying the specified five classes (person, car, bus, bike, auto) which
appear in the traffic environment. The five classes are considered to be the objects
in our datasets. Further, these frames are processed using the following workflow as
shown in Figure 5.6 and detailed here.
The flowchart in Figure 5.6 of the proposed method starts by receiving input
images and processing them on the YOLO network to detect objects. Once it has
done the detection, it converts the object label into speech to produce audio output
until it terminates or stops the execution.
T.me/nettrain
94 | Machine Intelligence
Step 1: The frames are input images, the labelling tool has been used to anno-
tate those images.
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 95
T.me/nettrain
96 | Machine Intelligence
5.3.4 YOLOv4 Architecture
Already we have discussed YOLO, which is one of the most famous object detec-
tion algorithms due to its speed and accuracy. YOLOv4 is intended for this process,
based on recent analysis findings, using CSPDarknet53 as a backbone, SPP (spatial
pyramid pooling) and PAN (path aggregation network), which is referred to as “the
Neck”, and YOLOv3 for “the Head”. The network contains 53 convolution layers
with the sizes of 1x1 and 3x3, and every convolution layer is connected with a batch
normalization (BN) layer and a Mish activation layer. SPP may be a methodology
to exploit both fine and coarse information by simultaneously pooling on multiple
kernel sizes (1, 5, 9, 13). PAN is a technique that leverages information in layers
close to the input by conveying features from completely different backbone levels
to the detector. This model contains 53 convolution layers, each followed by a batch
normalization layer and Leaky ReLU activation. Predicting the output contains the
centre of the ground truth box of an object, which predicts the image. The centre
of an image is responsible for the detection of objects, and also predicting through
probabilities that the box contains a certain class.
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 97
5.3.4.2 mAP
Mean average precision (mAP) is employed by collecting a set of predicted object
detections and a set of ground truth object annotations.
■ For every prediction, IoU is computed with reference to each ground truth
box enclosed in the image.
■ These IoUs are then thresholded to some value (generally between 0.5 and
0.95) and predictions are matched with ground truth boxes with respect to a
greedy strategy (i.e. highest IoUs are matched first).
■ A precision recall (PR) curve is then generated for every object class and
therefore the average precision (AP) is computed. A PR curve takes into con-
sideration the performance of a model with reference to true positives, false
positives, and false negatives over a spread of confidence values.
■ The mean of the AP of all categories classes is the mAP.
5.3.4.3 Precision
Precision is the ratio of the number of true positives to the total number of positive
predictions. For example, if the model detected 100 cars, and 90 were correct, the
precision is 90 per cent.
5.3.4.4 Recall
Recall is the ratio of the number of true positives to the total number of actual (rele-
vant) objects. For example, if the model correctly detects 75 cars in an image, and
there are actually 100 cars in the image, the recall is 75 per cent.
The F1 score is a weighted average of the precision and recall. Values range from
0 to 1, where 1 means the highest accuracy.
T.me/nettrain
98 | Machine Intelligence
5.3.5 Google Text-to-Speech
Google Text-to-Speech, a Python library, and a user interface tool are used to inter-
face with Google Translates text-to-speech application programming interface (API).
It features versatile pre-processing and tokenizing. Text-to-Speech generates raw
audio information of natural, human speech. That is, it creates audio that appears
like someone talking. When you send a synthesis request to Text-to-Speech, you
need to specify a voice that ‘speaks’ the words. Text-to-Speech features a wide var-
iety of customized voices to use. The voices vary in language, gender, and accent
(for some languages). For instance, we are able to produce audio that mimics the
sound of a female English speaker with a British accent. We are able to additionally
convert an equivalent text into a distinct voice, say, a male English speaker with an
Australian accent.
■ Capturing the decoding video file: we have captured the video using Video
Capture object and then capturing has been initialized so every video frame is
decoded (i.e., converted into a sequence of images).
■ Pre-processing: The converted image frames have been preprocessed into a par-
ticular size. In this method, we have used the image size of 416x416 with the
channel as 3 for colour images.
■ Labelling: The pre-processed images are labelled using the labelling tool to
annotate the images. As the YOLO object detection model is a supervised
learning algorithm, it requires labelled datasets. YOLO versions always use a
text file for their annotation format with (.txt) as an extension.
■ YOLOv5 object detection: In this version, we have taken YOLOv5s, YOLOv5m,
YOLOv5l, and YOLOv5x models for the detection of an object.
■ Object detection: Predicted bounding boxes with confidence scores are detected
objects of various images.
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 99
Here we have implemented YOLOv5 based on the PyTorch framework which uses
a .yaml file for configuration. One of the most popular Python-based deep-learning
libraries is PyTorch and Tensorflow. YOLOv5 s, m, l, and x models output structure
displayed in Tensorboard in Section 5.4. PyTorch is one of the foremost recent deep
learning frameworks, developed by Facebook. It is gaining popularity because of its
simplicity of use, and dynamic computational graph.
5.4 Experimental Results
To execute the proposed techniques, an experimental set-up was proposed, to detect
people, bus, car, auto, and bike. This work takes 200 frames in various video images
to prove experimental results. For object detection, different sets of images are taken
freely in different scenes, with different lighting conditions, and create defects in the
bounding box using the YOLOv4 algorithm.
The outcome of the problem is the calculated average precision (AP) and the mean
average precision (maP) in overall detections and achieves better results (Table 5.1).
T.me/nettrain
100 | Machine Intelligence
Figure 5.8 m(a) mAP and loss chart, (b) YOLOv4 Performance Outcomes with
Audio.
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 101
T.me/nettrain
102 | Machine Intelligence
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 103
T.me/nettrain
104 | Machine Intelligence
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 105
T.me/nettrain
106 | Machine Intelligence
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 107
5.4.3 Predictions
Figures 5.10 a–d show the detection of YOLOv5 based on different models. The
model YOLOv5 x has achieved 98 per cent of predictions accurately whereas we can
see the model has achieved 93 per cent of the lowest predictions.
1. YOLOv5 x
2. YOLOv5 l
3. YOLOv5 m
4. YOLOv5 s
Figure 5.10 Testing detected results based on YOLOv5 algorithm: (a) YOLOv5 x,
(b) YOLOv5 l, (c) YOLOv5 m, (d) YOLO v5 s.
T.me/nettrain
108 | Machine Intelligence
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 109
T.me/nettrain
110 | Machine Intelligence
5.5 Conclusion
In this study, the five classes were taken as an object to detect if it appears in the
image frames that are mainly used for the self-driving vehicles in the traffic envir-
onment. This work represents object detection used with the YOLO architecture.
YOLOv4 is the latest and fastest algorithm with a highly accurate rate of true
positive predictions. The width and height of predicted bounding boxes and IOU
T.me/nettrain
Motion Images Object Detection Using DL Algorithms | 111
methods were used to identify the real-time detection. In future, the algorithm can
be fine-tuned and increases the number of classes in different weather conditions. In
our future work, the network architecture along with the weight file will be created
rather than using the pre-trained model with different numbers of objects and
increasing the number of classes in our datasets.
References
1. Taek-Hoon Jung, Benjamin Cates, In-Kyo Choi, Sang-Heon Lee, and Jong-Min Choi
(2020) “Multi-Camera-Based Person Recognition System for Autonomous Tractors”.
http://dx.doi.org/10.3390/designs4040054
2. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-netwo
rks- the-eli5-way-3bd2b1164a53
3. https:// t ow a rds d ata s cie n ce.com/ r - c nn- f ast- r - c nn- f as t er- r - c nn- yolo- o bj e ct- d etect
ion- algorithms-36d53571365e
4. Joseph Redmon and Ali Farhadi (2018) “Yolov3: An Incremental Improvement”. arXiv
1804.02767v1
5. https://towardsdatascience.com/review-yolov2-yolo9000-you-only-look-once-obj
ect- detection-7883d2b02a65
6. Yu-Cheng Fan, Yelamandala Chitra, Ting-Wei Chen, and Chun-Ju Huang (2021)
“Real-Time Object Detection for LiDAR Based on LS-R-YOLOv4 Neural Network”.
Journal of Sensors, 1–11. doi:10.1155/2021/5576262.
7. A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao (2020) “Yolov4: Optimal Speed and
Accuracy of Object Detection”. arXiv. https:// arxiv.org/abs/2004.10934.
8. Qiwei Xu, Runzi Lin, HanYue, Hong Huang, Yun Yang, and Zhigang Yao (2020)
“Research on Small Target Detection in Driving Scenarios Based on Improved Yolo
Network”. IEEE Access. doi:10.1109/ACCESS.2020.2966328.
9. Tanvir Ahmad, Yinglong Ma, Muhammad Yahya, Belal Ahmad, Shah Nazir, Amin Haq,
and Rahman Ali (2020) “Object Detection through Modified YOLO Neural Network”.
Scientific Programming. doi:10.1155/2020/8403262.
10. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016) “You Only Look Once:
Unified, Real-Time Object Detection”. Paper presented at 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp.
779–788.
11. P. L. Chithra and V. Gomathi (2021) “Video Analysis Based on Pedestrian Object
Detection using YOLOv3”. Available at: www.researchgate.net/publication/363211
040
12. Upulie Handalage and Kuganandamurthy Lakshini (2021) “Real- Time Object
Detection Using YOLO: A Review”. Preprint. doi:10.13140/RG.2.2.24367.66723.
13. Heet Thakkar, Noopur Tambe, Sanjana Thamke, and Vaishali Gaidhane (2020) “Object
Tracking by Detection Using YOLO and SORT”. International Journal of Scientific
Research in Computer Science, Engineering and Information Technology, 30 March: 224–
229. doi:10.32628/CSEIT206256.
T.me/nettrain
112 | Machine Intelligence
14. Shubham Pal and Pramila Chawan (2019) “Real-Time Object Detection Using Deep
Learning: A Survey”. International Research Journal of Engineering and Technology, 6:
2395–056.
15. Wenbo Lan, Jianwu Dang, Yangping Wang and Song Wang (2018) “Pedestrian
Detection Based on YOLO Network Model”. Paper presented at IEEE International
Conference on Mechatronics and Automation. 978-1-5386-60751/18/$31.00 IEEE.
T.me/nettrain
Chapter 6
Diabetic Retinopathy
Detection Using
Various Machine
Learning Algorithms
P. K. Nizar Banu and Yadukrishna Sreekumar
6.1 Introduction
Diabetes is said to be a metabolic disease which causes high blood sugar. India is
said to be the diabetic capital of the world by 2030 with over 80 million people
affected. So it is necessary to diagnose and treat diabetes in the early stages. Diabetic
retinopathy is another condition which is connected to diabetes. Retinopathy is a
condition which develops in the eye, which if not treated at early stages, could lead
to permanent blindness.
Diabetic retinopathy, also known as diabetic eye disease, is a medical condition in
which damage occurs to the retina due to diabetes mellitus. Diabetic retinopathy is
the leading cause of blindness in the working-age population of the developed world
(California Healthcare Foundation, 2015). According to the California Healthcare
Foundation, diabetic retinopathy will affect over 93 million people. The US Centers
for Disease Control and Prevention estimate that 29.1 million people in the US have
diabetes and the World Health Organization estimates that 347 million people have
the disease worldwide. If a person has long-standing diabetes, they are potentially a
diabetic retinopathy sufferer. It is observed by the US Centers for Disease Control
and Prevention, that around 40–45 per cent of Americans with diabetes have diabetic
6.2 Background
6.2.1 Proliferative Diabetic Retinopathy
As the disease progresses, severe non-proliferative diabetic retinopathy enters an
advanced or proliferative stage, where blood vessels proliferate/grow. The lack of
oxygen in the retina causes fragile, new, blood vessels to grow along the retina and
T.me/nettrain
DR Detection Using Machine Learning Algorithms | 115
Figure 6.1 Fundus image –scatter laser surgery for diabetic retinopathy.
Source: https://en.wikipedia.org/wiki/File: Fundus_photo_showing_scatter_
laser_surgery_for_diabetic_retinopathy_EDA09.JPG
in the clear gel-like vitreous humour that fills the inside of the eye. Without timely
treatment, these new blood vessels can bleed, cloud the vision, and destroy the
retina (www.rxlist.com/diabetic_retinopathy/defi nition.htm). Fibrovascular prolif-
eration can also cause tractional retinal detachment (Conrad Stöppler, 2021). The
new blood vessels can also grow into the angle of the anterior chamber of the eye
and cause neurovascular glaucoma. Non-proliferative diabetic retinopathy shows
up as cotton wool spots, or microvascular abnormalities or as superficial retinal
haemorrhages. Even so, the advanced proliferative diabetic retinopathy (PDR) can
remain asymptomatic for a very long time, and so should be monitored closely
with regular check- ups (www.rxlist.com/diabetic_retinopathy/defi nition.htm).
A computer-aided screening system that uses machine learning classifiers such as
k-nearest neighbour (kNN), support vector machine (SVM), Gaussian mixture
model (GMM) and Adaboost in classifying retinopathy lesions from non-lesions are
analysed in (Roychowdhury et al., 2014), Several methods were used for automatic
retinopathy detection and diagnostic procedure of proliferative diabetic retinopathy
(Randive et al., 2019). This study used fundus images with varying illumination and
views to generate a severity grade for diabetic retinopathy. The work carried out by
Alazzam et al. (2021) claims a radial basis machine shows good diagnostic accuracy
and can be potentially used in screening for diabetic retinopathy. With the support
of machine learning and advanced intelligence, the Diabetes Web would provide the
essential support and guidance to predict both diabetes and diabetic retinopathy.
The Diabetes Web, developed as an application in this research work,1 is a website
which will mainly benefit clinicians and ophthalmologists where they will be able to
predict diabetes and diabetic retinopathy, such as identifying which stage of diabetes
T.me/nettrain
116 | Machine Intelligence
retinopathy the patient is in, based on the blood count and image of the retina
respectively.
The main objectives of the Diabetes Web will be prediction of diabetes and dia-
betic retinopathy, which will be helpful for patients to start their treatment based on
the results obtained through our website. Because of late diagnosis, diabetic retinop-
athy lead to permanent blindness and many other health issues. Hence this chapter
addresses the following tasks:
The Diabetes Web is designed as an open-source web application where the users will
be able to predict diabetes by giving the count of insulin, glucose content and age.
Also, the users can upload images of the retina and get instant results for diabetic
retinopathy (DR). Manual effort is reduced and the application is easily accessible
from any device across platforms.
This research work is carried out as an analysis to serve the needs of
ophthalmologists and lab technicians, and for rural areas (Tier 4 cities), with small
dispensaries, who do not have medical professionals for an emergency but can afford
a fundus camera. It also acts as a tool for doctors to prioritize cases based on the
severity of the disease condition.
6.3 Applicability
Diabetes Web is a website where the users will be able to predict diabetes. If the user
is found to be diabetic, then the users can check their diabetic retinopathy status as
well on the same website. The users are also able to check on which level of DR they
are, by uploading the fundus image of the retina. This website will be very useful
for clinicians as well as ophthalmologists. They can check the tests which they are
doing manually and confirm both diabetes as well as diabetic retinopathy. Diabetes
Web can be considered one of the most important tools for doctors to prioritize their
patients based on their criticality. Also, Diabetes Web will be more relevant in Tier 2
areas where patients are unable to afford ophthalmologists or experts.
The main aim of Diabetes Web is to predict whether the patient is diabetic or
not. Then there is another feature which could help ophthalmologists to detect
blindness before it happens. Millions of people suffer from diabetic retinopathy, the
leading cause of blindness among working aged adults ((APTOS, 2019). In India,
Diabetes Web will help to detect and prevent this disease among people living in
rural areas where medical screening is difficult to conduct. APTOS (Asia Pacific
T.me/nettrain
DR Detection Using Machine Learning Algorithms | 117
■ Clinicians: Clinicians can make use of Diabetes Web to check the diabetes
results of the patients. They can reconfirm their results in comparison with
the results they obtained through Diabetes Web.
■ Ophthalmologists: Ophthalmologists can make use of Diabetes Web to pre-
dict the different levels of DR by uploading the fundus image of the patient’s
retina.
■ Doctors: Doctors can also make use of this website for prioritizing their
patients based on the severity of their DR.
Figure 6.2 shows the flow of the research work carried out. Figure 6.2 helps to
explain the sub-systems making up the system and the framework for sub-system
control and communication. From Figure 6.2, we can see that the user will be giving
the input (in the form of an image which will contain the images of the retina)
through our user interface.
With the help of Flask microservices we put the image into our python backend.
From there the actual computation starts. As the first stage of processing we take
the image into the image processing unit. In that module we extract the necessary
information from the image, then, using a model, we order the extracted symbols
T.me/nettrain
118 | Machine Intelligence
and make that a digital equation. Then we have to confirm that with the user, so
that it is the same equation which is uploaded by the user. After receiving confirm-
ation from the user, we can put that digital equation into our pretrained model and
obtain the results. That result will be moved to the python backend. Again, with
the help of Flask microservices we can give that response to the user through our
user interface.
The visual part of our web-based computer application is the part through which
a user interacts with Diabetes Web. It determines how commands are given to the
system or the program and how data is displayed on the screen. Our website will be
a single page website where the user can input the images onto our site. Figures 6.3
and 6.4 show two pages of the dashboard of the Diabetes Web.
After uploading the image as an input to the system, the user will be able to see
the result on the same page. Also, the user will be able to download the solution as a
pdf, by clicking the print button. Figure 6.5 shows the diabetes prediction page. On
this page, the user can input the necessary basic details to the site and get an instant
result (diabetic or not) on the same page. Figures 6.6 and 6.7 show the diabetes ret-
inopathy prediction page and how to choose an image.
T.me/nettrain
newgenrtpdf
DR Detection Using Machine Learning Algorithms | 119
T.me/nettrain
Dashboard of Diabetes Web.
Figure 6.3
newgenrtpdf
120 |
Machine Intelligence
Figure 6.4 View of Diabetes Web dashboard page 2.
T.me/nettrain
newgenrtpdf
DR Detection Using Machine Learning Algorithms | 121
T.me/nettrain
Diabetes prediction page.
Figure 6.5
122 | Machine Intelligence
T.me/nettrain
DR Detection Using Machine Learning Algorithms | 123
■ Batch Normalization
■ ReLU activation
■ 3x3 Convolution
The authors found that the pre-activation mode (BN and ReLU before the Conv)
was more efficient than the usual post-activation mode.
6.4.2.1 Transition Layer
Instead of summing the residual as in ResNet, DenseNet concatenates all the fea-
ture maps. It would be impracticable to concatenate feature maps of different
sizes. Thus, in each dense block, the feature maps of each layer have the same size.
However down-sampling is essential to CNN. A transition layer is made of Batch
Normalization, 1x1 Convolution, Average pooling.
T.me/nettrain
124 | Machine Intelligence
We analysed around six models for diabetes prediction and also analysed two models
for DR detection.
6.4.4 Dataset Description
The system’s dataset comes from the Mendeley Diabetes Types dataset. This infor-
mation was gathered from Iraqi society at the Medical City Hospital’s labora-
tory (the Specialized Center for Endocrinology and Diabetes-Al-Kindy Teaching
Hospital). The data from the patients’ file is extracted and entered into the data-
base. Medical information and laboratory analysis make up the data. There are
103 patients with no diabetes, 53 with pre-diabetes, and 844 with diabetes in the
database.
For diabetic prediction, the results are as shown in Table 6.1. Here the accuracy
was high for the KNN (k nearest neighbours) algorithm, so we have connected
KNN with the frontend.
For DR detection, the images are accessed from www.kaggle.com/competitions/
diabetic-retinopathy-detection/data, and we have created a couple of models and the
results are as shown in Table 6.2.
Here we obtained more accuracy for DenseNet so we have taken DenseNet
model to connect the user interface.
6.5 Conclusion
In this chapter, a web application called Diabetes Web is introduced to analyse a
patient’s diabetes level. This website helps to identify the degree of diabetic retin-
opathy in a patient. Basic machine learning applications are used to detect if the
patient is diabetic or not. Deep learning techniques such as CNN and DenseNet121
are used to analyse the retinopathy level of a patient. This web application can be
used by clinicians, ophthalmologists and physicians. One of the major problems
faced by diabetics today is the inability to track blood sugar levels that can lead to
blindness. This application can also be used for diabetics diagnosis and early vision
recovery.
T.me/nettrain
newgenrtpdf
DR Detection Using Machine Learning Algorithms | 125
T.me/nettrain
Figure 6.8 Test images.
126 | Machine Intelligence
Figure 6.10 One level deeper look at DenseNet-121. Dense Block and
Transition Block.
T.me/nettrain
DR Detection Using Machine Learning Algorithms | 127
Note
1 The Diabetes Web is available from the authors at: C:/ Users/
Yadukrishna/
Downloads/yadu_web%20(2)/yadu_web/light-bootstrap-dashboard-master/BS3/
dashboard.html
References
Antal, B. and Hajdu, A. (2014) “An Ensemble-Based System for Automatic Screening of
Diabetic Retinopathy”. Knowledge-Based Systems, 60: 1–8.
APTOS (2019) “Kaggle: APTOS 2019 Blindness Detection”. Available at: www.kaggle.com/
c/aptos2019-blindness-detection (accessed 12 April 2021).
Bader Alazzam, M., Alassery, F. and Almulihi, A. (2021) “Identification of Diabetic
Retinopathy through Machine Learning”. Mobile Information Systems, 2021: 1–8.
California Healthcare Foundation (2015) “Kaggle Data Repository”. Available at: www.kag
gle.com/c/diabetic-retinopathy-detection (accessed 5 April 2021).
Chablani, M. (2017) “Towards Data Science”. Available at: https://towardsdatascience.com/
densenet-2810936aeebb (accessed 5 April 2021).
Conrad Stöppler, M (2021) “Definition of Diabetic Retinopathy” RxList. Available at: www.
rxlist.com/diabetic_retinopathy/defi nition.htm (accessed 5 April 2021).
Douillard, A. (2018) “Densely Connected Convolutional Networks”. arXiv. 1608 06993.
Available at: https://ieeexplore.ieee.org/document/8099726
Gondhalekar, A. (2020) Data Augmentation: Is It Really Necessary? Delhi: Analytics Vidhya.
Available at: https://medium.com/analytics-vidhya/data-augmentation-is-it-really-
necessary-b3cb12ab3c3f.
Jee, G., Harshvardhan, G., M., Gourisaria, M. H., Singh, V., Rautaray, S. S. and Pandey,
M. (2021) “Efficacy Determination of Various Base Networks in Single Shot Detector
for Automatic Mask Localisation in a Post COVID Setup”. Journal of Experimental &
Theoretical Artificial Intelligence, 35(3): 345–364. doi: 10.1080/0952813X.2021.1960638
Narasimha-Iyer, H., Can, A., Roysam, B., et al. (2006) “Robust Detection and Classification
of Longitudinal Changes in Color Retinal Fundus Images for Monitoring Diabetic
Retinopathy”. IEEE Transactions on Biomedical Engineering, 53: 1084–1098.
Niemeijer, M., Abramoff, M. D., and van Ginneken, B. (2009) “Information Fusion for
Diabetic Retinopathy CAD in Digital Color Fundus Photographs”, IEEE Transactions
on Medical Imaging, 28: 775–785.
T.me/nettrain
128 | Machine Intelligence
Osareh, A., Shadgar, B., and Markham, R. (2009) “A Computational Intelligence Based
Approach for Detection of Exudates in Diabetic Retinopathy Images”. IEEE Transactions
on Information Technology in Biomedicine, 13 : 535–545.
Pires, R., Jelinek, H. F., Wainer, J., et al. (2013) “Assessing the Need for Referral in Automatic
Diabetic Retinopathy Detection”. IEEE Transactions on Biomedical Engineering, 60:
3391–3398.
Randive, S. N., Senapati, R. K., and Rahulkar, A. D. (2019) “A Review of Computer-Aided
Recent Developments for Automatic Detection of Diabetic Retinopathy”. Journal of
Medical Engineering & Technology, 43(2): 87–99.
Roychowdhury, S., Koozekanani, D. D., and Parhi, K. K. (2014) “DREAM: Diabetic
Retinopathy Analysis Using Machine Learning”. IEEE Journal of Biomedical and Health
Informatics, 18(5): 1717–1728. doi: 10.1109/JBHI.2013.2294635. PMID: 25192577.
Sil Kar, S. and Maity, S. P. (2016) “Retinal Blood Vessel Extraction Using Tunable Bandpass
Filter and Fuzzy Conditional Entropy”. Computer Methods and Programs in Biomedicine,
133 : 111–132.
Taylor, D. J., Goatman, K., Gregory, A. A., Histed, M., et al. (2009) “Image-Quality
Standardization for Diabetic Retinopathy Screening”. Expert Review of Ophthalmology,
4: 469–476.
Walter, T., Klein, J. C., Massin, P., et al. (2002) “A Contribution of Image Processing to the
Diagnosis of Diabetic Retinopathy Detection of Exudates in Color Fundus Images of
the Human Retina”. IEEE Transactions on Medical Imaging, 21: 1236–1243.
Zhang, L., Li, Q., You, J., et al. (2009) “A Modified Matched Filter with Double-Sided
Thresholding for Screening Proliferative Diabetic Retinopathy”. IEEE Transactions on
Information Technology in Biomedicine, 13: 528–534.
T.me/nettrain
Chapter 7
IIoT Applications
and Services
P. Shanmugavadivu, T. Kalaiselvi,
M. Mary Shanthi Rani, and P. Haritha
7.1 Introduction
The Internet of Things (IoT) deals with the connection of internetworked sensors,
devices, machinery, and gadgets through supportive software and technologies.
The IoT generates and exchanges data across the internet [1]. It is a path-breaking
New Age technology that finds a place in the automation of complex and labor-
intensive manual processes. It has a record of numerous applications in several
domains including agriculture, healthcare, security, remote sensing, industry, wea-
ther forecasting, automated vehicles, wearable devices, and smart home appliances,
IoT can be deployed in a toothbrush and can be scaled up for a massive manu-
facturing industry [2]. The Internet of Things variously can be found in a heart
monitor that analyzes the heartbeat of a person, a biochip transponder that can
provide a unique identification number for an animal [3], a sensor in an auto-
mobile that can alert the driver when the air pressure of a tire drops, etc. [4]. IoT
devices can be interfaced without a keyboard or a screen. The virtual assistants such
as Alexa and Siri are IoT-enabled software that gathers information across the globe.
From the industrial perspective, IoT devices produce a rich set of data, which plays
a vital role in automated operation control for decisive, prediction-based safety and
security measures [5].
T.me/nettrain
IIOT Applications and Services | 131
■ Open Source Platform: The open source software plays a major role in IoT.
The Linux operating system is predominant in IoT. Arduino is a popular
open source electronic platform that provides an integrated development
environment (IDE) for hardware and software IoT prototypes using sensors.
Devicehub.net serves as a viable interface between the interactive IoT and
machine- to-
machine (M2M). This software also provides support for
device control, data management, and wireless control. The IoT Toolkit is
an advanced platform that offers deployment services for IoT to interface
with embedded agents and support prototyping with Raspberry Pi, CoAP
(Constrained Application Protocol) proxies and data-compatible frameworks.
SiteWhere supports the fabrication of scalable IoTs on the cloud as well as Big
Data management. ThingSpeak facilitates real-time data processing and visu-
alization. Webinos is a specialized web programming platform for platform-
independent IoTs.
■ Big Data Technologies: The technology-driven way of life has triggered the
usage of IoT and the generation of Big Data by IoT related to automobiles,
healthcare, business, industry, meteorological devices, the military, smart
devices, agriculture, e-governance, e-commerce, etc. As IoT is inseparable
from Big Data, the implicit potential of Big Data is harnessed using the
appropriate tools for data engineering, information processing and analytics,
using HBase, MongoDB, etc. [7].
■ Cybersecurity: The IoT devices are prone to security threats due to the vulner-
able open-source code, APIs (application programming interfaces), security
integration, unpatched software, test cases as well as due to limited visibility
on the integration of IoT components. These threats can be suitably mitigated
through data encryption API security, periodic open-source software updates,
authentication schemes, compliance of IoT standards and DNS (Domain
Name Server) filtering [8].
■ Software-Defined Networking (SDN): SDN is an emergent architecture, which
offers flexible and dynamic centralized network management. It has a special
feature of decoupling network control and data forwarding functions. This
ensures higher-order abstraction and programmability for the network con-
trol plane [9]. Some of the SDNs available on the market are: Cisco DNA
Center, Cisco ACI, IBM Cloud Internet Services, Cradlepoint NetCloud
Engine, VMware NSX, and IBM Networking Services for Software Defined
Networks [10].
7.2 Components of IoT
An IoT system includes four principal components: (1) IoT sensors/devices; (2) con-
nectivity; (3) data processing; and (4) user-interface.
T.me/nettrain
132 | Machine Intelligence
■ Sensors or devices are designed to generate or gather data from their respective
environment. These gadgets are the bridge between the digital and the real world.
The data can be simple, say, reading a temperature or complex, such as recording
a video. An IoT set-up can have single or multiple sensors. For instance, a smart-
phone has multiple sensors, such as motion sensor, light sensor, geoposition
sensor, environment sensor, proximity sensor, compass sensor, biometric sensor,
meters, probes, gadgets, and actuators to measure various parameters.
■ Connectivity is the second crucial element that provides physical/virtual con-
nectivity among the sensors/devices through Wi-Fi, Bluetooth, cellular sat-
ellite, low-power Wide Area Networks gateway, routers, Ethernet, etc. The
optimality of connectivity is decided by the quantum of power consumption,
range coverage, and its bandwidth [11].
■ Data processing deals with handling raw data (batched or steaming data), format-
ting data (table, Spreadsheet, plain text, files, etc.), data storage (into local, remote
or cloud storage), processing and preparing output, and visualization [12].
■ User-interface deals with the delivery of information to the end-user, in a
human-interpretable form. This can be accomplished through command line,
menu-based, form-based, voice-based, and graphical user interface (GUI).
Alexa and Siri are the typical examples of voice-based interfaces.
The other important elements of IoT are security, gateway, application, and user.
In general, the seven layers of IoT are delineated as: (1) sensors; (2) sensor
to gateway network (e.g., BLE, LoRaWAN, ZigBee, and Sigfox); (3) gateway
(routers/modems); (4) gateway to internetwork (Ethernet, Wi-Fi, satellite, cel-
lular); (5) data ingestion and information processing (backend of mobile/web
applications); (6) internet-to-user network (user-view of the processed data); and
(7) value-added information (visualization of data for better insights leading to
strategic decisions).
A few trivial applications of IoT are [13]:
T.me/nettrain
IIOT Applications and Services | 133
languages, C and C++have been proved to be handy for developing interfaces and
embedded software for IoTs. Python has been proved to be a simple, but powerful
programming language to interface with Raspberry Pi CPU. As an interpreted lan-
guage, Python has a good collection of built-in functions to handle and process
volumes of data with ease.
T.me/nettrain
134 | Machine Intelligence
■ Python-based Microsoft Azure for IoT: Microsoft has updated its open-source
IoT extension, by augmenting Azure CLI 2.0’s functionalities. The Azure CLI
2.0 uses commands to communicate with the Azure Resource Manager and
management endpoints. For instance, we can create an Azure Virtual Machine
(VM) or IoT Hub using Azure CLI 2.0. Through the CLI extension, an Azure
service can enhance Azure CLI by allowing users to gain access to exclusive
services. The IoT extension gives programmers command-line access to the
IoT Hub Device Provisioning Service, IoT Edge, and IoT Hub [15].
7.4 IoT Hardware
IoT hardware encompasses a wide range of gadgets, including sensors, bridges,
and routing devices. These IoT devices handle system activation, security, action
definitions, communication, and the detection of support-specific objectives and
actions. IoT Low-power boards and single-board processors (Arduino Uno) are tiny
boards which are inserted into mainboards to improve and increase their function-
alities. The GPS, light and heat sensors, or interactive displays are the common IoT
hardware used. Figure 7.1 depicts one such Arduino Uno board.
The Raspberry Pi 2 is a popular IoT platform that is a small, extremely afford-
able computer that can function as a complete web server. It is frequently referred to
as “RasPi,” and it has sufficient processing and memory to run both IoT Core and
Windows 10 on it. Figure 7.2 shows the Raspberry Pi and its components.
RasPi exhibits excellent processing power, when programmed with Python.
A single- board computer called BeagleBoard uses an ARM (Advanced RISC
Machine) processor, which is more powerful than RasPi’s, that runs on a Linux-
based operating system. Here, RISC stands for Reduced Instructions Set Computers.
T.me/nettrain
IIOT Applications and Services | 135
Galileo and Edison boards from Tech Giant Intel are further alternatives; both
are suitable for larger-scale production. Qualcomm has produced a variety of
enterprise-level IoT technology for everything from automobiles and cameras to
healthcare [17].
The Beaglebone Black shown in Figure 7.3 has a faster processor than the
Raspberry Pi (1GHz vs. 700MHz) Instead of an ARM 11-based processor. It uses
an ARM Cortex A-8 processor. However, processing speed is influenced by the use
cases [18].
T.me/nettrain
136 | Machine Intelligence
7.5.1 Applications of IIoT
The scope and influence of IIoT are increasing over time. In the data-driven era,
the IoT serves as one of the primary sources of data for production and business
management. This section outlines the role of IoT applications and impact in the
industry sector [19]:
■ Equipment management and monitoring: IIoT can remotely control many pro-
duction plants across various geographical regions by using digital devices and
software. The goal of gathering and utilizing the historical data is to improve
the process/product and to create an environment for information-driven
decision-making.
T.me/nettrain
IIOT Applications and Services | 137
7.5.2 IIoT Sensors
The sensors are the most significant component of the IoT hardware. The IoT
sensors commonly used in the industry are primarily categorized into four types: (1)
mechanical sensors; (2) thermodynamic sensors; (3) electrical sensors; and (4) geo-
spatial sensors:
■ Mechanical sensors: The real-time mechanical IoT sensors can monitor both
static (time- invariant inputs) and dynamic (time- variant inputs). These
types of sensors are seamlessly used in all major industrial processes. Some
of the most frequently used mechanical sensors are gyroscopes, piezoelectric
transducers, tilt sensors, and accelerometers. These sensors are used by operators
to gauge aircraft pitch, yaw, and weights of solids in containers and the quan-
tity of oil in tanks.
■ Thermodynamic sensors: Thermodynamic sensors are designed to measure the
internal energy and specific heat at a given constant pressure. Since liquids
and gases are essential to several production processes, thermodynamic
sensors are often used in the manufacturing industry. The most commonly
used IoT thermodynamic sensors are: temperature sensors, flow meters, gas
sensors, humidity/moisture sensors, anemometers, and pressure sensors.
■ Electrical sensors: In order to monitor the automated and manual operations
involving electricity, these sensors are used. In many industrial IoT set-ups,
mechanical, thermodynamic, and electrical sensors work in tandem. Electrical
T.me/nettrain
138 | Machine Intelligence
T.me/nettrain
IIOT Applications and Services | 139
oil-drilling systems, and many more. Some of the most popular pressure
sensors used in industrial IoT applications include Pressure Systems Series 960
and 970, the Paroscientific Inc. Series 1000, 2000 and 6000, and Environdata
BP10 Series.
■ Temperature sensors: These sensors have found application in many
industrial IoT systems, such as FMCG (fast-moving consumer goods),
pharmaceuticals, biotechnology, and other sectors, where temperature
monitoring is essential. The common temperature sensors in industrial
IoT applications include Melexis MLX90614, Environdata TA40 Series,
and Geokon 4700.
■ Optical sensors: These sensors are designed to measure the properties of light,
namely wavelength, intensity, frequency, and polarization of light waves. These
sensors are capable of tracking any type of electromagnetic radiation, namely
light, electricity, and magnetic field. Optical sensors are used in industrial
automation process including telecommunications, elevators, and construc-
tion, healthcare and safety systems. The optical sensors VCNL4020X01and
TCxT1600X01 are fabricated by Vishay exclusively for industrial IoT
applications.
■ Image sensors: In several sectors, including healthcare, transportation, and
others, image sensing has immense potential to transform industrial automa-
tion. These sensors find a place in virtual monitoring in hospitals, factories,
and other facilities when connected with an IoT-enabled system. The gesture
analysis of a patient can send relevant messages or alerts to a physician during
critical conditions. Image sensors can alert a driver to emergency situations,
such as a train collision. Omron and Invisage are the successful examples of
integrating image sensing into the IoT [21]. Figure 7.4 portrays the applica-
tion of various sensors in operations and business.
T.me/nettrain
140 | Machine Intelligence
T.me/nettrain
IIOT Applications and Services | 141
for how the result might be used, by addressing the question: “What should
be done next?” Prescriptive analytics is ideal for Industrial IoT (IIoT) settings
where business intelligence-based choices are made by using the capabilities of
cloud/edge computing, Big Data analytics, and machine learning.
■ Diagnostic analytics: This type of analytics also finds a place in IIoT
applications. On identification of a faulty operation or failure of an event,
diagnostic analytics finds the reasons for that occurrence. This analytics aims
to address the question, “Why did it happen?”.
■ Adaptive analytics: The results of the predictive analytics must be adjusted
with real-time data during actual deployment. In order to do this, adaptive
analytics are employed to modify or optimize the outcomes, based on the
most recent data history and the correlation among the data.
■ Stellapps Technologies: Stellapps is one of the leading IIoT firms in India that
has gained enormous growth by offering top-notch services. This Bangalore-
based company is primarily focused on data collection and machine learning.
The SmartMooIoT platform of Stellapps collects data, using sensors built
into animal wearables, milking systems, milk chilling machinery, and
milk procurement accessories. The gathered data is sent to the Stellapps
SmartMooTM Big Data Cloud Service Delivery Platform (SDP), wherein the
StellappsSmartMooTM suite of applications extract and analyse the informa-
tion before sending the results of the analytics and data science to various
stakeholders via low-end and sophisticated mobile devices [22] (www.stella
pps.com).
■ Parentheses Systems PLC: Parentheses Systems, a deep-tech start-up, is thriving
on shifting the Industrial IOT ecosystem from a technology-driven eco-
system to a behavior-led transformation for small and medium-sized discrete
manufacturing companies. The HuMaC (Human Machine Convergence)
platform allows the orchestration of the “Data-Decision” cycle to maximize
man-machine margins. HuMaC is a human-centric paradigm of artificial
T.me/nettrain
142 | Machine Intelligence
T.me/nettrain
IIOT Applications and Services | 143
T.me/nettrain
144 | Machine Intelligence
T.me/nettrain
IIOT Applications and Services | 145
7.10 Applications of IoT
IoT has a wide spectrum of applications in the primary sectors such as healthcare,
agriculture, security, embedded systems, robotics and much more.
■ Smart Homes: The integration of IoT intelligent utility systems and entertain-
ment together forms Smart Homes. The utilities in a Smart Home include
electricity meter, locking systems, television, the lighting systems, surveil-
lance systems, refrigerator, washing machine, etc., with IoT to monitor the
respective activities.
■ Smart City: Cities use IoT- enabled Information and Communication
Technologies (ICT) to manage traffic flows, electricity, water distribution,
T.me/nettrain
146 | Machine Intelligence
transport, resources, and public services. Smart city mission aims at improving
the economic growth and increasing the well-being of the citizens [26].
■ Smart Farming: IoT, drones and ICT play a very important role in man-
aging agricultural practices, monitoring the related parameters such as micro-
climatic conditions, soil moisture and nutrients, irrigation and supply of
fertilizers and pesticide. These technologies enable the farmers to increase the
productivity with minimized resources, cost, and efforts [27].
■ Retail Business: Sales data from both online and offline retail can manage
warehouse robotics and automation using data from IoT sensors. The
analytics on IoT data helps in strategic planning and decision-making,
aiming at profit maximization, market expansion and penetration, product
modification, etc.
■ Wearable devices: The fitness bands and smart watches track a variety of health-
related parameters, including calories burned, distance traveled, heart rate,
and blood oxygen saturation. There are many other specialized health care
IoT gadgets that connect the patients and doctors round the clock.
■ Traffic monitoring: IoT plays a major role in smart city planning. For instance,
it helps metropolitan cities to manage their automobile traffic flows, using
Google Maps or Waze to collect and exchange data from cars using mobile
phones as sensors. In addition to aiding in traffic monitoring, it provides
information on the various traffic conditions, expected arrival times, and
distances to destinations.
■ Fleet management: The installation of IoT sensors in fleet cars establishes
effective communication between the drivers, vehicles, and management.
This guarantees that owners and drivers are well-informed on the condition,
functionality, and requirements of vehicles on the move.
■ Water supply: A sensor coupled with software that is externally placed in water
meters aids in the seamless data collection, processing, and analysis of data,
allowing for better knowledge about consumers’ behavior, supply service
issues detection, and results reporting [28].
T.me/nettrain
IIOT Applications and Services | 147
The right information at the right time by the IoT devices helps the managers to
receive the relevant information related to the maintenance, so that the mitigating
tasks can be planned well ahead. The predictive maintenance benefits in increasing
safety and extending the lifetime of the plants/equipment as well as avoiding
accidents and hazards [29].
7.11.2 Smart Metering
A smart meter is an internet-enabled device that detects the amount of energy,
water, or natural gas used by a building or residence. The conventional meters
can track only the total consumption whereas smart meters are able to detect the
quantum of resource consumption, along with the time-stamp. Industries make
use of smart meters to track the consumer usage as well as to fix the price as per
the usage. Moreover, this approach is helpful in hazardous industries like mining
as well as those with widely dispersed equipment or heavy reliance on envir-
onmental conditions. To avoid squandering resources or overloading its energy-
generating equipment, the energy sector relies on smart metering and remote
monitoring [30].
7.11.3 Location Tracking
GPS systems, RFID tags, and other wireless technologies can be used to track assets
and locations and provide up-to-date information to the companies.
7.11.4 Location Services
These are frequently used by logistics organizations to track goods whilst in transit;
and to reroute the drivers, when needed. Location-based IIoT technology can
be used in industrial warehouses to track the materials. These services enable the
employees to quickly locate the materials.
T.me/nettrain
148 | Machine Intelligence
7.12 Future of IIoT
It is estimated that the use of IIoT-enabled devices will double in future. The trans-
formative and destructive technologies namely AI, cloud storage and computing,
edge computing, mobile computing robotics, augmented reality, virtual reality,
human-computer interface (HCI), etc., will expand the scope of IIoT and trigger
the design and development of innovative IoT applications and products [33].
7.13 Conclusion
This chapter has dealt with the basics of IoT, IIoT, hardware and software support for
IoT, types of IIoT sensors, security concerns, applications, domains and the future
scope of IIoT. This technology is expected to record an exponential growth due its
anticipated expansion of application domain and innovations.
References
1. www.oracle.com/in/internet-of-things/what-is-iot/
2. https://aws.amazon.com/what-is/iot/
3. www.techtarget.com/iotagenda/defi nition/Internet-of-Things-IoT
4. www.scottgroup.ie/internet-of-things-iot
5. www.networkworld.com/article/3207535/what-is-iot-the-internet-of-things-explained.
html
6. https://medium.com/zeux/the-internet-of-things-iot-5-reasons-why-the-world-needs-
it-125fe71195cc
7. www.channelfutures.com/security/why-is-iot-popular-because-of-open-source-big-
data-security-and-sdn
8. www.comp t ia.org/ c ont e nt/ a rtic l es/ w hat- i s- i ot- c ybers e cur i ty#:~:text= IoT%20
security%20is%20one%20of,attacks%2C%20such%20as%20through%20botnets
T.me/nettrain
IIOT Applications and Services | 149
9. www.ciena.com/insights/what-is/What-Is-SDN.html
10. www.trustradius.com/software-defined-networking
11. www.leverege.com/iot-ebook/how-iot-systems-work
12. https://trackinno.com/iot/how-iot-works-part-3-data-processing/
13. www.biz4intellia.com/blog/7-layers-of-iot-what-makes-an-iot-solution-comprehensive/
14. https://trackinno.com/iot/how-iot-works-part-4-user-interface/
15. www.javatpoint.com/internet-of-things-with-python
16. www.topcoder.com/thrive/articles/python-in-iot-internet-of-things
17. www.it4nextgen.com/best-development-software-ide-internet-things-iot/
18. https://data-flair.training/blogs/iot-hardware/
19. https://nexusintegra.io/7-industrial-iot-applications/
20. https://blog.wellaware.us/blog/ultimate-guide-types-of-iot-sensors-transmitters-industr
ial-applications
21. www.embitel.com/blog/embedded-blog/7-most-commonly-used-sensors-for-develop
ing-industrial-iot-solutions
22. www.bisinfotech.com/top-iiot-companies-in-india-industry-leaders/
23. www.f6s.com/companies/industrial-iot/india/co
24. https://startuptalky.com/iot-startups-india/
25. www.dsci.in/blogs/iot-technology-in-india/
26. www.jigsawacademy.com/top-uses-of-iot/
27. www.iotsworldcongress.com/iot-transforming-the-future-of-agriculture/
28. www.simplilearn.com/iot-applications-article
29. www.ibm.com/blogs/internet-of-things/top-5-industrial-iot-use-cases/
30. www.datamation.com/trends/industrial-iiot-use-cases/
31. www.techtarget.com/iotagenda/feature/Top-5-industrial-IoT-use-cases
32. www.mytechmag.com/iot-in-manufacturing/
33. https://doi.org/10.1007/978-3-030-37468-6_27
T.me/nettrain
T.me/nettrain
Chapter 8
Design of Machine
Learning Model
for Health Care Index
during COVID-19
Nishu Gupta, Soumya Chuabey, Ity Patni,
Vaibhav Bhatnagar, and Ramesh Chandra Poonia
8.1 Introduction
The lockdown due to COVID-19 very badly affected the economies of almost every
countries but its degree of impact will be based on various factors like the spread of
the virus, the duration of covering all the stages of coronavirus and the recovery rate
of the patients. Slowdown in the economy in the years 2017–2018, 2018–2019 and
now 2019–2020 have made it difficult to achieve the target of $5 trillion economy
by 2024–2025. India lost its fifth position in the GDP rankings in 2018 compared
to 2017. The consumption of consumer goods has slowed down, which has affected
the share of capital investment in India. With a slump in production and consump-
tion, the dream of being the third largest economy seems far away. In 2017, the size
of the Indian economy stood at $2.65 trillion, the fifth largest economy. In 2018, the
growth rate of India in comparison to the United Kingdom and France was minimal
as the former grew by 3.01 percent to $2.73 trillion and the latter economies saw a
growth of 6.8 percent and 7.3 percent, respectively, due to which India lost its fifth
position and was pushed into seventh position in the World Bank’s GDP rankings in
2018 [1]. The pandemic has affected all sectors due to the complete lockdown, and
the health care sector, which was assumed would remain unaffected, is also no excep-
tion. This chapter is an attempt to observe the volatility in the Health Care Index of
the Bombay Stock Exchange (BSE) and to predict the future movement of the Index
using the Auto Regressive Integrated Moving Average (ARIMA) model. The Health
Care Index of the BSE reported the biggest monthly gain in 21 years in the month
of April 2020. The index rallied around 27 percent in the same month [2]. The hope
of investors that demand would be created is connected to the development of a
vaccine against this deadly disease.
The Indian health care sector can be a valuable bet as it includes pharma
stocks too. India has done remarkably well in the field of generic medicines in the
pharmaceutical industry and is the largest supplier of generic medicines globally.
The Indian pharmaceutical sector industry accounts for over 50 percent of global
demand for various vaccines, nearly 40 percent of demand from the USA and
25 percent from the United Kingdom are fulfilled by the Indian pharmaceutical
manufacturers and suppliers. India supports the second largest share of pharma-
ceutical and biotech workforce on the global platform. The valuation of the
pharmaceutical industry in India was calculated at $33 billion in 2017 [2]and $38
billion in June 2019 [3]. The turnover of India’s domestic pharmaceutical market
in 2018 was measured as Rs 1,29,015 ($18.12 billion), which grew at 9.4 per-
cent year-on-year from Rs 1,16,389 crore ($17.87 billion) in 2017 [1]. India has
earned a very good reputation in the pharmaceutical industry at a global level by
developing a large pool of engineers and scientists who have been instrumental in
winning appreciation for their research work, and they have the potential to direct
the industry forward to achieve more accolades at global level., Around 80 per-
cent of the antiretroviral drugs which are used to fight AIDS (Acquired Immune
Deficiency Syndrome), a disease which has killed millions of people across the
globe, are supplied by Indian pharmaceutical manufacturers and suppliers. The
pharmaceutical and allied sectors like health care and insurance have contributed
in a major way to India’s economic growth and have played a very important role
in the comparatively lower mortality rate in India in recent years. Estimates suggest
that this industry has become a major source of employment creation in India
with over 2.7 million people dependent directly or indirectly on highly skilled
areas like manufacturing and research & development activities. The industry is
among the top five best performing sectors which generates a trade surplus of over
$11 billion and it has helped the Indian economy to reduce the burden of trade
deficit [4]. Foreign direct investors have shown tremendous interest in the Indian
pharmaceutical industry and they have invested more than $2 billion over the past
three financial years, making it one of the top 10 priority sectors attracting foreign
direct investment (FDI).
The Indian government is focusing more upon personal hygiene and care and
has significantly increased the Personal Hygiene Cost (PHC) in the short run. It will
help in creating community awareness of hygiene, which will have a positive impact
T.me/nettrain
Machine Learning Model for Health Care during COVID-19 | 153
on health care and allied industries like the insurance sector and the pharmaceutical
sector, not only in India but at a global level. Due to the nationwide lockdown, the
general medical facility has been badly impacted and has created a gap in the care
of patients with other health problems, especially chronic diseases. This will have a
long-term impact on the health care industry as the patients’ situation will become
more critical as their treatment was postponed.
The aims of this chapter are:
■ to design a time series model for health care closing price data, starting from
1 February 1999 to March 2020.
■ to study the impact of COVID-19, by comparing actual closing prices and
forecast values from the developed model, starting from 23 March 2020 to
July 2020.
■ to forecast the trend up to December 2020 based on the model developed.
8.2 Literature Review
Prediction of the price of a stock or index has always spurred the interest of investors.
[5] revealed that ARIMA is useful for predicting short-term movement in prices. [6]
summarized the performance through six forecasting methods, such as minimum
error, maximum error, mean error, standard deviation, root mean square error and
ARIMA, the forecast accuracy was very acceptable. [7, 8] discussed technical, fun-
damental long-and short-term approaches for prediction and highlighted that a
combination of robust tests would provide accurate forecasting. [9, 10] fitted the
ARIMA model for 10 years and 2 years RMSE and MEA observed for accuracy pre-
diction. This shows that short-term prediction was more accurate compared to long-
term prediction. [11] applied the model to 56 different stocks in various sectors.
The prediction percentage of model fits at 85 percent. [12] attempted the model
fit on NSE-Nifty Midcap 50 companies and the trend for the future is observed.
[13, 14] also attempted to forecast the future unobserved prices of the index and
data validated with Sensex movements of 2013. [15] explored how the German
market behaved in the same manner as North America, and the Asian and European
markets responded.
T.me/nettrain
154 | Machine Intelligence
financial data for the short term. The following steps are followed to predict the
future:
∆Rt = Rt − Rt −1 (8.1)
∆Rt = (1 − B ) Rt (8.3)
■ The stationarity of the series is obtained through the dth difference of Rt.
∆ dt R = (1 − B ) dt R (8.4)
■ Univariate models: Under the univariate model, only one variable is measured
over time based on past series. The Box-Jenkins is developed to explain the
process of generating series as well as of forecasting. Further, ARIMA is used
for analysis purposes.
■ ARIMA model: A pth -order and qth -order autoregressive model is integrated
with moving average.
The general equation for the ARIMA model is:
Where:
Rt =the actual data set used for converting the non-stationery data into sta-
tionery data by differentiating the time series value.
∅, θ = Parameters
T.me/nettrain
Machine Learning Model for Health Care during COVID-19 | 155
P =The time series value that are regressed on their own lagged values
q =Moving average value that is number of lagged values of the error term
∈t = The random error at t
The procedure for processing ARIMA model is done in a few steps, namely
testing, identification, estimating and forecasting [19, 5].
■ Model fit checking: Model fit checking needs to be performed to know the
reliability and validity of data. Further, prediction of future pattern of series
can be performed.
■ Linear trend model: Trend analysis of the closing price is done using a linear
trend model. The fitted trend equation and accuracy measure are shown in
Table 8.1. The graph is shown in Figure 8.1.
T.me/nettrain
156 | Machine Intelligence
In Table 8.2, since the p value is greater than 0.05, it can be inferred that the given
dataset is not stationary.
The first difference linear equation model: First differencing is performed to con-
vert the data into stationary form. Figure 8.2 shows the same behavior after the first
differencing. The first difference linear equation model is shown below:
The accuracy measure of the first difference linear model is shown in Table 8.3.
■ Auto correlation function and partial auto correlation function: The coefficient
of the correlation between two values in a time series is called the autocorrel-
ation function (ACF). For example, the ACF for a time series ytyt is given by:
T.me/nettrain
Machine Learning Model for Health Care during COVID-19 | 157
Corr(yt,yt−k),k=1,2,....Corr(yt,yt−k),k=1,2,....
This value of k is the time gap being considered and is called the lag. A lag 1
autocorrelation (i.e., k =1 in the above) is the correlation between values that are one
time apart. More generally, a lag k autocorrelation is the correlation between values
that are k time periods apart. The graph of ACF is shown in Figure 8.3.
The ACF is a way to measure the linear relationship between an observation at
time t and the observations at previous times. If we assume an AR(k) model, then we
may wish to only measure the association between ytyt and yt−kyt−k and filter out
the linear influence of the random variables that lie in between (i.e., yt−1,yt−2,…
,yt−(k−1)yt−1,yt−2,…,yt−(k−1)), which requires a transformation of the time series.
Then by calculating the correlation of the transformed time series we obtain the
partial autocorrelation function (PACF). The graph of PACF is shown in Figure 8.4.
T.me/nettrain
158 | Machine Intelligence
T.me/nettrain
Machine Learning Model for Health Care during COVID-19 | 159
■ Auto correlation function and partial auto correlation function: In this stage,
the ARIMA model is applied with different values of p, d, q which are shown
in Table 8.4. MSE is found between the forecasting values starting from 1
March 2020 to 31 August 2020.
Since we have seen from Figure 8.4 and Table 8.4, we have selected the model
111 for further forecasting.
T.me/nettrain
160 | Machine Intelligence
T.me/nettrain
Machine Learning Model for Health Care during COVID-19 | 161
8.5 Conclusion
The research has been undertaken to forecast the closing price of the BSE Health
Care Index through the ARIMA model and to show the effect of COVID-19 on the
closing price which is decently predicted by the ARIMA model. The ARIMA model
should be used for short-term forecasting and it also helps the investor in making
the short-term investment predictions. It is suggested that investors do not panic by
investing in health stocks as many things are postponed, such as operations, elective
surgeries, major treatments, international patients, etc. till life get back to normal.
Thus, this sector is showing the same pattern that is existing nowadays that will be
followed in the future also, as suggested by the data.
References
1. Radhakrishnan Vignesh and Sumant Sen (2019) “The Downturn in the Indian Economy
Continues.” The Hindu, 6 August. Available at: www.thehindu.com/data/the-downturn-
in-the-indian-economy-continues/article28836274.ece.
2. IBEF (n.d.) “Brand India.” Available at: www.ibef.org/industry/indian-pharmaceuticals-
industry-analysis-presentation.
3. Indian Pharmaceutical Alliance (2019) The Indian Pharmaceutical Industry -The Way
Forward, Indian Pharmaceutical Alliance, June, pp.1–33.
T.me/nettrain
162 | Machine Intelligence
4. FICCI (2013) “Indian Life Sciences: Vision 2030 Expanding Global Relevance and
Driving Domestic Access.” 2013-14.1-7.
5. Adebiyi A. Ariyo, Adewumi O. Adewumi, and Charles K. Ayo (2014) “Stock price
Prediction Using the ARIMA Model.” Paper presented at 2014 UKSim-AMSS 16th
International Conference on Computer Modelling and Simulation. IEEE.
6. Jaydip Sen, and Tamal Chaudhuri (2017) “A Time Series Analysis-Based Forecasting
Framework or the Indian Healthcare Sector.” Journal of Insurance and Financial
Management, 3(1).
7. Dev Shah, Haruna Isah, and Farhana Zulkernine “Stock Market Analysis: A Review and
Taxonomy of Prediction Techniques.” International Journal of Financial Studies, 7(2):
26.
8. Sheikh Mohammad Idrees, M. Afshar Alam, and Parul Agarwal (2019) “A Prediction
Approach for Stock Market Volatility Based on Time Series Data.” IEEE Access, 7:
17287–17298.
9. Madhavi Latha Challa, Venkataramanaiah Malepati, Siva Nageswara and Rao Kolusu
(2018) “Forecasting Risk Using Auto Regressive Integrated Moving Average Approach:
An Evidence from S&P BSE Sensex.” Financial Innovation, 4(1): 24.
10. George E. P. Box, et al. (2015) Time Series Analysis: Forecasting and Control. Hoboken,
NJ: John Wiley & Sons.
11. Prapanna Mondal, Labani Shit, and Saptarsi Goswami (2014) “Study of Effectiveness
of Time Series Modeling (ARIMA) in Forecasting Stock Prices.” International Journal of
Computer Science, Engineering and Applications, 4(2): 13.
12. B. Uma Devi, D. Sundar, and P. Alli (2013) “An Effective Time Series Analysis for Stock
Trend Prediction Using ARIMA Model for Nifty Midcap-50.” International Journal of
Data Mining & Knowledge Management Process, 3(1): 65.
13. Debadrita Banerjee (2014) “Forecasting of Indian Stock Market Using Time-Series
ARIMA model.” Paper presented at 2nd International Conference on Business and
Information Management (ICBIM). IEEE.
14. C. Narendra Babu and B. Eswara Reddy (2014) “Selected Indian Stock Predictions
Using a Hybrid ARIMA-GARCH Model.” Paper presented at 2014 International
Conference on Advances in Electronics Computers and Communications. IEEE.
15. Jeffrey E. Jarrett, and Janne Schilling (2008) “Daily Variation and Predicting Stock
Market Returns for the Frankfurter Börse (Stock Market).” Journal of Business Economics
and Management, 9(3): 189–198.
16. Pier Paolo Ippolito (2019) “Stock Market Analysis Using ARIMA.” May 21. Available
at: https://towardsdatascience.com/stock-market-analysis-using-arima-8731ded2447a.
17. Andhra, M. Pradesh, Venkataramanaiah, and Golden Valley Integrated Campus. (2018)
“Forecasting Time Series Stock Returns Using ARIMA: Evidence from S&P BSE
Sensex.” International Journal of Pure and Applied Mathematics, 118: 24
18. J. Van Greunen, A. Heymans, C. Van Heerden, and G. Van Vuuren (2014) “The
Prominence Of Stationarity in Time Series Forecasting.” Studies in Economics and
Econometrics, 38(1): 1–16.
19. Milind Paradkar (2017) “Forecasting Stock Returns Using the ARIMA Model.” Quant
Institution, Blog, March 9. Available at: https://blog.quantinsti.com/forecasting-stock-
returns-using-arima-model/>.
T.me/nettrain
Chapter 9
Ubiquitous Computing
and Augmented
Reality in HCI
Nancy Jasmine Goldena and Thangapriya
9.1 Introduction
Human-computer interaction (HCI) is a branch of computer science that studies
how people and computers interact. HCI researchers study how people interact with
computers and create technologies that enable people to connect with computers in
unique ways. HCI is at the crossroads of computer science, cognitive science, design,
mass communication, and a number of other specialties. The most fundamental
and essential aspects of HCI are augmented reality (AR) and ubiquitous computing
(UC) [1–7].
example, a user may want to drive from home to the gym. Smartphones, wearable
devices, and other fitness trackers may be required. Wireless communication and
networking technologies, mobile devices, embedded systems, wearable computers,
radio frequency identification (RFID) tags, middleware, and software agents are all
examples of UC [8].
9.2.1 UC’s History
Mark Weiser, the Chief Technologist at Xerox Palo Alto Research Center (PARC),
coined the term “ubiquitous computing” in 1988. Weiser published some of
the initial papers on the subject, both alone and with PARC Director and Chief
Scientist John Seely Brown, primarily defining it and sketching out its significant
obstacles. The Active Badge, a “clip-on computer” the size of an employee ID card,
was designed at the Olivetti Research Laboratory in Cambridge, England, allowing
businesses to track the exact locations of people in a building as well as the objects
to which they were attached [8].
At PARC, Weiser was inspired by social scientists, anthropologists, and
philosophers, and he explored a new approach to computers and connectivity. He
believes that people live by following their routines and knowledge bases, so that
the most powerful things are effectively invisible in usage. This is a problem that
affects the entire field of computer science. First and foremost, it makes the world
a better place by activating computers. Hundreds of wireless computer devices
of different sizes, per person and per office, are offered. This necessitated new
work in areas such as operating systems, user interfaces, the internet, wireless,
graphics, and a variety of other domains. At that time, Weiser called this work
“ubiquitous computing.” This is distinct from personal digital assistants (PDAs)
and Dynabooks. UC that does not sit on a personal device of any kind, but is
everywhere. Weiser believed that after 30 years, most interface design and com-
puter design had followed the route of the ‘dramatic’ machine. UC’s ultimate
goal is to create a computer that is thrilling, magnificent, and fascinating. Weiser
suggests the “invisible” path, which is a less-traveled path where the main aim is
to make a computer so embedded, so comfortable, that the user stops worrying
about it. Weiser believes that the second path will come to dominate in the next
20 years. But it won’t be easy, PARC has been testing several versions of the future
infrastructure. In the late 1990s, IBM’s pervasive computing group popularized
the phrase “pervasive computing” [8].
9.2.2 Characteristics of UC
The primary purpose of UC is to create linked smart products and make communi-
cation and data sharing easier and less intrusive [3]. The following are some of the
key characteristics of UC:
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 165
9.2.3 UC’s Layers
HCI technologies such as IoT, Cloud Computing, and the others enable users to
access wireless sensor networks. Such sensor networks gather data from a device’s
sensors before transferring it to a server. Three of the most common layers that it
may encounter when transmitting are shown in Figure 9.1.
Layer 1: The Task Management layer examines user tasks, context and index. It
also manages the territory’s complicated interdependencies.
T.me/nettrain
166 | Machine Intelligence
9.2.4 UC Types
UC can be categorized based on its portability, simplicity, technical capability,
mobility, and accessibility. UC is widely divided into four categories, which are
shown in Figure 9.2: (1) portable computing; (2) pervasive computing; (3) calm
computing; and (4) wearlable computing.
■ Portable computing: Laptops and personal devices have made computing port-
able. Computers are now easily portable, allowing them to be carried wher-
ever they are needed. Portable computers are used in a variety of industries
and for a variety of purposes from the television industry to the military.
Notebook and subnotebook computers, hand-held computers, palmtops, and
PDAs are all examples of portable computers.
■ Pervasive computing: Pervasive computing attempts to make our lives easier
by providing tools that make it simple to handle data. Portable handheld
personal assistant devices with high speed, wireless connectivity, lower power
consumption rate, data storage in durable memory, small storage devices, color
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 167
display video, and voice processing technologies will be available through per-
vasive computing. Electronic toll systems on highways are examples of perva-
sive computing, as is monitoring software like Life360, which can track a
user’s position, speed, and how much battery life they have left.
■ Calm computing: Calm computing refers to a level of technical development
in which the primary role of a user is not computing, but rather augmenting
and bringing important information to the experience. Calm computing
prioritizes people and tasks rather than computation and data. The next gen-
eration of linked devices is known as Calm Design. Its goal is to connect
customers to their devices and other people in a way that allows them to live
their lives without interruptions or pop-ups. A good example of calm tech-
nology is video conferencing. In contrast to telephone conferences, informa-
tion can be gathered through gestures and facial expressions.
■ Wearable computing: The study of creating, designing, and managing com-
putational and sensory systems is known as wearable computing. Wearable
computing allows users to work, interact, and entertain while maintaining
mobility and access to the device without using their hands or eyes. Wearables
have the potential to improve interaction, learning, perception, and practical
abilities. A wearable can filter user’s calls, provide a reference, track user’s
health, remind user of a name and even assist users in catching the bus.
Wearable technology refers to a group of electronic devices such as smart
watches, rings, smart clothes, head-mounted displays, and medical devices
that make life easier by connecting fitness, convenience, and lifestyle to the
rest of the world [8].
9.3 UC Devices
Sensors are continually evolving, becoming more compact, lighter, more precise,
durable, effective, reactive, and with increased communication capabilities. These
key components, as well as the availability of new technologies, are supporting the
expansion of the consumer electronics sensor industry, improving efficiency. For
example, camera sensors are frequently used in smart systems to collect photos and
videos of users interacting with the system. Digital color cameras can be used as
sensing devices to interpret human hand positions, postures, and gestures, which can
be converted into appropriate commands for controlling nearly any digital system.
The capacity to recognize and track human movements and poses in the field is one
of the most important features of camera-based HCI systems [9].
9.3.1 Smartwatches
Smartwatches are probably the best-known and most-used smart wearable in the
workplace today. When a smartwatch is linked to a smartphone, the wearer can
T.me/nettrain
168 | Machine Intelligence
receive and send new messages directly from their watch, removing the need to hold
and view the phone.
Apple’s smart watches come in a variety of designs, with the features expanding
with each new version (Figure 9.3). Safety features like fall detection, emergency
SOS, and high and low heart rate notifications make the smartwatch useful for eld-
erly family members, even if they don’t own an iPhone. Calling, texting, and loca-
tion sharing are simple ways to keep in touch with family through the smartwatch.
The Fitbit Luxe’s lozenge configuration attracts the fashion-conscious, while still
providing all of the health and fitness tracking, phone management, and even men-
strual tracking that a regular wearable gadget provides (Figure 9.4). When compared
with Apple smartwatches, the Fitbit is cheaper.
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 169
9.3.2 Smart Rings
Smart rings give consumers social feedback and can be used to interact with their
surroundings in ways that other smartwatches and portable devices can’t. When a
text message, phone call, or other notification is received, certain smart rings provide
notifications to alert the user.
The NFC Opnring allows the user to navigate apps, lock and unlock their door,
and transfer data without being charged (Figure 9.5).
The Oura Ring delivers useful health information, such as sleep patterns and
daily exercise (Figure 9.6). The Oura also monitors a variety of biometrics that many
other wearable tech gadgets don’t, and notifies the user if something is unusual.
T.me/nettrain
170 | Machine Intelligence
Heat training can help you improve your fitness or prepare for off-road,
marathons, and Iron Man competitions. A core body temperature sensor can
assist you in reaching your goal of successful heat training. Previously, extreme
athletes could only track the metric with invasive measures like electronic pills
(Figure 9.7).
Breathing training is an important therapy for asthmatics, athletes who want to
improve their lung capacity, and individuals suffering from COVID-19. The Airofit
Pro Breathing Trainer (Figure 9.8) and its companion software can help users to
breathe faster and more efficiently by improving their lung function.
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 171
9.3.4 Smart Earphones
Smart headphones, also known as smart earbuds or hearables, are already in audio
devices that do more than just transmit audio. These headphones are often wireless
and can connect to your phone, tablet, computer, or any other Bluetooth-enabled
device.
Some research shows sleeplessness is one of the most common health problems.
Sleep is taking a central role most of the time. The Bose Sleepbuds II Earphones have
been professionally proven to improve sleep. They won’t play radio or songs; instead,
they’ll play sounds from the Bose Sleep App (Figure 9.9).
9.3.5 Smart Clothing
Smart clothing can provide deeper insights than other forms of modern wearable
technology since it makes contact with a larger area of the body, allowing for enhanced
tracking for both medical care and lifestyle development. Samsung initiated sub-
stantial research in this field and has filed a number of interesting patents. If these
patents become commercially accessible products, Samsung might produce smart
shirts that diagnose respiratory problems and smart shoes that track running form,
in the coming years. Consumers can already buy Siren Socks, smart socks that detect
developing foot ulcers, Wearable X Nadi X smart pants (yoga pants that vibrate to
improve form during yoga exercises), and Naviano smart swimsuits that provide
alerts when the user should apply sunscreen (Figure 9.10).
9.4 UC’s Applications
Nowadays, UC is a leading and unique technology that can be applied in a variety
of research domains as well as other disciplines. This adaptable UC is used in a wide
range of industries, particularly as sensors everywhere.
T.me/nettrain
172 | Machine Intelligence
9.4.1 Healthcare Industry
UC is used in everyday hospital administration such as patient monitoring, well-
being, and elderly care. Ubiquitous healthcare is a new field that uses ubiquitous tech-
nologies to provide a technology-oriented environment for healthcare professionals to
provide efficient and effective care. Various technology solutions in diverse healthcare
domains, such as chronic disease monitoring, gait analysis, mood and fall detection,
neuropathic monitoring, physiological and vital sign monitoring, pulmonogical
monitoring and so on, have been developed by the research community [10].
9.4.2 Accessibility
Gregg Vanderheiden used the term “ubiquitous accessibility” to describe a new
technological approach to supporting accessibility. He suggests that, as computing
evolves from personal workstations to UC, accessibility must be examined, based
on this new computational paradigm. u-accessibility has arisen as a method of
improving the quality of life of people with disabilities and the elderly.
9.4.3 Learning
Ubiquitous learning is defined as learning that is enhanced by the use of mobile
and wireless communication technologies, as well as sensors and location/
tracking mechanisms, all of which work together to integrate learners with their
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 173
9.4.4 Logistics
Several studies have been conducted to assess the use of ubiquitous technologies to
improve transportation logistics. MobiLab/Unisinos suggested two models in this
context:
1. SWTrack: This enables businesses to track their vehicles and regulate the
routes they travel. The model allows users to determine whether a vehicle is
following a predetermined route or straying from it.
2. SafeTrack: This is a SWTrack extension that allows for the autonomous
delivery management of loads with no user interaction.
RFID is used to manage the input/output cargo on the vehicle, identifying shipments
and recoveries of loads made by mistake, as well as potential cargo theft, in real time.
9.4.5 Commerce
Context-aware commerce, also known as ubiquitous commerce or u-commerce, is
the use of mobile devices and ubiquitous technology to promote business. Gershman
identified three requirements for u-commerce success:
Some suggestions are focused on products trading, while others are concentrated
on the market revenue. The MobiLab proposed the MUCS paradigm, which stands
for “Universal Commerce Support.” The UC technologies are used by MUCS to
identify commercial opportunities for users as clients or suppliers. Furthermore, the
model provides a generic method to encourage commerce in products and services
across all domains.
T.me/nettrain
174 | Machine Intelligence
9.4.6 Games
The Ubiquitous Games were born through the usage of UC technology in game pro-
duction. In this type of game, players need to travel to certain locations in order to
complete tasks. Furthermore, users interact with one another (multi-player) as well
as with the environment around them (real objects). Wireless technologies enable
users to walk while somehow sending and receiving information such as position,
local users, and other types of context-aware data.
9.5 Advantages of UC
With the help of so many integrated sensors, UC is used in various fields to help people
improve their day-to-day activities more efficiently. Some of the advantages are:
9.6 Disadvantages of UC
Even as technology advances, there are still some limitations. Some of the
disadvantages of UC are discussed here:
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 175
T.me/nettrain
176 | Machine Intelligence
9.7.2 Characteristics of AR
AR is a computer system that can merge real-world and computer-generated data
and it has its own set of characteristics.
■ It enables the actual and virtual worlds to be combined so that elements of the
virtual world can interact with aspects of the actual world.
■ Information is exhibited virtually, such as photos, audio, videos, graphics and
GPS data, and is strongly linked to the information that we see with our
actual eyes.
■ It should be real-time interactive.
■ It directly interacts with the environment’s physical capabilities.
9.7.3 AR Types
Many immersive AR app development companies are always looking to improve the
consumer experience by integrating rich and innovative ways of expressing digital
content. This is made feasible through the use of several types of AR.
9.7.3.1 Marker-Based AR
Marker-based AR uses AR applications to identify physical images, ‘Markers’
and 3D models are shown in Figure 9.11. When you open the AR app, it uses
your mobile device’s back camera to track such markings. This form of AR tech-
nology is also known as image recognition or recognition-based AR. AR displays
overlay content in the form of video, image, 3D model, or animation clips,. Users
can connect with this content through an app. This is a popular sort of AR that
allows users to observe an object or image in better detail and from various angles.
Furthermore, when the user turns the marker, the 3D imaginary rotates at the
same time [13].
9.7.3.2 Markerless AR
Markerless AR works by scanning the environment without the use of a marker
(Figure 9.12). Users can place the virtual object or information anywhere they want
without having to move anything in the background. The markerless AR is a type of
virtual reality that collects positional information using the device’s location, digital
compass, camera, and accelerometer. In addition, for AR products to not always
float in the air, mobile apps with such features frequently ask the user for a flat sur-
face or floor to place them on. The AR app detects that flat surface and places items
on top of it [13].
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 177
T.me/nettrain
178 | Machine Intelligence
9.7.3.3 Location-Based AR
This is one of the most frequently used types of AR. The location is mostly
determined by GPS, Digital Compass, smartphone camera, and other technologies.
It does not require special markers to identify the location where the virtual object
is placed, unlike marker-based AR (Figure 9.13). A specific location is assigned to
the virtual products. The objects are displayed on the screen when the user enters a
predetermined area. Because it can forecast the user’s focus to connect real-time data
with the current spot, location-based AR does not require a hint from the object to
deploy. It also enables app developers to display imaginative, interesting, and valu-
able digital content to physical points of interest [13].
9.7.3.4 Superimposition AR
Superimposition AR creates a different view of an object that can be used to tem-
porarily replace the original view (Figure 9.14). This means that an augmented view
replaces the full view of an object or a portion of it with this technology. Object
recognition is key in this type of AR. Superimposition AR displays numerous
perspectives of a target object with the option of emphasizing additional relevant
information [13].
9.7.3.5 Projection-Based AR
Projection-based AR is a video projection technique that allows users to extend or
distribute digital data by projecting images onto the surface of 3D objects or into
their actual space (Figure 9.15). The user does not operate this type of technology.
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 179
This is one of the most useful types of AR since it allows users to freely navigate
around the world within the limitations of the projector and camera. With this tech-
nology, it is simple to create graphical representations with high-definition photos or
films that are impossible to see with traditional lighting techniques. With the passage
of time, it can also modify the visual appearance of the thing. This is AR software
T.me/nettrain
180 | Machine Intelligence
that uses artificial light to create illusions of depth, position, and orientation of
objects on real flat surfaces under proper supervision to simplify difficult manual
activities in organizations. The best part is that because the instructions are presented
on a specific work location, it eliminates the need for computers and screens [13].
9.8 AR Devices
AR is often recognized as one of the most promising technologies currently available.
After the smartphone revolution, everyone carries a mobile device and all of these
mobile devices generally include a processor, GPS, a display, a camera, microphone,
and some other hardware essential for AR. The terms AR and VR are frequently mis-
understood. When it comes to AR, while most smartphones and tablets can handle
it, VR requires specialized hardware.
9.8.1 Microsoft HoloLens
Microsoft HoloLens is an AR headset with transparent lenses. AR content is added
to the environment using a set of sensors and highly developed lenses. The HoloLens
also includes multiple microphones, an HD camera, a light sensor, and the
“Holographic Processing Unit,” which allows users to interact with 3D holograms
that are presented as part of the real environment. The ability to interact with the
virtual world, in particular, provides customers with a wide range of possibilities
while using this technology. Automobile manufacturers, for example, could use the
HoloLens to make real-time changes to new models (Figure 9.16). Other examples
include watching Netflix on your room’s wall or having a virtual pet wander around
your room.
Users can use their eyes to choose hologram apps and interact with the VR
world. The cursor will follow the movement of the head and body gestures can be
used to pick holograms or products and activate programs. To navigate and operate
apps, use voice recognition.
9.8.2 MagicLeap One
The “Lightpack,” a little computer that can be attached to your belt or pocket,
powers this futuristic-looking AR headset. MagicLeap’s ultimate goal is to combine
your virtual and actual lives. MagicLeap, for example, can project a whale into a
user’s classroom (Figure 9.17). MagicLeap does more than just insert image content
in the real world. The computer-generated stuff, like the HoloLens, responds to its
environment and allows the user to communicate with it. The way the user interacts
with virtual content differs. MagicLeap employs a controller with a large button and
a touchpad instead of gestures and eye movement.
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 181
T.me/nettrain
182 | Machine Intelligence
9.8.3 Epson Moverio
This AR headset has been specifically developed to operate with binoculars and
transparent glasses. This may not be the best choice if the user is seeking a fashion-
able and elegant headset. The Moverio glasses, on the other hand, are extremely
adaptable and can be adjusted to fit any size or user requirement. Epson Moverio’s
Si-
Oiled technology produces a sharp, bright, and high- quality image for
customers. These smart glasses operate on Android 5.1 and are powered by an Intel
Axom 5 CPU, making it easier for developers to create AR apps. Flying drones
with the Epson Moverio headset’s AR app is a fun way to use it (Figure 9.18). The
user can have a hands-free flying experience while also seeing the drone through
the glasses.
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 183
These AR headsets and smart glasses have the potential to take over the AR
world [14]. However, smartphones, mobile devices, and AR smart glasses will very
certainly interact in the future.
9.9 AR Applications
AR and other VR technologies are becoming increasingly popular. Here are modern-
day AR applications that users should be aware of.
T.me/nettrain
184 | Machine Intelligence
9.9.2 3D Animals
Another exciting application of AR technology is 3D animals. Google 3D Animals
allows users to search for and watch life-sized animals in 3D, as well as interact with
them via AR in their local area. Google image-search looks for 3D creatures like
tigers, zebras, koalas, or sharks using an ARCore-equipped device and supported
browsers like Chrome, Opera, and Safari. Use “Tap on View in 3D,” then view in
your space on the results page. To see the life-sized tiger in 3D, simply point the
smartphone to the ground (Figure 9.21). The user can rotate the animal and hear
the animal bark, bray, and snort, much like it does in the wild, using AR. This was
launched at Google I/O 2019 and includes a growing number of animals.
9.9.3 Fashion
See My Fit, a technology developed by Asos, uses AR to visually fit garments on
models. This allows the user to try up to 500 things per week on six different models
to see how they fit. The user can visualize the size, fit, and cut of each item before
making a purchase. Due to COVID-19, the shop is now unable to work with models
in its studios. In addition to See My Fit, Asos is integrating flat shot photographs and
product shots from models’ homes to continue to serve its customers. The Virtual
Catwalk, an AR-enabled tool that allows users to visually envision and experience up
to 100 ASOS Design products, was introduced earlier this year by the top fashion
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 185
store (Figure 9.22). Zara, Ikea, Shopify, Kendra Scott, and a slew of other retailers
are among the fashion companies aiming to use this technology in future.
9.9.4 Gaming
In game creation, game creators are experimental with more immersive and experi-
encing elements. For example, Pokémon Go, which was introduced in 2016, is one
of the most prominent examples of AR. It allows users to experience AR by bringing
game characters into their physical surroundings (Figure 9.23).
9.9.5 Coloring Books
Traditional coloring books allow users to sketch and express themselves artistically
in two dimensions on paper. However, in today’s extended reality environment, this
is static and fairly restricting. Use AR to explore the amazing world of 3D drawing.
Users can sketch their favorite characters in 2D on paper and have them reproduced
in 3D in real time with Disney’s AR coloring book app (Figure 9.24). The program
matches and copies elements of the drawing in 3D using texture generation. You
can move your characters as you turn the pages using deformable surface tracking.
Coloring books by Disney are accessible for both children and adults, with over 100
illustrations to encourage creativity and motivate you to draw. Disney received the
Edison Award in 2016 for this unique application.
T.me/nettrain
186 | Machine Intelligence
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 187
9.9.6 Obstetrics
Road to Birth is an AR+VR project being created by the University of Newcastle’s
Innovation Team to provide visual insights into the stages of childbirth and their
effects on pregnant women. This is a world-first invention that is designed to
improve healthcare for both healthcare professionals and pregnant women.
Samsung GearVR, HTC Vive (VR) and HoloLens are all used (AR). Using
AR and VR, this mixed-reality simulation (Figure 9.25) will allow you to learn
about essential anatomy changes, delivery approaches, and real-world situations
in 3D. This will help students learn faster and provide pregnant women with a
better understanding of the biological changes they will experience throughout
pregnancy.
9.9.7 Architecture
In the architecture world and construction industries, AR is used for space planning
and design visualization. Turning a simple sketch design on paper into a full 3D
model with specialist software is a great advantage of AR.AR can also be used in
construction to take a virtual tour of the finished 3D model during design analysis
(Figure 9.26). This visual inspection can support engineers and contractors in deter-
mining the height of the ceiling, as well as identifying any construction difficulties
and assisting with component prefabrication. The University of Canterbury in New
Zealand launched CityViewAR in 2011, following the Christchurch earthquake, to
T.me/nettrain
188 | Machine Intelligence
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 189
allow city planners and engineers to understand and re-imagine structures as they
were before the earthquake.
9.9.8 Sports
Consider 3D team set-plays, pre-game practice with computer-generated opponents,
virtual training venues, custom-built training sets for individual athletes, and more
(Figure 9.27). These technologies are already in use by teams in the National Football
League (NFL), the Atlantic Coast Conference (ACC), and several soccer teams.
One of the companies driving this trend is EON Sports. Virtual training and 24/7
personal AR coaches promise fewer injuries in sports, as well as greater experiences
for athletes, officials, spectators, and other stakeholders.
9.10 Advantages of AR
The possibilities are enormous as AR technology becomes more accessible and
affordable.AR is a rapidly expanding technology with a wide range of applications.
The following are some of the advantages:
T.me/nettrain
190 | Machine Intelligence
■ With the apps, anyone can use them, as they are easy to manipulate.
■ Saves money by putting crucial circumstances to the test before adopting
them in real life.
■ Can be implemented in the current world once it has been verified.
■ Military personnel can use AR in battlefield simulations before the actual
conflict without putting their lives at risk. This will also support them in
making important decisions during the actual fighting.
■ Can be used as part of training programmes because it makes things memor-
able and interesting [16].
9.11 Disadvantages of AR
Every new piece of technology that is emerging also has a negative aspect to it. The
following are the disadvantages of AR:
9.12 Conclusion
The field of HCI covers a number of disciplines, including computer science,
cognitive science, design, mass communication, and others. The two major HCI
developments discussed in this chapter seem to be UC and AR. This chapter
provides a comprehensive overview of UC, detailing it from its early years to the
present. The UC timeline, characteristics of UC, layers of UC, types of UC, and
various modern technologies used by people, such as smartphones, smart watches,
smart rings, and so forth, have been properly covered. This chapter described the
applications of UC in the fields of healthcare, education, logistics, business, and
the gaming industry. UC’s benefits and drawbacks were listed. Similar to UC, AR’s
history, characteristics, types, AR devices, AR applications, and advantages and
disadvantages were all explained. AR is a revolution for the HCI economy. The level
of progress and continued development of the technology are visible to the public.
Acknowledgments
We thank God Almighty for His abundant grace and mercy who made this possible.
T.me/nettrain
Ubiquitous Computing and Augmented Reality in HCI | 191
References
1. MRCET (n.d.) “Digital Notes on Human Computer Interaction.” Available at: https://
mrcet.com/pdf/Lab%20Manuals/IT/R15A0562%20HCI.pdf (accessed May 31,
2022).
2. Fakhreddine Karray, et al. (2008) “Cooperative Multi Target Tracking Using Multi
Sensor Network.” International Journal on Smart Sensing and Intelligent Systems, 1(3).
3. Wikipedia (n.d.) “Human–Computer Interaction.” Available at: https://en.wikipedia.
org/wiki/Human%E2%80%93computer_interaction. (accessed August 1, 2021).
4. The Interaction Design Foundation (2015) “The 5 Main Characteristics of Interaction
Design | Interaction Design Foundation (IxDF).” 1 August. Available at: www.interact
ion-design.org/literature/article/the-5-main-characteristics-of-interaction-design.
5. Educative (n.d.) “Introduction to Human-Computer Interaction & Design Principles.”
Interactive Courses for Software Developers. Available at: www.educative.io, www.
educative.io/blog/intro-human-computer-interaction. (accessed May 31, 2022).
6. “Human Computer Interface–Quick Guide.” Available at: www.tutorialspoint.com,
www.tutorialspoint.com/human_computer_interface/quick_guide.htm (accessed May
31, 2022.
7. Utsav Mishra, and AnalyticsSteps (2021) “Human- Computer Interaction (HCI):
Importance and Applications.” June 25. Available at: www.analyticssteps.com/blogs/
human-computer-interactionhci-importance-and-applications.
8. IoT Agenda (n.d.) “What Is Ubiquitous Computing (Pervasive Computing)?”,1 Oct.
2019,Available at: www.techtarget.com/iotagenda/defi nition/pervasive-computing-ubi
quitous-computing.
9. Wikipedia (n.d.) “Portal: Human– Computer Interaction.” Available at: https://
en.wikipedia.org/wiki/Portal:Human%E2%80%93computer_interaction (accessed 31
May 2022).
10. IoT Agenda (2019) “What Is Ubiquitous Computing (Pervasive Computing)?”1 Oct.
Available at: www.techtarget.com/iotagenda/defi nition/pervasive-computing-ubiquit
ous-computing.
11. The Franklin Institute (2017) “What Is Augmented Reality?” Sept. 21. Available at:
www.fi.edu/what-is-augmented-reality.
12. www.G2.Com/Articles/History-of-Augmented-Reality (accessed May 31, 2022).
13. Ayushi, Doegar, et al. (2021) “What Are the Different Types of Augmented Reality?
Latest Technology Blogs &Updates by Top Research Agency.” Oct. 27. Available at:
https://mobcoder.com/blog/types-of-augmented-reality/.
14. Onirix (2020) “The Best Augmented Reality Hardware in 2019–Onirix.” May 15.
Available at: www.onirix.com/learn-about-ar/the-best-augmented-reality-hardware-in-
2019/.
15. Ritesh Pathak, and AnalyticsSteps (2020) “7 Top Trends in Augmented Reality | Use
Cases of AR.” October 20. Available at: www.analyticssteps.com/blogs/5-trends-augmen
ted-reality.
16. Myayanblog (n.d.) “Advantages and Disadvantages of AR– Augmented Reality.”
Available at: www.myayan.com/advantages-and-disadvantages-of-ar-augmented-reality
(accessed May 31, 2022).
T.me/nettrain
T.me/nettrain
Chapter 10
A Machine Learning-
Based Driving Assistance
System for Lane and
Drowsiness Monitoring
Sanjay A. Gobi Ramasamy
10.1 Introduction
Lane detection is a critical component when driving heavy vehicles. This is done by
using computer vision methods with the help of the OpenCV library [1]. Lane detec-
tion is performed by looking for white markings on both sides of the lane and patterns
with color. The number of people driving along the nation’s highways without proper
sleep is uncountable, causing accidents. Using Python, OpenCV, and Keras, the driver
will be alerted to drowsiness and avoid accidents [2]. The concept of driving in the right
lane refers to avoiding the risk of moving into another lane without the knowledge
of the driver, which leads to accidents [2]. Lane departure signals and lane-keeping
signals are among the warning signals. Sleep deprivation is a problem for truckers who
travel long distances. As such, driving while sleepy becomes extremely dangerous as the
driver can wander into the wrong lane, so we recommend a portable system to monitor
a driver’s actions that can sit on the windshield or any surface.
Humans have developed technology to simplify and protect their lives, whether
they are performing mundane tasks, such as going to work [2] or more complex
ones, such as traveling by air. Technology has impacted our lives profoundly. At the
pace we travel today, even our grandparents wouldn’t have believed we could do it.
Almost everyone on this planet now uses some type of transport every day. Others
use public transportation because they cannot afford a car. No matter a person’s
social standing, there are a few rules for drivers. For instance, remaining alert and
active while driving is a must [3]. In developing countries such as India, fatigue,
combined with bad infrastructure, leads to disaster. There are no clear key indicators
or tests available for measuring or observing fatigue, unlike alcohol and drugs, which
have obvious key indicators and tests [4]. It is likely that raising awareness of fatigue-
related accidents and encouraging drivers to report fatigue at the earliest oppor-
tunity can solve this problem. Many long-distance truckers have to drive for many
hours, which is challenging and costly. But the higher wages associated with the
job encourage many people to take on that job. Transport vehicles must be driven
at night as in the day. When drivers are motivated by money, they make unwise
decisions, such as driving all night despite being fatigued [3]. Driving while fatigued
causes many accidents since the drivers don’t realize the risk involved.
To solve this problem, however, it is very difficult and costly to implement these
restrictions, and, even then, it is not sufficient to solve this problem. Some nations
have imposed restrictions on the number of hours a driver can drive at a stretch [3].
Aesthetic and technical enthusiasts who wish to develop the project are the intended
audience for this chapter. In many vehicles, there are a variety of products to measure
driver fatigue. Unlike the traditional drowsiness detection system, our driver drowsi-
ness detection system offers better results and additional benefits. A user will receive
an alert if the measure of drowsiness reaches a certain saturation level.
Despite fatigue being a health and safety problem, no nation in the world has yet
managed to tackle it comprehensively [5]. Alcohol and drugs, on the other hand, have
easy-to-measure key indicators and tests, while fatigue is difficult to measure or observe.
It is impossible to discern fatigue without the some sort of indicator, as the drivers will
not state their fatigue level, due to the high-profit margin of driving long hours.
10.2 Literature Review
10.2.1 System Review
This survey examined various sites and applications and looked at the fundamental
data to figure out what the general public’s needs are. By analyzing these data, we
came up with an audit that gave us a fresh perspective and helped us to devise
different strategies. A similar application is needed and some reasonable progress is
being made in this area.
T.me/nettrain
ML-Based Driving Assistance System | 195
deep learning to detect lanes. CNN and recurrent neural networks (RNN) are
both effective in detecting lanes. As part of this process, a camera attached to
the vehicle takes a picture of the road. Next, the image is converted to grayscale
to reduce the processing time. Following that, if the image suffers from interfer-
ence, one can activate filters to remove these interferences to ensure accurate edge
detection. In addition to bilateral and Gaussian filters, trilateral filters can also be
used. Using a canny filter and an edge detector, an edged image can be obtained
using thresholding generated by a machine to obtain the edges. These edges can be
detected by line detectors. Segments of the lane boundary will be generated on the
left and the right. As a result, the RGB color codes are used to determine yellow
and white lanes.
10.2.4 Requirement
Figure 10.1 shows the tools required to carry out the process.
10.2.6 Process Flow
Figure 10.3 shows the process flow.
T.me/nettrain
196 | Machine Intelligence
T.me/nettrain
ML-Based Driving Assistance System | 197
10.4 Image Capturing
Webcams are used to capture input images. An infinite loop records every frame
from the webcam while it is accessed. The cv2.VideoCapture(0) method of the
OpenCV library is used to capture video and set the capture object [3]. Each image
is stored in a video frame variable by reading () in the frame.
T.me/nettrain
198 | Machine Intelligence
Grayscale images are required for OpenCV’s object detection algorithm to detect
a face in the image. Detection of faces is not dependent upon color information [3].
This is done by implementing a Harr cascade classifier.
10.4.1 Edge Detection
Boundaries are defined by edges, which makes image processing a fundamental
problem. An edge is a sharp contrast between different pixels, a sudden change in
T.me/nettrain
ML-Based Driving Assistance System | 199
intensity [5]. When edges are detected in an image, data is significantly reduced and
useless information is culled, while the image’s significant structural properties are
preserved. One example is an optimal edge detector, also called canny. Ideally, the
first criterion should have a low error rate and make sure that useful information is
preserved while unwanted data is filtered out. As much variation as possible should
be kept between the original image and the processed image as the second criterion
[5]. The third criterion involves removing multiple responses to edges. The edge
detector uses these criteria to smooth out the image first. Next, it highlights regions
with high spatial derivatives by calculating the image gradient.
Based on these regions, the algorithm then suppresses any pixels that are not
at the maximum using non-maximum suppression. To remove streaking along the
edges and then the gradient array, hysteresis is now applied. Figure 10.7 shows
the original image edge detection techniques applied. Figures 10.8 and 10.9 show
the Sobel enhanced technique.
T.me/nettrain
200 | Machine Intelligence
Figure 10.9 Canny edge detection and Sobel enhanced edge in X, Y direction
through Sobel function.
T.me/nettrain
ML-Based Driving Assistance System | 201
cascade classifier and then each cascade classifier is used to detect both eyes [6].
The full image in this case only needs to be analyzed for eye information. The
boundary box of the eye must first be extracted so that the eye image can then be
grabbed. Only the left eye’s image data is contained [6] in the function detects. In
this way, we will be able to determine whether an eye is open or closed by using
our CNN classifier. This process is repeated for determining whether the right
eye is open.
T.me/nettrain
202 | Machine Intelligence
10.4.3 Grayscale Conversion
A few operations are required to fix the corrected dimension before the image can
be fed into the model. Using the cvtColor function with color from BGR to gray-
scale attribute to convert a color image into grayscale. The model was trained on
24*24-pixel images [7], so the image is resized to 24*24 pixels. If you want better
convergence, normalize the data by multiplying the right eye by 255 (all values will
be within the range of 0–1). To feed the classifier with more dimensions, expand the
data [8]. Use the model load_model function to load the model. The model predict_
classes class will be used to start the process of classifying each eye with the model
[8]. Having the predicted class value of 1 indicates that the eyes are open, and having
the predicted class value of 0 indicates that the eyes are closed.
10.4.4 Score Calculation
As a result of the score, we can calculate how long the individual has had their eyes
closed. As long as both eyes are closed, the system will constantly increase the score,
while as long as the eyes are open, the score will decrease (Figure 10.11). After the
classification has been made on the screen, Cv2.putText will display the status of the
person in real time. If the score is above 15, for example, then the person’s eyes must
have been closed for a long time; the system will subsequently beep the alarm using
the sound_play function.
T.me/nettrain
ML-Based Driving Assistance System | 203
proper rules. The majority of these crashes occur because the driver is lethargic and
interrupted. Regardless of whether the driver or pedestrian is distracted, lane discip-
line is crucial. The software has the goal of identifying lane markings [3]. This
software will improve traffic conditions.
The changes in lane boundaries have caused the above-stated problems. In this
chapter, an algorithm is proposed for the detection of lane markings on the road by
using computer vision technology to analyze the video of the road [5] (Figure 10.12).
This technology aims to reduce the frequency of accidents by detecting lane markings
on the road.
To prevent accidents caused by reckless driving on the road, the system can be
installed in cars and taxis. The system will ensure the safety of children in school
buses. The set-up can also monitor the performance of drivers, which in turn allows
the Road Transportation Offices to report and check negligence and lack of attention
by drivers on the road.
T.me/nettrain
204 | Machine Intelligence
10.5.2 Implementation
10.5.2.1 System Design Approach
Figure 10.13 shows the system design approach.
T.me/nettrain
ML-Based Driving Assistance System | 205
10.6 Computational Results
Image segmentation is done pixel-by-pixel using semantic segmentation. Datasets
used for training were downloaded from the Internet. Segmentation was performed
using 30 classes. Prelu is used for simple convolutions diluted with activation
functions. The main goal is finding the kernel.
10.6.1 Accuracy Achieved
Here the model is trained as a steady line in the graph which indicates that the model
trained is efficient which both the value-added points (Figure 10.14). An accuracy
check of the system can be carried out by measuring the system efficiency per minute
(Table 10.1). As the accurate column indicates, Table 10.2 shows accurate detections
per minute, while the Approximate and Missing columns in Figure 10.15 show
approximately positives and undetected lanes, respectively.
10.6.2 Assumptions
Imagine a sunny day with adequate lighting on the roads. Lane detection systems are
assumed to work up to speed and not limit the memory. The yellow lane markings
mark the left side of the road whereas the white lane markings mark the right.
T.me/nettrain
206 | Machine Intelligence
T.me/nettrain
ML-Based Driving Assistance System | 207
Highway authority standards should apply to all roadways built. It is assumed that
the lane detection system is fast enough and that there is no limit to memory.
10.7 Conclusion
In addition to its limitations, OpenCV is inefficient and, without clear markings,
produces inaccurate results. The classical OpenCV method cannot accurately detect
roads without clear markings. This method is also incompatible with all environ-
ments. The model can be tuned incrementally by adding additional parameters, such
as blink rate, car condition, and yawning rates. A variety of applications have been
developed to address these issues, including traffic control, traffic monitoring, and
traffic flow.
The modular implementation allows us to update algorithms easily and continue
working on changes to models in the future. Using a pickle file of the model, the
needed areas can be inserted and then easily applied to the product. Consequently,
it would not be necessary to compile the entire code each time. By introducing the
concept of detecting dark roads during the night, we can add value to the project.
When it is daylight, the process identifies and selects colors very well. The
accuracy can be significantly improved by using all of these parameters. Netflix and
other streaming services can use the same model and techniques to detect when users
are asleep and stop their videos accordingly. Anti-sleep applications could also be
developed using this method. Poorly functioning autonomous driving systems have
been implicated in several high-profile accidents, but other accidents have shown
the benefits of this technology. China has reported a fatality after a Tesla driver hit
a cleaning vehicle with his vehicle in January 2016. At the time of the crash, Tesla’s
autonomous features were reportedly engaged. In this first death involving an ADS
feature, the Tesla driver was not paying enough attention to the road, according to
the police.
In terms of safety, there are still difficulties evaluating self-driving cars using
publicly available data. Driverless cars are usually tested in cities and states where the
T.me/nettrain
208 | Machine Intelligence
weather is relatively dry and the road system is simple, which facilitates autonomous
driving. California is also the only state in America that requires detailed accident
reports from companies testing driverless cars. For this reason, it is also the only
state that requires detailed accident reports for any accident involving autonomous
vehicles on public roads.
References
1. J. Long, E. Shelhamer, and T. Darrell (2015) “Lane Detection Techniques –A Review.”
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.
3431–3440.
2. S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and
P. H. Torr (2015) “A Layered Approach to Robust Lane Detection at Night.” Conference
paper, pp. 1529–1537.
3. P. Shopa, N. Sumetha and P. S. K. Pathra (2014) “Traffic Sign Detection and Recognition
Using OpenCV.” Paper presented at International Conference on Information
Communication and Embedded Systems (ICICES2014).
4. V. Badrinarayanan, A. Handa, and R. Cipolla (2015) “Segnet: A deep Convolutional
Encoder-Decoder Architecture for Robust Semantic Pixelwise Labelling.” arXiv preprint
arXiv:1505.07293,
5. W. Han, Y. Yang, G.-B. Huang, O. Sourina, F. Klanner, and C. Denk (2015) “Driver
Drowsiness Detection Based on Novel Eye Openness Recognition Method and
Unsupervised Feature Learning.” Paper presented at IEEE International Conference on
Systems, Man, and Cybernetics.
6. Z. Li, S.E. Li, R. Li, B. Cheng, and J. Shi (2017) “Online Detection of Driver Fatigue
Using Steering Wheel Angles for Real Driving Conditions.” Sensors, 17(3): 495.
7. A.G. Correa, L. Orosco, and E. Laciar (2014) “Automatic Detection of Drowsiness
in EEG Records Based on Multimodal Analysis.” Medical Engineering and Physics, 36:
244–249.
8. R. Pooneh, R. Tabrizi, and A. Zoroofi (2009) “Drowsiness Detection Based on Brightness
and Numeral Features of Eye Image.” Paper presented at Fifth International Conference
on Intelligent information Hiding and Multimedia Signal Processing.
T.me/nettrain
Chapter 11
Prediction of Gastric
Cancer from Gene
Expression Dataset Using
Supervised Machine
Learning Models
P. Manikandan, D. Ruban Christoper,
and Luthuful Haq
11.1 Introduction
The geographical incidence of gastric cancer has changed dramatically since the
1940s. Gastric cancer was the most common reason behind cancer death in men.
Recent studies reported that gastric cancer is 2.2 times more likely to be diagnosed
in males compared with females (Rawla and Barsouk, 2019). Mortality rates of gas-
tric cancer in males are high in several countries, such as Central and East Asia and
Latin America. Gastric cancer peaks within the seventh decade of life. Often, a delay
in diagnosing can account for the poor prognosis. Luckily, dedicated analysis into its
pathological process and identification of recent risk factors, treatment and advanced
endoscopic techniques offer aid to earlier detection of gastric cancer. Recognition
that Helicobacter pylori infection causes most stomach ulcers has revolutionized the
approach to gastric cancer these days.
T.me/nettrain
Prediction of Gastric Cancer from Gene Expression Dataset | 211
of NLR. Univariate analysis and variations of this parameter were also shown to
be correlated with tumor progression. NLR values should be considered a useful
follow-up parameter (Sahin et al., 2017). The cancer-related mortality and Potential
Years of Life Lost (PYLL) were calculated by age, sex, districts (urban or rural), to
describe the patterns of life lost to cancers. The dataset consists of 255 registries
that were incorporated into the registration office in 2013, covering 226,494,490
people from urban areas. Three categories of people were found –those in urban
areas, males and people over 60 years –who were suffering from more serious cancer
deaths and life lost (Yan et al., 2019).
Cancer incidence and mortality produced by the International Agency for
Research on Cancer on Worldwide were estimated. Some 19.3 million new cancer
cases and almost 10.0 million cancer deaths occurred in 2020. The global cancer
burden is expected to be 28.4 million cases in 2040, a 47 percent rise from 2020
(Sung et al., 2021). Clinicopathological factors predictive of lymph node metastasis
in patients were identified with the poorly differentiated early gastric cancer. Some
1,005 patients were included in the analysis. The machine learning model, named
Logistic regression, revealed that lymph node metastasis was significantly associated
with sex, tumor size, depth of tumor invasion, and lymphatic involvement (Lee
et al., 2012). The potential relationship between the severity of inflammation
and prognosis in cancer patients was investigated. Patients’ records of 220 gastric
surgery patients, from January 2002 to December 2006, were analyzed with the
univariate analysis (Lee et al., 2013). Medical check-up data of 25,942 participants
from 2006 to 2017 at a single facility in Japan was analyzed using machine learning
techniques. The results showed that XGBoost outperformed logistic regression and
showed the highest area under the curve value as 0.899 (Taninaga et al., 2019).
A microRNA panel was identified in the serum of patients to predict gastric cancer
non-invasively with high accuracy and sensitivity. The Support Vector Machine
Classifier outperformed the other models with the highest accuracy of 95 percent
(Huang et al., 2018).
The most important aspects of gastric cancers, which include epidemiology, risk
factors, classification, diagnosis, prevention, and treatment, were briefly reviewed
(Sitarz et al., 2018). Supervised learning methods were used to distinguish the
three gastric cancer subtypes. Leave- one-
out cross-validation error was 0.14,
suggesting that more than 85 percent of samples were classified correctly (Shah
et al., 2011). New classification of gastric cancers and the up-to-date guidance in
the application of molecular testing were reviewed. Recent advances in molecular
medicine have not only shed light on the carcinogenesis of gastric cancer, but
also offered novel approaches regarding prevention, diagnosis, and therapeutic
intervention (Hu et al., 2012). The current applications of convolutional neural
networks (CNN) in identification of gastric cancer were reviewed. The transfer
learning approaches in CNN, such as AlexNet, ResNet, VGG, Inception and
DenseNet, were used to identify gastric cancer. A total of 27 articles were retrieved
T.me/nettrain
212 | Machine Intelligence
for the identification of gastric cancer using medical images (Zhao et al., 2022).
Deep convolutional neural networks (CNNs) are applied to automatically clas-
sify Magnifying-Narrow Band Imaging (M-NBI) images into two groups: normal
gastric images and EGC images. VGG16, InceptionV3 and InceptionResNetV2
are selected to classify the M-NBI image. Experimental results show that transfer
learning of deep CNN features performs better than traditional handcraft methods
(Liu et al., 2018). Hence, it is necessary to develop an accurate and rapid screening
method for diagnosis of gastric cancer. Recent works suggests that the Artificial
Intelligence and Big Data technologies are effective in improving the screening,
prediction and diagnosis in the medical fields (Hinton et al., 2017; Nishio et al.,
2018). Based on the existing literatures, this research work also focused on clas-
sifying the gastric cancer dataset using supervised machine learning models. The
rest of this chapter is organized as follows. Section 11.2 describes the Methodology
and Section 11.3 shows the experimental results on gastric cancer dataset and
discusses the outcomes. Finally, the conclusion and future enhancement are given
in Section 11.4.
11.2 Methodology
This research work compares four supervised machine learning algorithms such as
Logistic Regression (LR), Decision Tree (DT), Naïve Bayes (NB), and K-Nearest
Neighbor (KNN) to predict the best classification method for classification of gastric
cancer gene expression dataset. The overall framework for this research work is
shown in Figure 11.1.
T.me/nettrain
Prediction of Gastric Cancer from Gene Expression Dataset | 213
11.2.1 Dataset Description
The gastric cancer gene expression dataset was collected from Gene Expression
Omnibus with the GEO accession code (GSE2685) which consists of 4,524
attributes that include the records of 30 patients (Hippo et al., 2002). The python
programming language is used to analyze the performance of the supervised machine
learning algorithms.
11.2.2 Classification Techniques
In order to predict the outcome of the dataset, the algorithm processes a training
set containing a set of attributes and the respective outcome, usually called target or
prediction attribute. In this work, the supervised machine learning algorithms pre-
dict which of the algorithm is most suitable for classifying the gastric cancer gene
expression dataset. The supervised machine learning algorithms namely Logistic
Regression (LR), Decision Tree (DT), Naïve Bayes (NB) and K-Nearest Neighbor
(KNN) are used to find out which fits effectively for the gastric cancer gene expres-
sion dataset.
1
Sigmoid Function S ( x ) = (11.1)
1 + e −x
T.me/nettrain
214 | Machine Intelligence
expression dataset. The Gini Index (Eq. 11.3) computes the degree of probability of
a specific variable that has been wrongly classified. It works on categorical variables
and gives the outcome of success or failure:
( )
n
Entropy = ∑ − p (Ci ) log 2 p (Ci ) (11.2)
i =1
n
Gini = 1 − ∑ − p 2 (Ci ) (11.3)
i =1
where:
P(A|B) is posterior probability
P(B|A) is likelihood probability
P(A) is prior probability
P(B) is marginal probability
T.me/nettrain
Prediction of Gastric Cancer from Gene Expression Dataset | 215
Of the 4,524 features present in the dataset, “class” is the dependent variable
with three classes “Normal,” “Diffuse,” and “Intestinal.” Using the countplot from
the seaborn library, the count of three classes present in the dataset was predicted
(Figure 11.2).
To find the correlation of 4,522 independent variables in the dataset, the Pearson’s r
function is used from the scipy.stats module and converts the independent variables
to integer datatype. A loop is used with a range of 0–4521 to find the correlation
between the features and dependent variable. Since the dependent variable is a
T.me/nettrain
216 | Machine Intelligence
categorical variable, the label encoder is used to convert the values to numeric. Since
the number of features is huge, many of the features do not provide much meaning
relevant to the study. Hence, the Feature Selection is employed using Analysis
of Variance (ANOVA) to select the top 50 features which will be used to model
(Figure 11.3). The top 50 features selected by the ANOVA method are passed as an
independent feature to the four supervised machine learning classification models
such as LR, DT, NB, and KNN. The performance metrics, namely Accuracy,
T.me/nettrain
Prediction of Gastric Cancer from Gene Expression Dataset | 217
Figure 11.4 Comparison of accuracy and performance measures for the gastric
cancer gene expression dataset.
Precision, Recall and F1-Score, are used to evaluate the performance of the machine
learning models. From the experimental results (Figure 11.4), it is observed that the
Decision Tree (DT) algorithm performs better than the other supervised machine
learning algorithms. Out of 30 patients, 5 patients are identified to have diffuse gas-
tric cancer, 17 were found to have intestinal gastric cancer, and 8 are found to be
normal not have gastric cancer.
11.4 Conclusion
In this research work, the performance of four supervised machine learning
algorithms, namely Logistic Regression, Decision Tree, Naïve Bayes and K-Nearest
Neighbor, were analyzed. The gastric cancer gene expression dataset was used to
calculate the performance by using the train-test split method. And, finally, this
research has analyzed the algorithms using the Accuracy and Performance Metrics.
From the results, it is observed that the DT algorithm performs better than the
other algorithms. Out of 30 patients, 5 patients are identified to have diffuse Gastric
Cancer, 17 were found to have Intestinal Gastric Cancer, 8 are found to be NORMAL
not have gastric cancer. In future, the DT algorithm can be experimented on other
T.me/nettrain
218 | Machine Intelligence
datasets also. And, in future, the DT algorithm should be modified to obtain more
effective results. Also, the classification algorithms can be analyzed using other
parameters such as the cross-validation split.
References
Brindha, Senthil Kumar, et al. (2021) “Logistic Regression for Gastric Cancer Classification
Using Epidemiological Risk Factors in Cases and Controls.” Science & Technology
Journal, 9: 154–160.
Danish, Jamil, Sellappan, Palaniappan, Asiah, Lokman, et al. (2021) “Diagnosis of Gastric
Cancer Using Machine Learning Techniques in the Healthcare Sector: A Survey.”
Informatica. 45(7): 147–166.
Hinton, D. J. et al. (2017) “Metabolomics Biomarkers to Predict Acamprosate Treatment
Response in Alcohol-Dependent Subjects.” Scientific Reports. 7: 2496. PMID: 28566752
Hippo, Y., Taniguchi, H., Tsutsumi, S., Machida, N. et al. (2002) “Global Gene Expression
Analysis of Gastric Cancer by Oligonucleotide Microarrays.” Cancer Research, 62(1):
233–240.
Hu, B., El Hajj, N., Sittler, S., Lammert, N., Barnes, R., and Meloni-Ehrig, A. (2012)
“Gastric Cancer: Classification, Histology and Application of Molecular Pathology.”
Journal of Gastrointestinal Oncology, 3(3): 251–261.
Huang, Y., Zhu, J., Li, W., Zhang, Z., Xiong, P., Wang, H., and Zhang, J. (2018) “Serum
MicroRNA Panel Excavated by Machine Learning as a Potential Biomarker for the
Detection of Gastric Cancer.” Oncology Reports, 39(3): 1338–1346.
Jun, J.K., Choi, K.S., Lee, H.Y., Suh, M., Park, B., Song, S.H., Jung, K.W., Lee, C.W.,
Choi, I.J., Park, E.C., and Lee, D. (2017) “Effectiveness of the Korean National Cancer
Screening Program in Reducing Gastric Cancer Mortality.” Gastroenterology, 7152(6):
1319–1328.e7.
Lee, D.Y., Hong, S.W., Chang, Y.G., Lee, W.Y., and Lee, B. (2013) “Clinical Significance
of Preoperative Inflammatory Parameters in Gastric Cancer Patients.” Journal of Gastric
Cancer, 13(2): 111–116.
Lee, J.H., Choi, M.G., Min, B.H., Noh, J.H., Sohn, T.S., Bae, J.M., and Kim, S. (2012)
“Predictive Factors for Lymph Node Metastasis in Patients with Poorly Differentiated
Early Gastric Cancer.” British Journal of Surgery, 99(12): 1688–1692.
Leung, W.K., Cheung, K.S., Li, B., Law, S.Y.K., and Lui, T.K.L. (2021) “Applications of
Machine Learning Models in the Prediction of Gastric Cancer Risk in Patients After
Helicobacter Pylori Eradication.” Alimentary Pharmacology and Therapeutics, 53(8):
864–872.
Li, C., Zhang, S., Zhang, H., Pang, L., Lam, K., Hui, C., Zhang, S. (2012) “Using the K-
Nearest Neighbor Algorithm for the Classification of Lymph Node Metastasis in Gastric
Cancer.” Computational and Mathematical Methods in Medicine. 2012: 876545.
Liu, X., Wang, C., Hu, Y., Zeng Z., Bai J., and Liao G. (2018) “Transfer Learning with
Convolutional Neural Network for Early Gastric Cancer Classification on Magnifying
Narrow-Band Imaging Images.” Paper presented at 2018 25th IEEE International
Conference on Image Processing (ICIP), pp. 1388–1392.
T.me/nettrain
Prediction of Gastric Cancer from Gene Expression Dataset | 219
Mortezagholi, A., Khosravizadeh, O., Menhaj, M.B., Shafigh, Y., and Kalhor, R. (2019)
“Make Intelligent Gastric Cancer Diagnosis Error in Qazvin’s Medical Centers: Using
Data Mining Method.” Asian Pacific Journal of Cancer Prevention, 20(9): 2607–2610.
Nishio, M. et al. (2018) “Computer-Aided Diagnosis of Lung Nodule Using Gradient Tree
Boosting and Bayesian Optimization.” PLoS One 13, e0195875. PMID: 29672639.
Rawla, P., and Barsouk, A. (2019) “Epidemiology of Gastric Cancer: Global Trends, Risk
Factors and Prevention.” Prz Gastroenterology, 14(1): 26–38.
Sahin, A.G., Aydin, C., Unver, M., and Pehlivanoglu, K. (2017) “Predictive Value of
Preoperative Neutrophil Lymphocyte Ratio in Determining the Stage of Gastric Tumor.”
Medical Science Monitor, 23: 1973–1979.
Shah, M.A., Khanin, R., Tang, L., Janjigian, Y.Y., Klimstra, D.S., Gerdes, H., and Kelsen,
D.P. (2011) “Molecular Classification of Gastric Cancer: A New Paradigm.” Clinical
Cancer Research, 17(9): 2693–2701.
Sitarz, R., Skierucha, M., Mielko, J., and Offerhaus, G.J.A., Maciejewski, R., and Polkowski,
W.P. (2018) “Gastric Cancer: Epidemiology, Prevention, Classification, and Treatment.”
Cancer Management and Research, 10: 239–248
Su, Y, Shen, J, Qian, H, Ma, H, Ji, J, Ma, H, Ma, L, … Shou, C. (2007) “Diagnosis of Gastric
Cancer Using Decision Tree Classification of Mass Spectral Data.” Cancer Science, 98(1):
37–43.
Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., and Bray,
F. (2021) “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and
Mortality Worldwide for 36 Cancers in 185 Countries.” CA: A Cancer Journal for
Clinicians, 71(3): 209–249.
Taninaga, J., Nishiyama, Y., Fujibayashi, K. et al. (2019) “Prediction of Future Gastric
Cancer Risk Using a Machine Learning Algorithm and Comprehensive Medical Check-
Up Data: A Case-Control Study.” Scientific Reports, 9: 12384.
Yan, Y, Chen, Y, Jia, H, Liu, J, Ding, Y, Wang, H, Hu, Y, … Li, S. (2019) “Patterns of Life Lost
to Cancers with High Risk of Death in China.” International Journal of Environmental
Research and Public Health, 16(12): 2175.
Yasar, A., Saritas, I., Korkmaz H. (2019) “Computer-Aided Diagnosis System for Detection
of Stomach Cancer with Image Processing Techniques.” Journal of Medical Systems,
43(4): 99.
Zhao, Y., Hu, B., Wang, Y., Yin, X., Jiang, Y., and Zhu, X. (2022) “Identification of Gastric
Cancer with Convolutional Neural Networks: A Systematic Review.” Multimedia Tools
and Applications, 81(8): 11717–11736.
Zhou, B., Zhou, Z., Chen, Y., Deng, H., Cai, Y., Rao, X., Yin, Y., and Rong, L. (2020)
“Plasma Proteomics-Based Identification of Novel Biomarkers in Early Gastric Cancer.”
Clinical Biochemistry, 76: 5–10.
Zhu, S.L., Dong, J., Zhang, C., Huang, Y.B., and Pan, W. (2020) “Application of Machine
Learning in the Diagnosis of Gastric Cancer Based on Noninvasive Characteristics.”
PLoS One, 5(12): e0244869.
T.me/nettrain
T.me/nettrain
Chapter 12
12.1 Introduction
The sewer pipe systems are a component of civil infrastructure that is designed to
collect wastewater. In the United States, 800,000 miles of public and 500,000 miles
of private lateral sewers are connected to public sewer pipes. Most of the sewer
systems are between 30 and 100 years old and suffer pipe blockage, which is mainly
caused by grease and debris. Sewer defects are caused by root intrusion, line breaks,
and unequal amounts of inflow and infiltration. For example, cracks can be divided
into longitudinal cracks, vertical cracks and complex cracks consisting of both lon-
gitudinal and vertical ones. Pipe deposits are of different types, such as attached
deposits (attached to the pipe wall) and settled deposits (settled on the pipe floor).
As for tree root intrusion, this can belong to the mass root (high density or large
occupying area) or minor root (scattered and with low density). All these defects
such as cracks, deposits, tree root intrusions, as well as water infiltrations, impair
the performance of the sewer pipe. Currently, visual inspection techniques such as
closed-circuit television (CCTV) are commonly used for underground sewer pipe
inspection.
12.2 Related Work
One of the challenging problems in today’s world is the detection of automatic sewer
pipes. The sewer pipe system is designed to collect sewage water and groundwater or
stormwater [1], which are monitored by CCTV videos in some municipalities [2,
3]. It is expected that approximately 19,500 sewer pipe systems [4] can handle the
flow of 50 billion gallons of raw sewage but most of the sewer systems have become
obsolete, as they are 30–100 years old [5]. There are various sewer pipe defects such
as cracks, roots, deposits, and infiltrations, etc. [6]. For example, cracks [7, 8] are
divided into longitudinal cracks (same orientation of the pipeline), vertical cracks
(orientation is vertical to the pipeline) as well as complex cracks consisting of both
longitudinal and vertical ones [9].
Deposits are of different types, such as attached deposits (attached to the pipe
wall) and settled deposits (settled on the pipe ground) [10]. As for the tree root
intrusion, this can be a mass root intrusion (high density or large occupying area)
T.me/nettrain
Sewer Pipe Defect Detection in CCTV Images | 223
or a minor root (scattered and with low density). In the visual inspection technique
by CCTV video images, a large number of inspection images are mostly conducted
manually to identify defect types and locations, which is time-consuming, costly,
labor-intensive, and inaccurate. Computer vision is performed by a computer or
machine that can recognize digital images or videos [11]. The computer vision
techniques [12, 13] are Haar detector [14], SIFT [15], HOG [16] using larger
HOG filters [17] and the deformable part-based model (DPM) [18]. However, these
methods require the complex manual design of the feature extractor and training
process, which are inefficient. The digital image processing techniques are segmenta-
tion [19], histogram equalization, and transformation.
Deep learning-based approaches have been developed for object detection [20]
to address the limitation of conventional methods. The R-CNN method uses region
proposals generated through an external method called the selective search for the
input image. Each warped region proposal image is forwarded to a CNN model
to compute the features into a support vector machine (SVM) [4] classifier to cal-
culate the classification scores. Bounding box regression is then conducted for the
classified image so that the location of each object can be predicted. One limitation
of R-CNN is that the multi-staged training process is time-consuming and requires
large computation cost. In addition, the detection speed is quite slow for each image
as the convolution, classification, and regression need to be implemented for each
region.
The fast R-CNN is applied to the input image to produce the feature map,
on which the region proposals (RPN) [21] are generated by the selective search.
Through a Region of Interest (ROI) pooling layer, the generated region proposals
are converted into fixed-length vectors, which are fed into the fully connected layers
for both classification and bounding box regression. Compared with RCNN, fast R-
CNN [22] performs training using multi-task loss in a single-stage manner, which
greatly reduces the training time. The detection is faster because the convolution
process is only conducted once for the original image instead of for all the region
proposals, and the detection accuracy speed is improved. Cheng et al. have adopted
a method to identify cracks depending on conventional computer vision technique
using faster R-CNN with deep learning techniques [6]. The detection model is
trained using 3,000 images collected from CCTV inspection videos of sewer pipes.
After training, the model is evaluated in terms of detection accuracy and com-
putation cost using mean average precision (MAP), missing rate, detection speed,
and training time. Specifically, the increase of dataset size and convolutional layers
can improve the model’s accuracy. The experiment results demonstrate that dataset
size, initialization network type, training mode and network hyper- parameters
model performance are improved. The adjustment of hyper-parameters such as filter
dimensions or stride values contributes to higher detection accuracy, achieving MAP
of 83 percent accuracy and faster detecting speed in the sewer pipe. It is very efficient
for the automated detection of sewer pipe defects in CCTV.
T.me/nettrain
224 | Machine Intelligence
A robust methodology for detecting faults in sewer pipes has been developed.
Faults of all defined type are detected in still images (81.5 percent accuracy). The
methodology can be effectively applied to continuous footage (80 percent accuracy).
Random Forests are identified as the most appropriate machine learning classifier.
Smoothing greatly reduces the False Positive Rate [22].
This method presents the development of an automated tool to detect some
defects such as cracks, deformation, settled deposits, and joint displacement in sewer
pipelines. The automated approach is dependent upon using image-processing
techniques and several mathematical formulas to analyze output data from CCTV
camera images [23]. Automated CCTV interpretation could improve the speed and
accuracy of inspections. The defects in CCTV inspection use CNNs for classifying
sewer defects. The CNNs were trained and tested on a set of 12,000 images collected
from over 200 pipelines [24].
12.3 Proposed Methodologies
The deep learning technique for object detection using faster R-CNN with the com-
binations of different CNN architecture networks and image processing techniques
is used to detect defects.
12.3.1.1 Convolutional Layer
Every neuron takes inputs from a rectangular n*n section of the previous layer. The
rectangular section is called a local receptive field. The layer is combined with bias
and weights to produce a feature map. A parameter could be viewed as a trainable
filter or kernel F, the convolutional process could be considered as acting as an image
convolution and it takes input from the previous layer (Eq. 12.1). Sometimes it may
be called a trainable filter from the input layer to the hidden layer, based on the fea-
ture map with shared weights and bias.
n
xij (l ) = σ(b + ∑Wr , c X i(+l −r ,1j)+ c (12.1)
c =0
T.me/nettrain
Sewer Pipe Defect Detection in CCTV Images | 225
12.3.1.2 Feature Map
The convolved feature matrix is formed by sliding the filter over the image and
computing the dot product. Every layer feature map output acts as input to the
next layer.
L ({ p } , {t }) = N1 ∑L (
i i
i
cls pi , pi * ) + »
1
N reg
∑p
i
i
*
Lreg t ,t *
(i i ) (12.2)
cls
where 1 indicates the number of the anchor, pi represents the predicted probability
of anchor i being one type of sewer pipe defect. pi * represents the ground-truth
label of anchor i, where pi * equals 1 if the anchor is a positive sample and 0 if it
is negative. ti is a vector indicating the coordinates of the predicted bounding box
while ti* represents the ground-truth bounding box related to a positive anchor. N cls
and N reg are two normalization factors for the two components and are weighted by
a balancing parameter λ. With reference to a related study [22], the value of N cls is
set to be 256 and N reg equals 2,400 which is an approximate value of the number of
anchor locations in the model. The balancing parameter λ is set to 10, so that both
cls and reg terms are roughly weighted equal.
T.me/nettrain
226 | Machine Intelligence
Figure 12.1 Architecture model for developed faster R-CNN for sewer pipe
defect detection.
12.3.2 System Architecture
This proposed work is to detect sewer pipe defects trained in different networks
and apply some background effects. Defects have been detected and feature
matching is carried out. The flow diagram for pipe defect detection is shown in
Figure 12.3. In this method, sewer pipe images have been collected from CCTV
inspection videos.
T.me/nettrain
Sewer Pipe Defect Detection in CCTV Images | 227
T.me/nettrain
228 | Machine Intelligence
are fed into the fast R-CNN detector for further refining, so that more accurate
classification and bounding box localization are obtained. This predicts the location
of the proposed bounding boxes regression, thereby getting the best nearby ground-
truth box for the anchor box. The coordinates of the anchor are calculated using Eqs
(12.3)–(12.6).
x − xa y − ya
tx = , ty = (12.3)
wa ha
w h
t w = log , t = log (12.4)
wa h ha
x* − xa y* − ya
t *x = , t *y = (12.5)
wa ha
w* * hh *
t *w = log , t h = log (12.6)
wa ha
T.me/nettrain
Sewer Pipe Defect Detection in CCTV Images | 229
1 1 1
V = R + G + B (12.7)
3 3 3
3
S =1− min ( R ,G , B ) (12.8)
(R + G + B )
θ if B ≤ G
H= (12.9)
360 − θ if B > G
12.3.3 Crack Detection
Crack detection is used to identify cracks in an image and draw bounding boxes
around them. This is a very important problem in computer vision in numerous
applications. Figure 12.6 uses crack detection to identify crack roots, deposits, and
infiltration.
T.me/nettrain
230 | Machine Intelligence
12.4 Experimental Results
This proposed technique has been applied to detect defects, such as cracks, roots,
deposits, infiltration, etc. This work takes 3,000 defects in various video images to
prove experimental results. For defect detection, different sets of images are taken
freely in different scenes, under various lighting conditions and they create defects
in bounding boxes using faster R-CNN. Performance of the method is calculated
using average precision (AP) and mean average precision (MAP) in overall defects
and achieves better results. The proposed method applies some digital image pro-
cessing techniques to produce better results. The dataset used in our experiments is a
collection of sewer pipe inspection videos (CCTV footage), with that video converted
into frames. The frames are more than 30,000–1,00,000 images depending upon
CCTV videos.
12.4.1 Performance Analysis
The sewer pipe detects the defects such as cracks, roots, deposits, and infiltration
performance of ZF networks. Figure 12.7 (a) shows average result, Figure 12.7
(b) shows the Resnet50 network gives better performance compared to ZF net,
and Figure 12.7 (c) shows that VGG16 gives better performance than the ZF and
Resnet50 networks.
Our experimental results show that the proposed model based on the VGG net-
work outperforms well with a detection speed of 0.99 secs and training time of
8.573 secs better than the existing network of ZF and Resnet50. After training,
this model is evaluated in terms of detection accuracy achieving MAP of 90 percent
using the VGG network in the sewer pipe system. The defect detection MAP and the
AP value are mentioned in Table 12.1.
12.4.2 Measure of Effectiveness
To calculate the AP and MAP for all the class, use Eqs. (12.10) and (12.11)
1 max p( pr~ )
AP = ∑
11 r ∈{0,0,1,0,2,…1} pr~ : pr~ ≥ r
(12.10)
1
MAP =
N cls
∑AP i (12.11)
i
T.me/nettrain
Sewer Pipe Defect Detection in CCTV Images | 231
Figure 12.7 Chart for (a) ZF network, (b) RESNET50 network and (c) VGG 16
networks.
T.me/nettrain
232 | Machine Intelligence
Figure 12.8 Defect images trained with ZF network. (a) sample oval crack
outcome, (b) longitudinal crack, settled and attached deposit (c) vertical crack.
Figure 12.9 Defect images trained with Resnet50 network (a) attached deposit,
(b) mass root (c) vertical crack.
T.me/nettrain
Sewer Pipe Defect Detection in CCTV Images | 233
Figure 12.10 Defect images trained with VGG16 network (a) longitudinal crack
(b) infiltration.
defected image are shown in Figure 12.13. Finally, segmented defects are created
using bounding boxes (Figure 12.14).
12.5 Conclusion
This work suggests solutions to pipe system detection. Many pipe systems are old
so various defects such as roots, cracks, deposits, and infiltration are found, which
can lead to serious consequences. CCTV is used for sewer pipe inspection by cap-
turing inspection videos. Manual interpretation of the CCTV inspection videos is
time-consuming, and the assessment results are affected by the inspector’s expertise
and experience. Here defects are found quickly using faster R-CNN with VGG
T.me/nettrain
234 | Machine Intelligence
T.me/nettrain
Sewer Pipe Defect Detection in CCTV Images | 235
techniques and some digital image processing techniques were applied, which
show defects accurately and fix the defect detection. The performance of the net-
work is measured by the MAP value. This proposed method yields the better MAP
average 90 percent which is higher than the 36 percent and the 7 percent of the ZF
and Resnet50 respectively. Future work may focus on defect detection in order to
improve the accuracy rate as well as detection speed further.
References
1. Rinker Materials (2009) “Storm Sewer Pipeline Laser Profiling.” Available at: www.rin
kerpipe. com (accessed November 19, 2017).
2. ASCE “Infrastructure Report Card.” Available at: www.asce.org/infrastructure/.
3. J. Mattsson, A. Hedström, M. Viklander, and G.T. Blecken (2014) “Fat, Oil, and Grease
Accumulation in Sewer Systems: Comprehensive Survey of Experiences o Scandinavian
Municipalities.” Journal of Environmental Engineering, 140 (3). http://dx.doi.org/
10.1061/(ASCE) EE.1943-7870.0000813.
4. C. Cortes and V. Vapnik, (1995) “Support-Vector Networks.” Machine Learning 20(3):
273–297.: https://doi.org/10.1007/bf00994018.
5. EPA (n.d.) “Why Control Sanitary Sewer Overflows?” Available at: www3.epa.gov/
npdes/pubs/ sso_casestudy_control.pdf.
6. Jack C.P. Cheng and Mingzhu Wang (2018) “Automated Detection of Sewer Pipe Defects
in Closed-Circuit Television Images Using Deep Learning Techniques.” Automation in
Construction, 95: 155–171.
7. H. Zakeri, F.M. Nejad, and A. Fahimifar (2016) “Image Based Techniques for Crack
Detection, Classification and Quantification in Asphalt Pavement: A Review.” Archives
of Computational Methods in Engineering, 24: 935–977. https://doi.org/10.1007/s11
831-016-9194-z.
8. G. Li, S. He, Y. Ju, and K. Du (2014) “Long-Distance Precision Inspection Method for
Bridge Cracks with Image Processing,” Automation in Construction, 41: 83–95. https://
doi.org/10. 1016/j.autcon.2013.10.021.
9. A. Hawari, M. Alamin, F. Alkadour, M. Elmasry, and T. Zayed (2018) “Automated
Defect Detection Tool for Closed Circuit Television (CCTV) Inspected Sewer
Pipelines.” Automation in Construction, 89: 99– 109, https://doi.org/10.1016/j.aut
con.2018.01.004.
10. R.J. King Wong Allen (2009) Hong Kong Conduit Condition Evaluation Code, 4th edn.
Hong Kong: Hong Kong Institute of Utility Specialists.
11. D.H. Ballard and C.M. Brown (1982) Computer Vision. Englewood Cliffs, NJ: Prentice
Hall.
12. R. Girshick (2015) “Fast R-CNN.” Paper presented at IEEE International Conference
on Computer Vision (ICCV), IEEE, Santiago, Chile, pp. 1440–1448, https://doi.org/
10.1109/.
13. C. Koch, K. Georgieva, V. Kasireddy, B. Akinci, and P. Fieguth (2015) “A Review on
Computer Vision Based Defect Detection and Condition Assessment of Concrete and
T.me/nettrain
236 | Machine Intelligence
T.me/nettrain
Chapter 13
13.1 Introduction
Artificial Intelligence (AI) is based on the workings of the human brain, in order
to build smart machines. AI is capable of performing work intelligently without
human intervention, and is becoming capable of acting and thinking like a human.
AI involves natural language processing, speech recognition, expert systems and
machine vision to behave like a human. Human intelligence involves the mental
capability which includes planning, the skill to learn, to solve difficulties, to under-
stand complex ideas, to be quick at learning, to be able to reason abstractly and to
learn through practice. AI is not learning from a book or academic skills. Rather, it
replicates a larger and deeper understanding of its surroundings.
think, while Thorndike emphasized learning and the ability to offer good responses
to questions. In general, the psychologists agreed that adapting to the environment
is the key understanding of intelligence. This adaption has a number of intellectual
processes such as perception, learning, memory, reasoning and solving problems.
Thus, intelligence is not a single ability but the combination of many abilities. AI
uses computers to simulate human intelligence.
13.1.2.1 Type 1 Category
AI includes the ability to think like humans, solve puzzles, to reason, plan, make
judgments, learn and communicate on its own.
■ Narrow AI: Narrow AI is one of the AI types which performs a task based on
intelligence. The most commonly used AI is narrow AI. It is trained only for a
specific task, so it cannot perform beyond the trained task. Hence it is called
weak AI. It fails in unpredictable ways when the task is beyond the limit of
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 239
Super AI; Super AI is the level of intelligence which could exceed human intelli-
gence and could perform a task better than a human, along with possessing cognitive
properties. It is the result of General AI development. Super AI is still a theoretical
concept of AI. Development of Super AI is still a challenging task in the real world.
The second main type of AI is the Type 2 category which is divided into reactive
machines, limited machines, theory machines and self-aware machines.
13.1.2.2 Type 2 Category
■ Reactive machines: Reactive machines (RM) are the basic types of AI. This
AI system will not store memories for future use. This system focuses on the
current scenario and performs the best action. An example of an RM is the
Deep Blue System by IBM and AlphaGo by Google.
■ Limited machines: Limited machines (LM) store the experience and the data
for a short period of time. An example of an LM is the self-driving car. These
cars store the recent driven speed and the navigation information.
■ Theory of mind: Theory of mind AI should recognize human emotions, beliefs
and people and communicate like humans. These kinds of system are still
under development.
■ Self-awareness: Self-awareness is the future of AI. This machine will be super-
intelligent and have sentiments, consciousness and even self-awareness. These
machines are smarter than humans.
■ Artificial Narrow Intelligence (ANI) is the most used AI. ANI performs one or
two tasks based on the training data and its learning experience through the
previous tasks. Currently, the core development of AI used for decision-making
T.me/nettrain
240 | Machine Intelligence
can be categorized under ANI. It is also called weak AI as it completes the task
under a predefined and limited set of constraints.
■ Artificial General Intelligence (AGI) is related to the theory of mind function-
ality. The AGI is still in the development stage. The main target of AGI is to
create independent connections through various domains.
■ Artificial Super Intelligence (ASI) is the future development of AI achievement.
AI will improve in memory performance and better decision-making. It will
provide an outstanding amount of memory, a fast processing capacity and
enhanced way of intelligent decision-making.
13.1.4 Types of Intelligence
The intelligence system is a branch of AI which describes the calculation ability,
reasoning, perception, the ability to store and retrieve the data through memory,
complex problem solving and adaption.
■ Linguistic intelligence: This is the skill to speak, recognize sounds and the
mechanism of phonology such as speech and sound, grammar syntax and
meaning. Narrators and Orators are examples of linguistic intelligence.
■ Musical intelligence: The skill to create, communicate and also understand the
meaning of sound, pitch and rhythm. Musicians, singers and composers are
examples of musical intelligence.
■ Logical mathematical intelligence: This is the skill to use and understand the
relationship in the absence of the action or objects and understand complex
ideas. Mathematicians and scientists are examples of logical mathematical
intelligence.
■ Spatial intelligence: This is the skill to perceive visual or spatial informa-
tion, recreate visual images without reference to the objects, use 3D image
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 241
construction and to rotate and move them. Astronauts, map readers and
physicians are examples of spatial intelligence.
■ Bodily-kinesthetic intelligence: This is the skill to solve the complete or part
of body problems or products of fashion, the ability to control motor skills
and object manipulation. Sports players and dancers are examples of bodily-
kinesthetic intelligence.
■ Intra-personal intelligence: This is the skill to distinguish among one’s own
feelings, purposes and inspirations. Gautam Buddha is an example of intra-
personal intelligence.
■ Inter-personal intelligence: This is the skill to recognize the variation in other
people’s feelings, purposes and beliefs. Mass communication and interviewers
are examples of inter-personal intelligence.
The intelligent agents have five categories. Grouping of these agents is based on their
capabilities and the level of superficial intelligence. Figure 13.3 shows the categories
of intelligent agents. The five categories are:
T.me/nettrain
242 | Machine Intelligence
1. Simple reflex agents: The simple reflex agents use the current experience rather
than the past performed history. The condition action rule is the basis for the
function of an agent. The success of the agent function is because the observa-
tional environment is ideal.
2. Model-Based Reflex Agents (MRA): The MRA considers the historical data
in its actions. The MRA can perform well even if the environment is not
fully observable. This agent uses the internal model which determines the
history and the effects of action. It reflects on the current state that was
unobserved.
3. Goal-based agents have higher capabilities than the previous agents. They use
the goal information for the capability description. These allow it to choose
between various possibilities. These agents select the best possibility which
enhances the performance of the goal.
4. Utility-Based Agents make decisions based on utility. These agents are more
advanced than the goal-based agents. Using a utility function, a state is
mapped against the measure of utility. A rational agent selects the action
which optimizes the utility of the outcome.
5. The learning agents have the capability to learn through the previous experi-
ence. The elements of learning agents are as follows:
a. Learning element: This element makes the agents learn through previous
experiences.
b. Critical element: It provides feedback on the agents’ performance.
c. Performance element: This depends on the external action which needs to
be taken.
d. Problem generator: It provides the feedback agent which performs some
tasks, such as making suggestions and keeping track of the history.
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 243
■ Sensors: Sensors detect the changes in the environment. The details are
sent through other devices. In AI the environment is observed through the
sensor agent.
■ Actuators –Actuators convert the energy into motion and perform the role of
moving and controlling a system. Examples are motors, rails and gears.
■ Effectors: The effectors affect the environment. Examples are fingers, legs,
wheels, display screens and arms.
Figure 13.4 shows the inputs from the environment are received through the intel-
ligent agent by sensors. The sensor agent uses AI to make decisions based on the
observation through experience of prior tasks. Then action is activated through
actuators. Future decisions are based on history and past actions.
T.me/nettrain
244 | Machine Intelligence
■ Virtual assistant: The virtual assistants, like Amazon’s Alexa, Google’s assistant
and Apple’s Siri, use the voice-to-text technology and AI to perform searches,
with connectivity based on the internet, for online orders, list creation,
reminder setting and answering questions. If integrated in the smart home
and entertainment system, it can be used to automate the daily routine.
Algorithms are used for the conversational interfaces to train the system to
cater for the customer. Advanced chatbots could give an answer to the com-
plex questions and give the customer the impression that they are talking
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 245
■ Hands-free steering: This allows the car to drive without the driver’s hand
on the steering wheel. But the attention of the driver is required.
■ Cruise control: This automatically maintains the distance between the
driver’s car and the car in front.
■ Auto steer: This is an advanced driver assistant system which keeps the road
vehicle centered in the lane.
T.me/nettrain
246 | Machine Intelligence
13.2.2 E-Learning Tools
The digital era enhances different areas in society, also including education. Online
education has developed in the past few years, based on different types of interactive
tutorials, e-books, video tutorials, etc. Moreover, even one-to-one interaction with
students makes the platform more comfortable and also offers individual attention
and understanding of the subject.
13.2.3 Conducting Auctions
Numerous websites provide products for auction. Customers bid for the product
and make the payment according to their bidding amount. E-bay is one such web-
site which provides the bidding option for the product.
13.2.4 Travel Reservations
Booking tickets for flights or hotels is one of the popular uses of e-commerce. Online
travel reservation is one of the most convenient ways to save time and to find the
different options for destinations and offers.
13.2.6 Electronic Banking
The mobile app for electronic banking has been introduced by many banks. Using
the mobile or through the computer, customers can connect to the bank and control
their financial dealings in the comfort of their own surroundings. They can even pay
many bills through these facilities. This kind of service reduces the work of the staff.
13.2.8 Customer Service
Businessmen get an opportunity to communicate with the customers through the
internet. Through online messages, the product defects can be identified through the
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 247
reviews by the customers. Complaints can be rectified through various services based
on the customers’ feedback.
13.4.1 Cardiovascular Abnormalities
In recent years, using the AI technique, cardiac imaging analysis has greatly
developed. AI can help to analyze the results of echocardiography, coronary angi-
ography, and electrocardiogram. The different size of the heart shows the cardio-
vascular risk in each individual. Automating the identification of the abnormality
in common imaging tests including chest X-rays, MRI, etc. could lead to quicker
decision-making with few errors. In addition, AI can also analyze echocardiographic
T.me/nettrain
248 | Machine Intelligence
images that include the measurement of the size of each chamber and the assessment
of functions of the left ventricle.
AI can also help the physician to make a quick decision on the identification of
the sub-clinical organ dysfunction with the help of the clinical-related information.
The continuous development of AI with the sub-domains machine learning and
deep learning makes it attractive to medical experts as it can create a reliable and
efficient model for healthcare.
Echocardiography is critical in diagnosis and management of cardiovascular
diseases and the accurate assessment of cardiac functions and structures. AI tools
provide new features in machine learning to enhance accuracy, based on image inter-
pretation. Machine learning helps in automatic interpretation of the unused data
generated by the multidimensional modality of imaging.
A wide range of machine learning models is used for the diagnosis of cardiovas-
cular disease in clinical echocardiography practice. Sengupta et al. [1] proposed a
cognitive machine learning technique with the speckle tracking echocardiographic
data which differentiates constrictive pericarditis from restrictive cardiomyopathy.
The research work on the interpretation of the speckle tracking echocardiographic
data is continuing. Naurla et al. [2] developed a supervised learning algorithm to
differentiate the heart of athletes and hypertrophic cardiomyopathy based on the
speckle tracking echocardiographic data. Another application of heart valve dis-
ease using machine learning model is given by [3, 4]. Playford et al. [5] estimated
whether AI could analyze the aortic valve area in aortic valve stenosis without meas-
uring the left ventricular outflow track from other echocardiographic data with an
accuracy of 95 percent.
In cardiac MRI, segmentation of ventricles in the heart is one of the fields which
are supported by machine learning models. AI improves the efficiency of the clinical
assessment [6–8]. Avendi et al. [9] proposed a deep learning technique for auto-
matic detection and segmentation of the right ventricular chamber, based on the
cardiac MRI dataset. In the same way, various automated neural network techniques
are used for the segmentation of the right ventricle based on the cardiac cine MRI
[10–11]. Dawes et al. [12] proposed the supervised machine learning of systolic car-
diac motion for an early prediction of right heart failure in patients with pulmonary
diseases.
Machine learning algorithms in CT are increasingly used in the diagnosis of
risk assessment of coronary artery disease and in atherosclerosis. Coronary com-
puter tomographic angiography is a non-invasive medical modality to detect cor-
onary artery disease. Various machine learning models have been developed [13]
to control non-invasive fractional flow reserve and increase the performance of cor-
onary computer tomographic angiography. Wolterink et al. [14] used supervised
machine learning to identify coronary artery calcification. Machine learning ana-
lysis is performed on prospective and retrospective data, such as clinical, coronary
computer tomographic angiography imaging, biohumoral, lipidomics, etc. to distin-
guish the high risk patients.
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 249
13.4.2 Musculoskeletal Injury
Imaging is one of the important tools for the identification of patients with muscu-
loskeletal conditions. Increased usage of imaging includes an operational efficiency
with better accuracy and better quality of the image. AI is an exciting tool to help
the radiologists meet their requirements.
13.4.3 Neurological Diseases
The neuro- oncology field has many challenges and AI can help to overcome
them. AI helps in the field of neurodegeneration that includes Alzheimer’s disease,
Parkinson’s disease, or amyotrophic lateral sclerosis. AI supports in the assessment
of brain tumors and their diagnosis. Heidelberg University and the German Cancer
Research Centre trained the machine learning algorithm, based on the 500 MRI
scans of patients with brain tumors. Volumetric tumor segmentation is the result,
due to the algorithm used to detect brain tumors automatically based on the MRI
scan. The AI technique has great value, offers an accurate diagnosis, which helps in
finding the response of the therapy. Prediction is one of the other applications in
neuro-oncology. AI is used to speed up the process based on brain mapping or func-
tional brain imaging.
13.5 Advantages of AI
13.5.1 Automation
Automation is one of the source benefits of AI models which has an influence on
the transportation industry, communications, the service industry, and consumer
products. Automation leads to high accuracy along with the efficient use of the raw
T.me/nettrain
250 | Machine Intelligence
materials, to improve production quality, lead time reduction and offer superior
safety.
13.5.2 Smart Decisions by AI
AI makes smarter decisions for businesses. AI includes data delivery, analysis of
the trends, developing data consistency, providing forecasts, and assessing uncer-
tainty when making a better decision. If AI is not supported to imitate the human
emotions, it remains unbiased on the matter, which helps to make better decisions
for the development of the system.
13.5.5 Data Analysis
AI and machine learning are used to analyze the data in an efficient way. It helps
in creating the model to interpret and process the data for the future outcome. The
advanced capability of the computer can speed up the process as the manual inter-
pretation by humans takes too long to understand the data. The AI and machine
learning techniques are designed to solve complex problems. These include inter-
action with customers, detection of fraud, weather reporting and medical diagnosis.
Also, AI helps in finding the best solutions to the challenges which increases prod-
uctivity and reduces costs.
13.5.6 Stability of Business
Business forecasting based on AI helps to make critical decisions, even in emer-
gencies, to ensure the stability of the business. In general, there are many benefits
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 251
for business by using AI to increase efficiency. AI has a great ability to gather vast
amounts of data and automatically does all the work, based on the training and
implementation. AI can increase the business performance in the following ways:
■ sales prediction
■ extracting and reviewing data
■ leveraging smart chatbots
■ AI tools implementation
■ automatic call management
■ real-time operations.
13.5.8 Minimizing Errors
AI solves complex problems consistently and also speeds up the process which is
unmatched by human intelligence. But AI can’t replace human intelligence, it needs
the right problem to solve, the right data is required, in order to design the solutions
for the problem and develop the right process for AI to adapt, to learn through errors
and produce better results. AI also provides more time for the employees to handle
the impactful task of the business. Even the wrong diagnosis of the doctors can be
reduced, based on the large amount of data which connects the symptoms of illness
and the cause. AI strengthens the decision of the doctors and removes the error,
based on machine learning and the pattern recognition algorithm.
Undoubtedly, healthcare and other businesses around the world are leveraging
AI to produce important leaps in error management.
13.6 Training AI
When we train AI, we are teaching the model to interpret the data and learn through
it to perform the task with accuracy. Only through training to correctly interpret
the data will AI make accurate decisions based on the information provided. All
machine learning projects require high quality data in order to be successful. If the
AI is trained on poor features, then the accuracy of the output will be poor quality
of AI. AI training has three stages. Figure 13.5 shows the three stages of the AI
model: the training stage, the validation stage and the testing stage.
T.me/nettrain
252 | Machine Intelligence
1. Training stage: In the first training stage, an AI model is given a set of training
data and makes decisions based on the trained information. AI sometime
stumbles in this stage. Like a child, it is beginning its learning. As we point
out the mistakes, the adjustment has to be made which helps AI become more
accurate. One of the main things to avoid is overfitting. Overfitting happens
when a model learns the noise details in the training stage that negatively
affect the performance of the new data.
2. Validation stage: Once the AI has completed the training stage, the next
stage is validation. This stage provides the information of how the AI
performs using the new set of data. As with the training stage, one has to
evaluate and confirm the behavior of AI and the new variable which is not
considered earlier. Issues with the overfitting will be revealed during the
validation stage.
3. Testing stage; After the training stage, the new dataset is sent to the testing
stage to see the performance of the AI model. The raw dataset is sent to the AI
model for testing, to see if AI can make perfect decisions on the unstructured
information. If the decision is wrong, the training stage has to be repeated
until it is performed accurately.
13.6.1 Success of AI Training
In general, we need three key features to train the AI model to work well: high
quality data, accurate annotation of data, and experimentation.
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 253
1. High quality data: The AI model requires high quality data. If the dataset
includes a small amount of poor quality data, then there is a chance it will
produce undesirable results. The situation of undesirable results yields error
in AI called bias.
2. Accurate annotation of data: The accurate interpretation of the data is more
essential than having high quality data. Otherwise, the AI model learns the
wrong interpretation. Correctly annotated images help AI to learn better and
perform with high accuracy.
3. Experimentation; Expect the AI to make mistakes during the training stage.
Errors also called bias are normal during the AI training process. When AI
does not interpret the data properly, the main key is to analyze the result
which helps to identify what went wrong during the training stage. The know-
ledge received through deep identification help to improve AI, which leads to
better prediction with high accuracy.
1. Supervised learning: Making the algorithm learn the data based on the labels
presented on the data. This in general means that the values for the prediction
are known, which are well defined from the beginning for the algorithm.
2. Unsupervised learning: Unlike the supervised methods, the algorithm doesn’t
have any labels or have the correct answers for the data; it is based on the
algorithm’s options which bring together the related data and understands it.
3. Semi-supervised learning: A hybrid learning of supervised and unsupervised
learning.
4. Reinforcement: Rewards are given to the algorithm for the correct prediction to
drive the accuracy high in reinforcement learning.
Data scientists are needed to determine the best statistical algorithms when using the
ML software. The most popular statistical algorithms are Naïve Bayes which is used
for sentimental analysis, detection of spam and for recommendations. Decision tree
is used for outcome predictions to improve the prediction performance, the random
forest algorithm is used which merges multiple decision trees. The logistic regression
is for binary classification. The linear regression is used to categorize a large dataset.
T.me/nettrain
254 | Machine Intelligence
Clustering the organized data into groups like market segmentation uses Gaussian
mixture, K-Means Clustering AdaBoost, and Recommender. Figure 13.6 shows the
model building of the data set.
There are three stages of learning based on the machine learning: training, valid-
ation and testing. Before it starts, it’s necessary to safeguard the data, making it well
organized and perfect. Getting the data transformed into order is time-consuming
and requires human intervention for the detailed-oriented process. The main goal
is for thee data to be free of data duplication, disconnection and typos. The data is
divided into three sets after cleaning which is used for the three stages of training.
Random data division is to discourage the selection of data biases.
Here are a few definitions relevant to the creation of the model:
Before the training starts, the initial stage is to label the data so the ML software
can process it with the help of vital clues. Unsupervised learning doesn’t require the
T.me/nettrain
Learning and Reasoning Using Artificial Intelligence | 255
labels which are added later. Default parameter values of ML software can start the
process or the parameters can be modified individually.
It is in the validation or the training stage that AI encounters the success criteria.
The new set of data is used in the first pass, if the results performed better, then it
proceeds to the final stage, which is testing. If the results performance is low, then
the ML software makes additional passes through the data, continues till the ML
software displays no patterns or it reaches the maximum number of passes. The
parameters are modified automatically by the ML software or managed by advancing
the training.
The final stage is the testing stage of a new set of data for supervised learning.
It will become a working model when the software passes the success criteria test.
If it fails, it has to go back to the training stage. The team will manually change
the parameters or the ML software modifies the parameters automatically at the
training stage.
A repetitive play of the ML software on the dataset is AI machine learning, auto-
matically changing parameters modifies the data iteratively by the ML software and
by human interventions to make the model smarter after each data pass. Multiple
passes are the work of the ML software until no new patterns are detected or it
reaches the maximum number of passes which causes it to stop.
AI freedom is the constant monitoring. To determine the performance of the AI
model requires the obvious task of monitoring which leads to actual performance
matching the AI prediction. If the prediction performance worsens, it has to re-enter
the ML training model process for the correction of the up-to-date data.
The input data are easily changeable over time, which is called data drift. Data
drift causes the accuracy of the AI model to deteriorate, data drift warnings should
be identified ahead of the problems. The tools of AI can track the data drift which
finds the outlier. Neptune, Fiddler and Azure ML can provide early warnings so the
data problems can be addressed by updates in the ML sooner rather than later.
13.7 Conclusion
We are on the cusp of a revolution of the artificial intelligence and there is a lot
more to come in future. There are many research studies on AI, and a detailed his-
tory would include AI applications in public service, military applications based on
AI, the ethics of AI and the rules of robotics. AI is increasingly integrated into our
daily lives from video games, computers and kitchen appliances. However, there
are problems and challenges with AI. There are pros and cons of every new tech-
nology and AI is also no exception. Therefore, it is concluded that AI is a great tech-
nology which helps in new data science, engineering, machine learning and IT. It is
positioned for the development and maintenance of system software which runs on
AI algorithms and enhances the quality of human life.
T.me/nettrain
256 | Machine Intelligence
References
1. Partho P. Sengupta, Yen-Min Huang, Manish Bansal, Ali Ashrafi, Matt Fisher, Khader
Shameer, Walt Gall and Joel T. Dudley (2016) “Cognitive Machine-Learning Algorithm
for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis from
Restrictive Cardiomyopathy.” Circulation: Cardiovascular Imaging, 9(6).
2. Sukrit Narula, Khader Shameer, Mabrouk Salem Omar, Joel T. Dudley, and Partho
P. Sengupta (2016) “Machine-Learning Algorithms to Automate Morphological and
Functional Assessments in 2D Echocardiography.” Journal of the American College of
Cardiology, 68(21): 2287–2295.
3. R. Sengur (2012) “Support Vector Machine Ensembles for Intelligent Diagnosis of
Valvular Heart Disease.” Journal of Medical Systems, 36(4): 2649–2655.
4. C. Martin-Isla, V. M. Campello, C. Izquierdo et al. (2020) “Image-Based Cardiac
Diagnosis with Machine Learning: A Review.” Frontiers in Cardiovascular Medicine, 7.
5. E. Playford, L. Bordin, R. Talbot, B. Mohamad, B. Anderson, and G. Strange (2018)
“Analysis of Aortic Stenosis Using Artificial Intelligence.” Heart, Lung and Circulation,
27: S216.
6. H. B. Winther, C. Hundt, B. Schmidt et al. (2018) “v- Net: Deep Learning for
Generalized Biventricular Mass and Function Parameters Using Multicenter Cardiac
MRI Data.” JACC: Cardiovascular Imaging, 11(7): 1036–1038.
7. M. R. Avendi, A. Kheradvar, and H. Jafarkhani (2017) “Automatic Segmentation of
the Right Ventricle from Cardiac MRI Using a Learning-Based Approach.” Magnetic
Resonance in Medicine, 78(6): 2439–2448.
8. L. K. Tan, R. A. McLaughlin, E. Lim, Y. F. Abdul Aziz, and Y. M. Liew (2018) “Fully
Automated Segmentation of the Left Ventricle in Cine Cardiac MRI Using Neural
Network Regression.” Journal of Magnetic Resonance Imaging, 48(1): 140–152.
9. M. R. Avendi, A. Kheradvar, and H. Jafarkhani (2017) “Automatic Segmentation of
the Right Ventricle from Cardiac MRI Using a Learning-Based Approach.” Magnetic
Resonance in Medicine, 78(6): 2439–2448.
10. Q. Tao, W. Yan, Y. Wang et al. (2019) “Deep Learning-Based Method for Fully Automatic
Quantification of Left Ventricle Function from Cine MR Images: A Multivendor,
Multicenter Study.” Radiology, 290(1): 81–88.
11. T. A. Ngo, Z. Lu, and G. Carneiro (2017) “Combining Deep Learning and Level Set
for the Automated Segmentation of the Left Ventricle of the Heart from Cardiac Cine
Magnetic Resonance.” Medical Image Analysis, 35: 159–171.
12. T. J. W. Dawes, A. de Marvao, W. Shi et al. (2017) “Machine Learning of Three-
Dimensional Right Ventricular Motion Enables Outcome Prediction in Pulmonary
Hypertension: A Cardiac MR Imaging Study.” Radiology, 283(2): 381–390.
13. Y. Shiono, H. Matsuo, T. Kawasaki, et al. (2019) “Clinical Impact of Coronary
Computed Tomography Angiography-Derived Fractional Flow Reserve on Japanese
Population in the Advance Registry.” Circulation Journal, 83(6):1293–1301.
14. J. M. Wolterink, T. Leiner, B. D. de Vos, R. W. van Hamersvelt, M. A. Viergever,
and I. Isgum (2016) “Automatic Coronary Artery Calcium Scoring in Cardiac CT
Angiography Using Paired Convolutional Neural Networks.” Medical Image Analysis,
34: 123–136.
T.me/nettrain
Chapter 14
14.1 Introduction
Rapid developments in technology have resulted in people being able to express
their views and opinions freely across the social media networking sites (Daradkeh,
2022). Social media sites are a great platform for expressing emotive issues.
Emotion Artificial Intelligence, also termed opinion mining or sentiment ana-
lysis, is a procedure that involves analyzing whether a piece of text conveys a posi-
tive, negative or neutral opinion regarding any subject matter under consideration
(Paliwal et al., 2022; Shukla and Garg, 2022; Wu et al., 2022). The sentiments
behind such emotive issues can be analyzed using several sentiment analysis
methods like machine learning, lexicon-based, deep learning, hybrid and nature
inspired techniques (Alali et al., 2022; Dang et al., 2021; Ruz et al., 2022). Among
supervised and unsupervised machine learning techniques, unsupervised machine
learning techniques are usually applied for sentiment analysis as they do not require
training datasets. Clustering, an unsupervised machine learning technique, is quite
popular (Kaushik and Bhatia, 202; Pandey et al., 2022). Several NLP methods can
be adopted for evaluating the emotions considering any scenario (Dandekar and
Narawade, 2022). Opinion mining plays a great role in generating several models
that mimic the human thinking processes while handling uncertain situations (Kaur
et al., 2022). Sentiment analysis is an indispensable part of cognitive computing
(Pasupa and Ayutthaya, 2022). Opinion mining is multi-disciplinary in nature that
involves the culmination of Natural Language Processing, Machine Learning, Deep
Learning, Statistics, Mathematics, Psychology, Sociology, Artificial Intelligence and
many other domains. Innumerable research articles are available on sentiment ana-
lysis. Several tertiary studies are also carried out using secondary research articles
n opinion mining. The information obtained after implementing opinion mining
on a certain subject matter can be incorporated to explain, analyze and compre-
hend that particular phenomena (Pozzi et al., 2017). Considering the business area,
opinion mining plays a very important role in analyzing the customers’ views on a
particular product. Strategies to improve the business can be analyzed using senti-
ment analysis. Analyzing the needs of the customer is extremely important in the
current business domain which is highly aligned towards the customers (Chagas
et al., 2018).
1. Syntactic layer: The syntactic layer involves the following functions: normal-
ization of the dataset under consideration, disambiguation of boundaries
associated with the sentences, tagging parts of speech to each set of words
in the sentence or phrase, breaking down an entire sentence into smaller
chunks and lemmatizing the words. i.e., reducing each word to its root form
(Hemmatian and Sohrabi, 2017).
2. Semantics layer: The semantics layer involves the following tasks: detec-
tion of subjectivity and objectivity of sentences, extraction of underlying
concepts, anaphora resolution and recognition of named entities (Kumar and
Teeja, 2012).
3. Pragmatics layer: The pragmatics layer involves the following functions: detec-
tion of sarcastic elements in the dataset, understanding the metaphors involved,
extraction of aspects involved in a particular sentence and detection of polar-
ities (Cambria et al., 2017; Ligthartet al., 2021).
T.me/nettrain
Auto Encoder Ensemble Technique for Sentiment Analysis | 259
14.2.1 Classification of Sentiments
Classification of sentiments is one of the most frequently researched area in Computer
Science. Analyzing the polarities associated with the phrases in the dataset is a sub-
division of the classification of sentiments. Usually research work includes classifying
the polarities as negative or positive (Wang et al., 2014). Apart from classifying the
polarities as positive or negative, some of the research works include a third category
called neutral. Cross-language and cross-domain classification can be considered
as sub-divisions of the classification of sentiments. Ambiguity associated with the
word polarities still needs to be researched. Research is carried out on models for
the retrieval of information that can be considered an alternative methodology for
machine learning models to implement disambiguation of word polarity (Kumar
and Garg, 2020).
14.2.2 Subjectivity Classification
Subjectivity classification deals with analyzing whether subjectivity is present in the
considered dataset (Kasmuri and Basiron, 2017). The prime objective of subjectivity
classification is to eliminate objective phrases in the dataset. Subjectivity classifica-
tion is one of the foremost tasks associated with sentiment analysis. Words that are
associated with emotions like “better,” “fine,” etc. are detected by the subjectivity
classification system (ibid.).
T.me/nettrain
260 | Machine Intelligence
14.2.5 Extraction of Aspects
Aspect extraction is one of the major tasks involved in sentiment analysis (Wu et al.,
2022). In a particular scenario there can be a target entity, for example, the target
entity can represent a product, an event, a person or an organization (Kumar and
Sebastian, 2012). The concept of fine-grained sentiment analysis is quite popular
(ibid.). Extraction of aspects considering blogs that do not have any specific topics
and social media data like Tweets is extremely important. Several methods are used
for the extraction of aspects. The conventional and the first and foremost method to
implement extraction of aspects is analysis based on frequency. The analysis based on
frequency involves identifying regularly incorporated compound nouns and nouns
that can most probably correspond to aspects. A significant approach for frequency-
based analysis for aspect extraction says that if the noun or compound noun occurs
at a minimum of 1 percent of the statements, then it can be represented as an aspect.
The next way to figure out the aspects is on the basis of syntax. For example, it could
be analyzing the aspects that appear before a modifying adjective that is a word
carrying emotions. Through this method, aspects that do not occur frequently can
be discovered. An algorithm based on syntax is suggested by Qiu et al. (2009).
T.me/nettrain
Auto Encoder Ensemble Technique for Sentiment Analysis | 261
T.me/nettrain
262 | Machine Intelligence
learning (Javed and Muralidhara, 2022; Meena et al., 2022; Mohanty and
Mohanty, 2022).
6. Lexicon-based approach: One of the most conventional approaches for senti-
ment analysis is by considering the lexicon-based methodology (Hemmatian
and Sohrabi, 2017). The lexicon-based technique goes through the entire
document analyzing the terms that involves sentiments associated with them.
14.3 Research Methodology
14.3.1 Data Extraction
Global tweets on COVID-19 were retrieved using the snscrape Python library and
500 tweets from the month of May 2020 were obtained. Keywords like #covidStrain,
#covidstress, #covidanxiety, #covidmentalhealth, #coviddepression, #covidpain,
#covidfear, #covidmental, #alonetogether were considered when extracting the
T.me/nettrain
Auto Encoder Ensemble Technique for Sentiment Analysis | 263
tweets. The components of the dataset considered are tweet ID, the actual tweet, the
username associated with tweets, the location from which the tweets originated, the
date and time associated with each tweet. The extracted tweets consist of inconsisten-
cies and need to undergo the step of pre-processing to eliminate the inconsistencies.
14.3.2 Data Pre-Processing
The extracted tweets are subjected to pre-processing to eliminate inconsistencies.
Python programming is adopted to carry out pre-processing of the retrieved tweets.
Initially, the necessary packages like numpy, pandas, re, nltk, spacy and string were
imported. To maintain a sense of uniformity among the tweets, all the tweets were
converted to lower case. Punctuation was removed. Some of the most commonly
used words in English like “a,” “but,” “is” are considered as stop words. Stop words
are those that are not necessary from an analysis point of view. Stop words in the
considered dataset are eliminated using the nltk package. Rare words in the dataset
are removed. It is quite important to obtain the root form of a word to morpho-
logically analyze the words. Lemmatization is an important aspect of NLP that
reduces a word to its root form. The nltk package of Python plays a great role in
lemmatization of the considered tweets. Hyperlinks that are not necessary are also
scraped out.
Emoticons and emojis that convey the emotional feelings of a person in a con-
versation play an important role in opinion mining. Tweets contain lots of such
emoticons and emojis. The emoticons and emojis are converted to their respective
meanings. A number of HTML tags present in the dataset are also eliminated since
they are of no significance for analysis. A number of short forms, such as Ttyl (“Talk
to you later”) are used by users of social media sites like Twitter and Facebook.
The short forms or chat words like Ttyl are converted to the appropriate text form.
Duplicate tweets in the dataset are eliminated. Removal of duplicate tweets resulted
in a total of 498 tweets.
14.3.3 Polarity Classification
The pre-processing step of opinion mining removes inaccuracies and inconsistencies
in the considered dataset. The next step involves classifying each tweet as positive or
negative (Kalaivani and Jayalakshmi, 2022). The process of assigning the tags to each
tweet as positive or negative is termed polarity classification. The TextBlob library of
Python plays an important role in carrying out several Natural Language Processing
(NLP) tasks, sentiment analysis and classification procedures.
14.3.4 Tweet Classification
The present study involves distinguishing the tweets as negative or positive using
a novel auto-encoder network-based ensemble technique for sentiment analysis
T.me/nettrain
264 | Machine Intelligence
using tweets on COVID-19 data. Figure 14.1 gives a pictorial representation of the
research methodology adopted.
14.3.5 Auto-Encoder
An auto-encoder comes under the category of unsupervised learning. Auto-encoders
belong to the class of feed forward neural networks. They can also be interpreted
as a self-supervised method because the technique obtains its own labels using the
training dataset. The major aspects involved in auto-encoders are the construction
of the input layer, the encoder network, the decoder network and the output layer.
The dimensionality associated with the output layer of auto encoders is similar to
that of the input layer.
14.3.6 Adam Optimization
An optimization algorithms when applied with the machine learning or deep learning
algorithms yields great results. The Adam optimization algorithm, which can be
considered an extended aspect of the stochastic gradient descent, finds immense
T.me/nettrain
Auto Encoder Ensemble Technique for Sentiment Analysis | 265
applications in the field of NLP. The Adam optimizer is considered in the present
research work when defining the parameters of the auto-encoder network.
T.me/nettrain
266 | Machine Intelligence
algorithm is used widely to perform the tasks that involve classification. The struc-
ture of the algorithm corresponds to that of a tree kind of structure. A number of
nodes are associated with the tree structure. Features associated with the dataset are
represented by internal nodes. Decision rules associated with the decision tree are
represented by the branches of the tree structure.
14.3.10 Logistic Regression
Logistic regression comes under the umbrella of supervised machine learning
techniques. Logistic regression is a famous model that can be implemented to carry
out regression tasks. Samples of data can easily be classified using logistic regres-
sion. Logistic regression can thus be considered a classification algorithm. The end
result of a categorically dependent variable can be predicted easily using the logistic
regression classifier. Logistic regression is quite similar to linear regression. A logistic
regression model is implemented as a part of the ensemble model. Theory of pre-
dictive modeling is implemented using logistic regression.
14.3.11 Genetic Algorithm
The term “genetic” comes from the theory of evolution in the field of biology.
The idea of natural selection is brought about by the genetic algorithm. The gen-
etic algorithm plays an important role in problems that require optimization.
In the present work, GeneticSelectionCv is used. The features from the dataset
can be obtained from GeneticSelectionCv. The execution of code pertaining to
GeneticSelectionCv results in the display of selected features from the dataset
under consideration.
T.me/nettrain
Auto Encoder Ensemble Technique for Sentiment Analysis | 267
of data can easily be placed in the right class. Boundary of decision is also termed
a hyper plane. The present research work incorporates the stacking technique, as
stated. SVM classifier is chosen to be the meta classifier. A well-defined training
dataset is considered. The training dataset contains global tweets on COVID-19
from May 2020. The tweets in the training dataset are labeled as positive or negative.
A positive tweet is represented with the digit 1 and a negative tweet is represented
with the digit 0. The base-level models considered in the present research, namely
decision tree, gradient boosting, logistic regression and GeneticSelectionCv, are
trained, based on the well-defined training dataset formulated. The meta-classifier
under consideration is SVM, which is trained on the obtained outputs of the base-
level classifiers.
14.3.13 Dataset Visualization
Dataset visualization helps to obtain a systematic pictorial representation of the
dataset under consideration. Interesting characteristics and patterns in the dataset
can be analyzed efficiently using several graphs.
14.3.14 Word Cloud
A word cloud is one of the most fascinating dataset visualization techniques. A word
cloud helps to discover interesting trends in the data. Every word in the word cloud
appears in a different font size and color. Textual data is represented in a clear and
succinct manner according to its frequency of occurrence. Size associated with each
word in the dataset plays a major role in the analysis of word clouds. The bigger and
bolder the word, the greater is the frequency of the occurrence of that particular
word in the dataset.
14.4 Packages/Libraries of Python
The following are some of the libraries imported as a part of the current research work:
T.me/nettrain
268 | Machine Intelligence
14.4.2 TfidfVectorizer
tf-idfVectorizer is a Python library. TF-IDF stands for Term Frequency Inverse
Document Frequency. TF-IDF is one of the most common algorithms used by
researchers across the NLP domain, TF-IDF converts the textual data into appro-
priate numerical representation. The conversion of textual data into its appropriate
numerical representation helps in fitting any machine learning or deep learning
algorithm.
14.4.3 train_test_split
train_test_split is a function in Python that comes under the sklearn model selection
that helps to divide the dataset into training and testing datasets. Arbitrary partitions
of the considered dataset will automatically be made by the sklearn train_test_split
function.
14.4.4 Logistic Regression
Scikit-learn library of Python contains a specific pattern for modeling the machine
learning algorithms. Logistic regression classifier is modeled using the Scikit-Learn
library.
T.me/nettrain
Auto Encoder Ensemble Technique for Sentiment Analysis | 269
14.5 Conclusion
The total number of tweets considered for analysis is 500 (tweets on COVID-19
from May 2021). The number of tweets after removal of duplicate tweets is 498.
Textblob library of Python was considered to determine the polarity of the tweets.
The number of positive tweets in the dataset is 216 and the number of negative
tweets in the dataset is 79. Comparing the numbers of positive tweets and negative
tweets, the number of positive tweets is more, i.e., 216. A novel algorithm which is
a culmination of machine learning, deep learning, nature inspired techniques and
artificial neural networks is implemented. The suggested model is an ensemble of
decision tree algorithm, gradient boosting, logistic regression and genetic algorithm
based on the auto encoder technique. The objective of the research was to classify the
tweets as positive or negative for the considered dataset, that is tweets on COVID-19
from the month of May 2021. The proposed novel ensemble technique resulted in
an accuracy score of 81.25 percent.
Acknowledgments
Special thanks to Professor Rohini and Professor Joy Paulose for the constant support
and guidance throughout this research.
References
Alali, Muath, Nurfadhlina Mohd Sharef, Masrah Azrifah Azmi Murad, Hazlina Hamdan,
and Nor Azura Husin (2022) “Multitasking Learning Model Based on Hierarchical
Attention Network for Arabic Sentiment Analysis Classification.” Electronics, vol. 11,
no. 8, pp. 1193–1197.
Alsayat, Ahmed. (2022) “Improving Sentiment Analysis for Social Media Applications
Using an Ensemble Deep Learning Language Model.” Arabian Journal for Science and
Engineering, vol. 47, no. 2, pp. 2499–2511.
Amulya, K., S. B. Swathi, P. Kamakshi, and Y. Bhavani (2022) “Sentiment Analysis on
IMDB Movie Reviews Using Machine Learning and Deep Learning Algorithms.”
Paper presented at 2022 4thInternational Conference on Smart Systems and Inventive
Technology (ICSSIT), pp. 814–819,
Balakrishnan, Vimala, Zhongliang Shi, Chuan Liang Law, Regine Lim, Lee Leng Teh, and
Yue Fan (2022) “A Deep Learning Approach in Predicting Products’ Sentiment Ratings:
A Comparative Analysis.” The Journal of Supercomputing, vol. 78, no. 5, pp. 7206–7226.
Bhagat, Meenu, and Brijesh Bakariya (2022) “Sentiment Analysis Through Machine
Learning: A Review.” Paper presented at 2nd International Conference on Artificial
Intelligence: Advances and Applications, pp. 633–647.
Cambria, Erik, Dipankar Das, Sivaji Bandyopadhyay, and Antonio Feraco (2017) “Affective
Computing and Sentiment Analysis.” In Erik Cambria, Dipankar Das, Sivaji
T.me/nettrain
270 | Machine Intelligence
T.me/nettrain
Auto Encoder Ensemble Technique for Sentiment Analysis | 271
Krishnaveni, N. and V. Radha (2022) “A Hybrid Classifier for Detection of Online Spam
Reviews.” In Artificial Intelligence and Evolutionary Computations in Engineering Systems,
pp. 329–339. DOI:10.1007/978-981-16-2674-6_25
Kumar, Akshi and Geetanjali Garg (2020) “Systematic Literature Review on Context-Based
Sentiment Analysis in Social Multimedia.” Multimedia Tools and Applications, vol. 79,
no. 21, pp. 15349–15380.
Kumar, Akshi, and Mary Sebastian Teeja (2012) “Sentiment Analysis: A Perspective on Its
Past, Present and Future.” International Journal of Intelligent Systems and Applications,
vol. 4, no. 10.
Ligthart, Alexander, Cagatay Catal, and Bedir Tekinerdogan (2021) “Systematic Reviews in
Sentiment Analysis: A Tertiary Study.” Artificial Intelligence Review, vol. 54, no. 7.
Mamun, Md, Mashiur Rahaman, Omar Sharif, and Mohammed Moshiul Hoque (2022)
“Classification of Textual Sentiment Using Ensemble Technique.” SN Computer Science,
vol. 3, no. 1, pp. 1–13.
Meena, Gaurav, Krishna Kumar Mohbey, and Ajay Indian (2022) “Categorizing Sentiment
Polarities in Social Networks Data Using Convolutional Neural Network.” SN Computer
Science, vol. 3, no. 2, pp. 1–9.
Mohanty, Mohan Debarchan, and Mihir Narayan Mohanty (2022) “Verbal Sentiment
Analysis and Detection Using Recurrent Neural Network.” In S. De et al. (Eds.),
Advanced Data Mining Tools and Methods for Social Computing, New York: Academic
Press, pp. 85–106.
Nandedkar, S. C., and J. B. Patil (2022) “SeAbOM: Semi-Supervised Learning for Aspect-
Based Opinion Mining.” In Proceedings of International Conference on Data Science and
Applications, pp. 479–489.
Paliwal, Sushila, Suraiya Parveen, M. Afshar Alam, and Jawed Ahmed (2022) “Sentiment
Analysis of COVID-19 Vaccine Rollout in India.” ICT Systems and Sustainability, pp.
21–33. DOI:10.1007/978-981-16-5987-4_3
Pandey, A. C., Pandey, A. Kulhari, and D. S. Shukla (2022) “Enhancing Sentiment Analysis
Using Roulette Wheel Selection-Based Cuckoo Search Clustering Method,” Journal of
Ambient Intelligence and Humanized Computing, vol. 13, no. 1, pp. 1–29.
Pasupa, Kitsuchart, and Thititorn Seneewong Na Ayutthaya (2022) “Hybrid Deep Learning
Models for Thai Sentiment Analysis.” Cognitive Computation, vol. 14, no. 1, pp.
167– 193.
Pozzi, Federico Alberto, Elisabetta Fersini, Enza Messina, and Bing Liu (2017) “Challenges
of Sentiment Analysis in Social Networks: An Overview.” In F. Pozzi, et al. (Eds.),
Sentiment Analysis in Social Networks, New York: Morgan Kaufmann, pp. 1–11.
Qazi, Atika, Ram Gopal Raj, Glenn Hardaker, and Craig Standing (2017) “A Systematic
Literature Review on Opinion Types and Sentiment Analysis Techniques: Tasks and
Challenges.” Internet Research, vol. 27, pp. 608–630.
Qiu, Guang, Bing Liu, Jiajun Bu, and Chun Chen (2009) “Expanding Domain Sentiment
Lexicon through Double Propagation.” IJCAI, vol. 9, pp. 1199–1204.
Rabiu, Idris, Naomie Salim, Maged Nasser, Faisal Saeed, Waseem Alromema, Aisha Awal,
Elijah Joseph, and Amit Mishra (2022) “Ensemble Method for Online Sentiment
Classification Using Drift Detection-Based Adaptive Window Method.” In International
Conference on Reliable Information and Communication Technology, pp. 117–128.
T.me/nettrain
272 | Machine Intelligence
Ruz, Gonzalo A., Pablo A. Henríquez, and Aldo Mascareño (2022) “Bayesian
Constitutionalization: Twitter Sentiment Analysis of the Chilean Constitutional Process
through Bayesian Network Classifiers.” Mathematics, vol. 10, no. 2, pp. 166–172.
Shaaban, Mai A., Yasser F. Hassan, and Shawkat K. Guirguis (2022) “Deep Convolutional
Forest: A Dynamic Deep Ensemble Approach for Spam Detection in Text.” Complex &
Intelligent Systems, vol. 8, no. 6, pp. 4897–4909.
Shah, Parita, Priya Swaminarayan, and Maitri Patel (2022) “Sentiment Analysis on Film
Reviews in Gujarati Language Using Machine Learning.” International Journal of
Electrical and Computer Engineering, vol. 12, no. 1.
Shukla, Priyanka, and Adarsh Garg (2022) “Sentiment Analysis of Online Learners in
Higher Education: A Learning Perspective through Unstructured Data.” In S. Pathak et
al. (Eds.), Intelligent System Algorithms and Applications in Science and Technology. New
York: Apple Academic Press, pp. 157–170.
Singh, Vishakha, Sameer Shrivastava, Sanjay Kumar Singh, Abhinav Kumar, and Sonal
Saxena (2022) “StaBle-ABPpred: A Stacked Ensemble Predictor Based on Bilstm and
Attention Mechanism for Accelerated Discovery of Antibacterial Peptides.” Briefings in
Bioinformatics, vol. 23, no. 1.
Valle-Cruz, David, Vanessa Fernandez-Cortez, Asdrúbal López-Chau, and Rodrigo Sandoval-
Almazán (2022) “Does Twitter Affect Stock Market Decisions? Financial Sentiment
Analysis During Pandemics: A Comparative Study of the HLNL and the Covid-19
Periods.” Cognitive Computation, vol. 14, no. 1, pp. 372–387.
Vosoughi, Soroush, Deb Roy, and Sinan Aral (2018) “The Spread of True and False News
Online.” Science, vol. 359, no. 6380, pp. 1146–1151.
Wang, Gang, Jianshan Sun, Jian Ma, Kaiquan Xu, and Jibao Gu (2014) “Sentiment
Classification: The Contribution of Ensemble Learning.” Decision Support Systems, vol.
57, pp. 77–93.
Wu, Haiyan, Zhiqiang Zhang, Shaoyun Shi, Qingfeng Wu, and Haiyu Song (2022)
“Phrase Dependency Relational Graph Attention Network for Aspect-Based Sentiment
Analysis.” Knowledge-Based Systems, vol. 236, 107736.
Wu, Jian, Xiaoao Ma, Francisco Chiclana, Yujia Liu, and Yang Wu. (2022) “A Consensus
Group Decision Making Method for Hotel Selection with Online Reviews by Sentiment
Analysis.” Applied Intelligence, vol. 52, no. 9, pp.1–25.
T.me/nettrain
Chapter 15
Economic Sustainability,
Mindfulness, and
Diversity in the Age
of Artificial Intelligence
and Machine Learning
Ranjit Singha and Surjit Singha
all this, the capitalist can use automation and continue the traditional business in
an automated manner and make a profit. One such industry is the handloom-based
industry, pottery making. However, it is widely assumed that automation cannot
truly replace manual labour. If we consider the fabric’s durability, this is most likely
correct. A hand-made cloth is much more durable and can last a lifetime, but today’s
consumers are looking for good looks, low prices, and constant fashion changes.
Creating a hand-made fabric takes a lot of labour, enthusiasm for the product, love
for the customer, and attention to detail. Automated technology has replaced trad-
itional weaving, and similar fabrics can be made in an hour instead of a week. This is
a problem for weavers because machine-made cloth is less expensive and can be more
beautiful than hand-made cloth; the only difference is the durability of the cloth;
machine-made cloth will not last as long as hand-made cloth. And customers may
not understand the distinction between machine-made and hand-made cloth. With
the advent of technology, there are ways that the automated cloth manufacturer will
be able to sell their products, even if there are some legal restrictions on selling auto-
matic machine-made cloth in some geographical areas.
The vicious cycle that the weavers suffer does not tend to stop because there is
too much exploitation in the whole process. Today’s social climate has deteriorated
where a weaver who maintains traditional looms faces difficulty finding a suitable
bride or groom because the sustainability of that career is in doubt. Such social
issues can be resolved by incorporating technology into the entire system; pro-
duction will increase if traditional looms can be automated. Thus, a lower-priced
product can be sold; however, money will be required to implement such tech-
nology. Many looms are automated; the owners provide houses and looms; skilled
professionals work as labourers and are paid per sari basis. The problem with
modern technology is that the poor can learn the technology and have the ability
to gain the skills, but they cannot afford to purchase the technology. Most weavers
in India are unaware of intellectual property rights (IPR) and do not have their
designs’ copyright or design patents. Most of their designs are copyright filed, the
design patent holders are the owners of large corporations, leaving weavers with no
choice but to work with the same design that they are familiar with, for someone
else. Even if weavers are given the technology and skills, they do not have the legal
right to create the same design. The solution is for them to know how to file copy-
right, patents, and other legal aspects of the entire process or for someone to assist
them in the whole process. The filing of copyright, trademarks and patents in India
is a complicated process that needs simplification for the weavers. Obtaining a
digital signature could be automated using the PAN card number or AADHAAR
card number, thereby reducing the cost to purchase a separate digital signature. This
legal process of obtaining a digital signature, patent drafting, and copyright filing
may require legal professionals, but all of this can be done individually if NGOs,
government bodies, or institutions provide real-time training in patent drafting and
copyright filing.
T.me/nettrain
Economic Sustainability, Mindfulness, and Diversity | 275
In most cases, NGOs and government agencies wash their hands of the matter
simply by raising awareness and submitting a report. This doesn’t help, and most of
the success stories submitted by various agencies are fabricated. The process of copy-
right filing and patent filing can be automated in a simplified manner so that any
citizen should be able to file a copyright or patent.
T.me/nettrain
276 | Machine Intelligence
and listening with AI, machine learning, and beyond, the digital future has the
potential to incorporate real-time emotions and feelings. This would entail emo-
tional responses on both ends and a variety of other technologies and users, but each
of these carries its own set of risks. Through AI and ML, it is possible to cultivate
mindfulness and an awareness of diversity and communal harmony, as AI and ML
are capable of inferring the emotional and cognitive states of the people with whom
they interact. Artificial intelligence and machine learning could be used to educate
weavers and farmers about their legal rights, cultivation methods, banking processes,
and the harmful effects of tobacco use and other health-related issues.
T.me/nettrain
Economic Sustainability, Mindfulness, and Diversity | 277
making agarbatti, and there are a good number of NGOs associated with such
activity. Agarbati making may appear to be a solution for a short time, but it is
harmful in the long run; so far, in our observation, no funding agency is found to be
sponsoring Biri making or rolling cigarette training.
Some trades will be rendered obsolete by automation, and incorporating skill
training for such businesses into livelihood training is a waste of time because the
product will not maintain a market share due to price competition. Automation
combined with AI and ML will result in a higher quality product with a better
design in less time and at a lower cost. As a result, labour costs will be reduced, as
will the product price. It may appear that funding such traditional trades will benefit
conventional skilled workers, but this is not the case until and unless they incorp-
orate automation, AI, and ML into their entire process. During the COVID-19
pandemic, people’s purchasing power decreased; they are looking for high-quality
products and better designs at lower prices.
Before providing any CSR funding, the sponsor should screen to see if an NGO
or institute is involved in agarbatti making livelihood training or if the NGOs are
providing skill training in specific trades that are already being automated. In such
a case, no organization should provide CSR funding. The hidden reality of the time
is that industries are looking for free training through NGOs to get a skilled labour
force. The industry is attempting to solve its problem of skilled labour force scarcity
by receiving professional labour-free training through CSR funding because this
saves them money on training. While resolving the industrial problem of a lack of
skilled labour will undoubtedly provide a nominal wage-based income for an indi-
vidual or family, this is not the ultimate solution. A process-based training may not
help because, often, the process and product are patented, design is copyright or a
registered design. So, despite being highly skilled for an extended period in a pro-
cess, that individual may be unable to leave that employment and start his/her own
venture in the same trade in which they had been employed. Integrating skill-based
training about new technology into training is a solution; however, adopting such
technology for real-time manufacturing for cottage-based industry is impossible
until and unless such technology is provided with a high subsidy.
T.me/nettrain
278 | Machine Intelligence
thus many farmers are using modern technology. There is a lack of understanding
of contemporary technology and scientific methods in organic farming. The region
of Himachal Pradesh is well known for its organic farming because chemical-based
farming is prohibited by law in that state. J&K is famous for its apple production;
however, it is less well known that apple production is also possible in Arunachal
Pradesh. Commercial cultivation of apples is not common in Arunachal Pradesh;
however, the geographical area has the potential for commercial apple cultivation. AI
can harvest, identify different pests, plant diseases, and poor farm nutrition, detect
weeds, and spread herbicides around them. In India, the purchase of agricultural
technology is subsidized; however, farmers are unaware that such technology exists.
Implementing modern technology is doable and affordable due to available sub-
sidies. One of the main issues is a lack of electricity in India’s remote areas. Most
high-end technology requires electricity to charge the battery. If such technology
were compatible with solar power, it would be possible for Indian villages to apply
such technology. There is a possibility of raising awareness for such a programme.
Since the early 1990s, the Government of India has used television to broadcast
various farmer-related programmes, including agricultural news. There is still room
to raise awareness about modern technology in agriculture. India is a country whose
entire climatic condition is influenced by the Himalayan region and deserts; thus,
different climatic conditions exist across India’s diverse geography. The problem is
how to deploy AI with 100 per cent accurate output because each region would
require a different database according to the climatic conditions there. There would
be variations in the pattern of plant diseases. Such a database, specifically on a
regional basis, does not exist in India; we have to depend upon an international
database for prediction, which might not be 100 per cent accurate, considering
the varieties of crops and climatic conditions in India. This raises the question of
investment in such technology, and if the output has lots of error, it would be a
waste of time, money and bring the risk of wrong diagnosis and treatment. There
is a need for an Indian regionally-based database of different plant species, images
of diseases, pathogens, and pests. The regional-based database must be built and
expanded over time. It requires a dedicated R&D team, technology, investment,
and significant funds.
T.me/nettrain
Economic Sustainability, Mindfulness, and Diversity | 279
are associated with occupational hazards and indoor air pollution (Lin et al., 2008).
CO, CO2, NO2, SO2, and other gases are produced when incense is burned.
In addition to volatile organic compounds such as benzene, toluene, and xylenes,
incense burning produces aldehydes and polycyclic aromatic hydrocarbons (PAHs).
Air pollution is significant in all areas where agarbatti burning occurs, hazardous to
one’s health. Making incense sticks out of waste flowers may appear to be a solution
in many cases, but it is not; it has the same potential to pollute the air (Gallego et al.,
2011). The best way to manage flower waste is to dispose of it in the soil and convert
it into plant manure. Making agarbatti from waste flowers is the wrong solution;
instead, it adds to the existing problem. The fragrance of the agarbatti releases vola-
tile organic compounds (VOCs), which are hazardous to health (DiGiacomo et al.,
2018). Environmental tobacco smoke is harmful to health. It is associated with
cardiovascular disease. The only possible difference between agarbatti and tobacco
smoke is that tobacco smoke causes addiction because of nicotine, whereas agarbatti
smoke may not cause any addiction.
There are smart cities across the globe that were built using Deep Learning and
AI to detect pollution. CNN-LSTM (CNN Long Short-Term Memory Network)
is already deployed in China to detect such pollution; there are possibilities to
deploy a hybrid CNN-LSTM multivariate, which will eventually predict and depict
more accurate output. Deep Learning (DL) algorithms may include Long Short-
Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Gated Recurrent Unit
(GRU), Bidirectional GRU (Bi-GRU), Convolutional Neural Network (CNN),
and a hybrid CNN-LSTM model, Denoising Auto-Encoders (DAE). In an ideal
scenario, the LSTM model is observed to be more accurate than that of the DAE
model because of the existence of more layers of LSTM. LSTM is a type of recur-
rent neural network (RNN) that addresses the vanishing gradient problem, enabling
the network to retain long-term dependencies in sequential data. Bi-LSTM extends
the LSTM model by processing the input sequence in both forward and back-
ward directions, allowing the network to capture information from past and future
contexts. GRU is another type of RNN that is similar to LSTM but has a simpler
architecture. It also addresses the vanishing gradient problem and can capture long-
term dependencies. Similar to Bi-LSTM, Bi-GRU processes the input sequence in
both directions and captures information from both past and future contexts. CNN
is a deep learning model commonly used for image processing tasks. It uses convo-
lutional layers to automatically learn hierarchical representations from input data.
This model combines the strengths of CNNs and LSTMs by using CNN layers for
feature extraction from input data and LSTM layers for sequential processing and
capturing dependencies, known as a Hybrid CNN-LSTM model. DAE is a type of
unsupervised learning model that learns to remove noise from input data. It consists
of an encoder network that maps input data to a latent space representation and a
decoder network that reconstructs the original data from the latent representation.
These DL algorithms encompass a wide range of models and there are many other
T.me/nettrain
280 | Machine Intelligence
variations and architectures available, depending on the specific problem and data
characteristics.
However, it’s hard to train, as it consists of more layers; any addition of layers
ideally improves the accuracy of the results. The same methods can be used in
India to predict better output. India is moving towards smart cities and villages,
but for the time being, it is limited to road connectivity, having electricity in
every home, and solar street lights. India had launched Digital India and Make
in India based on its own earlier concept of Make in India initiatives. In Make in
India, the idea is to collaborate with technology from various parts of the world
and assemble products in India. However, the issue of technology compatibility
affects the whole process, which has its own merits and demerits. Al and ML could
be used to raise awareness about the detrimental effects of tobacco use and other
health-related issues associated with the use of agarbatti and other pollution con-
trol mechanisms.
T.me/nettrain
Economic Sustainability, Mindfulness, and Diversity | 281
15.7 Diversity
India is a diverse country with over 1,652 languages and dialects, and there may
be more languages in India that are not yet known. Any CSR funding should not
undermine the country’s existing diversity, and all training activities must consider
diversity as an essential component of training. There should be diversity among
the trainees, including males, females, and another gender. It must consider all
ethnicities, religions, and cultures. This could potentially become one of the elem-
ents of success in any venture because united we are strong, and divided we fall.
However, various NGOs focus on empowering only one specific community, reli-
gion, or gender because the goal of establishing such is focused on the growth and
development of a particular district. And it is possible that they might make a mis-
take by ignoring diversity and its repercussions. If they avoid diversity in training,
there is a risk of lacking creativity and innovation throughout the process. Deploying
AI can resolve racial discrimination; in the process, the first level of initial blockage
can be removed by deploying AI. There is a possibility of discrimination based on
race, colour, gender, nationality, religion and other diversity factors, but there are
T.me/nettrain
282 | Machine Intelligence
T.me/nettrain
Economic Sustainability, Mindfulness, and Diversity | 283
15.8 Mindfulness
Being mindful could be the possible solution to most of the issues; in the first place,
burning agarbatti is harmful to one’s health, and many people are unaware of it.
Agarbatti has never been labelled as dangerous to health. Millions of people indeed
depend upon the agarbatti industry, and it’s profitable; it’s also associated with faith
and religion (Tang et al., 2013). Recovery from tobacco addiction is possible by
applying brief mindfulness (Ehrlich, 2015), and the practice of mindfulness can
create mindful leaders, whereby their decisions could be conscious decisions, which
could eventually help make a better policy (Wamsler et al., 2018). Mindfulness is
considered a sustainable science (Wilson et al., 2019). In mindfulness, practising
self-compassion could be much more helpful in the whole process, so integrating
self-compassion into mindfulness could be a practical solution. AI and ML can play
a role in the guided meditation process. ML can learn the likings and dislikes of the
tone of sound, voice, preference of the type of voice, and sound. AI can deliver the
guided meditation in the appropriate style required by the user, making any inter-
vention more effective. There is a potential danger in this process because once AI
learns the preferences of voice and sound, it can deliver any content it wants. There
is a possibility of influencing any person in the world by making things happen
accordingly. Artificial intelligence and machine learning could be used to educate
about mindfulness and conduct mindfulness-based sessions. Web, Android, and
iOS apps now provide guided meditation, and mindfulness; additional biofeed-
back can be tracked through artificial intelligence and machine learning. AI and
machine learning can create harmony in music, arithmetic, and art. For instance,
the musical notes such as C and G sound harmonious when combined because their
ratio is a simple fraction, G/C =392 Hz/262 Hz =3/2 (Berberich et al., 2020).
Verbeek (2005; 2011; 2016) developed the concept of mediation theory in tech-
nology. Additionally, a blue-green deployment model can be used to maintain two
independent infrastructures or duplicate feature stores. The blue-green deployment
model is a software release management strategy that aims to minimize downtime
and risk during the deployment of new features or updates to a production envir-
onment. In this model, two independent infrastructures, referred to as “blue” and
“green”, are set up. It may also be defined as a single production environment for
feature and prediction requests.
15.9 Conclusion
Human diversity and mindfulness are necessary for the sustainable implementation
of artificial intelligence (AI) and machine learning (ML). This chapter has explored
the various ways in which AI and ML can collaborate with humans to advance
society. These include assisting with copyright and design patent registrations,
cultivating mindfulness, educating weavers and farmers on their legal rights and
T.me/nettrain
284 | Machine Intelligence
T.me/nettrain
Economic Sustainability, Mindfulness, and Diversity | 285
References
Berberich, N., Nishida, T., and Suzuki, S. (2020) “Harmonizing Artificial Intelligence for
Social Good.” Philosophy & Technology, 33(4): 613–638. https://doi.org/10.1007/s13
347-020-00421-8
DiGiacomo, S. I., Jazayeri, M., Barua, R. S., and Ambrose, J. A. (2018) “Environmental
Tobacco Smoke and Cardiovascular Disease.” International Journal of Environmental
Research and Public Health, 16(1): 96. https://doi.org/10.3390/ijerph16010096
Ehrlich, J. (2015) “Creating Mindful Leaders and Organizations.” People & Strategy, 38(3).
https://link.gale.com/apps/doc/A425111993/AONE
Gallego Piñol, E., Roca Mussons, F. J., Perales Lorente, J. F., and Guardino Solà, X. (2011)
“Simultaneous Evaluation of Odor Episodes and Air Quality in Urban Areas by Multi-
Sorbent Sampling and TD-GC/MS Analysis.” Reporter, 48: 3–5. http://hdl.handle.net/
2117/15563
Huang, H., and Nishida, T. (2013) “Evaluating a Virtual Agent Who Responses Attentively
to Multiple Players in a Quiz Game.” Journal of Information Processing, 8(1): 81–96.
https://doi.org/10.11185/imt.8.81
Lin, T., Krishnaswamy, G., and Chi, D. S. (2008) “Incense Smoke: Clinical, Structural and
Molecular Effects on Airway Disease.” Clinical and Molecular Allergy, 6. https://doi.org/
10.1186/1476-7961-6-3
Suzuki, S. (2010) Takt in Modern Education. Münster: Waxmann.
Suzuki, S. (2019) “Etoku and Rhythms of Nature”. In J. R. Resina and C. Wulf (eds)
Repetition, Recurrence, Returns. Lanham, MD: Lexington Books, pp. 131–146.
Tang, Y., Tang, R., and Posner, M. I. (2013) “Brief Meditation Training Induces Smoking
Reduction.” Proceedings of the National Academy of Sciences of the United States of America,
110(34): 13971–13975. https://doi.org/10.1073/pnas.1311887110
Verbeek, P. P. (2005) What Things Do: Philosophical Reflections on Technology, Agency, and
Design. Philadelphia, PA: The Pennsylvania State University Press.
Verbeek, P. P. (2011) Moralizing Technology: Understanding and Designing the Morality of
Things. Chicago: University of Chicago Press.
Verbeek, P. P. (2016) “Toward a Theory of Technological Mediation: A Program For Post
Phenomenological Research.” In J. K. Berg, O. Friis and Robert C. Crease (eds),
Technoscience and Post Phenomenology: The Manhattan Papers. London: Lexington
Books, pp. 189–204.
Višić, B., Kranjc, E., Pirker, L., Bačnik, U., Tavčar, G., Škapin, S. D., and Remškar, M.
(2018) “Incense Powder and Particle Emission Characteristics During and After Burning
Incense in an Unventilated Room Setting.” Air Quality, Atmosphere & Health, 11(6):
649–663. https://doi.org/10.1007/s11869-018-0572-6
Wamsler, C., Brossmann, J., Hendersson, H., Kristjansdottir, R., McDonald, C. P., and
Scarampi, P. (2018) “Mindfulness in Sustainability Science, Practice, and Teaching.”
Sustainability Science, 13(1): 143–162. https://doi.org/10.1007/s11625-017-0428-2
Wilson, A. F., Mackintosh, K., Power, K., and Chan, S. W. Y. (2019) “Effectiveness of Self-
Compassion Related Therapies: A Systematic Review and Meta-Analysis.” Mindfulness,
10(6): 979–995. https://doi.org/10.1007/s12671-018-1037-6
T.me/nettrain
T.me/nettrain
Chapter 16
Adopting Streaming
Analytics for Healthcare
and Retail Domains
G. Nagarajan, Kiran Singh, T. Poongodi,
and Suman Avdhesh Yadav
16.1 Introduction
Streaming analytics is the implementation of analytics in streaming real-time data
for crucial applications that need real-time information for decision-making. In most
streaming analytic applications, the transmission has minimal latency; let it be less
than 1 millisecond, but it is never a zero-latency transmission [1]. The strength of the
system is revealed with the precision of the real-time information that is collected.
Streaming analytics is based on the theory of streaming data processing that processes
that data in real time, one unit at a time; and low-latency analytics that delineate the
internal events as they occur. Such processes must be continuous, immune to tech-
nical glitches and capable of reconstructing the data transfer if interrupted by failure
of either internal nodes or processes.
Healthcare is an important example of applications of streaming analytics.
The healthcare databases process massive heterogeneous data in real time that is
being shared on an extended network. This database is used by hospitals, medical
practitioners, researchers, insurance companies, government authorities and many
others. Healthcare industries have grown exponentially after implementing Big
Data analytics [2]. Researchers’ emphasis on inspection of data stipulates identi-
fying diseases on the basis of modifying physiology. Understanding the relations,
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 289
16.2.1 Patient Data
Electronic health records (EHRs) are a computerized interpretation of a patient’s
medical history stored on a computer. It includes a comprehensive variety of infor-
mation relevant to a case’s treatment, such as demographics, issues, medications,
physician’s observations, vital signs, medical records, lab results, radiological reports,
patient records, and billing information. Many EHRs go beyond a patients’ health
or treatment history to provide new insights into their care. EHRs allow physicians
and associations to communicate information efficiently and effectively. In this con-
text, EHRs are built from the ground up to be in real time, and they may be input
and changed immediately by authorized practitioners. It is quite beneficial in a var-
iety of practical situations. As an example, a health center or specialists may seek to
incorporate the health records of the main practitioner into their system of record.
An EHR system simplifies the process by enabling immediate access to the simpli-
fied information in real time [5]. It can assist other care-related elements such as
substantiation-based decision assistance, standard operation, and problem reporting.
EHRs improve the efficiency with which wellness data is stored and recovered.
Patient care is improved by increasing patient engagement in the treatment process.
It also helps to improve the accuracy of judgments and health conditions and the
overall quality of care cooperation.
T.me/nettrain
290 | Machine Intelligence
PET, and ultrasound [6]. These images are very helpful in the examination of the
patients’ internal body organs without any impact on their health condition. The
medical practitioners get a better understanding of the cause of the ailment and
response of the patient’s body to the prescribed treatment. The primary purpose of
these imaging method is to collect real-time information and assist the physician to
treat their patients in critical situations, which sometimes can be life-threatening.
However, there are numerous challenges in capturing and transmitting these crit-
ical images, such as complex, diverse, and noisy images being included in this cat-
egory. Object identification, categorization, recognition, and extraction of features are
broad areas of image analysis research challenges. These concerns may provide crucial
analytical measures used in other elements of healthcare information analytics. Once
all of these issues are handled, and the development of relevant analytic measures used
as input to other aspects of healthcare data analytics will be possible.
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 291
16.3 Retail Domain
16.3.1 Retail Industry
Retailing is one of the most successful and diverse industries in the world. The fact
is that there are various sorts of retail outlets, such as supermarkets, pharmaceutical
stores, department stores, beautician stores, and so on, and there is neck-to-neck
war among all retailers, which strengthens the increasing rivalry among all of them.
This rivalry takes the shape of same-store kinds in terms of price reductions and
other incentives to entice shoppers to buy. Currently, commerce has transformed
disordered stacks into nicely organized storefronts [10].
However, the fundamental difficulty for today’s merchant is that the contem-
porary client is concerned with quality and brand. They can evaluate the services
supplied by numerous stores from the comfort of their own homes with a simple click
on a computer or smartphone. As a result, clients choose to buy from e-commerce
websites rather than physically visiting a retail shop, resulting in a decrease in sales
T.me/nettrain
292 | Machine Intelligence
for merchants, who are now facing a major challenge from online competitors. The
result is that retail establishments are obliged to strive diligently to meet and exceed
their customers’ expectations by bringing all of their desired items together in one
location. As a result, the key goal for today’s merchants is to maintain their existing
client base while avoiding being swayed by intense competition.
Streaming analytics may assist in the transformation of business insights by
obtaining data from within the organization, such as enterprise resource planning
(ERP) and investment management, and from other sources, such as network edge
and public assets, connecting it the ultimate decision. From acquiring historical
information to the most recent data, streaming analytics may assist merchants in
navigating existing issues and anticipating any upcoming problems.
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 293
16.3.3.1 Inventory Tracking
■ Electronic data interchange (EDI): Computer-to-computer interactions from
the shop to the suppliers’ databases and online ordering.
■ Hand-held inventory devices used wirelessly: Maintain an inventory and upload
the information to the database at headquarters.
■ Universal product code (UPC): Products are identified and tracked via the use
of barcodes and unique numbers.
■ Automatic replenishment: Handles the refilling of items that have been sold.
■ Virtual shelves: Intranets between merchants and providers allow faster com-
munication and more accurate inventory management.
16.3.3.2 Customer Service
■ CRM software: Shoppers’ information may be gathered by merchants using
this tool.
■ CD-ROMs at the register: Allow sales representatives to place specific orders
at the moment. In addition, sales training is provided to sales workers on
the floor.
■ Kiosks: Deliver product information to your consumers.
■ Point-Of-Sale (POS) terminals: Coupons and reports may be printed, frequent
buyer discounts can be calculated, buyer identity information is captured,
labor hours can be scheduled, and email connections between stores and
headquarters are also used.
■ Fingerprint technology: Input for credit card purchases at the point of sale ter-
minal. A digital copy of the receipt is preserved.
■ E-commerce technology: It enables companies and consumers to interact any-
where throughout the world.
16.3.3.3 Data Warehousing
■ Executive Information Systems (EIS): Create charts of detailed data to aid retail
professionals in business decisions in their various sectors by analysing and
visualizing the data.
T.me/nettrain
294 | Machine Intelligence
creative sources, they all play a critical role in ensuring that they obtain the informa-
tion necessary to provide correct and full sales analytics:
Data analysis using Big Data has been hailed as a revolutionary technology that can
transform the retail business. Despite the fact that data management techniques have
several advantages, an increasing number of firms are using data analysis through
large datasets to get information into their processes and boost their revenue pro-
duction capabilities [12].
T.me/nettrain
newgenrtpdf
Table 16.1 Main Platforms Used in Streaming Analytics
Streaming analytics
Platform projects Description
Apache Apex Apex is a Hadoop platform powered by YARN. It can handle unbounded data sets and
(Continued)
295
T.me/nettrain
newgenrtpdf
Table 16.1 (Continued)
296 |
Streaming analytics
Platform projects Description
Firehose Amazon Firehose makes it possible to link data streams into current business
Machine Intelligence
intelligence tools, analytical edges, and warehouses. It assists in retrieving data and
integrating it into current data warehousing systems.
Azure Azure Stream It works with Power BI to analyse data. Both systems are cloud-managed, and it
Analytics combines data from numerous sources and deliver low-latency processing.
Power BI Power BI is a public analysis tool. It is possible to integrate Stream Analytics with Power
BI to update dashboards and create visualizations using interactive components and
settings.
Google Google Cloud A specialized engine for data input, computing, and analysis is built into it, it offers
Cloud Stream Analytics capabilities equivalent to Amazon Simple Storage Service.
Oracle Oracle Stream Its cloud-based platform provides stream ingestion, processing, and visualization tasks
Analytics in a single environment.
IBM IBM Streaming IBM Streaming Analytics allows you to construct real-time analytics apps. Ingestion/
Analytics transformation/analysis of data using IBM Streams. Streams may be implemented on-
premise or on the IBM Cloud.
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 297
T.me/nettrain
298 | Machine Intelligence
NoSQL database or a memory filter to store results, send out alerts to the appro-
priate groups, and ingest messages. To keep track of message offsets, providers and
customers use ZooKeeper [13].
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 299
■ Data collection and ingestion: This can begin with the use of a streaming
collection and ingestion platform capable of handling a wide range of
waveforms with varying degrees of precision, accuracy, and reliability. Dynamic
waveform data must be integrated into the electronic health record (EHR).
Analytics engines can gain situational and contextual awareness by analysing
EHR data. By increasing the amount of data used, it is important to maintain
a healthy balance between predictive analytics’ sensitivity and specificity. The
study’s illness cohort will include a significant impact on signal processing.
A large number of target features can be extracted using signal processing
algorithms, they are then analysed by a machine learning model that has been
pre-trained to produce real-time actionable insights. These breakthroughs
could pave the way for new diagnostics, predictions, and prescriptions. In
response to the data, additional procedures, such as alerts and notifications to
doctors, could be developed. Combining discrete data from various sources
with consistent waveform data is a critical step in developing new diagnostics
T.me/nettrain
300 | Machine Intelligence
and therapies. Before these systems can be used in clinical settings, a number
of system, analytic, and clinical needs must be addressed. When designing
monitoring systems that can accept both continuous and discrete data from
non-continuous sources, there are a number of challenges and solutions to
consider.
■ Data analysis and interpretation: Prior to this study, physiological signal data
collected throughout time was seldom streamed in real time or saved for
an extended period of time. Data captures were frequently time-limited
and could only be recovered if the device-makers provided proprietary soft-
ware and data formats. Regardless of whether the majority of major medical
device manufacturers have begun to develop interfaces for obtaining data in
motion, such as live streaming data from their equipment, traditional Big
Data challenges are due to its high rate of movement. In the current situ-
ation, governing issues such as data protocols, data standards, and worries
about data privacy are all factors. Another impediment to adoption by the
masses of real-time data collecting has been a lack of network bandwidth,
scalability, and cost [17]. Medical research communities-specific programs
[18] can now be implemented system-wide [19]. The scientific community
is considering continuous monitoring systems based on data generated by
live monitors [20]. Several indigenous and off-the-shelf attempts have been
made to design and install data collection systems [21]. Thanks to new
industry technologies, data collection from patient monitors is becoming
easier across a wide range of healthcare systems, regardless of the device
vendor.
■ Data retrieval and storage: To handle the huge amount of data on the cloud
and other patient information that could come from a therapeutic setting,
a dependable method of data storage is required. Given how computation-
ally and time-intensive storing and retrieving data can be, it’s vital to have
a storage facility architecture that enables speedy information to be pulled
and saved in response to analytic needs. Because of their ability to keep and
process huge amounts of information, healthcare researchers are increasingly
turning to computer systems: Hadoop, MapReduce, and MongoDB are
examples of such systems [14]. Traditional relational databases can be replaced
by document-oriented databases that run on a variety of platforms using
MongoDB. Interoperable health systems frequently have their own relational
database schemas and data models, making it difficult to share healthcare data
across institutions or conduct research projects. To make matters worse, trad-
itional databases do not support the integration of data from multiple sources
(e.g., streaming waveforms and EHR data). When it comes to healthcare
information, document-based databases like MongoDB can help with high
speed, high availability, and easy scalability. Open-source frameworks such as
Apache Hadoop can now be used to distribute the processing of massive data
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 301
sets across a distributed network of computers. MapReduce and Spark are just
two of the numerous computing modules that can run on this massively scal-
able platform [14]. Because it can consume and process streaming data and
provides machine learning and graphing tools, an analytics module like Spark
is ideal for continuous telemetry waveforms. The ultimate goal of using such
tools for researchers is to quickly and efficiently translate scientific discoveries
into medicinal applications, both in real time and in retrospect.
■ Data aggregation: This is the process of merging data from several different
sources. Integrating data from multiple sources and maintaining consist-
ency are just two of the challenges that healthcare data aggregation faces.
Aside from data quality and standardization, two other issues must be
addressed: Because medical data is frequently complex, interconnected, and
interdependent, it is critical to keep it as simple as possible. Secure medical
data storage, access, and use are critical to the privacy and provenance of
medical records [22]. The analysis of continuous data makes considerable
use of time-related information. The dynamic nature of the time context
during integration may greatly increase the technical challenges when com-
bining measured data with fixed electronic health record data, because static
data does not necessarily provide real-time context. It takes time and effort
to create an open database of waveforms and other electronic medical data
that is accessible to researchers all over the world [23]. MIMIC II [24] and
other datasets contain signals and other medical data from various real-life
populations of patients.
T.me/nettrain
302 | Machine Intelligence
a current clinical trial, biomarkers derived from real-time successful estuation can
be predicted via signal processing of cardiac and breathing signals [29]. An animal
study established the use of non-invasive time series such as oxygen delivery, water
content, and blood circulation as indicators of soft tissue healing in wound care.
Electrocardiographic data collected via telemetry was coupled with demographic
characteristics to develop an in-hospital cardiac arrest early detection system based
on medical record, ejection fraction, lab results, and medicines [30].
The MIMIC II database was used by Lee and Mark [31] to use cardiac and blood
pressure time series data to prompt therapeutic treatment in hypotensive episodes.
Another study [32] found that physiological waveform data could be combined with
clinical data from the MIMIC II database and be used to detect shared characteristics
among patients in specific cohorts. This similarity may improve caregiver decision-
making because caregivers can draw on information gained from similar illness states
to make informed decisions. To identify patients with cardiovascular instability,
a combination of waveform data from the MIMIC II database is used [33]. Pre-
operative care and operating room settings can collect a wide range of physiological
data that can be analysed in [34] to track the patient’s condition during and after the
procedure. [35] demonstrated that machine learning models based on data fusion
can improve diagnosis accuracy when using breath omics biomarkers (in an inhaled
air metabolomics study) as a diagnostic aid.
Neurology researchers have recently become interested in the use of
electrophysiologic monitoring in patient treatment. This monitoring could be used
in the near future to develop novel diagnostic and treatment procedures, as well
as shed new light on difficult illnesses. According to an article [36], many physio-
logical monitoring devices have been developed specifically for the treatment of
patients in need of neurocritical care. Due to the efforts of [37], patients with severe
brain injuries can now benefit from multimodal monitoring and care that is tailored
to their specific needs. Researchers used multimodal brain monitoring to see if
patients’ reliance on mechanical ventilation and hospitalization could be reduced.
[38] discusses the fundamentals of multimodal monitoring in neurocritical care for
secondary brain damage, as well as the first and future approaches to understanding
and applying such data to clinical care.
Patients and enthusiasts of all ages are using personal monitoring devices, which
are becoming more affordable, portable, and easy to use outside of professional
settings. However, combining data from multiple portable devices at the same
time, as with clinical applications, could be difficult. Pantelopoulos and Bourbakis
[39] examined wearable biosensor research and development. They discussed the
advantages and disadvantages of working in this industry. A network-based tele-
medicine study [40] employs the same portable blood pressure and body weight
equipment. According to [41, 42], a variety of stationary and mobile sensors that
can be used to improve patient care technology can help with healthcare data
mining.
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 303
T.me/nettrain
304 | Machine Intelligence
changes from the standard genome code. And, changing a single base in nucleotides
from “ACGTTTGA” to “ACGTCTGA” may have negative health implications.
Interpreting genetic data brings us closer to medicine’s future; it signifies the start
of an age characterized by personalized medicine, in which every human has their
own personal DNA sequence and, thus, their own personal genome, loaded with
numerous mutations, in that some are beneficial and others dangerous. The Human
Genome Project (HGP), which is sponsored by the US government, completed the
first trial at sequencing in 2003. From 1990 to 2003, HGP received $3 billion
in funding to convert sequences from a manual form to an automated process.
Moreover, the HGP has discovered that in total there are over 20,500 human
genes. And, the human genome was determined in three ways by scientists working
on HGP,
Genome Wide Association Studies (GWAS) move the Human Genome Project a
step forward by identifying the most common genetic variants in various people and
establishing links among genes and diseases. Some challenges in the manipulation
of genomic data are:
■ Despite the fact that around 6000 Mendelian disorders are examined at the
primary genetic level, the majority of their activities in health and disease
remain unknown.
■ The National Institutes of Health (NIH), Sequence Read Archive (SRA)
databases have developed at an exponential pace over the last eight years.
While advances in NGS have made it simple to sequence an entire genome/
exome, the handling, analysis, and interpretation of the genomic data
produced by NGS remain a significant challenge.
By considering the human genomes for about 3 billion base pairs, sequencing a
human genome creates about 100 terabytes of data in the file format of BAM (Binary
Version of Sequence Alignment/Map), and VCF (Variant Call Format). Moreover,
the BAM file size in a sequencing experiment is found by the coverage and read
length. For a single sample of 30 WGS data, the FASTQ file could be around 250
GB, the BAM file could be around 100 GB, the VCF file could be around 1 GB, and
the other annotated files could be around 1 GB. And, by considering the estimated
file sizes of various NGS data formats, it is time-consuming to generate such files.
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 305
Data can be analysed swiftly with the Big Data infrastructures. In comparison to the
original version, a Big Data-based Burrows-Wheeler Aligner (BWA) will maximize
the alignment speed 36-fold.
Currently, most sequencing data analysis methods depend on VCF files, which
believe that all “no-call locations” are just like reference alleles. In reality, inadequate
coverage can be the cause of several “no-call locations.” As a result, data quality infor-
mation for each location, which includes coverage and Phred-scaled base quality
ratings, must be used to determine if “no-call locations” are reference-consistent
owing to excessive coverage or reference-inconsistent due to the less coverage in
the downstream data review. Exome sequencing data has been examined using a
variety of toolkits for cloud computing, data compression, to prioritize variants,
data exchange, Copy Number Variation (CNV) identification, and phenotypes.
Analytical tools work on VCFs, which do not generally require an infrastructure
always since VCFs are quite a bit smaller than BAM files.
Researchers are now encountering considerable difficulties in maintaining, pro-
cessing, manipulating, interpreting, and analysing WGS data. Thus, the problems
will be compounded as millions of people are sequenced, that becomes one of the
targets of the Precision Medicine Initiative (PMI) in the United States and related
work around the world. It may be possible to build a Big Data framework to handle
and interpret large volumes of genomic data that is consistent with clinical workflows
by exploiting the scalability and distribution features in Big Data infrastructures.
T.me/nettrain
306 | Machine Intelligence
they must provide consent on a business associate agreement to comply with the
HIPAA privacy, protection, and violation of notification laws. Managing data access
and the implementation of a user role-based access system are the two ways that can
be followed by cloud service providers in order to resolve these concerns.
Some issues that arise while handling genomic data are described below,
16.8.2 Privacy
Privacy is one of the most sensitive issues in data science. In fact, the public’s current
privacy concerns regarding surveillance cameras, financial transactions, and email
communication are crucial nowadays. Privacy disclosures are commonly difficult
to detect due to the cross-reference of larger datasets, quasi-identifier is the best
example for this. While genomics-based privacy has some distinct characteristics
when compared to data science-based privacy issues, the fact is that the genome is
crossing across generations and is very important to the public. Moreover, revealing
genomic data is more dangerous than leaking other kinds of data. Once a person’s
data or their related variants have been revealed or leaked, they may not be able to
take them back. Finally, genomic data are far larger in size than several other kinds
of personal data; for example, the genome contains much more confidential personal
information than a social security number or credit card.
16.8.3 Data Ownership
Privacy is a major concern in data management. An individual entity or patient owns
their personal information, a coercive trend in this biomedical science is that the
researcher who produces a dataset manages it. Researchers with large datasets will
have a long history that supports them in analysing the data over the period in order
to discover interesting stories. In particular, health data has commercial and med-
ical value, so corporations and governments often claim control and ownership of
large datasets. According to data miners, all information should be accessible, since
this would allow easy aggregation of massive amounts of data, better statistical ana-
lysis, and optimal outcomes. In fact, combining larger datasets results in increasingly
better genotypes being correlated with phenotypes. Furthermore, we estimate that
sharing biases, such as particular diseases, pathologies, and phenotypes, and being
more enthusiastic to share genetic data would occur in an ideal case, in which indi-
vidual users were committed to free access and the dataset became entirely accessible
and freely transferred by the users.
Skew in the dataset can be caused by education, socio-economic status, and
healthcare access, all of which can bias mining efforts, including knowledge extrac-
tion. Likewise, around 80 percent of participants in genome-wide association studies
are of European ancestry, despite the fact that group makes up only 16 percent of
the global population. As a result, complete data sharing is unlikely to be prac-
tical in the future for the best genomic studies. The development of a large private
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 307
archive may be one potential technological solution for exchanging genomics data.
This is completely distinct from the World Wide Web, and which is essentially a
public resource. To allow data sharing and suggest a way to centralize the computa-
tional storage of larger datasets to maximize performance, a private archive will be
approved for use only by certified researchers. Furthermore, in future, people will be
faced with challenging data science problems, namely sharing constrained data in a
certain context. Finally, data ownership is closely associated in obtaining profits and
benefits from the data. The best strategies have to be identified not only to recognize
the data generation process, but also to analyse the data efficiently and to value the
reward analysts and data generators.
16.9 Conclusion
This chapter discusses real-time streaming analytics in the healthcare and retail
domain. Developing data analytics is now easier than ever before, thanks to the
digital transformation of business. Data streaming analytics may enhance decision-
making by providing better business insight predictions. Stream analytics’ ultimate
goal is to assist organizations to stay ahead of their competition by allowing them
to make smarter decisions. Streams of data are gathered in retail and healthcare
via different electronic devices such as mobile phones, wireless sensors, etc., linked
through the internet. The healthcare business requires software tools and technology
capable of real-time analysis of massive and diverse data streams. The streaming ana-
lysis technology is used in retail establishments and e-commerce because increasing
client happiness is the key aim. Streaming analytics also helps new companies reach
their target audiences and gain market share. Big Data analytics uses both structured
and unstructured data. It enhances decision-making for both researchers and med-
ical practitioners.
References
1. Kantardzic, M., and Zurada, J. (2016) “Introduction to Streaming Data Analytics
and Applications Minitrack.” In 2016 49th Hawaii International Conference on System
Sciences (HICSS), IEEE, January, pp. 1729–1729.
2. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Hung Byers,
A. (2011) Big Data: The Next Frontier for Innovation, Competition, and Productivity. New
York: McKinsey Global Institute.
3. Oyelade, J., Soyemi, J., Isewon, I., and Obembe, O. (2015) “Bioinformatics, Healthcare
Informatics and Analytics: An Imperative for Improved Healthcare System.” International
Journal of Applied Information System, 13(5): 1–6.
4. Yadav, S. A., Sharma, S., Das, L., Gupta, S., and Vashisht, S. (2021) “An Effective
IoT Empowered Real-time Gas Detection System for Wireless Sensor Networks.” In
T.me/nettrain
308 | Machine Intelligence
T.me/nettrain
Streaming Analytics for Healthcare and Retail Domains | 309
21. Kaur, K., and Rani, R. (2015) “Managing Data in Healthcare Information Systems:
Many Models, One Solution.” Computer, 48(3): 52–59.
22. Uzuner, Ö., South, B. R., Shen, S., and DuVall, S. L. (2011) “2010 i2b2/VA Challenge
on Concepts, Assertions, and Relations in Clinical Text.” Journal of the American Medical
Informatics Association, 18(5): 552–556.
23. Athey, B. D., Braxenthaler, M., Haas, M., and Guo, Y. (2013) “tranSMART: an Open
Source and Community-Driven Informatics and Data Sharing Platform for Clinical and
Translational Research.” Paper presented at AMIA Summits on Translational Science
Proceedings, 2013, 6.
24. Scott, D. J., Lee, J., Silva, I., Park, S., Moody, G. B., Celi, L. A., and Mark, R. G.
(2013) “Accessing the Public MIMIC-II Intensive Care Relational Database for Clinical
Research.” BMC Medical Informatics and Decision Making, 13(1): 1–7.
25. Belle, A., Kon, M. A., and Najarian, K. (2013) “Biomedical Informatics for Computer-
Aided Decision Support Systems: A Survey.” The Scientific World Journal, 2013.
doi:10.1155/2013/769639
26. Bloom, B. S. (2002) “Crossing the Quality Chasm: A New Health System for the 21st
Century (Committee on Quality of Health Care in America, Institute of Medicine).”
JAMA, Journal of the American Medical Association-International Edition, 287(5): 645.
27. Han, H., Ryoo, H. C., and Patrick, H. (2006) “An Infrastructure of Stream Data
Mining, Fusion and Management for Monitored Patients.” Paper presented at 19th
IEEE Symposium on Computer-Based Medical Systems (CBMS’06), IEEE, June, pp.
461–468.
28. Bressan, N., James, A., and McGregor, C. (2012) “Trends and Opportunities for
Integrated Real Time Neonatal Clinical Decision Support.” In Proceedings of 2012
IEEE-EMBS International Conference on Biomedical and Health Informatics, IEEE, pp.
687–690.
29. Seely, A. J., Bravi, A., Herry, C., Green, G., Longtin, A., Ramsay, T., ... and Marshall,
J. (2014) “Do Heart and Respiratory Rate Variability Improve Prediction of Extubation
Outcomes in Critically Ill Patients?.” Critical Care, 18(2): 1–12.
30. Attin, M., Feld, G., Lemus, H., Najarian, K., Shandilya, S., Wang, L., ... and Lin, C. D.
(2015) “Electrocardiogram Characteristics Prior to In-Hospital Cardiac Arrest.” Journal
of Clinical Monitoring and Computing, 29(3): 385–392.
31. Lee, J., and Mark, R. G. (2010) “A Hypotensive Episode Predictor for Intensive
Care Based on Heart Rate and Blood Pressure Time Series.” Paper presented at 2010
Computing in Cardiology Conference, IEEE, September, pp. 81–84.
32. Sun, J., Sow, D., Hu, J., and Ebadollahi, S. (2010) “A System for Mining Temporal
Physiological Data Streams for Advanced Prognostic Decision Support.” Paper
presented at 2010 IEEE International Conference on Data Mining, IEEE, December,
pp. 1061–1066.
33. Cao, H., Eshelman, L., Chbat, N., Nielsen, L., Gross, B., and Saeed, M. (2008)
“Predicting ICU Hemodynamic Instability Using Continuous Multiparameter Trends.”
Paper presented at 2008 30th Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, IEEE, August, pp. 3803–3806.
34. Reich, D. L. (2011) Monitoring in Anesthesia and Perioperative Care. Cambridge:
Cambridge University Press.
T.me/nettrain
310 | Machine Intelligence
35. Smolinska, A., Hauschild, A. C., Fijten, R. R. R., Dallinga, J. W., Baumbach, J., and
Van Schooten, F. J. (2014) “Current Breathomics—A Review on Data Pre-Processing
Techniques and Machine Learning in Metabolomics Breath Analysis.” Journal of Breath
Research, 8(2): 027105.
36. Le Roux, P., Menon, D. K., Citerio, G., Vespa, P., Bader, M. K., Brophy, G. M.,
... and Taccone, F. (2014) “Consensus Summary Statement of the International
Multidisciplinary Consensus Conference on Multimodality Monitoring in Neurocritical
Care.” Neurocritical Care, 21(2): 1–26.
37. Tisdall, M. M., and Smith, M. (2007) “Multimodal Monitoring in Traumatic Brain
Injury: Current Status and Future Directions.” British Journal of Anaesthesia, 99(1):
61–67.
38. Hemphill, J. C., Andrews, P., and De Georgia, M. (2011) “Multimodal Monitoring and
Neurocritical Care Bioinformatics.” Nature Reviews Neurology, 7(8): 451–460.
39. Pantelopoulos, A., and Bourbakis, N. G. (2009) “A Survey on Wearable Sensor-Based
Systems for Health Monitoring and Prognosis.” IEEE Transactions on Systems, Man, and
Cybernetics, Part C (Applications and Reviews), 40(1): 1–12.
40. Winkler, S., Schieber, M., Lücke, S., Heinze, P., Schweizer, T., Wegertseder, D., ... and
Koehler, F. (2011) “A New Telemonitoring System Intended for Chronic Heart Failure
Patients Using Mobile Telephone Technology—Feasibility Study.” International Journal
of Cardiology, 153(1): 55–58.
41. Sow, D., Turaga, D. S., and Schmidt, M. (2013) “Mining of Sensor Data in Healthcare:
A Survey.” In C. Aggarwal (Ed.), Managing and Mining Sensor Data. Cham: Springer,
pp. 459–504.
42. Prasad, S., and Sha, M. N. (2013) “NextGen Data Persistence Pattern in Healthcare:
Polyglot Persistence.” In 2013 Fourth International Conference on Computing,
Communications and Networking Technologies (ICCCNT), IEEE, July, pp. 1–8.
43. Brophy, J. T. (1992) “WEDI (Workgroup for Electronic Data Interchange) co-chair
predicts big savings from EDI.” Computers in Healthcare 13(10): 54– 57. PMID:
10122897.
T.me/nettrain
Index
311
T.me/nettrain
312 | Index
T.me/nettrain
Index | 313
T.me/nettrain
314 | Index
T.me/nettrain
Index | 315
T.me/nettrain
316 | Index
T.me/nettrain
Index | 317
T.me/nettrain
T.me/nettrain