Top Five Machine Learning Libraries in Python - A Comparative Analysis
Top Five Machine Learning Libraries in Python - A Comparative Analysis
Top Five Machine Learning Libraries in Python - A Comparative Analysis
Vikrant Bhateja
K. V. N. Sunitha
Yen-Wei Chen
Yu-Dong Zhang Editors
Intelligent
System
Design
Proceedings of INDIA 2022
Lecture Notes in Networks and Systems
Volume 494
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of
Campinas—UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University of
Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to
both the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Yen-Wei Chen
College of Information Science
and Engineering
Ritsumeikan University
Kusuatsu, Shiga, Japan
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Conference Organization Commitees
Chief Patron
Patrons
Conference Chairs
Organizing Chair
Publication Chair
v
vi Conference Organization Commitees
Organizing Committee
Publicity Committee
Advisory Committee
xi
xii Preface
xiii
xiv Contents
IT Association, in 2019. She has guided 9 Ph.Ds. and currently guiding 8 research
scholars. She authored 5 text books, published more than 150 papers.
Yen-Wei Chen received the B.E. degree in 1985 from Kobe University, Kobe, Japan,
the M.E. degree in 1987 and the D.E. degree in 1990, both from Osaka University,
Osaka, Japan. He was a research fellow with the Institute for Laser Technology,
Osaka, from 1991 to 1994. From October 1994 to March 2004, he was an associate
professor and a professor with the Department of Electrical and Electronic Engi-
neering, University of the Ryukyus, Okinawa, Japan. He is currently a professor
with the college of Information Science and Engineering, Ritsumeikan University,
Japan. He is also a visiting professor with the College of Computer Science, Zhejiang
University, China. He was a visiting professor with the Oxford University, Oxford,
UK, in 2003 and a visiting professor with Pennsylvania State University, USA, in
2010. His research interests include medical image analysis, computer vision and
computational intelligence. He has published more than 300 research papers in a
number of leading journals and leading conferences including IEEE Transactions on
Image Processing, IEEE Transactions on SMC, Pattern Recognition. He has received
many distinguished awards including ICPR2012 Best Scientific Paper Award, 2014
JAMIT Best Paper Award, Outstanding Chinese Oversea Scholar Fund of Chinese
Academy of Science. He is/was a leader of numerous national and industrial research
projects.
Yu-Dong Zhang received his Ph.D. degree from Southeast University, China, in
2010. He worked as postdoc from 2010 to 2012 and a research scientist from 2012
to 2013 at Columbia University, USA. He served as Professor from 2013 to 2017
in Nanjing Normal University, where he was the director and founder of Advanced
Medical Image Processing Group in NJNU. From 2017, he served as Full Professor
in Department of Informatics, University of Leicester, UK. His research interests are
deep learning in communication and signal processing, medical image processing.
He was included in “Most Cited Chinese researchers (Computer Science)” from 2015
to 2018. He won “Emerald Citation of Excellence 2017”, and “MDPI Top 10 Most
Cited Papers 2015”. He was included in top scientist list in “Guide2Research”. He is
now the editor of Scientific Reports, Journal of Alzheimer’s Disease, International
Journal of Information Management, etc. He is the senior member of IEEE and
ACM. He has conducted and joined many successful academic grants and industrial
projects, such as NSFC, NIH, EPSRC, etc.
Contributors
1 Introduction
L. K. Kumar (B)
MVGR College of Engineering (A), Vizianagaram, Andhra Pradesh, India
e-mail: [email protected]
P. Srinivasa Rao · S. Sreenivasa Rao
Department of Computer Science & Engineering, MVGR College of Engineering (A),
Vizianagaram, Andhra Pradesh, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_1
2 L. K. Kumar et al.
Her symptoms include memory loss, language difficulties, and erratic behavior [5,
6]. Following her death, he examined her brain and discovered numerous abnormal
or unusual clumps and tangled bundles of natural fibers. These plaques and tangles
in the brain are thought to be some of the most significant symptoms of Alzheimer’s
disease [7–9]. The loss of nerve cell neural connections is another distinguishing
feature. Neurons in the body communicate with muscles and organs by sending
messages from various parts of the brain. Many other complex brain problems are
on the verge of developing Alzheimer’s disease. This damage first manifests itself in
the main part of the brain known as the hippocampus [10–12]. Our brain’s memory
is stored in the hippocampus. As neurons die in our brain, other parts of the brain
suffer. By the end stage of Alzheimer’s disease, the damage is extensive, and brain
tissue begins to shrink slowly. It is a neurodegenerative disease that causes our brain
to gradually shrink. As the disease attacks our body, they experience symptoms such
as difficulty speaking in a foreign language, mood swings, loss of motivation, self-
neglect, and some behavioral changes [13–15]. Gradually, our functionality deteri-
orates, eventually leading to death. A proper Alzheimer’s disease treatment is made
on the patient’s medical history, as well as testing with medical imaging and possibly
blood tests to mandate out other potential causes. Initial symptoms are misdiagnosed
as normal aged in people’s behavior. A detailed assessment and understanding of the
brain part is required for a definitive diagnosis, but this can only be done after death.
Good nutrition, physical activity, and social connections are known to be beneficial
in general in growing older, and these may assist in lowering the risk of dementia
and Alzheimer’s disease [16–18].
Alzheimer’s disease is the world’s most common brain disorder. Examine the
mini-mental state, age, and gender here. Examine the brain structure to see how
much risk is associated with that disease. Deep learning approaches for classification
process were demonstrated using an MRI test [19–21]. Alzheimer’s disease contains
a phase of mild cognitive impairment. Because AD can develop or not, it is one of
the methods for appropriately diagnosing patients. This disease has been identified
in a large number of people. This disease primarily affects people over the age of 65.
People frequently lose their identity at this stage. They gradually lose their memory
power [22–24]. Patients with Alzheimer’s disease and patients with mild cognitive
impairment were classified in terms of efficient accuracy. They were primarily inter-
ested in differentiating between people with AD or MCI [25–27]. The most common
form of dementia is Alzheimer’s disease. Alzheimer’s disease is manageable if mild
cognitive impairment symptoms are detected early on. Continue to improve the classi-
fication and prediction accuracy of Alzheimer’s disease. Alzheimer’s is a neurodegen-
erative disease. The dataset from the Neuroimaging initiative is being used to develop
a novel method for classifying MCI, normal control, and using structural magnetic
resonance imaging [28–30]. By analyzing the functional and anatomical changes in
the brain, a computer-based diagnosis of Alzheimer’s disease can be made. Multi-
spectral image fusion considers fusing complementary information while removing
excess data to produce a single image that contains both spatial and spectral charac-
teristics [31]. The CDR is a clinical rating scale used to characterize the severity of
dementia. As a result of the CDR value of 0, There is no dementia; however, the CDR
A Framework for Early Recognition of Alzheimer’s … 3
2 Related Work
Many researchers are working to create a model for detecting Alzheimer’s disease
early. Some researchers employ various techniques to detect the presence of
Alzheimer’s disease, such as developing a model and classifying the model to achieve
the best results.
Dill et al. [38] explained, input image is registered to the template, a difficult
process, particularly in brains. Three meta data points, such as age, gender, and range,
can be used to improve accuracy. In order to achieve better results, hippocampus
segmentation can be used. Statistical analysis can be used for hippocampus segmen-
tation. Cao et al. [39] proposed, MR images contain hippocampus-derived features.
That is used in computer-aided disease diagnosis. Previously, human annotations
have been used for hippocampus segmentation. Which of the respective prepro-
cessing techniques has the greatest computational cost? So resolve the problems. For
segmentation, a multi-deep learning method is used, as well as a regression method
with high accuracy. The advantage of this method is that this is not a time-consuming
process. In this study, we create a classification for hippocampus segmentation and
regression. Yalcin et al. [40] proposed, a rough set model is a mathematical technique
for analyzing clinical data. Physiological characteristics, diagnostics, and neurolog-
ical function values are included in this report, which is mainly intended at ill patients.
In terms of clinical power characteristics, classification techniques include support
vector machine, logistic regression, random forest, and decision trees. A set of data
sets that includes genome sequences, images, demographic information, diagnostic
tests, and environmental data. In order to refer patients to long-term care, they must
also address the chronological aspect of the disease progression model.
Shipe et al. [41] Prediction models are employed during the diagnostic testing and
therapy phases to help healthcare professionals and patients. The patient’s state is
estimated using risk prediction algorithms. For instance, the TREAT model can fore-
cast whether or not a patient’s lungs would get cancer. A prognostic model predicts
the risk factors that will exist after surgery, such as the ACS surgical risk calculator.
Lin et al. [42] proposed, In terms of predicting Alzheimer’s disease, a contempo-
rary machine learning system proposals are submitted in biological domains such
as proteomics, genotyping, and system biology. This study employed statistical and
spectral methods to aid in the diagnosis of arterial hypertension. Classifiers such as
J48 decision tree, random forest, Bayes Net, and Ripper rule-based induction are used
4 L. K. Kumar et al.
3 Proposed Methodology
The proposed model is depicted in the below diagram. In this case, the data set was
obtained from the repository. The first step is to pre-process the data. The second
step is to apply all classifiers to pre-processed data. The third step is for classifiers to
divide the pre-processed data set into train data set and test data set. Finally, predict
the data based on the results.
Preprocessing is the first step in any data classification process. Data cleaning, data
integration, data transformation, and data reduction are some of the techniques used in
preprocessing. Incomplete, noisy, and inconsistent data are common characteristics
of real-world data. Data cleaning and cleansing methods aim to fill in missing values,
smooth out noise while identifying outliers, and fix data errors. Data can be noisy,
and attribute values can be erroneous. The data collection instruments may be faulty
as a result of the following. Data entry mistakes could have been made by humans
or computers. Data transfer errors are possible as well. As in data warehousing,
data integration is used in data analytic tasks that combine data from numerous
A Framework for Early Recognition of Alzheimer’s … 5
sources into a logical data storage. Binning, grouping, and regression are examples
of such procedures. Aggregation is the process of applying summary or aggregation
operations to data. Using concept hierarchies, low-level or primitive/raw data is
replaced with higher-level concepts in generalization of the data (Fig. 1).
To categorize the OASIS data set, various machine learning classifiers are used
in this paper. The dataset is classified using random forest, SVM, decision tree, and
XGB classifier. For the best outcomes, make a framework in machine learning algo-
rithms. As a result, the framework known as a CatBoost classifier is recommended
in this paper. Applying all methods to the OASIS dataset allowed us to determine
the categorization accuracy for usage in the future. In this paper, random forest,
support vector machine, decision tree classifier, and XGB classifiers are compared
to CatBoost classifier to determine which has the highest accuracy.
The CatBoost algorithm has a wide range of parameters for fine-tuning the features
during the processing stage. Gradient boosting is a machine learning algorithm for
problem solving in classification and regression. The results are in a classification
algorithm based on a collection of weak prediction models, typically decision trees.
CatBoost can boost model performance while decreasing overfitting and tuning time.
There are several parameters that can be twisted with CatBoost.
6 L. K. Kumar et al.
4 Environmental Setup
Python is the most commonly used and popular programming language. Various
machine learning tasks are carried out in Python. So, for the best results, I use Jupyter
notebook to run Python modules. Classification algorithms are used to retrieve data
from a dataset and apply algorithms to determine the best accuracy. Python 3.7.6 is
used here. Jupyter notebook is a free and open source web application for producing
documents. Windows 10 operating system is used for application development. Intel
core processor and 8GB RAM were used to implement the application. This paper
may make use of data from the OASIS dataset. This data set includes 150 individuals
ranging in age from 60 to 96. On the 373 image collections in the data set, each
subject was scanned twice or more. Non-dementia is assigned to 72 individuals.
Sixty-four subjects have been classified as demented, while 14 have been classified
as converted. Various classification algorithms are used in this data set to determine
the best classification accuracy for dividing two groups. Demented and non-demented
people are separated into two groups. A data set is a group of fields and records. The
data set is made up of real-world data. The majority of the data is raw and will be
used in the future. To classify data, must be perform several operations on the data
set.
Table 1 Accuracy of a
Classifier Accuracy Error rate
machine learning classifiers
Random forest 0.8035 0.1965
Support vector machine 0.7767 0.2233
Decision tree 0.7946 0.2054
Extreme gradient boost 0.8392 0.1608
Cat boosting 0.8571 0.1429
Accuracy Chart
0.9
0.8
0.7
0.6
0.5
0.4 Accuracy
0.3 Error Rate
0.2
0.1
0
Random Support Decision Tree Extreme Cat Boosting
Forest Vector Gradient
machine Boost
5.1 Accuracy
The accuracy of a classification model refers to how well it classifies data samples.
Table 1, the CatBoost classifier has accuracy of 85.71%. The accuracy of the
random forest and support vector machine is 80.35% and 77.67%, respectively.
The results of decision tree and extreme gradient boosting are 79.46% and 83.92%,
respectively. This demonstrates that the CatBoost classifier produces the best results.
Table 1 depicts how different machine learning classification algorithms are used
to determine accuracy.
Figure 2 represents accuracy and error rate of machine learning classifiers.
CatBoost produces high accuracy in chart.
5.2 Precision
Table 2 Precision of a
Classifier Precision
machine learning classifiers
Random forest 0.83
Support vector machine 0.84
Decision tree 0.84
Extreme gradient boost 0.86
Cat boosting 0.89
Precision
0.9
0.89
0.88
0.87
0.86
0.85
0.84 Precision
0.83
0.82
0.81
0.8
Random Support Decision Tree Extreme Cat Boosting
Forest Vector Gradient
machine Boost
Table 2 depicts how different machine learning classification algorithms are used
to determine precision.
Figure 3 represents precision of machine learning classifiers. CatBoost produces
high precision in chart.
According to Table 2, the CatBoost classifier has precision of 89%. The preci-
sion of the support vector machine and decision tree is 84% and 84%, respectively.
The results of random forest and extreme gradient boosting are 83% and 86%,
respectively. This demonstrates that the CatBoost classifier produces the best results.
5.3 Recall
A classification model’s recall refers to what percentage of the total positive values
are predicted positive values.
Figure 4 represents recall of machine learning classifiers. CatBoost and extreme
gradient boost produces high recall in chart.
Table 3, the CatBoost classifier has a recall of 83%. The recall of the random
forest, support vector machine, decision tree, and extreme gradient Boost are 80%,
10 L. K. Kumar et al.
Recall
0.84
0.82
0.8
0.78
0.76
0.74
Recall
0.72
0.7
0.68
0.66
Random Support Decision Tree Extreme Cat Boosting
Forest Vector Gradient Boost
machine
72%, 77%, and 83%, respectively. This demonstrates that the extreme gradient boost
and CatBoost classifier produces the best results.
Table 3 depicts how different machine learning classification algorithms are used
to determine recall.
5.4 F1-Score
The F1-score of a classification model refers to how well it classifies data samples.
Figure 5 represents F1-score of machine learning classifiers. CatBoost produces
high F1-score in chart.
Table 4 depicts how different machine learning classification algorithms are used
to determine F1-score.
According to Table 4, the Cat Boost classifier has an F1-score of 86%. The
F1-score of the random forest, support vector machine, decision tree, and extreme
gradient boost are 81%, 77%, 80%, and 85%, respectively. This demonstrates that
the CatBoost classifier produces the best results.
A Framework for Early Recognition of Alzheimer’s … 11
F1-score
0.88
0.86
0.84
0.82
0.8
0.78 F1-score
0.76
0.74
0.72
Random Support Decision Tree Extreme Cat Boosting
Forest Vector Gradient
machine Boost
Table 4 F1-score of a
Classifier F1-score
machine learning classifiers
Random forest 0.81
Support vector machine 0.77
Decision tree 0.80
Extreme gradient boost 0.85
Cat boosting 0.86
Machine learning techniques can aid in the early diagnosis and detection of a variety
of diseases in medicine and health care studies. According to the findings of this
study, the CatBoost classifier has an accuracy of 85.71%. According to the findings,
data can be classified into groups. As a result, the CatBoost classifier divides data
into two categories: demented and non-demented. The findings demonstrated that
the data could be classified as demented or non-demented. Use precision, recall, and
F1-score results in the same manner. To detect Alzheimer’s disease early, a demented
group was selected from a classification algorithm, and different fields were applied
to the results, fields such as the MMSE and the clinical dementia rating, which were
used to determine the disease status of the patients. Machine learning techniques can
be used successfully in disease detection, prediction, and diagnosis. Identify people
with Alzheimer’s at an early stage and recommend yoga, daily exercises, eating
healthy foods, and counseling to help them maintain their mental stability. Because
of all of the suggestions, people can avoid reaching a critical stage and live a healthy
life.
12 L. K. Kumar et al.
References
23. Gupta Y, Lama RK, Kwon G-R (2019) Prediction and classification of Alzheimer’s disease
based on combined features from apolipoprotein-E genotype, cerebrospinal fluid, MR, and
FDG-PET imaging biomarkers. Front Comput Nuerosci
24. Billeci L, Badolato A, Bachi L, Tonacci A (2020) Machine learning for the classification
of Alzheimer’s disease and its prodromal stage using brain diffusion tensor imaging data: a
systematic review. MDPI
25. Krishna Prasad MHM, Thammi Reddy K (2014) A efficient data integration framework
in Hadoop using MapReduce. In: Computational intelligence techniques for comparative
genomics. Springer Briefs in Applied Sciences and Technology, pp 129–137. ISSN: 2191-530X
26. Li Q, Wu X, Xu L, Chen K, Yao L (2018) Classification of Alzheimer’s disease, mild cognitive
impairment, and cognitively unimpaired individuals using multi-feature kernel discriminant
dictionary learning. Front Comput Neurosci
27. Liu M, Zhang D, Shen D (2012) Ensemble sparse classification of Alzheimer’s disease.
NeuroImage
28. Latha Kalyampudi PS, Swapna D (2019) An efficient digit recognition system with an
improved pre-processing technique. In: ICICCT 2019—system reliability, quality control,
safety, maintenance and management. Springer Nature Singapore, pp 312–321
29. Khan RU, Tanveer M, Pachori RB (2020) A novel method for the classification of Alzheimer’s
disease from normal controls using magnetic resonance imaging. Expert Syst
30. Vidya Sagar Appaji S, Srinivasa Rao P (2018) A novel scheme for red eye removal with image
matching. J Adv Res Dyn Control Syst 10(13)
31. Bhateja V, Moin A, Srivastava A, Bao LN, Lay-Ekuakille A, Le D-N (2016) Multispectral
medical image fusion in contourlet domain for computer based diagnosis of Alzheimer’s
disease. Rev Sci Instrum 87(7):074303
32. Vadaparhi N, Yarramalle S (2014) A novel clustering approach using Hadoop distributed
environment. (Appl Sci Technol) 9:113–119, ISSN: 2191-530X
33. Vos SJB, Xiong C, Visser PJ, Jasielec MS, Hassenstab J, Grant EA, Cairns NJ, Morris
JC, Holtzman DM, Fagan AM (2014) Preclinical Alzheimer’s disease and its outcome: a
longitudinal cohort study. HHS Public Access
34. Zhang D, Wang Y, Zhou L, Yuan H, Shen D (2011) Multimodal classification of Alzheimer’s
disease and mild cognitive impairment. Neuro Image Sci Direct 55(3)
35. Maram B, Gopisetty GKD (2019) A framework for data security using cryptography and image
steganography. Int J Innov Technol Explor Eng (IJITEE) 8(11). ISSN: 2278-3075
36. Arevalo-Rodriguez I, Smailagic N, Figuls MRI, Ciapponi A, Sanchez-Perez E, Giannakou
A, Pedraza OL, Cosp XB, Cullum S (2015) Mini-mental state examination (MMSE) for the
detection of Alzheimer’s disease and other dementias in people with mild cognitive impairment
(MCI). Cochrane Library
37. Calero M, Gómez-Ramos A, Calero O, Soriano E, Avila J, Medina M (2015) Additional
mechanisms conferring genetic susceptibility to Alzheimer’s disease. Front Cell Neurosci 9
38. Dill V, Klein PC, Franco AR, Pinho MS (2018) Atlas selection for hippocampus segmentation:
relevance evaluation of three meta-information parameters. Comput Biol Med 95
39. Cao L, Li L, Zheng J, Fan X, Yin F, Shen H, Zhang J (2018) Multi-task neural networks for
joint hippocampus segmentation and clinical score regression. Springer Science
40. Yalcin A, Barnes LE, Centeno G, Djulvegovic B, Fabri P, Kaw A, Tsalatsanis A (2013)
Classification models in clinical decision making. University of Florida
41. Shipe ME, Deppen SA, Farjah F, Grogan EL (2019) Developing prediction models for clinical
use using logistic regression: an overview. J Thoraic Dis (4)
42. Lin S-K, Hsiu H, Chen H-S, Yang C-J (2021) Classification of patients with Alzheimer’s disease
using the arterial pulse spectrum and a multilayer-perceptron analysis. Sci Rep 11
43. Jo T, Nho K, Saykin AJ (2019) Deep learning in Alzheimer’s disease: diagnostic classification
and prognostic prediction using neuroimaging data. Front Aging Neurosci
On the Studies and Analyzes of Facial
Detection and Recognition Using
Machine Learning Algorithms
1 Introduction
In terms of computer vision, one of the most widely researched topics would be
detection and recognition. This concept is implemented in various fields like safety,
education, automobile, social media, etc. Facial detection and recognition are widely
implemented concepts, and one such example in our daily life is Facebook, where
the app automatically detects and recognizes the people in a particular photograph.
The working of different algorithms implemented is the initial base in understanding
the process of detection and recognition. The facial recognition market is predicted
to be estimated at 8.5 billion dollars by the year 2025 [1]. This proves that the market
expands with the increasing demand of employing facial detection and recognition
in almost all aspects and different industries.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 15
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_2
16 N. Thampan and S. A. Muthukumaraswamy
2 Related Study
• Recognition: This is the last step of the process where the object is recog-
nized based on what it has been trained on. In other words, it is the process
of identification of various categories of certain objects.
This implementation does not require a large dataset and hence, is a better choice to
opt for, if the datasets are limited. Features in the images or object(s) can be extracted
and fed into the machine learning model by various feature extraction methods, and
subsequently, these are categorized and classified. Along with this, this mode of
implementation offers flexibility since it chooses the best outcome with the specified
features and is less intricate than deep learning. This means of working can fetch
accurate results despite the size of the dataset.
One of the most powerful and oldest algorithms used to detect an object is the Haar
cascade detection algorithm (or HC algorithm) shown in Fig. 1. Those features in
faces are extracted by the HC features.
This machine learning algorithm was proposed by Paul Viola and Michael Jones
in 2001. This method involves training on various amounts and sorts of positive and
negative images. With these negative and positive images, one can train the Haar
cascade classifier to distinguish whether a face is present or not. Furthermore, it
condenses computational time and simplifies the algorithm.
The LBPH makes use of four factors to analyze a face for recognition: radius, neigh-
bors, grid X, and Y coordinates. The concept is to train the LBPH algorithm with
the training datasets and yield the ID of the facial recognition that describes what
the recognized object is [7]. Its working is simple yet efficient. The concentration of
each pixel is either 0 (very less) or 1 (high), which are arranged in 3 × 3 window [7].
After this, the binary is converted to decimals and further translated to histograms
[7]. In this approach, the histogram values for each face in the training datasets can
be retrieved as shown in Fig. 2.
On comparing other local face detection algorithms like Eigenface and Fisherface,
these can be influenced by lighting and illumination conditions, whereas LBPH
algorithm obtain results regardless of the lighting circumstances involved in the
dataset. In this study, the Haar cascade classifier was considered for face detection,
and the LBPH was considered for face recognition as machine learning approach.
Deep learning, a subfield of machine learning, involves various networks and layers
working beneath one another. This mode of implementation can be done either from
scratch or with a pre-trained model to execute the action. If going for the initial
method, then a large number of datasets are required for the algorithm to train on
and build up its confidence score when recognizing the object. Since this model
is being built from scratch, one will have to manually assign weights and biases.
If choosing the latter, since that model would already be trained, the only addition
would be to supply the new data. This is comparatively a lesser drawn-out method
since computational time is speedy. Despite the complexity and time involved in
means of recognition, it can promise results in levels of high accuracy. In this study,
GoogleNet was considered as the deep learning approach.
3.2.1 GoogLeNet
This deep learning algorithm based on neural networks was proposed in 2014 in the
paper ‘Going Deeper with Convolutions’ by Google. This model can detect objects in
various images included in the dataset. This CNN model does not require bounding
boxes around the detected object, and throughout their paper [8], there are evident
signs indicating that using sparse architecture is practical and much more convenient
[8]. The overall function here is to retrieve information through convolution and
pooling layers with unalike window shapes to reduce model complexity. This layout
can be seen in Fig. 3.
4 Implementation
The detection algorithm being evaluated first is the ‘Haar cascade object detection’
algorithm. In this case, the object is replaced by a face. With the importation of the
main library OpenCV, the algorithm file ‘Haar cascade for frontal face’ in its .xml
extension was downloaded in prior.
The training datasets consisting of three people, each containing 8 data samples
that would be useful in training to recognize them were stored. The images were
converted to its gray scale in order to minimize noise in the image as much as possible,
and boundary boxes were drawn around the detected face. Figure 4 represents the
Haar cascade algorithm’s path of work.
The next step of evaluation was recognition of faces. Again, with the use of OpenCV
library, the Haar cascade classifier for detection of frontal face and inclusion of the
LBPH face recognizer was introduced in the initial lines of the code. In the later
stages, the images were converted to gray scale and resized for training purposes.
Once the execution was done, the algorithm started to train itself and get saved under
‘trainner.yml’. This file has all the data, such as the coordinates of the object in each
iteration of different images and many such information stored. Figure 5 shows the
working path of this algorithm.
are to be replaced by user-specified layer for classification and training purposes [10].
Figure 7 denotes the process of transfer learning in a pre-trained network.
By doing so, the weights/biases of the pre-trained model will be frozen, and as the
network trains, those values will not be trained or updated. Subsequently, this network
would only re-initialize the weights for the required purposes. On loading the pre-
trained network, the final layers are replaced. The fully connected layer is substituted
with a new fully connected layer. This new layer consists of valuable information of
class probabilities, predicted labels, etc. GoogLeNet used in this study was trained
for 6 epochs with mini-batch size of 5.
With 18 training data samples included for the face recognizer and around 40 for the
CNN model, the codes were compiled. The program Python with OpenCV library
was opted for Haar cascade and LBPH face recognizer, whereas MATLAB was opted
for exploration of the CNN-based algorithm, GoogLeNet.
This algorithm performed well in terms of computational time; it provided the output
in only 2 s. It took 2.11 s as compiling time to detect a face in real-time video and
only 2.2 s in detecting a face in an image (inclusive of computational time). In the
initial running of the algorithm, it was only able to detect a single face from a crowd
or detect all faces except for one. Some of the results also had false positives. These
discrepancies can be seen in Fig. 8.
Tuning the parameters was found as the ideal solution to obtain all faces in a picture
with no false positives. Hence, this issue was rectified by fine-tuning constraints such
On the Studies and Analyzes of Facial Detection … 23
(a) (b)
Fig. 8 a Person not detected, b false detection in a picture of a group of people using HC algorithm
Fig. 9 Haar cascade object detection, detecting all faces after final tuning
OpenCV also supports LBPH face recognizer. The training datasets consisted of 8 sets
of pictures of three people. This algorithm, in the first run, was not able to recognize
24 N. Thampan and S. A. Muthukumaraswamy
faces. This would have been due to the presence of other objects/accessories in the
training images such as sunglasses. This is a factor that can affect the algorithm. With
rectifying that issue by replacing those training images with those of only proper
faces of the person (no accessories, no photos of the person with other people), the
algorithm was able to recognize some faces upon training. Initially, there were false
outcomes which were rectified by
• Readjusting the size of the training images
• Including even number of training datasets to avoid faulty learning/training.
Despite tuning certain parameters, the algorithm was not able to determine another
person. This could have been due to the fact that there were occurrences of shadow
on the test image. Shadow occurrences in face can change the confidence score
and boost more of false recognition. Another reason as to why the third person
was not recognized rightly could be due to the fact that the training images of this
person included a non-uniform number of his younger self and older self. This initial
discrepancy would have been the reason the algorithm showed false result. These
problems were rectified by including a good amount of training images.
Later, different test images of the third person were put to validation, and finally,
the algorithm was able to accurately recognize that person. After altering those
inconsistencies, the LBPH algorithm was able to train in about 2.5 s, with an overall
computational time being 3.4 s and in this interval of time, it was able to recognize
on an accuracy scale of 85% when subjected to test images.
This pre-trained model that was loaded on MATLAB uses CNN model architected by
Google and has already been trained with more than hundred images and can classify
almost thousand different objects. This model has been used for facial detection and
recognition training by the method of ‘transfer learning’ by replacing the last two
layers of the structure with user-defined layers. By doing so, the user is allowed to
be flexible in terms of training the network to produce desired results. The dataset
consisted of training images of four people, each having a set of 10 image, where
a common split of 70% was considered for training, and the remaining 30% was
used up for validation. With the pixel and scaling ranges defined, this deep learning
algorithm was allowed to pass through the datasets and train itself. As it can be seen
in Fig. 10, the accuracy was low initially, but it gradually spiked up with rate of loss
decreasing. The training was completed with a learning rate set to 0.0003.
In the start, the network recognized the test images incorrectly. This was due to
the fact that some of the test images in the directory along with the network directory
were not in the same path. Another case arose due to the poor resizing of images
and uneven data samples submitted. After those rectifications, some of the samples
On the Studies and Analyzes of Facial Detection … 25
scored a confidence rate less than 80%. To resolve this, more training datasets were
supplied, and finally, the algorithm scored a better confidence score as seen in Fig. 11.
For (c) of Fig. 11, the network felt extremely confident in seeing a black and
white scaled test image to be of Einstein. Although Einstein was included in the
dataset, the test image is not of him. A reason could be that there were more black
and white photos supplied for Einstein’s dataset. This could be rectified by supplying
the training datasets of Einstein with colored samples as well; giving uniformity in
both learning and training. For this case of GoogLeNet, it can be built to be more
accurate by supplying a vaster number of datasets to train on.
Table 1 summarizes the efficiency of algorithms obtained during the tests. The
only shortcoming for GoogLeNet is that it takes more time in training its sets (around
40 s) unlike LBPH which is much quicker. Directly inputting black and white images
for training could save computational time, but it would not be ideal since some edges
might not be expressed, thereby affecting the algorithm.
6 Conclusion
From evaluating the Haar cascade classification for face detection and LBPH recogni-
tion with the GoogLeNet convolutional neural network, it is conspicuous that the Haar
cascade classifier with LBPH for detection and recognition, respectively, computes
much quicker and is ideal when dataset is on a smaller scale. With some parameters
tuned, facial detection algorithm can work effectively, and the user can manually
tune out false positives. The LBPH can train as quick and provide accurate results.
When compared to the previous, the GoogLeNet requires a large dataset to train and
validate over. This network takes much more time to train compared to the previous
approach, but its accuracy is much higher. To create higher accuracy, more datasets
are required. One should opt for the right approach based on their preference. If less
datasets and quick computation are the preference, then machine learning approach
is advised. If not, then the CNN network does the job with much higher accuracy
with a downside of longer computational time. In any case of facial recognition, age
factor in the test images and training set makes an impact on the algorithm and its
result; hence, algorithms can find it difficult to make the right guess in recognizing
a person of young age and their older self, but the possibility of including a rather
huge dataset of such faces may open up a powerful recognition tool.
References
7. Face Recognition: Understanding LBPH Algorithm. Towards Data Science (2017). http://www.
towardsdatascience.com/face-recognition-how-lbph-works-90ec258c3d6b
8. Szegedy C et al (2015) Going deeper with convolutions. In: IEEE conference on computer
vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
9. MathWorks, Update parameters using stochastic gradient descent with momentum (SGDM)
MATLAB sgdmupdate. https://www.mathworks.com/help/deeplearning/ref/sgdmupdate.html
10. MathWorks, Transfer learning using pretrained network. https://in.mathworks.com/help/dee
plearning/ug/transfer-learning-using-pretrained-network.html
IPL Analysis and Match Prediction
1 Introduction
With innovation filling plentifully over the most recent years, an all-around getting of
data has gotten reasonably straightforward. Consequently, machine learning is ending
up being a critical example in sports assessment considering the availability of life
similarly to chronicled data. Analytics of sports would be the procedure for gathering
the previous game information and investigating it exploring basic information from
it, from an assumption that supports the powerful and dynamic judgement. It could
be anything whether to buy a player (not just in the auction), else whom to set on the
field in coming match, using more aggressive errand like setting up the systems to
match in the future relying on the prediction being made utilizing different factors
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 29
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_3
30 A. Singhal et al.
from past matches. Benefits of the proposed system include: (i) it has the option of
both visualization and tabular output for a few functions, (ii) it could be of great
help to captain and coaches to make the right pre-match decisions, (iii) it could be of
incredible assistance to in conclusion for individuals who are interested with regards
to IPL and its insights, (iv) it could be of extraordinary assistance to put resources
into the right group for wagering.
The dataset utilized in this work is an assortment of various match plays, and
there are around 817 match subtleties with complete data about the match winner,
location toss winner, team names, and other important attributes. The matches are
from 2008–2020. This dataset has helped us accomplish the main aim of our project.
2 Literature Survey
In most recent couple of years, numerous models were made related to sports analysis.
One of them, Banasode et al. [1], made an application for the purpose of analyzing
the data by picking up the attribute from the dataset and anticipating the fate of the
match and as well as of the players. Prediction is done for anything like which player
of the team will play good in the upcoming matches, which team will win the toss
and even the matches. Anticipating the winner of a cricket match relies upon many
factors like batsman’s performances, team’s strength, venues, weather conditions,
etc. In one of the research, the algorithms used are Naive Bayes, decision tree, and
SVM. The expectation model will have benefits for cricketing sheets like studying
the group’s solidarity and cricket examination [2].
Some also try for the mining models [3]. For making the mining model, the model
is streamlined by selecting parameters and iterating. To extract actionable patterns
and detailed statistics, the parameters are then taken care of into the dataset. This
work centres around observing the significant data about the IPL teams by using the
functions of the R package. R reduces the complexity of data analysis as it shows the
results using visual portrayals. The dataset is stacked, and a bunch of pre-handling
is done trailed by highlight determination, or we know as feature selection.
Some likewise attempt KNIME tool [4]. In his model, prediction is done by
using Euler’s strength calculation formula, KNIME tool, and Naive Bayes network.
Datasets and the previous statistics are trained in order to cover all dimensions and
important factors such as toss, venue, captains, favourite players, previous battles,
previous statistics, etc.
Amala Kaviya et al. [5] have engraved the results by using a detailed ball-by-ball
dataset of all the matches played in the history of IPL and done a comprehensive
analysis on various aspects regarding measures associated with the game alongside
pragmatic visualizations. They ranked all the players on the basis of the player ranking
index.
Some uses multivariate regression-based methodology to measure the points of
each team in the league table. The past performance of every team decides its like-
lihood of dominating a game against a specific rival. At last, a set of seven factors
IPL Analysis and Match Prediction 31
or attributes is recognized that can be used for predicting the IPL match winner [6].
Some uses logistic regression as well. Experimental results show that accuracy is
very less if we use logistic regression [7].
Lamsal and Choudhary [8] built a model after recognizing seven variables which
impact the result of an IPL match using multi-layered perceptron.
Sankaranarayanan and Sattar [9] use clustering methods as well for match predic-
tion. They used linear regression, nearest neighbouring, and clustering methods to
introduce the numerical outcomes that exhibit the performance of all the algorithm
used in result prediction of the model.
We know that cricket carries a lot of similarities in itself from baseball. Since a
great work and conversations are as of now accessible on baseball. This technique
for sabermetrics manages the use of measurable strategies to make predictions on the
sport of baseball. This paper attempts to apply comparable methodologies and proce-
dures to the sport of cricket. Execution examination utilizing bowling and batting
midpoints, economy rate, and strike rates was proposed by Lemmer separately [10].
Normal batting averages face a drawback related to the players who are not out in
a match. To overcome this drawback, Kimber and Hansford [11] came up with a
mind-blowing idea of alternate batting averages methods. To manage circumstances
when the batsman has not been out in the one-day matches.
This paper attempts to fulfil all those needs. From providing an interactive and
user-friendly portal, which would provide very advanced functionalities in order to
perform detailed exploratory analysis on all dimensions of matches.
3 Proposed Methodology
The proposed methodology that is used has been described compactly in this archi-
tecture. The first step is the processing of datasets and loading them in the back
end. The analysis and prediction of the match are performed. Then, user interface
with different functionalities is provided, which can be used for match analysis and
prediction (Fig. 1).
For analysis, prediction, and visualization, we have implemented the following
modules.
• Processing the datasets
• Match analysis
• Visualization
• Match prediction
• Creating user interface
32 A. Singhal et al.
This module is to analyze the datasets completely. Apart from including the basic
functionalities like previous battles, it is also integrated with advanced analysis and
visualization functionalities. A subset of them includes:
Head on Head Analysis of Teams: In this, a comparison of the two teams is
performed by analyzing the matches they played in the past against each other.
This feature will offer great help in predicting the winning team. A subset of which
includes captain’s decision after winning the toss, percentage of winning of toss
winning team.
IPL Analysis and Match Prediction 33
3.3 Visualization
For prediction, we analyze all the factors affecting the results of the match.
Prediction is made on three sets of data:
Set 1: Training Data—Season 1 to Season 10 IPL Data and Testing Data—Season
11.
Set 2: Training Data—Season 1 to Season 11 IPL Data and Testing Data—Season
12.
Set 3: Training Data—Season 1 to Season 12 IPL Data and Testing Data—Season
13.
We are using various classification algorithms for predicting the match winner.
In machine learning, classification is an important approach to classify different
classes. It is a supervised learning method in which the computer programme gains
from the training data and uses this to figure out how to classify new data. Here, four
different classification algorithms are applied, namely SVM, decision tree, K-nearest
neighbour, and random forest.
IPL Analysis and Match Prediction 35
Decision Tree: A decision tree is a supervised learning algorithm that is utilized for
both regression as well as classification. Decision tree is fundamentally a graph that
uses a tree-based technique to exhibit each conceivable result of a choice.
SVM: A SVM is a supervised machine learning algorithm that can be used for
both classification and regression problems. We perform classification by finding the
hyperplane that differentiates the two classes very well.
K-Nearest Neighbour: KNN is the machine learning which is used for both regres-
sion as well as classification. A K-nearest neighbour is a classification algorithm that
aims to determine how close the group of data points are around it.
Random Forest: Random forest is a powerful and versatile supervised machine
learning algorithm that can be used for both regression as well as classification. It
develops and consolidates multiple decision trees to create a “forest”.
A Web application is created using various front-end and back-end frameworks like
React and Django to make user interaction more efficiently.
Web templates are designed to make the output more attractive to the user, and
all the ML algorithms are connected through Django in the back end.
The IPL dataset was prepared and trained in different machine learning algorithms for
the database that included all the match details from 2008 to 2020, and the accuracy
shown by the algorithms is discussed below (Tables 1 and 2).
Some of the predictions made by our model is discussed below (Table 3).
5 Conclusion
The T-20 format of cricket carries a lot of randomness as it is the shortest format, and
the whole game can be changed in just one over. Therefore, predicting the winner of
these formats is very challenging and complex. But with the help of ML algorithms,
prediction can be made more efficiently and easily.
In this study, we identified various factors that influence the results of any IPL
match like match venue, toss, and decision after winning the toss.
36 A. Singhal et al.
Table 3 Predictions
Match Result Prediction
Decision tree SVM KNN Random forest
MI versus CSK (2018) CSK CSK CSK CSK CSK
KXIP versus DD (2018) KXIP KXIP DD DD KXIP
RCB versus KKR (2018) KKR KKR RCB KKR KKR
RCB versus CSK (2019) CSK CSK RCB RCB CSK
KKR versus SRH (2019) KKR KKR SRH SRH KKR
MI versus DC (2019) DC MI MI MI DC
MI versus CSK (2020) CSK CSK CSK MI CSK
DC versus KXIP (2020) DC DC KXIP DC DC
KKR versus RCB (2020) RCB RCB RCB RCB RCB
the dataset. We also plan to make our model more accurate by using more attributes
like player’s performance, etc.
References
1. Banasode P, Patil M, Verma S (2021) Analysis and predicting results of IPL T20 matches. IOP
Conf Ser Mater Sci Eng 1065:012040
2. Srikantaiah KC, Khetan A, Kumar B, Tolani D, Patel H (2021) Prediction of IPL match outcome
using machine learning techniques. In: Proceedings of the 3rd international conference on
integrated intelligent computing communication & security (ICIIC). Atlantis highlights in
computer sciences, vol 4
3. Sudhamathy G, Raja Meenakshi G (2020) Prediction on IPL data using machine learning
techniques in R package. ICTACT J Soft Comput 11(01)
4. Bhutada S, Team (2020) IPL match prediction using machine learning. Int J Adv Sci Technol
29(5):3438–3448
5. Amala Kaviya VS, Mishra AS, Valarmathi B (2020) Comprehensive data analysis and
prediction on IPL using machine learning algorithms. Int J Emerg Technol 11(3):218–228
6. Sai Abhishek Ch, Patil KV, Yuktha P, Meghana KS, Sudhamani MV (2019) Predictive analysis
of IPL match winner using machine learning techniques. Int J Innov Technol Explor Eng
(IJITEE) 9(2S). ISSN: 2278-3075
7. Vistro DM, Rasheed F, David LG (2019) The cricket winner prediction with application of
machine learning and data analytics. Int J Sci Technol Res 8(09)
8. Lamsal R, Choudhary A (2018) Predicting outcome of Indian premier league (IPL) matches
using machine learning
9. Sankaranarayanan, Sattar J (2014) Auto-play: a data mining approach to ODI cricket simulation
and prediction. In: Proceedings of SIAM conference on data mining, pp 1–7
10. Lemmer H (2004) A measure for the batting performance of cricket players. S Afr J Res Sport
Phys Educ Recreation 26:55–64
11. Kimber AC, Hansford AR (1993) A statistical analysis of batting in cricket. J R Stat Soc
156:443–455
12. Rupai AAA, Mukta M, Islam AKMN (2020) Predicting bowling performance in cricket from
publicly available data. In: International conference on computing advancements, pp 1–6
13. Passfield L, Hopker JG (2017) A mine of information: can sports analytics provide wisdom
from your data? Int J Sports Physiol Perform 12(7):851–855
14. Gupta S, Jain H, Gupta A, Soni H (2017) Fantasy league team prediction. Int J Res Sci Eng
6(3):97–103
15. Deep Prakash Dayalbagh C, Patvardhan C, Vasantha Lakshmi C (2016) Data analytics based
deep mayo predictor for IPL-9. Int J Comput Appl 152(6):6–11
16. Kampakis S, Thomas W (2015) Using machine learning to predict the outcome of English
county twenty over cricket matches. arXiv preprint arXiv:1511.05837
17. Hajgude J, Parameshwaran A, Nambi K, Sakhalkar A, Sanghvi D (2015) IPL dream team-
A prediction software based on data mining and statistical analysis. Int J Comput Eng Appl
9(4):113–119
18. Freitas AA (2014) Comprehensible classification models—a position paper. SIGKDD Explor
15(1)
19. Halvorsen P, Sægrov S, Mortensen A, Eichhorn A, Stenhaug M, Dahl S, Stensland HK, Gaddam
VR, Griwodz C et al (2013) Bagadus: an integrated system for arena sports analytics: a soccer
case study. In: Proceedings of the 4th ACM multimedia system conference. ACM, pp 48–59
20. Saikia H, Bhattacharjee D (2011) A Bayesian classification model for predicting the
performance of all-rounders in the Indian premier league. Vikalpa 36(4):51–66
38 A. Singhal et al.
21. Lewis A (2008) Extending the range of player performance measures in one-day cricket. J
Oper Res Soc 59:729–742
22. Bandulasiri A (2008) Predicting the winner in one day international cricket. J Math Sci Math
Educ 3(1):6–17
23. Saikia H, Bhattacharjee D, Bhattacharjee A (2003) Performance based market valuation of
cricketers in IPL. Sport Bus Manage Int J 3(2):127–146
24. Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on
document analysis and recognition, vol 1. IEEE, pp 278–282
25. https://www.rediff.com/cricket
26. https://www.iplt20.com
Application of ANN Combined
with Machine Learning for Early
Recognition of Parkinson’s Disease
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_4
40 B. Uppalapati et al.
the age of 50 [3]. Men are most likely to get affected by Parkinson’s disease than
women by 1.5 times. Parkinson’s disease symptoms are split into two types: motor
and non-motor. Posturing difficulty, gait freezing, tremor, and tiredness are all motor
signs. Rapid Eye Movement (REM) sleep behavior disorder, cognitive impairment,
and mental problems are examples [4, 5] of non-motor symptoms [6, 7]. The actual
cause of this disease is unknown yet. But some elements are more likely to cause PD
like genetic inheritance and environmental factors. Researchers observed that gene
mutation is only observed in about 10–15% of the patients.
Parkinson’s disease symptoms are mostly unnoticeable at the beginning, but
gradually they get severe over time. As the severity increases, people experience
major changes in their body like difficulty in body movement, speaking, depression
and, memory problems [8–10]. Parkinson’s disease mainly causes the dopamin-
ergic effect, neurons degenerating less amount of dopamine, which results in weak
neurotransmission. This causes shakiness, muscle stiffness, and movement prob-
lems. The breakdown or death of neurons in the basal ganglia, the area in the brain
which controls body movement, causes Parkinson’s disease [11]. PD patients suffer
from loss of norepinephrine hormone which acts as a messenger between the nerve
endings. It controls non-motor features like heart rate and blood pressure. The diag-
nosis of Parkinson’s disease is difficult since early symptoms are also observed in
various health issues [12–14]. CT and MRI scans are used to detect such disor-
ders that cause similar symptoms. However there is no cure for Parkinson’s disease,
medicines and other therapies can help people manage their symptoms. Exercise is
the best practice to help in controlling symptoms significantly [15–17].
Deep learning is one of the methods of Machine Learning (an ability of the
machine to learn using large amounts of data sets instead of a set of instructions)
where it allows us to train machines to predict an output on a given set of inputs.
It allows us to use both structured and unstructured data to train and learn. The
biggest advantage of deep learning is, with continuous training, the architecture will
become adaptive and work on complex problems. It works like a human brain for
data management and mapping the patterns for future reference. The larger the size
of the data set more the efficient decision-making [18]. One of the hardships using
deep learning will be the cost of computational power, the larger the data set, the
computational power increases more. This will cause a lack of transparency in fault
revision [19, 20].
Deep learning is based on neural networks, which are layers of nodes that function
similarly to neurons in the human brain. Individual layers’ nodes are linked to those
of contiguous layers. Because of the amount of levels, the network is considered to be
more complex [21–23]. Signals move between nodes in an artificial neural network
and apply weights to them. A node with a higher weight will have a greater impact
on the nodes below it. The last layer assembles the weighted inputs into a result [24].
Because deep learning systems handle a vast quantity of data and perform multiple
difficult mathematical computations, they demand strong hardware. The real-world
Application of ANN Combined with Machine Learning … 41
applications of deep learning are virtual assistants (Siri and Alexa), language transla-
tion applications, chatbots used in banking and health applications, and facial recog-
nition methods to recognize a person in pictures [25, 26]. The contributions for this
paper are:
1. To distinguish between healthy and PD patients, a Parkinson’s disease classifying
system is being developed.
2. Design of a hybrid classifier by associating deep learning ANN with a Machine
Learning classification algorithm.
The remaining portion of the paper is structured as follows: Sect. 2 goes over the
literature work, Sect. 3 deals with the methodology, Sect. 4 talks about the experi-
mental setup, Sect. 5 reviews the performance analysis and experimentation results,
and Sect. 6 discusses the conclusion and future work.
2 Literature Work
At present, the available medical diagnosis methods for Parkinson’s disease are fewer,
which made many researchers look for effective solutions for detecting PD at the early
phase. Salama et al. [27] developed a method using various feature assessments and
Machine Learning classification approaches based on the study of voice problems
to enhance the diagnosis of PD. For determining the best solution to the problem, a
multi-feature evaluation approach and Machine Learning classifiers are used.
The results revealed that Naive Bayes and Random Forest both improved their
accuracy in detecting PD at a faster rate. Sivaranjini and Sujatha [28] designed an
approach for identifying MR pictures of Parkinson’s disease patients and healthy
participants using Deep Learning Neural Networks. For data categorization, the
AlexNet architecture from Convolution Neural Network is used. Transfer Learning
techniques were used to train the MR images and tested for their accuracy measures.
With the proposed approach, an accurate measure of 88.9% is obtained.
Senturk [29] suggested a method for identifying Parkinson’s disease using
Machine Learning classifiers and feature selection approaches. The Recursion
Feature Elimination approach and the Feature Importance method were used in the
feature selection process. Support Vector Machine classifier algorithm with Recur-
sive Feature Elimination method obtained 93.85% of accuracy for detecting PD. Celik
and Omurca [30] experimented by analyzing voice datasets of PD patients and healthy
control subjects. They applied Principal Component Analysis (PCA) and Information
Gain (IG) techniques for analyzing features extracted. Machine Learning classifica-
tion methods were implemented for better prediction of PD. Pahuja and Nagab-
hushan [31] experimented by processing speech datasets and implemented Machine
Learning classifiers to find the most efficient and accurate classifier for PD classi-
fication. Levenberg-Marquadart with Artificial Neural Networks was found as the
best and most accurate with 95.89% measure in Parkinson’s disease. A comparison
of related works is listed in Table 1.
42 B. Uppalapati et al.
Wang et al. [36] suggested a deep learning model and compared it with twelve
Machine Learning models for finding the best and accurate model for early predic-
tion of PD depending upon premotor indicators. The proposed deep learning model
obtained the superior accuracy of 96.45%. Byeon [35] developed a depression model
for early diagnosis of PD by implementing eight models of Support Vector Machine
ML classifier where two types of SVM are implemented with four algorithms. The
results indicated that Nu-SVM model with Gaussian-based algorithm achieved the
highest accuracy with measure of 95.0%. Grover et al. [32] developed a Deep Neural
Network model for predicting the severity of PD by analyzing the speech dataset
of PD suffering patients using TensorFlow library. The proposed model achieved
accuracy of 83.36% and 81.66% in training and testing data.
Lahmiri and Shmuel [37] presented a Machine Learning model to identify
Parkinson’s disease according to the voice patterns. This model focuses on eval-
uating the effectiveness of eight alternative pattern ranking approaches when used
in conjunction with the Nonlinear SVM to distinguish between PD patients and
healthy subjects. Shahid and Singh [38] developed a Deep Neural Network model
by analyzing speech dataset for the prediction of PD. Principal Component Anal-
ysis is implemented to lessen the input feature space. Unified Parkinson’s Disease
Rating Scale (UPDRS) score is considered for assessing the Parkinson’s disease in
the proposed model. Nilashi et al. [39] designed a method to predict UPDRS metrics
utilizing voice signals. They developed a hybrid technique for predicting the PD’s
clinical scales Total UPDRS and Motor UPDRS. They devised a hybrid method
for predicting the Total UPDRS and Motor UPDRS clinical scales for Parkinson’s
disease. The findings showed that the suggested method is effective in forecasting
PD development by lowering calculation time and enhancing efficiency.
As per the literature review, many researchers addressed the problem in detecting
Parkinson’s disease using state-of-art and advanced Machine Learning techniques
[40]. This paper focuses on designing and developing a hybrid classifier by combining
Artificial Neural Network (ANN) with ML classifier to detect PD using speech
features.
Application of ANN Combined with Machine Learning … 43
3 Methodology
With the main objective of detecting the Parkinson’s disease, usually a neurolo-
gist may suggest Single-Photon Emission Computerized Tomography (SPECT) scan
named Dopamine Transporter scan (DaTscan) [41]. Despite the fact that this helps to
strengthen the suspicion of having Parkinson’s disease, the symptoms, neurologic,
and physical testing will eventually lead to the proper diagnosis. Early detection
helps patients to receive the effective treatment and slow down the progression of
disease by providing symptomatic relief. To detect Parkinson’s disease in early stage,
an automated PD classification model using neuronal fuzzy classifier is explored in
this paper.
The architecture of the proposed neuronal fuzzy inference classification system
is presented in Fig. 1. The dataset is parted into 80% training and 20% testing data.
Correlation Coefficient is employed to eliminate the irrelevant features in the data as
a preprocessing step [42]. The filtered data is passed on to proposed neuronal fuzzy
classifier for training the model. The trained model is tested for its performance using
the testing data in classifying healthy and PD patients.
The proposed neuronal fuzzy classifier is built up by 5 layers as shown in Fig. 2.
The hybrid architecture is piled up with input layer of size 139 × 1, four hidden
layers of Mnodes each and ReLU as activation function. All the hidden layers are
dense which are fully connected to each other. The features extracted using the hidden
layers are passed on to final layer for classification which is Random Forest classifier.
Nfeatures ∗ 2
Mnodes = +2 (1)
3
The neurons in each hidden layer are fired upon satisfying the ReLU activation
function, where the input to activation function is calculated as follows:
Σ
n
h θ (x) = wi xi + bias = w1 x1 + w2 x2 + bias (3)
i=1
where wi is the weight of the ith neuron, xi signifies the input at neuron i, bias makes
sure that the boundaries do not pass through the origin.
4 Experimental Setup
The entire experiment was carried out on a (64-bit) Windows 10 Operating System
with an Intel Core i7 processor running at 2.20 GHz, 16GB of RAM, and a 2TB
hard drive. The platform is setup with Anaconda supported by Machine Learning
and deep learning packages using Python as programming interface.
The UCI Machine Learning Repository provided the dataset “Parkinson’s Disease
Classification” [43]. The dataset consists of 754 characteristics and 756 samples from
188 individuals, with 564 samples from Parkinson’s disease patients and 192 samples
from healthy controls. The dataset description is shown in Table 2.
The proposed neuronal fuzzy classifier is assessed for its performance by surveying
different evaluation metrics precision, MSME, recall, RSME, F1-Score, and accuracy
explained as follows:
Tpositive
Precision = (4)
Fpositive + Tpositive
1 Σ( )2
n
MSE = Yi − Ŷi (5)
n i=1
Tpositive
Recall = (6)
Fnegative + Tpositive
⎡
⎟ n ( )2
⎟1 Σ
RMSE = √ Yi − Ŷi (7)
n i=1
2 × Precision × Recall
F1 − Score = (8)
Precision + Recall
Tpositive + Tnegative
Accuracy = (9)
Tpositive + Fpositive + Tnegative + Fnegative
where Tpositive refers to the samples classified as PD patient correctly, Fpositive repre-
sents the samples classified as PD patient incorrectly, Tnegative refers to the samples
classified as healthy correctly and Fnegative defines the samples classified as healthy
incorrectly, yi is the actual output and, ŷi is the predicted output of the classifier and
n refers to total number of samples. The evaluation metrics of the proposed Neuronal
Fuzzy Inference Classifier are shown in Fig. 3.
The related work differentiation extracted from accuracies of PD classification
is displayed in Table 3. Accuracy of the model during training and testing stage is
presented in Fig. 3.
The related work comparison derived from accuracies of kidney tumor classifi-
cation is presented in Table 3. The model accuracy during training and testing phase
is shown in Fig. 4.
Parkinson’s disease is the second most prevalent neurodegenerative illness for aging
and movement disorder. Less production or loss of transmitter dopamine is the under-
lying cause of PD. Neurologists face a difficult challenge in diagnosing Parkinson’s
46 B. Uppalapati et al.
120
98.65 95.36 94.65 96.23
100
Scores 80
60
40
20 14.24
2.02
0
Precision Recall F-Score Accuracy MSR RMSR
Evaluation metrics
Table 3 Accuracy
Paper, Year Dataset used Classifier Accuracy (%)
comparison for various
classifiers [32], 2018 Telemonitoring Deep Neural 81.66
voice dataset Network
[28], 2020 MR image Alexnet 88.90
dataset
[34], 2019 Parkinson’s SMOTE and RF 94.89
speech dataset
[7], 2019 Parkinson’s XGBoost 95.39
voice dataset
[31], 2021 Voice dataset ANN 95.89
This paper Parkinson’s Neuronal fuzzy 96.23
speech dataset classifier
disease (PD) at an early stage before the condition progresses. This research tests an
automated end-to-end classification approach for early identification of Parkinson’s
disease. The Voice-based Parkinson’s disease Classification dataset is used in this
approach. A hybrid neuronal fuzzy classifier is developed by combining Artificial
Neural Network and Random Forest classifier for classification of healthy and PD
patient class. The proposed neuronal fuzzy classifier is compared with GB, SVM,
KNN, DT, and RF for its performance analysis. Among all the compared classi-
fication algorithms, the hybrid neuronal fuzzy classifier achieved superior accu-
racy of 96.23% in classifying Healthy and PD patients. Design and implementa-
tion of different architectures using deep learning techniques and surveying large
volume datasets for improved classification performance in detecting PD patients
are considered as future work.
References
13. Vidya Sagar Appaji S, Lakshmi PV (2020) Maximizing joint probability in visual question
answering models. Int J Adv Sci Technol 29(3):3914–3923
14. Madhusudhana Rao TV, Latha Kalyampudi PS (2020) Iridology based vital organs malfunc-
tioning identification using machine learning techniques. Int J Adv Sci Technol 29(5):5544–
5554
15. Delaville C, Deurwaerdère PD, Benazzouz A (2011) Noradrenaline and Parkinson’s disease.
Front Syst Neurosci 5:31. https://doi.org/10.3389/fnsys.2011.00031
16. Bhat S, Rajendra Acharya U, Hagiwara Y, Dadmehr N, Adeli H (2018) Parkinson’s disease:
cause factors, measurable indicators, and early diagnosis. Comput Biol Med 102
17. Srinivasa Rao P, Krishna Prasad PESN (2017) A secure and efficient temporal features based
framework for cloud using MapReduce. In: 17th international conference on intelligent systems
design and applications (ISDA 2017), vol 736. Springer, pp 114–123. ISSN 2194-5357 Held
in Delhi, India, December 14–16, 2017
18. Lauzon FQ (2012) An introduction to deep learning. In: 2012 11th international conference on
information science, signal processing and their applications (ISSPA), pp 1438–1439. https://
doi.org/10.1109/ISSPA.2012.6310529
19. Vásquez-Correa JC, Arias-Vergara T, Orozco-Arroyave JR, Eskofier B, Klucken J, Nöth E
(2019) Multimodal assessment of Parkinson’s disease: a deep learning approach. IEEE J
Biomed Health Inform 23(4):1618–1630. https://doi.org/10.1109/JBHI.2018.2866873
20. Krishna Prasad MHM, Thammi Reddy K (2014) A efficient data integration framework in
Hadoop using MapReduce. Published in Computational Intelligence Techniques for Compar-
ative Genomics, Springer Briefs in Applied Sciences and Technology, pp 129–137. ISSN:
2191-530X
21. Wodzinski M, Skalski A, Hemmerling D, Orozco-Arroyave JR, Nöth E (2019) Deep learning
approach to Parkinson’s disease detection using voice recordings and convolutional neural
network dedicated to image classification. In: 2019 41st annual international conference of the
IEEE engineering in medicine and biology society (EMBC), pp 717–720. https://doi.org/10.
1109/EMBC.2019.8856972
22. Kaur S, Aggarwal H, Rani R (2020) Hyper-parameter optimization of deep learning model for
prediction of Parkinson’s disease. Mach Vis Appl 31. https://doi.org/10.1007/s00138-020-010
78-1
23. Vadaparthi N, Srinivas Y (2014) A novel clustering approach using Hadoop distributed
environment. In: Applied science and technology, vol 9. Springer, pp 113–119. ISSN:
2191-530X
24. Walczak S (2018) Artificial neural networks. In: Mehdi Khosrow-Pour DBA (ed) Encyclopedia
of information science and technology, 4th edn. IGI Global, pp 120–131. https://doi.org/10.
4018/978-1-5225-2255-3.ch011
25. Wingate J, Kollia I, Bidaut L, Kollias S (2019) A unified deep learning approach for prediction
of Parkinson’s disease
26. Maram B, Gopisetty GKD (2019) A framework for data security using cryptography and image
steganography. Int J Innov Technol Explor Eng (IJITEE) 8(11). ISSN: 2278-3075
27. Mostafa SA, Mustapha A, Mohammed MA, Hamed RI, Arunkumar N, Ghani MKA, Jaber
MM, Khaleefah SH (2019) Examining multiple feature evaluation and classification methods
for improving the diagnosis of Parkinson’s disease. Cogn Syst Res 54. ISSN 1389-0417
28. Sivaranjini S, Sujatha CM (2020) Deep learning-based diagnosis of Parkinson’s disease using
convolutional neural network. Multimedia Tools Appl
29. Senturk ZK (2020) Early diagnosis of Parkinson’s disease using machine learning algorithms.
Med Hypotheses 138
30. Celik E, Omurca SI (2019) Improving Parkinson’s disease diagnosis with machine learning
methods. In: Scientific meeting on electrical-electronics & biomedical engineering and
computer science (EBBT)
31. Pahuja G, Nagabhushan TN (2021) A comparative study of existing machine learning
approaches for Parkinson’s disease detection. IETE J Res
Application of ANN Combined with Machine Learning … 49
32. Grover S, Bhartia S, Akshama, Yadav A, Seeja KR (2018) Predicting severity of Parkinson’s
disease using deep learning. Proc Comput Sci 132
33. Berus L, Klancnik S, Brezocnik M, Ficko M (2019) Classifying Parkinson’s disease based on
acoustic measures using artificial neural networks. Sensors 19:16. https://doi.org/10.3390/s19
010016
34. Polat K (2019) A hybrid approach to Parkinson disease classification using speech signal:
the combination of SMOTE and random forests. In: 2019 scientific meeting on electrical-
electronics & biomedical engineering and computer science (EBBT), pp 1–3. https://doi.org/
10.1109/EBBT.2019.8741725
35. Haewon B (2020) Development of a depression in Parkinson’s disease prediction model using
machine learning. World J Psychiatry 10:19
36. Wang W, Lee J, Harrou F, Sun Y (2020) Early detection of Parkinson’s disease using deep
learning and machine learning. IEEE Access 8
37. Lahmiri S, Shmuel A (2019) Detection of Parkinson’s disease based on voice patterns ranking
and optimized support vector machine. Biomed Signal Process Control 49
38. Shahid AH, Singh MP (2020) A deep learning approach for prediction of Parkinson’s disease
progression. Biomed Eng Lett
39. Nilashi M, Ibrahim O, Samad S, Ahmadi H, Shahmoradi L, Akbari E (2019) An analyt-
ical method for measuring the Parkinson’s disease progression: a case on a Parkinson’s
telemonitoring dataset. Measurement 136
40. Bheemavarapu P, Latha Kalyampudi PS, Madhusudhana Rao TV (2020) An efficient method
for coronavirus detection through X-rays using deep neural network. J Curr Med Imag [online
Available with ISSN: 1875-6603]
41. Hustad E, Aasly JO (2020) Clinical and imaging markers of prodromal Parkinson’s disease.
Front Neurol 11. https://doi.org/10.3389/fneur.2020.00395
42. Latha Kalyampudi PS, Swapna D (2019) An efficient digit recognition system with an
improved preprocessing technique. In: ICICCT 2019—system reliability, quality control,
safety, maintenance and management. Springer Nature Singapore, pp 312–321
43. https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification
People Count from Surveillance Video
Using Convolution Neural Net
Abstract People counting is used to count the quantity of people in the picture.
People counting is not an easy task if it is done manually by our hand because we
can lost count in the middle of doing this laborious task, especially when dealing
with object that intersects with each other or dense crowd. This project automates the
counting process by building a machine learning system that can convert a video into
frames, then the model will output number of objects in a particular frame. We built
the model using convolutional neural network (CNN) technique. The system that
we built is capable of counting pedestrians in a mall. The frames/images are gener-
ated from CCTV that is placed somewhere in the mall. From those frames/images,
the system will output how many pedestrians at that particular place in the mall.
VGG16 is used to excerpt the topographies of the image and structural similarity
index (SSIM) for measuring the similarity among the given images. Then, use the
similarity measure as a loss function, named Euclidean and local pattern consistency
loss. The experimental results show the predicted number of people and exact number
of people in the image with 90% of accuracy using convolution neural net.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 51
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_5
52 L. Lakshmi et al.
1 Introduction
Counting how many objects in an image is essential for various industries and
researchers. Some of the use cases are (1) monitoring high traffic roads, or public
places. (2) Preventing people from entering the forbidden, or dangerous places. (3)
Giving information about the favorite spots where a high number of people gather.
This shows the usefulness of crowd counting in real life. People counting [1] is not an
easy task if it is done manually by our hand because we can lose count in the middle
of doing this laborious task, especially when dealing with object that intersects with
each other or dense crowd.
To solve this problem, we can automate [2] the counting process by building a
machine learning system that can receive an image that contains objects that we want
to count as an input, and then, the model will output number of objects in that image.
The system that we build is capable of counting pedestrians in a mall. The images [3]
are generated from CCTV that is placed somewhere in the mall. From those images,
the system will tell us how many pedestrians at that particular place in the mall. We
built the model using convolutional neural network (CNN) technique.
People counting is not an easy task. The objective of the project is to count [4]
the total number of people per frame which is obtained from a video, i.e., CCTV
footage. This will help us to control the people entering into heavy traffic areas and
monitoring high-traffic roads, or public places. There are so many others use cases [5]
of crowd counting that are not mentioned here. This shows the usefulness of crowd
counting in real life. The model takes a video and converts into frames/images. The
images are trained using a machine learning model and predict [6] the quantity of
individuals in every frame/image.
2 Literature Review
They are numerous algorithms for people count as the applications [11] range from
handling emergency situations in high-rise buildings to environmental conditions.
They have performed a comparative study on various algorithms that are used in
people counting in surveillance videos which are used for gradient-based methods,
differentiating frames, and transforming circular frames.
A huge number scenarios we can come across in real life where there is need to
detect and count the number of people in surveillance videos. Even though there are
people counting systems and detection systems available but are some challenges
in accurately predicting the number of people in real time scenarios. They worked
on segmentation of group of people [12] into individuals and track them over a
period of time. The literature survey we have conducted on various techniques used
for counting the number of people in a surveillance video over a period of time.
They have used different regression models and classification models. Now in our
proposed model, we are using convolution neural networks as progression of deep
learning in various applications, the convolution neural nets have shown tremendous
performance in various application.
3 Dataset
The dataset used in the model is shown in Fig. 1 which is set of images that are
generated from a single CCTV that is placed somewhere in a mall of the same spot,
which contains pedestrians who walks around the CCTV. Each image has different
number of persons. The images are generated from video with given time rate, i.e.,
frame rate. The video is of 100 s, and frame rate is 0.1 s, i.e., 1000 images/frames are
generated in the .jpg format from the video. The 1000 images extracted from video
are used as dataset.
4 Proposed Methodology
The architecture of the proposed model is shown in Fig. 2, where ‘CONV a’ denotes
convolution with kernel size a × a, and ‘CONV T’ means transposed convolutional
layer. The structure of VGG16 shown in Fig. 3 consists of five convolution blocks
with ReLU activation function is created, and batch normalization layer inside this
block to reduce internal covariate shift of the model is added. This will cause the
model to train faster. Average pooling layer is to reduce the feature map dimensions is
also added to make the model much easier to train. Then, dropout layer for preventing
overfitting of the model. Finally, global max pooling is to reduce the depth of the
feature map is added and connected it to final activation ReLU layer. Mean squared
error loss for this task and Adam method to perform the optimization is used. Also
mean absolute error for the metrics is used.
54 L. Lakshmi et al.
As the dataset consists of set of images generated from surveillance video. Suppose
an image with a size of 96 × 128 is given to VGG16, the following steps are
performed.
1. First, the first ten layers of VGG16 are used for extracting the features of the
image. After features with the first ten layers of VGG16 are extracted, we got an
output with size 24 × 32 which is a quarter of the original size.
2. Second, we fed our output from VGG16 to four filters with different sizes. We
did not use the pooling layer in the set of convolutional layers, because using
many pooling layers will affect the loss of spatial information from the feature
map.
3. Third, we used feature enhancement layer, where we concatenate our output
(x_conct, contain 4 filter from the previous layers) and used flatten + MLP with
softmax function in the output layer to get the weight for each input filters. Thus,
the model will learn to give a high weight for the filter which best represents the
image.
4. Last, we need to upsampling the image with size 24 × 32 to 96 × 128. Transposed
convolutional layer for the upsampling method is used. Concatenated the x_conct
with the filter that generated by convolution, the weighted x_conct is also used.
In the last layer, we set the filter size with 1, and that filter represents the predicted
density map.
For every convolutional layer, batch normalization and ReLU activation functions
are used. Local pattern consistency loss and structural similarity index (SSIM) are
calculated as follows
1. Structural similarity index measures the similarity between two images. Then,
we use similarity measure as a loss function named local pattern consistency loss
with Euclidean loss.
2. For each predicted density map and actual density map, the similarity between
a small patch of the image with a Gaussian kernel of 12 × 16 as the weight is
counted.
3. The model loss function is defined as follow: Euclidean loss (MSE) + alpha *
local pattern consistency loss.
4. Local pattern consistency loss helps model to learn similarity between small
patches of the image.
56 L. Lakshmi et al.
5 Results
The convolution neural network VGG16 is used to train the model. The dataset of
images generated from surveillance video is divided into training and test dataset.
70% of data is used as training data, and 30% is used as test data. We have taken a
random image and given as input to count the number of people. Figure 4 represents
a random image given to model to count the number of people.
The model is evaluated by various convolution neural networks like VGG16,
ResNet50, and ResNetV2 with activation functions as ReLU and ELU (Table 1).
We have evaluated our proposed model with different optimizers and activation
functions. Loss and accuracy graphs of ResNetV2 optimizer with ELU and ReLU
activation functions are shown in Fig. 5. We have evaluated the model with 20 epochs,
attained train and test accuracy as 92 and 91% subsequently which is sensibly good.
Similarly, loss and accuracy graphs of ResNet50 optimizer with ELU and ReLU
activation functions are shown in Fig. 6. We have evaluated the model with 20
epochs, attained train and test accuracies as 100 and 92 subsequently but ResNet50
performed well when compared ResNetV2. Similarly, loss and accuracy graphs of
VGG16 optimizer with ELU and ReLU activation functions are shown in Fig. 7. We
Table 1 Comparison of loss and accuracy for VGG16, ResNet50, and ResNetV2 with activation
functions as ReLU and ELU
Activation Optimizer Train loss (%) Test loss (%) Train acc. (%) Test acc. (%)
ELU ResNetV2 59 58 92 91
ReLU ResNetV2 6 6 78 91
ELU ResNet50 4 19 100 92
ReLU ResNet50 1 1 100 100
ELU VGG16 2 3 100 100
ReLU VGG16 3 2 100 100
People Count from Surveillance Video Using Convolution Neural Net 57
Fig. 5 Performance of ResNetV2 optimizer with ReLU and ELU activation functions
Fig. 6 Performance of ResNet50 optimizer with ReLU and ELU activation functions
have evaluated the model with 20 epochs, attained train and test accuracies as 100
and 100 subsequently. The performance of VGG16 is superior when compared with
ResNetV2 and ResNet50.
6 Conclusion
The advancement of deep learning techniques used for people count in surveil-
lance videos has shown significant impact in various applications with respect to
various domains. In our proposed system, we have used ResNetV2, ResNet50, and
58 L. Lakshmi et al.
Fig. 7 Performance of VGG16 optimizer with ReLU and ELU activation functions
VGG16 optimizers with ELU and ReLU activation functions. The structural simi-
larity index (SSIM) is used to measure the similarity between two images. Trans-
posed convolutional layer is used for the upsampling method rather than the conven-
tional upsampling method. The proposed system shows superior performance using
VGG16 model in terms of train and test accuracy in counting the number of people in
surveillance video. The future work of this application includes to detect the people
directly from ongoing CCTV footage; thus, we can predict the total number of people
in the mall as dividing video into frames may have same number of people in 5–10
frames if the person is at same place for so long.
References
1. Kowcika A (2017) People count from the crowd using unsupervised learning technique from
low resolution surveillance videos. In: 2017 international conference on energy, communica-
tion, data analytics and soft computing (ICECDS), August 2017, pp 2575–2582. https://doi.
org/10.1109/ICECDS.2017.8389919
2. CrowdNet | Proceedings of the 24th ACM international conference on Multimedia. https://doi.
org/10.1145/2964284.2967300. Accessed 9 Dec 2021
3. Pervaiz M, Jalal A, Kim K (2021) Hybrid algorithm for multi people counting and tracking
for smart surveillance. In: 2021 International Bhurban conference on applied sciences and
technologies (IBCAST), Jan 2021, pp 530–535. https://doi.org/10.1109/IBCAST51254.2021.
9393171
4. Pervaiz M, Ghadi YY, Gochoo M, Jalal A, Kamal S, Kim D-S (2021) A smart surveillance
system for people counting and tracking using particle flow and modified SOM. Sustainability
13(10), Art no. 10. https://doi.org/10.3390/su13105367
5. Park JH, Cho SI (2021) Flow analysis-based fast-moving flow calibration for a people-counting
system. Multimed Tools Appl 80(21):31671–31685. https://doi.org/10.1007/s11042-021-112
31-1
6. Lakshmi L, Reddy MP, Santhaiah C, Reddy UJ (2021) Smart phishing detection in web pages
using supervised deep learning classification and optimization technique ADAM. Wirel Pers
Commun 118(4):3549–3564. https://doi.org/10.1007/s11277-021-08196-7
People Count from Surveillance Video Using Convolution Neural Net 59
7. Conte D, Foggia P, Percannella G, Tufano F, Vento M (2010) A method for counting moving
people in video surveillance videos. EURASIP J Adv Signal Process 2010(1), Art no. 1. https://
doi.org/10.1155/2010/231240
8. Agustin OC, Oh B-J (2012) People counting using object detection and grid size estimation.
In: Communication and networking. Berlin, Heidelberg, pp 244–253. https://doi.org/10.1007/
978-3-642-27192-2_29
9. Lefloch D, Alaya Cheikh F, Hardeberg J, Gouton P, Picot-Clemente R (2008) Real-time people
counting system using a single video camera. In: Proceedings of SPIE, pp 6811. https://doi.
org/10.1117/12.766499
10. Alekya L, Lakshmi L, Susmitha G, Hemanth S (2020) A survey on fake news detection in
social media using deep neural networks 9(03):4
11. Raghavachari C, Aparna V, Chithira S, Balasubramanian V (2015) A comparative study of
vision based human detection techniques in people counting applications. Proc Comput Sci
58:461–469. https://doi.org/10.1016/j.procs.2015.08.064
12. Liu X, Tu PH, Rittscher J, Perera A, Krahnstoever N (2005) Detecting and counting people in
surveillance applications. In: Proceedings of IEEE conference on advanced video and signal
based surveillance. Como, Italy, pp 306–311. https://doi.org/10.1109/AVSS.2005.1577286
Detection of Pneumonia and COVID-19
from Chest X-Ray Images Using Neural
Networks and Deep Learning
1 Introduction
Pneumonia is an infection which causes the air sacs in lungs to get filled up with fluid
or pus, causing cough, difficulty in breathing, and various other breathing problems.
Pneumonia can be caused due to a variety of organisms like bacteria, virus, and
fungi. Children under the age of 2, adults over the age of 65, people who have
received/receiving chemotherapy are considered to be in the high-risk zone, and it
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 61
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_6
62 J. S. Nimbhorkar et al.
could be life threatening for people in these age groups if pneumonia is not detected
and diagnosed in early stages. Study shows that in India around 4 lakh children die
every year due to pneumonia which is almost equal to fifty percent of pneumonia-
related deaths that are caused in India. Vaccines are available to prevent some types
of pneumonia to a large extent. There are various steps that a doctor might follow to
diagnose pneumonia; (1) first check about the person’s medical history or smoking
habits and check for some crackling or bubbling sound when the person inhales. (2)
The blood tests are taken to check if he/she has signs of bacterial infection. (3) A
pulse oximeter is used to measure the level of oxygen in the blood. (4) CT scan which
gives a more detailed image of the person’s lungs which is used to detect if a person
is suffering from pneumonia or not.
Recently, there have been a lot of advancements made in the field of neural
networks, deep learning, and especially in medical imaging using CNN to solve
real-world problems. Models based on CNN have been extensively used in the recent
past to classify tumors, segment images, and detection of abnormalities. By training
a CNN model, it is possible to detect the abnormalities like missing tissues and weak
diaphragm, etc. Even experienced radiologists may take a lot of time to observe these
minute details and sometimes may even miss these details which in turn leads to delay
in diagnosis and treatment. So these automated models will be really helpful to give
fast and accurate results and would also provide aid in areas which have limited
availability to skilled radiologists. Training models from scratch require a high-
computational cost. Hence, we emphasize to adopt the method of transfer learning.
We have contributed to the existing work by executing different pre-trained models
on three similar datasets with different classes in order to get a generalized solution.
The short computational characteristic of transfer learning becomes apparent when
we observe the large reduction in parameters.
The remainder sections of this paper are as follows: The literature review is
explained in Sect. 2. Various CNN architectures are discussed in Sect. 3. The proposed
CNN model is explained in Sect. 4. The experimentation and results are discussed
in Sect. 5 and the conclusion is given in Sect. 6.
2 Related Work
Artificial neural network model is used to extract knowledge and can be used to
identify redundant inputs and outputs and also for the analysis of behavior of hidden
neuron [1]. The authors have used InceptionV3, Xception, and ResNeXt models to
perform image classification of COVID-19, normal, and pneumonia images on a
single dataset [2]. One dense layer with 256 neurons was added to Xception and
Inception, and 126 neurons were added to the extra dense layer in ResNeXt.
LeakyReLU was used as the activation function instead of the originally used
ReLU function. Convolution neural network (CNN) is improved by reducing param-
eters in [3]. It is used to detect pneumonia. Retinal disease is identified using deep
learning models with transfer learning [4]. The author has used Inception-ResNetV2,
Detection of Pneumonia and COVID-19 from Chest X-Ray Images … 63
Xception, DenseNet201, and VGG19 for pneumonia detection [5]. The input tensors
for these models were reduced. A basic CNN, VGG16, VGG19, and InceptionV3
were executed on a pediatric pneumonia dataset [6]. The convolutional layers of
the pre-trained models were frozen during training. All the models had an accu-
racy above 97%. Tuberculosis disease is classified using deep learning models with
transfer learning [7] on chest X-ray images.
In [8], a modified VGGNet was used to categorize chest X-ray images into four
different categories, namely COVID-19, bacterial pneumonia, viral pneumonia, and
normal X-ray. In order to obtain a higher classification rate, three different pooling
layers are used. In [9], the authors developed an algorithm from scratch using deep
convolutional neural networks. They included three convolutional layers with a
ReLU activation for each layer. For classification of emotions, fully connected layers,
softmax, and classification output layers have been used.
Deep learning models are used to identify tumor cells [10]. The authors have
used a deep learning-based model to recognize and characterize the inconsistencies
in a given chest X-ray sample and classify them as unaffected, COVID affected,
or pneumonia [11]. Support vector machine (SVM) is used to identify tumor from
X-ray images [12].
3 CNN Architectures
In this paper, we have proposed a CNN model with four convolutional layers as
shown in Fig. 1. The first layer is with 32 filters, second and third with 64 filters, and
fourth with 128 filters. The size of the filter is 3 × 3. Input size of the image is 150
× 150 × 3. The activation function used is ReLU. The max pooling layer of 2 × 2
size is implemented after each convolutional layer to reduce the spatial dimensions
of the output volume. The hidden layer has 64 neurons, and the output layer is the
last one.
64 J. S. Nimbhorkar et al.
5.1 Dataset
In this experiment, three publicly available data from Kaggle are used. Pneumonia
chest X-ray [17] dataset is comprised of 5863 X-ray images divided into 2 cate-
gories (pneumonia/normal). COVID-19 and pneumonia [18] dataset contain a total of
6432 X-ray images with 20% data as test images divided into 3 categories (COVID-
19, pneumonia, normal). COVID-19 radiography database [19] dataset has 3616
COVID-19 positive cases, 10,192 normal, 6012 lung opacity (non-COVID lung
infection), and 1345 viral pneumonia images.
The experiment is conducted using the online Kaggle Nvidia P100 GPU with 16 GB
memory and 1.32 GHz memory clock on all the three datasets. AlexNet, InceptionV3,
ResNet50, VGG19, and our CNN model are executed using following parameters as
shown in Table 1.
5.3 Results
In this section, Table 2 shows the precision, recall, and accuracy of all the five models
on different datasets by varying the parameters.
It is observed from the results in Table 2 that Adam optimizer is giving better
accuracy compared to SGD across all the datasets. Hence, we have implemented our
proposed CNN model with Adam optimizer with 20 epochs on all the three datasets.
The result of our proposed CNN model is shown in Table 3.
We first built a CNN model from scratch whose architecture is as shown in Fig. 1.
The accuracy obtained when we executed the model on the dataset [17], dataset
[18], and dataset [19] are 93.58%, 94.25%, and 84.32%, respectively. On the first
dataset [17], we implemented three pre-trained models (InceptionV3, ResNet50 and
VGG19), one non-pre-trained model (AlexNet). As this dataset contains only two
classes, binary sigmoid is used as the output function, and binary cross entropy is used
as the loss function. First, the original versions of these four models were executed.
The optimizer along with its different parameters are taken into consideration for
all the four models according to [13–16]. The results are displayed in Fig. 2. It is
observed from the graphs that our proposed model outperforms all the other models.
The accuracy, precision, and recall are computed using Eq. 1, 2, and 3, respectively.
The original models of ResNet50 [15] and Inceptionv3 [16] do not contain any
fully connected dense layers. Hence, we fine-tuned these two models by adding
hidden layers to get better results. A general and more widely used rule of thumb is
that the hidden layer should have less nodes than the input layer (usually capturing
70–90% of the variance in the input, according to [1]). Taking this into consideration,
one hidden layer with 2048 neurons are added initially. The results are shown in Table
2. Comparing the results from Table 2, the accuracy of InceptionV3 reduced by less
than 1% but that of ResNet50 increased by around 5%. Taking into consideration the
above-mentioned rule of thumb, we executed the same models but with 1024 neurons
in the hidden layer. As seen in Table 2, this gave better results than the previous
InceptionV3 and ResNet50 models. Hence, we proceeded with this architecture by
adding one more hidden layer with 512 neurons. This structure gave the best results
Table 2 Precision (P), recall (R), and accuracy (A) for dataset1
66
1
0.8
0.6
0.4
0.2
0
AlexNet InceptionV3 ResNet50 VGG19 Our CNN model
with a great increase in accuracy for ResNet50. Lastly, the 1024 + 1024 structure
was executed which did not give satisfactory results compared to the 1024 + 512
structure.
All these models were executed on the other two datasets as well. As there are
more than two classes, softmax was used as the output function, and categorical cross
entropy was used as the loss function for these two datasets. The precision and recall
for all the classes in each dataset are displayed in Table 2.
Out of all the models, VGG19 with Adam optimizer was the best model for the
first dataset, InceptionV3 with one dense layer (2048 neurons) was the best model
for the second dataset, and the original model of InceptionV3 with Adam optimizer
was the best model for the third dataset.
There were few observations that were similar across all the three datasets.
. There was a decrease in loss and a great increase in accuracy when the optimizer
was changed from stochastic gradient descent to Adam in VGG19 (4.3% increase
in the first dataset, 8.5% increase in the second dataset, and 34.5% increase in the
third dataset).
. The best performance of ResNet50 was obtained when two hidden layers with
1024 and 512 nodes were added.
. Out of the all the models, the performance of ResNet50 was the least.
In [11], two different datasets were used. The first dataset was processed into
an improved LeNet model’s input. Only transfer learning was used to experiment
on this dataset. On the second dataset, hyperparameters were adjusted many times
without changing the network structure, and the results between them are compared.
In [5], the authors implemented Inception-ResNetV2, Xception, and DenseNet201
on a single dataset. Only the method of transfer learning was used. In [6], the authors
implemented VGG16, VGG19, and InceptionV3 on a single dataset. Two dense
layers of 32 nodes each were added. In our paper, we experimented on three datasets
70 J. S. Nimbhorkar et al.
Table 4 Difference between actual and trainable parameters for all the models
Models Actual parameters Trainable parameters Difference between actual and
trainable parameters
AlexNet 58,290,945 58,288,193 2752
InceptionV3 21,804,833 2049 21,802,784
ResNet50 23,589,761 2049 23,587,712
VGG19 13,957,437 119,549,953 20,024,384
using transfer learning with different pre-trained models like AlexNet, InceptionV3,
ResNet50, and VGG19. Our own CNN model was also built from scratch, and all
the results are compared. The architectures of the models are tweaked, and dense
layers with a varying number of nodes were added which resulted in the reduction
of trainable parameters. The reduction in parameters is shown in Table 4.
Adam is a combination of the best properties of AdaGrad and RMSProp algo-
rithms. According to [11], Adam is straightforward to implement, computationally
efficient and has less memory requirements. It performs best when there are a large
number of parameters. Hence, the combination of Adam with VGG19 gave a great
increase in accuracy as VGG19 has the most number of parameters compared to the
other models used.
The concept of skip connections makes the architecture of ResNet50 [15] different
from the other models. It uses shortcut connections to solve the vanishing gradient
problem. As a result of this architecture, the accuracy of this model would have been
compromised.
6 Conclusion
Pneumonia and COVID-19 have been a major cause for a large number of deaths
across various age groups. Early detection has been proved to provide aid for
faster diagnosis and treatment. In this paper, we have implemented transfer learning
methods on existing CNN models to acquire faster results. The results were
compared between AlexNet (non-pre-trained), InceptionV3, ResNet50, VGG19, and
our proposed CNN model across three datasets. For datasets 1 and 2, the proposed
CNN model achieved the highest accuracy, while for dataset 3, it was the original
InceptionV3 model with Adam, but with minor difference.
References
1. Boger Z, Guterman H (1997) Knowledge extraction from artificial neural network models. In:
IEEE international conference on systems, man, and cybernetics. Computational cybernetics
and simulation, vol 4, pp 3030–3035. https://doi.org/10.1109/ICSMC.1997.63305
Detection of Pneumonia and COVID-19 from Chest X-Ray Images … 71
2. Jain R, Gupta M, Taneja S, Hemanth D. Deep learning based detection and analysis of COVID-
19 on chest X-ray images applied intelligence. Oct 1–11. PMCID: PMC7544769
3. Li X, Chen F, Hao H, Li M (2020) A pneumonia detection method based on improved convo-
lutional neural network. In: 2020 IEEE 4th information technology, networking, electronic and
automation control conference (ITNEC), pp 488-493. https://doi.org/10.1109/ITNEC48623.
2020.9084734
4. Kermany DS et al (2018) Identifying medical diagnoses and treatable diseases by image-based
deep learning. Cell 172(5):1122-1131.e9. https://doi.org/10.1016/j.cell.2018.02.010 PMID:
29474911
5. Jiang Z (2020) Chest X-ray pneumonia detection based on convolutional neural networks.
In: 2020 international conference on big data, artificial intelligence and internet of things
engineering (ICBAIE), Fuzhou, China, pp 341–344. https://doi.org/10.1109/ICBAIE49996.
2020.00077
6. Labhane G, Pansare R, Maheshwari S, Tiwari R, Shukla A (2020) Detection of pediatric
pneumonia from chest X-Ray images using CNN and transfer learning. In: Proceedings of
the 3rd international conference on emerging technologies in computer engineering: machine
learning and internet of things (ICETCE), Jaipur, India, 7–8 Feb 2020, pp 85–92
7. Seshu Babu G, Sachin Saj TK, Sowmya V, Soman KP (2021) Tuberculosis classifica-
tion using pre-trained deep learning models. In: advances in automation, signal processing,
instrumentation, and control, select proceedings of i-CASIC 2020, 2021, pp 767–774
8. Anand R, Sowmya V, Vijay krishnamenon, Gopalakrishnan EA, Soman KP (2021) Modified
Vgg deep learning architecture for Covid-19 classification using bio-medical images, IOP Conf
Ser Mater Sci Eng 1084:012001
9. Palaniswamy S, Suchitra (2019) A robust pose & illumination invariant emotion recognition
from facial images using deep learning for human-machine interface. In: 2019 4th international
conference on computational systems and information technology for sustainable solution
(CSITSS), pp 1–6. https://doi.org/10.1109/CSITSS47250.2019.9031055
10. Subbiah U, Kumar RV, Panicker SA, Bhalaje RA, Padmavathi S (2020) An enhanced deep
learning architecture for the classification of cancerous lymph node images. In: 2020 second
international conference on inventive research in computing applications (ICIRCA), pp 381–
386. https://doi.org/10.1109/ICIRCA48905.2020.9183250
11. Kishore SLS, Sidhartha AV, Reddy PS, Rahul CM, Vijaya D (2021) Detection and diagnosis
of covid-19 from chest X-ray images. In: 2021 7th international conference on advanced
computing and communication systems (ICACCS), pp 459–465. https://doi.org/10.1109/ICA
CCS51430.2021.9441862
12. Pooja A, Mamtha R, Sowmya V, Soman KP (2016) X-ray image classification based on tumour
using GURLS and LIBSVM. Int Conf Commun Signal Processing (ICCSP’16)
13. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional
neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
14. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception
architecture for computer vision. IEEE Conf Comput Vision Pattern Recognit (CVPR)
2016:2818–2826. https://doi.org/10.1109/CVPR.2016.308
15. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE
Conf Comput Vision Pattern Recognit (CVPR) 2016:770–778. https://doi.org/10.1109/CVPR.
2016.90
16. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. CoRR, abs/1409.1556
17. Chest X-ray images (Pneumonia). https://www.kaggle.com/paultimothymooney/chest-xray-
pneumonia
18. Chest X-ray (COVID-19 and Pneumonia). https://www.kaggle.com/prashant268/chest-xray-
covid19-pneumonia
19. COVID-19 Radiography Database. https://www.kaggle.com/tawsifurrahman/covid19-radiog
raphy-database
Plant Leaf Disease Detection
and Classification Using Deep Learning
Technique
Abstract Food is the main resource for humans; securing and taking care of the
plants are the number one priority. Raises in crop leaf disease are becoming a major
problem in agriculture. Taking care of the disease in the early stage will prevent
the disease from spreading between plants. Modern technology will be the way for
the detection of crop leaf disease. The deep learning technology made it easier to
detection of crop leaf diseases. The dataset used for training is publicly available. The
trained model can classify up to 15 diseases. The training accuracy reached 97.35%
which is more than enough to detect disease accurately. The proposed project can
detect crop leaf disease with higher accuracy which can be utilized to detect the
disease in the real world.
1 Introduction
The modern technology helped humans in useful ways like boosting the yield of
the crops, providing a new way of harvesting the crops. The technology provided a
way to grow crops so much that they can be fed to almost 7 billion people on the
earth. Crop diseases due to changes in external factors such as a change in climate
condition, decrease in pollination, increase in the spread of disease, and so on. By
controlling one of these external factors can help in increasing the yield of crops.
The main factor that can be controlled is plant disease. Detecting the disease in an
S. S. Bhoomika (B)
Department of Information Science and Engineering, GM Institute of Technology, Shimoga,
Karnataka, India
e-mail: [email protected]
K. M. Poornima
Department of Computer Science and Engineering, JNN College of Engineering, Shimoga,
Karnataka, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 73
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_7
74 S. S. Bhoomika and K. M. Poornima
early stage can help in saving the crops. More than 80% of the harvesting of crops in
developing countries is taken care of by the holders of small farmers. The statistical
analysis shows a loss of yield of more than 50% which is not acceptable in modern
days. The loss of yield is majorly due to the spread of diseases between leaves. The
disease is affecting major smallholder farmers. The farmers are trying their best to
minimize the crop disease and spreading of disease between plants. The farmers are
using pesticides and other preventive measures to mitigate the problem. But, these
are not effective against leaf disease. Because the disease has to be analyzed first
before applying pesticides. Wrong pesticides on the wrong plant are not effective
and also can affect the crop yield. In rural areas, most of the people are working in
the agricultural field sector. In plants, a leaf is the main part to produce food. The
major factor to decrease the yield by attacking the diseases on leaf, stems, and nodes
of the plant. So, it is required to identify the diseases in an early stage to prevent
losses.
Identification of disease on plant is very difficult due to the variety of diseases
spanning across different crops. There are many agriculture centers where they can
analyze the disease and provide the correct pesticides. Analyzing disease is a major
challenge since the crop has to be analyzed manually. The analysis of disease will be
based on the knowledge and experience of the persons and may vary between them.
This is where modern technology comes into play, which can detect the disease type
within seconds by make use of deep learning. Deep learning technology is being
applied in many fields and is proven to provide good and acceptable results.
The problem statement implements the deep learning technique to detect and
classify the leaf disease in plant in an accurate way. The main objectives are as
follows:
• To extract the features from the leaf.
• To detect the type of leaf disease.
• To classify the type of leaf disease.
2 Literature Survey
Sardogan et al. [1] have proposed a classification of leaf diseases on plant using
CNN and the LVQ algorithm. It extract a features from the image during the training
process which means that features are extracted during each epoch of the training.
In the proposed system, an algorithm called LVQ is utilized for the classification of
data and also for training purposes. The proposed system achieves 86% of accuracy.
Thejuswini et al. [2] have proposed the detection of leaves disease with fertilizer
recommendation. The system utilizes k-means clustering with SVM algorithm for
classification and detection of disease. The system reaches around 80% of accuracy.
Jasim et al. [3] have proposed the detection of plant leaf diseases using image
processing and deep learning techniques. Convolutional neural networks are imple-
mented to detect the diseases. The classification is performed based on the inputs
Plant Leaf Disease Detection and Classification Using Deep … 75
from the previous layer. The system obtains 98.29% of training accuracy and 98.02%
of testing accuracy.
Sholihati et al. [4] have proposed a classification of potato leaves diseases. The
system uses the neural network model called VGG16 and VGG19. In this proposed
system, deep learning technology is utilized which consists of convolutional neural
networks. A VGG16 model is slightly better than the VGG19 architecture model.
The proposed system achieved a maximum accuracy of 91%.
Karol et al. [5] have proposed using convolutional neural networks and image
processing techniques for detect plant disease. The proposed system database is
used to store the pesticides for the corresponding detected pests and diseases. A
layers residing in the convolutional neural network are dense, dropout, activation,
flatten, Convolution2D, MaxPooling2D. The model provides an accuracy of 78%.
Haridas et al. [6] proposed diagnosis and severity measurement of tomato leaf
disease. The different types of the algorithm used to analyze the performance are
linear regression analysis (LDA), KNN, SVM, Naïve Bayes, decision tree. Based
on the final result, it is concluded that the support vector machine performed better
compared to other algorithms.
Rajesh et al. [7] proposed the classification of leaf disease using a decision tree.
The image is then refined with a refinement filter. To distinguish between healthy and
diseased leaves, decision tree classifiers are applied. Finally, trees can’t be segmented
to particular nodes with insufficient supporting data.
Kumari et al. [8] proposed the detection of the leaf using a k-means clustering
algorithm and ANN to classification. Use GLCM to generate statistics (features).
To classify the data, a backpropagation neural network was deployed. After the
network has been trained, it displays the performance plot, confusion matrix, and
error histogram plot. The proposed system detects leaf disease with 92% accuracy.
Robert et al. [9] implemented deep learning-based automated image capturing
system detecting and recognizing tomato plant leaf disease. The network model
used on a proposed system was Alexnet. The faster R-CNN model is used, and this
model is trained up to 50 epochs with modification to the fully connected layer.
A Web server is implemented which runs the Webpage, and this webpage can be
accessed by the user to view the resulting output image. The proposed system works
very well and has an accuracy of 91.6%.
Figure 1 describes the system architecture for the training phase. It consists of
image acquisition, preprocessing, and feature extraction using CNN. Finally, trained
datasets are stored in the model. Training is the process of generating a trained model
file based on the given input. In a developed system, a dataset was split into a training
and a testing. 80% dataset is used for training purposes.
Figure 2 describes the architecture of a system for testing the leaf disease on
a plant. The proposed system composed of image acquisition, preprocessing, and
76 S. S. Bhoomika and K. M. Poornima
Feature
Training Extraction
Dataset using CNN
Feature
Detect Disease Extraction
using CNN
feature extraction using CNN. Testing is a process of checking whether the model is
trained perfectly or not. During the testing phase, the trained model will go through
rigorous checking by giving a testing data set to the model. The remaining 20% is
used for testing purposes.
Image acquisition is an initial stage of a process. The dataset of the leaf disease
images is collected from the Web site. The dataset includes leaf diseases images
of three major crops they are pepperbell, potato, and tomato. The collected dataset
contains 15 different classes.
Plant Leaf Disease Detection and Classification Using Deep … 77
Preprocessing is a process for modifying the raw input image before passing it to the
learning algorithm. The input image was resized to fit the network. A selected input
image was resized to 256 × 256 pixels to fast the training process and generate a
model that can be tested realistically.
The CNN is a best approach in a deep learning technique which contains multiple
layers used to train the dataset in a faster manner. In the proposed system, the structure
of CNN consists of 8 layers; they are as follows:
• Input layer
• Convolutional layer
• Pooling layer
• Normalization layer
• Nonlinear layer
• Fully connected layer
• Dropout layer
• Softmax layer
The input images and the pixel values of input images were stored in an input layer.
where
nh is a size of height
nw is a size of width
nc is a number of channels
This layer develops the tensor of the output by convolving a layer input with the
convolution kernel. A stride refers to the kernel’s sliding size. The stride value is set
to 2.
A pooling layer works on the feature map to reduce the minimum value. In the neural
network, the position of the pooling layer comes after the layer of convolution. The
pooling layer mainly performs two functions: The first function is to minimize the
parameters present in the feature map and also in weights. The second function is to
control the overfitting of the model. Max pooling is used. The pooling layer mainly
chooses a maximum value of the region in a feature map which is covered using a
filter. Normalization Layer The proposed system uses a batch normalization layer.
The batch normalization layer will perform normalization of the layer in batches. In
batch, normalization is applied to each of the feature maps by performing mean and
variance. The formula of normalizing is defined in Eq. 2;
xi − μ B
xi = / (2)
σ B2 + ∈
where
μB is a mean of the input,
xi is the instance of a input data,
σ B2 is the standard deviation of a input,
∈ is a smoothing term.
y = max(0, x) (3)
Plant Leaf Disease Detection and Classification Using Deep … 79
A fully connected layer will perform a connection to every layer, which means that
all previous layers will be interconnected in a neural network of a convolutional layer.
The fully connected layer uses matrix form to perform the calculation. Using matrix
form, the activation is multiplied using an offset of a bias. The matrix multiplication
computed by the fully connected layer helps to classify the given image.
This system nullifies the given data by masking data from the neurons, and nonmodi-
fied data will be left behind. The function of the dropout layer sets some of the values
to 0 based on the frequency set during each epoch of the training period, which will
help prevent models from overfitting.
The softmax layer is a last layer in the CNN. Using a softmax function, correct
disease is predicted. The function of the softmax is to convert the value of the vector
of a k real values to the sum of the values when computed will be a 1. It is necessary
to use the softmax function in CNN for the classification of a network.
Algorithm
The steps of CNN algorithm are as follows:
Step 1 : Input image.
Step 2 : Feature extraction using convolutional operation.
Step 3 : Max pooling layer to reduce the feature map.
Step 4 : Batch normalization normalizes the mini-batch mean and standard
deviation.
Step 5 : Rectified linear unit creates the activation map.
Step 6 : Fully connected layer connects the neurons.
Step 7 : Softmax function detects and classifies the disease.
80 S. S. Bhoomika and K. M. Poornima
In the proposed system, a village dataset is used. This dataset is trained, and a model
file is generated. Using this trained model file, the system can classify up to three
types of plants leaf diseases; they are tomatoes, peppers, and potatoes. The user
interface library used in this project is the TK interface in short it is also called a
Tkinter. The TK interface library is very easy to use where one can place buttons or
edit textboxes or images.
In our project, we used village dataset collected from kaggle.com. The dataset
comprises of 900 images of three types of leaf plants they are pepperbell, potato, and
tomato, each image is size of 256 × 256 pixels. The dataset consists of 15 different
classes of leaf disease. In 900 images, 750 images were selected to training, and 150
leaf images were used to detect the diseases in the plant. From each class, 50 images
of leaf are considered for training. For testing, 10 leaf images are selected from each
class.
Figure 3a shows the original, and Fig. 3b shows the resized image of the tomato
leaf disease. Figure 3c shows layer_1 image of the convolutional layer of tomato
leaf. In layer1, the filter is applied to produce 128 × 128 pixels. Figure 3d shows the
layer_2 image, and Fig. 3e shows the layer_3 image of the convolutional layer of
tomato leaf. Layer_2 produces 64 × 64 pixels, and layer_3 produces 32 × 32 pixels.
Figure 3f shows the visual image of a pooling layer, and Fig. 3g shows the visual
image of a normalization layer.
In a pooling layer, max pooling is used. The batch normalization is applied in
the normalization layer. Figure 3h shows the visual image of the tomato leaf of the
nonlinear layer, and Fig. 3i shows the visual image of the tomato leaf of dropout layer.
Figure 3j shows a visual image of tomato leaf of softmax layer in CNN. Figure 3k
shows the detection and classification of disease on tomato leaf. The predicted disease
is tomato bacterial spot.
Table 1 shows testing of pepperbell leaf. For training, 100 images are taken. For
testing the pepperbell leaf disease, 20 images are taken. The bacterial spot and healthy
leaf of the pepperbell are detected. In 10 images, 8 images correctly classify the
disease of pepperbell leaf that is a bacterial spot, and the remaining 2 images predicted
false. All 10 images of the healthy leaf of pepperbell were classified correctly. So,
out of 20 images, 19 images are correctly classified. The recognition rate is 90%.
Table 2 shows the testing result of potato leaves. For training 150 images and for
testing 30 images are taken. In potato leaf, 2 diseases are detected that are early blight
and late blight. Also, a healthy leaf of the potato is identified. In potato early blight,
9 images correctly predict the disease, and the remaining 1 leaf image predicted
false. In potato late blight, all 10 images of leaf correctly detect the disease. In potato
Plant Leaf Disease Detection and Classification Using Deep … 81
a) b) c) d)
e) f) g)
h) i) j)
k)
Fig. 3 a Original tomato leaf image, b Resized leaf image, c Image of convolutional layer_1, d
Image of convolutional layer_2, e Image of convolutional layer_3, f Image of pooling layer, g Image
of normalization layer, h Image of nonlinear layer, i Image of dropout layer, j Image of softmax
layer, k Detection and classification of tomato leaf disease
82 S. S. Bhoomika and K. M. Poornima
healthy leaf, all 10 input images are predicted correctly. So, out of 30 images, 29
images are correctly classified, and the recognition rate is 96.67%.
Table 3 shows the testing result of tomato leaves. For training 500 images and
for testing 100 images are taken. In the tomato leaf, 9 disease leaves and 1 healthy
leaf are detected. In tomato bacterial spots, 8 images detect the disease correctly,
and the remaining 2 images produce the wrong prediction. All 10 images of tomato
early blight, target spot, late blight, septoria leaf spot, spider mite, and healthy leaf
predict correct classification. In tomato leaf mold and yellow leaf curl virus, 9 images
classified the correct diseases, and the remaining 1 image wrongly predict the disease.
So, out of 100 images, 94 images are recognized correctly. The recognition rate is
94%.
5 Conclusion
The identification of leaf diseases is very important for the successful cultivation of
crops, and the system is implemented using deep learning technique. It identifies
the diseases of pepper, tomato, and potato leaves. The trained model can classify up
to 15 different classes of disease. It detects both healthy and diseased leaves. The
system is developed to detect and classify the type of diseases in an accurate way
using deep learning technique. The convolutional neural network algorithm is used
to classify the different types of leaf diseases of a plant. The proposed method gives
the accuracy of 97.35%.
References
1. Sardogan M, Tuncer A, Ozen Y (2018) Plant leaf disease detection and classification based on
CNN with LVQ algorithm. In: Proceeding of third international conference on computer science
and engineering, pp 382–385, Bosnia and Herzegovina, 20–23 Sept 2018
2. Indumathi R, Saagari N, Thejuswini V, Swarnareka R (2019) Leaf disease detection and fertilizer
suggestion. In: Proceeding of third international conference on systems computation automation
and networking, pp 1–7, India, 2930 Mar 2019
3. Jasim MA, AL-Tuwaijari JM (2020) Plant leaf diseases detection and classification using image
processing and deep learning techniques. In: Proceeding of international conference on computer
science and software engineering, pp 259–265, Ira, 16–18 Apr 2020
4. Sholihati RA, Sulistijono IA, Risnumawan A, Kusumawati E (2020) Potato leaf disease classi-
fication using deep learning approach. In: Proceeding of international electronics symposium,
pp 392–397, Indonesia, 29–30 Sept 2020
5. Karol AMA, Gulhane D, Chandi T (2019) Plant disease detection using CNN and remedy. Int
J Adv Res Electr Electron Instrum Eng 08(3):622–628, India, Mar 2019
6. Gadade HD, Kirange DK (2020) Tomato leaf disease diagnosis and severity measurement. In:
Proceeding of fourth world conference on smart trends in systems, security and sustainability,
pp 318–323, UK, 27–28 Jul 2020
7. Rajesh B, Vishnu Sai Vardhan M, Sujihelen L (2020) Leaf disease detection and classification
by decision tree. In: Proceedings of fourth international conference on trends in electronics and
informatics, pp 705708, Tirunelveli, India, 15–17 June 2020
8. Kumari CU, Prasad SJ, Mounika G (2019) Leaf disease detection feature extraction with K-
means clustering and classification with ANN. In: Proceedings of third international conference
on computing methodologies and communication, pp 1095–1098, India, 27–29 Mar 2019
9. de Luna RG, Dadios EP, Bandala AA (2018) Automated image capturing system for deep
learning-based tomato plant leaf disease detection and recognition. In: Proceeding of tenth
international conference of electrical and electronics engineers, pp 1414–1419, Korea, 28–31
Oct 2018
Breast Mass Classification Using
Convolutional Neural Network
Varsha Nemade, Sunil Pathak, Ashutosh Kumar Dubey, and Deepti Barhate
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 85
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_8
86 V. Nemade et al.
1 Introduction
Now a day’s breast cancer is the most perilous disease for women. According to report
of Indian Council for Medical Research (ICMR) 2019, new breast cancer cases in
India up to 1.5, out of which every year 70,000 succumb due to delays in prediction
and treatment [1]. This shows that cases in India are increasing. This rate can be
reduced by awareness of early detection [2]. Imaging techniques such as Mammog-
raphy, Magnetic Resonance Imaging (MRI), Ultrasound, and tomosynthesis, used
for breast cancer study and diagnosis. Mammogram is one the imaging method used
by many people for early diagnosis of breast cancer. For each breast two views are
produced such as Mediolateral Oblique (MLO) and Craniocaudal (CC) by a radiolo-
gist. Early diagnosis through mammogram screening increases the chance of survival
[3]. Diagnosis depends on the radiologist experience and expertise. In some surveys
it has been found that error in diagnosis may increase the cost of surgeries [4].
Benign and malignant are two main classes of cancers, in this study mammogram
images are used for classification of breast cancer into these two classes. Computer
Aided Diagnosis (CAD) helps to provide the second option for radiologist for taking
the decision from mammogram image interpretations. Deep learning, particularly
with CNN, is widely used technique in medical image analysis [5]. There are several
algorithms have already been used for the breast cancer and other cancer detection
[6–10] but there is the scope of improvement in this area.
The work is organized as follows: related work on breast mass categorization is
discussed in Sect. 2 followed by the proposed model in Sect. 3. Observations are
discussed in Sect. 4 which is followed by conclusions of the work in Sect. 5.
2 Related Works
Computer Aided Diagnosis (CAD) systems with computer technology are used to
detect anomalies in mammograms, which helps to increase accuracy. Figure 1 shows
the difference between conventional ML and DL techniques. Recently, many tech-
niques for classification of masses have been proposed. Traditional machine learning
approach from input image features need to be extracted and then classifier is applied.
Beura et al. [11] used 2-D discrete orthonormal S-transform (DOST) for extraction
of features using the AdaBoost algorithm with RF as base classifier and achieved
accuracy 98.3% and 98.8%, respectively, on MIAS and DDSM dataset. Li et al. [12]
used DDSM dataset and performed classification by using contour features and got
best accuracy 99.66% with SVM. Mughal et al. [13] described method for detection
of tumor from breast masses and classified it into a class of normal or abnormal,
benign or malignant by using the combination of Hat transformation and GLCM
with the back propagation network. Khan et al. [14] proposed different techniques
for mass classification from mammograms by using Gabor feature extraction and
for classification used SELwSVM. Textural features are extracted from by using
Breast Mass Classification Using Convolutional Neural Network 87
contourlet transform and GLCM and used SVM and KNN for classification and
achieved accuracy 94.12% and 88.89%, respectively, for MIAS dataset [15].
In deep learning features are automatically learned and used for classification
[16–22], there is no need of handcrafted features. Suzuki et al. [23] shows result by
using deep convolutional network (DCNN) trained by transfer learning and achieved
89.9% sensitivity. Ribli et al. [24] described the approach Faster RCNN for detecting
and classifying masses in mammograms, with an Area under curve of 0.95. Wang
et al. [25] works on a hybrid deep network using a Recurrent Neural Network (RNN)
to sort out features from multi-view data on the BCDR dataset and achieved an AUC
of 0.89. Al-Masni et al. [26] described method for finding and classifying masses in
DDSM dataset using ROI-based CNN—YOLO model and achieved classification
accuracy 97%. On the DDSM dataset, Al-Antari [27] et al. suggested a system that
combined a deep learning YOLO detector with an InceptionResNetV2 classifier and
achieved 97.50% accuracy. Gnanasekaran et al. [28] proposed CNN model with 8
convolutional layers, 4-max pooling and 2-fully connected layer and applied it on
MIAS and DDSM dataset and got accuracies 92.54 and 96.47, respectively.
Deep learning CNN provides the ability for automatically learns feature for clas-
sification problems which overcomes the problem of handcrafted features such as
feature extraction and selection. Deep learning has consistently shown improved and
accurate performance. This feature of DL motivated us to explore DL for breast
cancer analysis. In this we designed a CNN model with five layers of convolutional,
five layers of max pooling, four layers of dropout and two layers of fully connected
of classification mammogram images using breast image dataset DDSM.
3 Proposed Methodology
The publically available mammogram datasets DDSM and CBIS-DDSM are used.
DDSM dataset contains more than 2600 images with normal, benign and malig-
nant cases. This dataset has CC and MLO views of breast images. This dataset
is collaborative effort from Massachusetts General Hospital, University of South
Florida, Sandia National Laboratories. CBIS-DDSM dataset is a subset and updated
version of DDSM dataset that containing ROI segmentation and bounding boxes and
pathologic diagnosis data. Negative images from DDSM and positive images from
CBIS-DDSM are taken. On extracted ROI preprocessing is applied by using random
flips and rotations and then resizes them.
Deep learning shows improvement in medical images [29]. Figure 2 shows the new
proposed model. It has convolutional layer, max pooling layer and dropout layer.
It contains five layers of convolutional, five layers of max pooling, four layers of
dropout and two fully connected layer. Model used batch size of 32 with dropout
rate 0.20. Convolutional layer uses 3 × 3 kernels, ‘Relu’ as activation function. Max
pooling is used with pool size (2, 2) and stride used is ‘same’. Figure 3 shows detailed
model summary, with details of layers, output shape and parameters.
architecture applied to the dataset created by using images from DDSM and CBIS-
DDSM dataset. The classification report shows that recall of class 1 (Malignant) is
very low, this is because of imbalanced dataset. The dataset has a smaller number of
malignant records. To overcome the problem of imbalanced dataset can be overcome
by K-fold cross validation in future work.
90 V. Nemade et al.
5 Conclusions
CBIS-DDSM dataset. The CNN model used in this paper has four different layers.
Output taken from the last layer with fully connected, soft max layer. This model
produces accuracy 89.46%. For the enhancement in performance of the proposed
CNN model one can add extension of K-fold cross validation. Also, to improve
accuracy and recall we can use some other techniques to balanced dataset.
References
19. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A
(2017) Deep learning: a primer for radiologists. Radiographics 37(7):2113–2131
20. Platania R, Shams S, Yang S, Zhang J, Lee K, Park SJ (2017) Automated breast cancer diagnosis
using deep learning and region of interest detection (BC-DROID). In: Proceedings of the ACM
international conference on bioinformatics, computational biology, and health informatics,
ACM, pp 536–543
21. Wang J, Ding H, Bidgoli FA, Zhou B, Iribarren C, Molloi S, Baldi P (2017) Detecting
cardiovascular disease from mammograms with deep learning. IEEE Trans Med Imaging
36(5):1172–1181
22. Patil S, Kirange DK, Nemade V (2020) Predictive modelling of brain tumor detection using
deep learning. J Crit Rev 7(4):1805–1813
23. Suzuki S, Zhang X, Homma N, Ichiji K, Sugita N, Kawasumi Y, Ishibashi T, Yoshizawa M
(2016) Mass detection using deep convolutional neural network for mammographic computer-
aided diagnosis. In: 2016 Annual conference of the society of instrument and control engineers
of Japan, Sep 20, IEEE, pp 1382–1386
24. Ribli D, Horváth A, Unger Z, Pollner P, Csabai I (2018) Detecting and classifying lesions in
mammograms with deep learning. Sci Rep 8(1):1–7
25. Wang H, Feng J, Zhang Z, Su H, Cui L, He H, Liu L (2018) Breast mass classification via deeply
integrating the contextual information from multi-view data. Pattern Recogn 1(80):42–52
26. Al-Masni MA, Al-Antari MA, Park JM, Gi G, Kim TY, Rivera P, Valarezo E, Choi MT,
Han SM, Kim TS (2018) Simultaneous detection and classification of breast masses in digital
mammograms via a deep learning YOLO-based CAD system. Comput Methods Programs
Biomed 157:85–94
27. Al-Antari MA, Han SM, Kim TS (2020) Evaluation of deep learning detection and classification
towards computer-aided diagnosis of breast lesions in digital X-ray mammograms. Comput
Methods Programs Biomed 1(196):105584
28. Gnanasekaran VS, Joypaul S, Sundaram PM, Chairman DD (2020) Deep learning algorithm
for breast masses classification in mammograms. IET Image Proc 14(12):2860–2868
29. Bhatt C, Kumar I, Vijayakumar V, Singh KU, Kumar A (2020) The state of the art of deep
learning models in medical science and their challenges. Multimedia Syst 25:1–5
Deep Generative Models Under GAN:
Variants, Applications, and Privacy
Issues
Abstract Deep learning has lately acquired a lot of attention in machine learning
because of its capacity to train features and classifiers at the same time, resulting in
a significant boost in accuracy. To attain a high level of accuracy, the models require
huge amounts of data and processing capacity, both of which are now available due
to the advancements in big data, Internet of Things, and cloud computing. Even
though, some applications like medical diagnosis, image recognition, and biometric
authentication faces the problem of data scarcity which affects the predictive analytics
of deep learning. In order to tackle the issue, deep generative models like Generative
Adversarial Networks (GAN) come into existence that are capable of artificially
generating synthetic data for specific problems. In this article, various models of
GAN and their applications were explored and a comparison of the models were
also given. As the data increases, another issue faced by the applications is of data
privacy. With rising privacy concerns, more priority has to be given for privacy issues
while developing intelligent applications. GAN and its variants are nowadays used
as an attacker as well as a defender against various privacy risks which were also
presented in this review. As a future work, GANs potential to solve the issues of data
privacy and security has to be deeply explored.
1 Introduction
Deep structural learning or deep learning is a part of machine learning based on the
concepts of artificial neural networks. It differs from the traditional machine learning
in their feature learning techniques that allows the machine to automatically discover
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 93
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_9
94 R. Raveendran and E. D. Raj
the patterns which replaces the manual feature engineering process to perform a
specific task. To achieve a high level of accuracy, the models need access to massive
amounts of data and processing power. With the recent advancements in the field of
deep learning and Internet of Things, the amount of data arises to a large extent as
a lot of new devices are employed in various fields for better communication. Deep
neural learning creates complicated statistical models utilizing its own repetitive
output from enormous amounts of unlabeled, unstructured data, resulting in accurate
predictive models. All forms of big data analytics applications, particularly those
focused on NLP, language translation, medical diagnostics, stock market trading
signals, network security, and image identification, are now using deep learning.
Even though it has gained tremendous achievements, some applications like medical
diagnosis, image recognition, and biometric authentication face the problem of data
scarcity and data privacy. Insufficient data affects the predictive analytics of deep
learning techniques, and privacy issues affects the improvement of model robust-
ness by preventing the sharing of data. One way to tackle the unavailability of data
is to artificially generate synthetic data for the specific problem. Synthetic datasets
are automatically generated by extracting the statistical properties of features from
the original dataset, which increases the performance of algorithms and allows for
more generic models to be created. Nowadays, the two class of algorithms Genera-
tive Adversarial Networks (GAN) [1] and Variational Autoencoder’s (VAE) [2] have
gained importance due to their generative properties for creating sample data. Exten-
sive research and development have been done on these models, and many synthetic
data architectures have been built using these core methods, for generating images,
audio, tabular, and textual data.
However, data shortage can be solved by the generative modeling techniques,
more focus has to be given for preserving the data privacy. Traditionally, the machine
learning models for intelligent applications have been done by uploading the data
from all the connected devices to a centralized server in the cloud environment to
train a generic model. Since the clients here are distributing the data, there arises
a chance of data leakage which results in privacy concerns and even regulatory
and judicial issues. Machine learning has presented a new concept of collaborative
decentralized learning to address the problem in which the model is learnt locally
using the client’s real data and then disseminated to a remote/global server without
exposing the original sensitive data. Recent advancements in distributed learning
and Generative Adversarial Networks are emerging as a new solution for most of the
challenges faced for the processing of the heterogeneous data available from different
edge devices, which results in efficient communication and predictive models.
networks which was first introduced by Goodfellow et al. [1]. GAN can be consid-
ered as a generative model that can be applied to unsupervised and semi-supervised
learning tasks. Due to its generative capability, the foremost issue faced by the deep
learning methods of over fitting of data can be solved to a great extent. The basic
architecture and the objective function of GAN model are explained in the below
section.
In Generative Adversarial Networks, the model uses two neural networks; a generator
G and a discriminator D. The two models compete with each other as a min–max
game. Generators are responsible for generating synthetic data which is used by GAN
for training along with the real data, whereas discriminators are networks which
act as binary classifier that distinguish between true and fraudulent data. The two
models work like a min–max optimization formulation. The generator is attempting
to mislead the discriminator by producing more realistic data while focusing on
minimizing the objective function, while the discriminator is attempting to maximize
the objective function in order to detect the fake one. Here a loss function is used as
an objective function which is backpropagated to increase the accuracy of the model.
This ability makes the model act as a supervised model which helps in most of the
classification and regression tasks. Figure 1 shows the architecture of basic GAN.
The generator tries to generate fake but realistic data, while the discriminator tries to
tell the difference between artificial data (generated by the generator) and genuine
data. The discriminator is specified as D(x) →[0, 1], while the generator G transfers
a random vector z in the latent space to synthetic data: G(z) →x (i.e., close to 0).
The following is the objective or loss function that was used to train the network.
[ ] [ ]
minG max D V (D, G) = E x∈X log D(x) + E z∈Z log(1 − D(G(Z ))) (1)
where X denotes the set of actual images and Z denotes the latent space. The above
loss function (1) is referred to as the adversarial loss. The generator tries to minimize
the loss function while the discriminator tries to maximize it.
DCGAN Deep convolutional GAN uses convolutional neural network (CNN) for
generator and discriminator. This is considered as the first structure using de-
convolutional neural networks (de-CNN) as discriminator for stabilizing the GAN
training. Along with this, a newly proposed class of constraints has been added to this
network which includes batch normalization and Leaky ReLU and ReLU activation
functions in generator and discriminator, respectively.
Deep Generative Models Under GAN: Variants, Applications… 97
LapGAN Laplacian GANs are composed of a cascade of CGAN using the Lapla-
cian Pyramid framework with K levels. The model incorporates a variety of GAN
processes that generate various levels of image details in LP representation. Each
generation is distinct from the others. The key modification for GAN is the LapGAN,
which up-scales the low resolution input image to a higher-resolution output image in
a coarse-to-fine pattern, resulting in more photo-realistic images than regular GAN.
The coarse-to-fine approach makes the model computationally expensive, and the
convergence rate will be slower for deep LapGAN.
InfoGAN Information maximizing GAN, a variation of CGAN, learns interpretable
and meaningful representations of disentangled design in an unsupervised way.
The regularization term in this model maximizes the similarity measure between
a predefined small subset of latent random variables and the observations.
EBGAN Energy-based GAN uses a combination of AutoEncoder (AE) and GAN
frameworks. Instead of utilizing a probability function to detect actual and fake data
as in the original GAN, the EBGAN discriminator uses an energy function, with low
energy indicating real data and high energy indicating fake data. Both G and D are
trained using two different losses in this model.
WAGAN Wasserstein GAN (WGAN) is a loss function variant of GAN model which
uses Earth Mover (EM) or Wasserstein distance as cost function. Using this loss
function, the GAN’s vanishing gradient problem is avoided, and the mode collapse
impediment for stabilizing GAN training is partially removed. Similarly, WGAN-GP
was created by adding a gradient penalty (GP) term to the discriminator to improve
GAN training stability, resulting in high-quality samples with better convergence
rate than WGAN.
PROGAN Progressively growing GAN emphasizes on a multi-scale generating
process in which both the generator and the discriminator begin training with low
98 R. Raveendran and E. D. Raj
resolution images (4 × 4), slowly increasing the depth by adding new layers, and
eventually producing high resolution images (1024 × 1024). In comparison with
existing non-progressive GANs, the model enhances quality, stability, and variation.
However, because to the uneven training of the generator and discriminator, which
creates comparable samples, it is still not pleased with the mode collapse problem.
BigGAN Because to its vast scale, indistinguishable, and high-quality image produc-
tion capability, BigGAN has become one of the best models. The model surpasses
large computational models with more parameters in terms of output control and
interpolation phenomena between pictures. BigGAN has limited data augmentation
capacity in large size datasets, despite its great performance in huge and high-fidelity
diversified image production. Also, it can’t repeat the outcomes from scratch without
sufficient data.
StyleGAN StyleGAN, an improved version of ProGAN, that relies on the gener-
ator network to enable reasonable control over the specific features of the generated
image. The model uses an unconventional GAN architecture and Adaptive Instance
Normalization, which scales normalized input with style spatial statistics, to control
the correlation between input features such as coarse features (hair, face, pose, and
shape), medium features (facial features, eyes), and fine scheme without compro-
mising the high quality. StyleGAN2 comes up with an enhancement in normaliza-
tion used in the StyleGAN’s generator, which improves the image quality, efficiency,
diversity, and disentanglement.
MSGGAN Multi-scale gradients networks is inspired from ProGAN but here the
network is not trained progressively instead all the layers are trained at the same
time.
Other Models Least square GAN (LS-GAN) and unrolled GAN (UGAN) are loss
function variants models of GAN that has been introduced to solve the vanishing
gradient and mode collapse problems, respectively. Another model loss sensitive
GAN (LS-GAN) produces realistic samples by reducing the margins between real
data distribution and the generated sample distribution. Traditional CNN-based
GAN’s have difficulty in learning multi-class images which can be solved by a
self-attention mechanism which results in the development of SAGAN. CycleGAN
is a very popular architecture for image to image transformation. BicycleGAN an
enhanced version of CycleGAN used for multimodal image to image translation.
Comparative analysis of the models based on the architecture, learning method,
and merits and demerits are mentioned in Table 1.
3.2 Applications
Due to various advancement in the GAN architecture, it has been widely used in
various domains like computer vision, medical diagnosis, cyber security, and natural
Table 1 Comparison of GAN models
Variants Learning Architecture Optimizer Activation Merits Demerits
GAN Unsupervised MLP SGD Sigmoid function Can generate different Harder to train Suffers
versions of the text, video, from vanishing gradient
and audio and mode collapse
CGAN Supervised MLP SGD ReLU Prevent mode collapse and Better performance for
produce high-quality only labeled dataset
images
DCGAN Unsupervised CNN SGD Leaky ReLU Steadier in terms of The misclassification rate
generating higher quality is higher than other
samples and training GAN-based models
InfoGAN Unsupervised MLP Adam Optimizer Discriminator: Leaky It learns latent variables It gives better
ReLU Generator: ReLU without labels in the data performance
when data is not very
complex and small in size
Deep Generative Models Under GAN: Variants, Applications…
CycleGAN Unsupervised CNN Adam Optimizer ReLU and Leaky ReLu Better for paired image to Poor performance if
image translation tasks substantial geometric
changes to the images
BigGAN Supervised and Deep CNN Adam Optimizer Discriminator: Leaky Capable of generating large It can’t repeat the
unsupervised ReLU Generator: ReLU and high-quality images. outcomes from scratch
Suitable for large neural without sufficient data
networks
99
100 R. Raveendran and E. D. Raj
language processing. This section details some of the related works in different
domains. Synthetic data generation or data augmentation capability of GAN has made
incredible development in the fields facing data scarcity such as medical diagnosis,
target detection, satellite imaging, etc. It can be used for generating various data
sources like images, videos, audios, and structural data in [3].
GAN were used for synthetic image generation with CNN classifier used as
discriminator for classification of polyps into benign and malignant. WGAN [4]
which is more effective that GAN have been used for multi-classification of cancer
stages with DNN classifier as discriminator. Inspired from the CGAN, a new model
Conditional SinGAN [5] which is a combination of SinGAN and CGAN have been
proposed for generating constrained multi-target scene images which makes the
images more realistic in spatial layout and semantic information and also improves
the controllability of generated images. Deep residual GAN [6] can be used for image
denoising and defogging applied to both grayscale and color images. It keeps the
main features without loss of perceptual details. Rather than learning clean images
from noisy images as in traditional approaches, the complex-valued convolutional
neural network (CVMIDNet) proposes [7] residual learning, which studies noise
from noisy images and then removes it from noisy images to generate clean images.
The method has shown high accuracy with chest X-ray images and can also be applied
to MRI and CT images. FA-GAN [8] and Res-WGAN [9] have been proposed to
generate super resolution images from low resolution image to reduce the scanning
time effectively. Other than the medical images in [10], the authors have presented
a new GAN framework for the reconstruction of satellite images which are of low
resolutions. The term “image in-painting” refers to the approximate replacement
of a picture’s missing pixels. It is a sophisticated reconstruction technique used in
photo and video editing software. For the same, Exemplar GAN (Ex-GAN) [11]
is employed. Another image-painting model that achieves good results by mixing
local and global data is PGGAN. Photo editing, computer-aided design, and image
synthesis are just a few of the uses for text to image generation in computer vision.
Attentional GAN (AttnGAN) [12], text ACGAN (TAC-GAN), and (KD-GAN) [13]
have proposed for text to image manipulation. GAN has also been used in music
generation, dialog systems, and machine translations. [14] introduced a ranker GAN
used for high-quality language (sentence) generation. Another method which inte-
grates VAE and WGAN called (VAE-WGAN) has been used for voice conversion. In
addition to that, GAN has also utilized for music generation by creating continuous
sequential data.
GANs models have been used in a variety of video applications, including future
frame prediction, video retargeting, and learning disentangled image representations
from video, in addition to image and audio.
Deep Generative Models Under GAN: Variants, Applications… 101
4 GANs in Privacy
With rising privacy concerns among the individuals, resisting security and privacy
risks have become a top priority while developing applications that share private data
like medical image and record analysis, street-view image sharing, face recognition,
and biometric authentication. Various GAN models can be used to investigate privacy
concerns without making any assumptions. The models can be employed to launch
an attack or protect against powerful adversaries. In the attack model, the generator
will take on the role of an attacker to deceive the discriminator, whereas in the
defend model, the generator will take on the role of a defender to counterattack a
powerful attacker. GAN-based privacy issues can be related to data utilization and
model design as shown in Fig. 3.
Image data, speech data, video data, textual data, graph data, and spatio-temporal
data [15–17] are the six types of data that GAN can safeguard. On the one hand,
the generator is created to hide private information and/or trained to generate data
that is privacy-preserving by one or more discriminators. The discriminator, on the
other hand, ensures data similarity so that the created privacy-preserving data can
be utilized in real applications while remaining difficult for attackers to distinguish
from genuine data.
Face and medical images [18, 19] that focus on a single object, as well as street-
view images that deal with several objects, contain a variety of sensitive informa-
tion, resulting in privacy leaks, and hence have gotten a lot of research interest.
Different GAN techniques for anonymous text synthesis and privacy-preserving
public/medical records have been presented for textual data. The works on speech
data focuses on remote health monitoring and voice assistance in IoT systems. Due
to the popularity of edge computing devices, GPS data need to be collected from
IoT devices which includes user’s sensitive information that need to be protected.
Various GAN approaches used for data privacy are summarized in Table 2.
If an adversary utilizes the model output to deduce the private features of the data used
to train the model, data breaches can occur not only through data but also through
learning models.
5 Future Works
6 Conclusion
Generative Adversarial Network has been widely used in various domains due to its
generative capability that makes it effective in overcoming data scarcity as well as
privacy issues. The review of current GAN models demonstrates GAN’s creative
contributions in a variety of disciplines like image processing tasks, audio and
video synthesis, textual data synthesis as well as graphical data synthesis. Other
than these use cases, we have also reviewed various GAN models addressing the
privacy issues both in the case of centralized and decentralized learning systems of
machine learning.
References
1 Introduction
In 2019, PAN laboratories organized celebrity profiling challenge [1]. The celebrity
profiling task predicts gender, fame, birth year, and occupation of the celebrities. The
gender has male, female, and nonbinary as sub-profiles. Rising, star, and super star
are sub-profiles in the degree of fame. The birth year has ages between 1940 and 2012.
Sports, performer, creator, manager, science, politics, professional, and religion are
the sub-profiles under the occupation. This task has 48,335 user profiles written in 50
languages. Among these user profiles, 33,836 profiles are considered for training of
the model and remaining are for testing. Celebrity profiling analyzes the user tweets
and predicts the user traits such gender, degree of fame, birth year, and occupation.
Celebrity profiling is similar to author profiling. From 2013 to 2018, PAN laboratories
organized the author profiling task [2], which predicts demographics such as gender,
age, native language, and personality. From 2013 to 2018, the organizers of the author
profiling task have the demographic features and dataset. In 2019, PAN laboratories
included celebrity profiling.
The profiles of the celebrities are used in applications such as marketing, foren-
sic, security services, and recommendation systems. Most of the celebrities follow
writing style, while writing text in social media platforms. In general, the writing
of the author never changes throughout their lifetime. To analyze the profiles of the
celebrities, researchers started using stylistic features such as word level, character
level, syntactic, and semantic features. The researchers found different styles by ana-
lyzing different datasets. Rangel Pardo et al. [2] analyzed the tweets and identified
that male authors write more about politics, sports, and technology, whereas female
author wrote more about lifestyle such as jewellery, shopping, and beauty. Koppel
et al. [3] identified that the features selection and content of the text play major in
gender prediction. Newman et al. and Pennebaker et al. [4, 5] observed that as age
of the author increases they tend to use prepositions, idioms, and determiners. They
also found that the younger authors wrote articles and pronouns in the text, while
older authors wrote lengthy sentences. Celebrity profiling predicts the class labels
such gender, degree of fame, birth year, and occupation.
The text data was represented in different ways by the researchers. On this, dif-
ferent machine learning algorithms were trained to classify the data. Most of the
researchers were used classification algorithms such as SVM, Naïve Bayes classi-
fier, and random forest. The paper is organized into 5 sections. Section 2 covers the
related work. The methodology is discussed in Sect. 3. The results and discussion is
covered in Sect. 4. The final Sect. 5 concludes the paper.
2 Related Work
In author profiling and celebrity profiling, most of the researchers were differentiated
the writing style of the author by selected stylistic features. Argamon et al. [6]
proposed technique extracted features such as corpus-extracted, stylistic, and lexicon-
based attributes, which are useful to distinguish range of age and gender of the author.
De-Arteaga et al. [7] proposed method, in which feature vector was generated using
TFIDF. Random forest is trained on this. They observed that the model not performed
because of usage of more features and also consumed more time and memory.
Petrik and Chuda [8] proposed model uses features as n-grams and trained using
logistic regression. This model outperformed in predicting gender of the author but
the performance is poor in predicting age-range, fame, and occupation. PAN 2019
competition 3rd ranker created four models one per each of the sub-profile and applied
mainly preprocessing on the tweets and used n-gram features such as unigrams,
character-level tetragrams. The experiments were conducted using classifiers such as
SVM, random forest, gradient boosting, and logistic regression. Logistic regression
gave good accuracy but not outperformed.
Fusion-Based Celebrity Profiling Using Deep Learning 109
Martinc et al. [9] proposed a transfer learning using ULMFiT. Four classifiers were
created to predict gender, fame, occupation, and birth year using ULMFiT. They got
accuracies 68, 51, 39, and 32 for each of the sub-profile gender, occupation, fame,
and birth year, respectively. Pelzer [10] implemented SVM and logistic regression
on TFIDF feature vector. TFIDF vector is generated with n-grams. The performance
of these algorithms was good for the bigrams comparing with other n-grams.
Radivchev et al. [11] proposed model use word distance as feature vector and
implemented six algorithms on each of task gender, fame, occupation, and birth year.
Decision tree, random forest, Naïve Bayes, KNN, logistic regression, and SVM are
the classification algorithms and predicted all the four profiles.
Asif et al. [12] extracted socio-linguistic feature from the user tweets. Logistic
regression is applied on this feature vector and got the accuracies as 88, 65, 38.7
for gender, fame, birth year, respectively. The multinomial Naïve Bayes classifier
got 56.7 as accuracy for the occupation prediction. Kavadi et al. [13] proposed sub-
profile-based weightage (SPW) for feature representation. This has outperformed
the existing models. In this paper, we represent the tweets as combination of stylistic
and word embedding. We propose fusion-based deep learning algorithm to predict
profiles such as gender, fame, occupation, and birth year.
3 Methodology
3.1 Dataset
The dataset is taken from PAN 2019 challenge. The training set of the dataset is in
English, and the details are presented in Table 1. The dataset has 48,835 user profiles.
The average tweets per user are 2181. The dataset is not balanced. Some of the sub-
profiles in profile are not balanced, and while building the model, those low user
count sub-profiles were not considered.
The performance of the model is evaluated by F1 score, precision, recall, and accu-
racy. Precision checks how frequently it gets it True when it predicts True. Recall
checks how often it forecasts True when it is actually True.
110 K. Adi Narayana Reddy et al.
Initially, the preprocessing is applied on the tweets. In the preprocessing, all the
words are converted into lower case and also unwanted data which is not required for
stylistic features is removed. The stylistic features are used to identify the author’s
writing style. In general, the features of the document are considered as word count,
sentence count, average word count, period count, and average word length, count
of exclamation marks, count of colons, count of commas, and count of semicolons.
Along with these, other features also extracted from each of tweet as count of @,
count of hashtag, count of URLs, count of re-tweets, positive and negative word
count, and POS tags. The tweets are represented as vectors using stylistic features.
Word embedding plays an important role in tasks such as text classification, text sum-
marization, question and answering, and machine translation. The text is represented
as word embedding using Word2Vec [14, 15], Glove [16], and FastText [17].
Word2Vec Word2vec [14, 15, 18] is a popular word embedding technique which
uses shallow two layer neural network. It is trained on a large corpus which learns
the context of the words. The vector representation is carried by two methods CBOW
and Skip-grams.
Fusion-Based Celebrity Profiling Using Deep Learning 111
Glove Glove [16], global vector representation, uses co-occurrence of words globally.
It learns the substructure of words and outputs the vector representation.
FastText FastText [17] is the extension of Word2vec model. It represents each word
as n-gram of characters. It also generated word vectors for unknown words or out of
vocabulary words. The word representation of FastText works well for rare words.
3.5 Method
In the proposal, we use both content-based features using word embedding’s and
stylistic features as input to the model and predict the profiles of the celebrities. The
architecture is presented in Fig. 1. The LSTM takes word embedding’s as input. The
flow of LSTM is presented with the mathematical equations.
In the experiment, word embedding’s and stylistic features were given as input to
LSTM and fully connected network, respectively. For each profile the model is trained
and the results are presented in Table 2. The accuracy of the proposed model more
than existing models for the profiles gender, fame, occupation.
112 K. Adi Narayana Reddy et al.
The fusion of two different data representation is more effective than the single
word embedding or stylistic features. The classification accuracies of gender, fame,
and occupation are 78.1%, 88.2%, and 84.3%, respectively. The test accuracies of
the proposed model are higher than the existing stylistic and content-based models.
The fusion of content and stylistic features dominated other existing models.
5 Conclusion
The celebrity profiling predicts the author demographics such as gender, occupation,
fame, and birth year by analyzing the user written text. In this proposal, we consid-
ered gender, fame, and occupation Profiles. In the proposal, the content-based and
stylistic-based features are used. The fusion of these features improved the accu-
racy of the profiles such as gender, fame, and occupation. The best accuracies were
achieved using deep leaning. In the future, we are planning to propose attention-based
technique to predict the profiles.
References
1. https://pan.webis.de/clef19/pan19-web/celebrity-profiling.html
2. Rangel Pardo F, Rosso P, Koppel M, Stamatatos E, Inches G (2013) Overview of the author
profiling task at PAN 2013. In: Forner P, Navigli R, Tufis D (eds) CLEF 2013 evaluation labs
and workshop—working notes papers, Valencia, Spain, Sept 2013. CEUR-WS.org, pp 23–26
3. Koppel M, Argamon S, Shimoni A (2003) Automatically categorizing written texts by author
gender. Lit Linguist Comput 401–412
4. Newman ML, Groom CJ, Handelman LD, Pennebaker JW (2008) Gender differences in lan-
guage use: an analysis of 14,000 text samples. Discourse Process 45(3):211–236
5. Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: LIWC 2001,
vol 71, no 2001. Lawrence Erlbaum Associates, Mahwah, pp 2001–2009
6. Argamon S, Koppel M, Pennebaker JW, Schler J (2007) Mining the blogosphere: age, gender
and the varieties of self-expression. First Monday 12(9)
7. De-Arteaga M, Jimenez S, Duenas G, Mancera S, Baquero J (2013) Author profiling using
corpus statistics, lexicons and stylistic features-notebook for PAN at CLEF
Fusion-Based Celebrity Profiling Using Deep Learning 113
8. Petrik J, Chuda D (2019) Twitter feeds profiling with TF-IDF-notebook for PAN at CLEF
2019. In: Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF 2019 labs and workshops,
notebook papers, Sept 2019. CEUR-WS.org
9. Martinc M, Škrlj B, Pollak S (2019) Who is hot and who is not? Profiling celebs on twitter-
notebook for PAN at CLEF 2019. In: Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF
2019 labs and workshops, notebook papers, Sept 2019. CEUR-WS.org
10. Pelzer B (2019) Celebrity profiling with transfer learning-notebook for PAN at CLEF 2019. In:
Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF 2019 labs and workshops, notebook
papers, Sept 2019. CEUR-WS.org
11. Radivchev V, Nikolov A, Lambova A (2019) Celebrity profiling using TF-IDF, logistic regres-
sion, and SVM-notebook for PAN at CLEF 2019. In: Cappellato L, Ferro N, Losada DE, Müller
H (eds) CLEF 2019 labs and workshops, notebook papers, Sept 2019. CEUR-WS.org
12. Asif MU, Naeem S, Ramzan Z, Najib F (2019) Word distance approach for celebrity profiling-
notebook for PAN at CLEF 2019. In: Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF
2019 labs and workshops, notebook papers, Sept 2019. CEUR-WS.org
13. Kavadi DP, Al-Turjman F, Adi Narayana Reddy K, Patan R (2021) A machine learning approach
for celebrity profiling. Int J Ad Hoc Ubiquitous Comput 38(1–3):111–126
14. Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations
in vector space. ICLR
15. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of
words and phrases and their compositionality. Adv Neural Inf Process Syst 26
16. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword
information. Trans Assoc Comput Linguist 5. https://doi.org/10.1162/tacl_a_00051
17. Xu J, Du Q (2019) A deep investigation into fastText. In: 2019 IEEE 21st international
conference on high performance computing and communications; IEEE 17th international
conference on smart city; IEEE 5th international conference on data science and sys-
tems (HPCC/SmartCity/DSS), pp 1714–1719. https://doi.org/10.1109/HPCC/SmartCity/DSS.
2019.00234
18. Rong X (2014) word2vec parameter learning explained
DeepLeaf: Analysis of Plant Leaves Using
Deep Learning
Deepti Barhate, Sunil Pathak, Ashutosh Kumar Dubey, and Varsha Nemade
Abstract A growing number of scientists are examining the issue of the survival
of plant species under adverse climate conditions caused by global warming. The
extinction of some plant species is a more concern and such they must be saved
one must have experience and expertise with the species before one can assess it
which is manual and time consuming. Various Scientific methods are being evolved
such as image processing, digital camera, mobile devices, pattern recognition but it
is lagging by accuracy. For such problems, solution could be to identify the correct
species of plant by identifying recent methods like Convolutional Neural Network
(CNN) and Visual Geometry Group-16 (VGG16), deep learning, machine learning.
The proposed System is comprised CNN and VGG16 for Feature fusion extrac-
tion which extracts shape, texture, Contour, Margin. Finally, the results of each
feature were combined and classified using Hyper Parameter Tuned Gradient Descent
(HPTGD) classifier with dimension reduction method PCA. This paper represents
collection of images, preprocessing, and extraction of features using deep learning
methods and Classification on Flavia dataset. The preprocessing was done on images,
Augmented and forwarded for CNN + VGG16 and Classifier. Our model achieved an
accuracy up to 97%. It has been observed that the VGG16 architecture with HPTGT
classifier achieved better accuracy at a similar execution time compared to other
methodologies.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 115
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_11
116 D. Barhate et al.
1 Introduction
Since ancient period plants have been used for various purpose such as medicine,
decoration, environment, and mainly agriculture. Due to various properties of
different plants these are also used in preparation of food items, scents, beauty
product, and medicine. For increasing the demand of such plants, one should know
about its names and properties which can be done by providing solution in the form
of automatic species recognition using deep learning. In this computerized time,
individuals do not have sufficient information to distinguish different natural plants
which were utilized by our ancestor for long time [1]. As of now, the recognition of
herbal plants is simply founded on the human discernment or information. To amend
this issue technical process is required. When a plant species is recognized, its leaf,
fruit, and flower parts are examined [2]. The leaves, which are available almost all
year round, are perceived by as useful for the identification of plant species [3]. DL
utilizes both classification prior feature extraction in addition to the preprocessing
of images. In conventional computer vision approaches to plant recognition, these
accuracies have been reported as 94% [4], 76.3 [5], 90 [6]. The deep learning and
image processing methods were combined termed as CenterNet model [7] which
detected Vegetables by drawing rectangles around it, the rest of the green images out
of drawn box was considered as weed. For the background removal a color index-
based segmentation was used and evaluated through Genetic Algorithm. The CNN is
a supervised deep learning (DL) approach. A convolution layer has both convolutions
and pooling layers (FCLs) [8–14]. New technologies have been made possible such
as ALEXNET, Resnet50, VGG16, and Inception V3 for identifying Plant species.
Recognition is basically the assurance of the likenesses or contrasts between two
components, i.e., two components are something very similar or they are unique. In
this paper CNN and VGG16 have been introduced to extract feature and combined
various features such as shape, contour, margin, color, texture. For extracting all
features dimensionality reduction by PCA followed by HPTGD classifier were used
the flow of process is depicted in Fig. 1. The performance analysis is identified on
Flavia datasets which is high as compared to other methods it is observed that the
results obtained by these methods with feature fusion methods enhanced the results
compared with the other methodologies which is depicted in Fig. 2. The final output
of this system will be species name of plants in which various stages of images were
used for research.
DeepLeaf: Analysis of Plant Leaves Using Deep Learning 117
2 Related Work
The Author [15] worked using Semantic Annotation-Based Clustering (SABC) and
Semantic-Based Clustering (SBC) for picture and site page, respectively. Both the
pictures as well as page content was recovered in proposed work. Various factors such
as calculation time, review and accuracy were investigated using SABC methodology.
In [16] Author presented a work based on Multilayer perceptron and Ada-supporting
for extraction of morphological features such as shape, color, margin, texture with
classification. This system had preprocessing of images, features extraction, clas-
sification, and prediction of result which achieved over 90% of precision. In [17]
they proposed a method involving IOT framework for Fine grained infection detec-
tion. The analysis done by system was fetched and forwarded to farmers for next
action. They have also worked on multidimensional feature compensation residual
neural network (MDFC–ResNet) model for better results. In [18] they have intro-
duced a novice system of plant recognition from contour data of leaf images so
that different species of plants will get distinguished by observing contour informa-
tion. Arranging impeded plant leaves are more difficult than whole leaf matching
due to enormous varieties and intricacy of leaf structures. In the article [19], the
author proposed a leaf segmentation method based on overlapping-free individual
leaf segmentation. They have identified plant point clouds using 3D filtering for
removing leaves images with covering constrains. Other 3D sifting method was intro-
duced to add Radius-based Outlier Filter (RBOF) and Surface Boundary Filter (SBF)
to assist with isolating blocked leaves. Concentric circles-based technique to inves-
tigate outer layer of the leaves was presented in [20]. The shading progression effect
in paired pictures are identified to get the compound leaves. The technique was deliv-
ered the most extreme accuracy in prediction of leaves. In [21], Convolutional Neural
Network (CNN), AlexNet, fine tune AlexNet and D-Leaf were used for preprocessing
and various features extraction. The hybridization of these methods extracted better
features followed by classification provided better results compared to other avail-
able methods. In [22] Author proposed various CNN models such VGG16, VGG19,
InceptionV3, and Xception with classifier ANN and Xception-SVM and Xception-
SVM-BO these two CNN models were used with SVM classifier. Out of above model
Deep Herb model (Xception + ANN) extracted features with accuracy of 97.5% using
Deep herb dataset. In [23] Author used canny detection method for edge detection.
The edge detected images are used for further feature extraction. They have proposed
three CNN models such as, ResNet101, InceptionV3, and VGG16, to avoid training
model using unwanted images transfer learning was used. They found the excellent
results with accuracy of 97.32 using Inception V3 which are high as compared to
Other Models. In [24] they have used MobileNet, Xception, and DenseNet-121 they
have used hybridizations of these models as homogeneous (MoMoNet-MobileNet +
MobileNet, XXNet, DEDeNet, and heterogeneous models as MOXNet (MobileNet
+ Xception), XDeNet (Xception + DeNet), MoDeNet (MobileNet + DenseNet)
with set of Linear Discriminant Analysis (LDA), multinomial Logistic Regression
(MLR), k-Nearest Neighbor (k-NN), Naïve Bayes (NB), Bagging Classifier (BC),
DeepLeaf: Analysis of Plant Leaves Using Deep Learning 119
Random Forest Classifier (RF), Classification and Regression Tree (CART), Multi-
layer Perceptron (MLP), and Support Vector Machine (SVM) classifiers out of which
MoDeNet + MLR word best with accuracy of 98.71%. In [25] Author worked on
shape and texture features of Leaf. For this they have proposed multiscale triangle
descriptor (MTD), local binary pattern histogram Fourier (LBP-HF) the former one
extracted shape feature whilst later one extracted texture features finally both the
shape and texture features were added up to get final accuracy. They have achieved
an accuracy of 77.6% on Flavia dataset. As per the literature and study it is observed
that the traditional methods such as simple CNN and classifiers did not work for
feature fusion methodology. The missing work is no one had used various stages
of leaves such as seedling, tiny, matured, and dried. While in this research we have
considered above stages of leaves images.
3 Methodology
The most prominent feature of leaf is Shape as it has various dimensions and contour
apart from that margin, texture, venation, apex to centroid ratio, eccentricity can be
calculated. For this research various feature fusion method is proposed which extracts
Shape, contour, Texture, venation features and combined to detect exacts species
of plant. For this CNN and VGG16 were used for feature extraction in addition
for dimensionality reduction we have used PCA followed by HPTGD classifier.
The above experiment is conducted on Flavia dataset. Which consist of 33 species
having in total 1907 images. These dataset had images of species such as Chinese
rose, pinewood, acer, barberries, citrus.
In addition, analysis is carried out for Flavia by considering Local Directional
Patterns (LDP) and Local Binary Patterns (LBP). The results for Flavia dataset are
shown in Table 1. For feature extraction texture is considered and classified using
KNN. Existing method showed 96.03% accuracy for LBP and 96.94% for LDP.
120 D. Barhate et al.
However, the proposed system showed better accuracy rate (97%) for LBP and LDP
shown in Table 1.
Our deep Network consist of CNN and VGG16. VGG model, as an extremely new
style network architecture in convolutional neural network, investigates the relation-
ship between convolutional neural network and their execution [21]. VGG16 model,
has smaller kernels parts. Also, it has various parameters than AlexNet. The first
five layers are based on VGG16 for features extraction and provide more resolution
map which provide high resolution map. Finally, the full connected layer reduced the
parameters. Dimension reduction is done by enhanced Principal component analysis.
It removes the unwanted parameters and irrelevant feature without any information
loss. Hyper parameter tuning with gradient descent (HPTGD) reduced the prede-
fined loss functions and classified the species without any error. Input layer is fed to
zero padding layer for eliminating information loss. Then this is fed to into Rectified
Linear Activation Function (ReLU) which generated positive outputs. This result is
then applied to max pooling which extracted maximum element, followed by average
polling for average feature map. We have also compared our results with naive Bayes,
random forest, SVM classifier but our proposed solutions gave more accurate results
as compared to other methods.
Table 2 Performance
Performance Random forest Support vector Proposed
analysis compared with other
metrics machine
methods
Accuracy 84.11 79.05 97
Precision 85.25 82.04 88
Recall 83.52 79.05 84
F1-measure 82.23 79.36 83
For analysis of the proposed and existing systems for Flavia dataset, Scalar Invariant
Fourier Transform (SIFT), Global Image Descriptor (GIST), Multiscale R-Angle
(MSRA) has been considered. The obtained results are shown in Table 3.
Figure 2 shows the parameter extraction. The model consists of convolution layer,
max pooling, and drop out layer. It has 4 layers of convolution, 3 layers of max
pooling, and 2 layers of dropout. The total parameters were execrated are 1,653,568.
The results were classified by HPTGD and PCA with accuracy of 97% as shown in
Table 3.
5 Conclusion
CNN and other hybrid models working well in field in agriculture but need to be
more specific and effective. We proposed an improved method of CNN for Recog-
nition of huge dataset images of different plant species. The proposed model is
hybrid version of CNN with VGG16 which improved the Classification accuracy.
Unwanted parameters and irrelevant features were eliminated by proposed Hyper
parameter tuning with gradient descent and Principal Component analysis was used
for dimension reduction as a results our model achieved the highest classification rate
of 97% compared with other classifiers. We have performed the experiment using
Other classifiers as Random Forest and SVM along with proposed systems and it
has been observed that our Proposed Method achieved an state of art Performance
as compared to others.
122 D. Barhate et al.
References
1. Pravin A, Deepa C (2021) A identification of piper plant species based on deep learning
networks. Turk J Comput Math Educ (TURCOMAT) 12(10):6740–6749
2. Raj AP, Vajravelu SK (2019) DDLA: dual deep learning architecture for classification of plant
species. IET Image Proc 13(12):2176–2182
3. Aakif A, Khan MF (2018) Automatic classification of plants based on their leaves. Biosyst
Eng 139:66–75
4. Selvam L, Kavitha P (2020) Classification of ladies finger plant leaf using deep learning. J
Ambient Intell Humanized Comput 1–9
5. Jin T, Hou X, Li P, Zhou F (2015) A novel method of automatic plant species identification
using sparse representation of leaf tooth features. PloS one 10(10):e0139482
6. Wu SG, Bao FS, Xu EY, Wang YX, Chang YF, Xiang QL (2017) A leaf recognition algorithm for
plant classification using probabilistic neural network. In: 2007 IEEE international symposium
on signal processing and information technology 7 Dec 15, IEEE, pp 11–16
7. Jin X, Che J, Chen Y (2021) Weed identification using deep learning and image processing in
vegetable plantation. IEEE Access 8(9):10940–10950
8. Ashhar SM, Mokri SS, Abd Rahni AA, Huddin AB, Zulkarnain N, Azmi NA, Mahaletchumy
T (2021) Comparison of deep learning convolutional neural network (CNN) architectures for
CT lung cancer classification. Int J Adv Technol Eng Explor 8(74):126
9. Barhate D, Nemade V (2019) Comprehensive study on automated image detection by using
robotics for agriculture applications. In: 2019 3rd International conference on electronics,
communication and aerospace technology (ICECA), Jun 12. IEEE, pp 637–641
10. Kumar PY, Singh P, Pande S, Khamparia A (2022) Plant leaf disease identification and
prescription suggestion using deep learning. In: Proceedings of data analytics and management.
Springer, Singapore, pp 547–560
11. Minowa Y, Kubota Y (2022) Identification of broad-leaf trees using deep learning based on
field photographs of multiple leaves. J Forest Res 1–9
12. Tarek H, Aly H, Eisa S, Abul-Soud M (2022) Optimized deep learning algorithms for tomato
leaf disease detection with hardware deployment. Electronics 11(1):140
13. Senthil T, Rajan C, Deepika J (2021) An efficient CNN model with squirrel optimizer for
handwritten digit recognition. Int J Adv Technol Eng Explor 8(78):545
14. Mundada MR, Shilpa M (2022) Detection and classification of leaf disease using deep neural
network. In: Deep learning applications for cyber-physical systems. IGI Global, pp 51–77
15. Deepa C (2017) SABC-SBC: a hybrid ontology based image and webpage retrieval for datasets.
Automatic Control Comput Sci 51(2):108–113
16. Kumar M, Gupta S, Gao XZ, Singh A (2019) Plant species recognition using morphological
features and adaptive boosting methodology. IEEE Access 7:163912–163918
17. Chaudhury A, Barron JL (2018) Plant species identification from occluded leaf images.
IEEE/ACM Trans Comput Biol Bioinfo 17(3):1042–1055
18. Chouhan SS, Kaul A, Singh UP, Jain S (2018) Bacterial foraging optimization based radial
basis function neural network (BRBFNN) for identification and classification of plant leaf
diseases: an automatic approach towards plant pathology. IEEE Access 6:8852–8863
19. Li D, Cao Y, Shi G, Cai X, Chen Y, Wang S, Yan S (2019) An overlapping-free leaf segmentation
method for plant point clouds. IEEE Access 7:129054–129070
20. Chau AL, Hernandez RR, Mora VT, Canales JC, Mazahua LR, Lamont FG (2017) Detection
of compound leaves for plant identification. IEEE Latin Am Trans 15(11):2185–2190
21. Wei Tan J, Chang SW, Abdul-Kareem S, Yap HJ, Yong KT (2018) Deep learning for plant
species classification using leaf vein morphometric. IEEE/ACM Trans Comput Biol Bioinfo
17(1):82–90
22. Gu J, Yu P, Lu X, Ding W (2021) Leaf species recognition based on VGG16 networks
and transfer learning. In: 2021 IEEE 5th advanced information technology, electronic and
automation control conference (IAEAC), Mar 12. vol 5, IEEE, pp 2189–2193
DeepLeaf: Analysis of Plant Leaves Using Deep Learning 123
23. Roopashree S, Anitha J (2021) DeepHerb: a vision based system for medicinal plants using
xception features. IEEE Access 9:135927–135941
24. TS SK, Prabalakshmi A (2021) Identification of indian medicinal plants from leaves using
transfer learning approach. In: 2021 5th international conference on trends in electronics and
informatics (ICOEI), Jun 3. IEEE, pp 980–987
25. Yang C (2021) Plant leaf recognition by integrating shape and texture features. Pattern Recogn
112:107809
Potential Assessment of Wind Power
Generation Using Machine Learning
Algorithms for Southern Region of India
Abstract Now a day, large scale grid interconnected wind power generation systems
are increasing day by day, the stable operation of grid highly depends on the amount
of wind energy penetrating into the grid. This is not only essential for stable operation
but also necessary for generation allocation and load scheduling. In order to achieve
this, a precise method for estimating the potential is necessary. In this paper, a modest
attempt has been made to estimate the potential of wind power generation for southern
region of India. The methodology presented is based on an efficient machine learning
algorithm based regression methods viz. linear, support vector, K-nearest neighbour,
and decision trees regression models for prediction of number of units’ generated and
output power has been presented. To evaluate the efficiency of these algorithms key
performance indicators such as mean absolute error, mean square error, root mean
square error and R2 score have been considered. It has been observed that linear
regression model performs better than all the other methods considered in this study
and the same was summarized in the results.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 125
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_12
126 P. Upendra Kumar et al.
sources in the country. As per the recent statistics of Central Electricity Authority
(CEA) the installed capacity in India is 392,017 MW by November 2021. In the total
generation the share of renewable energy sources is around 26.5% i.e. 104,031 MW
in which the wind power generation is around 40,034 MW i.e. 38.48% out of all
renewable energy sources as per CEA 2021 November statistics. The Peak demand
in India is around 203,014 MW with a peak meet in demand 200,539 MW with a
deficits of 1.2%. With the associated environmental issues and depleting fossil fuel-
based resources, the emphasis is now shifting to the utilisation of renewable energy.
This necessitates the extension of the grid to renewable energy sources in order to
capitalise on generation diversity and dispersed renewable resources [1].
To fulfil the ever-increasing energy demand, new generation capacity must be
planned and built simultaneously. In this aspect wind power generation technology
is one of the most widely explored types of renewable energy technologies in order
to meet rising load demand while reducing carbon emissions and protecting fossil
fuels and natural resources with minimum operating cost. It also has the advantage of
manufacturing the wind turbines and the other auxiliary equipment at industries and
readily assembling them at the project construction site making it easy for installation
and operation. It is evident that such a huge amount of grid connected wind power
generation needs proper planning in execution and operation to maintain the grid
stability. In this aspect precise estimation of wind power generation is necessary for
effective planning. However the output power generation from the wind power plant
is random in nature. It depends on climatic conditions, wind speed, wind direction
etc. One of the most reliable ways of meeting rising load demand by augmenting
wind power generation is to perform wind power generation assessment to have prior
information about the power that could be generated so that other activities related
to generation scheduling and O and M can be planned without effecting the grid
operations. In order to connect the wind power system to grid an efficient prediction
mechanism is required to avoid grid instability problems [2].
Predicting wind energy is not an easy process as it is highly climate dependent
and atmospheric conditions that change over time. Earlier the process of estima-
tion depends on metrological data from numerical weather prediction and satellite
images illustrating clouds movement to predict wind speed, wind direction and the
other dependent parameters. The fundamental problem of older methods is that the
requisite meteorological data is not always accessible for the wind power site, and
it is not always available at the required resolution level, limiting their applicability
for extremely accurate forecasts. To overcome the difficulty it is important to use
new and intelligent methods to get valid and accurate results. Presently, advanced
estimation algorithms and techniques for power output estimation which combines
the advantages of artificial intelligence and machine learning algorithms are gaining
importance because, they can extract detailed information from wind power records
and produce more reliable forecast results [2]. In recent past, Machine learning
brought radical changes in various domains. In this connection, recently, most of
the researchers started integrating Machine learning based prediction methods in the
field of electrical engineering like grid management, fault prediction, load balancing,
output power prediction and load prediction etc. [3, 4]. Regression models such as
Potential Assessment of Wind Power Generation… 127
linear regression, support vector regression, K-nearest neighbour regression and deci-
sion trees regression are some of the most popular ways of supervisory learning in
the machine learning domain. In this paper it is proposed to assess the potential of
wind power generation using the previously mentioned regression based machine
learning techniques for southern region of India and the results are summarized.
Wind Energy is a technology that captures the natural wind in our environment
and converts it into mechanical energy. Differences in air pressure generate wind.
Wind speeds differ depending on location, topography, and season. The apparatus
that converts air velocity into power is known as a turbine. Turbines are enormous
structures with multiple spinning blades. When the wind drives the blades to spin,
generates electrical energy since they are connected to an electro-magnetic generator
[5] (Figs. 1 and 2).
Kinetic energy is the energy associated with wind movement and is given by
1 2
KE = mv (1)
2
dE
Power = W (2)
dt
1
Power = ρ AV 3 W (3)
2
where ρ is the air density in kg/m3 , A is the Area of cross section of blade movement
in m2 (Table 1).
3 Methodology
The intensity of wind in a region depends on latitude, time of year, atmospheric condi-
tions. Problems arise from velocity of wind, temperature, wind direction, expensive
energy storage, grid stability, and continuous fluctuations due to seasonal effects.
Also, integrating the wind power system to the power grid as an emergency power
Potential Assessment of Wind Power Generation… 129
source to cover the increasing demand is not directly technically feasible. Such a
structure affects the stability of the network. In short, fluctuations in weather condi-
tions lead to uncertainty in wind system performance. It is in this aspect, accurate
and precise models are required for estimation of wind power generation especially
for grid connected large scale systems. Earlier the process of estimation depends on
metrological data from numerical weather prediction and satellite images illustrating
clouds movement to predict wind velocity and the other dependent parameters [6]. To
overcome all the demerits in the earlier forecasting methods the present study focuses
on various machine learning based regression methods for predicting the performance
of wind power system. The proposed machine learning algorithms are used to predict
the number of units generated and wind power output from a wind power plant which
depends upon various independent variables such as Wind Direction, Wind Speed
[7–12].
This model determines a best straight line which fits the given data effectively with
less error. It finds a linear/straight line between predicator (Wind Direction and Wind
Speed) and response (Number of billable units generated or output power) variables
are known as linear regression. If Y is the dependent variable and X is an independent
variable then the population regression line is given by.
Y = B0 + B1 X (4)
It is one more supervised learning approach used for both classification and regression
problems. It finds a best hyper plane with less error and fixes positive and negative
boundaries that hyper plane in its training phase. Then, in testing phase it checks new
point lies in which side of the hyper plane to predict its value. The decision surface
separating the classes in the hyper line of the form.
WTX + b = 0 (5)
where W is the weight vector, X is the input vector and b is the bias.
130 P. Upendra Kumar et al.
In this algorithm K value plays vital role. Small K values provide the most
adaptable fit with little bias but large variance.
Decision trees are a non-parametric supervised learning technique used for classifi-
cation and regression. The goal is to build a model that predicts the value of the target
variable by learning simple decision rules derived from the data characteristics. It
builds models as tree structures. It decomposes a data set into smaller and smaller
subsets while gradually growing a related decision tree. The end result is a tree with
decision nodes and leaf nodes.
The information gain is given by.
The amount of information obtained for a random variable (b) is the reduction in
uncertainty when observing other existing variables (a).
4 Performance Indices
The performance of the proposed model is to be evaluated with the help of certain
performance indices called evaluation metrics [9]. The verification values of Mean
Absolute Error, Mean Square Error, and Root Mean square Error are close to zero
which means that the actual values are similar to predicted values. The magnitude of
the difference between a prediction and the true value of an observation is referred
to as absolute error. Mean Absolute Error computes the average of absolute errors
for a set of predictions and observations to determine the magnitude of errors for the
entire set. The Mean Square Error measures the average squared difference between
estimated value and the actual value. The mean square error is the standard deviation
Potential Assessment of Wind Power Generation… 131
of the residuals. The residual is a measure of the distance from the data points on the
regression line. The Root Mean Square Error is a measure of the distribution of these
deposits. In other words, it tells us how concentrated the data is around the best-fit
line. The R2 score describes how well the regression model fits the observed data.
Usually, a rating nearer to 1 indicates a more suitable model.
It calculates the absolute difference between actual (Ai ) and predicted (Pi ) data point.
This difference gives an absolute
Σ error (E i ) made by the model. It finds the sum of
all absolute errors such as in E i divided by total number of data points is known as
Mean Absolute Error. Its mathematical representation is as shown in Eq. (8).
Σ
n
Mean Absolute Error = 1
n
|Ai − Pi | (8)
i=1
This metric used as loss function. It finds squared distance between actual and
predicted data points. The square operation is useful to avoid the cancellation of
negative terms. Its mathematical representation is as shown in Eq. (9).
Σ
n
Mean Square Error = 1
n
( Ai − Pi )2 (9)
i=1
It tells how closely the data scattered around the line. It is measured as square root
of MSE as shown in Eq. (10).
[
| n
1]
|Σ
Root Mean Square Error = n (Ai − Pi )2 (10)
i=1
132 P. Upendra Kumar et al.
4.4 R2 Score
This metric describes about the performance of the regression method. It is the key
output of a regression analysis. It’s defined as the fraction of the dependent vari-
able’s variation that can be predicted by the independent variable. The determination
coefficient ranges from 0 to 1. The higher value of R2 score indicates that the model
better fits the observed data points. The calculation of R2 score divides the Sum of
Squared Regression (SSR) with the Sum of Squares Total (SST) and subtracts its
result from 1 as shown in Eq. (11).
Σ
n
(A j − P̂ j )2
SSR j=1
R Score = 1 −
2
=1− n (11)
SST Σ
(A j − Ā)2
j=1
For preforming the estimation, the day wise data for every 10 min is considered
from wind power plants operating in South India with various practical parameters
like voltage, current, power in terms of AC and DC quantities along with frequency.
The dataset consists of two features and 3962 instances. Of the total data collected
from the dataset, 80% of data is used for training and 20% of data is used for testing
purpose. The dependent variable Wind_Energy i.e. number of units generated and
Wind_Power i.e. output power relies on various independent variables such as wind
direction, wind speed. Table 2 and Fig. 4 illustrates the results of Mean Absolute
Error, Mean Square Error, Root Mean Square Error and R2 score metrics of all four
proposed regression methods. From the performance indices one can observe that the
Mean Absolute Error, Mean Square Error, and Root Mean Square Error of K-Nearest
Neighbour Regression are higher than all other regression methods. At the same time
the R2 Score of Support Vector Regression are less than all other regression methods.
The Support Vector Regression performs better than Decision Trees Regression and
K-Nearest Neighbour Regression. Of all the four methods it is evident that the Linear
Regression performs better than all the methods performed in this study. The error
analysis for Linear Regression model of the proposed machine learning algorithm is
depicted in Fig. 8 in terms of actual and predicted quantities.
Potential Assessment of Wind Power Generation… 133
6 Conclusions
In this paper, a modest attempt has been made to estimate the potential of wind
power generation for southern region of India. Since large scale grid interconnected
wind power generation systems are increasing day by day, the stable operation of
134 P. Upendra Kumar et al.
grid highly depends on the amount of wind energy penetrating into the grid. This is
not only essential for stable operation but also necessary for generation allocation
and load scheduling. In order to achieve this, a precise method for estimating the
potential is necessary. The methodology used in this study, an efficient machine
learning algorithm based regression methods viz. linear, support vector, K-nearest
neighbour and decision tree regression models for prediction of number of units’
generated and power output has been presented. The evaluation for efficiency of these
algorithms was tested based on key performance indicators such as mean absolute
error, mean square error, root mean square error and R2 score. It has been observed
that linear regression model performs better than all the other methods considered
in this study. The above techniques used and methodology proved to be efficient and
effective for potential assessment for wind energy systems. It is to be noted that the
methodology presented can be extended to all renewable power generation sources for
addressing the concerns regarding grid operation. Further, the proposed methodology
is extremely useful during planning stage for capacity fixation, generation and load
scheduling.
References
Abstract The traffic on roads is increasing day by day. It is becoming very difficult
to track the numbers of the vehicles which are violating traffic rules manually. To
address this issue, researchers has proposed various methodologies to detect the
number plates automatically. But the major issue in these methodologies is accuracy
of the existing methodologies is very low. To overcome this issue, a new efficient
methodology integrated with image processing is proposed. The proposed system
captures the number plate from the video of the vehicle. After capturing the image
of the number plate part, the long-short term memory (LSTM) algorithm is used for
recognizing the characters from captured number plate. This LSTM algorithm is an
optical character recognition algorithm. The recognized number is compared with
the numbers in the database to find the details of the vehicle. The proposed system
is implemented by python-tesseract. The proposed methodology is implemented in
a real time scenario at a busiest security gate. The proposed system is compared
with the existing methodologies, and it shown better accuracy than the existing
methodologies.
1 Introduction
In the number plate detection system (NPDS), the computer will detect the vehicle
numbers from the digital images of the vehicle. The number plate detection system is
integrated in various applications like traffic cameras, security cameras at buildings,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 135
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_13
136 M. Indrasena Reddy et al.
industries, companies and business areas. Any NPDS will consists of three phases
namely Detection of number plate, segmentation of characters and recognition of
characters. As well as most of the commercial buildings needs the system in which
only authorized vehicles needs to be parked inside the parking areas. As well as these
systems need to be more efficient, accurate and fast as well as the systems must be
robust in nature.
After capturing of the vehicle images, the images need to processed in order to
extract the information like number of the vehicle. Processing of the image consists of
several phases like, segmentation of the image, enhancement of the image by filtering
unnecessary parts of the image. Before processing the image, the image needs to be
preprocessed by performing operation like conversion of RGB image to the gray
scale image. After converting to the gray scale image, blurring operation need to be
applied to the resultant image. After that thresholding and contouring operations will
be performed. The bilateral filter will be used for filtering and removing the unwanted
parts in the image like redundant fragments. The canny edge detection algorithm will
be used for detecting the edges of the image. In the NPDS, for extracting the text
from image the preprocessing operation is very important.
Extracting of text from the documented images like handwritten images, skewed
images and typed images is quite faster when compared with the extracting of the
text from number plates. The text from documented images can be extracted very
easily as they are having fixed parameters. But when coming to the number plates,
they will have variable parameters. The number plate images are captured from
different distances as well as the clarity of the image is also lower when compared
with the document images. Hence extraction of the text from the number plates is
a bit difficult when compared with the document images. The preprocessed image
will be given to tesseract for extraction of text from the image. The tesseract is a
package which contains LSTM for extraction of the text from image. The LSTM is
a deep learning technique. The LSTM detects the text with greater accuracy than the
existing algorithms.
1.1 Tesseract
The tesseract is a package contains two components libtesseract and tesseract. The
libtesseract is an OCR engine and tesseract is a command line program. The tesseract
package is having the capability to recognize the text in greater than 100 languages.
The tesseract can be trained to recognize the text in a language. It is having the
capability to produce the output in various formats like html, plain text and pdf.
OCR-LSTM: An Efficient Number Plate Detection System 137
1.2 Opencv
This is an open-source tool for both commercial and academic use. The objective of
this tool is real time computer vision. It has several applications like object identi-
fication, facial recognition, motion recognition, motion tracking etc. The OpenCV
package is used to convert the RGB image into the grayscale image. The grayscale
image will be filtered by using the bilateral filer for removing the unwanted parts of
the image which is also called as noise in the image.
1.3 Lstm
2 Literature Review
the number plates. The proposed methodology uses public datasets. The proposed
methodology able to show high detection accuracy with the number plates having 5
characters but it is not showing high detection accuracy when the number plates are
of 7 characters.
In generally the deep neural networks are very difficult to train. Authors of [5–7]
has proposed a methodology with learning frame work which can be trained very
easily. By using this frame work the objects are detected. The proposed methodology
requires more epoch to achieve high accuracy.
From the existing works, it is observed that most of the works are achieving less
accuracy and not detecting the complete number on the number plates. Some of the
methods are limited to language specific number plates only. As well as some of the
methods are detecting the text which is not present on the number plate?
2.2 Objective
3 System Model
The RGB image can be converted into gray scale image in two ways. Average method
and weighted method.
In the RGB image, any pixel in the picture is having three colors red, green and blue.
The weights of these colors will be from 0 to 255. To convert the RGB image to the
gray scale image, each pixel color weight will be divided by 3 and resultant values
will be added. The Eq. (1) represents the conversion formula for RGB image to the
gray scale image by using average method
The weighted method is also called as luminosity method. Here the colors are
weighted according to the wave lengths. Figure 2 depicts the weighing method-
ology for conversion of RGB image to gray scale image. Here the average will not
be taken into the consideration. As per the Fig. 2, the red color is having higher wave
length than the green and blue color. So for converting the RGB image into gray
scale image 30% of red color, 53% of green color and 17% of blue color value will
be taken into the consideration.
140 M. Indrasena Reddy et al.
3.3 Lstm
The LSTM is introduced in 1997 by Schmidhuber et al. The LSTM solved variety
of problems. Here in LSTM the information will be remembered for longer period
than the usual. Figure 4 depicts the LSTM cell architecture.
OCR-LSTM: An Efficient Number Plate Detection System 141
The result of the bilateral filter will be given as the input to the canny edge detection
algorithm. Figure 8 depicts the result of canny edge detection algorithm.
The canny edged image will be given as input for drawing contours. Figure 9
depicts the resultant image after drawing contours.
After drawing contours, the number plate will be detected. Figure 10 depicts the
detected number plate.
OCR-LSTM: An Efficient Number Plate Detection System 143
The extracted number plate will be given as input to the LSTM algorithm. The
LSTM algorithm will extract the number from the image and extracted number will
be compared with the details in the database to display the information regarding the
vehicle.
The result of the proposed OCR-LSTM methodology is compared with the
existing methodologies in terms of extraction of number plate region, segmenta-
tion rate and recognition rate. Figure 11 depicts the extracted information from the
144 M. Indrasena Reddy et al.
database and Fig. 12 depicts the performance analysis of proposed OCR-LSTM when
compared with the existing methodologies.
5 Conclusions
In this paper the deep learning technique is integrated to detect the number plates.
The proposed OCR-LSTM uses LSTM algorithm which is a deep learning-based
algorithm. As the LSTM algorithm remembers the information for longer duration,
the proposed algorithm shows much better performance when compared with the
existing algorithms. The proposed algorithm recognizes the number plate with high
accuracy within lesser duration. The usage of bilateral filter and contour will remove
the unwanted space around the number plate which is enhancing the accuracy in
number plate detection and reducing the time consumption. The proposed OCR-
LSTM methodology shows 98% detection accuracy which is much better when
compared with the existing methodologies. This methodology is working efficiently
in all types of environments and all types of images.
Fig. 12 Performance
analysis of OCR-LSTM
References
1. Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Gonçalves GR, Schwartz WR et al. (2018) A
robust real-time automatic license plate recognition based on the YOLO detector. Int Joint Conf
Neural Netw (IJCNN) 1–10
2. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with
region proposal networks. Adv Neural Info Proc Syst 91–99
3. Ghosh AK, Sharma SK, Islam MN, Biswas S, Akter S (2019) Automatic license plate recognition
(alpr) for bangladeshi vehicles. Global J Comput Sci Technol
4. Montazzolli S, Jung CR (2017) Real-time Brazilian license plate detection and recognition using
deep convolutional neural networks. In: SIBGRAPI conference on graphics patterns and images,
pp 55–62
5. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
146 M. Indrasena Reddy et al.
6. Li H, Wang P, Shen C (2017) Towards end-to-end car license plates detection and recognition
with deep neural networks. CoRR abs/1709.08828
7. Masood SZ, Shu G, Dehghan A, Ortiz EG (2017) License plate detection and recognition using
deeply learned convolutional neural networks. arXiv preprint 1703.07330
Artificial Neural Network Alert Classifier
for Construction Equipments Telematics
(CET)
Abstract The Internet of Things (IoT) is connection between the Internet of Things
via cloud platform or centralized platform. It can be useful to many applications
that deal with varieties of services like sharing the information from one device to
another. Similarly, on these concepts, a concept called telematics, which deals with
the long-distance transmission of computerized information. It gives the navigations,
routing, or network-related information for many applications in service providers
like transportations, logistics, travelling, and many more. It has many challenges,
namely prediction of failure in the system, diagnostics analysis, etc. Therefore, there
is a need in predictive analysis of CET to analyse the failure in the system. Hence, the
proposed work using artificial neural network to alert the system. The experiment is
conducted using ANN on CET data set, with obtained the metric of accuracy 100%.
Also analysed the various machine learning (ML) algorithm, namely DT, KNN, and
Naive Bayes classifiers obtained in the metric of accuracy of 93.72%, 93.19%, and
62.57%, respectively.
1 Introduction
The Internet of Things (IoT) is connection between the Internet of Things via cloud
platform or centralized platform. It can be useful to many applications that deal with
varieties of services like sharing the information from one device to another. Similarly
on these concepts, a concept called telematics, which deals with the long-distance
transmission of computerized information [1]. It gives the navigations, routing, or
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 147
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_14
148 M. G. R. Urs et al.
2 Related Works
Authors Aslan and Koo worked on an optimizing operation planning and establishing
long-term strategic organization in telematics [1]. Authors Chan and Louis worked
on novel use of telematics data, which is currently used only for equipment-centric
analysis [4]. Authors Lee et al. worked on GPS-based fleet telematics system for
heavy earthwork equipment which can analyse time log information without utiliz-
ing any other on-board sensors [5]. Authors Slaton et al. worked on automated activity
recognition systems for tracking and monitoring equipment [6]. Authors Lekan et al.
worked on the framework for sustainable innovation and a system for the inclusive
monitoring of innovations in the design and planning of construction maintenance
[7]. Authors Singh et al. worked on the realistic demand of handing and manipulat-
ing humongous data coming every few seconds from several vehicles through IoT,
NoSQL CloudantDB database, and cloud computing [8]. Authors Aldelaimi et al.
worked on the objects belonging to a community will collaborate with each other to
collect, manipulate, and share interesting content and provide services to enhance
the quality of human interactions in smart cities [9]. Authors Hussein et al. worked
on the resource capabilities of context-awareness in addition to the user-friendliness
and connectivity proposed as a part of its infrastructure [10]. Authors Barrett-Powell
et al. worked on the lightweight, to facilitate experiments and demonstrations [11].
Authors Hao et al. worked on the a novel diversified top-maximal clique detection
approach based on formal concept analysis [12]. Authors Bruno et al. worked on
sensor that monitor a person’s position at the topological level and generate tracking
signals [13]. Authors Huk and Kurowski worked on to analyse telematics systems
Artificial Neural Network Alert Classifier for Construction … 149
used in transport and forwarding and to propose improvements in the form of cen-
tral solutions [14]. Authors Hu et al. worked on the spatio-temporal distributions of
different parameters including traffic speeds, fuel economy, and emissions [15].
3 Problem Statement
4 System Model
The CET system model SM has many equipments E, where it gives the services Sr
and applications Ap . Among independent equipments E within the given set of net-
works, N x is in various distribution in a space DNx . The homogeneous equipments
Ho and heterogeneous equipments Ht having sensors Sn gather data including vehicle
location, driver behaviour, engine diagnostics and vehicle activity, and visualize this
data on software platforms that help fleet operators manage their resources. In con-
striction equipment telematics, data is based on the communications of services and
applications via global positioning system (GPS) tracking T using these technology
supply chain management is functioning in construction services. The variables and
its descriptions used to model system are given in Table 1.
The objects are distributed randomly in the environment. The equipments service
and applications in the given system network. Hence, the system model is as shown
in Eq. 1.
( )
( ) DNx E(Sr , Ap )
SM = lim Sn DNx (E) : ∀(Sr , Ap ) = f (1)
Ho ,Ht T · Sn
In CET, equipments in the given network are distributed randomly that obtains the
data from sensors. The application and services act as interface to fetch the provided
services to each application from the equipments. These equipments have service
with respect to time for an application in CET.
150 M. G. R. Urs et al.
The CET data set has an environment data helps to make a decision to the user
by giving alert model Am using decision support based on artificial intelligence
model (ANN). The artificial intelligence model provides the decision based on the
environment information like locations status L, time status T , ignition status I ,
power status P, speed status S, and fuel status F. So, data is analysed through
predictive modelling technique using artificial neural network algorithm. Hence, the
problem formulation as Eq. 2.
( ) ( )
1 L·S·I ·P·F
Am = lim DNx , Sr , Ap = f (2)
Ho ,Ht T T
The objective function can be defined as that makes the proposed alert model ( Am )
will take decision with respect to time (T ) in network (DNx ) for services (Sr ) and
applications ( Ap ). Therefore, in the environment, information E is shown in Eq. 3.
Σ
Ht (E)
Am = limn→∞ f T (3)
n=Ho
E = (L + P + S + I + F) (4)
Artificial Neural Network Alert Classifier for Construction … 151
This section explains the proposed work carried out to conduct an experiment on
CET data.
The construction telematics is all about data. When a piece of construction equip-
ment or asset is being called into service, it can be monitored by software solutions
and provide a whole host of information. These areas generate an immense amount
of information that has wide-ranging implications and applications, from reducing
engine idling to identifying the need for further operator training or even investing in
alternative energy machines, such as electric vehicles. Further, under the concept of
predictive modelling technique, the data is analysed using artificial neural network
(ANN) that produces the corresponding information to the alert the system with
respect to time. Thus, the proposed ANN is based on the services in CET, that helps
to make decision according the alert in the system from applications of CET.
The working principle of artificial intelligence model provides the decision based
on the environment information like locations status L, time status T , ignition status
I , power status P, speed status S, and fuel status F. So, data is analysed through
predictive modelling technique using artificial neural network (ANN) algorithm as
Eq. 2. The process of ANN contains the three input layer of ReLU and one sigmoid
output function for 18 dimension of data, it is optimized using Adam and measured
the error in mean square error and obtained the accuracy. Hence, the design and
working methodology of CET environment as shown in Fig. 1.
Fig. 2 Construction
equipment telematics
proposed methodology
Hence, the methodology follows the pre-processing of the CET data to normalize
the null values and missing values in the data set. Next feature selection based on the
equipments models, power models, ignition model, location models, engine models,
and application models. Next to make the decision using artificial neural network
model, it has input function of rectified linear unit (ReLU) and one output layer
using sigmoid function to provide the alert system into CET environment. Hence,
the methodology of CET environment is shown in Fig. 2.
The CET data set contains real-IoT objects for a time of 36 h that contains the 12,000
values that are categorized into few service data models, namely equipments models,
power models, ignition model, location models, engine models, and application mod-
els has alert system. In the equipment model has equipment id, equipment name, in
power models has main power, status of power, in ignition model has ignition status,
Artificial Neural Network Alert Classifier for Construction … 153
vehicle status, digital input, speed status, conditions. In location models has time, lat-
itude, longitude, in engine models has fuel status, temperature, battery status, battery
alert.
6.2 Results
The proposed work is to predict the services using CET environment. This experiment
helps to decide the alert system based on knowledge model using ANN models. The
conventional ANN explores to analyse the sensors data that is encoded with the
service. The data is categorized into two types: target and features, and the total 6
features corresponding to sensors values and alert system have 3 class, namely good,
bad, and average target class. These classes and features data are split into two phase
like training data of 70% and testing data of 30%. The neural network model has
three ReLU activation function layers and one softmax activation function of output
layer for training the network that is obtained in the metric of accuracy 100%. The
proposed work is also analysed by split into ratio of training data 70%, and testing
data 30% on algorithm DT, KNN, and Naive Bayes classifiers obtained in the metric
of accuracy of 93.72%, 93.19%, and 62.57%, respectively as given in Table 2 (Fig. 3).
Table 2 Results
Algorithm Precision Recall F1-score Accuracy
DT 0.88 0.88 0.96 0.93
NB 0.58 0.98 0.92 0.62
KNN 0.78 0.58 0.77 0.93
ANN 1.00 1.00 1.00 1.00
Fig. 3 Accuracy
154 M. G. R. Urs et al.
7 Conclusions
The proposed work is carried on CET data set using artificial neural network to
alert the system. The experiment is conducted using ANN and obtained the metric of
accuracy 100%. Also analysed the various machine learning (ML) algorithm, namely
DT, KNN, and Naive Bayes classifiers obtained in the metric of accuracy of 93.72%,
93.19%, and 62.57%, respectively. Hence, in future increase the data set to analyse
the CET also analyse the performance with various deep learning techniques to the
CET environment.
Acknowledgements This work was carried out under the “Development program of ETU ‘LETI’
within the framework of the program of strategic academic leadership” Priority-2030 No. 075-15-
2021-1318 on 29 Sept 2021.
References
14. Huk K, Kurowski M (2022) The use of telematics systems in transport and forwarding manage-
ment. In: 5th EAI international conference on management of manufacturing systems. Springer,
Cham, pp 305–317
15. Hu S, Shu S, Bishop J, Na X, Stettler M (2022) Vehicle telematics data for urban freight
environmental impact analysis. Transp Res Part D Transp Environ 102:103121
Hybrid Approach of Modified IWD
and Machine Learning Techniques
for Android Malware Detection
Abstract Mobile phones have become an indispensable part of our daily lives due
to the rapid improvement in smartphone technologies. The increased use of smart-
phones in online payments has attracted cybercriminals and is contributing to the
rise of malware infections. Many cyberattacks are caused by mobile application
vulnerabilities and malware. As a result, these attacks pose a significant threat to
smartphone security. In general, big datasets are employed for malware analysis, and
these datasets may contain numerous redundant, inappropriate, and noisy features,
causing misclassification and low detection rates. So, we have to choose the most
important features from the dataset. This research work presents a hybrid model
for malware detection, based on a modified intelligent water drop algorithm (IWD)
and ML techniques. To investigate the performance of the proposed techniques, we
used the DREBIN dataset. The results of the experiments reveal that this approach
removes more than 60% of irrelevant features from the dataset effectively and produce
a promising result.
1 Introduction
Smartphones are frequently utilized nowadays because of their portability and multi-
functional capabilities. Smartphones play a huge role in our daily lives, and they are
utilized for a variety of things like Web browsing, e-banking, e-learning, e-shopping,
social media, and so on. In a previous couple of decades, android has risen to promi-
nence as a dominant mobile operating system. The ability to make digital payments
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 157
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_15
158 R. M. Sharma and C. P. Agrawal
via mobile device makes it incredibly unique, and this characteristic makes it the
most attractive target for hackers. Mobile apps can be acquired from a variety of
sources, depending on the requirement and purpose. Malware and benign-ware are
the two main categories of android apps. Malware is a malicious program that is
purposely created to harm mobile functions. Malware infects mobile devices and
executes a variety of fraudulent operations on its own. Benign-ware is a program
designed to aid the user and does not harm system functions in any way. Signature-
based detection has several limitations, including the inability to detect new malware
and the requirement for malicious source code to build the signature. As a result,
behavior-based malware detection is becoming increasingly common. The proposed
work is dedicated to behavior-based malware detection. To study the behavior of
malware, a large number of the attribute are extracted from the APK file of apps,
and progressively, more attributes are included in the dataset. For this reason, a very
large dataset is built, which may contain many duplicates, useless, and noisy features
[1]. The feature selection can improve computational complexity and classification
time, and it is also used for eliminating inoperative features. The optimal set N is
determined from the entire set M in feature selection, where N < M. The optimization
criteria function is set up in such a way that the best set is generated from the full
set. Many nature-inspired meta-heuristics algorithms, such as particle swarm opti-
mization (PSO), ant colony optimization (ACO), artificial bee colony (ABC), and
genetic algorithms (GA), have established their effectiveness in feature selection in
a variety of domains in recent decades [2]. For android malware detection, machine
learning-based methods with meta-heuristic methodologies are progressively being
explored and deployed [3]. We introduce a hybrid detection model in this paper
that combines a modified version of the intelligent water drop algorithm for optimal
feature selection with machine learning techniques for optimal set evaluation. The
following are the major contributions of the planned work.
• We modified the old IWD by using the feature important function instead the of
probability function for edge selection.
• After getting the subset from the first step, we evaluate the subset using six different
machine learning classifiers.
• To demonstrate the effectiveness of the proposed hybrid approach, we present the
results of various classifiers. In addition, we compare the results of the previous
work to the proposed work.
• To test the proposed method’s performance, we used a well-known android dataset
DREBIN.
The remainder of the paper is divided into the following sections. Section 2
discusses related work in this field, whereas Sect. 3 describes the modified IWD
algorithm, and Sect. 4 describes the feature selection procedure. Section 5 describes
the datasets, data preprocessing steps, and experimental environment in detail; Sect. 6
presents performance assessment metrics to assess the proposed approach’s perfor-
mance; Sect. 7 summarizes the proposed work’s findings, and Sect. 8 summarizes
the proposed approach with the future development.
Hybrid Approach of Modified IWD … 159
2 Related Works
This section explains the proposed IFWDA feature selection algorithm. The
suggested algorithm is a tweaked version of the IWD algorithm. Hosseini was the
first to introduce the IWD in 2007 [15]. The swarms of water droplets pursue the
160 R. M. Sharma and C. P. Agrawal
most efficient path from the source to the destination, avoiding obstacles and environ-
mental disturbances. This algorithm was created using data from the abovementioned
natural phenomenon. Based on previously acquired preserved solutions, this tech-
nique generates new solutions. It achieves an optimal path using artificially generated
intelligent water droplets (IWDs) and ambient factors. The problem is denoted by a
graph G (N, E), where N denotes the nodes of the graph and E represents the edges.
Each water drop makes a path gradually, roaming through the edge and node till
its whole solution is reached. Iteration is finished when all IWDs have established
their whole solution. The algorithm obtains the final solution after completing the
following steps.
The static parameters are unchanged during the whole process. The number of artifi-
cially generated IWD is denoted by ND. The velocity of IWDs is represented by (via ,
vib , and vic ). The soil values of the local path are represented by three parameters sia ,
sib , and sic . The Mx_It represents the total number of iterations, and It_S denotes the
initial soil value of the local pathway. The dynamic parameters are initialized at the
beginning of the process and updated during the process. The list of nodes visited by
each water drop NlD primarily blank and updated if IWD visits that node. The initial
velocity of IWD is denoted by v D . Initial soil is denoted by s D is set to zero. Table
1 represents the static and dynamic parameters.
The proposed modification has been applied to this step. In the IWD algorithm,
artificially generated water droplets select the edges which have less soil content. In
Hybrid Approach of Modified IWD … 161
this way, the water droplets follow the optimal path from the source to the destination.
In the proposed work, the nodes represent the features of the dataset, each edge that
connects the nodes to form an undirected network. In this network, a matrix was
generated by the probability function for each node in the traditional algorithm to
select the connected path with less soil, and the node with a higher value was selected.
Similarly, in the proposed modification, the matrix obtained from sci-kit-learn is used
to determine the optimal path and the choice of the important node. All the nodes of
the optimal path obtained in this way determine the optimal subset. If an IWD drop
k is presently in node i and move to node j. Then feature importance is calculated
using Eq. (1).
Σ
i : node(i )split on node( j )N p(i )
F p( j ) = Σ (1)
all node N p(t)
Σ
L
( )
G mp = f z(i ) ∗ 1 − f z(i ) (3)
i=1
In Eq. (2), Wz(i) represented the weighted number of IWD reaching node j. The Hz(i)
indicates the Gini impurities value of node i. The right (i) denotes the child node
from the right split on node i. The left (i) denotes the child node from the left split
on node j.
In Eq. (3), L denotes the number of labels, and f z(i) represents (frequency) of
label i. If the value of N p(i) is higher than the earlier calculated N p(i) , then add node
j to the list of visited node NlD .
The v D(t+1) represents the velocity of IWD k in time (t + 1); this parameter is updated
using Eq. (4).
vi a
v D(t+1) = v D(t) + (4)
vi b + vi c ∗Slp (i, j )
( )
Slp (i, j ) = (1 − ρin ) ∗ Slp (i, j) − ρin Δs(i, j ) (5)
The amount of soil in the local path is denoted by Slp (i, j).
162 R. M. Sharma and C. P. Agrawal
Soil values updated using Eqs. (5), (6), and (7), respectively.
where the value of constant (ρin ) lies between 0 and 1.
s D = s D + Δs(i, j ) (6)
sia
Δs(i, j ) = ( ) (7)
sib + sic ∗ time i, j : velIWDk (t+1)
( ) HU D (i, j )
t i, j : v D (t+1) = (8)
v D (t+1)
Equation (8) denotes the time function ti(i, j : v D (t+1) ) that denotes the time needed
for water drop k, to travel from i to j at the time (t + 1) where HU D (i, j ) denotes the
heuristic desirability function.
where T P denotes the population of the solution and q (xp) is the fitness function
that is used to measure the quality of the solution. The soil of all edges in T IBest is
calculated using Eq. (10).
( )
1
Sgp (i, j) = (1 + ρ D ) ∗ Sgp (i, j ) − ρIWD ∗ s Bk (10)
q(T IBest )
Equation (11) is used to substitute the T GBest by T IBest or preserve the same value.
The solution building and reinforcement phase are repeated until the termination
state is reached. If the value of It_Cnt becomes equal to or higher than Mx_It, then
the iteration progression is stopped.
Hybrid Approach of Modified IWD … 163
The improved IWD feature selection process finds the best subset S from the complete
dataset U. The suggested modified IWD’s searching process is represented by an
undirected graph G (N, E), where N is the node (i.e., features) connected by edge E.
The selection of an edge indicates the next node to be selected. A small amount of
soil is present on each edge, signifying impediments in the nearby path. Each water
drop is randomly dispersed over the graph and serves as a search agent. The iteration
best solution T IBest result is utilized to determine the global best solution T GBest . The
path with the fewest barriers is the best solution. The optimal feature subset is the
set of all nodes that are members of the optimal path.
To examine the performance of the proposed approach, the Drebin-215 dataset is used
for evaluation purposes, which contains 9476 are benign and 5560 malware samples
from the DREBIN project. This dataset is extensively used by many researchers
[16]. In the data preprocessing phase, the duplicate occurrences are removed from
the dataset, and the entry containing a NaN value is also removed. Then, important
features are selected using modified IWD algorithms. Then, the selected subset is
evaluated using six machine learning classifiers. The flowchart of the proposed model
is given in Fig. 1.
The proposed approach was implemented in Anaconda Python 3.8 on Jupyter note-
book, and the system has processor Intel(R) Core (TM) i-7 8550U @ 1.80 GHz and
8 GB RAM.
The confusion matrix is used to display the results of classification; it not only
provides insight into the performance of classifiers but also shows which classes are
correctly classified and which are not.
164 R. M. Sharma and C. P. Agrawal
• True Positive (∂): A true positive is a correctly forecast occurrence that exists in
malware samples.
• False Positive (α): A false positive is incorrectly forecasting occurrences that exist
in malware samples.
• True Negative (ρ): A true negative is correctly forecasting occurrences that belong
to benign-ware samples.
• False Negative (σ ): A false negative is incorrect forecasts occurrences that belong
to benign-ware samples.
The following metrics are used to evaluate the usefulness of the proposed
technique.
• Accuracy (μ): The accuracy represents the ratio of correctly categorized samples
to the total number of samples. The accuracy can be defined as Eq. (12).
∂ +ρ
Accuracy(μ) = (12)
∂ +α+ρ+σ
• Recall (δ): The recall is a ratio of total number forecast that is pertinent to the
total number of relevant predictions. The recall can be described as Eq. (13).
∂
Recall(δ) = (13)
∂ +σ
• Precision (λ): The precision is the ratio of true positive predictions to the total
number of positive forecasts. The precision can be represented as Eq. (14)
∂
Precision (λ) = (14)
γ +α
• F 1-Score (τ ): The F1-score is the harmonic mean of precision (λ) and recall
(δ); it delivers a better measure of the wrongly classified occurrences than the
accuracy metric. The F1-score can be expressed as Eq. (15).
2(λ ∗ δ)
F1 − Score (β) = (15)
(λ + δ)
machine (SVM), random forest (RF), and multi-layer perceptron (MLP). The results
of all applied ML-classification methods are given in Table 2. Among the proposed
variants, (Modified IWD + RF) achieved 96% accuracy and 98% recall value on
benign-ware. The result also shows that (Modified IWD + KNN), (Modified IWD
+ DT), and (Modified IWD + MLP) performed better than other methods in terms
of precision. The variants (Modified IWD + RF), (Modified IWD + SVM), and
(Modified IWD + KNN) have achieved an F1-score of 98% and 97%, respectively.
The contrast of the proposed method with the previous methods is given in Table
4. It is also clear from Table 4 that the proposed approach is better than the other
except one [7]. The outcomes also demonstrate that the model hybridized with meta-
heuristic methods gives better performance. Since heuristic algorithms unremittingly
try to reach an optimum solution by learning from their previous steps, these methods
do not reconsider the paths that have already been covered. Instead, the meta-heuristic
used the information of the preceding steps, to discover new promising solutions.
Thus, one of the benefits of using meta-heuristic optimization algorithms is that
they considerably reduce the sizes of the dataset by choosing appropriate features,
thereby reducing the time and complexity. The modified IWD algorithm selected 71
optimal features from the dataset. The top 12 features selected from the DREBIN
dataset are shown in Table 3. This process reduces the size of the dataset by more
than 60%; thus, it can be established that the proposed method is more effective
and deliver better performance than many other existing models. Figure 2 depicts
the suggested model’s F1-score, whereas Fig. 3 depicts its accuracy. The recall and
precision derived from subset evaluation are shown in Fig. 4 and Fig. 5, respectively.
Figure 6 shows a comparison of the proposed approach’s precision and recall.
95
90
85
80
BM BM BM BM BM BM
RECALL VS PRECISION
Recall Precision
100
95
90
85
80
75
B M B M B M B M B M B M
KNN DTC LR SVM RF MLP
ML-classifiers are used for subsets examination. Among all the proposed variants,
(modified + RF) has achieved the highest accuracy of 96%. Some variants of this
approach, such as (modified IWD + RF), (modified IWD + SVM), and (modified
IWD + KNN), have achieved F1-scores of 98% and 97%, respectively. And this also
proved from the result of the proposed approach was able to reduce the size of datasets
by more than 60% and was found to be better than many other previous approaches.
The future work will syndicate the other meta-heuristic for feature optimization and
ML techniques for subset examination to attain a more effective hybrid approach.
References
1. Acharya N, Singh S (2017) An IWD-based feature selection method for intrusion detection
system. Soft Comput 22(13):4407–4416
2. Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization
for feature selection and classification (AC-ABC Hybrid). Swarm Evol Comput 36:27–36
3. Lou S, Cheng S, Huang J, Jiang F (2019) Tfdroid: Android malware detection by topics
and sensitive data flows using machine learning techniques. In: 2019 IEEE 2nd international
conference on information and computer technologies (ICICT), pp 30–36. https://ieeexplore.
ieee.org Accessed: 12 Jan 2021.
4. Sun L, Li Z, Yan Q, Srisa-An W, Pan Y (2017) SigPID: significant permission identification
for android malware detection. In: 2016 11th International conference on malicious unwanted
software, MALWARE 2016, pp 59–66
5. Jiang X, Mao B, Guan J, Huang X (2020) Android malware detection using fine-grained
features. Sci Prog 2020(5190138):1–13. https://www.hindawi.com
6. Zhang W, Wang H, He H, Liu P (2020) DAMBA: detecting android malware by ORGB analysis.
IEEE Trans Reliab 69(1):55–69
7. Wang W, Zhao M, Wang J (2019) Effective android malware detection with a hybrid model
based on deep autoencoder and convolutional neural network. J Ambient Intell Hum Comput
10(8):3035–3043
8. Arp D, Spreitzenbarth M, Hübner M, Gascon H, Rieck K (2014) Drebin: effective and
explainable detection of Android malware in your pocket. NDSS 14:1–15
9. Talha KA, Alper DI, Aydin C (2015) APK auditor: permission-based android malware detection
system. Digital Invest 13:1–14
10. Mehtab A et al. (2020) AdDroid: rule-based machine learning framework for android malware
analysis. Mob Netw Appl 25(1):180–192
11. Jerlin MA, Marimuthu K (2018) A new malware detection system using machine learning
techniques for API call sequences. J Appl Secur Res 13(1):45–62
12. Alzaylaee M, Yerima SY, Sezer S (2020) DL-Droid: deep learning based android malware
detection using real devices. Comput Secur 89:101663
13. Idrees F, Rajarajan M, Conti M, Chen TM, Rahulamathavan Y (2017) PIndroid: a novel Android
malware detection system using ensemble learning methods. Comput Secur 68:36–46
14. Alam S, Alharbi SA, Yildirim S (2020) Mining nested flow of dominant APIs for detecting
android malware. Comput Netw 167:107026
15. Hosseini HS (2007) Problem solving by intelligent water drops. In: 2007 IEEE congress on
evolutionary computation. IEEE, pp 3226–3231
16. Android malware dataset for machine learning 2. https://figshare.com/articles/dataset/And
roid_malware_dataset_for_machine_learning_2/5854653 Accessed 11 Sep 2021
17. Milosevic N, Dehghantanha A, Choo KKR (2017) Machine learning aided Android malware
classification. Comput Electr Eng 61:266–274
Intuitionistic Fuzzy 9 Intersection Matrix
for Obtaining the Relationship Between
Indeterminate Objects
Abstract This paper defines intuitionistic fuzzy core (IFC), intuitionistic fuzzy
fringe (IFF), and intuitionistic fuzzy outer (IFO) of an intuitionistic fuzzy set (IFS)
in an intuitionistic fuzzy topology space (IFTS). It has been shown that the IFC, IFF,
and IFO of an IFS are mutually disjoint. Further, intuitionistic fuzzy 9 intersection
matrix (IF9IM) is defined, which can determine the topological relation between
any two IFS. The IF9IM is an upgradation of fuzzy 9 intersection matrix. The IFS
being capable of handling any hesitancy or indeterminacy, the IF9IM determines
relationship between two uncertain objects having any indeterminacy.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 171
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_16
172 S. Jana and J. Mahanta
have been used to deal with the uncertainty. The studies [2, 9, 17] based on fuzzy
set were introduced to make the GIS database system capable of dealing with uncer-
tainties. To determine the relationship between two uncertain geographical objects,
9 intersection matrix using broad boundary [3], egg yolk method [4], fuzzy 9 inter-
section matrix [15], unified fuzzy 9 intersection [6], fuzzy 9 intersection matrix in
a crisp fuzzy topological space [16], and α-induced 9 intersection matrix [11, 12]
were proposed.
Data accuracy in any database model is the primary and most crucial component.
As GIS is a database model, the aim is to make the model as accurate as possible.
But hesitancy or indeterminacy is a concern that occurs due to several facts such
as noise in the data or incapability of collecting data at a particular location. The
fuzzy models are capable of dealing with uncertainty though it fails to handle any
kind of indeterminacy. The generalized fuzzy sets serve better for that purpose. The
intuitionistic fuzzy set introduced by Atanassov [1] is one such generalization of the
fuzzy set that considers the membership and non-membership, also the measure of
the hesitancy of any object in the set. The elements of the set follow the condition
that the sum of the membership and non-membership is always less or equal to
one. The GIS modeling has already been studied in terms of the intuitionistic fuzzy
set. Malek [13] pointed out the shortcomings of the fuzzy framework and proposed
intuitionistic fuzzy framework and its several possible applications in the GIS. In
the viewpoint of point-set topology, the model describes an intuitionistic fuzzy set
in terms of the interior and boundary of membership and that of non-membership.
Interestingly, the study neither examines whether the parts are topological properties
nor it verifies their mutual disjointness. In this paper, we introduce the intuitionistic
fringe and intuitionistic fuzzy outer and show that the fringe, outer and core are
mutually disjoint topological properties in an intuitionistic fuzzy topology space
(IFTS). Finally, we propose IF9IM for determining the relationship between two
objects in an IFTS.
The paper is organized as follows. Section 2 discusses the preliminary concepts
required for the study. Section 3 introduces intuitionistic fuzzy core, intuitionistic
fuzzy fringe and intuitionistic fuzzy outer and constructs the IF9IM. Finally, Sect. 5
concludes the study.
2 Preliminary Concepts
In this section, we briefly recall the preliminary concepts required for the study.
Intuitive fuzzy set (IFS) is a generalization of fuzzy set [18] introduce by Atanassov
[1]. Each of the elements of the set is assigned a membership value as well
Intuitionistic Fuzzy 9 Intersection Matrix for Obtaining the Relationship … 173
The intuitionistic fuzzy topology was introduced by Çoker [5] in 1997 and defined
as follows:
Definition 2.2.1 Let X /= φ be any set and I = [0, 1] and τ ⊂ I X be a collection
of IFS such that τ satisfies the following conditions
(i) 0 X , 1 X ∈ τ,
(ii) A, B ∈ τ ⇒ A ∧ B ∈ τ, (1)
(iii) ( A j ) ∈ τ ⇒ ∨ A j ∈ τ, where J is an index set,
j∈J j∈J
where 1 X and 0 X are, respectively, the whole set X and the null set.
Then τ is called an intuitionist fuzzy topology (IFT) for X .
Members of τ are called intuitionistic fuzzy open set in τ , and compliments of
elements of τ are said to be intuitionistic fuzzy closed set. For an intuitionistic fuzzy
set A in an intuitionistic fuzzy topological space (IFTS), the supremum of the all
intuitionistic fuzzy open set contained in A is defined as intuitionistic fuzzy interior
of A, denoted as IFIntA and the infimum of the all fuzzy closed set that contain A
is defined as intuitionistic fuzzy closure of A, denoted by IFClA. The exterior of A,
denoted as A− , defined as A− = (IFClA)c . The relation between IFCl and IFInt can
be obtained from the following theorem [5].
Theorem 2.2.1 For a IFS A in a IFTS X ,
1. IFClAc = (IFIntA)c
2. IFIntAc = (IFClA)c .
The intuitionistic fuzzy boundary of an IFS A is defined by Hur et al. [10] as follows
Crisp methods: 4 intersection matrix [8] by Egenhofer was the first algebraic method
to obtain topological relation between geographical objects. The model was later
upgraded to the famous 9 intersection matrix model by Egenhofer and Franzosa [7].
For two crisp set A and B, the 9 intersection method is defined [8] as follows:
⎛ ⎞
IntA ∩ IntB IntA ∩ BdB IntA ∩ B −
⎝BdA ∩ IntB BdA ∩ BdB BdA ∩ B − ⎠
A− ∩ IntB A− ∩ BdB A− ∩ B −
The components of a general 9 intersection matrix are interior, boundary and exterior.
These are the mutually disjoint topological properties of a set in a topological space.
For an IFS in an IFTS, the intuitionistic fuzzy interior, boundary, and exterior are
not mutually disjoint. We find three mutually disjoint topological parts here.
Definition 3.0.1 The intuitionistic fuzzy core (IFC) of a IFS A denoted by Aθ is
defined as Aθ = {x ∈ IFIntA : IFIntA(x) = ⟨ 1, 0⟩}.
Theorem 3.0.1 For a IFS A in a IFTS X , IFBdA ∧ Aθ = φ.
Proof Aθ = {x ∈ IFIntA : IFIntA(x) = ⟨ 1, 0⟩}
IFClAc = (IFIntA)c by Theorem 2.2.1, therefore, IFClAc = ⟨ 0, 1⟩, for all x ∈ X ,
(μIFBdA ∧ μIFInt A )(x) = (μIFClA ∧ μIFClAc ∧ μIFInt A )(x) = 0, and
(νIFBdA ∨ νIFInt A )(x) = (νIFClA ∨ νIFCl Ac ∨ νIFInt A )(x) = 0, for x ∈ X .
Which implies IFBd A ∧ Aθ = φ.
Theorem 3.0.2 For a IFS A in a IFTS X , IFBdA ∧ IFIntA = φ implies either
IFIntA = φ or IFIntA = Aθ .
Proof IFBd A ∧ IFIntA = φ ⇒ IFBd A = φ or IFInt = φ
IFInt = φ implies IFInt is crisp
where IFBd A = φ ⇒ IFClA ∧ IFClAc = φ
⇒ either IFCl A is empty or IFCl Ac is empty.
IFClA is empty would imply IFIntA as empty, also
whereas IFClAc is empty ⇒ IFClAc = ⟨ 0, 1⟩
⇒ (IFClAc )c = ⟨ 1, 0⟩
⇒ IFIntA = ⟨ 1, 0⟩
Therefore, either IFIntA = φ or IFIntA = Aθ .
For a particular set, A in a crisp topological space X, the whole space split into
IntA, BdA and A− . In crisp set, IntA ∪ BdA = ClA and ClA − IntA = BdA. Thus,
we can say Cl A splits into IntA, BdA. But, in case of IFS, the intersection between
IntA and BdA to in general non-empty and the Theorem 3.0.1 suggest that the
decomposition of IFClA in IFS is due to the Aθ . So, in a IFTS, the IFClA splits into
Aθ and IFClA − Aθ .
Definition 3.0.2 The intuitionistic fuzzy fringe (IFF) of an intuitionistic fuzzy set
A denoted by ΔA is defined as ΔA = IFCl A − Aθ .
It is clear from the definition of ΔA that ΔA ∨ Aθ = IFCl A.
Theorem 3.0.3 For a IFS A in a IFTS X , the intersection between Aθ and ΔA is
empty.
Proof Aθ ∧ ΔA
= Aθ ∧ IFClA ∧ ( Aθ )c
= φ, as Aθ is crisp.
176 S. Jana and J. Mahanta
Definition 3.0.3 Let A be a IFS in IFTS. We denote the intuitionistic fuzzy outer
(IFO) of A as A∗ , defined by A∗ = {x ∈ (ClA)c : (ClA)c = ⟨ 1, 0⟩}.
Proof From the definition of the Aθ and A∗ , it is very much obvious that the inter-
section between them is empty.
From Corollary 3.0.1, we can say that the intersection between ΔA and A∗ is empty,
and Theorem 3.0.3 proves that intersection between Aθ and ΔA is empty.
IFS certainly gives a better description of the objects as it considers the membership,
non-membership, as well as any kind of indeterminacy or the hesitancy of objects.
Therefore, considering the geographical elements as a fuzzy intuitionistic set would
significantly improve the accuracy of the GIS models. The geographical elements
considered as intuitionistic fuzzy needs an upgraded 9 intersection matrix to obtain
the relationship between objects. As we discussed earlier, the 9 intersection matrix
was introduced by Egenhofer and Franzosa [7] and later, it has been upgraded [11,
15] by many others for finding the relationship between the fuzzy objects.
Given any set A in a topological space, the whole space decomposed into these
three mutually exclusive topological parts, namely the IntA, BdA and A− . But in
the case of a IFS A, intersection between any two of IFIntA, IFBd A and A− can
be non-empty. Corollary 3.0.2 in the previous section shows that the Aθ , ΔA and
A∗ are mutually disjoint and for any IFS A in an IFTS X , the whole space splits
into the mentioning subsets of X . Figure 1 is the geometrical interpretation of an
intuitionistic fuzzy area object in the whole space and the decomposition Aθ , ΔA and
A∗ . Now, to form a 9 intersection matrix, it requires to establish that Aθ , ΔA and A∗
Intuitionistic Fuzzy 9 Intersection Matrix for Obtaining the Relationship … 177
Fig. 1 Decomposition of an
intuitionistic fuzzy area
object
The 9 intersection matrix between two IFS in a IFTS A and B is defined as follows.
⎛ θ ⎞
A ∧ B θ Aθ ∧ ΔB Aθ ∧ B ∗
I = ⎝ΔA ∧ B θ ΔA ∧ ΔB ΔA ∧ B ∗ ⎠
A∗ ∧ B θ A∗ ∧ ΔB A∗ ∧ B ∗
Uncertainty and hesitancy in modeling are unarguably the vital aspects. Almost
every data models suffer the situation of uncertainty and hesitancy. The GIS models
are no exception. The existing fuzzy modelings of the geographical objects were
introduced to counter the uncertainty, whereas intuitionistic fuzzy was introduced to
deal with hesitancy. This paper has introduced a framework, viz. IF9M to determine
the topological relationship between spatial objects having uncertainty as well as
hesitancy.
References
Abstract In this paper, we are proposing a hybrid model of latent semantic analysis
with graph-based xtractive text summarization on Telugu text. Latent semantic anal-
ysis (LSA) is an unsupervised method for extracting and representing the contextual-
usage meaning of words by statistical computations applied to a corpus of text.
Text rank algorithm is one of the graph-based ranking algorithm which is based on
the similarity scores of the sentences. This hybrid method has been implemented
on Eenadu Telugu e-news data. The ROUGE-1 measures are used to evaluate the
summaries of proposed model and human-generated summaries in this extractive
text summarization. The proposed LSA with Text rank method has a F1-score of
0.97 as against the F1-score of 0.50 for LSA and 0.49 of Text rank methods. The
hybrid model yields better performance compared with the individual algorithms of
latent semantic analysis and Text rank results.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 179
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_17
180 A. Lakshmi and D. Latha
verbs and nouns is more complex than those of the English language. Thus, adapting
the existing models of text summarization for Telugu language is not feasible. This
has resulted in less reported work done on Telugu text. So, in this paper, we are
proposing and implementing a generic evaluative extractive text summarization on
Telugu text using LSA model.
The contents of this paper are in seven sections. Section 2 explains the related
work on LSA and Text rank. Section 3 describes the latent semantic analysis and
existing algorithms proposed by the authors Gong and Liu, Steinberger and Jezek,
Murray et al. and Ozsoy et al. Text rank algorithm is explained in Sect. 4. In Sect. 5,
the proposed algorithm and its implementation are discussed. Section 6 illustrates the
results of the proposed algorithm, its evaluation metrics, and comparative analysis.
Conclusion statements and future scope are specified in Sect. 7.
2 Related Work
The important and relevant studies on text summarization using LSA and Text rank
for documents in English and some Indian languages are reviewed in this section.
Gong and Liu [3] proposed two generic text summarization methods such as relevance
measures and LSA. Steinberger and Jezek [4] proposed two new evaluation methods
based on LSA. These methods measured the similarity between the summary and its
original document. Murray et al. [5] proposed an automatic speech summarization
using maximal marginal relevance (MMR) and latent semantic analysis. Ozsoy et al.
[6] proposed a text summarization using LSA on Turkish documents. They proposed
cross-method in this paper. Dokun and Celebi [7] proposed two approaches such as
avesvd and ravesvd on English documents using latent semantic analysis. Chowdary
[8] proposed a generic text summarization using latent semantic analysis on Bengali
text. Geetha et al. [9] proposed a text summarization using latent semantic analysis
on Kannada text. Two approaches (cross-method, Steinberger, and Jezek) are used
for generating the summary. In [10], survey of cross-domain text categorization
techniques has been presented. Improving classification accuracy and dimensionality
reduction of a large text data by least square support vector machines along with
singular value decomposition was implemented in [11]. Kumar [12] proposed new
hybrid model based on fuzzy logic using two graph-based techniques known as
Textrank and lexrank and latent semantic analysis. Mandal and Singh [13] proposed
a generic and query-based text summarization using latent semantic analysis. Reddy
[1] proposed a hybrid model for text categorization using SVM classifier with latent
semantic analysis.
A Hybrid Model of Latent Semantic Analysis … 181
A = USVT (1)
Text rank method is one of the popular graph-based ranking algorithm. Text rank
method is used to extract the sentences based on the sentence scores. It is based on the
page rank algorithm which is used for ranking the Web pages in search engine results
[15]. Text rank algorithm is illustrated in Fig. 1. The following steps implement the
Text rank algorithm.
Step 1: The input document is tokenized into sentences.
Step 2: Find the vectors for each sentence here term frequency–inverse document
frequency (TF–IDF) vectorization is used. Term frequency identifies the
term importance in a document. Inverse document frequency is the number
of occurrences of a term in a collection of documents.
Step 3: Find the similarity using cosine similarity between sentence vectors.
Step 4: The similarities stored in matrix format have to be represented as graph.
The nodes of this graph represent the sentences, and the edges represent the
similarity scores between the sentences.
Step 5: Top-ranked sentence scores are selected to form a summary.
5 Proposed Algorithm
In the proposed algorithm, the hybrid model of LSA and Text rank summarization
methods is used for extracting the sentences based on their sentence ranking. Figure 2
shows the detailed flowchart of the proposed algorithm (hybrid model).
In the preprocessing step, cleaning (removing unnecessary symbols) and tokeniza-
tion have been done. From the preprocessed document, the term document matrix
with TF–IDF has been constructed. LSA and Text rank algorithm discussed in the
earlier sections will be implemented separately with the term document matrix.
The two algorithms generate two different sets of the top “n” sentences with their
scores individually [12]. Sentences common in both the results are included in the
A Hybrid Model of Latent Semantic Analysis … 183
Input Document
Pre-processing
Select the top ranked sentences based Select the top ranked
on length scores. sentences.
Summary Summary
Final Summary
final summary. The common sentences are considered as very important because
these are selected by both algorithms. If there are no common sentences, the top
n sentences from the merged list of sentences sorted according to their length are
selected as final summary.
We have used unigram overlap method for evaluating the proposed summaries [8].
Precision, recall, F1-score are the metrics used to analyze the efficiency of the
184 A. Lakshmi and D. Latha
U H ∩U M
Precision = (2)
UH
U H ∩U M
Recall = (3)
UM
2 ∗ Precision ∗ Recall
F1 = (4)
Precision + Recall
where |UH| is the number of unigrams is selected by the human generated summary,
|UM| is the total number of unigrams generated by the system generated summary,
|UH ∩ UM| is the number of unigrams common in human generated summary and
system generated summary.
The proposed hybrid model based on LSA and Text rank method was implemented
on a sample dataset which is manually generated from one of the popular daily
newspaper Eenadu’s e-news data. A 100 news articles were collected to evaluate the
performance of the model. The proposed LSA with Text rank method is compared
with LSA and Text rank algorithms in 5 categories of Telugu e-news data. The
F1-score is measured for the proposed LSA with Text rank method for the text
summarization in Telugu e-news data as shown in Fig. 3. The comparative analysis
of the proposed LSA-Text rank method with LSA and Text rank algorithms is shown
in Table 1. The comparison shows that the proposed LSA with Text rank method has
higher efficiency than LSA and Text rank methods in Andhra Pradesh and business
categories. In remaining categories, the proposed hybrid model results are equivalent
to either LSA or Text rank methods.
1.2
0.8
F1 - Score
0.6
0.4
0.2
0
Andhra Politics Crime Fresh news Sports
Pradesh
Category
7 Conclusion
In this paper, we proposed a hybrid model of latent semantic analysis with graph-
based text summarization on Telugu text. The daily newspaper Eenadu Telugu e-news
data were collected to evaluate the performance of the proposed LSA with Text rank
method. We evaluated our approach by computing the precision, recall, and F1-score
using the ROUGE metrics. The result shows that the proposed LSA with Text rank
method has a F1-score of 0.97, and existing LSA, Text rank methods have the F1-
score 0.50 and 0.49, respectively. The proposed method yields better score results
when compared with the individual algorithms of latent semantic analysis and Text
rank. It has been observed to work well for small documents. The future scope of
this work will be to apply the model to large documents and extend it to perform
abstractive summarization on documents in Telugu language.
References
1. Reddy PVP. A hybrid approach for Tex categorization with LSA. 5, pp 20181–20188. ISSN
NO:1076-5131
2. Suleman RM, Korkontzelos I (2021) Extending latent semantic analysis to manage its syntactic
blindness. Expert Syst Appl 165:114130. https://doi.org/10.1016/j.eswa.2020.114130
3. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic
analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research
and development in information retrieval, pp 19–25
4. Steinberger J, Jezek K (2017) Using latent semantic analysis in text summarization
5. Murray G, Renals S, Carletta J (2005) Extractive summarization of meeting recordings X2Vi
6. Ozsoy MG, Cicekli I, Alpaslan FN (2010) Text summarization of Turkish texts using latent
semantic analysis. In: Proceedings of the 23rd international conference on computational
linguistics (Coling 2010) 2:869–876
7. Dokun O, Celebi E (2015) Single-document summarization using latent semantic. Analysis
1:1–13
8. Chowdhury SR (2017) An approach to generic Bengali text summarization using latent
semantic analysis. In: 2017 International conference on information technology, pp 11–16.
https://doi.org/10.1109/ICIT.2017.12
9. Kannada text summarization (2015), pp 1508–1512
186 A. Lakshmi and D. Latha
10. Murty MR, Murthy JVR, Prasad Reddy PVGD, Satapathy SC (2012) A survey of cross-domain
text categorization techn iques. In: 2012 1st international conference on recent advances in
information technology RAIT-2012. pp 499–504. https://doi.org/10.1109/RAIT.2012.6194629
11. Murty MR, Murthy JV, Prasad Reddy PVGD (2011) Text document classification basedon
least square support vector machines with singular value decomposition. Int J Comput Appl
27:21–26. https://doi.org/10.5120/3312-4540
12. Kumar A (2020) Fuzzy logic based hybrid model for automatic extractive text summarization,
pp 7–15
13. Mandal S, Singh GK (2020) LSA based text summarization. Int J Recent Technol Eng 9:150–
156. https://doi.org/10.35940/ijrte.b3288.079220
14. Hussein A, Joan A, Qiang L (2019) An efficient framework of utilizing the latent semantic
analysis in text extraction. Springer, US. https://doi.org/10.1007/s10772-019-09623-8
15. Vijay R, Vangara B, Vangara SP (2020) A hybrid model for summarizing text documents using
text rank algorithm and term frequency
A Combined Approach of Steganography
and Cryptography with Generative
Adversarial Networks: Survey
Abstract Secure transformation of data over public networks like the Internet is
nothing but achieving authenticity, secrecy, and confidentiality in secure data trans-
mission is now the primary concern. These issues may be solved by using data
hiding techniques. Steganography, cryptography, and watermarking techniques are
used to hide data and ensure its security during transmission. Objective of this submis-
sion is to analyze and examine several methods of deep learning in image cryptog-
raphy and steganography. The hidden message is revealed via steganography, but its
format is altered through cryptography. Steganography and cryptography are both
essential and robust techniques. This paper’s primary goal is to explore several inte-
grating steganography with encryption to create a hybrid system. In addition, specific
differences were also given between cryptography and steganographic approaches.
This paper aims to help other researchers’ summaries current trends, problems, and
possible future directions in this area.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 187
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_18
188 K. Sandya and S. Kompella
to hide digital media hidden messages in such a manner that they cannot be detected.
Steganography is mainly designed to communicate secret messages using images
securely. Steganography does not modify the secret data; instead, it hides it inside
image, video, or audio so that it cannot be detected [2]. Messages are encrypted
using cryptography to keep them safe from unwanted access [3]. The techniques for
steganographic can be tracked, or the steganography system can be known, as long
as the encoding method is understood.
The stenographic technique allows for disclosing the transmission by digital media
of messages. The methods of communication between senders and receivers [4]
are invisible. Cryptography hides the integrity of information such that it cannot
be decoded by anybody other than the sender and receiver. Data integrity, entity
authenticity, and data authenticity are elements of information security connected to
cryptography, a mathematical study [5].
2 Background Works
This section explores data hiding techniques used primarily on industrial and military
applications. The data ensure maximum security. The secret data are altered by
cryptography; steganography conceals the personal data’s existence, and watermark
data ownership is marked.
Hybrid Methods
In [16], the authors proposed a cloud architecture that would allow safe data transfer
from the client’s organization to the cloud service provider’s servers (CSP). The data
are sent over the network using a hybrid technique that combines cryptography and
steganography.
For example, in [17], the authors reviewed the current steganographic method-
ology for video copying, forgeries, and unauthorized access by using LSB techniques,
RSA algorithmic methods, and steganographic DNA methods. Existing concealing
techniques included drawbacks such as increased key size, higher computation cost,
decreased performance, and larger input sizes. Compared to traditional cryptography
methods, the proposed HECC-based DNA steganography increases the encrypting
and decryption processing times by 30 and 42%.
There was an emphasis on security, speed, and load [18]. The suggested approach
begins by updating two entries: medical image data and medical report data. The
suggested technique justifies its performance on example images with an average
PSNR of 55–70 dB, an MAE of 0.2–0.7%, and an average correlation coefficient of
1 (SSIM/sc/correlation coefficient).
190 K. Sandya and S. Kompella
3 Advanced Techniques
Fig. 2 Flowchart of the framework using learning and transferring representations [20]
A Combined Approach of Steganography … 191
Experimentation with both gray scale and color images may be performed about
cryptography, steganography, and compression methods—detailed discussion of the
reasons behind improving the performance of the recommended approaches. The
quantitative measures were generated and assessed to emphasize the method’s char-
acteristics. A significant effort has been made to establish adequate security algorithm
quantitative metrics that can evaluate and compare different conventional security
algorithms. The experiments on natural (real world) images and benchmarks were
performed to assess the algorithms objectively.
Quantitative Metrics
Four quantitative measures are utilized for the evaluation of the performance of new
techniques provided in the study: peak signal-to-noise ratio (PSNR), mean square
error [MSE], changing pixel rate number [NPCR], and unified averaged changed
intensity (UACI). PSNR is utilized for the image quality and MSE for the image
distortion measurement, and the NPCR, UACI for the evaluation of image encryption
techniques are employed.
Peak Signal Noise Ratio (PSNR): When the projected PSNR value is high, the
performance of the proposed system is high. PSNR improves its performance because
it is based on the region of the pixels in the received image, and the noise is eliminated
during the image recovery process instead of the pixel. Regarding secret images, it
determines the quality of the decoded image.
MAX f
PSNR = 20 log10 √
MSE
where MAX f is the decoded image total pixel count and MSE mean square error.
Mean Square Error (MSE): When the estimated value of MSE is low, the perfor-
mance of the suggested system is high. Because of the low MSE, the proposed
technique is susceptible to multivariate images, and recovery noises are eliminated
throughout the image process.
1 Σ Σ
m−1 n−1
MSE = ∥ f (i, j ) − g(i, j )∥2
m−n 0 0
A Combined Approach of Steganography … 193
whereas,
‘m’ represents secret image width.
‘n’ represents secret image height.
‘ f (i, j )’ represents original secret binary image.
‘g(i, j )’ represents decoded secret binary image.
194 K. Sandya and S. Kompella
NPCR and UACI: Decoded image quality is assessed between pixel change rate
number (NPCR) and unified average changing intensity (UACI).
NPCR may be specified as
1 Σ
NPCR = K (i, j )
X ∗Y
Let k1(i, j ) is image input and k2(i, j ) is image decoded. The image is X and Y
correspondingly in width and height.
⎧
amp; 1, ifk1(i, j ) == k2(i, j )
k(i, j ) =
amp; 0, else
1 Σ
UACI = [k1 (i, j ) − k2 (i, j )]/255 ∗ 100%
X ∗Y
5 Conclusion
References
K. Padma Vasavi
Abstract In India, around 4.61 Lakh road accidents happened in 2017 out of which
1.49 Lakh led to fatality. It is estimated that Andhra Pradesh alone takes a share of
7416 deaths among them. Among the total accidents, the death tolls are about 55,336
only from the two-wheeler crashes which indicate the pathetic and alarming scenario
of road accidents in India. Many lives would have been saved in such conditions
if the accident vehicle is detected, and the information regarding the incident is
sent to the right people in right time. This situation motivated us to take up this
research, which detects the road accidents using a computer vision system built
around a Raspberry Pi and intimate the registered mobile numbers through IoT. The
vehicle accident detection system (VADS) is built around a Raspberry Pi interfaced
with a Web camera. The camera may be fixed in places like four road junctions, T-
junctions, and other important locations where the probability of accident occurrence
is high. The camera continuously captures the scene under consideration and gives
the input to the processor. A convolution neural network architecture is designed
and implemented to classify the severity of the accident into one among the three
categories: good, moderate, and worst. As and when the test image in the scene is
classified as a “moderate” class or a “Worst” class, the system identifies the situation
as a serious condition and immediately triggers an event in the Ubidots cloud by
using Wi-Fi interfaced with the controller. On triggered by the event received from
the processor, the cloud immediately delivers a text message to the registered mobile
numbers like: ambulance or a police control room to ensure immediate help that
saves the life of the victim.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 197
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_19
198 K. Padma Vasavi
1 Introduction
According to a report given by the World Health Organization on road safety, India
accounts for more than 11% of the total number of road accidents with only 1% of the
world vehicles. In many cases, human lives are lost in road accidents due to delays
in emergency medical assistance [1]. According to golden hour principle [2], there
is a high probability that timely medical and surgical aid can avoid death during the
golden hour, which is the period after the traumatic injury. A decrease in the response
time of emergency medical care can reduce the probability of death by one-third on
an average. The percentage of people who die before reaching the hospital in low- and
middle-income countries is more than twice as compared to high-income countries
[3]. The rest of the paper is organized as follows: The literature related to detection
of vehicle detection is reviewed and described in Sect. 2; Sect. 3 gives the methods
used in the proposed vehicle accident detection system; Sect. 4 provides the results
and discussion of the proposed system. Finally, Sect. 5 concludes the paper.
2 Literature Review
of accident occurrence is high. The camera continuously captures the scene under
consideration and gives the input to the processor. A deep learning algorithm with
a convolution neural network architecture is designed and implemented to classify
the severity of the accident into one among the three categories: good, moderate, and
worst. As and when the test image in the scene is classified as a “moderate” class or a
“worst” class, the system identifies the situation as a serious condition and immedi-
ately triggers an event in the cloud by using Wi-Fi interfaced with the controller. On
triggered by the event received from the processor, the cloud immediately delivers
a text message to the registered mobile numbers like: ambulance or a police control
room to ensure immediate help that saves the life of the victim.
3 Methods
is done on the database to resize them to 110 × 110 × 3 to be suitable for the size of
the input layer of the proposed CNN. Also, the images in the database are subjected
to translation and rotation to improve the classification accuracy of the proposed
system. A zero-center normalization is also performed on the database to ensure the
training is completed at a quicker pace. The convolution layer in the network uses
three 3 × 3 filters with a stride of 1 to extract the features of each class of the vehicles.
The ReLU layer is used to compute the activations of the inputs. The Maxpool layer
is used to reduce the dimensions of the features calculated from convolution layer by
using a 3 × 3 filter with a stride of 1. The convolution layer, the ReLU layer, and the
Maxpool layer are repeated for three times, to calculate the features of the vehicles
from macro-level to micro-level. Finally, a fully connected layer is used to flatten the
features to one dimension and categorizes the vehicles into one of the three classes.
Till now, the implementation details of the proposed architecture for vehicle acci-
dent classification are presented in detail. The next section discusses the results
of implementing the proposed system using MATLAB simulations and real-time
implementation on a Raspberry Pi processor as well.
Initially, the dataset is presented to the convolution network for the purpose of classi-
fication of accident. A random set of images from the dataset chosen for the vehicle
accident detection is shown in Fig. 4.
The proposed architecture was trained on an Nvidia (TM) GeForce GTX GPU
with 16 GB of memory. It took approximately two minutes to train the network. The
network was trained using stochastic gradient descent learning, with a learning rate
of 0.001. Among the six hundred images present in the database, 90% of the images
202 K. Padma Vasavi
is chosen for training the neural network, and the remaining 10% is used for testing
the neural network.
The training accuracy obtained was 100% after eight epochs with epoch running
for 1500 iterations. The detail of training the neural network is shown in Fig. 5.
After training the neural network, the neural network is tested by presenting the
images in the remaining 10% of the database left after training the network, and
the results are shown in Fig. 6. In all the given instances, the neural network could
rightly label the category of the vehicle, with a validation accuracy of 94.7%. Till
now, the MATLAB simulation results of the neural network are presented in detail.
However, to implement the system in real time, we have deployed the system in a
Raspberry Pi processor. An equivalent C code required for Raspberry Pi is generated
by using the MATLAB coder support package for Raspberry Pi processor. The real-
time implementation is done by created a simulated environment in front of our
institution main gate, by deliberately keeping a moderately damaged vehicle on
the roadside. The camera interfaced with the Raspberry Pi processor, continuously
captured the scene, divided them into frames, and these frames are given as test
images for the neural network, deployed on the processor. The convolution neural
network could label the vehicles with good condition as “good” and the vehicle we
deliberately kept on the roadside as “Moderate” as shown in Fig. 7. Now, when the
neural network identified a vehicle with moderate damage, it triggers an event in the
cloud remotely connected to the Raspberry Pi processor through the Wi-Fi module
in built on the processor.
The Raspberry Pi is connected to the “Ubidots” cloud through Wi-Fi connection.
When the neural network identifies any vehicle with moderate to worst damage, it
triggers two events on the cloud. One event corresponds to sending a short message
to nearby hospital with ambulance and another event to the nearby police command
and control room giving the notification about the incident as shown in Fig. 8.
Comparisons
The performance of the proposed method is compared with other popular deep
learning architectures like “Alex Net” and “Squeeze Net.” Alex Net is chosen for
comparison because of its efficiency in terms of classification accuracy. Squeeze Net
is chosen because of its low computational complexity and small execution time. All
the three architectures are compared in terms of classification accuracy and execution
time. The comparison results are given in Table 1. From Table 1, it is observed that
the proposed method is executing at a faster pace when compared with Alex Net
without loosing much of the classification accuracy. The faster computational speed
helps the victim to be saved faster by preserving the golden hour.
204 K. Padma Vasavi
(a) (b)
Table 1 Performance
Architecture Classification Execution time (min)
evaluation of proposed
accuracy in %
method
Alex Net 97 8
Squeeze Net 96.8 6
Proposed CNN 94.7 2
5 Conclusion
A dataset of 600 images which contains good, moderate and worst vehicles with
200 images in each category is preprocessed using bilateral filtering and adaptive
Real-Time Accident Detection … 205
Gaussian thresholding. CNN architecture is designed and built with three convo-
lutional, Maxpooling layers, and one fully connected layer and trained with data
that extracted 6,315,843 features with 100% training accuracy. The testing accuracy
obtained is 94.7%. The vehicle accident system is implemented on a Raspberry Pi to
detect the accident vehicle in a real-world scenario and transmit the information to
the hospitals by built in Wi-Fi of the processor. The message from R-Pi is received
by the registered user’s mobile phone from the Ubidots Cloud.
Acknowledgements The author would like to express her fond appreciation to Md. Imamunnisa
and her team for their active participation in the execution of project and their enthusiasm in real-
time implementation of the project. The author would like to express her heartfelt gratitude and
sincere thanks to the Management of Shri Vishnu Engineering College for Women, for their support
and encouragement to complete this research.
References
Abstract In this paper, the authors present the analysis of the sensitivity response
of tin oxide-based Cu-doped thick-film gas sensor using neural computing method
of ANN simulation that enables us to foresee the response for wood alcohol at a
temperature of and 350 °C. The device’s sensitivity has been studied at entirely
different Cu-doped concentrations, including no conduction doping. Furthermore,
the minimum and maximal sensitivity at a particular temperature, and 350 °C, has
been analyzed upon exposure to methanol. A unique approach has been adapted to
measure the sensitivity of I records Cu-doped SnO2 dense gas device with applied
ANN algorithms for three distinct network functions. The algorithmic instruction
rule of feed-forward algorithms, particularly gradient descent and backpropagation
with accommodate learning rate (TRAINGD), was used. The performance of ANN
models with different algorithms is evaluated for the clear responsiveness of the
device with different network transfer functions. By experimentation, we tend to
find that the ANN model with an algorithmic training rule is also appropriate for the
sensitivity device. The results presented at intervals in the paper show that ANN is
an associate degree economical tool for designing SnO2 based mostly thick-film gas
sensor devices.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 207
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_20
208 A. Gupta et al.
to meet the demand for low-level gas detection [1]. Furthermore, thick-film gas
sensors ought to be cost-effective and reliable over the long term [2]. Metal-oxide-
semiconductor (MOS) sensors based on electron conduction have been widely used
as a widespread process of gaseous detection. The sensing characteristics of SnO2
were reported by Neri et al. (2006) using the gel-combustion method [3]. Mendoza
et al. (2014) showed chemical sensors based on SnO2 -CNT films using the HF-CVD
technique [4]. Due to less expensiveness, durability, and reusability, SnO2 is the
most widely used material among semiconductor oxides for fabricating sensors [5,
6]. Furthermore, owing to the high sensitivity, subtle design, lightweight, less cost,
small particle size, and dopant properties, the thick-film SnO2 gas sensor device is
most suited and competent [7].
Conversely, SnO2 exhibits n-type behavior due to non-stoichiometry instigated
due to oxygen voids. Furthermore, as the SnO2 is an n-type semiconductor, it has a
forbidden bandgap of 3.6 eV. In addition, each anion in the unit cell is attached with
the cations in a planner–trigonal conformation so that p orbitals of oxygen contain
the four–atom plane [8, 9].
The thick-film gas sensor is conductive due to the non-stoichiometric composi-
tions resulting from oxygen deficiency [10]. The sensing property of the thick-film
gas sensor is that its adsorption of the gas on the particles on its surface produces
changes in its conductivity [11].
Newly organized SnO2 particles adsorb oxygen atoms on the surface when
exposed to air [12]. Every SnO2 particle is shielded with negatively charged ions
on the surface, while positive charges get deposited after an atom donate an electron
due to depleting charges. Such layer is created just below the particle surface. When
the sensor is exposed to rare gases at higher temperatures, the O-absorbents respond
and discharge the e-s toward the conduction band [13].
Consequently, the bottom of the space–charge region shrinkages, ensuing in a
decline of the latent barrier height for convenient conduction at the grain bound-
aries. ANN analysis was revealed approximately fifty years ago; however, hands-on
problems have been practiced only for the past 20 years [14].
The ANN is the buildup of small distinctly related transform units. The smart info
is passed between those entities beside onward interconnections. An inward assembly
has two values detected: One is the input, and the other is gaseous mass [15]. The
output is a function of the estimated amount. ANN is proficient in predefined input
data and authorized ready for prediction or classification. ANNs can self-learn to
distinguish the pattern in the real data system. An ANN can handle much input and
provide a suitable selection for designers [16].
ANNs acknowledge signals to navigation in one manner only from input to output.
There is no close loop, i.e., the outcome of any layer does not influence the same
layer. Feed-forward ANNs tend to be truthful networks those assistant inputs with
outputs [17]. They are widely used in pattern identification. This type of network
is also mentioned to assign as bottom-up or top-down. The single-layer network is
the most accessible form of an incrusted system with only a single input layer that
associates straight to the output layer [18].
Design of Cu-Doped SnO2 Thick-Film Gas Sensor … 209
The perceptron is the simplest form of artificial neural network used to classify
patterns that are linearly breakable. Linearly delicate patterns lie on opposite sides of
a hyperplane. The model exists of single neurons with balanced synaptic weight and
bias. Single neuron perception is limited to performing pattern classification with
only two classes. From the classification with than two classes, the output layer of
the perception can include more than one neuron.
With multilayer feed-forward ANNs, hidden layers are available between the
input and output layers. In feedback or recurrent ANNs, there are linkages from
later layers back to earlier layers of neurons. In this type of neural network, there
is one feedback loop. The network’s hidden neuron unit activation to the output
data is feedback into the network as input. This work presents the reorganization
of sensitivity in a Cu-doped SnO2 sensor for the feed-forward network that can be
used to reorganize the pattern. Feed-forward network utilizes the Gaussian activation
function. The importance of such a function is that it is non-negative for all values
of x.
2 Proposed Experiment
The methanol gas is highly toxic and unsafe for a living being, whose severe contact
can produce instant bronchial contraction, narrowing of the airways, high pulmonary
resistance, and increased airway reactivity in experimental animals. Critical expo-
sures to experimental animals have also produced changes in metabolism and irri-
tation to the mucus membranes in the eyes. Therefore, the calibration of the heater
element was carried out in air ambient. The temperature variation of the substrate
containing the heater with external electrical power supplied was recorded using
the thermistor. The toxic gases and liquid are injected into the chamber by a needle
from the top of chamber. The base or foundation of the chamber is insulated and
isolated by a cotton bed or sheet. Chamber box has open the measurement of Ra
resistance of air and Rg resistance of test gas and liquid. To perform the experiment,
chain of different concentration liquids and gases is necessary. The sensor resistance
started falling down immediately after being due to the semiconducting nature of
the sensor. This decrease in resistance is exponential. It was followed by an increase
of resistance of the sensor due to the adsorption of oxygen molecules on the sensor
surface. After some time, the sensor resistance stabilized, which was the resistance
value in the clean air for that temperature. One ml of the test gas was introduced to
the enclosed chamber in this moment, and the resistance was noted down. The gas
concentration was increased by injecting more gas into the chamber ml-wise, and
the corresponding sensor reading was noted down The concentration is measured in
terms of part per million (ppm), and for liquid, 1 ml equals 100 ppm, and for gases,
1 ml equals to 250 ppm. To measure the sensitivity of the SnO2 based 1% Cu-doped
thick-film gas sensor, the value of resistance of thick-film gas sensor in air (Ra) is
measured with the digital multi-meter (DMM). Secondly, the value of resistance in
sample gas is calculated by digital multi-meter (DMM).
210 A. Gupta et al.
Alumina Substrate
The experimental data were first extrapolated by the Matlab tool, and 10 extrap-
olated data were obtained for the different concentrations of methanol at 150 and
350 °C. Out of these, first six data were used for the training purpose and the rest
four for confirmation. The confirmation set was used to stop the NN training when
the neural network began to overfit the data. The test dataset was not used during the
model validation of the neural network. Multilayer perception feed-forward ANN
was designed to test and train using gradient descent back (GDB) propagation algo-
rithm offered forward propagation. LEARNGDM has been cast-off as its adaptation
learning function, and mean square error is used as a performance function. The
mean square error (MSE) has been used to estimate the network efficiency as the
training goal. The more minor the MSE is, the better the network’s performance and
Design of Cu-Doped SnO2 Thick-Film Gas Sensor … 211
accuracy. For actual data tansin, logsin and purelin were used as transfer functions
for all the neurons, respectively, at individual iteration set of input and output data,
after building the neural network by MATLAB setting the network type and param-
eters and training for 1000 iterations and ten hidden neurons. When the sensitivity
was tested by Matlab software neural network tool through Levenberg–Marquardt
feed-forward propagation algorithm, the maximum sensitivity in the tansin network
transfer function was 22.52% at 150 °C compared to the various network transfer
functions in Fig. 2.
While analyzing the sensitivity of a sensor by the function extracted from Matlab
software (neural network) tool, it was observed that purelin transfer function of the
algorithm (Gradient descent backpropagation with adaptive learning rate), the sensi-
tivity appeared was found to be maximum 79.28% at 350 °C as compared to the other
network transfer function in Fig. 3a. Therefore, the gradient descent backpropagation
(GDB) with adaptive learning rate algorithm was used the regression parameter of
train data (0.9899) and output target train data (0.98305) in Fig. 3b.
(a) (b)
Fig. 4 Results of regression: a logsin transfer function, b purelin network transfer function
The gradient descent backpropagation with adaptive learning rate algorithm has
been utilized as regression factor of train data (0.97678) and the output target train
data (0.96097) as shown in Fig. 4a. The GDB propagation with adaptive learning rate
algorithm was used as regression parameter of the train data (0.9996) and the output
target data (0.9723) as shown in Fig. 4b.
LEARNGDM is used as its adaptation learning function, and mean square error
is used as performance function at 350 °C. The Levenberg–Marquardt feed-forward
propagation algorithm is used as regression parameter of train data (0.99922) and
output target data (0.98369) as demonstrated in Fig. 5. The above algorithm having
regression parameter of train data (0.99298) and output target data (0.99004) is shown
in Fig. 6a.
Levenberg–Marquardt feed-forward propagation algorithm regression parameter
of train data is (1.000), and output target data (0.99996) are shown in Fig. 6b.
4 Conclusion
The maximum sensitivity recorded for 1% Cu doping SnO2 -based thick-film gas
sensor was 22.52% at 150 °C. The maximum sensitivity of methanol was also tested
by Matlab software neural network tool. In gradient descent backpropagation with
adaptive learning rate algorithm network function in logsin network transfer function,
it was established to be (0.9830) at 150 °C in Fig. 4.
Among the three transfer function network, logsin is the most suitable function
as at zero epoch; maximum validation performance is too successful. The gradient
descent backpropagation with adaptive learning rate network function was found to
Design of Cu-Doped SnO2 Thick-Film Gas Sensor … 213
have the lowest error in logsin transfer function network. Gradient descent back-
propagation with adaptive learning rate algorithm is the substantial worthy approach
in equivalence to Levenberg–Marquardt feed-forward propagation algorithm. The
utmost sensitivity founded for 1% Cu doping SnO2-based thick-film gas sensor was
79.33 at 350 °C.
Amendment capability of 1% Cu-doped SnO2-based thick-film gas sensor was
checked by tansin transfer function for methanol at 350 °C.
214 A. Gupta et al.
References
17. Dargar SK, Srivastava VM (2019) Design and analysis of IGZO thin film transistor for
AMOLED pixel circuit using double-gate tri active layer channel. Heliyon 5(4):e01452
18. Gupta A, Kumar VR (2020) Machine learning technology using thick film gas sensor toxic
liquid detection for industrial IOT application. In: 2020 IEEE international conference on
electronics, computing and communication technologies (CONECCT)
Detect Traffic Lane Image Using
Geospatial LiDAR Data Point Clouds
with Machine Learning Analysis
1 Introduction
Climate change will bring a slew of new dangers to the earth. Floods and other hydro-
logical hazards may cause the global map to alter geographically. Changes in land
use and coverage are also the most difficult challenges in maintaining the geograph-
ical area. The harmful dangers include causing surface runoff and generating varied
surface capabilities. Natural disasters can do significant harm to human life. Taking
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 217
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_21
218 M. Shanmuga Sundari et al.
into account and preserving an excessive design of geological space is critical. Many
research students are interested in experimenting with the connections between land
use and coverage and any changes in the landscape. These will have an impact on
traffic restrictions as well. The diverse characteristics of hydrological dangers will
impact the forest’s built-up space extensions.
The afforestation provides the land changes due to the peak frequency of changes
in the land coverage. A range of study fields is related to geospatial analysis. All-
natural climatic information and alterations are managed using geoformation [1],
which refers to geographic information systems (GISs). The escalation of hydro-
logical risks sets the stage for spatial resolutions and land-cover changes. Satellite
images were obtained and used for enhancements. It specifies the GIS’s position
using the statistics used to generate vector or raster inputs.
The geospatial images [2] will show the changes in the road map, such as traffic
rerouting. When there is an irregular traffic flow, our dynamic technique will regulate
it. The tremendous intensity of traffic flow constantly disrupts the usual routine.
Machine learning technologies are applied to improve the accuracy of traffic forecasts
and flow control.
2 Literature Survey
Nowadays, various tools and technologies are available in the geospatial domain
leading the industry to the next level. These research tools mostly employ machine
learning techniques using algorithms such as logistic regression (LR), support vector
machine (SVM), and stochastic gradient descent algorithms (SGD). Many studies
have been conducted on the early prediction of events and supporting decisions
[3]. Labs have developed an LR-based model of real-time heart prediction based
on machine learning analysis. SVM algorithm is used to calculate the dependency
between attributes and analyze the disease. It predicts acute cardio effect in [4]
and diastolic and systolic blood pressure [6]. In addition to carrying out a contour
reputation for every signal, the use of a linear regression version is primarily based
on a raster image [5]. It uses earlier records to do away with floor and constructing
points.
The technique of coming across excessive extended items is based on the pinnacle
or border of the street in the MLS factor cloud. It clusters those excessively extended
items to site visitors’ signals and mild pole classes [6]. Big data analysis is going
on in geospatial fields and finding the predictions of traffic flow [7]. LiDAR tech-
nique is one of the efficient techniques that help to find the image visualization [8].
Much research is going on geospatial like time series segmentation [9] and image
reproduction of the original image.
Thanh Ha T and T. Chaisomphob et al. [10] propose to utilize principal component
analysis (PCA) to find the planar MLS data. This paper classifies the pole as different
objects like utility, lamps, and street signs. CNN features give the best performance
by providing visual data and image representation. Many algorithms are used in
Detect Traffic Lane Image Using Geospatial... 219
geospatial methods, but getting the data and images is most challenging in this area.
It is a huge limitation of this research. This research helps developing countries
regulate the traffic in necessary places. It will help to improve personal safety in the
country.
3 Proposed System
Figure 1 represents the proposed system of our research. Geospatial analysis is carried
out using weighted regression. Preprocessing of the image has crossed many stages
to reach the regression. The correction and classification are used to help to get the
figured image that is suitable for regression. This research differs from the previous
research in terms of algorithm matrix prediction using the Markov matrix and finding
the geometric points. This method is useful to discover the interception points in the
geospatial image reorganization.
The Markov matrix is calculated using the below formula and is used to find land-use
changes. The transitions were obtained in ArcGIS [11] using different datasets with
the help of Raster Calculator tool. The obtained value is quantified and categorized
using the equation called TRDSDLUI (1). This formula used to derive from the land-
use/land-cover change, namely the synthetic dynamic land. LU is the land-usage
value from source to destination LiDAR points.
Σn | |
|ΔLUi− j |
TRDSDLUI = i=1
Σ × 100 (%) (1)
2 LUi
Σ12
Pi2
MFI = (2)
i=1
P
The numerous geometric points using the item observed in the image are provided
by LiDAR points. This has nothing to do with structural or other data. To forecast
the analysis in geospatial dangers of irregularity or incompletion, machine learning
and deep learning are applied. Due to their ability to predict items in images, LiDAR
points are extremely valuable for geospatial accounting.
Figure 2 explains the architecture of the sensing concept using the neural network
concept. The given inputs are considered as the attributes which will carry the inputs
in different layers and transform the output attribute.
The traffic locations and other land-cover details are identified and used to train the
MLP model. The classes/categories were calculated with the frequency ratio values
calculated using Eq. (3).
( ) ( ) ) )
N (F X i ) N Xj
FR = Σm / Σn ) ) (3)
i=1 N (F X i ) j=1 N X j
4 Interpretation Concept
The processing of LiDAR point clouds using machine learning is based on the concept
of interpretation. LiDAR points are useful to extract the features and provide the
applications or services such as the search area in the geospatial image. ML will
train the feature and create a model. The evaluation of outcomes is triggered by the
analysis processes.
Figure 3 presents a comparison of LiDAR point cloud interpretation to the work-
flow: (a) workflow and LiDAR point cloud (b) semantic workflow and raw data and
features in training data. The following functions will be met by the machine learning
engine:
. LiDAR point classification: Labels are computed and applied as per-point prop-
erties together with the likelihood for this category assignment according to
established point categories.
. Cloud segmentation: Segmenting LiDAR point clouds as a key process helps to
reduce fragmentation and subdivide big point clouds.
222 M. Shanmuga Sundari et al.
True Positives
Recall =
True Positives + False Positives
True Positives
Precision =
True Positives + False Negatives
Precision × Recall
F1-measures = 2 ×
Precision + Recall
True Positives
quality =
True Positives + False Positives + False Negatives
Using the image with LiDAR technique achieved the confusion precision value
82.35% and recall value 100%; F1 value 90% with overall quality is 82%. This
performance is attributed to the use of the optimization with equality constraints to
more accurately classify a pixel as road or non-road, the use of deep features to more
accurately represent the visual data of candidates, and the use of the sparse classifier
optimization model to more accurately classify each candidate as a traffic sign or
non-traffic lane.
The precision value of this traffic lane is 82.35%, and the quality of the image
recognized by LiDAR is 82.35% shown in Fig. 6. This can depend on the location
captured and differ according to coordinates.
Accuracy is measured using the geospatial image coordination and the flow of each
vehicle using the interception concept. So, traffic tracking process [12] is successful
using geospatial techniques.
224 M. Shanmuga Sundari et al.
In this research, we used the images to capture the airborne geo color-referenced
images and noisy data in the traffic signs. The steps for this research are (1) road
extraction, (2) traffic sign candidate detection, and (3) traffic sign classification. The
drone’s usages are also critical to capture LiDAR technology. Because this technique
makes use of small areas, even inaccessible positions in topographic equipment. It
is recommended to use in areas with slow altitude differences in the traffic can track
at an average altitude.
Local deep features are integrated with the sparse representation to optimize the
traffic sign candidates with different color images with all coordinate projections.
The proposed system shows the qualitative and quantitative effectiveness of the
traffic lane. The outcome showcased in our project is accurate detection in the traffic
lane and reduction of collisions using the optimization model for classification. In
the future, we plan to enhance our research in remote aerial vehicle technology for
high-resolution aerial images with LiDAR data.
References
1. Huang X, Gong J, Chen P, Tian Y, Hu X (2021) Towards the adaptability of coastal resilience:
Vulnerability analysis of underground gas pipeline system after hurricanes using LiDAR data.
Ocean Coast Manage 209:105694
2. Johnson KM, Ouimet WB (2021) Reconstructing historical forest cover and land use dynamics
in the northeastern United States using geospatial analysis and airborne LiDAR. Ann Am Assoc
Geogr 111(6):1656–1678
3. Padmaja B, Prasad VVR, Sunitha KVN, Reddy NCS, Anil CH (2019) Detectstress: a novel
stress detection system based on smartphone and wireless physical activity tracker. Adv Intell
Syst Comput 815. https://doi.org/10.1007/978-981-13-1580-0_7
4. Lakshmi L, Purushotham Reddy M, Praveen A, Suniha KVN (2020) Identification of diabetes
with recursive partitioning algorithm using machine learning. Int J Emerg. Technol 11(3)
5. Nelson JR, Grubesic TH (2020) The use of LiDAR versus unmanned aerial systems (UAS) to
assess rooftop solar energy potential. Sustain Cities Soc 61:102353
Detect Traffic Lane Image Using Geospatial... 225
6. Ureta JC, Zurqani HA, Post CJ, Ureta J, Motallebi M (2020) Application of nonhydraulic
delineation method of flood Hazard areas using LiDAR-based data. Geosciences 10(9):338
7. Malik R, Nishi M (2021) Flexible big data approach for geospatial analysis. J Ambient Intell
Humaniz Comput 1–20
8. Lyu F, Xu Z, Ma X, Wang S, Li Z, Wang S (2021) A vector-based method for drainage network
analysis based on LiDAR data. Comput Geosci 156:104892
9. Anders K, Winiwarter L, Mara H, Lindenbergh R, Vos SE, Höfle B (2021) Fully automatic
spatiotemporal segmentation of 3D LiDAR time series for the extraction of natural surface
changes. ISPRS J Photogramm Remote Sens 173:297–308
10. Thanh Ha T, Chaisomphob T (2020) Automated localization and classification of expressway
pole-like road facilities from mobile laser scanning data. Adv Civ Eng 2020
11. Ahmed C, Mohammed A, Saboonchi A (2020) ArcGIS mapping, characterisations and
modelling the physical and mechanical properties of the Sulaimani City soils, Kurdistan Region,
Iraq. Geomech Geoengin 1–14
12. Sundari MS, Nayak RK (2021) Efficient tracing and detection of activity deviation in event log
using ProM in Health Care Industry. In: 2021 Fifth international conference on I-SMAC (IoT
in social, mobile, analytics and cloud)(I-SMAC), pp 1238–1245
Classification of High-Dimensionality
Data Using Machine Learning
Techniques
1 Introduction
Several research works have been done past two decades. There is a recognition
technique known as handwritten digit recognition. This process involves converting
handwritten text from an image or a scanned file into an editable text. It is not possible
to identify all digits correctly even for human. Here, we implement dimensionality
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 227
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_22
228 D. Padmaja Usharani et al.
2 Related Work
Combining DR strategies can cause higher consequences than making use of simplest
one approach consistent with de Paula Rodrigues et al. [11]. Mardani et al. [12]
worked on World Development Indicator (WDI) dataset, and SVD used for reduction.
Zebari et al. [13] applied feature selection as well as feature extraction methods
and analyzed that high dimensionality of data has a direct impact on the learning
algorithm, computational time, computer resources (memory) and model accuracy.
Ramakrishna Murty et al. [14] suggested dimensionality reduction of a large text
data by least square SVM along with singular value decomposition. Dimensionality
reduction text data clustering with prediction of optimal number of clusters. Saleem
and Chishti [15] suggested lightweight CNN model on MNIST dataset based on
execution time.
P(x|c) ∗ P(c)
P(c|x) = (1)
P(x)
• P(c|x): The posterior probability of class (c, target) given predictor (x, attributes).
• P(c): The prior probability of class.
• P(x|c): The likelihood which is the probability of predictor given class.
• P(x): The prior probability of predictor.
The SVM algorithm is to make the best line or choice limit that can isolate n-
dimensional space into classes so we can undoubtedly place the new information
230 D. Padmaja Usharani et al.
point in the right classification later on. This best choice limit is known as a hyper-
plane. The fact is to track down a hyper-plane that characterizes and expands the edge
in a n-dimensional space. SVM picks the outrageous focuses/vectors that assistance
in making the hyperplane. These outrageous cases are called as help vectors, and
consequently calculation is named as SVM. SVM are a bunch of regulated learning
techniques utilized for arrangement, regression and exceptions’ location.
K-nearest neighbor [16] is one of the supervised learning methods. KNN calculation
can be utilized for regression as well as for arrangement; however, for the most part,
it is utilized for the characterization issues. It is known as instance-based or lazy
learner calculation since it doesn’t gain from the preparation set promptly, rather
than it stores the dataset and at the hour of grouping, it recognizes which class
an information point has a place dependent on how intently it matches with the k
closest neighbors. Ascertain the Euclidean distance between the information focuses.
The Euclidean distance is the distance between two focuses to observe to be close
surmised with the K-nearest neighbors.
Three distance (Euclidean, Manhattan and Minkowski distance) measures are just
legitimate for nonstop factors. In the instance of straight-out factors, the Hamming
distance should be utilized.
Minkowski distance: is somewhat more perplexing measure than most. Minkowski
distance is a measurement in a normed vector space which can be considered as a
speculation of both the Euclidean distance and the Manhattan distance. This action
has three prerequisites:
Zero vector: The zero vector has a length of nothing while each and every other
vector has a positive length. For instance, assuming we make a trip starting with one
spot then onto the next, that distance is consistently certain. In any case, assuming
we go from one spot to itself, that distance is zero.
Scalar factor: When you multiply the vector with a positive number its length is
adjusted while keeping its course. For instance, in the event that we head a specific
distance one way and add a similar distance, the course doesn’t change.
Triangle inequality: The briefest distance between two focuses is a straight line.
( k ) q1
Σ
D(x, y) = (|xi − yi |)q (3)
i=1
Classification of High-Dimensionality Data Using Machine … 231
PCA [1] is a linear dimensionality reduction method that is used to reduce the high
dimensionality of large datasets, by transforming a large set of features (mostly
contains all features) into a smaller set of features that still contains most of the
information in the large set. We need to reduce the dimensionality because smaller
datasets are easier to visualize and explore and make analyzing data much faster and
easier for ML classification. This contains five steps as follows:
Standardization: We need perform standardization prior to PCA. Mathematical
Eq. (4) to find, by subtracting the mean and dividing by the standard deviation for
each value of each variable.
x ij − x j
x ij = ∀j (4)
σj
All variables will be transformed into the same scale. We need to compute
covariance matrix to identify the correlations.
Σ 1 Σ( i )( i )T Σ
m
= x x , ∈ R n∗n (5)
m i
⎡ ⎤
| | |
U = ⎣ u 1 u 2 u 3 ⎦, u i ∈ R n (6)
| | |
Recast the data on the principal component’s axes. In the previous steps, to form the
feature vector, select the principal components but the input dataset always remains
in terms of the original axes.
4 Proposed Model
i. The MNIST dataset has 42,000 labeled (28 × 28 pixel) grayscale images of
handwritten digits from 0 to 9 in their and 28,000 unlabeled test images. In
order to identify the digits correctly, we use different classification techniques of
machine learning. Sample image of MNIST dataset is as shown in Fig. 2.
z standard score
μ population mean
σ standard deviation.
iii. The normalized data is experimented by using ML algorithms like Naive Bayes,
SVM and KNN. Performance of classifiers is then evaluated on the various
metrics like precision, recall, f 1 -score and accuracy.
iv. PCA is applied on the normalized data. The resultant reduced dataset is then
experimented by using the ML algorithms like Naive Bayes, SVM and KNN. The
obtained results are again evaluated using the aforementioned metrics precision,
recall, f 1 -score and accuracy.
234 D. Padmaja Usharani et al.
Metrics like precision, accuracy, recall and f1-score are used here to analyze the
performance of this method. We discuss about these metrics here.
Accuracy: Accuracy is the total number of predictions which are correct.
TP + TN
Accuracy = (10)
(TP + TN + FP + FN)
Precision: Precision is the ratio of total number of classified positive (TP) exam-
ples which are correct and the total number of all predicted positive examples (TP
+ FP). It gives correctness achieved in positive prediction.
TP
Precision = (11)
(TP + FP)
Recall: Recall is the ratio of total number of classified positive (TP) examples
which are correct and all positive predictions that could have been made.
TP
Recall = (12)
(TP + FN)
Precision × Recall
F1 score = 2 × (13)
Precision + Recall
6 Result Analysis
Table 1 Accuracy of
Classifiers Accuracy
classifiers with and without
DR Without DR n = 784 With DR (PCA) n = 70
SVM 0.9171 0.9381
NB 0.5447 0.8754
KNN (k = 3) 0.9400 0.9750
Eq. (11). KNN classifier gives 94% accuracy Without DR when k = 3. After applying
PCA, dimensionality has been reduced to 70, KNN gives 97.5% accuracy. SVM and
NB also increases its accuracy with PCA up to 93.8 and 87.5%.
Precision of different classifiers with and without DR are given in Table 2 using
Eq. (10). Precision to predict each digit (0–9) by different algorithms has been iden-
tified as follows. Precision values were low when we use only classification algo-
rithms. Precision has increased when classification used together with PCA. Among
all techniques PCA + KNN gives us best result.
Recall of the different classifiers with and without DR are given in Table 3 using
Eq. (13). Recall to predict each digit (0–9) has been identified as follows. Recall
values were very low in case of NB. Recall increased when we use classification
together with PCA.
F 1 -score of various classifiers with and without DR are given in Table 4 using
Eq. (13). F 1 -score to predict each digit (0–9) correctly has been identified as follows.
F 1 -score very low when we use only classification. We can identify that these values
were increased when we use classification together with PCA.
7 Conclusion
In this paper, the effect of DR using PCA on ML classification algorithms has been
investigated. The MNIST has 42,000 labeled (28 × 28 pixel) grayscale images,
totally 784 features. ML classification (Naïve Bayes, SVM and KNN) applied on
raw dataset as well as reduced dataset, results have been identified. PCA and ML
classification algorithms together gives better results. In the future, the effectiveness
of DR technique can also be applied on other dataset such as text data and image
dataset (which contains high dimensionality). Other DR techniques and classification
algorithms can also test.
Classification of High-Dimensionality Data Using Machine … 237
References
P. Praveen (B)
Department of Computer Science and Artificial Intelligence, SR University, Warangal, Telangana
506371, India
e-mail: [email protected]
M. Nischitha · C. Supriya · M. Yogitha · A. Suryanandh
Department of Computer Science Engineering, SR Engineering College, Warangal,
Telangana 506371, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 239
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_23
240 P. Praveen et al.
1 Introduction
All of these issues might be solved with organic farming. Pest and disease manage-
ment, as well as fertilization, are the most important aspects of organic farming.
Disease identification is a complex undertaking that necessitates prior experience.
Infections colorful dots or streaks, as well as associated symptoms, are frequently
visible on plant leaves. Microbes such as fungi, bacteria, and viruses are widely
found in the environment accountable for plant illnesses. The signs and symptoms
of plant illness vary depending on the disease’s cause or ethnology [1–3].
The current method for identifying plant diseases is basic naked-eye inspection,
which necessitates more staff, properly equipped laboratories, costly technologies,
and so on. Incorrect disease identification can lead to incorrect pesticide application,
which can contribute to the development of long-term pathogen resistance and a
decrease in the crop’s ability to defend you. Plant disease can be identified using a
variety of methods. The plant’s leaves that have been infected even skilled agricul-
tural professionals as well as plant pathologists commonly fail to diagnose specific
diseases due to this complication, as well as the enormous number of crops in devel-
opment and their current status. Photo pathogenic concerns, resulting in incorrect
results and concern remedies. Identifying plant infections is the only way to avert
some losses in agricultural output quantities [4–6].
To be sustainable, agriculture must rely on plants to detect diseases, yet physi-
cally monitoring plant diseases is challenging. It necessitates a substantial amount
of work, as well as plant disease knowledge and a lengthy processing time. Our
project is based on a convolution neural network-based system that detects cotton
leaf illnesses. It makes it easier to detect bacterial illnesses and their consequences
on the environment. It is difficult to pinpoint disease in crops in the early stages. This
task requires farmers to be physically present. Detecting and identifying diseases on
the crop are really important [7–9].
Agriculture is essential for feeding the world’s populations of humans and live-
stock with the help of introduction agriculture’s use of renewable energy technology
involvement in generation of clean energy has grown. Agriculture offers raw mate-
rials as well for textile, chemical, and pharmaceutical manufacturing. Despite having
only a 10% chance of winning, increase in the amount of land utilized for agriculture
in between 1960s and in the early twentieth century, agriculture output increased
thrice [1, 10].
2 Related Work
We have chosen a few papers that deal with employing advanced approaches to detect
plant leaf diseases, and we have included a few of them below.
Using K-mean clustering, texture, and color analysis, the authors of paper [1, 11]
devised a method for identifying disease in Malus domestic. It makes use of textures
To Detect Plant Disease Identification on Leaf Using … 241
and colors that are unique, widespread in both healthy and diseased areas to identify
and distinguish between different agricultural types.
The author of paper [4] looked at and studied a total of 40 research projects that
used approaches for deep learning address a variety of food and agricultural produc-
tion issues. Investigate the unique agricultural issues being investigated, frameworks
and models that were utilized, as well as the overall effectiveness attained using
the measures for each task under inquiry. Also, compare deep learning to other
well-known approaches to see if the classification or regression results differ.
Deep learning outperforms traditional image processing systems [2, 12]. In a
research paper (SVM), the implementation of an SVM-based regression approach
resulted in a more accurate. The relationship between environmental factors and
disease severity is described, which could be useful in disease management.
[Sujatha R., Y. Sravan Kumar, and Garine Uma Akhil]
According to this study, plant disease in the agriculture sector is fairly challenging.
There is a significant loss in agricultural production and market economic value
if the identification is incorrect. The detection of leaf diseases necessitates a vast
amount of data of labor, plant disease knowledge, and additional processing time.
As a result, we may use MATLAB image processing to detect leaf disease. Image
loading, contrast improvement, RGB to HSI conversion, feature extraction, and SVM
are all processes in the sickness detection process. This study uses image processing
techniques to provide a method for detecting and categorizing plant diseases that is
both efficient and accurate. K-means and GLCM algorithms are used to detect plant
leaf disease [13, 14]. This method automates the process, reducing detection time
and labor expenses.
[Dr. Gagan Jindal Chandigarh, Simranjeet Kaur, Geetanjali Babbar, Navneet Sandhu]
Plant leaf diseases must be identified, as summarized in this research article, is a
preventative step in reducing yield loss and overall crop quantity in agriculture.
Observing and recognizing patterns that are observed and engraved on the leaves is
the essence of plant disease research. As a result, early disease diagnosis of any plant
before it has a negative impact becomes crucial for long-term agriculture sustain-
ability. However, due to the high costs involved with the technique, which need a
large amount of work, energy, talent, and, last but not least, processing time, manu-
ally identifying, monitoring, and drawing conclusions from plant leaf diseases is
exceedingly difficult. As a result, image processing concepts come in handy and
are used to diagnose diseases. The detection method includes picture acquisition,
segmentation, picture preprocessing, segment feature extraction, and classification
based on the findings [15–17].
242 P. Praveen et al.
3 Problem Statement
This is a study of leaf disease detection using various models of machine learning
this indicates whether or not the leaf will have a disease or not. Figure 1 depicts
the pipeline to early disease of leaf disease. At the start of the convolution layer, a
convolution core is defined. The essential benefit of the convolution neural network
is the local receptive field, which may be thought of as a local receptive field. When
the convolution core processes data input, it slides onto the feature map and extracts
a piece of feature information. After the convolution layer has retrieved the feature,
the neurons are sent to the pooling layer. The mean, maximum, and random values
of all values in the local receptive field are currently calculated using current pooling
methods.
In this project, initially, we are gathering dataset from Kaggle that contains train
and valid datasets of images including various diseased leaves. These sets of data
have ten classes in total where nine classes are diseased leaves and one class of
healthy leaves. After importing datasets, we do image processing that includes
resizing and reshaping to do this we have to import ImageDataGenerator from
keras.preprocessing.image followed by checking the model to ensure all the params
are trainable.
In addition to that we import and preprocess an image from valid dataset later
display the imported image and send through different layers on convolution neural
network. In next step, the process is visualization, here we acquire different filters
or features based on given number of rows and columns that are result of CNN
from previous process. Later, we train the model with certain epochs and mentioned
learning rate to acquire better accuracy, there more the epochs are the better the
accuracy is. After training, we plot a graph of training and validation accuracy. The
next step ahead is saving the model by importing load_models from keras.models
and save it in any location on the disk. The final step is to import an image from
valid dataset, preprocess and detect whether the provided input leaf is diseases or
healthy. We also find the probability of the given leaf with the leaf from training
dataset. Hence, we can find the status of leaf whether it is a diseased leaf or a healthy
leaf. Convolution networks are a type of neural network that has been shown to be
particularly effective at picture recognition and categorization.
The information was gathered from the Kaggle website, which provides eleven char-
acteristics that can be used to detect the disease a leaf is suffering from. The qualities
of tomatoes are investigated in this research. Tomato Leaf Mold with a Bacterial
Blemish Tomato spider mites, tomato Target Spot Tomato Yellow Leaf Curl Virus,
and Septoria leaf spot tomato two-spotted spider mite. There are a total of 19,286
leaf images in the dataset, with ten records being disease free.
This table consists of different types of diseases; this information is completely
based on the disease which is occurred on different types of leaves of tomato. That
means different spots are considered as a different type of disease. A leaf can have
different type of disease, and here, we display the types of diseases.
The local receptive field of the convolution neural network is its main benefit.
The core of the convolution glides across the screen. While processing data, use a
feature map to retrieve a piece of feature information. The most common pooling
algorithms today include all values in the local receptive field’s mean, maximum,
and random values are calculated. Convolution neural networks (CNN) are a type of
neural network that has been found to be particularly effective in image recognition
and categorization. Convents have been used to recognize faces, objects, and traffic
signs, as well as to power robotics. Conversion by learning visual attributes and
employing small filters, it is possible to preserve the spatial relationship between
pixels. The convolution network is made up of many critical components.
The database’s name is image reshaping, resizing, preprocessing includes things
like array conversion and data transformation. A similar treatment is applied to the
test image. A database of roughly 10,000 plant species is compiled, and any image
from that collection can be utilized as a software test image. The model (CNN) is
trained to recognize the test image, and the ailment it is suffering from using the train
database. CNN layers include Dense, Dropout, Activation, Flatten, Convolution2D,
and MaxPooling2D. If the plant species is in the database and the model has been
properly trained, the program can detect the illness. After adequate training and
preprocessing, the comparison is made. To predict the disease, a comparison of the
test image and the trained model is used.
244 P. Praveen et al.
An algorithm processes the image as soon as it arrives on the server. By convolving the
filter across the image, we extract image features using a convolution process, which
produces feature maps such as edges, texture, spots, holes, and color. These feature
maps are down sampled before being passed to a layer that is entirely connected,
such as a classifier, where ReLu, or non-linearity, is used to solve a hard task like
classification (Fig. 2).
. Step 1: Resize all of the photographs in the collection to 256 × 256.
. Step 2: The information is divided into two groups: training and testing.
. Step 3: Data augmentation: To minimize over fitting, the training set is augmented
by rotating, scaling, and adding random noise to images.
. Step 4: Extraction of features: Features would be obtained using the convolution
technique to generate layers in the CNN design.
. Step 5: Model training: In our circumstance, we use the sequential model. The
sequential model API allows you to create deep learning models by creating a
sequential class and layering model layers on top of it.
. Step 6: Evaluation: The model’s correctness will be tested using a test set.
. Step 7: Tuning: If the results are not what you expected, fine-tune the model by
changing architecture elements such as kernel size and nodes.
. Step 8: Save the weights: Save the final model under the model name once you
have completed training it.
. In order to use it with new data, you will need to edit the h5 configuration file.
. Step 9: A web application based on Flask would be built to upload images to the
server and display the results.
. Step 10: These programmers are in responsible of preprocessing the user’s
uploaded image, categorizing it based on its features, and presenting results.
. Step 11: Photograph a scene, resize it, and upload it to the server.
. Step 12: Compare the extracted characteristics to the model that has been trained.
We considered ten tomato disease classes which holds around 10,000 images.
This work provides a genuine idea for detecting the attacked leaf, and the farmer
who works to produce these receives a remedy, allowing them to increase agricul-
tural production. Specialists in the agriculture department accept image processing
techniques for rapid disease detection, and as a result, image processing technology
has reached a significant milestone in a relatively a short time frame. The transited
portion of the leaf is easily segmented and analyzed using the CNN model, and the
best possible result is provided instantly. As a result, farmers who detect plant disease
manually can save time and reduce their risk of misdiagnosis. Our long-term goal is
to create an open multimedia system, and software that can automatically detect and
treat plant diseases (Fig. 3).
In this code sample, we will train the dataset and find its accuracy. So, after
writing the code, we will get the output in terms of graphs, the redline represents the
validation accuracy and blue line represents the training accuracy. Training accuracy
can be calculated by diving the number of correct predictions by the number of total
predictions.
Fig. 3 Training the dataset and finding the validation accuracy of the dataset
246 P. Praveen et al.
6 Result Analysis
After calculating the validation and training accuracy. We have saved the model, by
using module from keras.model import load module, later we have loaded the saved
model for testing purpose to get the accurate results. Followed by preprocessing
test data by using modules from keras.model import sequential. Here, we are giving
image width and height as 256 (256 × 256).
Tomato results (Fig. 4):
In this, we have validated the accuracy by training the model. Here, we
have predicted whether the leaf is diseased or not. By using this CNN model,
we have predicted good accuracy. After preprocessing our images to size
256 × 256 from keras.preprocessing import image module, we have loaded
the image using image.load_img and then converted the image into array using
image.img_2_array(img) img-temporary variable for a particular image. There are
two lists in this section. A key list is the first, while a value list is the second.
We can eventually determine which form of sickness is affecting the leaf by using
these classes. Because we had ten distinct sorts of illnesses, we were able to figure
out which one impacted the leaf. Here, we can observe that the leaf belongs to
Tomato_mosaic_virus class that means it belongs to diseased class. The word result
in the above code is defined to assign the prediction of the image taken from one of
the classes of valid dataset. In this, we take help of dictionaries to get keys and values
of the classes after converting into lists. Finally, we will display the index position
of the result value to show the class name like in this case it is Tomato_mosaic_virus
(Fig. 5).
After preprocessing, we have predicted to which class the output belongs to. We
have taken test image from validation dataset by using model.predict classes. The
disease keyword in the above code is to show the test image figure. Hence, we have
predicted that the leaf is healthy. After assigning the location of the image to result,
loading certain image and displaying it we have obtained the probability of the leaf
based on the result using np because it is used to perform array operations efficiently.
In this case, the leaf belongs to healthy class as we have mentioned earlier there
are ten various classes in total where nine of them contains diseased leaves and
one possess healthy leaves. Here, the output leaf belongs to healthy class with 99%
accuracy so it displayed leaf is healthy. In the second test case, we have taken another
leaf from the valid dataset by using model.predict classes same like in the previous
test case. The same process is followed from preprocessing we have predicted to
which class the output belongs to. The disease keyword in the above code is to
show the test image figure. Here, the leaf given in the input compares with all the
images from training set and predicts the class of the given leaf. In this test case,
the leaf belongs to Tomato_Spider_mites_Two_spotted_spider_mite which means it
is a diseased leaf as apart from healthy_leaf class all other classes contain diseased
leaves. The probability that it belongs to that class is 99%.
7 Conclusion
Although there are several approaches for detecting and classifying plant diseases
using automated or computer vision, this study area is yet immature. Furthermore,
with the exception of those dealing with plant species recognition based on leaf
photos, no commercial solutions are available on the market. In this paper, we will
248 P. Praveen et al.
look at, an algorithm for deep learning was used to investigate a novel method-
ology for automatically categorizing and identifying plant illnesses from leaf photos.
The proposed model was successful in detecting the presence of leaves and distin-
guish between 13 healthy leaves and distinct disorders that could be identified visu-
ally. From picture collecting for training and validation to image preprocessing and
augmentation, and eventually training and fine-tuning the deep CNN, the full tech-
nique was detailed. The performance of the newly designed model was evaluated
using a series of tests.
References
1. Kulkarni AH, Ashwin Patil RK (2012) Applying image processing technique to detect plant
diseases. Int J Mod Eng Res 2(5):3661–3664
2. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot diseases using image
processing edge detection techniques. In: IEEE International conference on emerging trends
in science, engineering and technology, Tiruchirappalli, Tamil Nadu, India, pp 169–173
3. Al-Tarawneh MS (2013) An empirical investigation of olive leave spot disease using auto-
cropping segmentation and fuzzy C-means classification. World Appl Sci J 23(9):1207–1211
4. Argenti F, Alparone L, Benelli G (1990) Fast algorithms for texture analysis using co-
occurrence matrices. IEE Proc Radar Signal Process 137(6):443–448
5. Wang H, Li G, Ma Z, Li X (2012) Image recognition of plant diseases based on back propagation
networks. In: 5th International congress on image and signal processing, Chongqing, China,
pp 894–900
6. Arivazhagan S, Newlin Shebiah R, Ananthi S, Vishnu Varthini S (2013) Detection of unhealthy
region of plant leaves and classification of plant leaf diseases using texture features. Comm Int
Genie Rural (CIGR) J 15(1):211–217
7. Jaware TH, Badgujar RD, Patil PG (2012) Crop disease detection using image segmentation. In:
National conference on advances in communication and computing, World Journal of Science
and Technology, Dhule, Maharashtra, India, pp 190–194
8. Zhang Y-C, Mao H-P, Hu B, Li M-X (2007) Feature selection of cotton disease leaves
image based on fuzzy feature selection techniques. In: Proceedings of the 2007 international
conference on wavelet analysis and pattern recognition, Nov 2007, Beijing, China, pp 124–129
9. Arivazhagan S, Newlin Shebiah R, Ananthi S, Vishnu Varthini S (2013) Detection of unhealthy
region of plant leaves and classification of plant leaf diseases using texture features. Agric Eng
Int CIGR 15(1):211–217
10. Shaik MA, Verma D (2020) Deep learning time series to forecast COVID-19 active cases in
INDIA: a comparative study. IOP Conf Ser Mater Sci Eng 981:022041. https://doi.org/10.1088/
1757-899X/981/2/022041
11. Praveen P, Shaik MA, Kumar TS, Choudhury T (2021) Smart farming: securing farmers using
block chain technology and IOT. In: Choudhury T, Khanna A, Toe TT, Khurana M, Gia Nhu N
(eds) Blockchain applications in IoT ecosystem. EAI/Springer innovations in communication
and computing. Springer, Cham. https://doi.org/10.1007/978-3-030-65691-1_15
12. Shaik MA, Verma D (2021) Agent-MB-DivClues: multi agent mean based divisive clus-
tering. Ilkogretim Online Elementary Educ 20(5):5597–5603. https://doi.org/10.17051/ilkonl
ine.2021.05.629
13. Pramod Kumar P, Sagar K (2019) A relative survey on handover techniques in mobility
management. IOP Conf Ser Mater Sci Eng 594:012027
14. Shaik MA, Verma D, Praveen P, Ranganath K, Yadav BP (2020) RNN based prediction of
spatiotemporal data mining. IOP Conf Ser Mater Sci Eng 981:022027. https://doi.org/10.1088/
1757-899X/981/2/022027
To Detect Plant Disease Identification on Leaf Using … 249
15. Kumar S, Manjula B, Shaik MA, Praveen P (2019) A comprehensive study on single sign on
technique. Int J Adv Sci Technol (IJAST) 127. ISSN: 2005-4238; E-ISSN: 2207-6360
16. Shaik MA, Verma D (2020) Enhanced ANN training model to smooth and time series forecast.
IOP Conf Ser Mater Sci Eng 981:022038. https://doi.org/10.1088/1757-899X/981/2/022038
17. Praveen P, Babu CJ, Rama B (2016) Big data environment for geospatial data analysis. In: 2016
International conference on communication and electronics systems (ICCES), Coimbatore, pp
1–6
18. Ravi Kumar R, Babu Reddy M, Praveen P (2019) An evaluation of feature selection algorithms
in machine learning. Int J Sci Technol Res 8(12):2071–2074. ISSN 2277-8616
Association and Correlation Analysis
for Predicting the Anomaly in the Stock
Market
Abstract The stock market is more volatile and fluctuate along with the time, the
rapid change of the price value; it is very difficult to predict the price of the stock.
Stock market price is mostly determined by the demand of the stock, which is deter-
mined by the gross purchase and sales. In the stock market, these are mostly done
by the domestic intuitional investors (DII) and foreign intuitional investors (FII).
Their percentage of investment is very huge compared to the retail investors in the
market. The price change is mostly determined by the activity done by the FII and
DII. The market price is dominated by the FII and DII; in this work, we identified the
association and correlation between the FII and DII activities. The results show the
suspicious anomaly between the FII and DII. In the Indian stock market, every day
an average of 6.43 billion shares was traded depending on total composite volume.
But surprisingly in the last one decade, the DII and FII are negatively correlated.
1 Introduction
The most essential questions for investors in the stock market are how to predict
future stock values. It is most uncertain thing in the market, even the huge develop-
ments and research are conducted on various subjects like mathematics, statistics,
machine learning, data mining, and deep learning; none of the model has predicted
the price accurately. The specific model or tool to identify the price of the market
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 251
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_24
252 R. Ravinder Reddy et al.
or the momentum of the market is very difficult. In our observation of the market
over a period of decade, we have come to one conclusion. These markets mostly
depend on the FII and DII activities. These institution can pour the large quantity
of money into the market; they easily change the direction of the market. Many
various methodologies, mathematical formulations, genetic algorithm (GA)-based
models, neural network models, machine learning-based techniques, and so on, have
all been presented and tested to varying degrees of success. Among these, the surprise
behavior of the FII and DII is detected in this work. Using these correlations, we can
predict the price up to a certain level more accurately.
Mostly, the traditional users believe that the market movement is normal, but it is
deviated by the FII and DII. The strategy in between the FII and DII has observed and
is mostly negatively correlated which influences the market condition. The activity
is involved very highly to divert the actual move of the market. The traditional
user’s general public may feel that the market move is predicted as per the financial
conditions in the current scenarios. But, it may not be accurate and true, because of
these big players based on some understandings move the markets in their direction
to create a panic situations among the general investors [1].
Increased computational power makes the users to determine the price and study
the different strategies easily. Many of the existing tools identify the flaws and trade
value of the different people. The main aim of this work is to determine the direction
of the stock market, which is the real move or anomaly move. This work majorly
focuses on the FII and DII data, which are collected in the real-time mode. Most
of the time, the data shows the fact that the FII and DII are inversely proportionate
activity. This influences the market for the long period of time.
The basic building blocks of the model consist the following stages; in each
stage, the model is crucial in decision-making process. The data collection is the
more challenging part; it is taken from the real-time data, collected from the Indian
stock market.
1. Data collection
2. Data analysis
3. Feature engineering
4. Model selection
5. Model building.
2 Related Work
Most of the researchers have contributed toward the stock price prediction using
machine learning and deep learning [2]. More people invest their money in the stock
market to get the maximum benefit, but really it is not happening due to the huge
correction by the different fund houses playing the strategies to get the maximum
money. However, this kind of investment possesses a lot of risks. Detecting such kind
of risks and the anomaly in stock market price is important, before the smart player
or smart investors like FII’s and DII’s exit their positions. Here, the risk and the
Association and Correlation Analysis for Predicting … 253
anomaly are the different perspectives; risk is to gain the money believing the other
persons. But, the anomaly is taking the money by creating the panic environments
by using different strategies.
The distribution phase of Wyckoff theory discusses the accumulation phase of
the stocks, and it is opposite to the accumulation in the distribution phase and as it
is defined by Richard Wyckoff [3]. This is an oldest theory just a slight background
about Richard Wyckoff, he used to write in wall street journal long time ago, and the
theory was defined that most of the fund managers and most of the smart investors
exist and retailers do not. The big gambling is happened in the stock market because
everyone wants the maximum profits [1].
The traders generally retail investors feel that prices have retraced and going to
heading the new high no; it tests those levels which are generally known as lpsy last
point of supply; this generally happens with gaps maximum time, where we witness
gaps get down re-test of gaps, prices go down again, and again the selling pressure
continues; generally, this distribution happens and stock price forms major top over
here, and there can be talk for weeks to come months and even years. The Wyco
theory really helps us to find out the right exit position of the stock, most of the fund
managers use this wisely to exit their positions without affecting the selling price,
and it can be implemented on any asset class which has the volume data, so volumes
here play a key role you need to keep in mind [4].
The other researcher used a histogram-based detector to detect irregularities,
which were then retrieved via association rule mining. The Apriori and FP growth
algorithms were used to build the metadata rules collection. In that study, the author
compared the results of the Apriori technique with the FP growth algorithm, revealing
how the FP growth algorithm achieves better results in terms of lowering time
and space complexity. The implementation of the FP growth algorithm has been
postponed as a future development [5].
3 Data Preparation
The stock dataset contains several characteristics, including both continuous and
categorical data. These values should be quantitative and categorized to aid in the
optimization process. All alphanumeric values must now be translated to numeric
values first, followed by continuous values being changed to categorical values [6].
As shown in Table 1, the day-wise DII and FII activities regarding the purchase
or sale are collected for the month of January 2020. We have collected the data for
the past 10 years from the NSE India. This data is collected for the Indian stock
market. These data is pre-processed to remove any inconsistencies and redundant
and missing values.
254 R. Ravinder Reddy et al.
4 Methodology
The collected market data is used to analyze the behavior of the FII and DII. To
analyze these data, here we used the association and correlation of the DII and FII,
which impact the users how the movement of the market is in the real scenarios [7,
8]. The major thing is to analyze their behavior with the FII and DII along with the
market movement. The positive and the negative analysis of the data is used. Here,
the value negative represents the net sales in the market.
Association and Correlation Analysis for Predicting … 255
Data mining for the most part refers to the method involved with mining information
from a huge data. In this process, the new information is predicted by comprehending
the current information. By large, data mining is classified into two classes: predictive
and descriptive. The general properties of the information in the dataset are depicted
by elucidating mining. Predictive mining performs derivation on present information
to make expectations [9, 10].
Specifically, two data mining approaches have been proposed [11] and used for
anomaly disclosure: association rules and recurrence episodes. Association rule algo-
rithms notice connections between elements or properties used to portray a dataset.
Association rules mining began a strategy for tracking down interesting rules from
value-based datasets.
Association rule mining was proposed in, where the formal definition of the issue
is introduced as: Let L = {i1 ,..., in } be a set of literals, called items. Let dataset D
alone be a set of records, where each transaction T is a set of items to such an extent
that T L. Associated with each exchange is an original identifier, called its transaction
id (TID).
The transaction T contains X, a set of specific items in L, if XL. An association
rule is a ramifications of the form X → Y, where X L, Y L, and X ∩ Y = Ø. The
standard X → Y holds in the transaction set D with confidence c if c% of transaction
in D that contains X moreover contains Y. The standard X → Y has support s in the
transaction set D if s% of transaction in D contains X U Y.
Given a set of items I = {I1, I2,…,Im} and an dataset of transaction D = {t1,
t2, …, tn} where ti = {Ii1, Ii2, …, Iik} and Iij I, the association rule problem is to
recognize all association rules X → Y with a base help and confidence. The support
of the standard is the degree of trades that contains both X and Y in all transaction
not entirely settled as |X Y|/|D|. The support of the standard gauges the significance
of the correlation between item sets. The confidence is the degree of transaction that
contains Y in the transaction that contains X. The confidence of a standard gauges
the degree of correlation between the item sets which is defined as |X Y |/|X|. The
support is an extent of the recurrence of a standard, and the certainty is an extent of
the strength of the connection between the arrangements of things [11].
1. Correlation is a bivariate investigation that measures the strength of relationship
between two factors and the direction of the relationship. As far as the strength of
relationship, the value of the correlation coefficient changes among +1 and −1.
A value of ±1 shows an ideal level of association between the two factors. As the
correlation coefficient esteem goes toward 0, the relationship between the two
factors will be more vulnerable. The direction of the relationship is demonstrated
by the indication of the coefficient; a + sign shows a positive relationship, and
a− sign shows a negative relationship. As a rule, in insights, we measure four
kinds of correlations [12].
256 R. Ravinder Reddy et al.
2. Pearson correlation,
Σ Σ Σ
n xi yi − xi yi
rx y = / / (1)
Σ ( Σ 2 ) Σ (Σ )2
n xi2 − xi n yi2 − yi
Nc − Nd
τ (2)
1
2
n(n − 1)
N c Number of concordant.
N d Number of discordant.
4. Spearman correlation,
Σ
6 d12
ρ =1− (3)
n(n 2 − 1)
5. Point-Biserial correlation,
M1 − M0 √
r pb = pq (4)
sn
M 1 Mean (for the entire test) of the group that received the positive binary
variable.
M 0 Mean (for the entire test) of the group that received the negative binary
variable.
S n Standard deviation for the entire test.
p Proportion of cases in the “0” group.
q Proportion of cases in the “1” group.
Association and Correlation Analysis for Predicting … 257
The correlation is performed for the FII and DII net purchase/sale; we identified that
the correlation among these two is negative −0.35212. But, the analysis of the results
shows that it is reverse of the market action. The market has to move upward if there
is a tremendous purchase and downward based on the selling pressure. But instead
of the demand and supply of the market, the FII and DII are artificially creating the
trends in the market.
In this, we performed the three correlation measures.
1. Kendall
2. Pearson
3. Spearman.
In all these correlation, the results are showing the negative relation, given in
Table 2.
As shown in Fig. 1, we show the correlation between the FII and DII which shows
purely these two are negatively correlated. This will impacts the market hugely. The
relations show that it is anomaly in between the two parties’ agreement [13–16].
Because which is to be happened as a casual relation, but in the past one decade
data compared, it shows that it is negatively correlated. But, it may not happen in the
general market. When we analyze the general person perspective, most of the time
it shows that it is positively correlated in between the users. In the general scenarios
also, most of the users will see the price movement based on the demand and supply.
It may not happen here. The fishy thing is both the behaviors negatively correlated;
this is surprising and shocking thing. It is a big surprise to everyone, among the
billion transactions; both the FII and DII behaviors are negatively associated. The
association not depends on few parameters.
6 Conclusion
To study the movement of the stock-market and patterns among sectorial indexes,
association rule mining and statistical correlation analysis were used. According
to the study’s findings, various sectoral indices are connected together. Another
intriguing discovery is that distinct industry indexes have a time-lag relationship.
This correlation can be used to estimate the direction of future index movement with
a forecast horizon of d days, where d is the number of days lag considered. We tested
258 R. Ravinder Reddy et al.
References
1. Coleman H (2021) Is the stock market gambling? Why trading in the stock market isn’t
gambling, February 2021
2. Marchai FL, Martin W, Suhartono D (2021) Stock prices prediction using machine learning.
In: 2021 8th international conference on information technology, computer and electrical
engineering (ICITACEE). IEEE
3. Louis S, McGraw G, Wyckoff RO (1993) Case-based reasoning assisted explanation of genetic
algorithm results. J Exp Theor Artif Intell 5(1):21–37
4. US Equities Historical Market Volume Data, February 2021, [online] Available: https://www.
cboe.com/us/equities/market_statistics/historical_market_volume/
5. Aung KMM, Oo NN (2015) Association rule pattern mining approaches network anomaly
detection.In: Proceedings of 2015 international conference on future computational technolo-
gies (ICFCT’2015) Singapore. 2015
6. Jyothsna V, Rama Prasad VV (2016) FCAAIS: anomaly based network intrusion detection
through feature correlation analysis and association impact scale. ICT Express 2(3):103–116
7. Umer M, Awais M, Muzammul M (2019) Stock market prediction using machine learning
(ML) algorithms. ADCAIJ: Adv Distrib Comput Artif Intell J 8(4):97–116
8. Ding G, Qin L (2020) Study on the prediction of stock price based on the associated network
model of LS TM. Int J Mach Learn Cybern 11(6):1307–1317
Association and Correlation Analysis for Predicting … 259
9. Kamalov F (2020) Forecasting significant stock price changes using neural networks. Neural
Comput Appl 32(23):17655–17667
10. Henrique BM, Sobreiro VA, Kimura H (2018) Stock price prediction using support vector
regression on daily and up to the minute prices. J Finance Data Sci 4(3):183–201
11. Nalavade K, Meshram BB (2014) Finding frequent itemsets using apriori algorithm to detect
intrusions in large dataset. Proc 2014 IJCAIT 6(I):84–92
12. Kendall M, Gibbons JD (1990) Rank correlation methods edwardarnold. A division of hodder
and stoughton, A Charles Griffin title, London, pp 29–50
13. Su S et al (2019) A correlation-change based feature selection method for IoT equipment
anomaly detection. Appl Sci 9(3):437
14. Saboori E, Parsazad S, Sanatkhani Y (2010) Automatic firewall rules generator for anomaly
detection systems with Apriori algorithm. In: 2010 3rd international conference on advanced
computer theory and engineering (ICACTE), pp V6-57–V6-60. https://doi.org/10.1109/ICA
CTE.2010.5579365
15. Razaq A, Tianfield H, Barrie P (2016) A big data analytics based approach to anomaly detec-
tion. In: Proceedings of the 3rd IEEE/ACM international conference on big data computing,
Applications and Technologies
16. Mazel J, Casas P, Labit Y, Owezarski P (2011) Sub-space clustering, inter-clustering results
association and anomaly correlation for unsupervised network anomaly detection. In: 2011 7th
international conference on network and service management, pp 1–8
Early Identification of Diabetic
Retinopathy Using Deep Learning
Techniques
1 Introduction
The retina is the innermost, thin layer of tissue that is situated at the back of the eyeball
from inside. It is situated near the optic nerve. The retina receives the light that has
focused on the lens, converts it into the signals and transmits that signal to the brain for
enabling us to see [1]. The most common eye disease is diabetic retinopathy. Usually,
diabetic retinopathy affects those people who have had diabetes for a significant
number of years but they may have or have not gone through diagnosis. Diabetic
retinopathy can affect any diabetic person and if it is left untreated for a longer time,
it may become dangerous and the risk of blindness may increase [2].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 261
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_25
262 S. Sharma et al.
Sometimes increase in blood glucose levels causes changes in retinal blood vessels
and it may cause diabetic retinopathy. These blood vessels swell and leak fluid
occasionally, or even close off completely. In the other case, an abnormal new blood
vessel grows on the surface of the retina [3]. The patients suffering from diabetic
retinopathy may subject to blurry vision or a black spot appear on the image or
complete vision loss. Early identification of Diabetic retinopathy is more important
to recover the eyesight and provide treatment in time. Ophthalmologists perform the
identification of DR manually which has drawbacks like:
. It is a time-consuming task and expensive.
. Preclinical signs are not easily detected with manual grading.
2 Literature Review
classified images and detected the affected lesions. In [6] authors have used machine
learning techniques for the detection of diabetic retinopathy. First preprocessing is
done on retinal fundus images using the green channel, histogram equalization, crop-
ping, and resizing techniques. Images were divided into two different datasets; one
was in normal retinal images and the other was in affected retinal images. A total
of 14 features of DR were extracted from both normal and diabetic retinal fundus
images datasets. These even features were used for comparison and classifying the
images into normal and diabetic fundus images. As per the authors, it was observed
that exudate is the best feature that can be used primarily for diabetic detection from
the results obtained. After exudate blood vessels and other features can be used for
detection. Authors in [7] covered a detailed survey about the identification of diabetic
retinopathy in light of almost 150 research articles, summarized with the collection
of retinal datasets, adoption of different kinds of methodologies to detect diabetic
retinopathy, and selection of the performance evaluation metrics for the representa-
tion of their outcomes. Initially, retinal datasets are discussed and then several kinds
of approaches have been explained to detect the retinal abnormalities including retinal
neovascularization, hemorrhages, microaneurysm, and exudates. Moreover, the role
of evaluation metrics for computer-aided diagnosis (CAD) systems has been briefly
discussed. Authors in [8] suggest that early detection of glaucoma is necessary and
that treatment should be done in time. Otherwise, in affected patients, it can cause
permanent blindness. For this he proposed morphological filtering algorithms for
preprocessing or enhancement of retinal images. In the morphological enhancement
module, input images are converted into gray scale images, then a channel extracts
the optic cup and optic disc from the images. Here, top-hat transform highlights the
bright objects on a dark background, and bottom-hat transform highlights the darker
regions of the image [9]. Then, in morphological operations, bottom-hat transform
is subtracted from top-hat transform and channel images are merged, and converted
into grayscale images.
3 Methodology
Figure 2 represents our proposed system architecture. First step includes acquiring
the input image. We have used ATPOS 2019 Kaggle dataset which is available freely
online [10]. In the second step, various preprocessing steps like cropping, resizing,
converting to gray and Gaussian blur have been applied on the dataset so that training
of the model may improve. In the third step of architecture, feature extraction is
applied which includes various features like Microaneurysm, exudates, hemorrhages,
and blood vessel. Various features are extracted in the third step, and then in the fourth
step classification is done. Classification is done by detecting it has DR or No DR.
264 S. Sharma et al.
4 Experimentation Setup
5 Dataset
We have used fundus retinal images from the Kaggle challenge of APTOS 2019
blindness detection [10]. The dataset consists of 3662 train images, 1929 test images
and is divided into five different classes as no DR, mild, moderate, severe, and
proliferative DR. Class’s name is given in numbers starting from 0 to 4. We have
used 2929 images for training and 733 images for testing. In this dataset, id_code is
used for different images name, and their diagnosis level is given as below in Fig. 3.
Early Identification of Diabetic Retinopathy … 265
6 Image Processing
To enhance and extract useful features from images, image processing is used. Oper-
ations such as circular cropping, resizing, converting to gray, applying Gaussian blur
are performed on images.
See Fig. 4.
See Fig. 5.
See Fig. 6.
A convolution neural network (CNN) is used as a deep learning method. Some CNN
architectures can be used for image processing, object detection, segmentation, and
image classification. There are various types of CNN architecture with pre-trained
ImageNet weights like:
. ResNet50
. VGG16
. MobileNet
. Inception Net
. Dense Net.
We have used ResNet50 and VGG16 for training, testing, and classification.
Parameters such as batch size, learning rate, epochs, height, width, are used for
training. Image data generator function is used for data augmentation, data flow is
used to prepare the dataset and split it into training, validation, and testing. Here,
2344, 585, 733 images are validated for training, validation, and testing.
The dataset is ready for the training phase after initializing the above-mentioned steps.
We did transfer learning using standard ResNet50 and VGG16 CNN architecture with
pre-trained ImageNet weights. As the standard ImageNet weights classify the objects
into 1000 categories so we excluded the top layer and added some layers of our own
Early Identification of Diabetic Retinopathy … 267
in the model. The standard input image size for ResNet50 and VGG16 is 224, 224,
3 but we have used 320, 320, 3.
We froze all the layers in the base model and then added the following layers to
the original model:
. GlobalAveragePooling2D
. Dropout Layer (50% dropout)
. Dense (2048 inputs and ReLu activation) and Dense (512 inputs and ReLu
activation) for ResNet50 and VGG16, respectively
. Dropout Layer (50 percent dropout)
. Dense Layer (Softmax activation with 5 classes).
The method works as it is found that the kind of information needed to distinguish
between all the 1000 classes in ImageNet is often also useful to distinguish between
new kinds of images (fundoscopic retinal images in our case).
Fine-Tuning
Fine-tuning is one of the approaches to transfer learning. Here pre-trained model
is used for training with 2 epochs on our new dataset then fine-tuning is done by
unfreezing the whole base model (or a part of it), and retrain the whole model with
a very low learning rate of 0.0001. Here, binary cross-entropy and Adam optimizer
is used and early stopping is used with mode minimum and it monitors according to
val_loss and verbose 1 is used. Fine-tuning is done using the same model again to
tweak the parameters of already trained networks. Because initial layers learn very
general features and we want to go higher up the network so that layers tend to learn
features more specific to the task it is being trained. We used Adam optimizer, 2
epochs, and categorical cross-entropy as loss function in the first case and binary
cross-entropy in the second case. In fine-tuning the model is trained for 12 epochs
and 22 epochs for Resnet50 and VGG16, respectively, with a batch size of 8.
8 Result
Keras is used to train a classifier for detecting whether a person is having DR or not.
During the process, we keep track of training, validation accuracy and loss.
Model Accuracy graph: The learning curves (training, validation accuracy and loss)
of ResNet50 model which took almost 26 min for 12 epochs as mentioned below
(Figs. 7 and 8; Table 1).
268 S. Sharma et al.
Fig. 8 Confusion matrix shows how accurate it is predicting using different shades
Fig. 10 Confusion Matrix shows how accurate it is predicting using different shades
Model Accuracy graph: The learning curves (training validation accuracy and loss)
of VGG16 model which took almost 43 min for 22 epochs as mentioned below
(Figs. 9 and 10; Table 2).
Table 3 shows the comparison of results of ResNet50 and VGG16. We can see that
train accuracy of ResNet50 is high compare to VGG16 and test accuracy is slightly
high of ResNet50. There is not much difference between test accuracy of both models,
so can say that ResNet50 is predicting little better for some classes.
270 S. Sharma et al.
9 Conclusion
Automated system for diabetic retinopathy will help every person in all aspects like
time, cost, etc. The system helps ophthalmologists to give fast and assured treatment
to their patients. Preprocessing is a very important part and in our paper; we showed
that it has improved our accuracy to a great extent. In our paper, 2344, 585, and 733
images are used for training, validation, and testing of the model. We obtained best
results using the Resnet50 and VGG16 model with ImageNet pre-trained weights
and a softmax layer with 5 output units at the end. Training accuracy of ResNet50
model is high than VGG16 but test accuracy is almost the same for both.
10 Future Work
As there are very few images for classes 3 and 4 so our model got less information
about these classes. Therefore, either we need more data for these classes or we need
to augment the data more. We can use bounding boxes to extract features of DR
in images while testing on individual images and can show the probability of how
accurate the model is predicting.
11 Competing Interest
References
1. Healthline. https://www.healthline.com/human-body-maps/retina#1
2. https://www.medicalnewstoday.com/articles/183417?c=1338628189797
3. Webmd. https://www.webmd.com/diabetes/diabetic-retinopathy
4. Eye 7. https://www.eye7.in/retina/diabetic-retinopathy/
5. Alyoubi WL, Shalash WM, Abulkhair MF (2020) Diabetic retinopathy detection through deep
learning techniques: a review. Inf Med Unlocked 20(2020):100377
6. Sisodia DS, Nair S, Khobragade P (2017) Diabetic retinal fundus images: pre-processing and
feature extraction for early detection of diabetic retinopathy. Biomed Pharmacol J 10(2):615–
626
7. Mateen MM, Wen J, Hassan M, Nasrullah N, Sun S, Hayat S (2020) Automatic detection of
diabetic retinopathy: a review on datasets. Meth Eval 8
8. Johri A et al (2021) Enhancement of retinal images using morphological filters. Data
engineering and intelligent computing. Springer, Singapore
9. Bhadauria AS, Nigam M, Arya A, Bhateja V (2018) Morphological filtering-based enhance-
ment of MRI. In: Proceedings of 2nd international conference on computing, communication
and control technology (IC4T), Lucknow, (U.P.), India, pp 54–56
10. Kaggle. https://www.kaggle.com/c/aptos2019-blindness-detection/data?select=train_images
Performance Evaluation of MLP
and CNN Models for Flood Prediction
Abstract Accurate and reliable forecasts with an appropriate lead-time affect oper-
ational flood control systems for making required arrangements against floodings.
Developing a suitable artificial intelligence (AI) model for flood forecasting poses
a severe challenge in terms of interpretability and accuracy. Due to nonlinearity
and uncertainty of floods, prevailing hydrological solutions consistently attain less
prediction robustness. Thus, present work developed a flood model utilising a convo-
lution neural network (CNN) to move forward from artificial neural network (ANN)
that has been broadly applied for developing flood models to secure diversity and
establish model’s suitability. The mean squared error (MSE) and Willmott index
(WI) of CNN were 1.743 and 0.9878, respectively, representing an excellent overall
model performance in flood prediction. The conclusive results indicated that CNN
generated improved forecasting results than MLP models and can be recommended
for monthly flood forecasting. Using commonly accessible data of the region crucial
for prediction, the outcomes would be helpful for real-time flood forecasting, evading
complexity of physical procedures.
I. S. Macharyulu · A. Ray
Department of Civil Engineering, GIET University, Bhubaneswar, Odisha, India
e-mail: [email protected]
D. P. Satapathy · S. Samantaray (B)
Department of Civil Engineering, OUTR, Bhubaneswar, Odisha, India
e-mail: [email protected]
D. P. Satapathy
e-mail: [email protected]
A. Sahoo
Department of Civil Engineering, NIT Silchar, Assam, India
N. R. Mohanta
Department of Civil Engineering, NIT Raipur, Raipur, Chhattisgarh, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 273
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_26
274 I. S. Macharyulu et al.
1 Introduction
Of several natural disasters, floods are the most dangerous since they often cause loss
of life and assets each year and destroy settlements, farmland, and roads worldwide
[1–3]. It was reported that during 2011–2012, the flood disasters influenced about
200 million people, and total indirect damages were around 95 billion dollars. India,
in particular, has been facing frequent floods for so long, and this disaster causes huge
amounts of property losses and fatalities. Hence, it is vital to predict floods and recog-
nise the susceptibility zones [4]. Floodings are a complex and nonlinear process; as
a result, it is impossible to prevent floods entirely [2, 5, 6]. However, we can predict
future flood occurrences and help in mitigating human and economic damages. ANNs
are nonlinear nonparametric regression systems [7–9] known as ‘universal approx-
imators’, i.e., if provided with an adequate number of hidden neurons, they have
the ability to approximate every continuous function when trained with an informa-
tive data set [10, 11]. CNN is among the most prevalent models utilised today. This
computational neural network model utilises a variation of MLPs and comprises one
or additional convolutional layers which can either be entirely linked or pooled.
Tiwari and Chatterjee [12] explored ability of bootstrapping and wavelet methods
by applying a hybrid bootstrap–wavelet–ANN (BWANN) model to develop a reliable
and accurate model for forecasting hourly flood magnitude of Mahanadi river basin,
India. Results revealed that robust BWANN model generated better outcomes than
other applied models in their study. Kim and Singh [13] developed and applied
MLP, GRNN, and SOM for flood forecasting at Sangye site of Bocheong stream
watershed, Republic of Korea. Their findings revealed that SOM forecasted flood
discharge more accurately than MLP and GRNN during testing period. Hong and
Hong [14] evaluated the usage of MLP neural network to forecast water levels of a
gauge station positioned at Kuala Lumpur city centre, Malaysia. Phitakwina et al.
[15] implemented MLP-CS (Cuckoo search) for predicting 7 h ahead water level for
developing a flood model of River Ping, Thailand. Results indicated that MLP-CS
model performed better than simple MLP model. Le et al. [16] suggested an long
short-term memory (LSTM) model for flood forecasting using daily discharge and
rainfall of Da River basin, Vietnam, as input data. Wang et al. [17] introduced CNN to
evaluate flood susceptibility in Shangyou County, China and compared the obtained
results with conventional support vector machine (SVM) classifier. They concluded
that CNN could help manage and mitigate floods. Suddul et al. [18] proposed and
evaluated MLP, MLP-GA (genetic algorithm), MLP-BA (bat algorithm), and MLP-
BA-GA models for automated and real-time river flood prediction. They found that
MLP-BA-GA model provided improved river flood prediction with better accuracy
and reliability. Duan et al. [19] developed temporal CNN model for predicting long-
term streamflow in California within catchment characteristics. Results of developed
model were compared with LR, RNN, ANN and LSTM models, which showed the
ability and potential of temporal CNN model. Song [20] used CNN to develop a runoff
model for Heuk River, South Korea. He found great potential in implementing the
CNN model with better results and accuracy.
Performance Evaluation of MLP and CNN Models for Flood Prediction 275
2 Study Area
Subarnarekha river basin lies between 21° 33' N to 23° 32' N and 85° 09' E to 87°
27' E covering 19,300 km2 area and originating near Nagri village in Ranchi district
(Fig. 1). Present study area covers the central and lower watersheds of the river
basin. Subarnarekha flows through extreme southwestern regions of West Medinipur
district (West Bengal) and easternmost regions of Baleswar and Mayurbhanj districts
of Odisha. Average annual precipitation fluctuates between 1150 and 1500 mm, with
most rain experienced from June to October. In winter, minimum temperature is as
low as 8 °C, whereas during summer season, temperature ranges from 40 to 45 °C.
3 Methodology
3.1 Mlp
Rumelhart et al. [21] developed a feed-forward MLP network usually applied for
problems associated with pattern mapping. The MLP network utilised in present
study comprises a sensory unit set that establishes an input layer, one or more hidden
layers with computational neurons, and an output layer [22]. A neuron contains a
single output with multiple inputs. The basic equation representing net in an MLP
network is as follows:
Σ
net = xi .wi − b (1)
where, w—weights, b—bias, and x—input. Then neuron’s output, f (net), is selected
by an activation function that finds a response of node to input signal it accepts.
3.2 CNN
local connections, shared weights, and pooling, CNN is differentiated from a conven-
tional NN. CNN’s primary idea is that input data are images or can be inferred as
images. This significantly reduces a number of parameters which results in more fast
processing. CNN is an optimal architecture designed for detecting patterns in 1D
and 2D data as they can be customised based on the application’s number and kind
Performance Evaluation of MLP and CNN Models for Flood Prediction 277
Discharge data (Dt ) of monsoon season (June–October) are collected from CWC,
Guwahati, for a period of 1988–2019. The data collected from 1988–2011 (75%
of data) are utilised for training and from 2012–2019 (25%) are utilised for testing
the models. Three constraints WI, R2 , and MSE are applied to evaluate the model
performance.
1 Σ( )2
n
MSE = Yi − Ŷi (2)
n k=1
⎡ ⎤
N (
Σ )2
⎢ Yi − Ŷi ⎥
⎢ k=1 ⎥
WI = 1 − ⎢ N (| | | | ) ⎥ (3)
⎣Σ | | | | ⎦ 2
| i
Ŷ − Yi| + | i
Ŷ − Ŷi|
k=1
The present study aims to familiarise practicability of hybridised AI model for flood
forecasting. Monthly data of Jamsolaghat meteorological station was utilised for
developing the proposed CNN model and then compare its performance against MLP.
Before the forecasting procedure, correlation analysis is used to determine the related
lag times of forecasting matrix for constructing the predictors. Table 1 summarises
five input combinations assimilated with four different lag times. For more descriptive
278 I. S. Macharyulu et al.
Fig. 4 Predicted versus actual monthly flood discharge of MLP and CNN models
5 Conclusion
One of the most complex and challenging problems in hydrology is flood forecasting.
However, because of its critical contribution to reducing life and economic losses,
it is also one of the most significant aspects of hydrology. From the standpoint of
providing reliable forecasts and avoiding difficulty of physical processes, in this
study, CNN and ANN models were applied to predict floods at a specific location.
This work recommends the potential of using CNN model in the hydrological field
of study to construct and manage real-time flood warning systems. Results indicated
that accuracy of CNN was better than MLP for monthly flood forecasting in the
study area. With this knowledge, estimating future river system floods is promising
by utilising past rainfalls and water levels from stations without any comprehensive
data requirements.
References
1. Sahoo A, Ghose DK (2021) Flood frequency analysis for menace gauging station of Mahanadi
River, India. J Inst Eng (India): Series A, pp 1–12
2. Sahoo A, Samantaray S, Ghose DK (2021) Prediction of flood in Barak River using hybrid
machine learning approaches: a case study. J Geol Soc India 97(2):186–198
3. Samantaray S, Tripathy O, Sahoo A, Ghose DK (2020) Rainfall forecasting through ANN
and SVM in Bolangir watershed, India. In: Smart intelligent computing and applications, pp
767–774. Springer, Singapore
4. Samantaray S, Sahoo A, Agnihotri A (2021) Assessment of flood frequency using statistical and
hybrid neural network method: Mahanadi River basin, India. J Geol Soc India 97(8):867–880
5. Samantaray S, Sahoo A (2019) Estimation of flood frequency using statistical method:
Mahanadi River basin, India. H2Open J 3(1):189–207
6. Sahoo A, Samantaray S, Paul S (2021b) Efficacy of ANFIS-GOA technique in flood prediction:
a case study of Mahanadi river basin in India. H2Open J 4(1):137–156
7. Samantaray S, Ghose DK (2018) Dynamic modelling of runoff in a watershed using artifi-
cial neural network. In: Smart intelligent computing and applications, pp 561–568. Springer,
Singapore
8. Samantaray S, Ghose DK (2020) Modelling runoff in an arid watershed through integrated
support vector machine. H2Open J 3(1):256–275
9. Samantaray S, Ghose DK (2021) Prediction of S12-MKII rainfall simulator experimental runoff
data sets using hybrid PSR-SVM-FFA approaches. J Water Clim Change. https://doi.org/10.
2166/wcc.2021.221
10. Sahoo A, Samantaray S, Bankuru S, Ghose, DK (2020) Prediction of flood using adaptive
neuro-fuzzy inference systems: a case study. In: Smart intelligent computing and applications,
pp 733–739. Springer, Singapore
11. Sahoo A, Singh UK, Kumar MH, Samantaray S (2021c) Estimation of flood in a river basin
through neural networks: a case study. In: Communication software and networks, pp 755–763.
Springer, Singapore
12. Tiwari MK, Chatterjee C (2010) Development of an accurate and reliable hourly flood
forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach. J Hydrol
394(3–4):458–470
13. Kim S, Singh VP (2013) Flood forecasting using neural computing techniques and conceptual
class segregation. JAWRA J American Water Resour Assoc 49(6):1421–1435
Performance Evaluation of MLP and CNN Models for Flood Prediction 281
14. Hong JL, Hong K (2016) Flood forecasting for Klang river at Kuala Lumpur using artificial
neural networks. Intl J Hybrid Inf Technol 9(3):39–60
15. Phitakwinai S, Auephanwiriyakul S, Theera-Umpon N (2016) Multilayer perceptron with
cuckoo search in water level prediction for flood forecasting. In: 2016 international joint
conference on neural networks (IJCNN). IEEE, pp 519–524
16. Le XH, Ho HV, Lee G, Jung S (2019) Application of long short-term memory (LSTM) neural
network for flood forecasting. Water 11(7):1387
17. Wang Y, Fang Z, Hong H, Peng L (2020) Flood susceptibility mapping using convolutional
neural network frameworks. J Hydrol 582:124482
18. Suddul G, Dookhitram K, Bekaroo G, Shankhur N (2020) An evolutionary multilayer percep-
tron algorithm for real time river flood prediction. In: 2020 zooming innovation in consumer
technologies conference (ZINC). IEEE, pp 109–112
19. Duan S, Ullrich P, Shu L (2020) Using convolutional neural networks for streamflow projection
in California. Frontiers Water 2:28
20. Song CM (2020) Hydrological image building using curve number and prediction and
evaluation of runoff through convolution neural network. Water 12(8):2292
21. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error
propagation. Parallel distributed processing, vol 1. MIT Press, Cambridge, pp 318–362
22. Mohanta NR, Panda SK, Singh UK, Sahoo A, Samantaray S (2022) MLP-WOA is a successful
algorithm for estimating sediment load in kalahandi gauge station, India. In: Proceedings of
international conference on data science and applications, pp 319–329. Springer, Singapore
23. Zhang C, Sargent I, Pan X, Li H, Gardiner A, Hare J, Atkinson PM (2018) An object-based
convolutional neural network (OCNN) for urban land use classification. Remote Sens Environ
216:57–70
24. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
25. Ghorbanzadeh O, Blaschke T, Gholamnia K, Meena SR, Tiede D, Aryal J (2019) Evaluation
of different machine learning methods and deep-learning convolutional neural networks for
landslide detection. Remote Sens 11(2):196
Bidirectional LSTM-Based Sentiment
Analysis of Context-Sensitive Lexicon
for Imbalanced Text
P. Krishna Kishore, K. Prathima, Dutta Sai Eswari, and Konda Srikar Goud
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 283
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_27
284 P. Krishna Kishore et al.
1 Introduction
2 Related Work
It is really prominent that conceptual text contains lots of subjective opinions and is
never objective. In this case, sentiment analysis skilled at evaluating whether qual-
itative text is optimistic, false, or indifferent in words of sets of assumptions. A
review sites is a brief record. As a result, the authors regard evaluation classifica-
tion as paragraph sentiment analysis. To begin, the book describes machine learning
techniques that have been used in the literary works to manage sentiment classifica-
tion. Deep learning techniques for sentiment analysis are classified into four stages:
semi-supervised, ensemble learning, supervised, and unsupervised.
286 P. Krishna Kishore et al.
When training data is used to identify an effective set of features for sentiment
categorization, machine learning classifiers are used. Supervised machine learning
algorithms necessitate a substantial amount of labelled training data in order to do
this. Many supervised algorithms, including Naive Bayes, support vector machines,
and maximum entropy classification are utilised to classify movie reviews. Each
film review is treated as a separate piece of work when it comes to determining how
viewers feel about it [15]. The bag of words model, which tracks the recurrence
of terms and phrases in feedback, represents each evaluation as a feature vector.
POS tags and classifier attributes are used to create feature-based representations
of sentiment orientation [16]. The dominating collection of n-gram characteristics is
selected using semantic information and syntactic interactions between n-grams [17].
Bespalov et al. [18] used latent overview and n-gram characteristics to construct a
word embedding model for sentiment categorization. The authors compared support
vector machines, character-based n-grams, and Naive Bayes for supervised machine
learning.
There has been investigation on a neural network-based approach for encoding
tweet conductivity characteristics into word representations [19]. SVM and ANN,
two supervised algorithms, are combined to recognise movie reviews. First, SVM
allocates a sentiment value to every factor and calculates which ones are the most
suitable. ANN [20] is then employed to determine the categorization accuracy. Using
four feature selection strategies, IG, GR, CHI and DF, and five supervised designs
(NB, KNN, DT, SVM, and RBFNN), Liu et al. [21] examined the performance
of four feature selection methods in the context of multi-class sentiment. Zhu and
colleagues developed an architecture that recognises long-term associations without
the usage of syntax in a statement and document simulation [22]. A two-way long
record approach has been used to collect global semantic meaning and compositional
linkages [23]. Recent years have seen the usage of CNN models [24, 25] and recurrent
neural networks [26, 27] for sentiment analysis of sentences.
Unsupervised sentiment classification methods are those that do not require labelled
training data sets to work. Sentiment reviews were analysed using a sentiment lexicon
that includes a list of well-known sentiment phrases as well as their direction. The
authors referred to these strategies as lexicon-based methodologies. It was found
that Turney and Littman [28] used the knocks results of a search result to assess a
message’s sentiment orientation, but WordNet [29] used the short distance between
the word “outstanding” and the bi-polar start concentrating terms (“outstanding”
and “terrible”). By combining similarity interactions in WordNet to phrases and
records, Missen and Boughanem [30] derived the sentiment score. It is possible to
Bidirectional LSTM-Based Sentiment Analysis … 287
Data unlabelled and labelled data are both used in semi-supervised learning.
These algorithms continue to expand sentiment labels from concrete to abstract by
employing a semi-supervised methodology on a limited portion of training data [32].
In order to extend adjectives, Hatzivassiloglou and McKeown [32] employed a graph-
based classification system and a large number of associative rules. One of Zhu and
Ghahramani’s proposed clustering algorithms keeps shifting the labels throughout a
data set. Using an all-encompassing lexicon, he and Zhou [34] developed a framework
for merging preliminary classifier expertise with sentiment classifier information. By
employing the simple majority strategy on successfully categorised instances, Zhang
and He [35] hope to reduce the number of misclassified cases. An approach based
on multiple feature subspace-based self-training was proposed by Gao et al. [36] for
identifying appropriate features and informative examples for autonomous labelling.
Unsupervised and supervised data are combined in a semi-supervised framework
described by Da Silva et al. [37]. The first technique for calculating conceptual simi-
larities between document sentences is presented by Tai and Kao [38]. According to
Hamilton et al. [39], the best way to build opinion sentiment lexicons from corpora
is to combine label transmission with word representations that are particular to the
topic in question.
A large number of randomly generated feature spaces was used by Li et al. [47]
to overcome the imbalanced sentiment analysis challenge [48]. Song et al. [48]
suggested two-way clustering-based algorithms to categorise unbalanced text data.
Oversampling and undersampling are combined using the SMOTE algorithm. Clas-
sifier performance on severely unbalanced Twitter post data was studied by Prus
et al. [49] using simulated data in the selection, management, and improving stages
of the classification process. Text resampling was examined by Moreo et al. [50]
by determining the minority class. This method is used to assess the relevance of a
characteristic’s distribution across a vast corpus of textual data [46]. This information
implementation outperforms in mathematical space.
3 Proposed Methodology
The authors described a framework that controls the activity of totally unbalanced
text sentiment analysis in this section. The method described in [23] was used by the
author to calculate the sentiment score of evaluations in domain-specific records. In
general, domain-specific data provides the semantic meaning of words in the context
of sales. To collect the review sites’ local and nonlocal correctness, a bidirectional
LSTM is used. Consider a review R = wo1 , wo2 , wo3 , …, won which has n words.
Authors identified the subjective words wo1L , wo2L , wo3L , …, WOmL in review R using
a lexicon resource L. Then, as shown in Eq. (1), the sentiment value of review R is
computed. The sentiment score of subjective word woijL is represented by S value
(woijL ). The sentiment weight and partiality of the review R are represented by the
factors Aij and b, respectively. Because an overview may contain different paragraphs,
the researchers compared them into a small paragraph. As a consequence, there will
be only one bias all through the analysis.
Σ
m
S_value(R) = αij ∗ S_value(wo L )ij + b (1)
j=1
Between the data and the bidirectional LSTM layer is this layer. i. Bidirectional
LSTM: this layer performs the input data procedure (the entire review) within a
specific timeframe. It extracts resident and non-semantic information from the input
pattern by combining previous and future data from that time frame. The proposed
method’s entire model is depicted in Fig. 1. General framework of the proposed
model. Three distinct layers make up the model. ii. The embedding layer is in charge
of transforming arriving words into dense, real-valued matrices. One LSTM examines
the entire input from left to right, while another LSTM examines the same data in
the opposite direction (right to left). Finally, regression analysis is implemented on
highest part of the output nodes to conduct numeric text sentimental analysis.
Bidirectional LSTM-Based Sentiment Analysis … 289
They started a series of gates that evaluate the amount of data that must be kept from
the previous condition as well as the extraction of features from the existing input
pattern. For the first time in recurrent neural networks, Hochreiter and Schmidhuber
(1997) proposed LSTM to solve the gradient vanishing problem. A double-layer
LSTM is a BLSTM. From left to right, the first LSTM computes the input data
token by symbol. The input pattern is encrypted from right to left by another layer.
The BLSTM model used in this study is shown in Fig. 2. BLSTM Model: BLSTM
extracts features from an input sequence word for word. It also has memory cell cts .
These elements are in charge of directing information flow from the past x 1 , x 2 , x 3 ,
…, x ts-1 and h1 , h2 , h3 , …, hts-1 to the present state hts and output gate ots . Each LSTM
cell has three gates: at each time, stamps, the memory cell cts , input gate its , forget
gate f ts, and an output gate ots are formally updated in the following way:
f ts = 1 − i ts (3)
( )
gts = Wg [xts , h ts - 1 ] + bg (4)
( )
cts = f ts ∗ cts - 1 + (i ts ∗) (5)
The embedded dense real-valued vector of the input data word wots is denoted
by x ts . In input gate, there is a sigmoid function (σ) implementation on the variable
W i , where W i is described as the linear combination of variables are wi , pi , qi .
wi , pi , and qi are weight vectors of input word, hidden state, and memory cell,
respectively. W g is defined as concatenation of parameters x g , pg . × represents the
point-wise multiplication operation. W o is defined as concatenation of parameters
wo , po , Finally, bi , bg , bo are the bias level parameters of gts and ots, respectively.
To compute sentiment weights aij , each BLSTM layer generates a hidden state
vector for each input word. The two hidden states produced by the left to right LSTM
layer, and the right to left layers are denoted by hl and hr , respectively.
This model’s section, the overall sentiment score of the review, was calculated by
the authors. To figure out the weights. Authors leveraged the hidden state vectors hl
and hr for each subjective word woij in the review R.
The following equations are used to calculate sentiment weights.
( )
FijR = σ WpR .Hij + (8)
( )
βijR = Wpw .FijR + (9)
The review’s overall bias score Rbias is calculated using the same methodology by
the authors. Rbias is calculated once for each overview and is focused on the whole
review, according to intellect. The following formula is being used to calculate Rbias .
Now, author’s leveraged Rbase and Rbias . Using the equation shown below, calculate
the overall sentiment score of the review R.
∅ is a component with such a value between 0 and 1 which can be measured using
the equation below.
∅ = (Wϕ H B + bϕ ) (14)
+ = (S_(R))
probRi (15)
Existing approaches and algorithms are based on the principle of generating new
synthesised points based on numerical data distribution. Traditional methods and
technologies for dealing with unlabelled data have been successful in a wide range
of applications, but they are restricted to numeric value distributions. To address the
issue of unequal class distribution, the researchers used a new data target population
with [51] inversion and illustration. This method generates modern experimental
texts by reversing a selected cluster and mimicking minority class examples. To be
more specific, this technique balances the data while also generating new messages
in the minority class. This method produced a single text based on the distribution
of minority classes. It should be noted that the negative classifier is considered the
minority class, whereas the positive class distribution is considered the majority
class.
292 P. Krishna Kishore et al.
4 Experimental Study
The authors first presented experimental settings, followed by our results, which
confirmed the performance of the proposed method based on the experimental
analysis in this section of this paper.
Four Amazon multi-domain review data sets and the SemEval-2013 Twitter data set
are used to validate the proposed model (subtask B). Electronics, DVDs, kitchen, and
books are the four Amazon multi-domain analysis data sets analysed. To produce an
extremely unbalanced variant for Amazon review data sets, the researchers have used
the same distribution rule as in [51]. Each domain has 1000 positive and 400 nega-
tive samples. The unbalancing ratio from each of the Amazon multi-domain review
data sets is 1:2.5. There really are 6400 Twitter posts in the SemEval-2013 Twitter
data set. The authors have been using 3550 tweets as the training examples (posi-
tive:2600 and negative:950) and 2850 tweets as the testing data set (positive:1950
and negative:900). The SemEval-2013 Twitter data set does have a consistency ratio
of 1:2.45. Word removal, punctuation symbols, and unexpected words have been
eliminated, and every word was translated to lowercase letters. The authors have
identified the personal words and their rankings using SentiWordNet 3.0 [52], a
lexicon source specifically developed for sentiment classification implementations.
In SentiWordNet 3.0, the scoring system of lexicon words ranges from 0 to 1.
An assessment for evaluating imbalanced data sets varies from those for analysing
balanced data sets. In this article, the authors use correctness and area under curve
(AUC) performance evaluation (area under curve). The authors employed the same
techniques in [53] and [51] to evaluate the efficacy of the proposed method. In words
of measurement exactness, the proposed model is also evaluated by comparing to
the SE-HyRank Lexicon [54], the cluster UC, and the cost sensitive algorithms [55].
For four Amazon multi-domain review data sets, the proposed model outperforms
comparison algorithms in terms of effectiveness. This is clear from the data in Table
1. Despite receiving an F1-score of 81.2%, the proposed approach lagged behind
the other data sets in the race. Province accuracy and F1 rating results, as well as
modelling techniques suggested. The F1-score is defined in Eq. (16) and is used as
a metric in the SemEval-2013 Twitter data set.
Precision ∗ Recall
F1score = 2 ∗ (16)
Precision + Recall
Bidirectional LSTM-Based Sentiment Analysis … 293
Table 1 Accuracy and F1-score results for the state of the arts and the proposed model
Books ACC DVD ACC Electronics ACC Kitchen ACC SemEval-2013
F1
Se-HyRank – – – – 81.7
lexicon
Cluster UC 71.3 73.1 79.7 80.3 –
Cost 66.3 71.2 74.8 77.3 –
sensitive
Unified 78.5 75.5 78.6 81.1 81.5
model
Proposed 79.2 76.4 80.0 81.9 81.2
model
The ROC curves for four Amazon multi-domain data sets are shown in the figures
(books, DVD, electronics, and kitchen). Books, DVDs, electronics, and a kitchen are
depicted in Figs. 3, 4, 5 and 6. Similarly, authors achieved an accuracy (ACC) of
79.2% for the books data set, 76.4% for the DVD data set, 80.0% for the electronics
data set, and 81.9% for the kitchen data set. The AUC for the books data set was
77%, the DVD data set had an AUC of 85%, the electronics data set had an AUC of
87.5%, and the kitchen data set had an AUC of 82%.
5 Conclusion
The authors proposed a method for communicating with the assignment of imbal-
anced text sentiment analysis in this article. It evaluates evaluation sentiment value
as a weighting factor of interpretive word sentiment words. To identify subjective
terms in evaluations, this method allows use of its lexicon resource. It has used a bidi-
rectional LSTM to determine the sentiment value of evaluations. The goal of using a
294 P. Krishna Kishore et al.
BLSTM is to determine sentiment strength exercises for subjective words using both
local and nonlocal semantic text features. It employs logistic regression on top of
BLSTM to measure responsibility and evaluate sentiment. Finally, using a text-based
over sampling method, the authors generate additional test results by inverting the
class membership and mimicking the underrepresented class. According to the results
Bidirectional LSTM-Based Sentiment Analysis … 295
References
1. Mishne G, Glance NS (2006) Predicting movie sales from blogger sentiment. In: AAAI spring
symposium: computational approaches to analyzing weblogs, pp 155–158
2. Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and
semantic classification of product reviews. In: Proceedings of the 12th international conference
on World Wide Web, pp 519–528
3. Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blogs.
Icwsm 7(21):219–222
4. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
5. Goldsmith RE, Horowitz D (2006) Measuring motivations for online opinion seeking. J Interact
Advert 6(2):2–14
6. Pk MR (2018) Role of sentiment classification in sentiment analysis: a survey. Ann Libr Inf
Stud (ALIS) 65(3):196–209
7. Andreevskaia A, Bergler S (2008) When specialists and generalists work together: overcoming
domain dependence in sentiment tagging. In: Proceedings of ACL-08: HLT, pp 290–298
8. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a
survey. Ain Shams Eng J 5(4):1093–1113
9. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of
words and phrases and their compositionality. In: Advances in neural information processing
systems, pp 3111–3119
10. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation.
In: Proceedings of the 2014 conference on empirical methods in natural language processing
(EMNLP), pp 1532–1543
11. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J
Mach Learn Res 3(Feb):1137–1155
12. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions.
Prog Artif Intell 5(4):221–232
13. Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE
Trans Syst Man Cybern Part B (Cybern) 42(4):1119–1130
14. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision.
CS224N project report. Stanford 1(12)
15. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine
learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural
language processing. 10:79–86. Association for Computational Linguistics
16. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth
ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177
17. Abbasi A, France S, Zhang Z, Chen H (2010) Selecting attributes for sentiment classification
using feature relation networks. IEEE Trans Knowl Data Eng 23(3):447–462
18. Bespalov D, Bai B, Qi Y,Shokoufandeh A (2011) Sentiment classification based on super-
vised latent n-gram analysis. In: Proceedings of the 20th ACM international conference on
Information and knowledge management, pp 375–382
19. Ye Q, Zhang Z, Law R (2009) Sentiment classification of online reviews to travel destinations
by supervised machine learning approaches. Expert Syst Appl 36(3):6527–6535
20. Tripathy A, Anand A, Rath SK (2017) Document-level sentiment classification using hybrid
machine learning approach. Knowl Inf Syst 53(3):805–831
21. Liu Y, Bi JW, Fan ZP (2017) Multi-class sentiment classification: The experimental compar-
isons of feature selection and machine learning algorithms. Expert Syst Appl 80:323–339
296 P. Krishna Kishore et al.
22. Zhang R, Lee H, Radev D (2016) Dependency sensitive convolutional neural networks for
modeling sentences and documents. arXiv preprint arXiv:1611.02361
23. Teng Z, Vo DT, Zhang Y (2016) Context-sensitive lexicon features for neural sentiment analysis.
In: Proceedings of the 2016 conference on empirical methods in natural language processing,
pp 1629–1638
24. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:
1408.5882
25. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for
modelling sentences. arXiv preprint arXiv:1404.2188
26. Dong L, Wei F, Tan C, Tang D, Zhou M, Xu K (2014) Adaptive recursive neural network for
target-dependent twitter sentiment classification. In: Proceedings of the 52nd annual meeting
of the association for computational linguistics (vol 2: Short papers), pp 49–54
27. Liu P, Qiu X, Chen X, Wu S, Huang XJ (2015) Multi-timescale long short-term memory neural
network for modelling sentences and documents. In: Proceedings of the 2015 conference on
empirical methods in natural language processing, pp 2326–2335
28. Turney PD, Littman ML (2002) Unsupervised learning of semantic orientation from a hundred-
billion-word corpus. arXiv preprint cs/0212012
29. Kamps J, Marx M, Mokken RJ, De Rijke M (2004) Using WordNet to measure semantic
orientations of adjectives. In: LREC (vol 4, pp 1115–1118)
30. Missen MMS, Boughanem M (2009) Using wordnet’s semantic relations for opinion detection
in blogs. In: European conference on information retrieval, pp 729–733. Springer, Berlin,
Heidelberg
31. Fernández-Gavilanes M, Álvarez-López T, Juncal-Martínez J, Costa-Montenegro E, González-
Castaño FJ (2016) Unsupervised method for sentiment analysis in online texts. Expert Syst
Appl 58:57–75
32. Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In:
Proceedings of the 35th annual meeting of the association for computational linguistics and
eighth conference of the European chapter of the association for computational linguistics, pp
174–181. Association for computational linguistics
33. ZhuǴ X, GhahramaniǴn Z (2002) Learning from labeled and unlabeled data with label
propagation
34. He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process
Manage 47(4):606–616
35. Zhang P, He Z (2013) A weakly supervised approach to Chinese sentiment classification using
partitioned self- training. J Inf Sci 39(6):815–831
36. Gao W, Li S, Xue Y, Wang M, Zhou G (2014) Semi-supervised sentiment classification with
self- training on feature subspaces. In: Workshop on Chinese lexical semantics, pp 231–239.
Springer, Cham
37. da Silva NFF, Coletta LF, Hruschka ER, Hruschka ER Jr (2016) Using unsupervised information
to improve semi-supervised tweet sentiment classification. Inf Sci 355:348–365
38. Tai YJ, Kao HY (2013) Automatic domain-specific sentiment lexicon generation with label
propagation. In: Proceedings of international conference on information integration and web-
based applications & services, pp 53–62
39. Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016). Inducing domain-specific sentiment
lexicons from unlabeled corpora. In: Proceedings of the conference on empirical methods in
natural language processing. Conference on Empirical methods in natural language processing
(vol 2016, pp 595) NIH Public Access
40. Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment
classification. Inf Sci 181(6):1138–1152
41. Onan A, Korukoğlu S, Bulut H (2017) A hybrid ensemble pruning approach based on consensus
clustering and multi-objective evolutionary algorithm for sentiment classification. Inf Process
Manage 53(4):814–833
42. Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier
based on differential evolution algorithm for text sentiment classification. Expert Syst Appl
62:1–16
Bidirectional LSTM-Based Sentiment Analysis … 297
43. Onan A, Korukoğlu S, Bulut H (2016) Ensemble of keyword extraction methods and classifiers
in text classification. Expert Syst Appl 57:232–247
44. Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text using ensemble of classifiers.
Eng Appl Artif Intell 51:191–201
45. Lochter JV, Zanetti RF, Reller D, Almeida TA (2016) Short text opinion detection using
ensemble of classifiers and semantic indexing. Expert Syst Appl 62:243–249
46. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-
imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
47. Li S, Wang Z, Zhou G, Lee SYM (2011). Semi-supervised learning for imbalanced sentiment
classification. In: Twenty-second international joint conference on artificial intelligence
48. Song J, Huang X, Qin S, Song Q (2016) A bi-directional sampling based on K-means method for
imbalance text classification. In: 2016 IEEE/ACIS 15th international conference on computer
and information science (ICIS). IEEE, pp 1–5
49. Prusa JD, Khoshgoftaar TM, Seliya N (2016) Enhancing ensemble learners with data sampling
on high- dimensional imbalanced tweet sentiment data. In: The twenty-ninth international flairs
conference
50. Moreo A, Esuli A, Sebastiani F (2016) Distributional random oversampling for imbalanced text
classification. In: Proceedings of the 39th international ACM SIGIR conference on research
and development in information retrieval, pp 805–808
51. Li Y, Guo H, Zhang Q, Gu M, Yang J (2018) Imbalanced text sentiment classification using
universal and domain-specific knowledge. Knowl-Based Syst 160:1–15
52. Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for
sentiment analysis and opinion mining. In: Lrec. vol 10:2200–2204
53. Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M (2016)
Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced
databases. Neurocomputing 175:935–947
54. Tang D, Wei F, Qin B, Yang N, Liu T, Zhou M (2015) Sentiment embeddings with applications
to sentiment analysis. IEEE Trans Knowl Data Eng 28(2):496–509
55. Li S, Ju S, Zhou G, Li X (2012) Active learning for imbalanced sentiment classification. In:
Proceedings of the 2012 joint conference on empirical methods in natural language processing
and computational natural language learning, pp 139–148. Association for computational
linguistics
Improving Streamflow Prediction Using
Hybrid BPNN Model Combined
with Particle Swarm Optimization
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 299
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_28
300 N. M. Kumar et al.
1 Introduction
model to develop a seepage prediction model. Outcomes showed that enhanced dam
seepage model increases ability of generalisation and nonlinear mapping.
This study aims to develop an efficient streamflow forecasting model based on
BPNN-PSO model using historical monthly streamflow data of Panposh gauging
station of Brahmani River, India, and investigate its performance against conventional
BPNN model.
2 Study Area
Brahmani River basin lies between 20° 30' to 23° 36' N latitudes and 83° 52' and
87° 00' E longitudes and flows in eastern region of India with total catchment area
of 39,313 km2 (Fig. 1). Brahmani is located amid River Baitarani on left and River
Mahanadi on right and has four separate sub-basins: Jaraikela, Tilga, Jenapur, and
Gomlai. It receives mean annual precipitation of 1305 mm, with maximum rain
occurring during four months of southwest monsoon season (June–October). The
minimum and maximum temperature fall to 4°C in winter and in summer reaches as
high as 47°C, respectively.
3 Methodology
3.1 BPNN
Rumelhart et al. [28] developed BPNN on the basis of error back-propagation algo-
rithm (Fig. 2). BPNN is a commonly applied efficient NN comprising input, hidden,
and output layers [29, 30]. The complexity of mathematical problems governs the
count of hidden nodes and is determined experimentally. Input I j and output O j of
j th node are computed by
Σ
Ij = wij Oi (1)
i
( )
Oj = f Ij + θj (2)
where
( wij)—weight between input node to hidden or hidden to output node,
f I j + θ j —activation function, and θ j —biased input to neuron.
302 N. M. Kumar et al.
3.2 PSO
being Vl and Pl —velocity and position of the particle, respectively. In Eq. (3), updated
velocity can be achieved based on best swarm position (Pg ) and personal best value
(Pb ) using subsequent relation:
( )
Vl+1 = a · Vl + c1 · r1 (Pl − Pb ) + c2 · r2 Pl − Pg (4)
Present work applies two data-driven modelling tools comprising BPNN and BPNN-
PSO for developing runoff models on basis of six different input combinations for
determining predictor variables. Performance assessment results of both models are
given in Table 1. Predicted values were computed considering data from 30 years
(1990–2019). Valuation is done using NSE and RMSE values between observed
data and predicted results. According to Table 1, when we compared for the flow
predictions in between six scenario, the flow predictions of sixth scenario related to
other scenarios provide higher values of NSE and lower values of RMSE. Similarly,
for BPNN-PSO model, scenario VI provide superior performance than other five
scenario condition. Regarding performance of prediction models based on Table
1, the results of BPNN-PSO model showed more pre-eminence than conventional
BPNN method.
Assessment of prediction models based on their performance in terms of graphical
representation is shown in Figs. 4, 5 and 6. The performance of the two proposed
models for streamflow forecasting is compared using scatter plots, time-series plots,
and violin plots. Figure 4 shows the coefficient of determination (R2 ) between
observed flow data outcomes and predicted values equal 0.92965 and 0.97583 for
304 N. M. Kumar et al.
Fig. 4 Scatter plot of actual versus predicted streamflow for the testing period
Fig. 6 Violin plots of observed streamflow data versus predicted data through standalone BPNN
and hybrid BPNN-PSO models
5 Conclusion
Predicting streamflow is vital to assess impending flood risks and evaluate and plan
flood mitigation actions. Generally, in hydrological modelling, sensitivity and uncer-
tainty are two essential considerations. The primary goal of this research was to
analyse the predictive abilities of hybrid BPNN-PSO algorithm for monthly stream-
flow forecasts. Compared with the BPNN model, the ANN models trained by PSO
algorithm obtain better forecasting results. Obtained results are validated utilising
different statistical measures, indicating that proposed model is computationally fast
and able to learn quickly. Present work only considered ANN-based modelling tech-
nique and utilised data from one gauging station. Future studies can focus on applying
different time-series modelling methods, and additional data from other stations may
be necessary for strengthening our conclusions.
References
7. Sahoo GB, Schladow SG, Reuter JE (2009) Forecasting stream water temperature using regres-
sion analysis, artificial neural network, and chaotic non-linear dynamic models. J Hydrol
378(3–4):325–342
8. Sattari MT, Yurekli K, Pal M (2012) Performance evaluation of artificial neural network
approaches in forecasting reservoir inflow. Appl Math Model 36(6):2649–2657
9. Mehr AD, Kahya E, Olyaie E (2013) Streamflow prediction using linear genetic programming
in comparison with a neuro-wavelet technique. J Hydrol 505:240–249
10. Gowda CC, Mayya SG (2014) Comparison of back propagation neural network and genetic
algorithm neural network for stream flow prediction. J Comput Environ Sci
11. Mehr AD, Kahya E, Şahin A, Nazemosadat MJ (2015) Successive-station monthly stream-
flow prediction using different artificial neural network algorithms. Int J Environ Sci Technol
12(7):2191–2200
12. Chen XY, Chau KW, Busari AO (2015) A comparative study of population-based optimization
algorithms for downstream river flow forecasting by a hybrid neural network model. Eng Appl
Artif Intell 46:258–268
13. Peng T, Zhou J, Zhang C, Fu W (2017) Streamflow forecasting using empirical wavelet
transform and artificial neural networks. Water 9(6):406
14. Gao G, Liu F, San H, Wu X, Wang W (2018) Hybrid optimal kinematic parameter identification
for an industrial robot based on BPNN-PSO. Complexity
15. Li X, Sha J, Wang ZL (2019) Comparison of daily streamflow forecasts using extreme learning
machines and the random forest method. Hydrol Sci J 64(15):1857–1866
16. Zhang X, Chen X, Li J (2020) Improving dam seepage prediction using back-propagation
neural network and genetic algorithm. Math Probl Eng
17. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: 1995 proceedings of the IEEE
international conference on neural networks, vol 4, pp 1942–1948
18. Cao Y, Zhang H, Li W, Zhou M, Zhang Y, Chaovalitwongse WA (2018) Comprehensive learning
particle swarm optimization algorithm with local search for multimodal functions. IEEE Trans
Evol Comput 23(4):718–731
19. Mohanta NR, Panda SK, Singh UK, Sahoo A, Samantaray S (2022) MLP-WOA is a successful
algorithm for estimating sediment load in Kalahandi Gauge Station, India. In: Proceedings of
International Conference on Data Science and Applications. Springer, Singapore, 319–329
20. Sridharam S, Sahoo A, Samantaray S, Ghose DK (2021) Assessment of flow discharge in a river
basin through CFBPNN, LRNN and CANFIS. Commun Softw networks. Springer, Singapore,
765–773
21. Samantaray S, Sahoo A (2021b) Modelling response of infiltration loss toward water table
depth using RBFN, RNN, ANFIS techniques. Int J Knowl-based and Intell Eng Syst. IOS
Press. 25(2):227–234
22. Kisi O (2015) Streamflow Forecasting and Estimation Using Least Square Support Vector
Regression and Adaptive Neuro-Fuzzy Embedded Fuzzy c-means Clustering. Water Resour
Manage 29:5109–5127. https://doi.org/10.1007/s11269-015-1107-7
23. Samantaray S, Sahoo A, Agnihotri A (2021) Assessment of flood frequency using statistical
and hybrid neural network method: Mahanadi River Basin, India. J Geol Soc India, Springer.
97(8):867–880
24. Samantaray S, Sahoo A (2021c) Prediction of suspended sediment concentration using hybrid
SVM-WOA approaches. Geocarto Int. Taylor & Francis. 1–27
25. Tien Bui D, Pham BT, Nguyen QP, Hoang ND (2016) Spatial prediction of rainfall-
induced shallow landslides using hybrid integration approach of Least-Squares Support Vector
Machines and differential evolution optimization: a case study in Central Vietnam. Int J Digital
Earth. 1077–1097
26. Samantaray S, Sahoo A, Ghose DK (2020) Infiltration loss affects toward groundwater fluctua-
tion through CANFIS in arid watershed: a case study. Smart intelligent comput appl. Springer,
Singapore. 781–789
27. Nourani V, Komasi M, Mano A (2009) A Multivariate ANN-Wavelet Approach for Rainfall–
Runoff Modeling. Water Resour Manage 23(14):2877–2894
308 N. M. Kumar et al.
28. Rumelhart GE, Hinton RJ, Williams R (1986) Learning representations by back-propagating
errors. Nature 323:533–536. https://doi.org/10.1038/323533a0
29. Samantaray S, Ghose DK (2020a) Modelling runoff in an arid watershed through integrated
support vector machine. H2Open Journal, IWA Publishing. 3(1):256–275
30. Samantaray S, Ghose DK (2020b). Assessment of suspended sediment load with neural
networks in arid watershed. J Inst Eng (India): Series A, Springer. 101(2):371–380
Prediction of Pullout Resistance
of Geogrids Using ANN
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 309
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_29
310 I. S. Amacharyulu et al.
1 Introduction
2 Pullout Force
between soil and grid (δ). L and W are geogrid properties and remaining are filling
material properties.
Pullout force can be determined experimentally using pullout test apparatus. In
this study, pullout test result data were collected from literature [5], where they
performed tests for 35 geogrids with different lengths, widths, overburden pressures,
and friction angle between soil and grid (δ). Here (δ), it is calculated by using [1]
procedure (δ = 2/3ϕ). Sample geogrid material is shown in Fig. 1.
For the development of ANN model, network needs input parameters for prediction
of pullout force. Totally five input parameters are required, normal stress acting on
geogrid (q), length of embedment (L), width of geogrid (W), relative density of
sand (Dr ), and average friction angle between soil and geogrid (δ), and one output
parameter was pullout force (P). Initially, ANN is trained with 5 inputs: 4 hidden
312 I. S. Amacharyulu et al.
and 1 output layer (5-4-1), and its performance was observed, if its performance was
not satisfactory, then hidden layer neurons need to increase. So hidden layer neurons
were increased one by one and checked ANN performance. It was observed that
when hidden layer neurons are at 10, the performance was most satisfactory. So, 5-
10-1 ANN architecture was concluded and shown in Fig. 2. The algorithm used here
was Bayesian regression, and transfer function used between input and the hidden
Prediction of Pullout Resistance of Geogrids Using ANN 313
Table 2 Data applied to real data and compared with ANN-predicted value
S. No. q L W Dr δ P ANN-P
1 1.265 29.5 10.38 37 43.7 740 735
2 1.265 18.5 10.38 84 45.4 493 511
3 1.265 12.25 10.38 87 52.2 415 430
4 1.265 18.25 10.38 42 33.7 320 317
5 1.841 29.5 10.38 40 34.6 777 766
layer was TANSIGMA whereas PURLIN was used between the hidden layers and
output layers [15, 16, 17].
where
After developing the ANN model successfully, it was applied to untrained new
data collected from the above literature and presented in Table 2. It was observed
that ANN-predicted pullout forces P were almost close to the experimental pullout
force values and were shown in Fig. 3. For the further clarification of the perfor-
mance of ANN, ANN-predicted results were compared using statistical relations of
314 I. S. Amacharyulu et al.
mean square error (MSE), coefficient of determination (R2 ), root mean square error
(RMSE), and variability accounted for (VAF). These values were calculated using
Eq. (1), Eq. (2), Eq. (3), and Eq. (4), respectively.
1 Σ( )2
1
MSE = y − y' [10] (1)
N N
⎡ ⎤2
Σ ( Σ )( Σ )
⎢ N y ∗ y' − y y' ⎥
R2 = ⎢ /
⎣ [ Σ ][ ]
⎥ [11, 13] (2)
(Σ )2 Σ (Σ )2 ⎦
N y2 − y N y '2 − y'
[
|
|1 Σ1
RMSE = [ (y − y ' )2 [12, 14] (3)
N N
[ ( )]
var y − y '
VAF = 100 1 − (4)
var(y)
where
Y Pullout force from Experiment
Y’ Pullout force from ANN
N Total no of cases.
It was observed that MSE was 140 which was a slightly higher value, but it was
manageable for consideration. Coefficient of determination value R2 was close to 1,
900 Experimental
800
ANN Predicted
Pullout Resistance force (lb)
700
600
500
400
300
200
100
0
1 2 3 4 5
Sl .No
Values(lb)
600
500
400
300
300 500 700 900
Predicted Pullout force Values (lb)
RMSE was 11, and VAF value was 99.58. These indices indicate that ANN perfor-
mance was almost close to experimental values. The coefficient of determination R2
can also be seen in Fig. 4.
After successfully developing the ANN model with 30 cases of data, the relative
performance of various contributing factors for pullout resistance is assessed from
the residual weights between various neurons. For this calculation, the range of
variations of these input parameters is to be in the same order variation [3]. This
requires normalization of the training data. For this purpose, all the training data
parameters are normalized between [0–1] using Eq. (5).
X − Min(X )
Normalization = (5)
Max(X ) − Min(X )
where ‘X’ is the considered parameter value. Max (X) is the maximum value of
the considerable parameter, and Min (X) is the minimum value of the considerable
parameter.
After normalization, normalized data are considered again as training data without
modifying the ANN architecture and transfer function. After successful training with
normalized data, the interconnected residual weights are collected from the trained
ANN and presented in Table 3. The sensitivity of input parameters was calculated
by using Garson’s [3, 4] procedures. After calculation, the relative contribution of
each parameter was assessed and shown in Fig. 5.
316 I. S. Amacharyulu et al.
Table 3 Interconnected residual weights between input and hidden layers and hidden and output
layers
Weights q L W Dr Δ Sub weights
1 0.21741 0.050567 0.090315 0.001530 0.10769 0.28683
2 − 0.16025 − 0.034861 − 0.072725 0.004811 − 0.062762 − 0.20685
3 0.11967 0.023289 0.052802 − 0.005020 0.04226 0.1513
4 − 0.61447 − 0.24765 − 0.51278 0.2689 − 0.038005 − 0.81159
5 − 0.70989 0.14546 − 0.018932 − 0.080583 − 0.64969 − 1.0713
6 − 0.15113 − 0.032127 − 0.068666 0.005114 − 0.057351 − 0.19419
7 0.36616 0.10538 0.20311 0.016883 0.20854 0.52003
8 − 0.23035 − 0.40549 − 0.031452 − 0.13675 − 0.9227 − 0.74142
9 1.0268 0.48102 0.35375 0.084676 − 0.75253 1.006
10 − 0.07513 − 0.012934 − 0.031304 0.003429 − 0.025275 − 0.093299
5 Conclusions
The main objective of this study is to determine pullout force of geogrid using ANN
and calculate the relative contribution parameters of pullout force using the Garson
algorithm. The ANN-predicted pullout forces were almost close to experimental
results. This can be observed by R2 and VAR values, whereas MSE and RMSE were
a little higher value but within a range only. For further improvement of results,
training data need to increase. In this study, training data considered were very less,
i.e., 30 cases; only because of this, MSE and RMSE were little higher. Apart from
the prediction of pullout force, relative contribution was also calculated in this study.
It was observed that ‘q’ value is higher (42%); this implies that pullout force mainly
depends upon overburden pressure, next significant was friction between soil and
geogrid ‘δ’ (26%), next width (17%), and length (12%) and least significant was the
relative density of sand (6%). If the contribution of the parameters is more, it plays
a major role in the calculations of the pullout force. Even a small deviation in input
Prediction of Pullout Resistance of Geogrids Using ANN 317
value will result in maximum variation in the pullout force calculations. Therefore,
it is very important to identify the relative contribution of various parameters in the
calculation.
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 319
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_30
320 I. Saikrishnamacharyulu et al.
1 Introduction
Water table depth forecasting is vital for water resource managers for managing
water supply, planning land development and creating efficient irrigation schedules.
Groundwater is one of the primary irrigation and food production sources in many
countries ([1–4]). Rapid population expansion and constant advancement of indus-
trialisation, climate and weather change and changing frequency and intensity of
rainfall have had a severe influence on groundwater resources. Hence, groundwater
resource evaluation is urgently necessary to guarantee sustainable usage and manage-
ment [5–7]. Variations of water level in wells directly quantify effect of groundwater
development, and essential information regarding aquifer dynamics is frequently
entrenched in constantly noted WTD time series [8]. As a result, modelling and
prediction of WTD are essential for water administrators and engineers for qualifying
and quantifying groundwater resources and keep a balance amid demand and supply.
GWL mainly fluctuates due to natural processes, like precipitation or surface water
interaction and artificial effects like artificial recharge and groundwater pumping.
Predicting variations in GWLs is a complicated problem when natural time series
that affect GWL are diversified with very uneven anthropogenic impacts. Artificial
neural networks (ANNs) can give a more practical method to predict GWLs in a
highly uncertain and dynamic system. These are the reasons for which ANN models
based on time series have recently been broadly utilised in hydrology to estimate
and predict GWL [9, 10], water quality [11, 12], flood [13–15], precipitation [6,
16], etc. Gholami et al. [17] used CANFIS for simulating groundwater quality and
geographic information system (GIS) as a preprocessor tool for demonstrating spatial
changes in groundwater quality. Results confirmed high efficacy of combining neuro-
fuzzy methods and GIS. Allawi et al. [18] proposed a modified CANFIS model
to allow improved detection of high nonlinearity patterns faced during reservoir
inflow forecasting of Aswan High Dam (AHD), Egypt. Applied statistical measures
show better performance of modified CANFIS, which considerably outperformed
ANFIS. Zhang et al. [19] developed a long short-term memory (LSTM) model
to predict long-term WTD in agricultural areas and compared the obtained results
with traditional feed-forward neural network (FNN) model. CANFIS was developed
for estimating monthly pan evaporation at two stations located in Uttarakhand and
Uttar Pradesh states of India [20, 21]. They validated it against multilayer percep-
tron (MLP) and multiple linear regression (MLR) and concluded supremacy of the
developed CANFIS model. Also, MLP, MLR and CANFIS were applied, and their
performance was investigated in predicting drought index at different study loca-
tions [21, 22]. In both studies, CANFIS predicted drought index better than other
models. Some studies have also revealed that hybrid methodologies that integrate
different ML models with optimization algorithms and data pre-processing tech-
niques can provide more precise outcomes than conventional ML as certain patterns
in data (e.g. periodicities, level shifts, trends) can be well apprehended by hybrid
methods. Bayatvarkeshi et al. [23] evaluated ANN, CANFIS, ANN-PCA (principal
component analysis) and three conjoint models, comprising WANN, WPCA–ANN
Simulation of Water Table Depth Using Hybrid CANFIS Model … 321
and W-CANFIS for predicting daily relative humidity. Their findings revealed that
WPCA–ANN model was optimal model to estimate RH. Singh et al. [24] used
MLP, random forest (RF), decision tree (DT), CANFIS and support vector machine
(SVM) models and their wavelets W-MLP, W-DT W-RF, W-CANFIS and W-SVM
for predicting soil permeability corresponding to physical aspects of soil. It was
found that wavelet-based models simulated better soil permeability results than non-
wavelet models, and W-RF had the highest efficiency and accuracy. Supreetha et al.
[25] used a hybrid lion algorithm-long short-term memory (LA-LSTM) to develop a
GWL forecasting model and compared it with traditional LSTM and FNN models.
Results reveal that hybrid LA-LSTM model forecasted GWL with better accuracy
for a larger data set. Moravej et al. [26] applied hybrid LSSVR-ISA and LSSVR-GA
and conventional GP and ANFIS models for monthly GWL forecasting in Karaj
plain, Iran. Azizpour et al. [27] implemented ANFIS-FA and WANFIS-FA to esti-
mate qualitative and quantitative groundwater parameters using collected data from
Karnachi Well, Kermanshah, Iran. They found that WANFIS-FA was successful
in groundwater parameters forecasting. The current study employed PSO to tune
BPNN (BPNN-PSO) for WTD simulation and forecasting in Nuapada watershed
and evaluate potential of proposed technique against conventional BPNN.
2 Study Area
Nuapada district lies between 20° 0' N and 21° 5' latitudes and 82° 20' E and 82° 40'
E longitudes. It falls in the western region of Odisha with 3407.5 km2 area. Because
of noticeable lack of any industry, the economy of this region mostly rotates around
agricultural activities. The soils are red, mixed red and black. The district agriculture
faces systematic risk and uncertainties with drought and acidic soils. It receives
an average rainfall of 1286 mm, majorly through monsoon rains. The summer is
scorching, and the temperature may increase to 48 °C (Fig. 1).
3 Methodology
3.1 CANFIS
CANFIS is an integrated form of artificial neural network and fuzzy system with
precise and quick capabilities [28]. Usefulness of ANFIS or CANFIS lies in the
usage of nonlinear fuzzy rules. Utilising fuzzy rules, correlations amongst outputs
in CANFIS are formulated with collective membership values [29]. If FIS with
one output z and two inputs x1 and x2 is utilised, then for CANFIS network, a
characteristic ruleset with two fuzzy IF–THEN rules for first-order Sugeno fuzzy
model can be articulated as given below:
322 I. Saikrishnamacharyulu et al.
3.2 CANFIS-FA
Yang [30] developed FA based on fireflies flashing behaviour. Three primary guide-
lines in FA are—all fireflies are unisex; every firefly has its personal brightness,
and luminous intensity governs the attractiveness in fireflies [31]. Figure 2 illus-
trates training arrangement flowchart of CANFIS-FA. Elementary stages involved
are elaborated in subsequent steps.
Step I: Initialise fireflies population where a set of fireflies will be randomly
produced on basis of scaling outcome of firefly population. Consequently, the set of
CANFIS parameters is mapped by each firefly.
Step II: Now fitness function f (u) represents light intensity of each firefly.
Step III: On basis of light intensity for each firefly, compute attractiveness β
utilising Eq. (4) and after that compare amid fireflies, transfer lower intensity firefly
u i towards high-intensity firefly u j utilising Eq. (5).
( )
β (r ) = βo exp −γ r 2 (4)
( )( )
u i = u i + β exp −γ ri2j u i − u j + α(rand − 0.5) (5)
where u i and u j —firefly with low and high intensity of light, γ —coefficient of
absorption, r 2 —Euclidian stance amid ith and jth firefly, α—random movement
behaviour, βo —attractiveness of β at r = 0, rand—arbitrary number produced in
interval [0−1].
Step IV: If iterations reached maximum permissible iterative number or system
fitness reached fitness threshold (i.e. iterations < maximum), end or return to step
II and repeat the procedure. This output represents fitness value and position of
preeminent firefly.
The results of CANFIS and CANFIS-FA models are compared to assess their poten-
tial in predicting WTD. Mean absolute error (MAE) and coefficient of determination
(R2 ) are used as statistical metrics. Values of MAE and R2 are shown in Table 1,
which reveals that best CANFIS model predicted WTD with R2 = 0.95184, MAE =
7.9614 in the training stage. Also, the best CANFIS-FA model estimated WTD with
R2 = 0.99163, MAE = 1.084 m. Comparison of obtained results indicates that the
precision of the CANFIS model increased by applying the optimization algorithm.
The MAE reduced by using CANFIS and CANFIS-FA models by 13.89–4.11 during
testing phases.
On the whole, comparison amid CANFIS and CANFIS-FA models shows that
hybrid CANFIS-FA performed superior to CANFIS in both stages. The key reason
for better performance of CANFIS-FA is that the model incorporates both neural
324 I. Saikrishnamacharyulu et al.
Table 1 Performance indicators (MAE, R2 ) values using CANFIS and CANFIS-FA methods
Station name Model name MAE training R2 MAE testing R2
Nuapada CANFIS 1 11.62 0.94815 16.3847 0.92736
CANFIS 2 10.3766 0.9497 14.906 0.92905
CANFIS 3 9.005 0.95026 14.217 0.9308
CANFIS 4 7.9614 0.95184 13.89 0.93177
CANFIS-FA 1 3.709 0.98421 5.752 0.96733
CANFIS-FA 2 3.1971 0.98596 5.114 0.9689
CANFIS-FA 3 1.92 0.9872 4.7899 0.96941
CANFIS-FA 4 1.084 0.99163 4.11 0.97098
Fig. 3 Scatter plots of predicted and calculated values by CANFIS and CANFIS-FA models in
testing period
326 I. Saikrishnamacharyulu et al.
Fig. 4 Comparison plot of observed and predicted monthly water table depth
5 Conclusion
Long-term WTD forecasting presents a significant challenge and is vital for the
sustainable management of water and environmental resources. However, because
of nonlinear interactions between GWL and its drivers and their multiscale behaviour
that varies with time, producing accurate water table depths is difficult. To address
such problems, this study tested potential of CANFIS-FA model for predicting WTD
of Nuapada watershed, Odisha, India. Applied model gives a capable new technique
for predicting WTD, verified by satisfactory WTD prediction performance in the
specified location. The robust CANFIS-FA model can work as a valuable tool for
predicting WTD. The results of this study will assist as a guideline to government
policymakers and authorities for future water management projects.
Simulation of Water Table Depth Using Hybrid CANFIS Model … 327
References
21. Malik A, Rai P, Heddam S, Kisi O, Sharafati A, Salih SQ, Al-Ansari N, Yaseen ZM (2020) Pan
evaporation estimation in Uttarakhand and Uttar Pradesh States, India: validity of an integrative
data intelligence model. Atmosphere 11(6):553
22. Malik A, Kumar A, Rai P, Kuriqi A (2021) Prediction of multi-scalar standardized precipitation
index by using artificial intelligence and regression models. Climate 9(2):28
23. Bayatvarkeshi M, Mohammadi K, Kisi O, Fasihi R (2020) A new wavelet conjunction approach
for estimation of relative humidity: wavelet principal component analysis combined with ANN.
Neural Comput Appl 32(9):4989–5000
24. Singh VK, Kumar D, Kashyap PS, Singh PK, Kumar A, Singh SK (2020) Modelling of soil
permeability using different data driven algorithms based on physical properties of soil. J
Hydrol 580:124223
25. Supreetha BS, Shenoy N, Nayak P (2020) Lion algorithm-optimized long short-term memory
network for groundwater level forecasting in Udupi District, India. Appl Comput Intell Soft
Comput
26. Moravej M, Amani P, Hosseini-Moghari SM (2020) Groundwater level simulation and fore-
casting using interior search algorithm-least square support vector regression (ISA-LSSVR).
Ground Water Sustain Dev 11 p 100447
27. Azizpour A, Izadbakhsh MA, Shabanlou S, Yosefvand F, Rajabi A (2022) Simulation of time-
series groundwater parameters using a hybrid metaheuristic neuro-fuzzy model. Environ Sci
Pollut Res pp 1–17
28. Abyaneh HZ, Varkeshi MB, Golmohammadi G, Mohammadi K (2016) Soil temperature esti-
mation using an artificial neural network and co-active neuro-fuzzy inference system in two
different climates. Arab J Geosci 9(5):377
29. Mohanta NR, Patel N, Beck K, Samantaray S, Sahoo A (2021) Efficiency of river flow prediction
in river using wavelet-CANFIS: a case study. In: Intelligent data engineering and analytics.
Springer, Singapore, pp 435–443
30. Yang XS (2009) Firefly algorithms for multimodal optimization. In: International symposium
on stochastic algorithms. Springer, Berlin, Heidelberg, pp 169–178
31. Poursalehi N, Zolfaghari A, Minuchehr A, Moghaddam HK (2013) Continuous firefly
algorithm applied to PWR core pattern enhancement. Nucl Eng Des 258:107–115
Monthly Runoff Prediction by Support
Vector Machine Based on Whale
Optimisation Algorithm
Abstract This study was conducted in catchment area of Baitarani River at Jaraikela,
situated in Eastern India. The Baitarani River is one of the most important rivers
in the eastern region of peninsular India, which later joins the Bay of Bengal.
This region frequently experiences floods due to its erratic rainfall patterns and
climatic conditions, which makes runoff prediction important for planning better
watershed management techniques and mitigation strategies. To simulate rainfall-
runoff process, SVM model integrated with Whale Optimisation Algorithm (WOA)
method has been used. WOA enhances the results by reducing the error margin in
SVM. For this purpose, 48 years (1981–2020) of statistical data have been used for
calibration, validation and testing of the model. The results show that the hybrid
SVM-WOA model outperforms the classical SVM model in terms of forecasting
accuracy and efficiency based on root mean squared error (RMSE), mean absolute
error (MAE), and Nash–Sutcliffe efficiency (NE) performance evaluation measures.
1 Introduction
Runoff prediction and simulation in watersheds are prerequisites for several practical
applications concerning environmental disposal and conservation and management
of water resources [1–3]. Over the last decade, machine learning and optimisation
algorithms have been widely used for creating hydrological models of rainfall-runoff
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 329
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_31
330 A. Mishra et al.
relationships [4–6], sediment load modelling [7–9]; flood prediction [10, 11] (Sahoo
et al. 2020), groundwater prediction [7, 12] and many more. These techniques gained
momentum among scholars and researchers due to better forecasting accuracy. SVM
has greater prediction capability than the classical ANN models. It incorporates the
structural risk minimisation principle that reduces the risk (error) and also seems
to substitute every other shortcoming of ANN models. Proper mid- and long-term
prediction of runoff is significant for better urban planning, watershed management,
urban construction and flood mitigation.
Bray and Han [13] illustrated the rainfall-runoff process using SVM in the region
of Bird Creek, USA. A transfer function model was compared with unit response
curve from SVM, and it was observed that a TF model outmatched SVM in short-
range predictions. It also depicts the complications in finding for global optima for a
SVM model. Behzad et al. [14] compared applicability of SVM, ANN, and ANN-GA
(genetic algorithm) models for predicting one-day ahead streamflow of River Bakhti-
yari, Iran, utilising local rainfall and climate data. Their findings showed that SVM
was the most efficient as it provides high accuracy outcomes, thus demonstrating the
forecasting capabilities of SVM. Misra et al. [15] performed runoff estimation for
Vamsadhara River basin, using SVM and ANN. It was concluded that SVM provided
robust and accurate estimation and significantly improved results than ANN. Sharma
et al. [16] examined the performance of SVM, ANN, and simple regression models
to estimate runoff in the Nepal watershed, and SVM proved to be superior to the
other two models, under comparable accuracy predictions. Sedighi et al. (2016)
used ANN model and SVM for simulating rainfall-runoff process subjective to snow
water equivalent height in Roodak catchment, Iran. The results displayed that SWE
enhanced the performance and precision of SVM. Regardless of extensive applica-
tion of these techniques, there are substantial disadvantages to applying these models.
A key disadvantage is their necessity to tune parameters of optimal learning proce-
dure, whereas the major concern is performance and predictability of these models.
In recent times, the application of metaheuristic optimisation algorithms revealed a
significant solution to ease the complications in the parameterisation of these models
[17–19].
Wang et al. [20] demonstrated the superiority of EEMD-SVM-PSO over ordi-
nary least-squares (OLS) regression and feed-forward neural network (FFNN) for
rainfall-runoff forecasting at Yellow River, China. Komasi and Sharghi [21] vali-
dated the dominance of wavelet SVM model over standalone classical ANN and
SVM models in predicting both long-term and short-term runoff discharge by taking
into consideration the seasonal effects in two catchments. Obtained outcomes showed
that wavelet SVM model could estimate both short- and long-term flow discharges.
Feng et al. [22] adopted SVM-QPSO to determine the input–output relationships in
Yangtze Valley, China. Test outcomes indicated that hybrid model gives improved
forecasting precision than several classical techniques like ANN and extreme learning
machines (ELM). Mohammadi et al. [23] applied standalone SVM and SVM-WOA
models to predict daily evapotranspiration at three meteorological stations in Iran.
Monthly Runoff Prediction by Support Vector Machine … 331
It was found that hybrid SVR-WOA model delivered the best performance. Al-
Zoubi et al. (2018) incorporated the SVM-WOA model to detect spam profiles on
social networking sites in multiple lingual contexts (English, Arabic, Spanish and
Korean). This hybrid model outperformed other commonly used classical models.
The proposed model detects spam profiles automatically and provides information
about the most influencing features during detection procedure, with a high degree
of accuracy. Anaraki et al. [24] estimated flood frequency using changing climatic
conditions in the region of Karun River basin, Iran, using machine learning, decom-
position and metaheuristic algorithms. MARS and M5 tree models are applied to
classify precipitation, WOA is used to train LSSVM, wavelet transform (WT) is
performed to decompose temperature and precipitation, ANN, K-nearest neigh-
bour (KNN), LSSVM and LSSVM-WOA are applied to downscale temperature and
precipitation, and discharge is simulated under the considered time period. Results
showed the superiority of LSSVM-WOA-WT model in simulating discharge. Vahed-
doost et al. [25] developed a metaheuristic optimisation tool, ANN-WOA model, to
define soil parameters of East and West Azerbaijan provinces in Iran. This hybrid
model turned out to be superior to other core optimisation models of ANN and
multilinear regression (MLR).
Objective of current research is to develop a runoff forecasting model based on
hybrid CANFIS method emphasising high-value flows. Outcomes related to standard
CANFIS model are utilised as a benchmark to show the hybrid method’s performance.
2 Study Area
Baitarani River originates from hill ranges of Keonjhar District, Odisha, with an
elevation varying between 32 and 1024 m. Being one of the major rivers of Odisha,
the river flows mainly through the state of Odisha and some parts through Jharkhand.
It lies between20°35' N to 22°15' N latitude and 85°03' E and 87°03' E longitude,
spreading over 14,351 km2 in Odisha with a major part of the basin being covered by
agricultural land. The monsoon season extends from June till October with minimum
and maximum annual rainfall of 800 mm and 2000 mm, respectively, with an average
yearly precipitation of about 1400 mm (Fig. 1).
3 Methodology
Vapnik [26] first introduced SVM as a two-layer network (in principal layer, weights
are nonlinear, and in subsequent layer, linear). In general, ANN adapts all constraints
(utilising clustering or gradient-based methods), SVM selects constraints as training
332 A. Mishra et al.
input vectors for first layer since this minimises dimension [27–30]. In mathematical
terms, a primary function for statistical learning procedure is
Σ
M
y = f(x) = ai ϕi (x) = wϕ(x) (1)
i=1
WOA optimisation algorithm was first familiarised by Mirjalili and Lewis [31],
motivated by bubble-net feeding behaviour of humpback whales. Usually, they
hunt smaller fishes and other aquatic animals by generating an enclosing curtain
of bubbles. In WOA, target prey is reflected to be the preeminent solution. Probable
circumstance of humpback whales nearby the target is articulated by:
−
→ −
→ | (− → → )||
−
|
X (t + 1) = X ∗ (t) − A→ · |C→ · X ∗ (t) − X (t) | (3)
−
→
where X —condition vector of whale, t—running iteration, X * —condition vector of
preeminent solution and is restructured if there is an enhanced solution.
A→ = 2→
a · r→ − a→
C→ = 2 · r→
A→ and C—coefficient
→ vectors, and a→ —linearly reduces from 2 to 0 in the process
of iteration, r→—unplanned vector ∈ [0, 1].
−
→ −
→ −
→
X (t + 1) = D ' · ebl · cos(2πl) + X ∗ (4)
−
→' |− → ||
|→ −
D = | X ∗ (t) − X (t)|—space amid prey and ith whale; b—constant to deter-
mine logarithmic helix-shaped wave; l—arbitrary number between [−1, 1]. By
shrinking circles, whales move around prey along spiral-shaped paths. Subsequent
mathematical model is formulated to model this concurrent behaviour:
⎧−→
−
→ X ∗ (t) − A→ · D
→ if p ≤ 0.5
X (t + 1) = −→' bl −→∗ (5)
D · e · cos(2πl) + X if p ≥ 0.5
where p ∈ [0, 1]; this permits one for finding probability to keep rotation mode
updating according to whales’ situations.
334 A. Mishra et al.
5 Conclusion
This paper uses SVM (machine learning tool) to model monthly rainfall-runoff data
of the Baitarani River Basin at Jairaikela, Odisha, India. The kernel functions used
were RBF and Gaussian. Among these, Gaussian showed the highest efficacy. The
results obtained from the standalone model were compared with that of SVM-WOA
model. Although the SVM model generated fairly good results, SVM-WOA could
336 A. Mishra et al.
Fig. 3 Scatter plots for best models of observed vs. predicted (SVM and SVM-WOA) runoff values
Fig. 4 Monthly predicted runoff by SVM and SVM-WOA models for Jaraikela station
bridge the few gaps in the SVM model and produce better prediction in terms of
variability and efficiency. The RMSE values of both the models were 0.968, 11.374
while the NS values were 0.9956 and 0.9721 respectively, which clearly indicate that
SVM-WOA was found more prominent than standalone SVM. Thus, we conclude
that SVM-WOA is suitable to carry out data estimation of runoff and can also be
applied for flood and groundwater forecasting.
Monthly Runoff Prediction by Support Vector Machine … 337
References
1. Samantaray S, Sahoo A (2021) Modelling response of infiltration loss toward water table depth
using RBFN, RNN, ANFIS techniques. Int J Knowl Based Intell Eng Syst 25(2):227–234
2. Sahoo A, Singh UK, Kumar MH, Samantaray S (2021) Estimation of flood in a river basin
through neural networks: a case study. In: Communication software and networks. Springer,
Singapore, pp 755–763
3. Mohanta NR, Biswal P, Kumari SS, Samantaray S, Sahoo A (2021) Estimation of sediment
load using adaptive neuro-fuzzy inference system at Indus River Basin, India. In: Intelligent
data engineering and analytics. Springer, Singapore, pp 427–434
4. Samantaray S, Sahoo A (2020) Prediction of runoff using BPNN, FFBPNN, CFBPNN
algorithm in arid watershed: a case study. Int J Knowl Based Intell Eng Syst 24(3):243–251
5. Jimmy SR, Sahoo A, Samantaray S, Ghose DK (2021) Prophecy of runoff in a river basin using
various neural networks. In: Communication software and networks. Springer, Singapore, pp
709–718
6. Samantaray S and Sahoo A (2021) Modelling response of infiltration loss toward water table
depth using RBFN, RNN, ANFIS techniques. Int J Knowl.-Based Intell Eng Syst 25(2):227–234
7. Samantaray S, Sahoo A, Ghose DK (2020) Prediction of sedimentation in an arid watershed
using BPNN and ANFIS. In: ICT analysis and applications. Springer, Singapore, pp 295–302
8. Mohanta NR, Patel N, Beck K, Samantaray S, Sahoo A (2021) Efficiency of river flow predic-
tion in river using Wavelet-CANFIS: a case study. Intelligent data engineering and analytics.
Springer, Singapore, pp 435–443
9. Sahoo A, Samantaray S, Singh RB (2020) Analysis of velocity profiles in rectangular straight
open channel flow. Pertanika J Sci Technol 28(1)
10. Agnihotri A, Sahoo A, Diwakar MK (2021) Flood prediction using hybrid ANFIS-ACO
model: a case study. In: Proceedings of ICICIT 2021, inventive computation and information
technologies, p 169
11. Sahoo A, Samantaray S, Ghose DK (2021) Prediction of flood in Barak River using hybrid
machine learning approaches: a case study. J Geol Soc India 97(2):186–198
12. Samantaray S, Sahoo A, Ghose DK (2019) Assessment of groundwater potential using
neural network: a case study. In: International conference on intelligent computing and
communication. Springer, Singapore, pp 655–664
13. Bray M, Han D (2004) Identification of support vector machines for runoff modelling. J J
Hydroinform 265–280
338 A. Mishra et al.
14. Behzad M, Asghari K, Eazi M, Palhang M (2009) Expert systems with applications gener-
alization performance of support vector machines and neural networks in runoff modeling. J
Expert Syst Appl 36:7624–7629
15. Misra D, Oommen T, Agarwal A, Mishra SK, Thompson AM (2009) Application and analysis
of support vector machine based simulation for runoff and sediment yield. J Biosyst Eng
103:527–535
16. Sharma N, Zakaullah M, Tiwari H, Kumar D (2015) Runoff and sediment yield modeling using
ANN and support vector machines: a case study from Nepal watershed. J Model Earth Syst
Environ 1(3):1–8
17. Samantaray S, Biswakalyani C, Singh DK, Sahoo A, Prakash Satapathy D (2022) Prediction
of groundwater fluctuation based on hybrid ANFIS-GWO approach in arid Watershed, India.
Soft Comput 26(11):5251–5273. Springer, Berlin Heidelber
18. Mohanta NR, Panda SK, Singh UK, Sahoo A, Samantaray S (2022) MLP-WOA is a successful
algorithm for estimating sediment load in Kalahandi Gauge Station, India. Proceedings of
international conference on data science and applications. Springer, Singapore, pp 319–329
19. Kisi O, Sanikhani H, Zounemat-Kermani M, Niazi F (2015) Long-term monthly evapotranspi-
ration modeling by several data-driven methods without climatic data. Comput Electron Agric
115:66–77
20. Wang WC, Xu DM, Chau KW, Chen S (2013) Improved annual rainfall-runoff forecasting
using PSO–SVM model based on EEMD. J Hydroinform 15(4):1377–1390
21. Komasi M, Sharghi S (2016) Hybrid wavelet-support vector machine approach for modelling
rainfall–runoff process. J Water Sci Technol 73(8):1937–1953
22. Feng ZK, Niu WJ, Tang ZY, Jiang ZQ, Xu Y, Liu Y, Zhang HR (2020) Monthly runoff time
series prediction by variational mode decomposition and support vector machine based on
quantum-behaved particle swarm optimization. J J Hydrol 583:124627
23. Samantaray S, Sahoo A, Ghose DK (2020) Infiltration loss affects toward groundwater fluc-
tuation through CANFIS in arid watershed: a case study. In: Smart intelligent computing and
applications. Springer, Singapore, pp 781–789
24. Anaraki MV, Farzin S, Mousavi SF, Karami H (2021) Uncertainty analysis of climate change
impacts on flood frequency by using hybrid machine learning methods. J Water Resour Manage
35(1):199–223
25. Vaheddoost B, Guan Y, Mohammadi B (2020) Application of hybrid ANN-whale optimization
model in evaluation of the field capacity and the permanent wilting point of the soils. J Environ
Sci Pollut Res 27(12):13131–13141
26. Vapnik V (1995) The nature of statistical learning theory. Springer, New York
27. Mohammadi B, Mehdizadeh S (2020) Modeling daily reference evapotranspiration via a novel
approach based on support vector regression coupled with whale optimization algorithm. J
Agricult Water Manage 237:106145
28. Ala’M, AZ, Faris H, Alqatawna, JF, Hassonah MA (2018) Evolving support vector machines
using whale optimization algorithm for spam profiles detection on online social networks in
different lingual contexts. J Knowl Based Syst 153:91–104
29. Samantaray S, Ghose DK (2021) Prediction of S12-MKII rainfall simulator experimental runoff
data sets using hybrid PSR-SVM-FFA approaches. J Water Clim Change
30. Samantaray S, Ghose DK (2020) Modelling runoff in an arid watershed through integrated
support vector machine. H2 Open J 3(1):256–275
31. Mirjalili S and Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Application of Adaptive Neuro-Fuzzy
Inference System and Salp Swarm
Algorithm for Suspended Sediment Load
Prediction
Abstract Due to the sheer importance of suspended sediment load (SSL) in water-
shed management and design of engineering structures and considering the impact
of rainfall, temperature, and runoff parameters in quantifying and understanding
nonlinear interdependence, it has been a crucial task to predict suspended sediment
load based on these parameters. For this purpose of prediction, a soft computing
model (adaptive neuro-fuzzy inference system (ANFIS)) is optimized with Salp
swarm algorithm (SSA), and the results were validated against a well-established
classical ANFIS model. Data from Jaraikela catchment area in Jharkhand with some
part of it in Sundergarh district of Odisha were used in the analysis. The perfor-
mance of the models was evaluated based on MSE and WI performance indicators.
In comparing the results of the models used, it is evident that ANFIS-SSA model
proved its ascendancy over ANFIS.
1 Introduction
Sediment load is one of the most indispensable hydrological and hydraulic criteria,
which affects efficiency of water diversion projects and hydraulic structures. It can
cause environmental issues like damaging the aquatic ecosystem and reducing the
quality of surface water. Besides that, the reservoir capacity is reduced, and oper-
ational policy (i.e., energy generation, irrigation, and water supply) is affected due
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 339
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_32
340 G. K. Sahoo et al.
prediction of SSL for Talar basin, Iran, and Eagle Creek basin, USA. They compared
stand-alone ANFIS with other neural network models and also the hybridization
of ANFIS, integrated with four optimization algorithms, i.e., particle swarm opti-
mization (PSO), sine–cosine algorithm (SCA), bat algorithm (BA), and firefly algo-
rithm (FA). Results specified that ANFIS-SCA outperformed other applied models.
The goal of the current study is to integrate stand-alone ANFIS model with SSA
for creating a robust model for SSL prediction. For this purpose recorded data
(2001–2020) of Jaraikela catchment, India was tested for predicting monthly SSL
outcomes.
2 Study Area
Jaraikela catchment area spreads from 21° 50' N to 23° 36' N latitude and 84° 29' E to
85° 49' E longitude. This catchment area extends around the River Koel, a tributary of
River Brahmani, that originates close to Palamu Tiger Reserve, Jharkhand. Elevation
of watershed area stretches from 185 m at Jaraikela gauge station to 640 m in the
upper portion of the watershed. Total drainage area of Jaraikela catchment is almost
9160 km2 . A major part of this catchment lies in Jharkhand state and some part of
the catchment covers the Sundergarh district of Odisha. About 80% of rainfall in
this catchment occurs during the monsoon season. The catchment’s topography is
undulating and flat, shielded with deep forest and cultivable lands, and the climate
is categorized as sub-humid (Fig. 1).
3 Methodology
Two rules can be expressed for a typical first-order Sugeno inference system
3.2 SSA
Y ji + Y ji−1
Y ji = (4)
2
where Y ji —commencement of jth dimension and ith salp at SSA approach that is
employed to fitness function as below equation.
λ = Min(LP) (5)
LP—ANFIS’s learning parameter using SSA, and after iteration, λ value can be
reorganized and verified for coverage of accurate value. The best iteration quantity
could be decided at finish of iteration. Objective function of SSA is defined using
the subsequent equation.
[
|
| 1 Σ
N
F =] (OB(R = Pa , Ia , Da ))2 (6)
N 1
344 G. K. Sahoo et al.
The ANFIS and ANFIS-SSA models were utilized to model SSL time series for
considered input combinations. The performance of proposed models is given in
Table 1. For evaluating efficiency of models, MSE and WI measures were employed
for all scenarios. Based on Table 1, WI and MSE for ANFIS models ranges from
0.93816 to 0.96883, 29.4963 to 14.908. In ANFIS-SSA models, parameters are in
range 0.97336 to 0.99489 and 9.1998 to 1.0132, respectively.
Predicted and observed cumulative SSL during testing phase is illustrated in Fig. 3
in the form of scatter plots. Predictions by ANFIS-SSA were closer to 45° straight
line compared to stand-alone ANFIS model in Jaraikela catchment. As it is raised
from Fig. 4, ANFIS-SSA models have better predictions for SSL prediction than
stand-alone ANFIS models. ANFIS model constantly underestimated peak SSL,
whereas ANFIS-SSA model provided consistent results with peak SSL occurrences
in Jaraikela catchment. Magnitudes of the ANFIS-SSA models’ low, medium, and
high SSL predictions were nearer to observed values. It was observed that cumulative
SSL estimated by optimal ANFIS-SSA model was in good agreement with collected
data. Nevertheless, ANFIS-SSA model showed a superior performance than ANFIS
model.
Application of Adaptive Neuro-Fuzzy Inference System … 345
5 Conclusion
that ANFIS-SSA model outperformed than ANFIS model, with MSE = 1.0132,
WI = 0.99489 and MSE = 14.908, WI = 0.96883, respectively. Hence, this study
revealed the potential of ANFIS-SSA model in predicting suspended sediment load.
The current outcome of this research can be expanded further by including more
hydrological and climatological data for the proposed model and by trying different
input combinations for various time series data.
References
1. Mohanta NR, Biswal P, Kumari SS, Samantaray S, Sahoo A (2021) Estimation of sediment
load using adaptive neuro-fuzzy inference system at Indus River Basin, India. In: Intelligent
data engineering and analytics. Springer, Singapore, pp 427–434
2. Samantaray S, Ghose DK (2020) Assessment of suspended sediment load with neural networks
in arid watershed. J Inst Eng (India) Ser A 1–10
3. Samantaray S, Sahoo A, Ghose DK (2020) Infiltration loss affects toward groundwater fluc-
tuation through CANFIS in arid watershed: a case study. In: Smart intelligent computing and
applications. Springer, Singapore, pp 781–789
4. Agnihotri A, Sahoo A, Diwakar MK (2021) Flood prediction using hybrid ANFIS-ACO model:
a case study. In: Inventive computation and information technologies: proceedings of ICICIT
2021, p 169
Application of Adaptive Neuro-Fuzzy Inference System … 347
Abstract Immediately following mango, the banana (Musa sp.) is India’s second
most important fruit crop. Once again, bananas are the most important fruit in world-
wide trade and the most widely consumed, ranking second only to citrus in terms
of value after apples. The size, colour, and ripeness of the fruits are the primary
factors in grading. The grading of bananas is based on maturity in four stages: green,
yellowish-green, mid-ripen, and overripen. Here, the maturity label of banana is esti-
mated using deep features of VGG16 and texture feature with SVM classifier. The
performance of classification is measured individually for deep feature and texture
features. The classification task is also performed using both deep and texture features
using parallel fusion. The accuracy and AUC using deep feature, texture feature and
both (using parallel feature fusion) are 92.34% and 0.99%, 89.99% and 0.97%, and
99.87% and 1%, respectively.
1 Introduction
Deep learning has been hailed as the cutting-edge technology in computer vision
techniques for image classification in the age of computerization. The quality of
fresh banana fruit is the most significant source of concern for purchasers and fruit
processors, and ripeness is the most important element in determining the fruit’s
storage life. The productivity of a banana’s development stage and the speed with
which it can be classified is the most conclusive variables influencing its quality
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 349
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_33
350 A. K. Ratha et al.
[1]. Fruit ripeness can be determined by a variety of characteristics, and the most
important of which is skin colour [2]. The majority of the time, human specialists use
optical identification to determine the stage of maturity of the fruit, which is prone
to inaccuracy. Consequently, it is important to draw and execute image processing
techniques in order to fix the ripening stage categorization of the various fresh arriving
banana bundles at the appropriate time. The amount of ripeness in banana fruit has
the greatest impact on the eating quality and market price of the banana fruit [3]. In
this research, it is proposed that a computer vision model be used to automatically
detect the ripening stages of bananas. Bananas are outlined at four different stages
of maturity in this article. The deep feature, the texture feature, and the SVM are
all used in the execution of the operation. Preparation of a data set consisting of
images of 104 green bananas, 48 yellowish-green bananas, 88 mid-ripen bananas,
and 32 over-ripen bananas is completed. This proposed system categorises bananas
according to their maturity level, which makes them more easily marketable.
There have been numerous reports on the use of image processing and machine
learning techniques for the classification of bananas. Mendoza and co-workers used
image analysis techniques to categorise ripened bananas into seven class of fruit. 49
banana samples were classified into their seven ripening stages with an accuracy of
98% using the L*, a*, and b* bands, brown area percentage and contrast. Several
chemical parameters, including Brix and pH are used to verify the findings [4]. Prabha
et al. proposed an image processing technique that may be used to precisely detect the
maturity stage of fresh banana fruit based on the colour and size values of the images
they took. A total of 120 images were used, with 40 images from each stage of devel-
opment, such as under-mature, mature, and over-mature. The accuracy was 85% [5].
Diez et al. used hyperspectral imaging techniques in the visible and near-infrared
(400–1000 nm) wavelength ranges to investigate the ripening stages of banana fruits
over their storage time in a ripening chamber (12 °C and 80–90% relative humidity)
[6]. Two batches of bananas, containing seven and fourteen bananas, were observed.
It was possible to discern between the spectral patterns associated with the various
ripening stages. The most significant changes in the relative reflectance spectra occur
around 680 nm, which is the wavelength at which an absorption band of chlorophyll is
centred. Principal component analysis applied to a calibration set of spectra revealed
statistically significant differences between the seven maturity classes based on the
scores of the first principal component (94.6% of the explained variance). Mesa et al.
proposed a deep learning model utilising morphological features and hyperspectral
imaging for grading bananas into three categories based on their quality [7]. This
method took into account both the external and internal characteristics of the banana
and achieved an accuracy of 98.45%. According to Mohapatra et al. [8], using the
dielectric properties of bananas, they created a quick and non-destructive method for
measuring the ripening stage of bananas. According to Olaniyi et al., an automated
method for distinguishing between healthy and unhealthy bananas was developed
using the GLCM texture feature and SVM [9]. The approach obtained an accu-
racy of 100%. An autonomous computer vision system for identifying the ripening
phases of bananas was created by Mazen and colleagues. First and foremost, a four-
class handmade database is constructed. Second, a framework based on artificial
Maturity Status Estimation of Banana Using Image … 351
neural networks is used to categorise and grade the ripening stage of banana fruits.
The system takes into account colour, the development of brown spots, and Tamura
statistical texture data. According to [10], this approach had a 97.45% of accuracy.
According to the state of the art, the maximum accuracy reached for banana grading
in terms of its maturity level in three classes is 85%, when image processing and
machine learning are used in conjunction with each other. As a result, it is necessary
to improve the accuracy of the system by include more banana maturity classes.
2 Methodology
The methodology comprises two phases. In the first phase, the performance of two
classification models, such as VGG16 plus SVM and GLCM texture feature plus
SVM, is evaluated for the classification of bananas into four classes as per their matu-
rity level. In the second phase, the parallel feature fusion is adapted. In our previous
work [11], for recognising 40 kinds of fruits, it was observed that in deep learning
approach VGG16 plus SVM and machine learning approach, GLCM feature plus
SVM outperformed the other classification models. Hence, we evaluated these two
models individually and got satisfactory results. Again, to enhance the performance,
the parallel feature fusion is adapted. Here, the deep feature of VGG16 (extracted
from fc8 layer) and 13 number GLCM feature is fused in a parallel fashion. So, a
total of 1013 features (1000 deep features of VGG16 plus 13 GLCM texture features)
are fed to SVM for classification. The detailed flow of methodology is depicted in
Fig. 1.
The images of four stages of bananas are collected, and then, the data set is then
enhanced by introducing different flipping and rotating operations. The distribution
of the original and enhanced data set is detailed in Table 1. The banana samples
concerning four ripen stages are illustrated in Fig. 1. The data set enhances by
executing the horizontal right-flip, horizontal left-flip, rotate right 90°, and rotate
left 90° operations. So, the data set is increased five times (Fig. 2).
a b c d
Fig. 1 Samples of banana. a Green banana b yellowish-green banana c mid-ripen banana d over-
ripen banana
352 A. K. Ratha et al.
Initially, the classification models, i.e. VGG16 plus SVM and GLCM plus SVM, are
evaluated. The models are executed in Windows 10, core i5, 5th generation, 8 GB
RAM laptop with in-built NVIDIA GEFORCE in MATLAB 2020a platform. The
deep feature of VGG16 with SVM resulted in an accuracy of 92.34% and an AUC
of 0.99. Further, the SVM with the GLCM texture feature resulted in an accuracy
of 89.99% and AUC of 0.97. Again, with the adaptation of parallel feature fusion,
the SVM achieved an accuracy of 99.87% and an AUC of 1. This experimentation
revealed that the performance is significantly increased with the adaptation of the
parallel feature fusion technique. Hence, the deep feature of VGG16 with GLCM
texture feature and SVM is the best classification model for grading bananas into
four levels concerning their maturity levels.
Maturity Status Estimation of Banana Using Image … 353
4 Conclusion
An automatic system for grading bananas according to their maturity level is impor-
tant for the stock and export market. Here, three classification models, one deep
learning approach, a machine learning approach, and a hybrid approach are evalu-
ated. In the deep learning approach, the deep feature of VGG16 with SVM and the
machine learning approach, the GLCM texture feature with SVM, is evaluated. These
two classification models are considered among their respective strategies based on
our expert knowledge and results for fruit recognition. Here, the deep learning and
machine learning approaches are satisfactory for grading bananas into four levels
as per maturity levels. The deep learning approach, i.e. the deep feature of VGG16
plus SVM, achieved an accuracy of 92.34% and AUC of 0.99. Again, in the machine
learning approach, i.e. GLCM texture feature plus SVM attained an accuracy of
89.99% and AUC of 0.97. Further, with the adaptation of the parallel feature fusion
technique, the performance of the classification model is significantly increased, i.e.
the accuracy is 99.87% and AUC is 1. This automatic grading approach is helpful to
grade ripen bananas.
References
1. http://nhb.gov.in/report_files/banana/BANANA.html
2. Prasad K, Jacob S, Siddiqui MW (2018) Fruit maturity, harvesting, and quality standards. In:
Preharvest modulation of postharvest fruit and vegetable quality. Academic Press, pp 41–69
3. Maduwanthi SDT, Marapana RAUJ (2019) Induced ripening agents and their effect on fruit
quality of banana. Int J Food Sci 2019:8 pages. Article ID 2520179. https://doi.org/10.1155/
2019/2520179
4. Mendoza F, Aguilera JM (2004) Application of image analysis for classification of ripening
bananas. J Food Sci 69(9):E471–E477
5. Prabha DS, Satheesh Kumar J (2015) Assessment of banana fruit maturity by image processing
technique. J Food Sci Technol 52(3):1316–1327
6. Diez B et al (2016) Grading banana by VNIR hyperspectral imaging spectroscopy. In: VIII
international postharvest symposium: enhancing supply chain and consumer benefits-ethical
and technological issues, pp 1194
7. Mesa AR, Chiang JY (2021) Multi-input deep learning model with RGB and hyperspectral
imaging for banana grading. Agriculture 11(8):687
8. Mohapatra A, Shanmugasundaram S, Malmathanraj R (2017) Grading of ripening stages of red
banana using dielectric properties changes and image processing approach. Comput Electron
Agric 143:100–110
9. Olaniyi E et al (2017) Automatic system for grading banana using GLCM texture feature
extraction and neural network arbitrations. J Food Process Eng 40(6):e12575
10. Mazen F, Nashat A (2019) Ripeness classification of bananas using an artificial neural network.
Arab J Sci Eng 44(8):6901–6910
11. Behera SK, Rath A, Sethy PK (2020) Fruit recognition using support vector machine based
on deep features. Karbala Int J Mod Sci 6(2). Article 16. https://doi.org/10.33640/2405-609X.
1675
Application of a Combined GRNN-FOA
Model for Monthly Rainfall Forecasting
in Northern Odisha, India
Abstract Rainfall forecasting is considered the most complex variable in the hydro-
logical cycle, and often its cause-impact relationship cannot be articulated in complex
or simple mathematical terms. Because of climate change, the varying amount of rain
can lead to either surplus or dryness in reservoirs. This research introduces a novel
hybrid model generalised regression neural network integrated with fruit fly opti-
misation algorithm (GRNN-FOA), to forecast monthly rainfall. Rainfall data were
collected from a local meteorological station from 1971 to 2020 and utilised in
this study to assess model performance. Performance of each approach is assessed
utilising root mean squared error (RMSE), Nash Sutcliffe efficiency (NSE), and Will-
mott index (WI). Results specify that the hybrid GRNN-FOA model is consistent
and accurate in estimating the risk level of significant rainfall events. Our proposed
robust model shows improved performance than conventional techniques, providing
a new thought in the area of rainfall prediction. This artificial intelligence-based
study would also help quickly and accurately predicting monthly rainfall.
1 Introduction
The most significant meteorological event with a substantial impact on human life
is rainfall. It is also considered one of the essential constituents of feasible planning
and design. Hence, proper understanding of rainfall-runoff process is significant in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 355
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_34
356 D. P. Satapathy et al.
water-sensitive urban designing approaches [5, 17, 21, 22]. As a result, understanding
and modelling rainfall have turned out to be essential in solving many water engi-
neering and flood problems and maintaining a stable agro-economic system fulfilling
necessities of sustainable growth [4, 7, 15, 21, 22, 24, 25]. Then again, inadequate
rain for an extended period causes droughts. Hence, rainfall prediction is essential
for protecting and improving the human lives, aquatic environment, and water usage
[3]. Recently, AI-based methods have become popular and are being broadly utilised
for forecasting/prediction purposes in different areas of science and engineering [1,
9, 11, 18–20]. These techniques are generalised data-driven methodologies that can
model linear and non-linear systems.
Nagahamulla et al. [12] investigated applicability of combined multilayer feed-
forward networks (MLFNs) with backpropagation (BP) algorithm, radial basis func-
tion network (RBFN), and GRNN for forecasting precipitation in Colombo, Sri
Lanka. Outcomes revealed that performance of integrated model is superior to perfor-
mances of other models. In another study, Chen et al. [2] applied MLFN, RBFN, and
GRNN to predict streamflow of River Jinsha, China. Lu et al. [8] employed GRNN,
support vector machine (SVM), and an autoregressive model to forecast monthly
rainfall. Their findings revealed that performance of both SVM and GRNN models
was better. Modaresi et al. [10] assessed performance of artificial neural network
(ANN), GRNN, least square-SVM, and K-nearest neighbour (KNN) for monthly
inflow forecasting to Karkheh dam, Iran, in different environments. Sanikhani et al.
[23] applied GRNN, multivariate adaptive regression splines (MARS), random forest
(RF), and extreme learning machines (ELMs) to estimate air temperature deprived
of climate-based inputs. They found that GRNN model was capable of estimating
temperature without climate-based inputs. Kamel et al. [6] employed GRNN and
RBFN to predict sub-surface evaporation rate considering wind speed, temperature,
water depth, and humidity as input parameters. Results showed that neural network
models have the potential for accurate prediction of evaporation rate.
Regardless of expected flexibility, latest investigations have revealed that stand-
alone AI techniques are not adequately appropriate for forecasting rainfall at longer
time scales, predominantly in semi-arid and arid areas where time series of rain
are very intermittent. Niu et al. [13] proposed GRNN-FOA for improving stability
and accuracy of icing prediction on transmission lines. Results indicated that GRNN-
FOA model provided better robustness, generality, and accuracy in icing forecasting.
Ruiming and Shijie [14] developed a reference evapotranspiration (ET0 ) prediction
model for daily ET0 prediction of Tieguanyin on the basis of integration of GRNN
and mathematical morphology clustering (MMC). FOA was utilised for optimising
GRNN’s smoothing factor. Predictions of different seasons under multifaceted mete-
orological conditions showed that projected model is effective with higher precision
and has better flexibility. Salehi et al. [16] aimed at forecasting and optimising pacli-
taxel biosynthesis and growth utilising GRNN-FOA data mining approach. Results
revelled that GRNN-FOA model produced better forecasting outputs than multilayer
perceptron-genetic algorithm (MLP-GA).
Application of a Combined GRNN-FOA Model for Monthly … 357
2 Study Area
Keonjhar District lies between 21° 1' N to 22° 10' N latitudes and 85° 11' E to
86° 22' E longitudes and covers a geographical area of 8303 km2 (Fig. 1). This
region is structurally and geologically complicated and is characterised by diverse
geomorphological set up leading to broadly deviating hydrogeologic conditions.
Because of tropical humid climate, Keonjhar receives average to massive rain from
southwest monsoon between June to September and from Northeast monsoon a little
between December and January. Average annual precipitation varies between 150
and 200 cm, and mean annual temperature varies between 22 and 27 °C.
3 Methodology
3.1 GRNN
GRNN has robust non-linear mapping capability and is suitable to solve prob-
lems related to linear and non-linear regression. GRNN shows good performance
on converging speed to best outcomes for small and large sample datasets. In
mathematical terms, solution of GRNN can be expressed as
∫ +∞ ( )
[ ] Y→ f Y→ , | X→ d X→
−∞
E Y→ | X→ = ∫ ( ) (1)
+∞ → → →
−∞ f Y , | X d X
[ ]
where X→ —input vector; Y→ —predicted result of GRNN; E Y→ | X→ —true value of
output Y→ ; and f (Y→ , | X→ )—combined probability density function of X→ and Y→ .GRNN
algorithm’s main architecture includes three main layers: input, hidden, and output.
First hidden layer is a pattern layer (RBF layer) with a Gaussian function, whereas
second layer is a summary layer with a linear function. Even though GRNN is
considered a straightforward and quick predictor, its usability is limited to only
regression models. It also has some disadvantages because it lacks the ability of
extrapolation. Moreover, GRNN is generally similar to kernel functions and is nega-
tively pretentious by matters associated with dimensionality. GRNN cannot overlook
inappropriate inputs without significant adjustments to its elementary algorithm.
3.2 FOA
Pan (2012) proposed FOA, an optimisation algorithm that mimics the foraging
behaviour of a fruit fly. FOA simulates procedure utilised by fruit flies for finding
food by manipulating their intense sense of vision and smell. FOA applies an itera-
tive space search for finding solutions (Cao and Wu 2016). Computations are simple
and minimal; their convergence rate is quick and generally easy for implementation
(Li et al. 2020; Cao and Wu 2016). Therefore, FOA has subsequently been the main
point of most investigations in optimisation domain (Mao et al. 2014). In addition,
FOA can get the better of problems in finding the optimum GRNN flattening factor
σ faced by the prevailing GRNN method, thus enhancing the prediction accurate-
ness. Figure 2 shows the flowchart of FOA optimisation procedure for GRNN, with
detailed steps.
Application of a Combined GRNN-FOA Model for Monthly … 359
This section describes the results of rainfall data using the GRNN and GRNN-FOA
based on different scenario condition. Performance indicators like NSE, RMSE,
and WI are employed for evaluating efficacy of model. Result reveals that GRNN-
FOA found better performance than standard GRNN with NSE, RMSE, WI values
are 0.9964, 1.39, 0.9937 during training phases. Similarly, GRNN values of NSE,
RMSE, WI are 0.9547, 10.3398, 0.9533 during training phases, respectively. The
performances of proposed algorithm for all five scenario conditions are given in
Table 1.
Figure 3 shows that the scatter plot results of predicting rainfall at Keonjar gauge
station with prominent R2 value 0.94353, 0.96875 for GRNN and GRNN-FOA,
respectively. Figure 4 shows comparison of predicted model (GRNN and GRNN-
FOA) and observed rainfall for Keonjhar gauge station. Box plot for actual and
predicted model (GRNN and GRNN-FOA) is given in Fig. 5.
It is observed that GRNN-FOA is superior to GRNN as a prospective approach.
GRNN cannot grasp non-linearity in a dataset, and GRNN-FOA wind-up in being
Table 1 Results presented in training and testing phases using GRNN and GRNN-FOA models
Station name Model name NSE RMSE WI NSE RMSE WI
Training Testing
Keonjhar GRNN 1 0.9516 13.5219 0.9486 0.9448 17.0254 0.9426
GRNN 2 0.9521 12.964 0.9502 0.9452 16.934 0.943
GRNN 3 0.9532 12.36 0.951 0.9463 16.128 0.9447
GRNN 4 0.954 11.047 0.9526 0.947 15.4126 0.9451
GRNN 5 0.9547 10.3398 0.9533 0.9487 14.394 0.9462
GRNN-FOA 1 0.9932 4.5872 0.9908 0.9658 8.8746 0.9617
GRNN-FOA 2 0.9943 3.478 0.9916 0.9664 8.103 0.9631
GRNN-FOA 3 0.995 2.9634 0.9924 0.967 7.4469 0.965
GRNN-FOA 4 0.9956 2.1178 0.993 0.9683 6.72 0.9668
GRNN-FOA 5 0.9964 1.39 0.9937 0.9699 5.3301 0.9672
Application of a Combined GRNN-FOA Model for Monthly … 361
Fig. 3 Scatter plots showing R2 and linearly fitted equation between observed and forecasted
rainfall values
Fig. 4 Observed and forecasted rainfall for a GRNN and b GRNN-FOA models
362 D. P. Satapathy et al.
useful in such situations. In addition, root mean square error (RMSE) is processed for
both models for assessing implementation of these models. The outcomes of present
research revealed that stand-alone ML models are capable of predicting rainfall with
standard level of precision, but, applying hybrid ML algorithms certainly provided
more precise rainfall predictions.
5 Conclusion
References
1. Agnihotri A, Sahoo A, Diwakar MK (2021) Flood prediction using hybrid ANFIS-ACO model:
a case study. In: Inventive computation and information technologies: proceedings of ICICIT
2021, p 169
2. Chen L, Singh VP, Guo S, Zhou J, Ye L (2014) Copula entropy coupled with artificial neural
network for rainfall–runoff simulation. Stoch Env Res Risk Assess 28(7):1755–1767
3. Danandeh Mehr A, Nourani V, Karimi Khosrowshahi V, Ghorbani MA (2019) A hybrid support
vector regression-firefly model for monthly rainfall forecasting. Int J Environ Sci Technol
(IJEST) 16(1)
4. Hartmann H, Snow JA, Stein S, Su B, Zhai J, Jiang T, Krysanova V, Kundzewicz ZW (2016)
Predictors of precipitation for improved water resources management in the Tarim River basin:
creating a seasonal forecast model. J Arid Environ 125:31–42
5. Jimmy SR, Sahoo A, Samantaray S, Ghose DK (2021) Prophecy of runoff in a river basin using
various neural networks. In: Communication software and networks. Springer, Singapore, pp
709–718
6. Kamel AH, Afan HA, Sherif M, Ahmed AN, El-Shafie A (2021) RBFNN versus GRNN
modeling approach for sub-surface evaporation rate prediction in arid region. Sustain Comput
Inform Syst 30:100514
7. Kusiak A, Wei X, Verma AP, Roz E (2012) Modeling and prediction of rainfall using radar
reflectivity data: a data-mining approach. IEEE Trans Geosci Remote Sens 51(4):2337–2342
8. Lu W, Chu H, Zhang Z (2015) Application of generalized regression neural network and
support vector regression for monthly rainfall forecasting in western Jilin Province, China. J
Water Supply Res Technol—AQUA 64(1):95–104
9. Moustris KP, Larissi IK, Nastos PT, Paliatsos AG (2011) Precipitation forecast using artificial
neural networks in specific regions of Greece. Water Resour Manage 25(8):1979–1993
10. Modaresi F, Araghinejad S, Ebrahimi K (2018) A comparative assessment of artificial neural
network, generalized regression neural network, least-square support vector regression, and
K-nearest neighbor regression for monthly streamflow forecasting in linear and nonlinear
conditions. Water Resour Manage 32(1):243–258
11. Mohanta NR, Patel N, Beck K, Samantaray S, Sahoo A (2021) Efficiency of river flow prediction
in river using wavelet-CANFIS: a case study. In: Intelligent data engineering and analytics.
Springer, Singapore, pp 435–443
12. Nagahamulla HR, Ratnayake UR, Ratnaweera A (2012) An ensemble of artificial neural
networks in rainfall forecasting. In: International conference on advances in ICT for emerging
regions (ICTer2012). IEEE, pp 176–181
13. Niu D, Wang H, Chen H, Liang Y (2017) The general regression neural network based on
the fruit fly optimization algorithm and the data inconsistency rate for transmission line icing
prediction. Energies 10(12):2066
14. Ruiming F, Shijie S (2020) Daily reference evapotranspiration prediction of Tieguanyin tea
plants based on mathematical morphology clustering and improved generalized regression
neural network. Agric Water Manage 236:106177
15. Sahoo A, Samantaray S, Paul S (2021) Efficacy of ANFIS-GOA technique in flood prediction:
a case study of Mahanadi river basin in India. H2Open J 4(1):137–156
16. Salehi M, Farhadi S, Moieni A, Safaie N, Hesami M (2021) A hybrid model based on general
regression neural network and fruit fly optimization algorithm for forecasting and optimizing
paclitaxel biosynthesis in Corylus avellana cell culture. Plant Methods 17(1):1–13
17. Samantaray S, Sahoo A (2020) Prediction of runoff using BPNN, FFBPNN, CFBPNN
algorithm in arid watershed: a case study. Int J Knowl Based Intell Eng Syst 24(3):243–251
18. Samantaray S, Sahoo A (2021) Modelling response of infiltration loss toward water table depth
using RBFN, RNN, ANFIS techniques. Int J Knowl Based Intell Eng Syst 25(2):227–234
19. Samantaray S, Sahoo A, Ghose DK (2019) Assessment of groundwater potential using
neural network: a case study. In: International conference on intelligent computing and
communication. Springer, Singapore, pp 655–664
364 D. P. Satapathy et al.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 365
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_35
366 V. Bhateja et al.
under a microscope. Microscopy images are acquired using samples (blood, oral
cavity, or urine) from the human body using which a slide is prepared and further
placed under a microscope for capturing an image. Further, using these images,
microorganisms are detected and classified by the pathologist [2]. Because of poor
visibility, contrast, acquired noise, etc., the credibility of these images is questioned
for classification purposes. For increasing the accuracy of classification, enhance-
ment of the microscopy images is necessary. To achieve better results, pre-processing
techniques are used followed by machine learning algorithms for an automated clas-
sification. In the works [3–5], multi-scale retinex (MSR) is used for the contrast
enhancement of the image. Dynamic range compression and color constancy are
obtained by this method. The GIF [6] is used for enhancement as well as noise
filtering in the microscopy images. It mostly emphasizes on the various uses of GIF
such as noise improvement, contrast enhancement, and feathering. In [7], OT has
been used for segmentation of the images. In [8], SIFT feature extraction is used for
classification of the images. In [9], SVM is used for classification of the classification
of the bio-medical images. The combinative approach using these techniques can be
used to develop an automated classifier for microscopy images. The organization
of the rest of the paper is in the following sections: Sect. 2 describes the general
overview of GIF, OT, SIFT, and SVM; In Sect. 3, discussion is on the IQA, summary
of the results, performance metrics, and discussion of the outcomes of this work. In
Sect. 4, conclusion of the work is discussed.
where p(x, y) is the value of pixel of the result at (x, y); q(x, y) is the value of input
image pixel and t is the optimum threshold value.
OT is an automatic region-based segmentation algorithm which largely depends on
the value of threshold selected. For achieving reliable segmentation, special attention
has to be given while selecting an optimal value of threshold.
SIFT is used for the extraction of the local features known as the key points of an
image. These extracted key points of the microscopy images are the rotation and
scale invariants which are used for image matching and image classification by the
machine learning algorithms. These SIFT features are dependent on the appearance of
the bacterial cells in the microscopy image [14]. These SIFT features are independent
of the illumination level of the image, minor changes in the viewpoint as well as the
noise in the images. The SIFT key points extracted are saved as a visual vocabulary
for the classification of images. This visual vocabulary is known as the bag of visual
words (BoVW) model. This BoVW model is used as a reference by the classification
algorithms.
for classification; therefore, the SVM hyperplane is of ‘Y ’ shape. During the testing
phase, the datapoints (referred to the features) are classified into the category to which
it is the closest. The proposed design methodology for the automated classification
is shown in Fig. 1.
(c) (d)
Fig. 2 a Original test image#1, b contrast enhancement using GIF, c segmented image, d SIFT
feature extraction
using SVM as discussed in Sect. 2.4. Based on this classifier, a confusion matrix is
obtained for the evaluation of the overall accuracy of the classifier. Response of the
simulation is shown in Figs. 2 and 3. Figure 2 depicts the results obtained at each step
of the working, while Fig. 3 depicts the confusion matrix obtained for the classifier.
The IQA parameters SD and entropy are used for the comparison of the responses
for original image and GIF response. The confusion matrix obtained after the
classification of the bacterial cells using SVM is depicted in Fig. 3.
3.3 Discussions
Figure 2 depicts a test image and the responses at all the proceeding steps, while Fig. 3
shows the confusion matrix obtained for the SVM classifier. The original test image
consists of the bacterial cells, but the appearance of these cells in the microscopy
images is not very clear. Thus, making the classification purpose very challenging.
The poor visual characteristics of the input microscopy images should be improved
370 V. Bhateja et al.
for the proper segmentation and classification. The contrast enhancement achieved
by GIF is shown in Fig. 2b. It can be clearly inferred from this image that the GIF filter
has improved the images quality by improving the contrast as well as sharpening the
edges. This is further proved by incremental values of SD and entropy in the GIF
response as compared to original microscopy images. The bacterial cells are visible
properly and are separated from each other as well as the background. After the
contrast enhancement, OT is used for segmentation. In the segmentation process, the
unwanted background is removed from the images, while the meaningful regions
(bacteria cells) are retained as depicted in Fig. 2c. After the segmentation of the
images, features are extracted in the form of key points in the images which are
further used for training the SVM for the successful classification of the bacterial
cells. The SVM is trained for the classification of six species of bacteria cells, and
the confusion matrix thus obtained is depicted in Fig. 3. On the basis of the confusion
matrix, the accuracy of the classifier is calculated to be 93.3%.
4 Conclusion
the classification is shown in the form of confusion matrix in Fig. 3 on the basis
of which accuracy is calculated to be 93.3% which is also very convincing. This
work can be further improved by improving the various methods used. GIF used in
this work is manually tuned which can be tuned automatically by using the various
optimizing algorithms [19]. OT is a very simple segmentation algorithm, which can
be replaced by using a more sophisticated technique. Further, the classifier used
here is SVM which can be further improved by using deep learning algorithms for a
more accurate classification. Thus, by incorporating these techniques, a more flexible
classifier can be developed.
References
15. Bhateja V, Taquee A, Sharma DK (2019) Pre-processing and classification of cough sounds in
noisy environment using SVM. In: Proceedings of 4th international conference on information
systems and computer networks (ISCON). Mathura, India, pp 822–826
16. Bhateja V, Nigam M, Bhadauria AS, Arya A, Zhang EY (2019) Human visual system based opti-
mized mathematical morphology approach for enhancement of brain MR images. J Ambient
Intell Humanized Comput 1–9
17. Sahu A, Bhateja V, Krishn A (2014) Medical image fusion with Laplacian pyramids.
In: Proceedings of international conference on medical imaging, M-health and emerging
communication systems (MedCom). Greater Noida, India, pp 448–453
18. The bacterial image dataset (DIBaS) is available online at: http://misztal.edu.pl/software/dat
abases/dibas/. Last visited on 10 Dec 2020
19. Jordehi AR (2015) Enhanced leader PSO (ELPSO): a new PSO variant for solving global
optimisation problems. Appl Soft Comput 26:401–417
Application of Machine Learning
Algorithms for Creating a Wilful
Defaulter Prediction Model
Abstract A “wilful defaulter” is a borrower who has the financial means to repay the
bank but chooses not to do so. With the increasing cases of such defaulters creating
serious economic implications for the country, it is essential to develop a robust
credit assessment model to predict defaulters. The primary objective of this study
is to develop an efficient wilful defaulter prediction model through the deployment
of machine learning algorithms like logistic regression, Naïve Bayes, and random
forest. The dataset was 250 public-listed companies as published by the RBI and
the All India Bank Employees Association for 2020–2021. The analysis showed that
debt service coverage ratio, debt-equity ratio, profit after tax, governance factors like
board size, promoters, and board composition are crucial factors in this prediction.
The study helps organizations by giving them a framework to focus upon these factors
to avoid financial complications in future.
1 Introduction
The term “default” refers to any individual or company that has not paid their loans
within the agreed-upon payback time and has breached the lending authority’s terms
and conditions. A “wilful defaulter” is a borrower who has the financial means to
repay the bank but chooses not to do so. The Reserve Bank of India (RBI) has
issued guidelines for identifying a wilful defaulter [16]. The defaulter ratio has been
steadily rising over a period of time for a number of reasons: taking advantage of the
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 373
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_36
374 B. Uma Maheswari et al.
country’s weak governance systems, ineffective economic and legal systems, and the
inability of the financial institutions’ risk assessment models to predict defaulters.
However, with the implementation of machine learning models, data analytics can
be effectively deployed to improve the prediction rate. This study aims to propose
a model that would not only prove to be an efficient credit assessment solution but
also act as a tool in minimizing the risk of defaults by predicting potential defaulters
and analysing the attributes that lead to the scenario.
2 Literature Review
The banking sector has contributed significantly to the country’s economic develop-
ment. One of the most important banking activities is lending loans to customers,
corporate organizations, micro, small, and medium enterprises (MSMEs), and start-
ups. Bank credit increased at a compounded annual growth rate (CAGR) of 0.29%
from fiscal year (FY) 16 to fiscal year (FY) 21, with total credit extended in FY21
totalling $1487.60 billion. The current challenges faced by a multitude of financial
institutions in the banking sector in India are high non-performing assets (NPAs) due
to an increasing number of defaulters with outstanding payments exceeding 25 lakhs
and 1 crore, according to RBI [16]. In March 2021, Indian banks were said to have
gross NPAs valued at over Rs. 8.3 lakh crore. Many businesses that were affected
by the current pandemic are experiencing liquidity shortages, which has resulted in
payments and reimbursements being delayed. Post-COVID defaults are expected to
increase to 10–11 lakh crores by March 2022 [4]. Wilful defaulters rose by over 200
from 2208 to 2494 in FY21, as of March 2021 [13] according to RBI data. The top
100 wilful defaulters owe lenders Rs. 84,632 crore [14]. Furthermore, legacy debts
or backlogs on the bank balance sheets must still be accounted for as they account
for a large share of NPAs. The Ministry of Finance in India has recently proposed
a scheme by setting up bad banks, collectively called the National Asset Recon-
struction Company (NARCL), with funding to help the banks recover from backlogs
during the first phase of recovery. This will help stabilize the margins of banks and,
furthermore, contribute to the GDP of the country [6]. Initially, the government has
agreed to lend Rs. 90,000 crores to NARCL and provide additional incremental funds
with an overall target of covering 2 lakh crores of NPA.
Many studies have developed models for credit risk assessment using unstructured
and structured data [2]. In certain studies, it was found that, in comparison with the
final prediction model, more emphasis should have been given to the data prepro-
cessing steps to achieve better accuracy [1]. The relevant features were extracted prior
to model construction by using a compromised analytic-hierarchy process (AHP)
approach [5]. In contrast to a mere quantitative metric-based approach, the financial
characteristics of companies were studied, which classified defaulters with respect to
industry and identified the likeliness of a defaulter trait using the Altman Z Score [9].
Application of Machine Learning Algorithms for Creating … 375
Another study focused on the various types of loan products and the credit scoring
models used to grant loans [1]. The characteristic traits were used to build a credit risk
model using logistic regression along with net cash flow from financing activities,
investment activities, and cash inflows and outflows [8].
Application of a CatBoost model with synthetic characteristics was used to create a
prediction model that focuses significantly on categorical [15]. Research on the Indian
banking sector and significant factors responsible for the non-performing assets have
been identified. According to this study, non-priority sector lending contributes more
to NPAs than priority sector loans [5]. In addition to this, it was also inferred that
a fraudulent credit rating affects the probability of default. It is seen that increasing
liquidity and competition are partly responsible for the growing NPAs [11]. An
empirical study was undertaken to examine the non-performing assets in India’s
governmental, corporate, and foreign sector banks [3]. In the case of educational loan
defaults, the impact of macroeconomic conditions greatly improved the classification
accuracy [7]. A few others studied the measures taken by the government to accelerate
the loan recovery rate, which can help with new model design [12].
Studies also dealt with the partitioning and clustering of real-time data by imple-
menting incremental k-means for clustering with reduced iterations, modified k-
modes using frequency measures, and K-prototype algorithms for a hybrid method,
thereby reducing the cost of implementation [10]. A thorough review of existing
literature in this domain indicates that mostly quantitative and financial factors were
considered for prediction of such credit assessment models, and the qualitative factors
were neglected. This study addresses the limitations of the previous research and uses
financial indicators, macroeconomic factors represented by industry performance,
and factors relating to corporate governance. The primary objective of the paper is
to design a machine learning model to predict the wilful defaulters using logistic
regression, Naive Bayes, and random forest algorithms and to understand the key
influential factors that lead to wilful default and outstanding loans.
3 Methodology
The backbone of the research primarily starts with the selection of a relevant
dataset comprising of wilful defaulters and non-defaulters, extracted from the recent
published list of wilful defaulters by the RBI based on outstanding payments,
credit ratings, history of repayments, and potential bankruptcy. A list of 150 wilful
defaulters was selected this, and 100 non-defaulter companies were selected from
the NIFTY 50 index. The resultant dataset is the combination of these 250 publicly
listed companies. The dataset considers variables that the literature has identified
as factors that contribute significantly to a firm’s performance in terms of financial,
industrial, governance, and firm performance characteristics. The data pertaining to
the parameters that define the characteristics of a wilful defaulter has been extracted
376 B. Uma Maheswari et al.
from the CMIE Prowess IQ database. The final dataset contains 30 variables under
these four main categories. The financial variables comprise key ratios like liquidity
ratios, profitability ratios, and leverage ratios. The industry characteristics include
market share, expenses, total assets, and total debts. The governance attributes include
board size, composition of directors, board diversity, number of meetings attended,
and CEO duality. The firm’s performance is measured using return on assets (ROA),
net worth, and net income. Among these variables, the dependent variable is the vari-
able indicating a wilful defaulter which is a binary classification variable with the
values “Yes” (defaulter) and “No” (non-defaulter). The sub-categories under each of
the indicators are presented in Table 1. The data preprocessing was done by identi-
fying the outliers and missing values and treating them. The outliers were treated by
the winsorization technique using the 95 percentile and 5 percentile rule to achieve
better results. Missing (NA) values were treated using the mean imputation method.
Figure 1 represents the model representing the variables like financial indicators,
industry characterestics and firm performance impacting whether a company would
be a wilful defaulter or not.
Application of Machine Learning Algorithms for Creating … 377
The different model performance measures used in this study are discussed here.
Accuracy is a measure of the overall model performance and is the percentage of
correct predictions, and Sensitivity (“true positive rate”) refers to the percentage
of correct positive predictions, and Specificity (“true negative rate”) refers to the
percentage of correct negative predictions. Area under the curve (AUC) measure
gives the area under the ROC curve (ROC—receiver operating characteristic). The
higher the AUC, the better the model distinguishes between positive and negative
classes. Gini coefficient ranges from 0 (inequality) to 1 (inequality). Kolmogorov–
Smirnov (KS) parameter estimates the degree of separation between the positive and
negative distributions. The higher the number, the better the model is at distinguishing
between positive and negative classes.
The analysis shows that there is a significant difference between the defaulter and non-
defaulter in terms of the different financial indicators and ratios of the non-defaulters
are better than the defaulters (Table 2).
Three machine learning models (logistic regression, Naïve Bayes, and random
forest) were applied to the dataset. The model outputs and the corresponding accuracy
measures are presented in Table 3.
Table 3 shows that the random forest algorithm has the highest accuracy in compar-
ison with the other models. Here, the specificity value is the highest in all the models,
which indicates that the prediction of negative classes is done accurately. The KS
parameter is very high in the random forest, and the Gini coefficient ranges from
0.15 to 0.60 in the model. Overall, the random forest algorithm gives the best results
in terms of models predicting wilful defaulters. Therefore, the variable importance
plot (Fig. 2) of the random forest algorithm was analysed in order to understand the
significant variables influencing defaulters.
It is seen that among the four main categories of financial, industry, governance,
and firm performance, the variable importance plot (Fig. 2) shows that under the
378 B. Uma Maheswari et al.
financial indicators, the highly significant values include profit after tax, debt-to-
equity ratio, and debt service coverage ratio (DSCR). PAT is an important indicator
of the operational efficiency and performance of the organization and is important
for the distribution of dividends or retained earnings to shareholders. A corporation’s
DSCR is important when a corporation has borrowings such as bonds, loans, or lines
of credit. If the DSCR ratio is less than one, it indicates a negative cash flow and the
borrower can only service debts partially. This can help lenders effectively assess
borrowers. If the DSCR ratio is 1 or above, it means the firm has sufficient revenue to
settle the debts. Hence, DSCR is an important parameter in risk assessment and loan
approval. The debt-equity ratio (D/E) measures a firm’s ability to repay loans based
on the equity distributed among its shareholders. Thus, lenders can be self-assured
about the credit worthiness of the borrowing firm before granting loans.
The other common variables affecting the performance of a company and causing
potential defaults include the short-term liquidity of the company as represented by
current assets and current liabilities. Creditors use this ratio to assess a company’s
capacity to pay short-term obligations. Before a loan approval is granted, lenders
understand the ability of the firm to pay off its debts and other obligations. In general,
higher assets indicate that the company is expanding. The asset turnover of a corpo-
ration is a measure of its capacity to produce revenue from its assets. The higher the
asset turnover, the more efficient the company is; on the other hand, a lower value
indicates that the company is not successfully utilizing its assets to generate revenue.
This is an important factor to consider while assessing the credit worthiness of a
borrower. If the total volume of assets was not sufficient enough, it could possibly
indicate that the market performance and liquidity of that company is weak. In such
cases, it is better to avoid taking risks by reconsidering loan approvals. The measure
of total assets is not only critical for loan grants but also determines the market
potential of a company in attracting potential investors. A high debt-to-equity ratio
is believed to signal that a company is having financial troubles and may be unable
to pay its creditors. If it is too low, the company is relying too heavily on equity to
support its operations, which can be expensive and inefficient. This ratio displays
the proportions of equity and debt utilized to fund a company’s assets, as well as the
extent to which shareholder equity can satisfy creditors’ obligations in the event of
a company’s failure.
Furthermore, the variables that fall under governance characteristics include the
number and percentage of promoters, board size, board composition, number of direc-
tors, and both independent and non-independent. Independent directors are those
who are outside board members with minimal responsibilities, like attending the
annual general meetings and being present during executive board events. The non-
independent board directors are those who are directly involved in the process of
decision making. These variables define the administrative efficiency of a company
and its corporate governance. By taking these into consideration, a lender can gain
insights into the management, hierarchy, and internal work ethics of an organiza-
tion. The ownership structure of a company has a significant impact on its long-term
performance, and research shows that a high concentration of promoter ownership
380 B. Uma Maheswari et al.
5 Conclusion
In India, the majority of the lenders grant loans to individuals and firms based upon
historical payment behaviour, credit ratings, and current financial position. Any finan-
cial institution with the authority to provide funding must have a stable credit risk
assessment model to avoid outstanding debt payments. Contemporarily, the risk
assessment system uses financial metrics and neglects administrative attributes. The
potential scope for the future lies with predictive analytics as it serves the purpose of
identifying frauds by prevention of risks involved in approving bad loans. This paper
proposes a machine learning model that takes all the major influential factors. The
prediction model built using the random forest algorithm provides the highest accu-
racy of 93.85%. The models have also identified key significant predictor variables
that impact the defaulter prediction. More weightage should be given to industry
parameters like total assets and liabilities, DSCR ratio, and debt-equity ratio. In
addition, governance factors like board size, promoters, composition and financial
indicators like PAT were also identified as important. The model can play an impor-
tant role in the mitigation of non-performing assets (NPA) in the country by predicting
potential wilful defaulters. The outcome of the paper highlights the advantages of
using machine learning to predict defaulters and promises a stable and effective credit
assessment solution that can be integrated with banks and financial institutions in
future. The study helps organizations by giving a framework to concentrate on the
factors that need to be focused upon to avoid such financial complications in future.
References
1. Ahmad Itoo R, Selvarasu A (2017) Loan products and credit scoring methods by commercial
banks. Int J Latest Trends Finance Econ Sci 7(1):1297–1304
2. Attigeri G, Pai MMM, Pai RM (2019) Framework to predict NPA/willful defaults in corporate
loans: a big data approach. Int J Electr Comput Eng (IJECE) 9(5):3786
3. Bhasin ML (2017) Galloping non-performing assets bringing a stress on India’s banking sector:
an empirical study of an Asian country. Int J Manage Sci Bus Res 6(3):1–26
4. Das JK, Dey S (2019) Factors contributing to non-performing assets in India: an empirical
study. Rev Prof Manage J New Delhi Inst Manage 16(2):62
5. Eddy YL, Nazri EM, Mahat NI (2020) Identifying relevant predictor variables for a credit
scoring model using compromised-analytic hierarchy process (compromised-AHP). J Adv
Res Bus Manage Stud 20(1):1–13
Application of Machine Learning Algorithms for Creating … 381
Chirag Arora
1 Introduction
Reduction in multipath effect and suppression of interference due to rain can be easily
achieved by circularly polarized antennas. Therefore, such antennas are good candi-
dates for various applications such as satellite communication [1–7]. Researchers
have adopted various techniques to enhance the gain and three dB axial ratio band-
width of the circularly polarized antenna [8–14]. In [8], Guo and Tan designed a
multilayered circularly polarized patch antenna. This antenna uses single feed and
provides wide bandwidth but its fabrication is quite tedious since very high preci-
sion is required to maintain the desired gap between different layers of the antenna.
C. Arora (B)
KIET Group of Institutions, Delhi-NCR, Ghaziabad, Uttar Pradesh, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 383
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_37
384 C. Arora
Wang et al., in [10], proposed a circularly polarized antenna with surface integrated
waveguide. The proposed antenna shows characteristics of good circular polariza-
tion but this feeding technique is not much easy to implement practically. The design
proposed by Cheng and Dong, in [11], provides wide three dB axial ratio bandwidth
of 22.58% and −10 dB impedance bandwidth of 48.75%. But these characteristics
have been obtained by use of two suspended metal rods, making the antenna bulky.
Pan and Dong proposed circularly polarized stacked antenna for radio frequency
reader applications [14]. This antenna produces good bore sight gain but at the cost
of a director patch, which is in addition to the parasitic patch. Thus, from the literature
survey presented above, it is concluded that the most common technique adopted to
enhance the performance of circularly polarized antenna is the use of multilayered
structure with air filled between the different layers. This technique results in less
fabrication cost, low antenna profile, and low dielectric loss. However, this technique
requires some dielectric posts to support the upper patch, thus causing difficulty in
fabrication. To the best knowledge of author, limited work has been done toward
performance improvement of circularly polarized patch antenna using multilayered
structure, where different layers are not filled with air, rather they are fixed together
with help of glue.
In this chapter, a two-layered circularly polarized microstrip patch antenna has
been explored, where the two layers are not separated by the air gap, rather they
are stuck together with some pasting material. The upper layer of this proposed
antenna is composed of a mitered square patch to acquire circular polarization. The
help of metamaterials have been taken to improve the performance of this antenna.
These are specially designed structures which possess peculiar properties that are not
found in naturally occurring materials. These materials were proposed by Veselago in
1967 [15] and then in 1990s, Pendry et al. showed electric plasma from wire-shaped
structures [16] and then magnetic plasma from ring-shaped structures [17]. However,
these structures were not planar in nature and hence, difficult to fabricate. Since
then, various two-dimensional metamaterial structures have been designed by various
researchers and are being widely used in antenna as well as other microwave and
millimeter wave applications [18–28]. The metamaterials can be incorporated with
conventional antennas in several ways, out of which the most common techniques
include their loading as substrate [29, 30] or superstrate [31–34]. Though these two
techniques provide significant betterment in the performance parameters of patch
antennas, but at the cost of tedious technique of designing array of metamaterial unit
cell or increased profile, respectively.
In this paper, authors have realized the metamaterial characteristics through
thin slots on lower patch and via holes which extend from L-shaped patch to the
ground plane. The narrow slots provide left-handed series capacitance and via holes
contribute to left-handed shunt inductance. Thus, this technique of metamaterial
realization eliminates the requirement of designing the array of metamaterial unit
cell.
Moreover, to squeeze the profile of the patch antenna, use of multi-band patch
antennas prove to be very beneficial. Several techniques have been discussed in the
literature to design multi-band antennas, such as stacking of two different structures
Design of Metamaterial-Based Multilayer Dual Band … 385
[35], using stubs [18], using defective ground plane [36], slotting the radiator to
create perturbations [37]. Out of various above-said techniques, perturbation creation
to achieve multi-band operation by slotting the radiator appears to be the simplest; as
it does not increase the profile of the antenna as well as no special arrangements are
required for its fabrication. Taking the advantage of this fact, in this chapter authors
have etched a plus-shaped slot on the lower patch so that dual band operation can
be achieved. Thus, the lower patch of the proposed patch antenna serves the two
functions—one it realizes the metamaterial effect and second it helps in achieving
dual band operation with the help of a plus-shaped slot.
Thus, in this literature a metamaterial-based multilayer two band circularly polar-
ized microstrip patch antenna is proposed. This antenna comprises of two layers, and
both these layers are pasted together with some glue, thus eliminating the problem
of aligning the two layers. The upper layer has a mitered square patch to achieve
circular polarization. The lower layer possesses a plus-shaped patch to obtain dual
band behavior. Further to improve the performance of this antenna metamaterials
have been used. The realization of metamaterial has been done by using the L-
shaped slots and via holes. The antenna is simulated on FR-4 substrate of thickness
(h) = 1.48 mm, dielectric constant (εr) = 4.3, and loss tangent = 0.01.
2 Antenna Design
A. Configuration of Antenna
The top view of this antenna is presented in Figs. 1 and 2. Whereas, the rare view
of the composite structure is given in Fig. 3. The designed antenna is composed of
two layers, which are pasted together with the help of some glue. The upper layer is
composed of a square mitered patch of length 12 mm × 16 mm. The dimensions of
lower patch are also same. Both the substrates possess the length and width of 60 mm
× 60 mm. The perturbation caused by this mitered patch leads to achieve the circular
polarization. The width of all slots on lower patch is 4 mm. Radius of each via hole
is 0.3 mm. The upper patch is mitered at the length of 4 mm. The antenna is excited
using a probe feed. The lower patch possesses a plus-shaped slot to obtain dual band
behavior. The performance improvement of this antenna is achieved by using the
metamaterials. The metamaterial behavior is realized by etching L-shaped slots on
the four corners of the lower patch and making via holes from each slot to the ground
plane. The L-shaped patches account for left-handed series capacitance, whereas via
holes contribute for shunt inductance. These four patches are etched on four corners
of the lower patch, symmetrically with respect to the central plus-shaped patch. The
location of via holes is selected in such a way that their presence does not affect the
circular polarization. Since the current intensity at the corners is usually weak, hence
via holes are introduced at the corner of the patches.
386 C. Arora
Upper Patch
Via
holes
B. Theory
As discussed in the literature [38], the metamaterial-based antennas are considered
as composite right/left-handed transmission line with a terminal open, whose equiv-
alent circuit is shown in Fig. 4. As observed from Fig. 4, a metamaterial struc-
ture not only comprises traditional RH shunt capacitance and series inductance, but
it also possesses left-handed shunt inductance and series capacitance. These left-
handed characteristics can be realized by thin slots on microstrip patch and via holes
Design of Metamaterial-Based Multilayer Dual Band … 387
Z Upper Patch
Via Holes
Lower Substrate
Coaxial Probe X
Ground
Fig. 3 Side view of the two-layered metamaterial-inspired dual band circularly polarized microstrip
patch antenna
Cshunt Lshunt
to the ground. This results in simultaneous negative and positive phase constant.
As compared to a conventional half wavelength antenna of same electrical length,
the metamaterial antennas possess lower resonant frequencies. Hence, metamaterial
structures can be used to realize compact antennas. Moreover, due to the presence
of multiple left- and right-handed resonant frequencies, metamaterial antennas also
provide dual band operations.
The two opposite cut corners help to produce the perturbation needed for the
circular polarization by using single feed. The location of via holes is decided in
such a way that it coincides with the direction of current so that the performance of
circular polarization is enhanced.
3 Results
This segment describes the simulated results of the proposed antenna with and
without metamaterial loading. Figure 5 shows the simulated return loss characteris-
tics of the conventional dual band circularly polarized microstrip patch antenna and
metamaterial loaded antenna at 5.8 GHz and 2.45 GHz. It is seen that the unloaded
antenna array resonates at 5.8 GHz and 2.45 GHz with bandwidth of 525 MHz and
388 C. Arora
S11 (dB)
-15
-20
-25
1.5 2.5 3.5 4.5 5.5 6.5
Frequency (GHz)
0 0
10
330 30 330 30
15
5 10
0 300 60 5 300 60
0
-5
-5
-10 270 90 -10 270 90
-5 -5
Loaded
0 Unloaded
0 240 Loaded 120 240 120
5
Unloaded
5 10
210 150
10
180
(a) 15 210 150 (b)
180
Fig. 6 Elevation plane radiation pattern curves of unloaded and loaded proposed antenna at a
5.8 GHz b 2.45 GHz
270 MHz, whereas when this traditional patch antenna is loaded with metamaterial,
bandwidth reaches to 590 MHz at the resonant frequency of 5.8 GHz and 325 MHz at
resonant frequency of 2.45 GHz. From Fig. 6, it is observed that gain of the proposed
antenna is almost same at both the resonant frequencies for loaded and unloaded
conditions.
4 Conclusions
References
1. Jeon SI, Kim YW (2000) New active phased array antenna for mobile direct broadcasting
satellite reception. IEEE Trans Broadcast 46(1):34–40
2. Sajal S, Latif SI, Spencer E (2018) Circularly polarized small-footprint hybrid ring-patch
stacked antenna for pico-satellites. In: IEEE international symposium on antennas and
propagation & USNC/URSI national radio science meeting. Boston, MA, USA
3. Satapathy SC et al (eds) (2016) Information systems design and intelligent applications. In:
Proceedings of third international conference INDIA 2016, vol 2. Springer India
4. Satapathy SC et al (2016) Computer communication, networking and internet security. In:
Proceedings of IC3T
5. Matsunaga M, Yamamoto M (2018) A double-band circularly polarized antenna for satellite
signal bands in the ratio of 3:8. In: IEEE conference on antenna measurements & applications
(CAMA). Sweden
6. Arora C (2021) Metamaterial-inspired circularly polarized microstrip patch antenna. In:
Proceedings of international conference on computer communication, networking and IoT.
Lecture notes in networks and systems book series, vol 197. LNNS, pp 183–190
7. Arora C (2021) Metamaterial-loaded circularly polarized patch antenna array for C band appli-
cations. In: Proceedings of 6th international conference on recent trends in computing. Lecture
notes in networks and systems book series, vol 177. LNNS, pp 57–64
8. Guo YX, Tan DCH (2009) Wideband single-feed circularly polarized patch antenna with
conical radiation pattern. IEEE Antennas Wirel Propag Lett 8:924–926
9. So KK, Wong H, Luk KM, Chan CH (2015) Miniaturized circularly polarized patch antenna
with low back radiation for GPS satellite communications. IEEE Trans Antennas Propag
63(12):5934–5938
10. Wang Y, Zhu F, Gao S (2018) 24-GHz circularly polarized substrate integrated waveguide-fed
patch antenna. In: International applied computational electromagnetics society symposium.
Beijing, China
11. Cheng Y, Dong Y (2021) Wideband circularly polarized split patch antenna loaded with
suspended rods. IEEE Antennas Wirel Propag Lett 20(2):229–233
12. Satapathy SC, Bhateja V, Joshi A (eds) (2016) Proceedings of the international conference on
data engineering and communication technology: ICDECT 2016, Volume 2, vol 469. Springer
13. Satapathy SC, Bhateja V, Das S (2018) Smart intelligent computing and applications. In:
Proceedings of the second international conference on SCI, vol 1
14. Pan Y, Dong Y (2020) circularly polarized stack Yagi RFID reader antenna. IEEE Antennas
Wirel Propag Lett 19(7):1053–1057
15. Veselago VG (1968) The electrodynamics of substances with simultaneously negative values
of ε and μ. Soviet Physics Uspekhi 10(4):509–514
16. Pendry JB, Holden AJ, Stewart WJ, Youngs I (1996) Extremely low frequency plasmons in
metallic mesostructures. Phy Rev Lett 76(25):4773–4776
17. Pendry JB, Holden AJ, Robbins DJ, Stewart WJ (1999) Magnetism from conductors and
enhanced nonlinear phenomena. IEEE Trans Microw Theory Tech 47(11):2075–2084
18. Ali T, Biradar RC (2017) A compact multiband antenna using λ/4 rectangular stub loaded with
metamaterial for IEEE 802.11N and IEEE 802.16E. Micro Opt Tech Lett 59(5):1000–1006
19. Alu A, Engheta N, Erentok A, Ziolkowski RW (2007) Single negative, double-negative, and
low index metamaterials and their electromagnetic applications. IEEE Antennas Propag Mag
49(1):23–36
20. Rezaeieh SA, Antoniades MA, Abbosh AM (2017) Miniaturization of planar Yagi antennas
using Mu-negative metamaterial-loaded reflector. IEEE Trans Antenna Propag 65(12):6827–
6837
21. Chen PY, Alu A (2010) Dual-mode miniaturized elliptical patch antenna with μ–negative
metamaterials. IEEE Antenna Propaga Lett 9:351–354
22. Joshi JG, Pattnaik SS, Devi S, Lohokare MR (2012) Metamaterial embedded wearable
rectangular microstrip patch antenna. Int J Antenna Propag 2012:1–9
390 C. Arora
23. Arora C, Pattnaik SS, Baral RN (2015) SRR inspired microstrip patch antenna array. J. Prog
Electromag Res C 58(10):89–96
24. Arora C, Pattnaik SS, Baral RN (2015) Microstrip patch antenna array with metamaterial
ground plane for Wi-MAX applications. In: Proceedings of the springer second international
conference on computer and communication technologies (IC3T-2015). India, pp 665–671
25. Arora C, Pattnaik SS, Baral RN (2016) Metamaterial superstrate for performance enhancement
of microstrip patch antenna array. In: Proceedings of 3rd international conference on signal
processing and integrated networks (SPIN-2016). India, pp 775–779
26. Palandoken M, Grede A, Henke H (2009) Broadband microstrip antenna with left-handed
metamaterials. IEEE Trans Antennas Propag 57(2):331–338
27. Du G, Tang X, Xiao F (2011) Tri-band metamaterial-inspired monopole antenna with modified
S-shaped resonator. Prog Electromag Res Lett 23:39–48
28. Gao XJ, Cai T, Zhu L (2016) Enhancement of gain and directivity for microstrip antenna using
negative permeability metamaterial. AEU Int J Electron Commun 70(7):880–885
29. Li M, Luk KM, Ge L, Zhang K (2016) Miniaturization of magnetoelectric dipole antenna by
using metamaterial loading. IEEE Trans Antennas Propag 64(11):4914–4918
30. Dong Y, Toyao H, Itoh T (2012) Design and characterization of miniaturized patch antennas
loaded with complementary split-ring resonators. IEEE Trans Antennas Propag 60(2):772–785
31. Arora C, Pattnaik SS, Baral RN (2017) SRR superstrate for gain and bandwidth enhancement
of microstrip patch antenna array. Prog Electromag Res B 76:73–85
32. Arora C, Pattnaik SS, Baral RN (2017) Performance enhancement of patch antenna array for
5.8 GHz Wi-MAX applications using metamaterial inspired technique. Int J Electron Commun
(AEÜ) 79:124–131
33. Chung KL, Chaimool S (2012) Broadside gain and bandwidth enhancement of microstrip patch
antenna using a MNZ-metasurface. Microw Opt Technol Lett 54(2):529–532
34. Wu Z, Li L, Li Y, Chen X (2016) Metasurface superstrate antenna with wideband circular
polarization for satellite communication application. IEEE Antennas Wirel Propag Lett 15:374–
377
35. Shafai L, Chamma W, Seguin G, Sultano N (1997) Dual-band dual polarized microstrip
antennas for SAR applications. In: Proceedings of IEEE antennas and propagation international
symposium. Canada, pp 1866–1869
36. Zayed ASA, Shameena VA (2016) Planar dual-band monopole antenna with an extended
ground plane for WLAN applications. Int J Antennas Propag 1–10
37. Mok WC, Wong SH, Luk KM, Lee KF (2013) Single-layer single-patch dual-band and triple
band patch antennas. IEEE Trans Antenna Propag 61(8):4341–4344
38. Caloz C, Itoh T (2002) Application of the transmission line theory of left-handed (LH) materials
to the realization of a microstrip LH line. In: 2002 IEEE antennas and propagation society
international symposium, vol 2, pp 412–415
Heart Disease Prediction in Healthcare
Communities by Machine Learning Over
Big Data
Abstract In today’s world, big data is the fastest and most widely used tool in
every industry. Medical and healthcare industries flourish with the help of vast data,
and with the help of massive data, the advantages of accurate medical data analysis,
early illness prediction, and accurate patient information may be securely held on and
employed. Furthermore, the accuracy of the study may be harmed due to a variety of
factors such as poor medical information and regional sickness features that might
be used to anticipate outbreaks and so on. In this work, we will show how to use
a machine learning algorithmic programme to correctly anticipate disease. To do
so, we will collect hospital data from a specific location. We can utilize latent factor
models to actualize unfinished information in the case of missing data. In prior work,
a convolutional neural network-based unimodal sickness prediction (CNN-UDRP)
algorithmic programme was used to forecast illness. The CNN-MDRP algorithmic
programme, which is based on multimodal sickness prediction, solves the shortcom-
ings of the CNN-UDRP algorithmic programme, which only works with structured
data. This algorithmic application makes use of all of the hospital’s organized and
unstructured data. None of the previous studies focused on every type of data in the
field of medical big data analysis.
L. Thirupathi (B)
CSE Department, Stanley College of Engineering and Technology for Women, Hyderabad,
Telangana, India
e-mail: [email protected]
B. Srinivasulu
CSE Department, Vidya Jyothi Institute of Technology, Hyderabad, Telangana, India
U. Khanapurkar
CSE Department, Methodist College of Engineering & Technology, Hyderabad, Telangana, India
D. Rambabu
CSE Department, Sreenidhi Institute of Science & Technology, Hyderabad, Telangana, India
C. M. Preeti
CSE Department, Institute of Aeronautical Engineering, Hyderabad, Telangana, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 391
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_38
392 L. Thirupathi et al.
1 Introduction
Therapeutic X-rays are images that are commonly used to examine a few sensitive
body parts such as the bones, chest, teeth, skull, and so on. For decades, experts
have used this method to study and visualize breakdowns or irregularities in human
organs. This would be very welcomed in revealing the shocking truth that X-rays are
extremely effective symptomatic tools in detecting obsessive alterations, in addition
to their non-invasive properties and economical concerns. CXR images will show
chest infections in the form of cavitation, combinations, penetrates, and small broadly
speaking transmitted modules. The radiologists will examine the chest X-ray image
for a variety of conditions and illnesses, including fiery illness, radiation, penetration,
module, and pathology, deviation from the norm, breaks, and a plethora of others.
Classifying the chest X-ray variations from the norm is taken under consideration as
a repetitive errand for radiologists; in this way, a few calculations were anticipated by
analysts to precisely perform this assignment. Over the past decades, computer-aided
assignment (CAD) frameworks have been created to extract helpful information from
X-rays to help specialists in having quantitative knowledge with respect to Relate
in Nursing X-ray. In any case, these CAD frameworks couldn’t have accomplished
an important level to make determinations on the kind of conditions of maladies
in Relate in Nursing X-ray. These profound systems appeared divine correctness
in performing expressions such assignments. This victory affected the analysts to
utilize these systems to therapeutic pictures for maladies classification assignments,
and thus it appeared that profound systems will with proficiency extricate supportive
choices that recognize completely diverse categories of pictures.
The most ordinarily utilized profound learning plan is that the convolutional neural
organize (CNN). CNN has been connected to changed restorative pictures classifi-
cation much obliged to its control of extricating completely distinctive level alterna-
tives from pictures. Having reacted to the associated examination, amid this paper,
a profound convolutional neural arrange (CNN) is utilized to upgrade the execution
of the assignment of the chest maladies in terms of precision and least square error
accomplished. For this reason, old and profound learning-based systems are utilized
to classify commonest pectoral illnesses and to bless comparative outcomes.
2 Literature Survey
Accurate prediction saves you time by minimizing the need to locate frequent action
sets. To enhance the accuracy of failure prediction classifications results, which antic-
ipates a person’s risk of developing heart disease and offers a succinct explanation of
numerous classification rules. In [1–4], the authors used various methods to predict
the diseases. In [5–19], the authors has focused on security-related aspects to predict
the diseases. In [20–32], the authors have given overview of big data in healthcare
domains. In [33], an enhanced fingerprinting and trajectory prediction for IoT were
Heart Disease Prediction in Healthcare Communities by Machine … 393
developed. In [34–39], the authors have used different models in healthcare domain
to assess the risks in data mining. Heart diseases are the most common disease to
be detected. We require an accurate model to decrease the effort of mankind. On the
premise of mining rule and therefore the given inputs, they’re making an attempt to
search out the problems of prediction of cardiomyopathy. In [40–43], data analysis,
news recognition, and plant leaf diseases are performed in machine learning.
3 Proposed System
To carry out all of the alterations inside the values included in the data set that we tend
to conduct our work, we use Python libraries. Our paper’s architecture is depicted
in Fig. 1. We usually use pandas to retrieve information about which frame to use
for data modification. Age, pain frequency and types, blood pressure, steroid alcohol
levels, and other variables are included in our data set. In Python, we frequently
employ machine learning methods such as Tree and SVM, as well as the scikit-
learn package. We like to visualize the results of our victimization Matplotlib library
implementation in Python. After that, we usually compare the two algorithms to see
which one is the greatest at supporting each algorithm’s accuracy. The input layer
is followed by a convolutional layer with 16 kernels and activation conducted as
ReLU, and twenty-five per cent of the nodes are born by the dropout layer within the
subsequent layer. The convolutional layer was applied with eight kernels with the
same settings as before, and the dropout layer was applied with twenty-fifth. Asso-
ciate the degree output layer with the prediction chance computations. For coaching
and testing purposes, the cleansed data is divided into eightieth coaching and two
hundredth testing. The same data set is examined using a variety of machine learning
classifiers, including supply regression (LR), NB, KNN, and SVM, as well as various
kernels, such as linear and RBF, and simple neural networks. Throughout this study,
we will use a CNN to accurately predict whether or not a patient has a cardiac
problem.
4 Algorithm
Neural network using convolution In general, CNN is made up of two levels. One
is the highlight extraction layer, which associates each neuron’s contribution with
the previous layer’s surrounding response fields and emphasizes the neighbour-
hood include. When the nearby highlights are extracted, the spatial relationship
between them and other highlights will be resolved as well. Figure 2 shows that
CNN multimodal disease risk prediction algorithm.
5 Implementation
6 Results
We applied the convolution neural network algorithm on our data set, and the effects
which will be produced are given inside in the form of a confusion matrix which
396 L. Thirupathi et al.
suggests the accuracy of the unique model in the form of true positive and true
negative values.
Figure 6 depicts the user interface which we designed. Here the user is requested
to enter the input image to check whether he is suffering from low risk or high risk.
In Fig. 7, x-axis: one unit-0.5 s and y-axis: one unit-0.5 epoch, and the input we
give here is chest X-ray images. It will analyse the images of both virus infected
people and normal people. We assign the value 1 for high risk and 0 for low risk
(Figs. 8 and 9).
7 Conclusion
For simple diseases, only structured data is enough to predict but for complex data
both structured and unstructured data is required. In this article, we used a new
convolutional neural network primarily based totally multimodal disease prediction
(CNN-MDRP) set of rules the usage of dependent and unstructured statistics from
statistics set. To the best of our knowledge, none of the previous works focused on
each statistical category in the context of clinical big data analytics. When compared
to other tough prediction algorithms, our proposed version has a high level of accu-
racy. In this paper, we will implement a substitute CNN-MDRP rule that targets both
structured and unstructured hospital data. To the best of our knowledge, no current
effort in the field of medical big data analytics has focused on every type of knowl-
edge. In comparison with several common prediction algorithms, our projected rule
has a prediction accuracy of 94.8% and a convergence speed that is faster than the
CNN-based uni-modal risk prediction (CNN-UDRP) rule.
Heart Disease Prediction in Healthcare Communities by Machine … 399
References
22. Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better
research applications and clinical care. Nat Rev Genet 13(6):395–405
23. Tian D, Zhou J, Wang Y, Lu Y, Xia H, Yi Z (2015) A dynamic and self-adaptive network
selection method for multimode communications in heterogeneous vehicular telematics. IEEE
Trans Intell Transp Syst 16(6):3033–3049
24. Chen M, Ma Y, Li Y, Wu D, Zhang Y, Youn C (2017) Wearable 2.0: enable human-cloud
integration in next generation healthcare system. IEEE Commun 55(1):54–61
25. Chen M, Ma Y, Song J, Lai C, Hu B (2016) Smart clothing: connecting human with clouds and
big data for sustainable health monitoring. ACM/Springer Mob Netw Appl 21(5):825–845
26. Chen M, Zhou P, Fortino G (2016) Emotion communication system. IEEE Access. https://doi.
org/10.1109/ACCESS.2016.2641480
27. Qiu M, Sha EH-M (2009) Cost minimization while satisfying hard/soft timing constraints for
heterogeneous embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 14(2):25
28. Wang J, Qiu M, Guo B (2017) Enabling real-time information service on telehealth system
over cloud-based big data platform. J Syst Architect 72:69–79
29. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G (2014) Big data in health care: using
analytics to identify and manage high-risk and high-cost patients. Health Aff 33(7):1123–1131
30. Qiu L, Gai K, Qiu M (2016) Optimal big data sharing approach for tele-health in cloud
computing. In: IEEE international conference on smart cloud (SmartCloud). IEEE, pp 184–189
31. Zhang Y, Qiu M, Tsai C-W, Hassan MM, Alamri A (2015) Health CPS: healthcare cyber-
physical system assisted by cloud and big data. IEEE Syst J
32. Lin K, Luo J, Hu L, Hossain MS, Ghoneim A (2016) Localization based on social big data
analysis in the vehicular networks. IEEE Trans Ind Inform
33. Lin K, Chen M, Deng J, Hassan MM, Fortino G (2016) Enhanced fingerprinting and trajectory
prediction for IoT localization in smart buildings. IEEE Trans Autom Sci Eng 13(3):1294–1307
34. Oliver D, Daly F, Martin FC, McMurdo ME (2004) Risk factors and risk assessment tools for
falls in hospital in-patients: a systematic review. Age Ageing 33(2):122–130
35. Marcoon S, Chang AM, Lee B, Salhi R, Hollander JE (2013) Heart score to further risk stratify
patients with low timing scores. Crit Pathw Cardiol 12(1):1–5
36. Bandyopadhyay S, Wolfson J, Vock DM, Vazquez-Benitez G, Adomavicius G, Elidrisi M,
Johnson PE, O’Connor PJ (2015) Data mining for censored time-to-event data: a Bayesian
network model for predicting cardiovascular risk from electronic health record data. Data Min
Knowl Disc 29(4):1033–1069
37. Qian B, Wang X, Cao N, Li H, Jiang Y-G (2015) A relative similarity based method for
interactive patient risk prediction. Data Min Knowl Disc 29(4):1070–1093
38. Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV (2015) Incorporating
temporal data in predictive models for risk stratification of renal function deterioration. J
Biomed Inform 53:220–228
39. Wan J, Tang S, Li D, Wang S, Liu C, Abbas H, Vasilakos A (2017) A manufacturing big data
solution for active preventive maintenance. IEEE Trans Ind Inf. https://doi.org/10.1109/TI-I.
2017.2670505
40. Thirupathi L et al (2021) J Phys Conf Ser 2089:012049. https://doi.org/10.1088/1742-6596/
2089/1/012049
41. Lingala T et al (2021) J Phys Conf Ser 2089:012050. https://doi.org/10.1088/1742-6596/2089/
1/012050
42. Pratapagiri S, Gangula R, Ravi G, Srinivasulu B, Sowjanya B, Thirupathi L (2021) Early
detection of plant leaf disease using convolutional neural networks. In: 2021 3rd international
conference on electronics representation and algorithm (ICERA), pp 77–82. https://doi.org/10.
1109/ICERA53111.2021.9538659
43. Padmaja P, Sophia IJ, Hari HS, Kumar SS, Somu K et al (2021) Distribute the message over
the network using another frequency and timing technique to circumvent the jammers. J Nucl
Energy Sci Power Gener Technol 10:9
A Novel Twitter Sentimental Analysis
Approach Using Naive Bayes
Classification
Abstract The world is mutating into a better place due to the innovations happening
around the globe. Since people spend much time regularly on social media to express
their opinions, social networks are the primary sources of information regarding
people’s opinions and feelings on various topics. Twitter is a micro blogging and
social networking site that allows users to post brief status updates of up to 140
characters in length. It is a rapidly growing service. This project resolves the issue
of sentiment analysis on Twitter. Sentiment analysis is a form of natural language
processing used to monitor public opinion on a specific product or subject. Sentiment
analysis, also known as opinion mining, entails creating a framework to capture and
analyze product opinions expressed in blog posts, comments, reviews, or tweets.
The objective of this report is to provide an illustration of this fascinating problem as
well as a model for performing sentiment analysis on Twitter tweets using the Naïve
Bayes classification algorithm.
1 Introduction
L. Thirupathi (B)
CSE Department, Stanley College of Engineering and Technology for Women, Hyderabad,
Telangana, India
e-mail: [email protected]
G. Rekha
CSE Department, Kakatiya Institute of Technology & Science, Warangal, Telangana, India
S. K. Shruthi · B. Sowjanya · S. Jujuroo
CSE Department, Methodist College of Engineering & Technology, Hyderabad, Telangana, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 401
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_39
402 L. Thirupathi et al.
In this big data era, almost every individual has access to the Internet. As a result,
people are finding it easy to share their thoughts and opinions on social and also
cultural issues on global platforms like Twitter. Twitter is providing an opportunity
to communicate with strong influential people. Hence common people often address
their issues through tweets to bring them to politicians notice. Many brands launch
their products on Twitter. Influential people like actors share their life events and
experience through tweets with their fans. Their tweets get numerous replies from
fans.
As a consequence, large volumes of data are collected every single day, every
single hour, every single minute, and every single second on Twitter. This data when
put to the right use can benefit Businesses to make major decisions. Sentiment anal-
ysis is crucial since it allows companies to easily consider their consumers’ overall
views. Twitter sentiment analysis allows you to follow what people are saying on
social media about your product or service, and it can help you discover disgruntled
customers or unfavorable mentions before they become a major problem.
Simultaneously, sentiment analysis on Twitter may provide useful information.
What are the characteristics of your business that your customers enjoy the most?
What are the most frequently mentioned negative aspects?
Sentiment analysis’ primary notion is to determine the polarity of brief sentences
and classify them accordingly. The starting point The polarity of a sentiment can
be classified as “good,” “bad,” or “balanced” Because sentiment analysis in the
context of micro blogging is a relatively new study field, there is plenty of room
for more research. Prior work has been conducted on sentiment analysis of user
comments, papers, web blogs/articles, and general phrase analysis. The 280-character
limit distinguishes these from Twitter. Although work on unsupervised and semi-
supervised approaches is complete, there is still much room for improvement.
The major goal of this research is to investigate and evaluate a sentimental analysis
model based on nave Bayes classification.
2 Literature Survey
domains. Another study attempted to pre-process the dataset, then extract the adjec-
tives from the dataset that have significant significance (feature vector), pick the
feature vector array, and apply machine learning algorithms such as Nave-Bayes,
Maximum Entropy, and SVM. Finally, they evaluated the classifier’s efficiency in
terms of recall, precision, and accuracy. Nave Bayes, according to Bo Pang and
Lillian Lee, is the most efficient process with highest accuracy.
3 Proposed Methodology
The proposed methodology is a classifier built using naïve Bayes algorithm. The
following steps are performed and shown in the Fig. 1.
(1) Creating a Twitter developer account, (2) Obtaining access keys and access
tokens, (3) Connecting to Twitter API, (4) Data Acquisition, (5) Data preprocessing,
(6) Feature Extraction, (7) Training the classifier, (8) Classification.
. It’s always great hearing from happy customers! Show your support for small
businesses like ours.
. Happy 6 months to this beautiful girl, I can’t wait for what the future has for us.
. To all runners training for the Every Step Counts 5 K event, May the 4th is with
you! Happy for you! Register for our ESC 5 K today.
Examples of negative tweets:
. It makes me feel so sad that bullying still exist I think it will never stop in this
world and that is so horrible.
. Stormy Daniels is a very selfish person. For her to bring this up from years ago.
Sad, not thinking of Melania.
. I am disappointed. It’s not live in CA anymore:(
Examples of neutral tweets:
. Are there any flights flying from nyc to CA this afternoon?
. Where is the Eiffel tower entrance?
. Where can I get my license renewed?
As seen in the examples above, tweets may contain useful information and express
opinions on any subject. But they also contain a lot of irrelevant characters. Hence
preprocessing of data is important.
We are applying Tokenization, Normalization and Substitution preprocessing
techniques, then after extract the futures. To build a training model, choose one
of the text classification algorithms and feed the training corpus to the classifier.
We have chosen Multinomial naïve Bayes classifier. Now that we have the training
model, we can feed it with the testing data and get a classification prediction.
A Novel Twitter Sentimental Analysis Approach Using Naive Bayes… 405
The learning algorithm Nave Bayes is widely used in text classification prob-
lems because it is computationally efficient and simple to implement. There are two
different event models: Multivariate Event Model and Multivariate Bernoulli Event
Model.
The Multivariate Event model is referred to as Multinomial Naive Bayes. The
Multinomial Naive Bayes algorithm is a probabilistic learning method prominent in
Natural Language Processing (NLP). It calculates each tag’s probability for a given
sample and outputs the tag with the highest probability.
Bayes theorem gives a route to estimating posterior probability P(c|x) from P(c),
P(x) and P(x|c).
The below Figs. 3 and 4 shows that examples of positive and negative tweets
respectively.
Text classification is amongst the most crucial elements of text data mining, and it is
used to perform sentiment analysis. Although today’s data is exploding, classifying
406 L. Thirupathi et al.
vast amounts of data has become a challenge. Through this methodology, we can
collect and analyze the sentiment of tweets based on keyword searched. Sentiment
analysis is still in its early phases, especially in the context of micro blogging, and
is far from complete. As a result, we’ve come up with a few ideas that we think are
worth investigating in the future and could lead to even better outcomes.
References
1. Fouad MM, Gharib TF, Mashat AS (2018) Efficient Twitter Sentiment Analysis System with
Feature Selection and lassifier Ensemble. International Conference on Advanced Machine
Learning Technologies and Applications. Springer, pp 516–527
2. Musto C, Semeraro G, Polignano M (2014) A comparison of lexiconbased approaches for
sentiment analysis of microblog posts. Info Filtering Retrieval 59
3. Kharde V, Sonawane P Sentiment analysis of twitter data: a survey of techniques. arXiv preprint
1601.06971
4. Harb A, Plantié M, Dray G, Roche M, Trousset F, Poncelet P (2008) Web Opinion Mining:
How to extract opinions from blogs? In: Proceedings of the 5th international conference on
Soft computing as transdisciplinary science and technology, ACM, pp 211–217
5. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine
learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural
language processing vol 10, Association for Computational Linguistics, pp 79–86
6. Silge J, Robinson D (2017) Text mining with R: a tidy approach, O’Reilly Media
7. Abirami A, Gayathri V (2017) A survey on sentiment analysis methods and approach. In: 2016
eighth international conference on Advanced computing (ICoAC), IEEE, pp 72–76
8. Thirupathi L, Rao PVN (2021) Multi-level protection (Mlp) policy implementation using graph
database. Int J Adv Comput Sci Appl (IJACSA) 12(3). https://doi.org/10.14569/IJACSA.2021.
0120350
9. Thirupathi L, Rao PVN (2020) Developing a multilevel protection framework using EDF. Int
J Adv Res Eng Technol (IJARET) 11(10):893–902
10. Thirupathi L, Padmanabhuni VNR (2020) Protected framework to detect and mitigate attacks.
Int J Anal Exp Modal Anal 12(4):2335–2337. https://doi.org/18.0002.IJAEMA.2020.V12I6.
200001.0156858943
A Novel Twitter Sentimental Analysis Approach Using Naive Bayes… 407
11. Thirupathi L, Rekha G (2016) Future drifts and modern investigation tests in wireless sensor
networks. Int J Adv Res Comput Sci Manag Stud 4(8)
12. Thirupati L, Pasha R, Prathima Y (2014) Malwise system for packed and polymorphic malware.
Int J Adv Trends Comput Sci Eng 3(1):167–172
13. Thirupathi L, Galipelli A, Thanneru M (2014) Traffic congestion control through vehicle-to-
vehicle and vehicle to infrastructure communication. (IJCSIT) Int J Comput Sci Info Technol
5(4):5081–5084
14. Swathi M, Thirupathi L (2013) Algorithm for detecting cuts in wireless sensor networks. Int J
Comput Trends Technol (IJCTT) 4(10)
15. L Thirupathi, Reddemma Y, Gunti S (2009) A secure model for cloud computing based storage
and retrieval. SIGCOMM Comput Commun Rev 39(1):50–55
16. Thirupathi L, Nageswara RPV (2018) Understanding the influence of ransomware: an inves-
tigation on its development mitigation and avoidance techniques. Grenze Int J Eng Technol
(GIJET) 4(3):123–126
17. Thirupathi L, Sandeep R (2017) Social media: to deal crisis circumstances. Int J Innov Adv
Comput Sci (IJIACS) 6(9)
18. Rekha S, Thirupathi L, Renikunta S, Gangula R (2021) Study of security issues and solutions in
Internet of Things (IoT). Mater Today Proc ISSN 2214–7853. https://doi.org/10.1016/j.matpr.
2021.07.295
19. Gangula R, Thirupathi L, Parupati R, Sreeveda K, Gattoju S (2021) Ensemble machine learning
based prediction of dengue disease with performance and accuracy elevation patterns, Mater
Today Proc ISSN 2214–7853. https://doi.org/10.1016/j.matpr.2021.07.270
20. Nalajala S, Thirupathi L, Pratap NL (2020) Improved access protection of cloud using feedback
and de-duplication schemes. J Xi’an Univ Architect Technol 12(4)
21. Srividya V, Swarnalatha P, Thirupathi L (2018) Practical authentication mechanism using
passtext and OTP. Grenze Int J Eng Technol Spec Issue Grenze ID 1 GIJET.4.3.27,© Grenze
Scientific Society
22. Thirupathi L, Rehaman Pasha MD, Reddy GS (2013) Game based learning (GBL). Int J Res
Eng Adv Technol 1(4)
23. Thirupathi L et al. (2021) J Phys Conf Ser 2089:012049. https://doi.org/10.1088/1742-6596/
2089/1/012049
24. Thirupathi L et al. (2021) J Phys Conf Ser 2089:012050. https://doi.org/10.1088/1742-6596/
2089/1/012050.
25. Pratapagiri S, Gangula R, Ravi G, Srinivasulu B, Sowjanya B, Thirupathi L (2021) Early
detection of plant leaf disease using convolutional neural networks. In: 2021 3rd International
conference on electronics representation and algorithm (ICERA), pp 77–82. https://doi.org/10.
1109/ICERA53111.2021.9538659
26. Padmaja P, Sophia IJ, Hari HS, Kumar SS, Somu K et al (2021) Distribute the message over
the network using another frequency and timing technique to circumvent the jammers. J Nucl
Ene Sci Power Generat Techno 10:9
27. Reddy CKK, Anisha PR, Shastry R, Ramana Murthy BV (2021) Comparative study on internet
of things: enablers and constraints. Adv Intell Syst Comput
28. Reddy CKK, Babu BV (2015) ISPM: improved snow prediction model to nowcast the presence
of snow/no-snow. Int Rev Comput Softw
29. Reddy CKK, Rupa CH, Babu BV (2015) SLGAS: supervised learning using gain ratio as
attribute selection measure to nowcast snow/no-snow. Int Rev Comput Softw
30. Reddy CKK, Rupa CH, Babu BV (2014) A pragmatic methodology to predict the presence of
snow/no-snow using supervised learning methodologies. Int J Appl Eng Res
Recognition and Adoption
of an Abducted Child Using Haar
Cascade Classifier and JSON Model
Abstract The purpose of the paper is to recognize an abducted child from the photos
of children available, with the help of face recognition and face matching technique
and to adopt an abandoned child from the data available, with the help of string
comparing technique. The work in this paper is extended by providing the concept
of an adoption system for the abandoned child. Haar cascade classifier, an OpenCV
classifier can be used for face recognition, the mean method can be used for face
matching, and DB can be used for string comparisons. The dataset consists of 500
images for child recognition and child matching, and 50 images for child adoption.
The accuracy obtained from the face recognition algorithm is 80.7%.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 409
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_40
410 G. Begum et al.
of them are remain untraced. This project will be helpful to these abducted children,
their parents, and the authorities who are searching for these children.
Even after getting traced, some of the children are not taken by their real parents
or guardians. These are abandoned children. There are some parents who don’t want
to take their children back and there are also some parents who want to adopt the
child. These parents, who want to adopt the child, are known as adoptive parents.
This project will be helpful for these adoptive parents who can adopt the child and
live happily.
2 Literature Survey
In paper [3], a methodology is proposed for missing child identification system that
combines both facial feature extraction and matching. For feature extraction, a deep
learning method is used and for matching support vector machine is used. Face
detection is done using HOG algorithm. A box is bounded on the detected face,
and by using the algorithm of face landmark estimation, sixty-eight specific points
(landmarks) on the face are figured out. After passing these images to the deep CNN,
128 measurements are obtained. An SVM classifier which takes the measurements
from a test image and gives the closest match as output. A face is recognized using
this SVM classifier. The dataset in this system consists of 43 child cases and the
accuracy of the system is 99.4%. The system doesn’t perform well in case of large
data set because the required training time will be higher.
In paper [4], a method of tracking people online and identifying them using
RFID and Kinect was proposed. Kinect V2 sensor was used for tracking, and it
generates a skeleton of the body for six persons. Identification was performed using
both Kinect and passive RFID. A person’s skeleton is first measured, and then their
RFID tag measured using the reader antenna positions as references, and then the
best match is made between the two. Only six people can be tracked by this system
simultaneously. The effective area is limited to four meters. People have to wear the
RFID tag physically.
In paper [5], the system presented an E-crime alert by using the robust face
recognition system. It worked on the LEM algorithm to detect the point, and LSD is
calculated, and finally, the feature is computed. The system is efficient to 85%, and
it doubles the cost of computing time.
In paper [6], a system is developed by using deep learning for face detection and
tagging a deep dense face detector is used for face detection, and the LBPH method is
used to recognize the detected faces. The system is extended by providing the concept
of a tagging system for the detected faces. For the faces detected successfully, the
system achieved an accuracy of 85% for tagging the faces.
Recognition and Adoption of an Abducted Child Using Haar... 411
In paper [7], the author presented work on identifying aging deep face features on
missing children. The system proposed an age-progression module that is responsible
for age-progress deep face features given by any commodity face matcher. Three
face matchers called Face Net, Cos Face, and COTS were used to evaluate the face
matching results. The name of the dataset is the ITWCC dataset, and it consists of
7990 images of 745 child celebrities.
In paper [8], a methodology is proposed for missing child identification systems
using deep learning and multi-class SVM. It combined both facial feature extrac-
tion and matching. For feature extraction, a deep learning method is used and for
matching support vector machine is used. Face recognition is done using the VGG-
Face network. An SVM classifier takes the measurements from a test image and gives
the closest match as output. The dataset in this system is user defined and it consists
of 846 child face images with 43 unique children’s cases. The accuracy of the system
is 99.41%. The system doesn’t perform well in case of large data set because the
required training time will be higher.
In paper [9], the work presented on developing the ML-based methods, to recom-
mend missing person’s search level. It proposed the methods of ML to support the
decisions of the police actions in search of missing persons. According to the author,
the time between the moments of disappearance of a person to the moment of deci-
sion making must be short. Several methods were explored, including decision trees,
random forests, naïve bayes, support vector machines, and multi-layer perceptron.
Among all these, decision trees and random forests gave the Fit factor value that
indicates their best adaptations for the classified information. The weakness of the
system was the small number of real cases.
In paper [10], an application is proposed for uploading the complaints of a missing
person on the AWS web server. It can be accessed by any of the government officials
and also by the local people for matching the missing person’s face. By using face
recognition, this application matches the image of a missing person on any android
platform. It consists of several layers like Presentation Layer for front-end, Business
Layer for requests and responses, and Database Layer for storing data. This applica-
tion obtained a better accuracy. It is limited to android devices only, and it requires
an internet connection.
. Some of the papers do not have authentic missing child image dataset like in
reference [7]. It includes only the image dataset but not the missing child image
dataset.
. According to reference [8], a deeply disturbing fact about India’s missing children
is that many children go missing every day, and half of them remain untraced.
. According to reference [8], the earliest methods for face recognition commonly
used features such as LBPH, HOG, SIFT, or SURF does not give better
performance.
412 G. Begum et al.
. Even after getting traced, the abandoned children are not provided with any option
in the system.
. There is no adoption module for the abandoned children.
. No other module is provided for those users who want to adopt the children from
the abandoned ones.
. There is no option provided for the authorities or the official people regarding
how to proceed with the abandoned children.
3 Methodology
3.1 Dataset
The dataset in this system is user-defined and a real-time dataset. The data used to
make the dataset is collected from the authentic website named “Ministry of Women
and Child Development (MWCD)” Govt. of India. This website provides information
for both the Missing Children and the Recovered Children.
The data is downloaded by web scrapping and written in a.txt file and then loaded
into MySQL DB after performing preprocessing techniques. For the missing children,
the data is downloaded for the months 11th June 2021–13th August 2021, and it
consists of 500 records (RCF images). There are 10 more images downloaded for
the missing children whose details are not added to DB (RCNF images). There are
10 more random (things) images downloaded whose details are not added to DB
(NRCNF images). For the adopting children, the data is downloaded before March
2021, and it consists of 50 records of recovered children.
The system architecture consists of two main modules divided into four modules.
These two main modules are: (1) Child Recognition and (2) Child Adoption. These
two modules are sub divided into four modules. Child Recognition is divided into
Official Module and Public Module. Child Adoption is divided into Welfare Module
and Adopting Parent Module (Fig. 1).
Module 2: Public Module The Public Module is accessible to those users who want
to upload the suspected child details. Public can upload the suspected child image
with other details, then face recognition and face matching of the image process
begins. After this process, the screen shows the status of the uploaded child image.
The status shows one of the three outputs. The three outputs are.
1. Thank you for uploading. It is not recognized as a face, and no child found
(NRCNF).
2. Thank you for uploading. It is recognized as a face, but no child found (RCNF).
3. Thank you for uploading. It is recognized as face and child found (RCF).
When the user uploads any of the thing’s images but not the child image, the
status shows the first output. When the user uploads a suspected child image that is
not there in the missing database, the status will show the second output. When the
user uploads the suspected child’s photo that is stored in the missing database, the
status shows the third output.
Module 3: Welfare Module The Welfare Module is responsible for uploading the
adoptive child images. Welfare people add the images according to the conditions
provided by the adoptive parents. There are six folders which contains adoptive child
images based on the requirements (age and gender) provide by the adoptive parents.
By using string comparing through DB technique, the welfare people match the
data provided by the adoptive parents at the time registration with the data while
resubmitting the child details. If the details are matched, the respective child images
are displayed on the screen. If the details are not matched, the adoptive parent will
414 G. Begum et al.
not have access to the adoptive child images. After adoption, the welfare people
generate the adoption certificate.
Module 4: Adopting Parent Module The Adopting Parent Module is accessible to
those users who want to adopt the abandoned child. It is responsible for providing the
abandoned children to the adoptive parents. This module requires the login process
of the adopting parent. Before the adoption process, the adopting parent has to go
through the adoption rules, provide the required documents, and strictly adhere to the
adoption rules. After reading the adoption rules, adopting parent has to register. After
registration, adopting parents can log in by providing the username and password
created at the time of registration. For verification, the adopting parent has to resubmit
those child details that are submitted at the time of registration. In this, the system
compares both the strings through Database (DB). If the strings match, a screen
will be appeared showing the Adoption Child Images List. From that list, adopting
parents can adopt the abandoned child, get the adoption certificate, and can log out.
3.3 Methods
If both the mean matches, the system will delete the image first folder and add
it to the third folder, and shows the message as matched. Otherwise, it shows the
message as did not match.
String Comparing Through DB The two important strings are Child Age and Child
Gender. This string comparing method takes both the strings age and gender from
the log-in page and compares them with the strings stored in the database of that
particular adopting parent, with the respective attributes. It is possible with the help
of python “for” and “if” statements. If the strings match, then the system will appear
a screen showing the appropriate list of abandoned child images. It is possible with
the help of python “if” and “elif” conditions. If the strings do not match, then the
system will appear a screen showing the message as log in failed.
416 G. Begum et al.
The dataset consists of 500 missing child (1–17 years old both male and female)
images. From these 500 images, the system is tested for 396 missing child images.
There are 10 more images downloaded for the missing children whose details are
not added to DB. From these 10 images, the system is tested for 4 images. There are
10 more random (things) images downloaded whose details are not added to DB.
From these 10 images, the system is tested for 5 images. Therefore, the system is
totally tested for 405 images. Among the 405 images, 396 belong to RCF, 4 belong
to RCNF, and 5 belong to NRCNF. The actual 396 RCF images predicted 318 as
RCF, 0 as RCNF, and 78 as NRCNF. The actual 4 RCNF images predicted 0 as RCF,
4 as RCNF, and 0 as NRCNF. The actual 5 NRCNF images predicted 0 as RCF, 0 as
RCNF, and 5 as NRCNF.
The average accuracy obtained is 0.8074074074074075 i.e., 80.74%. The preci-
sion for RCF obtained is 1.0. The precision for RCNF obtained is 1.0. The
precision for NRCNF obtained is 0.060240963855421686. The average preci-
sion for all the classes obtained is 0.6867469879518072. The recall for RCF
obtained is 0.803030303030303. The recall for RCNF obtained is 1.0. The recall
for NRCNF obtained is 1.0. The average recall for all the classes obtained is
0.9343434343434343. F1-score for all the classes obtained is 0.7916369505649186
(Fig. 4).
Recognition and Adoption of an Abducted Child Using Haar... 417
5 Conclusion
In this paper, we have proposed a system for recognizing an abducted child using
a classifier. Even after getting traced, some of the children are not taken by their
real parents or guardians. This research will be helpful for these abducted children,
their parents, authorities, abandoned children and the adoptive parents who can adopt
the child and live happily. The work in this research is extended by providing the
concept of an adoption system for the abandoned child. The performance of the
face recognition and matching algorithm is obtained by using the parameters like
precision, recall and F1-score. The accuracy obtained from the face recognition
algorithm is 80.7%.
References
1. Ministry of home affairs (2019) Report on missing women and children in India. National
Crime Records Bureau
2. Flores JR (2002) National incidence studies of missing, abducted, runaway, and thrown-away
children
3. Kumar BK, Supriya G, Divya N, Bhargavi T, Venkatesh T (2020) Missing children identification
system using deep learning and multiclass SVM. J Info Comput Sci
4. Arniker SB (2014) RFID based missing person identification system. In: Conference Paper
5. Pate S, Deepak GM, Vinit JM, Parmesh KY (2016) Robust face recognition system for E-crime
alert. Int J Res Eng Appl Manage (IJREAM)
6. Mehta J, Ramnani E, Singh S (2018) Face detection and tagging using deep learning. Int Conf
Comput Commun Signal Proc (ICCCSP)
7. Deb D, Aggarwal D, Jain AK (2019) Finding missing children: aging deep face features
8. Chandran PS, Byju NB, Deepak RU, Nishakumari KN, Devanand P, Sasi PM (2018) Missing
child identification system using deep learning and multiclass SVM. In: IEEE recent advances
in intelligent computational systems (RAICS)
418 G. Begum et al.
Abstract Artificial Intelligence has the potential to bring about an exemplary repo-
sition in the detection of brain tumors. Many health organizations have identified
brain tumors as the second leading cause of mortality in humans worldwide. The
possibility of an effective medical therapy exists if a brain tumor is identified at an
early stage. For appropriate diagnosis, Magnetic Resonance Imaging (MRI) is firmly
recommended for individuals with brain tumor indications. The immense geograph-
ical and structural variety of the brain tumor’s surrounding environment makes auto-
matic brain tumor classification a challenging task. The differences in the tumor site,
structure, and size present a significant difficulty for brain tumor identification. This
research proposes the design and implementation of Convolutional Neural Networks
(CNN) classification for enabling automatic brain tumor detection. When compared
to other cutting-edge methodologies such as Support Vector Machines (SVM) and
Deep Neural Networks (DNN), obtained results demonstrate that CNN repositories
have a rate of 97.5% accuracy with minimal intricacy.
1 Introduction
Artificial Intelligence (AI) is a field in computer science that aims to provide machines
with human-resembling intelligence, allowing them to learn, analyze, and solve prob-
lems when confronted with multitudinous forms of data. In recent times, the infusion
of Artificial Intelligence into the healthcare system has helped clinical experts give
quality patient care. AI has been demonstrated in research to have a positive impact
A. B. Ifra
Shadan Women’s College of Engineering and Technology, Hyderabad, Telangana, India
M. Sadaf (B)
Chaitanya Bharathi Institute of Technology, Hyderabad, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 419
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_41
420 A. B. Ifra and M. Sadaf
on many preoperative stages such as diagnosis, assessment, and planning [1]. One
of the important organs in the human body is the brain, which is made up of billions
of cells. Irregular cell division produces an abnormal group of cells, often known
as a tumor. Low grade and high grade are the two forms of brain tumors. Low-
grade brain tumors are known as benign brain tumors. Malignant refers to a tumor
with a high grade. Because the malignant tumor is cancerous, it spreads rapidly and
endlessly throughout the body. It causes immediate death [2]. X-rays, CT scans, and
magnetic resonance imaging are among the imaging techniques available (MRI).
The X-ray provides visual evidence of the brain or skull’s living structures and
overall synthesis. However, neuroimaging, such as MRI, is still the fundamental
basis for diagnosing brain tumors [3]. Brown et al. devised a Natural Language
Processing (NLP) ML system that analyzed brain MRI inputs and then determined
the most optimal MRI brain imaging sequence to generate the most therapeutically
valuable images, demonstrating the influence of AI even before radiological images
are obtained [4]. ML-based sequential algorithms could help standardize the MRI
sequence protocol, increasing the clinical utility of the scans produced [5]. Moreover,
researchers observed that radiologist sequence selection is frequently challenged by
unusual situations, indicating that the ML technique performed particularly well in
these instances [1]. Using publically available datasets [6], the goal of this study is to
create a completely automatic CNN model for brain tumor detection. The following
is how the rest of the paper is organized: The second and third section contains a
summary of the current system and proposed solution. Section 4 digs much deeper
into the proposed CNN model. A full comparison of the proposed method to current
methodologies, as well as a description of the experimental results, is included in
Sect. 5. Portion 6 is the paper’s final portion, and it draws everything to an end.
2 Present System
Since it became possible to capture and transmit image data to the computer, auto-
matic algorithms for brain tumor detection and type labeling have been used with
brain MRI images. Over the last decade, NN and SVM are the most commonly
used approaches for their application in classifying brain tumor images and their
ease of use, whereas deep learning models have recently established an emerging
trend in ML that represents composite relations with the least possible number of
nodes. Hence, as matter of fact, they gradually ascended towards the top of their
respective healthcare sectors, such as medical image analysis, healthcare analytics,
and bioinformatics. Various algorithms like the FCM, SVM, and DNN are used for
the partial fulfillment of the requirements to arrive at the best results for brain tumor
detection. FCM (Fuzzy C Means Clustering), is a soft clustering method in which
each data point is allocated a probability or likelihood score to belong to that cluster.
It is preferred when we have overlapped datasets. Support Vector Machines (SVMs)
are supervised learning algorithms for classifying, predicting, and detecting outliers.
They are used in high-dimensional settings and in situations where dimensionality
Automatic Brain Tumor Detection Using Convolutional Neural… 421
exceeds the amount of samples because of their extreme effectiveness [7]. SVM clas-
sifier is used to detect a cluster of malignant tumor cells in a segment of Magnetic
Resonance (MR) and fragment the tumor cells to assess the size of the tumor present
in that segmented area of the brain. The SVM approach is not appropriate for large
data sets and therefore does not function effectively when there is more distortion in
the data set. The support vector classifier has no probabilistic explanation because
it operates via positioning the data points around the classifying hyperplane [8]. So
far many algorithms have been implemented on how to detect and extract tumors in
medical images; they used techniques such as a hybrid approach with Support Vector
Machines (SVM), backpropagation, and dice coefficient. Among these algorithms
which used backpropagation as a base classifier had the highest accuracy of 90%
[7]. The Deep Neural Network (DNN) is another Deep Learning framework that has
been successfully used for classification and regression in a spectrum of areas. This
is a feed-forward network in which the input is routed through numerous hidden
layers from the input layer to the output layer (more than two). With a 0.97 recall
rate, the DNN classification rate is 96.97% [9].
3 Proposed System
4 Methodology
The design and execution of a neural network are used to model the human brain.
Vector quantization, data aggregation, optimization methods, approximation, pattern
recognition, and classification algorithms are all common uses for neural networks. A
422 A. B. Ifra and M. Sadaf
neural network’s interconnections are divided into three categories. Neural networks
can be classified into three types: feedback, feed-forward, and recurrent. In a typical
neural network, images cannot be resized. However, in a convolutional neural
network, the picture can be resized. The Convolution Neural Network is made up of
an input layer, a convolution layer, a Rectified Linear Unit (ReLU) layer, a pooling
layer, and a fully connected layer (CNN). The convolution layer divides the image
into small parts. The element-by-element activation function is performed by the
ReLU layer. It is not necessary to use the pooling layer. We can choose whether to
use it or not. On the other hand, the pooling layer is mostly used for downsampling.
Based on the probability score between 0 and 1, the class score or label score value is
created in the last layer (i.e. fully linked layer) [11]. A block diagram of brain tumor
classification using convolutional neural networks is shown in Fig. 1. The CNN-based
brain tumor categorization process is divided into two phases: the training phase and
the testing phase. Using label names such as tumor and non-tumor brain images, the
quantity of images is divided into various categories. To create a prediction model,
the training phase includes preprocessing, semantic segmentation, and categorization
using the Loss function. To begin, label the image collection for training purposes.
Picture resizing is done during preprocessing to change the image’s size.
The loss function is calculated using a gradient descent-based approach. The raw
image component is mapped with class scores using a scoring method. The loss
function is used to determine how good a set of variables are. It is defined by how
closely the generated scores reflect the ground truth labels in the training data. The
loss function calculation is critical for improving accuracy. The precision and loss
function are inversely related, if one is high then the other is low, and vice-versa. The
value for the loss function is used to construct the gradient descent method. Compute
the loss function’s gradient by evaluating the gradient value several times.
Fig. 1 Flowchart of the proposed CNN classification system for brain tumors
Automatic Brain Tumor Detection Using Convolutional Neural… 423
4.1 Datasets
The proposed CNN is trained on the Kaggle dataset [6], which contains MRIs of brain
tumors. There are a total of 253 images in the MRI dataset [6]. We used “ImageData-
Generator” provided by Keras among other techniques for data augmentation [12].
It replaces the original batch with a new batch of images that have been randomly
modified. The images are flipped, rotated, tilted, and brightened before being resized
to 128 × 128 pixels.
For training CNN, the Kaggle dataset is used which contains 98 without excrescence
and 155 with excrescence MRIs of the brain. To ameliorate the performance of the
model, “ImageDataGenerator” is used which generates new data continuously from
the dataset handed to train the model. Grayscale images are produced by converting
multi-channel images into single-channel images [13]. The model is trained using
80% of the photos from this dataset. The pictures are all pre-processed before being
fed to CNN. After geometric and color augmentation, the images are scaled, tilted,
and rotated before being resized to 128 × 128 pixels. Now, CNN should be able to
execute dynamically for a variety of purposes.
The trained model is now tested with the remaining 20% of the data. A snapshot of
the data set is shown in Fig. 2.
Three convolution layers make up CNN’s architecture, as designed in this study. The
convolution layer, the network’s fundamental building unit, combines distinct sets
by convolving images using the convolution filter, resulting in a feature map. Three
layers are presented in the architecture for extracting feature maps and creating more
information for categorization (Fig. 3).
Algorithm:
1. Convolution filter is to be added to the beginning layer.
2. Smooth convolution filter to minimize the sensitivity of the filter.
3. For the purpose of signal transduction from different layers an activation layer
is in charge.
4. A rectified linear unit is used for minimizing the time for training (RELU).
5. All neurons in the coming layers are connected to each other.
6. To the neural network, a loss layer is to be added to give feedback towards the
end of training (Fig. 4).
5 Discussions
The tumor and non-tumor MRI scans from Kaggle [6] are included in our Dataset. The
dataset contains real cases of patients. In this study, a convolutional neural network
is used to enable effective automatic brain tumor detection. Python is used to carry
out the simulation. The precision is calculated and compared to all other existing
processes. Calculating the training accuracy, validation accuracy, and validation loss
determines the efficiency of the proposed brain tumor classification system. The
current technique for detecting brain lesions is SVM-based categorization. It accepts
the feature extraction output. The classifying output is generated, and the accuracy is
Automatic Brain Tumor Detection Using Convolutional Neural… 425
determined using the feature value. The computation rate is slow and the accuracy is
low in SVM-based tumor and non-tumor identification. Separate feature extraction
methods are not required for the proposed CNN-based classification. The feature
value is calculated using CNN. The classification of tumor and non-tumor brain
imaging is shown in Fig. 3. As a result, the complexity and computation time are
reduced while the accuracy remains high. The accuracy of brain tumor classification
is shown in Fig. 5. Finally, the segmentation results in a Tumor brain or a Non-tumor
brain, based on the probability score value.
6 Conclusion
The fundamental goal of this research is to create an accurate, quick, and easy-to-use
automatic brain tumor classification system. Tumor classification has traditionally
relied on Fuzzy C Means (FCM)-based segmentation, texture and shape feature
extraction, as well as SVM and DNN-based classification. The level of complication
in this method is low, the tumor processing rate is long, and the accuracy is poor.
The suggested system includes a CNN-based classification to improve accuracy and
reduce time complexity. The results acquired are also labeled as either tumor or
normal brain images. The training accuracy is 97.5%, and the validation loss and
delicacy are determined. Validation accuracy is likewise good, with negligible vali-
dation loss. Convolutional neural networks are a developing field that will likely aid
radiologists in providing more accurate patient care. This paper provides a funda-
mental review of automated segmentation, allowing the reader to be well-informed
about the field. This could be used in other fields of radiology by further developing
segmentation techniques in brain tumors.
References
1. Williams S, Layard Horsfall H, Funnell JP, Hanrahan JG, Khan DZ, Muirhead W, Stoyanov
D, Marcus HJ (2021) Artificial intelligence in brain tumor surgery-an emerging paradigm.
Cancers 13(19):5010. https://doi.org/10.3390/cancers13195010
2. Zhang J et al. (2018) Brain tumor segmentation based on refined fully convolutional neural
networks with a hierarchical dice loss. In: Cornell university library, computer vision, and
pattern recognition
3. Ranjbar Zadeh R, Bagherian Kasgari A, Jafarzadeh Ghoushchi S et al (2021) Brain tumor
segmentation based on deep learning and an attention mechanism using MRI multi-modalities
brain images. Sci Rep 11:10930. https://doi.org/10.1038/s41598-021-90428-8
4. Brown AD, Marotta TR (2018) Using machine learning for sequence-level automated MRI
protocol selection in neuroradiology. J Am Med Inform Assoc 25:568–571. https://doi.org/10.
1093/jamia/ocx125.[PMCfreearticle][PubMed][CrossRef][GoogleScholar]
5. Brown AD, Marotta TR (2017) A natural language processing-based model to
automate MRI brain protocol selection and prioritization. Acad Radiol 24:160–166.
7.1016/j.acra.2016.09.013. [PubMed][CrossRef][Google Scholar]
6. Abdalslam L (2019) Brain tumor detection CNN, retrieved 10 November 2021 from. https://
www.kaggle.com/loaiabdalslam/brain-tumor-detection-cnn/data
7. Pedapati P, Tanneedi RV (2017) Masters thesis electrical engineering, December 2017. http://
www.diva-portal.org/smash/get/diva2:1184069/FULLTEXT02.pdf
8. Dhiraj K (2019) Top 4 advantages and disadvantages of Support Vector Machine or
SVM, Medium. https://dhirajkumarblog.medium.com/top-4-advantages-and-disadvantages-
of-support-vector-machine-or-svm-a3c06a2b107
9. Mohsen H, El-Dahshan ESA, El-Horbaty ESM, Salem ABM (2018) Classification using deep
learning neural networks for brain tumors. Future Comput Info J 3(1):68–71, ISSN 2314–7288.
https://doi.org/10.1016/j.fcij.2017.12.001
10. Alam MS, Rahman MM, Hossain MA, Islam MK, Ahmed KM, Ahmed KT, Singh BC, Miah
MS (2019) Automatic Human Brain Tumor Detection in MRI Image Using Template-Based K
Means and Improved Fuzzy C Means Clustering Algorithm. Big Data Cogn Comput 3(2):27.
https://doi.org/10.3390/bdcc3020027.[GoogleScholar][Scopus]
Automatic Brain Tumor Detection Using Convolutional Neural… 427
11. Seetha J, Raja SS (2018) Brain tumor classification using convolutional neural networks.
Biomed Pharmacol J 11(3). https://dx.doi.org/https://doi.org/10.13005/bpj/1511
12. Naseer A, Yasir T, Azhar A, Shakeel T, Zafar K (2021) Hindawi Int J Biomed Imaging 2021,
Article ID 5513500. https://doi.org/10.1155/2021/5513500
13. Reddy CKK, Anisha PR, Apoorva K (2021) Early prediction of pneumonia using convolutional
neural network and X-Ray images. In: Smart innovation, systems and technologies
Deep Learning and Blockchain
for Electronic Health Record
in Healthcare System
Abstract The emerging technologies like Artificial intelligence and block chain
technology has wide range of applications in the field of healthcare system. Deep
neural networks or deep learning technology in Artificial intelligence which works
similar to human brain is coupled with blockchain technology provides effective
tracking and personalized collection of data in the medical field. Integration of these
two technologies together allows data security and transparency in the medical care
system with a high accuracy. Review on several research paper by using deep learning
and blockchain technology illustrated security and efficiency advancement for the
prediction and decision making process in biomedical applications. Blockchain tech-
nology stores cryptographic data which artificial intelligence requires. Real life
data increases the accuracy of regression or classification problems in deep neural
network. Block chain technology ensures the safety of data exchanging and analysis
among data suppliers. Comparison studies between deep learning technology and
blockchain technology in medical field aims to give a brief informations and process
flow about their integration process in medical electronic health care sector. Integra-
tion of artificial intelligence and Block chain technology provides time consuming
and more accurate result with all safety towards data exchanged.
1 Introduction
Medical services is coming to another age where the adequate biomedical infor-
mation are assuming an ever increasing number of critical parts. The tremendous
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 429
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_42
430 Ch. Sravanthi and S. Chowdary
2 Related Work
In this article, we discuss about most recent purposes of deep learning and block chain
in clinical science, accentuating the distinguished elements which can considerably
influence health care. We have especially centered on the subject of electronic health
records stored by providing safer mechanisms using deep learning and blockchain
based on several published peer-reviewed studies.
The huge models applied to the clinical consideration locale have been by and large
established on convolutional neural networks (CNNs) [3] Ismail et al., K-Nearest
Deep Learning and Blockchain for Electronic Health Record… 431
Neighbor (K-NN) for arrangement and forecast at this point time taken for the
assumption is longer so the further improvement in gauge is LSTM (Long Short-
Term Memory Neural Network) with Recurrent Neural Networks (RNNs) has been
made [4], Restricted Boltzmann Machines (RBMs) [5], and Autoencoders (AEs) [6].
Patel et al. [7] one of the significant AI techniques used for drug discovery is sensible
molecular candidates based on the training inputs by means of QSAR. This also devel-
oped by the neural network based retrosynthesis algorithm to obtain 90% accuracy.
This paper incorporates the survey for AI playing major role in drug discovery in
pharma intelligence.
Keshavari et al. [8] recommended the headway of COVID-19 medication and
immunization revelations by man-made reasoning procedures. If the specialist having
an adequate number of information, it will prepare the information and anticipate the
require immunization. So they gathered dataset of mixtures, peptides and epitopes
found either in silico or vitro from CoronaDB-AI. The outcome assessed from the
trained dataset and discovered the effective viral therapies.
Jo et al. [9] on the basis of neuroimaging techniques and data, the utilization of deep
learning in detecting Alzheimer’s disease has a rapid growth in healthcare system.
The accuracy of Alzheimer disease prediction is detected by the combination of
machine learning and stacked auto-coder (SAE). Alzheimer infection research actu-
ally developing and further developing execution ordinarily by fusing cross breed
information types, omics information and so on, Fisher et al. [10] Charles., explained
the unsupervised machine learning involved to predict Alzheimer disease for dozens
of patients simultaneously. The techniques used in machine learning called Condi-
tional Restricted Boltzmann Machine (CRBM). The dataset collected from 44 clinical
places with 18 months trajectories from 1909 patients. This unsupervised techniques
predicts changes in ADAS-Cog scores with accuracy.
Lundervold and Lundervold [11], elaborated the machine learning algorithm in clin-
ical pictutre handling and image analysis. They used image analysis in MRI from
segmentation to disease prediction. Esteva et al. [12] illustrated the deep learning in
medical computer vision and how it get benefited through various applications such
432 Ch. Sravanthi and S. Chowdary
The need for patient-driven offices and connecting disparate frameworks have caused
the act of blockchain. Blockchain conveys patients unlimited authority over their
wellbeing accounts. Patient information is very case-delicate and should be kept
and partaken in a protected and private way. Thus, it is a significant objective for
malevolent attacks, like Denial of Service (DoS), Mining Attack, Storage Attack and
Dropping Attack. Blockchain conveys a secured and solid stage for medical care
against frustrations and assaults since it contains assorted systems of access control
[14].
The act of blockchain innovation in medical services doesn’t accentuation on
understanding’s classification and security as it were. It is applied to address extra
significant subjects like interoperability. Applying safeguarded strategies to share
clinical information is testing a direct result of the heterogeneous information struc-
tures among different gatherings, which brings about similarity forestalling. Infor-
mation understanding can be fragmented because of the disparate utilization of the
wording “medical services”. It is compulsory to settle on both construction and
semantics of information to share restorative information.
One application test is Guard time, a Netherland based information security firm,
which got together with the public authority of Estonia to make a blockchain-based
system to affirm patient characters. A second EHR-related execution is MedRec, a
venture began between MIT Media Lab and Beth Israel Deaconess Medical Center.
This stage offers a scattered strategy to taking care of assents, endorsement, and
information dividing among medical care frameworks [15].
Reegu et al. [16] suggested the EHR for handling the secure data using blockchain
techniques. They have likewise inspected the pandemic management in more efficient
such as monitoring supply chain management for vaccines, data aggregation, forecast
for additional advancement of contamination populace and COVID certificate for
the people etc., So blockchain technology provided the accurate and secure data
Deep Learning and Blockchain for Electronic Health Record… 433
storage during pandemic period. Dubovitskaya et al. [17] illustrated that the clinical
care services raised with the specialization across multiple hospitals for disease
prediction and diagnosis of chronic disease such as cancer. They collaborated with
Stony Brook University Hospital and developed ACTION-EHR to treat radiation
treatment to cancer. These techniques built on Hyperledger Fabric with blockchain
framework. By approaching the blockchain technology, obviously the therapy for
the cancer would get succeed without delay and efficient.
Fatokun et al. [18] The goal of this research was to demonstrate blockchain concept
used to solve privacy, security and data exchange issues and implemented Ethereum
consortium blockchain. But there is some drawbacks which should be avoided by
future. The drawback that they mentioned are scalable system for blockchain, extra
overhead due to bandwidth resources and suggested to include machine learning to
detect the intrusions.
Wang and Song [19] projected a safe cloud-put together EHR framework based
with respect to blockchain and property based cryptosystem. To scramble clinical
information, they utilized a combination of personality based encryption and char-
acter based initials at the comparative chance to execute advanced signs. On top
of that blockchain, different strategies are utilized to shield the dependability and
detectability of clinical fitness. While the recently referenced three investigations
figured around the cryptographic highlights to get EHR blocks, Roehrs et al. [20] have
addressed different difficulties associated with the mixture of scattered wellbeing
record, and the entrance controlling of medical care supplier’s benefactors. These
two matters were settled by proposing OmniPHR, a scattered model for absorbing
individual wellbeing records (PHR) that utilizes an identical information base to store
PHR in blocks and joined primary semantic interoperability and cutting-edge vision
of various PHR set-ups. Ultimately, in an absolutely different strategy, Hussein et al.
[21] laid out a structure for defending helpful record by blockchain innovation in
view of hereditary calculations and discrete wavelet changes. The projected technique
involves a reexamined cryptographic hash generator for delivering the fundamental
client security key. In addition, MD5 (a message-digest calculation utilizing a hash
work that yields a 128-bit hash esteem) strings were used to make another key set-
up by tolerating a discrete wavelet change. This strategy further develops generally
framework security and protection to various attacks.
Griggs et al. [22] gave the expansion of WBANs blockchain savvy arrangements
for a safeguarded ongoing patient noticing and clinical intercessions framework.
The review proposes the joining of blockchain to perform shrewd arrangements that
would survey information gathered by a patient’s IoT medical services gadgets in
view of custom fitted edge principles. This is finished to beat the issue of logging
move of information exchanges in an IoT medical services structure. Rahman et al.
[23] introduced a shrewd dyslexia investigation arrangement where a decentralized
434 Ch. Sravanthi and S. Chowdary
enormous information source was utilized to store and afterward share with medical
care gatherings and people utilizing blockchain.
Versatile Hypermedia Health information was seized during dyslexia examina-
tion and kept in a decentralized enormous information storehouse, which could be
shared for extra clinical examination and factual assessment. Ichikawa et al. [24] laid
out a structure alter safe versatile Health framework by blockchain apparatuses, to
ensure dependability of records. The objective of this study was to foster a versatile
wellbeing framework for mental social treatment for a sleeping disorder by a cell
phone application.
The Deep learning enhance the data, analysing and decision making. It sharing
the data and reliability is efficient in order to improve its accuracy. The decen-
tralized data using blockchain technology giving importance on data sharing. The
data shared should be secure and legitimate and blockchain concept enables these
tasks. The combination of these two techniques deep learning and blockchain shows
high accuracy with security and dependability of shared data that helps in healthcare
intelligence.
Tagde et al. 2021 [25] shows that the integrating blockchain and deep learning
concepts makes significance difference in healthcare. It generalized the analytical
technology that can be integrated to make risk management approach. So the health-
care utilizes the data from the blockchain medical records and the profound learning
techniques analyse the proposed algorithm to settle the issue and trackdown the [26]
proposed the latest advancements of blockchain and AI approach in healthcare moni-
toring systems. Their main focus on sustainable framework based on integrating these
two technologies, characteristics of healthcare supply chains, the impact on human
of these techniques and emerging technologies such as big data, IoT and AI.
Bhattacharya et al. [27] demonstrated the Healthcare 4.0 with decentralization and
provided necessary inputs for user data privacy based on previous electronic health
record to be analysed. The proposed engineering BinDaas (Blockchain-Based Deep-
Learning as-a-Service in Healthcare 4.0 Applications) an incorporated methods for
exact expectation. The following contributions have involved in this research such
as a lattice key based to avoid quantum attacks, approval of the security plan and
forecast model against existing cutting edge frameworks. This model comprises of
enormous number of boundaries in view of Gaussian dispersions.
kumar et al. [28] the new issue looked by the worldwide with the increment of
COVID-19 cases. The best way to move towards the information assortment and
maintaining in secure way. In order to diagnosis the covid patients requirements
such as shortage and reliability of testing kits. It was a tough time for everyone
to dealt with it due to increased positive cases and predictions. The another issue
Deep Learning and Blockchain for Electronic Health Record… 435
3 Conclusion
References
1. Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2018) Deep learning for healthcare: review,
opportunities and challenges. Brief Bioinform 19(6):1236–1246
2. Hasselgren A, Kralevska K, Gligoroski D, Pedersen SA, Faxvaag A (2020) Blockchain in
healthcare and health sciences—a scoping review. Int J Med Info 134:104040
3. Ismail WN, Hassan MM, Alsalamah HA, Fortino G (2020) CNN-based health model for regular
health factors analysis in internet-of-medical things environment. IEEE Access 8:52541–52549
4. Aldahiri A, Alrashed B, Hussain W (2021) Trends in using IoT with machine learning in health
prediction system. Forecasting 3(1):181–206
5. Cifuentes J, Yao Y, Yan M, Zheng B (2020) Blood transfusion prediction using restricted
Boltzmann machines. Comput Methods Biomech Biomed Engin 23(9):510–517
6. Baucum M, Khojandi A, Vasudevan R (2021) Improving deep reinforcement learning with
transitional variational autoencoders: a healthcare application. IEEE J Biomed Health Inform
25(6):2273–2280
7. Patel L, Shukla T, Huang X, Ussery DW, Wang S (2020) Machine learning methods in drug
discovery. Molecules 25(22):5277
436 Ch. Sravanthi and S. Chowdary
Abstract This paper focus on the visible (VIS) and near-infrared (NIR) range of
electromagnetic (EM) spectrum, the available EM energy is very high, but in thermal
infrared (TIR) range the EME is very low. This results in poor or coarse spatial
resolution and lower amount of detail in TIR images. Artificial Neural Networks
(ANN) approach is adopted to utilize the best properties, to improve the coarse
spatial resolution (120 m) of Landsat Thematic Mapper (TM) TIR data utilizing the
advantages of fine resolution (30 m) VIS and NIR data. The working of this model
is based on the 3 VIS and NIR band data, Raw TIR data is used as the input. This in
turn improves the result in substantial improvement of spatial resolution at the end
of the model.
1 Introduction
The Sun, is the basic and major source of electro-magnetic (EM) energy for remote
sensing, with the Earth as the secondary source. The solar energy available in the
visible (VIS) and near-infrared (NIR) range (0.4 µm–0.7 µm) of EM spectrum is
very high (108 W/µm/m2 ), and in the TIR range (8 µm–15 µm) EM energy is very
low (10 W/µm/m2 ) [1]. The high energy in VIS–NIR ranges results in fine or high
spatial resolutions of images [1] acquired, and the low energy in TIR range results in
M. Gurudeep
Department of ECE, MCET Hyderabad, Hyderabad, India
G. Samatha
Department of ECE, JBIET Hyderabad, Hyderabad, India
S. Ravikanti (B)
Methodist College of Engineering & Technology, CSE, Hyderabad, India
e-mail: [email protected]
G. R. Kulkarni
IIT BOMBAY, Mumbai, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 437
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_43
438 M. Gurudeep et al.
coarse resolution. TIR data in general, is 3–4 times coarser than the data in VIS–NIR
range. Spatial resolution in digital imagery refers to the area on the ground covered
(ground resolution element—GRE) or one picture element (or a pixel) one data value
in digital images. Amount of detail conveyed by the satellite imagery, mainly depends
on the spatial resolution of the sensor, apart from spectral resolution. Finer spatial
resolution results in higher amount of detail. Artificial Neural Networks (ANNs) [2]
due to their best properties are used in many applications [2], classification of Multi-
Spectral (MS) data [3] and urban land use classification [4]. TIR data add thermal
responses [4, 5] in many Earth resources applications. Study area and Data.
The study area falls in the North-East part of Bombay (Plate 1). Thane creek is
the most dominant feature in the area flowing from North to South. The study area
covers the three prominent lakes viz. Tulsi, Vihar and Powai from North to South
on the left side of the creek. The urban area spreading from Thane to Kurla falls on
the left (West) of the creek, and New Bombay falls on the right (East) of the creek.
There are mangroves on either side of the creek [6], and forests cover the hilly areas
around Tulsi and Vihar lakes. The Eastern Express Highway runs on the left side of
Thane creek. The Central Railway and LalBahadurShastriMarg running in North–
South direction are also prominently seen in the area. Landsat 5, thematic mapper
(TM) VIS–NIR data of 30 m spatial resolution (bands 2, 3, 4) and TIR data of 120 m
resolution (band 6) acquired on 20th Dec 1989 was available for the study area and
used in the present studies [6]. Plate 2 shows, Landsat 5 TM band 3 (red, 0.6–0.7 m)
image (left) at 30 m resolution and Plate 3 shows the Raw TIR band 6 (10.4–12.5 m)
image (right) at 120 m resolution. Plate 1, is a FCC image of 3 bands, 2, 3, and 4 of
the Landsat 5 TM (Fig. 3).
Artificial Neural Networks (ANN) with their best properties [2] are found to be far
superior in many applications. ANNs are highly interconnected systems of infor-
mation processing cells/nodes, formed into a few layers with several nodes in each
layer. All the nodes in each layer are connected to all the nodes of the next layer.
Generally, a 3-layer network is used with 1 input layer (I), 1 output layer (O) and 1
or more hidden layers (H) in between. The number of input nodes (NI) is known and
equal to the number of input variables considered (like, number of bands) and the
number of output nodes (NO) is also known and equal to number of output variables
(like, landuse classes). The number of nodes in the Hidden layer (NH) is chosen
empirically [7].
A good number of samples are extracted from some selected areas of the input
image data representing the landuse classes, and scaled to 0.0–1.0 for sigmoid activa-
tion function. This data is used in training the Network chosen to derive the requisite
weights (W ij and W jk ) for the connections between the NI-NH and NH-NO layers
[8]. In training the network, the TIR data (120 m) is enlarged 4 times to match the
resolution (30 m) of the VIS–NIR data (Fig. 4).
Artificial Neural Networks in Improvement of Spatial Resolution… 439
The primary goal of the current research is to enhance the spatial resolution of TIR
data from 120 to 30 m. Different methods [6] have been employed in the past to
improve spatial resolution, but ANNs have been proven to be significantly superior.
A 3-layer (I, H, O) ANN is employed in the current investigations, and programmes
developed [10] in “C” in previous studies are used in the process. Two cases are
carried out in the process of improvement of effective spatial resolution of the TIR
data. In case 1, only 3-band (VIS–NIR) data is used as input. In case 2, Raw TIR
data is also used along with the 3 VIS–NIR bands as input. Only a single forward
pass is used in the improvement of resolution. The output is the improved resolution
[11] (30 m) TIR data.
In the first case, data of bands 2, 3 and 4 (VIS and NIR) only are used as the input.
The network used for the case is shown in the Fig. 2. The output of the Network is
the improved resolution (30 m) TIR data.
In the second case, data of bands 2, 3, 4 and raw day time band 6 (TIR) data also is
used as the input. The network is shown in Fig. 5. The output of the Network is the
improved resolution (30 m) TIR data.
440 M. Gurudeep et al.
The higher resolution of VIS–NIR data is taken advantage of in the process. Two
cases, (i) using only the 3 bands of VIS–NIR data as input, and (ii) using 3 bands of
Artificial Neural Networks in Improvement of Spatial Resolution… 441
Wij Wjk
Input B1
Input B2 output B6
Input B3
Input B4
VIS–NIR data added with the Raw TIR data also as input. Results of Improved TIR
data obtained with ANN method are found to be far superior to the results obtained
by the statistical approaches carried out earlier.
442 M. Gurudeep et al.
Band 2
Band 3 Improved
Band 6
Histograms, of the Raw TIR data (Fig. 9a), ANN improved TIR data with 3 bands
only (Fig. 9b), and ANN improved TIR data (Fig. 9c) with 4 bands (VIS–NIR and
Raw TIR), are obtained to compare the results. The ranges of DN values in the ANN
improved TIR data are found to be within the range (128–182), of values of the Raw
TIR data (120–183) (Table 1).
Visual interpretation of the ANN improved images (Plate 4 and Plate 5) are also
found to show far superior results with higher amount of land use detail and ease
of terpretation. The results of Case 2 (VIS–NIR and Raw TIR) are found to be far
better than those of Case 1.
Artificial Neural Networks in Improvement of Spatial Resolution… 443
Fig. 9 a Histograms of Raw TIR data, b TIR data case 1, c Improved TIR data case 2
444 M. Gurudeep et al.
5 Conclusion
The major goal of this research was to use the good features of ANNs to improve
the effective spatial resolution of TIR data. It has been discovered that artificial
neural networks (ANNs) can be utilized to improve the effective spatial resolution
of low resolution (120 m) TIR data. The ANN technique has been proven to provide
significantly improved resolution and detail, as well as being very good at retaining
the thermal patterns of the original photos. Using only the high-resolution VIS and
NIR band data resulted in a significant increase in spatial resolution. The addition
of Raw TIR band data to the high-resolution VIS and NIR data as input improves
the resolution even further. The DN ranges of the raw images are maintained better,
according to the histograms of the ANN improved TIR images.
Acknowledgements The authors express their gratitude to Dr. K. GopalRao,Prof. (Retd.,) IIT
Bombay for his guidance and valuable support.
References
1. Lillesand TM, Kiefer R (1994) Remote sensing and image interpretation. In: 3rd (ed) John
Wiley & Sons, Inc
2. Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H (2018) State-of-the
art in artificial neural network applications: A survey. Heliyon 4
3. Kahle BA, Michael JA, Frank DP, John PS (1984) Geological mapping using thermal images,
Rem Sens Environ 16:13–33
4. Price JC (1981) The contribution of thermal data in landsat multi spectral classification.
Photogram Eng Remote Sens 47(2):229–236
5. Stephen ML, Venugopal G (1990) Thematic mapper thermal infrared data in discriminating
selected urban features. Int J Remote Sens 11(5):841–857
6. Valdes M, Inamura M (2001) Improvement of remotely sensed low spatial resolution images by
backpropagated neural networks using data fusion technique. Int J Remote Sens 22(4):629–642
7. Zhang X, Van Gendern JL, Kroonenberg SB (1997) A method to evaluate the capability of
landsat TM band 6 data for sub-pixel coal fire detection. Int J Remote Sens 15:3279–3288
8. Heermann PD, Khazenie K (1992) Classification of multispectral remote sensing data using a
back propagation neural networks. IEEE Trans Geosci Remote Sens 30(1):81–88
9. Chavez SP, Stuart CS, Labrey AA (1991) Comparison of three different methods to merge
multi resolution and multi spectral data: landsat TM data and SPOT panchromatic. Photogram
Eng Remote Sens 57(3):295–303
Artificial Neural Networks in Improvement of Spatial Resolution… 445
10. Ravikanti S (2017) Internet of everything (IoE): a new technology era will have impact on
every facet of our life
11. Ravikanti S, Preeti G (2016) Future’s smart objects in IOT, Based on big-data and cloud
computing technologies
12. A hybrid forecasting method based on exponential smoothing and multiplicative neuron model
artificial neural network, (IRSYSC-2017)
13. J N FIDALGO (2015) Neural networks applied to spatial load forecasting in GIS. INESC Porto
and Depart Electric Eng Comput
14. Application of artificial neural network and climate indices to drought forecasting in south-
central Vietnam, September 2019
15. Mehdy MM, Ng PY, Shair EF, MdSaleh NI, Gomes C (2017) Artificial neural networks in
image processing for early detection of breast cancer. Hindawi Comput Math Meth Med 2017,
Article ID 2610628
16. Thirupathi L, Padmanabhuni VNR (2021) Multi-level Protection (Mlp) policy implementation
using graph database. Int J Adv Comput Sci Appl (IJACSA) 12(3). http://dx.doi.org/https://
doi.org/10.14569/IJACSA.2021.0120350
17. Thirupathi L et al (2021) J Phys: Conf Ser 2089:012049
18. Lingala T et al (2021) J Phys: Conf Ser 2089:012050
19. Pratapagiri S, Gangula RRG, Srinivasulu B, Sowjanya B, Thirupathi L (2021) Early detection
of plant leaf disease using convolutional neural networks. In: 2021 3rd International conference
on electronics representation and algorithm (ICERA), pp 77–82. https://doi.org/10.1109/ICE
RA53111.2021.9538659
20. Padmaja P, Sophia IJ, Hari HS, Kumar SS, Somu K et al (2021) Distribute the message
over the network using another frequency and timing technique to circumvent the jammers. J
NuclEneSci Power Generat Techno 10:9
21. Reddy CKK, VijayaBabu B (2015) ISPM: improved snow prediction model to nowcast the
presence of snow/no-snow. Int Rev Comput Softw
22. Reddy CKK, Rupa CH, VijayaBabu B (2015) SLGAS: supervised learning using gain ratio as
attribute selection measure to nowcast snow/no-snow. Int Rev Comput Softw
23. Artificial neural networks-based machine learning for wireless networks: IEEE, 03 July 2019
Facial Micro-expression Recognition
Using Deep Learning
Nasaka Ravi Praneeth, Godavarthi Sri Sai Vikas, Ravuri Naveen Kumar,
and T. Anuradha
1 Introduction
Subtle expressions involuntarily cause an emotional leakage that exposes true feel-
ings of a person. Because these expressions occur inadvertently, they can also be
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 447
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_44
448 N. R. Praneeth et al.
considered as a person’s true feelings. These expressions are officially called micro-
expressions because they do not last long. In order to detect these subtle expres-
sions in real-life situations, a well-trained accurate model is needed to identify the
real motives. Human micro-expression recognition can be implemented in many
areas for security, lie detection, or information about the person. Nowadays, human
facial micro-expression is most widely used for day to day life. Happy, sad, angry,
fear, neutral, and surprised emotions of human can be detected by using this facial
micro-expression.
MEVIEW dataset consists of 146 video clips which consist of different catego-
rized micro-expression videos in a sequential way [1]. These sequenced datasets
help us to recognize the spontaneous changes of expressions even in the live video.
A model was built using the SAMM [2, 3] and MEVIEW datasets and convolutional
neural network (CNN) classifier, as it is used for detecting the micro-expressions.
This needs to train in a critical way with huge number of layers as it needs to recog-
nize the class that belongs to even for a minute change in the expression as it needs
to detect micro-expressions which occur as fast as 1/15–1/25 of a second. The model
that we built shows the classes of expressions with an accuracy of 89%. Also, the
model that we built shows the emotional levels in the form of percentage.
2 Literature Survey
that it was better to use deep learning models for micro-expression detection in a
better way compared to machine learning methods [2].
Qu and their team explained the CAS (ME)2 database and its characteristics. They
employed LBP method for spotting and evaluating the micro- and macro-expressions
[12]. Adegun et al. presented their work of recognizing micro-expressions using
combination of LBP on three orthogonal planes and ELM. They concluded that
detecting expression from static images will not be effective for subtle movements
[13].
3 Proposed System
The specific domain for this model is in research areas such as human physiological
interaction detection purposes especially, at the time of interrogations. Building a
model for micro-expression recognition is a heavy task, where it has to be trained in
an ultimate way for detecting micro-expressions that last for 0.2–0.5 s which means
that the upper bound limit of such expressions will be less than ½ s. The proposed
system as shown in Fig. 1 captures the face of the user using a webcam. The frames
will be extracted from the video, and all these frames will be converted to grayscale
for pixel formatting. The model built helps to detect these facial micro-expressions in
a spirited way as it was well trained with the existed spontaneous micro-expression
database called SAMM and MEVIEW.
MEVIEW consists of a set of video clips that helps us to train the model for
predicting facial micro-expressions precisely. Those videos were splitted into frames,
and some preprocessing techniques such as image rescaling, cropping, converting
RGB images to grayscale, and image rotation were applied on the dataset for
Usually, a person’s subtle expressions come out when they are in a stressful and high
share situation such as when they feel to hide their real feelings. As we know, a video
is a sequence of images, where every type of expression can be extracted as there
will be continuality. So, we extracted the frames from each video and used them to
train our model so that it can be able to predict micro-expressions.
From video clips, the frames are extracted. Some preprocessing techniques like image
rescale, rotation, and image cropping were done and converted all the images from
RGB to grayscale for extracting the information easily from the image by reducing
the size of pixel values. The images of the training dataset are 48 X 48 pixels in size.
Face detection plays a key role in recognizing facial micro-expressions. For this,
Haar cascade model was implemented which will be provided in OpenCV as a pre-
trained method [14, 15]. A pop-up window will occur on the screen which shows the
webcam feed. Haar cascade is an object detection algorithm which helps to draw a
bounding box around the face which denotes the identification of the object that we
are looking for.
As described in Fig. 1, the detected frame will be given as input to the trained model
for predicting the class label of that particular expression. The model calculates
and extracts the features from the image. This model learns the feature detection
via hidden layers of the model. These extracted features will be compared with the
training sets of data. Thus, the class label will be displayed on the top of the bounding
box occurring around the face. Also, all the labels of micro-expressions that were
Facial Micro-expression Recognition Using Deep Learning 451
captured when the webcam is on will be visible in the background with the prediction
score of that respective image.
3.4 Limitations
Since assessing facial subtle expression is a very laborious task, the face must be
visible to accurately assess subtle expression. For this, the use of a high-quality
camera can capture every frame in a clear cut. Thereby, it is more helpful in accurately
assessing subtle expressions.
4 Proposed Algorithm
We have used deep learning approach, which is used for image processing is convo-
lutional neural network (CNN). Convolutional neural network consists of different
layers. They have built this CNN model with six convolutional layers and max-
pooling layers. The input size of the image must be in the 48 × 48 dimensions.
Convolution layer is used for filters, or methods are applied to the original image or
to other feature maps. This convolution layer works with the help of ReLU which is
activation function, and these can be applied for different number of filters. In this
convolution layer, kernel size must be the 3 × 3 dimensions as shown in Fig. 2.
Max-pooling is one of the layers of CNN model which is used for selecting most
of the elements from the particular region of the feature map which is covered by the
filter. For this, size of max-pooling must be 2 × 2 dimensions for this model which
have been created as shown in Fig. 2. Thus, the output after the max-pooling layer
contains a feature map similar to the features of the previous feature map. Flattening
layer converts the data into a one-dimensional array for inputting it to the next layer.
Finally, flatten layer is connected to the fully connected layer which is the last layer
of the convolutional layer for the output. Softmax is used as an activation function
for multi-class classification problems as there are more than two classes of emotions
that need to be predicted. In this, micro-expressions are detected within 0.5 s and
also detected every micro-expression, and the levels of micro-expressions can be
displayed on background.
5 Results
6 Conclusion
with an accuracy of 89%. Also, the model can show the emotional levels in the form
of percentage.
The accuracy of the model can be improved by adding much more layers to it.
Use of a high-quality camera is required to capture every frame, and the face must
be visible to accurately assess subtle expressions.
References
1. Husák P, Cech J, Matas J (2017) Spotting facial micro-expressions “in the wild”. In: 22nd
computer vision winter workshop (Retz). https://cmp.felk.cvut.cz/~cechj/ME/
2. Davison K, Lansley C, Costen N, Tan K, Yap MH (2018) Samm: a spontaneous micro-facial
movement dataset. IEEE Trans Affect Comput 9(1):116–129
3. Davison AK, Merghani W, Yap MH (2018) Objective classes for micro-facial expression
recognition. J Imaging 4(10):119
4. Choi DY, Song BC (2020) Facial micro-expression recognition using two-dimensional land-
mark feature maps. IEEE Access 8:121549–121563. https://doi.org/10.1109/ACCESS.2020.
3006958
5. Reddy S, Karri ST, Dubey SR, Mukherjee S (2019) Spontaneous facial micro-expression recog-
nition using 3D spatiotemporal convolutional neural networks, pp 1–8. https://doi.org/10.1109/
IJCNN.2019.8852419
6. Dubey V, Takkar B, Lamba PS (2020) Micro-expression recognition using 3D—CNN. Fusion:
Pract Appl 1(1):5–13. https://doi.org/10.54216/FPA.010101
7. Adegun IP, Vadapalli HB (2020) Facial micro-expression recognition: a machine learning
approach. Sci Afr 8:e00465, ISSN 2468-2276. https://doi.org/10.1016/j.sciaf.2020.e00465
8. Takalkar MA, Xu M (2017) Image based facial micro-expression recognition using deep
learning on small datasets. Int Conf Digital Image Comput: Tech Appl (DICTA) 2017:1–7.
https://doi.org/10.1109/DICTA.2017.8227443
9. Yap CH, Yap MH, Davison AK, Cunningham R (2021) 3D-CNN for facial micro- and macro-
expression spotting on long video sequences using temporal oriented reference frame. https://
arxiv.org/abs/2105.06340v3
10. Zhao Y, Xu J (2019) A convolutional neural network for compound micro-expression
recognition. Sensors 19:5553. https://doi.org/10.3390/s19245553
11. Peng M, Wang C, Chen T, Liu G, Fu X (2017) Dual temporal scale convolutional neural network
for micro-expression recognition. Front Psychol 8:1745. Published 2017 Oct 13. https://doi.
org/10.3389/fpsyg.2017.01745
12. Fu (2017) Cas (me)ˆ2: a database for spontaneous macro-expression and micro-expression
spotting and recognition. IEEE Trans Affect Comput
13. Adegun P, Vadapalli HB (2016) Automatic recognition of micro-expressions using local binary
patterns on three orthogonal planes and extreme learning machine. In: 2016 pattern recognition
association of South Africa and robotics and mechatronics international conference (PRASA-
RobMech), pp 1–5. https://doi.org/10.1109/RoboMech.2016.7813187.
14. Sri BR, Akanksha Y, Puthali R, Anuradha T (2021) Early driver drowsiness detection using
convolution neural networks. In: Proceedings of the 2nd international conference on electronics
and sustainable communication systems, ICESC 2021, pp 1779–1784
15. Teja PR, AnjanaGowri G, PreethiLalithya G, Anuradha T, Kumar CSP (2021) Driver drowsi-
ness detection using convolution neural networks, smart innovation, systems and technologies
224:617–62
Precision Agriculture with Weed
Detection Using Deep Learning
Abstract Agriculture is the field which needs care and attention. This field remains
as the backbone to the Indian Economy. Nowadays, the production or yield decreases
due to the increase of variety of crop diseases and weeds. Identification and elimi-
nation of weeds are a tedious task. To reduce the stress on the farmers and increase
the productivity of the crop, machine learning and deep learning can be used to
detect the weeds and the diseases. Various researches have been conducted in this
area using machine learning algorithms like Random Forest (RF) and Support Vector
Machine (SVM). But for better accuracy in the results, deep learning techniques—
InceptionV4 and Xception are used to detect the weeds with higher speed and usage
of less computing resources.
1 Introduction
Weeds grow on farmland by feeding on nutrients present in the soil that is meant for
the crop plant. Weeds indeed compete for various resources necessary for growth
from the crop plants and deplete the nutrients available for the crop plants to grow.
Therefore, pulling weeds becomes an inevitable task in farming. However, manually
pulling weeds on huge crop lands can be time-consuming task to farmers. This
problem raises a need to automated weeding. Automated weeding is process of
removing weeds using machines. Weed identification also called weed detection
plays an important role in automated weeding. One crop that suffers from many
types of weeds is cotton [1].
Identifying crop and weeds (hereinafter referred to as detection) is the first impor-
tant step in an automated weed control process. The development of computer-vision
algorithms for weed detection has a long history, and research dates back to the 1980s.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 455
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_45
456 I. Deva Kumar et al.
Many algorithms are used to distinguish between weeds and crops. Feature detec-
tion with machine learning algorithms combined can improve performance. Recent
studies have shown that deep learning algorithms can be used to further improve
performance. These algorithms learn the characteristics of the associated image and
detect weeds directly from the camera image.
Hence, in this paper total eleven weeds in the cotton crop are classified using
the convolutional neural network algorithms. The models used are InceptionV4 and
Xception. InceptionV4 is a CNN architecture built on previous iterations of the Incep-
tion family by simplifying the architecture and using more Inception modules than
Inceptionv3. The module was developed, among other things, to solve the problem
of computational effort and overfitting.
Xception is a deep convolutional neural network architecture that includes convo-
lutions that can be separated by depth. It was proposed by Google researchers. Google
has presented the interpretation of the convolutional neural network Inception module
as an intermediate step between normal convolution and a separable convolution oper-
ation in the depth direction (depth convolution followed by pointwise convolution).
From this point of view, the depth-separable convolution can be understood as an
Inception module with the largest number of towers. Based on this observation, they
propose a new deep convolutional neural network architecture inspired by Inception,
replacing the Inception module with a depth-separable convolution.
2 Review of Literature
There are different papers published to detect the weeds and control the weeds in real
time. One of them is “A Multi-class weed species Image Dataset for Deep Learning”
[2]. 17,509 images are collected from eight different kinds of species from the fields
in Australia, and Inception v3 and ResNet-50 were used to identify the weeds. But
the author focused only on the preparing the dataset.
Next paper is “A Deep Learning Approach for Weed Detection in Lettuce Crops
Using Multispectral Images” [3]. They gathered the images from the field with the
help of drone and used machine learning algorithm called “Support Vector Machine”
and convolutional neural network—YOLO and R-CNN for identification of the
weeds.
The authors of the paper titled “Weed Location and Recognition based on UAV
imaging and Deep Learning” have taken 2000 weed images from the fields of china
[4]. To classify the weeds, deep learning techniques called YOLO V3 and YOLO
V3-tiny are used. But the disadvantage is the model failed to identify the weeds
which are in growing stage or in small size.
The models InceptionV4 and ResNet are used in the paper. InceptionV4 uses
stronger architectural constraints than InceptionV3. The novel introduction of ResNet
with skipping layers is included in Xception network along with deep convolution
and point convolution techniques. Ultimately by using high end models like these,
Precision Agriculture with Weed Detection Using Deep Learning 457
Figure 1 shows the architecture of the proposed work. The weed images which are
taken in different soil conditions, growth stages, etc., are fed into the preprocessing
stage. In the preprocessing stage, the data augmentation and normalization techniques
are applied. Before the normalization, the pixel values are between 0 and 255. Now,
the pixel values are between 0 and 1. After preprocessing step, the data (image
dataset) is separated to training data and testing data. Training images are given to
the classification model called Inception, whereas the testing images are used for
the evaluation of the model. The model takes many parameters. After completion of
the iteration, the model is saved and evaluated against the test data. If accuracy is
acceptable, model can be used on real-world data.
3.2.2 Algorithm
The pretrained models like from torch are imported and used in this project. Here,
we try to summarize how different algorithms work.Inception Algorithm:
1. Input the image to the model
2. for i: = 1 to epochs
2.1
parallel filtering is applied for different sizes 1*1,3*3,5*5
2.2
extract the features by applying the filters
2.3
average pooling is applied to reduce the amount of parameters
2.4
if predicted_label not matched with the actual_label
2.4.1 loss function is applied
2.5 if classification then
2.5.1 Cross_Entropy_Loss is applied
2.6 else
2.6.1 Mean_Squared_Loss is applied
3. Outputs the probabilities of the class label
4. The class label which has higher probability is the predicted label
Optimization algorithms are used to boost and give momentum to the model.
The torch.optim is a package which implements various optimization algorithms.
The algorithm used in this paper is SGD (stochastic gradient descent). The
torch.optim.lr_Scheduler helps to adjust the learning rate after each epochs based on
the optimization algorithms used.
Precision Agriculture with Weed Detection Using Deep Learning 459
The cost or loss functions are used to increase the accuracy and make the model
optimized. The main aim is used to decrease the loss. If the loss is more, then the
model is not preferred. If the loss is less, the model is preferred for real-time usage.
Cross-entropy loss function is mainly used for the classification problems. The cross-
entropy loss is preferred when there is an unbalanced dataset. The formula for the
cross-entropy loss is given.
( )
exp(x[class])
loss(x, class) = − log Σ = −x[class]
j exp(x[i])
( ) (1)
Σ
+ log exp(x[ j])
i
4.1 Dataset
The cotton crop was selected for the project; then, the cotton weeds are taken from
the Kaggle [6]. The dataset contains the 11 labels and 5187 images of the weeds.
The images are taken under different conditions like the natural light, varied weed
growth stages, and different soil types. The names of the weeds are Carpetweeds,
Crabgrass, Ragweed, Sickle pod, Spurred Anoda, Swinecress, Water hemp, Morning
glory, Spotted Spurge, and Prickly Sida. Figure 2 shows the sample images of each
class.
The environment used to construct and validate the model is Google Colab. It is a
free notebook environment which helps to connect to the Google Drive. Colab has
many pre-installed machine learning libraries which can load into our notebook using
import keyword followed by the library name. The libraries used in this project are
PyTorch, Seaborn, Matplotlib, time, multiprocessing, csv, etc. Figure 3 shows the
predicted results of some weed images as part of the validation process of the model.
The three weeds in the above figure are predicted correctly but the last one is wrongly
predicted as the Morning glory. From the confusion matrix of the InceptionV4,
carpetweeds were classified with 96% accuracy but the Swinecress and Spurred
Anoda were not classified properly due to the similar structure with other weeds.
The graphical representation of the per class, precision, recall, and F1score is shown
in Fig. 4.
The different models are constructed using the same dataset with different CNN
algorithms. Even though the Xception and efficientnet-B5 have given high training
accuracy and validation accuracy, the models failed to predict the real-time weeds in
the cotton fields. This may be due to the overfitting of the data. InceptionV4 predicts
the labels with more accuracy for real weed images. The tabular representation of
the training time, training accuracy, validation accuracy, etc., for different models
with different epochs is shown in Table 1. The training and validation loss for the
InceptionV4 model is shown in Fig. 5. The loss graph of Fig. 5 is with 25 epochs. At
epoch-0, the validation loss is 2.2558 and the training loss is 2.2456, and at epoch-25,
the validation loss is nearly 0.8328 and the training loss is 0.3127.
5 Conclusion
crop cannot be afforded. For weed pulling application, high accuracy and speed
should be given. Our project concentrates on higher accuracy with increased speed.
The project also facilitates the retraining when suffice amount of new data is added.
Therefore, the models with less computational cost and higher efficiency are used as
part of this project.
Precision Agriculture with Weed Detection Using Deep Learning 463
References
1. Alex O, Konovalov DA, Philippa B, Ridd P, Wood JC, Johns J, Banks W et al (2019) DeepWeeds:
a multiclass weed species image dataset for deep learning. Sci Rep 9(1):1–12
2. Zhang R et al (2020) Weed location and recognition based on UAV imaging and deep learning. Int
J Precision Agric Aviat 3(1)
3. Arif S et al (2021) Weeds detection and classification using convolutional long-short-term
memory. ResearchSquare
4. Islam N, Rashid MM, Wibowo S, Wasimi S, Morshed A, Xu C, Moore S (2020) Machine
learning based approach for weed detection in chilli field using RGB images
5. li L, Zhang S, Wang B (2021) Plant diease detection and classification by deep learning. IEEE
6. Chen D, Lu Y, Li Z, Young S (2021) Performance evaluation of deep transfer learning
on multiclass identification of common weed species in cotton production systems.
arXiv:2110.04960[cs.CV]
An Ensemble Model to Detect
Parkinson’s Disease Using MRI Images
1 Introduction
PD is a neural disorder that affects the human brain’s motor system. The disease
occurs when neurons that control the human body’s movement become impaired
and eventually die. When this phenomenon occurs, neurons produce less dopamine,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 465
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_46
466 T. S. Lakshmi et al.
a brain chemical responsible for the disease’s movement problems. The symptoms
appear when 60% of the dopaminergic neurons begin deteriorating. Significant PD
symptoms are shaking, rigidity, and difficulty with walking, balance, and coordi-
nation. These symptoms typically begin slowly and get worse over time. PD is the
second primary neurological disorder widespread in older adults after Alzheimer’s.
Age is identified to be an extreme risk factor for Parkinson’s. The disease’s occur-
rence reaches a high at around 80 years. The number of patients will be more than
30% by 2030. Environmental and genetic factors play a considerable role in the cause
of PD.
Detection of PD in the early stages is crucial in impeding the disease’s growth
and providing patients with some opportunity to have access to good treatment. The
disease is usually diagnosed by using history and neurological investigations [1]. But
the condition may not be identified accurately, as other similar symptoms of diseases
related to neurodegeneration exist. Diagnosis is typically made when there is a heavy
loss of dopamine chemicals. The exact detection of PD is a task that still poses a
challenge. Various clinical tests can diagnose it. But, as the clinical tests are related to
biological brain changes, visual image inspection can be an appropriate technique for
diagnosis. Several neuroimaging methods like Single Photon Emission Computed
Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Resonance
Imaging (MRI), Functional Magnetic Resonance Imaging (fMRI), and Transcranial
Sonography are used for diagnoses of PD [2]. However, MRI has seen many recent
improvements, making the diagnosis relatively easier. Convolutional neural network
(CNN) is a deep learning (DL) technique that has recently demonstrated excellent
results in classifying images in visual content analysis. Over the years, researchers
and students have studied and worked upon artificial neural networks to solve crucial
image classification challenges [3]. In this work, an ensemble of popular convolu-
tional neural networks VGG16 and ResNet50 is performed to observe an overall high
classification performance compared to the models’ individual performance.
2 Literature Review
The authors of [4] discuss the existing deep learning architectures for image detec-
tion, segmentation, classification, etc., of MRI. It mainly focuses on deep learning’s
application in disease detection using the MRI modality and numerous problems
and current advances in deep learning linked to image processing. In paper [5], a
CNN architecture has been implemented to efficiently classify Alzheimer’s subjects
from healthy control subjects using fMRI data. The authors of [6] have shown how
complex networks can be proficiently used to define novel brain connectivity and
introduce accurate PD markers. In the study of [7], the authors proposed a custom
CAD-based CNN model for classifying healthy patterns and MRI patches related to
Parkinson’s.
The authors of [8] proposed a framework to classify MRI scans of Parkinson’s
disease and healthy control subjects by combining data augmentation techniques
An Ensemble Model to Detect Parkinson’s Disease Using MRI Images 467
with a transfer learned CNN like AlexNet. In [9], the authors have shown how the
dropout algorithm can affect accuracy in diagnosing Parkinson’s disease and the use
of batch normalization. 97.92% accuracy is achieved when these are applied with
LeNet-5 architecture.
In the study [10], two DL models were used to classify PD subjects from healthy
control subjects at the early stages of diagnosis. The flair and T2-weighted MRI
scans were extracted from the public database of PPMI. To improve model perfor-
mance, pre-processing was performed in four stages in the following order: N4 bias
correction, histogram matching, z-score normalization, and image rescaling. They
regulated the model having almost 123 million parameters using dropout and a ridge
regularizer and achieved a high accuracy of 88.9%.
All the above-proposed models have achieved considerable accuracy in classifying
the brain MR images.
This section contains a brief explanation of the methodology involved in our proposed
work. It covers the MRI database that is taken, the pre-processing of the image dataset,
and the architectures of the CNN models used.
3.1 Dataset
Table 1 Demographics
HC PD
Subject records 84 102
Sex (F/M) 38/46 38/62
Age 58 ± 11 (approx.) 60 ± 9(approx.)
468 T. S. Lakshmi et al.
images are normalized to the (0, 1) range using normalization methods. A filtering
operation is then applied to these images to reduce their noise. A 2D Gaussian filter
with a value of 0.8 is set as the optimized standard deviation, and a 5 × 5 kernel
is applied to smoothen the previously normalized images, reducing the intensity
inconsistencies.
To increase the size of the dataset and better train the models, we perform real-
time augmentation, which creates new iterations in real time while the model is being
trained. We create multiple transformed copies of the same image using this image
augmentation technique by applying different transformations to original dataset
images.
Pre-processing is then done on the raw images shown in Fig. 1a by applying a
Gaussian filter to reduce noise in the images shown in Fig. 1b. The image dataset is
split into training, validation, and test sets in the ratio of 70:15:15, indicating that
70% of the dataset is split for training, 15% for validation, and 15% for the test set.
The models are then made to train with images of size 224 × 224 pixels for
VGG16 and 256 × 256 pixels for ResNet50, which are processed in their respective
layers from the input layer to the final output layer. An average of 45 ± 10 image
slices from each patient has been used in the input dataset according to the criteria
given in Table 2.
Fig. 1 MR images
3.2 Methodology
The ensemble model used in our study consists of two CNN models, namely VGG16
and ResNet50. The VGG16 is a 16-layer convolutional neural network. This network
substitutes huge kernel-sized filters with multiple 3 × 3 kernels, one after the
other, followed by Max-pooling across a 22-pixel window with stride two and three
completely connected layers [12]. ResNet50, a 50-layer neural network, was success-
fully trained using the ResNet unit. The core concept is residual learning, and it is
shown to be effective in dealing with network degeneration. It produces good results
despite having fewer parameters than VGGNet [13].
To tackle the issue of a large number of parameters, models that have already been
pre-trained on different image datasets and contain pre-trained weights are used to
ease the process for classifying new datasets [14]. Both of our proposed VGG16
and ResNet50 models have been trained previously on the ImageNet dataset, which
consists of around 15 million images and 1000 classes. This knowledge helps in
better classification of our dataset, giving improved accuracy.
Ensemble Modeling
The proposed methodology’s core concept involves ensembling two CNN models
to increase accuracy considerably. It was first discussed in [15]. Ensembling is the
process of combining multiple learning models or algorithms to achieve a collec-
tive and improved performance in prediction. Traditionally, while many models are
available to classify or predict data individually, sometimes it may lead to lower
accuracy due to the models not fitting the whole training data or some models identi-
fying specific features better than the other models. If we combine such models, the
overall accuracy is boosted, leading to a better classification of images. Our study
ensembles a VGG16 model and a ResNet50 model.
The proposed method shown in Fig. 2 uses the weighted average ensemble method.
The weighted average ensemble combines the models based on their effectiveness
and contribution in classifying the given dataset, since some models tend to classify
a specific set of features better than the others. Our study combines the models by
finding the ideal weights to achieve maximum possible accuracy.
4 Experimental Results
The MR image dataset is trained on the proposed models. The raw data is fed into
both models’ first convolution layers, and image slices’ convolution is performed
with filters. Prominent features that help recognize images are extracted at each
convolution layer. The fully connected (FC) layer is then fed by the features learnt
by all the previous layers. The VGG16 and ResNet50 models are trained for 25
epochs using the training set and validation set we prepared earlier. The models’
parameters are tuned based on the validation set for every epoch. The training loss
470 T. S. Lakshmi et al.
and accuracy and the validation loss and accuracy are calculated for every epoch.
Initially, the learning rate is initialized to 1e-4. The learning rate decreases if the
validation loss is not improved over several epochs. The models are saved when
there is an improvement in validation loss. The best models that were saved during
the training process after a cycle of 25 epochs are used, and the metrics, loss value,
and accuracy are found. The parameters used are shown in Table 3.
During the training process of VGG16, the validation loss is initially in the range
of 0.6–0.7. It gradually decreases with each epoch as the model learns and tunes
its hyperparameters based on the validation set. The learning rate is reduced if the
validation loss does not improve for a set number of iterations. The validation loss
reaches 0.38 by the end of 25 epochs. Graphs for training and validation metrics are
plotted in Fig. 3.
Similarly, for ResNet50, the validation loss is initially observed to be in the range
of 1.2–1.6 and decreases gradually with each iteration as the model learns and tunes
its hyperparameters based on the validation set. The learning rate decreases if the
validation loss does not improve for a set number of iterations. The validation loss
improves to 0.27 by the end of 25 epochs. Graphs for the training process are plotted
in Fig. 4.
These models are subjected to predicting classes of the images present in the test
dataset. The output from the final fully connected layer consisting of the softmax
function gives two outputs, probabilities between 0 and 1. Using the predicted labels
and true labels, the performance of the classifiers of VGG16, ResNet, and ensemble
models are obtained by measuring their accuracy, recall, precision, and F1-score.
The classification results for VGG16, ResNet50, and ensemble model are tabulated
in Table 4.
ResNet50 performed better in classifying PD, while VGG16 classified HC better.
The weighted average ensemble is applied using these models, combining the predic-
tions from each model based on the ideal weights. Ideal weights for each model are
found using the grid search algorithm. The contribution of each model is weighted
proportionally to its capability and effectiveness, which in turn results in the best
achievable combination and maximum performance, which can be seen in the table.
Table 5 shows the comparison of VGG16, ResNet, and ensemble models’ accuracy
on test data. We can see that the ensemble model achieved higher accuracy than the
ResNet50 and VGG16 models for the detection and classification of Parkinson’s
disease.
472 T. S. Lakshmi et al.
Table 5 Comparison of
Model Accuracy (%)
accuracies of all models
VGG16 90.19
ResNet50 92.18
Ensemble model (VGG16 and ResNet50) 96.09
5 Conclusion
Parkinson’s disease has no cure till today, and effective early diagnosis is essen-
tial before it can severely affect the patients so that proper care can be taken in
later stages. MRI has been increasingly used in recent years for neuroimaging anal-
ysis of degenerative diseases. In this work, we performed a study on the classifi-
cation of MRI scanned images of HC and PD patients by applying state-of-the-art
deep learning architectures and techniques. We have used pre-trained VGG16 and
ResNet50 models for detection and classification purposes. The final FC layer is
fine-tuned for classifying HC and PD classes. Later, we built an ensemble of the
best performing versions of the two models using the weighted average ensemble
technique where each model’s prediction is multiplied by their ideal weights, and
then, their average is calculated. The contribution of each model to the final predic-
tion is weighted by their individual performance. An accuracy of 90.19% is achieved
with VGG16 and 92.18% with Resnet50 individually. The proposed ensemble model
achieved an accuracy of 96.09%, showing an improved discriminatory proficiency
than individual deep learning models. In future, modern state-of-the-art CNN models
with deeper architectures like EfficientNet and DenseNet with millions of parameters
and their ensembles can be used to classify MR images with very high accuracies,
making the diagnoses of PD no longer an arduous job for clinicians.
An Ensemble Model to Detect Parkinson’s Disease Using MRI Images 473
References
1. Jankovic J (2008) Parkinsons disease: clinical features and diagnosis. J Neurol Neurosurg
Psychiatry 79(4):368–376
2. Chung HYChung YL, Tsai WF (2019) An efficient hand gesture recognition system based on
deep CNN. In: 2019 IEEE international conference on industrial Technology (ICIT). IEEE
3. Provost JS, Hanganu A, Monchi O (2015) Neuroimaging studies of the striatum in cognition
part I: healthy individuals. Front Syst Neurosci 9:140
4. Kalyani G, Janakiramaiah B, Karuna A, Prasad LV (2021) Diabetic retinopathy detection and
classification using capsule networks. Complex Intell Syst, pp 1–14
5. International Conference on Power Energy, Environment and Intelligent Control (PEEIC)
(2019) Greater Noida, India, pp 458–465. https://doi.org/10.1109/PEEIC47157.2019.8976727
6. Sarraf S, Tofighi G (2016) Deep learning-based pipeline to recognize Alzheimer’s disease using
fMRI data. In: 2016 future technologies conference (FTC), San Francisco, CA, pp 816-820.
https://doi.org/10.1109/FTC.2016.7821697
7. Amoroso N, La Rocca M, Monaco A, Bellotti R, Tangaro S (2018) Complex networks reveal
early MRI markers of Parkinson’s disease. Med Image Anal 48:12–24
8. Shah PM, Zeb A, Shafi U, Zaidi SFA, Shah MA (2018) Detection of Parkinson disease in brain
MRI using convolutional neural network. In: 2018 24th international conference on automation
and computing (ICAC). IEEE, pp 1–6
9. Kaur S, Aggarwal H, Rani R (2021) Diagnosis of Parkinson’s disease using deep CNN with
transfer learning and data augmentation. Multimedia Tools Appl 80(7):10113–10139
10. Bhan A, Kapoor S, Gulati M, Goyal A (2021) Early diagnosis of Parkinson’s disease in brain
MRI using deep learning algorithm. In: 2021 third international conference on intelligent
communication technologies and virtual mobile networks (ICICV). IEEE, pp 1467–1470
11. Vyas T, Yadav R, Solanki C, Darji R, Desai S, Tanwar S (2021) Deep learning-based scheme
to diagnose Parkinson’s disease. Expert Syst, e12739
12. Fellner F, Schmitt R, Trenkler J, Fellner C, Helmberger T, Obletter N, Böhm-Jurkovic H (1994)
True proton density and T2-weighted turbo spin-echo sequences for routine MRI of the brain.
Neuroradiology 36(8):591–597. https://doi.org/10.1007/BF00600415
13. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv 1409.1556
14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778.
https://doi.org/10.1109/CVPR.2016.90
15. Subramanian M, LV NP, B J, A MB, VE S (2021) Hyperparameter optimization for transfer
learning of VGG16 for disease identification in corn leaves using Bayesian optimization. Big
Data 3:X, pp 1–15, https://doi.org/10.1089/big.2021.0218.
Classification of Diabetic Retinopathy
Using Deep Neural Networks
Abstract Diabetic retinopathy (DR) is one of the disorders which generally occurs
among diabetic patients, which can even affect the eye gradually. This disorder
has to be identified at the beginning stage, or else this can damage the eyesight
permanently. Since the fundus oculi are so easily visible, retinopathy is the most
commonly recorded chronic complication of diabetes and, as a result, the one we
know the most about in terms of epidemiology and natural history. Clinicians can use
empirical but effective methods to postpone the initiation and development of diabetic
retinopathy by achieving near-normal blood glucose and blood pressure levels. In
order to identify this abnormality, ophthalmologists use the “fundus images” of the
eye, that is retinal image to detect it. But to detect this abnormality through a naked
eye, ophthalmologists find it difficult because it takes lot of time and consumes cost.
Also, there might be misjudgement through a naked eye. So, “deep learning” can be
used to detect the diabetic retinopathy at an early stage. There are many techniques
of deep learning using “convolutional neural network (CNN)” to achieve the results
of the level of eye damage. In this work, “Residual Networks” are also experimented
to find out the results by improving accuracy.
J. Hyma (B)
Department of CSE, GITAM University (Deemed to Be), Visakhapatnam, India
e-mail: [email protected]
M. Ramakrishna Murty · S. Ranjan Mishra
Department of CSE, Anil Neerukonda Institute of Technology & Sciences (ANITS),
Visakhapatnam, India
e-mail: [email protected]
Y. Anuradha
Department of CSE, G.V.P College of Engineering (A), Visakhapatnam, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 475
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_47
476 J. Hyma et al.
1 Introduction
Diabetes mellitus (DM) is a leading cause of vision loss in people of working age.
Medical signs of vascular defects in the eye are used to diagnose DR. Increased
vascular permeability and capillary occlusion are two key findings in the retinal
vasculature in non-proliferative diabetic retinopathy NPDR, which represents the
early stage of DR. Fundus photography can detect retinal pathologies such as microa-
neurysms, haemorrhages, and hard exudates at this point, even if the patients are
asymptomatic.
PDR is distinguished by neovascularization, which is a more advanced stage of
DR. When the new irregular vessels bleed into the vitreous (vitreous haemorrhage)
or when tractional retinal detachment is present, the patients may experience extreme
vision impairment. DME—diabetic macular oedema is characterized by swelling or
thickening of the macula caused by sub- and intra-retinal fluid accumulation in the
macula as a result of the blood-retinal barrier breaking down (BRB). DME can occur
at any level of DR, causing visual image distortion and a loss of visual acuity.
Laser photocoagulation has proved to be extraordinarily successful even though
other treatments have failed and retinopathy has progressed to the point of sight loss.
Despite this, retinopathy continues to be a leading cause of blindness, and there is no
evidence that diabetes-related vision loss is declining in developed countries. This
may be due to the mixed blessing of longer survival of diabetic patients who were
diagnosed when metabolic regulation was less stringent than it is now. Screening
for sight-threatening retinopathy is the most cost-effective medical technique docu-
mented, and it can help improve the usage of diagnostic and therapeutic services, but
most healthcare systems are still stuck in a state of stagnation and lack of interest. In
order to identify this abnormality, ophthalmologists use the “fundus images” of the
eye, that is retinal image to detect it.
But to detect this abnormality through a naked eye, ophthalmologists find it diffi-
cult because it takes lot of time and consumes cost. Also, there might be misjudge-
ment through a naked eye. Several traditional image processing techniques have
been experimented on DR detection. More advanced deep learning techniques are
also used to detect the diabetic retinopathy at an early stage. There are many tech-
niques of deep learning using “convolutional neural network (CNN)” to achieve the
results of the level of eye damage. This work extended with an experimentation with
“Residual Network” to find out the results by improving accuracy.
Diabetic retinopathy is divided into five categories: “No Diabetic Retinopathy
(NDR)”, “Mild Non-proliferative Retinopathy”, “Moderate Non-proliferative
Retinopathy” or “Severe Non-proliferative Retinopathy”, and “Proliferative
Retinopathy”.
1. No Diabetic Retinopathy (NDR): It is the early stage of the disease, in which
there is no irregular blood vessel development (proliferation).
2. Mild Non-proliferative Retinopathy: This is the first stage of the disorder, which
is characterized by microaneurysms, which are small balloon-like swellings that
develop within the retina’s tiny blood vessels.
Classification of Diabetic Retinopathy Using Deep Neural Networks 477
2 Literature Study
The work proposed in [1] aimed to find the abnormalities with a proper detection of
abnormal features of the retinal fundus images. It focused on pre-processing steps
like image enhancement, noise removal, etc., which are crucial in detecting important
features. The results depicted the successful extraction of features and their classifica-
tion to various DR stages. Another work proposed in [2] developed a saliency-based
technique for leakage detection in the angiography. The work proposed in [3] has used
Principal Component Analysis (PCA) for better feature selection and also used back-
propagation neural networks for classifying retinal images to non-diabetic or diabetic
classes. In [4], a hybrid classifier has been proposed with a combination of m-medoids
with Gaussian mixture model to detect retinal lesions with an improved accuracy.
Digital colour images of retinas have been considered for automatic detection of
retinopathy [5]. Another work proposed in [6] came up with an advanced method
for automatic extraction of anatomical features with more precise and accuracy to
detect and diagnose the Glaucoma. Convolutional neural network and advanced deep
learning techniques in defining and analysing the deviations in the DR fundus images
from the non-DR fundus images (the input data) were proposed in the paper [7]. The
work given in [8] used the convolutional neural network, and current DR screening
systems usually use retinal fundus imaging, which is manually assessed by profes-
sional readers. The aim of this research was to create a reliable diagnostic technology
that could be used to automate DR screening. Using our local data collection, they
have achieved a 0.97 AUC with 94 and 98% sensitivity and specificity, respectively,
after fivefold cross-validation. They a developed a systematic analysis of causes of
vision loss.
478 J. Hyma et al.
3 Methodology
There are several freely accessible datasets for detecting DR and vessels in the retina.
These datasets are often used to train, verify, and evaluate systems, as well as to
compare the performance of one system to that of others. Retinal imaging includes
fundus colour images and optical coherence tomography (OCT). OCT images are
two- and three-dimensional images of the retina taken with low-coherence light that
reveal a lot about the shape and thickness of the retina, while fundus images are
two-dimensional images taken with reflected light. OCT retinal images have been
available for a few years now. A wide range of publicly accessible fundus image
datasets are widely used. The following is the fundus image shown in Fig. 2.
Kaggle: It comprises 88,702 high-resolution images obtained from various
cameras, with resolutions ranging from 433 289 pixels to 5184 3456 pixels. Each
picture is assigned to one of the five DR levels. Only the ground truths for training
photographs are open to the public. Many of the photographs on Kaggle are of low
quality and have inaccurate labelling. We have also created a CSV file with the image
names corresponding to the degree of eye injury, which ranges from 0–4.
3.2 Pre-processing
Images from patients of various ethnicities, genders, and lighting conditions in fundus
photography were included in the dataset. This has an effect on the pixel intensity
values in the images, resulting in unneeded variance that is unrelated to classification
levels. To combat this, the Python Image Library package was used to apply colour
normalization to the files. The images were also of high resolution, requiring a large
amount of memory. The images were resized to 128 × 128 pixels.
3.3 Training
The CNN was pre-trained on 1000 photographs at first, before it achieved a significant
degree of accuracy. This was essential in order to get a fast classification result without
wasting a lot of training time. The model was trained on 1000 training images for
another epoch after two epochs of training on the initial images. Over-fitting is a
problem for neural networks, particularly in a dataset like ours, those with no signs
480 J. Hyma et al.
of retinopathy. The class weights were changed with a ratio proportional to how
many images in the training batch were graded as having no signs of DR for each
batch loaded for back-propagation. The probability of over-fitting to a specific class
was significantly decreased as a result of this. To stabilize the weights, more epochs
were used [9]. This was followed by increasing the model’s accuracy to over 75%.
The network was then trained with a low learning rate on the entire training set of
photographs. Various layers and their importance in reaching the objective are given
below.
. Pooling Layer
Pooling is a nonlinear down-sampling technique. Pooling can be implemented
using a variety of nonlinear functions, the most common of which is max pooling.
It divides the input image into arrays and outputs the limit. A feature’s exact loca-
tion is not that useful when compared to components. Pooling in convolutional
neural networks is based on this concept. Down-sampling is the term for this. In
a CNN architecture, a pooling layer is often inserted between successive convo-
lutional layers (each of which is usually accompanied by an activation feature,
such as a ReLU layer). Pooling layers contribute to local translation invariance in
a CNN, but they do not have global translation invariance unless global pooling
is used.
. Rectified Linear Unit (ReLU)
Rectified Linear Units (RLUs) are frequently used in deep learning models. To be
clearer, if the function receives a negative value, it returns 0; if it receives a positive
value, it returns the same positive value [10]. The following is a description of the
function:
The RLU, also known as the ReLU, helps the deep learning model to account for
nonlinearities and complex interaction effects. The ReLU function has the advan-
tage of being a relatively inexpensive function to compute due to its simplicity.
The model can be trained and run in a short amount of time because there is
no complex math involved. Similarly, it converges faster, implying that the slope
does not plateau as X increases. Unlike other functions such as sigmoid or tanh,
ReLU avoids the vanishing gradient problem [11]. Finally, ReLU is only partially
activated since the output is zero for all negative inputs.
. SoftMax
The SoftMax function is a generalization of the logistic function, which simply
squashes values into a given set. The reason for using the SoftMax is to ensure
that these logits all add to 1, thus satisfying the probability density constraints. If
one of the inputs is large, it becomes a large probability; however, it will always
be between 0 and 1 [12, 13].
Classification of Diabetic Retinopathy Using Deep Neural Networks 481
4 Results
To validate the proposed algorithm, the Kaggle dataset clinically specified referable
diabetic retinopathy as a benchmark was used. The proposed model was trained for
enough number of epochs on the Kaggle dataset. For validation purposes, 20,000
images from the dataset were saved. The number of patients with a proper classi-
fication is what we call accuracy. The network’s classifications were numerically
described as follows: 0—No DR, 1—Mild DR, 2—Moderate DR, and 3—Serious
Proliferative DR is the fourth form of DR. The entropy picture of the grey level’s
result outperforms the photograph. This approach, which uses the greyscale variable,
is more accurate and sensitive than the fundus photograph’s entropy of luminance.
Using the entropy image of the greyscale portion will improve accuracy and prevent
under-diagnosis. When using CNN with the entropy picture from the grey as input, the
output is around 73.5% better than when using CNN with individual input. The final
validation dataset achieved 75.5 per cent accuracy by using the Residual Network
and is presented in Figs. 3 and 4.
72.5
Accuracy
70
70
65 65 Accuracy
Accuracy
60 60
0 20 40 60 80 100120 0 20 40 60 80100120
Batch Size Epoch
75
Accuracy %
72
69
CNN RESNet
Model
5 Conclusion
In diabetic patients, a deep learning can improve the accuracy of diagnosing retinal
pathologies. The proposed method’s methodology starts with the grey portion of
the RGB picture. The RGB component’s entropy image can help with accuracy and
sensitivity. In this case, the CNN has been pre-processed with a greyscale input image
and has a lower accuracy than the Residual Network. The proposed deep learning
technology will benefit the automated retinal image analysis system and will assist
ophthalmologists in diagnosing referable diabetic retinopathy. As a future work,
more deep models can be experimented to get better accuracy.
References
1. Raman V, Then P, Sumari P (2016) Proposed retinal abnormality detection and classification
approach: computer-aided detection for diabetic retinopathy by machine learning approaches.
In: 2016 8th IEEE international conference on communication software networks (ICCSN).
IEEE
2. Zhao Y (2017) Intensity and compactness enabled saliency estimation for leakage detection in
diabetic and malarial retinopathy. IEEE Transac Med Imaging 36(1):51–63
3. Prasad DK, Vibha L, Venugopal KR (2015) Early detection of diabetic retinopathy from digital
retinal fundus images. In: 2015 IEEE recent advances in intelligent computational systems
(RAICS). IEEE
4. Akram MU (2014) Detection and classification of retinal lesions for grading of diabetic
retinopathy. Comput Biol Med 45:161–171
5. Winder RJ (2009) Algorithms for digital image processing in diabetic retinopathy. Comput
Med Imaging Graphics 33(8):608–622
6. Haleem MS (2013) Automatic extraction of retinal features from colour retinal images for
glaucoma diagnosis: A review. Comput Med Imaging Graph 37(7):581–596
7. Harshitha C, Asha A, Pushkala JLS, Anogini RNS, Karthikeyan C (2021) Predicting the stages
of diabetic retinopathy using deep learning. In: 2021 6th international conference on inventive
computation technologies (ICICT), 2021, pp 1–6. https://doi.org/10.1109/ICICT50816.2021.
9358801
8. Goh JK, Cheung CY, Sim SS, Tan PC, Tan GS, Wong TY (2016) Retinal imaging techniques
for diabetic retinopathy screening. J Diabetes Sci Technol 10(2):282–94. https://doi.org/10.
1177/1932296816629491. PMID: 26830491, PMCID: PMC4773981
9. Praneel ASV, Rao TS, RamakrishnaMurty M (2019) A survey on accelerating the classifier
training using various boosting schemes within casecades of bossted ensembles. Springer SIST
series, 169:809–825
10. Agarap AF (2018) Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.
08375
11. Hanin B (2019) Universal function approximation by deep neural nets with bounded width and
relu activations. Mathematics 7(10):992
12. Wang M, Lu S, Zhu D, Lin J, Wang Z. A high-speed and low-complexity architecture for
softmax function in deep learning. In: 2018 IEEE Asia pacific conference on circuits and
systems (APCCAS). IEEE, pp 223–226
13. de La Torre J, Valls A, Puig D (2020) A deep learning interpretable classifier for diabetic
retinopathy disease grading. Neurocomputing 5(396):465–476
A Deep Learning Model for Stationary
Audio Noise Reduction
Abstract The primary aim of the paper is to reduce the noise in the audio using
deep learning techniques. Speech denoising is a long-standing problem. Given a noisy
input signal, we aim to filter out the undesired noise without degrading the signal
of interest. Noise is a widespread problem faced during calls, video recordings, and
many other situations. The proposed work focuses on short-time Fourier analysis,
which can be done on the sound wave. The Convolutional neural network (CNN)
architecture removes the noise from analyzed data. Then inverse Short-Time Fourier
analysis is used to reconstruct the sound wave. The work targets stationary noise
like wind or thermal noise and non-stationary noises like strings, music, chattering,
etc. The result is compared with existing work to show the efficacy of the proposed
framework.
1 Introduction
The audio noise reduction plays a vital role in scenarios such as traveling by bike or
bus when an individual gets an urgent call and lifts the call, but the opposite person
can hear the noise from the wind. This problem is addressed by many different signal
processing techniques. For instance, in the case of mobile phones, this problem has
already been addressed using multiple mics. When there is a need to remove noise
from recorded audio, keeping multiple mics on the phone consumes extra space,
making the mobile thicker.
The above problems can be addressed if we shift from a hardware-based solution
to a software-based solution. The current work presents one such software-based
solution using deep learning techniques. The deep learning technique used in this
work is a kind of convolutional neural network (CNN). The reason behind using a
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 483
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_48
484 S. S. Kulkarni et al.
CNN is that it can work well with 2D matrices, and the number of parameters is very
low to require less computation.
Not only in phones, this solution can be utilized in many cases where arranging
multiple mic setups is not possible due to power, cost, and space constraints (like
hearing aid). However, this work focuses on reducing noise from signals received
during phone calls. This work concentrates on noises like stationary noises like
thermal noise. Noise reduction is considered as the process of
1. Identifying the presence of noise in the audio signal
2. Removing noise from the signal
3. Reproducing the clean signal
The objective of this work is to reduce noise in audio due to air which would
be feasible for both real-time applications like calls and non-real-time applications
like audio from a video recording. Current-generation phones use multiple mics to
reduce background noise. The latest iPhones use four mics which takes more phone
space and adds cost, the RNNoise software developed by Mozilla uses handpicked
features, which causes inferior performance of the neural network. Many other works
which use non RNN solutions suffer from high latency due to their sampling rates.
Some of the applications include noise-free calls, removing noise from video
recording, removing noise caused by cochlear implants. Further noise cancellation
is a very much required mechanism in many cases. Besides that, noise reduction in the
video can help viewers retain concentration. It can be used to clean the audio before
using it as an input to other neural networks like caption maker, speech summarizer,
or sentiment analyzer. In the case of cochlear implants, it is very much necessary
because most of the patients complain of thermal noise in them. Thus, the necessity
of noise reduction mechanisms is undeniable.
2 Literature Review
learning. A paper named “Recurrent Neural Networks for Noise Reduction in Robust
ASR” [3] was published by Maas et al. It suggests an end-to-end deep recurrent neural
network with three hidden layers. It is presented how end-to-end neural networks if
they have enough data, can easily reduce a wide variety of noises.
The technical work on ‘speech processing for cochlear implants with the discrete
wavelet transform [4] suggests how differently from the traditional filter-bank spec-
tral analysis strategies are available. It is possible to analyze the speech signal using
the discrete wavelet transform (DWT). Preliminary tests were conducted to compare
the WT and the filter-bank analysis methods. Additionally, the intelligibility of the
speech processed with the proposed strategy was tested on ordinary hearing people
using the acoustic simulations, and a comparison was made with respect to traditional
CI algorithms.
The paper by Park and Lee suggests why a CNN is optimal for speech reduction
in real-time situations and provides a way to build a CNN for noise reduction and
evaluate it using SDR values [5].
3 Proposed Methodology
The central part of the architecture is the neural network. This neural network gets the
Fourier transformed data of the noisy signal as the input, processes it, and returns the
Fourier transformed data of the clean signal. First, the Fourier transform is applied
on all the audio files, including the clean files and noisy files. The Fourier transform
of the noisy audio files is sent to the input side of the neural network, and the Fourier
transform of the clean files is sent to the output side of the neural network. Then the
neural network is trained.
Once the model is trained, it is used for making predictions. The noisy audio is
taken as input, and Fourier transform is applied to it. Then this data is fed into the
neural network to get a Fourier transform of the clean audio as the output. Inverse
Fourier transform is then applied on the output to get the final noise-free output
audio.
Figure 1 denotes the training model with Fourier analysis with input parameter as
X-Train and the output parameter as Y-Train. The sample rates used for the Fourier
transform must be optimum so that it does not result in much latency due to processing
and does not degrade the speech data in the original audio.
Dataset Collection: The dataset used in this work is obtained from floydhub [6].
The dataset contains recordings of speech audio files of different persons. Then the
noises are obtained from the urban noise dataset. These clean audios are combined
with noises which makes the overall dataset. This overall data set is saved as npy
(numpy) files so that we do not need to do the audio to numpy conversions every
time we need to train the model.
Model: A CNN-based model is used for this work. The CNN takes a 128 × 128 ×
1 matrix, which is obtained from Fourier transform of the audio signal as the input.
486 S. S. Kulkarni et al.
Each layer from next aims to reduce the matrix size by half and increase the kernels
by twice. The values are converted into features until the final matrix dimensions
become 8 × 8, and the kernels become 256. From the next layer, the exact opposite
happens, and finally, we get a 128 × 128 × 1 matrix which can be used to construct
the wave later. To reserve the original values, these layers are concatenated with the
previous layers of the same size.
Figure 2 demonstrates the model visualization of various layers in the CNN model,
which uses the Fourier transform of the audio signal as the input.
Post-processing follows the method exactly reverse of the preprocessing. It
contains three steps.
1. Rescale the output values with the same parameters used for scaling.
2. Change the shape of the array as required.
3. Apply inverse short-time Fourier analysis to convert the wave from frequency to
amplitude domain.
The last stage is amplitude scaling once after applying the Fourier and inverse
Fourier analysis, the wave’s amplitude reduces. The wave has to be amplified again.
This can be done by taking averages of the input wave and denoised wave, then
multiplying the amplitudes of the denoised wave with the ratio between the mean
amplitude of the input wave and the mean average of the denoised wave. Due to
this, the speech sounds slightly louder than the original audio. Because the original
audio has noise in it and the denoised audio does not. As we are making the averages
of noisy audio and denoised audio same, this slightly increases the loudness of the
speech signal.
Huber loss is used as the cost function as the noise reduction use-case can have a
lot of outliers and they have to be handled. The model converged nearly after 200
epochs as demonstrated in Fig. 3 which includes the training and validation loss.
The Fig. 4 denotes the waveform for the clean signal without noise attenuation and
Fig. 5 represents the waveform for the noisy signal. Figure 6 denotes the waveform
of denoised signal. Figure 7 denotes the various wavelengths of frequencies for
A Deep Learning Model for Stationary Audio Noise Reduction 487
spectrogram of clean audio signal. The Fig. 8 denotes the range of spectrogram for a
noisy signal with various levels of disturbances. The Fig. 9 denotes the spectrogram
of denoised and amplified signal based on range of frequencies in the spectrum. This
shows the efficacy of the proposed architecture.
Signal to distortion ratio: SDR Eq. (1) is used to measure the error between clean
signal and denoised signal.
||y||2
SDR = 10 log (1)
|| f (x) − y||2
488 S. S. Kulkarni et al.
The paper by Park and Lee [5], which is used for comparison, gives SDR as
8.62 using 15 convolutional layers. As tabulated in Table 1, the SDR obtained by
the proposed method was 8.64 using 11 convolutional layers. The reduction in the
number of layers reduces a significant number of parameters of the neural network
and hence improves the latency and computational power requirement.
A Deep Learning Model for Stationary Audio Noise Reduction 489
Fig. 6 Waveform of
denoised signal
The proposed work provides a completely software-based solution for noise reduc-
tion, which reduces hardware cost and space requirements in phone manufacturing.
Even though this work handles many noises, many more new kinds of noises can be
added to the training set, and generalization can be achieved. Short frequency noises
like non-stationary noises can be achieved by reducing the sampling rates, which can
490 S. S. Kulkarni et al.
Fig. 9 Spectrogram of
denoised and amplified
signal
Table 1 Comparative
Park and Lee [5] Proposed work
analysis
SDR 8.62 8.64
No. of layers 15 11
cause more latency which the present-day mobiles cannot handle. This whole work
is based on python. Cython can be used to optimize the builds, and then the resultant
files can be used for the end deployment.
References
Abstract 360° image enables the user to interact with the view and explore the
whole environment around the camera. As there can be infinite number of viewports
in a 360° image, the work of viewer becomes cumbersome and confusing. This work
aims to study 360° images and to classify those into 10 different categories based on
places and then to predict the viewpoint using deep CNN architectures. This study
explores the advantages of transfer learning and uses the same to create a classifier
to classify the 360° images into different categories. Further, two approaches are
proposed to predict the viewpoint in a 360° image to recommend the viewer best
viewport.
Keywords 360° images · Viewport prediction · Best view synthesis · CNN · Place
classification · Spherical video
1 Introduction
360° images (Spherical image) and videos have taken over the world because of its
effectiveness in providing the users with a rich immersive experience to the view or
scenario. With the emergence of virtual reality as a mainstream trend, 360° images
and videos have become more and more popular. The users can explore the whole
view using their mobile phone or computer screen.
Due to COVID19 lockdown, there is a huge growth in interest of people for
virtual tours of places. Cheap 360° consumer cameras help to supercharge the 360°
contents on internet. 360° photos and videos provide a controllable spherical view
that surrounds the center point from which the shot was taken. These images provide
the user with the flexibility to look around the place where the shot was taken.
However, as the viewer can view only a part of the whole sphere at any particular
time, it is a tedious task for the viewer to identify where to look at.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 491
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_49
492 S. Raj and A. Mahapatra
Identifying the best viewport is one main problem for which a solution is sought
after. There are only few attempts and researches in this area. Classifying 360°
images into different categories can actually make use of the many already established
models that are trained over millions of normal limited field of view images. Though
the curved view in 360° images have different features compared to limited field of
view images, the study shows that working on an already established model provides
better accuracy. The limited availability of data is one of the biggest challenges in
establishing an accurate model.
Deep learning models helps in studying the complex features of images that helps
in distinguishing them into different categories. Several state-of-art deep learning
architectures successfully classified limited view images into various categories.
Dataset is one such challenging resource that need to be fed to the model for good
results. Both quality of coverage and quantity of images in the dataset greatly influ-
ences the end results. Various standard benchmarks for datasets are available for
limited view images. The number of datasets available for 360° image processing is
very limited.
Captured pictures have the potential to hold one or more important viewports.
The number of subjects in a limited field of view image is lesser compared to a 360°
image. Often all the subjects in a 360° image are not of importance to the user. The
most important or interesting subject in a 360° image is called as the viewpoint.
It helps in showing the user the most important subject of interest in the picture.
They need not turn the 360° image around to check out the main subject of interest.
Viewpoint provides the user, the hint on main subject of interest and based on the
same they can decide if they want to see beyond the viewpoint.
The objective of this work is to explore 360° images, various suitable architectures
and methods that can be used to classify the images and predict the viewpoint in the
given 360° image.
The rest of the paper is organized as follows. In Sect. 2, the literature survey
required for the work is discussed. Then in Sect. 3, the existing tools and technologies
available which are used for the implementation are explained. Section 4 demon-
strates all the implementation details and the results. Finally, the conclusion and
Results are discussed in Sect. 5.
2 Literature Review
The study related to this work is done in (1) 360° images, (2) choosing dataset, (3)
preprocessing techniques used in 360° image (4) various architectures and method-
ologies. The author Xiao et. al. devises a model to classify given limited view photo
to the place category of the panorama to which it belongs [1]. The dataset consists of
various panoramas that is categorized based on place shown using amazon mechan-
ical truck workers. The model is first trained with the limited view photos from the
panorama images. The model takes advantage of the symmetry of images to finds
the best viewpoint of the observer, that is the direction towards which the observer
Optimizing Deep Neural Network … 493
is facing the view. They used a two-staged greedy algorithm. In the first stage it
predicts the category of limited view photo. There are 26 panorama categories. The
given limited view photo will be classified to one of the 26 categories. In the second
stage of the algorithm the panorama will be aligned and the observer’s direction is
identified. The model in addition to these also finds the average representation of
each panorama category. Given a limited view photo, the view beyond the given
image can be filled up using the average representation of panorama. They have used
the support vector machine (SVM) model for the classification of photo and identifi-
cation of the viewpoint. In the first iteration of training they used the sample limited
view photos from one panorama to train the model. Then used the model to predict
for new panoramas. The one with highest confidence score is then used to train the
model again. Thus, the implemented a greedy approach to train the model. They
got an accuracy of 51.9% for SUN360 dataset place classification, 50.2% for view-
point prediction and 24.2% accuracy combining place classification and viewpoint
prediction.
The authors Raja et al. worked on a KNN classifier to achieve reasonable perfor-
mance on classification of images into indoor and outdoor by using limited resources
[2]. The work provided better accuracy considering the limited resources used. Their
work mainly consists of training the KNN model using a small dataset and using
the features learned during the training phase to predict for new queries. Instead of
using RGB color model for the images, the authors choose to use HSV color model.
HSV color model is based on how human perceives colors and hence it is considered
a more natural way to capture the details. The low-level features of the images like
color, texture, entropy is considered important factors that will help in better accu-
racy. The proposed methodology was experimented in two different datasets. The
first dataset IITM-SCID2 consist of 907 images and the second one was consisted
of 3442 images that are downloaded from the internet.
The authors Li et al. devised a model, viewport-based convolutional neural
network (V-CNN) for viewport predictions [3]. The work is based on studying the
real head movement and eye movement of the observer viewing 360 video images.
The work consists of three stages. The first stage consists of viewpoint prediction
network. In this stage the various viewpoints from the 360° images are identified
from the data. The second stage consist of viewpoint alignment methods and view-
point quality network. The viewpoint alignment method consists of techniques to
map the angular viewpoint obtained in first stage to a flat plane. This helps in smooth
processing of the viewpoints in further stages. The quality of the aligned view-
point is assessed using the viewpoint quality network. In the final stage, the video
quality assessment (VQA) of videos is made using the score of individual view-
points obtained in the second stage. The score of all viewpoints from the second
stage for a given video is integrated together to form the video quality assessment
score of videos. They used CNN network as the base model. They performed the
experiment on 360° video dataset VQA-ODV. It consists of 540 360° videos. They
used 432 for training purpose and 108 for testing purpose. In addition to the videos,
the dataset consists of head and eye movement of 200+ subjects while watching the
494 S. Raj and A. Mahapatra
video. Accuracy of the system is measured in Normalized scan path Saliency and
correlation coefficient.
The author Shahryar Afzal et al. studied YouTube 360° videos in comparison
to normal videos and found that 360° videos demand about 3 times the resolution
of normal videos [4]. This causes higher bandwidth usage of the network. They
developed a technique to find the resolution requirement of 360° videos based on
the field of view of VR headset. These techniques help to reduce the bandwidth
requirement of 360° videos in a network.
The authors Ali Caglayan et al. created a new standard database for image clas-
sification problems [5]. The new database consists of 10 million images categorized
into 365 different categories. The exhaustiveness of the database was then measured
with respect to various classic architectures like GoogleNet, VGG and ResNet etc.
The study showed that the accuracy of these architectures that were trained on top of
this dataset-place365 have better performance that the other popular databases like
ImageNet88 and SUN88. The models that was solely trained on Places365 dataset
only showed 6% loss compared to other models that were trained in addition to 1.2
million images. The study shows the exhaustiveness and robustness of the Place365
dataset. The standard datasets like SUN contains a large coverage of various place
and scene categories. Though the dataset was exhaustive, it lacked the quantity. The
deep learning algorithms need to be fed with large amount of data for good results.
The SUN contained 397 categories and a total of around 1 lakh images. The places365
dataset attempted to overcome this drawback and succeeded in the effort. The Places
dataset contains 10,624,928 images from 434 different categories. There are four
benchmarks for this dataset and they are Places365-standard, Places35-Challenge,
Places205 and places88. Each of the subset contains different quantity of images for
training, validation and testing.
The authors Qin Yang et al. a model to predict the future viewport the users will
be interested in based on eye movement while watching 360 videos [6]. Further they
extend the study to find the interested viewports in a future duration also called as
viewport trajectory. They also combined RNN and CFVT to explore the correla-
tion between viewport and video content. They were able to improve the accuracy
up to 40%. They used CNN for viewport prediction, RNN for viewport trajectory
prediction and CFVT for content aware viewport prediction. The dataset used for
the experiment consist of head motion data of 153 different volunteers for each of
16 different 360-video images. There were 985 views in total. The viewpoint trajec-
tory predicts the various viewpoints the observer may look at in future duration. For
achieving this, they used many CNN models. Each of the CNN model predicted
one of the possible viewpoints from the trajectory. Thus, combining the result of all
the models, there will be many viewpoints predicted from the head movements of
the observer. The correlation filter-based viewpoint tracker performed content aware
viewpoint predictions. It helps in finding a target in a normal video. The spherical
image is mapped to a plane surface before given to process via the model. The
combination of RNN to CFVT helped in increasing the accuracy of the model.
The author Karen Simonyan et al. designed the classic Convolutional Network
Architecture, VGG16 [7]. The architecture was a breakthrough for its time. The
Optimizing Deep Neural Network … 495
authors studied the performance of deep CNN by varying the depth of convolutional
layers and keeping all other parameters constant. The best of the same models won
the ImageNet challenge 2014. They designed about 5 models by varying the number
of convolutional layers in each of them. All other parameters are kept constant. All
the models consisted of 5 max pool layer, 3 fully connected layer and a final soft
max layer. The number of convolutional layers varied from the 1st model to the 5th
model. They started with 8 convolutional layers in the first and added more convo-
lutional layer to subsequent model. The last model has 16 convolutional layers. Not
all convolutional layer is followed by a max pool layer. The width of convolutional
layer is rather small, starting from 64 in the first layer and increases by a factor of 2.
The number of parameters required is considerably small compared to other shallow
neural network with larger widths. The model was trained on ImageNet dataset and
have shown good performance.
3 Proposed Methodology
Two different approaches are proposed to predict the viewpoint in a 360° image. The
first approach is viewpoint prediction using feature map (Sect. 3.1) and the second
one is viewpoint prediction using object detection models (Sect. 3.2).
Feature map of an image is the feature vector extracted by the layers of convolutional
neural network. The convolution operation on a convnet filter and an input image
results in a feature map. Each layers of a CNN are capable of detecting specific
features in an image. The filters in lower layers helps in detecting subtle features like
edges, lines etc. whereas the filters in higher layers captures for complex patterns
specific to a class. A model trained to classify images will learn specific common
feature in that class and will show the same as the most activated region for that class.
For Example, what distinguish a bedroom from a classroom is the presence of bed
in the bedroom. There is a high probability that a model trained to classify bedroom
from a classroom will capture the bed as the most activated region for all images
that contain a bed. And most often bed becomes one of the most sought-after sight
in a bedroom and hence it is a potential viewpoint. That is, what the model captures
in a class as the most activated region for a class of images has the potential to be
a viewpoint. The intuition in using this approach is that the final feature map of an
image will contain the subject that is specific to a given class, often the viewpoint,
as the most activated region. The Fig. 1 shows the flowchart of first approach.
Various pre-trained models are trained to fit and classify 360° image dataset
SUN360 to different classes. The dataset contained 80 categories and a total of
67,583 panoramic images. Due to computational constraints, only a total of 700
496 S. Raj and A. Mahapatra
In this approach, the potential of object detection model to predict the viewpoint
in a 360° image is explored. The flowchart for the same is illustrated in Fig. 4. The
difficult task in using object detection model is to Pre-process the data. Pre-processing
of data require annotating the viewpoints in each of the 360° image. Annotating the
viewpoint is manual work. The Sun 360 dataset is used for this experiment [1]. The
viewpoint is marked in an XML file corresponding to an image. In case an image
contains more than 1 viewpoint, the same is annotated by mentioning the diagonal
coordinates of those viewpoints in the XML file. 100 images are used for training,
30 images for testing in each class. Figure 3a and b shows a sample frame and the
annotated viewpoint respectively. The Faster RCNN is used to train and fit the dataset.
Two different approaches are proposed in this study to predict the viewpoint in 360°
image.
498 S. Raj and A. Mahapatra
It is observed from the Table 1 is that the classification result obtained from
pretrained model of RESNET50 is better compared to the method of SVM proposed
in [1]. However, the first approach gave poor performance on subjective evaluation.
The second approach have shown better results. The metric mean Average Preci-
sion (mAP) is used to evaluate the model used in the latter approach. Average Preci-
sion (AP) is very similar to F1-score in that it provides a harmonic mean value
between precision and recall. Higher the score better is the model in predicting the
viewpoint. The mAP and AP become the same only when a single class of view-
point is detected. Keeping the learning rate at 0.0001 and varying the epochs, the
Tables 2 and 3 shows the AP score obtained for two different classes. It contains
the various mAP score obtained for two different classes trained separately on the
model proposed in this paper. The score depends on the type of viewpoint in each
class as well. For example, for bedroom the viewpoint is less complex compared to
the viewpoint in a church. In bedroom, mostly the viewpoint is the bed whereas in
the church the viewpoint is the altar. The altar has more complex features compared
to a bed in terms of pattern, arrangements, the details and it varies a lot from one
image to another within a class. So even with more epochs the mAP score obtained
is only 61%, compared to the class bedroom which have a mAP score of 86%.
Based on the results of object detection model, the viewpoint in 360° images is
predicted. The Fig. 5 shows original image, manual annotations and prediction by
the model. The so obtained results from the trained model can be used for viewpoint
detection. The model has shown an average recall value of 75% and an average
precision value of 65%. The recall is higher for the model and it means that model
detects most of the viewpoints correctly. In the sample prediction shown in Fig. 5c,
it is clear that the model predicted the positive sample correctly. Along with that,
the reflection of the bed in the mirror and object like table got predicted as false
positives.
A comparative evaluation of the model based on Faster-RCNN proposed in this
study and various others are shown in the below table. Various authors have used
Fig. 5 a Original 360° image, b manually annotated image, c viewpoint predicted by model
different metrics to evaluate their results. Xiao et al. [1] evaluated the viewpoint
prediction using the metric precision. The model proposed in this study got better
result compared to the same. The model based on YOLO architecture used by Yang
et al. [8] obtained a mAP score of 30.29% for 360° image. Thus, on evaluation of
both metric with the existing methodologies, the proposed model has obtained better
results (Tables 4and5).
Table 4 Comparative
Model mAP
analysis of models (mAP)
Proposed 70.5
Yang [8] 30.29
Table 5 Comparative
Model Precision
analysis of models (Precision)
Proposed 65
Xiao [1] 50.2
500 S. Raj and A. Mahapatra
The proposed work is implemented to study 360° images and various deep learning
architectures through which the 360° images can be classified into various classes
and to predict viewpoints. This study uses the transfer learning approach to classify
the images into 10 categories. Viewpoints extracted from the Feature map of images
in classification models give lesser accuracy as compared to object detection models
like Faster-RCNN. It showed good results compared to existing models. 86% mAP is
obtained in predicting the viewpoint in the Bedroom class. This work can be further
extended to experiment with larger datasets and more classic and new state of art
architectures.
References
1. Xiao J, Ehinger KA, Oliva A, Torralba A (2012) Recognizing scene viewpoint using panoramic
place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition.
IEEE, pp 2695–2702
2. Raja R, Roomi SM, Dharmalakshmi D, Rohini S (2013) Classification of indoor/outdoor
scene. In: 2013 IEEE International Conference on Computational Intelligence and Computing
Research. IEEE, pp 1–4
3. Li C, Xu M, Jiang L, Zhang S, Tao X (2019) Viewport proposal CNN for 360 video quality assess-
ment. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
IEEE, pp 10169–10178
4. Afzal S, Chen J, Ramakrishnan KK (2017) Characterization of 360-degree videos. In:
Proceedings of the Workshop on Virtual Reality and Augmented Reality Network, pp 1–6
5. Caglayan A, Imamoglu N, Can AB, Nakamura R (2020) When CNNs meet random RNNs:
towards multi-level analysis for RGB-D object and scene recognition. arXiv preprint arXiv:
2004.12349
6. Yang Q, Zou J, Tang K, Li C, Xiong H (2019) Single and se-quential viewports prediction for
360-degree video streaming. In: 2019 IEEE International Symposium on Circuits and Systems
(ISCAS). IEEE, pp 1–5
7. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556
8. Yang W, Qian Y, Kämäräinen JK, Cricri F, Fan L (2018) Object detection in equirectangular
panorama. In: 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, pp
2190–2195
ConvNet of Deep Learning in Plant
Disease Detection
Abstract Balancing the yield and population of the country is most important factor
faced by the farmers in the agricultural field. The maximum number of farmers
all over the world are still struggling due to natural disasters, unexpected rainfall,
nutrient deficiency in the soil, etc. But, above all, the major key problem is pest
infection. Many researchers used various techniques to find out the plant diseases.
Deep learning technique is widely used to find out a solution to the image-oriented
problems using convolutional neural network. CNN (ConvNet) neural network model
of deep learning is effective and efficient technique for analysing an image. This work
compares the different models used to detect plant diseases using CNN. Finally, this
research paper outlines the existing achievements, limitations and suggestions for
future plant disease detection research using the convolutional neural network.
1 Introduction
Plant infections are regular problems in the farmers’ life, since most of them are
unaware of Plant Pathology, study of plant diseases. Normally, plant diseases are
classified under the two major categories, Biotic and Abiotic. Plants infected due
to microorganism come under Biotic as well as in Abiotic plant diseases caused by
natural factors like temperature, rainfall, humidity, etc. (Fig. 1) [1].
Population pressure is one of the most important facts that rules the Indian Agricul-
ture. Directly or indirectly seventy percentage of the population depends on farmers
for their food. Agricultural production is basically affected by various problems like
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 501
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_50
502 J. Gajavalli and S. Jeyalaksshmi
Plant Diseases
Biotic Abiotic
Factors: Factors:
Virus Temperature
Bacteria Rainfall
Fungus Humidity
Nutrient
deficiency
Table 1 Crops affected by microorganisms table describes some microorganism causing plant
disease images, disease name, affected crops and their symptoms
Infected Leaves images Diseases Crops affected Symptoms
Black rot Brassicas Leaves margin with
yellow to light brown
image processing. In the next milestone of image processing, the deep learning
technique is widely represented to implement the identification of affected region of
the plant; especially convolutional neural network models have created and achieved
the required accuracy rate in the findings.
2 Image Processing
The image processing concept is used to perform some operations on digital images
that enable to find the solution for image related problems. Image processing consists
of a series of phases that are Image Acquisition, Image preprocessing, Image
segmentation, Feature extraction and detection and classification (Fig. 3) [6].
504 J. Gajavalli and S. Jeyalaksshmi
Image
Acquisition
Detection Image
& Pre-
Classification processing
Feature Image
Extraction Segmentation
Image acquisition phase is a process of collecting image inputs. Digital form of data
is collected using hardware’s like Cameras and Scanners, etc. This is the initial step
of the image processing.
This stage is used to improve the image for further operations like removing unwanted
distortions and enhancing the image features for further analysis. Important tech-
niques were used for preprocessing such as sizing, noise reduction, brightness
corrections and geometric transformations.
ConvNet of Deep Learning in Plant Disease Detection 505
In image segmentation stage, the input image is divided into multiple portions that
are used to implement the similar portions of similar features for further analysing.
Understanding the image details with the required parameters extracted from the
image is known as feature extraction phase. An extracted piece of information will
describe more details about an image. This feature’s parameters are texture, shape
and colour of an image, which plays an important role in further processing.
Classification and detection phase is used to find out the actual region of an image,
based on the extracted features. Various classification and detection methodologies
are implemented to identify the diseased portion of the leaf. The familiar classifica-
tion methods are artificial neural network, probabilistic neural network, K-nearest
neighbour, SVM and back propagation, etc.
3 Deep Learning
Deep learning is the subsection of machine learning. Here learning and predicting of
the model is trained using series of neurons, like functions of the human brain. Deep
leaning algorithm creates the features and models by its own, not by the manual
definition. More labelled data is required to train a deep learning model for best
accuracy [7]. The information is processed by the many hierarchical layers of deep
learning in a non-linear manner, in that the lower level features and concepts help to
define the higher level features and concepts. The following are famous types of deep
learning neural networks, such as convolutional neural networks (CNN), recurrent
neural networks (RNN), artificial neural networks (ANN) (Fig. 4) [8].
CNN of deep learning is used to perform computer vision on an image and video
recognition. CNN is advanced than ANN, this neural network model is used in
506 J. Gajavalli and S. Jeyalaksshmi
Fig. 4 This deep neural network diagram shows the process flow of finding the predicted output
[9]
Fig. 5 CNN diagram describes the architecture of convolutional neural network [11]
image recognition and classification. CNN works on pixel of images, then train the
model for extracting the features for better classification [10]. The layers of a CNN
consist of (i) input layer, (ii) output layer, (iii) hidden layer that includes, (a) multiple
convolutional layers, (b) pooling layers, (c) fully connected layers (d) normalization
layers (Fig. 5).
The existing research work on plant disease detection focuses on the traditional
method of image processing concepts and its phases used to detect the diseased part
of an input image. Currently, researchers are turning to focus on the convolutional
neural network architecture of deep learning for detecting the infected region from
leaf. The familiar convolutional neural network milestone models are listed in Table
2 [12].
This research paper is a comparative study of research on CNN architecture which
is used by various researchers to detect the plant diseases. This study is structured
with the collected information in comparative form by the following three phases:
Data collection, data augmentation and data detection and classification (Fig. 6).
ConvNet of Deep Learning in Plant Disease Detection 507
This phase describes about the dataset collection by the various researchers and it
defines about the capturing the images directly from the field, existing repositories
508 J. Gajavalli and S. Jeyalaksshmi
and certain dataset. Image quality is most important for further processing, whilst
collecting the images quality measures are adhered by the researchers. Table 3 listed
the source dataset according to the plant.
The data augmentation phase is used to increase the amount of dataset by adding
slightly modified copies of existing images in the dataset. Popular augmentation
techniques are Crop, Scale, Flip, Rotation, Translation, sizing and Gaussian Noise.
Normally, image preprocessing is recommended to remove unwanted portion and
ConvNet of Deep Learning in Plant Disease Detection 509
enhancing the image for further processing [27]. CNN input layer accepts the images
in the following sizing 227 × 227 for AlexNet, 224 × 224 for other architecture like
DenseNet, ResNet and VGG Table 4, [18].
Table 4 Comparison of data augmentation phase of CNN describes the sizing and augmentation
techniques used in their research work
Authors and year Plant common name Image sizing dimension Augmentation
techniques
Prabhakar Tomato AlexNet 227 × 227 Rotation translation
et al. [18] other network 224 × scaling
224
Chhillar Corn, Pepper, Potato, Image dimension in 256 Flipping, rotation,
et al. [19] Tomato × 256 zooming, flipping
Toda et al. [20] 14 crop species 224 × 224 N/A
Konstantinos et al. Apple, Watermelon, 256 × 256 Size reduction,
[22] Banana, Strawberry, cropping
Blueberry, Tomato,
Cabbage, Squash,
Cantaloupe, Orange,
Cassava, Eggplant,
Celery, Potato, Cherry,
Peach, Corn, Pumpkin,
Cucumber, Raspberry,
Gourd, Soybean,
Grape, Onion, Pepper
Darwish et al. [23] Maize 256 × 256 Rotation, shear
Fill mode, width
shift
Height shift,
horizontal Flip,
zoom
Gupta et al. [24] Tomato, Apple, Corn, N/A Resize,
Potato, Grapes segmentation
Crop, flipping
rotation, zooming
Noise Removal
Background
Removal
Gutierrez et al. [25] Tomato N/A Flipping, rotation,
crop
Wang et al. [26] Apple Shallow Networks—256 Resize
× 256 Flipping
VGG16, VGG19, and Rotation
ResNet50—224 × 224 Zooming
InceptionV3—299 ×
299
510 J. Gajavalli and S. Jeyalaksshmi
Table 5 The comparison of various CNN models used for identifying the plant diseases
Authors and year Plant name CNN architecture Accuracy rate
Prabhakar et al. [18] Tomato ResNet101 Training—97.6%
and testing—94.6%
Chhillar et al. [19] Corn, Pepper, Potato, CNN 96.54% accuracy
Tomato achieved
Hari et al. [21] Maize, Grape, Tomato, PDDNN—17 layers Implemented from
Potato Scratch and produces
accuracy of 86%
Konstantinos et al. [22] Apple, Watermelon, VGG 99.53%
Banana, Strawberry,
Blueberry, Tomato,
Cabbage, Squash,
Cantaloupe, Orange,
Cassava, Eggplant,
Celery, Potato, Cherry,
Peach, Corn, Pumpkin,
Cucumber, Raspberry,
Gourd, Soybean,
Grape, Onion, Pepper
Darwish et al. [23] Maize VGG16 97.9%
VGG19 97.7%
AE Model 98.2%
Gupta et al. [24] Tomato, Apple, Corn, VGG13 95.21%
Potato, Grapes
Gutierrez et al. [25] Tomato Faster RCNN 82.51%
Wang et al. [26] Apple VGG16 90.40%
Lu et al. [29] Rice AlexNet 95.48%
Mohanty Sharada et al. 14 crop species AlexNet and 99.34%
[30] GoogleNet
This phase explains the detection and classification of plant diseases that are exam-
ined only using different CNN architecture for image classification. The most
familiar CNN architectures are [28]: (i) LeNet-5, (ii) AlexNet, (iii) VGG-16, (iv)
Inception-V1, (v) Inception-v3, (vi) ResNet-50, (vii), Xception, (viii) Inception-v4,
(ix) Inception—ResNets x) ResNeXt-50 (Table 5) (Fig. 7).
5 Conclusion
Automation in the agricultural field is essential for balancing the population pres-
sure. Automatic plant disease detection is a required automation in the agricultural
ConvNet of Deep Learning in Plant Disease Detection 511
CNN Models
Fig. 7 The accuracy rate of different models of the CNN architecture is visualized using above
chart
field to promote the production, which helps to decrease the farmers’ loss. Automatic
plant disease detection is achieved by traditional image processing methods and algo-
rithms. Currently, many researchers are focussed on deep learning of CNN archi-
tecture for analysing an image and detecting the required region, especially in plant
disease detection various CNN models are involved to identify the diseased region.
This comparative study highlights many researcher’s implementation methods on
plant disease detection using convolutional neural network of deep learning. The
collected details are projected in the three phases of CNN (i) data collection, (ii) data
augmentation and (iii) data detection and classification. This research motivates to
find the best CNN architecture in plant disease detection in future.
References
1. Vishnoi VK, Kumar K, Kumar B (2021) Plant disease detection using computational intel-
ligence and image processing. J Plant Diseases Protection 128(1):19–53. https://doi.org/10.
1007/s41348-020-00368-0
2. Goyal SK, Rai JP, Singh SR (2016) Indian agriculture and farmers problems and reforms
3. Jeyalaksshmi S, Rama V, Suseendran G (2019) Data mining in soil & plant nutrient manage-
ment, recent advances and future challenges in organic crops. Int J Recent Technol Eng 8(2)
S11:pp 213–216. https://www.ijrte.org/wpcontent/uploads/papers/v8i2S11/B10350982S1119
4. https://www.world-grain.com/articles/13645-focus-on-india
5. Rama V, Jeyalaksshmi S (2019) Data mining based integrated nutrient and soil management
system for agriculture—a survey. CIKITUSI J Multidisciplinary Res 6(5):144–147, ISSN NO:
0975-6876. http://www.cikitusi.com
6. Devaraj A, Rathan K, Sarvepalli J, Indira K (2019) In: IEEE 2019 international conference
on communication and signal processing (ICCSP)—Chennai, India (2019.4.4–2019.4.6)]. In:
2019 International conference on communication and signal processing (ICCSP)—identifica-
tion of plant disease using image processing technique, pp 0749–0753. https://doi.org/10.1109/
ICCSP.2019.8698056
512 J. Gajavalli and S. Jeyalaksshmi
7. https://www.smlease.com/entries/technology/machine-learning-vs-deep-learning-what-is-
the-difference-between-ml-and-dl
8. https://roboticsbiz.com/different-types-of-deep-learning-models-explained
9. https://commons.wikimedia.org/wiki/File:MultiLayerNeuralNetworkBigger_english.png
10. Amara J, Bouaziz B, Algergawy A (2017) A deep learning-based approach for banana leaf
diseases classification. In: Mitschang B, Nicklas D, Leymann F, Schöning H, Herschel M,
Teubner J, Härder T, Kopp O, Wieland M (Hrsg.) Datenbanksystemefür Bus Technol und Web
(BTW 2017)—orkshopband Bonn Gesellschaft fürInformatike V (S. 79–88)
11. https://docs.ecognition.com/eCognition_documentation/User%20Guide%20Developer/8%
20Classification%20-%20Deep%20Learning.htm
12. https://machinelearningmastery.com/review-of-architectural-innovations-for-convolutional-
neural-networks-for-image-classification/
13. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document
recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
14. Krizhevsky A, Sutskever I, HintonImageNet GE (2012) classification with deep convolutional
neural networks part of advances in neural information processing systems 25 (NIPS 2012)
15. Simonyan K, Zisserman A (2015) Computer science > computer vision and pattern recogni-
tion [Submitted on 4 Sep 2014 (v1), last revised 10 Apr 2015 (this version, v6)] Very deep
convolutional networks for large-scale image recognition
16. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 1–9
17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
18. Prabhakar M, Purushothaman R, Awasthi DP (2020) Deep learning based assessment of disease
severity for early blight in tomato crop. Multimed Tools Appl 79:28773–28784. https://doi.org/
10.1007/s11042-020-09461-w
19. Chhillar A, Thakur S (2021) Plant disease detection using image classification. In: Tiwari S,
Suryani E, Ng AK, Mishra KK, Singh N (eds) Proceedings of international conference on big
data, machine learning and their applications. Lecture notes in networks and systems, vol 150.
Springer, Singapore. https://doi.org/10.1007/978-981-15-8377-3_23
20. Toda Y, Okura F (2019) How convolutional neural networks diagnose plant disease. Plant
Phenomics, p 14, ArticleID 9237136. https://doi.org/10.34133/2019/9237136
21. Hari SS, Sivakumar M, Renuga P, karthikeyan S, Suriya (2019) Detection of plant disease by
leaf image using convolutional neural network. In: 2019 International conference on vision
towards emerging trends in communication and networking (ViTECoN), pp 1–5. https://doi.
org/10.1109/ViTECoN.2019.8899748
22. KP Ferentinos (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agricult 145:311–318, ISSN: 01681699. https://doi.org/10.1016/j.compag.2018.01.
009.https://www.sciencedirect.com/science/article/pii/S0168169917311742)
23. Darwish A, Ezzat D, Hassanien AE (2020) An optimized model based on convolutional neural
networks and orthogonal learning particle swarm optimization algorithm for plant diseases
diagnosis. Swarm Evol Comput 52:100616, ISSN: 22106502. https://doi.org/10.1016/j.swevo.
2019.100616. https://www.sciencedirect.com/science/article/pii/S2210650219305462
24. Gupta S, Garg G, Mishra P, Joshi RC (2021) CDMD: an efficient crop disease detection and
pesticide recommendation system using mobile vision and deep learning. In: Tiwari S, Suryani
E, Ng AK, Mishra KK, Singh N (eds) Proceedings of international conference on big data,
machine learning and their applications. Lecture notes in networks and systems, vol 150.
Springer, Singapore. https://doi.org/10.1007/978-981-15-8377-3_25
25. Gutierrez A, Ansuategi A, Susperregi L, Tubío C, Rankić I, Lenža L (2019) A benchmarking
of learning strategies for pest detection and identification on tomato plants for autonomous
scouting robots using internal databases. J Sens. 2019:15. ArticleID 5219471. https://doi.org/
10.1155/2019/5219471
ConvNet of Deep Learning in Plant Disease Detection 513
26. Wang G, Sun Y, Wang J (2017) Automatic image-based plant disease severity estimation using
deep learning. Comput Intell Neurosci 2017:8. https://doi.org/10.1155/2017/2917536
27. Srivastava P, Mishra K, Awasthi V, Sahu V, Pawan Kumar P (2021) Plant disease detection
using convolutional neural network. Int J Adv Res 09:691–698. https://doi.org/10.21474/IJA
R01/12346
28. https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
29. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep
convolutional neural networks. Neurocomputing 267:378–384, ISSN: 09252312. https://doi.
org/10.1016/j.neucom.2017.06.023. https://www.sciencedirect.com/science/article/pii/S09
25231217311384
30. Mohanty Sharada P, Hughes David P (2016) Salathé Marcel using deep learning for
image-based plant disease detection frontiers in plant science 72016:1419. https://doi.org/
10.3389/fpls.2016.01419. https://www.frontiersin.org/article/10.3389/fpls.2016.01419. ISSN:
664-462X
Recognition of Iris Segmentation Using
CNN and Neural Networks
1 Introduction
Nowadays, biometric recognition has become a solid method for identifying and
recognizing individuals based on physiological or behavioral features. Traditional
means of identity verification, passwords and identity cards, for example, are not
always trustworthy since they might be forgotten or stolen. Biometric identification
has been utilized in security systems such as authentication and information protec-
tion. Biometric technologies use behavioral (such as handwriting) or physiological
(such as fingerprint, face, and iris) features to correctly validate human identification.
In comparison to other biometric techniques, iris recognition has the best accuracy
in identifying individuals among various biometric technologies [1]. If a system can
automatically identify a human person based on differences in biological features
among humans, it will be revolutionary; it is referred to as biometric recognition [2].
S. Jeyalaksshmi (B)
Vels Institute of Science, Technology and Advanced Studies, Chennai, Tamil Nadu, India
e-mail: [email protected]
P. J. Sai Vignesh
Rajalakshmi Engineering College, Thandalam, Chennai, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 515
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_51
516 S. Jeyalaksshmi and P. J. Sai Vignesh
Iris texture patterns are thought to be unique to each individual and even to the
two eyes of the same person. It is also stated that the iris patterns of a specific
person change rarely beyond youth. In investigations to date, very high recogni-
tion/verification rates for iris recognition systems have been recorded [3]. As a result,
the iris is regarded as the most accurate and reliable means of identifying people,
and it has attracted a lot of attention in the recent decade [4]. In this paper, we used
CNN and neural networks for better iris recognition.
2 Related Works
Shashidhara and Aswath [5] demonstrated an iris area segmentation method for iris
recognition. The human iris is a one-of-a-kind feature that differs from person to
person. Human irises are unique, much like fingerprints, according to biological
research. In addition, any vision-capturing device may simply access the iris. The
iris’s two-dimensional structure makes the technology much more useful.
Hu et al. [6] created a unique approach for improving color iris segmentation
accuracy and reliability in both static and mobile device captures. Our approach
is a fusion technique that selects segmentation results from a number of different
methods. To begin, we propose and investigate a three-model iris segmentation
framework, demonstrating that selecting among the three models’ outputs can result
in improvements.
Sreeja and Jeyalakshmi [7] presented the issues and disputes with existing iris
biometric systems.
CNN-based Hofbauer et al. [8] pupil classification systems have been demon-
strated to outperform traditional iris segmentation techniques in terms of segmented
error metrics. They created a method for parameterizing CNN-based segmentation
that bridged the gap between the rubber sheet transform and CNN.
Abiyev and Altunkaya [9] presented a NN based for physical identity information;
a biometric security technology is used. The location of the iris area and the creation
of an iris image dataset are the first steps in personal identification, followed by iris
pattern recognition. The iris is taken from an eye picture and represented as a dataset
after normalization and augmentation. Using this data collection, a NN is used to
categorize iris patterns.
Existing Arsalan et al. [10] iris identification systems are largely reliant on certain
circumstances, picture capturing distance, and the stop-and-stare environment, for
example, both need significant user engagement. In a non-cooperative scenario, they
introduced a two-stage CNN-based approach for identifying the true iris border in
noisy iris images [11, 12].
Recognition of Iris Segmentation … 517
3 Proposed System
One of the most difficult aspects of iris recognition is capturing an elevated picture of
the iris by an operator. First, pictures of the iris with adequate resolution and crispness
to allow identification are desirable. Second, without a particular amount of light, it
is crucial that the inner iris design has a lot of contrast. In order to circumvent this,
we used iris pictures from the CASIA database (Fig. 1).
If the recorded image is in color (RBG), before being stored for further processing,
it is transformed to gray scale. One picture must be used to detect the inner circle,
while the other must be used to detect the outside circle. The picture is just captured
once to locate the pupil area. The image is transformed to black and white when a
certain threshold value is reached.
From one side of the image to the other, a vertical scan is performed. We will receive
a tangent to the circle on the left if the scan starts on the left side of the picture, for
example. The other tangent is produced as a result of this operation. The diameter is
determined by the distance between these two points, and the radius and center are
derived as a result.
518 S. Jeyalaksshmi and P. J. Sai Vignesh
To get the iris’s outer boundary, step ii is done with a high value (0.38).
The center of both circles remains constant since they are concentric. To obtain a
tangent, a horizontal scan is performed from this point. As a consequence, the iris’s
outer edge radius is calculated. The radii for the iris’ inner and outer rings, as well
as the center, were sent to us. The final step is to use the equations below to create
the circles.
7 Segmentation
7.1 CNN
As indicated in Eq. (3), the candidate input value is shifted between 0 and 1 using
the sigmoid-based activation function, meaning that the output becomes zero for
negative inputs, but for big positive inputs, the output becomes 1.
The iris patterns in this article are identified using a neural network (NN). The
normalized and improved iris picture is represented as a two-dimensional array in
Recognition of Iris Segmentation … 519
this method. This array stores the texture of the iris pattern’s grayscale values. The
input signals to a neural network are represented by these numbers. Figure 2 depicts
the NN architecture. The NN employs two hidden layers.
X1, X2,…,Xm are grayscale input array values that characterize the iris texture
information, whereas P1, P2,…,Pn are output patterns that characterize the irises
(Fig. 3).
From Table 1 and Fig. 4 shows the accuracy of CNN and neural network in iris
segmentation. We can conclude that CNN has better accuracy than neural network
in iris recognition method.
8 Conclusion
The iris region has been determined once the inner and outer circles have been
gathered and is being examined for pattern recognition. As a result, CNN and neural
networks algorithms are used for iris segmentation. It is worth mentioning that the
eyelids and lashes are included in the outer circle. As a result, this approach is
effective for pattern matching in a specified area of the obtained region. As a result,
520 S. Jeyalaksshmi and P. J. Sai Vignesh
ACCURACY
92.00%
90.00%
88.00%
Percentage
86.00%
84.00%
82.00%
80.00%
78.00%
D1 D2 D3
CNN NN
References
Abstract In today’s scenarios, online marketing and social networking require senti-
ment for opinion mining to understand its customers and users. The sentiment anal-
ysis involves extracting information from the text and symbols shared by the individ-
uals over the website reflecting their opinions. It describes various emotions of the
customers based on any product. Sentiment analysis is applicable to monitor social
media that recognized the mood of customers against the brand or any other product.
It has been observed that a variety of techniques were used to optimize the features
extracted during sentiment analysis. In the present paper, the author has presented a
detailed literature survey to outline the popularity of optimization techniques used
in the field of sentiment analysis. The literature review conducted over the authenti-
cated research published in the last decade had illustrated that most of the researchers
had implemented Ant Colony Optimization (ACO) and Particle Swarm Intelligence
(PSO) as optimization techniques. In addition to this hybrid, optimization had also
been emerging in recent years. The work outcomes are supported by the graphical
illustrations to show the rising popularity of optimization techniques in the field of
sentiment analysis.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 523
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_52
524 Priyanka and K. Walia
because it predicts the actual needs of the customers through customer feedback.
Many studies were provided by several researchers that are based on sentiment anal-
ysis. In the year 2012, [1] presented a survey that defined the detailed study regarding
sentiment analysis, its definitions, problems, and development. Nowadays, several
applications are available that are based on sentiment analysis such as for the polit-
ical sector, public opinion, medical-based analysis, business analysis, etc., [2]. One
of the recent research works presented by Saha et al. is based upon the sentiment
analysis of COVID-19 with the help of Twitter data sets [3]. Several challenges and
applications of sentiment analysis were discussed by Makinist et al. [4] and Tang
et al. [5] in the detailed description.
Sentiment analysis is an intelligent technique because it helps to capture and
predict the various opinions, attitudes, feelings, and emotions by using different
sources like speech, database, text, and database. The main target behind the senti-
ment analysis is to find out the various emotions and attitudes of the customers
through the feedback of the product [6]. Table 1 defines various emotions that are
used for sentiment analysis. In sentiment analysis, three types of emotions are mostly
preferred that are defined as positive, neutral, and negative [7].
Social media such as blogs, wikis, review sites, social sites, and tweets helps the
customers to share their experiences, knowledge, and thoughts. In the last few years,
people have preferred social networking due to its several advantages. So, social
media plays a major role in sentiment analysis [8]. Microblogging is mostly preferred
for sentimental analysis. It is a network-based service that provides the exchange of
information according to customers’ feedback with the help of messages, videos,
images, etc., [9]. Several machine learning techniques are used in sentiment analysis
[10]. Provided a deep study that is based on ML techniques that are implemented
into sentiment analysis.
Contributions of the study: Overall the paper provided a review to show the popu-
larity of optimization techniques in the field of sentiment analysis based on published
and authenticated work. The major contributions of the review study are:
. The key stages of sentiment analysis and classification work are discussed.
. Highlights the various optimization techniques that have been frequently utilized
for the sentiment analysis work concerning the feature extraction stage.
. The year-wise assessment was provided to show the changing trend in the
popularity of optimization techniques for sentiment analysis.
. A literature review is summarized to show the goal of existing works behind the
integration of optimization techniques at various stages.
Popularity of Optimization Techniques in Sentiment Analysis 525
2 Sentiment Analysis
In sentiment analysis, the first step that is performed is known as data collection.
Social media helps in this step because data were collected from blogs, tweets,
forums, reviews, etc. NLP is used for the mining and classification of data [12].
However, the majority of the research works focus on a Twitter data set for analysing
the positive, neutral, and negative sentiments of customers or consumers to support
various review or survey studies.
After the collection of data, prepare the text and eliminate the non-textual content or
irrelevant data. This step is also known as pre-processing of the data and is respon-
sible for cleaning the data before initiating any type of analysis. At this stage, all
the unwanted or irrelevant words are removed from the collected text [13]. These
unwanted words usually represent repeated use of some words, phrases, stop words,
punctuations, etc. Usually, there are three important techniques that are widely used
in the pre-processing or the preparation of the text for the sentiment analysis. These
techniques are normalization, punctuation removal, and stop word removal.
The last step has resulted in the preparation of the most relevant textual data that
can be used for the sentiment analysis. Now, at this step, the features representing
the refined data are extracted [14, 15] that are then passed to the next step. Various
techniques such as Bag of words, N-gram, and TF-IDF have been popularly used
by various researchers for the extraction of features of the textual data for sentiment
analysis.
The output of the feature extraction step is further refined by applying various opti-
mization techniques. This step increases the accuracy of sentiment analysis and
classification work. The popular nature-inspired optimization techniques are Artifi-
cial Bee Colony, Particle Swarm Optimization, Ant Colony Optimization, FireFly
Optimization, etc., [16–18].
In sentiment detection, the extracted sentences related to the customer opinions and
reviews are examined deeply. It considered only sentences that are having subjec-
tive expressions and the objective-based communication is removed [19]. After the
sentiment detection, the next step is sentiment classification which classifies the
subjective sentences in the form of positive, negative, and neutral. It is also classified
according to the likes, dislikes good, and bad based on different points [20, 21]. This
is performed using machine learning algorithms such as Naïve Bayes, SVM, and
neural networks.
Popularity of Optimization Techniques in Sentiment Analysis 527
The last step of the sentiment analysis is the output that is represented in the form of
pictorial representation through bar charts, pie charts, and line graphs [22].
Sentiment analysis is used to collect the textual form of the opinions that are based on
customer feedback. It defines the various forms of customer emotions. In the past few
years, several authors proposed studies that belong to sentiment analysis [23]. Have
presented a study that used Twitter data to examine sentiments based on a particular
subject. The tweets of the proposed work are categorized into two opinion classes
that are; negative or positive. A hybrid approach of the machine learning algorithm
SVM and ACO were performed for classification. The simulation results show the
enhancement of average accuracy classification is computed from 75.54% (using
SVM) to 86.74% (using SVMACO) [24]. The author introduced a hybrid algorithm
by integrating ACO and KNN algorithm that is applied for feature selection. The
author implemented the simulated results on the customer review data sets. The
proposed work will be compared with the baseline algorithms like information gain
(IG), genetic algorithm (GA), and rough set attribute reduction (RSAR). The overall
evaluation describes that the proposed method provides an improvement in results
as compared with baseline algorithms [25]. The author proposed a system that is
based on sentiment analysis or is used for the election of West Java Governor. In
the proposed work, PSO and information gain are used that provide helps to select
suitable attributes from the documents. For the classification, the author used SVM as
a classifier. The accuracy achieved by the proposed system is computed as 94.8%, and
the value of AUC is computed as 0.98 [26]. Authors have proposed a hybrid approach
using a swarm intelligence-based optimization algorithm. In this paper, the pre-
processing is performed through various steps like tokenization, stemming, removing
emotions, and stop words. The author utilized ACO and PSO techniques because
these are optimized best features selection and also reduce the number of paths. The
optimization is performed before the categorization of the text. For classifications of
tweets, Naïve Bayes (NB) and Support Vector (SVM) techniques of machine learning
were implemented [27]. The author presented a study that used PSO for feature
selection to check the performance of the various classification algorithms. The
author implemented two data sets in the proposed work that are sentiment analysis
data sets and SMS spam detection. The main reason behind the implementation of the
PSO approach was feature selection. The overall result analysis shows that the PSO
enables to reduce the space complexity and provide better accuracy of the classifiers
[28]. The author introduced a study that overcome the difficulty of feature selection
in sentiment analysis. The author used ACO because it is always considered the best
feature selection approach. In the proposed work, a KNN classifier was implemented
528 Priyanka and K. Walia
to generate the optimum features subset of the candidate. The experimental results
show the relationship between the features and sentiment that a determined through
the accuracy depends on precision, recall, and f-score. The overall results evaluation
describe that the proposed ACO-KNN algorithm obtained a better feature subset, and
it also improved accuracy based on sentiment classification. Shekhawat et al. [29] the
author also presented a study that describes sentiment analysis and optimization ACO
is considered as a better approach rather than others. Some authors proposed PSO
or ABC as are best optimization techniques for sentiment analysis. The following
literature survey of this paper is based on different optimization techniques that
are being used for sentiment analysis. In the current era, Twitter is an important
blogging platform that is used to collect the different customers’ opinions based on
“tweets”. The detailed analysis of published work against “sentiment analysis” and
“optimization techniques” is presented in the next section.
The overall inferences drawn from the aforementioned survey analysis are discussed
in this section of the paper. Critically, the study had focused on the sentiment anal-
ysis performed since last decade. With the rising popularity of meta-heuristics and
swarm intelligence, it is concluded that a lot of work had been done on sentiment
analysis using different optimization techniques to address the feature selection stage.
Table 2 describes the different optimization techniques that were implemented by
the researchers in combination with machine learning while focusing on the purpose
of implementation of the optimization techniques. It has been observed that Ahmad
et al. 2015 had integrated GA at the feature selection stage and later in 2017 involved
KNN with ACO for the same purpose Table 2 also generalized that there is several
researchers who had integrated swarm-based optimization techniques at the feature
selection stage due to their objective fitness functions that could resolve the opti-
mization and selection issues. Further, it has also been analysed that the integration
of optimization approaches also enhanced the classification accuracy of existing
machine learning techniques.
The published resources summarized in Table 2 give the constructive outline of the
existing research and lay down the foundation of future research. It has been observed
that in recent years, optimization techniques, namely ACO, PSO, ABC, and hybrid
techniques had been popularly implemented by the research community. Moreover,
a rising trend towards the integration of optimization approaches to enhance the
classification performance of sentiment classification based on machine learning
work had also been presented. The graphical presentation of these observations is
used to illustrate interpretations of the review study.
Further, Fig. 2 represents the pie chart of sentiment analysis depending on the
different optimization techniques. It shows the popularity of three optimization tech-
niques, namely are ACO, PSO, ABC, and hybrid as illustrated by the published
papers cited within the present survey. It is observed that the popularity of ACO
Popularity of Optimization Techniques in Sentiment Analysis 529
Table 2 (continued)
Author’s detail Implemented Optimization Data sets Purpose of
techniques techniques optimization
technique
A. Jain, B. Pal Senti-NSetPSO PSO Blitzer, All IMDb, To categorize
Nandi, and C. Polarity and the document
Gupta, en D. K. subjective data set
Tayal et al. 2020,
[27]
K. Machová, M. Naïve Bayes Particle swarm Movie and general Sentiment
Mikula, and X. optimization data sets labelling
GAO, en M. Mach
et al. 2020, [28]
S. S. Shekhawat, S. SVM and NB Spider monkey Sender2 and Identify optimal
Shringi, en H. optimization Twitter cluster-heads of
Sharma et al. 2020, the data set
[29]
A. Jain, B. Pal NA PSO with Blitzer, aclIMDb, Classify
Nandi, and C. Neutrosophic Polarity and large-sized text
Gupta, en D. K. Set subjective data set
Tayal et al. 2020,
[27]
Naresh and SVM Sequential Twitter data set Multistage
Venkata 2021, [30] minimal optimized
optimization classification
(SMO)
Datta and Recurrent Neural Firefly Demonetization Optimization of
Chakrabarti 2021, Network (RNN) algorithm (FF), tweets weights of the
[31] and Multi-verse polarity scores
optimization
(MVO)
A. Hosseinalipour, Fuzzy C-means Social spider ISEAR data set, Feature
F. S. data clustering optimization sentiment polarity selection
Gharehchopogh, technique, a (SSO) data sets, and
and M. Masdari, en decision tree Stanford sentiment
A. Khademi ipour (DT), and Naïve treebank data sets
et al. 2021, [32] Bayes (NB)
Vasudevan and Support vector PSO Amazon data sets Feature
Kaliyamurthie, (SVM) selection
2021, [33]
has increased to 32% in comparison with PSO (26%) and ABC (21%) in the field
of sentiment analysis. It also means that ACO is an emerging optimization tech-
nique and proves its strength over earlier ABC and PSO techniques. Moreover,
the hybridization of various optimization techniques had also shown a significant
attraction to improve the sentiment analysis and classification work. Usually, these
Popularity of Optimization Techniques in Sentiment Analysis 531
5 Conclusion
Sentiments reflect the thoughts of an individual and the sentiment analysis belongs
to the text-based analysis that reflects customers’ emotions and opinions. It has been
observed that for sentiment analysis two optimization techniques have been popularly
implemented and form the first choice of most of the researchers for the feature
532 Priyanka and K. Walia
25% 23%
20%
15%
15%
12% 12%
10% 8%
5% 4%
0%
2015 2016 2017 2018 2019 2020 2021
selection stage. Further, it is observed that ABC, PSO, and ACO hold the major
proportion among various optimization techniques concerning sentiment analysis.
However, the highest popularity is observed for ACO as an optimization approach
for feature selection in the field of sentiment analysis. The overall results describe
that the ACO emerged as the best approach for sentiment analysis since 2015 with
Twitter data sets implemented in most of the cases.
References
1. Saha G, Roy S, Maji P (2021) Sentiment analysis of twitter data related to COVID-19. In: Impact
of AI and data science in response to coronavirus pandemic, Singapore, Springer, Singapore,
pp 169–191
2. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Foundations and trends® in
information retrieval 2(1–2):1–135
3. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
4. Makinist S, Hallaç IR, Karakuş BA, Aydın G (2017) Preparation of improved Turkish DataSet
for sentiment analysis in social media. ITM Web Conf 13:01030
5. Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Expert Syst Appl
36(7):10760–10773
6. Trupthi M, Pabboju S, Narasimha G (2016) Improved feature extraction and classification—
Sentiment analysis. pp 1–6
7. Mouthami K, Devi KN, Bhaskaran VM (2013) Sentiment analysis and classification based
on textual reviews. In: 2013 International conference on information communication and
embedded systems (ICICES), Chennai
8. Alessia D, Ferri F, Grifoni P, Guzzo T (2015) Approaches, tools and applications for sentiment
analysis implementation. Int J Comput Appl 125(3)
Popularity of Optimization Techniques in Sentiment Analysis 533
9. Kaur J, Sehra SS, Sehra SK (2016) Sentiment analysis of twitter data using hybrid method of
support vector machine and ant colony optimization. Int J Comput Sci Inf Secur 14(7):222
10. Ahmad SR, Yusop NMM, Bakar AA, Yaakub MR (2017) Statistical analysis for vali-
dating ACO-KNN algorithm as feature selection in sentiment analysis. In: 2nd international
conference on applied science and technology 2017 (ICAST’17), Kedah, Malaysia
11. Kurniawati I, Pardede HF (2018) Hybrid method of information gain and particle swarm opti-
mization for selection of features of SVM-based sentiment analysis. In: 2018 international
conference on information technology systems and innovation (ICITSI), Bandung-Padang,
Indonesia
12. Bakshi G, Shukla R, Yadav V, Dahiya A, Anand R, Sindhwani N, Singh H (2021) An optimized
approach for feature extraction in multi-relational statistical learning. J Sci Ind Res (JSIR)
80(6):537–542
13. Gupta A, Anand R, Pandey D, Sindhwani N, Wairya S, Pandey BK, Sharma M (2021) Prediction
of breast cancer using extremely randomized clustering forests (ERCF) technique: prediction
of breast cancer. Int J Distrib Sys Technol (IJDST) 12(4):1–15
14. Anand R, Chawla P (2016) A review on the optimization techniques for bio-inspired antenna
design. In: 2016 3rd international conference on computing for sustainable global development
(INDIACom), IEEE. pp 2228–2233
15. Srivastava A, Gupta A, Anand R (2021) Optimized smart system for transportation using RFID
technology. Math Eng Sci Aerosp 12(4):953–965
16. Anand R, Chawla P (2020) A hexagonal fractal microstrip antenna with its optimization for
wireless communications. Int J Adv Sci Technol 29(3s):1787–1791
17. Badr EM, Salam MA, Ali M, Ahmed H (2019) Social Media Sentiment Analysis using Machine
Learning and Optimization Techniques. Int J Comput Appl 975:8887
18. Bajeh AO, Funso BO, Usman-Hamza FE (2019) Performance analysis of particle swarm
optimization for feature selection. FUOYE J Eng Technol 4(1)
19. Ahmad SR, Bakar AA, Yaakub MR (2019) Ant colony optimization for text feature selection
in sentiment analysis. Intell Data Anal 23(1):133–158
20. Nayar N, GautamS, Singh P, Mehta G (2021) Ant colony optimization: A review of literature
and application in feature selection. In: Inventive Computation and Information Technologies,
Singapore: Springer, Singapore 285–297
21. Ahmad SR, Bakar AA, Yaakub MR (2015) Metaheuristic algorithms for feature selection
in sentiment analysis. In: 2015 Science and information conference (SAI), London, United
Kingdom
22. Ahmad SR, Bakar AA, Yaakub MR, Yusop NMM (2017) Statistical validation of ACO-KNN
algorithm for sentiment analysis. J Telecommun Electron Comput Eng JTEC 9(2–11):165–170
23. Gupta DK, Reddy KS, Ekbal A (2015) Pso-asent: Feature selection using particle swarm
optimization for aspect based sentiment analysis. pp 220–233
24. Kumar S, Yadava M, Roy PP (2019) Fusion of EEG response and sentiment analysis of products
review to predict customer satisfaction. Inf Fusion 52:41–52
25. Nagarajan SM, Gandhi UD (2019) Classifying streaming of Twitter data based on sentiment
analysis using hybridization. Neural Comput Appl 31(5):1425–1433
26. Orkphol K, Yang W (2019) Sentiment analysis on microblogging with K-means clustering and
artificial bee colony. Int J Comput Intell Appl 18(03):1950017
27. Jain A, Pal Nandi B, Gupta C, Tayal DK (2020) Senti-NSetPSO: large-sized document-level
sentiment analysis using Neutrosophic Set and particle swarm optimization. Soft Comput
24(1):3–15
28. Machová K, Mikula M, Gao X, Mach M (2020) Lexicon-based sentiment analysis using the
particle swarm optimization. Electronics (Basel) 9(8):1317
29. Shekhawat SS, Shringi S, Sharma H (2021) Twitter sentiment analysis using hybrid Spider
Monkey optimization method. Evol Intell 14(3):1307–1316
30. Naresh A, Krishna PV (2021) An efficient approach for sentiment analysis using machine
learning algorithm. Evol Intell 14(2):725–731
534 Priyanka and K. Walia
31. Datta S, Chakrabarti S (2021) Aspect based sentiment analysis for demonetization tweets by
optimized recurrent neural network using fire fly-oriented multi-verse optimizer. Sādhanā 46(2)
32. Hosseinalipour A, Gharehchopogh FS, Masdari M, Khademi A (2021) Toward text psychology
analysis using social spider optimization algorithm. Concurr Comput 33(17)
33. Vasudevan P, Kaliyamurthie KP (2021) Product sentiment analysis using particle swarm opti-
mization based feature selection in a large-scale cloud. In: Proceedings of the 1st international
conference on computing, communication and control system, I3CAC 2021
Predominant Role of Artificial
Intelligence in Employee Retention
Abstract The present study throws light on the role of artificial intelligence in human
recourses. As technology is changing very rapidly so many industries adopted this
system to give more satisfaction to the employees. An employee plays a vital role in
the organization. New techniques and technologies are used by the organization to
maintain their employees. It is important for all organizations that offer more bene-
fits to the employee. The validity and reliability of the questionnaire were validated
by Cronbach’s alpha. The present study is based on the previous literature research
“Factors” with the help of this literature review, a structured questionnaire was devel-
oped. AI technology will continue to grow and at some point in future, AI will be
the norm and the old-fashioned recruiting and hiring processes will seem stone-age.
There is a positive relationship between the hiring and training for employees with
AI and find the factor which employees are required to be done with AI. With the
help of SPSS, the study was found that AI showing a positive relationship with the
human resource department as well as with employees. With the help of the Radom
sampling technique, the data was collected from the different companies’ employees
(N = 50).
1 Introduction
Modernized thinking is a device that uses human information in various areas and
improves adoption, and it is a creative development used in all organizations to
improve utility and performance. In this article, we are receiving a wide technique to
consolidate ace unique systems, re-enactment and showing, mechanical innovation,
regular language planning (NLP), usage of inventively decided computations, etc.
Thus, we are including helped, increased, and self-ruling insight in the different
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 535
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_53
536 R. Kaur and H. Kaur
Organization
AI Role
Human
Employees
Resource
2 Literature Review
In the year 2021 found in his study that AI is contributing in the success of recruitment
process: identifying, selecting, and retaining talented people. Howard identified that
AI is playing role in everybody life. Strategic foresight on AI workplace applications
will shift professional research and practice from a reactive attitude to a proactive
attitude [8]. Understanding the opportunities and challenges of AI for the future of
Predominant Role of Artificial Intelligence in Employee Retention 537
work will help mitigate the adverse effects of AI on the safety, health, and well-being
of workers.
In 2019, emphasis the AI use in human resource management in term of Team
Estimate, Recruitment and Selection, Employability/R&S, Recruitment, Turnover,
HR Performance Measurement, Corporate Education /training, Development (HRD)
Management by Competencies Quality of life at work Employability [9]. Worked
on the requirement of the organization as he found the chatbots attract the employee
and help the organization to engage their employees with it and along with this
with the help of this AI recruitment procedure became easy for an functions of
an organization like production, performance management, sale, strategic planning,
customer correlation management, banking system, coaching, training, taxes, etc.
[10]. It was also Investigated in 1986 that with the help of AI, time is saved, i.e. tasks
can be completed in less time, and also emphasize learning and development, which
help the organization to focus on crucial thinking [11].
Artificial intelligence has now entered the overall procedure, method, technique
of an organization, and one of the areas where AI has replaced humans is in the
human resource department, where all functions in the human resource department
technology are carried out, such as candidate interviewee, recruitment, orientation of
human resource activities, and performance management, among others [12] (Fig. 1).
(continued)
S. No Research paper name Year Findings Sector
4 “Artificial intelligence 2019 Artificial intelligence Food processing
chatbots are new recruiters” chatbots are very
productive instruments in
the recruitment process,
according to the research,
and they will be useful in
developing a recruitment
strategy for the industry
[16]
5 “Artificial intelligence: 2019 Taking a proactive Manufacturing sector
implications for the future approach to AI workplace
of work” applications will transform
occupational research and
practice from a reactive to a
proactive position [17]
6 “Impact of artificial 2019 The researcher attempted to Manufacturing sector
intelligence in recruitment, establish a link between
selection, screening, and mass hiring drives and the
retention outcomes in the success of using AI in
Irish market in view of the identifying top performers
global market” who will be interested in
long-term organisational
development in this article
[18]
7 “The influence of 2019 Research found that some Manufacturing sector
organizational culture on employees are poor task
employee” performance; lacked
discipline in carrying out
tasks, such as coming and
leaving work without
following applicable
regulations; some
employees carried out their
tasks without following
applicable guidelines
(resulting in poor work
quality); there were delays
in reporting by employees
[11]
9 “Evolution of artificial 2019 It’s thought that the IT sector
intelligence research in interdisciplinary practice of
human resources” AI and HR hasn’t yet
resulted in a theoretical
break or a new conceptual
field [19]
(continued)
Predominant Role of Artificial Intelligence in Employee Retention 539
(continued)
S. No Research paper name Year Findings Sector
10 “Artificial intelligence in 2019 Solution for (1) HR Industry sector
human resources phenomenon complexity,
management: challenges (2) restrictions imposed by
and a path forward” tiny data sets, (3) ethical
concerns related to fairness
and legal constraints, and
(4) employee reaction to
management by databased
algorithms [20]
11 “Artificial intelligence and 2018 AI in HR, the benefits of IT sector
the future of HR practices” AI, the obstacles of
implementing AI, and the
way ahead AI and machine
learning are two critical
tech trends that must be
adopted if inch-perfect
decision-making and
successful people
management are to be
achieved [21]
12 “Can artificial intelligence 2018 According to the findings, Industry sector
change the way in which AI has a good impact [22]
companies recruit, train,
develop, and manage
human resources in
workplace?”
13 “Artificial intelligence in 2018 Artificial neural networks IT sector
human resource for turnover prediction,
management” knowledge-based search
engines for candidate
search, genetic algorithms
for staff rostering, text
mining for HR sentiment
analysis, information
extraction for résumé data
acquisition, and interactive
voice response for
employee self-service [9]
14 “Employee turnover 2017 Reduce the employee Industry sector
prediction and retention turnover [23]
policies design: a case
study”
15 “Prediction of employee 2016 It has a far greater accuracy IT sector
turnover in organizations rate when it comes to
using machine learning predicting employee
algorithms” turnover [12]
(continued)
540 R. Kaur and H. Kaur
(continued)
S. No Research paper name Year Findings Sector
16 “Artificial intelligence for 2015 Assisting the organization IT sector
marketing” in the development of new
strategies and innovations
[24]
17 “Retention: A 2011 Retention policy should be IT sector
case of Google” resolved, as anonymized
logs could be shared with
third parties without prior
user approval [25]
4 Research Methodology
The research methodology is based on the descriptive by nature, with the help of
above literature review prepared one structured questionnaire. The information is
gathered from the respective HR manager using the AI in their organization. Infor-
mation was gathered using self-gathered survey. Factor analysis technique was use
Predominant Role of Artificial Intelligence in Employee Retention 541
Employee
Learning
Retain the
Employee
with
Organization
by use AI
Biases
removed
Develop
Ledership
Skills
Analysis
the
potenial
of
Employee
to found that the factor in which manager can easily use the AI, and these factor find
out with the help of above literature review (Fig. 2).
Publications
Literature
Survey
Industry
Survey
Research Analysis
methodology
By Email
Industry
Contacts
By Visit
In this examination, the investigation was performed with the help of SPSS. By using
the factor, analysis technique hiring, training, vacations, appraisal, and engagement
are factors most suits to an employee, and they feel motivate and well treated. This
examination give light on the retention as well as on the satisfaction of employee by
using this technique (Fig. 3).
6 Factor Analysis
By using this technique, 50 questions reduce to six factors. So that manager can
easily understand the factor in which give employee satisfaction, and they able to
retain for long time. The value of −1 to 1 that indicate there is strongly influence the
factor.
Reliability Test
Cronbach alpha tests were conducted for 24 parameters evaluated for analysing
the success of employee retention tactics in the selected organisation in order to
validate the questionnaire. The Cronbach alpha for 24 items, however, is 0.8, which
is higher than the 0.8 threshold level for social sciences. As a result, the values used
in the evaluation of research factors are consistent.
7 R Statistics
Correlations
8 Result
Using factor analysis in SPSS, it able to find six factor these factors are.
. Hiring with the help of AI
. Training with the help of AI
. Vacation request
. Employee development
. Appraisals through AI
. Employee engagement
– It was shown that there is a strong and favourable association between AI-
assisted hiring and employee retention techniques (r = 0.634).
– It was revealed that AI-assisted training was linked to organisational manpower
involvement, as demonstrated by (r = 0.585).
– It was discovered that vacation request has a statistically significant link with
staff retention techniques (r = 0.680).
– It was discovered that staff development has a significant impact on employee
retention, as evidenced by (r = 0.445).
– AI-assisted appraisals have been found to boost the retention rate in organisa-
tions, according to research (0.564).
– Through AI, it was shown that employee engagement strategies had a direct
association with appraisals (r = 0.551).
9 Discussion
All of these applications are novel, and as fascinating as they may appear, there
are a few risks to be aware of. The good news is that AI can’t function without
training data. Algorithms, in other words, learn from their experiences. You may
wind up with all of the things you despise if your existing management techniques
are prejudiced, discriminating, punishing, or overly consistent. To ensure that ways,
techniques [algorithms, methods] are doing the right thing, we need visible and
544 R. Kaur and H. Kaur
adaptable AI. Our early algorithms would require a bump and a change of knots in
order to learn how to develop, design, and manufacture more precisely, just as early
cars did not always go straight. Methods can be used to determine bias. Consider
this scenario: your company has never employed a female engineer and just a few
African–American engineers. According to the AI recruitment method, women and
black engineers are less likely to rise in management. This form of bias should
be carefully removed from algorithms, and it will take time to do it successfully.
There’s also the risk of data breaches and misuse. Consider the widespread use,
universal, and integrated analysis in which we attempt to predict the likelihood of
the most productive employee leaving the company. In fact, informing management
that this person is more likely to leave the company may result in the manager firing or
disregarding the employee. Instead of being an independent decision-making process,
modern AI is a tool for recommendation and improvement. The need of establishing
interpretative and transparent AI systems was underlined by AI scientists at Entelo.
To put it another way, whenever a system makes a decision, it must explain why it
made that decision so that we, as humans, can assess whether the approach it employs
is still effective. Unfortunately, most AI algorithms today are completely opaque, and
this is one of the most influential display elements of the most recent tools.
The decision to keep the employee’s goal meant that fluctuation activities could be
normalized all the more and measures to avoid fluctuations could be taken at an
early stage. Low professional association leads to the goal of decisively ending the
association of HR patterns with AI-based competitors for a more profound impact
on improving overall execution. Despite the way AI applications are unlikely to have
the breaking points, humans have the energetic and insightful breaking points, these
shocking AI-powered HR applications can still verify, predict, disconnect, and it’s an
incredible benefit that such a connection is real The fear sweeping the global work-
force shows how AI is impacting work in diverse fields around the world. However,
it is not the frontline advances that people are displacing, and it is how people should
change and view these movements in order to achieve prosperity and success. As
such, there will be some level of operators affected by the cutoff points, the rela-
tionship between responsibility and regulators is nil in terms of expected outcomes.
Furthermore, in our estimation, a majority of affiliations will eventually have enough
AI-based tools to choose from, at least not in all areas where the upcoming AI will
eventually block HR: registration, organization, embarkation, execution evaluation,
maintenance, and so on, and affiliations are relaxed enough to connect in terms of
the mix. In summary, the execution of AI should be seen as living chance, since AI
improves life, AI improves the future when obviously observed and used in a real
way.
Predominant Role of Artificial Intelligence in Employee Retention 545
References
25. Kumar BSP, Nagrani K Artificial intelligence in human resource management. JournalNX
106–118
26. Ajit P (2016) Prediction of employee turnover in organizations using machine learning
algorithms. Algorithms 4(5): C5
27. Sterne J (2017) Artificial intelligence for marketing: practical applications. Wiley
28. Toubiana V, Nissenbaum H (2011) An analysis of google log retention policies
Semantic Segmentation of Brain MRI
Images Using Squirrel Search
Algorithm-Based Deep Convolution
Neural Network
Abstract In recent years, brain tumor has become a severe threat to human lives.
These tumors are so often inadequately contrasted and are inadequately dispersed.
In recent days, brain tumor is automatically detected using semantic segmentation.
However, the variability in the size of brain tumors and the low contrast of brain
imaging are the two major problems affecting the performance of semantic segmen-
tation. To address this problem, a squirrel search algorithm-based deep convolution
neural network (SSA-DCNN) proposed for semantic segmentation of the medical
images in this paper. The proposed method is a blend of deep convolution neural
network (DCNN) and squirrel search algorithm (SSA). The SSA is used to fine-
tune the performance of DCNN by optimizing the hyperparameters of the DCNN,
which in turn enhances the accuracy of the semantic segmentation. The proposed
method is implemented and validated by performance metrics such as accuracy, loss,
IoU, and BF score. The performance of SSA-DCNN is compared with the jellyfish
algorithm-based deep convolution neural network (JA-DCNN), oppositional-based
seagull optimization algorithm (OSOA-3DCNN), and particle swarm optimization
(PSO)-DCNN.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 547
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_54
548 B. Tapasvi et al.
1 Introduction
A brain tumor is the growth of abnormal cells in the brain, out of which few of
them may be malignant and can cause cancer. Gliomas is a typical tumor that
occurs in brain and spinal cord. Based on the glioma cell involved in the tumor,
the gliomas can be classified into three types: Astrocytoma, Ependymomas, and
Oligodendrogliomas. Gliomas is also classified as low-grade gliomas (LGG) and
high-grade gliomas (HGG), the latter having more strength and penetration than
the previous one [1]. A glioma usually develops strongly and penetrates deeply on
the basis that it rapidly attacks the central nervous system (CNS). About 18,000
Americans continue to have glioma, according to the U.S. National Cancer Institute.
A significant number of them die within 14 months [2]. In clinical practice, clin-
ical imaging, mainly computed tomography (CT), and magnetic resonance imaging
(MRI) has been used to determine the presence of (1) tumor, (2) peritoneal edema,
and (3) localization. However, the classification and complexity of cerebral tumors
under MRI often make it difficult for radiologists and various physicians to approve
and classify tumors [3, 4]. Therefore, the automatic segmentation of multiple tumors
into clinically relieving physicians from the burden of manual imaging of tumors may
adversely affect clinical medications [5, 6]. Many different types of semantic segmen-
tation have been developed by researchers. Several deep neural network architectures
available in the literature are showing great efficiency in classification and object
recognition. However, the computational complexity of building these architectures
is very high because they have to be custom designed for each different application
and problem domain in a manual fashion. Therefore, there is a need for reducing the
computational complexity in designing the neural network architecture for a specific
application. To this extent, this paper proposes to use a squirrel search-based genetic
algorithm to automatically optimize the structure of deep neural network to be suit-
able for the semantic segmentation of brain tumor images to distinguish between
tumor and the rest of the brain.
The remaining part of the paper is organized as follows; Sect. 2 provides a detailed
description of the proposed methodology. Section 3 provides a detailed description
of the results of the semantic segmentation. The conclusion of the paper is presented
in Sect. 4.
for classification of the tumor. In general, all the layers in the DCNN may not be
needed for classification of brain tumor. The decision of skipping some layers, while
retaining other layers of the DCNN to optimize the architecture would be made by
the squirrel search algorithm. After training the SSA-DCNN with the database, the
DCNN is tested for its performance by giving the test images to the network.
The heart of the proposed method lies in two sub-systems: the deep convolution
neural network and the squirrel search algorithm. Therefore, the details of these
two sub-systems are elaborated with respect to the classification of tumor in the
brain MRI images. To optimize the architecture of the DCNN, the squirrel search
algorithm must be presented with the input database, all the layers of the DCNN and
the classification labels of the required application. So, the first step in the proposed
system is to design a base DCNN architecture to be provided for the SSA algorithm
for optimization. The base DCNN for the brain tumor classification is presented in
Fig. 2.
The size of the images in the input layer is chosen to be 110 × 110 × 3 and is
connected to a convolution layer as shown in Fig. 2. The DCNN consists of two
convolution layers with a 3 × 3 filter size and 1 × 1 stride. Each convolution layer is
followed by a batch normalization layer to improve the classification accuracy. Two
max pooling layers are used to reduce the dimensions of the features extracted after
convolution. Two fully connected layers are used to flatten the output and the final
fully connected layer is connected to a softmax layer for classifying the brain image
with or without the tumor.
550 B. Tapasvi et al.
1 2
Input Image 1 2
3 4 5
Normalization Layer
Soft max layer
Now, the details of the DCNN are presented to the squirrel search algorithm for
optimizing the architecture and thereby reducing the computational complexity.
where Si, j can be represented as jth dimension of ith flying squirrel. The initial loca-
tion of each flying squirrel is allocated with the consideration of uniform distribution
in the forest. The fitness location of each flying squirrel is computed with the decision
variable values into a user-described fitness function. The fitness function values are
stores in the below array.
⎡ ([ ]) ⎤
F1 ([ S1,1 , S1,2 , . . . , S1,d ])
⎢ F S ,S ,...,S ⎥
⎢ 2 2,1 2,2 2,d ⎥
⎢ ⎥
FF = ⎢ ... ⎥ (2)
⎢ ⎥
⎣ . . . ⎦
([ ])
FN S N ,1 , S N ,2 , . . . , S N ,d
FF = MAX{PSNR} (3)
( )
MAX P
PSNR = 10log10 (4)
MSE
1 Σ Σ [ ]2
N M
MSE = Iimage (A, B) − Id−image ( A, B) (5)
N ∗ M X =1 Y =1
where Id−image (A, B) is described as segmented images and Iimage (A, B) is described
as an input image. Based on the fitness function, the DCNN images are selected
which are utilized to enhance the optimal semantic segmentation process. Once
compute fitness values, these values are stored in the array. After that, the stored
fitness values are sorted in ascending order. The flying squirrel with minimum fitness
value is considered a hickory nut tree. The next three optimal flying squirrels are
considered as the acorn nuts tree which moved toward the hickory nut tree. The
remaining flying squirrels are considered the normal tree. In the flying squirrels,
the foraging characteristics are affected by the presence of predators. This normal
character is changed by considering the location updating technique with predator
presence probability function. The new solutions are generated with the consideration
of the dynamic foraging behavior of flying squirrels. The dynamic foraging behavior
of the flying squirrels can be understood with three conditions: scenario 1 in which
the flying squirrels move from acorn nut tree to hickory nut tree, scenario 2, in which
552 B. Tapasvi et al.
the flying squirrels move to acorn nut tree, and scenario 3 in which, the squirrels are
on the normal tree. The mathematical description of these three scenarios is presented
in this section,
Scenario 1: Flying squirrels are presented in acorn nut trees which moves to
hickory nut tree. The new location is computed as follows.
⎧ ( )
T +1
T
S AT + dg × gC × S HT T − S AT
T
r1 ≥ PD P
S AT = (6)
Random location otherwise
Here the r2 can be described as a random number in the range [0, 1].
Scenario 3: In this scenario, the squirrels are on normal trees which already
consumed acorn nuts may move toward hickory nut trees to store hickory nuts which
can be considered at the time of food scarcity. The new location of squirrels can be
achieved follows.
⎧ T ( )
S N T + dg × gC × S HT T − S NT T r3 ≥ PD P
S NT +1
T = (8)
Random location otherwise
Here the r3 can be described as a random number in the range [0, 1]. PD P can be
described as a probability function taken as 0.1 for three scenarios.
In the SSA, seasonal changes significantly affect the foraging activity of squirrels.
They affect heat loss at very low temperatures [12]. The seasonal constant value
should be considered to enhance the performance which presented follows.
[
| D
| Σ ( )2
SC = ]
T T
S AT ,K − S H T,K
T
(9)
K =1
where T = 1, 2, 3.
The relocation of the flying squirrels is designed with the below equation.
T = S L + Levy(N ) × (SU − S L )
S Nnew (10)
Semantic Segmentation of Brain MRI Images Using Squirrel… 553
4 Conclusion
In this paper, SSA-DCNN has been developed for semantic segmentation of medical
images. Initially, the brain tumor images have been collected from the open-source
system. The proposed semantic segmentation process is a combination of DCNN
and SSA. In the DCNN, the hyperparameters have been selected with the help of
the SSA algorithm for enhancing the segmentation accuracy. The proposed method
has been implemented and validated by performance metrics such as accuracy, loss,
IoU, and BF score. The proposed method has been compared with the conventional
methods such as JA-DCNN, OSOA-3DCNN, and PSO-DCNN, respectively. From
the results, the proposed methodology has been achieved the best results in terms of
accuracy, loss, IoU, and BF score, respectively. In the future, the efficient methods
will be developed to achieve the best segmentation outcomes in different medical
images.
Semantic Segmentation of Brain MRI Images Using Squirrel… 555
100
Accuracy (%)
80
60
40
20
0
SSA-DCNN OSOA-3DCNN JA-DCNN PSO-DCNN DCNN
0.96
0.95
0.94
IoU
0.93
0.92
0.91
SSA-DCNN OSOA-3DCNN JA-DCNN PSO-DCNN DCNN
0.825
0.82
BF score
0.815
0.81
0.805
0.8
SSA-DCNN OSOA-3DCNN JA-DCNN PSO-DCNN DCNN
References
1. Zhang D, Huang G, Zhang Q, Han J, Han J, Yu Y (2021) Cross-modality deep feature learning
for brain tumor segmentation. Pattern Recogn 110:107562
2. Naser MA, Deen MJ (2020) Brain tumor segmentation and grading of lower-grade glioma using
deep learning in MRI images. Comput Bio Med 121:103758
3. Khan H, Shah PM, Shah MA, ul Islam S, Rodrigues JJ (2020) Cascading handcrafted features
and convolutional neural network for IoT-enabled brain tumor segmentation. Comput Commun
153:196–207
4. Aboelenein NM, Songhao P, Koubaa A, Noor A, Afifi A (2020) HTTU-Net: hybrid two track
U-net for automatic brain tumor segmentation. IEEE Access 8:101406–101415
5. Yogananda CGB, Shah BR, Vejdani-Jahromi M, Nalawade SS, Murugesan GK, Yu FF, Pinho MC
et al (2020) A fully automated deep learning network for brain tumor segmentation. Tomography
6(2):186–193
6. Zhang W, Yang G, Huang H, Yang W, Xu X, Liu Y, Lai X (2021) ME-net: multi-encoder net
framework for brain tumor segmentation. Int J Imaging Syst Technol
Top Five Machine Learning Libraries in
Python: A Comparative Analysis
Abstract Nowadays machine learning (ML) is used in all sorts of fields like health
care, retail, travel, finance, social media, etc. ML system is used to learn from input
data to construct a suitable model by continuously estimating, optimizing, and tuning
parameters of the model. To attain the stated, Python programming language is one of
the most flexible languages, and it does contain special libraries for ML applications,
namely SciKit-Learn, TensorFlow, PyTorch, Keras, Theano, etc., which is great for
linear algebra and getting to know kernel methods of machine learning. The Python
programming language is great to use when working with ML algorithms and has
easy syntax relatively. When taking the deep-dive into ML, choosing a framework can
be daunting. The most common concern is to understand which of these libraries has
the most momentum in ML system modeling and development. The major objective
of this paper is to provide extensive knowledge on various Python libraries and
different ML algorithms in comparison with meet multiple application requirements.
This paper also reviewed various ML algorithms and application domains.
1 Introduction
Machine learning (ML) is the most popular technology in today’s world. The ML-
domain is very immense, and it is developing quickly, being constantly apportioned
and sub-parcelled relentlessly into various sub-fortes and types [1]. AI is the domain
of study that enables PCs to learn without being unequivocally modified and tackles
issues that can’t be addressed by mathematical. Among the various kinds of ML
assignments, a pivotal qualification is drawn among regulated and unaided learning:
Supervised machine learning, the program is “prepared” on a predefined set of
“preparing models”, which then, at that point, work with its capacity to arrive at an
559
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_55
560 M. Rajesh and M. Sheshikala
exact resolution when given new information. Solo machine learning is a program
that gives a lot of information and should discover examples and connections in that.
These days, the strength of an organization is estimated by the measure of
information it has. These organizations examine the information and concentrate
valuable data. For example, E-Marts continues proposing items dependent on your
buying patterns and Facebook and Twitter consistently recommend companions and
posts in which you may be intrigued. Information in crude structure resembles
raw petroleum, we need to filter raw petroleum to make petroleum and diesel.
Additionally, machine learning is helpful in handling the information to get valuable
experiences.
Machine learning algorithms are being applied in heaps of spots curiously.
It’s turning out to be progressively universal with an ever-increasing number of
utilization’s in places where we can’t consider, in some application of medical field,
academic and Data Center Optimization. In the medical field, machine learning
assumes a fundamental part and is as a rule progressively applied to clinical
image division, image enlistment, multi-modal picture combination, PC supported
conclusion, image directed care will be taken, picture comment, and picture information
base recovery, where disappointment could be lethal [2]. In academics, educators
need to plan showing materials, physically grade understudies’ schoolwork, and
give criticism to the understudies on their learning progress. Understudies, then
again, regularly amazingly troublesome “one-size-fits-all” training measure that isn’t
customized to their capacities, needs, and training setting [3].
In ongoing advances, ML furnish new freedoms to handle difficulties in instruction
framework by gathering and dissecting the understudy’s information and produce
when they cooperate with a learning framework. Those large number of racks of
murmuring workers utilize tremendous measures of energy; together, all current
server farms utilize generally 2% of the world’s power, and whenever left unchecked,
this energy request could develop as quickly as Internet use. So, making server
farms run as productively as conceivable is an exceptionally serious deal. ML is
appropriate for the DC climate given the intricacy of plant activities and the plenitude
of existing observing information and a portion of the undertakings dealt with by
machine learning DC Plant Configuration Optimization [4].
Python is becoming famous step by step and has begun to supplant numerous well-
known dialects in the business. The effortlessness of Python has drawn in numerous
designers to assemble libraries for machine learning and data science, due to this load
of libraries, Python is practically well known as R for data science. It is an engaging
decision for an algorithmic turn of events and exploratory information examination
[5, 6]. The overview of top machine learning libraries in Python is shown in Fig. 1.
Top Five Machine Learning Libraries … 561
2.1 Scikit-Learn
2.2 TensorFlow
TensorFlow [8] (TF) is created by Google Brain and a team of Google, it has
decent documentation, a lot of functionalities alongside the rudiments and it is
feasible to make code entirely customizable. Since it is composed as a low-level
library, it is a bit more earnestly to dominate. TensorBoard is a representation
instrument that accompanies every one of the standard establishments of TF. It
permits clients to screen their models, boundaries, misfortunes, and substantially
more. A portion of the fundamental regions where TensorFlow sparkles are Handling
profound neural organizations, NLP, Abstraction capacities, Image, Text, and Speech
acknowledgment, Effortless cooperation of thoughts and code. The center undertaking
of TensorFlow is to fabricate profound learning models. Figure 2 illustrates the data
flow processing mechanism in TensorFlow.
TensorFlow enjoys many benefits, some resemble assists us with carrying out
support learning. We can straight away imagine machine learning models utilizing
TensorBoard, an instrument in the TensorFlow library. We can convey the models
constructed utilizing TensorFlow on CPUs just as GPUs. On the other hand, it
562 M. Rajesh and M. Sheshikala
has a few weaknesses like, it runs significantly more slow in contrast with those
CPUs/GPUs that are utilizing different structures. The computational charts in
TensorFlow are moderate when executed.
2.3 Keras
Keras [9] is based on and works on top of TensorFlow programming in it, which is like
the way on a more significant level. The expense for that is harder customization of
code. Notably, customization and tweaking of code are a lot simpler when coding at a
low level. Keras includes a few of the structure squares and apparatuses fundamental
for making a neural organization, for example, Neural layers, Activation and cost
capacities, Objectives, Batch standardization, Dropout, Pooling.
Keras enjoys a few benefits like, it is the awesome examination work and product
prototyping. The Keras system is compact. It permits a simple portrayal of neural
organizations. It is exceptionally productive for representation and demonstrating. On
other hand, it has a few impediments like it is delayed as it requires a computational
chart before carrying out an activity.
2.4 PyTorch
PyTorch [10] (PT) is created as well as utilized by Facebook. This was grown later
than TensorFlow, however, its local area is developing quickly. PyTorch runs its code
in a more procedural style, while in TensorFlow, one first necessity to plan the entire
model and afterward run it inside a session. Along these lines, it is a lot simpler
to troubleshoot code in PyTorch. It has more “pythonic” codes, it is simpler to
learn and simpler to use for speedy prototyping. PyTorch and Keras additionally
have great documentation. A portion of the crucial provisions that put PyTorch
Top Five Machine Learning Libraries … 563
aside from TensorFlow is Tensor registering with the capacity for sped-up handling
through GP units, easy to study, utilize and coordinate with the remainder of the
Python environment, support for neural organizations based on a tape-based auto
diff framework. The different modules PyTorch accompanies that help makes and
train neural organizations are Tensors-torch.Tensor, Optimizers-torch.optim module,
Neural Networks-nn module, and Autograd.
PyTorch enjoys a few benefits like its system is famous for its speed of execution.
It is equipped for taking care of amazing charts. It additionally incorporates different
Python articles and libraries. On the other hand, it has a few weaknesses like the
local area for PyTorch isn’t broad, and it slacks to give content to questions. In
contrast with other Python structures, PyTorch has lesser elements as far as giving
representations and application investigating.
2.5 Theano
Theano [11] library is a Python interface for the upgrading compiler. After enhancement
and accumulation, the capacities become accessible as regular Python capacities,
yet have superior. Vector, grid, and tensor activities are upheld and productively
paralleled on accessible equipment. A portion of the elements that make Theano
a hearty library for doing logical estimations. It assists for GPUs to do more in
uncompromising calculations contrasted with CPUs, strong joining relation with
NumPy, faster and stable assessments of even the trickiest of factors, and ability to
make custom C code for your numerical activities. Theano enjoys a few benefits
like, it upholds GPUs that assist applications with performing complex calculations
proficiently, it is straightforward execute Theano as a result of its combination with
NumPy, there is an immense local area of engineers utilizing Theano. On another
hand, it has a few burdens like, it is slower in the back end, there are different issues
in Theano’s low-level API, it gives a ton of back-end blunders. Also, the Theano
library has a precarious expectation to absorb information.
The overall comparison of machine learning libraries is done in the next section,
here in Table 1, all the basic information of the libraries is mentioned like developed
by whom, launched in which year, written in which languages, and well-known
applications.
Table 2 gives the libraries information taken from GitHub. It is a popular online
hosting service for version control [12]. Here the information is taken based on
parameters like the number of stars, forks, contributors, and activity on the library
repository [13]. If we look into the table, all the libraries have a good number of stars,
which means all the libraries are performing better and users giving good feedback.
564 M. Rajesh and M. Sheshikala
In the list of specified libraries, TensorFlow has more contributors, which implies
that it has more popularity among the all other machine learning libraries [14–17].
Theano is becoming famous day by day in machine learning applications.
4 Conclusions
There are many more libraries in the ML world, but these are the most popular and
widely used. ML is a huge world and the most promising tech right now. No matter
the programming language or the area a developer is working in, learning to work
with libraries is important. Doing so helps in de-complexing the things and to cut
the tedious effort.
References
1. https://www.toptal.com/machine-learning/machine-learningtheory-an-introductory-prime
2. There A, Jeon M, Sethi IK, Xu B (2017) Machine learning theory and applications for
healthcare. J Healthc Eng 2017. Article ID 5263570
Top Five Machine Learning Libraries … 565
Abstract In the current days, popular and innovative works are in developing
tracking efficient systems. This is an easy part, with a smart phone application.
The main objective involved in system for Vehicle Tracking is for providing the
vehicle location on maps using tools like Global Positioning Systems (GPS) and
Global System for mobile communications (GSM) (Hlaing et al. in Int J Trend Sci
Res Dev 3, 2019; Ramadan et al. in Int J Mach Lear Comp 2, 2012; Shafee et al.
Int J Adv Comp Sci App 4, 2013) which operates using base stations and satellites
using Internet in both web and Android applications. Markers in the map provide the
complete information about that vehicle which includes time, address and vehicle
ID. Scripting is done on server by using PHP which is used to insert and get the
details of location information of vehicles. An Android smart phone application is
developed to track the vehicle expected to this application in his smart phone. Appli-
cation will continuously monitor the position and updates the database for every five
minutes. Registered vehicles are issued a track ID, and this is confined to only one
smart phone, which is achieved my MAC-based authentication. He has to specify
the track ID which is basically transport vehicle number and the MAC address of his
smart phone during registration process. Paper keeps emphasis on tracking system
as core concept and explains about various application areas in which system can
be implemented. This work provides real-time results through experimentation and
implementations.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 567
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_56
568 B. V. D. S. Sekhar et al.
1 Introduction
Initial implementation of tracking systems for vehicles was done for shipping
industry where the system provides current position and where about at any given
instance of time. With the needs of people and innovations in technologies, currently
such tracking systems are implemented in real time. Moving further in this work, we
tried and successfully implemented real-time tracking system using which people
can know the exact locations of intended vehicle on their handheld Android mobile
phones or smart phone applications.
We in this work used technologies like GPS and GSM to provide real-time location
and time information anywhere on this globe. PHP which is a server-side scripting
language is also used to send the values from an android device to MQSQL database,
to retrieve the values from the database,to perform analytics on GPS data available
in the database.
A GPS network involves GPS transmitter, GPS receiver and satellites, sending radio
signals of particular frequency which will be detected by GPS receiver. The trans-
mitted messages from satellites are coded in time at this particular frequency with
exact time as well because satellites use atomic clock.
The GPS receiver detects the satellites it can listen, and then collects messages.
As told above, the messages incorporate with time, current position of the satellite
and bits related to other information. These streams of messages are delayed so as
to compensate power and ease to read messages as all satellites transmit messages
on same frequency. Hence, time required for locating position on normal GPS takes
30–60 s.
track ID and a mobile number to which track ID has to be sent, so that parent/friend
can use this track ID and track the person. Vehicles within the specified range can
also be identified by using our smart phone application in which results are viewed
as markers in map which gives us complete information.
2 Existing System
Existed Vehicle Tracking system covers the aspect of tracking a vehicle by using
GPS satellites and Android smart phone application. These are not dealing with
the application area such as implementing this tracking system which relates to the
aspect of woman safety. Existing system cannot get the vehicles which are located
in threshold distance to our vehicle or device [9–11].
3 Proposed System
3.1 Architecture
The architecture of the proposed system can be described as follows (Fig. 1).
i. The code snippet is used to send geo-coordinates data and associated address
by using reverse geo coding and time to our web server. The address from
GPS coordinates is obtained. Further, the data is retrieved and inserted into the
dedicated database. This data is used for woman safety module and threshold
distance-based tracking.
ii. Whenever user wants to track the vehicle, he enters the track ID in the application
or website to get position which can be viewed in Google Map associated to it.
iii. To confine a track ID with a specific device, a MAC-based authentication is
used in which user has to register the device with track ID by entering the MAC
address of mobile.
Android application will retrieve GPS location of device for every 5 min and send
location details to database along with device ID associated with device which is
basically a vehicle no./service no. (or) bus driver starts the bus by entering the
service no./vehicle no. and start the application. So that Android application will
start tracking bus locations every 5 min and send to our database. We will provide
a website to register the bus by entering an ID and other personal details. Users can
track the bus by entering ID via by our application or he can use a normal browser
A Novel Technique of Threshold Distance-Based Vehicle... 571
where we can see markers on Google Map. We can set map to satellite view to get
more clearance. Users by tapping on marker can get the information such as bus ID
which is basically a vehicle no. issued by the Road Transport Authority and service
number, so that we can know the route the bus travels by service number, we will
also get the time and address where bus is located. We also get path in which bus
travels soon they start tracking (Fig. 2).
Dealing with aspect of women safety, this application can protect her. Women who are
travelling by anonymous vehicles can assure safety by our application. They should
enter the vehicle no. before entering the anonymous vehicle and start the application.
They have to enter the mobile number to a person who is basically a father to know
the vehicle ID to track the vehicle. Soon she enters track ID and mobile number, she
has to tap track button. Application will send this vehicle no. which is a track ID as
a SMS to entered mobile number. So that by entering the vehicle no. in our website
or in Android application they can track the vehicle if there is any problem (Fig. 3).
572 B. V. D. S. Sekhar et al.
Dealing with a scenario in which a person got stuck in a highway. If he knows what are
the buses that are available to him within specified threshold distance. Our Android
application or through website he can know the vehicles within threshold distance.
If he takes service no. and check the route so that he will retrieve the source and
destination of a bus. This is done by taking his own location and data analytics with
our locations data in our database by taking locations which are only 5 km far away
from our location. So that he/she might know the vehicle that approaching him and
by knowing service no. he may know the path that bus travels. We also provides the
possible path from his location to vehicle available within threshold distance which
can be viewed in Google Map provided in both Android and in website (Fig. 4).
A Novel Technique of Threshold Distance-Based Vehicle... 573
4 Experimental Results
The experimental results are practically taken and presented in several scenarios as
follows.
1. Testing smartphone application.
2. Testing web server.
3. Testing results in both Android and website.
4. Testing all the modules.
Testing objectives and modules and their related data and status are given in the
Table 1.
Real-time tracking screen shots of the proposed tracking system and mobile
interface (Figs. 5, 6, 7 and 8).
Adding the fuel level sensor to this system results in prediction of distance in which
vehicle can travel and can estimate whether it reaches the nearest refilling centre.
Based on the data available in database, we can calculate the speed and distance of
the vehicle. Prediction of person by using data analytics, i.e. more presence of user
in one location. We can perform anti-theft module by using microcontroller with
GSM module. Implementing this tracking system to delivery instances to track the
item. Finding the nearest ambulance, school buses details with threshold distance
574 B. V. D. S. Sekhar et al.
module. Using satellite images, we can find nearest water bodies. We can develop
an application that list of buses arrives a particular station within five minutes.
A Novel Technique of Threshold Distance-Based Vehicle... 575
5 Conclusions
Tracking system gives us an advantage for companies to locate their vehicles and to
retrieve exact location of vehicle. This can be utilized by the organizations who deal
with product delivery systems. Recent evolution of online cab services too can use
this system. However, even a group of people can place a log of their location details,
so that they will know each other at what distance they separated. Also useful in some
investigation-related activities done by the Police or Military Departments. However
in future, we can even expect a default option in smart phones to enable the location
log with the unique ID or registered mobile number itself. It is a centralized system
capable of giving complete information of location of mobile which can be useful
576 B. V. D. S. Sekhar et al.
in robbery situations or to know the exact location of person who is calling to him
(CONSTRAINT: mobile number is the track ID.). We can even use the GLONASS
and GALILEO which are more accurate than GPS but receivers cost high. Same
system can be developed by using GPS + GPRS module and microcontroller board.
References
1. Rohitaksha K, Madhu CG, Nalini BG, Nirupama CV (2014) Android application for vehicle
theft prevention and tracking system. Int J Comp Sci Info Tech 5(3)
2. Dhumal A, Naikoji A, Patwa Y, Shilimkar M, Nighot MK (2014) Survey paper on vehicle
tracking system using GPS and android. Int J Adv Res Comp Eng Tech 3(11):3762–3765
3. Nkem NF (2020) Implementation of car tracking system using GSM/GPS. Int J Sci Res Pub
10(3)
4. Patil U, Mathad SN, Patil SR (2018) Vehicle tracking system using GPS and GSM using mobile
applications. Intl J Inno Sci Res Tech 3(5)
5. Sekhar BVDS, Reddy PVGD, Varma GPS (2017) Performance of secure and robust water-
marking using evolutionary computing technique. J Glo Inf Mang 25(4)
6. Sekhar BVDS, Venkataramana S, Chakravarthy VVSS, Chowdary PSR, Varma GPS (2018)
Image denoising using wavelet transform based flower pollination algorithm. Adv Intell Syst
Comput 862 Springer
7. Sekhar BVDS et al. (2019) Image denoising using novel social grouping optimization algorithm
with transform domain technique. Int J Natl Comp Res 8(4)
8. Sekhar BVDS, Reddy PVGD, Varma GPS (2015) Novel technique of image denoising using
adaptive haar wavelet transformation. IRECOS 10(10)
9. Venkataramana S, Reddy PVGD, Krishna Rao S (2017) EEECARP: efficient energy clustering
adaptive routing procedure for wireless sensor networks. J Glo Inf Mang 25(4)
10. Venkataramana S, Sekhar BVDS et al. (2020) Recognition of human being through handwritten
digits using image processing techniques and AI. Int J Inno Eng Manag Res 9(12)
11. Deshai N, Sekhar BVDS, Reddy PVGD, Chakravarthy VVSSS (2020) Processing real world
datasets using big data hadoop tools. J Sci Indu Res 79(7)
12. Hlaing N, Naing M, Naing S (2019) GPS and GSM based vehicle tracking system. Int J Tren
Sci Res Deve 3
13. Ramadan MN, Al-Khedher MA, Al-Kheder SA (2012) Intelligent anti-theft and tracking system
for automobiles. Int J Mach Lear Comp 2(1)
14. ElShafee A, EIMenshawi M, Saeed M (2013) Integrating social network services with vehicle
tracking technologies. Int J Adv Comp Sci App 4(6)
Author Index
A D
Aakunoori Suryanandh, 239 Deba Prakash Satapathy, 273, 329, 339, 355
Abinash Sahoo, 273, 299, 319, 329, 339, Deepanshi Agarwal, 29
355 Deepti Barhate, 85, 115
Adi Narayana Reddy, K., 107 Deva Kumar, I., 455
Aiswarya Mishra, 329 Devi Sowmya, M., 455
Akash Naik, 299 Dharmesh Shah, 261
Aluri Lakshmi, 179 Disha Singh, 365
Amit Gupta, 207 Dutta Sai Eswari, 283
Amtul B. Ifra, 419
Anisha, P. R., 409
Anjali Singhal, 29 E
Ankit Yadav, 365 Ebin Deni Raj, 93
Ansuman Mahapatra, 483, 491 Esha Singh, 29
Anuradha, T., 447
Anuradha, Y., 475
G
Arkajyoti Ray, 273
Gaddam Samatha, 437
Ashoka Kumar Ratha, 349
Gajavalli, J., 501
Ashutosh Kumar Dubey, 85, 115
Ghousia Begum, 409
Gnana Manoharan, E., 547
Godavarthi Sri Sai Vikas, 447
B Gopal Krishna Sahoo, 339
Bala Sundar, T., 483 Gopal Rao Kulkarni, 437
Balendra Mouli Marrapu, 309
Bharathi Uppalapati, 39
Bhimala Raghava, 465 H
Bhoomika, S. S., 73 Harapriya Swain, 355
Hardeep Kaur, 535
Hari Shankar Chandran, 373
Hyma, J., 475
C
Chaitanya P. Agrawal, 157
Chakravarthy, V. V. S. S. S., 567 I
Chilupuri Supriya, 239 Indrasena Reddy, M., 135
Chirag Arora, 1, 383 Ippili Saikrishna Amacharyulu, 309
© The Editor(s) (if applicable) and The Author(s), under exclusive license 579
to Springer Nature Singapore Pte Ltd. 2023
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3
580 Author Index
J P
Jeet Santosh Nimbhorkar, 61 Padmaja Usharani, D., 227
Jeevesh, K., 61 Padma Vasavi, K., 197
Jeyalaksshmi, S., 501, 515 Poornima, K. M., 73
Juthika Mahanta, 171 Prabira Kumar Sethy, 349
Prameet Kumar Nanda, 319
Prathima, K., 135, 283
K Praveen, P., 239
Kakunuri Sandya, 187 Preeti, C. M., 391
Kalyani, G., 455 Priyanka, 523
Kavitha, D., 373 Priyashree Ekka, 319
Kirill Krinkin, 147
Kirti Walia, 523
Kishore, T. S., 125 R
Kishor Kumar Reddy, C., 409 Raghavaiah, B., 207
Konda Srikar Goud, 283 Raghavender Raju, L., 251
Kranthi, A., 217 Rajat Valecha, 29
Krishna Kishore, P., 283 Raju, Bh. V. S. R. K., 567
Krishna Rao, S., 567 Rakesh, B., 135
Kurapati Sreenivas Aravind, 61 Ramakrishna Murty, M., 475
Rambabu, D., 391
Rambabu Pemula, 227
L Ranjan Mishra, S., 475
Lakshmana Rao, K., 125 Ravi Mohan Sharma, 157
Lakshmi, L., 51 Ravinder Kaur, 535
Lakshmi Ramani, B., 465 Ravinder Reddy, R., 251
Latha, D., 179 Ravuri Naveen Kumar, 447
Lingala Thirupathi, 391, 401 Rekha, G., 401
Lolla Kiran Kumar, 1 Remya Raveendran, 93
Ritika Malik, 29
Rohith Kumar Jayana, 465
M
Madiha Sadaf, 419
Mallam Gurudeep, 437 S
Mandala Nischitha, 239 Sachin Sharma, 261
Mavoori Hitesh Kumar, 299, 319 Sagenela Vijaya Kumar, 227
Mitta Yogitha, 239 Sai Rashitha Sree, J., 455
Mohan Gopal Raje Urs, 147 Sai Vignesh, P. J., 515
Mothe Rajesh, 559 Sakshi Zanje, 261
Murali Nath, R. S., 51 Sandeep Ravikanti, 437
Sandeep Samantara, 339
Sandeep Samantaray, 273, 299, 319, 329,
N 355
Naga Kalyani, A., 51 Sanket S. Kulkarni, 483
Nagarampalli Manoj Kumar, 299 Santi Kumari Behera, 349
Naga Satish, G., 51 Satwik Kaza, 465
Nageswara Rao, A. V., 207 Sekhar, B. V. D. S., 567
Nalini Kanta Barpanda, 349 Senthil Arumugam Muthukumaraswamy,
Nasaka Ravi Praneeth, 447 15
Naveen Kumar Laskari, 107 Shanmuga Sundari, M., 217
Author Index 581