Proceedings of Fourth International Conference On Computer and Communication Technologies
Proceedings of Fourth International Conference On Computer and Communication Technologies
Proceedings of Fourth International Conference On Computer and Communication Technologies
Proceedings of Fourth
International
Conference
on Computer and
Communication
Technologies
IC3T 2022
Lecture Notes in Networks and Systems
Volume 606
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of
Campinas—UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University of
Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to
both the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose ([email protected]).
K. Ashoka Reddy · B. Rama Devi · Boby George ·
K. Srujan Raju · Mathini Sellathurai
Editors
Proceedings of Fourth
International Conference
on Computer
and Communication
Technologies
IC3T 2022
Editors
K. Ashoka Reddy B. Rama Devi
Kakatiya Institute of Technology Department of Electronics
and Science and Communication Engineering
Warangal, India Kakatiya Institute of Technology
and Science
Boby George Warangal, India
Department of Electrical and Electronics
Engineering K. Srujan Raju
Indian Institute of Technology Department of Computer Science
Chennai, Tamil Nadu, India and Engineering
CMR Technical Campus
Mathini Sellathurai Hyderabad, Telangana, India
Department of Signal Processing
Heriot-Watt University
Edinburgh, UK
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
v
vi Preface
in Embedded System Design in Current Scenario.” These public talks were very
accessible to a general audience. In addition, notably, this was the third conference
at KITSW, and a formal session was held the first day to honor the event as well as
those who were instrumental in initiating the conference.
Generous support for the conference was provided by Captain V. Lakshmikantha
Rao, Honorable Ex. MP (Rajya Sabha), Former Minister, and Chairman, KITS,
Warangal. The funds were sizeable, timely, greatly appreciated, and permitted us to
support a significant number of young scientists (postdocs and students) and persons
from developing/disadvantaged countries. Nevertheless, the number of requests was
far greater than the total support available (by about a factor of five!), and we had to
turn down many financial requests. We encourage the organizers of the next IC3T
to seek a higher level of funding for supporting young scientists and scientists from
developing/disadvantaged countries. All in all, the Springer Scopus Indexed 4th
IC3T 2022 in Warangal was very successful. The plenary lectures and the progress
and special reports bridged the gap between the different fields of Computers and
Communication Technology, making it possible for non-experts in a given area to
gain insight into new areas. Also, included among the speakers were several young
scientists, namely postdocs and students, who brought new perspectives to their fields.
The next IC3T will take place in Warangal in 2023 and trend to be continued every
year. Given the rapidity with which science is advancing in all of the areas covered
by IC3T 2022, we expect that these future conferences will be as stimulating as this
most recent one was, as indicated by the contributions presented in this proceedings
volume.
We would also like to thank the authors and participants of this conference, who
have considered the conference above all hardships. Finally, we would like to thank all
the reviewers, session chairs and volunteers who spent tireless efforts in meeting the
deadlines and arranging every detail to make sure that the conference runs smoothly.
vii
viii Contents
Dr. B. Rama Devi received the Ph.D. from JNTUH College of Engineering, Hyder-
abad, Telangana, India, in April 2016. She completed M.Tech.—Digital Commu-
nication Engineering from Kakatiya University, Warangal in 2007. She joined the
faculty of Electronics and Communication Engineering, KITSW in 2007. Currently,
she is working as Professor and Head, Department ECE. She published more than 40
papers in various journals and conferences, and filed four patents. She published three
books and acted as Session Chair for various international conferences. Her areas
of interest include wireless communication, wireless networks, Signal processing
for communications, medical body area networks, and Smart grid. She is an active
reviewer for IEEE Transactions on Vehicular Technology (TVT), Elsevier, Wireless
Personal Communications and Springer journals.
xiii
xiv Editors and Contributors
Dr. Boby George, received the M.Tech. and Ph.D. degrees in Electrical Engineering
from the Indian Institute of Technology (IIT) Madras, Chennai, India, in 2003 and
2007, respectively. He was a Postdoctoral Fellow with the Institute of Electrical
Measurement and Measurement Signal Processing, Technical University of Graz,
Graz Austria from 2007 to 2010. He joined the faculty of the Department of Electrical
Engineering, IIT Madras in 2010. Currently, he is working as a Professor there.
His areas of interest include magnetic and electric field-based sensing approaches,
sensor interface circuits/signal conditioning circuits, sensors and instrumentation
for automotive and industrial applications. He has co-authored more than 75 IEEE
transactions/journals. He is an Associate Editor for IEEE Sensors Journal, IEEE
Transactions on Industrial Electronics, and IEEE Transactions on Instrumentation
and Measurement.
Dr. K. Srujan Raju is currently working as Dean Student Welfare and Heading
Department of Computer Science and Engineering and Information Technology at
CMR Technical Campus. He obtained his Doctorate in Computer Science in the
area of Network Security. He has more than 20 years of experience in academics and
research. His research interest areas include computer networks, information security,
data mining, cognitive radio networks and image processing and other programming
languages. Dr. Raju is presently working on two projects funded by Government of
India under CSRI and NSTMIS, has also filed seven patents and one copyright at
Indian Patent Office, edited more than 14 books published by Springer Book Proceed-
ings of AISC series, LAIS series and other which are indexed by Scopus, also authored
books in C Programming and Data Structure, Exploring to Internet, Hacking Secrets,
contributed chapters in various books and published more than 30 papers in reputed
peer-reviewed journals and international conferences. Dr. Raju was invited as Session
Chair, Keynote Speaker, a Technical Program Committee member, Track Manager
and a reviewer for many national and international conferences also appointed as
subject Expert by CEPTAM DRDO—Delhi and CDAC. He has undergone specific
training conducted by Wipro Mission 10X and NITTTR, Chennai, which helped
his involvement with students and is very conducive for solving their day-to-day
problems. He has guided various student clubs for activities ranging from photog-
raphy to Hackathon. He mentored more than 100 students for incubating cutting-edge
solutions. He has organized many conferences, FDPs, workshops and symposiums.
He has established the Centre of Excellence in IoT, Data Analytics. Dr. Raju is
a member of various professional bodies, received Significant Contributor Award
and Active Young Member Award from Computer Society of India and also served
as a Management Committee member, State Student Coordinator and Secretary of
CSI—Hyderabad Chapter.
Dr. Mathini Sellathurai is currently the Dean of Science and Engineering and
the head of the Signal Processing for Intelligent Systems and Communications
Research Group, Heriot-Watt University, Edinburgh, UK and leading research in
signal processing for Radar and wireless communications networks. Professor Sell-
athurai has 5 years of industrial research experience. She held positions with
Editors and Contributors xv
Bell-Laboratories, New Jersey, USA and with the Canadian (Government) Commu-
nications Research Centre, Ottawa Canada. She was an Associate Editorship for the
IEEE Transactions on Signal Processing (2005–2018) and IEEE Signal Processing
for Communications Technical Committee member (2013–2018). She was an orga-
nizer for the IEEE International Workshop on Cognitive Wireless Systems, IIT Delhi,
India, 2009, 2010 and 2013; and the General Chair of the 2016 IEEE Workshop on
Signal Processing Advances in Wireless Communications (SPAWC), Edinburgh,
UK. She is also a peer review college member and Strategic Advisory Committee
member of Information and Communications Technology of Engineering and Phys-
ical Sciences Research Council, UK. Professor Sellathurai has published over 200
peer reviewed papers in leading international journals and IEEE conferences as well
as a research monograph. She was the recipient of the IEEE Communication Society
Fred W. Ellersick Best Paper Award in 2005, Industry Canada Public Service Awards
for her contributions in science and technology in 2005 and awards for contri-
butions to technology Transfer to industries in 2004. She was also the recipient
of the Natural Sciences and Engineering Research Council of Canada’s doctoral
award for her Ph.D. dissertation Her research has been funded by UK Engineering
Physical Sciences Research Council titled “A Unified Multiple Access Framework
for Next Generation Mobile Networks By Removing Orthogonality”; “Large Scale
Antenna Systems Made Practical: Advanced Signal Processing for Compact Deploy-
ments”; “A Systematic Study of Physical Layer Network Coding: From Information-
theoretic Understanding to Practical DSP Algorithm Design”; “Advanced Signal
Processing Techniques for Multi-user Multiple-input Multiple-output Broadband
Wireless Communications”; “Bridging the gap between design and implementation
of soft-detectors for Turbo-MIMO wireless systems”; “Signal Processing Techniques
to Reduce the Clutter Competition in Forward Looking Radar”.
Contributors
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_1
2 S. Divya Meena et al.
1 Introduction
Agriculture is vital in every economy since it plays a major role to supply food and
income to a large portion of the population. Plant diseases have become a major source
of concern in this agricultural sector because they generally cause crop damage,
leading to a reduction in the availability and value of food produced [TMLAI]. Crop
product quantity and quality have a direct impact on people’s daily living conditions
[1]. The worsening diversity has changed the environmental structure to a greater
extent in the past years, which has paved the way for a widespread outbreak of
agricultural diseases and pests. Even pathologists and agriculturists may not be able
to spot illnesses that have afflicted plants just by looking at disease-affected leaves
due to the development of a varied diversity of crop products. Visual inspection is
still the major method of disease detection in rural parts of poor nations [2], but it
involves various processes which are unsuitable, and also making it unfeasible for
various medium-sized farms all throughout the world [3]. According to the Food
and Agriculture Organization of the United Nations (FAO), around 40% of pants
are lost each year due to sickness and pests [4]. The prevention of crop losses is
becoming a center of research today, as studied by the study in [5] since it is a
highly connected issue with climate change food standards and protection of the
environment. As a result, it is so important and impactful in productivity that it has
become a key instrument in precision agriculture (PA) [6]. It is easy to determine the
crop’s insufficiency just with the help of the image of the leaf. As a result, finding
a quick, easy-to-use, and low-cost method to automatically detect plant illnesses is
critical and realistic.
To control the problem, an automatic, less expensive, and accurate detection
system for predicting the illness from images of any parts of plants and suggesting
an apt pesticide as a possible solution is essential [7]. Even while image processing
techniques are effective in detecting illnesses of the plant, they are subject to differ-
ences in leaf pictures due to form, texture, image noise, texture, and other factors.
Machine learning techniques may also be used to classify plant diseases utilizing
a variety of feature sets. Several steps have to be followed prior to extracting the
features, efficiently, for example, enhancement of pictures, segmentation, and color
modification. Before efficiently extracting features, preprocessing is required, such
as picture enhancement, color modification, and segmentation [8]. Many classifiers,
such as random forest, support vector machine, artificial neural network, deep neural
network model, and others, may be employed after feature extraction. Traditional
Plant Diseases Detection Using Transfer Learning 3
machine learning algorithms for disease diagnosis are very challenging to imple-
ment. As a result, deep learning methods can assist in overcoming these challenges
in developing a better and expert system for agricultural growth. Many deep learning
concepts have been applied to the agricultural area in recent years to solve prob-
lems, including insect identification, fruit detection, plant leaf categorization, and
fruit disease detection, among others. To design a plant disease detection system,
pictures of other parts of the plants can also be taken. But the most common and
easiest portion of a plant to detect sickness of a particular plant is its leaves. As a
result, we have used the leaves as a sample in this study to identify diseased crops
[7].
This paper proposes a system based on deep learning for detecting and classi-
fying plant diseases. The MobileNet and ResNet model are used to analyze perfor-
mance on a minimal memory-efficient interface. The MobileNetV1 infrastructure
can attain better accuracy rates compared to ResNet while decreasing the number of
parameters and computations. Images of plant leaves from 14 crops were considered
in 38 different classes depending on their state of fitness and illness categories. To
ensure broad implementation of the suggested paradigm, publicly accessible resource
includes pictures from multiple countries’ archives. To establish a strong founda-
tion, the photographs include both laboratory and field images. The following point
discusses the contribution of this work:
• A Transfer learning concept is employed in the proposed methodology by fitting
the data into four CNN models, and the proposed system classifies the leaves
depending on the disease.
• With the emergence of smart applications, a very simple web application has been
designed to give enhanced farming platform and assistance for recognizing plant
pathogens.
• For numerous iterations, a vast dataset of gathered photos in diverse characteristics
is being used to analyze deep learning designs.
• Different neural network models are compared in these studies.
The following is the paper’s structure: Section 2 dealt with the literature review,
Sect. 3 with the proposed work methodology, Sect. 4 with the framework analysis
and datasets, Sect. 5 with the results, and Sect. 6 with the conclusion and future
research recommendations.
2 Literature Review
Regardless of the fact that many researchers have been focusing on machine learning
methods to detect a vast range of materials, very few studies are done on utilizing
a MobileNet network either with transfer learning or without to predict illnesses of
plants. Pest analysis demands the statistical study of vast amounts of data to determine
4 S. Divya Meena et al.
the association of multiple components in order to acquire the guideline for protec-
tion. Hand identification techniques have a plethora of issues such as being only appli-
cable to limited size plantations. On the one hand, the experience of employees varies
a lot, resulting in inappropriate data on plant diseases and pests, negating the project,
and resulting in agronomic losses. Most of the other research in this area, that has
arisen on prevailing deep networks, has been constrained to the actions of computer
systems with tremendous amassment responsiveness and also asset simulation in
the majority of cases. SVM model which requires constructed features to differen-
tiate the classes is used to recognize many plant diseases for a long time, including
grape leaf diseases, palm oil leaf diseases, potato blight illnesses, and so on. Singh
et al. [9] demonstrated a CNN model, i.e., multilayer CNN, to classify the leaves of
mango tree produced by bacterial blight with the help of dataset containing real-time
images with both afflicted and uninfected leaves. Mohanty et al. [10] employed deep
learning architecture for diagnosing 26 infections in 14 different crops with 99.35%
estimated accuracy. Barbedo [11] explored the challenges of detecting plant disease
using visible range imagery. The author has examined numerous issues connected
to plant disease recognition in this work. La et al. [12] described a new method for
detecting rice illnesses. To diagnose ten rice ailments, 500 pictures of good and sick
rice leaves were used. When utilizing tenfold cross-validation, the suggested CNN
achieves good accuracy.
Ma et al. [13] used a deep learning model for common side effect categorization to
classify four diseases of cucumber as follows: downy mildew, anthracnose, target leaf
spots, and powdery mildew. Dan et al. [14] used an updated version of the MobileNet
V2 algorithm for photo recognition to assess 11 different Lycium barbarum illnesses
and pests. For this experiment, a total of 1955 photographs were taken, which were
subsequently spatially metamorphosed into 18,720. Their recommended solution,
SEMobileNet V2, has a 98.23% accuracy rate, which is greater than previous testing
in this sector. The deep neural network is also trained by Rangarajan et al. [15].
The GoogleNet, AlexNet, and VGGNet, three excellent deep learning architecture
systems are used to detects various kind of plant disease and their combined method
was able to reach an overall accuracy of 80%.
A smaller amount of time and effort has gone into determining the extent of
stress, which Kranz [16] and Bock et al. [17] believe is critical for controlling pests
infestations, estimating harvest, and suggesting control remedies, as well as under-
standing fundamental biological processes like coevolution and plant disease causa-
tion [17]. This input is severely limited owing to a scarcity of reliable information
that includes these crucial data. All of the prior research and published findings
are encouraging, but more inventive and improved solutions in the field of plant
identification are still needed. Disease detection and categorization with high accu-
racy are utilizing sophisticated neural network designs. Such automated analysis
methods should indeed be analyzed with a large number of crops in diverse classes
and scanning circumstances to increase their durability and efficacy. As a result, our
algorithms suggested in this research increase both the efficiency and classification
accuracy for plant disease photos by referencing the research and pertinent data. To
solve the stated difficulties, this research provides a pest image processing approach
Plant Diseases Detection Using Transfer Learning 5
3 Proposed Model
Models based on CNN are the deep learning methods that are used for identifying and
classifying the diseases of the plant based on the images. CNN models are used for
most image processing works, because of their accuracy in predicting the classes of
the image. CNN (Fig. 1) models consist of several layers, and numerous designs have
been implemented to obtain accurate results. In this project, various CNN models
are used. The workflow of the project is shown in Fig. 1.
Table 1 Information
Classification category Label information
regarding class and label
Presence of diseases 0—Healthy
1—Unhealthy
Data augmentation is the most effective method for boosting the amount of training
data by changing an existing dataset to create a new one. Because deep neural
networks are extremely data-hungry models, they necessitate a significant amount
of data to provide correct results. The Kaggle dataset contains a series of photos that
have been augmented using the data augmentation approach [18], which involves
making modest adjustments to the images such as image flipping, color augmen-
tation, rotation, scaling, and so on. This updated fresh dataset is used to train the
models. When using unseen data, the model will regard each little change as a new
image, resulting in more accurate and better outcomes. The enhanced images are as
seen below in Fig. 2.
Fig. 2 Sample augmented images: a width shift, b horizontal and vertical flip, c zoom, and d color
adjustment
Plant Diseases Detection Using Transfer Learning 7
In today’s world, deep learning is a useful technique in which a CNN model that
has previously been constructed for one job is used as a starting point for a model
for a different task. A productive technique to employ pre-trained models for natural
language processing (NLP) tasks, as rebuilding network models for this procedure
takes a long time [19]. When dealing with predictive modeling, issues in which picture
data is used as an input transfer learning are frequently used. As demonstrated in
Fig. 3, this might be a prediction job with photos as well as video data as input. Deep
learning concepts reduce time by eliminating the need to analyze huge amounts of
data since it builds on existing knowledge [20]. We get better and more accurate
outcomes as a result of this. MobileNet, ResNet, and EfficientNet are three well-
known and useful designs.
MobileNetV1
TensorFlow’s first mobile computer vision model, MobileNet, was created by a team
of Google engineers specifically for mobile applications. Depthwise separable convo-
lutions are preferred. As compared to a network with regular networks with the similar
depth in the networks, the set of features is substantially decreased. Lightweight
neural networks are built as a result [21]. When depthwise and pointwise convolu-
tions are studied independently, a MobileNet has 28 layers. In a typical MobileNet,
which is an upgraded version of other current models, the width multiplier hyperpa-
rameter may be modified or tweaked to lower the number of parameters (4.2 million).
The size of the supplied picture is 224 * 224 * 3 pixels [22]. The MobileNetV1
architecture is depicted below in Fig. 4.
ResNet34
ResNet34 is a 34-layer convolutional neural network image classification model that
is state of the art. The ResNet34 network’s infrastructure is the network’s residual
building component, and it makes up the bulk of the network residual neural networks
that (ResNets) are artificial neural networks (ANNs) that build networks by stacking
residual blocks on top of each other [22]. A 3 × 3 max-pooling layer, an average pool
layer, and a fully linked layer are among ResNet’s 34 layers, which are separated
into 33 convolutional layers. In the “Basic Block,” rectification nonlinearity (ReLU)
activation and batch normalization (BN) have been provided toward the rear of all
fully connected layers, and the sigmoid activation function is applied to the final layer
in the typical manner. 63.5 million parameters in the ResNet34 model. The ResNet
model is trained using residuals, which are the differences between the layer’s inputs.
The input shape for each ResNet34 model is 150 × 150 × 3. The residual building
component consists of many convolutional layers (Conv), batch normalizations (BN),
a rectified linear unit (ReLU) activation function, and a shortcut. Figure 5 depicts the
ResNet34 architecture.
4 Experimental Framework
A framework for identifying and quantifying plant diseases based on deep learning
was analyzed on multiple leaf photos from diverse crops in a device with a 64 GB
Plant Diseases Detection Using Transfer Learning 9
RAM, Intel Xenon CPU, and a 64-bit Windows 10 operating system. Python is the
programming framework in Anaconda Jupyter Notebook with TensorFlow, Keras,
PyTorch, and other libraries. In this project, a learning rate of 0.001 is used. In this
section, the loss and optimization functions that we used in our implementation are
discussed in this section.
Loss Function
To learn, machines use a loss characteristic. It is a technique for determining how
well a selected set of rules replicates the data. The loss feature will return in a large
quantity if the forecasts are too far off from the actual circumstances. With the help
of a few optimization characteristics, eventually, the loss function adapts to decrease
prediction errors. Cross-entropy [25] has been employed as the loss function in this
paper. Since the images are divided into various groups, categorical cross-entropy
[26] is used. The error is computed using the loss function for each class, which
ranges from 0 to 1. Categorical cross-entropy is expressed mathematically as Eq. (1)
output size
W denotes weight, b denotes bias, and V is the rate at which the gradient drop is taking
place. After each epoch, these equations will be utilized to adjust the weights and
biases of each layer. Learning rate η and Epsilon ε (Epsilon = 10–8) are parameters
that prevent zero division and are the learning rate.
Dataset Description
Three different datasets have been used for the training and testing process. Table 2
describes the dataset, and Fig. 6 illustrates the sample of each image class.
10 S. Divya Meena et al.
The trained dense deep learning model was evaluated using all photos from the
validation set as well as unseen test images. True Positive (TP), False Positive (FP),
True Negative (TN), False Negative (FN), accuracy, recall, precision, and F1-score
are some of the traits that are evaluated. Proper data labels that were appropriately
expected in reference to the ground truth are referred to be TP. Negative data labels
that were incorrectly anticipated and categorized into a separate image label category
are referred to be FP. Negative data samples that have been successfully forecasted are
referred to be TN. FN stands for positive data labels that were incorrectly expected.
The outcomes of training and testing the neural networks are shown below for both
networks Data with both actual truth and extrapolated identities is assessed using
confusion matrices.
Table 3 shows the recall, precision, specificity, and F1-score for each class. Speci-
ficity, recall, precision, and F1-score had an average value of 99.72%, 93.26%,
92.88%, and 92.98%, respectively. The trained deep learning model achieved an
overall average accuracy of 98.91% on the test dataset. Table 4 shows the various
performance parameters estimated on the testing set of photos, and the trained
ResNet34 model surpassed the other model in comparison with its performance.
Table 5 shows the time it takes for different models to compute the identical collec-
tion of photos utilized in the suggested study. Frames are processed fast by the
MobileNetV1 deep learning model. In a system with GPU, the ResNet34 took
Table 3 Performance evaluation of the test dataset
Class Specificity Recall Precision Accuracy F1_Score Class Specificity Recall Precision Accuracy F1_Score
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
Apple black rot 100.00 100.00 100.00 100.00 100.00 Raspberry health 100.00 67.44 87.40 74.99 62.45
Apple cedar rust 100.00 100.00 100.00 100.00 100.00 Soybean health 99.95 100.00 67.98 98.45 78.11
Apple healthy 100.00 100.00 100.00 100.00 100.00 Squash 100.00 100.00 100.00 100.00 100.00
powdery_mildew
Apple scab 100.00 100.00 100.00 100.00 100.00 Strawberry 100.00 99.34 99.45 67.33 74.77
leaf_scorch
Blueberry 100.00 97.67 100.00 96.02 98.82 Strawberry health 100.00 99.45 98.45 97.56 99.34
health
Cherry health 99.2 100.00 100.00 93.40 98.76 Tomato bacterial 99.97 99.85 98.46 99.12 99.32
spot
Cherry powdery 100.00 100.00 100.00 100.00 100.00 Tomato early 99.94 100.00 98.88 99.74 99.81
Plant Diseases Detection Using Transfer Learning
mildew blight
Corn cercospora 99.96 100.00 97.80 100.00 98.88 Tomato late blight 99.83 78.65 89.54 89.99 99.99
spot
Corn common 99.34 65.44 74.32 78.33 69.84 Tomato leaf_mold 99.96 100.00 100.00 88.88 100.00
rust
Corn healthy 99.92 96.25 88.50 92.00 92.19 Tomato seporia 99.56 88.91 87.69 85.99 99.45
leaf spot
Corn Northern 100.00 100.00 100.00 100.00 100.00 Tomato 99.39 87.88 88.93 88.45 85.38
leaf blight spider_mites
Grape black rot 100.00 97.54 100.00 67.44 100.00 Tomato target 100.00 99.92 98.45 98.67 98.51
spot
(continued)
11
Table 3 (continued)
12
Class Specificity Recall Precision Accuracy F1_Score Class Specificity Recall Precision Accuracy F1_Score
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
Grape esca 100.00 100.00 100.00 100.00 100.00 Tomato leaf curl 100.00 100.00 100.00 100.00 100.00
virus
Grape healthy 100.00 100.00 100.00 100.00 100.00 Tomato mosaic 100.00 99.97 98.88 99.95 99.95
virus
Grape leaf 99.92 93.45 65.55 78.45 81.50 Tomato health 100.00 100.00 100.00 100.00 100.00
blight
Orange 97.56 67.13 74.34 97.44 98.45 Citrus black spot 100.00 100.00 100.00 100.00 100.00
Haunglongbing
Peach bacterial 99.96 100.00 98.76 100.00 98.31 Citrus canker 99.36 99.96 99.98 99.95 99.95
spot
Peach healthy 99.90 76.44 71.35 100.00 63.53 Citrus greening 99.26 100.00 78.99 87.53 80.23
Pepper bell 99.96 100.00 88.93 74.62 100.00 Citrus scab 99.98 86.77 80.35 89.96 99.96
bacterial spot
Pepper bell 100.00 100.00 100.00 100.00 100.00 Rice bacterial leaf 99.96 99.34 99.98 90.56 99.98
healthy blight
Potato early 100.00 100.00 100.00 100.00 100.00 Rice brown spot 100.00 100.00 100.00 100.00 100.00
blight
Potato late 99.92 81.00 78.31 65.78 100.00 Rice leaf smut 100.00 99.97 92.19 93.79 96.25
blight
Potato healthy 99.96 100.00 94.76 100.00 67.34
S. Divya Meena et al.
Plant Diseases Detection Using Transfer Learning 13
219.037 s to handle plant leave images with aspects of 128 × 128 pixels and a
framerate of 18.932 frames per second.
The suggested approach achieved 98.91% correctness using a complex network
trained on huge crop leaf picture from a variety of datasets with a wide variety
of laboratory and on-field photos. Thus, these types of systems can be utilized to
identify plant diseases since they provide the best accuracy with real-time perfor-
mance on a dataset with intra-class and inter-class variance. The screenshots of web
application which has been done using Python framework, and deep learning archi-
tecture is shown in Figs. 8, 9, and 10. Plant pathogens identification and classification
approaches can be developed in the context of crop remote monitoring in order to
make timely decisions and ensure healthy crop growth. The results in Fig. 7 are the
average accuracy and loss of two different models.
Using CNN to predict and identify agricultural illnesses is a difficult task. There have
been numerous innovative ways developed for categorizing agricultural pathogens
that prey on damaged crops. However, there is currently no commercially available
approach for identifying illnesses that is both trustworthy and cost effective. A CNN-
based plant disease prediction and analysis technique is provided in this paper. Three
crop datasets are used in this study. The fully convolutional neural network is used
to create the data processing model, and the data analysis approach is improved
to assure the accuracy of the data analysis model. The simulation findings suggest
that the method proposed is able to accurately predict and detect crop disorders, as
well as having a good network model performance with an accuracy of 98.91%. The
model’s training time was far shorter than that of earlier machine learning approaches.
According to the results of the studies, the integrated segmented and classification
methods can be used well for crop disease prediction. Overall, the proposed technique
has a lot of promise in terms of crop disease recognition and classification, as well
as a fresh idea for the crop disease detection process. Future work could perhaps
concentrate on disease and pest image analysis, also predicting the active surface of
plant pathogens, and trying to judge the intensity of crop diseases and pests, required
to bring out an efficient and systematic diagnosis to avoid significant economic losses,
and we strategize to implement it on an integrated system to ensure and quickly detect
a wide broad spectrum of biological diseases, allowing us to respond faster.
References
1. Liu Y, Zhang X, Gao Y, Qu T, Shi Y. Improved CNN method for crop pest identification based
on transfer learning. https://www.hindawi.com/journals/cin/2022/9709648/
2. Hassan SM, Maji AK, Jasiński M, Leonowicz Z, Jasińska E (2021) Identification of plant-leaf
diseases using CNN and transfer-learning approach. Electronics 10(12):1388
3. Sharma P, Berwal YPS, Ghai W (2020) Performance analysis of deep learning CNN models
for disease detection in plants using image segmentation. Inf Process Agric 7(4):566–574
4. Food and Agriculture Organization of the United Nations, Plant Health and Food Security,
International Plant Protection Convention, Rome, Italy, 2017
5. Fenu G, Malloci FM (2021) Forecasting plant and crop disease: an explorative study on current
algorithms. Big Data Cogn Comput 5(1):2
6. Harvey CA, Rakotobe ZL, Rao N et al (2014) Extreme vulnerability of smallholder farmers to
agricultural risks and climate change in Madagascar. Philos Trans R Soc B: Biol Sci 369(1639).
Article ID 20130089
7. Tahamid A (2020) Tomato leaf disease detection using Resnet-50 and MobileNet architecture
(Doctoral dissertation, Brac University)
8. Camargo A, Smith J (2009) An image-processing-based algorithm to automatically identify
plant disease visual symptoms. Biosyst Eng 102:9–21
9. Singh LTP, Chouhan SS, Jain S, Jain S (2019) Multilayer convolution neural network for the
classification of mango leaves infected by anthracnose disease. IEEE Access 7:4372143729
10. Mohanty SP, Hughes DP, Salathe M (2016) Using deep learning for image-based plant disease
detection. Front Plant Sci 7:1419
16 S. Divya Meena et al.
11. Barbedo JGA (2016) A review on the main challenges in automatic plant disease identification
based on visible range images. Biosyst Eng 144:52–60
12. La Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep
convolutional neural networks. Neurocomputing 267:378–384
13. Ma J, Du K, Zheng F, Zhang L, Gong Z, Sun Z (2018) A recognition method for cucumber
diseases using leaf symptom images based on deep convolutional neural network. Comp
Electron Agric 154:18–24
14. Dan B, Sun X, Liu L (2019) Diseases and pests identification of lycium barbarum using se-
mobilenet v2 algorithm. In: 2019 12th International symposium on computational intelligence
and design (ISCID), vol 1. IEEE, pp 121–125
15. Rangarajan K, Purushothaman R, Ramesh A (2018) Tomato crop diseases classification using
pre-trained deep learning algorithmn. Procedia Comp Sci 133:1040–1047
16. Kranz J (1988) Measuring plant disease. In: Experimental techniques in plant disease
epidemiology. Springer, Berlin, Germany, pp 35–50
17. Bock CH, Poole GH, Parker PE, Gottwald TR (2010) Plant disease severity estimated visually,
by digital photography and image analysis, and by hyperspectral imaging. Crit Rev Plant Sci
29(2):59–107
18. Team K (2021) Keras documentation: image data preprocessing. Keras.io. https://keras.io/api/
preprocessing/image/
19. Chollet F (2018) Deep learning with Python. https://livebook.manning.com/book/deep-lea
rning-with-python/about-this-book/9
20. Brownlee J (2019, May 14) Transfer learning in keras with computer vision models. Machine
Learning Mastery. https://machinelearningmastery.com/how-to-use-transfer-learning-when-
developing-convolutional-neural-network-models/
21. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H.
https://paperswithcode.com/paper/mobilenets-efficient-convolutional-neural
22. Singh N. https://iq.opengenus.org/mobilenet-v1-architecture/
23. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv. https://arxiv.org/abs/1409.1556
24. Boesch G (2021, Aug 29) Deep residual networks (ResNet, ResNet50)—guide in 2021. Viso.ai.
https://viso.ai/deep-learning/resnet-residual-neural-network/
25. Mannor S, Peleg D, Rubinstein R (2005) The cross entropy method for classification. In:
Proceedings of the 22nd international conference on Machine learning. https://icml.cc/Confer
ences/2005/proceedings/papers/071_CrossEntropy_MannorEtAl.pdf
26. Parmar R (2018, Sept 2) Common loss functions in machine learning. Medium. https://toward
sdatascience.com/common-loss-functions-in-machine-learning-46af0ffc4d23
27. Doshi S (2020, Aug 3) Various optimization algorithms for training neural network. Medium.
https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
28. Hinton G, Srivastava N, Swersky K (n.d.) Neural networks for machine learning lecture 6a
Overview of mini-batch gradient descent. https://www.cs.toronto.edu/~tijmen/csc321/slides/
lecture_slides_lec6.pdf
29. Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Statis 22(3):400–
407. https://www.jstor.org/stable/2236626
Shuffled Frog Leap and Ant Lion
Optimization for Intrusion Detection
in IoT-Based WSN
Abstract In recent years, to examine the energy and security of the nodes that make
up an Internet of Things (IoT) network is important. However, because of the limited
resources, it is impossible to build a system that is 100% safe. An IDS is used to check
all incoming traffic and identify network intrusions in WSN and IoT communication
networks. It is also feasible for an attacker to steal sensors from an IoT network. To
guarantee the safety of the WSN and IoT, an effective IDs system must be designed.
As a result, SFLA-ALO is proposed in this study to identify intruders for WSN-IOT
in order to safeguard against damaging malicious assaults. The suggested SFLA-
ALO surpasses the previous systems in terms of throughput, detection rate, energy
usage, and delay. MATLAB was used to assess these instances, which clearly beats
existing detection systems.
1 Introduction
WSNs are a kind of wireless network in which data transfer from a source to a
base station is possible without the need of any infrastructure [1]. The Internet of
things (IoT) has lately emerged as a superset of the previously outlined pattern
of networking. Because of their distributed nature, IoT-WSN networks provide a
significant security problem [2]. In addition to data transmission and reception, these
IoT devices are used to connect numerous devices to the Internet [3]. An IoT network
known as a WSN is what this research is committed to protecting. In the recent decade,
machine learning and artificial intelligence-based IDSs have been extensively studied
[4]. IoT networks link millions of sensors wirelessly, making the network resource
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 17
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_2
18 D. Jayanayudu and A. Ch. Sudhir
constrained [6]. Nodes in a WSN may move freely within the network due to the great
mobility of the network [7]. Despite the fact that the Internet of things (IoT) offers
many opportunities for creating an efficient system, power consumption seems to be
a significant issue [8]. Because of the WSN’s dynamic nature, routes between nodes
often change, necessitating the need of an efficient routing protocol [9]. Due to the
increasing mobility of network nodes, discovering the route and tracing it becomes a
difficult task [10]. The major contributions of this research are analysis of WSN and
IoT-based communication networks’ security needs and probable harmful attacks.
• Concentrate on the protocols for IoT-WSN network intrusion detection.
• To design and implement a secure WSN-IoT application security algorithm named
SFLA-ALO.
2 Literature Survey
3 Problem Statement
• IoT networks have significant security challenges due to the increasing mobility
of WSN nodes.
• A problem for safe routing is posed by IoT nodes’ ability to self-organize and
function without external infrastructure.
• The multicast routing method is computationally exhausting and does not take
into consideration priority assignment criteria for the discovery of routes.
4 Proposed Method
The major goal of this work is to build SFLA-ALO for energy conscious multicast
routing in WSNs. Both sorts of malicious behavior black hole attack and distributed
denial of service (DDOS) attack used in the studies are discussed in this which makes
the malicious node appear as if it has the shortest path.
DDoS assaults and black holes may both be launched from these rogue devices, which
may already malicious scripts pre installed. When it comes to selecting secure nodes
for secure and efficient communication, the fit factor is an important consideration.
According to the trust and energy model of IoT nodes:
bytes
Ti j = Ti,direct
j Ti,indirect
j Ti,recent
j Ti, j (1)
The trust used for the evaluation of the node trust is direct trust Ti,direct
j , indirect
bytes
trust Ti,indirect
j , Ti,recent
j , and bytes trust Ti, j .
X i = X i1 , X i2 , . . . , X in (2)
A frog’s location can move no more than Dmax degrees from its initial position;
thus, we use random numbers between 0 and 1.
Generally, ALO primarily consists of five processes: those are random movement of
ants, building, traps, catching of prays, and rebuilding of traps. The ant’s position is
randomly kept in the matrix M ant that is given as Eq. (6):
⎡ ⎤
Ant1,1 Ant1,2 · · · Ant1,d
⎢ Ant2,1 Ant2,2 · · · Ant2,d ⎥
⎢ ⎥
Mant =⎢ . .. .. ⎥ (6)
⎣ .. . ··· . ⎦
Antn,1 Antn,2 · · · Antn,d
Here, Ai,j designates the jth value of variable of ith ant, n represents the number of
ants, and number of variables characterized as d. In objective function f, this is given
as Eq. (7), and the fitness ant should be retained within the M OA matrix.
⎡ ⎤
f [Ant1,1 Ant1,2 · · · Ant1,d ]
⎢ f [Ant2,1 Ant2,2 · · · Ant2,d ] ⎥
⎢ ⎥
MOA = ⎢ .. .. .. ⎥ (7)
⎣ . . ··· . ⎦
f [Antn,1 Antn,1 · · · Antn,d ]
M antlion and M OAL are mentioned as the location and fitness of ant lion, and the
matrices are given by below Eqs. (8) and (9).
⎡ ⎤
AntL1,1 AntL1,2 · · · AntL1,d
⎢ AntL2,1 AntL2,2 · · · AntL2,d ⎥
⎢ ⎥
Mantlion =⎢ .. .. .. ⎥ (8)
⎣ . . ··· . ⎦
AntLn,1 AntLn,2 · · · AntLn,d
Shuffled Frog Leap and Ant Lion Optimization for Intrusion Detection … 21
⎡ ⎤
f ([AntL1,1 AntL1,2 · · · AntL1,d ])
⎢ f ([AntL2,1 AntL2,2 · · · AntL2,d ]) ⎥
⎢ ⎥
MOAL = ⎢ .. .. .. ⎥ (9)
⎣ . . ··· . ⎦
f ([AntLn,1 AntLn,1 · · · AntLn,d ])
The roulette wheel is used to estimate the high probability for the ant section that
will be used to find the best ant lion. For the trapping procedure, Eq. (10) is applied.
where the random walks are designated adjacent to the ant lion via roulette wheel
which is referred as Rt .
The performance of detection rate denotes the precise attacker detection, and delay
performance mentions the interval reserved for transmitting the data between the
IoT nodes in the network. The proposed SFLA-ALO technique promotes the better
performances in all the metrics such as the detection rate, throughput, and highest
energy with minimal delay. The result segment computes the proportional inves-
tigation of the proposed SFLA-ALO approaches and depends on the performance
indexes in the existence of two types of attacks: One refers to the black hole attack
and other one refers to DDoS attack. This investigation is deliberated with 50 nodes
in the MATLAB/simulation setting as discussed in below.
Fig. 2 Performance of
detection rate
time. The detection rate obtained in the proposed SFLA-ALO and crow whale-ETR
is 0.712 and 0.651, respectively, at the interval of 20 s.
Correspondingly, the examination of the energy performance is computed in
Fig. 3. The results of the proposed SFLA-ALO and CWOA are 71 and 65 at the
time of 20 s. Figure 4 illustrates the performance of throughput for proposed SFLA-
ALO and CWOA, respectively. From the result of throughput, it clearly illustrates the
proposed SFLA-ALO and CWOA that achieve the value of 0.2 and 0.05, respectively,
at the time of 20 s.
In this section, the investigation is developed with 50 nodes in the existence of the
DDoS attack. In the establishment of the node sequences, there is not much delay,
when the time starts to increase; it gives the increase in delay. On the other hand,
from Fig. 5, it strongly denotes that the delay of the proposed SFLA-ALO technique
accomplished a marginal delay rate while associated with the existing crow whale
method. The delay of proposed SFLA-ALO and crow whale-ETR is 0.2388 and
0.2890, respectively, while the interval is at 20 s.
Shuffled Frog Leap and Ant Lion Optimization for Intrusion Detection … 23
Fig. 3 Performance of
energy
Fig. 4 Performance of
throughput
Fig. 6 Performance of
detection rate
Fig. 7 Performance of
energy
Shuffled Frog Leap and Ant Lion Optimization for Intrusion Detection … 25
Fig. 8 Performance of
throughput
6 Conclusion
One of the most difficult issues in networking is ensuring security while maximizing
energy efficiency. Monitoring IDS through IoT-WSN necessitates an enhanced focus
on security. In this study, we provide an IoT-WSN safe routing intrusion preven-
tion architecture. Improved network performance and protection against malicious
attacks are the primary goals. The greedy method for data routing is used by the
majority of energy-efficient techniques, which rely on static sensor nodes. As a
consequence, dynamic circumstances do not allow for such solutions. The preser-
vation of IoT security and privacy is essential to IoT services, but it also presents
a significant challenge to IoT security. We now have an abundance of information
thanks to the Internet’s many communication channels and social media platforms.
The SFLA-ALO approach proposed in this research is a novel intrusion detection
tool for the Internet of things. The suggested SFLA-ALO approach increases security
by measuring performance metrics such as detection rate, throughput, latency, and
energy consumption, according to simulation findings.
References
1. Butun I, Morgera SD, Sankar R (2013) A survey of ıntrusion detection systems in wireless
sensor networks. IEEE Commun Surv Tut 16(1):266–282
2. Borkar GM, Patil LH, Dalgade D, Hutke A (2019) A novel clustering approach and adaptive
SVM classifier for intrusion detection in WSN: a data mining concept. Sustain Comput: Inf
Syst 23:120–135
3. Pundir S, Wazid M, Singh DP, Das AK, Rodrigues JJ, Park Y (2019) Intrusion detection
protocols in wireless sensor networks integrated to internet of things deployment: survey and
future challenges. IEEE Access 8:3343–3363
4. Amouri A, Alaparthy VT, Morgera SD (2020) A machine learning based intrusion detection
system for mobile Internet of Things. Sensors 20(2):461
5. Halder S, Ghosal A, Conti M (2019) Efficient physical intrusion detection in Internet of Things:
a Node deployment approach. Comput Netw 154:28–46
26 D. Jayanayudu and A. Ch. Sudhir
Abstract The World Health Organization (WHO) has suggested a successful social
distancing strategy for reducing the COVID-19 virus spread in public places. All
governments and national health bodies have mandated a 2-m physical distance
between malls, schools, and congested areas. The existing algorithms proposed and
developed for object detection are Simple Online and Real-time Tracking (SORT)
and Convolutional Neural Networks (CNN). The YOLOv3 algorithm is used because
YOLOv3 is an efficient and powerful real-time object detection algorithm in compar-
ison with several other object detection algorithms. Video surveillance cameras are
being used to implement this system. A model will be trained against the most
comprehensive datasets, such as the COCO datasets, for this purpose. As a result,
high-risk zones, or areas where virus spread is most likely, are identified. This may
support authorities in enhancing the setup of a public space according to the precau-
tionary measures to reduce hazardous zones. The developed framework is a compre-
hensive and precise solution for object detection that can be used in a variety of fields
such as autonomous vehicles and human action recognition.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 27
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_3
28 K. Naveen et al.
1 Introduction
2 Objectives
This project is for cautioning the people who are violating the social distancing norms
and in result the organizations could take the support of this project and could monitor
whether the social-distancing norms are well followed, using a novel single-stage
model approach for increasing speed without compromising much accuracy with
the YOLOv3 algorithm, which improves object detection speed while maintaining
accuracy in comparison with that of the similar other algorithms.
3 Literature Review
This project was created after reviewing the following literature. In each of these
papers, we have studied several technological aspects, offering a basic overview of
object detection schemes that combine two object detectors [1]. Although single-
stage detectors are significantly faster than two-stage object detectors, two-stage
A Comprehensive Alert System Based on Social Distancing … 29
sensors achieve the best prediction rates. It describes the detection method, which
is based on regional suggestion and regression, as well as the system’s advantages
and disadvantages [2]. SSD produces more accurate results, and YOLO operates
more quickly. Because of the speed with which the execution was accomplished, the
proposed solution helps make the use Mobile Net SSD [3]. Using a simplified IoT
paradigm would be resulting in excessive electrical energy consumption. Even minor
movements, such as a strong breeze or wildlife, would be misinterpreted as human
appearance suggests wearing a social distancing device that uses a microprocessor
and an ultrasonic sensor to determine the distance between two persons [4]. When
compared to these image processing algorithms, detection is much more accurate.
This, however, does not guarantee that each individual has the detector with them. It
employs a deep neural network, a mask Regions with Convolutional Neural Networks
(RCNN) for identifying faces in video frames [4]. The CNN algorithm can be used
with large datasets and detects without the assistance of humans, but it is slow.
Although the CNN algorithm can handle large datasets and identify without human
intervention, it is slower. When considering distancing in public spaces, CNN-based
object detectors with a recommended social distancing algorithm produce promising
results [5]. CNN models are being used in image recognition and text mining. It is
important in classification. Small objects, on the other hand, are difficult to detect.
A new adaptive detection methodology for effectively recognizing and monitoring
people is constructed using both interior and exterior contexts [6, 7].
The proposed model helps to identify people who violate social distance rules in
crowded environments. Deep learning and computer vision techniques are used, and
the people in the video frame were detected using an open-source object detection
network based on the YOLOv3 algorithm, and the pedestrians were identified. In
this application, these object classes are ignored. As a result, for each recognized
individual in the image, the best bounding boxes with centroids will be generated,
and the centroids will be used for distance measurement. The centroids of two people
can be used to calculate the distance between them. Let the first person’s centroid be
(x 1 , y1 ) and the second person’s centroid be (x 2 , y2 ), and the distance between their
centroids be (x 2 , y2 ) is calculated by squaring difference of the respective coordinates
to the whole root of the sum of their product. The model’s flow diagram and the
procedure conducted on the input video. The first step is to load a video frame and
count the number of people in the frame. Following the detection of people, the
distance between them is measured, and if the distance is higher than or equal to
the social distance, a green bounding box surrounds the person. In this situation,
the individual is marked with a red bounding box, and the number of infractions is
increased and displayed on the screen. This is a recursive procedure that continues
until the entire video frame is identified. And the social distance between people
would be measured if the distance is less than 2.5 ft. They would be bounded in red
30 K. Naveen et al.
boxes and if the distance is greater than 2.5 ft. they would be bounded in the green
boxes. So, we could identify the people who are not following the social distancing in
red boxes and the ones following in green boxes, hence could notify them accordingly
with a beep sound using an alarm [8].
The first input frame from the movie is imported, as shown in Fig. 1 and previously
explained, and a grid is formed using a convolution neural network. Assume that it
takes 3 by 3 grids to produce 9 vectors, each with 7 variables. Probability, x-axis
length, y-axis length, height, and breadth come first, followed by classes c1 and c2 .
This vector is now used to identify the picture and create boundary boxes. If we have
more than one, non-max suppression occurs when the bounding box with the lowest
probability is deactivated. The pictures are now bound with high probability to the
boxes, and the distance between the bounding boxes is determined. In this inves-
tigation, to determine people, the YOLO algorithm was used. The YOLO method
learns bounding box coordinates (t x , t y , t w , t h ), object confidence, and matching
class label probabilities (P1 , P2 , …, Pc ) to distinguish objects from a given input
image. The YOLO was trained using the COCO dataset [9], which contains 80 labels,
including human and pedestrian classes. Only box coordinates, object confidence,
and pedestrian object class from the YOLO model detection result were used for
person detection in this study [4] (Fig. 2).
5 Performance Metrics
average prediction time is 7.12 s for a one-second video frame, with person detection
taking the longest, at 5.24 s.
TN + TP
Accuracy = (1)
(TN + FN) + (TP + FP)
Precision ∗ Recall
F1 Score = 2 ∗ (2)
(Precision + Recall)
where TN stands for true negative, TP stands for true positive, FN stands for false
negative, and FP stands for false positive (Table 1).
6 Data Collection
7 Outcome
The concerns are illustrated from top to bottom, with boxes indicating each runner in
terms of image processing. To acquire an accurate number, we forecast the distances
by computing the centroids of the boxes. These red colored boxes represent people
who are too far away from this individual [10]. These green hue boxes represent those
who keep a safe distance from this. In all of the above examples, slower object detec-
tion algorithms may fail to determine the right distance since the distances between
people are continuously changing owing to their motions; hence, the quicker one
stage detector object detection model is utilized to suit the objective. Yolo operates
at a fast speed, with an accuracy of 0.358 and a time limit of 0.8 s per every frame.
The concerns are illustrated from top to bottom, with boxes indicating each runner in
A Comprehensive Alert System Based on Social Distancing … 33
As a result, there is no issue with counting the same object too many times. Only
runners who do not keep the necessary distance will be tallied in the number of
infractions. Aside from that, each violation will result in a warning. So that it is
simple to determine how many people are breaking.
In the test 1, test 2 shown above a video stream from Velagapudi Ramakrishna
Siddhartha Engineering College’s Information Technology is evaluated. The input
video shows a group of individuals strolling at a steady pace. The detection procedure
has been finished, and the results are depicted in Figs. 3 and 4, respectively.
In comparison between YOLOv3 and Faster RCNN, YOLOv3 can take eight
frames more than Faster RCNN per second. YOLOv3 can work better in terms
of speed in comparison with Faster RCNN (Figs. 5, 6 and 7). Table 2 shows that
YOLOv3 has higher accuracy of 93.46% than other algorithms.
Image processing techniques are used to detect social distancing violations. This
design was validated using a video of a group of people competing in a running race.
The visualization results confirmed that this method could determine the distance
between people, which can also be used in public places such as bus stops, shop-
ping malls, and hospitals. Furthermore, it can be improved by combining it with
mask detection. It can also be improved by modifying the system capabilities and
implementing more advanced algorithms for faster detection. For social distancing
A Comprehensive Alert System Based on Social Distancing … 35
Fig. 7 Percentage graph on comparing different algorithms that outperforms with COCO dataset
detection, the flows are depicted from top to bottom as boxes denote each runner
[13]. To get an accurate value, we estimate the distances by calculating the centroids
of the boxes. The red boxes represent the runners who are the furthest away from the
finish line. The runners in the green boxes are those who keep a safe distance from
these runners. The system was successfully tested, and it was able to detect social
distancing accurately. The errors are possible as a result of the runners running too
36 K. Naveen et al.
close to this runner. However, the obtained results have a certain number of limita-
tions. According to the results of the system tests that have been performed, the object
detection model that has been used for detecting people has difficulty in correctly
detecting people outdoors and there have been issues with distant scenes too. In this
case, we may not be able to determine the correct distance [14]. The YOLO algorithm
can also detect the runner’s half body as an object by displaying the bounding box.
The visualization results confirmed that this approach was effective. Furthermore, it
can be improved by combining it with mask detection [7].
References
1. Hou YC, Baharuddin MZ, Yussof S, Dzulkifly S (2020) Social distancing detection with
deep learning model. In: 2020 8th International conference on information technology and
multimedia (ICIMU), pp 334–338. https://doi.org/10.1109/ICIMU49871.2020.9243478
2. Wei W (2020) Small object detection based on deep learning. In: Proceedings of the IEEE
international conference on power, intelligent computing and systems (ICPICS), pp 938–943
3. Gupta S, Kapil R, Kanahasabai G, Joshi SS, Joshi AS (2020) SD-measure: a social
distancing detector. In: Proceedings of the IEEE 12th international conference on computational
intelligence and communication networks, pp 306–311
4. Ansari MA, Singh DK (2021) Monitoring social distancing through human detection for
preventing/reducing COVID spread. Springer
5. Adarsh P, Rathi P, Kumar M (2020) YOLO v3-tiny: object detection and recognition using one
stage improved model. In: Proceedings of the IEEE 6th international conference on advanced
computing and communication systems (ICACCS), pp 687–694
6. Madane S, Chitre D (2021) Social distancing detection and analysis through computer vision.
In: 2021 6th International conference for convergence in technology (I2CT), pp 1–10. https://
doi.org/10.1109/I2CT51068.2021.9418195
7. Tyagi A, Rajput D, Singh A (2021) A review on social distancing auto detection techniques in
perspective of COVID’ 19. In: 2021 Fifth international conference on I-SMAC (IoT in social,
mobile, analytics and cloud) (I-SMAC), pp 1–6. https://doi.org/10.1109/I-SMAC52330.2021.
9640663
8. Pan X, Yi Z, Tao J (2021) The research on social distance detection on the complex environment
of multi-pedestrians. In: 2021 33rd Chinese control and decision conference (CCDC), pp
763–768. https://doi.org/10.1109/CCDC52312.2021.9601818
9. Saponara S, Elhanashi A, Gagliardi A (2021) Implementing a real-time, AI-based, people
detection and social distancing measuring system for Covid-19. Springer
A Comprehensive Alert System Based on Social Distancing … 37
10. Hou YC, Baharuddin MZ, Yussof S, Dzulkifly S (2020) Social distancing detection with
deep learning model. In: 2020 8th International conference on information technology and
multimedia (ICIMU), pp 334–338
11. Melenli S, Topkaya A (2020) Real-time maintaining of social distance in Covid-19 environ-
ment using image processing and Big Data. In: 2020 Innovations in intelligent systems and
applications conference (ASYU), pp 1–5. https://doi.org/10.1109/ASYU50717.2020.9259891
12. Indulkar Y (2021) Alleviation of COVID by means of social distancing and face mask detec-
tion using YOLO V4. In: 2021 International conference on communication information and
computing technology (ICCICT)
13. Shao Z, Cheng G, Ma J, Wang Z, Wang J, Li D (2022) Real-time and accurate UAV pedestrian
detection for social distancing monitoring in COVID-19 pandemic. IEEE Trans Multimedia
24:2069–2083. https://doi.org/10.1109/TMM.2021.3075566
14. Hossam H, Ghantous MM, Salem MA (2022) Camera-based human counting for COVID-
19 capacity restriction. In: 2022 5th International conference on computing and informatics
(ICCI), pp 408–415
An Improved Image Enhancing
Technique for Underwater Images
by Using White Balance Approach
Abstract In recent days, a lot of study has been done on improving the visual quality
of underwater and undersea imaging in submarine and military operations to find the
hidden structure designs and sea excursions. This paper proposed an underwater
image enhancement method based on colour constancy theory. When we compare
with much of the existing research, the time and complexity of the proposed method
are low, and it achieves excellent performance. Firstly, we analyse the underwater
imaging model and distortion. Then, by compensating the red channel and applying
local white balance, a linear transformation is performed. Finally, results are obtained
by applying the histogram equalization to the RGB images. We also measure the
image quality by taking the parameters PSNR, UIQM, UCIQE, and Entropy. These
parameters are compared to proposed and existing approaches; however, our method
produces higher image quality.
1 Introduction
Earth is an aquatic planet, and most of its surface is covered by water. If he or she
dives into the water, they have to face so many problems that they have to stay for
an extended period of time in order to conduct experiments [1]. Exploration of the
oceans is not an easy task. At present, most of the researches going on in these
oceans but due to poor imaging environment quality of the images are bad. The low
quality of underwater images leads to low efficiency when humans use these image
sensors to explore the ocean. In Fig. 1, we can clearly see the distortion caused by
the underwater conditions. Underwater, the quality of the image degrades and light
properties differ compared to air [2]. Only one way to get clear underwater images is
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_4
40 G. Geetha and A. Sai Suneel
pass through the water as we travel deeper into it. Depending on the water type, the
images seem green or bluish.
2 Literature Survey
Ancuti et al. [7] proposed a novel method for improving underwater images and
videos. It is based on the fusion principle, that is, it is influenced by the inherent
qualities of the source picture. Our degraded image first undergoes white balancing.
It removes the unwanted colour cast, but this method solely does not handle the
problem of visibility. Here, the contrast local adaptive histogram equalization is
used to obtain the second input. Since it works well and the distortion is small.
Next, the weight maps of this algorithm evaluate the image qualities that identify the
spatial pixel relationships. The weight measure must be designed with the intended
appearance of the restored output in mind. It handles the global contrast by applying
the Laplacian filter on input of the channel, but this weight is inadequate to take back
the contrast. So, we have to take another measurement, which is the local contrast
weight. It strengthens the contrast. Saliency weight is to highlight the selective objects
that lose their fame in the underwater world, and exposedness weight estimates how
well a pixel is exposed, and all these weights are summed up by using a normalised
weight. Finally, the multi-scale fusion process is completed; that is, the restored
output is adding up all the fused additions of all the inputs. Although this method has
high contrast and excellent performance, it can easily cause noticeable distortion in
images and amplify the noise in images.
Ancuti et al. [3] proposed an effective technique to improve the underwater
captured images. It builds on a two-step strategy. Firstly, the original image is given
to the white balancing algorithm, which removes the unwanted colour casts. The two-
step strategy consists of combining local white balance and image fusion. To enhance
the edges and reduce the loss of contrast, we use image fusion. First, we perform
gamma correction on the white-balanced image. The second input, a sharper version
of the white-balanced picture, is also used, and it reduces the degradation caused by
the scattering. Next, the weight maps are used during the combining process to repre-
sent pixels with higher weight values in the final image. Here, the Laplacian contrast,
saliency, and saturation weights are used. Saturation weight enables chrominance
information to be adapted by highly saturated regions. The reconstructed image is
created in the following naive fusion step by fusing the inputs with weight measure-
ments at each pixel location. Finally, the output of the multi-scale fusion process
is obtained by adding all the inputs fused contributions. Although this method has
high contrast and excellent performance, it can easily cause noticeable distortion in
images and amplify the noise in images.
Berman et al. [4] proposed a method: it reconstructs underwater sceneries from
a single photograph using several spectral profiles of various water kinds. The chal-
lenge can be simplified by estimating the two global parameters, namely the attenu-
ation ratios of the blue–red and blue–green colour channels. But we do not know the
42 G. Geetha and A. Sai Suneel
water type. Each has a set of attenuation ratios that are known and constant. To begin,
we calculate the covering light. If we wanted to detect the pixels that correspond to
veiling light, we had to create an edge map using the Structured Edge Detection
Toolbox with a pre-trained model and default settings, then group the pixels into
haze-lines to gain a first estimate of the blue channel’s transmission. Finally, use a
guided image filter using a contrast enhanced input picture as direction to set up the
transmission. Then, the restored image is calculated and performed the white balance
on that image, and then return the image to the gray world assumption on pixels. We
perform the restoration several times with various attenuation ratios before deciding
on the optimum one based on the gray world assumption. We discovered gray world
assumption which produces the best results. This method only performs well if it
meets the assumptions about the underwater environments, and this is one of the
drawbacks.
Huang et al. [8] proposed a simple but very effective method. It has three major
steps: contrast correction, colour correction, and quality evaluation. Firstly in contrast
correction, after this RGB channel decomposition, we apply colour equalization
and the relative global histogram stretching to the image. And next, relative global
histogram stretching ignores the histogram distribution of distinct channels in the
image and uses the same parameters for all RGB channels. By using the bilateral filter,
we can eliminate the noise after the transformation, and this is given to the colour
correction process. In this colour correction, we apply simple histogram stretching
on the “L” component and adjust the “a” and “b” in CIE Lab colour space. Next,
this CIE Lab colour space is given to the adaptive stretching of “L”, “a”, and “b”.
The channels are then combined and returned to the RGB colour model. Finally, a
contrast and colour-corrected output image may be generated, and we evaluate the
effectiveness of the proposed method using five quality evaluation models. But, in
RGHS, the main drawback is that it lacks the ability to correct the colour casts in
underwater images.
3 Existing Method
In this existing method, we are having three steps there are as follows:
1. Red channel compensation
2. Colour correction
3. Histogram stretching.
attenuate. The red channel has a wavelength attenuation of 600 nm, the blue channel
has a wavelength attenuation of 525 nm, and the green channel has a wavelength
attenuation of 475 nm. If we directly apply the white balance approach to the images,
the result is unsatisfactory. So, first we have red channel compensation on the images,
and then we apply white balance, and finally we get an effective result.
After this red channel compensation, we still need to correct the red channel because
it is having severe attenuation [10]. A traditional method like gray world also fails
to correct the images. This is because of the colour degradation of images, which is
not uniformity. So we divide the images into patches. The effect of distance on the
extent of distortion may be ignored for each patch, and colour constancy can be used
to rectify red channel distortion. For each patch, we use the gray world method to
solve the weight map for the local white balance.
Fig. 2 a Raw image, b after red channel compensation, c colour correction, and d histogram
stretching
44 G. Geetha and A. Sai Suneel
4 Proposed Method
The histogram equalization technique uses the histogram to modify the contrast of an
image [2]. After this colour correction method, the image gets low image contrast. To
eliminate this, we are using histogram equalization. When this histogram equalization
is applied to the RGB channels of the image, it will improve the image contrast and
give a better image quality. The block diagram is designed in such a way that we
send an input to the red channel compensation and then apply local white balance
for the specific channel, correct the picture using the colour correction technique,
and improve the image using histogram equalization (Fig. 3).
Figure 4a depicts the raw picture, and Fig. 4b red channel correction applied to raw
image. Figure 4c depicts the colour correction, and Fig. 4d depicts how we applied
histogram equalization to boost image contrast and yield a high-quality image.
5 Experimental Results
We took the above images from the below mentioned references. Figure 5a contains
raw images, and in Fig. 5b a fusion-based method [7] is used. In Fig. 5c, a fusion-
based method [3] is used. In Fig. 5d shows relative global histogram stretching. In
Fig. 5e, we first compensate the red channel and use the white balance method, then
An Improved Image Enhancing Technique for Underwater Images … 45
Fig. 4 a Raw image, b after red channel compensation, c colour correction, and d histogram
equalization
colour correction method and finally we use histogram stretching. Figure 5f shows
red channel compensation and local white balance, followed by colour correction
and histogram equalization. Compared to those methods, our proposed method gives
a good-quality image and improves the image contrast.
In Table 1, here we calculate some parameters that is peak signal-to-noise ratio
(PSNR), underwater image quality measure (UIQM) [13], underwater colour image
Fig. 5 a Raw image, b fusion method [7], c fusion method [3], d RGHS, e existing method, and f
proposed method
46 G. Geetha and A. Sai Suneel
quality evaluation (UCIQE) [14], and Entropy. To evaluate an image, we take these
parameters, and we can strongly say that our proposed method got better results
compared to other methods.
6 Conclusion
References
1. Padmavati G et al (2010) Comparison of filters used for underwater image pre-processing. Int
J Comp Sci Netw Security 10(1):58–65
2. Singh B, Mishra RS, Gour P. Analysis of contrast enhancement techniques for underwater
image. IJCTEE 1(2):190–195
3. Ancuti CO, Ancuti C, De Vleeschouwer C, Bekaert P (2017) Color balance and fusion for
underwater image enhancement. IEEE Trans Image Process 27(1):379–393
4. Berman D, Treibitz T, Avidam S (2017, Sept) Diving into haze-lines: color restoration of
underwater images. BMVC 1(2)
5. Jerlov NG (1976) Marine optics. Elsevier
6. Buchsbaum G (1980) A spatial processor model for object colour perception. J Franklin Inst
310(1):1–26
7. Ancuti C, Ancuti CO, Haber T, Bekaert P (2012, June) Enhancing underwater ımages and
videos by fusion. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE,
pp 81–88
8. Huang D, Wang Y, Song W, Sequeria J, Mavromatis S (2018, Feb) Shallow water image
enhancement using relative global histogram stretching based on adaptive parameter acquisi-
tion. In: International conference on multimedia modeling, pp 453–465
An Improved Image Enhancing Technique for Underwater Images … 47
9. Iqbal K, Odetayo M, James A, Salam RA, Talib AZH (2010, Oct) Enhancing the low quality-
images using unsupervised colour correction method. In: 2010 IEEE ınternational conference
on systems, man and cybernetics. IEEE, pp 1703–1709
10. Zhang H, Li D, Sun L, Li Y (2020) An underwater image enhancement method based on
local white balance. In: 5th International conference on mechanical, control and computer
engineering (ICMCCE), pp 2055–2060
11. Jaffe JS (1990) Computer modeling and the design of optimal underwater imaging systems.
IEEE J Ocean Eng 15(2):101–111
12. McGlamery BL (1975) Computer analysis and simulation of underwater camera system
performance. SIO Ref 75(2)
13. Panetta K, Gao C, Agaians S (2015) Human-visual-system-inspired underwater image quality
measures. IEEE J Ocean Eng 41(3):541–551
14. Yang M, Sowmya A (2015) An underwater color image quality evaluation metric. IEEE Trans
Image Process 24(12):6062–6071
32-Bit Non-pipelined Processor
Realization Using Cadence
K. Prasad Babu
15PH0426, Department of ECE, JNTUA, Anantapuramu, Andhra Pradesh, India
K. E. Sreenivasa Murthy (B)
Department of ECE, RECW, Kurnool, Andhra Pradesh, India
e-mail: [email protected]
M. N. Giri Prasad (B)
Academics and Audit, JNTUA, Anantapuramu, Andhra Pradesh, India
e-mail: [email protected]
Department of ECE, JNTUA, Anantapuramu, Andhra Pradesh, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 49
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_5
50 K. Prasad Babu et al.
1 Introduction
The basic working principle of a processor is fetch, decode, execute, and store the
instructions. ALU is heart of any processor. Arithmetic unit performs the logical and
arithmetic tasks. Processors can be RISC/CISC, and most of the times RISC proces-
sors are suitable for low power applications and hence suited for portable or embedded
applications. These have very few instructions of preset length, additional common
registers, load–store structural design, basic addressing types for collective execution
of instructions in a faster way, and the area required is less when compared to CISC
Processors. For execution of instructions, different types of instruction formats are
used like R-Type, B-Type, and I-Type. The PC provides the next address location
to be fetched. The instruction register decodes and stores the fetched one. In the
decode process destination register, source register, address of the memory location,
or immediate value is assigned based on the operation to be performed. Short-circuit
power dissipation occurs throughout switching of transistors. Dynamic power is due
to the charge and discharge of the output load. The equation representing it is
All these three power equations are the contributors for total net power dissipation
in any design.
There are many works done and proposed for processor design. RISC processors
are majorly employed for their prominent usage. In [1], Chandran Vankatesan et al.
have implemented the design of 16-bit RISC processor using 45 nm. In [2], Nirmal
Kumar et al. have proposed using separated LUT’s for embedded system. In [3], Indu
et al. had implemented the design of low power pipelined RISC processor. In [4],
Chandran et al. employed rounding technique for energy efficiency of multipliers. In
[5], Samiappa et al. coded for convolution applications of processor. In [6], Topiwala
implemented the MIPS 32-bit processor using Cadence. In [7] Jain had implemented
the 32-bit pipelined processor on spartan fpga board. In [8] Rupali et al., have analysed
the Instruction fetch-decode blocks of 32-bit pipelined processor. In [9] Gautham
et al., have proposed low power 5 stage MIPS-32.
32-Bit Non-pipelined Processor Realization Using Cadence 51
2 Implementation
The basic blocks of processor implemented is shown in Fig. 1. The inputs to the CPU
block are Clock and RESET; CPU block in this design is composed of InstructReg,
PrgCntr, and Accmltr. The memory IO controller block is combination of separate
RAM, ROM, input–output controller, and multiplexer. The data to CPU is feedback
from the output of Mux to the CPU block. The CPU outputs data from CPU, address,
write enable are fed to the Memory IO controller block.
The format for opcode is as shown below, where only most significant bits are
used for selection 31 to 28 bits, and remaining are unused.
31 28 27 16 15 0
OPCODE UNUSED OPERAND
2.1 Simulations
The proposed work simulation is started with ISE for coding in Verilog, and Fig. 2
indicates the RTL view of CPU and memory controller unit. By using Cadence
Simvision, the output waveform is obtained for the code. Total number of instances
and their short-circuit power consumption, leakage power consumption, dynamic
power consumption are calibrated using RTL encounter. Layout is obtained as the
final step of the processor work.
Figure 2 indicates the schematic view of processor and its memory unit. Data bits
size is 32, and address size is 16-bit. Four input output values.
The above Fig. 3 depicts the waveform notation obtained on invoking the Simvi-
sion. All the blocks are synced w.r.t clock and accordingly to write enable IR, PC,
AC Datain, and Dataout are processed.
The RTL values variation can observed in Fig. 4.
Area synthesis report gives us the no of cells used in the design, there occupancy
in µm values. Figure 5 emphasizes on this. The critical part of the entire design, the
usage of the various instances of the design, is clearly mentioned in Fig. 6, and the
memory controller is utilizing most of the area.
An individual basic gate with total number of mapping instances, area is mentioned
in Fig. 7, and it is also in µm values.
Figure 8 discloses us the power consumption of different types. Leakage power,
short-circuit power, net power, dynamic power in nw. Figure 9 signifies the synthesis
layout of the design. Lastly, the layout generation of the 32-bit processor design is
revealed in Fig. 10.
32-Bit Non-pipelined Processor Realization Using Cadence 53
Fig. 5 Area synthesis report. This shows a figure consisting of different types of cells. The
corresponding cell area and net area are obtained
Fig. 6 Total power report. This shows a figure consisting of different types of blocks. The
corresponding percentage of net power usage individually is obtained
Fig. 8 Total net power consumption of cells. This shows a figure consisting of different types of
cells. The corresponding values of cells and their power usage individually are obtained
32-Bit Non-pipelined Processor Realization Using Cadence 55
5’h05:dataout<=32’h7000005f;
5’h06:dataout<=32’h4000ffff;
5’h07:dataout<=32’h2000005f;
//Result(AC)should be fffe
//Test the shift left(SL)
5’h08:dataout<=32’h40000001;
5’h09:dataout<=32’h7000005f;
5’hOA:dataout<=32’h4000ffff;
5’hOB:dataout<=32’h3000005f;
//Test the OR
5’hOC:dataout<=32’h4000fOfO;
5’hOD:dataout<=32’h7000005f;
5’hOE:dataout<=32’h40000000;
5’hOF:dataout <=32’h6000005f;
//Test the AND
5’hl0:dataout<=32’h4000OfOf;
5’hll:dataout<=32’h7000005f;
5’hl2:dataout<=32’h4000OOfO;
5’hl3:dataout<=32’h9000005f;
//Branch
5’h14:dataout<=32’h80000000;
56 K. Prasad Babu et al.
3 Conclusion
Acknowledgements I am very much thankful for the guide and support given by my Supervisor
Dr. K. E. Sreenivasa Murthy, in doing this work and special thanks to the Co-Supervisor Dr. M. N.
Giri Prasad for helping me in work.
32-Bit Non-pipelined Processor Realization Using Cadence 57
References
1. Venkatesan C, Thabsera Sulthana M, Sumithra MG, Suriya M (2019) Design of a 16-Bit Harvard
structure RISC processor in cadence 45 nm technology. In: 2019 5th international conference
on advanced computing and communication systems (ICACCS), 978-1-5386-9533-3/19/$31.00
©2019 IEEE, pp 173–178
2. Kumar RN, Chandran V, Valarmathi RS, Kumar DR. Bitstream compression for high speed
embedded systems using separated split LUTs. J Comput Theor Nanosci 15(Special):1–9
3. Indu M, Arun Kumar M (2013) Design of low power pipelined RISC processor. Int J Adv Res
Electr Electron Instrum Eng 2(Aug 2013):3747–3756
4. Chandran V, Elakkiya B. Energy efficient and high-speed approximate multiplier using rounding
technique. J VLSI Des Sig Process 3(2, 3)
5. Sakthikumaran S, Salivahanan S, Bhaaskaran VK (2011) 16-Bit RISC processor design for
convolution applications. In: IEEE international conference on recent trends in information
technology, June 2011, pp 394–397
6. Topiwala MN, Saraswathi N (2014) Implementation of a 32-Bit MIPS based RISC processor
using cadence. In: IEEE International conference on advanced communication control and
computing technologies (ICACCCT), 2014, pp 979–983
7. Jain N (2012) VLSI design and optimized implementation of a MIPS RISC processor using
XILINX tool. Int J Adv Res Comp Sci Electron Eng (IJARCSEE) 1(10), Dec 2012
8. Balpande RS, Keote RS (2011) Design of FPGA based instruction fetch & decode module of
32-bit RISC (MIPS) processor. In: 2011 IEEE. https://doi.org/10.1109/CSNT.2011.91
9. Gautham P, Parthasarathy R, Balasubramanian K (2009) Low power pipelined MIPS processor
design. In: proceedings of the 2009, 12th international symposium, 2009 pp 462–465
Metaverse: The Potential Threats
in the Virtual World
Abstract The virtual world seems like it’s a bit old term, let’s use the metaverse.
Neal Stephenson, a science fiction writer, created the term metaverse in 1992. “The
concept of a fully immersive virtual world where people assemble to socialize, play,
and work,” according to its most basic definition. From basic games and shopping to
complex meetings, we can now be organized in the virtual world called the metaverse.
In general, we are habituated to working in the physical world but metaverse is the
future where we will be playing games, virtual shopping, virtual meetings, creating
our virtual world, and investing in stocks. Cryptocurrencies are the fuel for metaverse
to keep it working. Non-Fungible Tokens (NFTs) are the cryptocurrencies that are
used to stand for real-world objects. These NFTs cannot be hacked since they use
blockchain technology that allows the decentralization of data to prevent these hacks.
Now comes the billion-dollar question is this metaverse safe? As we know everything
cannot be 100% perfect and it’s also had some shortcomings. Anyone can enter
into a highly confidential meeting using our unique avatar, and this arises much
dubiousness, is our data safe in the metaverse?, privacy concerns, digital boundaries,
and social engineering. This makes metaverse a bit scary even though it has some
dominance. This requires creating a more secure, robust metaverse, and awareness
in the people.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 59
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_6
60 K. Ghamya et al.
1 Introduction
Fig. 1 Metaverse
Metaverse: The Potential Threats in the Virtual World 61
The rest of the paper is organized as follows, Sect. 3 discusses the role of each
technology in the metaverse and its issues, Sect. 4 shows the discussions on the
findings of the study, and Sect. 5 presents the conclusion.
Utilizing virtual reality technology, we can create a hybrid universe where the impos-
sible is possible. You can step into different stores, explore additional items, gain
knowledge, and participate in fantastic occasions. It fosters a sense of community,
but it also allows us to feel the warmth of hugs and holding hands from loved ones
all over the world. Using VR technology can enable metaverse to disrupt vast indus-
tries even more. Let’s suppose Facebook Horizons enables you to design your ideal
virtual landscape. You connect with people worldwide and establish relationships
with them. Meanwhile, VR gloves are estimated to be the next big thing. Worldwide,
HTC, PlayStation VR, Oculus quest, and Valve index are among the most popular
VR headsets.
2.1.1 Issues in VR
Virtual reality (VR) is encountering several major difficulties on its way to broad
acceptance, despite its many benefits. A lot of huge investors like Google and
Facebook have thrown billions into the VR business, allowing for some incred-
ibly powerful devices to join the market in the last year, such as the Oculus Rift.
According to Riccitiello, the problem is that consumers were not prepared for every-
thing [4]. When seen from the perspective of an organization, where a design team
may require many VR machines for its process, the cost issue becomes much more
acute. Even if the price barrier is solved, VR still confronts a significant challenge
in the shape of a scarcity of must-have content. As a result, it becomes a significant
market impediment.
In the metaverse, augmented reality transforms real-life items and characters into
digital visual components. Virtual reality also generates a virtual environment using
computer-generated graphics. Virtual components can be embedded in the real world
using augmented reality. For example, Facebook uses virtual reality headsets and
augmented reality smart glasses to introduce metaverse to users on desktop and
62 K. Ghamya et al.
mobile devices. The increased use of AR and VR technologies implies that metaverse
development is becoming more understandable [5].
The biggest attempt by augmented reality to catch up to virtual reality and into the
public consciousness may also prove to be one of the most difficult challenges it
faces. The addition of augmented reality to regular mobile devices puts augmented
reality in the hands of hundreds of millions of people right away. That augmented
reality experience is far from perfect. Although Apple and Google developers have
done an incredible job of bringing augmented reality to devices that weren’t designed
specifically for it [5, 6], most consumers’ first exposure to the technology will be
restricted to what a mobile device can achieve.
The blockchain that enables the usage of Non-Fungible Tokens or NFTs has proven
to be immensely useful in the accomplishment of digital ownership, governance,
value transfer, accessibility, and interoperability. Considering its infinite potential
and capabilities, metaverse will furnish ample opportunities for businesses of all
shapes and sizes, thereby, accelerating the growth of major economic industries
such as real estate, eCommerce, entertainment, and media as shown in Fig. 3. To
ensure the full functioning of the blockchain metaverse, all participants must see
and interact in the same virtual landscape. A decentralized ecosystem powered by
blockchain technology enables tens of thousands of independent nodes to seamlessly
synchronize [7].
We’ve seen large corporations develop private blockchain solutions for compa-
nies who prefer to keep certain information classified. Top tech firms like IBM and
Intel deploy these blockchain technology solutions to businesses looking to improve
supply chain problems.
It’s critical to remember that the endpoints of the majority of blockchain transactions
are significantly less secure. As a result of bitcoin trading or investment, a sizeable
amount of bitcoin may be deposited into a “hot wallet,” or virtual savings account.
The actual blocks of the blockchain cannot possibly be more secure than these wallet
accounts. The absence of clear legislative standards presents another challenge to
blockchain security (Fig. 2).
The blockchain sector has no homogeneity, making it challenging for developers
to learn from others’ mistakes. It is clear that blockchain technology isn’t completely
Metaverse: The Potential Threats in the Virtual World 63
Fig. 2 Blockchain
secure. It’s crucial to understand every facet of blockchain security as a result, whether
a single user acquired cryptocurrency in numerous transactions was a mystery to
hackers. Contrarily, the potential of blockchain privacy protection has not yet been
fully realized. According to a study, chaff coins and mixins are missing from about
66% of the transactions that were looked at. Chaff coins or mixins may make it
more difficult for hackers to determine the relationship between the coins used in a
transaction.
64 K. Ghamya et al.
With the fusion of artificial intelligence, mobile app developers are showing more
interest in creating AI mobile applications. Artificial intelligence is critical for the
metaverse experience because it improves the link between the real world and the
digital world. The use of artificial intelligence in improving user contact and experi-
ence AI can be used to make more realistic and lifelike avatars, as well as to tailor the
user experience to their preferences. It can also be used to improve social connections
by simulating real-world interactions in virtual places.
2.4.1 Issues in AI
Artificial intelligence will soon become one of the metaverse’s most essential, and
potentially hazardous, features. I’m referring about agenda-driven artificial agents
that seem and act like regular users but are actually virtual simulations that will
engage us in “conversational manipulation” on behalf of paying advertising. This is
particularly problematic when AI algorithms gain access to information about our
preferences, beliefs, habits, and temperament, as well as the ability to interpret our
facial expressions and verbal inflections. Such agents will be able to sell us better
than any salesperson. They may easily promote political propaganda and targeted
misinformation on behalf of the highest bidder, and it won’t only be to offer us items
and services (Fig. 3).
Since real-world identities are connected to digital avatars, NFTs can be used to
limit who has access to the metaverse. With the implementation of NFT-controlled
access, the metaverse NFT token first surfaced in 2019. Guests were admitted via an
NFT-based ticket to the first NFT.NYC conference, which took place in 2019. Even
though no one could recognize it as the “metaverse,” the conference provided a good
illustration of NFT metaverse interaction [8]. NFTs have the potential to play a key
role in the greater ecosystem of the metaverse.
remedy this, the larger networks, such as Ethereum, the most popular network for
NFTs, are not.
Accenture, like most corporations, faced the dilemma of how to onboard new workers
when they couldn’t come to the office when COVID directed staff to work from
home. The answer devised by the consulting behemoth was to transport them to the
Nth Floor for orientation. The Nth Floor is a virtual office where coworkers may
collaborate as if they were all in the same room using a virtual reality headset. In an
interview, Yusuf Tayob, group chief executive of operations, said, “We distributed
the device to new hires and then held training sessions on the Nth Floor.” “I have my
Oculus device on my desk across from me, and I can now go to the Nth floor and
interact with peers.” There is no doubt that businesses and supply chain managers are
interested—just look at Facebook’s name change to Meta. For example, in a digital
twin, you’d be able to visualize the impact of modifications and adjustments to your
operations rather than merely scenario planning by running reports. “You might link
the physical space to supplier data, publicly available data like weather data, or
other digital twins,” he said. “The setting becomes considerably more lively.” While
the metaverse may arrive sooner than robotics, Tayob (chief executive of Accenture
Operations) believes it will be a gradual process. He sees five stages in the evolution of
digital twins, some of which are currently taking place. We’ve heard of the metaverse,
and it appears that new world order is on the way. Various theories, multiple thoughts,
possibilities, expectations, and, of course, news are all circulating to keep us guessing
and imagining what it will be like [9]. As a result of our long hours of screen time,
whether on laptops, PCs, mobile phones, tablets, or even smartwatches, we are living
in a mini-metaverse.
One of the questions that arise as we consider the concept of a digital universe is
whether users will be required to use a single digital identity or “avatar” across the
entire metaverse, or if will there be numerous avatars for different micro-pocket
communities. This is similar to logging into an iOS app using your Facebook ID,
which is linked to your Google Account, as you may do now. To access the app, you’re
essentially utilizing three distinct IDs. In the metaverse, how will identification and
transparency work? This is something that the developers must first resolve, as the
wallet address is insufficient [9, 10].
66 K. Ghamya et al.
4 Discussions
Some speculate that the metaverse may be the internet’s future. Many businesses are
investing in the development of the metaverse. It is critical to ensure that no monopoly
exists in the shared virtual environment. Addictions to the Internet and smartphones
are becoming commonplace. As a result, virtual world addiction may become the next
big thing. Furthermore, the metaverse contains entertainment, commerce, games, and
a variety of other addicting activities. Even in this day and century, not everyone has
an Internet connection. Many people lack basic digital skills. Many people will be
unable to benefit from the metaverse because of the digital divide. Few companies
may have control over the metaverse, leaving power and influence in the hands of a
few individuals [10].
5 Conclusion
The metaverse’s powers are clearly depicted in the final perspective on the metaverse.
The promise of a fully immersive web presence incorporating a variety of compo-
nents, such as social media, entertainment, video production, and other contemporary
technology, is very advantageous to the metaverse. On the other hand, worries about
privacy and security, as well as the need for advanced technologies, emerge as major
metaverse issues. The benefits and drawbacks of the metaverse are considered to
create a balanced view of what the metaverse is and could be. Metaverse enthusiasts
and organizations should study both sides of the metaverse before making a final
conclusion.
References
1. Park S-M, Kim Y-G (2022) A metaverse: taxonomy, components, applications, and open
challenges. IEEE Access
2. Wang Y, Su Z, Zhang N, Liu D, Xing R, Luan TH, Shen X (2022) A survey on metaverse:
Fundamentals, security, and privacy. arXiv preprint arXiv:2203.02662
3. Kim T, Jung S (2021) Research on metaverse security model. J Korea Soc Digi Indus Inf
Manage 17(4):95–102
4. Ning H, Wang H, Lin Y, Wang W, Dhelim S, Farha F, Ding J, Daneshmand M (2021) A survey
on metaverse: the state-of-the-art, technologies, applications, and challenges. arXiv preprint
arXiv:2111.09673
5. Di Pietro R, Cresci S (2021) Metaverse: security and privacy ıssues. In: 2021 Third IEEE
ınternational conference on trust, privacy and security in ıntelligent systems and applications
(TPS-ISA), pp 281–288. IEEE
6. Far SB, Rad AI (2022) Applying digital twins in metaverse: user ınterface, security and privacy
challenges. J Metaverse 2(1):8–16
7. Masadeh R (2022) Study of NFT-secured blockchain technologies for high security metaverse
communication
Metaverse: The Potential Threats in the Virtual World 67
8. Brown Sr R, Shin SI, Kim JB (2022) Will NFTS be the best dıgıtal asset for the metaverse?
9. Skalidis I, Muller O, Fournier S (2022) The metaverse in cardiovascular medicine: applications,
challenges and the role of non-fungible tokens. Can J Cardiol
10. Nath K (2022) Evolution of the ınternet from web 1.0 to metaverse: the good, the bad and the
ugly
A Mobile-Based Dynamic Approach
to Comparative Study of Some
Classification and Regression Techniques
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 69
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_7
70 V. S. Maddila et al.
1 Introduction
Beginners are perplexed and doubtful about which algorithm suits a particular
problem. Furthermore, comparing the performance of several models generally
entails a significant amount of time and effort. For example, we can predict which
algorithm would perform best on a specific task, but we must test these algorithms
on the specific use case and inspect the finer features and approach details. To reduce
this overhead, our application is aimed to automate these activities and provides a
comparison between different models using flutter.
The world is moving on to intelligent systems that can do a lot more than the
currently existing classical systems. With the advent of Machine Learning, computer
systems can now be created to learn and be taught, how to do new things. Machine
Learning also gives the ability of senses to some extent to computes, such as computer
vision, audio processing and sentiment analysis. This field achieved a state of maturity
during the recent years and is now able to exert a tremendous amount of influence on
the regular world. Machine Learning is so omnipresent now that every person in this
world is somehow related to a machine learning algorithm. Machine Learning grew
out of the field of artificial intelligence or AI and was strictly a branch that dealt with
data and experience.
There is an abundance of data created due to the boom in information systems
and Internet specifically. Almost every person in this world has an online presence
and is generating data as we speak. Although this data by itself does not amount to
much, its utilization thorough Machine Learning and AI can cause huge stirs in the
world. Organization that collect the user data can understand what a particular user
likes or dislikes and can provide this valuable insight to whoever is willing to buy.
As of now Machine Learning is so potent that with the right data about a person, the
machine could predict their entire day with minor altercations. Although this seems
to invade privacy, with controlled usage it could be used to make human lives better.
One example is allowing an AI to predict an underlying disease by understanding
the daily habits of a person etc.
Another aspect responsible for boom in Machine Learning is the improvement in
computational power. Right now, computational power is abundant and readily avail-
able, this makes the high requirements of Machine Learning algorithms attainable
at relatively lost cost. Along with the cost, the availability of computational power
as cloud resources and distributed resources makes further makes it easy to attain
the power. Previously when computation time was expensive and centralized avail-
ability of computational power made, researching and experimenting with Machine
Learning difficult. Now, almost anyone with a google account could use powerful
CPUs, GPUs, and TUPs to arrive at a Machine Learning solution.
With all these advancements and popularity increase for Machine Learning, there
is a need for programmers and developers to acclimate to the new technology.
Machine Learning is already widely used and is being implemented in every industry.
Software engineers and developers should now have expertise in Machine Learning
to facilitate the organization in keeping up with the trend of using AI and ML
A Mobile-Based Dynamic Approach to Comparative Study of Some … 71
in services and products. But, Machine Learning involves statistics and complex
modeling paradigms that dictate how learning happens and how the results can be
delivered. Learning these paradigms requires time and steady practice. Similarly,
statisticians and professionals with background in other mathematics fields can also
enter the ML world. But the barrier of programming and computational approach
might slow them down. In such scenarios, a one stop solution that can build and
compare the models for a required problem can become crucial.
Hence, with this study, the entry barrier to the world of Machine Learning from
the perspective of computational approach is vastly reduced. The user will be able
to comprehend the performance of a Machine Learning algorithm through a simple
interface and be able to make informed decisions based on the results provided.
2 Literature Review
A programmer needs a lot of practice and knowledge about all the algorithms to
know which algorithm works better for that specific problem. It is quite tedious
and timetaking to conclude which algorithm works better and faster for a specific
problem. So, our goal with this project is to make it easier for the programmer,
newcomers, and students of Machine Learning to understand which algorithm best
discovers a solution to a certain problem. Following researches show that the compar-
ative analysis of the algorithms helps us in finding the accurate results by finding the
best algorithm for the certain problem in less time.
Researches show that the comparative analysis of the Machine Learning algo-
rithms can be used in finding the best algorithms for many problems in many fields
[1]: The authors of this paper put the data sets through many Machine Learning
algorithms such as Support Vector Machine, decision tree, logistic regression, and
random forest to predict the presence of dementia across the data set. Comparing
the results obtained by the algorithms suggested that the Support Vector Machine
algorithm gives the accurate results of dementia when compared to the remaining
algorithms. This emphasized the importance of SVM in actual use cases and hence
is included in this project as one of the major algorithms to test.
Intrusion detection should happen whenever necessary and is one of the chal-
lenging tasks in the modern networking industry [2]. A network should be contin-
uously monitored to detect the intrusion so we need an intrusion detection system
which will monitor network for the sudden intrusion. In this paper, several classifica-
tion techniques and Machine Learning algorithms have been considered to categorize
the network traffic. Out of the classification techniques, we have found nine suitable
classifiers like BayesNet, Logistic, IBK, J48, PART, JRip, Random Tree, Random
Forest, and REPree. This study showed which algorithms are implemented in a situa-
tions that demanded speed and precision. Hence, some of the algorithms are included
in the project for classification.
It is very necessary to efficiently distribute the electricity across the population for
reducing the power loss [3]. Smart Grids (SG) have the potential to reduce the power
72 V. S. Maddila et al.
loss when distribution. Many algorithms are used to analyze and predict the most
suitable one that can be applied to SGs. Out of Support Vector Machines (SVM),
K-Nearest Neighbor (KNN), Logistic Regression, Neural Networks, Naive Bayes,
and decision tree classifier, have been deployed for predicting the stability of the SG.
Identifying the drugs have received a lot of interest so based on 443 sequence-
derived protein features, we applied the algorithms to identify if the protein is drug-
gable and also to know the superior algorithm of the chosen set of algorithms [4].
Neural Network is the best classifier, with the accuracy of 89.98%.
Identifying the stock marketing trends is quite challenging for anyone, so we can
use the comparative analysis of Machine Learning algorithms to identify which ne
works the best for the identification [5]. After using five techniques i.e., Naive Bayes,
Random Forest, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and
Softmax, results show that Random Forest algorithm performs the best for the data
sets that are large, and Naive Bayesian performs the best for the data sets that are
small.
3 Methodology
In this project, there are four major steps to arrive at the desired end result of a
comparative analysis flutter application. They are as follows.
To deploy the Machine Learning pipeline, it should be embedded into a server envi-
ronment. This can be done with the flask web framework which creates routes for a
specific task in the web. Before starting with the coding part, we need to download
flask and some other libraries. Here, by using virtual environment, all the libraries
are managed and made easier for development and deployment. Create script.py file
in the project folder and implement the Machine Learning pipeline code as designed
above. Then we import the libraries and by using app = Flask(__name__), we create
an instance of flask. @app.route(‘/’) is used to set the URL that should trigger the
function index() and in the function index used render_template(‘index.html’) to
display the script index.html in the browser.
Once the Machine Learning pipeline is set into the flask environment, it should be
enabled to accept and send data. A flow of data must be established from the API call
to Machine Learning pipeline and again to the API call. The flask API handles the
74 V. S. Maddila et al.
calls through HTTP and Werkzeug libraries which are responsible for HTTP calls
and file handling, respectively.
Finally, the API should be interacting with an interface to display the results. Simi-
larly, the user will also need an interface to upload the data and look at the results.
To allow the application to develop and expand into multiple formats in the future,
Flutter is used. Flutter’s dart language allows easy application building for multiple
platforms (Figs. 1 and 2).
The flow of the application is as follows: Once the user starts the application,
they are greeted with a home page that has three buttons. There is a file selection
button and two file upload buttons. Initially, the user has to click on the file selection
button and pick their CSV file. Once the file is selected, the application comes back
to the home page with a changed text over the buttons informing the user that the
file is picked. Once the file pick confirmation text is shown, the user can move onto
uploading the file for either regression task or classification task. If the user clicks
on the regression task, the regression path of the API is invoked and the Machine
Learning pipeline processes the data accordingly. Similarly, when the user clicks on
the classification upload button, the classification path of the API is invoked and the
pipeline processes the data accordingly. Once the processing of the pipeline is done,
a pandas table is generated with comparative results of all the tested algorithms. This
table is converted into a json and sent to the application interface through an API
call. Once the interface receives the data, it shows the user a view of tabulated data
in a new screen.
4 Results
Let us understand the working of the application through an example. Let’s assume
the user wants to perform a comparative analysis on different algorithms for a clas-
sification task on a data set called boston.csv (https://www.kaggle.com/datasets/pux
ama/bostoncsv) (Fig. 3).
The boston.csv file has 15 columns with 1 target column and 14 features. The label
column has 5 classes which describe the favorability level of living in the location.
All the features in the data set are describing a particular location inside boston. The
goal of this classification task is to predict the class of a particular location depending
on 14 features (Fig. 4).
In the above screenshot a, we see the home page of the application, with three
buttons. To continue, the user clicks on the file picking button at the bottom right of
the screen which takes them into the android file space. In the above screenshot b,
we see the file picker of android where the user can select the file they want to upload
or perform the task on. Once the user selects the boston.csv data set, the file picker
models locks onto the file location and keeps it ready for upload to the API (Fig. 5).
In the above screenshot a, we see an alert generated by the application so that the
user know which algorithms which be checked on the data set. For classification,
Fig. 4 Flutter application showing the home page and android file picker
Fig. 5 Flutter application showing an alert to inform the user about the algorithms
A Mobile-Based Dynamic Approach to Comparative Study of Some … 77
we use Random Forest, decision trees, light gradient boosting, K-nearest neighbors,
Support Vector Machine, Ridge Classifier, extra trees, multi-layer perceptron, and
gradient boosting classifier. In the screenshot b, we see the next step which is after
the removal of the alert, the application shows text saying “File selected” which
informs the user that they can proceed with the upload of file. Since the user is using
a classification data set, they will have to choose the first button which says “Upload
for classification.”
Once the back end is done with the processing, the user is automatically diverted
to a new screen as per screenshot c, where a tabular information is presented. This
table gives comparative results of all the algorithms previously mentioned. If the
user wishes to restart the whole process or try the application with a new data set,
they simply have to press the back button at the navigation bar or the system’s back
button at the bottom. The whole process can be redone.
References
technologies and management for computing, communication, controls, energy and materials.
IEEE, pp 89–95
3. Bashir AK, Khan S, Prabadevi B, Deepa N, Alnumay WS, Gadekallu TR, Maddikunta PKR
(2021) Comparative analysis of machine learning algorithms for prediction of smart grid stability.
Int Trans Electr Energy Syst 31(9):e12706
4. Jamali AA, Ferdousi R, Razzaghi S, Li J, Safdari R, Ebrahimie E (2016) DrugMiner: comparative
analysis of machine learning algorithms for prediction of potential druggable proteins. Drug
Discov Today 21(5):718–724
5. Kumar I, Dogra K, Utreja C, Yadav P (2018) A comparative study of supervised machine
learning algorithms for stock market trend prediction. In: 2018 Second ınternational conference
on ınventive communication and computational technologies (ICICCT), pp 1003–1007
Land Cover Mapping Using
Convolutional Neural Networks
Abstract Using deep learning, the proposed method provides a novel approach
for UC Merced satellite photos. The main aim of the deep learning method is to
draw out a large number of features without human interaction. Adding object-
based segmentation to deep learning further improves classification accuracy. Remote
sensor images are accurately classified using deep object-based feature learning with
CNN. This method is based on the extraction of deep features and their application to
object-based classification. The proposed machine extracts intensive capabilities the
usage of predefined filter values, which improves standard overall performance when
compared to randomly initialised clear out values. In complicated satellite pictures,
the object-based classification technique can preserve edge statistics. Object-based
totally deep studying is used to growth type accuracy and reduce complexity. The
proposed object-based deep learning strategy is used to seriously enhance the type
accuracy. Challenge-based category outperformed the experiments.
1 Introduction
Land cowl mapping and detection, water useful resource detection, agricultural
usage, wetland mapping, geological statistics, and concrete and nearby making plans
are handiest few of the packages for classifying numerous regions of far flung sensing
pix. However, due to its complexity, categorization of far off sensing photographs
stays a time-eating operation. In categorization, feature extraction can be very large.
The class manner loses efficiency when characteristics are selected manually with
human interaction. As an end result, we use an independent function mastering
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 79
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_8
80 C. Srilakshmi et al.
2 Related Work
In 2021, Ali Jamali presented “Improving land use land cover mapping of a neural
network using three optimizers of multi-verse optimizer, genetic algorithm, and
derivative-free function,” which describes how facts on land use land cover are
crucial for land control and planning. To growth the accuracy of faraway sensing
photo categorization, the use of a small-sized neural community, three optimizers of
the multi-verse optimizer, genetic set of rules, and by-product-unfastened character-
istic are designed inside the MATLAB programming language. The outcomes are
compared to the ones of a medium-sized neural network created inside the MATLAB
programming language based totally on the results of the assessments. Landsat-8
imagery with a spatial decision of pixel-based totally is used [1].
In 2021, Yao Li, Peng Cui, Cheng-Ming Ye, and Jose Marcato Junior presented
“Accurate Prediction of Earthquake-Induced Landslides Based on Deep Learning
Considering Landslide Source Area,” which explains that earthquake brought on
landslide (EQIL) is an unexpectedly converting system happening at the Earth’s
floor; this is firmly managed by way of the Earthquake in question and propensity
situations. To provide an explanation for the complex link and enhance spatial predic-
tion accuracy, they present a deep studying framework that takes into account the
source location function of EQIL. To isolate the source area of an EQIL, we first
employed high-decision remote sensing photographs and a virtual elevation version
(DEM). For EQIL prediction, shallow machine getting to know models most effective
take use of applicable parameters, consistent with this look at [2].
In 2018, “Identification of farm regions in satellite pictures using Supervised
Classification Technique,” proposed by Rahul Neware and Amreen Khan outlines
the process. Finding land usage or area covered by reviewing prior satellite data
and providing analytics is a remote sensing difficulty. This work examines the use
of supervised classification to identify farm areas from satellite images. Minimum
distance, maximum like hood, spectral angle mapping, parallelepiped classification,
and land cover signature classification are some of the mathematical procedures used
for classification [3].
Land Cover Mapping Using Convolutional Neural Networks 81
3 Proposed Methodologies
3.1 VGG16
VGG-16 is a 16-layer deep convolutional neural community. You can use the
ImageNet database to load a pretrained model of the community that has been skilled
on over one million photographs. The community can categorise photographs into
a thousand distinct item classes, such as keyboards, mice, pencils, and a variety of
animals. As an end result, the network has learnt a spread of wealthy feature repre-
sentations for a spread of pix. The network’s image enter size is 224 × 224 pixels.
This model takes the input image and turns it into a 1000-value vector.
⎡
⎤
y0
⎢ ⎥
⎢ y1 ⎥
⎢
⎥
⎢ y2 ⎥
y=⎢ ⎥
⎢ y3 ⎥ (1)
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
y999
The configuration “D” in the table below is referred to as VGG16. There are
16 weight layers in the configuration “C.” In stacks 3, 4, and 5, however, the last
convolution layer is a 1 × 1 filter. This layer was utilised to boost the decision
functions’ nonlinearity without compromising the layer’s receptive field. Unless
otherwise noted, configuration “D” shall be referred to as VGG16 throughout this
discussion. Unless otherwise noted, configuration “D” shall be referred to as VGG16
throughout this discussion (Fig. 1).
The input to the cov1 layer is a 224 × 224 RGB photo with a hard and fast size.
That photo is going to be processed through a stack of convolutional (conv.) layers
with an extremely slim receptive area: 33 (the smallest size is to capture the notions
of left/right, up/down, and centre). It also uses eleven convolution filters in one of the
planned steps, which may be thought of as a linear adaptation of the enter channels
82 C. Srilakshmi et al.
(observed by the way of nonlinearity). The convolution stalk was about to 1 pixel,
and the spatial padding of conv. layer enter is a set to one pixel for 33 conv. layers
in order to the spatial resolution was stored after the convolution. Five max-pooling
layers, which follows a part of the conv. layers, do spatial pooling (now not all of the
conv. layers are travel with by means of max-pooling). Max-pooling is accomplished
with stalk 2 throughout a 22 pixel body.
Because of its efficacy in spatial feature exploration, the CNN method is useful for
high-resolution picture categorization. Deep features are extracted from the LISS
III picture using CNN as a feature extractor. The acquired deep characteristics are
merged with object-based textural information to further boost efficiency. A neural
network is generally trained in two stages: The input is routed entirely through the
network in this phase. Gradients are back propagated (backprop), and weights are
modified at this step.
It is made up of input and output layers, as well as numerous hidden levels in
between. It’s far typically used to categorise photos, cluster them based on simi-
larity, and recognise gadgets within scenes. They are algorithms that can recognise
a wide variety of visible statistics. Two key elements in CNN are the locally related
network and parameter sharing. If we appoint a completely connected community,
a massive wide variety of parameters will be essential. As a result, locally linked
Land Cover Mapping Using Convolutional Neural Networks 83
3.3 Fast.AI
Fast.ai is a non-profit research organisation that focuses on deep learning and AI. the
objective of making deep learning more accessible to the general public “Practical
Deep Learning for Coders” is a massive open online course (MOOC) that requires
only knowledge of the programming language Python as a prerequisite. Fast.ai is a
deep modern-day library that offers practitioners with excessive-degree components
which can offer effects in conventional deep today’s domain names speedy and
without problems, in addition to teachers with low-stage components that can be
mixed and coupled to create novel techniques. It strives to attain each dreams without
sacrificing usability, flexibility, or normal performance. This is made viable through a
layered layout that represents the not unusual underlying styles extremely present day
numerous deep modern and information processing techniques in language which
are easy to apprehend.
84 C. Srilakshmi et al.
4 Experimental Analysis
We’re ready to study the data once we’ve downloaded it and unzipped it. First,
let’s look at the labels. Pandas helped us read the labels. The information is stored
in One-Hot Encoded format. Each image includes 17 labels, with “0” indicating that
the label is not present in the image and “1” indicating that it is there. We have 2100
photographs in total (Fig. 3).
Checking the dataset’s distribution for data imbalances is a critical phase in the
process. To store the classes and their numbers, we first establish a new data frame.
The following visualisation depicts the dataset’s class imbalances. There are almost
1200 photographs in the pavement class and just 100 shots in the aeroplane class
(Fig. 4).
We’ll need to prepare the data for the training. Our data labels are in One-Hot
Encoded format, which I expected to be difficult. Fortunately, a quick search of the
Fast.ai Forum revealed that Fast.ai has a native method for multiple-labels in the
One-Hot Encoding format. When labeling the dataset, we must provide the column
names as well as the fact that it is a multi-category dataset. After we’ve created the
data source, we can use Fast.ai’s data bunch API to feed it through. In addition,
we make certain data enhancements. After we’ve created the data source, we can
use Fast.ai’s data bunch API to feed it through. In addition, we make certain data
modifications.
Next, we build a learner and provide it the data bunch we made, the model we
want to use (in this case, ResNet34), and the metrics we want to use (accuracy thresh
and F Score) (Figs. 5, 6, 7, 8, 9 and 10).
5 Architecture
In our architecture diagram it states that how to do perform the process of UC Merced
to land use mapping. This project use various techniques for bringing the land use
mapping (Fig. 11).
86 C. Srilakshmi et al.
N N
1
Mean x = xi j (2)
N2 i=0 j=1
N N
1
Variance V = (xi j − x) (3)
N2 i=0 j=1
N N
Entropy = (C(i, j)) log(C(i, j)) (4)
i=1 j=1
N
Contrast = (i − j)2 C(i, j) (5)
i, j=0
Land Cover Mapping Using Convolutional Neural Networks 87
N N
Energy = C(i − j)2 (6)
i=1 i=1
n
Local consistency = 1/ 1 + (i − j)2 C(i, j) (7)
i, j=0
n
3
Cluster Shade = i − Mx + j − M y C(i, j) (8)
i, j=0
n
4
Cluster prominence = i − Mx + j − M y C(i, j) (9)
i, j=0
88 C. Srilakshmi et al.
n n
where Mx = iC(i, j) and M y = jC(i, j)
i, j=0 i, j=0
N N
i=1 j=1 [i jC(i, j)] − μx μ y
Correlation = (10)
σx σ y
μ N N μ N N
where x= i C(i, j) y= j C(i, j)
i j i j
N N N N
σx2 = (a − μx )2 C(i, j)σ y2 = (b − μx )2 C(i, j)
i j i j
6 Observations
By using this project, we can identify the changes in the area done from previous
years till now. And also we can identify multiple objects in a satellite image which
we are using in the project from the dataset we loaded.
Land Cover Mapping Using Convolutional Neural Networks 89
7 Conclusion
References
1 Introduction
To what extent can neural network models communicate with each other and discover
each other’s identity? How would they use this information in a competitive setting?
For example, in a social deduction game, players attempt to uncover each other’s
hidden allegiance—typically with one “good” team and one “bad” team. Players
S. Jain
Maharaja Agrasen Institute of Technology, Delhi, India
e-mail: [email protected]
V. K. Bunga (B)
Andhra University, Visakhapatnam, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 91
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_9
92 S. Jain and V. K. Bunga
must utilize deductive reasoning to find the truth or instead lie to keep their role
hidden. In this paper, we explore if neural networks can be successfully trained to
compete in a scenario such as this, and how would the opposing parties interact
during the period of debate.
1.1 Among Us
Within this design space, there are adversarial parties working against each other.
In the deep learning realm, adversarial situations appear in adversarial examples [1]
and within generative adversarial networks (GANs) [2]. In particular, the latter often
designs a contest between two neural networks, in the form of a zero-sum game. We
build upon these concepts and foundations in our work.
2 Approach
During the period of communication in Among Us, the crew must do tasks and gain
information while the imposters must sabotage and kill the crew. We decided to
simplify the “game” by both removing the tasks and killing and making the entire
perception of each player predetermined [7]. Specifically, each agent is given as input
a matrix consisting of N events. During each event, the agent “sees” some subset of
the other players (sight is reflexive and symmetric, but not transitive) [8]. They also
may experience a “sabotage,” which means they are in the prescience of an imposter
who is sabotaging. They cannot see the imposter, but the imposter can see them
during this event. Ultimately, each agent will receive a N (P + 1) matrix, where P
is the total number of players—the first P values of a row indicate who the agent is
seeing, and the final value indicates a sabotage.
These event matrices are generated randomly using four key parameters: the total
number of players (P ), the number of those players who are imposters (I), the chance
that any given pair of players will see one another during an event (view chance), and
the chance that any given imposter will sabotage during an event (sabotage chance)
[9].
In order to process the input events, our agent model has an LSTM, which generates
cN and hN , which are used as the initial inputs h0 and c0 to the next phase: commu-
nication. Communication is also modeled with an LSTM. During each “round” of
communication, every agent contributes a message vector of size M via a small MLP
using ht as input. These messages are collected into a matrix of size M P , which
is fed as input into each agent’s LSTM so that their memory can be updated before
the next round of communication—there are R rounds in total. We chose to model
94 S. Jain and V. K. Bunga
Fig. 1 Diagram of the agent model. The red section is the perception LSTM, which takes in a
sequence of events. The blue section is the communication LSTM, which receives messages and
generates messages using the green MLP. Finally, the purple section is the voting MLP, which
produces the agent’s vote vector
communication this way because it is a simple and symmetric way for the agents to
pass information between each other in multiple rounds. Figure 1 shows a simplified
diagram of the complete architecture.
The last stage is the most simple one—voting. The model simple takes the hR and
cR from the end of the communication LSTM and feeds them through a small MLP
finalized with a softmax layer. This results in a probability vector of a confidence
that a specific player should be “voted out.”
At the end of voting, we calculate a “crew score.” This score is simply the
maximum vote-off score that any imposter received, where votes are averaged across
all agents. Clearly, the imposters would like to minimize the votes on themselves,
so their loss function for training purposes is simply the crew score [10]. Inversely,
the crew’s loss function is the negation of the crew score. This takes the form of a
zero-sum game with similar application as in generative adversarial networks [11].
There were multiple decisions we had to make when attempting to train the models.
Similar to GANs, we decided the best approach to train the two adversarial models
was to use alternating training to help find eventual convergence and juggle two
different optimizations.
Establishing Communication Between Neural Network Models 95
2.5 Challenges
The main challenge in building and running the model came mainly in the form of
training time, gradient overlap, and hyperparameter search. As discussed previously,
to help speed up training time and reduce gradient overlap between multiple different
models training over the same dataset, we had to only train either one imposter or
crew, and keep the rest constant. This limits the quick adaptivity and may lead to
the models attempting to train against the constant cooperative models rather than
against the adversarial model.
Furthermore, the immense number of combinations of hyperparameters and
input and outputs sizes, along with no precedent for recommended values, lead
us to generalize our model significantly. Although this offers the extended ability
for customizability and exploration, it severely reduces reproducibility over small
changes.
Another important challenge to recognize is the extreme “black box” architecture
of the model. With our current model iteration, we have no current approach to visu-
alizing exactly what the model is doing, especially with communication. Therefore,
there are assumptions and estimations when creating conclusions about the model.
To list the variable hyperparameters and input and outputs sizes for the player model,
overall, we have batch size, number of epochs, epoch length, the global LSTM hidden
layer size, and the amount of players and imposters. For the perception phase, there
is the chance a player will view another player, the chance a sabotage occurs, and
the number of N events. The communication stage has variables the message size M
and the number of communication rounds R (Fig. 2).
3 Results
Due to the unprecedented architecture, we decided our overall goals were to find the
balance between imposter and crew score, locate trends within the variable changes,
and attempt to understand how the models communicate, to a degree. We conducted
many training sessions over multiple different combinations of variables.
A typical training session started with the first two alternating epochs, which
were considered as the “initialization” of the two models. Training the imposters
first gave them time to learn to not vote for each other. Then, the crew had learned
the same thing on the next epoch. After this initial training, then the models started
to significantly train against each other.
96 S. Jain and V. K. Bunga
Fig. 2 Grid search heat map over the size parameters M (message vector) and rounds (the number
of communications) displaying the ending crew score after 6 epochs. Figure 3 contains the other
parameters used
Through our search, we found a trend to three different types of results: an imposter
‘win,’ a crew ‘win,’ and a convergence. An imposter win happens under certain
circumstances where the crew are unable to employ a better strategy than the crew’s
equally voting all other candidates. A crew win occurs with the opposite—the crew
can determine exactly who the imposters are, and crew vote for one imposter. A
convergence happens to place itself in the middle, for what we assume to be a fair,
balanced game between the two opposing parties.
Due to resource constraints, we were not able to get as much data as we had hoped
for. This lead to a consistency issue, where the same variable combination would
yield different results because of certain randomness within training the models.
The hopeful goal was to find the set of variables that lead to not only a convergence,
but an almost “tug-of-war” battle when one model trains against the other and vice
versa. This is graphically shown via an oscillating crew score around the eventual
convergence crew score. This would mean that the two models are successfully
balanced and able to improve their strategy with training time against the opposition.
Figure 3 shows an example of a single run which converges to a crew score of around
0.6.
Establishing Communication Between Neural Network Models 97
Fig. 3 Crew score over time for a particular training run. The crew model is being trained in odd
epochs, and imposters in even epochs
Unfortunately, we were not able to interpret the communication vectors due to the
disconnect between representation of information. However, we found an interesting
result when the players were given no identifying information (e.g., there were no
sabotages whatsoever in the perception stage), the crew could still win the situation.
At first glance, this seems almost impossible because the crew would be unable to
identify the imposters via the necessary events. Further thought lead us to make an
estimated conclusion that the crew model was actual creating a hidden tag or key
to identify themselves within the communication stage, therefore identifying the
imposters immediately. The imposters would then have to train to figure out this key
or become helpless [10]. This situation resembles almost a cryptographic adversarial
problem, unlocking more future directions this work could potentially travel.
4 Conclusion
This approach is just a start into understanding how neural networks communicate
with one another. Although assumptions can be made, understanding the actual infor-
mation being transferred between each model is a dark area. Innovative techniques
must be found to analyze this information flow in order to fully comprehend the
strategy within the game.
98 S. Jain and V. K. Bunga
Furthermore, the results could help to explore more cryptographic use cases for
neural networks. It was observed that the crew could identify the imposters with no
prior information which leads us to believe the crew model had some sort of hidden
code or tag to identify themselves.
Finally, similar work in adversarial networks could potentially give rise to more
techniques utilizing them. Generative adversarial networks are at the forefront of
adversarial techniques, but possibly large multi-agent adversarial communication
networks could give heed to more good results.
References
1. Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples,
2015. Published as a conference paper at ICLR 2015
2. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio
Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence N,
Weinberger KQ (eds) Advances in neural information processing systems, vol 27, pp 2672–
2680. Curran Associates, Inc.
3. Foerster JN, Assael YM, de Freitas N, Whiteson S (2016) Learning to communicate to solve
riddles with deep distributed recurrent q-networks. CoRR, abs/1602.02672
4. Foerster JN, Assael YM, de Freitas N, Whiteson S (2015) Learning to communicate with deep
multi-agent reinforcement learning. CoRR, abs/1605.06676
5. Abadi M, Andersen DG (2016) Learning to protect communications with adversarial neural
cryptography. CoRR, abs/1610.06918
6. Fekri MN, Grolinger K, Mir S (2022) Distributed load forecasting using smart meter data:
federated learning with recurrent neural networks. Int J Electr Power Energy Syst 137:107669
7. Ni Y, Li X, Zhao H, Yang J, Xia W, Gui G (2022) An effective hybrid v2v/v2i transmission
latency method based on LSTM neural network. Phys Commun 51:101562
8. Jing Y, Ye X, Li H (2022) A high precision intrusion detection system for network security
communication based on multi-scale convolutional neural network. Futur Gener Comput Syst
129:399–406
9. Liu F, Meng W, Lu R (2022) Anti-synchronization of discrete-time fuzzy memristive neural
networks via impulse sampled-data communication. IEEE Trans Cybern
10. Bhushan M, Nalavade A, Bai A (2020) Deep learning techniques and models for improving
machine reading comprehension system. IJAST 29
11. Kim SH, Moon SW, Kim DG, Ko M, Choi YH (2022) A neural network-based path loss
model for bluetooth transceivers. In: 2022 International conference on information networking
(ICOIN), pp 446–449. IEEE
Hardware Implementation of Cascaded
Integrator-Comb Filter Using Canonical
Signed-Digit Number System
Satyam Nigam
1 Introduction
In the modern era of digital signal processing, the requirement of more robust and
accurate filters has increased. Computerized signal processors are utilized for facil-
itating activities of separating in high-transfer speed applications. Computerized
channels are undeniably utilized because of the reality they put off various issues
connected with simple channels. Filters are commonly used to reduce noise and
improve quality of information. The presence of interference can veil the resultant
sign, or mediate with its investigation. Notwithstanding, if signal and noise involve
restrictive spectral locales, it tends to be plausible to improve the signal-to-noise
proportion (SNR) with the guide of utilizing advanced channels with digital filters
S. Nigam (B)
Electronics and Communication Department, Netaji Subhas University of Technology, Dwarka
Sector-3, Dwarka, Delhi, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 99
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_10
100 S. Nigam
2 Method
The filter simulations are based on FPGA Nexys A7-100T (xc7a100tcsg324-1) board.
All the simulation results are calculated by Xilinx VIVADO and MATLAB R2017a.
The cascaded integrator-comb (CIC) decimation filters are multi-rate digital filters.
These filters have several advantages over traditional moving average filters such as
there is no need for coefficient data storage as CIC filters do not have filter coefficients
and narrow band-pass filters can be made using CIC filters, which are less complex
than their FIR counterpart. The noise is reduced due to averaging. Perfect precision
can be achieved using fixed-point numbers only.
The principle on which these filter work is recursive running sum. Traditional
digital FIR filters require total (D − 1) summations to calculate a single filter output.
Moreover, there are also (D) multiplications with filter coefficients.
The recursive running sum filters shown in Fig. 1 subtract the oldest sample
x(n − D) from the output y(n − 1) and simultaneously add present input sample x(n)
to obtain the present output y(n). The number of computations per output sample
reduces drastically by implying this methodology.
Equations 1 and 2, as in [4], will express the complete picture of recursive running
sum filters
1
y(n) = [x(n) − x(n − D)] + y(n − 1) (1)
D
And the transfer function for a second order CIC filter can be expressed as
2
Y (z) 1 1 − z −D
H (z) = = (2)
X (z) D 1 − z −1
The application of filter is endless as they support high data-rate filtering [3].
Modern technologies such as quadrature amplitude modulation, delta-sigma ADC
and DAC use CIC filters. CIC filters are best to design anti-aliasing filters prior to
decimation. CIC filters provide BIBO stability, are linear phase, and have a finite
length impulse response (Table 2).
Frequency characteristics of CIC filers are defined in Eq. 3, as mentioned in [4].
j2π f sin 2π2f D
HCIC e = e− j2π f (D−1)/2 . (3)
sin 2π2 f
A CSD number is a vector of digits. These digits involve {1, 0, 1}. Where two digits
1 and 0 represent standard binary and 1 represents − 1 in decimal format. We can
Hardware Implementation of Cascaded Integrator-Comb Filter Using … 103
use these digits to for avoiding the occurrence of consecutive 1’s. To achieve this,
we first need to convert the binary number to CSD number with the help of recursive
algorithms. These algorithms works on the principle that if the numbers of 1’s in a
number is k having the value 2k − 1, then we can represent the number using only
two 1’s and k − 1 zeros. Other than this some higher radix CSD representations are
also discussed in [7]. Higher radix CSD numbers can further reduce the complexity
of the systems.
Now focusing on simple CSD representation, we have designed an adder circuit,
which saves carry propagation and gives out result in single iteration. This can be
achieved once we complete the conversion process from CSD numbers to encoded
CSD formats. The requirement of encoding is compulsory to achieve optimized
addition operation. Negative and positive encoding [8] is one such technique we use
to convert a CSD number to an encoded CSD format. As a part of this encoding
technique, an algorithm converts 1, 0 and 1 to 10, 00 and 01, respectively. Rest
one combination will be treated as do not care on hardware implementation.
To further elaborate how we can convert a binary number to CSD, we can look
into the algorithm explained in [9].
The CSD adder consists of combinational hardware logic gates based on the truth
table in Table 3 and use iterations to calculate sum and carry simultaneously. From
Table 3, it is evident that the addition module has no dependency over previous
carry. On hardware level modeling, we can get all these values in a single clock
cycle ready to recalculate next operation. The adder is modeled in MATLAB also to
inculcate the behavior in CIC filter (Fig. 3). As the filter consists of adder the step
responses are similar and the robustness of the system increases. The real novelty in
the design lies under the CSD adder modules. As we have introduced a direct decimal
to encoded CSD number conversion algorithm that checks the stored decimal number
in a variable and directly records a code accordingly Fig. 4.
This implementation improves path latency by using carry free addition, and the
whole system can run at higher clock frequencies. Filter responses are simulated
104 S. Nigam
using MATLAB, and the rest implementation is doing using VIVADO software.
The software provides the interface to convert hardware description of the filter to a
schematic.
The hardware implementation of CSD adder is done using Verilog HDL. This will
help us analyze the power and delay optimization of the system. The parameters of
CIC filter that are incorporated on the FPGA implementation are shown in Table 2.
Simulations are done using Xilinx Simulator (XSIM) on Intel Core i5 sixth generation
processor. Simulation results are shown in Fig. 5.
The path delay (delay experienced by a sample from input to output of the circuit) is
known as data-path delay. The comparison in Table 4 shown the timing improvement
of the CSD number system implementation in CIC decimation filters.
The filter consumes 0.022 W with CSD adder at frequency 50 MHz and normal
CIC filter consumes 0.028 at 100 MHz. Filter input has 16 bit width (Bin ), while
output having 18 bits (Bout ) by using formula mentioned in [4] for full resolution
Hardware Implementation of Cascaded Integrator-Comb Filter Using … 105
Fig. 5 Xilinx simulation results for 16 bit CIC filter with CSD adder
Here, for the implemented filter, D (conversion factor) is 2 and M (differential delay)
is 1. The number of sections (N) are two. The post implementation utilization statistics
shows that filter having CSD adder requires 802 LUTs and 108 Flip-flops.
6 Conclusion
A different approach for adding two samples in a filter is presented. Using CSD
number system has their own advantages in making high performance devices. The
design methodology used in this work require further analysis of this implementation
yet results look promising. The algorithm as well as the CSD adder are working
seamlessly both in hardware as well as in software implementation. With all the
timing constraints met, there is an improvement in data-path delay. We can therefore
conclude that the CSD arithmetic have caused some improvement in time domain
while maintaining the integrity of the signal. CSD number while reducing switching
activity also prevent carry to propagate, thus overall reducing computational effort
of the system.
Hardware Implementation of Cascaded Integrator-Comb Filter Using … 107
References
1. Crochiere R, Rabiner L (1975) Optimum FIR digital filter implementations for decimation,
interpolation, and narrow-band filtering. IEEE Trans Acoust Speech Signal Process 23(5):444–
456
2. Goodman D, Carey M (1977) Nine digital filters for decimation and interpolation. IEEE Trans
Acoust Speech Signal Process 25(2):121–126
3. Peled A, Liu B (1974) A new hardware realization of digital filters. IEEE Trans Acoust Speech
Signal Process 22(6):456–462
4. Hogenauer E (1981) An economical class of digital filters for decimation and interpolation.
IEEE Trans Acoust Speech Signal Process 29(2):155–162
5. Jing Q, Li Y, Tong J (2019) Performance analysis of resample signal processing digital filters
on FPGA. EURASIP J Wirel Commun Netw 31:1–9
6. Aggarwal S, Meher PK (2022) Enhanced sharpening of CIC decimation filters, implementation
and applications. Circuits, Syst Signal Process:1–23
7. Coleman JO (2001, Aug) Express coefficients in 13-ary, radix-4 CSD to create computationally
efficient multiplierless FIR filters. In: Proceedings European conference on circuit theory and
design
8. Parhami B (1988) Carry-free addition of recoded binary signed-digit numbers. IEEE Trans
Comput 37(11):1470–1476
9. Hewlitt RM, Swartzlantler ES (2000, Oct) Canonical signed digit representation for FIR
digital filters. In: 2000 IEEE workshop on signal processing systems, SiPS 2000. Design and
implementation (Cat. No. 00TH8528). IEEE, pp 416–426
Study of Security for Edge Detection
Based Image Steganography
Abstract Due to the increase in the speed of the computer networks, the advan-
tage in information communication has also increased and thus, the importance of
information security cannot be overstated. The method of steganography has the
purpose of making the communication hidden by wrapping the data into some other
form. Many steganography formats of files are available, but still images in digital
form are considered most famous due to their internet frequency. There are plenty
of algorithms for data hiding but might have a compromise in quality of image.
In this paper, a new technique is proposed for performing image steganography by
utilizing edge detection techniques for grayscale images. In this proposed method,
edges are detected by converting the image into grayscale and then, text is embedded
into the digital image. Different methods like Canny, Sobel, Prewitt and Laplacian
are applied here for having better secrecy and also for enhancing the stego image as
well for getting correct data that was embedding.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 109
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_11
110 N. Jani et al.
[2]. Steganography use of image, video, audio or text files is done for information
embedding.
Steganography methods are mainly of five different types depending on the cover
object, which can be text, image, video, audio and network. So of all the methods
most frequently used are image steganography in which an image is used as a cover
object for embedding the secret information. So in image steganography for various
images, there will be several file formats and mostly all of them are there for particular
application.
In the image steganography, lossy and lossless are the compression types [3].
These two types are not similar but both of them will be helpful in saving storage.
In lossy type, smaller files are created and image data that are excess are discarded
from the original image. In lossless technique, information is hidden in important
areas. So more preferred formats of image are lossless one for image steganography
[4].
Steganography on the whole image can sometimes distort the image to a level at
which the modifications in the image are perceptible to a human eye. Also, several
techniques have been evolved over time through which an adversary can easily extract
the concealed message from the image. Much research work has been carried out to
overcome this limitation with image steganography, and it involves embedding the
confidential data in some specific regions of an image called region of interest (ROI).
One such region is the edges present in an image. Any changes made in the smooth
area of the image is easily detected by the human vision, but when the same changes
are done on the edges of an image, there is a high probability of going unnoticed or
undetected [1].
An image’s basic characteristic is that the image edge is a combination of values,
which are gray, having huge pixel change in the image [5]. The initial step of detecting
an edge is processing the image, and once the detection is done, the results will have
influence on analysis of image and direct recognition [6]. Thus, detecting edges has
important significance.
In this paper, the contribution of the author includes collection of information from
various sources and designing the work. The contribution also consists of analyzing
the performance of the proposed system and drafting the article regarding the same.
This section includes an introduction about steganography, why it is used, how
it is used and why image steganography is used despite the different steganography
methods available and also an introduction about edge detection methods. Section 2
consists of a literature survey about various edge detection methods. In Sect. 3, an
algorithm developed for the proposed system is mentioned. In Sect. 4, performance
analysis about the proposed work is done, and lastly, conclusion and future work is
included in Sect. 5.
Study of Security for Edge Detection Based Image Steganography 111
2 Literature Survey
3 Proposed Methodology
The current mentioned scheme uses edge detection techniques and least significant bit
(LSB) technique to conceal the secret message in the cover object. The cover image
is first pre-processed by converting the RGB image to grayscale image, and then, the
last bit of every 8-bit pixel value is set to zero. For edge detection, the image is passed
112 N. Jani et al.
through five stages namely image noise reduction phase, gradient calculation phase
(where we apply different kernels like Sobel, Prewitt and Laplacian), non-maximum
suppression phase, double thresholding phase and edge tracking by hysteresis phase.
For double thresholding, the following conditions were used for classifying the edges
as strong and weak edges
where I(x, y) indicates the magnitude of the pixel at position (x, y).
The heterogeneous edge is then classified as either strong or weak edge, depending
on the connectivity with the strong edge in the edge tracking by hysteresis phase.
After this phase, we will get an edge-detected image of the original image. We store
the secret data in those detected edges in the original image. The image formed after
this stage, will be our stego image. The initial steps (i.e., Step 1, 2, 3 and 4) are same
for both the encoding and the decoding phase of the proposed method as mentioned
below:
Step 1: Convert the cover image into grayscale image
Step 2: Change the rightmost (LSB) bit of every pixel to zero.
Step 3: Perform edge detection on that image using the below mentioned steps
(a) Apply Gaussian blur to smooth the image
(b) Detect edge direction and intensity by computing the gradient of the
smoothen image.
(c) Carry out non-maximum suppression on the modified image to thin out the
edges
(d) Implement double threshold detection on non-maximized suppressed image.
(e) Perform edge tracking by hysteresis and an edge-detected image will be
generated.
Step 4: From the edge detection image in an array.
Encoding Phase:
Step 5: Convert the secret message from ASCII to binary format.
Step 6: Compute the length of the message in 8-bit binary format and prepend it
to the binary format of the secret message.
Step 7: Perform LSB substitution of secret data in the original image at the stored
edge positions.
Study of Security for Edge Detection Based Image Steganography 113
Decoding Phase:
Step 5: With the help of stored edge positions, extract the first 8-bits of the
message, which will determine the length of the message.
Step 6: Then continue extracting the message bit by bit until the length of the
message.
Step 7: Convert the extracted message from binary format to ASCII format.
Image steganography using edge detection techniques like Canny, Prewitt, Laplacian
and Sobel were used to detect edges, and further, the secret message was concealed
in those edges. The results obtained by using the Sobel edge detection method are
shown in Figs. 1 and 2, and the concealed image obtained as a result of the proposed
method is shown along with the original image.
In order to perform comparative analysis of the different techniques, the following
performance metrics were considered: mean squared error (MSE), root mean
square error (RMSE), peak signal-to-noise ratio (PSNR), embedding capacity (EC),
structural similarity index (SSMI) and image quality index (Q) as discussed in
[11–13].
For “lena.png” and “lena.jpeg” image, the measurements obtained from the
proposed algorithm are shown in Tables 1 and 2, respectively.
As seen from Table 1, Prewitt and Sobel methods perform almost similarly for
a PNG image. The Laplacian method achieves better results than other methods in
Fig. 1 Images obtained after each phase of the Sobel edge detection method a Original (cover)
image. Image obtained after b applying Gaussian blur, c applying horizontal Sobel filter, d applying
vertical Sobel filter, e calculating the magnitude, f non-maximum suppression phase, g double
thresholding phase and h edge tracking by hysteresis phase
114 N. Jani et al.
terms of MSE, RMSE and PSNR but at the cost of EC. The Canny method has the
highest embedding capacity among the four methods, and values for other parameters
are close enough. For JPEG images, the Canny method performs better than others as
evident from Table 2. Thus, we can say that the Canny method is ideal for performing
edge detection based image steganography. As this scheme focuses more on the area
of secrecy, the proposed scheme achieves much better results but at the cost of better
payload capacity and is also practically feasible.
Study of Security for Edge Detection Based Image Steganography 115
Though there are several steganography methods available, in this paper, we have
discussed image steganography, which is the most prominent method for information
hiding. The main file formats of image have several ways of hiding information with
several weak as well as strong points. This paper consists of methods like Canny,
Sobel, Prewitt and Laplacian for edge detection. This helps one to understand the
methods in a better way with their capability as well as robustness. Canny edge
detection is considered to be the best among the all four edge detection methods.
For different image formats and images, Canny edge detection works best because
it is time efficient as well as simple to implement. Along with that edge detection
using Canny edge detection method is less noisy when compared with other methods.
Thus, it depends on the user as well as the application on which algorithm to use.
In this paper, we have converted the image into grayscale for better edge detection.
So instead of this RGB channel can be used through which embedding capacity
can be increased but while doing the same take care of mean square error might be
increased. Also, the use of key can be done for encoding and decoding for more
security. When normal strings are used as keys high security will not be provided.
Use of cryptographic functions in order to implement keys will give us more security
but at the same time embedding capacity might get compromised. Therefore, it is
important to check all the parameters while performing the task.
References
1. Luo W, Huang F, Huang J (2009) A more secure steganography based on adaptive pixel-value
differencing scheme. Springer Science Business Media, LLC
2. Johnson NF, Jajodia S (1998) Exploring steganography: seeing and unseen, 0018-
9162/98/$10.00 © 1998 IEEE
3. Moerland T. Steganography and steganalysis. Leiden Institute of Advanced Computing Science
4. Lal M, Singh J (2008) A novel approach for message security using steganography. In: 3rd
International conference of advance computing and communication technologies, 08-09 Nov,
2008, APIIT, Panipat, India
5. Xiaofeng Z, Yu Z, Ran Z (2011) Image edge detection method of combining wavelet lift with
canny operator. Proc Eng 15:1335–1339
6. Kaur SP, Singh S. A new image steganography based on 2k correction method and Canny edge
detection. Int J Comput Bus Res. ISSN: 2229-6166
7. Jain N, Meshram S, Dubey S (2012) Image steganography using LSB and edge-detection
technique. Int J Soft Comput Eng (IJSCE) 2(3). ISSN: 2231-2307, July 2012
8. Alam S, Kumar V, Siddiqui WA, Ahmad M (2014) Key dependent image steganography using
edge detection. In: Fourth international conference on advanced computing & communication
technologies
9. Singh S, Agarwal G (2010) Use of image to secure text message with the help of LSB
replacement. Int J Appl Eng Res 1
10. Setiadi DRIM, Jumanto J (2018) An enhanced LSB-image steganography using the hybrid
canny-Sobel edge detection. Cybern Inf Technol 18(2):74–88. https://doi.org/10.2478/cait-
2018-0029
116 N. Jani et al.
11. Pradhan A, Sahu AK, Swain G, Sekhar KR (2016) Performance evaluation parameters of image
steganography techniques. In: 2016 International conference on research advances in integrated
navigation systems (RAINS), pp 1–8
12. Asamoah D, Oppong E, Oppong S, Danso J (2018) Measuring the performance of image
contrast enhancement technique. Int J Comp Appl 181:6–13. https://doi.org/10.5120/ijca20
18917899
13. Gaurav K, Ghanekar U (2018) Image steganography based on Canny edge detection, dilation
operator and hybrid coding. J Inf Secur Appl 41:41–51. ISSN 2214-2126
Fake Face Image Classification
by Blending the Scalable Convolution
Network and Hierarchical Vision
Transformer
Abstract A face has been used as a primary and unique attribute to authenticate
individual users in emerging security approaches. Cybercriminals use the double-
edged sword “image processing” capabilities to deceive innocent users. The under-
lying technology is based on advanced machine learning and deep learning algo-
rithms. The intentions of cyber criminals range from simple mimicking or trolling
to creating violent situations in society. Hence, it is necessary to resolve such prob-
lems by identifying the fake face images generated by expert humans or artificial
intelligent algorithms. Machine learning and artificial neural networks are used to
resolve the issue. In this work, we have designed an approach for detecting deep
learning-generated fake face images by combining the capabilities of the scalable
convolutional neural networks (CNN) “EfficientNet” and hierarchical vision trans-
former (ViT) “shifted window transformer”. The proposed method accurately clas-
sifies the fake face images with a 98.04% accuracy and a validation loss of 0.1656
on the 140 k_real_fake_faces image dataset.
1 Introduction
Computers can distinguish humans based on their distinct physical or biometric char-
acteristics. Such systems have the potential to be employed in a variety of near real-
world applications, including security and telemedicine systems. Preventing illegal
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 117
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_12
118 S. Kerenalli et al.
2 Literature Review
This section presents brief information on face manipulation and generation tech-
niques along with the detection strategies. Face modification methods have substan-
tially advanced over time. Several technologies, including deepfakes and face
morphs, have been presented in the literature to achieve realistic face manipula-
tions. Manipulated photos and videos might be nearly indistinguishable from genuine
content to the untrained eye.
The commonly used algorithms for fake image generation range from simple cuts
and pastes to generating visually appealing images using deep learning techniques.
Fake Face Image Classification by Blending the Scalable Convolution … 119
Primary deep learning methods include autoencoders (AE) and generative adversarial
networks (GAN).
Encoder-decoder [5] pairs are used in the AE-based generator. They are taught
how to dismantle and recreate one of the two faces that will be swapped. The decoder
is then switched to reassemble the target picture. DeepFaceLab, DFaker1, and Deep-
Faketf2 are among the examples. GANs [6] use two neural networks, one for gener-
ation and one for discrimination for image generation. The generator network draws
the random noise to generate the sample to deceive the discriminator. The discrim-
inator learns to differentiate the generated sample against the actual sample. The
discriminator score is iteratively fed into the generator network to learn a better
approximation of the real sample. The iteration stopped when the discriminator
could not differentiate between the real and generated images. WGAN, StarGAN,
DiscoGAN, and StyleGAN-V2 are some examples.
A style transfer-inspired swapping generator architecture for generative adver-
sarial networks was developed by Karras et al. [7]. The new scheme offers intuitive,
scale-specific artificial feature generation by automatic learning and unsupervised
separation of high-level features—such as posture and identity. The resulting pictures
also generate random variations (e.g., freckles, hair). This generator scheme has
state-of-the-art capability in terms of distribution quality, resulting in superior inter-
polation features, and better disentangles the varying latent characteristics. In the
current work, a data set authored by XHLULU [8] is used for training and evaluation
purposes. This dataset contains a set of high-quality human faces with diverse styles.
The various approaches for identifying fake image manipulation come with their
merits and limitations. Abdulreda and Obaid [9] studied the earlier work to examine
deepfakes, principles, and counterfeiting strategies. ImageNet was used by Touvron
et al. [10] to train a robust convolution-free transformer. On ImageNet, the vision
transformer obtained top-1 accuracy of 83.1% on single-crop evaluation. More
importantly, they offered a transformer-specific teacher-student strategy. It is based
on a distillation token, which guarantees that the student pays attention and learns
from the teacher. They showed how beneficial this token-based distillation might
be, especially when using a convent as a teacher. As a result, they outperformed the
ConvNets for ImageNet and when transferring to other tasks.
The fused facial region feature descriptor (FFRFD) is presented as an alternative
to mining more subtle and broad characteristics of deepfakes. It is a discriminative
feature description vector for practical and quick detection. DeepFake faces contain
more minor feature points in facial areas than actual faces, according to their study. To
improve the generalizability, FFRFD takes advantage of such crucial insights. FFRFD
to be trained with a random forest classifier to accomplish efficient detection. Tests
on six large-scale Deepfake datasets show that this lightweight strategy successfully
has an AUC of 0.706, outperforming most state-of-the-art approaches [11].
120 S. Kerenalli et al.
To detect deepfake videos, Kolagati et al. [12] created a deep hybrid neural network
model. They gathered data about numerous face features from the videos using facial
landmark recognition. This information is fed into a multilayer perceptron, which
is used to understand the distinctions between actual and deepfake videos. They
utilize a convolutional neural network to extract features and train on the videos
simultaneously. These two models are used to create a multi-input deepfake detector.
The model is trained using a subset of the Deepfake Detection Challenge Dataset and
the Dessa Dataset. The suggested model produces good classification results with
an accuracy of 84%, an AUC score of 0.87, accuracy of 84%, and an AUC score of
0.87.
A transfer learning based on the ResNet50v2 architecture was described for
detecting manipulated images, especially spliced images. The image splicing
approach was employed with the pre-trained weights of a YOLO CNN model
to see whether they generated the photos had been intentionally tampered with.
Vision transformer-based models and self-attention processes have piqued researcher
curiosity to acquire visual representation successfully. Convolution layer injection
and the construction of local or hierarchical structures are among them. Several
solutions add substantial architectural complexity [13].
A self-attention mechanism is implemented into CNNs to mimic long-range inter-
actions. It was pretty tricky due to the issue of convolutional kernel locality. Recent
research has discovered that a self-attention-only structure with no convolution works
effectively [14]. The original ViT beats convolutional networks, which need hundreds
of millions of images to train; however, such a data demand is not always practical.
Data-efficient ViT (DeiT) solves this problem by combining neural network-based
teacher distillation. Despite its promise, it adds to the supervised training complexity,
and current reported performance on data-efficient benchmarks still falls short of
convolutional networks [10].
3 Proposed Method
The detailed workflow for the EfficientNet hierarchical vision transformer approach
is illustrated in Fig. 1. The given set of input images is preprocessed using various
image augmentations. The preprocessed image set is used to extract the features
from the EfficientNet, and then the hierarchical vision transformer block classifies
the images as real or fake.
3.1 Preprocessing
Image augmentation operations are carried out on the dataset to generate more data
to train the model. They include:
Fake Face Image Classification by Blending the Scalable Convolution … 121
3.2 EfficientNet
Randomly selecting the network depth, width, or image resolution is a common tactic
to scale a CNN network for training and validation. This method involves tuning the
network manually and frequently produces sub-optimal results. EfficientNet [3] is a
systematic compound scaling method. This method appropriately resizes the network
width, depth, and resolution correctly. A compound scaling coefficient p is a hyper-
parameter used to scale coefficients of the network width w1 , depth d 1 , and image
resolution r 1 for available computational resources.
p
Depth = d = d1 (1)
p
Width = w = w1 (2)
p
Resolution = r = r1 (3)
The values w1 , d1 , r1 , are scaled using a small grid search over the network param-
eters with a restriction of w11 × d12 × r12 ≈ 2. This restriction imposes that, for any
new chosen value of “p”, the sum of all the floating-point operations should increase
“2p” times approximately.
122 S. Kerenalli et al.
Fig. 2 EfficientNetB0
The details of the experimental design, dataset, and results are presented in this
section.
The “140 K Real and Fake Faces” dataset consists of 70,000 StyleGAN2 generated
counterfeit images and 70,000 real images collected from Flickr.com by Nvidia
Incorp. This dataset consists of many high-quality face images with subjects of
different sex, age, and even real-world fake faces [8]. The dataset is split into train,
test, and validation subsets. The train set has 135,000 images, and test and validation
groups have 5000 images each. An input dataset has an equal number of labeled
images and belongs to two classes, “real and fake”.
4.3 Discussions
5 Conclusion
In this work, we combined the capabilities of the EfficientNet, the scalable convo-
lutional neural networks (CNN), an algorithm for feature extraction, and the hierar-
chical SWIN vision transformer for classification. On the validation set, the proposed
model obtains an accuracy of 98.04% and a validation loss of 0.1656. The above
method is applied to process the real-time image data in our future work.
Fake Face Image Classification by Blending the Scalable Convolution … 125
References
Abstract Osteoarthritis (OA) is listed among the chronic disease which shows a
lot of impact on the health-related issues among the people. The ancient or the
earlier scoring method and the physical diagnosis processes would be requiring man’s
higher involvement and time. This article will involve in developing the automatic
OA diagnosis by keeping the base of convolutional neural architecture (CNN) for
implementing the rheumatologist to diagnose and the planning the treatments. Our
article will be including the various CNN architecture like DenseNet121, VGG16,
ResNet50 with InceptionV3 this may or may not have the argumentations of the
data carried out from the RA diagnosis. By the end of the 50th epoch, InceptionV3
accuracies have reached to the 98.91% with no data augmentations with least errors
of 1.65%; DenseNet121 reaches to the 96.57% (training set). InceptionV3 it reaches
96.7% to the validations that will indicate the InceptionV3 with lower variance.
1 Introduction
Most of the patients who is suffering from the OA will be classified as the different
types of OA like knees, hips, with the spine osteoarthritis. OA can be easily diag-
nosed by doctors from physical testing with medical image of the OA patient that
is accumulated in the hospitals. The accuracy of the OA in the patients is very time
talking process. Many of the articles have been considered the automatic detection
of the OA from the images that are dependent on the deep learning algorithms [1–
3]. But, the medical image will use the health behaviors that will collect data from
S. Lebaka (B)
Department of Electronics and Communication Engineering, Visvesvaraya Technological
University, Belagavi, Karnataka, India
e-mail: [email protected]
D. G. Anand
Rajiv Gandhi Institute of Technology, Bengaluru, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 127
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_13
128 S. Lebaka and D. G. Anand
the statistical data that are easy for collecting and using the images of the medicine
[4]. By taking into account the medical image parameters for predicting the occur
of the various types of OA which will signifies the impacts on the pro-active with
prevention of the medical care. Here, we will be using the deep neural network in
detection of occurring the OA by taking the data got from the statistics to use the
medical and health behavior data [5, 6]. The analysis of the component along the
scale of the quantile transformers was used in generating the features by using the
background of the patient through the record that is identifying the occurrences of
the OA.
OA is led by the disability and considered as the great social costs of the aged
citizens. According as the age is increased and the increase in the obese, the OA
will be much widespread than the earlier years [7, 8]. As the time passes, there is
a vast increase in the insight to originate pathogenesis in the pains OA. Preventing
and diseases modifications were the areas to be targeted from the many research
endeavor that indicate the huge potentials [9]. The prevalent and incapacitating disor-
ders that are significant and rising in the health burden along the notable implication
of individual affected, medical system, wider socioeconomic cost. A convolutional
neural network (CNN) the part of the machine learning technique utilized the auto-
matic OA [10, 11] detecting multilayer neural network structures. CNN application
will be including the automatic OA complexity predications, detecting early and
various stage of management of gout arthritis. Deep learning application used for the
analyzing medical image. DenseNet has 121 layer CNN; CheXNet network trained
among 10,000 X-ray images with ten different diseases. The Xception model and
VGG based model used for image classifications. These models includes the perfor-
mance analysis of the CNN architectures like ResNet50, DenseNet121, VGG16, and
InceptionV3 for the RA classification purpose A correct predictions OA that is very
essential step to effectively diagnose and preventing severes OA.
2 Literature Survey
Mora et al. [11] and Mandl et al. [12] have represented the collected some of the knee
joint image that are spliced to train (∼60%) tested (∼25%) according to the KL grade.
To classify the images of the knee joints, they also tried to get the features of the fully
connected, the pooling, and the convolution layer of the VGG16, VGG-M-128, and
BVLC CaffeNet. The binary classification along with the multi-classification, linear
SVM is trained separately along the gained results. These results of the classifications
were achieving the CNNs that were comparing the knee classifications of OA image
used Wndchrm. Classification accuracy is computed which utilizes the convolutional
(conv4, conv5), and (pool5) layers are high in comparison are connected layer feature.
Performance Analysis of Osteoarthritis from Knee Radiographs Using … 129
There were minimum variation of the classification accuracies got from the features
of VGG-M-128 net and BVLC reference CaffeNet by comparing to VGG16.
Antony et al. [13] present the multi-classes the classification has resulted in the fine-
tuning of BVLC CaffeNet with the VGG-M-128 network. These authors eliminated
the VGG16 networks while experimenting this because of the differentiation in the
accuracies in the already trained CNN were too less; the fine-tuned VGG16 had
greater computational expenses. These data is then divided to train (60%), validate
(10%) and testing (30%) dataset used in fine-tune. For increasing, the total numbers
of the datasets of these samples can be including the right to left-flipped images
of the knees to the trained dataset. Network is fine-tuned to 20 epochs by utilizing
the learning rate of 0.001 to transfer layer, boost it in novel introduced layer by the
factor of 10. This performance of the fine-tuned BVLC CaffeNet is significantly
better compared to the VGG-M-128.
CNN architecture will have the hidden layers, input layers along with the output
layers. Input image represents by input layer, with all the feature that were leant on
the hidden layer, result gained from output layers. This architecture makes us utilize
the multiple convolution layer, pooling layers, and activation layers. The convolution
layer of CNN is one of the important properties in the extraction. These filters are
utilized in finding the various properties in various level through applications of
multiple filter along the various kernel size the input image.
a. Image dataset: The X-ray data is collected through ChanRe Rheumatology along
with Immunology Center, Bengaluru, Karnataka, India. Data includes normal and
RA affect image. Data contains 398 radiographs image that has 180 usual and
168 OA image. The 398-dataset divided to 278 images (89%) to train and 35
image (30%) to validate. Images were of the dimension of 256 * 256. Figure 1
has samples of normal and OA dataset.
b. Data augmentation: Deep learning requires the large datasets for producing the
accurate productions on the testing phases. Many datasets for accurate predictions
need the testing images. Data augmentation necessary steps to increase overall
performance of the network, and it avoids overfitting, irrelevant pattern reorgani-
zation, and memorizing condition occurred during training phase of the network.
All the algorithms trained with both original dataset as well as augmented dataset.
130 S. Lebaka and D. G. Anand
Initially, images are resized to 256 * 256 sizes. Data augmentation operations
include width shift and height shift by 0.2, shear transformations by 0.02, zoom
range by 0.17, and rotation by 7 degree. All the images are normalized to 256 * 256
sizes (Table 1).
1
f (si ) = (2)
1 + e−si
Performance Analysis of Osteoarthritis from Knee Radiographs Using … 131
Table 2 Description of
Performance parameter Formula
performance analysis
formulas Accuracy (TP + TN)/(TP + FN + TN + FP)
Loss (FP + FN)/(TP + FN + TN + FP)
Sensitivity (TP)/(TP + FN)
Specificity (TN)/(TN + FP)
Precision (TP)/(TP + FP)
Recall (TP)/(TP + FN)
F1-score 2 *(Precision * Recall)/(Precision +
Recall)
132 S. Lebaka and D. G. Anand
5 Conclusion
Our study was the comparative work of various CNN architectures of the RA
disease diagnose. Various CNN architectures uses DenseNet121, VGG16, ResNet50
with InceptionV3. The network trained the original and data arguments dataset of
RA detections. Finally, 50th epoch, InceptionV3 accuracies reach 96.1% zero data
augmentation at lower error at 1.65%, DenseNet121 reached 96.57% (training set).
InceptionV3 it reached 96.1% on validation set indicates InceptionV3 which is less
variances by comparing. Accurately, the plots will be indicating the InceptionV3
performing the training and validations set of augmentation and non-augmentation
data by comparing different models. Simultaneously, the loss plot will be showing
Performance Analysis of Osteoarthritis from Knee Radiographs Using … 133
variations of the categorical cross-entropy’s losses with the many networks. Incep-
tionV3 results less categorical cross-entropy losses in augmentation and non-
augmentation data comparing to different models. The conclusion is drawn that
InceptionV3 performs well on the different plot of F1-scores, precisions, recalls,
specificity.
134 S. Lebaka and D. G. Anand
References
1. Lim J, Kim J, Cheon S (2019) A deep neural network-based method for early detection of
osteoarthritis using statistical data. Int J Environ Res Public Health 16(7):1281
2. Antony J, McGuinness K, O’Connor NE, Moran K (2016, Dec) Quantifying radiographic knee
osteoarthritis severity using deep convolutional neural networks. In: 2016 23rd International
conference on pattern recognition (ICPR). IEEE, pp 1195–1200
3. Kokkotis C, Moustakidis S, Papageorgiou E, Giakas G, Tsaopoulos DE (2020) Machine
learning in knee osteoarthritis: a review. Osteoarthr Cartilage Open 2(3):100069
4. Chen P, Gao L, Shi X, Allen K, Yang L (2019) Fully automatic knee osteoarthritis severity
grading using deep neural networks with a novel ordinal loss. Comput Med Imaging Graph
75:84–92
5. Saleem M, Farid MS, Saleem S, Khan MH (2020) X-ray image analysis for automated knee
osteoarthritis detection. SIViP 14(6):1079–1087
6. Awan MJ, Rahim MSM, Salim N, Rehman A, Nobanee H, Shabir H (2021) Improved deep
convolutional neural network to classify osteoarthritis from anterior cruciate ligament tear
using magnetic resonance imaging. J Pers Med 11(11):1163
7. Wahyuningrum RT, Anifah L, Purnama IKE, Purnomo MH (2019, Oct) A new approach to
classify knee osteoarthritis severity from radiographic images based on CNN-LSTM method.
In: 2019 IEEE 10th ınternational conference on awareness science and technology (iCAST).
IEEE, pp 1–6
8. Glyn-Jones S, Palmer AJR, Agricola R, Price AJ, Vincent TL, Weinans H, Carr AJ (2015)
Osteoarthritis. The Lancet 386(9991):376–387
9. Chow YY, Chin KY (2020) The role of inflammation in the pathogenesis of osteoarthritis.
Mediators of inflammation, 2020
10. Hunter DJ, Bierma-Zeinstra S (2019) Osteoarthritis. The Lancet 393(10182):1745–1759
11. Mora JC, Przkora R, Cruz-Almeida Y (2018) Knee osteoarthritis: pathophysiology and current
treatment modalities. J Pain Res 11:2189
12. Mandl LA (2019) Osteoarthritis year in review 2018: clinical. Osteoarthritis Cartilage
27(3):359–364
13. Antony J, McGuinness K, O’Connor NE, Moran K (2016) Quantifying radiographic knee
osteoarthritis severity using deep convolutional neural networks. 1195–1200. https://doi.org/
10.1109/ICPR.2016.7899799
Efficient Motion Detection
and Compensation Using FPGA
Abstract Moving target detection plays a vital role in computer vision applications,
which requires rigorous algorithms and high computation; also, realization of these
algorithm in real time is difficult. Hence, in this paper, FPGA implementation of
moving object detection by correcting the unwanted motion is proposed. The sub-
components of the architecture are optimized to get the optimization of the entire
module. Also, the performance of the proposed technique is validated by calculating
the different performance metrics, and the method is compared with some of the
method from literature. The experimental results illustrate the approach is excellent
in detecting the moving objects even when the camera is moving and less hard-
ware is utilized to detect the moving targets efficiently. The designed architecture is
implemented on Xilinx 14.5 Zynq Z7-10 series FPGA development board.
1 Introduction
Visual moving object detection and tracking has attained pronounced advances in the
past decades and has been used in many applications such as traffic monitoring and
control, security purpose, surveillance, military, sports, and much more. Real-time
computer vision applications require rigorous algorithms and high computation; due
to the limited input and output capabilities and processing power, it is difficult to
run these algorithms on general purpose computers. Hence, a high-speed dedicated
hardware development is essential. There are many methods to determine the motion
and compensate the unwanted motion that hinders the efficient detection of moving
parts in the video sequences. Traditionally, many methods exist to determine the
motion in the video sequence. VLSI realization of these methods degrades the detec-
tion accuracy therefore not suitable for hardware implementation. However, the full
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 135
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_14
136 N. Sridevi and M. Meenakshi
search (FS) method illustrates consistent data flow hence appropriate for hardware
realization. The FS block matching method of motion estimation is most popular
technique. Here, the block is divided into many sub-blocks of finite number; from
this, the best match is detected to estimate the motion vector.
Contribution: This paper focuses mainly on architecture level, and the contribu-
tions are (i) The “Controller” architecture is optimized by using simple counter-
architecture. (ii) The regular adder is supplanted with high-speed Kogge–Stone
adder architecture. (iii) The uses of data reuse technique reduce the overall hardware
utilizations.
Numerous methods are available in the literature to detect and compensate the
motion; some of them are discussed below. Authors in paper [1] used two steps
to obtain the motion vectors. A rough motion vector was obtained in the first step,
and in the second step, search area is reduced to compute the vectors. Block matching
with RANSAC approach to estimate the movement is proposed in [2] to compensate
for ego-motion. Also, the author of the paper developed a prototype and implemented
on FPGA. FPGA-based motion estimation is explained in paper [3]. Here, correla-
tion between pixels in the reference and current frames is used for estimating the
motion. Further, to reduce the lighting issues, the author of the paper used normal-
ized correlation. In reference [4], two steps are used for searching the blocks to
estimate the motion initially; using reference frame, a partial measure for distortion
was constructed and is extended further to find the motion vectors. A review on
various motion estimation was explained in paper [5]. To reduce the zonal improve-
ment range the authors of paper [6] used Wavelet transform to get the starting point
from current frame and reference frame. This improved the TZ search by removing
the data dependency. The authors of the paper [7] claim that the designed architec-
ture for processing uses pipeline technique, reduced latency, highest output, complete
utilization of hardware.
The rest of the paper is organized as follows; the brief insight about the method-
ology of designing the architecture and their sub-components are presented in Sect. 2
which follows the results of FPGA implementation and simulation in Sect. 3. Finally,
Sect. 4 concludes with some remarks.
2 Architecture Framework
Figure 1 illustrates the architecture proposed to estimate the motion in the video
sequences. The current block having a block size of 16 × 16 is stored in the external
memory from which the motion vector is estimated. However, the pixel data of the
computational zone and current block are directed to DEMUX unit, which further
Efficient Motion Detection and Compensation Using FPGA 137
distributes these data pixels to three different memory units called SUBM1, SUBM2
and SUBM3. The pixel values stored in these memory units are used for finding the
motion using successive approximation difference (SAD). Further, motion correction
and compensation is accomplished using the compare and correct module. However,
for the next frame, the motion vector stored in the memory as a motion topographies
is taken as reference. Moreover, the data path for the complete system framework is
controlled by the controller module.
The raw video data are not compatible to process in the hardware because of
various formats of video frames. Hence, it necessitates to convert the raw video
sequences in to fixed number of frames having standard size. Here, in this work,
MATLAB and system generator is used for conversion. After preprocessing, the
description of each module is given below.
The controller in this architecture is used to regulate the data flow by enabling or
disabling a suitable module. The operation starts by setting the select line of DEMUX
to zero. This in turn directs the pixels of the first sub-matrix to move to SM1. Abso-
lute difference calculation from current zone will begin after 16 clock cycles. Now,
the SM2 block is selected by controller through DEMUX. Until 49th clock pulse,
the absolute differences are estimated for the two submatrices, and the final absolute
value starts at 50th clock pulse. The rationality is realized using the state machine
counter-rationalities. Further, the corresponding architecture is shown in Fig. 2 which
comprises of counter-unit, decision unit, and encoder unit. The operation of each unit
is discussed below. When the reset signal goes high, for every positive clock cycle,
138 N. Sridevi and M. Meenakshi
the counter-unit starts to count. The decision unit then chooses the proper categoriza-
tion of sub-blocks to be enabled. Finally, the encoder unit encodes the units to enable
the processing components in proper order. However, the controller unit is initial-
ized to preliminary state. After calculating the motion vector corresponding to three
successive 16 × 16 sub-matrices, then the process again repeats from the begining.
By using the parallel architecture of Kogge–Stone adder, the absolute difference in
terms of binary arithmetic is achieved in this work. There by the speed of processing
is improved.
The motion recognition module comprises of absolute difference, adder array, and
decision unit, through which the motion vectors are estimated to find the actual move-
ments present in the video sequences. The method adopted for absolute difference
calculation is illustrated in Fig. 3. Here, due to the Kogge–Stone adder the entire
architecture is optimized.
Subtraction is performed through addition using binary arithmetic from which the
sign of the value is determined using concatenation module. Based on these value,
the MUX will send either the data or data in 2 s complement form. Here, in this work,
Kogge–Stone adder is used to compute the 2 s complement there by the architecture
is optimized. However, all the matched motion vectors stored in the memory are
motion vectors obtained from overlapping blocks using the comparator. The memory
is composed of three sub-parts, namely SUBM1, SUBM2 and SUBM3. Through the
select line of the DEMUX module from top to bottom way row by row, the pixels
enter in to the memory in 16 bits. The architecture of local memory is given in Fig. 4.
The estimated motion vectors stored in the memory are compared to determine
the vectors that represent the motion of the object in the video. However, the false
detection from this technique is removed by using correction module and is given
Fig. 5. Block of memory is used for storing the motion vectors obtained from SAD.
After comparing the motion vectors using comparator, the vectors are then stored in
separate modules which are then interpolated using interpolation array for detecting
the correct motion vector. Moreover, the controller module will control the entire
data flow of the architecture.
In this work, the accuracy of detection is demonstrated with two different scenario of
traffic videos. The proposed approach is synthesized using Xilinx Zynq-Z7-10 series
FPGA board and is coded using VHDL language. Figure 6 demonstrates the detection
of moving object from the normal traffic flow. Similarly, Fig. 7 shows the detection of
moving vehicles from moving camera. The unwanted motion instigated by the motion
140 N. Sridevi and M. Meenakshi
of the camera which may be placed on the moving platform is corrected using the
compare and correction unit, which is given in Fig. 5. However, the performance
metrics of rate of true detection (TR), rate of false detection (FR), and moving
objects not detected (NR) are calculated from randomly selected frames and are
given in Table 1.
Table 1 Illustration of
Scenarios TR (%) FR (%) NR (%)
performance metrics
Normal traffic 93.78 3.97 1.5
Moving camera 92.93 4.714 2.14
The experimental results obtained are compared with the methods from literature
and are listed in Table 3 in terms of hardware used (Table 4).
4 Conclusion
This paper proposes a hardware framework design based on FPGA to estimate and
compensate the motion. In order to detect only the moving objects from the video
frame, the method has to confiscate the motion induced by the camera. The proposed
method detects and compensates the unwanted movements. Also, to achieve opti-
mized hardware utilization, different techniques are used such as Sum of Absolute
Difference (SAD), controller and memory block. The Kogge–Stone adder is used
to optimize the sum of absolute difference calculation and modified absolute differ-
ence block. The basic logical elements are used to optimize the comparator and
compensation block. Further, the controller module is optimized by adopting the
counter-operation using FSM modeling. Finally, the entire framework is tested and
synthesized on Xilinx FPGA development board.
References
1. Chatterjee SK, Vittapu SK (2019) An efficient motion estimation algorithm for mobile
video applications. In: 2019 Second ınternational conference on advanced computational and
communication paradigms (ICACCP)
2. Tang JW, Shaikh-Husin N, Sheikh UU, Marsono MN (2016) FPGA based real-time moving
target detection system for unmanned aerial vehicle application. Int J Reconfig Comput
3. Viorela Ila V, Garcia R, Charot F, Batlle J (2004) FPGA implementation of a vision-based
motion estimation algorithm for an underwater robot. In: Becker J, Platzner M, Vernalde S
(eds) Field programmable logic and application. FPL 2004. Lecture notes in computer science,
vol 3203. Springer, Berlin, Heidelberg
4. Paramkusam V, Reddy VSK (2014) A novel block-matching motion estimation algorithm
based on multilayer concept. In: 2014 IEEE ınternational conference on multimedia and expo
(ICME)
5. Bnadou R, Hiramori M, Iwade S, Makino H, Yoshimura T, Matsuda Y (2016) A study on
motion estimation algorithm for moving pictures. In: 5th IEEE global conference on consumer
electronics, pp 1–3
6. Pakdaman F, Hashemi MR, Ghanbari M (2020) A low complexity and computationally scalable
fast motion estimation algorithm for HEVC. Multimed Tools Appl, Springer, vol 79: 11639–
11666
7. Singh K, Shaik RA (2015) A new motion estimation algorithm for high efficient video coding
standard. In: Annual IEEE India conference, 2015
Efficient Motion Detection and Compensation Using FPGA 143
Abstract The blockchain technology is used throughout the world for digital ledgers
of transactions. To maintain all participant transactions on a blockchain, distributed
ledger technology (DLT) is used. Blockchain technology has become a popular
method of transferring huge amounts due to the pandemic situation. A hacker was
able to exploit some part of a chain, smart contract, exchange, or stealing cryptocur-
rency, that is, a hack or a theft. Such attacks are referred to as crypto-Jack attacks.
The Wormhole cryptocurrency platform was hacked in Feb 2022, resulting in a loss
of $326 Millions. Many cryptocurrency Web pages are being hacked every day. Most
of the attacks are focusing the financial purpose, so banking attackers are globally
raised. To identify crypto-jackers and to trace out the hijackers, we proposed the new
technology.
T. Subburaj
Department of Computer Applications, Rajarajeswari College of Engineering, Bangalore, India
e-mail: [email protected]
K. Shilpa (B) · S. Sultana
Department of CSE, CMR Technical Campus, Hyderabad, Telangana, India
e-mail: [email protected]
S. Sultana
e-mail: [email protected]
K. Suthendran
Department of Information Technology, Kalasalingam Academy of Research and Education,
Krishnan Koil, Srivilliputhur, Tamilnadu, India
e-mail: [email protected]
M. Karuppasamy
Department of Computer Applications, Kalasalingam Academy of Research and Education,
Krishnan Koil, Srivilliputhur, TamilNadu, India
S. Arun Kumar
Department of Computer Science and Engineering, Bethesda Institute of Technology and Science,
Gwalior, India
A. Jyothi Babu
Department of MCA, Sree Vidyanikethan Engineering College, Tirupati, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 145
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_15
146 T. Subburaj et al.
1 Introduction
2 Crypto-Jacking Attack
Crypto-jacking has become increasingly common across the world today. Criminals
are constantly learning new methods. On September 2017, cybercriminals began
launching crypto-jacking attacks [3].
The most common methods cybercriminals use to steal currency from cryptocurren-
cies are file-based, browser-based, and cloud-based crypto-jacking. Crypto-jacking
attacks are unique types of attacks in the crypto sphere. An attack of this kind involves
having hackers create a fake environment around a particular block in a blockchain
so that they can manipulate the artificial node in order to commit crimes.
To launch a crypto-jacking attack on the Ethereum network, there are generally two
methods. By establishing TCP connections to an attacker’s malicious node before
the victim can establish TCP connections to itself, an attacker can crypto-jacking
an Ethereum client, and the other is to own the table and crypto-jacking. A victim-
centric crypto-jacking attack framework is designed in light of the attack methods.
The crypto-jacking framework defines four states for nodes, and based on the change
in state, we can determine if a node is currently being attacked by crypto-jack [4].
Figure 1 shows the crypto-jacking model.
3.1 Running
The running state means a node which has already running on at least last 24 h.
Every node was maintained the database and table. The database of the every node
is maintained in the ping message and also pong message. In table, every node
automatically filled the information about the SHA3 features.
3.2 Reborn
When the crash or recover occur in node, the node will be reboot. After rebooting
the node, node’s information from the table is deleted. As a result, an attacker is now
able to initiate connections or carefully-crafted packets to a rebooting victim once
the node has been accessed. It may be best to collect malicious packets at this time
[5].
3.3 Submerge
A state submerge occurs when an attacker establishes Maxpeers from its own adver-
saries to the victim’s TCP connections[6]. The victim is forced to set all connections
to incoming at this point.
3.4 Poisoned
A table poisoned when the attacker inserts a crafted nodeID into the victim table.
There is a high probability that the victim forms all outgoing connections to the
attacker’s nodes.
An active crypto-jacking attack shows the states changing. Node A is in the
running state when it has been active for a while. A node A needs many ping requests
to launch an attack by an attacker B. After victim A becomes reborn for some reason
(as we would say, it reboots), its attack probability increases. Depending on the
configuration of A’s connection, it will either be submerged or poisoned. A may
enter the submerge state if it does not create outgoing TCP connections. By contrast,
a crafted message from attacker B will poison the table of A.
The ground data, which is needed for studying our detection models, must be
collected beforehand [7]. Geth V 1.6.6 client is designed as the victim. First, we
gather packets from normal access connections, then we make an attack script that
attempts to send ping repeatedly at the target. After a victim reboots, packet collection
begins and continues until all connections from that victim is occupied or the victim’s
table is filled with node entries from our node.
4 Proposed Approaches
Main objective of the proposed system is to identify the attacks from cybercriminals
in cryptocurrency.
150 T. Subburaj et al.
To detect the crypto-jacking attack, we are developed the tool based on the arbi-
trary forest decision sorting. Attackers are continually sending the requests to victim
nodes. Victim nodes are collecting the all UDP packets from the spontaneous nodes.
We are analyzing the incoming packets of crypto-jacking attack.
n
H (X ) = − p(xi ) log2 ( p(xi )) (1)
i=1
Equation 1 is used to find the entropy value of X, where p(x i ) is the probability of
X. Entropy value of X about Y is
m
n
H (X/Y ) = − p xj p(y j /xi ) log2 ( p(yi/ x j ))) (2)
j=1 i=1
Discover Crypto-Jacker from Blockchain Using AFS Method 151
m
n
Flow of Data = − p di j p(Si i /di j ) log2 ( p(Si i /di j )))
j=1 i=1
A j Bi j
m n
Bi j
=− log2 . (3)
j=1
S i=1 A j Aj
The following features are used to easily identify the attacks: packets_size,
access_frequencies, and access_time [9, 10].
c. Arbitrary Forest Sorting Method
Machine learning algorithms such as the arbitrary forest sorting improve detec-
tion accuracy without causing significant computational complexity. In addition
to reducing over fitting and variance, arbitrary forest can resolve several problems
associated with decision trees.
I. AFS Model Training process
Algorithm 1: AFS model training process
Input: T, p, k Output: DT
for i 1 to n
T = withResample_Sample(T, p);
Att = get_Attributes(T );
Att = withoutResample_ Sample(Att, k);
T = remain_Attributes(T , Att );
DT[i] = create_DecisionTree(T );
return DT
II. AFS Model Classification
After the training process, a certain leaf node is reached by passing through
each leaf in the test sample. Then calculate the probability of the sample. The
AFS model relies on majority voting for classification. E = [e1, e2, ***, em],
DT = [dt1, dt2, ***, dtn], CIR = [n], C = [c1, c2, ***, cn], and CR = [CR1,
CR2, ***, CRm], respectively, define the test sample set, the other set is C =
[c1, c2, ***, cn], and classification results are CR = [CR1, CR2, ***, CRm].
for j = i; j ≤ n; j ++ do
CIR[j] = 0;
for j = 1; j ≤ n; j ++ do
ClassifyResultIndex = classify (E[i], DT[j]) CIR [classifyResultIndex]++;
maxIndex = getMaxAppeared(CIR);
CR[i] = C[maxIndex];
return CR;
5 Our Experiment
Throughout this section, we illustrate how accurate and effective our crypto-jack
detector is at detecting crypto-jack attacks on Ethereum networks. We followed four
basic steps in our experiment.
a. Collection
By sending ping repeatedly, we attempt to crypto-jack the victim by collecting
the normal incoming packets during the running state. As soon as victim reboots,
we begin collecting crypto-jack attack packets until our nodes have occupied all
of the victim’s incoming connections, and its table is filled. Wireshark is used at
this stage to collect the UDP packets from the victim. Figure 3 illustrates how to
capture sample data with Wireshark.
Additionally, we added the Ethereum devp2p protocol dissector plug-in to
Wireshark for analysis of UDP packets collected.
b. Preprocessing
Data collection is using for ping, pong, findnode, and neighbor. We are currently
decoding a pcap file of Ethereum packets captured into a readable format using
an Ethereum UDP packet dissector for discovery protocol v4. The decoded data
are shown in Fig. 4. There is a packet type, destination IP address, and source IP
address for each packet. Attack traffic samples are sampled every 5 ms for 25 ms,
with a single increase by 5 ms, while background data samples are sampled every
5 s. Using the data obtained from the five sets of data, a sample sequence consists
of 100 samples continuously, so there are 20 sequences total.
c. Training Model
Initially, we analyze the distribution of UDP packets in two states using a statis-
tical analysis. Here is the analysis. Figure 5 illustrates this. Malicious packets
have a different size distribution than honest packets. An attacker must ping the
victim many times to eclipse an honest node. In comparison with packets with the
types of findnode, neighbors, ping, and pong will contain less data information.
As a result, their sizes are distributed differently.
Figure 6 shows a higher complexity of attack access. Normally, short connec-
tion access is built for shorter connections. The attacker may wait longer for the
victim to respond with a pong when the victim cannot do so on time.
In Figure 7, the chart shows that a node under eclipse attacks experiences a
much higher visit frequency. Eclipse works by repeatedly sending ping requests
to a victim. This is an indicator that a victim is being eclipsed. Our data are
classified using random forest using these features.
154 T. Subburaj et al.
d. Detection
This data were prepared using a statistical distribution of the UDP data. Sklearn
is used to build our detection model, and the test data and collected data are split
3:7. As adversarial nodes connect to our node through UDP, it will reboot several
Discover Crypto-Jacker from Blockchain Using AFS Method 155
times. Detecting eclipse attacks with high probability can identify adversary
connection requests.
Our detection rate is quite high in practice, with a precision rate of 72% and a
recall rate of 93% (Table 1). A third of the attacked data can hit its ground label,
according to the experimental results. Currently, most of the attack packets are
able to be blocked by our detection model as more than 90% of malicious data
can be correctly identified.
6 Conclusion
7 Future Works
References
1. https://101blockchains.com/blockchain-security-issues/
2. https://www.digitalshadows.com/blog-and-research/cryptocurrency-attacks-to-be-aware-of-
2021/
3. https://www.varonis.com/blog/cryptojacking/
4. Locher T, Mysicka D, Schmid S, Wattenhofer R (2010) Poisoning the Kad network. Lecture
notes in computer science book series (LNCS) distributed computing and networking, vol 5935,
pp 195–206
5. Xu G, Liu J, Lu Y, Zeng X, Zhang Y, Li X (2018) A novel efficient MAKA protocol with
desynchronization for anonymous roaming service in global mobility networks. J Netw Comput
Appl 107:83–92
6. Marcus Y, Heilman E, Goldberg S (2018) Low-resource eclipse attacks on Ethereum’s peer-
to-peer network. Cryptology ePrint Archive, 236
7. Chen S, Xue M, Fan L, Hao S, Xu L, Zhu H, Li B (2018) Automated poisoning attacks and
defenses in malware detection systems: an adversarial machine learning approach. Comp Secur
73:326–344
8. Subburaj T, Suthendran K, Arumugam S (2017) Statistical approach to trace the source of
attack based on the variability in data flows. In: ICTCSDM 2016, Lecture notes in computer
science, LNCS 10398. Springer, pp 392–400
9. Qiang Z, Wang Y, Song K, Zhao Z (2021) Mine consortium blockchain: the application research
of coal mine safety production based on blockchain. Secur Commun Netw 2021, Article ID
5553874. https://doi.org/10.1155/2021/5553874
10. Wu D, Xiang Y, Wang C (2018) Data protection technology for information systems based on
blockchain. J Command Control 4(3)
Automated Detection for Muscle Disease
Using EMG Signal
Abstract Muscle disease is a term used to describe illnesses that affect the human
muscle system. To diagnose muscle diseases such as myopathy and amyotrophic
lateral sclerosis (ALS) specialists examine EMG signals. This manual method is a
time-consuming procedure and needs specialized skills. In this paper, we propose
an automated detection technique for the same. Proposed algorithm uses frequency
decomposition method (FDM) and classifiers such as ensemble subspace k-nearest
neighbour(KNN) to distinguish ALS and myopathy EMGs from normal EMG signals
and obtain 92.3% accuracy for ALS versus myopathy vs normal case.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 157
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_16
158 R. Tengshe et al.
the cause may even be genetic/hereditary in some cases. Muscle weakness, palsy, and
loss of brain control over muscles are potential consequences of ALS. Another con-
dition related to muscle fibre skin is myopathy. Myopathy is a term used to describe
disorders that affect muscular tissue which leads to muscle weakness, inflammation,
spasms, and cramping. If ignored, all these conditions may deteriorate over time.
Hence prompt diagnosis, medical intervention, and care are strongly advised.
A common diagnostic tool for these diseases is the electromyography (EMG).
EMG measures the cumulative effect of the action potentials generated by the con-
traction and expansion of skeletal muscles and is a good source of information regard-
ing the muscle activity and hence very useful for the diagnosis of conditions related
to muscle.
EMG is made up of a number of motor unit action potentials (MUAPs). Per-
taining to ALS shows an overall reduction in the EMG amplitude associated with
fasciculation and fibrillation potentials. There is also prolonged distal motor latency
and slowed conduction velocity. There may also be sharp wave potentials, as shown
in Fig. 1a. In muscular dystrophy or myopathy, the EMG shows motor unit poten-
tials that are prolonged and associated with polyphasia. There is also a reduction the
amplitude of EMG waves, as shown in Fig. 1b, whereas as shown in Fig. 1c, normal
muscle EMG has no fasciculations or fibrillations, and continuous muscle activity
can be seen with normal contraction and relaxation.
2 Proposed Methodology
Data set used for training the classifiers is acquired from online repository of
EMGlabs N2001 at http://www.emglab.net [14]. This is a repository of clinical sig-
nals which are divided into three major subsets: normal, ALS, and myopathy. The
ALS group consisted of eight participants, four male and four female aged from 35
to 67 years. The myopathy group contained seven subjects, two men and five women
within the age bracket 19–63 years. The normal group consisted of ten subjects, six
men, and four women aged 21–37 years, with no history or signs of neuromuscu-
lar illness. A conventional concentric needle electrode is employed. EMG signals
obtained vary in location from where they are taken and level of needle insertion
(low, medium deep insertion).
Automated Detection for Muscle Disease … 161
2.2 Pre-processing
The EMG measuring equipment’s electrical network and electronic components may
introduce noise which degrades the quality of the digitized EMG signal. To nullify
these artefacts, a pre-processing step is done which consisted of processing the signal
using a cascaded filter bank designed suitable to remove each type of noise. AC power
line interference is dealt with a notch filter with cut-off frequency 50 Hz. Another
noise which usually might get introduced is noise due to involuntary movement of the
subject. This type of noise introduces oscillations in the form of low-frequency base
line. Suppression of these baseline oscillations is done before decomposition. The
denoised EMG data is then segmented to create more sample and enhance the size
of the data set, as also done in [20]. Here, we have considered 0.5 s non-overlapping
window to obtain the required segments.
Signal is decomposed using FDM into orthogonal intrinsic band functions (FIBFs).
For detailed discussion of FDM, refer [18]. The following features are computed
from each FIBF:
1. Maximum amplitude of the EMG signal in one frame.
2. Minimum amplitude of the EMG signal in one frame.
1/2
N −1
3. Variance = N1 n=0 (s [n] − μ)2 .
N −1 s[n]−μ 4
4. Kurtosis = n=0 σ
where μ and σ denotes mean and variance of s(n),
respectively.
N −1
5. Entropy = − n=0 p (s [n]) log2 ( p (s [n])) where p (s [n]) is the discrete prob-
ability of signal s [n].
Here, we have used different machine learning algorithms including support vec-
tor machines (SVMs) with linear, quadratic, cubic and Gaussian kernal, k-nearest
neighbour (kNN), ensemble methods including ensemble bagged trees, ensemble
subspace kNN. For detail discussion on classifiers refer [13]. The performance of
these algorithms is compared to select the best classifier for the proposed methodol-
ogy. Here, we have used tenfold cross-validation method as the data set used here is
small.
162 R. Tengshe et al.
In this section, we present the results obtained for muscle disease detection using the
proposed algorithm. The data set discussed in Sect. 2 has been used here. The simu-
lations have been carried on MATLAB 2021b to obtain the results presented in this
section. We have developed muscle disease detection models for four classification
tasks, namely ALS versus CO, myopathy versus CO, ALS versus myopathy, and
ALS versus myopathy versus CO. In Table 1, the performance of different machine
learning algorithms has been compared for the four classification tasks as mentioned
above.
The data set used in this work included EMG signals collected from different mus-
cles. Among these for four of the muscles, namely Vastus Medalis, Tibialis Anterior,
Deltoideus, and Biceps Branchii, data is available for both ALS and myopathy. In
order to select the most discriminative signal, we compare the classification perfor-
mance of ALS versus myopathy for each of these muscles, as given in data set [14].
It seems from the result that Vastus Medialis muscle group which is part of quadricep
located in front thigh is showing the best performance. However, it is pertinent to
mention here that since the data present for each muscle is very small in size, our
results are not conclusive (Table 2). We now present the performance of ESKNN
classifier for each feature in Table 3. As it can be observed from the table, maximum
and minimumm amplitude values are relevant features for Myopathy versus CO and
ALS versus CO. This may be due to the fact that in ALS, the ampliude of the MUAPs
increases by a large margin as shown in Fig. 1a, whereas for myopathy, these values
decrease as shown in Fig. 1b. Since the ALS MUAPs have both high positive and
negative amplitude values as compared to myopathy and control, variance as a fea-
ture works better for distinguishing these the muscle diseases. It can also be noted
that the performance of kurtosis is not good as compared to other features. Finally,
we compare our results for the four classification tasks, namely ALS versus CO,
ALS versus myopathy, myopathy versus CO, and ALS versus CO versus myopathy,
with the literature in Table 4. The proposed algorithm performs better for all these
tasks as can be seen in the table.
164 R. Tengshe et al.
4 Conclusions
We proposed a machine learning model for NMD detection using FDM and ensemble
subspace KNN. Results are shown for binary (ALS vs. CO, ALS vs. myopathy and
myopathy vs. CO) and tertiary classes (ALS vs. CO vs. Myopathy). Our model
classifies NMD with accuracies: 95.2% for ALS versus CO, 96.4% for ALS versus
myopathy, 93.7%, myopathy versus CO, and 92.3% ALS versus myopathy versus
CO. In future, we would like to develop a subject-independent methodology where
the subjects involved in training the model will not be used for testing. Also, we
would like to use control subjects’ data collected from different muscles and develop
a muscle-independent data set. We will aim to obtain an improved algorithm with
better performance metrics.
References
1. Belkhou A, Achmamad A, Jbari A (2019) Classification and diagnosis of myopathy emg signals
using the continuous wavelet transform. In: 2019 scientific meeting on electrical-electronics &
biomedical engineering and computer science (EBBT). IEEE, pp 1–4
2. Doulah ASU, Iqbal MA, Jumana MA (2012) Als disease detection in emg using time-frequency
method. In: 2012 international conference on informatics, electronics & vision (ICIEV). IEEE,
pp 648–651
3. Doulah A, Fattah S (2014) Neuromuscular disease classification based on mel frequency cep-
strum of motor unit action potential. In: 2014 international conference on electrical engineering
and information & communication technology. IEEE, pp 1–4
4. Dubey R, Kumar M, Upadhyay A, Pachori RB (2022) Automated diagnosis of muscle diseases
from emg signals using empirical mode decomposition based method. Biomed Signal Process
Control 71:103098
5. Fatimah B, Javali A, Ansar H, Harshitha B, Kumar H (2020) Mental arithmetic task classifica-
tion using Fourier decomposition method. In: 2020 international conference on communication
and signal processing (ICCSP). IEEE, pp 0046–0050
6. Fatimah B, Preethi A, Hrushikesh V, Singh BA, Kotion HR (2020) An automatic siren detec-
tion algorithm using Fourier decomposition method and MFCC. In: 2020 11th international
conference on computing, communication and networking technologies (ICCCNT), pp 1–6.
https://doi.org/10.1109/ICCCNT49239.2020.9225414
7. Fatimah B, Singh P, Singhal A, Pachori RB (2020) Detection of apnea events from ecg segments
using Fourier decomposition method. Biomed Signal Process Control 61:102005
8. Fatimah B, Singh P, Singhal A, Pachori RB (2021) Hand movement recognition from semg
signals using Fourier decomposition method. Biocybern Biomed Eng 41(2):690–703
9. Fatimah B, Singh P, Singhal A, Pramanick D, Pranav S, Pachori RB (2021) Efficient detection
of myocardial infarction from single lead ecg signal. Biomed Signal Process Control 68:102678
10. Istenič R, Kaplanis PA, Pattichis CS, Zazula D (2010) Multiscale entropy-based approach
to automated surface emg classification of neuromuscular disorders. Med Biol Eng Comput
48(8):773–781
11. Joshi D, Tripathi A, Sharma R, Pachori RB (2017) Computer aided detection of abnormal emg
signals based on tunable-q wavelet transform. In: 2017 4th international conference on signal
processing and integrated networks (SPIN). IEEE, pp 544–549
12. Mishra VK, Bajaj V, Kumar A (2016) Classification of normal, als, and myopathy emg signals
using elm classifier. In: 2016 2nd international conference on advances in electrical, electronics,
information, communication and bio-informatics (AEEICB). IEEE, pp 455–459
Automated Detection for Muscle Disease … 165
Abstract This paper presents IoT-based monitoring system for drowsiness detection
for automotive drivers in real-time. The proposed system undergoes three levels of
drowsiness detection system to monitor the driver drowsiness and alert him as and
when required. The process begins with alcohol detection as a safety precaution,
if alcohol is not sensed, then the system proceeds further to detect the face else the
engine turns off. Initially, the driver’s face is captured and trained using Haar cascade
classifier and AdaBoost algorithm is used to select the meta-data in Haar like features.
The proposed system detects only the authorised driver’s face and estimates the eye
closure rate, which is captured through the live streaming video from the pi camera.
In level 1, if the eye-aspect ratio is below the threshold value, then a sound alerting
system is generated. In level 2, if sound alert is prolonged for more than two times, a
human voice alerting system is enabled and in the final level, a notification with the
GPS location is sent to the driver’s owner or any concerned person. The continuous
retrieved data will be stored in the log file. The system uses infrared light to detect
driver’s drowsiness at night-time.
Abbreviations
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 167
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_17
168 R. Chandana and J. Sangeetha
1 Introduction
2 The Methodology
be sent to the owner of the vehicle. Detection of the driver’s eyes is determined by the
Haar cascade frontal eye detection classifier [12]. Classification of eye is determined
as open, close and drowsy based on eye co-ordinates. Drowsiness of the driver is
determined by computing the eyelid closure, which is based on the “eye-aspect ratio”
(EAR) of both the eyes [13].
If EAR value descends below the threshold value and if eyelid closure [14, 15]
occurs more than twice, then Level 1 drowsiness gets detected and driver is alerted
from the sound alerting system. Subsequently in Level 2, if the alerting system is
enabled more than twice, then a human voice message is generated. In Level 3, if
voice message is generated more than once, and then SMS and E-mail will be sent
to the vehicle’s owner. Simultaneously, parking light gets enabled and engine gets
turned off. The continuous retrieved data will be stored in the log file containing
various levels of drowsiness detected with current date and time. The system is used
for vehicles theft detection, if any other person apart from the concerned person tries
to drive the vehicle, then an alerting sound, e-mail and a message notification will
be sent to vehicle’s owner.
Once the driver gets seated in the vehicle, MQ-3 (Mı̆ngăn Qı̌lai) alcohol sensor,
which is placed near the driver’s seat detects the existence of alcohol gases. If the
value ranges from 0.05 to 10 mg/L, then alcohol will be detected else alcohol will
not be sensed. Based on the alcohol sensor readings, the system allows the driver to
start the vehicle’s engine else it will remain in idle state.
Face is detected, and images are captured through camera. The detected face will be
verified across the trained faces, which are captured and only the authorised driver’s
face will be detected.
Haar cascade classification is a machine learning algorithm where positive and
negative images are used to train the classification classifier [15]. The positive images
contain the images that have to be detected, and the negative images contain images
other than positive images [16]. It is evident to use fast and precise detection of
face. Haar cascade classifier is used based on the technique of “Haar wavelet” in
order to determine the pixels in image [17]. Initially, Haar features are obtained by
considering the corresponding rectangular regions in the sliding window at specific
locations. This measures the intensities of the pixels in each area and assesses the
difference between these quantities. It uses the concept of “integral image” to analyse
the “features” [18]. Integral image is an image where in both horizontal and vertical
axis we get cumulative addition of intensities on subsequent pixels. Haar cascade
172 R. Chandana and J. Sangeetha
uses AdaBoost learning algorithm to extract the appropriate features from a large
collection to produce an effective classification result [19]. Cascading technique is
used to identify the face in an image and remove images that are insignificant. This
reduces the number of weak classifiers and increases the detection speed [20].
The system gets trained with the authorised driver’s face; the training module helps
to collect the pictures of authorised driver with various face orientations through
camera and trains the system using Haar cascade classifier.
In Fig. 2, image samples are collected to train the face with different orientations
using Haar cascade. Adaptive boosting learning (“AdaBoost”) algorithm is used
to select the important features from the dataset to produce an effective result of
classifiers. Several weak classifiers are trained based on same training set. The strong
classifier is made up of previous weak classifiers that will be jointly boosted. The
efficient classifier has a greater capacity to identify the face [21, 22]. In order to
process the data using AdaBoost algorithm, we need
Quality data: this tries to correct misclassifications in the training data.
Outliers: to rule out any unrealistic observations.
Noisy data: to isolate the required data from the unwanted data [23].
This technique of cascading detects the face in an image and discards irrelevant
images. These images help the system to detect the faces in real-time even with
different head orientations. Eye is detected from upper half region of the face [24].
Here, the driver’s face is captured through the live streaming video with various face
orientations and training of the face is done by using Haar cascade classifier. Cascade
confidence is measured across the image. The confidence level of the detected face can
be calculated by the system where it checks for the similar features with the trained
images [25]. This confidence level is found to improve the accuracy in detecting
authorised driver’s face. If the confidence level is greater than 80%, then the detected
face will be authorised driver’s face. Detection of unauthorised face is done in order
to prevent vehicle theft. Training has been carried out by capturing driver’s face with
various face orientations using Haar cascade classifier [26]. Cascade confidence is
measured across the image. If the confidence level is lesser than 80%, then the
detected face is unauthorised driver’s face.
Eye is detected from the upper half region of the face [10, 27]. Haar classifier is used
to train the eye images by using edge detection. Different eye images are collected
and trained to detect eye from live streaming video frame [28]. Here, eye images are
considered as positive images and the rest of the images, which does not have eyes are
considered as negative images [29]. AdaBoost learning algorithm is used to select the
important features from the dataset to produce an effective result of classifiers. This
technique of cascading detects the eye in an image and discards irrelevant images.
P = E c /(E o + E c ) × 100%
P = 19/(41 + 19) × 100%
P = 0.31 (1)
value, if the eyelid closure is lesser than 31%, then the eye state is classified as
“close” else it is classified as “open” state. Eye-aspect ratio (EAR) of both the eyes
is computed, and average of EAR is computed as shown below.
where
A = Euclidean_distance(eye[P2 ], eye[P6 ])
B = Euclidean_distance(eye[P3 ], eye[P5 ])
C = Euclidean_distance(eye[P1 ], eye[P4 ]).
Threshold of eye is set to a certain range (0.31) as illustrated in Eq. 1. If the EAR
value falls below the specified range, then drowsiness will be detected [23, 24]. The
drowsiness detection is captured for every five frames per sec. This way eye-state
classifier reduces false alarm in the eye detection level [25].
In Fig. 3, six landmarks of eye are located as P1 , P2 , P3 , P4 , P5 , P6 co-ordinates,
respectively, where the Euclidean distances are calculated based on the distances
between two points. The numerator in Eq. (2) computes the sum of distances between
the vertical eye landmarks (P2 , P6 ; P3 , P5 ), while the denominator computes twice
the distances between the horizontal eye landmarks (P1 , P4 ).
In general
||P2 − P6 ||+||P3 − P5 ||
EAR =
2||P1 − P4 ||
(0.91 − 0.71) + (0.85 − 0.74)
=
2(0.38 − 0.22)
0.31
= = 0.95 (3)
0.32
Here, EAR value 0.95 concludes that the eye is open.
Drowsiness Detection for Automotive Drivers in Real-Time 175
3 Experimental Setup
Fig. 4 Hardware
components of the
experiment
Initially, the engine will be in off state and the engine starts only when the alcohol is
not sensed. The alcohol sensor is attached to the system, which checks if the driver
has drunk or not. This helps in safe driving.
In Fig. 5, the alcohol is not sensed. The alcohol MQ3 gas sensor concentration
ranges from 0.05 to 10 mg/L. If the value lies in between the specified range, then
the engine will remain in off state. Once the alcohol detection is done, and if alcohol
is not sensed, then the engine starts.
The system starts detecting the face and checks if the face is authorised or not.
Drowsiness Detection for Automotive Drivers in Real-Time 177
The system further checks if the face is authorised or not. Figure 6 represents
known face detection, and Fig. 7 represents unknown face detection.
If an unknown face is detected, then an e-mail alert is sent to the vehicle’s
owner. The sent e-mail contains an image of the unknown driver’s face with Global
Positioning System (GPS) location of the vehicle as shown in Fig. 8.
178 R. Chandana and J. Sangeetha
This way it helps to identify any theft activity. Also, SMS alert is sent to vehicle’s
owner when unknown face is identified as shown in Fig. 9.
In Fig. 9, SMS alert is received immediately by the vehicle’s owner stated as “Hello
owner, unknown driver found, Take action!!!” when an unknown face is recognised.
This helps the vehicle’s owner to take immediate action accordingly.
Once the authorised face is recognised, then the system identifies the eyes as in
Fig. 10.
Drowsiness Detection for Automotive Drivers in Real-Time 179
Fig. 8 E-mail alert received by the vehicle’s owner when an unknown driver is identified
In third level of drowsiness detection, a notification with the GPS location is sent
to the driver’s owner or any concerned person. The drowsiness can be clearly detected
during both day and night times as shown in Fig. 15a, b, respectively.
Here, EAR in daytime is 0.32 which is greater than the threshold value 0.31,
hence drowsiness is not detected. The EAR during night-time driving is 0.33 where
the value is greater than threshold value where the drowsiness is not detected.
In Fig. 16, E-mail alert is received immediately after the third level of drowsiness
detection. Vehicle’s owner receives an e-mail alert with an attachment containing the
driver’s face along with the GPS location. SMS Alert is sent to vehicle’s owner as in
Fig. 17.
Figure 17 represents a SMS, which is sent to the vehicle’s owner after the third
level of drowsiness detection. Here, Vonage’s SMS API is used to send and receive
Fig. 15 a Detection of face and eye during daytime driving. b Detection of face and eye during
night-time driving
Drowsiness Detection for Automotive Drivers in Real-Time 183
the text messages. Once the drowsiness detected, Vonage API gets triggered and a
SMS is sent to the vehicle’s owner using the local number. The continuous retrieved
data will be stored in the log file as shown in Fig. 18.
The various drowsiness levels are stored in the log file for future reference with the
current date and time. This helps the driver owner to identify the various drowsiness
detection levels and take appropriate actions. Here, the first level of drowsiness is
detected as followed by the second and third level of drowsiness detection.
The drowsiness of the driver is predicted by the threshold value of 0.31. When
the threshold value descends below certain range, drowsiness is detected as shown
in Table 2.
Table 2 consists of four sets, and each set consists of 20 drowsy cases;
In set I, 18 cases are predicted as drowsy when threshold value is set to 0.21 and
19 cases are predicted as drowsy when threshold value is set to 0.31.
In set II, 17 cases are predicted as drowsy when threshold value is set to 0.21 and
18 cases are predicted as drowsy when threshold value is set to 0.31.
In set III, 19 cases are predicted as drowsy when threshold value is set to 0.21 and
19 cases are predicted as drowsy when threshold value is set to 0.31.
Drowsiness Detection for Automotive Drivers in Real-Time 185
In set IV, 16 cases are predicted as drowsy when threshold value is set to 0.21 and
18 cases are predicted as drowsy when threshold value is set to 0.31.
The accuracy of the drowsiness predicted is higher when the threshold value is
0.31. The mean accuracy is calculated as shown in Eq. (4) by considering four sets.
In Fig. 19, drowsiness is predicted for four sets; here, the red-line indicates the
threshold value 0.31 and grey-line indicates the threshold value 0.21. The chances
of predicting drowsiness are more accurate when the threshold value is 0.31.
In Fig. 20, we can see various drowsiness levels detected with date and time. At
the initial level, the system checks if the driver has taken alcohol or not through the
alcohol sensor. If alcohol is not sensed, then the engine starts. Subsequently, camera
turns on and captures the images from the live steaming video. Preceding the system
recognises the face, if known face is recognised, then eye detection will be done else
unknown face will be detected and a sound alarm will be raised along with SMS and
an e-mail alert. When the known face is detected and driver drowsiness is recognised,
then Level 1 drowsiness will be detected and the driver gets alerted through a sound
alert system. If prolonged for more than twice, Level 2 drowsiness will be detected
and the driver gets alerted through human voice alerting system. Sequentially, if the
drowsiness is prolonged more than once, then Level III drowsiness will be detected
by enabling the parking light, which blinks the light thrice and engine gets turned
off automatically.
The illustration of three-level verification is as shown in Table 3.
Here, the verification of drowsiness is carried out in three levels. The system is
validated in order to check the accuracy. These results are accurate as false predictions
can be eliminated in earlier levels.
The overall framework is illustrated in Table 3 showing sample data of 15 cases.
Here, “A” and “D” denote the user is “alert” and “drowsy”, respectively.
Here, out of 15 cases, seven cases are alert and eight cases are drowsy.
The probability of alert P(A) and drowsy P(D) is calculated, respectively, as shown
in Eqs. (5 and 6).
The probability of eyelid closure predicting as “drowsy” given when driver is
actually drowsy in level 1 is obtained as
The probability of eyelid closure predicting as alert given when driver is actually
alert in level 1 is obtained as
The probability of eyelid closure predicting as alert given when driver is actually
alert in level 3 is obtained as
This indicates that three levels of drowsiness has improved the prediction indi-
cating the alert state P(A/A) = 0.714, with false positive probability as 2/7 =
0.285.
The system accuracy levels can be determined as shown in Table 4.
In Table 4, system accuracy is calculated by considering four sets of 25 samples
each.
In set I, alert cases are 15, drowsy cases are 10. Out of these 25 cases, our system
predicted 23 cases correctly. Therefore, the accuracy is 92%.
In set II, alert cases are 13, drowsy cases are 12. Out of these 25 cases, our system
predicted 21 cases correctly. Therefore, the accuracy is 84%.
In set III, alert cases are 16, drowsy cases are 9. Out of these 25 cases, our system
predicted 22 cases correctly. Therefore, the accuracy is 88%.
In set IV, alert cases are 12, drowsy cases are 13. Out of these 25 cases, our system
predicted 21 cases correctly. Therefore, the accuracy is 84%.
5 Conclusion
in the log file for future reference by the vehicle’s owner to track the driver. Infrared
light is used to detect drowsiness of the driver during night-time. The proposed
system is more reliable because the system has three levels of drowsiness detection
of the driver. The system is also advantageous for using SMS service to inform the
vehicle owner’s or the concerned person concerning the loss of attention of the driver.
The system is also used for vehicles theft detection, if any other person apart from
the concerned person tries to drive the vehicle, then an alerting sound, e-mail and a
message will be sent to the vehicle’s owner. This application is applicable to monitor
the face’s in ATM centres, lifts and sending an alarm if any consequences occur for
child and women safety.
6 Future Scope
Extension of the work can be to exploit driver health conditions in order to improvise
the driver safety in driver drowsiness detection. Drowsiness can be detected based
on speech signals, when a system generates a question through the speaker, then the
driver has to respond through speech. Seat vibration can be implemented as a part of
physical alert.
Acknowledgements We would thank the funding agency “Karnataka State Council for Science
and Technology” (KSCST) for accepting the project proposed and sponsoring our project. Also our
college Ramaiah Institute of Technology, CSE dept. for helping us to complete the project with
favourable outcome.
References
10. Singh S, Prasad SVAV (2018) Techniques and challenges of face recognition: a critical review.
Procedia Comp Sci 143; In: 8th International conference on advances in computing and
communication (ICACC-2018) 2018
11. Ji Q, Zhu Z, Lan P (2004) Real-time nonintrusive monitoring and prediction of driver fatigue.
IEEE Trans Veh Technol 53(4):1052–1068
12. Hong T, Qin H (2007) Drivers drowsiness detection in embedded system. In: Proceedings of
international conference on vehicular electronics and safety (ICVES), Dec 2007, pp 1–5
13. Lang L, Qi H (2008) The study of driver fatigue monitor algorithm combined PERCLOS and
AECS. In: Proceedings of international conference computer science software engineering, vol
1, Dec 2008, pp 349–352
14. Zhang Y et al (2019) Research and application of AdaBoost algorithm based on SVM. In:
2019 IEEE 8th joint international information technology and artificial intelligence conference
(ITAIC), Chongqing, China, pp 662–666. https://doi.org/10.1109/ITAIC.2019.8785556
15. Chandana R, Sangeetha J (2021) Review on drowsiness detection for automotive drivers in
real-time. Nat Volatiles Essen Oils 8(6), Jan 2021
16. Vinay A, Joshi A, Surana HM, Garg H, Murthy KB, Natarajan S (2018) Unconstrained
face recognition using ASURF and cloud-forest classifier optimized with VLAD. In: 8th
International conference on advances in computing and communication ICACC-2018
17. Kakade SD (2016) A review paper on face recognition techniques. Int J Res Eng Appl Manage
(IJREAM) 2(2), May 2016
18. Parte RS, Mundkar G, Karande N, Nain S, Bhosale N (2015) A survey on eye tracking and
detection. Int J Inno Res Sci Eng Technol 4(10), Oct 2015
19. Jin L, Niu Q, Jiang Y, Xian H, Qin Y, Xu M (2013) Driver sleepiness detection system based
on eye movements variables. Hindawi Publishing Corporation, Article ID 648431
20. Fitriyani NL, Yang CK, Syafrudin M (2016) Real-time eye state detection system using Haar
cascade classifier and circular Hough transform. In: IEEE 5th global conference on consumer
electronics
21. Sahu M, Nagwani NK, Verma S, Shirke S (2015) Performance evaluation of different classifier
for eye state prediction using EEG signal. Int J Knowl Eng 1(2), Sept 2015
22. Chan TK, Chin CS, Chen H, Zhong X (2019) A comprehensive review of driver behavior
analysis utilizing smartphones. IEEE Trans Intell Transp Syst
23. Pratama BG, Ardiyanto I, Adji TB (2017) A review on driver drowsiness based on image,
bio-signal, and driver behaviour. In: 3rd International conference on science and technology—
computer (ICST) 2017
24. Kusuma Kumari BM, Ramakanth Kumar P (2017) A survey on drowsy driver detection system.
78-1-5090-6399-4/17/$31.00 c. IEEE
25. Ramzan M, Khan HU, Awan SM, Ismail A, Ilyas M, Mahmood A (2019) A survey on state-
of-the-art drowsiness detection techniques. IEEE Access 7
26. Dhupati LS, Kar S, Rajaguru A, Routray A (2010) A novel drowsiness detection scheme based
on speech analysis with validation using simultaneous EEG recordings. In: Proceedings on
IEEE international conference on automation science and engineering. (CASE), Aug 2010, pp
917–921
27. Song F, Tan X, Liu X, Chen S (2014) Eyes closeness detection from still images with multi-scale
histograms of principal oriented gradients. Pattern Recognit 47(9):2825–2838
28. Manjutha M, Gracy J, Subashini P Dr, Krishnaveni M Dr (2017) Automated speech recognition
system—a literature review. Int J Eng Trends Appl (IJETA) 4(2), Mar–Apr 2017
29. Kashevnik A, Lashkov I, Gurtov A (2019) Methodology and mobile application for driver
behavior analysis and accident prevention. IEEE Trans Intell Transp Syst
30. Tran D, Du J, Sheng W, Osipychev D, Sun Y, Bai H (2018) Human-vehicle collaborative driving
framework for driver assistance. IEEE Trans Intell Transp Syst
Prediction of Dementia Using Deep
Learning
Abstract Artificial intelligence and its sub-field machine learning are continuously
evolving and being applied in medicine and healthcare amongst other important
fields. Machine learning and deep learning are frequently used to aid dementia predic-
tion and diagnosis. Deep learning models are better than other machine learning
models for dementia detection and prediction, but they are more computationally
very expensive. The objective of the work is to build a deep learning model to predict
dementia. This model is designed to predict dementia from brain MRI images and
is based on the concepts of deep learning and convolutional neural network (CNN).
The developed model is able to identify demented and non-demented MRI images
with an accuracy of 99.35%, better than existing models.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 191
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_18
192 T. Baliyan et al.
to rise to 152 million by 2050 [1]. However, scientists are yet to discover a cure
for Alzheimer’s disease that can treat and prevent the disease precisely. Based on
clinical dementia rating (CDR) value, the dementia is categorized into four stages:
very mild, mild, moderate, and severe dementia. Because the treatment costs for
very mild dementia patients differ a lot from that of the severe dementia patients, it is
important to diagnose dementia illnesses early in order to maximize patient recovery
and reduce treatment costs [2].
A major issue is incorrect diagnosis, as the majority of dementia patients are
initially seen by general physicians, who often fail to recognize dementia and hence
diagnose it incorrectly. Due to such late diagnosis, physicians are often unable to
slow the progression of dementia and reduce debilitating behavioural changes. A
simple way for diagnosing dementia early in its development might lead people to
seek diagnosis and treatment sooner than later.
Recent advances in deep neural network approaches have showed a lot of promise
in terms of combining massive administrative claims and electronic health record
databases power, also powerful computation to generate good predictive models
for healthcare. Many deep learning techniques have been used to detect and diag-
nose dementia along with other neurological diseases. Deep learning, unlike typical
machine learning algorithms, incorporates all three fundamental processes in neural
network modelling: feature extraction, feature dimension reduction, and classifica-
tion. CNN and RNN have become predominant mechanisms in deep learning. In
computer vision and image analysis, CNN is currently the most successful deep
learning model. CNN model architectures are typically made of several layers such
as convolutional layer, pooling layer, and activations. The model uses these layers to
extract features from images gradually.
2 Related Work
This section of the paper explains about the various existing works on the prediction of
dementia and other neurological disorders like Alzheimer’s disease and Parkinson’s
disease. Some of these work on neuroimaging MRI dataset, whilst others review the
parameters from the clinical data of patients. A thorough review of various litera-
tures reveals that dementia is a degenerative brain condition that eventually leads to
memory loss. Exploratory data analysis on longitudinal MRI dataset resulting in a
technique called ‘CapNet’ which emphasizes the use of classification methods by
which an image retrieval system is fed with images as query inputs [3]. Investigative
analysis on the use of deep learning models to predict dementia using the longitu-
dinal health information of patients reveals that the deep learning models provide
a significant boost in the performance of models. Dementia is not the only neuro-
logical disease for which predictive models have been built. Analysis of algorithms
like linear discriminant analysis, K-nearest neighbours, and support vector machines
in identifying Parkinson’s disease revealed that SVM provides better accuracy [4].
An ‘ALL-PAIRS’ technique developed to investigate the progression of Alzheimer’s
Prediction of Dementia Using Deep Learning 193
was effective when trained on patient data [5]. A predictive and preventive CNN
model to predict Alzheimer’s disease in the early stages along with a system that
displays the preventive measures to be taken along with suggestion of medication
outperforms traditional ML algorithms, when trained on both cross-sectional and
longitudinal MRI scans [6]. Apart from these, a deep learning model validated on
MRI scans to predict the progression towards Alzheimer’s disease ranging from 6 to
18 months with a follow-up duration of 18–54 months finds a clear advantage of using
‘hippocampal’ features for improved prediction [7]. The use of fluorodeoxyglucose
(FDG) PET and structural MRI in a 3D DenseNet brain age prediction model to see
how the brain age gap relates to degenerative cognitive disorders showed an age-
dependent saliency pattern of brain areas, and CNN-based age prediction provided
good accuracy is proposed in [8]. A study with the goal of developing a machine
learning model for predicting occurrence of Alzheimer’s disease, mild cognitive
impairment, and similar dementias using structured data obtained from electronic
health record and administrative sources, developed a ‘label-learning approach’ using
a cohort of patients and controls using data obtained within two years of the patient’s
incident diagnosis date [9]. The model achieved an accuracy of over 80% and AUC
and sensitivity over 40% and thus has the utility to pre-screen patients for further
diagnosis or evaluation for clinical trials.
3 Proposed Work
We propose a CNN-based model to predict dementia using MRI scan images of the
patient. The MRI scan is given to the proposed model as input. In the pre-processing,
the input image is resized to 128 * 128 and normalized. This image is then sent to
the CNN classifier which predicts the presence of dementia as depicted in Fig. 1.
Convolutional neural networks (CNNs) are made up of various layers, which are
typically the input, hidden, and output layers. The convolution takes place in the
hidden layer. Computers view images as pixels, and convolution uses this ability to
classify images. Features are extracted from the images in convolutional layer. The
kernels in the convolutional layers scan through images to extract the feature map.
Convolution layer is followed by a pooling layer. The function of the pooling layer
is to reduce the feature map to prevent over fitting. Activation functions are used to
activate a neuron when needed. The pooling method selected here is maxpooling2d.
4 Experiment
In this paper, we have used the dementia MRI dataset [10], publically available in
Kaggle. The dataset is hand collected from various Websites with labels verified. This
dataset has 6199 brain MRI images of size 176 × 208 pixels, and we have labelled the
images with ‘yes’ for patients with dementia and ‘no’ for patients without dementia.
The dataset comprised of 3190 images with ‘no’ labels and 3009 images with ‘yes’
label. The dataset is further divided into training and validation sets with training
set as 80% and the validation set as 20% of the images in the dataset. The sample
images of both non-demented and demented from the dataset are given in Fig. 2.
The proposed CNN model architecture is given in Fig. 3. The model is made
up of convolutional layer along with a max pool layer stacked up sequentially in
3 layers. The last max pool layer is followed by two fully connected layers. Each
convolutional layer of size (3 × 3) is followed by a pooling layer of size (2 × 2).
The final pooling layer is flattened. This is followed by 1 fully connected layer and
1 output layer. We have used sigmoid function for activation in the output layer as
shown in Fig. 4. We have used ReLU function for activation in rest of the other
layers. Binary cross entropy function is used as the loss function along with Adam
optimization algorithm. We trained the model for fifty epochs. We have used early
stopping to stop the training when the validation loss is smaller than 0.003 for 5
epochs.
The main difference between other predictive models and our model is the use
of binary cross entropy as the loss function. We used binary cross entropy since we
only wanted two possibilities for our prediction model: either presence or absence
of dementia. Other predictive models prefer using categorical cross entropy as the
loss function.
5 Results
Our model has given a 99.35% training accuracy and 96.21% as validation accuracy.
The training stopped at 31 epochs as change in validation loss for last 5 epochs is
less than 0.003. The accuracy graph for training and validation data is given in Fig. 5.
The loss graph for training and validation data is given in Fig. 6.
The comparison of the results of our model with different models is as shown
in Table 1. Classification by application of query using ‘CapNet’ [3] has produced
92.39% accuracy, predictive and preventive-based CNN model [6] give an accuracy
of 85%, and ‘label-learning’ approach [9] model gives an accuracy of 80%. Our
model comparatively gives better accuracy of 99.335%.
6 Conclusion
References
1. Nori VS, Hane CA, Crown WH, Au R, Burke WJ, Sanghavi DM, Bleicher P (2019) Machine
learning models to predict onset of dementia: a label learning approach. In: Alzheimer’s &
Dementia: translational research & clinical ınterventions, vol 5, pp 918–925
2. Isik Z, Yiğit A (2019) Applying deep learning models on structural MRI for stage prediction
of Alzheimer’s disease. Turk J Electr Eng Comp Sci 28(1), Article 14
3. Basheer S, Bhatia S, Sakri SB (2021) Computational modeling of Dementia prediction using
deep neural network: analysis on OASIS dataset. IEEE Access 9:42449–42462
4. Mathkunti NM, Rangaswamy S (2020) Machine learning techniques to ıdentify Dementia. SN
Comput Sci 1:118
5. Albright J (2019) Forecasting the progression of Alzheimer’s disease using neural networks
and a novel preprocessing algorithm. In: Alzheimer’s & Dementia: translational research &
clinical ınterventions, vol 5, pp 483–491
Prediction of Dementia Using Deep Learning 199
6. Singhania U, Tripathy B, Hasan MK, Anumbe NC, Alboaneen D, Ahmed FR, Ahmed TE,
Nour MM (2021) A predictive and preventive model for onset of alzheimer’s disease. Front
Public Health 9
7. Li H, Habes M, Wolk DA, Fan Y (2019) A deep learning model for early prediction
of Alzheimer’s disease dementia based on hippocampal magnetic resonance imaging data.
Alzheimers Dement 15(8):1059–1070
8. Lee J, Burkett BJ, Min HK et al (2022) Deep learning-based brain age prediction in normal
aging and dementia. Nature Aging 2:412–424
9. Nori VS, Hane CA, Sun Y, Crown WH, Bleicher PA (2020) Deep neural network models for
identifying incident dementia using claims and EHR datasets. PLoS One 15(9)
10. Alzheimer’s dataset. https://www.kaggle.com/datasets/tourist55/alzheimers-dataset-4-class-
of-images
Performance Analysis of Universal
Filtered Multicarrier Waveform
with Various Design Parameters for 5G
and Beyond Wireless Networks
1 Introduction
The massive deployment of wireless systems and Internet devices with new appli-
cation scenarios has created demands for ubiquitous connectivity with extreme data
traffic. To fulfill these needs, 5G technology has emerged to cope with challenges like
increase in user density, seamless connectivity, traffic density, data rate, and exten-
sive applications. In the current cellular network, increasing bandwidth or increasing
cell density is the major factors considered to meet the requirement of peak data rate
and increased capacity. The primary challenge in this approach is that the limited
resources are reaching their saturation and also increasing the cost of the hardware [1].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 201
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_19
202 S. Jolania et al.
To improve spectrum usage, the evolution of a new air interface and novel approaches
to radio resources with multiple access management are needed. The design of a novel
multicarrier waveform at the physical layer is to fulfill the needs of next-generation
wireless networks with low peak-to-average power ratio (PAPR), high throughput,
improved spectral efficiency, and reduced interchannel interference (ICI). OFDM
is a widely used multicarrier modulation air interface in 4G LTE, WiMAX, optical
communication, etc., but fails to meet the requirements in future scenarios of the
physical layer. Due to high-frequency offset, PAPR and spectral leakage, OFDM
multicarrier technique is not suitable for the next generation wireless networks [2].
For efficient utilization of spectrum with high data rate transmission and to cope
with ICI, a new multicarrier modulation technique needs to be designed to bring a
faster and better user experience. Various multicarrier techniques are available to
meet the requirements of 5G and to improve the spectrum efficiency as discussed in
the following section.
In filter bank multicarrier (FBMC), the spectrum is divided into multiple sub-bands
which are orthogonal to each other and applies subcarrier filtering. Adaptable filters
are applied at the subcarrier level to perform according to the channel conditions
and use cases [3]. Although it has various advantages—time-frequency efficiency,
fewer OOB emissions, and ICI proving suitable for 5G. But FBMC has very high
computational complexity and incompatibility with multiple-input multiple-output
(MIMO). In generalized frequency division multiplexing (GFDM), the modulated
data symbols are transmitted in frequency and time, two-dimensional blocks divided
into sub-symbols and subcarriers. Subcarriers are filtered with non-orthogonal pulse
shaping prototype filters [4]. In this method, the major drawbacks are higher latency,
incompatibility with MIMO, and complex pilot design [5]. In FBMC and GFDM,
subcarrier-wise filtering is applied, but it requires a new transceiver design. Also,
there are major problems with channel equalization and backward incompatibility
with 4G. So we will prefer sub-band wise filtering. In universal filtered multicarrier
(UFMC), sub-band filtering is applied where the total bandwidth is divided into N
number of sub-bands, and filtering is applied in the frequency domain to reduce
OOB emissions. Due to fine frequency filtration, shorter filter length, and compat-
ibility with MIMO, UFMC is the best multicarrier waveform for 5G and beyond
wireless networks. The remaining paper is focused on the UFMC system model and
its performance.
Performance Analysis of Universal Filtered Multicarrier Waveform … 203
1. Let us consider a multicarrier system with the total bandwidth with C number of
subcarriers indexing from [0, 1, 2, …, C − 1]
2. All the subcarriers are broken down into smaller sub-bands indexing from [i =
1, 2, …, B].
3. Each ith sub-band comprises K subsequent subcarriers, where K = C/B
subcarriers.
4. For the ith sub-band, where 1 ≤ i ≤ B, the data blocks are represented with x i,k ,
{ith sub-band and kth subcarrier, where 1 ≤ k ≤ K.
5. Generating the random bit stream of data and mapped to M-QAM.
6. The QAM modulated symbol is represented B as S i(i = 1, 2, 3, …, B) for the
ith sub-band, including k i subcarriers ( ki = C ). The QAM symbols in the
i=1
frequency domain are assigned to each sub-band with length k i [6].
7. To overcome the problem of sub-band carrier interference, the signal processing
tool inverse fast Fourier transform (IFFT) is applied.
8. N point IFFT converts symbols from frequency domain (S i ) to time-domain (yi )
as shown in Eq. (2).
yi = IFFT{si } (1)
1
k−1
yi (l) = √ si (k)e j2πkl/N (2)
N i=0
L−1
| f i (l)|2 = 1 (3)
l=0
10. The prototype filter should have a constant response in signal spectral range
and must be suitable for communications in dispersive channels [7].
11. The summation of outputs from band filters is passed through the channel. The
superposition of filtered sub-band symbols is the signal X UFMC expressed as
B
X UFMC = yi (l) ∗ f i (l), where l = 0, 1, . . . , N + L − 2 (4)
(i=1)
204 S. Jolania et al.
Here, ‘∗’ symbolizes linear convolution. Finally, the UFMC signal can be
represented by Eq. (5),
(K
−1) (L−1)
(N−1)
XK = y(i,k) f (i,k) (l)e j2πk(n−1)/N (5)
(i=0) (l=0) (n=0)
where F i,k is a Toeplitz matrix, comprising filter impulse response with dimen-
sion (N + L − 1) × N; V i,k is an IFFT matrix that includes the relevant columns
as per the sub-band position within the available frequency range with dimen-
sion N × ni , ni = number of QAM symbols in each resource block; yi,k is a
time-domain symbol with dimension ni × 1.
6 Conclusion
The multicarrier waveform UFMC is most suitable for the existing 4G as well as
for future 5G and beyond systems. Good spectral efficiency due to the absence of
cyclic prefix and reduced PAPR makes UFMC better than other multicarrier tech-
niques. Sub-band filtering helps in reducing OOB emissions with the flexibility to
choose sub-band size, filter length, stop-band attenuation, FFT size, and prototype
window. Higher-order QAM modulation makes it best suitable with massive MIMO
transmission. This research paper concluded that when the FFT size and sub-band
size increase, with side lobe attenuation and filter length decreasing, the BER perfor-
mance becomes better. The most important result is that for the larger FFT sizes (1024
or 2048), UFMC BER performance becomes independent of FIR filter lengths. The
UFMC waveform may be the best choice for the 5G and beyond wireless networks.
In future, this waveform can be implemented with a massive MIMO scenario to
enhance the system capacity and spectrum utilization.
208 S. Jolania et al.
-1
10
BER
-2
10
-3
10
0 5 10 15 20 25
SNR
(a)
BER vs SNR for UFMC (256QAM, 1024 FFT)
at Various Filter Side LObe Attenuation
10 0
Side-lobe Attenuation=5dB
Side-lobe Attenuation=10dB
Side-lobe Attenuation=20dB
Side-lobe Attenuation=40dB
10 -1
BER
10 -2
10 -3
10 -4
0 5 10 15 20 25
SNR
(b)
Fig. 5 a UFMC with different filter length, b BER versus SNR at different side lobe attenuation
Performance Analysis of Universal Filtered Multicarrier Waveform … 209
(a)
(b)
Fig. 6 CCDF and BER curve for 256QAM at different FFT size
210 S. Jolania et al.
QAM PAPR
Mapping
4 QAM 9.04 dB
16 QAM 8.2379 dB
64 QAM 8.6229 dB
References
1. Chataut R, Akl R (2020) Massive MIMO systems for 5G and beyond networks—overview,
recent trends, challenges, and future research direction. Sensors 20:2753. https://doi.org/10.
3390/s20102753
2. Wei S, Li H, Zhang W, Cheng W (2019) A comprehensive performance evaluation of universal
filtered multi-carrier technique. IEEE Access 7:1–1. https://doi.org/10.1109/ACCESS.2019.
2923774
3. Nissel R, Schwarz S, Rupp M (2017) Filter bank multicarrier modulation schemes for future
mobile communications. IEEE J Sel Areas Commun 35(8):1768–1782
4. Fettweis G, Krondorf M, Bittner S (2009) GFDM—Generalized frequency division multi-
plexing. In: Proceedings VTC Spring-IEEE 69th vehicular technology conference, Apr 2009,
pp 1–4
5. Sahin A, Guvenc I, Arslan H (2014) A survey on multicarrier communications: prototype filters,
lattice structures, and ımplementation aspects. IEEE Commun Surv Tutorials 16(3):1312–1338,
Third Quarter 2014. https://doi.org/10.1109/SURV.2013.121213.00263
6. Sakkas L, Stergiou E, Tsoumanis G, Angelis CT (2021) 5G UFMC scheme performance with
different numerologies. Electronics 10:1915
7. Shawqi FS, Audah L, Hammoodi AT, Hamdi MM, Mohammed AH (2020) A review of PAPR
reduction techniques for UFMC waveform. In: 2020 4th International symposium on multidis-
ciplinary studies and ınnovative technologies (ISMSIT), pp 1–6. https://doi.org/10.1109/ISM
SIT50672.2020.9255246
8. Baig I, Farooq U, Hasan NU, Zghaibeh M, Jeoti V (2020) A multi-carrier waveform design
for 5G and beyond Communication Systems. Mathematics 8:1466. https://doi.org/10.3390/mat
h8091466
Performance Analysis of Universal Filtered Multicarrier Waveform … 211
Abstract Diabetic retinopathy (DR) is a condition that causes vision loss and blind-
ness in those who have diabetes. It directly affects the blood vessels of the retina
which leads to visual deficiency. Diabetic retinopathy may not have any symptoms
at first, but it is early diagnosis can help to take further steps to protect your vision.
Screening DR is a time-consuming procedure and requires experts like ophthalmol-
ogist. The proposed work tries to solve this problem with the help of deep learning.
The ResNet34 model is trained on a dataset of fundus eye images. There are five
DR stages such as 0, 1, 2, 3, and 4. Features are extracted from fundus images, and
further, activation function is used to get the output. The model successfully achieves
an accuracy of 0.82.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 213
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_20
214 S. Dongre et al.
1 Introduction
Diabetic retinopathy (DR) is a form of diabetes which damages the retina of the eye.
It is caused due to high blood sugar level. If it is not diagnosed and treated at appro-
priate time, then it can cause blindness. The retina is severely damaged and vision
impairments result. It affects the veins that pass through the retinal tissue, causing
them to leak fluid and distort vision. DR is among the most persistent diseases, along-
side disorders that cause visual impairment such as waterfalls and glaucoma. DR is
divided into the following stages: 0, 1, 2, 3, and 4.
The table below summarizes the various stages of DR: Each stage has its unique
symptoms and characteristics, and doctors can no longer distinguish between the DR
phases only on normal imaging. Furthermore, conventional diagnostic approaches
are ineffective since they take a long time, causing therapy to proceed in the incorrect
direction. Doctors utilized a fundus camera to diagnose retinopathy, which captures
pictures of veins and nerves behind the retina. Because there are no indications of DR
in the early stages of this disease, identifying it as such can be difficult. We employed
several CNN algorithms for early detection so that doctors could begin therapy at
the appropriate moment.
The dataset for this research was obtained from “Aravind Eye Hospital”. The two
CNN designs, such as VGG16 and DenseNet121, were compared, and the outcomes
of both the architectures were illustrated. In recent research and projects, “deep
learning” in AI has shown good results in identifying the hidden layers in different
tasks, especially in the domain of medical picture analysis [1–3]. These models help
to categorize illnesses, aid in medical decision-making, as well as improve persistent
consideration [4]. The work is divided as follows: Section 2 contains the DR image
categorization literature reviews. Section 3 has detailed description of the dataset and
methodology of DL architecture. The primary outcome of this study is described in
Sect. 4. Finally, Sect. 6 brings the paper to a close.
2 Literature Review
There are various drawbacks: Even expert medics find it challenging to classify
DR pictures. As a result, deep convolutional neural network (DCNN) was used to
classify DR with a 94.5% accuracy [6].
Currently, a unique DCNN is being developed that performs the initial temporal
detection by identifying all microaneurysms (MAs), which is the first sign of DR.
It also reliably assigns names to retinal fundus images and divides them into five
groups. The architecture was evaluated on the Kaggle dataset, and it yields a QWK
score of 0.851 and an AUC score of 0.844. The model has a sensitivity of 98% and
a specificity of 94% in the early stage of detection, which demonstrates the efficacy
of the used technique [7, 8].
On ImageNet models, transfer learning improves classification accuracy by
74.5%, 68.8%, and 57.2%, respectively [7].
With proper therapy at the early stages of DR, this form of sickness can be
avoided. For the diagnosis of DR condition, a novel feature extraction approach
called modified Xception architecture has been displayed in the image [8].
The objective is to utilize a universal approach to identify DR and quantify its
severity with great efficiency. The use of various CNN architectures is investigated.
The end outcomes of the training process describe VGG16 which had a 71.7%
accuracy, VGG19 had a 76.9% accuracy, and Inception v3 had a 70.2% accuracy [9].
Unfortunately, determining the DR stage is notoriously difficult and needs expe-
rienced human interpretation of fundus images. Individual imaging of the human
fundus is now being used to build an autonomous approach for DR stage detection.
The technique may be utilized for early-stage detection on the APTOS dataset since
it has a sensitivity and specificity of 0.99 and a QWK score of 0.925466 [10].
3 Methodology
A. Dataset
The dataset consists of high-quality eye fundus images. The dataset for this
research was obtained from Kaggle. There are a total of 5593 images. These
images are of left and right eye, and clinicians have divided them into 5 classes
as per the stage of DR (Fig. 4).
216 S. Dongre et al.
B. Data Preprocessing
The model takes eye image as input. These eye fundus images are divided into
5 classes: no DR (class 0), mild DR (class 1), moderate DR (class 2), severe DR
(class 3), proliferative DR (class 4). Firstly, weights are assigned to each class.
The images are the processed properly. It helps to extract important features
from the images. There are several steps involved in image preprocessing. Image
resizing is a critical preprocessing step as deep learning models train faster on
smaller size images. All eye fundus images are cropped and resized into fixed
sized images that is 512 × 512 pixels. Then, the images are transformed to
tensors. A tensor is like a NumPy array. For accelerated computations, all the
images are converted into tensors. Data is then normalized to a smaller range. It
helps to improve the accuracy and integrity of the data and is generally preferred
for classification algorithms. After normalization, tensors are given to the model
for training and testing.
Diabetic Retinopathy (DR) Detection Using Deep Learning 217
Data Acquisition
Image Pre-processing
Fig. 3 Flowchart
C. Model Training
The next step is to train the model after preprocessing the data. A pretrained
ResNet34 model is used. Residual network (ResNet) is a convolutional neural
network architecture. It consists of 34 convolution layers which can be used
for image classification. The final layer of this architecture is replaced with 4
new layers. Use of ResNet model overcomes the problem of vanishing gradient.
Every ResNet architecture is made up of five blocks. The first block has a total
of 64 filters, each having two strides. After that, there is a max pooling layer and
the ReLU activation function. The second block has a max pooling layer and a
3 * 3 kernel size. Third, fourth, and fifth block have kernel sizes of 3 * 3 with
input channels 64,256 and 512, respectively. Linear activation function helps to
keep all the layers connected.
This ResNet34 model (Fig. 5) is trained and validated for 30 epochs. Also,
accuracy is calculated for each epoch, and then, the trained model is saved.
5 Future Scope
Further work may include utilizing more comprehensive behavioral data and altering
the layers of the neural network. Specific models can also be trained to increase the
overall accuracy.
Diabetic Retinopathy (DR) Detection Using Deep Learning 219
6 Conclusion
In recent years, diabetes has become one of the fastest-growing illnesses. According
to numerous studies, a diabetic patient has a 30% probability of developing diabetic
retinopathy (DR). Also, manual detection of DR requires ophthalmologists and
consumes a lot of time. So, with the knowledge of data mining and deep learning,
we developed an architecture for automatic DR detection. The used ResNet34 archi-
tecture successfully achieved a accuracy of 0.82. It helps to classify the unseen input
images into one of the five stages of DR. The findings demonstrate that the proposed
ensemble model outperforms the previous state-of-the-art approaches and can detect
all phases of DR. To improve the accuracy of early stages in future, we propose to
train models for various phases and then ensemble the results.
References
Rahul Deo Sah, Neelamadhab Padhy, Nagesh Salimath, Sibo Prasad Patro,
Syed Jaffar Abbas, and Raja Ram Dutta
R. D. Sah (B)
Department of CA & IT, Dr. SPM University, Ranchi, Jharkhand, India
e-mail: [email protected]
N. Padhy · S. P. Patro
Department of CSE, Giet University, Gunupur, Odisha, India
e-mail: [email protected]
N. Salimath
Department of CSE, Poojya Dodappa Appa College of Engineering, Kalaburagi, India
e-mail: [email protected]
S. J. Abbas
Department of CSE, Jharkhand Rai University, Ranchi, Jharkhand, India
e-mail: [email protected]
R. R. Dutta
BIT Mesera, Ranchi, Jharkhand, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 221
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_21
222 R. D. Sah et al.
1 Introduction
2 Related Work
Understudy performed AI calculations to predict air quality (AQI) files for specific
regions in this audit article to assess air quality in India. The air quality index is
a regular measure for evaluating air quality. The office tracks gas focus such as
SO2 , NO2 , CO2 , RSPM, and SPM. The analyst creates a model that predicts air
quality inventory based on historical information from the previous year, and how
gradient dip has helped with the multivariate relapse question for a given next year.
They improved the model’s performance by including cost estimates to evaluate
the problem. This model guarantees this recorded information on foreign objects,
which can actually audit the list of air quality lists in the entire province, state, or
limited zone. [1] This study uses an artificial neural network to use an artificial neural
network and uses an artificial neural network to provide a long model that uses an
artificial neural network and an artificial neural network to expand the degree of air
pollution in different places of Navi Mumbai and Navi Mumbai. The proposed model
is performed with the indicators of the results with the participation of MATLAB
for the ANN and R of Kriging [2–4]. The next day, this approach participated in the
multilayer convolution (ANN) as well as reissue. This frame supports predictions
of providing precision of dressing due to the main variable as well as the study
of current pollution and the determination of future pollution. Time series analysis
was further used to predict pollution levels and to recognize future informational
elements [5]. The proposed framework serves two main goals such as (i) determine
Comparative Analysis Using Data Mining Techniques to Predict the Air … 223
the PM2.5 level based on climate data and (ii) predict the PM2.5 level for a specific
date. Strategic relapses are performed to determine whether information testing has
been compromised. Given past PM2.5 estimates, autoregression is often used to
predict future PM2.5 levels. Its main task is to predict air pollution levels in cities
using a set of ground data [6]. An important goal of this article was to describe
the vast research work and provide a useful overview of the latest technologies in
materials for air quality assessment and prediction, vast information approaches,
and AI procedures. The air quality guidelines were prepared and designed using
information from Shenzhen, China.
ANN genetic algorithm model, arbitrary shield, selection tree, and deep convicted
network have a calculation used, various advantages are presented, and the disad-
vantages of the model were presented [7]. In-progress tests are engaged in cutting
measurable learning calculations to assess the prediction of air quality and pollution
levels. The brain network was used [7–9] [Example: 10 micron (PM10)]. You must
wait for the model until individual monopoly occurs, such as the particles of microns
(PM10). To prepare these models, we used [7] reference vehicles (SVM) and fake
brain tissue (ANN). Their best model ANN was almost 79% of personnel and 0.82%
of fake positive rate, and the best SVM model has a characteristic of 80% and is also a
positive tempo of 0.13%. For AQI class prediction [10], RAQ, an irregular backward
method, is recommended. Since then, Leong et al. [3] have used deep brain tissue to
predict design subcategories. To predict the AQI level, Frank et al. [11] used various
settings that outperform K-nearest neighbors (KNNs), selection trees, and SVMs.
Their ANN model outperformed all other evaluated calculations with an accuracy of
92.3%.
3 Dataset Observation
The CSV document design is utilized for the informational index. It is available to the
general population on Kaggle. There are around 450,000 records in the assortment.
In this distribution, the scientists center around PM10 (particulate matter), vaporous
toxins like sulfur dioxide (SO2 ), carbon monoxide (CO), nitrogen oxide (NO2 ),
ozone, and ecological temperature. Datasets were arranged into two segments: 75%
for preparing and 25% for testing.
224 R. D. Sah et al.
We only review and applied the algorithms on data. We have simplified the type
attribute to contain only one of the six categories: PM10, CO, NO2 , SO2 , ozone,
temp. So after preprocessing, our dataset contains 60,380 rows and 6 columns.
4 Technical Approach
For equivalent dataset splitter paths, similar arbitrary seeds (887), and k = 10,
10x, the “mutual validation” strategy applies widely. To emphasize the validity of the
various strategies of this characterization problem, it is used in relative studies using
the selection tree method: Naive Bayes, generalized line method, logistic regression,
fast large-scale method, deep learning, decision tree, gradient boosting trees, and
support vector machines. Mutual recognition is a system for reducing variance when
a conforming model is transferred to a dataset [12]. Finally, the findings collected
are compared to this measurement. The area under the ROC bend is a graphical and
measurable check of the feasibility of expectations. Tragically, the unique alignment
method does not have the ability to evaluate and represent the area under preci-
sion and recall. This is very important for heterogeneous samples. The basic idea
of profound learning is based on progressive learning methods, which are associ-
ated with calculated recurrences. Deep learning, decision tree, SVM classifier, naive
Bayes, generalised line method, fast large-scale method, gradient strengthening tree,
support vector machine of this situation, as well as perform the extensive learning
calculations.
It is a straightforward and fast philosophy, yet some way or another it expects that
the indicator factors are regularly dispersed.
The choice tree order technique is one of the most widely recognized arrangement
strategies since it reliably performs well [4].
Decision trees are vulnerable to overfitting, and alternative methods are usually supe-
rior in quality of accuracy. In this situation, the decision tree method, also known
as the bagging algorithm, is a powerful option for creating many trees to improve
prediction accuracy and reduce the risk of overfitting [13].
Support vector machine enables define category edges also using linear and non-
linear metrics. This function is useful in classification tasks because it allows you
to split observations into classes using a polynomial or wavelet transform function
(non-linear). To classify relevant observations, it will use a linear function in SVM
[15, 16].
In the final part of the study, we used a hierarchical learning approach to implement
the core idea of deep learning or neural network algorithms. The best performance
algorithm for this dataset is based on a logistic regression algorithm and needs further
development. This method applies the algorithms in the proper order, tracking the
results of the previous iteration. Use the “deep learning” feature to use well-known
deep learning algorithms. The “target” characteristics that indicate the results of the
diagnostic test are analyzed [4].
The invention of a data science software interface enables for a simple and quick
approach to various models in the machine learning categorization discipline. Those
processes compare the outcomes of various models’ analysis techniques using
conventional machine learning measures. The creation of a software interface for
data science that makes it easy and quick to use different models in the field of
machine learning categorization using standard machine learning measures, these
steps compare the results of different models’ analysis techniques. First, the process
of analysing and preprocessing data so that descriptive statistics can be made.
In Fig. 2, we used different data mining techniques for comparing the receiver
operating characteristics (ROC) and got deep learning method performance is better
than other.
Comparative Analysis Using Data Mining Techniques to Predict the Air … 227
In Fig. 3, we used different models and got area under curve deep learning 0.783,
standard deviation + 0.055 gains 58, and taking total time is 11 s. Scoring time (1000
rows) in 10.797 ms for deep learning compares to other.
In Fig. 4, we have seen different models which have classification errors, and we
found after building the models, classification errors rate 19.3% of deep learning is
less than among others.
Lists model prediction accuracy and other performance criteria, depending on the
type of classification problem. Performance is calculated using a holdout rate of 40%,
which was not used to optimize the run model. This holdout set is then used as an
input to a multi-hold initial validation that computes the powers of seven relatively
228 R. D. Sah et al.
disjoint subsets. The largest and highest achievements have been removed, and the
average of the remaining 5 achievements is displayed here. This validation is not as
thorough as full mutual validation, but this approach balances the quality of run-time
and model validation.
In the Fig. 6, statics values of temperature are shown range 1 and range 2, and there
is total values of these statics. The detailed description is shown in Table 1.
Gains/lift Table (avg response rate: 29.03%, avg score: 30.12%) (Fig. 7).
The accuracy and execution time of the model are displayed in the overview. ROC
comparison: It shows the ROC curves for all models in one graph. The better the
model, the closer the curve is to the upper left corner. Only two classes of problems are
displayed. After making changes for simulation, the dataset is stored electronically.
This is the data that all modeling approaches and automated feature engineering
can take as input. You can use only a subset of this data in your model or generate
more columns. Text: Displayed only if feature extraction of text data is enabled. As
a survey of tables and surveys, we show the words in the columns of text used in the
analysis. Finally, if you allow emotional or language evaluation, you can examine the
distribution of those values across columns of text. Correlated weights: The global
relevance of each input data column to the value of the desired column, regardless
of model-based algorithms or perhaps showcase techniques. For predictions, the
weights are based on the correlation between the columns and the target columns.
Model-specific weights, on the other hand, are the columns that have the greatest
230 R. D. Sah et al.
Fig. 7 Variables’
correlation, numeric values
There are basically two possibilities for this study to improve the results obtained
with logistic algorithms. One is to improve the algorithm characteristics, and the
other is to use the “deep rooted” approach in this algorithm.
Lambda parameters, also known as generalized linear factors, are an important
component of logistic regression techniques and help researchers find the optimal
combination of simplicity and complexity. In other words, a high lambda ratio indi-
cates that the model is easy to fit, and a small number of lambdas indicate that the
model is inadequate. Model that is too complex and over fitting. Moving on to deep
learning, it demonstrates how sophisticated methods such as hierarchical learning
can significantly boost classification results. The standard response rate was 28.95%,
whereas the average score was 31.26%. It is also worth noting how this study demon-
strates how advanced deep learning approaches can boost performance, particularly
in classification algorithms.
References
1. Soundari AG, Jeslin JG, Akshaya AC (2019) Indian air quality prediction and analysis using
machine learning. Int J Appl Eng Res 14(11). ISSN 0973-4562 (Special Issue)
2. Guttikunda K, Goel R, Pant P (2014) Nature of air pollution, emission sources, and management
in the Indian cities. Atmos Environ 95:501–510
3. Leong WC, Kelan RO, Ahmad Z (2020) Prediction of air pollution index (API) using support
vector machine (SVM). J Environ Chem Eng 8(3):103208
4. Kotsiantis SB, Kanellopoulos D, Pintelas PE (2006) Data preprocessing for supervised leaning.
Int J Comput Sci 1(2):111–117
Comparative Analysis Using Data Mining Techniques to Predict the Air … 231
5. Han S, Qubo C, Meng H (2012) Parameter selection in SVM with RBF kernel function. In:
Proceedings of world automation congress, Puerto Vallarta, Mexico, pp 1–4
6. Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–
222
7. Arampongsanuwat S, Meesad P (2011) Prediction of pm10 using support vector regression. In:
Proceedings of international conference on information and electronics engineering, Bangkok,
Thailand, vol 6
8. Vong CM, Wong PK, Yang JY (2012) Short-term prediction of air pollution in Macau using
support vector machines. J Control Sci Eng 2012
9. Sah RD, Sheetlani J (2017) Pattern extraction and analysis of health care data using rule-based
classifier and neural network model. Int J Comp Technol Appl 8(4):551–556
10. Vapnik V et al (1997) Predicting time series with support vector machines. In: Proceedings of
ICANN, Lausanne, Switzerland, pp 999–1004
11. Frank E, Hall MA, Pal CJ, Witten IH (2017) Data mining: practical machine learning tools and
techniques, 4th edn. Elsevier/Morgan Kaufmann, Cambridge, Massachusetts, pp 147
12. Chicco D, Warrens MJ, Jurman G (2021) The coefficient of determination R-squared is more
informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.
PeerJ Comp Sci 7:623
13. Albon A (2018) Machine learning with Python cookbook: practical solutions from prepro-
cessing to deep learning. O’Reilly, First edition. Kindle Edition, p. 91
14. Parbat D, Chakraborty M (2020) A python based support vector regression model or prediction
of COVID19 cases in India. Chaos, Solitons Fractals 138:109942
15. Sah RD (2017) Review of medical disease symptoms prediction using data mining technique.
IOSR J Comp Eng (IOSR-JCE) 19(3):59–70, Ver. I (May–June 2017). a-ISSN: 2278-0661,
p-ISSN: 2278-8727
16. Weizhen H et al (2014) Using support vector regression to predict PM10 and PM2.5. Proc IOP
Conf Ser: Earth Environ Sci 17:012268. Jakarta, Indonesia
Complexity Reduction by Signal Passing
Technique in MIMO Decoders
Abstract Breadth first tree search algorithms are intended to search its lattice points
by using breadth first search method; guarantees optimal BER performance without
the need for an estimate of SNR. However, one such breadth first signal decoder
(BSIDE) algorithm, usually, searches more nodes in the tree and incurs a higher
implementation complexity. A signal processing technique capable of minimizing
the number of multipliers needed for realizing the processing unit of the breadth
first signal decoder is proposed. The proposed signal passing technique reduces 86%
of computational complexity for 2 × 2 and 99% for 4 × 4 multiple input multiple
output (MIMO) systems with similar performance as that of BSIDE.
1 Introduction
R. Jothikumar (B)
Department of Electronics and Communication Engineering, Sri Manakula Vinayagar
Engineering College, Puducherry 605014, India
e-mail: [email protected]
N. Rangasamy
Department of Electronics Engineering, School of Engineering and Technology, Pondicherry
University, Puducherry 605014, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 233
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_22
234 R. Jothikumar and N. Rangasamy
search has been done, namely breadth and depth first. Efficient methods for instance
sphere decoding (SD) [6, 7] and K-best [8–10] for depth and breadth first decoding
method, respectively, have layer processing unit for symbol detection at the detector,
where the constellation points of quadrature amplitude modulation (QAM) are real-
ized using multipliers. These multipliers multiply a constant with the decomposed
channel element [11] and are named as constant multiplication units (CMUs). These
CMUs used at each layer of the tree are realized by parallel processing and increase
the hardware requirement of symbol detection. Therefore, this paper proposes an
alternate method that majorly contributes toward the reduction of hardware require-
ments needed for layer processing unit by employing a signal processing technique
that involves serial computation. Thus, the proposed method shows improvement in
complexity reduction with the similar logic delay.
2 System Model
Y = HX + ñ (1)
Y = QRX + n
Q Y = RX + QH n
H
Ŷ = RX + n̂ (2)
where ‘S’ denotes the set of quadrature amplitude modulation (QAM) entries in the
constellation. The received signal Y and transmitted signal X at the receiver part are
transformed to real value representation with N = 2NT and M = 2NR elements,
respectively, which in turn transforms H to an M × N matrix. Each Xi where i = 1,
… N R may be one of the real number from the set , for example, it can be + 1 or −
1 if 4-QAM modulation is considered and takes value from = {1, 3, − 3, − 1}for
Complexity Reduction by Signal Passing Technique in MIMO Decoders 235
2
16 QAM. Maximum likelihood detection method estimates the value Ŷ − RX
and finalizes the minimum one to be the ML estimate. The ML distance is illustrated
as
2
d̂ = Ŷ − RX̂
2 2
N
N 2
= Ŷ1 − R1,i X̂ i + Ŷ2 − R2,i X̂ i + · · · + ŶN − RN ,N X̂ N (4)
i=1 i=2
where Ŷi represents ith element of Ŷ = QH Y , Rij denotes the upper triangular matrix,
and X j is the element of transmitted signal.
3 Existing Method
Consider a C-ary tree as shown in Fig. 1 with number of layer equal to N and C
T
denotes the elements present in lattice. Let X (s) = X (s) (s) (s)
1,l , X 2,l . . . X N −l+1,l be a
N −l+1
vector corresponding to the sth node of the lth layer, where 1 ≤ s ≤ C and 1
≤ l ≤ N. The BSIDE [12, 13] algorithm uses breadth first search strategy, in which
search for minimum valued node is done at each layer. And the assessment for the
received signal is taken at their appropriate corresponding level. The same procedure
repeats until reach the first layer, to get the ML solution. The computational burden
decreases when the parameters, {dl }2l=N , are smaller. To realize it, BSIDE is used,
which merges linear detection algorithm, namely decision feedback equalization
T
(DFE) and nonlinear method ML with a result of, X̃ = X̃1 , X̃2 , . . . , X̃N and
2
d̃ = Ŷ − RX̃ . The X and dl give a solution of DFE and distance. The equation
of dl is given by
2
dl = min dl+1 , Ŷ − Rq(l) (6)
236 R. Jothikumar and N. Rangasamy
T
With dN +1 = d̃ and q(l) = X̃1 , X̃2 , . . . , X̃l−1 , X (1) (1) (1)
1,l , X 2,l , . . . , X N −l,l+1 . Let el
be the chosen node without discarded at the lth layer with assuming condition el +
1 = 1. The distance of the node at the lth layer is mathematically given as
2
T
N
(j) (j) (q)
X 1,N , X (t) (t)
1,l+1 ....X N −l,l+1 = Ŷl − Rl,l X 1,N − Rl,i X i−l,l+1
i=l+1
2
N
N
+ ··· Ŷj − Rj,i X (t)
i−l,l+1
j=l+1 i=l+1
2
(j)
= Ŷl − Rl,l X 1,N − X (t)
l+1 + X (t)
l+1 (7)
Fig. 2 Conventional
processing unit for 16 QAM
MIMO symbol detection,
where DEC-decoder,
ADD-adder and
NEG-negator, and << 2—left
shift operation [11]
Complexity Reduction by Signal Passing Technique in MIMO Decoders 237
4 Proposed Algorithm
A signal passing technique that exploits serial computation in CMU with reduced
complexity in realizing the constellation was proposed. The requirement of CMU’s
in each layer of the processing unit is high, while reducing the complexity of a single
CMU will have a significant impact over the total complexity of the processing unit.
The proposed signal passing technique can be realized through cost metric defined
below;
2 2
Ŷ − RX = Ŷ − RX ai (8)
√
√ of modulated signal, S i (t) = ai E0 (t) = ai X , (t) is
where ‘X’ is written in terms
the basis function, and E0 is the energy signal with lowest amplitude. Considering
only ai in the receiver and neglecting all other, we can reproduce the ai of QAM to
be
ai = (2i − 1 − M ) i = 1, 2, . . . M (9)
M
1 = ai = (2i − 1 − M ) i = 1, 2, . . .
2
M
2 = ai = (2i − 1 − M ) i = + 1, . . . M (10)
2
And it enforces to compute the cost metric with only one set (namely 2 ). The cost
metric of the other (1 ) can be realized through reflection. The proposed method
passes the signal of first computed one to realize the next, so that the modified
constellation is given as
M
2 = ai = (2i − 1 − M ) i = + 1...M (11)
2
Then, the set 2 can be written as
2 = ai , ai + 2, ai + 2, . . .
⇓ ⇓
ai ai (12)
where
1 = −2 (13)
238 R. Jothikumar and N. Rangasamy
with the help of Eq. (13), the modified version of cost metric is illustrated as
2 2
Ŷ − RX ai = Ŷ − RX (2i − 1 − M )
2N
M
Lm = Ŷm − Rm,k Xk (2i − 1 − M )
k=m+1 i= M2 +1
2N
M
Lm = Ŷm − Rm,k Xk 2 + 1 − 1 − M + (X + 2) + (X + 2) + · · ·
2
k=m+1
⇓ ⇓ ⇓
X X X (14)
Since ‘R’ remains identical for the respective layer of breadth first tree structure
of MIMO, sign change technique can be applied. Thus, the constellation points are
grouped into two sets, and the calculation of RX ai can be made simple. Further to
reduce the arithmetic computations required by the CMU, the proposed technique
passes the present computed value of X to compute the next, in serial manner. To
understand this, a 64-QAM modulation scheme is considered, in which the set ai is
divided into 1 = {− 1, − 3, …} and 2 = {1, 3, …}, where 2 can be obtained
by changing the sign of 1 . Let the processing unit computes the fourth layer of the
symbol detection for 64-QAM system for which the input–output relation is given
as
arg min 2
X 4 = x∈N ŷ4 − R44 X 4 (15)
X 4 substitutes the entries from the set 1 and 2 ; the Eqs. (11–12) with respect
to proposed model becomes as follows
Thus, the computational complexity of the multiplier unit is cut down by the signal
passing technique. The hardware configuration for the proposed multiplier unit is
represented in Fig. 3. The feedback encountered in the proposed method introduces
delay, which is trivial when compared to reduction in computational complexity.
Complexity Reduction by Signal Passing Technique in MIMO Decoders 239
5 Evaluation
Table 2 Hardware
Modulation Conventional Proposed Percentage of
requirement of multiplier unit
reduction (%)
to process ‘N’ layers of tree
4 QAM N −i+1 N −i+1 –
16 QAM 3N − 3i + 3 3N − 3i + 3 –
64 QAM 7N − 7i + 7 5N − 5i + 5 28.5
256 QAM 15N − 15i + 15 7N − 7i + 7 53.3
at the lth layer for the signal passing technique can be given as
N
C C
+ + el+1 + Ul + Dl (17)
4 2
l=1
el = 2
Ul = N − l
N
0 ≤ Dl ≤ N + i (18)
i=1
This proposed algorithm works well for increased constellation size with
substantial reduction in complexity.
6 Simulation Results
a 4 × 4 is 99%. Figures 5 and 6 show comparison of the complexity curves for both
2 × 2 and 4 × 4.
7 Conclusion
References
1. Gesbert D, Shafi M, Shiu DS, Smith PJ, Naguib A (2003) From theory to practice: an overview
of MIMO space-time coded wireless systems. IEEE J Sel Areas Commun 21(3):281–302
2. Yang S, Hanzo L (2015) Fifty years of MIMO detection: the road to large-scale MIMO. IEEE
242 R. Jothikumar and N. Rangasamy
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 243
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_23
244 K. Rathore et al.
1 Introduction
A micro-controller or computer system will usually have many serial data ports
that are used for communicating with input–output devices such as computer serial
communication ports that are compatible with UART, serial printers, key-boards,
Bluetooth-UART devices, and so on. As its names implies, a UART is a serial commu-
nication protocol that receives and transmits the data serially [1, 2]. UART is an
acronym of universal asynchronous receiver and transmitter. It works as a data trans-
mission protocol that facilitates serial communication among devices. The UART
protocol supports full-duplex, half-duplex, and simplex transmissions between any
transmitter–receiver pair. In the simplex mode of communication systems, the data
bits are transmitted only from the source end only. In the half-duplex mode of commu-
nication systems, the data transmission is possible from both directions, but at a time,
only one of the two users can perform data transmission. If the receiver receives the
data, then the transmitter is in an idle state and vise-versa. In the full-duplex mode
of communication systems, both users can actively participate and exchange data
at the same time. UART consists of two modules, namely transmitter and receiver.
The transmitter module converts the bytes into serial bits and transmits the data seri-
ally. The receiver performs serial-to-parallel conversion on the asynchronous data
frame received from the serial data input [3]. UARTs are asynchronous in nature
since the transmitter and receiver modules transfer the data without support from
an external clock signal. To synchronize the received data frame, the clock is not
required. Instead, UART’s transmitter and receiver module operates at equal baud
rates. A baud rate is a rate at which unit data is transmitted through a communication
channel, usually in bits-per-second (bps). Standard baud rates for UART are 1200,
2400, 4800, 9600, 19200, 38400, 57600, and 115200 bps. In order to transmit data
effectively over UART, both the transmitter and the receiver must use the same baud
rate [4–6]. The standard UART data frame consists of 1 start bit, 8 transmits data
bits, 1 parity bit, and 1 stop bit. The parity bit is optional which depends upon the
designer’s requirement whether they want to consider even parity or odd parity. In
order to produce a parity bit in the UART protocol, all 8 bits of the data byte are
added up, and the evenness or the oddness of the sum decides whether the bit is
set or not. So, this low-level error checking mechanism makes the whole system
less reliable because if two data bits are corrupted; for instance, the parity will not
detect the error. Thus to remove this limitation, we are introducing checksum bits
in the standard UART protocol [7]. Checksums are counts of the bits that are also
transmitted with the payload. This helps the receiver to ensure that the number of bits
received are equal to the number of bits transferred by the sender. If both counts are
equal, then the transmission is judged as a success, else error detection mechanism
is initiated [8, 9]. In this paper, we proposed the architecture of a UART transmitter
block and receiver block that consists of a checksum generator and a checksum
checker, respectively, and these blocks have been synthesized and simulated using
Verilog hardware descriptive language [10].
A New Approach to Improve Reliability in UART Using Checksum … 245
Whenever the data transmission is initiated using the UART module, it always gener-
ates a data frame. Now, to manage the serial transmission of this data, the transmitter
adds certain bits, namely start bit (one), stop bit (one), and checksum (two bits)
initially. So, total 12 bits are present in the data frame at the time of its creation
out of which only bits represent the actual data. During the reset condition, the data
transmission remains high, i.e., logic 1. At the time of transmission, first start bit is
sent which has logic 0; after that 8 bits of data are transmitted followed by 2 bits of
checksum, and at last, stop bit is sent which has a logic 1 (Fig. 1).
The checksum is added to this protocol to eliminate the problem of corrupted data
bits, which is not possible while using even/odd parity. Using this method, the receiver
can check if the output is correct as it is more reliable. The checksum generator first
divides the transmitted 8-bit input data into 4 chunks of 2 bits each; then, an add
operation is implemented on these 4, 2-bit chunks. After that, take 1’s complement
of the result. Afterward, the checksum bit will be attached to the 8-bit input data as
the final result. The checksum generator operates on a mechanism formed by full
adders and NOT gates.
A baud rate generator (BRG) actually operates as a frequency divider circuit. This
BRG module has an active-low reset and a system clock, which act as inputs, and
the baud clock acts as an output. In this design, the BRG produces a frequency clock
that is 8 times of the baud rate clock. In this way, the asynchronous serial data at the
receiver is sampled precisely.
The transmitter FSM changes the transmitter’s state. This module consists of three
inputs and three outputs. The transmit enable signal, the active-low reset, and the
baud clock act as input signals, and the load, shift, and busy act as outputs of a
transmitter FSM. In this transmitter FSM module, there are four states: idle state,
load state, shift state, and hold state. Idle is the initial state of the transmitter FSM. In
this state, the transmit enable signal, load signal, busy signal, and shift signal remain
low. The transmitter FSM moves to load state when transmit enable is high. In the
load state of the transmitter FSM, the data is loaded before a frame is generated. On
the next baud clock, the transmitter’s FSM changes to the shift state where the data
is transmitted serially one clock at a time till all the data is being transmitted. Hold
state is used to clean the signal’s value (Fig. 3).
To transmit serial data, PISO registers are used. The baud clock, the load signal, the
active-low reset, the shift signal, the checksum bits, and the input 8-bit transmitted
data act as inputs of the PISO register. Serial out acts as an output of the PISO
A New Approach to Improve Reliability in UART Using Checksum … 247
register. The data frame is created when the load signal is high, meaning that the
required additional bits (start, parity bit, and stop bits) are appended to the data bits.
Furthermore, as the signal reaches a high value, serial data transmission starts.
The transmission data frame’s start bit is detected by a negative edge detector. Prior
to transmission, logical high is used as the default transmitted data signal. Serial bits
are received by the UART receiver when the start bit appears, then the signal shifts
from logic high to low. A negative edge detector is useful for detecting the start bit.
A combination of a AND gate and D flip-flop is used to design such edge detector.
A baud rate generator actually operates as a frequency divider circuit. This baud rate
generator module has an active-low reset and a system clock, which act as inputs,
and the baud clock acts as an output. Both the transmitter and receiver operate at the
equal rates (baud).
The receiver FSM changes the receiver’s state. This module consists of three inputs
and three outputs. The negative edge detector signal, the active-low reset, and the
baud clock act as input signals, and the load, shift, and busy act as outputs of a receiver
FSM. In this receiver FSM module, there are four states: idle state, shift state, load
state, and hold state. Idle is the initial state of the receiver FSM. In this state, the
negative edge detector signal, load signal, busy signal, and shift signal remain low.
The start bit is being detected by the negative edge detector module, which signals
the receiver to start. Once the receiver reaches the shift state, shifting operations start
until all bits have been received. On the next baud clock, the receiver moves to the
load state. Here, 8 bits of data are loaded by removing start bit, checksum bits, and
a stop bit. On the next baud clock, the receiver moves from the load state to the hold
state, and hold state is used to clean the signal’s value (Fig. 5).
To receive the serial data, SIPO registers are used. The baud clock, the load signal,
the active-low reset, the shift signal, and the received 8-bit serial data act as inputs of
the SIPO register. Parallel data out acts as an output of the SIPO register. One bit of
data is shifted on the positive edge of the baud clock when the shift signal is set to 1.
After removing the extra bits, 8 bits carrying the actual data are sent to the receiver’s
output during high load signal.
A New Approach to Improve Reliability in UART Using Checksum … 249
The checksum checker module validates the correctness of received data. This
module consists of two inputs and one output. The 8-bit input data signal and
checksum bits signal from the SIPO register act as inputs, and the data valid signal
acts as an output for the checksum checker. If the value of the data valid signal is 00,
then the received data is correct, otherwise not.
We have simulated our UART architecture design on the Xilinx ISE. Figure 6 repre-
sents the waveform simulation of the UART transmitter module. In this paper, 8-bit
input data, i.e., 10101100 is transferred serially using the UART module which
utilizes the baud clock’s positive edge and is shown using blue color in Fig. 6. More-
over, the data out signal is present in violet color. By default, the serial data out
remains high (logic 1). At the time of high load signal, the PISO register creates a
new data frame using one start bit (logic 0), 8 input bits (10101100), two checksum
bits (10), and at last one-stop bit (logic 1). When the signal is high, shifting opera-
tions is initiated, and the start bit shifts (logic zero) on the next positive edge of the
baud clock. Similarly, least significant bit of the 8 data bits shifts, and it is followed
by the shifting of checksum bits and stop bit. When transmitter in the shift state,
the counter counts the bits until all bits are not transmitted serially. Finally, serial
output shows that 010101100101 (12 bits of the data or one modified UART frame)
transmitted successfully. If the FSM will not detect the start bit, it will remain in its
idle state. Once it detects a valid start bit (in this case logic 0), the FSM will move
toward shift state. Moreover, the shift signal will hold the high value until all the
serial bits including data and extra bits are saved in the temporary register. When
the load signal is high, stored data (10101100) is the load to the receiver output.
The 8-bit receiver output (10101100) and checksum bits signal (10) from the SIPO
250 K. Rathore et al.
register act as inputs for the checksum checker. The checksum checker will divide
the 10-bit data (including 8 bits of input data and 2 bits of checksum) into 5, 2 bits of
chunks. Then, an add operation is implemented on these 5, 2-bit chunks. After that,
take the 1’s compliment of the result. Now, the value of the data valid signal is 00,
which shows yellow color in Fig. 7 that means the received data is correct.
Now, let us consider another example. In this, we will intentionally corrupt some
bits of the input data signal (i.e., the output of the UART transmitter), and this signal
will act as an input for the UART receiver. On the transmitter side, the user provides
8 bits of input data (11001000) to the UART transmitter; it is shown using the
orange color in Fig. 8. After applying the respective module’s logic, the transmitter
can transmit the 12 bits of data serially, which includes one start bit (logic 0), 8
bits of data (11001000), two checksum bits (01), and one-stop bit (logic 1). During
transmission of data, 2 bits of the input data signal get corrupted, specifically the
2nd (D2) and 6th (D6) bit. Now, the UART receiver receives 11101010 as an input
instead of 11001000, which includes one start bit (logic 0), checksum bits (01), and
stop bit (logic 1). Therefore, 1 data frame concatenates to (011101010011), which
acts as serial input data for the UART receiver. The data valid signal gives 1 0
as an output, and it is shown using the yellow color in Fig. 9, which leads to the
contradiction checksum algorithm (data valid = 00). So, the received signal is found
to be corrupted.
A New Approach to Improve Reliability in UART Using Checksum … 251
6 Conclusion
All most all the UART protocols use even/odd parity as an error detection technique.
This low-level error checking mechanism makes the whole system less reliable.
To fix this limitation, an enhanced version of UART has been presented with the
introduction of a checksum. This modified UART protocol performs verification
by simulating the transmitter and receiver waveforms on Xilinx ISE. Using this
modified UART protocol could significantly enhance the efficiency of the serial
data transmission protocol, and it also adds reliability, stability, and flexibility to the
standard UART design that is often used in embedded systems and digital circuit
applications.
References
1. Fang Y-Y, Chen X-J (2011) Design and simulation of uart serial communication module based
on vhdl. In: 2011 3rd International workshop on ıntelligent systems and applications. IEEE,
pp 1–4
2. Nanda U, Pattnaik SK (2016) Universal asynchronous receiver and transmitter (uart). In: 2016
3rd İnternational conference on advanced computing and communication systems (ICACCS),
vol 1. IEEE, pp 1–5
252 K. Rathore et al.
3. Daraban M, Corches C, Taut A, Chindris G (2021) Protocol over uart for real-time applications.
In: 2021 IEEE 27th ınternational symposium for design and technology in electronic packaging
(SIITME). IEEE, pp 85–88
4. Wang Y, Song K (2011) A new approach to realize uart. In: Proceedings of 2011 international
conference on electronic & mechanical engineer-ing and information technology, vol 5. IEEE,
pp 2749–2752
5. Anjum F, Thakre MP. Vhdl based serial communication interface inspired by 9-bit uart
6. Mahure B, Tanwar R (2012) Uart with automatic baud rate generator and frequency divider. J
Inf Syst Commun 3(1):265
7. Fletcher J (1982) An arithmetic checksum for serial transmissions. IEEE Trans Commun
30(1):247–252
8. Tong XR, Sheng ZB (2012) Design of uart with crc check based on fpga. In: Advanced materials
research, vol 490. Trans Tech Publication, pp 1241–1245
9. Wakhle GB, Aggarwal I, Gaba S (2012) Synthesis and implementation of uart using vhdl codes.
In: 2012 International symposium on computer, consumer and control. IEEE, pp 1–3
10. Priyanka B, Gokul M, Nigitha A, Poomica J (2021) Design of uart using verilog and verifying
using uvm. In: 2021 7th International conference on advanced computing and communication
systems (ICACCS), vol 1. IEEE, pp 1270–1273
Modified VHDL Implementation
of 128-Bit Rijndael AES Algorithm
by Asymmetric Keys
Abstract Using electronic means to transfer data exposes the data to risk of attack.
The increasing usage of electronic media has pushed security into the spotlight.
Cryptography’s relevance has risen dramatically in recent years as a result of the
rise of electronic data transfers. This paper gives us a scenario about the commonly
used and well reliable advanced encryption standard (AES) algorithm. It also throws
light on the information about functional cipher operation. Since digital data is being
exchanged at such a rapid rate, the security of information in data storage and trans-
mission becomes significantly more important. The security of information trans-
mitted over wireless networks is of the utmost importance. Security of the data is
ensured in wireless communication by encryption and decryption of the data. Security
is provided through encryption algorithms used in the transmission channels. Devel-
oped as a Federal Information Processing Standard (FIPS) of the United States, AES
is an algorithm that can protect electronic data by encrypting it. The AES algorithm
for cryptography is a block cipher that encrypts and decrypts information by means
of asymmetric keys.
1 Introduction
With the expansion of data communications, security systems and devices that safe-
guard personal information transmitted over transmission channels have become
more necessary. A cryptosystem is much more appropriate for protecting large
amounts of data. Cryptography is already becoming increasingly important in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 253
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_24
254 S. Das et al.
embedded systems innovation due to the rapid increase in devices and apps sending
and receiving data; data transfer rates are increasing. Any organization or academic
organization should analyze the cipher strength as part of their security risk assess-
ment [1]. The NIST of the USA has approved the AES algorithm to succeed DES
(FIPS-197, 2001). Here, for the encryption and decryption purpose, we use separate
keys (Key A and Key B). Both the keys (A and B) are given prior to their respec-
tive inputs. Key A is used for the encrypting the plaintext, and Key B is used for
decrypting the cipher text (Fig. 1).
The block size for this encryption algorithm is 128 bits, while the key size is 128,
192, or 256 bits (Table 1).
Because of its great soundness and is dependability in both software and hardware,
AES is extensively used [2]. Despite the availability of several technology solutions,
they are too sluggard for fast-paced operations such as for wireless communication
networks. For wide range of applications, a number of AES optimized designs and
modifications have been presented. AES analysts state that, out of 10 rounds, about
8 can be brute force successfully on today’s modern hardware systems. In addition,
the remaining 2 rounds cannot be broken within enough time that would allow the
attacker to make the attack on the system impactful [3, 4].
Numerous studies have been conducted for adjusting cryptography and maneuver the
big data using cloud servers. An article describes the use of unexpected confidentiality
as a security solution with AES-based storage needs and less storage [5]. AES was
created due to its lower storage requirements and faster execution time as compared
to previous approaches. The study presented a secure method basis of two private
keys, with the secondary key (extra) being used for both encrypting and decrypting.
According to the conclusions, this enhances the security while maintaining the perfor-
mance index close to the original AES [6]. To protect the cloud computing paradigm,
a similar study effort with a special emphasis on cloud computing and the field of
cloud computing is examined in [7], where a reconfiguration of AES is outlined that
offers protection over data stored in the cloud by leveraging a new key generation
procedure as well as the involvement of a transpose matrix to construct ciphertexts
that are buried deep from the eyes of third parties, providing security to access crucial
data for over cloud. Reena et al. [8] offered a study that focused on key expansion and
shift row transformation to maintain a high degree of security. The purpose was to
prevent and safeguard the information against cyber-attacks. Their experiment also
cuts the time it took to encrypt photos and produced a better outcome than AES.
They also helped to improve bandwidth efficiency.
1.2 Methodology
Inputs and outputs: The AES algorithm uses a single 128-bit sequence as both outputs
and inputs. An AES cipher key is 128 bits in length. A byte is used in the AES
algorithm as its basic unit of computation, so the input bits are converted into byte
sequences prior to processing. After that, a two-dimensional array of bytes (known
as the State) is created. A state array is organized into four rows of bytes. Each row
contains N b bytes, where N b is the block size divided by 32. The State array will go
through all of the processes core process (Cipher and inverse cipher), after which its
final result will be transmitted to the output [9].
Key Schedule: To construct a key schedule, the AES system accepts the cipher key
as an input and runs it through a key expansion procedure. N b (N r + 1) is generated
via the key expansion. There are broadly two types of keys used during the algorithm
process.
Symmetric Key: Secret-key or shared-key cryptography is other name for
symmetric key cryptography. In this sort of system, the transmitter and receiver
utilize the same key for both decryption and encryption. The framework is depen-
dent on self-certification, which implies that the key is self-certified. This type of
cryptographic technology is necessary since it allows for speedier service without
consuming a lot of resources [10].
256 S. Das et al.
r = 0, while the second shifts by one byte, third by two bytes, and finally fourth rows
move cyclically left by three bytes.
Mix Columns: A Galois field multiplication is used to achieve this transformation.
Each byte in a column is given a new value depending on a concatenation of all four
bytes in the column.
Add Round Key: When compared to the previous encryption and decryp-
tion rounds, the add round key procedure is done once more. As a hardware
implementation, it uses a simple exclusive-or operation with the 128-bit data and
key.
Inverse Cipher: This is accomplished by first copying the input (ciphertext)
into the State array, and then performing three inverse transformations, and adding
an add round key transformation to the State array. After adding the first round
key to the State array, a round function is constructed, with the final round
again differing slightly from the first round likewise encryption. A key expansion
routine is used to derive a one-dimensional array of four-byte words (Round key)
that are used to parameterize the round function. Except for the last round, which
does not involve the transformation of inverse mix columns, there are no distinctions
between the rounds of N r [13].
Add Round Key: Because of its inverse nature, add round key is exactly its inverse.
To select the round keys, reverse the order in which they are entered.
Inverse Shift Rows: Inverse shift rows have similar properties to that of shift rows.
The first row is not changed, but the second is shifted to the right by one bit, third by
two, and fourth by three bits.
Inverse Sub Bytes: It automates the substitution process by making use of a
previously calculated substitution table called inverse S-box. 256 numbers (from
0 to 255) and their values are stored in the inverse S-box table.
Inverse Mix Columns: It operates similarly to mix columns in the encryption
part; however, it has a different matrix. Polynomial transformations of degree less
than 4 over GF (2 * 8) are used for the inverse mix columns transformation. The
coefficients of these polynomials are the columns of the state multiplied by the mix
columns matrix.
The VHDL programming language was used to code the suggested design,
and the ISE Design Suite software was used to analyze the results. Here,
we have taken the hexadecimal value of nitjalandhargood as an input,
and then, we have carried out the simulation results. Figure 3a shows
3c4fcf098815f7aba6d2ae2816157e2b is the private key, and 10 rounds of encryp-
tion have been performed to the plaintext of 6e69746a616c616e64686172676f6f64.
Figure 3b shows the message bit has been encrypted at the end of 10th
round for the given input of 6e69746a616c616e64686172676f6f64, and at
the end of encryption algorithm, we have successfully encrypted the input
258 S. Das et al.
2.1 Conclusion
Software can easily implement the AES algorithm. Software implementations are the
cheapest, but they have the least physical security and are the slowest. Apart from the
increasing amounts of high, increased secure communications mixed with physical
security, cryptography is now being implemented efficiently. Cryptography is now
becoming particularly crucial in today’s society. As a result, the frequency is by far
the most important aspect in order to minimize the time duration. We have addressed
the basics of the AES algorithm as well as the implementations of its modules in
VHDL in this study.
Fig. 3 a Simulation results of AES encryption algorithm; b simulation results of AES encryption
algorithm; c simulation results of AES decryption algorithm; d simulation results of AES decryption
algorithm
260 S. Das et al.
References
1. Sharma N (2017) A review of information security using cryptography technique. Int J Adv
Res Comp Sci 8(4)
2. Luo AW, Yi QM, Shi M (2011, May) Design and implementation of area-optimized AES
based on FPGA. In: 2011 International conference on business management and electronic
information, vol 1. IEEE, pp 743–746
3. Jun Y, Jun D, Na L, Yixiong G (2010, March) FPGA-based design and implementation of
reduced AES algorithm. In: 2010 International conference on challenges in environmental
science and computer engineering, vol 2. IEEE, pp 67–70
4. Deshpande AM, Deshpande MS, Kayatanavar DN (2009, June) FPGA implementation of
AES encryption and decryption. In: 2009 International conference on control, automation,
communication and energy conservation. IEEE, pp 1–6
5. Roy S, Das AK, Chatterjee S, Kumar N, Chattopadhyay S, Rodrigues JJ (2018) Provably secure
fine-grained data access control over multiple cloud servers in mobile cloud computing based
healthcare applications. IEEE Trans Industr Inf 15(1):457–468
6. Fadul IMA, Ahmed TMH (2013) Enhanced security of Rijndael algorithm using two secret
keys. Int J Secur Appl 7(4):127–134
7. Pancholi VR, Patel BP (2016) Enhancement of cloud computing security with secure data
storage using AES. Int J Inno Res Sci Technol 2(9):18–21
8. Mehla R, Kaur H (2014) Different reviews and variants of advance encryption standard. Int J
Sci Res (IJSR), ISSN (Online), pp 2319–7064
9. Daemen J, Knudsen L, Rijmen V (1997, Jan) The block cipher Square. In: International
workshop on fast software encryption. Springer, Berlin, Heidelberg, pp 149–165
10. Terec R, Vaida MF, Alboaie L, Chiorean L (2011) DNA security using symmetric and asym-
metric cryptography. In: The society of digital information and wireless communications (vol
1, No 1, pp 34–51). IEEE, Piscataway, NJ, USA
11. Wang CH, Chuang CL, Wu CW (2009) An efficient multimode multiplier supporting AES and
fundamental operations of public-key cryptosystems. IEEE Trans Very Large Scale Integration
(VLSI) Syst 18(4):553–563
12. Cheng H, Ding Q (2012, Dec) Overview of the block cipher. In: 2012 Second international
conference on instrumentation, measurement, computer, communication and control. IEEE,
pp 1628–1631
Modified VHDL Implementation of 128-Bit Rijndael AES Algorithm … 261
13. Jing MH, Chen YH, Chang YT, Hsu CH (2001, Nov) The design of a fast inverse module
in AES. In: 2001 International conferences on info-tech and info-net. Proceedings (Cat. No.
01EX479), vol 3. IEEE, pp 298–303
A Computationally Inexpensive Method
Based on Transfer Learning for Mobile
Malware Detection
Abstract With the broad usage of Android smartphones, malware growth has been
rising exponentially. The high prominence of Android applications has roused attack-
ers to target them. In the past few years, most scientists and researchers have
researched detecting Android malware through machine learning and deep learning
techniques. Though these traditional techniques provide good detection accuracy,
they need high configuration machines such as GPUs to train complex datasets. To
resolve this problem, the transfer learning approach is presented in this paper to
efficiently detect Android malware with low computational power requirements. By
transferring the necessary features and information from a pre-trained source model
to a target model, transfer learning lowers the computational cost. In this paper, we
initially performed Android malware detection using traditional models such as con-
volutional neural networks and then we applied the transfer learning technique to
reduce the computational cost. Additionally, we evaluated how well the suggested
strategy performed against other cutting-edge malware detection methods. The pro-
posed method achieved an accuracy of 97.5 with 2.2% false positive rate. In addition,
the overfitting problem and high computational power requirements are also reduced.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 263
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_25
264 S. Acharya et al.
1 Introduction
Mobile malware has been increasing drastically from the last few years. The rapid
growth of this malware has become the motivation for attackers to target smartphones,
especially Android. According to Zimperium1 mobile threat report, in 2021, more
than 10 million mobile phones were impacted by various threats in more than 214
countries. During 2019–21, more than 50 million phishing websites were examined
and 250% of the websites were mobile-specific. There is a huge increase in the
percentage of phishing web pages based on HTTPS from 2019 to 2021. This makes
it tough to differentiate between legitimate and malicious sites. Mobile threats along
with network attacks have dominated the malware ground.
Conventional mobile malware detection methods were limited by pattern match-
ing, and hence, it becomes difficult to identify novel variants. The detection methods
are based on artificial intelligence algorithms to provide more accurate and robust
results in the recent times. Moreover, the probability of getting false positive results
with these algorithms is also less compared to traditional detection methods. Malware
detection methods based on AI algorithms have two common phases: preprocessing
and classification. The first phase deals with feature extraction and the second phase
utilized the extracted features to train the machine learning or deep learning model.
The feature extraction methods are further classified into static and dynamic fea-
ture extraction [1]. The features are extracted without executing the mobile applica-
tion [2]. Static features include dex files, XML files, bytecode, API calls, application
permissions, etc. The major objective of static feature extraction is to disassemble the
application to get the source code. Ahma d Firdaus et al. [3] proposed a technique
based on static approach to extract static features. To choose the features among
106 strings, the authors further employed genetic search (GS), which is a query for
genetic algorithm. In contrast to static feature extraction, the dynamic feature extrac-
tion methods rely on executing the applications in a virtual emulator. Hence, the
obtained features provide more accurate information.
To classify malware, machine learning and deep learning techniques are com-
monly utilized. The machine learning models require detailed knowledge of feature
selection. Some of the machine learning algorithms used for malware classification
are support vector machine (SVM), random forest, k-nearest neighbor (KNN), and
so on. Deep learning models provide more accurate results during the classification
stage as compared to classical machine learning techniques. However, these models
require heavily configured machines for training and testing. The most common deep
learning techniques used to detect mobile malware are convolutional neural networks
(CNNs) and recurrent neural networks (RNNs).
In our study, we have statically obtained the bytecode features of Android appli-
cations. Further, grayscale images are recreated using an autoencoder. In the end, the
overall malware features are generated with the help of an autoencoder. The exper-
iments have been conducted using CCCS-CIC-AndMal-2020, Drebin, AAGM, and
1 https://www.zimperium.com/global-mobile-threat-report/.
A Computationally Inexpensive Method … 265
hybrid datasets, respectively. The experimental outcomes provide good accuracy and
outperform various machine learning and deep learning models for detecting Android
malware.
The rest of the article is structured as follows: Sect. 2 discusses the related lit-
erature study. Section 3 demonstrates the proposed method. Section 4 presents the
experimental results. In the end, Sect. 5 concludes the paper.
2 Related Work
The scholarly world has done a lot of research in malware detection. Most of the
studies utilized machine learning algorithms to detect and classify Android malware.
The results obtained with machine learning algorithms are promising but the major
drawback of using machine learning algorithms is that they require domain-level
knowledge for feature selection and extraction. Hence, deep learning algorithms
were introduced to automatically extract critical features and classify mobile mal-
ware more efficiently as compared to machine learning approaches. The Maldozer
framework, proposed by Karbab et al. [4], can automatically detect Android malware
and offer familial categorization. To discover malicious applications, the authors used
deep learning algorithms. They derived numerous features from the dataset’s API
code sequences. Over 30 k hazardous samples out of 70 k samples were used in the
dataset used to evaluate the framework. Low false positive rates were attained by
the authors, however, the framework needed complex calculations to operate more
effectively.
The authors in [5] have suggested DL-Droid, a deep learning model that uses
dynamic analysis and stateful input generation to detect malware in Android plat-
forms. A study found that 94% accuracy (dynamic features only) and 95% accu-
racy (dynamic+static features) may be attained. The method employs an automated
framework for running Android apps and extracting their functionality. DL-Droid
uses these features as inputs for categorization. The DynaLog dynamic analysis
framework was used to test numerous apps.
The authors in [6] proposed a system that can be used with mobile phones. It
saves money by utilizing flexible computing resources. They use a convolutional
neural network to plot an API call graph to determine whether or not an application
is malicious (CNN). Using a simple classifier, it distinguishes between API call
networks used for malicious actions and API call graphs used by apps. They were
successful in achieving a high degree of accuracy. The technique uses API call graphs
from both harmful and helpful applications to train datasets. The next step is to use
Grad-CAM to discover high-weight API call graphs that are used by rogue apps.
Feng et al. proposed MobiTive [7], a real-time and responsive malware detec-
tion system for mobile phones. It protects by utilizing specialized deep neural
networks. This environment should be pre-installed and ready to use on mobile
phones. There are two parts to the functionality, i.e., model preparation, Dl training
model, model migration, and model quantization mobile phone deployment using
266 S. Acharya et al.
3 Proposed Method
Fig. 1 Proposed
methodology Android Application Samples
Manifest
API calls System Calls File Services
Permissions
Runtime Network
Dalvik Code Services
Libraries Activities
Benign
Layer Upgradation
Transfer Learning
Feature Extraction
Fine-tuning
Source Model
Target Model
(CNN)
Malicious
The APK files are visualized best by utilizing static features. To extract the binary
images from files, the files are converted into binary vector pixels. The entire APK
file data is treated as a byte stream and is stored in a matrix called binary vector
matrix. The APK files are extracted to produce 8-bit binary data files which are
further transformed into grayscale images. This transformation is depicted in Fig.
2. Every byte in the binary vector matrix has been transformed into a pixel value
because it might have a value between 0 and 255.
The steps to generate the images are given below:
n
Feature Vector Matrix
Grayscale Image
Malicious
Transfer
Source Model Target Model
To resolve the issue of overfitting, some of the layers of the pre-trained source
model are fine-tuned. Fine-tuning helps to restrict the generalized features to be
trained again and again. These general features can be APK version, history, software
information, temporary files, etc. To achieve a better fine-tuning mechanism, we
freeze the initial few layers of the source model.
In our study, we initially trained CNN model to classify benign and malicious apps.
For training, we used the CCCS-CIC-AndMal-2020 dataset containing 400K sam-
ples. Further, we transferred some of the features of the trained model to the target
transfer learning model and performed the classification with Drebin, AAGM, and
hybrid datasets. To select the transferrable features, we used the feature selection
method which is described below.
Feature Selection Method: Many of the features produced by the feature extraction
are irrelevant. We use attribute selection to pick the most important features from the
ones that were extracted. During attribute selection, we calculate the information gain
of each feature to determine its value. The entropy decrease caused by classification
is depicted as information gain, which captures feature efficacy in relation to the
class. Formally, let Fs be a set of features to be classified into C classes and Fn
denotes the nth subclass. Then, the entropy of Fs will be:
|Fn | |Fn |
E(Fs ) = − × log2 (1)
n∈C
|F| |F|
Let Fx denotes the sample subset with feature value x for A for a feature F with
x as the set of its potential values. The information gain can be calculated as:
|FX |
Infogain (Fs , f ) = E(Fs ) − × E(Fx )) (2)
x∈x( f ))
F
smaller size than the AndMal-2020 dataset. It consists of a total of 9476 benign
samples and 5560 malicious samples, respectively. The malicious samples belong
to 179 different families. The AAGM dataset consists of 1500 benign apps and 500
malware apps from 42 different malware families. Further, we constructed a hybrid
dataset using 15,000 benign applications from Google Play with the help of crawler
tools, and 10,000 malicious apps from the Drebin and AAGM datasets respectively.
4.2 Evaluation
In the first stage, the classification test is done for the CCCS-CIC-AndMal-2020
dataset containing more than 399 k samples. We used three well-known parameters
to evaluate the process: “Recall rate (Rec)”, “Precision (Prec)”, and “F-Score”. The
following formulas are used to define these parameters:
Nbi
Prec(bi) = (3)
Ni
Nbi
Rec(b, i) = (4)
Nb
Rec(b, i) ∗ Pre(b, i)
FScore (b, i) = 2 ∗ (5)
Rec(b, i) + Pre(b, i)
The experimental results provide an efficiency of 94.2% with a false positive rate
of 5.7%.
In the next stage, transfer learning is applied by fine-tuning the feature sets of
the CNN layers. The classification test is done for Drebin, AAGM, and Hybrid
A Computationally Inexpensive Method … 271
Table 2 Classification results using classical CNN approach for CCCS-CIC-AndMal-2020 dataset
Sample Precision F-score Rec Support
Genuine apps 0.933 0.915 0.90 1200
Malicious apps 0.918 0.917 0.92 1563
Table 3 Classification results using transfer learning approach for hybrid dataset
(Drebin/AAGM/Google Play)
Type Precision F-score Rec Support
Genuine apps 0.963 0.935 0.93 1100
Malicious apps 0.968 0.957 0.95 1423
and distributed equally throughout the dataset. We then used the transfer learning
strategy, which resulted in a cross-validated score of 97.5%. We changed the config-
uration file and fine-tuned the hyper-parameters of the CNN layer and dense layer
while using transfer learning. In comparison with the classic CNN model, the transfer
learning approach achieves superior performance and fewer false positives. Table 6
gives the results of the performance evaluation. It can be observed that the transfer
learning strategy outperforms the other two in terms of efficiency, computational
requirements, and FPR as well as no overfitting concerns. The transfer model’s con-
vergence rate is also quick because the entire model re-training is not required (Figs.
4 and 5).
A Computationally Inexpensive Method … 273
Fig. 5 Performance
comparison of CNN and
transfer learning models
5 Conclusion
Malware has been a part of smartphones since their inception. Malware applications
continue to succeed in eluding security models as the popularity of Android grows.
We explored how to detect and categorize Android malware using classic CNN and
transfer learning approaches in this article. The application of CNN on malware
images has become essential due to the widespread use of CNN in image processing.
A two-stage method for converting Android APKs into binary grayscale images
was suggested. The standard CNN model is fed these images as input. We used
the transfer learning strategy to the trained model, freezing the first layers of the
pre-trained model, to avoid the difficulties of overfitting, complexity, and computing
expense. The results of the evaluation demonstrate that the transfer learning strategy
has a higher accuracy of 97.5%.
References
8. Naway A, Li Y (2018) A review on the use of deep learning in android malware detection.
arXiv:1812.10360
9. Li D, Wang Z, Xue Y (2018) Fine-grained android malware detection based on deep learning.
In: 2018 IEEE conference on communications and network security (CNS). IEEE, pp 1–2
10. Mahindru A, Sangal A (2021) Fsdroid—a feature selection technique to detect malware from
android using machine learning techniques. Multimed Tools Appl 80(9):13271–13323
11. Xiao X, Yang S (2019) An image-inspired and cnn-based android malware detection approach.
In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE).
IEEE, pp 1259–1261
A Statistical Approach for Extractive
Hindi Text Summarization Using
Machine Translation
1 Introduction
This study is based on the results of ATS systems for various MT engine outputs
that are not affected by human involvement. ATS systems are becoming increasingly
popular and widely used [1, 2]. Numerous languages are available on the Internet
with great variations. But, except a few languages like English, other languages
are of low resources in terms of datasets, modeling techniques, and hence, accurate
methods for summary generation are less. To overcome and address this problem,
we propose a solution for automatic extractive text summarization using different
machine translation engines. We have used machine translation engines to translate
benchmark BBC News [3], CNN News [4], and DUC 2004 [5] datasets into Hindi
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 275
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_26
276 P. Gupta et al.
language to overcome the under-resource issue. These datasets are very popular and
easily accessible on Internet. We have focused on Google [6], Microsoft Bing [7]
and Systran translators [8] for English–Hindi translation and corpus generation for
the proposed method.
The proposed framework consists of preprocessing, summary extraction,
summary generation, and postprocessing steps. This work uses sentence ranking
using maximum likelihood estimation (MLE) and generated summaries using
ranking scores. Also, this method can check the similarity score or closeness of
the output summary. We have evaluated our system by calculating the ROUGE-3
and F-score.
Rest of the paper is organized as follows: a brief overview of related work is given
in Sect. 2. Section 3 describes the proposed work. Results and evaluation are shown
in Sect. 4. Finally, the conclusion of this work is discussed in Sect. 5.
2 Related Work
Further, the evaluation has been performed for summaries using different metrics
such as precision, recall for computing the F-score and ROUGE-3 score [27].
ROUGE is a set of metrics, rather than one metrics. ROUGE measures by the different
levels of N-grams, where N = 1, 2, 3 denotes unigrams, bigrams, and trigrams,
respectively [28].
3 Proposed Work
In this paper, we have used BBC News [3], CNN News [4], and DUC 2004 [5] datasets
in English language. We have collected English sentences from among three datasets,
and then, we have translated these sentences into Hindi language using three machine
translators, given in Table 1. We have extracted the unigrams, bigrams, and trigrams
of given datasets for translated Hindi text documents.
Table 1 MT systems
Engine No. Description
Engine 1 Microsoft Bing machine translator [6]
Engine 2 Google machine translator [7]
Engine 3 Systran machine translator [8]
278 P. Gupta et al.
3.2 Preprocessing
The score for output text has been calculated using MLE. We have used trigram
language model for calculating the probability for each trigram of Hindi text using
the Markov chain approach for computation of occurrences score and coherence
factor. For example, if we want to compute the probability of a string W = (w1 , w2 ,
…, wn ), then probability estimated of a trigram on these given sentences can be given
by Eq. 1.
Count(wn−2 wn−1 wn )
P(wn−2 wn−1 wn ) = (1)
Count(wn−2 wn−1 )
ATS defines the overall working of generating the score of translated text documents.
The probabilities of each translated sentence are computed using MLE. Furthermore,
we have applied ranking algorithm to find the score of each sentence. These scores
will be computed for each sentence of given datasets.
A Statistical Approach for Extractive Hindi Text Summarization Using … 279
In this step, summary generation has been done with help of the generated scores of
the sentences. The sentences with highest scores have been considered as the output
summary of the text documents.
We conduct our experiments on three datasets BBC News, CNN News articles, and
DUC 2004 documents which have been translated into Hindi for experiments. The
details of these datasets are shown in Table 2. The summary length chosen for this
work is 3, 10, and 15 for the BBC News dataset, CNN News dataset, and DUC 2004
datasets, respectively.
To obtain the accuracy for our training model, we use ROUGE-3 [33] for eval-
uation of the proposed method. This includes the comparison of summary gener-
ated from our approach and existing summaries. ROUGE-3 measures the overlap-
ping trigrams from predicted and reference summaries. By selecting the top ranked
sentences from the documents, we obtain the output summary.
To evaluate the performance of overall approach, we have tested our system on 25
documents from given datasets. These documents are preprocessed by the proposed
method. We have found 14 and 11 correct summaries from 25 documents for BBC
and CNN datasets, respectively. For DUC dataset, 7 correct summaries from 15
documents have been retrieved. These observations are summarized in Fig. 2. The
proposed approach achieves an accuracy of 56% for BBC News dataset, 44% for
CNN News dataset, and 46% for DUC 2004 dataset.
Table 3 shows the F-score of the 10 documents for BBC News and CNN
News datasets. F-score is measured by ROUGE-3 metrics extracted from machine-
translated Hindi text documents. The obtained results for BBC and CNN News
datasets have been shown in Figs. 3 and 4, respectively.
Fig. 2 Evaluation of
generated summaries
Table 3 F-score for BBC and CNN News datasets for translated summaries in Hindi
Documents BBC News CNN News
Bing Google Systran Bing Google Systran
D1 0.77 0.75 0.79 0.63 0.62 0.63
D2 0.31 0.31 0.24 0.91 0.93 0.91
D3 0.68 0.71 0.42 0.67 0.74 0.72
D4 0.59 0.57 0.65 0.92 0.81 0.93
D5 0.17 0.81 0.39 0.82 0.82 0.88
D6 0.53 0.89 0.82 0.58 0.42 0.54
D7 0.27 0.6 0.01 0.59 0.58 0.57
D8 0.97 0.48 0.89 0.92 0.88 0.89
D9 0.94 0.83 0.83 0.34 0.32 0.38
D10 0.65 0.59 0.46 0.16 0.14 0.15
Fig. 3 Comparison of MT engines for Hindi summary generation for BBC News dataset
A Statistical Approach for Extractive Hindi Text Summarization Using … 281
Fig. 4 Comparison of MT engines for Hindi summary generation for CNN News dataset
5 Conclusion
References
1. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
2. Maybury M (1999) Advances in automatic text summarization. MIT Press
3. https://www.kaggle.com/pariza/bbc-news-summary
4. https://www.tensorflow.org/datasets/catalog/cnn_dailymail
5. https://www.kaggle.com/datasets/usmanniazi/duc-2004-dataset
6. https://www.microsofttranslator.com
7. https://translate.goolge.com
8. https://www.systran.net/en/translate/
9. Aggarwal CC (2018) Machine learning for text, vol 848. Springer, Cham
10. Radev DR, Allison T, Blair-Goldensohn S, Blitzer J, Celebi A, Dimitrov S, Zhang Z (2004)
MEAD-a platform for multidocument multilingual text summarization
11. Abdulateef S, Khan NA, Chen B, Shang X (2020) Multidocument Arabic text summarization
based on clustering and Word2Vec to reduce redundancy. Information 11(2):59
12. Oufaida H, Blache P, Nouali O (2015) Using distributed word representations and mRMR
discriminant analysis for multilingual text summarization. In: International conference on
applications of natural language to information systems, pp 51–63
282 P. Gupta et al.
13. Kaljahi R, Foster J, Roturier J (2014) Semantic role labelling with minimal resources:
experiments with French. In: SEM@ COLING, pp 87–92
14. Kabadjov M, Atkinson M, Steinberger J, Steinberger R, Goot EVD (2010) NewsGist: a multi-
lingual statistical news summarizer. In: Joint European conference on machine learning and
knowledge discovery in databases, pp 591–594
15. Rani R, Lobiyal DK (2022) Document vector embedding based extractive text summarization
system for Hindi and English text. Appl Intell:1–20
16. Edmundson HP (1969) New methods in automatic extracting. J ACM (JACM) 16(2):264–285
17. Srivastava R, Singh P, Rana KPS, Kumar V (2022) A topic modeled unsupervised approach to
single document extractive text summarization. Knowl-Based Syst 246:108636
18. Yang K, He H, Al Sabahi K, Zhang Z (2019) EcForest: extractive document summariza-
tion through enhanced sentence embedding and cascade forest. Concurr Comput: Pract Exp
31(17):e5206
19. Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert
Syst Appl 68:93–105
20. El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) EdgeSumm: graph-based
framework for automatic text summarization. Inf Process Manage 57(6):102264
21. Patel A, Siddiqui T, Tiwary US (2007) A language independent approach to multilingual text
summarization. Large scale semantic access to content (text, image, video, and sound), pp
123–132
22. Gupta V (2013) Hybrid algorithm for multilingual summarization of Hindi and Punjabi
documents. In: Mining intelligence and knowledge exploration, pp 717–727
23. Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J, Del Fiol G (2014) Text
summarization in the biomedical domain: a systematic review of recent research. J Biomed
Inform 52:457–467
24. Al-Radaideh QA, Bataineh DQ (2018) A hybrid approach for Arabic text summarization using
domain knowledge and genetic algorithms. Cogn Comput 10(4):651–669
25. Koehn P (2010) Statistical machine translation. Cambridge University Press
26. Barzilay R, Lee L (2004) Catching the drift: probabilistic content models, with applications to
generation and summarization. arXiv preprint cs/0405039
27. Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization
branches out, pp 74–81
28. Jing H, Barzilay R, McKeown K, Elhadad M (1998) Summarization evaluation methods:
experiments and analysis. In: AAAI symposium on intelligent summarization, pp 51–59
29. https://www.nltk.org/nltk_data/
Semantic Parser Using
a Sequence-to-Sequence RNN Model
to Generate Logical Forms
Abstract Neural networks have been shown to replicate neural processing and in
some cases intrinsically show features of semantic insight. It all starts with a word;
a semantic parser converts words into meaning. Accurate parsing requires lexicons
and grammar, two kinds of intelligence that machines are just starting to gain. As
the neural networks get better and better, there will be more demand for machines
to parse words into meaning through a system like this. The goal of this paper is to
introduce the reader to a new method of semantic parsing with the use of vanilla or
ordinary recurrent neural networks. This paper briefly discusses how mathematical
formulation for recurrent neural networks (RNNs) could be utilized for tackling
sparse matrices. Understanding how neural networks work is key to handling some
of the most common errors that might come up with semantic parsers. This is because
decisions are generated based on data from text inputs. At first, we present a copying
method to speed up semantic parsing and then support it with data augmentation.
1 Introduction
A key area where natural language processing (NLP) is transforming the era of arti-
ficial intelligence is semantic parsing. The goal of a prevailing extensible semantic
parsing system is to have parsers and phrase-structure grammars in less of a clas-
sical AI fashion, but rather a more neural network-Esque approach where the parser is
provided much more inputs before coming up with an answer. Furthermore, recurrent
neural networks are predominantly used for prediction functions in speech recogni-
tion, handwriting recognition, and language understanding. The basic architecture
S. Jain
Maharaja Agrasen Institute of Technology, Delhi, India
e-mail: [email protected]
Y. Bhardwaj (B)
Jodhpur Institute of Engineering and Technology, Mogra, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 283
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_27
284 S. Jain and Y. Bhardwaj
of a recurrent neural network is a loop; although it does contain loops, it does not
constitute an infinite loop. In fact, the input to this network is based on its previous
values. The outputs of a recurrent neural neuron constitute the input to the next time
step. So, the natural question is, can we use RNNs to build an accurate semantic
parser?
Two major challenges stand in the way. First, semantic parsers must be able to
generalize to a large set of entities that may not appear during training.
Second, semantic parsers must understand compositionality: they must be able to
recognize hard alignments between fragments of utterances and logical forms and
know about the predictable ways in which these fragments can be combined. RNNs
do not intrinsically have a concept of compositionality and can only learn about these
crisp structural regularities by observing data.
In this paper, we present the first semantic parser that uses a sequence-to-sequence
RNN model to generate logical forms. Our contributions are twofold. First, we intro-
duce an attention-based copying mechanism that allows our RNN model to generalize
to unseen entities. Second, to teach the model about compositionality, we introduce
compositional data augmentation, which induces a high-precision grammar from the
training data and augments the training data with new examples sampled from this
grammar.
2 Literature Review
Recent literature acknowledges that current models for natural language processing
(NLP), though delivering progress, still have lots of room for improvement. And
we specifically find in recent work on deep recurrent neural networks (RNNs) with
predictive models like conditional random fields can help us to overcome these kinds
of limitations due to their robustness and performance [1].
Semantic parsers or RNNs are computer systems capable of understanding the
meaning of a human statement, returning a human-readable explanation of that
meaning—and then outputting a response [2].
They perform well in “sentiment analysis,” which is defined as estimating if these
sentences are to be interpreted as happy or sad [3]. The servers read in the sentences
and predict the embedding vectors, which are found digitally in relation with prob-
ability estimation process [4]. Using clustered vector goal projection for modeling
expressions to help spot words and phrases that could have multiple meanings due
to inserted space by prepositions, connectors, or transposed letters [5].
In natural language processing, semantic parser refers to the mechanism for under-
standing bi-tagged sentences which can offer a broader context and its representation
in the reading framework [6]. “Review” is guided by a sense inventory, which is done
separately as a strongly annotated corpus [7]. This review’s potential goals mainly
include analysis of recent research and state-of-the-art results on how to use neural
network architectures such as bi-directional recurrent neural network (BDRNN),
Semantic Parser Using a Sequence-to-Sequence RNN Model … 285
convolutional neural network (CNN), and recursive artificial neural network (RANN)
[8].
Semantic parsers have reached a threshold of competent human translation; now,
anybody can get a truly human interface without even speaking a word [9]. The self-
feeding of language corpora (or the iteration of data in machine learning models) is
the bread and butter of generative algorithms that are outside the strict purview of
neural networks. However, recent advances in recurrent neural network architectures
point to some new and fascinating applications [10].
Classically, recurrent neural networks (RNNs) have been mostly confined to
modeling numbers, making them ill-suited for research and industry alike when the
processing of natural languages based on more complex soft phenomena is desired.
Our study is focused on blending two different mechanisms through RNNs. First, we
alter the model so that it can more easily handle a particular type of crisp regularity:
words that can be copied from input to output. Second, we generate synthesized
training examples to teach the model about the rules that govern how smaller frag-
ments of language can be composed to form larger units. Understanding the trade-offs
between these two paradigms—designing new models and generating new data—is
an important open challenge.
3 Task
3.1 Datasets
One of the things we looked for when testing machine intelligence was how it would
score on 3 standard datasets:
• GeoQuery (GeoQuery) comes with 600 questions about US geography and is
paired with database queries. It has a split of training testing examples, meaning
the app includes 600 examples, including all the questions you will see during
training.
• Regular Expressions (Regular Expressions) contain natural language descrip-
tions of regular expressions paired with associated regular expressions. We
evaluate on a test set of 164 examples selected randomly from the dataset.
• ATIS (ATIS) Here, the query is translated into SQL, and the database is queried
with the corresponding result set.
286 S. Jain and Y. Bhardwaj
The scope of this research is limited to extracting knowledge from logical forms.
We, therefore, do not use any semantic parsing datasets that only include denotations,
such as WebQuestions.
4 RNN Model
We are using a standard recurrent neural network model that is backed by our generic
sequence-to-sequence framework. It combines existing neural machine translation
models with our novel copying mechanism.
At a high level, our system consists of two main modules:
1. Encoder Module. It transforms a string of words x 1 , …, x m into context-sensitive
representations b1 , …, bm , where each bi is a real-valued fixed-dimensional
vector.
2. Decoder Module. This module takes in the input sequence and context-sensitive
embeddings and generates a probability distribution over sequences y = y1 , …,
yn , where each yj ∈. It writes the output tokens one at a time, maintaining a hidden
state sj at each time j.
This can be further decomposed into four modules:
1. Initialization Module: Takes in the context-sensitive embeddings b1 , …, bm ,
and outputs the initial decoder hidden state s0 .
2. Attention Module: Takes in b1 , …, bm and the current state sj , and outputs an
attention score vector ej of length m.
3. Output Module: Takes in b1 , …, bm , sj , ej , and x, and outputs a probability
distribution for yj+1 , the next word to write.
4. Update Module: Takes in b1 , …, bm , sj , ej , and yj+1 , and outputs the new state
sj+1 .
Figure 1 illustrates how these modules are connected to form the overall RNN
model. In the next sections, we describe these modules in greater detail.
...
s0 e0 y1 s1
Semantic Parser Using a Sequence-to-Sequence RNN Model … 287
At each time step j, and for each word x i in the input, we compute an attention score
eji . We use the general content-based scoring function:
e ji = s Tj W (a) bi , (2)
AT each time step j, the scores ej from the attention module are converted to a
probability distribution over {1, …, m} with a softmax:
exp e ji
α ji = m , (3)
i=1 exp e ji
α ji is known as the attention weight, and can be interpreted as the amount of attention
paid to the i-th input word at time step j. Then, a context vector cj is computed as a
weighted average of the bi ’s:
288 S. Jain and Y. Bhardwaj
m
cj = α ji bi (4)
i=1
The current input vector vj+1 is computed as the concatenation of (yj+1 ) and cj ,
where is another word embedding function. Finally, the state is updated according
to the recurrence
s j+1 = LSTM v j+1 , s j . (5)
Finally, we describe two decoder output modules: a baseline module, and a more
sophisticated module that performs attention-based copying.
Baseline The baseline output module uses a simple softmax overall output vocab-
ulary words. At each time step j, it first computes the context vector cj , as in the
update module.
m
exp Mw s j + Uw c j + I[xi = w] exp e ji (7)
i=1
Semantic Parser Using a Sequence-to-Sequence RNN Model … 289
We define a total of three models (one main model and two baselines):
• Attention-Based Copying This is our full model with attention-based copying.
• Attention This is the same as attention-based copying, except with the baseline
output module.
• Encoder-Decoder This baseline is an encoder-decoder model that uses the base-
line decoder output module. This can be thought of as a variant of the attention
model where the decoder initialization module just returns s0 = h mF , and the
context vector cj is artificially set to always be 0.
4.7 Learning
We train the model using stochastic gradient descent. Gradients are computed
automatically using Theano.
The strength of deep learning models lies in their flexibility. However, this flexibility
also presents a challenge: because neural models make fewer assumptions about the
task, they can be at a disadvantage compared to specialized systems that have domain
knowledge baked in.
Our solution to this problem is to augment our training datasets with new examples
generated from the original training examples. This approach allows us to inject prior
knowledge into our system, as the new examples can be generated in a way that
leverages domain knowledge.
For semantic parsing, one important phenomenon to model is compositionality.
There are often hard alignments between fragments of the input and output, and
these units can be composed with each other in predictable ways. We, therefore,
propose a compositional data augmentation scheme that uses an induced grammar
to generate new, highly structured examples. We focus primarily on applying this to
the GeoQuery domain. More details are shown in Fig. 2.
This procedure begins by identifying high-precision alignments between pieces
of an utterance and associated logical form. First, for each (x, y) pair, there is a
trivial alignment that matches the entire utterance with the entire logical form (e.g.,
what states border Illinois? aligns to an entire logical form). We write some manual
rules to convert questions into noun phrases by stripping things like question marks
and “wh” words (e.g., to create states border Illinois). Finally, we match the entity
mentioned in the input and output based on simple string matching (e.g., Illinois).
290 S. Jain and Y. Bhardwaj
90
80
Accuracy (%)
70
60
50
No data augmentation
40
Regular expressions and ATIS have less nesting structure, making them less suited for
the compositional data augmentation scheme described above. However, we can still
use high-precision alignment rules to perform a simpler form of data augmentation.
We do this on regular expressions by looking for quoted strings and integers. We
generate new examples by swapping quoted strings and integers in one example for
other quoted strings or integers.
Semantic Parser Using a Sequence-to-Sequence RNN Model … 291
Note that unlike our synthesized examples for GeoQuery, these synthesized
examples are more like additional (non-independent) samples from the probability
distribution that generated the training data.
6 Experiments
We evaluate our system based on the following grounds. Denotation match assesses
the relevance of content in relation to a specific keyword. Match accuracy has to find
the similarity in their surrounding context. A regular expression or regular expres-
sions are an elegantly easy way of matching a single text string with an ordered
sequence. Unlike denotation match, this solution can evaluate the user’s response
based on character-level similarities.
First, we evaluate our system trained on the original dataset alone, with no data
augmentation.
Note that we are roughly competitive with the state-of-the-art on regular expres-
sions although the numbers are not directly comparable as the other work evaluates
different data. However, we lag behind on GeoQuery and ATIS.
We see that our compositional data augmentation improves our accuracy on
GeoQuery by more than two percentage points. In contrast, we do not see accu-
racy gains on regular expressions, where we performed a less compositional form of
data augmentation.
100
90
90
80
80
Acuracy (%)
Acuracy (%)
Acuracy (%)
70
80
60
60
70
40
50
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300
Number of additional training examples Number of additional training examples Number of additional training examples
7 Conclusion
Our research introduces the first sequence-to-sequence RNN model for semantic
parsing. Our model is easy to train and gives good accuracy on several semantic
parsing datasets when trained with logical form supervision. Furthermore, we
propose a compositional data augmentation scheme to inject prior knowledge about
compositionality into our model.
One limitation of our current approach is that it uses annotated logical forms as
supervision.
An alternative direction would be to incorporate the execution step itself into the
network. Our model includes a novel attention-based copying mechanism to deal
with unseen words such as entity names. Our attention-based copying can be used
for both rare and common words, so our model can learn when it is best to perform
copying.
We used a small set of high-precision manual rules to perform data augmentation.
It is possible that an automatic grammar induction approach could expand the recall
of our grammar while keeping precision high.
Our experiments on artificial data show that compositional data augmentation can
help the model learn even when the new examples look different than the examples
seen at test time.
Tree-structured recursive neural networks leverage the structure of a syntactic
parse tree to compositionally build representations of sentences. Their focus on
soft representations contrasts with our goal of modeling hard relationships between
fragments of sentences and logical forms.
References
8. Lukovnikov D (2022) Deep learning methods for semantic parsing and question answering
over knowledge graphs. Ph.D. dissertation, Universitäts und Landesbibliothek Bonn, 2022
9. Marton G, Bilotti MW, Tellex S. Why names and numbers need semantics
10. Yang L, Liu Z, Zhou T, Song Q (2022) Part decomposition and refinement network for human
parsing. IEEE/CAA J Automatica Sinica 9(6):1111–1114
NFF: A Novel Nested Feature Fusion
Method for Efficient and Early Detection
of Colorectal Carcinoma
Abstract Colorectal cancer is one of the most common cancer types and causes of
death due to cancer in the world. Wireless curated endoscopy is used to diagnose
and classify colorectal carcinoma. However, the major drawback of wireless curated
endoscopy is that it presents many images to be analyzed by the medical practitioner.
Therefore, many studies have been performed to automate the detection and classi-
fication of colorectal carcinoma using machine learning and deep learning models.
Studies vary from traditional image classification techniques to image processing
algorithms combined with data augmentation combined with pre-trained neural net-
works for early detection and type classification of colorectal carcinoma. In this
manuscript, we proposed a novel nested feature fusion method to fuse the deep fea-
tures extracted by the pre-trained EfficientNet family to devise an approach for early
detection and classification of colorectal carcinoma. We have used the WCE curated
colon disease dataset, which consists of 4 classes: normal, ulcerative colitis, polyps,
and esophagitis. Our proposed method and experimental results outperformed com-
pared to the state of the art with the fused model showing an accuracy of 94.11%.
Medical centers can use the proposed method to detect colorectal cancer efficiently
in real life.
1 Introduction
Colorectal carcinoma (CRC) is ubiquitous and is the underlying cause of death due to
cancer worldwide [1, 2]. Unfortunately, colorectal carcinoma is mainly discovered in
very late stages in patients for its effective treatment [3]. Mainly, colonoscopy is used
to detect the various types of CRCs. However, such methods also impose risks to the
patient, such as bleeding, negative consequences of sedation, colonic perforation, and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 297
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_28
298 A. K. Dwivedi et al.
other clinical risks [4, 5]. Furthermore, due to wide-ranging variation in data from
one patient to another, traditional learning methods of diagnosis are not extremely
reliable [6].
Biomedical image processing is the mainstay of scientific research and an essential
part of medical care, which is being highly sought after in the field of deep learning [7].
Although clinical detection of diseases based on traditional medical imaging methods
has provided factual accuracy, developments in machine learning have pushed deep
learning research developments in biomedical medical imaging [6].
To augment the process of colorectal carcinoma detection, a tremendous amount
of research is focused on detecting CRCs through medical image processing and
computer-aided diagnosis.
Machine learning methods have provided accurate classification and prediction
abilities and have been deployed to be used for the diagnosis and prognosis of various
medical ailments and health conditions due to their data-backed method of analysis,
which unifies diverse risk factors into a classification/prediction algorithm [8–10].
However, deep learning methods are more effective than conventional machine learn-
ing methods due to their ability to process a high number of available samples during
the training stage [11], their ability to execute feature engineering on its own, and
their need for less human intervention while training which is highly suitable for
datasets with a large number of samples. Furthermore, deep neural network mod-
els and frameworks can be retrained using a custom dataset compared to traditional
computer vision algorithms, which are highly domain-specific. This provides much
flexibility in deep learning compared to traditional machine learning algorithms [12].
With deep learning, an image dataset with object classes annotated to each image
is presented to the machine to facilitate end-to-end learning [13], which is, in com-
parison with traditional computer vision techniques where parameters have to be
fine-tuned by the CV engineer, is much easier.
The remaining contents of the proposed experimentation can be summarized as
follows: Sect. 2 briefs about the previous academic works of various scholars in
detecting colorectal carcinoma. Section 3 explores EfficientNet models, other deep
learning strategies, and the materials and methods used. Section 4 describes the deep
feature extraction and model training. Finally, Sect. 5 presents the experimentation
and results from the mentioned experimentations.
2 Related Works
A variety of research has been performed on the automated detection and classifi-
cation of colorectal cancer using machine learning and computer vision algorithms.
Recently, deep learning has become the state-of-the-art approach for performing the
classification of colorectal cancer due to its current popularity in biomedical image
classification experimentations.
The study presented by Jesmar et al. proposed a model that integrates EfficientNet,
MobileNetV2, and ResNetV2 into a single feature extraction pipeline called multi-
NFF: A Novel Nested Feature Fusion Method … 299
WCE curated colon disease dataset deep learning is an image dataset for gastroin-
testinal tract or simply, a colon disease image dataset [19, 20]. These are images of the
gastrointestinal tract captured during the procedure of wireless capsule endoscopy,
which in the scope of current experimentation, will be used to devise a deep learning
model for the early detection of colorectal carcinoma. The dataset contains 6000 col-
ored images and the dataset contains four classes: Normal, ulcerative colitis, polyps,
and esophagitis as given in Table 1.
Data preprocessing is an essential step for deep learning model training. It outlines
the processes required to alter or encode data so the model can parse it effectively. In
neural networks, the model expects the input image to be the same size. However, the
images gathered are not the same size or form. The images in our dataset originally
ranged in size from 400 × 300 to 936 × 768 pixels. We converted all the images
into a common size of 128 × 128 pixels as a preprocessing step before training
because the dataset’s images were not homogeneous and came in varied sizes. After
applying RGB reordering to all images, the model’s final input was delivered as a
128 × 128 × 3 matrix.
While downscaling the images, we can sometimes lose some vital information, so
this has to be done carefully by observing the dataset. For example, suppose we have
a dataset of MRI scans for brain tumor classification. In that case, if we downscale the
images to a minimal size, the tumor will almost disappear from MRI scans, which can
impact training accuracy. Also, resizing the image to a very large size like 512 × 512
can exceed the GPU memory. Therefore, to make it both memory efficient and not
lose any critical information from the image, we have to choose the best image size
based on the experiments.
We scaled and ran our trials on all 128 × 128, 196 × 196, and 256 × 256 image
sizes in this study, and we found that the accuracy is similar for all three image
sizes. However, training time is considerably shorter on 128 × 128, saving significant
computational efforts.
A deep learning model may obtain a 99% accuracy rate, but it fails when evalu-
ated on real-world images. In order to prevent model selection bias and overfitting,
it is ethical to divide the dataset into training, validation, and testing sets. Further-
more, our parameter estimations are more variable when we have a scant amount
of data. Similarly, our performance measure will be more variable if we have fewer
testing data. As a result, we should split the data so that no variances are exces-
sive.
Adding more data to the final testing set ensures the method’s resilience and
minimizes the chance of failure in real-world tests. As a result, as given in Table 2,
we partitioned the entire dataset into three sections: 70% training, 10% validation,
and 20% testing.
302 A. K. Dwivedi et al.
Transfer learning was initially discussed in the NeurIPS (Conference on Neural Infor-
mation Processing System), which talked about using previously learned knowledge
to augment further future learning. Deep transfer learning (DTL) combines deep
learning architecture with transfer learning. Deep neural networks (DNNs) provide
a powerful way to learn features, making them useful in feature-based transfer learn-
ing. Methods based on latent feature spaces utilize DNNs to discover a common
latent feature space where both source and target data can exhibit the same probabil-
ity properties. Consequently, the source data can be used as a training set for target
data in the latent feature space, which improves the model’s performance with target
data [21].
3.5 EfficientNet
EfficientNet is a simple convolutional neural network that is known for its profound
effective compound scaling method that helps researchers to scale up a convolu-
tional neural network to any target resource constraints in a highly fundamental way,
quickly. Unlike other architectures, EfficientNet uniformly scales network resolu-
tion, depth, and width. EfficientNets are also highly used in transfer learning which
is why they are being used in the scope of this experiment [22].
In order to construct a CNN, you need to extract features and classify them. The
model’s first layers may be considered as descriptors of image features, whereas the
latter layers are associated with specific categories. In feature extraction, many con-
volution layers are utilized, followed by max-pooling and an activation function. A
fully connected layer and a softmax activation function are standard components of a
classifier. Since the number of classes in a dataset is directly proportional to the num-
ber of features in a model to learn, to learn complex features, the feature extraction
component of the convoluted neural network should be more complex and deeper.
NFF: A Novel Nested Feature Fusion Method … 303
The loss function is used to measure the deviation of the estimated value from the
true value. It is a computational procedure to assess how the algorithm used models
the data. In this experiment, cross-entropy loss function is used because of its ability
304 A. K. Dwivedi et al.
to increase in magnitude when predicted probability skews from the actual results.
The following mathematical Eq. 1 explains the computation of the cross-entropy loss
function:
n
L CE = − ti log ( pi ) , for n classes, (1)
i=1
where ti is the truth label and pi is the Softmax probability for the ith class.
The softmax classifier is an output function that outputs the probabilities for each
class label in the form of a vector. It is usually used for multi-class classification
purposes. Softmax function is defined in Eq. 2.
e zi
σ (z)i = K (2)
j=1 ez j
Learning rate decay is an actual practical technique that is used to instruct mod-
ern neural networks. It initializes with an enormous learning rate and then declines
multiple times: Decomposition of learning rate—decay. It is used to enhance opti-
mization and generalization in the experimentation process. Learning rate decay can
be time-based, step-based, and exponential.
All the models mentioned in the proposed research were implemented with Tensor-
Flow in Python. Further, Kaggle was used to train the models mentioned, with the
following specs - GPU Tesla P100-PCIE-16GB compute capability: 6.0 and 16 GB
GPU RAM.
NFF: A Novel Nested Feature Fusion Method … 305
Loss Curve
B0 B1 B2 B3 B4 B5 B6 B7
0.8
0.6
Loss
0.4
0.2
0.0
200 400 600 800 1000
Epochs
The first and most crucial step in constructing a deep learning model is to define the
network architecture. We prefer to use pre-trained networks to extract deep features
as they have been initially trained on a large-scale ImageNet dataset. Therefore,
we save a lot of computational power when adjusting weights to match our WCE
dataset. In this study, we have used pre-trained networks of the EfficientNet family
for feature extraction. The extracted deep features were then trained with a multi-
layer perceptron network with a softmax activation function. The accuracy achieved
on each of the networks is reported in Table 3. The loss and accuracy curve of the
training of EfficientNet family are shown in Figs. 3 and 4, respectively.
306 A. K. Dwivedi et al.
Accuracy Curve
B0 B1 B2 B3 B4 B5 B6 B7
1.00
0.95
Accuracy
0.90
0.85
0.80
0.75
200 400 600 800 1000
Epochs
Three classifiers are required to generate the fusion model. After working with the
whole EfficientNet family, it was discovered that EfficientNetB1, EfficientNetB2,
and EfficientNetB4 provided the best testing accuracy. As a result, Fused Model 1
was created by combining EfficientNetB1 and EfficientNetB2, while Fused Model
2 was created by combining EfficientNetB2 and EfficientNetB4. Furthermore, we
have fused models 1 and 2 together to generate our final nested fusion model.
On the test dataset, combining the EfficientNetB1 and EfficientNetB2 generated
an accuracy of 93.43%, while combining the EfficientNetB2 and EfficientNetB4
gave an accuracy of 93.63%. Finally, when the previous two fused models were
combined, an accuracy of 94.11% was achieved on the test dataset as given in
Table 4. The loss and accuracy curve of the training of fusion models are shown
in Fig. 5. The confusion matrix and AUC-ROC plots of each fusion model are shown
in Fig. 6.
Fig. 5 Loss and accuracy curve of the training of final fused model
References
1. Ponzio F, Macii E, Ficarra E, Cataldo SD (2018) Colorectal cancer classification using deep
convolutional networks. In: Proceedings of the 11th international joint conference on biomed-
ical engineering systems and technologies, vol 2, pp 58–66
2. Matthew F, Sreelakshmi R, Tatishchev Sergei F, Wang Hanlin L (2012) Colorectal carcinoma:
pathologic aspects. J Gastrointest Oncol 3(3):153
3. Wan N, Weinberg D, Liu T-Y, Niehaus K, Ariazi EA, Delubac D, Kannan A et al (2019) Machine
learning enables detection of early-stage colorectal cancer by whole-genome sequencing of
plasma cell-free DNA. BMC Cancer 19(1):1–10
4. Young Patrick E, Womeldorph Craig M (2013) Colonoscopy for colorectal cancer screening.
J Cancer 4(3):217
5. Su H, Lin B, Huang X, Li J, Jiang K, Duan X (2021) FFNet: multi-branch feature fusion
network for colonoscopy. Front Bioeng Biotechnol 515
6. Razzak MI, Naz S, Zaib A (2018) Deep learning for medical image processing: overview,
challenges and the future. Classification BioApps 323–350
7. Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier KH (2021) nnU-Net: a self-configuring
method for deep learning-based biomedical image segmentation. Nature Methods 18(2):203–
211
8. Liyan P, Guangjian L, Fangqin L, Shuling Z, Huimin X, Xin S, Huiying L (2017) Machine
learning applications for prediction of relapse in childhood acute lymphoblastic leukemia. Sci
Rep 7(1):1–9
9. Konstantina K, Exarchos Themis P, Exarchos Konstantinos P, Karamouzis Michalis V, Fotiadis
Dimitrios I (2015) Machine learning applications in cancer prognosis and prediction. Comput
Struct Biotechnol J 13:8–17
10. Passos IC, Mwangi B, Kapczinski F (2016) Big data analytics and machine learning: 2015 and
beyond. Lancet Psychiatry 3(1):13–15
11. Dinggang S, Guorong W, Heung-Il S (2017) Deep learning in medical image analysis. Annual
Rev Biomed Eng 19:221
12. O’Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Riordan
D, Walsh J (2019) Deep learning vs. traditional computer vision. In: Science and information
conference. Springer, Cham, pp 128–144
13. Montalbo Francis Jesmar P (2022) Diagnosing gastrointestinal diseases from endoscopy images
through a multi-fused CNN with auxiliary layers, alpha dropouts, and a fusion residual block.
Biomed Signal Process Control 76:103683
14. Poudel S, Kim YJ, Vo DM, Lee S-W (2020) Colorectal disease classification using efficiently
scaled dilation in convolutional neural network. IEEE Access 8:99227–99238
15. Khan MA, Kadry S, Alhaisoni M, Nam Y, Zhang Y, Rajinikanth V, Sarfraz MZ Computer-
aided gastrointestinal diseases analysis from wireless capsule endoscopy: a framework of best
features selection. IEEE Access 8:132850–132859
16. Juan S, Aymeric H, Olivier R, Xavier D, Bertrand G (2014) Toward embedded detection of
polyps in wce images for early diagnosis of colorectal cancer. Int J Comput Radiol Surgery
9(2):283–293
NFF: A Novel Nested Feature Fusion Method … 309
17. Fan S, Lanmeng X, Fan Y, Wei K, Li L (2018) Computer-aided detection of small intestinal
ulcer and erosion in wireless capsule endoscopy images. Phys Med Biol 63(16):165001
18. Chenjing C, Shiwei W, Youjun X, Weilin Z, Ke T, Qi O, Luhua L, Jianfeng P (2020) Transfer
learning for drug discovery. J Med Chem 63(16):8683–8694
19. Pogorelov K, Randel KR, Griwodz C, Eskeland SL, de Lange T, Johansen D, Spampinato C
et al (2017) Kvasir: a multi-class image dataset for computer aided gastrointestinal disease
detection. In: Proceedings of the 8th ACM on multimedia systems conference, pp 164–169
20. Juan S, Aymeric H, Olivier R, Xavier D, Bertrand G (2014) Toward embedded detection of
polyps in wce images for early diagnosis of colorectal cancer. Int J Comput Radiol Surgery
9(2):283–293
21. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–
1359
22. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks.
In: International conference on machine learning. PMLR, pp 6105–6114
Arrhythmia Classification Using
BiLSTM with DTCWT and MFCC
Features
Abstract Heart disease is the number one causes of mortality all over the world.
Electrocardiogram (ECG) is a valuable powerful tool in the diagnosis of cardiac
disorders and detection of arrhythmia. In this study, a new feature set proposed
by combining MFCC and DTCWT-based feature for accurate identification and
classification of arrhythmia. First, various filters and wavelet transform are used
to remove noise from the ECG signals. Latter, R-peak locations are then detected,
and ECG segments are generated. From these ECG segments, MFCC and DTCWT-
based features were extracted and provided to BiLSTM to implement the classifica-
tion. The arrhythmia classification carried out according to the Association for the
Advancement of Medical Instrumentation (AAMI) criteria. Our model attained an
average sensitivity of 94.59%, precision of 94.97%, and overall accuracy of 99.12%
on class-oriented arrhythmia classification scheme.
1 Introduction
According to the WHO, cardiovascular disease (CVD) is the leading cause of death
worldwide [1]. Heart disease is very difficult to cure in the later stages. Therefore, it
is important to diagnose and treat cardiovascular disease in advance.
One type of heart disease is arrhythmia. It is a disorder of the frequency or rhythm
of heartbeats [2]. During arrhythmia, the heart may not be able to pump enough
blood to the body. Due to this circulatory failure, the brain, heart, and other organs
may be damaged and can lead to death. Types of arrhythmia are broadly classified
into two categories. The first category includes life-threatening arrhythmias such as
tachycardia and ventricular fibrillation. These arrhythmias need prompt defibrillator
S. Munawar (B)
Annamalai University, Chidambaram, India
e-mail: [email protected]
A. Geetha · K. Srinivas
CSE, CMR Technical Campus, Hyderabad, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 311
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_29
312 S. Munawar et al.
therapy. Even though the other group contains arrhythmia that may not be imme-
diately life-threatening, they require apt treatment or therapy to avoid additional
complications in future [3].
ECG is an important modern medical tool that can record the process of cardiac
activity. A careful examination of ECG can help to diagnose a cardiac function
issue [4]. The occurrence of abnormal beats in ECG may not be regular. So, ECG
signals need to be monitored for long duration. Monitoring such a big volume of data
manually is not practicable [5]. As a result, automated approaches for ECG signal
processing and analysis are essential.
Arrhythmia classification from ECG typically consists of three stages: prepro-
cessing the ECG signal, extracting features from the preprocessed signal, and clas-
sifying arrhythmia beats using machine learning techniques [6]. The preprocessing
step is primarily concerned with detecting and attenuating unwanted frequencies
from the ECG signal. Latter, features are extracted from preprocessed ECG signals.
The extracted features can be frequency based, statistical based, ECG morphology
based, or auto-extracted. The collected features are then supplied as input into
machine learning-based classification algorithms. Deep learning (DL) is a high-
performance and effective machine learning algorithm that is gaining popularity. DL
is frequently employed in image processing, signal processing, voice and natural
language processing operations. Actually, DL is a neural network topology that uses
additional hidden layers to handle deeper feature levels to improve classification
performance [7].
The aim of this paper is to classify the arrhythmia beats according to AAMI
standards. Initially, the ECG signal is denoised and then features extracted using dual-
tree complex wavelet transform (DTCWT) and Mel-Frequency Cepstral Coefficient
(MFCC). These features are feed to BiLSTM to classify the beat type.
The rest of this paper is organized as follows. The ECG database used in proposed
work is introduced in Sect. 1. Section 2 covers noise removal from the ECG signal
and obtaining ECG segments, while Sect. 3 describes the feature extraction process.
Section 4 outlines the proposed model, as well as its training process and parameters.
Section 5 presents results and discussion, and finally, conclusion of the article is
presented in Sect. 6.
2 Database Used
The MIT-BIH Arrhythmia Database [8] is the widely used and openly available ECG
database for heartbeat classification. This database is used to assess. The database
has 48 ECG recordings such that each with duration of half-hour with a sampling
rate of 360 Hz. According to the AAMI standards, the fifteen approved arrhythmia
classes from the MIT-BIH Arrhythmia Database are divided into five super-classes
[9]. These are: N (normal), V (ventricular), S (superventricular), F (fusion), and Q
(unclassified) beats. The performance of the proposed ECG classification model is
assessed using these five AAMI class beats.
Arrhythmia Classification Using BiLSTM with DTCWT and MFCC … 313
3 Preprocessing
Detecting R-peak location and forming ECG segments are crucial for arrhythmia
beat classification performance. However, detecting R-peaks position is beyond the
scope of this work. The already indexed R-peak locations in each ECG record from
MIT-BIH arrhythmia database is used in this work. An ECG segment having 359
samples to the left and 360 samples to the right is created from each indexed R-
peak. In another way, each ECG segment has 720 samples or data of two seconds
duration. Our method largely mimics the way doctors scans the ECG. On the other
hand, compared to previous ECG segmentation strategies, each segment obtained
in this work always has more ECG data than a single heartbeat cycle. However,
this segmentation strategy requires additional processing time to train the proposed
model, while capturing hidden ECG features to improve classification performance.
314 S. Munawar et al.
Fig. 1 a Sample raw ECG signal. b After removal of baseline wander. c After removal of high-
frequency noise. d After removal of power-line interference
4 Feature Extraction
(iv). Apply 1D FFT to the absolute coefficients values and compute the logarithm
of the Fourier spectrum.
where and f is the filter-bank input and Mel is the output. 700 and 2595 are
the predefined values that have been used by many researchers.
(iii). Calculate the N features with discrete cosine transform (DCT) to generate the
MFCC.
5 Proposed Methodology
In the classification phase, we have used BiLSTM for classifying the arrhythmia
types. The best architecture of the BiLSTM is usually obtained using a trial-and-error
process. Therefore, after running many simulations, the architecture of BiLSTM clas-
sifier fixed with two layers of BiLSTM, each containing 50 hidden units followed
by a flatten layer. Then follows two dense layers such that the first one contains
128 neurons with “RELU” as activation function, and the second one contains 5
neurons with “softmax” activation function. The second dense layer gives classifica-
tion output. The proposed model focuses on solving the objective function in terms of
maximizing the accuracy, sensitivity, and precision of the arrhythmia classification.
The aim of the developed model is indicated in Eq. (2).
1
F2 = argmin (2)
{HNblstm
b ,epblstm
c } acr + sen + prc
316 S. Munawar et al.
Fig. 2 Confusion matrix of BiLSTM with a MFCC features, b DTCWT features, c MFCC +
DTFCT features
6 Conclusion
This study investigated the use of a BiLSTM classifier to classify ECG beats accu-
rately. A robust approach is proposed for cardiac arrhythmia identification and classi-
fication using MFCC and DTCWT time–frequency-based features. The classification
scheme started with denoising of ECG signals and extracting important morpholog-
ical features using MFCC and DTCWT. The combined features are provided as input
to BiLSTM classifiers to perform classification of the arrhythmia according to AAMI
standard. The results show that the BiLSTM classifier has the best detection accuracy
of 99.12%, indicating its superiority in detecting cardiac arrhythmia. As a result, the
presented automated approach can be used to detect cardiac arrhythmias effectively.
References
1. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
2. Essa E, Xie X (2021) An ensemble of deep learning-based multi-model for ECG heartbeats
arrhythmia classification. IEEE Access 9:103452–103464
3. Karraz G, Magenes G (2006) Automatic classification of heartbeats using neural network classi-
fier based on a Bayesian framework. In: 2006 international conference of the IEEE engineering
in medicine and biology society. IEEE
4. Pandey SK, Janghel RR (2019) ECG arrhythmia classification using artificial neural networks.
In: Proceedings of 2nd international conference on communication, computing and networking.
Springer, Singapore
5. Acharya UR et al (2017) A deep convolutional neural network model to classify heartbeats.
Comput Biol Med 89:389–396
6. Ebrahimzadeh A, Khazaee A (2009) An efficient technique for classification of electrocardio-
gram signals. Advances in Electrical and Computer Engineering 9(3):89–93
7. Cai J et al (2021) Real-time arrhythmia classification algorithm using time-domain ECG feature
based on FFNN and CNN. Mathematical Problems in Engineering 2021
8. Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med
Biol Mag 20(3):45–50
9. Yang H, Wei Z (2020) Arrhythmia recognition and classification using combined parametric
and visual pattern features of ECG morphology. IEEE Access 8:47103–47117
10. Jagtap SK, Uplane MD (2012) A real time approach: ECG noise reduction in chebyshev type
ii digital filter. International Journal of Computer Applications 49(9)
11. Mogili R, Narsimha G (2021) A study on ECG signals for early detection of heart diseases
using machine learning techniques. J Theor Appl Inf Technol 99(18):4412–4424
12. Yang Y et al (2014) Dual-tree complex wavelet transform and image block residual-based
multi-focus image fusion in visual sensor networks. Sensors 14(12):22408–22430
13. Yusuf SAA, Hidayat R (2019) MFCC feature extraction and KNN classification in ECG signals.
In: 2019 6th international conference on information technology, computer and electrical
engineering (ICITACEE). IEEE
Anomaly-Based Hierarchical Intrusion
Detection for Black Hole Attack
Detection and Prevention in WSN
Abstract The wireless sensor network (WSN) is the network of sensors which
might be deployed in the surroundings for sensing any kind of physical phenomenon.
Further sensed data is transmitted to base station (BS) in order to processes the data.
During data processing, routed data security is most vital and is very challenging in
WSN. The black hole is a most malicious attack, and it targets the routing protocols of
sensors. This type of attack may have devastating impacts over hierarchical routing
protocols. In this paper, anomaly-based hierarchical intrusion detection for black
hole attack detection and prevention in WSN is presented. The black hole attack
may happen if the intruder catches and reprogrammed a node set in a network for
blocking the packets as opposed to transmitting them to the BS in WSN. Here, the
active trust routing model concept is utilized for defining the black hole attacks in data
packets routing. The results can demonstrate that, this presented system enhances
the security with the prolonged life time of network and less energy utilization and
high-efficiency throughput and packet delivery ratio (PDR) the life time of network.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 319
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_30
320 V. N. Kumar et al.
1 Introduction
WSN is a low-cost network which contains smaller sensing devices, namely sensors.
The sensor nodes have a unique identity with the capabilities of sensing, processing
and sharing the information to other devices. The WSNs might be limited to small
sensing device components, e.g., temperature sensing device to the most critical
and complex jet-engine parts. The WSN is a self-organized and least-cost devices
network. Such devices utilize actuators and sensors which can minimize the inter-
actions of human. Smart home appliances such as air conditioners (AC) adjust
the temperature of room by sensing the temperature of room. Motion detection
devices can alert the user about suspicious activities. The nodes of WSN are low
cost and simpler for deployment through the wireless medium to communicate.
However, the sensor nodes can be limited in terms of battery power, computations and
processing. These devices are not protected by traditional cryptographic algorithms.
This resource-constrained behavior and wireless medium make them as vulnerable
to various attacks.
The sensor nodes would be deployed in an unattended and hostile field in
which nodes might always be prone to various security attacks. The WSN can be
most susceptible to the security breaches because of its inherent nature, limited
resources, unattended hostile environment and open environment. Security is one of
the most vital threats among all other aspects in a network. Earlier security methods
are not much effective because of their limitations such as energy, memory and
nodes accessing after deployment. However, the security aspect is one of the most
challenging issues which will deserve much attention in WSNs.
The data packets routing from source to the sink via network have gained more
attention from the researcher’s in WSN fields. One of the major impacts is energy
sources limitation since energy is a vital fundamental element in routing protocols
designing. In addition, to lessen the same data superfluous transmissions, the sensed
data coalition will be required to be considered in the routing protocols of WSN [1].
Many of the present routing protocols may endeavor at the parameters such as respon-
siveness, energy preservation, robustness and reliability. But, the feasible security
obstacles non-forbearance in routing is perilous since in most of the application fields
where WSNs can be utilizes the sensors node will be deployed in unfavorable and
inimical environments, provide the opportunities to antagonist for launching some
attacks against the sensor nodes.
The security solutions such as key management or cryptography and authentica-
tion would improve the protection in WSNs; however, such solutions alone won’t
forest all achievable attacks. A greater variety of attacks can be introduced with
the compromised nodes in WSN nodes which can seem as legitimate inside the
network but do not operating for third party; hence, a defense system such as intrusion
detection system (IDS) is needed.
The security attacks in WSNs can be categorized as passive and active. In passive
attacks, generally, the attacker can be disguised (cover up) and may tap the related
connection for gathering the information or devastates the system working elements.
Anomaly-Based Hierarchical Intrusion Detection for Black Hole Attack … 321
The active attacks can be classified as jamming, Sybil types, denial-of-service (DoS),
flooding and hole attacks (sinkhole, wormhole) [2].
In the black hole attack, a malicious node attacks the entire traffic. While adver-
tising that it possess shorter direction in a network. Thus it generates a symbolic
black hollow with a malicious node or an adversary with inside the middle. The
black hollow dropped complete packets which might be acquired from different
nodes. During this attack, a compromised node is trying for pulling the entire traffic
from the encompassing nodes. This compromised node creates the false direction
data to the neighborhood nodes. This should occupy the entire traffic to the mali-
cious node. A malicious node publicized that it contains high remaining energy.
While publicizing this, the malevolent node obtained as cluster head (CH) in every
round. All nodes transmit the packets to malicious nodes because it goes as CH. The
malicious nodes collect the entire packets and do not send to BS.
Further, the paper is arranged as: Sect. 2 demonstrates the literature corresponds
to presented work, and Sect. 3 discussed about the Black hole Attack Detection
and Prevention in WSN. Section 4 discussed about presented security solutions
performance, and finally, paper is concluded in Sect. 5.
2 Literature Survey
Liu et al. [3] presented a new security and trust routing system relying on active
detection. This system achieved higher scalability, anticipation and successful routing
security. This active trust system might be able for sensing a nodal trust and even able
for stopping doubtful nodes. In addition, design system is greatly energy effective; it
utilizes silt energy to pro-create the multiple detecting routs. The author carried out
a test run for results verification. Das et al. [4] presented an algorithm and made the
dynamic formation of cluster and CHs based on the nodes distance from the cluster
node using a genetic algorithm and sensor nodes trust. The cluster information is
passed to each node, and after that real-time routing may appear. Motamedi and
Yazdani [5] proposed unmanned aerial vehicle (UAV) to find black hollow assaults
in WSN. In a black hollow attack, a terrible node shows that the direction to the
vacation spot is brief and can be possible, that could appeal to a huge quantity of
visitors and drop the whole packet. The device makes use of the UAV to validate the
node and makes use of the sequential chance ratio take a look at version as a dynamic
threshold device to avoid malicious nodes.
Geethu and Mohammed [6] designed a novel multipath transmission system. This
built-up method is used as protection approach in opposition to selective forwarding
attacks. In this system, during routing time, when a node senses the packet then that
packet would be passed through an alternate node. Due to the resending method,
the routing mechanism reliability can be maximized. Satyajayant et al. [7] presented
multiple BSs to improve the data in black hole attacks presence. But, these multiple
BSs produce additional overhead and increased the memory cost and communication.
322 V. N. Kumar et al.
In addition, the black holes strategic position not taken, the region of black hole which
is closed to base station captures all the packets with higher probability.
Tan et al. [8] present a new model for achieving the confidentiality in multi-
hop code destination. In multi-hop protocol, the authors integrated confidentiality as
well as DoS-attack resistance. Depending on Deluge, state-of-art code dissemination
protocol and an open source is presented for WSNs. They also provided the evaluation
of performance in this approach with real Deluge and current secured Deluge.
The flow diagram of anomaly-based hierarchical intrusion detection for black hole
attack detection and prevention in WSN is represented in Fig. 1. This system contains
two major protocols, namely data routing protocol and active detection routing
protocol.
Different types of attacks like data type attack, selective forwarding attack and
black hole attacks will be detected and prevented using this system. First, the network
is implemented while entering the number of nodes. Next, the user selects source
and destination. All the possible multiple paths would be computed from source to
destination after the generation of network. The detection packet (DP) is transmitted
via each path. The DP consists of certain data which is as follows:
where the path length defines the number of hopes to the destination.
If destination receives the DP, then every node in the path transmits feedback path
(FP) to the source node.
Here, the threshold must be calculated for each path, and the path with the lowest
threshold is considered the safest path for routing data. To achieve this, each node
contains its own trust value and is calculated as follows:
w
BA (ti )
NodeTrust = {C A =B
, . . . w = 0 (1)
i=1
BA (ti ). hwi
NodeTrust = 0, . . . w = 0 (2)
For every node, the distance between the node and destination would be calculated.
The threshold value is computed for every node by the equation which is as follows
Trust
X = ThresholdNode = . (3)
Distance
Anomaly-Based Hierarchical Intrusion Detection for Black Hole Attack … 323
Network deployment
If E of trusted
No
path is sufficient
Select other
Yes
trusted path
No Yes
If a node selected as CH
> Max
Stop
Using this equation, the threshold will be calculated for every path.
ThresholdPath = X (4)
from node0ton
The above formula is the sum of the thresholds for all the nodes in the path.
Finally, the path with the lowest threshold is adopted as the safest and most reliable
path for routing data.
324 V. N. Kumar et al.
After the selection of route, the path data is transmitted via that path, and it is
expressed as:
Whenever a node gets the information packet, then it verifies the information with
the data of packet. If in case, the information doesn’t healthy, then it is far decided that
information kind assault is took place in pervious node, in such cases the node drops
that packet and transmits rest of the packets to further nodes. In selective forwarding
attack, if the data size doesn’t match, then the node will recover the data from past
node which is attacked by the attacker. Thus, the packet loss ratio is less in presented
system.
In each cluster of sensor nodes, a node is selected as a local base station for fixed
time duration and is selected as the cluster head for that particular cluster. The sensor
node sends its sensor information only to related CH. If all the members of cluster
are communicating with single node which is their CH, thus the CH requires more
transmission and computation than sensor node members. The LEACH protocol
utilizes the rotation of CH randomly between the sensor nodes for avoiding the
cluster head rapid dying. Thus all sensor node energy is equally consumed, and alive
time of networks is increased. Using local data fusion technique at each cluster head,
compressed data is transmitted to BS by each CH. The CH selection is based on
energy probability distribution in which the CH nodes would broadcast their status
as being CH to all the sensor node in sensing networks, so that every member node
can know about the cluster head that it belongs. The formation of cluster is done
as per the strength of signal. The leach protocol is used to observe that how many
times a specific node is became as CH in overall duration. If a CH is repeated more
than the highest limit, then the network is in black hole attack. The BS transmits an
alert packet to all the sensor nodes. Or else transmission of data is done successfully
across the network.
4 Result Analysis
a minimum, and limited power resources have reached the end of their life. Energy
consumption for sending a k-bit message at distance d:
where E T x is the total energy consumed during the transmission of data, ∈amp is the
energy of amplifier, d indicates the distance, k is a constant and Eelec is transmitter
energy.
Throughput: The network throughput is referred as the rate of successfully
delivered messages on the communication channel. The throughput is measured
as bit/second. The packets are successfully delivered during flooding and selective
forward attacks because the earlier one floods only the undesired packets, so desir-
able packets would be successfully delivered. In the next case, only certain packets
are dropped so that the throughput is not much affected compared black hole attack
where all the packets are dropped.
Energy in Joules
energy consumption (Energy 120
consumption graph) 100
80
60
40
20
0
Black Hole Attack HID based Black
Detection using Hole Attack
UAV Detection method
60
50
40
30
20
10
0
20 40 60 80 100
Network size
Black Hole Attack Detection using UAV
HID based Black Hole Attack Detection
method
Anomaly-Based Hierarchical Intrusion Detection for Black Hole Attack … 327
the data is transmitted via another trusted path; hence, in this manner, entire data
reaches the destination. Therefore, from results, it is clear that described model is
detected and prevented the black hole attacks efficiently than previous models.
5 Conclusion
In this paper, anomaly-based hierarchical intrusion detection for black hole attack
detection and prevention in WSN is described. One of the most challenging issues in
WSN is security. For detecting and preventing the black hole attacks, this system
utilized updated active trust model and data routing method with the data type
checking method at the time of routing. Modified low-energy adaptive clustering
hierarchy (LEACH) protocol is used for black hole attack simulation on WSN.
The black hole attacks impact is analyzed by the parameters PDR, throughput and
consumption of energy. Comparative analysis in between the ‘HID-based Black Hole
Attack Detection method’ and ‘Black Hole Attack Detection’ using UAV resulted
that, minimum energy consumption, high throughput and high PDR which indicates
that, great efficiency of HID-based black hole attack detection model.
References
1. Abdul-Wahab Y, Alhassan A-B, Salifu A-M (2020) Extending the lifespan of wireless sensor
networks: a survey of LEACH and non-LEACH routing protocols. International Journal of
Computer Applications 975:8887
2. Sikora M, Fujdiak R, Kuchar K, Holasova E, Misurec J (2021) Generator of slow denial-of-
service cyber attacks. Sensors 21(16):5473
3. Liu Y, Dong M, Ota K, Liu A (2016) ActiveTrust: secure and trustable routing in wireless sensor
networks. IEEE Trans Inf Forensics Secur 11(9):2013–2027
4. Das S, Barani S, Wagh S, Sonavane SS (2016) Energy efficient and trustable routing protocol for
wireless sensor networks based on genetic algorithm (E2TRP). In: 2016 international conference
on automatic control and dynamic optimization techniques (ICACDOT), Pune, pp 154–159
5. Motamedi M, Yazdani N (2015) Detection of black hole attack in wireless sensor network using
UAV. In: 2015 7th conference on information and knowledge technology (IKT), Urmia, pp 1–5
6. Geethu PC, Mohammed AR (2013) Defense mechanism against selective forwarding attack in
wireless sensor networks. In: 2013 fourth international conference on computing, communica-
tions and networking technologies (ICCCNT), Tiruchengode, pp 1–4
7. Satyajayant M, Kabi B, Guoliang X (2011) BAMBi: blackhole attacks mitigation with multiple
base stations in wireless sensor networks. In: IEEE ICC proceedings
8. Tan H, D Ostry H, Zic J, Jha S (2009) A confidential and DoS-resistant multi-hop code dissemina-
tion protocol for wireless sensor networks. ACM WiSec’09, Zurich, Switzerland, 16–18 March
2009
A Reliable Novel Approach of Bio-Image
Processing—Age and Gender Prediction
Abstract Image processing in its field has many applications. With the advancement
in the deep learning, many researchers experimented on various traits recognition of
face. One of the best applications is age prediction. Using various location points on
the face, the age is predicted from the face. Similarly gender also. Age and gender
prediction is which allows us to predict age and gender from a texture image or real
time video. Important application of age and gender prediction is in Biometrics which
is used for security purposes. This paper presents the results on gender prediction
and age estimation system based on convolutions neutral networks by extracting
features from given input image and reorganization by taking large data set and
dividing it into training data (80%) and testing data (20%). The proposed system can
get accurate results by taking large sets of training data. The proposed method used
ResNet architecture using facial points identification, to classify the age group and
gender of the input subject. The experimentation achieved the accuracy of 84% in
predicting the age and 71% predicting the gender.
1 Introduction
Biometric is used to analyze the characteristics of each individual for their identity
age and gender prediction is mainly used in Biometrics for security purposes from
which gender prediction and estimation will be done from a facial image or from a
real time video. Face recognition has been one of the most interesting and important
tasks in prediction age and gender from face images. Many techniques were applied
for gender prediction from face images from the last few years convolution neural
network in deep learning were used which has powerful ability to estimate and extract
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 329
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_31
330 A. Swathi
feature from the given input image or real time video and get accurate results. The
main aim is to develop intelligent system which are able to learn efficiently and
recognizing objects.
The proposed system will split the data into training data and test data and we
will apply the model sequential and test the predictions. In this machine learning
project we are training convolutional neural networks to predict age and gender.
After the increase in social network and social media are being concerned with
automatic age classification in social interaction the most fundament facial qualities
are age and gender. Tensor flow and open source library used for math, data flow and
specific machine learning application. Coevolution neural network is a deep learning
algorithm which takes input images on different aspects and can differentiate from
one image [1–6]. Convolution networks takes less processing fire compared to other
algorithms. Prediction algorithm that is implemented will work in a way so the model
will able to predict age and gender.
2 Literature Survey
The proposed scheme is to fill the gap between automatic age and gender prediction.
We will first introduce the basic structure of CNN then we will describe the ResNet
model for training data to classify gender and age then the result will be obtained
by these data using a trained model. Primary method of the proposed system is to
recognize the gender and age from the human face images. By using the set of facial
features in the real time application extraction of features from face images is an
important part in this method. Figure 1 explains the classification of age and gender
using ResNet model. The binary classification is used for gender prediction as we
need to classify in to two groups. Multi-class classification and regression models
are used for the age classification techniques.
A Reliable Novel Approach of Bio-Image Processing—Age and Gender … 331
The proposed architecture is shown in Fig. 1. The model is trained using the
ResNet architecture. These layers results in training deep neural network with 50
layers on FGNet dataset. These connections skip the training from few layers and
connects directly to the output. Hence, if H(x) is initial mapping of the network fit,
Equation (1), which gives H(x): = F(x) + x in the skip connection. The steps to
proceed in the proposed method are explained in the following subsections.
The training and testing accuracies for the two networks in comparison to the
number of training examples and training hours. Expressing accurateness of epochs
enables one to assess how quickly the network learns as data is submitted to it,
whereas representation in terms of days allows one to evaluate how quickly the
network learns as data is taught to it. In terms of training time, the factor to be
optimized is reflected. Figure 1 shows that the increase in test accuracy ceases a
few phases after the previous modification in learning rate. As a result, we consider
both systems to be given training at the conclusion of 30 epochs, resulting in top test
accuracies of 69.09% (epoch 5) for the targeted system and 89.75% for the reference
network (epoch 30).
332 A. Swathi
The model is built using the CNN, mobilenet_1.0 model. The model is depicted in
Fig. 2. It is used to extract the face area from the background because this background
can be confusing and fail to recognize the expressions. It involves segmentation and
extraction of facial features from the uncontrolled backgrounds.
Face extraction plays an important role in gender object detection. It includes
shapes, color, texture, movements of the facial image. It also reduces information of
image, which requires less storage. The geometric separation of two reference points.
Following the identification of the eye centers, 11 correlated points are obtained
from the provided face input picture. The crucial points are identified three locations
from eye, as well as the lateral endpoints, are put in the face; as well as the nose’s
vertical midway, the lip’s midpoint and two points on the lateral ends of the lips.
This procedure works with the face is upfront, color pictures and consistent lighting
are used with the sample image is either indifferent or smiling.
train the ResNet model and given input image to trained model then performance
evaluation will done to get output as age and gender.
3 Results
The proposed model is built using ResNet architecture with new kernel obtained
from Eq. (2), defined as
filter size, pooling layers and the proposed system use convolution layers to evaluate
the impact of the CNN depth and size of filter on gender prediction. The layer dataset
is given as input and the dataset given is divided into train data and test data where
train data is of 80% and test data is of 20% if given with more train data then
there are better chances of good accuracy. Hence the proposed system used the own
dataset obtained from university and the existing dataset-UTK to compare. First the
system will be trained using train data and then given input image from which age
and gender should be predicted then the given input image will be processed that
is image processing will be one and then it will be given to facial model and then
it will be given to final model and then results will be obtained the data which it
trained will be given to final model in training of data also image processing will be
done and then CNN algorithm will be applied where features will be extracted and
classification will be done and it is given to final model.
Preferably, the proposed method would be able to completely train a small
network, upsize its kernels and instantaneously attain the target network’s test accu-
racy. However, there was a drop in learning rate and the elimination of weight decay
lead to a rise in overfitting, imposing certain limits on this basic technique.
The proposed system is executed on GPU-based system with tensor flow package of
python. In order to map to a logistic regression model, the proposed system defined
a classification threshold up to 0.5. For each step of threshold, the accuracy, F1-
score, recall, precision, false prediction rate and true prediction rates were analyzed.
The values are tabularized in Table 1. At a threshold of 0.5 the system is showing
the better accuracy compared to the other threshold values. The accuracy, precision
and recall were calculated using true positive, true negative, false positive and false
negative values obtained from the confusion matrix while training the dataset. The
corresponding accuracy, precision and recall were noted in Table 2.
It is a noticeable fact that the large number of datasets are available publicly.
Among them the MORPH-II and FG_Net are greatly used datasets. The proposed
334 A. Swathi
method experimented on FG-Net and obtained the results with respect to FACES.
The proposed system evaluated the gender and age for the loss value of 0.59 in
the model. The proposed system is checked using python-tensor flow library on
CPU-based system.
Resultant images are shown in Fig. 3. Figure 3a, b, c, d are predicting the age and
gender. Using multi-class classification the results are figured in Fig. 3e, f, g, h. The
proposed system is executed using input video and also on the images of the test set.
The proposed system has shown the accuracy of 88.9% at the epoch of 24. Accuracy
for the training dataset is little less compared to the trained set as 69%. Although the
difference is too tiny to be called an improvement (0.11%), it does demonstrate that
the upper-bound is reachable with the suggested strategy while eliminating 30.7 h
while training is a great improvement compared to the existing state of art systems
by 11.41% (Fig. 4).
4 Conclusion
CNN can be used to provide improved results of age and gender classification even
by considering the much smaller size unconstrained image sets labeled for age and
gender. The simplicity of the model implies that more elaborated system using more
or large training data may be capable of improving results and gender accurate results.
Using regression model for age and gender prediction instead of classification if
enough data is available. Mainly that can be drawn is that age and gender from face
reorganization are very popular among research which can be used in social network
and advertising to implement an intelligent system that can achieve good and robust
results in the accuracy of recognition deep learning algorithm, convolution neural
network to propose a study contain various ResNet models in gender classification,
and trained in well-known datasets then to apply an efficient model for age estimation.
A Reliable Novel Approach of Bio-Image Processing—Age and Gender … 335
Fig. 4 Accuracy plot obtained for both original image with 231 × 231 resolution with its pre-trained
image of 147 × 147 resolution
References
1. Fu Y, Guo G, Huang TS (2010) Age synthesis and estimation via faces: a survey. IEEE Trans
Pattern Anal Mach Intell 32(11):1955–1976
2. Dhimar T, Mistree K (2016) Feature extraction for facial age estimation: a survey. In:
2016 international conference on wireless communications, signal processing and networking
(WiSPNET). IEEE, pp 2243–2248
3. Dantcheva A, Elia P, Ross A (2015) What else does your biometric data reveal? A survey on
soft biometrics. IEEE Trans Inf Forensics Secur 11(3):441–467
4. Fu S, He H, Hou Z-G (2014) Learning race from face: a survey. IEEE Trans Pattern Anal Mach
Intell 36(12):2483–2509
5. Zafeiriou S, Zhang C, Zhang Z (2015) A survey on face detection in the wild: past, present and
future. Comput Vis Image Underst 138:1–24
336 A. Swathi
6. Ng C-B, Tay Y-H, Goi B-M (2015) A review of facial gender recognition. Pattern Anal Appl
18(4):739–755
7. Sariyanidi E, Gunes H, Cavallaro A (2014) Automatic analysis of facial affect: a survey of
registration, representation, and recognition. IEEE Trans Pattern Anal Mach Intell 37(6):1113–
1133
8. Ding C, Tao D (2016) A comprehensive survey on pose-invariant face recognition. ACM Trans
Intell Syst Technol (TIST) 7(3):1–42
9. Wu Y, Ji Q (2019) Facial landmark detection: a literature survey. Int J Comput Vision
127(2):115–142
10. Savchenko AV (2019) Efficient facial representations for age, gender and identity recognition
in organizing photo albums using multi-output ConvNet. Peer J Computer Science 5:e197
11. Gowroju S, Kumar S (2020) Robust deep learning technique: U-Net architecture for pupil
segmentation. In: 2020 11th IEEE annual information technology, electronics and mobile
communication conference (IEMCON). IEEE, pp 0609–0613
12. Swathi A, Kumar S (2021) A smart application to detect pupil for small dataset with low
illumination. Innovations Syst Softw Eng 17(1):29–43
13. Swathi A, Kumar S (2021) Review on pupil segmentation using cnn-region of interest. In:
Intelligent communication and automation systems. CRC Press, pp 157–168
14. Gowroju S, Kumar S (2021) Robust pupil segmentation using UNET and morphological ımage
processing. In: 2021 international mobile, intelligent, and ubiquitous computing conference
(MIUCC). IEEE, pp 105–109
15. Gowroju S, Aarti KS (2022) Review on secure traditional and machine learning algorithms for
age prediction using IRIS image. Multimed Tools Appl. https://doi.org/10.1007/s11042-022-
13355-4
Restoration and Deblurring the Images
by Using Blind Convolution Method
Abstract Because of the camera shaking or motion, blurring the image is intro-
duced in the digital photography. Another reason for blurriness of the image is less
shutter speed or background light intensity. So because of this, the image important
information is significantly degraded. In order to deblur these affected images, there
are different techniques. One of the techniques is blind image deblurring and works
in different cases as less information or unavailability of point spread function (PSF).
The blurred image deblurred process of deconvolving is simple with the help of any
deblurring filter when it estimated the PSF. The proposed deblurring process is to
be used even in the case of no idea about blur type information. With the help of
estimated PSF re-blurred, the deblurred image is takes place. Then, the quality of
deblurred image in between the re-blurred image and original blurred image is calcu-
lated by the peak signal-to-noise ratio (PSNR). The deblurred images are containing
the noise, which is produced by the deblurring filters. Every iteration of this method
is uses the algorithm of Richardson-Lucy along with blurred image computation and
by using PSF image is restored.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 337
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_32
338 J. Narasimharao et al.
1 Introduction
With the fast growth in modern digital technology, using digital image as digital
information carrier has been the people’s attention. The digital images are used in
various areas, such as medical, military and transportation, microscopy imaging and
photography deblurring [1]. The recorded image consisting a noise and blur version
of original picture [2].
Different techniques are used in identification of colors and shades in several
pictures, and these are not really recognized by human eye. A huge amount of infor-
mation is conveyed through the single images than speaking more words. The main
aim of the image capturing is, captured image is cannot differentiated with the orig-
inal or real image. Even though sometimes, images are attacked by the interfer-
ence or disturbance in the form of blurriness. Then, the actual information in the
image is disturbed. Outside interference or camera physical properties can results to
occurrence of disturbance in the original image.
The camera or object movements are the main reasons of image blurring in the
capturing time, using wide-angle lens and long exposure times, etc., the process of
getting real image from the corresponding blurred image is called image restoration
[3, 4]. Image processing technology uses this technology widely [5]. So from the
distorted image, the original image is retrieved by this image restoration process. In
many situations, the process of eliminating the blur from the image is difficult and
causes a great damage to the original image.
In general terms, image sharpness and contrast deviating situations are called as
blurring. Image restoration is the best solution of this type of problems in the images.
From the distorted images, the blur can be eliminated by using the process of image
deblurring. In this process, sharpness has been given to the degraded image with clear
appearance. The noise function and degradation function are expresses the blurred
image in the degradation model of the image [6].
The process of image recovering is disturbed by the linear degradation is called as
image deblurring commonly known as inverse problem [7]. The first challenge in the
blurred images is blurred kernel [8] and point spread function (PSF) approximation
because blurred kernels estimation is too difficult in the blurred image. If the dynamic
scene or camera rotation in the image, then noise estimation is too hard because it is
spatially invariant. Second challenge is noise elimination from the blurred image in
order to get a noise free image. From the scene, noise has done the attenuation high
frequency information and neighboring pixels averaging. The sharpness of image
is estimated by the blind motion deblurring from blurred image [9]. The image
blurriness is defined in this model as
B = K ∗S+n (1)
Restoration and Deblurring the Images by Using Blind Convolution … 339
where blurry image is denoted by B, blur kernel as K, noise as ‘n’ and latent sharp
image as S. In case of blind motion deblurring, blur kernel value is undefined. There-
fore, the blur kernel value and latent sharp image is to be calculated for the given
image B.
2 Literature Survey
Optical aberration, atmosphere scattering, sensor spatial and temporal integration and
lens defocus are the different sources of getting blurred images. The mechanisms
are partially understood by the humans, while visual systems are easily recognized.
Therefore, the blur image estimation is too difficult. Inaccurate focusing of camera
and movement in camera results the blur in the image. An aperture can causes the
shallow field depth which results the blur in image subsequently non-sharp the image.
Blind deconvolution algorithm gives the effective results even in the situation of no
information as regards to the noise or blur of the image.
In the form of space invariant or space variant, the blurring degradation is present.
Two types of image deblurring methods are present as blind and non-blind. In the
blind type, unknown factor is blurring operator. The blurring operator is known factor
in non-blind type. Image bandwidth reduction is called as blurring which is because
the process of imperfect image formation. Between the original image and camera,
relative motion can results the imperfect image formation. Recovering image in
the blind image deconvolution is too difficult because degrading PSF knowledge is
little in this process. Therefore, the blind deconvolution algorithm can performs the
point spread function restoration simultaneously. In each iteration, Richardson-Lucy
method is used. The improvement in the quality of image restoration is achieved with
additional optical system characteristics, for example, input parameters of camera.
The PSF constraints can be passed in the user specified function.
The phase recovery also called phase retrieval. k̂(ω) phase component estimation
2
is required by the power spectrum k̂ kernel k recovering. However, this proce-
dure only obtains the spectrum information; the phase information is still unknown
because it iteratively switches between Fourier and real-space domains. Unique solu-
2
tion is not guaranteed by the spatial constraints and input k̂ . A hybrid input–output
method is used to estimate the blur kernel in iterative phase retrieval procedure under
the appropriate frequency/spatial domain constraints. Therefore, based on iterative
phase retrieval algorithm, the blur kernel can be recovered. Thus, the blurry image
can be deblurred through a deconvolution.
340 J. Narasimharao et al.
As known above, it can obtain n blur kernels after iterating n times. Each of NSM
values for deconvolution by using the corresponding kernel can be calculated. It is
obvious that the symmetric relationship exists among blur kernels and the estimated
blur kernel for each iteration is also different. Hence, the measure of blur kernel
quality will test the symmetry of blur kernel and will make a score. In order to
estimate the symmetric characteristic of blur kernel, it has to calculate the NSM
score twice. For example, if there are thirty kernels, it will calculate the NSM for
sixty times. After computing NSM value, the smaller NSM score, the better the
reconstructed image.
Natural image signals are highly structural information, such as pixels exhibiting
strong dependencies and containing important information about the objects struc-
ture in the visual scene. In order to estimate the reconstructed image structural
performance after deconvolution, we adopt the structural-similarity-based image
quality measure (SSIM) instead of the mean squared error (MSE). The SSIM mainly
computes the structural similarity between the reference and the distorted signals.
However, one usually requires the overall image quality measure, a mean SSIM
(MSSIM) derived from SSIM is used to achieve this measure, which is showing the
good visual appearance with the best consistency.
3.1 Methodology
The blurred image is obtained with various reasons of camera properties and move-
ments. In this process, blurred image is formed with the intermixing of real PSF (h)
and true image (f ). The blurred image (g) is deblurred when it is passed through the
restoration filter. This deblurred image is estimates the true image with candidate
PSF (h ) which is extracted from the PSF’s list. So the blur in (g ) which is same as
restoration filter is produced by the real PSF (h) which as or similar to candidate PSF
so less noise is produced by the restoration filter. The produced blurring image is
Restoration and Deblurring the Images by Using Blind Convolution … 341
Original image
f h
Wiener filter
Candidate PSF
Deblurring image
PSF of image
Reconstructed image
similar to real blur image and not measured by the peak signal-to-noise ratio (PSNR)
value in between the re-blurred and blurred image. The image point spread function
(PSF) is derived in the next step of this process. Then, four times pixels shorter
undersized PSF is modified from the colored image. In the next step, this under sized
color image is over sized with four times higher pixels to the initial image. At last
initial, PSF is converted from the colored image and is having the same pixel size.
Then, analysis is done for the PSF images and stored.
Every iteration of this method is uses the algorithm of Richardson-Lucy along
with blurred image computation. Input parameters of camera are used as additional
optical system characteristics, which are improve the quality of image restoration.
The PSF constraints are passed in specified function of user. The definition of blind
deblurring method is represented through the below equation as
The blur in images is removed or eliminated by the Wiener filter, which is most
important techniques. The blur is formed because of unfocussed optics or linear
motion. Poor sampling is resulted from the photograph linear motion, and blurring
is also introduced through this signal processing standpoint. Single stationary point
intensity is represented by the pixel of photograph digital representation in front
of the camera. If the camera is in motion and slow shutter speed, then the pixel is
intensities amalgam from camera’s motion line presented points.
The Wiener filter is given by
H ∗ (m, n)
G(u, v) = (3)
[H (m, n)]2 + NSR
where signal-to-noise ratio is represented by the NSR. For getting optimal results,
NSR parameter is used to adjust when unknown the original signal. Noise is
completely eliminated when the NSR value is high and deblurred image is smoothed
extensively. On the other hand, image sharpness is improved with the less NSR value
and less amount of noise is present in this case.
Point spread function is defined as the amount of degree of blur or spread point of
light in any optimal system. Point spread function (PSF) Fourier transform is denoted
by the optical transfer function (OTF), and it is a frequency domain function. The
impulse response of invariant and linear system is defined by the OTF. Conversely,
OTF inverse Fourier transform is denoted as PSF. Point spread function (PSF) is
given by the light emission pattern, which is diffracted from the point source. One
of the image fundamental units is PSF. Convolution integrals can give the blurring
and represented as
(r ) = h(r, s) f (s)ds (4)
h(r, s) is denoted for the image position of r, and object brightness distribution of point
spread function is f (s). The equation can be simplified through the same coordinates
of r, s.
Centered PSF is given by
Point object or point source imaging system response is explained by the point
spread function (PSF), and it is systems impulse response representation.
The blurred images restoring process can be done by the method of Richardson-Lucy
(R-L) algorithm and widely used method. The deblurring images and its restoration
are very compatible in this R-L method because it is more and more characteristics.
The Poisson statistics are adopted this R-L method for best probability solutions
for its data. The images are restored as non-negative at the local and global iteration,
and flux is conserved by the R-L method. Strong characteristics are gained in the
restored images point spread function (PSF). Certain calculations are required for
storing the image in R-L algorithm. The derivation of R-L algorithm iterations is
very comfortable for the Poisson statistics equation.
4 Results
The results are obtained from deblurring process is same as real images. This process
includes the images as motion blur case and atmospheric turbulence. Figure 2a shows
a blurred frame and deblurred image using atmospheric turbulence.
The camera can captures the observed scene, which is affected by the atmospheric
turbulence as considered blur because of the refractive index of medium fluctuations.
Fig. 2 a Blurred video frame, b Deblurred using estimated atmospheric turbulence psf
344 J. Narasimharao et al.
Atmospheric turbulence blur of OTF for long exposures under some conditions
are given as
Gaussian function can estimates the atmospheric turbulence blur for long
exposures as
2
x + y2
d(i, j; σG ) = Cexp − (8)
2σG2
where the blur variance is denoted as σG2 . Uniform blur is represented in Fig. 2a so in
this case PSF is utilized. Fig. 2b explains the deblurred image with high sharpness.
0.79 is the blur variance in this method (Fig. 3).
This proposed method is used even in case of distortion type is unknown and
efficiently deblur the colored images. PSF can analysis the image, then after blind
deconvolution process is done. Every iteration of this method is uses the damped algo-
rithm of Richardson-Lucy. Colored images are also deblurred by using this method,
and image is taken by the point spread function. Under sized color image is over sized
with four times higher pixels to the initial image. At last initial, PSF is converted
from the colored image and is having the same pixel size. The PSF constraints are
passed in specified function of user. Figure 4 shows the appropriate results.
The blurred image because of motion is depicted with unreadable book name in
Fig. 5a. In the figure, a man is holding the book and moving across the camera.
In this process, image is captured with an ordinary camera. The deblurred image
with estimation of PSF is depicted in Fig. 5b. Minus 4° angle with 71 pixel long is
considered in this proposed PSF estimation approach. In the kurtosis-based scheme,
1degree angle and 76 pixel long is considered with PSF estimation. The book title is
readily readable.
The performance evaluation factors used for the comparison are peak signal-to-
noise ratio (PSNR) and mean square error (MSE) with respect to the ground truth.
The values PSNR and MSE for reconstructed image according to threshold values
are given in Table 1.
In Table 1, PSNR and MSE values are calculated. The result shows that PSNR
value is maximum at threshold 0.2 and corresponding MSE value is minimum at this
threshold. The graphical representation of Table 1 is in Fig. 6.
Fig. 5 a Image under motion blur. b Deblurred with PSF of length 71 pixels and angle minus-4
5 Conclusion
A novel method for deblur the image based on PSF is discussed in this paper even in
the case of distortion type is unknown and efficiently deblur the colored images. The
implementation of this process is very easy and efficiently works. Different types
of blurring situations are applicable for this method as motion blur or atmospheric
turbulence. Deblurred image ringing effect and noise are eliminated by using various
restoration filters and significantly effects the characteristics of image. Through this
blind convolution method, the blurred images are deblurred efficiently with less
computational time. Moving objects images are effectively restored through this
algorithm presented in this paper by the low quality surveillance cameras with a
good visual quality. It could enhance the image quality shown in many kinds of the
view devices.
References
1. Li B, Cheng Y (2021) Image segmentation technology and its application in digital image
processing. In: 2021 IEEE Asia-Pacific conference on image processing, electronics and
computers (IPEC)
2. Lu X, Gu C, Zhang C, He Y (2021) Blur removal via blurred-noisy image pair. IEEE Trans
Image Process 30
3. Tao S, Dong W, Chen Y, Xu G (2021) Blind deconvolution for poissonian blurred image with
total variation and L0-norm gradient regularizations. IEEE Trans Image Process 30
4. Rajagopalan AN, Purohit K, Suin M (2021) Degradation aware approach to image restoration
using knowledge distillation. IEEE Journal of Selected Topics in Signal Process 15(2)
5. Tang M (2020) Image segmentation technology and its application in digital image processing.
In: 2020 international conference on advance in ambient computing and intelligence (ICAACI)
6. Chen J, Wu G, Wang W, Zeng L, Cai W (2020) Robust prior-based single image super resolution
under multiple Gaussian degradations. IEEE Access 8
7. Li H, LuoW, Zhang K, Ma L, Zhong Y, Liu W, Stenger B (2020) Deblurring by realistic blurring.
In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
8. Li Y, Zhang H, Zhang Z, Wu Y (2020) BID: an effective blind image deblurring scheme to
estimate the blur kernel for various scenarios. IEEE Access 8
9. Lee D, Seo D, Kim H, Cha D, Jung J (2019) Blind motion deblurring for satellite image using
convolutional neural network. In: 2019 digital image computing: Technique and Apps (DICTA)
Interpretation of Brain Tumour Using
Deep Learning Model
J. Avanija
Department of CSE, Sree Vidyanikethan Engineering College, Tirupati, Andhra Pradesh, India
e-mail: [email protected]
B. Ramji
Department of CSE (DS), CMR Technical Campus, Hyderabad, Telangana, India
e-mail: [email protected]
A. Prabhu (B) · K. Maheswari · V. N. Kumar
Department of CSE, CMR Technical Campus, Hyderabad, Telangana, India
e-mail: [email protected]
K. Maheswari
e-mail: [email protected]
V. N. Kumar
e-mail: [email protected]
R. H. S. Vittal
Hyundai Mobis, Hyderabad, Telangana, India
e-mail: [email protected]
D. B. V. Jagannadham
Department of ECE, Gayatri Vidya Parishad College of Engineering, Madhurawada,
Visakhapatnam, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 347
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_33
348 J. Avanija et al.
is a huge amount of data to assist. Extracting tumour from the images becomes diffi-
cult. To overcome this drawback, the proposed method uses convolutional neural
network-based model using MobileNet for detection of brain tumours given MRI
images.
1 Introduction
Today, we live in an era where illnesses are on the rise, necessitating the advancement
of treatment quality. Tumours are an irregular bulge that can appear anywhere on the
body and are one of the most hazardous illnesses. The most dangerous of all cancers is
the brain tumour, which can develop in any area of the brain. It is primarily described
as aberrant cell proliferation in the brain. These aberrant cells can cause damage to
healthy brain cells, resulting in brain dysfunction. There are several distinct forms
of brain tumours. These tumours can be either malignant (cancerous) or benign (not
cancerous). Detecting a brain tumour and correctly identifying its kind is not an easy
process. CNN [1] outperforms the competition due to its widespread application in
image recognition. It is essentially a collection of neurons with weights that may be
learned. They are also noted for their exceptional precision and performance. Because
of the noise and abnormalities in the picture, human’s observation in predicting the
tumour might be misleading. This drives our efforts to develop a tumour prediction
algorithm. This contains methods for identifying tumours and categorising them as
benign, malignant, or normal. A new horizon for radiology has opened up with the
emergence of technologies to quantitatively analyse gliomas using computational
methodologies. It is critical for radiologists to stay up to date on machine learning
developments. The college of radiologists in New Zealand has recently updated its
curriculum to include machine learning in the part I applied imaging technology
examinations.
Quantitative analytic methods will complement the traditional visual study of
pictures. This will enable statistical examination of characteristics that are not visible
to the naked eye. Radiomics is rapidly evolving as a way of forecasting survival
durations using imaging parameters such as the shape of a region of interest. With
the advancement of these approaches, the necessity for automatic segmentation has
grown. Inconsistencies in the first and second authors’ blinded hand segmentation of
brain tumours are considered. The Srensen–Dice coefficient, which was determined
using the StudierFenster calculator, is a measure of picture segmentation consis-
tency (available).The result obtained from the first and second authors’ segmentation
was 0.91, demonstrating the disparity in hand segmentation. Convolutional neural
networks (CNNs) [2, 3] work on the principle of human brain and it is a machine
learning method. Machine learning is rapidly evolving, with increasing representa-
tion at major conferences. Radiologists require an educated viewpoint. This research
Interpretation of Brain Tumour Using Deep Learning Model 349
2 Related Work
This section focuses on the background analysis that was done in this domain. As
there are variety and complexity of tumours, detecting MRI brain tumour pictures
is a tough process. This study introduces two detection techniques: the first is edge
detection and segmentation, and the second is artificial neural network proficiency.
The proposed strategy for brain tumour identification and segmentation is more
accurate and successful in this study [4]. First, while all interscale correlations were
statistically significant, they were modest, indicating that the scales were measuring
different aspects of the quality of life concept [5]. Due to the variety and complexity
of tumours, detecting MRI brain tumour pictures is a tough process. This study
introduces two detection techniques: the first is edge detection and segmentation,
and the second is artificial neural network proficiency [6, 7]. The data set collected
of ePROs through the cancer clinics that gave the monitoring of patent care a survey
of the validated symptoms with 78 questions [8].
Patients who are diagnosed with cancer frequently experience uncertainty and a
lack of control over their circumstances, which has a poor impact on their health
outcomes. Patients’ quality of life is further harmed by cancer therapy. Patients
frequently rely on their doctors for social/interpersonal, informational, and deci-
sional support during their cancer experience. An increasing amount of evidence
suggests that doctors’ communication style has a favourable influence on patient
health outcomes. As a result, the patient–physician contact is extremely important in
the delivery of cancer care. It is great to see that cancer researchers are paying atten-
tion to research in this field, which is generally dominated by primary care studies. A
review of significant data tying physician conduct to cancer patient health outcomes
follows a discussion of several techniques to evaluate physician behaviour [9, 10].
Finally, the shortcomings of the existing work possible shortcomings are mentioned,
as well as opportunities for future research.
Alternative approaches have been used to diagnose brain tumours, including pre-
trained models, different designs of convolutional neural networks, and ensemble
models that combine many models. The existing methods had issues with noises
such as light fluctuations, blurring, and occlusion, and some of the existing systems
failed to identify real time due to limited data sets.
3 Proposed Method
The proposed system uses convolutional neural networks to diagnose brain tumour
which can handle scalability of images through the architecture including input layer,
convolution layer, rectified linear unit (ReLU), pooling layer, and fully linked layer.
350 J. Avanija et al.
The architecture of the proposed approach is specified in Fig. 1. During the training
phase, the images from the data set will be pre-processed to remove noise and outliers.
Next step is to extract the features from pre-processed images and then perform
classification of images using convolutional neural network.
Convolutional neural network is a deep learning neural network which is mainly
used for image processing and classification. CNNs are feedforward networks in that
information flow takes place in one direction only, from their inputs to their outputs.
It is an algorithm which takes an image and is able to differentiate one from another
with minimal pre-processing compared to other classification algorithms. Automatic
detection of features without any human supervision is the main advantage of CNN
compared to others. CNN architecture [11, 12] is built by using three types of layers:
convolutional layer, pooling layer, and fully connected layer. A convolutional layer
can be followed by additional convolutional and pooling layers, and the final layer is
a fully connected layer. These layers are stacked together to form a deep model. The
convolution layer divides the supplied input image into smaller parts. The ReLU [13]
layer activates each element individually. The pooling layer is optional. The network
architecture consists of fully connected layer to compute the scores for class, label
value, which is based on a probability values ranging from 0 to 1.
Convolutional layer acts as a feature extractor to extract the features from the
input image. It contains learnable filters called kernels, which is a matrix of integers
(trainable weights). The filter shifts by a stride throughout the image and performs
a dot product with that portion of the image on which the filter is hovering in order
to produce a feature map. Various categories of feature maps in the same layer
of convolutional network contain different weights, and at each location, several
features are extracted. In order to reduce the dimensionality of the feature maps by
selecting the best features, a pooling layer is used. In the pooling layer, the pooling
Interpretation of Brain Tumour Using Deep Learning Model 351
operation will sweep the filter throughout the entire input but it does not contain
any weights like convolution layer. The filter applies an aggregation function to the
values in respective fields and produces an output array. Fully connected layer [14]
with softmax, sigmoid activation function is used for image classifications. Softmax
activation function uses probability distributions to classify the images (Fig. 2).
The training of convolution neural network specified in Fig. 3 is divided into two
stages such as forward propagation and backpropagation. During forward propaga-
tion, the sample x and its label y are extracted where x will be the input given to
the network and the dimension of y will be 7 and specified as vector. The output of
the previous layer will be the input to the current one. Activation function is applied
to calculate the output which will be passed to the layers at lower level. At last, the
model finds the output of softmax layer. After completing the forward propagation,
the error between the output y and the softmax layer will be calculated and propagated
back. Based on the error value, weight adjustment takes place. MobileNet model is
used which works same as the convolution network to apply the image filters but the
depth of the convolution varies from the normal representation. Rectified linear unit
(ReLU) function is used which has a derivative function and allows for backpropa-
gation while simultaneously making it computationally efficient. The neurons will
only be deactivated if the output of the linear transformation is less than 0.
4 Experiments
The data set used in the proposed system is brain tumour MRI scan images an open-
source data set publicly shared in Kaggle. The data set consists of 3264 images, out
of which, 2860 are used for training and the rest for testing. The images of both
training and testing are demonstrated into four classes. Class-1 is glioma_tumour,
class-2 is meningioma_tumour, class-3 is pituitary_tumour, and class-4 is no tumour.
352 J. Avanija et al.
Table describes the number of images used for training and testing. Training and test
data description are specified in Table 1.
Collection of a data set consisting of images. (In this case, the data set is
brain tumour MRI scan images an open-source data set of 3264 images, greyscale
images of brain each labelled with one of 4 classes such as glioma_tumour, menin-
gioma_tumour, pituitary_tumour, and no tumour) [15]. Experimentation was carried
out using python libraries in GoogleColab. The image data set is pre-processed
using function in ImageDataGenerator() and classification is performed using the
CNN model with 3 layers such as convolution, pooling, and dense. Model fitting is
performed by calling mode.fit_generator() with parameters train data set and setting
epochs to 35. This model is validated on test data set during training. During training,
the phases such as forward and backward propagation will be performed on the pixel
values. After the model is trained, evaluation of model on test data is performed.
The trained model predicts the classes for the test data. A test run of the system
is performed to remove defects before implementing the new system activity or
capability. Figure 4 shows the sample input and output images. Table 2 gives the
evaluation metrics considered to measure the performance of the model. Compar-
ison of various existing models along with proposed model is given in Table 3. The
proposed CNN-based MobileNet model shown accuracy of 96.6% which is better
when compared to other models as specified in Table 3.
Table 3 Comparison of
Features Model Accuracy (%)
existing methods
Model based CapsNet 86.56
Model based CNN 84.19
CNN NN 91.90
CNN MobileNet 96.6
5 Conclusion
The main aim of the proposed work is to detect brain tumour based on the given
image data set consisting of patients MRI scanned images. The proposed model uses
convolutional neural network with MobileNet to classify the images. This model even
performs well in detecting glioma tumour, meningioma tumour, pituitary tumour, and
non-tumour scans among the opted images (MRI scans). To categorise the tumours,
image enhancing methods, a CNN model, and a softmax classifier were used to
achieve a 96.6% accuracy which is better compared to existing methods. Future
work is to identify and use optimum deep learning network architecture and also to
expand the model to detect other types of tumours.
References
1. Vijayakumar T (2019) Neural network analysis for tumor investigation and cancer prediction.
Journal of Electronics 1(02): 89–98. https://doi.org/10.36548/jes.2019.2.004
2. Hassan M, DeRosa MC (2020) Recent advances in cancer early detection and diagnosis: role
of nucleic acid based aptasensors. TrAC, Trends Anal Chem 124:115806. https://doi.org/10.
1016/j.trac.2020.115806
3. Pandian P (2019) Identification and classification of cancer cells using capsule network with
pathological images. Journal of Artificial Intelligence and Capsule Networks 01(01): 37–44.
https://doi.org/10.36548/jaicn.2019.1.005
4. Siegel RL, Miller KD, Jemal A (2017) Cancer statistics, 2017. CA: A Cancer Journal for
Clinicians 67(1): 7–30. https://doi.org/10.3322/caac.21387
5. Razzak MI, Imran M, Xu G (2019) Efficient brain tumor segmentation with multiscale two-
pathway-group conventional neural networks. IEEE J Biomed Health Inform 23(5):1911–1919.
https://doi.org/10.1109/jbhi.2018.2874033
6. Khan HA, Jue W, Mushtaq M, Mushtaq MU (2020) Brain tumor classification in MRI image
using convolutional neural network. Math Biosci Eng 17(5):6203–6216
Interpretation of Brain Tumour Using Deep Learning Model 355
U. M. F. Dimlo
Department of CSE, Sreyas Institute of Engineering and Technology, Hyderabad, Telangana, India
e-mail: [email protected]
J. Narasimharao (B) · B. Laxmaiah · D. S. Rani · V. N. Kumar
Department of CSE, CMR Technical Campus, Hyderabad, Telangana, India
e-mail: [email protected]
B. Laxmaiah
e-mail: [email protected]
D. S. Rani
e-mail: [email protected]
V. N. Kumar
e-mail: [email protected]
E. Srinath
Department of CSE, Keshav Memorial Institute of Technology, UGC Autonomous, Hyderabad,
Telangana, India
e-mail: [email protected]
Sandhyarani
Department of CSE (Data Science), CMR Technical Campus, Hyderabad, Telangana, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 357
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_34
358 U. M. F. Dimlo et al.
picture without ringing. Finally, the results are presented in terms of performance
parameters such as signal-to-noise ratio (SNR), mean squared error (MSE), and peak
signal-to-noise ratio (PSNR).The results show that the performance parameters of
the improved blind deconvolution model are superior compared to existing image
blur removal algorithms.
1 Introduction
Digital photographs are utilized in a lot of fields, together with medical, military,
transportation, microscopy imaging, and images deblurring, among others. The
recorded picture is a noise and blurred model of the unique picture. Images are
affected by blurring and noise in various areas of applied science.
Blurring is a problem created by an imaging system (caused, for example, by
diffraction, aberrations, etc.), whereas noise is a part of the detecting process. As a
result, picture deconvolution is essentially a post-processing of the recognized image
with the target of decreasing blur and noise.
Convolution, which is frequently associated with the band-limited nature of acqui-
sition technology, and contamination by additive Gaussian noise, which may be
attributable to the electronics of the recording and transmission processes, is well-
known sources of signal/image degradation in many practical situations. For example,
the blur effect in remote sensing images is caused by the limited aperture of satellite
cameras, optical system, and mechanical vibrations. A blurred image is created by
convolution a sharp image with a blur kernel or point spread function (PSF). To get
the crisp image, first extract the blur kernel from the sharp image. On the other hand,
the problem is estimation of the blur kernel. Deconvolution is the estimation of an
unknown blur kernel.
These concepts are used by the majority of deblurring techniques. A data restora-
tion process is frequently required before any further processing to remove these
artifacts. Many papers have been written on the deconvolution of noisy signals [1].
Inverse problems related to practical interest are often badly encountered so it is
difficult to devise appropriate deconvolution methods. Deconvolution techniques
are a computationally intensive image processing technique that is widely used to
improve digital image contrast and resolution [2]. The basis of deconvolution is
mainly designed using a set of methods to remove blurring of an image. Therefore,
the deconvolution method is often recommended as a good choice to reduce the
effects of visual blurring of captured images. In addition, image processing using a
resolution technique offers an advantage in cases where images are captured using a
pin hole aperture [3].
An Improved Blind Deconvolution for Restoration of Blurred Images … 359
2 Literature Survey
average square error, i.e., the wiener filtering, the blurry images are restored. Some
authors have considered ringing issues and sought to minimize deconvolution scheme
artifacts. Liu et al. [10] invented the ring detection method design and build the
pyramid at various scales of the restored image and compute the gradient differ-
ence between each level of the pyramid. Such artificial ringing detectors can only
be used to assign the quality of blurry images. It is not directly involved in efforts to
produce blurry images free of artifacts. The original ringing artifacts are eliminated
by applying a residual multi-scale deconvolution approach to the edge-preserving
bilateral filter and the traditional reinforcement learning (RL) algorithm.
This paper gives a stepped forward blind blur elimination set of rules primarily
based totally on darkish channels, in addition to a bilateral clear out shared with
the unique set of rules, to eliminate ringing and generate a blur removal image. For
example, ringing is effective in suppressing deterioration for the gradient’s prior
probability. As a result, only gradient information is used to estimate ideal images.
This method compares the previously estimated ideal image with the image obtained
using gradient information and a two-sided filter. The following Fig. 1 shows steps
used for improved blind deconvolution using ringing removal process.
The pixels of the remote sensing pictures are uniformly blurred due to the jitter and
blur of the distant sensor. This is known as the mathematically clear picture and the
blur kernel noise convolution. It can be expressed as:
b =k∗x +n (1)
L0 regularization
Deblurred image
The dark channels of a fog-free outdoor image have almost zero pixels, and the dark
channels were previously applied to the image defog problem. Intuitively, the blurring
process replaces the pixel values of very dark pixels with the weighted average of
other bright pixels nearby, increasing the size of the very dark pixels. As a result, dark
channel preferences can be used to enhance the dark channels potential for sharper
images. The dark channel in image is defined as:
min min
D(I )(X ) = IC (y) (2)
y ∈ N (x) c ∈ N (x)
If x and y are pixel positions, then N(x) is the x-centered image point and I C is the
cth color channel. The dark channel solution will compute the smallest red, green,
and blue (RGB) component per pixel, save it in the same grayscale image as the
original image, and then scale it using Eq. (2).The size of the pop-up area determines
the filter radius in the value filter.
362 U. M. F. Dimlo et al.
1
JBF[I, D]m = In f (m − n)g(Dm − Dn ) (3)
k p n=(x)
F represents the spatial filter, g represents the distance filter, I represents the input
image, D represents the guide image, represents the spatial support of kernel f ,
and k p represents the normalization coefficient.
The problem of blind image recovery can be divided into two parts: Calculating the
PSF from the degraded image (k-step) and the best image from the PSF (x-step). A
different method is used to repair the damaged image. This blind image reconstruction
process employs a deconvolution based fast reconstruction method. By regularizing
overall variation, X-step reduces the effects of noise and improves edges with shock
filters and ideal images. Also, Rmap only the strong edge component remains in
the k-step. The process is then repeated in a series of PSF calculations using the
estimated ideal image’s derivative thresholding and the conjugate gradient method.
The PSF obtained from the iteration is used for the final deconvolution. For error
detection and prevention, the power value is calculated (4). As this value increases,
the estimated PSF threshold changes. When the objective function converges and the
energy value decreases, the reconstruction is successful.
|b − x × k|2
e= (4)
ω×h
where x × k is the estimated blurred image and w and h are the horizontal and vertical
pixel counts of an image.
An Improved Blind Deconvolution for Restoration of Blurred Images … 363
The regularized probability L 0 of the image is used in this method to remove blur.
Based on the prior probabilities of the pixel values and the prior probability of the
regular pixel value, L 0 calculates the image’s pre-probability P(x) (5).
Pt (x) represents the previous probability of the pixel value and Pt (∇x) represents
the gradient’s previous probability. Prior probabilities for gradients can help you
control deterioration like ringing. As a result, while reducing ringing, set to σ = 0
and estimate the ideal image represented by using only the gradient information (6).
−1 F(k)F(b) + βF(u) + μFG
x =F (6)
F(k) + β + μF(∇)F(∇)
where u, β, and μ denote auxiliary variables, and F(•) and F −1 (•) are fast Fourier
transform (FFT) and inverse FFT, respectively. F(•) denotes a complex conjugate
operator which is expressed by FG (7).
4 Results
The images were evaluated objectively for signal-to-noise ratio (SNR), mean squared
error (MSE), and peak signal-to-noise ratio (PSNR). SNR is a simple metric used to
assess the effectiveness of noise reduction techniques. Higher signal-to-noise ratios
are regarded as a sign of effective noise reduction. The SNR is given as
RMS signal
SNR(dB) = 20log (8)
RMS Noise
MSE is a metric used to assess denoising accuracy. Lower MSE values indicate
that the noise reduction signal is more similar to the original signal. This is thought
to result in better noise reduction. The MSE is given as
1
m−1 n−1
MSE(dB) = [1(i − j) − k(i − j)]2 (9)
mn i=0 j=0
364 U. M. F. Dimlo et al.
PSNR is a metric similar to SNR, with higher values indicating more accurate
noise reduction. The PSNR is given as,
max j 2
PSNR(dB) = 10log10 (10)
MSE
The results of objective evaluation are presented in Figs. 2 and 3. It shows the
SNR and PSNR analysis of image at different steps of improved blind decovolution
in Fig. 2. If it can be seen that the SNR has PSNR which has high values at after
blind deconvolution compared to the stages at known PSF and in out blur image.
Figure 3 shows the MSE analysis of image at different steps of improved blind
decovolution. It can be seen that the MSE has lower values at after blind deconvolution
compared to the stages at known PSF and in out blur image.
Here, the performance of the image ringing removal scheme is evaluated. A series
of blurry images was used for this purpose. Blurred images and their point spread
function (PSF) pairs are used. An image containing motion blur (handshake) was
captured by the camera. The PSF of the image was estimated using the blind decon-
volution approach. Blurred images were deblurred by applying an improved blind
deconvolution using a ringing removal process. The deblurring results are shown in
20
15
10
5
0
input Blur Image Known PSF After Blind
deconvoltuion
An Improved Blind Deconvolution for Restoration of Blurred Images … 365
Fig. 4a–c. Input parameters for the deblurring algorithm such as rule weights and
smoothing factors are chosen in such a way that they do not produce overly sensitive
and cartoonish results. The deconvolution scheme, as seen in these figures, produces
ringing artifacts, as shown in Fig. 4a.
To identify the artifacts generated during the deconvolution stage, the blurred
image was subjected to a ringing removal process. It is used a filter to remove the
ringing artifacts. The Gaussian parameters were determined during the ringing detec-
tion step. Because of the symmetry of the PSF Fourier transform, half of the detected
minimum points are ignored. This cuts down on the number of filters needed during
detection. Figure 4b illustrates the ringing artifact detection results. For example, the
marked ring mask is superimposed on a yellow-colored blurred image. The algo-
rithm identifies almost all ringing areas in the blurred image. Blind deconvolution
was used to estimate the PSF used in the image deblurring process.
5 Conclusion
This paper implemented an improved dark channel before image blur method with
blind deconvolution and restored the image with a ringing removal process that
targets the ringing effect of the image blur. Use common bilateral filtering during
restoration to reduce ringing of the restored image, save edges more effectively,
and enhance the image restoration effect. The signal-to-noise ratio (SNR), peak
signal-to-noise ratio (PSNR), and mean squared error (MSE) of the performance
parameters are calculated and graphically displayed. According to simulation results,
when compared to non-blind deconvolution, the maximum signal-to-noise ratio and
noise-to-signal ratio are higher, indicating that the signal information is higher, and
the mean squared error is lower, indicating a lower amount of error. According to
experimental results, this algorithm effectively eliminates motion blur in an image.
The results obtained with this method show that the blind deconvolution method is
superior. The simulation results show that the blind deconvolution technique performs
better when reconstructing an image from an out-of-focus image.
References
1. Cheng L, Wei H (2020) An image deblurring method based on improved dark channel prior. J
Phys: Conf Ser 1627(1):012017
2. Xu X, Zheng H, Zhang F, Li H, Zhang M (2020) Poisson image restoration via transformed
network. Journal of Shanghai Jiaotong University (Science) 1–12
3. Kanwal N, Pérez-Bueno F, Schmidt A, Molina R, Engan K (2022) The devil is in the details:
whole slide image acquisition and processing for artifacts detection, color variation, and data
augmentation. A review. IEEE Access
4. Shamshad F, Ahmed A (2020) Class-specific blind deconvolutional phase retrieval under a
generative prior. arXiv preprint arXiv:2002.12578
5. Barani S, Poornapushpakala S, Subramoniam M, Vijayashree T, Sudheera K (2022) Analysis
on image restoration of ancient paintings. In: 2022 international conference on advances in
computing, communication and applied informatics (ACCAI). IEEE, pp 1–8
6. Sarbas CHS, Rahiman VA (2019) Deblurring of low light images using light-streak and
dark channel. In: 2019 4th international conference on electrical, electronics, communication,
computer technologies and optimization techniques (ICEECCOT). IEEE, pp 111–117
7. Wang H, Pan J, Su Z, Lianga S (2017) Blind image deblurring using elastic-net based rank
priors. In: Computer vision and image understanding, Elsevier, pp 157–171
8. Yang F-W, Lin HJ, H Chuang HJ (2017) Image deblurring, IEEE smart world, ubiqui-
tous intelligence and computing, advanced and trusted computed, scalable computing and
communications, cloud and big data computing, internet of people and smart city innovation
9. Marapareddy R (2017) Restoration of blurred images using wiener filtering. International
Journal of Electrical, Electronics and Data Communication
10. Liu Y, Wang J, Cho S, Finkelstein A, Rusinkiewicz S (2013) A no-reference metric for
evaluating the quality of motion deblurring. ACM Transactions on Graphics (SIGGRAPH
Asia)
A Review on Deep Learning Approaches
for Histopathology Breast Cancer
Classification
Abstract Deep learning (DL) is the most rapidly expanding in the current scenario.
For image analysis and categorization, deep neural networks (DNNs) are presently
the most extensively utilized technology. DNN designs include GoogleNet, residual
networks, and AlexNet, among others. Breast cancer is seen as a major problem that
endangers the lives and health of women. Ultrasonography or MRI scanning methods
are used to diagnose breast cancer disease. Imaging methods used for diagnosis
include digital mammography, ultrasonography, magnetic resonance imaging, and
infrared thermography. The primary objective is to investigate different deep learning
algorithms for recognizing breast cancer-affected imageries. The best models provide
accuracy for the 2, 4, and classifications on cancer datasets. No previous research
is carried out for the current model investigation. Early detection and screening
are critical for effective therapy. The following is a synopsis of recent progress in
mammograms and identification, as well as a discussion of technological advance-
ments. An effective test result should meet the following requirements: performance,
sensitivity, specificity, precision, recall, and low cost. The experimental settings for
every study on breast cancer histopathology images are thoroughly reviewed and
deliberated in this article.
R. Kalavathi (B)
Research Scholar, Department of Computer Science and Engineering, Osmania University,
Hyderabad, India
e-mail: [email protected]
M. Swamy Das
Department of Computer Science and Engineering, Chaitanya Bharati Institute of Technology,
Hyderabad, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 367
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_35
368 R. Kalavathi and M. Swamy Das
1 Introduction
As per the note of National Cancer Institute (NCI), women are facing breast cancer
problems [1]. It is envisaged that all the advanced cases of breast cancer would be
recognized and cured timely. It plays a vital part in the indicative process; it is also
hoary for differentiating between malignant and benign tissues, separating them as
in situ and invasive carcinoma [2].
Hematoxylin and Eosin were used to stain tissue samples (H&E). Pathologists will
then evaluate the samples using light microscopy. However, due to the complexity
of the visible structures, and photographic estimation of tissue microstructure, the
overall arrangement of centers in histological pictures takes time and is too specific.
As a result, computer-assisted diagnostic methods that work automatically are crit-
ical for minimizing expert labor by enhancing diagnostic efficiency and minimizing
subjectivity in illness categorization [3]. Many approaches for object detection in
medical diagnostics have been developed. Deep learning-based approaches have
recently been proven to outperform traditional machine learning methods in several
image analysis tasks. Most of the image processing using DL methods has shown
promise in the detection of breast cancer [4–7].
The volume and size of medical datasets are continually rising, yet the majority
of these data are not evaluated for important and hidden knowledge. Extracted data
and correlations can be discovered using powerful data mining algorithms [8].
Simulations derived from these approaches can help healthcare professionals
make sound judgments. Breast cancer is viewed as a severe danger to the health
and lives of women. Breast cancer is one of the most frequent kinds of cancer in
women all over the world [9]. Mammography produces high-quality pictures of the
breast’s interior architecture. Breast cancer can be detected with mammograms, and
out of one with architectural deformities and the other with macrocalcifications.
It is very critical for detecting primary tumors, although architectural aberrations
are less relevant specifically related to masses and MCs [10]. Various authors have
recently developed ML algorithms for diagnosing breast abnormalities in mammog-
raphy imageries. Singh and Gutte [11] developed a group categorization based on
common polling. On the Wisconsin breast cancer dataset (WBCD), many ML algo-
rithms were proposed to identify and classify the cancer data and were evaluated with
an accurateness of 99.42%. The [12] employed photograph handing out to eliminate
the pectoral muscle from the digital mammogram database for the mammographic
image analysis society (MIAS) [13] and the digital mammogram dream challenge
dataset [14]. The characteristics were retrieved and identified by the researchers
using conventional and multiple classifications based on statistical measures. The
maximum achievable accuracy was 99.7% [12, 13]. To categorize the MIAS dataset
samples, [15] employed Fourier analysis, PCA, and SVM. The achieved accuracy
was 92.16%. Furthermore, certain articles, such as Refs. [16–21], acknowledged
conventional CAD systems that utilized ML approaches.
A Review on Deep Learning Approaches for Histopathology Breast … 369
Screening for breast cancer is done using breast self-examination (BSE) and clinical
breast examination (CBE). The sensitivity of CBE is 57.14%, and the specificity is
97.11% [22]. Although it cannot be used to identify cancer with certainty, it can be
used to detect worrisome breast lesions. Reference [23] discovered no difference in
breast cancer death tolls between those who were tested with BSE and CBE and
those who were not, despite the fact that persons who were screened had twice
quite so many biopsies. Other studies indicate that many professors and healthcare
professionals, i.e., people who influence young women, are either uneducated or
unable to perform BSE properly [24]. In one study, 99% of nurses felt able to conduct
a BSE, but 26% performed BSE every month [25]. The methods BSE and CBE are
very much useful in screening cancer. The sensitivity and specificity of the cancer
are influenced by the parameters such as age, HRTs, BMI, menstrual phase, and
genetics [26, 27]. The research results from women who utilized HRTs provided a
mammographic specificity of around 91.7% [27]. Mammography is less sensitive in
women who have thick radiographic breasts. Sensitivity ranges from 62.9% in highly
dense-breasted women to 87% in extremely fatty-breasted women, while specificity
ranges from 89.1 to 96.9% [27].
Using the leukemia dataset, Ref. [28] employed the Bayesian model for feature
selection and then used ANN, KNN, and SVM classifiers. In 2004, the researchers
[29] employed uncorrelated linear discriminant analysis (ULDA) for feature selec-
tion and found that it outperformed previous approaches in terms of classifier accu-
racy. The authors [30] used SVM-RFE to choose features and a kernel-based fuzzy
technique to classify them. Reference [31] employed the subset information gain
strategy for feature selection in 2012, repeatedly gaining an informative gene subset
with the subset merge and split procedure. Reference [32] employed a discriminant
kernel-based classifier with ANOVA, a statistical technique for feature selection.
Reference [33] used slide photographs to diagnose metastatic breast cancer using
a deep learning technique. Reference [34] used deep belief networks to construct a
breast cancer classification model with 99.68% accuracy. Skin infections are a fairly
prevalent type of infection; yet, they are difficult to identify and forecast their targets.
To categorize skin illnesses, [35] presented a deep learning technique. Reference [36]
suggested a method for identifying and diagnosing cancer kinds based on unsuper-
vised feature learning. They employed deep learning to extract characteristics auto-
matically by merging different forms of cancer gene expression data. The majority of
the offered techniques see feature selection as a pre-classification activity. Reference
[37] suggested a hybrid method for feature selection that combines correlation and
optimization approaches. They tested their method on multi-class benchmark gene
expression cancer datasets including MLL, Lymphoma, and SRBCT. Reference [38]
developed an architecture for detecting and visualizing basal cell carcinoma. To
achieve balanced accuracy, they applied fivefold cross-validation procedures on the
BCC dataset.
370 R. Kalavathi and M. Swamy Das
The following is how the paper is organized: The third section presents a breast
cancer outline, then datasets, augmentation, preprocessing, and a few approaches
have been described in Sect. 4.
Although there are around 20 primary kinds of breast cancer, the bulk of them may be
divided into two histological classes: Invasive Ductal Carcinoma (IDC) and Invasive
Lobular Carcinoma (ILC) [38, 39]. Researchers are focusing more on IDC than on
the other two kinds of breast cancer. The various stages of breast cancer are seen in
the following Fig. 1 such as (a) normal duct, (b) usual ductal hyperplasia, (c) atypical
hyperplasia, (d) ductal carcinoma in situ, and (e) invasive cancer.
4.1 Databases
• Natural databases
– ImageNet
– Object-centric database
A Review on Deep Learning Approaches for Histopathology Breast … 371
• Pathology datasets
– Cancer Metastases in Lymph Nodes (Camelyon)
– Breast Cancer Histopathological Image Classification (BreakHis)
– Bio-Image Semantic Query User Environment (BISQUE)
– Tissue Microarray (TMA) from Stanford
– Breast cancer histopathology (BACH)
Cropping, rotation, color change, flipping, translation, and intensity are data
augmentation procedures used in breast cancer histopathology.
4.3 Preprocessing
5 Comparative Analysis
This section compares previously published material on deep learning models for
histopathology pictures, as indicated in Tables 1 and 2.
372 R. Kalavathi and M. Swamy Das
6 Conclusion
In conclusion, this research shows that when models are analyzed at different reso-
lutions, different results are obtained. DL models are prone to low perseverance and
high noise, according to this distinction. As a result, dealing with high-resolution and
high-quality breast cancer histopathology images is crucial. One of the challenges
is gathering high-resolution photographs through cutting-edge scanners and data
storage. As a consequence, the researchers should study and investigate the perfor-
mance of the DL models after super-resolution techniques. Future research should
focus on evaluating the performance of deep learning models after using models to
analyze pathology images.
References
1. Eastland TY (2017) Prostate cancer screening in the African American community: the female
impact
2. Tasnim Z, Shamrat FMJM, Islam MS, Rahman MT, Aronya BS, Muna JN, Billah MM
(2021) Classification of breast cancer cell images using multiple convolution neural network
architectures. International Journal of Advanced Computer Science and Applications 12(9)
374 R. Kalavathi and M. Swamy Das
22. Ratanachaikanont T (2005) Clinical breast examination and its relevance to the diagnosis of a
palpable breast lesion. J Med Assoc Thai 88(4):505–507
23. Kosters JP, Gotzsche PC (2003) Regular self-examination or clinical examination for early
detection of breast cancer. Cochrane Database of Systematic Reviews 2, Article ID CD003373
24. Amoah C, Somhlaba NZ, Addo F-M, Amoah VMK, Ansah EOA, Adjaottor ES, Amankwah
GB, Amoah B (2021) A preliminary psychometric assessment of the attitude of health trainee
undergraduate students towards breast-self examination in Ghana
25. Madubogwu CI, Madubogwu NU, Azuike EC (2021) Practice of breast self-examination among
female students of Chukwuemeka Odumegwu Ojukwu University, Awka. Journal of Health
Science Research 10–18
26. Hanis TM, Islam MA, Musa KI (2022) Diagnostic accuracy of machine learning models on
mammography in breast cancer classification: a meta-analysis. Diagnostics 12(7):1643
27. Sadovsky R (2003) Factors affecting the accuracy of mammography screening. Am Fam
Physician 68(6):1198
28. Dai X, Fu G, Reese R, Zhao S, Shang Z (2021) An approach of Bayesian variable selection for
ultrahigh dimensional multivariate regression. Stat e476
29. Wang Z, Sun X, Sun L, Qian X (2013) Tissue classification using efficient local fisher
discriminant analysis. Przegl˛ad Elektrotechniczny 89(3b):113–115
30. Hernandez JCH, Duval B, Hao J-K, A counting technique based on SVM-RFE for selection
and classification of microarray data. Advances in Computer Science and Engineering 99
31. Koul N, Manvi SS (2020) Ensemble feature selection from cancer gene expression data using
mutual information and recursive feature elimination. In: 2020 third international conference
on advances in electronics, computers and communications (ICAECC). IEEE, pp 1–6
32. Syafiandini AF, Wasito I, Mufidah R, Veritawati I, Budi I (2018) Prediction of breast cancer
recurrence using modified kernel based data integration model. Journal of Theoretical and
Applied Information Technology 96(16):5489–5498
33. Broadwater DR, Smith NE (2018) A fine-tuned inception v3 constitutional neural network
(CNN) architecture accurately distinguishes between benign and malignant breast histology.
59 MDW San Antonio United States
34. Dandil E, Selvi AO, Çevik KK, Yildirim MS, Süleyman UZUN (2021) A hybrid method based
on feature fusion for breast cancer classification using histopathological images. Avrupa Bilim
ve Teknoloji Dergisi 29:129–137
35. Liao H (2016) A deep learning approach to universal skin disease classification, CSC 400-
Graduate Problem Seminar-Project Report
36. Oh J (2020) Potential of disease prediction using deep learning algorithms. Science 5(4):283–
286
37. Namwongse P, Limpiyakorn Y (2012) Learning Bayesian network to explore connectivity of
risk factors in enterprise risk management. International Journal of Computer Science Issues
(IJCSI) 9(2):61
38. Zavareh PH, Safayari A, Bolhasani H (2021) BCNet: a deep convolutional neural network for
breast cancer grading. arXiv preprint arXiv:2107.05037
39. de Boo LW, Jóźwiak K, Joensuu H, Lindman H, Lauttia S, Opdam M, van Steenis C et al
(2022) Adjuvant capecitabine-containing chemotherapy benefit and homologous recombina-
tion deficiency in early-stage triple-negative breast cancer patients. British Journal of Cancer
126(10):1401–1409
IoT-Based Smart Agricultural
Monitoring System
Abstract Agriculture is critical to the Indian economy and people’s survival. The
intention of this work is to build an embedded-based soil surveillance system and to
assist farmers in identifying appropriate crops to plant on the soil. The pH value of
the soil, temperature, and humidity level in the air all have an impact on crop output.
Using the Node MCU ESP8266 and the ThingSpeak server, this architecture enables
to decrease physical field monitoring and to receive information through mobile or
laptop. The technique is designed to assist farmers in increasing their agricultural
output. The soil is evaluated using a pH sensor, as well as the humidity content
and temperature values are collected using a DH11 sensor. Depending on the values
sensed, these parameters are fed into a machine learning technique called decision
tree regression, which aids in accurately determining the crop that best suits the soil.
Farmers can plant the optimum crop for the soil type.
1 Introduction
Farming has been performed for centuries in every country. Agriculture is the science
and skill of growing plants. Agriculture was a pivotal event in the evolution of
sedentary human society. Agriculture was always done by hand. As the world moves
toward new technologies and applications, agriculture must keep up. The Internet of
Things (IoT) is essential in smart agriculture [1–8]. Sensors in the Internet of Things
can collect data on agricultural lands. We proposed a solution for automated IoT and
smart agriculture. Adequate soil moisture is required for proper plant structure and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 377
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_36
378 R. D. Boddu et al.
high crop yields. Water acts not only as a moisture repellent, but also as a temperature
regulator in the plant. During the process of thermo-regulation, the plant evaporates
up to 99% of its total water content while using only 0.2–0.5% to build vegetable
weight. As a result, it is effortless to see how a plant’s humidity requirements vary
depending on the climate and growth stage. Whenever the IoT-based farm monitoring
system is activated, it runs a set of tests. A smart farm monitoring project based on
the IoT, Raspberry Pi, and Node MCU is presented to enhance the efficiency of crop
production and effectiveness. Agriculture provides a significant source of income
for India’s largest population and contributes significantly to the Indian economy.
Crop improvement has been minimal in the agricultural industry over the last decade.
Food prices have risen steadily as crop yields have declined. A variety of factors,
including water, contributed to this. The fundamental purpose of the Internet of
Things is to ensure that the appropriate information is sent to the appropriate persons
at the proper time. Hence, IoT integrated with agriculture gives an excellent solution
and with addition machine learning algorithm decision tree regression will end this
problem. The cultivation of the suitable crop for the soil is becoming more difficult
to the humans due to the either atmospheric conditions or the instability of standard
pH value of the soil, but by using the decision tree regression algorithm, it is quite
easy for the farmers to grow the exact crop that matches the soil.
2 Types of Sensors
The “DHT11” is a temperature and humidity sensor. This sensor is widely used in
many of the applications due to its accuracy and simple architecture. DHT11 sensor
senses the humidity content in the air and temperature. The sensor has a specialized
NTC for temperature measurement as well as a 8-bit microcontroller for serial output
data of temperature and humidity values. After calibration, the sensor is ready to
connect to other microcontrollers (Fig. 1).
Fig. 2 pH sensor
2.2 pH Sensor
3 Raspberry Pi 4, Model B
Raspberry Pi is a small computer that can run a variety of apps, when connected to
regular monitors and peripherals. Traditional desktop operations like file creation,
storage, and Internet streaming are available on Raspberry Pi models, which are
barely larger than a credit card and include hardware components. The Raspberry Pi
base contributes to the Linux kernel and other open source developments, as well as
providing open source software for its own products (Fig. 3).
Fig. 3 Raspberry Pi 4
Model B
380 R. D. Boddu et al.
The data can be transmitted via the Wi-Fi protocol utilizing the ESP8266-based Node
MCU platform. Wi-Fi communication module: The ESP8266 is a low-cost Wi-Fi
module that may be used with a UART serial connection to add Wi-Fi functionality.
Among the features are the 802.11 b/g/n protocol, as well as an integrated TCP/IP
protocol stack. Module for Wi-Fi Node MCU is a low-cost open source IoT platform.
It came with firmware that operated on Espressif Systems’ ESP8266 Wi-Fi SoC, as
well as hardware based on the ESP-12 module (Fig. 4).
Fig. 6 ThingSpeak
visualization
6 ThingSpeak Server
ThingSpeak is an IoT open data platform and API that lets you gather, store, evaluate,
monitor, and act on sensor data. ThingSpeak is a cloud-based platform that allows to
combine, display, and study the data streams. ThingSpeak contains a range of useful
capabilities, providing the ability to setup devices to submit data to it using standard
IoT protocols. Real-time evaluation of sensor data (Fig. 6).
7 Flowchart
See Fig. 7.
Raspberry Pi requires power supply to run the device and Node MCU is activated by
serial communication to the Raspberry Pi (Fig. 9).
382 R. D. Boddu et al.
Fig. 7 Flowchart
10 Methodology
Fig. 9 Connection of
ESP8266 with Raspberry Pi
Node MCU and DHT11” into it. Now by executing the code in Arduino IDE software,
we can see the results in serial monitor.
We can use Raspberry Pi as a storage device to store the sensed values from
the sensors. Then, these values undergo decision tree regression machine learning
algorithm to find the exact crop that need to be grown on that soil with high accuracy.
11 Results
The pH, humidity, and temperature values of various fields are sensed. The resulted
values were send to cloud with help of Node MCU. Data send to cloud useful to
analyze the values in order to give best suitable crop to the soil. Then, the data set of
pH, temperature, and humidity is exported in .CSV (comma separated values) format
and trained with “decision tree regression” Machine learning algorithm to obtain the
accuracy and ideal crop that is to be grown (Figs. 10, 11, 12).
12 Conclusion
We used a Raspberry Pi, a Node MCU ESP8266 (a Wi-Fi module), pH, and DH11
sensor in this IoT-based smart agriculture monitoring system. We will know the
soil pH value, as well as the temperatures and humidity in a specific region, using
this system, so that the irrigation system and fertilizer usage can be monitored and
controlled. IoT is not restricted to a single application but may develop and explore
new trends, and it is utilized in a variety of agricultural sectors to improve time
efficiency, pest control, soil production management, and varied ways. This project
reduces human effort while increasing crop yield. Farmers can benefit from this smart
farming, which has a high level of precision.
384 R. D. Boddu et al.
Fig. 12 Output after sensed values are subjected to decision tree regression algorithm
References
1. Sakthipriya N (2014) An effective method for crop monitoring using wireless sensor network.
Middle-East J Sci Res 20(9):1127–1132
2. Hade AH, Sengupta DM (2014) Automatic control of drip irrigation system & monitoring of
soil by wireless. IOSR Journal of Agriculture and Veterinary Science (IOSR-JAVS). e-ISSN,
2319–2380
3. Kuenzer C, Knauer K (2013) Remote sensing of rice crop areas. Int J Remote Sens 34(6):2101–
2139
4. Sanjukumar RK (2013) Advance technique for soil moisture content based automatic motor
pumping for agriculture land purpose. International Journal of VLSI and Embedded Systems
4:599–603
5. Giri M, Kulkarni P, Doshi A, Yendhe K, Raskar S (2014) Agricultural environmental sensing
application using wireless sensor network. International Journal of Advanced Research in
Computer Engineering & Technology (IJARCET) 3(3)
6. Ayaz M, Ammad-Uddin M, Sharif Z, Mansour A, Aggoune EHM (2019) Internet-of-Things
(IoT)-based smart agriculture: toward making the fields talk. IEEE Access 7:129551–129583
7. Kurosu T, Fujita M, Chiba K (1995) Monitoring of rice crop growth from space using the ERS-1
C-band SAR. IEEE Trans Geosci Remote Sens 33(4):1092–1096
8. Chakraborty M, Manjunath KR, Panigrahy S, Kundu N, Parihar JS (2005) Rice crop parameter
retrieval using multi-temporal, multi-incidence angle Radarsat SAR data. ISPRS J Photogramm
Remote Sens 59(5):310–322
Singular Value Decomposition
and Rivest–Shamir–Adleman
Algorithm-Based Image Authentication
Using Watermarking Technique
1 Introduction
Nowadays, the availability of digital data like image, audio, video, etc., has increased
significantly. This data can be shared among different persons without losing its
quality parameters. This exponential development of digital data has additionally led
to a number of threats due to multimedia security controls, copyright protection, and
critical content verification. In these days, huge amount of digital data is generated in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 387
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_37
388 Y. Bhavani et al.
real world. So, it is essential to handle issues related to privacy, security, and copyright
protection. Copyright protection can be provided using watermarking technique.
Digital watermarking is a technique proposed for protecting the ownership rights
of digitalized data by determining the original copyright owners of information
contents. Digital watermarking integrates or embeds some information as the owner’s
name or logo in to a digital media. Thus, watermark information will serve as the
identification mark of its owner. With the aid of this embedded watermark, we can
suspect whether the data or an image is illegally edited and copied. Digital water-
marking is the process of embedding other specific digital data like text, audio, or
image into source content. The data embedded into source content is called as a
watermark or can be termed as Label.
Digital watermark may be visible or invisible or fragile identification code that
is embedded permanently in digital data. This watermark remains in the digital data
even after the digital watermark is extracted by means of various decryption algo-
rithms, so that rightful ownership of data remains solid at all times. Visual watermarks
are those that are visible to the naked eye and are widely used to show image identity
but invisible watermarks are not visible to the human eye.
2 Related Work
The proposed scheme has three phases: key creation, embedding the watermark
image, and extraction of the watermark image. As depicted in Fig. 1, two keys
(Key1 and Key2) that are generated using RSA algorithm are used in embedding and
extracting of watermark image. This secrete key is used for initial conditions and
parameters to produce a complex mapping system. This system is used to change
the watermark before embedding process. This updated watermark will protect the
real watermark from attack.
Authenticated data is generated by applying Exclusive-OR operation on binary
bits found on singular image block and watermark image. The two keys generated
by RSA algorithm used in embedding process and extraction process enhance the
system security. This method improves durability and tolerates various attacks of
image deception. The proposed system adds power to the authentication system.
Fig. 2 Embedding
watermark
3. Do round off operation after multiplication of S r with a scalar α and apply modular
operation to obtain binary bits (S r = floor(αS r ), Br = S r mod 2).
4. Tile these binary bits within the dimensions of the block of image i.e., create a
matrix Br whose row contains B vectors.
5. Enable Br with irregular permutation primarily dependent on Key2 which is
encrypted and then apply XOR among Br and W r . (X r = Br XOR W r )
6. Insert validated information (X r ) in LSB of Ar to get block of image consisting
of watermark.
By performing the above steps with complete blocks, we obtain an image as shown
in Fig. 2.
Wr = Brw XOR X rw
the watermark that is extracted will be an image that represents noise, as shown in
Fig. 9.
Gaussian Noise attack is used to compare the efficiency of our work. The proposed
SVD–RSA-based watermarking technique is measured by using Normalization
Coefficient (NC) given in Eq. 1.
i jWi (i, j).RW i (i, j)
N C = (1)
i j Wi (i, j). i j RW i (i, j)
The proposed SVD and RSA based digital watermarking method is resistant to
following attacks.
Active Attacks: The hacker attacks intentionally by removing the watermark from
the original image or he can make it undetectable. The hacked image is critical for
identification of the owner, proof of identity, etc. To overcome this attack, encryption
is used in the proposed method.
Passive Attacks: Passive attacks are also intentional and detect the presence of
watermark. The hacker hides the watermark without destroying it. In this work, we
used SVD to overcome these attacks.
5 Conclusion
References
1. Abdulsattar FS (2012) Robust digital watermarking technique for satellite images. J Eng Dev
16(2):133–143
2. Abdallah HA, Ghazy RA, Kasban H, Faragallah OS, Shaalan AA, Hadhoud MM (2014)
Homomorphic image watermarking with a singular value decomposition algorithm. Inf Process
Manage 50(6):909–923
3. Liu R, Tan T (2002) An SVD-based watermarking scheme for protecting rightful ownership.
IEEE Trans Multimedia 4(1):121–128
4. Wong P, Memon N (2001) Secret and public key image watermarking schemes for image
authentication and ownership verification. IEEE Trans Image Processing 10(10):593–1601
5. Byun S, Lee S, Tewfik A, Ahn B (2002) A SVD-based fragile watermarking scheme for image
authentication. In: International workshop on digital watermarking, pp 170–178
6. Joseph A, Anusudha K (2014) Singular value decomposition based wavelet domain water-
marking. In: International conference on computer communication and informatics, Coim-
batore, pp 1–5
7. Prasad RM, Koliwad S (2010) A robust wavelet-based watermarking scheme for copyright
protection of digital images. In: Second international conference on computing, communication
and networking technologies, pp 1–9
8. Oktavia V, Lee WH (2004) A fragile watermarking technique for image authentication using
singular value decomposition. In: Advances in multimedia information processing. Lecture
notes in computer science, pp 42–49
9. Ramos CC, Reyes RR, Miyatake MN, Meana HMP (2011) Watermarking-based image authen-
tication system in the discrete wavelet transform domain. Discrete Wavelet Transforms-
Algorithms and Applications
10. Hsu C-T, Wu J-L (1999) Hidden digital watermarks in images. IEEE Trans Image Process
8(1):58–68
11. Kaewamnerd N, Rao KR (2000) Wavelet based image adaptive watermarking scheme. Electron
Lett 36(4):312–313
12. Barni M, Bartolini F, Piva A (2001) Improved wavelet-based watermarking through pixel-wise
masking. IEEE Trans Image Process 10(5):783–791
13. Lakshmi Priya CV, Nelwin Raj NR (2017) Digital watermarking scheme for image authen-
tication. In: International conference on communication and signal processing (ICCSP), pp
2026–2030
14. Voloshynovskiy S, Herrigel A, Baumgaertner N, Pun T (2000) A stochastic approach to content
adaptive digital image watermarking. In: International workshop on information hiding. pp
211–236
15. Parashar P, Singh RK (2014) A survey: digital image watermarking techniques. International
Journal of Signal Processing, Image Processing and Pattern Recognition 7(6):111–124
16. Kalra GS, Talwar R, Sadawarti H (2015) Adaptive digital image watermarking for color images
in frequency domain. Multimedia Tools and Applications 74(17):6849–6869
17. Fares K, Amine K, Salah E (2020) A robust blind color image watermarking based on Fourier
transform domain
Crop Yield Prediction Using Machine
Learning Algorithms
Abstract Agriculture is the most crucial aspect in ensuring survival. Climate and
other environmental changes have become a significant threat to agriculture. Esti-
mating the crop yield before the harvest would assist farmers in choosing marketing
and storage strategies. Machine learning algorithms are used for developing prac-
tical and efficient solutions to predict the yield. Historical data, such as rainfall,
temperature, fertilizer, and past crop yield data, are used to predict crop yield. This
paper focuses mostly on estimating yield by utilizing a variety of machine learning
methods. The models utilized here are ensemble XGBoost-RF, gradient boosting,
random forest, and XGBoost out of which ensemble XGBoost-RF showed maximum
accuracy with the R2 of 0.976111 and MSE of 0.002163.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 397
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_38
398 B. Rama Devi et al.
algorithms help to predict the crop yield, which is a better way than using excessive
hybrid products to increase the crop. This work emphasizes crop yield prediction
with the help of machine learning (ML) algorithms. It is vital to make efficient use
of agricultural land to ensure the food security of the country. So, ML algorithms
can be used to predict the yield from the historical data. Various ML algorithms [1]
such as random forest, XGBoost, gradient boosting, and ensemble XGBoost-RF are
used to predict the yield based on various parameters [2] like rainfall, temperature,
fertilizers, etc. By using the above-mentioned algorithms, from the results, it can be
concluded that the proposed hybrid model called extreme gradient boosting–random
forest gave maximum accuracy.
2 Crop Yield
The quantity of a crop produced per unit of land is referred to as crop yield. It is a
crucial measurement to comprehend since it helps us to understand food security.
Crop yield is one of the measures used to assess the efficiency of food production.
Understanding crop yield and being able to estimate, it is significant for several
reasons. First, understanding food security, or the capacity to produce enough food to
fulfill human needs soon, requires the ability to estimate crop yield. Second, for each
crop, the potential yield should be estimated prior. Finally, crop yields are important
because they have a direct impact on how much money people will spend on food.
Rainfall, temperature, and fertilizers are the different factors that are important to
achieving high yields [2]. In this work, a dataset related to agriculture (shown in
Table 1) is used for the analysis. The dataset contains rainfall, fertilizer, temperature,
nitrogen, phosphorus, and past crop yield.
Table 1 Minimum,
Parameters Minimum Maximum Standard deviation
maximum, and standard
deviation of the parameters Rainfall (in 400 1300 400.0427
mm)
Fertilizer (urea) 50 80 10.0282
(kg/acre)
Temperature 24 40 5.42635
(°C)
Nitrogen (N) 59 80 6.677079
Phosphorus (P) 18 25 1.951695
Potassium (K) 15 22 1.817254
Yield (Q/acre) 5.5 12 1.965902
Crop Yield Prediction Using Machine Learning Algorithms 399
The collecting of electronic data has grown more prevalent in most domains of human
endeavor because of advances in computer technology over the last several decades.
Many organizations need vast volumes of data dating back many years. This informa-
tion relates to individuals, financial activities, biological data, etc. Simultaneously,
data scientists have been working on algorithms which are iterative computer soft-
ware applications that can look at vast amounts of data, evaluate it, and find patterns
and links that people cannot. Analyzing the previous events can reveal a wealth
of information on what to expect in future from the same or nearly comparable
events. These algorithms may learn from the past and use what they have learned to
make better decisions in future. Data analysis is not a novel concept. ML algorithms
distinguish themselves from other techniques and can cope with significantly large
amounts of data and data with minimal structure. It enables ML algorithms to be
effective in a wide range of applications previously thought to be too complicated
for conventional learning techniques.
In this current work, developed and applied four ML algorithms: random forest,
gradient boosting, XGBoost, and ensemble XGBoost-RF used to predict the crop
yield.
For classification and regression tasks, this is the most common and powerful super-
vised ML approach. During training and creating class outputs, this technique creates
a vast number of decision trees. Random forest (RF) is a bagging technique that
employs many decision trees on subsets of a given set of observations and averages
the results to improve the dataset’s estimated accuracy.
The predictions from each tree are collected by random forest, which then predicts
the ultimate output based on the popular vote of predictions. The more trees in the
forest, the more accurate it becomes, and the risk of errors is reduced. There are two
random factors in a random forest. They are as follows:
1. Random subset of features.
2. Bootstrap samples of data.
A random forest [3] is merely a group of trees, each of which makes a prediction,
and they gather from all of them and use the mean, mode, and median of the collection
as the forest’s prediction, based on the data which may be continuous or categorical.
To a greater extent, this appears to be acceptable. But most of the trees may have
generated predictions based on random possibilities because each had its own set of
conditions.
400 B. Rama Devi et al.
Gradient boosting is one of the boosting methods that is used to reduce the bias error
of the model. It can be used for predicting continuous target values, i.e., as a regressor.
The gradient boosting regressor (GBR) reduces the prediction error and increases
the accuracy of the model. GBR is a fully integrated model that offers improved
performance and stability. To overcome the regression problem, the GBR method
[4] extends the boosting technique. This method makes use of negative gradients
of the loss function to solve the minimum value. GBR has been widely utilized in
biological research because of its capacity to handle messy and noisy data and has a
good predictive ability for non-linear data.
3.3 XGBoost
XGBoost refers to the extreme gradient boost algorithm. It provides a parallel tree
boosting that solves the issues in data science fast and accurately. This algorithm
performs best on datasets that are well-structured or tabular.
This model uses boosting ensemble learning with the help of decision trees.
Gradient boosting is XGBoost’s original model that involves iteratively merging
weak base learning techniques into a stronger learner. The residual will be utilized to
adjust the previous predictor at each iteration of gradient boosting, so that the stated
loss function may be improved. Regularization is introduced to the loss function in
XGBoost to create the objective function for monitoring model performance, which
is represented by
where ϕ denotes the parameters trained from the provided dataset; L denotes the
training loss function which is a metric for how well a model fits on a training set
data.
The argument ‘n’_estimators sets the number of trees used in the ensemble. The
XGBoost-RF ensemble is first fitted to the available data, after which the predict
function generates predictions on new data.
Gradient boosting is extremely slow to train a model and exacerbated by big
datasets. XGBoost solves the speed concerns of gradient boost by incorporating
different strategies that drastically speed up the model’s training and, in many cases,
improve the model’s overall performance [6]. The primary advantage of training
random forest ensembles with the XGBoost library is to boost the speed.
T
where y refers to the number of target values; b = (b1 , b2 , . . . , b y ) ; b∗ is the
prediction value, and f (a x ) denotes the regression function for the feature vector
ax .
The collection and processing of sample data are the initial step in the construction
of a prediction model. To serve as input data, large amount of data must be compiled.
To train the algorithm, dataset with different parameters is to be considered. The vari-
ables in the dataset are rainfall, temperature, fertilizer, nitrogen, phosphorus, potas-
sium, and yield. After collecting data, apply the four ML algorithms to check the accu-
racy of each algorithm. In this project, a random forest, gradient boosting, XGBoost,
and ensemble XGBoost-RF are implemented using Python on the Jupyter notebook
application. Pandas, scikit learn, NumPy, and matplotlib are the main libraries used.
The data has two parts as follows: (i) testing and (ii) training (67% for training and
33% for testing). During experimentation with tuning the hyperparameters, model
depths had taken the maximum values. Figure 1 shows the correlation between actual
crop yield and predicted crop yield of ensemble XGBoost-RF algorithm.
Figure 2 depicts the correlation between actual crop yield and predicted crop yield
of XGBoost algorithm. Similarly, Fig. 3 shows the correlation between actual crop
402 B. Rama Devi et al.
Fig. 1 Plot of measured crop yield versus predicted crop yield of ensemble XGBoost-RF
yield and predicted crop yield of random forest algorithm. Finally, Fig. 4 depicted the
correction correlation between actual crop yield and predicted crop yield of gradient
boosting algorithm.
The obtained R2 and MSE values of the four algorithms applied to the collected
datasets were shown in Table 2. The R2 value for gradient boosting is 0.952457, which
Fig. 2 Plot of measured crop yield versus predicted crop yield of XGBoost
Crop Yield Prediction Using Machine Learning Algorithms 403
Fig. 3 Plot of measured crop yield versus predicted crop yield of random forest
Fig. 4 Plot of measured crop yield versus predicted crop yield of gradient boosting
means the accuracy level for gradient boosting is 95.24%. Likewise, the accuracy
level for random forest, XGBoost, and ensemble XGBoost-RF is 95.43%, 96.58%,
and 97.61%, respectively. From the above results, an ensemble XGBoost-RF has the
highest R2 value. The higher the R2 value, the more accurate the algorithm. The value
of MSE for gradient boosting is 0.004303, for the XGBoost algorithm, MSE value
is 0.003092, for the random forest algorithm, MSE value is 0.004133, and for the
ensemble XGBoost-RF algorithm, the MSE value is 0.002163 which is least from all
404 B. Rama Devi et al.
Table 2 Comparison of R2
Model R2 MSE
and MSE
XGBoost-RF 0.976111 0.002163
XGBoost 0.965855 0.003092
Random forest 0.954357 0.004133
Gradient boosting 0.952457 0.004303
the four algorithms used. The lesser the MSE value the more accurate the algorithm
will be. So from the above results, the ensemble XGBoost-RF is the best one.
5 Conclusion
In this work, the crop yield data (Q/acre) was analysed. The following key
observations are observed from the analysis. They are as follows:
• The data used for constructing the model consists of rainfall, temperature, fertil-
izer, nitrogen, phosphorous, and potassium which are the input parameters, and
crop yield is the output.
• Four ML algorithms include XGBoost, random forest, gradient boosting are
developed to predict the crop yield. To evaluate the performance of developed
algorithms, considered R2 and MSE.
• From the results, it is evident that the ensemble XGBoost-RF algorithm shows
maximum accuracy (R2 = 0.97611) and least error (MSE = 0.002163), whereas
XGBoost algorithm provides R2 of 0.965855 and MSE of 0.003092.
• The results analysis (Table 2) shows that ensemble XGBoost-RF shows better
performance over random forest, gradient boosting, and XGBoost. Hence, the
above specified results show that ensemble XGBoost-RF can predict the crop
yield efficiently.
References
1. Raja SP, Sawicka B, Stamenkovic Z, Mariammal G (2022) Crop prediction based on character-
istics of the agricultural environment using various feature selection techniques and classifiers.
J IEEE Access 10:23625–23641
2. Venugopal A, Aparna S, Mani J, Mathew R, Williams V (2021) Crop yield prediction using
machine learning algorithms. Int J Eng Res Technol 9(13):87–91
3. Priya P, Muthaiah U, Balamurugan M (2018) Predicting yield of the crop using machine learning
algorithm. Int J Eng Sci Res Technol 7(4):1–7
4. Khan R, Mishra P, Baranidharan B (2020) Crop yield prediction using gradient boosting
regression. Int J Technol Exploring Eng 9(3):2293–2297
5. Ravi R, Baranidharan B (2020) Crop yield prediction using XG boost algorithm. Int J Recent
Technol Eng 8(5):3516–3520
Crop Yield Prediction Using Machine Learning Algorithms 405
6. Oikonomidis A, Catal C, Kassahun A (2022) Hybrid deep learning-based models for crop yield
prediction. J Applied Artificial Intelligence 36(1)
7. Ragam P, Nimaje DS (2018) Evaluation and prediction of blast-induced peak particle velocity
using artificial neural network: a case study. Noise Vib Worldw 49(3)
Analysis of Students’ Fitness and Health
Using Data Mining
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 407
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_39
408 P. Kamakshi et al.
1 Introduction
Each country’s progress requires high-quality education. The amount of data in the
education domain is expanding by the day, thanks to admission systems, academic
information systems, learning management systems, and e-learning. As a result,
using this vast amount of educational data to predict student health is a hot topic.
The technique of obtaining useful insights from vast quantities of data is referred
as data mining, or knowledge discovery (KDD) in databases. Student health and
fitness analysis is a critical topic in the education [1] data mining business since it
is a significant step toward personalized education. The following aspects have been
shown to have a significant impact on academic performance:
• Personality traits of students (e.g., neurotic tendencies, conscientiousness, and
extroversion).
• Personal concerns of students (e.g., age, sex, physical fitness, indifference,
emotional stability, stress, mood, panic attacks, activeness, and energy levels).
• Lifestyle behaviors (e.g., nutrition, regular exercise, sleeping habits, social
connections, and effective planning); and
• Learning conduct (e.g., presence in class, active participation, and study time).
Many data-driven approaches for predicting health status have been developed
by analyzing the impact of various factors on student health and fitness. Despite the
development of various health prediction systems for college students, substantial
challenges remain, such as acquiring student’s whole profile and merging this data
to achieve a comprehensive overview. Analyzing the elements that affect students’
health and utilizing that data to construct a strong, high-accuracy prediction model,
as well as leveraging the statistical model to give individualized support that could
assist students improve their behavior and enhance their study-life balance.
2 Literature Survey
3 Proposed Methodology
The proposed methodology deals with the tree regression, LSTM, KNN, random
forest regression, voting classifier for accuracy [11] comparison. The algorithm is
trained using a student dataset that contains information on students’ health status.
Students’ data is collected from educational institutes in order to create a health and
fitness management [12] system. The most effective and efficient model would be
determined by comparing the performance and accuracy of these models (Fig. 1).
We develop a single model that trains on numerous models and forecasts results
depended on the cumulative majority [13] of votes for each resultant class, rather
than building separate models and evaluating their performance.
Random forest is a flexible supervised machine learning algorithm which uses
bagging techniques to improve the performance. A random forest is a collection of
tree classifiers with the parameters {h(x, βk), k = 1…}. The meta classifier (x, βk) is a
CART-based regression tree with x as the input vector and k as an independent random
vector with the same distribution. The forest algorithm’s final output is determined
by voting. Randomness is determined in two ways: The bagging algorithm is used to
choose the training sample set, and the split attribute series is generated at random as
well. Considering the classification model has N attributes in all, we set an trait value
S ≤ N to each intermediate node, select S attributes at random from the N attribute
set as the split attribute set, and determine the optimal strategy of splitting for the
S attributes. The tree classifier’s vote determines the final classification outcome as
shown in Fig. 2.
Gini index, Gini(T ) is defined as follows:
c
Gini = 1 − ( pi )2 (1)
i=1
In the first experiment, three classification algorithms (random forest, voting classi-
fier, and decision tree) are run on a dataset containing student personal and health
details.
The accuracy of Decision tree algorithm, matrix-based Apriori algorithm, support
vector machine and the K-nearest neighbor algorithm is between 73% to 76%.
According to the graphical representation in Fig. 3, the best accuracy was achieved
by random forest (79.8%), which was satisfactory in comparison with prior studies,
while the lowest accuracy was achieved by decision tree.
Analysis of Students’ Fitness and Health Using Data Mining 413
In our study, we evaluate categorization quality using five popular distinct measures.
Details are as follows:
Accuracy: It is also abbreviated as CCI (correctly classified instances). It is
determined by the formula
T p + Tn
Accuracy = (2)
T p + Tn + F p + Fn
ICI: obtained by calculating the count of misclassified instances divided by the overall
instances.
Precision: The fraction of accurately classified instances among all truly classified
instances is represented by the algorithm.
Tp
Precision = (3)
T P + FP
TP
Recall = (4)
T p + Tn
precision × recall
F1 = 2 × (5)
precision + recall
From Eqs. (2–5), T p indicates true positive, T n indicates true negative, F p indicates
false positive, and F n indicates false negative. These values were derived using a
confusion matrix, which resulted in the execution of the algorithm.
4 Results
The best model turned out to be random forest as it has the highest accuracy compared
to others. By using random forest algorithm, a framework is created which predicts
the score. Now, a questionnaire is prepared to collect details of a student to analyze
students’ health.
The questionnaire consists of few personal questions such as name, gender, and
age and questions related to students’ health status mentioned in Table 1. The students
answer these questions according to their choice. By analyzing the answers of these
questions, we get a numerical score at the end. A framework is developed using
random forest algorithm to predict the score based on the inputs collected from the
students.
Through this score obtained in Fig. 4, we can conclude that a student having score
from.
• 16–20 is healthy and extremely active and attends college regularly.
• 11–15 is experiencing mild stress.
• 6–10 is weak, with emotional instability and apathy. The student must practice
self-care and seek professional counseling.
• 1–5 is unhealthy, suffering from serious disorder, stress, and depression. The
student should see a doctor and receive treatment.
5 Conclusion
In this paper, data mining methods are used to extract seven health dimensions,
resulting in a health and fitness management system. The findings provide a realistic
framework for educational institutions to master student health and colleges to scien-
tifically prevent health problems among college students. Every educational institute
is in need of an accurate student health and fitness prediction model. However,
Analysis of Students’ Fitness and Health Using Data Mining 415
resolving data quality issues in student health prediction models is sometimes the
most difficult task. This research develops a random forest model-based student
performance prediction model. Many academics have looked into student health and
fitness status prediction as an essential topic in the field of education data mining.
However, there are still several hurdles in predicting accuracy and interpretability
due to a lack of abundance and diversity in both data sources and characteristics. This
system has the potential to lead to extensive investigations. The knowledge gained
in this study has the potential to help with related studies among students who are
interested in developing a student health management system.
References
1. Abd-Ali RS, Radhi SA, Rasool ZI (2020) A survey: the role of the internet of things in the
development of education. Indonesian J Electrical Eng Computer Sci 19(1):215
2. Zhang X, Liu L, Xiao L, Ji J (2020) Comparison of machine learning algorithms for predicting
crime hotspots. IEEE Access
3. Thota C, Sundarasekar R, Manogaran G, Varatharajan R, Priyan MK (2018) Centralized fog
computing security platform for IoT and cloud in healthcare system. In: Fog computing:
breakthroughs in research and practice. IGI global, pp 365–378
4. Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential.
Health Information Science and Systems 2(1):3
5. Smys S, Raj JS (2019) Internet of things and big data analytics for health care with cloud
computing. J Inf Technol 1(01):9–18
416 P. Kamakshi et al.
6. Rajesh SR (2021) Design of distribution transformer health management system using IoT
sensors. Journal of Soft Computing Paradigm 3(3):192–204
7. Ghosh P, Shamrat FMJM, Shultana S, Afrin A, Anjum A et al (2020) Optimization of prediction
method of chronic kidney disease using machine learning algorithm. In: 2020 15th international
joint symposium on artificial intelligence and natural language processing (iSAI-NLP), pp 1–6
8. Dubey H, Yang J, Constant N, Amiri AM, Yang Q, Makodiya K (2015) Fog data: enhancing tele-
health big data through fog computing. In: Proceedings of the ASE bigdata & social informatics
2015. ACM, p 14
9. Yassine A, Singh S, Hossain MS, Muhammad G (2019) IoT big data analytics for smart homes
with fog and cloud computing. Futur Gener Comput Syst 91:563–573
10. Suma V (2019) Towards sustaınable industrialization using big data and internet of things.
Journal of ISMAC 1(01):24–37
11. Furnham A, Monsen J (2009) Personality traits and intelligence predict academic school grades.
Learning and Individual Differences 19(1):0–33
12. Yaacob WFW, Sobri NM, Nasir SAM et al (2020) Predicting student drop-out in higher
institution using data mining techniques. J Phys Conf Ser 1496(1):13–15
13. Alam S, Abdullah H, Abdulhaq R et al (2021) A blockchain-based framework for secure
educational credentials. Turkish J Comput Math Edu (TURCOMAT) 12(10):5157–5167
Local Agnostic Interpretable Model
for Diabetes Prediction
with Explanations Using XAI
Abstract Diabetes mellitus is the deadliest disease that affects the production of
insulin. Diabetes is a life-taking disease, if it is not detected early in advance.
Recently, artificial intelligence-based machine learning (ML) predictive models are
predominantly used in sensitive healthcare domain for predicting diseases in advance.
Most of these ML models are black-box models which provide approximate expla-
nations of how a model behaves. If the models were interpretable, then domain
expert can understand the reasons and modify the model accordingly to get the best
results. In this paper, we present an ensemble local explainable agnostic model for
predicting diabetes. Our study shows that the ensemble voting classifier produced
81% accuracy on the Pima Indian diabetes dataset as compared to other conventional
predictive models. We then applied the explainable AI (XAI) technique which helps
the medical experts in understanding the predictions made by the mode.
1 Introduction
As per the International Diabetes Federation (IDF) [1] report, nearly 537 million
people are suffering from diabetes across the world. Every year diabetes causes 6.7
million casualties, and more than a million children and adolescents (0–18 years)
are suffering from insulin-dependent diabetes. Every year, more than 21 million
children are born with diabetes [2]. About 541 million grown-ups are in danger
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 417
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_40
418 V. Aelgani et al.
2 Related Work
In this section, a discussion on materials and methods used for conducting the study
has been presented. This section is divided into three sections, namely Sects. 3.1,
3.2, and 3.3. Section 3.1 describes the dataset used in the study. Section 3.2 describes
the problem statement, and finally, Sect. 3.3 explains the explainable model.
The ML and DL predictive models were trained and tested on [19]. This dataset was
created by the National Institute of Diabetes and Digestive and Kidney Diseases,
USA. The dataset consists of 768 diabetic patients from the Pima Indian population
near Phoenix, Arizona. Dataset consists of 268 diabetic patients (positive) and 500
non-diabetic patients (negative) with eight different features.
Soft Voting Classifier: The idea behind the soft voting classifier (SVC) is to integrate
theoretically diverse predictive models and use a majority result (predicted outcome)
or the average predicted probabilities to predict the category labels. A classifier of this
type can be useful for a collection of equally well-performing models to compensate
for their deficiencies or shortcomings.
Example: Diabetic prediction is classification task with class label k belonging to
{0, 1}.
0 indicates negative (non-diabetic class) and 1 indicates positive (diabetic class).
Sample calculations are shown in Eqs. 1–3.
y = arg max prob(k0 |x), prob(k1 |x) = 0. (3)
k
This section presents the experimental procedure, results, and analysis. This section
is divided into three sections, namely Sects. 4.1, 4.2, and 4.3. Section 4.1 presents the
experimental setup of the study, Sect. 4.2 describes results, and Sect. 4.3 describes
explanations generated by LIME explainer.
4.2 Results
As the dataset is imbalanced, accuracy may not be the right metric to select the best-
performing model as it misleads the classification decisions. The accuracy and AUC
values are popular metric(s) for comparing predictive models on class imbalanced
datasets. From performance analysis bar graphs presented in Fig. 2, it is clear that soft
voting classifier has the best values for accuracy (81%) and AUC (84%). Therefore,
we have selected soft voting classifier as the best-performing complex predictive
model to generate explanations for the instance of interest using LIME.
References
20. Ribeiro M, Singh S, Guestrin C (2019) Why should I trust you? Explaining the predictions of
any classifier. arxiv160204938 cs stat. 2016
21. Vivekanand A, Vadlakonda D, Lendale V (2021) Performance analysis of predictive models on
class balanced datasets using oversampling techniques. Soft computing and signal processing.
Springer, Singapore, pp 375–383
22. Felzmann H, Fosch-Villaronga E, Lutz C, Tam‘o-Larrieux A (2020) Towards transparency by
design for artificial intelligence. Science and Engineering Ethics 26(6):3333–3361
23. Langer M, Oster D, Speith T, Hermanns H, Kästner L, Schmidt E, Sesing A, Baum K (2021)
What do we want from explainable artificial intelligence (XAI)? A stakeholder perspective
on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence
296:103473
Exploring the Potential of eXplainable AI
in Identifying Errors and Biases
Abstract Artificial intelligence has virtually pervaded every field and its adaptation
is a catalyst for organizational growth. However, the potential of artificial intelligence
is often associated with a difficulty to understand the logic veiling behind its decision
making. This is essentially the premise upon which XAI or eXplainable AI functions.
In this field of study, researchers attempt to streamline techniques to provide an
explanation for the decisions that the machines make. We endeavor to delve deeper
into what explainable means and the repercussions of the lack of definition associated
with the term. We intend to show in this paper that an evaluation system based solely
on how easy it is to understand an explanation, without taking into account aspects
such as fidelity, might produce potentially harmful explanation interfaces.
1 Introduction
R. Chahar
Delhi Technological University, New Delhi, India
U. Latnekar (B)
Bennett University, Noida, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 427
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_41
428 R. Chahar and U. Latnekar
This aspect represents a huge technical and ethical issue for this field, especially
when building autonomous systems that are meant to replace or aid humans in highly
impacting decisions. If we can’t explain “why” a certain algorithm took a certain
decision, how can we trust these systems? How do we ensure that their internal
models are not biased or broken? How do we understand when the machine is failing?
The problem of introspection and accountability for these systems is a very serious
one. Marvin Minsky et al. raised the issue that AI can function as a form of surveil-
lance, with the biases inherent in surveillance, suggesting Humanistic Intelligence
(HI) as a way to create a more fair and balanced “human-in-the-loop” AI [8].
As a natural result of these emerging concerns about AI, the field of Explainable
AI (XAI) was born. The goal of this research field is to build systems that can provide
humans with a deeper understanding of AI algorithms [5], with the ultimate objective
of making errors and biases easier to spot or predict and AI-based systems generally
more trustworthy.
In this paper, we will analyze some of the extreme consequences of the lack of
such definition, and more generally the lack of a comprehensive way to evaluate AI
explanations.
To explain this idea, we will proceed as follows.
Section 2 provides some background on the problems which XAI is trying to solve
and a classification of the solutions that are currently being developed. Section 4
introduces the problem of defining interpretability, and proposes a classification of
the aspects that define an explanation. Section 5 discusses the idea that explanation
interfaces might be able to fool a human user into believing that a specific algorithm
is doing the right thing, leveraging his or her own bias. Finally, Sect. 6 contains some
concluding thoughts on this subject.
2 Background
3 Methods
From a general perspective, [2] identifies two families of approaches to this problem:
Transparent box design, which aims at building algorithms that are more inter-
pretable by design. A transparent box design is a cognitive process to try and simplify
things so it is easier for the human brain to understand them. This enables people
with cognitive impairments, parents, or carers of young children, among others, as
well as scholars. It has been seen that clear visualizations help in various types of
user engagement too.
Reverse-engineering approaches, also called post-hoc interpretability
approaches, which try to provide explanations for already existing algorithms.
Post-hoc interpretability refers to the nuanced understanding and prediction of
anonymous public data.
Some examples of the latter type are listed in Ref. [4].
Visualization, for instance, focuses on representing visually some key aspects of
the model, for example which pixels of an image are important for a classification
output.
Approximation consists in using simple models or simplifying already existing
models: in single tree approximation, for example, the internal structure of an AI
algorithm is approximated to a classification tree, shown in Fig. 1.
Causal Models (CAMEL) try to generate causal explanations of Machine
Learning operations and present them to the user as intuitive narratives. A scheme
of the architecture needed for this approach is illustrated in Fig. 2.
Other approaches include Learning and Communicating Explainable Represen-
tations, where explanations themselves are learned as a separate part of the training
process, and Explanation by Example, where the AI is able to provide an example,
or a prototype, of how it thinks that a typical member of a given class should appear
and/or which characteristics should be changed to change the outcome.
It is important to notice how these approaches differ in how thick the explanation
interface is, i.e., how many complex manipulations the initial model undergoes before
being presented to the user. Intuitively, we can see for example that the visualization
approach tries to give a close insight on how the internal elements are activated by
a certain picture, while in techniques such as CAMEL and Learned Explanations
there is a much more indirect connection between elements of the original model
and elements of the explanation, which is also reflected on the increased complexity
of the interface itself.
This intuitive idea will be further expanded in Sect. 5.1 using the concept of
fidelity.
4 Defining Interpretability
As anticipated in Sect. 1, one fundamental problem in the field of XAI is that there is
no single conventional notion of interpretability. Reference [7] goes as far as consid-
ering the term itself ill-defined, therefore stating that claims about interpretability
generally have a quasi-scientific nature. Reference [2] on the other hand, considers
the lack of a mathematical description as an obstacle for the future development of
this field. Reference [3] itself defines the formalization of an evaluation metric for
explanations as one of the goals of the XAI program, to be developed in parallel with
technical solutions.
When analyzing the problem of defining and evaluating interpretability, two
questions naturally arise:
Explainable to whom? The concept of user of an AI system is not always well-
defined, nor is the concept of user of an explanation. This might include:
Exploring the Potential of eXplainable AI in Identifying Errors and Biases 431
Bearing in mind the goals of XAI, there are a number of metrics that can be used to
characterize and evaluate a solution:
• Complexity: how many elements are there in the explanation?
• Clearness: how cognitively hard is the explanation? How difficult is it to under-
stand the correspondence between the elements of the explanation and the
information we are trying to gain?
• Informativeness: how much information, weighted on how meaningful it is, can
be extracted by the explanation? E.g., does the explanation significantly modify
the level of uncertainty about the AI behavior?
• Fidelity: how closely does the explanation represent the functioning of the system?
Are all the facts inferred from the explanation also applicable to the original
system?
Clearly, a specific metric will be more or less important depending on the specific
user and use-case. There is however a deeper distinction that has to be made, which
is related to how these metrics are measured.
1 This goal is not explicitly listed in the original scope of XAI, but has gained traction recently with
the introduction of the concept of right for an explanation in Europe’s new GDPR [10].
432 R. Chahar and U. Latnekar
Complexity, for instance, is often measured using a proxy quantity such as the
number of elements in the explanation, which can be for example the depth of the
decision tree or the number of neurons. On the other hand, clearness and informa-
tiveness are more difficult to quantify a-priori, but could be empirically evaluated by
providing the explanations to a group of humans and verifying how they respond.
In general, we can identify two ways of evaluating an AI explanation: one is
using a direct measurement of some quantity that we can derive directly from the
explanation. The second one is considering an explanation itself a black-box, and
check if it actually provides a better understanding of the AI model to some selected
group of individuals used as a benchmark. While the first method is not always
feasible, since choosing which quantity is representative of a certain aspect is in itself
a difficult decision to make, the second method clearly presents the same problems
of opaqueness and unreliability that AI models themselves have.
Of all the metrics highlighted in Sect. 4.2, fidelity, also called faithfulness in literature
[1], is probably the most complex to evaluate. On one hand, the maximum fidelity
is already represented by the implementation itself, but on the other hand the reason
we need explanations is that the implementation itself is not clear enough.
This is particularly important since AI explanations are also targeted to unspecial-
ized users, which need to understand what’s happening without necessarily having
a solid background on the internal functioning of such systems.
Yet, fidelity plays a fundamental role when we have to evaluate an AI algorithm,
as it quantifies the difference between what is being evaluated (the AI model) and
the instrument we are using for this evaluation (the AI explanation). This represents
in some sense the “measurement error” introduced by the explanation.
Let’s take for example the situation depicted in Fig. 3: in this case, a human
operator is evaluating an AI model trough an explanation interface.
While this idea might seem easy enough to understand, devising an operational
way to measure it is a non-trivial task.
Let’s take for example Causal Models: in this case, the explanation and the original
model will typically have a very different nature, since the explanation interface
produces causal relationships, while the AI model typically reasons in terms of
statistical correlation. In this case, how can we measure the fidelity of this interface?
On the other hand, being unable to measure fidelity poses another question: if
both the AI and the explanation are treated as black boxes, how can we be sure
that evaluating the AI using that explanation interface will effectively improve our
understanding of the underlying AI model? Couldn’t it be that we just think we
understand it?
Exploring the Potential of eXplainable AI in Identifying Errors and Biases 433
Human decision making is known to be affected by many cognitive bias, which are
deeply rooted in our thinking and are often difficult, if not impossible, to exclude
when we make decisions. Recently, [6] studied the consequences of the framing
effect in the domain of AI, in particular how likely is a person to accept or reject an
AI recommendation based on how the output was framed. An interesting result of
this research is, for example, that “perceived reasonableness was significantly higher
when the suggestion of AI was provided before the decision is made than after the
decision is made when perceived accuracy was controlled” ([6], page 5).
While this is not a direct study on AI explanation interfaces, it does show how the
same local decision of an AI can be judged differently simply varying the timing of the
explanation. Similar results have been observed when varying how the explanation
is framed (positive or negative sentences, etc.).
This shows how the evaluation of the correctness of an AI model is not only a
subjective matter, but can vary in the same individual depending on factors that are
external to the AI behavior itself.
434 R. Chahar and U. Latnekar
6 Conclusions
In conclusion, this paper should have shown how the fact that there is no single
definition of what interpretability is and no comprehensive way of evaluating simul-
taneously all the important aspects that compose an explanation, especially fidelity,
leads to the possibility of creating yet another black-box layer over the black-box
model, which can accentuate biases instead of reducing them.
While the proposed argument is just a thought experiment, there are many realistic
elements in this setting that should warn us about the possibility of creating deceitful
explanation interfaces.
References
1. Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L (2018) Explaining explanations: an
overview of interpretability of machine learning. In: 2018 IEEE 5th international conference
on data science and advanced analytics (DSAA), pp 80–89
2. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of
methods for explaining black box models. ACM Comput Surv 51(5)
3. Gunning D (2017) Darpa’s explainable artificial intelligence (xai) program. In: Proceedings of
the 24th international conference on intelligent user interfaces, IUI’19, page ii, New York, NY,
USA. Association for Computing Machinery
4. Gunning D (2018) Xai for nasa
5. Islam MR, Ahmed MU, Barua S, Begum S (2022) A systematic review of explainable artificial
intelligence in terms of different application domains and tasks. Appl Sci 12(3):1353
6. Kim T, Song H (2020) The effect of message framing and timing on the acceptance of artificial
intelligence’s suggestion
7. Lipton Z (2016) The mythos of model interpretability. Commun ACM 61:10
8. Minsky M, Kurzweil R, Mann S (2013) The society of intelligent veillance. In: 2013 IEEE
international symposium on technology and society (ISTAS): social implications of wearable
computing and augmediated reality in everyday life, pp 13–17
9. Russell S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall
Press, USA
10. Selbst AD, Powles J (2017) Meaningful information and the right to explanation. Int Data
Privacy Law 7(4):233–242
Novel Design of Quantum Circuits
for Representation of Grayscale Images
Mayukh Sarkar
1 Introduction
Quantum computing, one of the buzzwords in today’s research, started from the
idea of quantum mechanical model of Turing machine proposed by Paul Benioff,
in 1980 [3]. In 1982, Richard Feynman, ideate the possibility of quantum computer
[4], and it became a buzzword when, in 1994, Peter Shor proved its capability by
proposing a quantum polynomial time algorithm of integer factorization [5]. Since
then, researchers all around the world have been trying to solve multitudes of compu-
tational problems using this technology. One of the promising applications is the
domain of image processing using this powerful paradigm.
M. Sarkar (B)
Department of Computer Science and Engineering, Motilal Nehru National Institute of
Technology Allahabad, Prayagraj, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 435
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_42
436 M. Sarkar
2 Background Information
2
n
−1
|I = ck |k (2)
k=0
of n = log2 (M L) qubits. |k represents the computational basis encoding the
f
position (i, j) and ck = i j 2 represents the pixel values, encoded as probability
( fi j )
distribution satisfying |ck |2 = 1.
Given that, ck and |ck |2 can be calculated efficiently, the n-qubit state repre-
senting the image data can be created efficiently in O(poly(n)) steps, where poly(n)
represents some polynomial function of n [1]. Arbitrary state preparation techniques
proposed by Grover et al. [13] and Soklakov et al. [14], includes unit vectors in
2n -dimensional Hilbert space, i.e., vectors may contain complex amplitudes. But
pixel data of an image is always real. Though the generalized state preparation tech-
niques can also prepare such states, removing the necessity of handling complex
amplitudes has the ability to obtain circuits with smaller subset of gates, such as
NCT, and Ry gates and their controlled counterparts, which keeps the states only in
real vector space. Current paper proposes exactly the same, i.e., goal of the current
paper is to propose an algorithm to produce a quantum circuit producing state as in
Eq. (2), solely for unit vectors in real vector space, such as normalized pixel data of
a grayscale image.
Note that, as the pixel data is being represented as probability amplitudes of
a quantum statevector, it cannot be used to store the image for further retrieval,
as measuring the statevector will collapse the complete quantum state, thereby
destroying the complete pixel data. This work is expected to be important in the
applications requiring state preparation circuits, where an image needs to be repre-
sented using minimal number of qubits, temporarily. These qubits are then further
processed via the image processing circuit, performing important image processing
applications.
3 Proposed Work
In this section, the technique to generate a quantum circuit with n qubits that will
produce an arbitrary unit vector in 2n -dimensional real vector space, is proposed.
The circuit consists of only NCT, and Ry gates and their controlled counterparts. To
demonstrate the technique, let us first start with a 2-dimensional unit real vector.
438 M. Sarkar
α1
Let us consider an arbitrary real vector |ψ = with α12 + α22 = 1. Thus we
α2
can readily consider α1 = cos θ2 and α2 = sin θ2 for certain angle θ , which can be
obtained as θ = 2 arccos(α1 ). The following circuit will generate the desired state
(Fig. 1).
α4
α3 + α4 = 1. Thus we can consider three real angles θ1 , θ2 , θ3 such that α1 = cos θ21 ,
2 2
α2 = sin θ21 cos θ22 , α3 = sin θ21 sin θ22 cos θ23 , and α4 = sin θ21 sin θ22 sin θ23 . This is in
accordance with the spherical coordinate system.
Now, with initial state of a two-qubit quantum system being (1, 0, 0, 0)T , the
circuit generating the desired quantum state can be designed as follows.
T
(a) Employing R y (θ1 ) on first qubit yields the state cos θ21 , sin θ21 , 0, 0 , following
the similar logic in Sect. 3.1.
(b) Employing controlled-R y (−θ2 ) gate with control on first qubit and target on
second qubit performs following operation.
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 θ2 0 0 cos θ21 cos θ21
⎢ 0 cos − 0 − sin − 2 ⎥
θ ⎢ θ1 ⎥ ⎢ sin θ1 cos θ2 ⎥
⎥⎢ sin 2
2
⎢ 2 ⎥=⎢ 2 2 ⎥.
⎣0 ⎦⎣ 0 ⎦ ⎣ ⎦
0 θ2 1 0 θ2 0
θ1 θ2
0 sin − 2 0 cos − 2 0 − sin 2 sin 2
(c) Employing controlled-R y (π + θ3 ) gate with control on second qubit and target
on first qubit performs following operation.
Novel Design of Quantum Circuits for Representation of Grayscale Images 439
⎡ ⎤⎡ ⎤
1 0 0 0 cos θ21
⎢0 1 ⎥⎢ θ1 θ2 ⎥
⎢ π0 θ3 0π ⎥⎢ sin 2 cos 2 ⎥
⎣0 0 cos 2 + 2 − sin 2 + 2 θ3 ⎦⎣
0 ⎦
θ3 θ3
0 π
0 sin 2 + 2 cos 2 + 2 π
− sin 2 sin θ22
θ1
⎡ ⎤
cos θ21
⎢ sin θ1 cos θ2 ⎥
=⎢ ⎥
⎣ sin θ1 sin θ2 cos θ3 ⎦.
2 2
2 2 2
θ1 θ2 θ3
sin 2
sin 2
sin 2
The output state, as observed, matches with our desired statevector. The circuit
thus demonstrated, is shown as in Fig. 2.
The generated circuit has been tested on several randomly generated 4-
dimensional real array with elements in range [0, 255], using Qiskit library in Python
3.9.
As an example, when the above-mentioned procedure is employed on the pixel
data [0, 128, 192, 255], the following quantum circuit, as shown in Fig. 3, is produced.
Whereas as accessed on the day of this writing, the circuit proposed by the Qiskit
tutorial website [2], for the 4-pixel image with pixel values [0, 128, 192, 255], consists
of 5 quantum gates.
Suppose we have been given any arbitrary grayscale image. We can readily pad the
image with zeros to make number of pixels as power of 2. Let number of pixels,
after padding, turns out to be 2n . After scaling and converting it into probabilistic
amplitudes of a possible quantum system statevector, the n-qubit quantum circuit
to generate the arbitrary 2n -dimensional statevector, can be obtained as follows.
Generation of a 3-qubit quantum circuit for 8-dimensional statevector, is being shown
as example along with each step.
440 M. Sarkar
(a) Obtain spherical angles from the statevector. With 2n -dimensional stat-
evector, we will obtain (2n − 1) angles. As an example, for an 8-dimensional
statevector [c0 , c1 , c2 , c3 , c4 , c5 , c6 , c7 ], we can obtain 7 spherical angles
[α0 , α1 , α2 , α3 , α4 , α5 , α6 ] such that, the statevector can be represented
as cos α20 , sin α20 cos α21 , sin α20 sin α21 cos α22 , sin α20 sin α21 sin α22 cos α23 ,
sin 2 sin α21 sin α22 sin α23 cos α24 ,
α0
sin α20 sin α21 sin α22 sin α23 sin α24 cos α25 ,
α0 α1 α2 α3 α4 α5 α6
sin 2 sin 2 sin 2 sin 2 sin 2 sin 2 cos 2 , sin α20 sin α21 sin α22 sin α23 sin α24
α5 α6
sin 2 sin 2 .
(b) If n = 1 or 2, employ design techniques mentioned in Sects. 3.1 and 3.2,
respectively. Otherwise, design an (n − 1)-qubit arbitrary statevector gener-
ator circuit, recursively, employing the first (n − 1) qubits on the system.
This will involve first (2n−1 − 1) spherical angles, and will build up first
(2n−1 − 1) entries of the statevector completely, and 2n−1 th entry partially.
As an example, for the 3-qubit system, employ the design of Sect. 3.2 with first
(2
n−1
− 1) = 3 angles, as shown in Fig. 4. The output of the partial circuit is
cos α20 , sin α20 cos α21 , sin α20 sin α21 cos α22 , sin α20 sin α21 sin α22 , 0, 0, 0, 0]. Observe
that c3 has been created partially.
(c) Employ an (n − 1)-qubit controlled R y (α2n−1 −1 ) gate, with control on first (n −
1) qubits and target on last qubit, where α2n−1 −1 represents the 2n−1 th spherical
angle. For the 3-qubit system, this will employ R y (α3 ) on entries 0 111 . . . 111
n−1
.
(2n−1 th entry) and 1 111 . . 111 (2n th entry). The circuit in Fig. 5 has the output
n−1
as cos α20 , sin α20 cos α21 , sin α20 sin α21 cos α22 , sin α20 sin α21 sin α22 cos α23 , 0, 0, 0,
sin α20 sin α21 sin α22 sin α23 .
(d) Employ (n − 1) CNOT gates, one by one, on each of the first (n − 1) qubits. Each
of these CNOT gates have controls on the last qubit. These (n − 1) gates will
take the entry at 1111
. . . 111 (last entry) to 1 000
. . . 000 (2n−1 + 1th entry). As
n n−1
the continuation of the example, for the 3-qubit system, the circuit in Fig. 6 has
the output [cos α20 , sin α20 cos α21 , sin α20 sin α21 cos α22 , sin α20 sin α21 sin α22 cos α23 ,
sin α20 sin α21 sin α22 sin α23 , 0, 0, 0].
(e) Employ another (n − 1)-qubit arbitrary statevector generator circuit with last
(2n−1 − 1) angles, recursively, on first (n − 1) qubits. Each gate in this sub-
circuit must have additional control from last qubit. The final 3-qubit circuit is
shown in Fig. 7. It has 2-qubit arbitrary statevector generator circuit of Fig. 2
with angles [α4 , α5 , α6 ], employed after the partial circuit of Fig. 6, each having
additional control from last qubit.
The circuit of Fig. 7 eventually has the desired output statevector. The circuit thus
designed, has also been verified successfully with Qiskit library in Python 3.9, on
several randomly generated 8-dimensional arrays with values in range [0, 255].
4 Conclusion
References
1. Yao XW, Wang H, Liao Z, Chen MC, Pan J, Li J, Zhang K, Lin X, Wang Z, Luo Z, Zheng W
(2017) Quantum image processing and its application to edge detection: theory and experiment.
Phys Rev X 7(3):031–041
2. Quantum Edge Detection—QHED Algorithm on Small and Large Images. https://qiskit.org/
textbook/ch-applications/quantum-edge-detection.html
3. Benioff P (1980) The computer as a physical system: a microscopic quantum mechanical
Hamiltonian model of computers as represented by Turing machines. J Stat Phys 22:563–591
4. Feynman RP (1982) Simulating physics with computers. Int J Theor Phys 21(6/7):467–488
5. Shor PW (1994) Algorithms for quantum computation: discrete logarithms and factoring.
In: 35th IEEE annual symposium on foundations of computer science, pp 124–134, IEEE
6. Latorre JI (2005) Image compression and entanglement. Comput Sci
7. Venegas-Andraca SE, Bose S (2003) Storing, processing, and retrieving an image using
quantum mechanics. In: Proceedings of SPIE—the international society for optical engineering,
vol 5105
8. Venegas-Andraca SE, Ball JL (2010) Processing images in entangled quantum systems.
Quantum Inf Process 9:1–11
9. Le PQ, Dong F, Hirota K (2011) A flexible representation of quantum images for polynomial
preparation, image compression, and processing operations. Quantum Inf Process 10:63–84
10. Zhang Y, Lu K, Gao YH, Wang M (2013) NEQR: a novel enhanced quantum representation
of digital images. Quantum Inf Process 12:2833–2860
11. Sang JZ, Wang S, Li Q (2017) A novel quantum representation of color digital images. Quantum
Inf Process 16:14
12. Su J, Guo X, Liu C, Lu S, Li L (2021) An improved novel quantum image representation and
its experimental test on IBM quantum experience. Sci Rep 11(1):1–13
13. Grover L, Rudolph T (2002) Creating superpositions that correspond to efficiently integrable
probability distributions. arXiv preprint quant-ph/0208112
14. Soklakov AN, Schack R (2006) Efficient state preparation for a register of quantum bits. Phys
Rev A 73(1):012307
Trajectory Tracking Analysis
of Fractional-Order Nonlinear PID
Controller for Single Link Robotic
Manipulator System
Abstract Increasing demand for automation is being observed especially during the
recent scenarios like the Covid-19 pandemic, wherein direct contact of the healthcare
workers with the patients can be life-threatening. The use of robotic manipulators
facilitates in minimizing such risky interactions and thereby providing a safe environ-
ment. In this research work, a single link robotic manipulator (SLRM) system is taken,
which is a nonlinear multi–input–multi–output system. In order to address the limi-
tations like heavy object movements, uncontrolled oscillations in positional move-
ment, and improper link variations, an adaptive fractional-order nonlinear propor-
tional, integral, and derivative (FONPID) controller has been suggested. This aids
in the effective trajectory tracking of the performance of the SLRM system under
step input response. Further, by tuning the controller gains using genetic algorithm
optimization (GA) based on the minimum objective function (JIAE ) of the inte-
gral of absolute error (IAE) index, the suggested controller has been made more
robust for trajectory tracking performance. Finally, the comparative analysis of the
simulation results of proportional & integral (PI), proportional, integral, & deriva-
tive (PID), fractional-order proportional, integral, & derivative (FOPID), and the
suggested FONPID controllers validated that the FONPID controller has performed
better in terms of minimum JIAE and lower oscillation amplitude in trajectory tracking
of positional movement of SLRM system.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 443
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_43
444 P. Tripathi et al.
The SLRM systems are very flexible in their links due to which positional oscil-
lation and unsustained vibration can be observed. The SLRM system as shown in
Fig. 1 comprises of a single link joint, and rotational base, modeled using Euler’s
Lagrangian technique by assessing the kinetic energy (KE) and potential energy (PE)
of the system. The model equations of the SLRM system in the state-space form are
described below in Eq. (1) [8, 9].
⎡ ⎤
⎡ ⎤ 0 0 1 0 ⎡θ ⎤
θ̇ ⎢0
⎢ β̇ ⎥ ⎢ 0 0 1⎥⎥⎢ β ⎥
⎢ ⎥=⎢ η C C η N 2 +R γ ⎥⎢ ⎥ (1)
⎣ θ̈ ⎦ ⎢ 0 Cs
− g t mJeqmRm m 0 ⎥⎣ θ̇ ⎦
⎣ Jeq ⎦
0 − s (Jeql Jl eq ) g t mJeqmRm m
C J +J η C C η N 2 +R γ
β̈ 0 β̇
where θ is the angle of rotation; β is the angle of oscillation; θ̇ is the rate of change
of angular rotation; β̇ is the rate of change of oscillational angle; Cs : 1.3792 is
the stiffness constant; Ct : 0.0069 is the thermal constant; Jeq : 0.00208 kgm2 is the
moment of inertial without load; Jl : 0.000410 kgm2 is the moment of inertia of link;
ηg : 0.90, and ηm : 0.69 is the efficiency of the gearbox and motor, respectively;
Cm : 0.0078 v/rad/s is back emf constant; N = 70 is the ratio of the gearbox; γ :
0.004 Nm/(rad s) is coefficient of damping; Rm : 2.6 ohm is resistance of armature
[8, 9].
The discovery of fractional calculus has enabled in switching from the long-
established models and controllers to those based on noninteger order differential
446 P. Tripathi et al.
where u FONPID is the controller output, i.e., control signal considered in V; k p , ki , and
kd are proportional, integral and derivative gains of the suggested FONPID controller;
λ, and μ are the fractional-order integral (FOI), and fractional-order derivative (FOD)
operators, respectively. e(t) is the error signal; f (h) = cosh(kn e(t)) is a nonlinear
hyperbolic gain function for the proportional and integral term; kn is a positive gain
for the nonlinear hyperbolic function.
The logic behind the nonlinear hyperbolic function [14] is that if the total of errors
e(t) and manipulator system output is big, the nonlinear function is considerably
high, resulting in greater corrective actions that quickly direct the output toward the
intended trajectory. Hence, the incorporation of the FONPID controller [7, 15] has
been suggested for such a purpose which lessens the error e(t) as well as the flexibility
to changes in robot output. The combined error signal and the actual robot output
are input into a nonlinear function in the supplied loop.
For the implementation of FOI and FOD, the fifth-order of Oustaloup’s Recur-
sive Approximation (ORA) filter is considered for the proper distribution of poles
Trajectory Tracking Analysis of Fractional-Order Nonlinear PID … 447
and zeros as shown in Eq. (3) having a lower frequency of 0.01 rad/sec and upper
frequency of 100 rad/sec.
λ,μ dμ d−λ
Dt = ∀R(μ) > 0; 1∀R(λ, μ) = 0; ∀R(λ) < 0 (3)
dt μ dt −λ
The key to building an effective control scheme is to tune a controller. The
parameters tuned using a nature-inspired algorithm yield higher performance than
parameters tuned with traditional algorithms. Because the system requires precise
tracking with little fluctuation in control effort, a machine learning control optimiza-
tion strategy is required. For the precise and effective positional movement of the
link, GA optimization [16] is utilized in order to tune the controller gains based on the
minimum objective function (JIAE ) of the IAE as given in Eq. (4). The genetic algo-
rithm is the most common type of evolutionary algorithm (EA) that solves optimiza-
tion problems by maintaining approaches triggered by natural processes including
selection, inheritance, mutation, and crossover. The closed-loop control system with
a tuned controller structure is showcased in Fig. 3.
30
Fig. 3 Closed-loop control configuration of the whole system with tuned FONPID controller
448 P. Tripathi et al.
solutions (new—k p , ki , kd , λ, μ, kn ) from the best solution found in the previous loops.
Later on, using the role of genetic operators (crossover & mutation). By repeating
steps 3 through 6 until the best FONPID controller coefficients can be obtained.
The trajectory tracking of the single link robotic manipulator system using the
suggested FONPID controller is simulated and tested in this part for step input
when it comes to effective and proper control in the positional movement of the
link of the robotic manipulator system. The simulation results are carried out in
MATLAB/Simulink R2016b environment. The Runge Kutta method of order four is
used for solving the differential equation used in the modeling of this system. The
step size is considered as 1 ms at the time of simulation analysis along with the
control signal saturation limit ranging between −5 V and 5 V.
When compared to traditional analogues, FOPID, and classical PID, PI
controllers, the trajectory tracking the performance of the suggested FONPID
controller is studied based on the minimum objective function value of GA opti-
mization considering the IAE performance index. The convergence curve is shown in
Fig. 5, and corresponding IAE values are showcased in Table 1 validating the perfor-
mance of PI, PID, FOPID, and the suggested FONPID controller. The controller
gains are optimized using the GA optimization approach as showcased in Table 2.
The bar chart in Fig. 4 represents the values of the objective function for PI, PID,
FOPID, and FONPID controllers.
Table 1 Gains for PI, PID, FOPID, and FONPID controllers tuned using the IAE performance
index
Controller Controller gains
KP KI KD λ μ K0
PID 9.99 0.50 0.48 – – –
PI 5.08 −9.60 – – – –
FOPID 9.26 0.50 0.92 0.19 0.91 –
FONPID 4.20 0.50 0.83 0.15 0.90 3.23
The control goal is to monitor the reference trajectory of the positional movement
of a single link robotic manipulator system applied to the various controller such as
PI, PID, FOPID, and FONPID as showcased in Fig. 6. The corresponding error signal
is generated in Fig. 7. And the control signal, i.e., controller output showcased in
Fig. 8. The step response of rate of change of rotational position for PI, PID, FOPID,
and FONPID controllers incorporated into the system is showcased in Fig. 9. The
step response of oscillational position, as well as the rate of change of oscillational
position for PI, PID, FOPID, and FONPID controllers, are showcased in Figs. 10,
and 11, respectively.
adaptive and robust. The suggested FONPID controller observed 32.9%, 23.8%, and
1.7% improvement over proportional & integral (PI), proportional, integral, & deriva-
tive (PID), fractional-order proportional, integral, & derivative (FOPID) controllers,
respectively. Hence, the suggested FONPID controller showcased more resilient,
effective, and better performance over PI, PID, and FOPID controllers based on
Trajectory Tracking Analysis of Fractional-Order Nonlinear PID … 451
Fig. 10 Response of
oscillational position for PI,
PID, FOPID, and FONPID
controllers
minimum JIAE function values. In the future, this work can be further extended by
modeling complex multi-link robotic manipulators controlled with various intelligent
control techniques such as fuzzy and neural networks.
452 P. Tripathi et al.
References
1. Kumar J, Gupta D, Goyal V (2022) Nonlinear PID controller for three-link robotic manipu-
lator system: a comprehensive approach BT. In: Proceedings of ınternational conference on
communication and artificial intelligence. Presented at 2022
2. Agrawal A, Goyal V, Mishra P (2021) Comparative study of fuzzy PID and PID controller
optimized with spider monkey optimization for a robotic manipulator system. Recent Adv
Comput Sci Commun (Formerly Recent Patents Comput Sci) 14
3. Agrawal A (2021) Analysis of efficiency of fractional order technique in a controller for a
complex nonlinear control process BT. In: Proceedings of ınternational conference on big data,
machine learning and their applications. Presented at 2021
4. Boulkroune A, M’saad M (2012) On the design of observer-based fuzzy adaptive controller
for nonlinear systems with unknown control gain sign. Fuzzy Sets Syst 201:71–85. https://doi.
org/10.1016/j.fss.2011.12.005
5. Hultmann Ayala HV, dos Santos Coelho L (2012) Tuning of PID controller based on a multi-
objective genetic algorithm applied to a robotic manipulator. Expert Syst Appl 39:8968–8974.
https://doi.org/10.1016/j.eswa.2012.02.027
6. Renuka K, Bhuvanesh N, Reena Catherine J (2021) Kinematic and dynamic modelling and PID
control of three degree-of-freedom robotic arm. In: Kumaresan G, Shanmugam NS, Dhinakaran
V (eds) Advances in materials research. Springer Singapore, Singapore, pp 867–882
7. Rawat HK, Goyal V, Kumar J (2022) Comparative performance analysis of fractional-order
nonlinear PID controller for NPK model of nuclear reactor. In: 2022 2nd International confer-
ence on power electronics & IoT applications in renewable energy and its control (PARC), pp
1–6. https://doi.org/10.1109/PARC52418.2022.9726661
8. Jayaswal K, Palwalia DK, Kumar S (2020) Analysis of robust control method for the flexible
manipulator in reliable operation of medical robots during COVID-19 pandemic. Microsyst
Technol 9. https://doi.org/10.1007/s00542-020-05028-9
9. Jayaswal K, Palwalia DK, Kumar S (2021) Performance investigation of PID controller in
trajectory control of two-link robotic manipulator in medical robots. J Interdiscip Math 24:467–
478. https://doi.org/10.1080/09720502.2021.1893444
10. Gupta D, Goyal V, Kumar J. (2019) An optimized fractional order PID controller for ıntegrated
power system. Presented at 2019. https://doi.org/10.1007/978-981-13-8461-5_76
11. Faieghi MR (2011) On fractional-order PID design. Presented at 2011. https://doi.org/10.5772/
22657
12. Goyal V, Mishra P, Kumar V (2018) A robust fractional order parallel control structure for
flow control using a pneumatic control valve with nonlinear and uncertain dynamics. Arab J
Sci Eng. https://doi.org/10.1007/s13369-018-3328-6
13. Agarwal A, Mishra P, Goyal V (2021) A novel augmented fractional-order fuzzy controller for
enhanced robustness in nonlinear and uncertain systems with optimal actuator exertion. Arab
J Sci Eng 46:10185–10204. https://doi.org/10.1007/s13369-021-05508-8
14. Agrawal A, Goyal V, Mishra P (2019) Adaptive control of a nonlinear surge tank-level
system using neural network-based PID controller BT. In: Applications of artificial ıntelligence
techniques in engineering. Presented at 2019
15. Kumar J (2021) Design and analysis of nonlinear PID controller for complex surge tank system
BT. In: Proceedings of ınternational conference on communication and artificial intelligence.
Presented at 2021
16. Deb K (1999) An introduction to genetic algorithms. Sadhana 24:293–315. https://doi.org/10.
1007/BF02823145
PCA-Based Machine Learning Approach
for Exoplanet Detection
Abstract The search of planets capable of sustaining life has been taken to a
whole new level with NASA’s Kepler mission. The mission has successfully discov-
ered around 4000 planets, however, the task of manual evaluation of this data is
cumbersome and labor intensive, and calls for more efficient methods of discov-
ering exoplanets to remove false positives and errors. The goal of this project is to
utilize machine learning algorithms to classify stars as exoplanets through the data
collected by the Kepler satellite. To this end, we plan to use preprocessing methods
and apply suitable classification algorithms to build an accurate and optimal classifier,
increasing the proficiency of the process.
1 Introduction
The everlasting curiosity of humans to know more about the world around them has
been a key factor in the advancement of civilization. Since ancient times, humans have
wondered where the edge of the world might be, which led pioneers such as Columbus
to set sail into an unknown horizon. This human penchant to discover more and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 453
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_44
454 H. K. Sharma et al.
know more has now taken the form of space exploration. Even though the Universe
is infinitely vast, we have not yet discovered any other life form in outer space and
this seemingly paradoxical situation has baffled scientists and astronomers since
decades as they continuously try to look for new worlds where other life forms might
be thriving. In this noble quest, one of the most important factors is the discovery
of exoplanets, which could possibly host new life forms. The planet which moves
around a star beyond our solar system is known as exoplanet. In the hope of finding
an exoplanet with similar conditions as that of Earth, ultimately supporting life,
humankind took a huge step forward when NASA launched the Kepler Mission. The
Kepler Mission was the first of its kind, capable of finding exoplanets smaller or
equal to the size of the earth, orbiting around a star. When a planet crosses its star, it
momentarily obstructs the light emitted by the star. Thus a depression in the intensity
of light of the star is observed, as shown in Fig. 1. This event is known as ‘transit’ and
can be observed from the Earth as well when Venus or Mercury passes the Sun. The
Kepler Space Observatory satellite makes use of this transit method by observing a
solar system for a long time and looking for variations in the star’s flux. It accurately
measures the brightness of the star. Astronomers use this data to determine if a regular
transit exists, and if it does, then it is evidence that a planet may be orbiting the star.
Once a planet is discovered, other aspects like the size of the planet, its orbit and the
star are observed and calculated. These values help in knowing whether this newly
discovered planet is capable of hosting life forms. Figure 1, shows the light intensity
of a planet during its movement.
Manually interpreting this data is a complex, time consuming task and is subject
to human error. Moreover, more planet hunting missions like TESS and PLATO are
underway and with advanced technology, they provide more comprehensive data.
This calls for progressive data analysis methods. Hence, this project aims to simplify
and accelerate the process of discovery of exoplanets with the use of machine learning
techniques. The data amassed by Kepler for over a decade has been made available
by NASA to the public to let researchers carry on discoveries. In this project, we
make use of data preprocessing methods like normalization of data and Principal
Component Analysis (PCA) on the dataset and apply machine learning models to
predict whether an object is an exoplanet or not from the given data points. Through
highly efficient prediction models, it will be extremely helpful to determine the
general characteristics about exoplanets as recorded by the Kepler and whether the
exoplanets confirmed by literature are supported by the measurements of the satellite.
2 Literature Review
3 Methodology
The following are the steps that we have followed for this research work (Fig. 2).
4 Implementation
4.1 Normalization
5 Experimental Results
5.1 Normalization
In Fig. 3, the dataset used originally is presented. Few records from the whole dataset
is extracted and presented on desktop console.
In Fig. 4, dataset after applying normalization techniques is presented. Dataset
is properly transformed into same length for equal contribution. We have again
extracted some records and presented from the whole dataset.
In Fig. 5, we have provided a visualization based on graph that shows the results
of the Principle Component Analysis (PCA) applied on provided dataset. Principle
Component 1 versus Principle Component 2 has been drawn on min–max scaled
form on the graph. The results shows the relations between PCA 1 and PCA 2 and it
is much scattered as compare to Fig. 5.
PCA-Based Machine Learning Approach for Exoplanet Detection 459
Following are the results (Fig. 6) obtained from training SVM classifiers on the
exoplanet data. The output screenshot represents the values of various statistical
parameters used for the accuracy measurement of the model performance.
In Fig. 6, we have presented the confusion matrix generated by the model accuracy
measurement code module. This is generated by training using SVC.
460 H. K. Sharma et al.
Fig. 6 The confusion matrix of the results obtained by training using SVC
In Fig. 7, we have presented the confusion matrix generated by the model accu-
racy measurement code module. This is generated by training using Random Forest
Classifier.
In a given dataset all the features are not equally important, some features have
greater importance than others. Figure 7 shows the relative importance of the features
obtained using PCA as per the Random Forest Classifier.
PCA-Based Machine Learning Approach for Exoplanet Detection 461
6 Conclusion
References
Abstract The convolutional neural network (CNN) architecture has shown remark-
able success in image classification and segmentation. Its popularity has increased
promptly due to various factors such as exponential growth in computational
resources, availability of benchmark datasets, supporting libraries, and open-source
software. The efficiency of CNN architecture majorly depends on the complexity of
the architecture, availability of datasets, and hyperparameter selection. But, due to
the huge number of parameters in CNN architecture, its selection has completely
remained ad hoc in the past works. In this article, a novel encoding technique has
been proposed that can represent complex CNN architecture effectively. The article
defines basic building blocks to represent CNN architecture such as the genesis block,
transit block, agile block, and output block. This encoding structure is used to generate
dynamic length chromosome structure and initialized using evolutionary algorithms.
A comparative analysis is also presented that shows compared its effectiveness with
existing encoding representations on the basis of the number of encoding parameters,
training cost, and efficiency.
1 Introduction
The convolutional neural network (CNN) is a special type of deep neural network
that is specially designed for image classification and segmentation [1]. The CNN
architecture is a layered combination of convolutional layer, pooling layer, and fully
connected layer. In the convolutional layer, input images are passed pixel by pixel and
multiplied with an adaptive weight matrix known as filters. A number of filters have
been employed to capture different features of the input dataset. In each convolutional
layer, the number of filters and dimensions vary according to the input image size
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 463
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_45
464 V. Mishra and L. Kane
and complexity of the CNN network. The output of the convolutional layer is passed
to on next convolutional layer or pooling layer. In the pooling layer, the output of
the convolutional layer is optimized by reducing redundant or less useful features
using different pooling operations, such as max, mean, and min. After multiple
convolutions and pooling operations, data is passed to a fully connected layer. In
a fully connected layer, multidimensional data is converted into a one-dimensional
layer based on predicted output classes. In the training of CNN architecture, the feed-
forward process is followed iteratively with some fixed set of hyperparameters and
adaptive learning parameters. In each iteration, the input image is passed, and based
on the output, an average learning error is calculated that is passed using a feedback
mechanism to adopt filter and weight values. As the number of hyperparameters is
numerous and their values are not correlated, we need to check a lot of combinations
before predicting suitable CNN architecture. The hyperparameter tuning problem is
NP-hard, and finding an optimum solution will take exponential time. Based on the
literature, we observed that evolutionary algorithms and RNN methods are helpful to
solve that problem. The RNN-based [2] models are efficient to solve hyperparameter
tuning problems, but they required huge computation power. Evolutionary-based
algorithms [3, 4] are more suitable in a computationally constrained environment with
comparative effectiveness. Another limitation of manual selection of architecture is
that requires a good amount of knowledge in the CNN design with the problem
domain. To solve both problems evolutionary algorithms help to design architecture
and hyperparameters selection [5, 6] automatically. In this paper, we proposed an
encoding scheme to map the CNN architecture and pass on the evolutionary algorithm
as input parameters.
The remaining content of the paper is organized in the following sections. Section 2
represents various existing techniques with their methodology and findings. The
proposed encoding scheme is elaborated with the contribution of work in Sect. 3
followed by a discussion of results in Sect. 4. Section 5 concludes the work with the
outcome of the reviews.
2 Related Work
a complex task to define the genetic operation to mutate the architecture. Architec-
ture such as CNN-GA [8], CGP-CNN [9], and AE-CNN [4] is used variable-length
encoding representation. The effectiveness of encoding representation is evaluated
based on computation cost, accuracy, number of parameters, and adaptability. If
encoding schemes are represented with only a few parameters, then it will restrict
the exploration of the architecture. If we represent individual training parameters as
one unit, then the number of possible chromosomes is too huge which will increase
computation cost.
The evolutionary-based algorithms are used to generate improved CNN architec-
ture using an input encoding scheme, and efficiency is compared with benchmark
datasets such as CIFAR-10 and CIFAR-100 [10]. Table 1 represents a comparison
of the existing encoding schemes with their representation. We compared based
on representation, decoded architecture, and accuracy. A comparative analysis of
existing encoding techniques is shown in Fig. 1, and methodology with performance
on CIFAR-10 dataset using genetic algorithm is presented in Table 1.
This section details the proposed encoding scheme to represent CNN topology. The
proposed scheme employs a variable-length encoding scheme that represents the
depth as well as the width of the architecture. The scheme comprises four basic
building blocks as shown in Fig. 2. A few bit strings represent each building block,
and concatenated structure will represent the complete CNN architecture. Genesis
block (combination of convolutional block and pooling block) is used to pass input
image size. To reduce the feature map and dimension of the input pixel, a transit
block is introduced that uses 1 × 1 convolution and pooling operation. The value
of the pooling operation is defined in the range of 0–1. If the value is less than 0.5
means max pool operation is used, else mean pool operation is used. The agile block,
working on the concept of dense connection, uses multiple convolutional blocks with
the same learning parameters. These convolutional blocks are connected using skip
connection to reduce the number of parameters. An agile block is the combination of
five elements; operation, size of filters, number of filters, depth, and interconnection
of a different convolutional layer. In the end, fully connected blocks are introduced
to flatten the layer with one dimension data and convert it into the next output layer
with the number of classes.
The main advantage of the proposed encoding scheme is that it can represent archi-
tecture with a combination of two different layers. It makes the representation simple,
and one can increase the depth of architecture easily. Also, due to fewer parameters,
one can define different evolutionary operations like mutation and crossover effi-
ciently. The scheme also supports increasing the complexity within the block. In
the agile block, it can generate filer size and depth randomly and thereby increases
complexity. The proposed scheme supports a hybrid encoding scheme that utilizes
binary as well as decimal representation. The encoding scheme offers the maximum
466 V. Mishra and L. Kane
choice of exploration in depth and width as well as faster optimization. We pass our
initialized encoding method in evolutionary algorithms to optimize for better archi-
tecture. The maximum number of iterations is fixed at 50 as limited computation
power is available.
Self-build Deep Convolutional Neural Network Architecture Using … 467
Fig. 1 Block diagram of the encoding representation of CNN architecture in the literature reviewed;
a GACNN [7], b CGP-CNN [8], c CNN-GA [2], d genetic CNN [5]
A novel encoding scheme for representing CNN architecture is proposed which can
effectively be used to represent a complex architecture with a variable number of
parameters.
The study also presents a decisive comparison among various existing encoding
schemes that can help the researchers in choosing the best suitable method for their
application-specific projects. The comparative analysis highlights the merits and
demerits of existing schemes through multiple parameters like accuracy and compu-
tational power. The authors also represented a depth analysis based on the number of
parameters used to represent input chromosomes, their initialization methods, oper-
ators used to find different combinations, and fitness function to stop the searching
methods.
(a) (b)
(c)
Fig. 3 Comparison of various encoding schemes under training in CIFAR-10 dataset using genetic
algorithm; a accuracy achieved, b number of parameters used, c error rate and training cost
algorithms. The proposed scheme is adaptive and versatile in nature. A simple repre-
sentation offers a better understanding of the complex network. Adaptive behavior
scales up the application domain of the proposed scheme.
5 Conclusion
The study proposed a novel encoding method that is used to represent complex
CNN architecture. In this encoding, we can represent existing architecture as well
as generate new architecture using an available dataset. It covers both the depth
and width of architecture that reduces the number of parameters and helps to iden-
tify comparable architecture in significant improvement of computation power with
comparable accuracy. This encoding scheme is used to pass evolutionary algorithms
to design new architecture automatically using different datasets. We can use evolu-
tionary algorithms for hyperparameter tuning and use this encoding representation
in future.
Self-build Deep Convolutional Neural Network Architecture Using … 471
References
1. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer
vision, pp 1440–1448
2. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
3. Sun Y, Xue B, Zhang M, Yen GG (2019) Evolving deep convolutional neural networks for
image classification. IEEE Trans Evol Comput 24(2):394–407
4. Suganuma M, Kobayashi M, Shirakawa S, Nagao T (2020) Evolution of deep convolutional
neural networks using cartesian genetic programming. Evol Comput 28(1):141–163
5. Sinha T, Haidar A, Verma B (2018) Particle swarm optimization based approach for finding
optimal values of convolutional neural network parameters. In: 2018 IEEE congress on
evolutionary computation (CEC), pp 1–6
6. Serizawa T, Fujita H (2020) Optimization of convolutional neural network using the linearly
decreasing weight particle swarm optimization. arXiv preprint arXiv:2001.05670
7. Xie L, Yuille A (2017) Genetic cnn. In: Proceedings of the IEEE international conference on
computer vision, pp 1379–1388
8. Sun Y, Xue B, Zhang M, Yen GG, Lv J (2020) Automatically designing CNN architectures
using the genetic algorithm for image classification. IEEE Trans Cybern 50(9):3840–3854
9. Suganuma M, Shirakawa S, Nagao T (2017) A genetic programming approach to designing
convolutional neural network architectures. In: Proceedings of the genetic and evolutionary
computation conference, pp 497–504
10. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
11. Esfahanian P, Akhavan M (2019) Gacnn: Training deep convolutional neural networks with
genetic algorithm. arXiv preprint arXiv:1909.13354
12. Joshi D, Mishra V, Srivastav H, Goel D (2021) Progressive transfer learning approach for
identifying the leaf type by optimizing network parameters. Neural Process Lett 53(5):3653–
3676
13. Wang Y, Zhang H, Zhang G (2019) cPSO-CNN: an efficient PSO-based algorithm for fine-
tuning hyper-parameters of convolutional neural networks. Swarm Evol Comput 49:114–123
14. Loussaief S, Abdelkrim A (2018) Convolutional neural network hyper-parameters optimization
based on genetic algorithms. Int J Adv Comput Sci Appl 9(10):252–266
15. Joshi D, Singh TP, Sharma G (2022) Automatic surface crack detection using segmentation-
based deep-learning approach. Eng Fract Mech 268:108467
Bird Species Recognition Using Deep
Transfer Learning
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 473
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_46
474 K. R. Madhavi et al.
1 Introduction
The study of birds has contributed much both to the theoretical and practical aspects
of biology. In classifying birds, most environmentalists have historically relied upon
structural characteristics to infer evolutionary relationships. So to help them in order
to recognize the bird species, a machine learning application model is built that would
assist them. To classify the bird species with greater accuracy, deep learning with the
transfer learning model ResNet50 is used.
To predict the bird species, an interface is developed for extracting information
from bird images, and the pre-trained model ResNet50 and a few dense layers are
added to it. Number of neurons is proportional to number of bird classes. A dataset
of birds was taken from the Kaggle. The model is trained on this dataset. The output
of the current machine learning model is an array of class probabilities. The class
which has a higher probability will be the output. Image can be taken in various
situations like the image can be taken in dull light, the bird might be small in the
image. To overcome the problem of building the model from scratch, the concept
called transfer learning is used.
The proposed system aims to maximize the accuracy in determining the bird
species in an unconstrained environment from the images and to overcome the
problem of class imbalance within the bird images.
2 Relevant Study
The existing system [1] was developed using convolutional neural networks with skip
connections. These skip connections will provide an output of the previous layers
as an input to the current layers. This system has to be built from scratch. In the
existing system [2], SVM with decision trees has been used. It suffers from an error
accumulation problem. Even this system has to be built from scratch. The existing
system [3] converts the grayscale images into autographs and makes predictions
based on the score sheet analysis. The existing system [4] uses the color segmentation
technique to remove the background elements and locate the bird [5]. Later, they use
the histogram bin size to recognize the bird species. They cannot differentiate the
minute variations using these histogram bin sizes.
In these existing systems, there was a problem of building a new machine learning
architecture. The weights that are chosen are random values that take time to reduce
the loss. The proposed technique solves the problem of starting from scratch and
using random weights to build a model.
Bird Species Recognition Using Deep Transfer Learning 475
3 Proposed Method
ResNeT
A feed-forward network having a single layer can represent any function if it has
enough capacity. The layer can be quite vast, and network might be prone to overfit-
ting the data. As a result, researchers are constantly agreeing that our network design
has to become more complicated. ResNet’s basic concept is to provide an “identity
shortcut link” that bypasses one or more levels, as indicated in the diagram below.
These are called skip connections. These skip connections are able to overcome the
vanishing gradient problem.
Residual Block
ResNets are made up of residual blocks as shown in Fig. 1. It is noticed that there
is a direct link that bypasses certain levels (which may change depending on the
model) in between. Next, this term goes through the activation function f () and H(x)
is considered as the output.
H (x) = f (wx + b)
H (x) = f (x) + x
4 Experimental Results
Table 1 Comparing
Model Training accuracy Validation accuracy
accuracy values
ResNet50 98.73 96.11
VGG16 87.47 87.89
Bird Species Recognition Using Deep Transfer Learning 479
5 Conclusion
The pre-trained models like ResNet50 and VGG16 are employed as part of the
transfer learning process. ResNet50 is able to provide better results than the other
model when it comes to predicting bird species. With this model, an accuracy of
98.7% can be achieved. The proposed system outperforms some of the existing
systems in predicting the bird species. But whenever a new bird species is included,
the model has to be re-trained which is a time-consuming task. The changes can
be made in the application such that the images can directly be uploaded from the
camera instead of uploading it from the folder. The effectiveness of recognition is
hampered by the poor image quality.
References
1. Huang Y-P, Basanta H (2019) Bird image retrieval and recognition using a deep learning platform.
IEEE Access 7:66980–66989. https://doi.org/10.1109/ACCESS.2019.2918274
2. Qiao B, Zhou Z, Yang H, Cao J (2017) Bird species recognition based on SVM classifier and
decision tree. In: 2017 First international conference on electronic instrumentation & information
systems (EIIS)
3. Gavali P, Banu JS (2020) Bird species identification using deep learning on GPU platform. In:
2020 International conference on emerging trends in information technology and engineering
(ic-ETITE)
4. Marini A, Facon J, Koerich AL (2013) Bird species classification based on color features. In:
2013 IEEE international conference on systems, man, and cybernetics
5. Cox DTC, Gaston KJ (2015) Likeability of garden birds: importance of species knowledge &
richness in connecting people to nature. PloSone 10
6. Ragib KM, Shithi RT, Haq SA, Hasan M, Sakib KM, Farah T (2020) Pakhi Chini: automatic bird
species identification using deep learning. In: 2020 Fourth world conference on smart trends in
systems, security and sustainability (WorldS4)
CNN-Based Model for Deepfake Video
and Image Identification Using GAN
Abstract Deepfakes are the new age tools that automate the syntheses and detection
of computer altered videos through GANs. Studies and researches are being done to
detect and study the impact of deepfakes on social media and on human lives. In this
paper, we will research about the DF technologies such as MTCNN and ResNext-
v1 classification models to artificially automate the tasks of deepflakes detection by
using datasets from varied sources and having different diversities of people. We also
portray another deep learning-based technique that can successfully recognize AI-
created counterfeit recordings from genuine recordings. It is inconceivably critical
to foster innovation that can spot fakes, so the DF can be recognized and kept from
spreading over the Web. Our strategy identifies by looking at the facial zones and their
encompassing pixels by parting the video into outlines and separating the highlights
with a ResNext-v1 CNN and utilizing the MTCNN catch the transient irregularities
between frames presented by GANs during the remaking of the pixels. Our aim is
to make an audio-less deepfakes detection system using ML and DL techniques to
curb the spread of misinformation.
Hitesh Kumar Sharma and Tanupriya Choudhury contributed equally to the work.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 481
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_47
482 H. K. Sharma et al.
1 Introduction
The free and open access to enormous amount of public data through various social
media Websites and e commerce Websites, along with the quick advancements of
deep learning strategies specifically generative adversarial networks, have prompted
the age of deepfakes content in this time of providence of news through social
media. Deepfake videos which include biometric information created by digitally
manipulating information, with deepfake algorithms, have become matter of grave
concern. The well-known term “deepfake” referred to the DL-based technology able
to forge synthetic videos by mapping the facial features of a person onto the target
person. Human faces are usually preferred to current deep fake algorithms because:
In computer vision, augmenting facial details are well researched fields. Faces are the
first and most important in human connection because we tend to believe the message
if it is coming from a trust worthy faces. These factors stirred consideration around
the innovation’s disinformation hazards. Be that as it may, an absence of answers to
key inquiries has left policymakers and experts without clear direction in creating
arrangements to address these hazards. How quickly is the innovation for manufac-
tured media progressing, and what are sensible assumptions around the commodifi-
cation of these instruments? For what reason would a disinformation crusade decide
to disperse deepfakes rather than all the more roughly made phony substance at times
similarly as powerful? What sorts of entertainers are probably going to receive these
advances for vindictive finishes? How might they utilize them?
Deepfakes offer the online platform an interesting chance to make hoaxed content.
ML-driven deception can create strikingly practical portrayals of people furthermore,
circumstances. Critically, deepfakes can repeat different unobtrusive subtleties—
such as persuading facial spasms or practical shadows for a phony article glued into
a picture—that make it trying to distinguish a picture or video as a lie. At the very least,
such fakes may plant adequate uncertainty about an objective individual or circum-
stance to make disarray and doubt. Deepfake innovations are progressively incor-
porated into programming stages that do not need exceptional specialized mastery.
Simple to utilize, ML-driven programming that works with a “face trade”—elimi-
nating one face from a picture or video and embedding another—is progressively
accessible for clients with no specialized expertise. Other routine changes of pictures
and video controlled by ML are prone to follow. This pattern toward democratiza-
tion may diminish or successfully take out a significant number of the operational
costs in any case making deepfakes an ugly alternative for disinformation culprits.
Uncovering reality in such fields along these lines has become increasingly elemen-
tary. Nowadays, there are many system independent platforms to create DFs, and
anyone can make deepfakes nowadays with little to no knowledge using existing DF
models or software. There are many detection models used to detect deepfakes. The
majority of them depend on DL, and along these lines, a fight among vindictive and
positive employments of DL strategies has been emerging [1]. Taking into consid-
eration the emergence of DF, the US DARPA started an examination conspire in
Media Forensics in order to accelerate the advancement of deepfakes techniques. As
CNN-Based Model for Deepfake Video and Image Identification Using … 483
of late, Facebook Inc. collaborating with Microsoft Corporation and the Partnership
on AI alliance have dispatched the deepfake detection challenge to catalyze more
examination also, advancement in identifying, and forestalling deepfakes from being
utilized to delude watchers.
2 Literature Review
Detection of synthetic portrait videos using biological signals [6] technique sepa-
rates organic signs on face pixels areas on genuine and counterfeit representation
resources sets. At that point, the total probabilities to check whether a resource is
genuine or not. It is a way to deal with identify manufactured substance in repre-
sentation recordings, as an obstructive answer for the arising danger of DF [7, 8].
As such, this present a deep phony indicator. We see that identifiers aimlessly using
DL are not powerful in getting phony substance, as GAN generates considerably
practical outcomes. The key affirmation follows that organic signs covered up in
picture recordings can be utilized as a verifiable descriptor of credibility, since they
are neither spatially nor transiently safeguarded in counterfeit substance.
There are numerous instruments accessible for making the deepfakes; however for
deepfakes detection, there are few apparatus accessible. Our methodology for recog-
nizing the deepfakes will be extraordinary commitment in keeping away from the
permeation of the deepfakes over the Internet. In this paper, the expression “deep-
fakes” allude to the expansive extent of engineered pictures, video, and sound created
through ongoing leap forwards in the field of ML [9], explicitly in deep learning. This
term is comprehensive of ML procedures that look to adjust some part of a current
piece of media, or to create completely new substance. While this paper underlines
propels in neural organizations, its examination is pertinent for other strategies in
the more extensive field of ML. The expression “deepfakes” reject the wide reach
of strategies for controlling media without the utilization of ML, including many
existing instruments for “reordering” objects starting with one picture then onto the
next. One of the significant goals is to assess its presentation and worthiness as far as
security, ease of use, exactness, and unwavering quality. Our technique is zeroing in
on distinguishing a wide range of deepfakes. First need to create datasets containing
the pictures and videos for both persons that we want to mimic and that person who
we want to map that information on. Then, encoder is created to encode the avail-
able information on the pictures and videos by using a CNN-based deep learning
model [10, 11]. Then, we create a decoder to reenact the image and video informa-
tion. These autoencoders (the encoder and the decoder) have thousands of pooling
layers which is used to extract the image data, reenact them, and argument the image
data. Hence, an encoder is required to extract the various facials extracted features
to learned the provided input data. To decode the extracted facial maps, we use two
separate decoders for both the persons. Encoder and decoder are trained based on
backpropagation model, such that output data through the decoder resembles the
input data from the encoder (Fig. 1).
After training our model, the video is processed frame-by-frame to map informa-
tion of one’s face to another. Face detection machine learning algorithms are used
for face A to identify the feature, then decoder is used from face B for superimpose
for GAN-based fake image generation. The dataset is prepared before applying the
CNN-Based Model for Deepfake Video and Image Identification Using … 485
methodology on subject image. For this work, we have used pre-processed dataset
form Kaggle to achieve high accuracy from our algorithm.
• Choose images in datasets that contains only one face.
• There should be lots of videos containing different facial expressions with different
angles.
• Remove any bad quality images.
In subject picture, we identified a 5 × 5 network focuses and move them, some-
what far away from their uniquely identified positions. We utilize a straightforward
calculation to twist the picture as indicated by those moved matrix focuses. Indeed,
even the distorted picture may not look spot on, however, that is the commotion
that we need to present. At that point, we utilize a more perplexing calculation to
develop an objective picture utilizing the moved framework focuses. We need our
made pictures to look as close as the objective image (Fig. 2).
The CNN model used for encoder includes 5 CNN layers and 2 dense layers. Dense
layers used for fully connected neurons. The CNN layers for decoder consist 4 layers.
It reconstructs the 64 × 64 image back. The dimensions up from 16 cross 16 (16 ×
16) to 32 cross 32 (32 × 32), a convolution filter (3 cross 3 cross 256 cross 512 filter)
to do mapping with (16, 16, 256) layer into (16, 16, 512). Then, we need to reshape
it to (32, 32, 128). Face area of fake image is blur, it shows that peoples are using
forceful approach for fake image or video generation [1].
486 H. K. Sharma et al.
5 Experimental Results
6 Conclusion
References
1. Sharma HK, Khanchi I, Agarwal N, Seth P, Ahlawat P (2019) Real time activity logger: a user
activity detection system. Int J Eng Adv Technol 9(1):1991–1994
2. Filali Rotbi M, Motahhir S, El Ghzizal A, Blockchain technology for a Safe and Transparent
Covid-19 Vaccination. https://arxiv.org/ftp/arxiv/papers/2104/2104.05428.pdf
3. Choudhury T et al (2022) CNN based facial expression recognition system using deep learning
approach. In: Tavares JMRS, Dutta P, Dutta S, Samanta D (eds) Cyber intelligence and infor-
mation retrieval. Lecture Notes in Networks and Systems, vol 291. Springer, Singapore. https://
doi.org/10.1007/978-981-16-4284-5_34
4. Shi F, Wang J, Shi J, Wu Z, Review of artificial intelligence techniques in imaging data acqui-
sition, segmentation, and diagnosis for COVID-19. https://ieeexplore.ieee.org/document/906
9255
5. Wang L, Qiu Lin Z, Wong A (2020) COVID-Net: a tailored deep convolutional neural network
design for detection of COVID-19 cases from chest X-ray images 19549
6. Chuang M-C, Hwang J-N, Williams K (2016) A feature learning and object recognition
framework for underwater fish images. IEEE Trans Image Process 25(4):1862–1872
7. Chuang M-C, Hwang J-N, Williams K (2014) Regulated and unsupervised highlight extraction
methods for underwater fish species recognition. In: IEEE Conference Distributions, pp 33–40
8. Kim H, Koo J, Donghoonkim, Jung S, Shin J-U, Lee S, Myung H (2016) Picture based
monitoring of jellyfish using deep learning architecture. IEEE Sens Diary 16(8)
9. Sharma HK, Kumar S, Dubey S, Gupta P (2015) Auto-selection and management of dynamic
SGA parameters in RDBMS. In: 2015 2nd International conference on computing for
sustainable global development (INDIACom). IEEE, pp 1763–1768
10. Khanchi I, Ahmed E, Sharma HK (2020) Automated framework for real-time sentiment anal-
ysis. In: 5th International conference on next generation computing technologies (NGCT-2019)
11. Mishra M, Sarkar T, Choudhury T et al (2022) Allergen30: detecting food items with possible
allergens using deep learning-based computer vision. Food Anal Methods. https://doi.org/10.
1007/s12161-022-02353-9
Comparative Analysis of Signal Strength
in 5 LTE Networks Cell
in Riobamba-Ecuador with 5
Propagation Models
Abstract This article analyzes the measured signal strength in 5 LTE network cell
located in the central urban area of Riobamba in Ecuador. These measures were done
using Network Cell Info Lite and WiFi software for Android systems and 3 campaign
were applied getting 50 measured points in each cell. The results were compared with
5 propagation models (log-normal, Okumura-Hata, Cost 231, Walfisch-Bertoni and
SUI models) where the SUI and log-normal models fit better for the areas analyzed.
The signal strength varies from −70 to −110 dBm. Finally, the model that best fits
the real values obtained is determined through the calculation of the quadratic error.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 491
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_48
492 K. Chiguano et al.
The signal strength and power levels received by mobile devices connected to a
cellular network are identified by the parameter received signal code power (RSCP),
which indicates the level of signal reception in the UMTS (3G) network. The RSRP
parameter (reference signal received power) within the 4G technology measures the
signal strength that reaches the mobile from the cell or tower to which it is connected
[2]. The mobile operating frequencies in Ecuador for 4G in Movistar operator are
1900 MHz (Band 2), Claro operator 1700/2100 MHz (Band 4). For 3G technology
in Movistar operator: 850 MHz or 1900 MHz, in CLARO operator: 850 MHz (Band
5) [3].
Network Cell Info Lite is a monitoring and measurement software tool for 4G long-
term monitoring and measurement tool for 4G long-term evolution (LTE), 4G+, wide-
band code division multiple access (WCDMA), wideband code division multiple
access (CDMA) and GSM [4]. It is dual SIM compatible, except for Android mobile
devices below 5.0, due to device/Android limitation. It is capable of measuring the
received signal strength in decibels (milliwatts). The application needs to specify the
actual network whose signal strength is being measured. This application is avail-
able in the Play Store; in addition, there is a more complete paid version, but in this
case, the one we have used to make this article is the free one. This app can be very
useful to check the mobile network coverage we have at a given time. A limitation of
Android devices is GPS: it is recommended, set the GPS mode to “high precision”
in the location settings of your device to get the best performance of the application.
To approximate the distance between the measurement points and the base station,
it was used Google Earth [5].
Empirical propagation models are widely used to calculate path losses in a wireless
channel in different types of scenarios, and their results are considered when selecting
the location of base stations and planning their coverage area [6]. For 2.1 GHz, prop-
agation models used to estimate signal attenuation in long-term evolution (LTE)
mobile communications systems are mainly the Stanford University Interim (SUI)
model and Walfisch-Bertoni model, which are models applicable up to 3 GHz. For
these models, the equations depend on different variables of the propagation envi-
ronment (effect of roofs and height of buildings, among others), which makes them
precise; however, they are more complex in their calculation [7]. In this article, power
reception measurements were taken in 5 strategic points of Ecuador in the city of
Riobamba. Using the software Network Cell Info Lite WiFi to obtain the reception
power in the coverage area of each of the base stations. We compare graphs obtained
using propagation models and the coverage measurements of Movistar and Claro
mobile telephony powers in band 2 and band 4, respectively, for suburban environ-
ments. Log-normal, Okumura-Hata, Cost 231, Walfisch-Bertoni and SUI models
were evaluated in this work in order to estimate a better coupling. Finally, a propa-
gation model is selected that best fits the measurements obtained using error theory
to estimate a better analysis. The best fitting models in relation to the measurements
were Cost 231, Walfisch-Bertoni and SUI.
Comparative Analysis of Signal Strength in 5 LTE Networks Cell … 493
2 Theorical Framework
Log distance path loss model is a generic model and an extension to Friis free space
model. It is used to predict the propagation loss for a wide range of environments,
whereas the Friis free space model is restricted to unobstructed clear path between
the transmitter the receiver. The following equation shows mean path losses
d
P L(d)[d B] = P L(do) + 10nlog +χ (1)
do
This model is considered one the simplest and best in accordance with its precision
in path loss calculation and has become the method of mobile system planning in
Japan.
The most important result provided by the model is the median value of the
basic propagation loss, as a function of frequency, distance and the heights of the
base station and mobile antennas. Although it does not include any of the path type
correction factors, which are in the Okumura model, the equations proposed by Hata
have important practical value [9].
• f: 150–1500 MHz
• h b : 30–200 m
494 K. Chiguano et al.
• h m : 1–10 m
• d: 1–20 km
the following equation is used to calculate the correction factor for small cities:
2
f
L su = L u − 2 log − 5.4 for sub-urban area (4)
28
The COST 231 model is a semi-empirical path loss prediction model, resulting from
the combination of the Walfisch-Bertoni and Ikegami models. It is recommended for
macro-cells in urban and suburban scenarios, with good results of the path loss for
transmitting antennas located above the average roof height. However, the error in
predictions increases considerably as the transmitter height approaches the rooftop
height, with very low efficiency for transmitters below that level [10].
For this analysis, the model is implemented with the following equation:
where L o is the attenuation in free space and is described where is the attenuation
as:
Here, L ORI is a function of the orientation of the antenna relative to the street a (in
degrees) and is defined in Table 2.
L MSD represents diffraction loss due to multiple obstacles and is specified as:
where:
Comparative Analysis of Signal Strength in 5 LTE Networks Cell … 495
Table 2 Equations
L ORI −
depending on the range of the
angle -10 + 0.354a 0 < a < 35
2.5 + 0.075(a-35) 35 < a < 55
4 − 0.114(a-55) 55 < a < 90
(f)
k F = −4 + k (9)
924
Here, k = 0.7 for suburban centers and 1.5 for metropolitan centers (Tables 3, 4 and
5) [11].
This model estimates the influence of building height and ceilings by using diffraction
models to predict the average signal power at pavement level [12].
d2
L P = 57.1 + A + log( f ) + 18 log(d) − 18 log(H ) − 18 log 1 − (10)
17H
496 K. Chiguano et al.
b 2 2(h b − h r )
A = 5 log + (h b − h r ) − 9 log(b) + 20 log arctan
2
2 b
(11)
Equations should be punctuated in the same way as ordinary text but with a small
space before the end punctuation mark.
SUI is based on the Hata model. It applies to heights from the MS between 2 and
3 m and from the BS between 10 and 80 m. The frequency range for the model is
from 0 to 2000 MHz [13].
d
PL = A + 10γ log + X f + Xh + S (12)
do
c
γ = a − bh b + (13)
hb
4π d0
A = 20 log (14)
λ
f
X f = 6 log (15)
2000
hr
X h = −10.8 log (16)
2000
The SUI model groups the propagation scenarios into three different categories,
each with its own specific characteristics:
• Category A: mountainous ground with medium and high levels of vegetation,
which corresponds to high loss conditions.
• Category B: mountainous ground with low levels of vegetation, or flat areas with
medium and high levels of vegetation. Medium level of losses.
• Category C: flat areas with very low or no vegetation density. Corresponds to
paths where losses are low.
Comparative Analysis of Signal Strength in 5 LTE Networks Cell … 497
Errors can be the result of the inaccuracy of the measuring equipment, which are
called systematic errors, or caused by external agents or by the operator himself,
which are called accidental errors. While the former are repeated in the same sense,
whenever the same measuring apparatus is used, the latter vary from one experience
to another, both in value and in sign [14]. The absolute error can be conceptualized
as the difference between the real value and the value obtained:
Ea = X r − X o (18)
in which a represents the actual value and X r represents X o the obtained value. The
relative error is the result of the division between the absolute error and the actual
value, while the relative percent error is the result of the relative error in percent:
Ea
Er = Relative Error (19)
Xr
Ea
Er % = × 100% Relative Percent Error (20)
Xr
3 Methodology
For this study, 5 base stations were selected (Movistar-Claro) located in the down-
town area of the city. Claro) that are located in the downtown area of the city of
Riobamba; their locations have been selected thanks to the coverage maps provided
by the cellular coverage maps provided by the cellular telephone companies (the loca-
tions are shown in Fig. 1). The data obtained from the mobile application (Network
Cell Info Lite) are collected quantitatively. The sample size is 60 data per base
station. The data collection was obtained at different times of the day, in order to
observe the influence of variables such as weather, traffic, users, cellular technology,
distance between the antenna and the base station, technology, distance between the
transmitting antenna and the receiver. To obtain the approximate distance from the
point of each measurement and the base station, Google Earth was used to obtain the
approximate distance between the point of each measurement and the base station.
Several of the parameters used in the propagation models were obtained from
information provided by the country’s mobile telephone companies. Each selected
base station is located in different environments, and the received power is affected
498 K. Chiguano et al.
For each of the propagation models, it was necessary to determine certain parameters
as shown in the following tables.
Where it is necessary to consider important parameters, being the operating
frequency, the P.I.R.E. and the height of the transmitting antenna as shown in Table
6. It is also necessary to consider that in 4G/LTE, the Movistar operator is working
in band 2 with a frequency of 1900 MHz.
Table 7 considers a network cell with similar characteristics to the previous one
mentioned, because both base stations belong to the Movistar operator, and their
main difference is the height at which it is located at 49 m.
Claro is working in Band 4, at a frequency of 1700/2100 MHz. In addition, this
cell is working with a P.I.R.E. of 20 dBm as shown in Table 8. And these are the
most important parameters to consider.
Table 9 considers narrow streets and a transmitter height of 49 m. It is also neces-
sary to consider that in the theoretical calculations to determine the P.I.R.E is 19
dBm was obtained; therefore, it is assumed that the P.I.R.E is 20dBm (Table 10).
Comparative Analysis of Signal Strength in 5 LTE Networks Cell … 499
Finally, knowing all the parameters required by the propagation models, they were
applied to each of the base stations in order to predict the loss curve with respect to
500 K. Chiguano et al.
distance, in addition to applying the error calculation to determine which model best
fits the real values, based on the percentage of error.
4 Results
Figure 2 shows the results obtained in the field measurements, providing information
about the coverage offered by El Cisne base station located at Daniel Leo´n Borja and
Duchicela avenues. In general, the figure shows that in the area near the base station,
we were able to receive a signal with an excellent power reaching values of up to
−60 dBm; however, at similar distances, we could appreciate a drop that reaches
up to −87 dB. Thanks to the field measurements, it was possible to appreciate that
because the base station is located in a central area of the city of Riobamba in certain
points, it was possible to receive a considerably good power because it was possible
to have a direct view to the transmitting antenna; however, in certain areas near the
antenna, it was possible to appreciate that there is an abrupt drop in areas where
apparently the signal should be good. At this point, we can consider that there is an
attenuation due to the infrastructure surrounding the base station. In spite of this, the
final result, which is the average of the measurements, indicates that there are no
considerable losses, and it can be deduced that in the area of the Hotel Zeus as well
as in the Guayaquil Park, the coverage is really good. However, it should be noted
that the base stations in an urban area seek to reduce the power and increase more
infrastructure is why it was observed that when reaching a distance of 450 m with a
drop of −110 dBm the mobile device suddenly changes, connecting to a transmitting
antenna that manages to emit a considerably higher power, thus achieving a wider
coverage in the urban area.
In Fig. 3, it shows the results obtained by applying the propagation models to the
mean of the measurements obtained, which allowed observing the behavior of the
logarithmic curve under conditions consistent with the scenario found.
Comparative Analysis of Signal Strength in 5 LTE Networks Cell … 501
To determine the model that best fits the measurements obtained, the error theory
was applied, giving the results shown in the Table 11, resulting in the Walfisch-Bertoni
model showing a smaller error compared to the others. propagation models.
Figure 4 shows the results obtained from the measurements which gives us infor-
mation about the coverage offered by the Zeus base station located at Daniel Leo´n
Borja Avenue. The figure shows that in the area closest to the base station, a signal
with good power is received reaching values up to −65 dBm; however, at similar
distances, there is a drop in the signal that reaches up to −89 dBm. Due to the field
measurements, we were able to analyze that because the base station is located in the
downtown area of the city of Riobamba, it is observed that at certain points, a good
signal is received, because we have a close view to the base station.
As it is in a central area of the city, there are elements that interfere with the
intensity of the power which will give us an attenuation due to the infrastructure.
Finally, the final result shown in Fig. 4 is the average of our three measurements,
which shows that there are no large-scale losses, so we can say that there is a good
signal reception.
Figure 5 shows the comparison of the empirical models used in the suburban area
of the city of Riobamba. Base station (Hotel Zeus).
Table 12 shows the comparison of the absolute and relative errors of the propa-
gation models to select the best model from which it can be noted that the Walfisch-
Bertoni model is the most appropriate for the measurements obtained from this base
station.
Figure 6 shows the average power level in reception of 150 samples obtained at
different times of the day, highlighting that the best time for a good power in reception
is from 11:00 to 13:00. On the contrary, it is possible to observe reception values
of −100 dBm from 08:00 to 10:00. This is due to many factors such as the number
of connected users, the distance from the connection point, the mobile device or the
peak hours of the city.
Figure 7 shows the comparison of the empirical models used in the suburban area
of the city of Riobamba. Base station (Guayaquil Park).
Table 13 shows the comparison of the absolute and relative errors of the propa-
gation models to select the best model. For the data obtained at this base station, the
Walfisch-Bertoni model is the best fit.
Figure 8 shows the results obtained during measurements at different times in the
Santa Cecilia base station providing information about the coverage offered by the
operator Claro, sector located between Vicente Rocafuerte and Carabobo streets.
504 K. Chiguano et al.
Figure 10 shows the results obtained in the field measurements, providing informa-
tion on the coverage offered by the Banco Pichicha base station located at Primera
Constituente and Pichincha streets. The figure shows that in areas close to the base
station, it was possible to receive a signal with excellent power reaching values up to
−57dBm; however, at similar distances, it was possible to observe a drop reaching
up to −103 dBm.
Through the field measurements, it was possible to appreciate that because the
base station is located in a central area of the city of Riobamba in certain points, it was
possible to receive a considerably good power; this is because it was possible to have
a direct view to the transmitting antenna. While in other data, a notable dispersion in
the measurements can be observed due to the different attenuations that exist in the
place, which can be noted that the height of the buildings will be an obstacle for the
intensity of power that will reach the receiver thanks to the diffraction that occurs in
the scenario in where the measurements were collected.
Figure 11 presents the comparison of the empirical models used in the suburban
area of the city of Riobamba. Base station (Pichincha bank). Which shows the results
obtained by applying the propagation models to average measurements obtained,
Comparative Analysis of Signal Strength in 5 LTE Networks Cell … 507
which allows observing the behavior of the logarithmic curve under conditions
compatible with the results of the scenario found.
Table 15 shows the results of the error estimation for each propagation model
applied, of which the one with the highest margin of error is the Okumura-Hata
model with 23% and the one with the lowest margin of error is the cost 231 model
with 6%, so it can be assumed that the model that best adjusts to the average of the
measurements obtained is the COST 231 model.
5 Conclusions
• It was observed that the models that most closely adapted to the scenarios proposed
were the SUI and COST 231 models, based on the fact that the conditions for the
use of these models were met.
• It was concluded that the Okumura-Hata propagation model cannot be used in
small cities such as the city of Riobamba since it has a greater error than the other
models; in addition, the range for its application of the Hata model is between
200 m and 20 km and that it uses frequencies lower than 1800 MHz.
• According to the observation of the results, 4G technology has a greater stability
in its network at a range of up to 250 m or 300 m. But its coverage area is more
limited so there are base stations nearby to ensure a permanent connection.
• There is a great influence of the power in reception in relation to the polarity of
the antenna located in the base station, although the coverage area acts radially;
within the cities, there are antennas directed to cover certain areas. This in most
cases is due to the infrastructure of buildings that obstruct the line of sight.
• It was determined that the Walfisch-Bertoni model does not consider the existence
of a line of sight between the transmitting antenna and the receiving antenna. It
uses diffraction to analyze the losses suffered by the signal before reaching the
receiving antenna with respect to the distance from buildings.
• The applied models considered variables such as street width, building height,
reflection angles, among other parameters. As could be seen in the results, there
is no specific model that fits each base station, since the reception power in each
of them is affected by different factors such as weather, traffic, infrastructure and
the number of connected users.
References
1. Pavon JO, Jime´nez Motilla J (2017) Evaluación comparativa de redes móviles, Master’s thesis,
Universidad Politécnica de Madrid
2. Santiago CAO (2020) Diseño e implementación de una celda celular con tecnologías 2g, 3g y
lte para la ciudad de san jose de chimbo en la provincia de bolivar (bachelor’s thesis). EPN, p
11
3. ARCOTEL (2018) Boletín estadístico. Agencia de regulación y control de las telecomunica-
ciones
Comparative Analysis of Signal Strength in 5 LTE Networks Cell … 509
Abstract In today’s world, a secure and reliable system configuration is very impor-
tant be it for any service-based or for any product-based company. Through Blaze,
we enable the functionality to remotely connect to any device over the network and
configure it as per our need. Flexible and scalable are the values that we follow in
our roots. Blaze focuses on customer satisfaction with seamless, high-quality support
for the users and ensures the correctness of the actions performed through powerful
automation scripts to mitigate the human errors. Further, automation is the value that
we nourish throughout the project by adding the functionality of a complete system
upgrade and including real-time reporting as well for the system to be upgraded. In
this project, we aim to develop a command line tool for Service Management and
Monitoring that follows cutting-edge automation compliance.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 511
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_49
512 H. K. Sharma et al.
1 Introduction
We live in a world where Alexa sets alarms and students study from E-books.
Computers have become an inevitable part of our lives, and the ability to connect
to and use any service worldwide had completely changed our lives. Coming to the
organizations, as an organization grows, workforce, resources, systems, services and
infrastructure also tend to grow considerably, and it becomes difficult to maintain
each of the system physically. Maintaining system, provisioning them and config-
uring them are a major concern for most of the IT industries today as we need to
ensure that the different system element services are running smoothly with the same
configuration to keep the IT services running right. The primary reason is that while
using any software, many users notice that there are slight human errors that has
degraded the performance or has rendered the system useless. This project focuses
on implementing Service Manager and Monitoring System. The project works by
using a bunch of automation scripts and roles (modules) for the provisioning and
configuration part to mitigate the probability of human errors. The project will use the
object-oriented approach as well as handling and processing the data in the database.
The entire Service Manager and Monitoring System is working on the real-time data,
and we wish to make it an intelligent system using some of the intelligent learning
models. Ansible automation engine and various other tools will be utilized to achieve
automated [9] using the principles of DevOps (Fig. 1).
The project will be delivered as a command line utility and will be deployed over
to a dedicated system (can be both on cloud and on premise as well. Python scripts, In
Linux system architecture, everything is a file and all the files and directories appear
under the root (/) directory, even if they are stored on different physical or virtual
devices. Each of the files in the file system is isolated and is present in their respective
directories, and everything is well organized. The directories are organized in such a
way that they provide access to a group of files, e.g., /bin contains essential binaries,
/etc. contains the configuration files, /var the variable data files and there are many
other directories as well which provides their isolation of files. We will utilize this
concept of the Linux architecture and will configure and modify the only file system
necessary. Considering the example of system upgrade, we will be modifying only
the /bin and /boot directory so that all the other data remains as is.
2 Literature Review
In Refs. [1, 2], the authors have proposed that numerous companies are running
distributed operations on their on-premise servers. Still, if load on those servers fluxes
suddenly, and also, it becomes tedious to gauge the coffers and requires skillful human
authority to address similar situations. It may accelerate the capital expense. Hence,
numerous companies have bolted to resettle their on-premise operations to the cloud.
This migration of the operations to the cloud is one of the substantial challenges. To
setup and address the growing sophisticated architecture after migrating these oper-
ations to the cloud are really a time devouring and tedious operation which results in
downtime. Hence, we necessitate to automate this contexture. To attain architecture
for the distributed systems which advocate security, repetition, trustability and scal-
ability, we require some cloud automation tools. This document summarizes tools
like as Terraform and cloud conformation for infrastructure automation and Docker
and Habitat for operation automation.
In Ref. [3], the inventors have declaimed about certain embodiments that associ-
ations are acquainting in operations to accelerate the pace of their software devel-
opment procedure and to ameliorate the grade of their software. We also depict the
developments of an exploratory interview-predicated study concerning six associa-
tions of various sizes that are functional in various diligence. As proportion of our
findings, we adhered that all associations were admiring about their guests and only
minor challenges were encountered while espousing DevOps.
3 Dataset Characteristics
Here, in our project, the database refers to the list of the devices that is to be provided
to the Service Manager and Monitoring System for the configuration purpose [8].
Here, we are using two different kinds of datasets, one being static and other being
dynamic in nature.
514 H. K. Sharma et al.
• The static database is defined in the Blaze inventory and contains the list of IP
addresses to be configured.
• The dynamic inventory is used in case of the cloud-based inventories (AWS in our
case), where the inventory is parsed dynamically based on the credentials stored
in the console.
In case of static inventory, we provide the user with the option to use and configure
any Linux distribution according to his/her comfort.
On the other hand, in case of dynamic inventory, we are using RHEL-based OS
for making it smooth and easy for user to work and configure the system easily
as the user is focused on receiving the working services rather than the underlying
architecture [5].
4 Methodology
The objective of the project is to build a command line utility for achieving Service
Manager & Monitoring System with all the features of automation [6, 7].
The complete task is further divided into subtasks:
1. Configuring and making resources available on cloud.
2. Managing the workspace and configuring it with the desired toolset.
3. Generating logs for any privilege activity made.
4. Upgrading the system in case the system is old.
The configurations that we want to achieve after the completion of the project are:
1. Installing/updating any package over the remote system.
2. Updating the OS of the remote system from EL6 to EL7 or from EL7 to EL8 [8].
3. Starting/Stopping/Restarting any service over the remote system.
4. Configuring yum repositories.
5. Running any docker image in the remote server.
This project is blending the agile methodology and waterfall methodology of soft-
ware development as it is rare to find all the qualities in a single software develop-
ment methodology. There are different ways to implement a waterfall methodology,
including iterative waterfall, which still practices the phased approach but delivers in
smaller release cycles. The project used the agile methodology for the development
in build part of the project to take the advantage of the documentation part of the
waterfall methodology as well as utilizing the sprints as a part of agile workflow.
Overall, the time for the project is dedicated to an approach where the beginning
time is dedicated toward the requirement analysis and the documentation part, and
during the implementation part, all the team members are following their dedicated
sprints cycles to implement the functionality. After the implementation, testing is
to be done for the whole application. Finally, the application is deployed with the
documentation.
Automated System Configuration Using DevOps Approach 515
The following data flow diagram depicts how the data is flowing in the application;
here in our case, we have three different roes to be considered for the working of the
application, i.e., engineer, Blaze and the remote server to be configured (Fig. 2). The
engineer is working on his/her machine and wants to configure the remote server
as per his needs. First, the user installs and configure the Blaze on his/her system
and creates the inventory and playbook for the system to work upon. The playbook
syntax is simple and inventory too. Blaze works with Terraform as well as Ansible
which it connects in the background; it works as a third-party tool designed to work
on the top of existing DevOps tools [4] and utilize their extensive functionality
under a single umbrella and make the services available for all the user easily and
hassle free. Blaze follows the principle of DevOps and follows the DevOps culture
rather than just being a tool for automation. The principle of low-code, idempotence,
code-generation, failure detection and many more are used in the project (Fig. 3).
5.1 sshKeygen
sshKeygen generates a new and a unique key for the authentication purpose and
stores it in the keyStorage directory. The status of the sshKeygen is maintained in a
file which can be utilized further as required by the program.
• Check if status directory exists,
• If exists, continue.
516 H. K. Sharma et al.
5.2 sshCopyId
sshCopyId utilizes the key generated from the sshKeygen stored in the keyStorage
directory and utilizes it to copy it to the remote user so that a password less access
can be provided for the remote system. The status of the sshCopyId is maintained in
a file which can be utilized further as required by the program (Figs. 4 and 5).
Automated System Configuration Using DevOps Approach 517
• Execute the command to copy the sshKey from the server on our end to the server
on the client end.
• If not executed successfully, dump the error and exit the program.
• If executed successfully, save the status in the status file (Fig. 6).
5.3 5x-Automation
In the modern world, to install any software, we download the executable file and
install it directly rather than downloading the files and placing them in separate
directories and then configuring the pre-install, post-install and perform checks for
the required packages.
6 Conclusion
A DevOps tool to help and manage other infrastructure provisioning and configu-
ration management tools has been achieved under a single umbrella. Blaze works
on all the preferences and is currently under development so much can be expected
Automated System Configuration Using DevOps Approach 519
from it further. In the end, Blaze is able to achieve the following results: configure
a remote system according to the provided configuration, upgrade the infrastructure
(in-place) without the need to re-configure the system, provide a company-specific
full-stack DevOps use case to deploy the infrastructure and configure it.
References
1. Masek P, Štůsek M, Krejčí J, Zeman K, Pokorny J, Kudlacek M (2018) Unleashing full potential
of ansible framework: university labs administration. In: Proceedings of the XXth conference of
open innovations association FRUCT, p 426. https://doi.org/10.23919/FRUCT.2018.8468270
2. Jayachandran P, Pawar A, Venkataraman N (2017) A review of existing cloud automation tools.
Asian J Pharm Clin Res 10.471. https://doi.org/10.22159/ajpcr.2017.v10s1.20519.
3. Erich F, Amrit C, Daneva M (2017) A qualitative study of DevOps usage in practice. J Softw
Evol Process. https://doi.org/10.1002/smr.1885
4. Agarwal A, Gupta S, Choudhury T (2018) Continuous and integrated software development
using DevOps. In: 2018 International conference on advances in computing and communication
engineering (ICACCE), pp 290–293. https://doi.org/10.1109/ICACCE.2018.8458052
5. Kumar S, Dubey S, Gupta P (2015) Auto-selection and management of dynamic SGA param-
eters in RDBMS. In: 2015 2nd International conference on computing for sustainable global
development (INDIACom), pp 1763–1768
6. Tian J, Varga B, Tatrai E, Fanni P, Mark Somfai G, Smiddy WE, Cabrera DeBuc D (2016)
Performance evaluation of automated segmentation software on optical coherence tomography
volume data. J Biophoton 9(5):478–489
7. Klein R, Klein BEK (2013) The prevalence of age-related eye diseases and visual impairment
in aging: current estimates. Invest Ophthalmol Vis Sci 54(14)
8. Biswas R et al (2012) A framework for automated database tuning using dynamic SGA
parameters and basic operating system utilities. Database Syst J III(4)
9. Gulia S, Choudhury T (2016) An efficient automated design to generate UML diagram from
Natural Language Specifications. In: 2016 6th International conference—cloud system and big
data engineering (Confluence), pp 641–648. https://doi.org/10.1109/CONFLUENCE.2016.750
8197
Face Mask Detection Using Multi-Task
Cascaded Convolutional Neural
Networks
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 521
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_50
522 N. Rayapati et al.
In these developing technologies, face detection is the most popular and significant
technology. In the world that is battling against coronavirus disease COVID-19, the
technology is of a great use. Many organizations are getting converted to ‘work from
home’ style as a precaution of this pandemic. As the pandemic effect is slowly getting
reduced, many of the workers are now apprehensive about returning to the office ‘in-
person’ work style. During this transformation from work from home (WFH) to
in-person style the checking of any violations manually is almost impossible in large
premises. Computer vision and artificial intelligence techniques motivated for auto-
matic detection that helps in monitoring, screening society containing coronavirus
(COVID-19) pandemic.
2 Related Work
In this section, related works are reviewed that are done in this domain, the methodolo-
gies and algorithms that are designated solely for face mask detection are inadequate.
The Viola–Jones method for face detection [1] uses Haar features for extracting facial
features. A novel detection framework [2] is wearing face mask for helping control the
spread of COVID-19. Jignesh et al. [3] proposed a detector using SRCNet classifica-
tion network. Another model is developed using SRCNet in Ref. [4] with good perfor-
mance. With deep learning advancements, neural networks learn features without
prior knowledge in forming feature extractors like You Only Look Once (YOLO)
algorithm [5, 6]. Face detection models developed not only use CNN and pre-trained
models but also include independent techniques, classifiers such as support vector
machine, softmax and optimizers such as Adam optimizer for better accuracy and effi-
cient classification. A model proposed by Preeti et al. [7] SSDMobileNetV2 proposed
face mask detection using OpenCV. Some of the earlier developed systems include
pre-trained models called as MobileNet as the main component or as the backbone
along with other models with fine-tuning. Some of the models developed [8], which
has high computational efficiency and is easy to set up model for embedded systems.
Different pre-trained deep convolutional neural networks (CNN) extract deep
features [9] from images of faces. Extracted features are further processed using
machine learning classifiers. To overcome the weak generalization ability problem,
Qi et al. [10] proposed a network that takes the input images of variable size which
is the extension of existing system. Chaves et al. in Ref. [4] evaluate the speed–
accuracy trade-off of three popular models. Different face detection methods using
deep learning [11, 12] and image processing technologies are presented in Ref. [13].
Receptive field enhanced multi-task cascaded CNN (RFE-MTCNN) is proposed by
Xiaochao et al. in Refs. [14, 15]. The existing models take the fixed size input images,
which makes the model generalization ability weak.
Face Mask Detection Using Multi-Task Cascaded Convolutional Neural … 523
3 Proposed Method
Proposed approach has been demonstrated in Fig. 1 and aims to detect whether the
people wearing a face mask or not. The proposed approach consists of two stages,
detection of faces from images and prediction of face mask-wearing conditions.
A. MTCNN
MTCNN is comprised of three deep neural networks—P-Net, R-Net and O-Net,
respectively. Hence, MTCNN is called as three-staged neural network. Primary
stage is to resize input images into different scales in order to make an image
pyramid (Figs. 2 and 3).
b. Architecture of P-net
c. Architecture of R-net
d. Architecture of O-net
Table 1 Summary of
Output Param#
network model
24 × 24 1000
12 × 12 0
10 × 10 57,664
5×5 0
1600 0
Dense (Dense) 50 80,050
Dropout (Dropout) 50 0
Dense (Dense) 2 102
Face Mask Detection Using Multi-Task Cascaded Convolutional Neural … 527
4 Experiments
A. Dataset
Dataset used for training is face mask detection (https://www.kaggle.com/wob
otintelligence/face-mask-detection-dataset) [16]. The dataset contains a total of
5933 images. The dataset contains a total of 5933 images.
The flow of the proposed approach is illustrated in Fig. 9, showing training phase
and testing phase. The proposed model has been trained for 30 epochs. As Adam
optimizer is the simple, time-efficient optimizer and best replacement for stochastic
gradient descent for training deep learning models. Thus, the Adam optimizer is
considered of the proposed model. Hyperparameter settings of the proposed model
are listed in Table 2. The training images are annotated by human labelling for the
face coordinates in the images for both single face images and multiple face images.
The facial coordinates, corresponding image source and class label are loaded from
a comma-separated values (CSV) file. During the training phase, the model converts
the images into grayscale using OpenCV module. By using the facial coordinates
from the CSV file, all the faces are cropped from the images and are resized into 50 ×
50 dimensions as a part of image post-processing. Finally, by making the necessary
transformations and normalizations on the cropped images and features, they are fed
to the model.
Testing phase contains 1698 images. The final result of the testing phase is to
classify the images into face_with_mask and face_no_mask. Testing phase consists
of two steps, detection faces from the test images and classification of face mask-
wearing conditions. As described in the Sect. 3A, MTCNN algorithm is used for the
face detection. All the test images are fed to the MTCNN detector which outputs
the bounding box coordinates of all the faces in the image. These coordinates are
used to crop faces from the images. The cropped image further undergoes image
Table 2 Hyperparameters of
Hyperparameters Value
the proposed model
Epochs 30
Batch size 5
Optimizer Adam
Learning rate 0.001
Decay rate 1e-5
5 Conclusion
During these COVID-19 pandemic times, people are forced to wear a face mask at
all the public places such as markets and offices, but manually checking about the
conditions of wearing face mask of every person is not achievable. Thus, researches
are motivated to develop automatic facial mask detection system. In this paper, a
model is proposed for the detection of face mask-wearing conditions. The model
proposed accommodates MTCNN algorithm, a hybrid model for the efficient facial
region detection in unconstrained environment, which outstands among the existing
detectors. For the purpose of face mask detection, CNN architecture is developed
that extracts features. Finally, a softmax classifier is used for the binary classification,
which classifies faces in the images into two classes, namely face_with_mask and
face_no_mask. The model is evaluated on the face mask detection dataset available
at kaggle website. This model achieved 99.53% accuracy and 0.14% loss after 30
epochs. Further, the proposed model outperforms several of the existing models in
face mask detection area of research.
References
1. Huang J, Shang Y, Chen H (2019) Improved Viola-Jones face detection algorithm based on
HoloLens. Springer Access
2. Zhang J, Han F, Chun Y, Chen W (2021) A novel detection framework about conditions of
wearing face mask for helping control the spread of COVID-19. IEEE Access 9:42975–42984
3. Jignesh Chowdary G, Punn NS, Sonbhadra SK, Agarwal S (2021) Face mask detection using
transfer learning of InceptionV3. Springer access
4. Chaves D, Fidalgo E, Alegre E, Alaiz Rodríguez R, Jáñez-Martino F, Azzopardi G (2020)
Assessment and estimation of face detection performance based on deep learning for forensic
applications. Sensors
5. Loeya M, Manogaran G, Hamed M, Tahad Nour N, Khalifa EM (2020) Fighting against
COVID-19: a novel deep learning model based on YOLO-v2 with ResNet-50 for medical
face mask detection. Elsevier
530 N. Rayapati et al.
6. Kumar A, Kalia A, Verma K, Sharma A, Kaushal M (2021) Scaling up face masks detection
with YOLO on a novel dataset. Elsevier Access
7. Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2020) SSDMNV2: a real time
DNN-based face mask detection system using single shot multibox detector and MobileNetV2.
Elsevier
8. Qin B, Li D (2020) Identifying facemask-wearing condition using image super-resolution with
classification network to prevent COVID-19. Springer access
9. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) A hybrid deep transfer learning
model with machine learning methods for face mask detection in the era of the COVID-19
pandemic. Elsevier
10. Qi R, Jia R-S, Mao Q-C, Sun H-M, Zuo L-Q (2019) Face detection method based on cascaded
convolutional networks. IEEE Access 7:110740–110748
11. Pei Z, Xu H, Zhang Y, Guo M, Yang Y-H (2019) Face recognition via deep learning using data
augmentation based on orthogonal experiments. Electronics
12. Zheng G, Xu Y (2021) Efficient face detection and tracking in video sequences based on deep
learning. Elsevier
13. Liu Q, Peng H, Chen J, Yang S (2020) Face detection based on open Cl design and image
processing technology. Elsevier
14. Li X, Yang Z, Wu H (2020) Face detection based on receptive field enhanced multi-task
cascaded convolutional neural networks. IEEE Access 8:174922–174930
15. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask
cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
16. Cabania A, Hammoudi K, Benhabiles H, Melkemi M (2020) MaskedFace-Net—A Dataset of
correctly/incorrectly masked face images in the context of COVID-19. Elsevier
Empirical Study on Categorized Deep
Learning Frameworks for Segmentation
of Brain Tumor
Abstract In the medical image segmentation field, automation is a vital step toward
illness detection and thus prevention. Once the segmentation is completed, brain
tumors are easily detectable. Automated segmentation of brain tumor is an important
research field for assisting radiologists in effectively diagnosing brain tumors. Many
deep learning techniques like convolutional neural networks, deep belief networks,
and others have been proposed for the automated brain tumor segmentation. The
latest deep learning models are discussed in this study based on their performance,
dice score, accuracy, sensitivity, and specificity. It also emphasizes the uniqueness
of each model, as well as its benefits and drawbacks. This review also looks at
some of the most prevalent concerns about utilizing this sort of classifier, as well
as some of the most notable changes in regularly used MRI modalities for brain
tumor diagnosis. Furthermore, this research establishes limitations, remedies, and
future trends or offers up advanced challenges for researchers to produce an efficient
Authors Roohi Sille and Tanupriya Choudhury contributed equally and all are the first author.
R. Sille (B)
Systemics Cluster, University of Petroleum and Energy Studies (UPES), Dehradun,
Uttarakhand 248007, India
e-mail: [email protected]
T. Choudhury (B) · P. Chauhan
Informatics Cluster, University of Petroleum and Energy Studies (UPES), Dehradun,
Uttarakhand 248007, India
e-mail: [email protected]
P. Chauhan
e-mail: [email protected]
H. F. Mehdi
Department of Computer and Software Engineering, University of Diyala, Baquba, Iraq
e-mail: [email protected]
D. Sharma
School of Business and Management, Christ University, Delhi NCR Campus, Mariam Nagar,
Meerut Road, Delhi NCR, Ghaziabad 201003, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 531
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_51
532 R. Sille et al.
system with clinically acceptable accuracy that aids radiologists in determining the
prognosis of brain tumors.
1 Introduction
For analyzing data about the biological human body, medical imaging is crucial.
Various imaging modalities, like computed tomography scans, X-rays, magnetic
resonance imaging, and so on, are utilized in order to identify various illnesses. For
frequently using in the diagnosis and clinical research, the computed tomography
scans and the magnetic resonance imaging are done. The medical imaging approaches
presented have certain advantages and disadvantages. MRI has the following advan-
tages over other imaging modalities: high resolution, high signal-to-noise ratio, and
soft tissue imaging capability [1]. Segmentation of medical pictures aids in the sepa-
ration of distinct objects included in a medical image for better brain MRI analysis.
The segmentation of brain MRI data has been successfully automated using a variety
of deep learning techniques because hand segmentation is laborious and has poor
reproducibility [2]. Deep learning algorithms have made transfer learning possible
without a vast amount of data or handmade features. It has the ability to extract the
features of specific brain MRI tissues automatically. Because of the intensity inhomo-
geneities in brain MRI images, preprocessing is required before using a deep learning
model to process the image. Preprocessing improves the texture quality of images,
allowing deep learning approaches to do more accurate segmentation. Computerized
diagnostics are necessary to aid radiologists in clinical diagnosis. This allows a large
number of instances to be processed with the same precision and in less time. It has
also been noticed that due to the overlap in intensity between the two groups, distin-
guishing non-healthy tissues from healthy tissues is challenging. Recent studies have
used deep neural networks or convolutional neural networks to separate brain MRI
data.
Different categories of brain datasets are publicly available for research work in
object detection, separation of gray matter, white matter, tumor segmentation, and
cerebrospinal fluid, among other things, which is due to the extensive research being
done in the segmentation of automated brain tumor. Researchers have largely used
the BraTS dataset from 2015 to 2021, according to the literature survey undertaken
for this publication. T1-weighted CE-MRI, MRBrainS, and iSEG-2017 were among
the other datasets examined.
Focusing on qualitative measures like accuracy, specificity, sensitivity, and preci-
sion while using quantitative parameters like entropy (a measure of system disorder),
peak signal-to-noise ratio, and root mean square error (RMSE)/mean square error
(MSE) and can help improve the efficiency of segmentation algorithms [2]. The
Empirical Study on Categorized Deep Learning Frameworks … 533
above-mentioned datasets were used to train various DL models, and their perfor-
mance was measured using dice score, mean IoU, Hausdorff distance, and other
metrics. The majority of the time, dice scores are employed as an evaluation criterion.
In this paper, the benefits and drawbacks of various brain MR segmentation algo-
rithms are explored, with a focus on performance evaluation. DL algorithms have
been demonstrated to provide predicted outcomes for brain tumor segmentation when
compared to ML methods. When applied to medical image segmentation, DL has
a number of advantages over ML techniques [3]. Single-path and multi-path CNN,
cascaded CNN, fully convolutional networks (FCN), and fusion approaches are four
types of deep learning algorithms. The literature review is covered in these sections.
Medical imaging developments have made real-time segmentation of medical images
possible, providing for real-time feedback on therapeutic decisions. This study exam-
ines the deep learning techniques used to improve the computational speed and effi-
ciency of medical picture segmentation in a real-time setting, in order to address all
of the problems mentioned above at once.
2 Related Works
The tumor segmentation is a vital and critical step to detect and manage the disease
cancer. However, accurately segmenting tumors is an interesting research work
because of the characteristics of brain tumors and the noise device. The brain tumor
segmentation approaches based on the fully CNN have shown brightly and received
a growing amount of attention with the recent deep learning success.
F2 FCN is offered as a way to cut down on CNN training time and improve
segmentation accuracy. It is a new distributed and parallel computing concept based
on a hypergraph membrane technology. It has a feature reuse and conformance
module that extracts more valuable features, reduces noise, and improves the fusion
of multiple feature map levels [9].
nnU-net has been updated to include the most recent BraTS team suggestions for
post-processing, region-based training, and more aggressive augmentation methods.
Based on dice scores and Hausdorff’s distance, the nnU-net modification achieved
excellent performance results [10]. Following DSC and HD95 are attained with
proposed methodology (Table 1).
Empirical Study on Categorized Deep Learning Frameworks … 535
Table 5 Comparison between deep learning frameworks for brain tumor segmentation
S. No Paper Dataset CNN Input Performance
architecture Parameter
1 [19] MRBrainS Single and 35 × 35 × Mean IoU: 87.16
iSEG-2017 Multipath 35
CNN
2 [20] BraTS Single and 72 × 72 × Dice scores:
2018,2019,2020,2021 Multipath 72 BraTS 2018:
CNN 77.71%, 79.77%,
89.59%
BraTS 2019:
74.91%, 80.98%,
88.48%
BraTS2020:
72.91%, 80.19%,
88.57%
BraTS2021:
77.73%, 82.19%,
89.33%
3 [21] BraTS 2020 Two Path CNN 2 × 128 × 0.891, 0.842, 0.816
128 × 128
4 [9] BraTS 2020 Hybrid FCN 4 × 128 × Dice scores 0.78,
128 × 128 0.91, 0.85
HD 26.57, 4.18
and 4.97
5 [10] BraTS 2020 U-Net 32 × 128 × DSC: 88.95, 85.06
(FCN-Based) 128 × 128 and 82.03
HD95: 8.498,
17.337 and 17.805
6 [11] BraTS 2017 and 2018 U-Net 128 × 128 NR (Not Reported)
(FCN-Based) ×4
7 [12] BraTS 2020 Cascaded DNN Patches of DSC: 0.8858,
size 120 × 0.8297, 0.7900
120 HD:5.32 mm,
22.32 mm,
20.44 mm
8 [13] BraTS 2015 Cascaded CNN NR (Not DSC: 0.81, 0.76,
Reported) 0.73
9 [15] T1-weighted CE-MRI Cascaded 512 × 512 Dice: 0.8003
LinkNet and 256 × Mean IOU: 0.9074
256
10 [16] BraTS 2015 Skip 120 × 120 0.83, 0.65, 0.62
Connections
with ResNets
11 [17] BraTS 2018 and 2019 multipath CNN 44 × 192 × DSC:
with FCN 192 BraTS 2019: (0.89,
0.78, 0.76)
BraTS 2018: 0.90,
0.79, 0.77
(continued)
538 R. Sille et al.
Table 5 (continued)
S. No Paper Dataset CNN Input Performance
architecture Parameter
12 [18] BraTS 2018 and 2019 FCN and 200 × 168 BraTS 2018
Cascaded 0.787, 0.886, 0.801
BraTS 2019
0.751, 0.885, 0.776
3 Conclusion
Despite the fact that several deep learning models have been trained on a variety of
datasets, brain tumor segmentation remains a difficult task. The CNN models can’t be
trained using all of the trainable parameters connected to the impacted tumor because
of the insufficient datasets available. The segmentation findings are inaccurate as a
result. In brain MRI imaging, data imbalance happens as a result of the diminished
volume of the tumor or lesion regions. The possibility of inaccurate segmentation due
to biased prediction exists as a result of the hand annotation. These reasons allow for
the use of generative adversarial networks (GAN) or adversarial learning to replace
CNN [22–25] models. GAN have the capability to annotate the images required for
training the models, and it is also used to segment the brain tumors from different
image modality scans.
References
1. Isa IS, Sulaiman SN, Mustapha M, Karim NKA (2017) Automatic contrast enhancement
of brain MR images using Average Intensity Replacement based on Adaptive Histogram
Equalization (AIR-AHE). Biocybern Biomed Eng 37(1):24–34
2. Battalapalli D, Rao BP, Yogeeswari P, Kesavadas C, Rajagopalan V (2022) An optimal brain
tumor segmentation algorithm for clinical MRI dataset with low resolution and non-contiguous
slices. BMC Med Imaging 22(1):1–12
3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
4. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document
recognition. Proc IEEE 86(11):2278–2324
5. Matsugu M, Mori K, Mitari Y, Kaneda Y (2003) Subject independent facial expression
recognition with robust face detection using a convolutional neural network. Neural Netw
16(5–6):555–559
Empirical Study on Categorized Deep Learning Frameworks … 539
6. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. In: 3rd International conference on learning representations, ICLR 2015—confer-
ence track proceedings, arXiv preprint arXiv:1409.1556
7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition.
In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
8. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional
networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 4700–4708
9. Jia H, Cai W, Huang H, Xia Y (2020) H 2 NF-Net for brain tumor segmentation using
multimodal mr imaging: 2nd place solution to BraTS challenge 2020 segmentation task. In:
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pp 58–68
10. Isensee F, Jäger PF, Full PM, Vollmuth P, Maier-Hein KH (2020) nnU-net for brain tumor
segmentation. In: International MICCAI brainlesion workshop. Springer, Cham, pp 118–132
11. Zhang J, Lv X, Sun Q, Zhang Q, Wei X, Liu B (2020) SDResU-net: separable and dilated
residual U-net for MRI brain tumor segmentation. Curr Med Imaging 16(6):720–728
12. Silva CA, Pinto A, Pereira S, Lopes A (2020) Multi-stage deep layer aggregation for brain tumor
segmentation. In: International MICCAI brainlesion workshop. Springer, Cham, pp 179–188
13. Khan H, Shah PM, Shah MA, ul Islam S, Rodrigues JJ (2020) Cascading handcrafted
features and Convolutional Neural Network for IoT-enabled brain tumor segmentation. Comput
Commun 153:196–207
14. Chaurasia A, Culurciello E (2017) Linknet: Exploiting encoder representations for efficient
semantic segmentation. In: 2017 IEEE visual communications and image processing (VCIP).
IEEE, pp 1–4
15. Sobhaninia Z, Rezaei S, Karimi N, Emami A, Samavi S (2020) Brain tumor segmentation by
cascaded deep neural networks using multiple image scales. In: 2020 28th Iranian conference
on electrical engineering (ICEE). IEEE, pp 1–4
16. Ding Y, Gong L, Zhang M, Li C, Qin Z (2020) A multi-path adaptive fusion network for
multimodal brain tumor segmentation. Neurocomputing 412:19–30
17. Sun J, Peng Y, Guo Y, Li D (2021) Segmentation of the multimodal brain tumor image used
the multi-pathway architecture method based on 3D FCN. Neurocomputing 423:34–45
18. Tong J, Wang C (2022) A performance-consistent and computation-efficient CNN system for
high-quality automated brain tumor segmentation. arXiv preprint arXiv:2205.01239
19. Sun Q, Fang N, Liu Z, Zhao L, Wen Y, Lin H (2021) HybridCTrm: Bridging CNN and
transformer for multimodal brain image segmentation. J Healthc Eng
20. Akbar AS, Fatichah C, Suciati N (2022) Single level UNet3D with m ultipath residual attention
block for brain tumor segmentation. J King Saud Unive Comput Inf Sci
21. Wang Y, Zhang Y, Hou F, Liu Y, Tian J, Zhong C, … He Z (2020) Modality-pairing learning for
brain tumor segmentation. In: International MICCAI brainlesion workshop. Springer, Cham,
pp 230–240
22. Mishra M, Sarkar T, Choudhury T et al (2022) Allergen30: detecting food items with possible
allergens using deep learning-based computer vision. Food Anal Methods. https://doi.org/10.
1007/s12161-022-02353-9
23. Choudhury T et al (2022) Quality evaluation in guavas using deep learning architectures: an
experimental review. In: 2022 International congress on human-computer interaction, optimiza-
tion and robotic applications (HORA), pp 1–6. https://doi.org/10.1109/HORA55278.2022.979
9824
24. Arunachalaeshwaran VR, Mahdi HF, Choudhury T, Sarkar T, Bhuyan BP (2022) Freshness
classification of hog plum fruit using deep learning. In: 2022 International congress on human-
computer interaction, optimization and robotic applications (HORA), pp 1–6. https://doi.org/
10.1109/HORA55278.2022.9799897
25. Khanna A, Sah A, Choudhury T (2020) Intelligent mobile edge computing: a deep learning
based approach. In: Singh M, Gupta P, Tyagi V, Flusser J, Ören T, Valentino G (eds) Advances in
computing and data sciences. ICACDS 2020. In: Communications in computer and information
science, vol 1244. Springer, Singapore. https://doi.org/10.1007/978-981-15-6634-9_11
An Extensive Survey on Sentiment
Analysis and Opinion Mining:
A Software Engineering Perspective
Abstract Context—The authors have analyzed the opinion mining and sentiments
related to software Engineering and what are the sentimental issues software engi-
neers are facing in the current scenario. Objective—The authors have obtained the
overall solutions to research issue and finding what are the research challenges and
gaps related to sentiments and opinion. Conclusion—The authors of current paper,
have analyzed the work done in various research papers on sentimental analysis
related to software engineering. In software engineering process, the authors include
a process where authors analyze and classify the positive, negative and neutral polari-
ties of the opinions and reviews. This process is called sentiment analysis in software
engineering. The authors give systematic and extensive survey on sentiment analysis
and opinion mining.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 541
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_52
542 S. Vikram Sindhu et al.
1. To write this article, authors have gone through papers from IEEE, Springer,
ACM, GS, Elsevier, MDPI, Wiley etc., authors have shortlisted 200 papers which
are relevant to the topic. Out of which authors have filtered 59 papers based on
abstract and contents. Out of which authors have identified relevant works and
finally selected 45 papers.
2. By analyzing work in all papers, authors found 10 research questions which have
been mentioned in the section VII.
3. Authors have planned to carryout research in analyzing sentiments and opinion
related to software engineering and find necessary solutions to existing chal-
lenges.
4. Authors have carried out systematic literature review in order to bring this paper.
tweets per day. Therefore, it is possible to extract various opinion of the people from
different backgrounds and may help to improve the services and products.
2 Literature Review
In Marks et al. [2], the author introduced the model that explains the relation-ship
between actors in a sentence and thereby obtains the attitude of the actor. This
work explains about categorization of opinion mining and sentimental analysis. A
more detailed model is introduced by Strapparava and Valitutti [3, 4] who developed
Wordnet-Affect that explains direct synsets that elaborates emotions and indirect
synsets which include emotion carrier. Khairullah Khan et al. [1] obtained sentiment
analysis in sentence level. Naive Bayesian classifier has been used for word level
extraction of feature. Individual sentences semantic orientation is obtained from
contextual information. This method claims an accuracy rate of 83% on an average.
Guzman et al. [5] verified the opinions and sentiments of commit comments which
are available in GitHub and given evidence that works having more teams will have
higher positive opinions as a result. Authors also expressed that the same comments
which are written on Monday will have more negative opinion.
A study conducted by Sinha et al. [6] on 28,466 projects and these projects analysis
were done within 7 years of time. This study revealed that most of sentiments which
are supposed to be neutral would be negative on Tuesdays. In Bo Pang et al. [7]
analyzed the classification of positive and negative tags. Document classification has
been done on the topic basis. The work analyses that if we use same techniques
of topic-based classification then sentimental analysis will fail. Therefore, more
number of techniques are be utilized in solving opinion mining and sentiment analysis
problem. Jongeling et al. [8] has compared the four sentiment analysis techniques and
they are SentiStrength, NLTK, Stanford CoreNLP and AlchemyAPI. They evaluated
their performance also in the paper. They found that all four sentiment analysis
techniques failed to provide 100% accuracy. Also they concluded that there are
disagreement among tools.
In Fig. 1, authors describe the procedure used to write this paper. To write this
article, authors have gone through papers from IEEE, Springer, ACM, GS, Elsevier,
MDPI, Wiley etc., and we have shortlisted 200 papers which are relevant to the topic.
Out of which authors have filtered 59 papers based on abstract and contents. Out of
which authors have identified relevant works and finally selected 45 papers.
3 Sentiment Analysis
lexicon based and predefined rules. Automatic systems are used to learn from
techniques used in machine learning. A hybrid sentiment analysis uses both.
Apart from identifying sentiments, opinion mining will obtain polarity which is
nothing but the amount of positivity and negativity. Moreover, sentiment analysis are
applied to documents, paragraphs, sentences and sub-sentences.
Figure 2 shows the steps are involved in Sentimental Analysis and Opinion Mining.
First step is setting goal. Then the text goes through prepossessing stage. Text
will be read and organized in convenient way for compiler. The next step is parsing
An Extensive Survey on Sentiment Analysis and Opinion Mining: … 545
where the text is divided into tokens. Then the text is refined as per the regulations.
Final step is filtered tokens are analyzed and they will be scored.
reviews through public portals and social media as the reviews are ambiguous
and controversial.
lexicon for sentiment classification problem which are a list of positive terms like
beautiful, good useful etc., and list of negative terms like bad, uncomfortable,
ugly and frustrated etc.,
When given a piece of text, the model counts the number of positive and
negative tokens and assigns the related sentiments. If the input text contains
more number of positive terms than negative terms, it will be tagged as positive.
If the input contains more number of negative terms it will be tagged as negative.
This technique has limitations. The problem is those words which do not
appear in lexicon will not be recognized. It will isolate the unrecognized words
from the context.
2. Automated Systems (Based on Machine Learning): Machine learning algorithms
are used by automated systems which can predict sentiments from past obser-
vations. In this approach researchers need a data set with included tags. This is
termed as training data. While carrying out training process, text data will be
converted into vectors and patterns are identified so that vectors are associated
with predefined tags (“Positive”, “Negative” and “Neutral”). Once the related
data is fed, the automated system starts obtaining own predictions which classi-
fies the unseen data. Like this one can improve the accuracy of such models with
more number of tagged examples.
3. Hybrid Systems: Hybrid systems merge both rule-based and machine learning-
based approaches. Hybrid systems tries to learn and detect sentiments from
tagged examples and then it verifies the results with lexicon which will improve
accuracy. The main goal is to get best possible outcome and to over-come all
limitations of this approach.
Figure 3 shows sentiment analysis classification. Data is initially fed to the system
and will go through pre-processing stage, then lexicon-based approach has been
applied to pre-processed data. The analyzed data is fed to sentiment classifier and
opinions and sentiments are extracted. The opinions and sentiments are extracted
without using sentiment classifier on some occasions. Then finally classified data
will be obtained in the last step.
Fig. 3 Sentiment
classification
10 Conclusion
Sentiment analysis purely belongs to machine learning problem and many researchers
are interested to carryout research in this area. In this literature survey authors have
highlighted the work done to solve the sentiment analysis problems in machine
An Extensive Survey on Sentiment Analysis and Opinion Mining: … 549
learning and it can be studies. However notable work have been done in this field,
complete automated systems have not been introduced till now. It may be due to
the unstructured nature of natural language. Authors also like to conclude that the
opinions and sentiments are controversial and ambiguous and thereby causing more
complexity to sentimental analysis.
References
1. Khairullah Khan B, Khan A (2010) Sentence based sentiment classification from online customer
reviews. In: ACM, 2010
2. Maks I, Vossen P (2012) A lexicon model for deep sentiment analysis and opinion mining
applications. Decis Support Syst 53(4):680–688
3. Strapparava C, Valitutti SA (2004) WordNet-affect: an affective extension of WordNet. In:
Proceedings LREC 2004, Lisbon, Portugal, 2004
4. Valitutti A, Strapparava C (2010) Interfacing wordnet-affect with OCC model of emotions. In:
Proceedings of EMOTION-2010, Valletta, Malta, 2010
550 S. Vikram Sindhu et al.
5. Sinha V, Lazar A, Sharif B (2016) Analyzing developer sentiment in commit logs. In: Proceedings
of MSR 2016 (13th international conference on mining software repositories). ACM, pp 520–523
6. Jongeling R, Sarkar P, Datta S, Serebrenik A (2017) On negative results when using sentiment
analysis tools for software engineering research. Empir Softw Eng 2017:1–42
7. Bo Pang SV, Lee L (2002) Thumbs up? Sentiment classification using machine learning tech-
niques. In: Proceedings of the conference on empirical methods in nat ural language processing
(EMNLP), ACL, July 2002, pp 79–86
8. Padhy N, Panigrahi R, Satapathy SC (2019) Identifying the reusable components from
component-based system: proposed metrics and model. Springer, pp 89–99
9. Guzman E, Az´ocar D, Li Y (2014) Sentiment analysis of commit comments in GitHub: an
empirical study. In: Proceedings of MSR 2014 (11th working conference on mining software
repositories). ACM, pp 352–355
Feature Enhancement-Based Stock
Prediction Strategy to Forecast the Fiscal
Market
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 551
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_53
552 D. K. Padhi et al.
1 Introduction
The forecast of the stock market has piqued the interest of both academics and those
in finance. The problem persists. “To what degree can the price history of a common
stock be utilized to generate reliable forecasts about the stock’s future price?” [1].
Earlier research on forecasting relied on the Efficient Market Hypothesis and the
random walk hypothesis [1, 2]. These older models said that stock markets couldn’t
be anticipated because they are affected by news rather than current market prices.
Because of this, stock prices will move in a way that is hard to predict with more
than 50% accuracy [3]. In contrast, a growing number of studies [4–14] present data
that contradicts the EMH and random walk hypotheses.
Stock market forecasting is critical in the financial business because a reasonably
accurate assessment can make a lot of money and protect against market risks [7, 8,
12]. Regardless of how predictable the stock market is, it’s still hard to predict how
the price of stocks will move. This is because the financial sector is an extremely
complicated, emerging, and highly nonlinear system that comes into contact with
political trends, the financial environment, and stockholders’ assumptions [12]. It’s
still very important to be able to accurately predict stock prices in the short term and
long term because this is the most interesting and important research topic in the
investment field. People who are excited about making inaccurate predictions are
encouraged to come up with new and better tools and techniques. In the broad sense,
there seem to be two ways to figure out how the stock market will go. These two
methods are called “fundamental analysis” and “technical analysis.” The first look
at economic factors to figure out how much a stock is worth, while the second looks
at stock prices in the past to figure out how much a stock is worth. It’s a huge field,
and new techniques are being developed each day, notably in the field of automatic
feature learning, which is a lot of work.
Information and a framework are the two main parts of ML. At the time of
extracting hidden features [15], it is always good practice to select only those features
whose results have some contributive meaning. Because potential features give more
accuracy to a model during model building. The prime objective of engineering the
feature is to not only reduce the dimension but also to find the potential features for
the predictive model. Researchers in [16] used machine learning to make predictions
about the stock and were pleased with the results.
In addition, there have been a lot of studies done that used feature engineering, but
none of them had anything to do with stock prediction. Scholars in [17] used feature
extraction to figure out what was wrong with induction motors. Researchers in [18]
came up with a semantic feature framework for concurrent engineering that they
used. Another study used gradient boosting to make new features for energy theft
detection and discovered useful pairings from the original features [19]. The authors
of [20] came up with a way to use AETA data to predict short term earthquakes. In
[21], researchers looked into how to make search ads easier to recognize. Based on
prior investigation, it can be seen that there aren’t a lot of studies that used feature
extraction to predict the stock price. So, this study is trying to come up with a new
Feature Enhancement-Based Stock Prediction Strategy to Forecast … 553
way to figure out how to predict stock prices daily. It’s important to point out that
our study was the first to look at and use feature extraction for stock prediction with
ensemble methods.
The remainder of the article is organized as in Sect. 2 our research framework,
Sect. 3 model implementation, Sect. 4 discussion and finally section four consists of
the conclusion part.
2 Research Framework
Our research plan is made up of five main steps, such as collecting datasets,
preprocessing data, designing features, making a model, and evaluating the model
(Fig. 1).
For our practical experiment, we have collected a 5-year dataset of ITC stock prices
for each trading day, which is downloaded from the publicly available website
yahoo/finance. The period of the dataset was from April 27th 2017 to April 26th
2022. Our original dataset (ITC) recorded 1235 records of daily transaction historical
data. Each record has six pre-existing features.
a. Date—Represents the each trading day’s date.
b. Close—It denotes the final stock movement value of each trading days.
c. Volume—It is the total number of shares buy and sell in the particular trading
days.
d. Open—It is the Opening price of a stock on each trading days.
e. High—It is the highest value of the stock on that particular trading days.
f. Low—It is the lowest value of the stock on that particular trading days.
Data preprocessing is the act of converting raw data into a standard format that
machine learning models can easily learn from. Next, after receiving the pre-existing
historical data, we have to find the missing values and clean up the existing records.
Feature engineering is an essential step to identify the potential features and hidden
behavior of our dataset. So, in this segment, we will find out the real features that are
essential and fruitful for our experiment. So during our feature enhancement phase,
we found the pre-existing features ‘Open’, ‘Volume’, ‘Close’, ‘High’, and ‘Low’ are
to be used as independent variables, whereas ‘Close’ will be treated as a dependent
variable. During the feature building phase, we found the independent features which
are selected by us suffered from multicollinearity issues.
When the independent variables in a regression model are linked to each other,
this is called multicollinearity. This is problematic since the independent variables
should be kept distinct. If the correlation between variables is high enough, it can
be hard to fit the model and figure out what the results mean. It can be hard to tell
which independent variables have an effect on the dependent variable if they are all
in the same place in a regression model.
The concept is that we may alter the value of one independent variable while
leaving the rest unchanged. On the other hand, the correlation between independent
variables suggests that changes in one variable are connected with changes in another.
Correlation strength reveals how hard it is to change one variable without influencing
another. Since the independent variables tend to move together, it’s hard for the model
to figure out how each independent variable affects the dependent variable on its own.
Multicollinearity is classified into two types:
Structural multicollinearity: This happens when we build new features from
the data itself instead of the data that was sampled. Data multicollinearity: This is
already part of the feature of ones dataframe, and it’s much more difficult to see. In
this case, this type of multicollinearity is found in the data itself, not because of our
framework.
If we will find out the independent variables which are the cause of multicollinearity
and the strength of the correlation then we may fix this issue.
There are certain approaches are there by using that technique we may avoid
multicollinearity.
1. Correlation coefficient (Heat map).
2. Variance Inflation Factors (VIF) (Fig. 2)
Feature Enhancement-Based Stock Prediction Strategy to Forecast … 555
By looking at the heat map we found the features which are selected by us from
the pre-existing dataset are highly correlated with each other (Covariance values are
1) except volume. So we must avoid this issue before modeling.
By looking at Table 1 the VIF of all features are more than 10 except Volume
feature and VIF value of more than 10 are considered to be the worst one which should
be avoided. That means the above experiment clearly gratifies that the combination
of features which are selected by us are not suitable. So we have again extracted two
more hidden features which are known as technical indicators are shown below.
Fig. 3 Heat map of two pre-existing features and two derived features of ITC stock
3 Model Implementation
Here in this phase, we have selected some base machine learning models with some
ensemble models. The list of algorithms are listed below:
1. Linear Regression
2. Lasso Regression
3. SVR
4. KNN
5. GradientBoostingRegressor
6. BaggingRegressor
7. HistGradientBoostingRegressor
8. LGBMRegressor
Feature Enhancement-Based Stock Prediction Strategy to Forecast … 557
After the success of making models that can predict the future, now we have to find
the best model among them. For finding the best model, we have used the root mean
square error (RMSE) and coefficient of determination (R2). The forecasting model
is evaluated in terms of root mean squared error, which is expressed in the following
equation.
n 2
t=1 Y t − Yt
RMSE = (1)
n
where the Yt is the predicted value and Y t is the actual value, n represents the total
number of sample that has been predicted, t represents the time period.
In this section, we will discuss our findings related to our experiment. Our initial
thought process started from the raw dataset whose detailed status has been given in
the Sect. 2.1. After getting the raw dataset, we go through the feature engineering
process to find the best combination of features to forecast the next day’s closing price.
But really, it is a tedious job to find out the best features for a certain combination.
So to find the best features, we have applied the technique, i.e., multicollinearity
and VIF for finding the best combination that has some impotence during the model
building process. After successfully engineering the features, our final combination of
features is shown in Table 2. For finding the best predictive model and for comparison
purposes, we use eight different algorithms. Four of them are base level machine
learning algorithms, i.e., Linear Regression, Lasso Regression SVR, KNN, and the
other four are based on ensemble techniques, i.e., GBR, BR, and HGBR.
By looking at Table 3 after the successful training of our above said models with
70% data and testing with 30% data we found the linear regression got the R square
value is 99.18 and RMSE value 3.451, Lasso regression got the R square value is
99.17 and RMSE value 3.459, SVR got the R square value is 99.23 and RMSE value
3.429, KNN got the R square value is 99.12 and RMSE value 5.859, GBR got the
R square value is 99.10 and RMSE value 3.53, Bagging regressor got the R square
Table 3 Performance
Models R square RMSE
evaluation of all models of
ITC stock Linear regression 99.18 3.451
Lasso regression 99.17 3.459
SVR 99.23 3.429
KNN 99.12 5.859
GradientBoostingRegressor 99.10 3.53
BaggingRegressor 99.08 3.60
HistGradientBoostingRegressor 98.76 3.72
LGBMRegressor 98.63 3.81
value is 99.08 and RMSE value 3.60, HGBR got the R square value is 98.76 and
RMSE value 3.72 and finally LGBMRegressor got the R square value is 98.63 and
RMSE value 3.81. If we will consider the best-performing model then we say SVR
doing slightly better than other models.
This study’s goal was to assess machine learning-based forecasting skills to identify
difficulties related to intraday trading transactions. As shown in Table 3, while the
RMSE for each model is increasing, the advanced approach (ensemble techniques)
is unable to give an appropriate improvement over traditional procedures. So it is
clearly understood that if we properly find out the best combination of features
that impact our machine learning model, even if baseline machine learning models
also perform well. So in our future research, we will expand this study in terms
of feature engineering to find the best combination of features that really impact a
heterogeneous dataset with a heterogeneous combination of algorithms.
References
8. Farias Nazário RT, e Silva JL, Sobreiro VA, Kimura H (2017) A literature review of technical
analysis on stock markets. Q Rev Econ Financ 66:115–126
9. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-
imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
10. Wang L, Wang Z, Zhao S, Tan S (2015) Stock market trend prediction using dynamical Bayesian
factor graph. Expert Syst Appl 42(15):6267–6275
11. Moghaddam AH, Moghaddam MH, Esfandyari M (2016) Stock market index prediction using
artificial neural network. J Econ Financ Adm Sci 21(41):89–93
12. Nayak A, Pai MMM, Pai RM (2016) Prediction models for Indian stock market. Proc Comput
Sci 89:441–449
13. Weng B, Ahmed MA, Megahed FM (2017) Stock market one-day ahead movement prediction
using disparate data sources. Expert Syst Appl 79:153–163
14. Zhao Y, Li J, Yu L (2017) A deep learning ensemble approach for crude oil price forecasting.
Energy Econ. 66:9–16
15. Khurana U, Turaga D, Samulowitz H, Parthasrathy S (2016) Cognito: automated feature engi-
neering for supervised learning. In: 2016 IEEE 16th International conference on data mining
workshops (ICDMW), 2016, pp 1304–1307
16. Long W, Lu Z, Cui L (2019) Deep learning-based feature engineering for stock price movement
prediction. Knowledge-Based Syst 164:163–173
17. Panigrahy PS, Santra D, Chattopadhyay P (2017) Feature engineering in fault diagnosis of
induction motor. In: 2017 3rd International conference on condition assessment techniques in
electrical systems, CATCON 2017—Proceedings, 2018, vol 2018, Janua, pp 306–310
18. Liu YJ, Lai KL, Dai G, Yuen MMF (2010) A semantic feature model in concurrent engineering.
IEEE Trans Autom Sci Eng 7(3):659–665
19. Punmiya R, Choe S (2019) Energy theft detection using gradient boosting theft detector with
feature engineering-based preprocessing. IEEE Trans Smart Grid 10(2):2326–2329
20. Huang J, Wang X, Yong S, Feng Y (2019) A feature enginering framework for short-term
earthquake prediction based on AETA data. In: Proceedings of 2019 IEEE 8th joint international
information technology and artificial intelligence conference, ITAIC 2019, 2019, pp 563–566
21. Sun Y, Yang G (2019) Feature engineering for search advertising recognition. In: Proceed-
ings of 2019 IEEE 3rd information technology, networking, electronic and automation control
conference, ITNEC 2019, 2019, pp 1859–1864
22. TA-LIB: Technical analysis library. Available online: www.ta-lib.org. Accessed on 10 Jan 2022
Early Prediction of Diabetes Mellitus
Using Intensive Care Data to Improve
Clinical Decisions
Abstract Insulin deficiency causes diabetes mellitus (DM), which can lead to multi-
organ failure in patients. Insufficient data is always a threat for detection and diagnosis
of health disorders. Intensive Care Units (ICUs) do not have verified medical histories
of their patients. This paper presents and elaborates the most significant analysis done
during WIDS Datathon 2021. Data from the first 24 h of ICU admission was used to
build a model that can identify if a patient has been diagnosed with a particular type
of diabetes during admission to an ICU. The work focuses on Diabetes Mellitus type,
and to discover a competent classifier to obtain the most accurate result, particularly
in comparison to clinical outcomes. For analytic and comparative purposes, four
algorithms were used. The optimum result was obtained using the LGBM classifier
with roc_auc_score of 0.871. The evaluation is done using Stratified threefold cross-
validation and the predictions for the test set won accolades in the Kaggle hackathon.
1 Introduction
During the COVID-19 pandemic, monitoring the overall health of public has become
very critical. It paved way to digitalization in healthcare, to obtain faster health
analytics. The advances in AI/ML came handy at the right time. It is now possible to
detect and there by cure diabetes during early stages through automated techniques.
The patient may not be always able to provide other information about his chronic
ailments such as injuries, heart diseases. The medical records can take many days
to be transferred from another medical service provider. The clinical decisions can
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 561
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_54
562 C. Uddagiri et al.
2 Literature Survey
The following are various works done in this area which are found useful for designing
our approach.
Diabetes mellitus (DM), which is brought on by uncontrolled diabetes, can result
in multi-organ failure in people. It is now possible to identify and diagnose diabetes
in its early stages using an automated method that is more effective than manual
diagnosis, thanks to advancements in AIML [1]. In order to perform and analyze
tasks effectively, data must be structured. Data was checked for missing informa-
tion, and diabetes cases are represented by a 1 or a 0. Through the course of the data
analysis, it was discovered that there were a fair number of instances with a zero
Early Prediction of Diabetes Mellitus Using Intensive Care Data … 563
value. Data imputing was used to address missing or zero values in the dataset [2].
The proposed Logistic Regression model has an AROC of 84.0% and a sensitivity
of 73.4% compared to the suggested GBM model’s 84.7% AROC and 71.6% sensi-
tivity. GBM and Logistic Regression models perform worse than Random Forest
and Decision Tree models [3]. Missing data can be handled in a variety of ways,
as detailed in a large body of research. There are three approaches to dealing with
missing data. The first is based on strategies for disregarding missing data. Imputation
of missing data is the second option. Missing data-based modeling is the third option.
Usage of missing data imputation methods is concentrated more in this study [4].
XGBoost and pGBRT are two helpful versions of the well-known machine learning
algorithm known as the Gradient Boosting Decision Tree (GBDT). Experiments on
several publicly available datasets show that Light GBM speeds up the training of
traditional GBDT by up to 20 times while keeping roughly the same accuracy [5].
The suggested approach analyses the features in the dataset and picks up the fittest
features choosing on correlation values [6]. Random Forest is a dimensionality reduc-
tion approach that consists of a few decision trees. It is a classification, regression,
and other words ensemble approach. It’s a tool for ranking the significance of factors
[7]. XGBoost is a type of boosting-based ensemble learning method. The idea behind
XGBoost is to use an iterative computation of the CART decision tree classifier to
get accurate prediction results quickly. By fusing a linear model with a tree learning
model, XGBoost is an optimization model that improves the gradient boost tech-
nique. It is highly precise and utilized to solve a number of real-world problems
[8]. The Random Forest model that was built might be used to help doctors diag-
nose diabetes. Other measures, such as classification time, might also be employed
to assess the present research’s performance. Given the positive outcomes gained
with Random Forests, this method can be used to help with pediatric emergency
management [9]. Feature selection has already been used to improve classification
performance in a variety of medical scenarios. In the current study, the technique
for determining the contribution of each characteristic based on its relevance is also
important. The most common method for locating such obscured patterns in data is
correlation analysis. Although correlation was not included in the prediction chal-
lenge, the ranking correlations can be used to enhance our findings at a later stage of
model development [10].
In the provided training data as shown in Fig. 1, 22% of patients are diagnosed
with diabetes.
The Dtype of 180 columns were obtained which helped in further calculations and
analysis. The count of each type is shown in Table 1.
The number of unique classes in each data column are found for investigating
the correlations and balancing between the attributes. There are 6 ethnicity classes,
2 gender classes, 15 hospital_admit_source classes, 5 icu_admit_source classes, 3
icu_stay_type classes, and 8 icu_type classes.
The box plot shown in Fig. 2 depicts the age distribution for men and women,
excluding diabetic patients.
Early Prediction of Diabetes Mellitus Using Intensive Care Data … 565
1. The majority of persons diagnosed with diabetes are between the ages of 60 and
70.
2. When it comes to positive instances, males have the most.
3. There is one age 0 value that is an anomaly.
This visualization in Fig 3, shows that African Americans have the largest number
of diabetes positives cases.
The initial goal of our data analysis was to find the more relevant features in order
to focus our preprocessing on them. The more important features are usually those
with a higher correlation with the target variable of interest. Figure 4 shows a ranked
histogram that shows how the 15 most essential features influence class prediction.
Correlation matrices help to discover answers fast. It’s used to examine the interde-
pendence of numerous variables at once and to determine which variables in a data
table are the most related. The value of the correlation coefficients is shaded in Fig. 5.
4 Preprocessing
The crucial aspect in this dataset is the amount of missing values for some variables.
There are 160 columns that have missing values. The irrelevant columns and the
columns with high rate of missing values are also dropped as they could introduce
noise into the dataset.
Early Prediction of Diabetes Mellitus Using Intensive Care Data … 567
The gender data that were missing were predicted using logistic regression using
the patient’s age, height, and weight. The mean values for the weight and height are
aggregated by gender, ethnicity, and age to produce lookup tables. The lookup tables
were used to fill in the missing values for height and weight.
Duplicate columns or functionally similar columns in the dataset are removed. The
invasive and noninvasive variables are removed from the dataset as they are redundant
with respect to diasbp, sysbp, and mpb, and they have a high rate of missing values.
Columns like hospital id and encounter id are no longer needed. Because there were
no repeat patient visits, the encounter id was irrelevant to our models. The hospitals
in the annotated dataset and the hospitals in the unlabeled dataset do not overlap.
Furthermore, the readmission_status column is also dropped because it only has one
unique value of 0. The dataset relates to young adults and adults aging 16 and older.
However, there are 30 data points with age = 0 in the training data. For the initial
analysis purposes, these data points are dropped. However, these data points only
account for 0.02% of data loss in addition to the earlier argument.
This dataset has categorical columns such as ‘ethnicity’ and ‘gender’ and ‘hospital
admit source,’ ‘icu admit source’ as well. Those functions are encoded with the
568 C. Uddagiri et al.
categorical data encryption, one hot encoding, when their functions are nominal (do
not contain any order). In one hot encryption, a new variable is constructed for each
level of a category feature.
5 Methodology
Based on data from the first 24 h of critical care, a model was created in this work
using Logistic Regression, Random Forest, XGBoost, and Light GBM classifiers to
predict the likelihood that a patient has been diagnosed with Diabetes Mellitus.
One of the most fundamental and widely used Machine Learning techniques for
binary classification is logistic regression. The link between one dependent binary
variable and independent variables is described and approximated using logistic
regression. The Logistic Regression model’s roc auc score is 0.638. The most often
used classification technique is Random Forest. Random Forest builds several deci-
sion trees and combines them to get a more precise and trustworthy prediction. It
takes less training time as compared to other algorithms. The roc_auc_score of the
Random Forest model is 0.824. Therefore, Random Forest can be considered as a
good predictor for diabetes. XGBoost is a distributed gradient boosting library built
with efficiency, versatility, and portability in mind. XGBoost efficiently handles the
missing value and has in-built cross-validation capability which makes it a great
choice for large datasets and classification problems. XGBoost performed very well
with roc_auc_score of 0.848.
A decision tree-based gradient boosting system called Light GBM can be utilized
for a variety of machine learning applications, including ranking and classification.
The execution time for model training differs significantly when Light GBM is used
in place of XGBOOST, despite the fact that accuracy and auc score only slightly
improve. For handling enormous datasets, Light GBM is a considerably superior
approach that is roughly seven times faster than XGBOOST. When working on
enormous datasets in a short amount of time, this proves to be a great benefit. By
comparing all these models, it is evident that Light GBM performs best and has the
highest roc_auc_score of 0.871.
Light GBM uses the leaf-wise tree growth algorithm instead of the depth-wise tree
development method, which is used by many other widely used approaches. In
comparison to the depth-wise technique, the leaf-wise algorithm can converge much
more quickly. However, the leaf-wise growth may be over-fitting if the appropriate
parameters are not used.
Early Prediction of Diabetes Mellitus Using Intensive Care Data … 569
The area under the receiver operating characteristic (ROC) curve between the
predicted and observed goal (diabetes mellitus diagnosis) was used to evaluate while
submitting for the leaderboard. The graph in Fig. 6 shows the performance of a
classification model at all classification levels in a ROC curve (receiver operating
characteristic curve).
Initially a validation set was created from the training set with test_size = 0.20. Then,
the roc_auc_score of Logistic Regression, Random Forest, Light GBM, XGBoost
were 0.649, 0.829, 0.844, 0.848, respectively. Light GBM parameters were tuned for
obtaining higher results. There are two types of cross-validation: k-fold and stratified
k-fold. The k-Fold cross-validation method is used to divide the dataset into k folds.
To guarantee that each fold of the dataset has the same percentage of observations
with a particular label, the stratified k-fold is utilized. Figure 7 shows the comparison
of various classifiers used.
Stratified threefold cross-validation yielded best results. The score in the private
leaderboard is 0.87278. This is calculated with approximately 30% of the test data.
7 Conclusion
Our study provides medics to determine whether or not a patient has diabetes. In an
ICU, understanding about their health problems and improving the clinical judgments
is the key subject of our research. The LGBM classification outperformed all other
models, providing an outstanding insight into this study.
Our team which submitted this work stood 85th position in the global leader
board and secured 3rd position in the Hyderabad region, for WiDS 2020 Datathon
on conducted on kaggle platform.
Early Prediction of Diabetes Mellitus Using Intensive Care Data … 571
References
1. Chaki J, Ganesh ST, Cidham SK, Theertanb SA. Machine learning and artificial intelligence
based diabetes mellitus detection and self-management: a systematic review
2. Sarwar MA, Kamal N, Hamid W, Shah MA. Prediction of diabetes using machine learning
algorithms in healthcare
3. Lai H, Huang H, Keshavjee K, Guergachi A, Gao X. Predictive models for diabetes mellitus
using machine learning techniques
4. Houari R, Bounceur A, Tari AK, Kecha MT (2014) Handling missing data problems with
sampling methods. In: 2014 International conference on advanced networking distributed
systems and applications, pp 99–104
5. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu F. LightGBM: a highly efficient
gradient boosting decision tree. In: NIPS
6. Sneha N, Gangil T (2019) Analysis of diabetes mellitus for early prediction using optimal
features selection. J Big Data 6(13)
7. Choudhury A, Gupta D (2019) A survey on medical diagnosis of diabetes using machine
learning techniques. In: Springer recent developments in machine learning and data analytics,
pp 67–78
8. Xu Z, Wang Z (2019) A risk prediction model for type 2 diabetes based on weighted feature
selection of random forest and XGBoost ensemble classifier. In: 2019 IEEE eleventh inter-
national conference on advanced computational intelligence (I (Houari) (Ke) (Michelle L.
Griffith) (Daniel J. Rubin) (N. Sneha1) (Choudhury) (Xu)CACI), pp 278–283
9. Benbelkacem S, Atmani B (2019) Random forests for diabetes diagnosis. In: IEEE 2019
International conference on computer and information sciences (ICCIS), pp 1–4
10. Ahmad HF, Mukhtar H, Alaqail H, Seliaman M, Alhumam A. Investigating health-related
features and their impact on the prediction of diabetes using machine learnings
Authorship Identification Through
Stylometry Analysis Using Text
Processing and Machine Learning
Algorithms
Abstract The project aims to detect the identity of an anonymous author of a defam-
atory blog post or comment. A dataset samples containing list of authors is acquired,
and then predict the anonymous author by using a custom machine learning model.
The main task in this proposal is to build an authorship analysis model that will
match a sample to the defamatory BlogSpot and reveal the anonymous author. Text
preprocessing methods along with a combination of machine learning algorithms
such as SDG classifier are employed. Stylometry analysis gives the clarity about the
text information like text length, vocabulary and style of text. By this we can use
this technique for authorization purpose. The project consists of building a model
that can learn authorship style and then scale the model to handle hundreds of such
cases. Stylometry analysis plays a major role in this project. An accuracy of 79% is
obtained with 40 classes, which was found improving with lesser number of classes.
1 Introduction
The aim of the work is to identify the true author of an anonymous post, which
could possibly be defamatory blog or a comment. The sample dataset contains list of
existing authors and we try to detect the closest matching author of the post from this
sample dataset through a tested machine learning model. Authorship analysis comes
under text mining. This work will follow the approach of breaking the text into useful
tokens and build predictive models to classify new text. Authorship identification uses
a different approach to deal with text. There is a need to perform content analysis
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 573
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_55
574 C. Uddagiri and M. Shanmuga Sundari
and writing style, etc. first. The author is tried to identify irrespective of the content,
which is called “Stylometry Analysis.”
Stylometric analysis is the study of linguistic style. It can be applied to written
texts, music and fine arts. It’s based on the intuition that authors have a consistent
style of writing such as usage of vocabulary, punctuations. This can be analyzed
statistically. The analysis uses input features such as frequency distribution, word-
length, n-grams (word and character), sentence length, Parts of speech tags, content
words, function words.
For example:
• Every human has their own style of writing, which is visible in their vocabulary
(rich or poor). The quality of vocabulary is usually associated with their count,
which may not be always the case. Noble prize winner for Literature in 1954, Mr.
Ernest Hemingway is well-known for his smaller number of words in his writing.
• The length of sentences using clauses varies among the authors.
• No two people use the punctuation the same way.
The goal of optimization is to solve a real-world problem, which includes
obtaining the best possible result. The optimization procedure in machine learning is
different. Generally, we alter the data features while optimizing and locate the most
efficient dataset in the process. In machine learning, we optimize the training data and
compare it to new validation data to see how well it performs. As a result, gradient
descent is the most widely utilized optimization technique in machine learning.
Stochastic Gradient Descent (SGD) algorithm is used to fit the linear classifiers and
repressors with convex loss functions. SGD is mainly used on large scale datasets,
such as the one in this project, i.e., text mining. SGD helps to find the classifiers
in the text with linear function. This mainly used to create matrix array for authors
similarity with the linear function. So SGD is the better technique in this project.
Gradient descent runs slow on large datasets. It needs training dataset for predic-
tion. Hence SDG, a variant of this algorithm is used. Randomly select few samples
from the entire dataset for each iteration. This is called “batch”. If the dataset is
redundant, the gradient on first half is identical to second half. Computing the gradi-
ents for several samples simultaneously requires matrix multiplications. So GPUs
are preferred to improve the efficiency.
Authorship Identification Through Stylometry Analysis Using Text … 575
2 Literature Survey
3 Proposed System
Fig. 1
Flowchart/architecture
Authorship Identification Through Stylometry Analysis Using Text … 577
4 Preprocessing
“The Blog Authorship Corpus” dataset was used from bloggers.com website. The
corpus consisted of 681,288 posts and over 140 million words, which came up to
approximately 7 k words per person. This is a real dataset from genuine blogs. The
dataset has required attributes [11] which need not do the preprocessing techniques.
They have been labeled with anonymous author ids. There are other features attached
to the documents such as age, gender, astrological sign and industry.
To load the XML from the dataset folder, we used the glob library which helps in
loading the data into the list with regex expression provided along with the file path
in the runtime. The dataset contains all posts in XML files each for an author. We
have multiple authors leading this to be multi-classification problem. Loading the
dataset is a common step for all the models. However, the steps succeeding these
steps vary basing on the model type.
The files are in XML format which made preprocessing a bit challenging than antic-
ipated. The XML files are first acquired through glob library and made as a blogs
folder. Then there are different encodings present throughout the files. Some files
with improper encoding are removed. Beautiful soup library is used for parsing the
XML files and obtain the posts as strings. Then these strings are being preprocessed.
The Stylometric analysis is way of understanding the author’s style of writing
[13]. So, in this project we cannot focus much on preprocessing since we may lose
author style of writing. But there is some unnecessary text from each of files to
remove. Some of the files and posts contained unnecessary information such a url
links. We scanned through each file and removed these url links.
578 C. Uddagiri and M. Shanmuga Sundari
Then the strings after preprocessed they are made as posts using post class func-
tions which are created to make post. All the posts are then converted to the post
objects. But the problem is we cannot use the post objects for classification directly.
So, the posts objects are then converted to data frame. Handling large data frame will
be time consuming which led us to create a compressed version of data frame.
5 Methodology
It is one of the linear classifiers (SVM, logistic regression, etc.) with SGD training.
The k value is the number of iteration and θ is the constant value. X value is the
author’s signature list. Y is the linear function value.
SGD Algorithm: update at kth iteration.
1. Learning rate ε k
2. Initial parameter θ
3. While stopping criteria is not met do…
i. Sample a mini batch of m samples from the training set {x(1) ,..x (m) } with
ii. Corresponding targets y(i)
1
ĝ ← + ∇θ L( f (x (i) ; θ ), y(i) )
m i
θ ← θ− ∈ a
6 Result Analysis
Table 1 Displays the accuracy according to class size. If the class size is small the
accuracy is high and if the class size is large the accuracy value is reduces. The
reason for this reduction can be attributed to the subjective nature of the analysis.
The misclassification error is bound to rise due to similarity in writing styles of few
authors. If the variance in writing style is high the accuracy was found to improve.
Ex: bundle the authors according to country/region.
Table 2 shows the prediction between the real and predicted author ids. So our
research gives the maximum correct prediction of authors using stochastic gradient
descent. This table also proves that the predicted author ID’s writing style is very
close to the real author ID. This was manually verified.
Figure 3 shows the confusion matrix value for 10 authors. This interpret the
author’s identification using array that is given below
The model is used to predict the author of any defamatory blog posts or comments.
The current accuracy obtained only gives a lead and cannot completely rely. Also
there can be new authors coming up who were not part of the trained model. Author’s
writing styles are vastly subjective. So, some kinds of filtering techniques may
be useful. Bagging and boosting techniques may help if the datasets are bundled
according to their features. The future enhancement of this application can be.
• To test it with a larger and complex dataset.
• Bundling the data on combination of features to get more accuracy.
References
1. Benzebouchi NE, Azizi N, Hammami NE, Schwab D, Khelaifia MCE, Aldwairi M (2019)
Authors’ writing styles based authorship identification system using the text representation
vector. In: 2019 16th International multi-conference on systems, signals & devices (SSD),
2019, pp 371–376
2. RamakrishnaMurty M, Murthy JVR, Prasad Reddy PVGD, Satapaty S (2012) Statistical
approach based keyword extraction aid dimensionality reduction. In: International confer-
ence information systems design and intelligent application—2012, vol 132. Springer—AISC
(indexed by SCOPUS, ISI proceeding DBLP etc), pp 445–454. ISBN 978-3-642-27443-5
3. Kuzu RS, Balci K, Salah AA (2016) Authorship recognition in a multiparty chat scenario. In:
2016 4th International conference on biometrics and forensics (IWBF), 2016, pp 1–6
Authorship Identification Through Stylometry Analysis Using Text … 581
Abstract Nowadays, numerous songs are available on the Internet and other front-
line streaming media. It makes it hard to find the genre to listen to. Quick classification
eliminates the idea of searching for music in a specific genre. The classification of
music into its respective genre emerges from traditionally extracting the features from
time-series data. Another efficient way to classify music into different genres would
be to apply a convolutional neural network. Since CNN gives promising results,
we have built a CNN model to classify. We used the Librosa library to extract Mel
frequencies. This library uses to understand the data using the Mel spectrum. We
used the GTZAN data set having ten different genres and 10,000 different audio files
(.wav). The classification of our music genre is different zones.
1 Introduction
The steps used to classify an audio file into its respective genre are as follows: The
first step is feature extraction from the audio file, and the second step would be to
build a classifier using these features. Feature extraction [1] depends on one factor,
which is Mel-frequency cepstral coefficients (MFCC). We use the Libros library
to understand the audio file, its parameters, and the most significant contributing
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 583
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_56
584 M. Shanmuga Sundari et al.
factor that helps in classification [2]. We then use the Librosa library to extract Mel-
frequency cepstral coefficients from the given audio files in the data set and store
them in a .json file. We extract 13 Mel-frequency cepstral coefficients from each
audio sample in the data set. GTZAN consists of ten different genres. So, we create
ten labels and assign them to each genre correspondingly. As mentioned, each genre
consists of 100 audio files that make 10,000 audio files in total. We take each audio
file and extract 13 MFCCs from it using the Librosa library. We assign each audio
file a label to define which genre the audio file is from. Then we use the extracted
features to build a classifier. We load the JSON file data into two different vectors.
One consists of extracted MFCCs, and the other consists of labels. The convolutional
neural network model is then built with three convolutional layers using Keras. The
classifier is trained with a 70:30 ratio of training and test data. The model has been
trained, and it can now predict the type of music. The accuracy of the classifier was
77.5%. We take ten random samples from the given data set and predict the genre of
each audio file. We now take the input from the user, a .wav audio file, and process
it by extracting its 13 MFCCs. We now assign labels to each MFCC and display the
most occurring label as the predicted genre.
2 Literature Survey
Global Layer Regularization (GLR) technique is used with CNN and RNN model
for the evaluation of training and accuracy [3]. Many music classification techniques
use acoustic features for the comparison in terms of the audio signal like Mel-
frequency cepstral coefficients (MFCCs). The range bins with the mono-component
linear frequency-modulated (LFM) signal model [4]. The music genre recognition
was captured using CNN with NetVLAD [5] and aggregated high-level features. It
created the proper feature selection to capture musical information across different
levels. However, traditional feature coding methods are unsupervised clustering-
based approaches that may create chaos to classify the given tasks. EEG signals
[6] are tested using many emotions recognition abilities; without traditional utiliza-
tion strategies, EEG signals are considered the most reliable technique for emotions
recognition because of their noninvasive nature. The error rates are extracted using
conjecturing that can be the cause of varying noise tag-wise performance differences.
Therefore, this research connects to music tagging [7] and neural networks. Music
process mining [8] is the technique to categorize the flow of music. The music predic-
tion can also be done with the gender [9] based on the voice frequency. Effective
music classification is useful to enhance the level of the music zonal in terms of
different platforms [10].
Music Genre Classification Using Librosa Implementation … 585
3 Proposed System
4 Implementation
We collected a data set from Kaggle [12] website for different music genres. The data
set consists of ten genres and 10,000 audio files. Each genre has 100 different sample
audio files. Blues, Classical, Country, Disco, Hip-hop, Jazz, Metal, Pop, Reggae, and
Rock are the genres in the data set.
There are four steps in data preprocessing [13] that will extract the noiseless data
from our original data set. Data transformation: The final data set is completely built
with the required attributes.
5 Algorithm
The convolutional layer, pooling layer, and fully connected layer are the major layers
of a convolutional neural network. The convolutional layer works on the principle
of obtaining the attributes of the image or audio through a fixed-length window
(convolutional kernel) by sliding up and down. The feature map which is generated
by the activation function is given as the input for the next layer. The pooling layer
works to retain the salient attributes, minimize the dimension of each feature map, and
lessen the size of the input image or audio file. After obtaining the feature information
from the previous convolutional and pooling layers, the fully connected layer acts as
a general neural network that is used for classification. The fully connected layer’s
neurons are only linked to the pixels of the kernel’s previous layer and are shared in
the same layer with the same weights on each link.
Figure 1 shows the architecture of the convolutional neural network. The pooling
layer works on a principle to reduce the attributes to lessen the time and computing
resources. Max pooling and average pooling are the two methods of pooling. We
have used a max pooling method in our pooling layer that helps choose a maximum
value from a matrix and lessen the data of the matrix.
Figure 2 is the illustration of max pooling. Over-fitting is the most occurring issue
in a neural network, where the model, when learning features, matches a specific data
set too precisely which leads to the model not being generic. It results in the outcomes
being more specific to the training data set and low accuracy. The dropout layer erad-
icates over-fitting and improves generalization. The most used deep learning tech-
nique to reduce over-fitting is dropout. Neurons are randomly disconnected during
learning by the neural network when dropout is invoked. In current training, these
disconnected neurons are not allowed to participate. A sub-network is fabricated
from the original neural network after random sampling. This sub-network structure
when compared to the original network structure is different. The different machine
learning algorithms [14] also can use to predict along with the CNN algorithm for
analysis. The sequence of process mining [15] will regulate the activities to get the
accurate performance.
Figure 3 shows the architecture of the neural network that contains different layers.
The input audio file is sent to the convolution layer. The given audio file is transferred
into the pooling layers that are used to reduce the feature dimensions in the input
image. Figure 4 shows the feature reduction has done in the previous layer, and the
attributes are sent to the CNN layer to get the output.
Figure 5 shows the attribute values after coming across the pooling layer. The
attribute selection and reduction will give the better accuracy to find the values.
588 M. Shanmuga Sundari et al.
5.2 Librosa
The library function is used to load audio from various sources. It will compute
spectrogram representations and analyze audio files. Some of the matrix decom-
position methods are harmonic-percussive source separation (HPSS) and generic
spectrogram decomposition. Time-domain audio processing is also done with these
library functions such as pitch shifting and time stretching. Low-level feature extrac-
tion is done using feature extraction and manipulation. Various spectral and rhythmic
features provided manipulation methods that will help the delta features and memory
embedding.
Music Genre Classification Using Librosa Implementation … 589
6 Result
As mentioned earlier, we predict the genre for randomly chosen audio files from the
testing data set. We take ten randomly chosen audio files from the testing data set and
pass them to the model. The model then processes the audio files. Processing includes
extracting the features from the test sample. Feature extraction entails extracting and
storing the 13 MFCCs and assigning a label for the MFCC based on its features
that are nearest to the MFCCs retrieved from the audio files of the data set in the
preprocessing step. We then check for the most occurring label and predict the genre
as the output (Fig. 6).
Figure 7 shows the graph of the accuracy of our CNN model. This model is trained
with 50 epochs using Adam optimizer at a learning rate of 0.001. We calculate the
loss function and categorize cross-entropy. This also shows validation losses.
7 Conclusion
Music genre classification tool is a time-saving method. Users can find their audio
files classified into respective genres within no time. Our proposed model of using the
convolutional neural network algorithm gives the highest accuracy of 77.50% which
will help in the future work of music genre classification. Music genre classification
system can be integrated with Music Recommendation System to recommend more
accurate and artist favorite music. Accuracy can be improved by using different
models or different optimizers. In the future, our research will lead to finding the
gender identification in a large volume of the data set.
590 M. Shanmuga Sundari et al.
References
1. Sharma AK, Aggarwal G (2021) Classification of Indian classical music with time-series
matching deep learning approach. IEEE Access 9:102041–102052. https://doi.org/10.1109/
ACCESS.2021.3093911
2. Chen J et al. (2020) An automatic method to develop music with music segment and long
short term memory for tinnitus music therapy, vol 8. https://doi.org/10.1109/ACCESS.2020.
3013339
3. Ahmad F, Abid F (2020) A globally regularized joint neural architecture for music classification,
vol 8
4. Qiuchen LIU, Yong W, Qingxiang Z (2020) ISAR cross-range scaling based on the MUSIC
technique, vol 31(5):928–938. https://doi.org/10.23919/JSEE.2020.000070
5. Ng WWY, Member S, Zeng W (2020) Multi-level local feature coding fusion for music genre
recognition, pp 152713–152727. https://doi.org/10.1109/ACCESS.2020.3017661
6. Sheykhivand S, Mousavi Z, Rezaii TY, Farzamnia ALI, Member S (2020) Recognizing
emotions evoked by music using CNN-LSTM networks on EEG signals, vol 8. https://doi.
org/10.1109/ACCESS.2020.3011882
7. Choi K (2018) The effects of noisy labels on deep convolutional neural networks for music
tagging, vol 2(2):139–149. https://doi.org/10.1109/TETCI.2017.2771298
8. Sundari MS, Nayak RK (2021) Efficient tracing and detection of activity deviation in event log
using ProM in health care industry. In: 2021 Fifth international conference on I-SMAC (IoT
in social, mobile, analytics and cloud) (I-SMAC), 2021, pp 1238–1245
9. Reddy RR, Ramadevi Y, Sunitha KVN (2017) Enhanced anomaly detection using ensemble
support vector machine. In: 2017 International conference on big data analytics and computa-
tional intelligence (ICBDAC), March 2017. IEEE, pp 107–111
10. Zhu Y, Member S, Liu J, Member S, Mathiak K (2020) Deriving electrophysiological brain
network connectivity via tensor component analysis during freely listening to music. IEEE
Trans Neural Syst Rehabil Eng 28(2):409–418. https://doi.org/10.1109/TNSRE.2019.2953971
Music Genre Classification Using Librosa Implementation … 591
11. Castillo JR, Flores MJ (2021) Web-based music genre classification for timeline song
visualization and analysis, vol 9:18801–18816. https://doi.org/10.1109/ACCESS.2021.305
3864
12. www.kaggle.com/andradaolteanu/gtzan-dataset-music-genre-classification
13. Padmaja B, Prasad VVR, Sunitha KVN, Reddy NCS, Anil CH (2019) Detectstress: a novel
stress detection system based on smartphone and wireless physical activity tracker. In: Advances
in intelligent systems and computing, 2019, vol 815. https://doi.org/10.1007/978-981-13-158
0-0_7
14. Kania D, Kania P, Łukaszewicz T (2021) Trajectory of fifths in music data mining. https://doi.
org/10.1109/ACCESS.2021.3049266
15. Sundari MS, Nayak RK (2020) Process mining in healthcare systems: a critical review and its
future. Int J Emerg Trends Eng Res 8(9):5197–5208. https://doi.org/10.30534/ijeter/2020/508
92020
Web Application for Solar Data
Monitoring Using IoT Technology
Abstract This paper presents the implementation of a web application for solar
data monitoring using IoT technology. A prototype of a solar photovoltaic (PV)
panel supporting an electrical load is built. The voltage and current produced by
the solar PV panel is continuously sensed and sent to Arduino microcontroller for
energy consumption calculations. The computed information is then transferred to
the Ethernet shield server and will be stored in firebase. The graphs depict the solar
power generation’s intermittent behaviour under various weather conditions. Data in
the firebase can be accessed through a user interface, which can display the present
and past data based on the given inputs like date and time. With this application, one
can track the present and historic data given by a solar PV panel. Having first hand
information about generation and consumption, the user will enable to optimize their
use of generated electrical energy.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 593
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_57
594 B. Sujatha et al.
1 Introduction
Electrical energy plays a significant role in the economic growth of a country. With
growing population, the demand for electricity is also increasing rapidly [1]. Renew-
able energy sources are currently being deployed on a large scale not only to meet
the increasing demand for electrical energy but also to mitigate the environmental
pollutants and achieve socio-economic benefits for sustainable development [2, 3].
The power produced by these renewable sources can be integrated to the grid. In this
regard, microgrid provides low-cost electrical energy. Microgrid is basically a small
grid with one or more renewable and/or conventional energy sources (addressed as
distributed energy resources (DER) integrated and supplying a small cluster of load)
[4]. As the energy is pooled from more than one resource and supplied to different
loads, a microgrid needs an energy management system (EMS), which can optimize
the use of energy in the most smart, safe, and reliable way. Though EMS was initially
developed for management on demand side, emergence of Internet of Things (IoT)
paved wave to a better management on both supply and demand sides of the grid. In
this paper, a prototype of the microgrid with one DER is made. A solar PV panel is
connected to a load through voltage and current sensors. The sensors read the voltage
and current every second and send the data to Arduino microcontroller, which calcu-
lates the energy consumption. This data is being continuously transferred to firebase
using Ethernet shield. This data can be accessed by the user interface application
built on Java Script [5, 6].
The importance of an energy management system in a microgrid is discussed in
Sect. 1. The relevance of an energy management system in a microgrid is discussed
in Sect. 2. Methodology and implementation for the proposed system’s hardware and
software design is discussed in Sect. 3. Section 4 contains the real-time set-up for the
proposed project. The solar data and its outcomes in various weather scenarios are
discussed in Sect. 5. The conclusion of the intended research as well as the work’s
future scope is presented in Sect. 6.
Fig. 1 Block diagram of microgrid EMS and web application for solar data using IoT technology
management system’s primary functions are to determine how much energy is created
or used, how it is used, and when it is utilized [7]. Microgrid energy management
system is implemented with software and hardware integration. Figure 1 shows a
microgrid energy management system (EMS) with one distributed energy resource
(solar PV panel) and a web application for solar data monitoring using IoT technology
[8].
Figure 4 shows the proposed system’s real-time set-up, which includes a solar PV
panel, an Arduino Mega2560 controller, an Ethernet shield server, a voltage and
current sensor, and a load. A user interface was built that displays voltage, current,
power, and energy values based on the user’s preferences. Present data will be shown
automatically on the user interface. If user wishes to view previous data, they must
enter a date and then click the submit button, after which the UI will display them
Web Application for Solar Data Monitoring Using IoT Technology 597
historic data. If user wants to view present data again, he or she must click the Back
Button [12–15].
Figure 5 represents present data and historic data of a solar PV panel on the user
interface. The user receives a pop up warning message in case of missing inputs.
Fig. 6 Intermittent
behaviour of solar power
generation under a variety of
weather conditions
The table and graph show the intermittent behaviour of solar power generation under a
variety of weather conditions, including sunny day with more light intensity, normal
sunny day, cloudy day, and rainy day. An energy management system (EMS) is
required for a microgrid in order to optimize energy usage in the most intelligent,
safe, and reliable manner possible. Though EMS was originally designed for demand
management, the advent of the Internet of Things (IoT) has paved the way for better
grid management on both the supply and demand sides (Fig. 6; Table 1).
The proposed system with a solar PV panel supplying an electric load is imple-
mented. The data from the sensors is transferred to the Arduino microcontroller
device for computation of the energy consumption. The computed information is
then transferred to the Ethernet shield server which directly communicates with the
user interface. User interface displays real-time and historical power and energy
consumption data according to user requirements. It is useful for industries, which
can also be implemented in homes to reduce the cost of energy consumption by
coordinating distributed energy resources. The proposed system will be effective by
implementing energy management system in a microgrid and can be extended with
grid-connected mode of a microgrid using DERs. The data should be protected from
all cyber attacks. The user interface can also be equipped with advanced features
like sending alerts in case of low energy generation of a source and high energy
consumption of the consumer. The system can be optimized by automatic switching
to another source in case of low power generation from the existing source.
Web Application for Solar Data Monitoring Using IoT Technology 599
References
1. Khaparde SA, Mukerjee A (2018) Infrastructure for sustainable renewable energy in India:
a case study of solar PV installation. In: IEEE power and energy society general meeting—
conversion and delivery of electrical energy in the 21st century, pp 1–7
2. Punna S, Manthati UB, Chirayarukil Raveendran A (2021) Modeling, analysis, and design of
novel control scheme for two-input bidirectional DC-DC converter for HESS in DC microgrid
applications. Int Trans Electr Energy Syst e12774
3. Punn S, Manthati UB (2020) Optimum design and analysis of a dynamic energy management
scheme for HESS in renewable power generation applications. SN Appl Sci 1–13
4. Nayanatara C, Divya S, Mahalakshmi EK (2018) Micro-grid management strategy with the
integration of renewable energy using IoT. In: International conference on computation of
power, energy, information and communication (ICCPEIC), pp 160–165
5. Arun J, Manivannan D (2016) Smart energy management and scheduling using internet of
things. Indian J Sci Technol 9(48)
6. Legha MM, Farjah E (2018) Implementation of energy management of a microgrid using
HMAS. In: IEEE smart grid conference (SGC)
7. Hosseinzadeh N, Mousavi A, Teirab A, Varzandeh S, Al-Hinai A (2019) Real-time moni-
toring and control of. a microgrid - pilot project: hardware and software. In: 29th Australian
universities power engineering conference (AUPEC), pp 1–6
600 B. Sujatha et al.
Abstract With the increased usage of DER like solar and wind energies, power
quality (PQ) has become serious concern, in distribution systems and industries.
This study suggests utilizing an ultracapacitor (UCAP) at the DC link of the power
conditioner employing a bidirectional DC-to-DC converter (BDC) to lessen a variety
of power quality issues. The ultracapacitor will improve active power transfer capa-
bility and will also reduce the voltage sag and voltage swell problems. UCAP’s
low energy density, high power density, and quick charging and discharging rates
will help to address distribution system power quality problems. The performance
of ultracapacitor along with bidirectional DC-DC converter configuration is studied
using MATLAB/SIMULINK software. The effectiveness of the UCAP using PID
controller and fuzzy controller control (FLC) algorithm is compared.
1 Introduction
Power quality is the term which was given more importance for power engineers
nowadays. Quality of power is measured by which parameters like voltage, current,
etc., deviate from the given standards. In the distribution systems, power quality
(PQ) has become a research in the present scenario due to vast growth in customers
on distribution side. The major concern in this is variations in different parameters
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 601
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_58
602 K. Bhavya et al.
like voltage, current, real and reactive power at different customer premises. The
main causes for power quality issues are external issues like lightning strikes, motor
starting, load variations, nonlinear loads, and arc furnaces. These will lead to various
electrical disturbances like voltage sag, voltage swell, harmonic distortions, inter-
ruptions, and flicker [1]. This power quality issue will be considered majorly from
customer side and is also useful from utility side. Due to the increasing expansion of
DG technologies including fuel cells (FC), photovoltaic (PV), wind turbines (WT),
small-scale hydro plants (ES), energy storage (ES), and power quality issues arise.
Conventionally, shunt capacitors were used to improve power factor as one of
the reactive power compensation techniques. But, sizing and optimal location of the
capacitor are the major concerns in the radial system.
Due to development of different custom power devices, the power quality prob-
lems are reducing nowadays. These devices are DVR, DS-TATCOM, UPQC, IPFC,
etc., are used for mitigation of various power quality issues both in transmission
and distribution systems. The D-STATCOM is a shunt controller that produces or
consumes reactive power at PCC, allowing for the maintenance of power quality.
DVR is a series connected device which will inject three-phase voltage at the same
frequency in series to the network voltages to compensate the disturbances.
Advantages of integrating series controllers such as DVR and APF through a
converter architecture, which was dubbed UPQC, for power quality enhancement in
distribution systems in [2]. The main goal of the conventional UPQC is to recover
power quality in distribution systems. Energy storage integration proposed in this
paper improves active power capability along with mitigation of voltage sag, swell,
and harmonics.
2 3-Ø Converter
In-phase compensation method was implemented for series inverter, which necessi-
tates the use of a PLL to estimate º. Based on θ and the L-L source, line voltages
are converted to dq coordinates and the L-N components of the source voltage will
be calculated by using following equations.
⎡ ⎤ ⎡ ⎤
Vsa 1 √0 Vd
⎣ Vsb ⎦ = ⎢ 3 ⎥ cos θ − π6 sin θ − π6 √
⎣ −1 ⎦ 3
(1)
− sin θ − π6 cos θ − π6
2 2√ V
√q
−1
Vsc 2
− 2
3 3
⎡ ⎤ ⎡ ⎤
Vrefa sin θ − 169.7
Vsa
The voltages are kept at normal sine waves as 415 V rms and compared with unit
sine. V ref is the voltage needed to have a stable voltage at the load. The DVR injects
an equivalent voltage V inj2 in-phase in case of mentioned disturbances in the supply,
and UCAP is employed to compensate and maintain the promised voltage V L to the
604 K. Bhavya et al.
load. Equation (3) uses the voltage injected V inj2a and load current I La to find active
power and reactive power delivered by series inverter, where φ is difference in-phase.
3
Pref = − Vsq i qref
2
3
Q ref = − Vsq i dref (4)
2
⎡ ⎤ ⎡ ⎤
i refa 1 √0
⎣ i refb ⎦ = ⎢ 3 ⎥ cos(θ ) sin(θ ) i dref
⎣ −1 ⎦ (5)
2
−1
2√ − sin(θ ) cos(θ ) i qref
i refc 2
− 23
The id-iq technique was used to create the controller for the shunt inverter, which
delivers active power and reactive power compensation, with the id component
controlling the reactive power and the iq component controlling the active power.
The active and reactive power references are calculated by using iqref and idref , i.e.
from Eq. (4). Equation (5) is used to calculate the reference currents.
3 Description of BDC
The buck-boost converter used as an edge between the UCAP and the DC link in this
BDC. This converter will offer active/reactive power assistance as well as voltage sag
correction, while the UCAP is in discharge mode. During intermittency smoothing,
this converter functions bidirectionally, charging, or absorbing power from the grid.
When discharging power from the ultracapacitor, the proposed buck-boost DC-to-
DC converter functioned as boost converter, and when charging the ultracapacitor
from the grid, it operated as buck converter.
The output voltage of this DC-to-DC converter is controlled using average current
mode control. When used in conjunction with other tactics like voltage mode control
and peak current mode control, this strategy works better.
4 UCAP
UCAP can transport extremely high power within a short time. When compared to
Li-ion batteries, ultracapacitors have lower energy density and higher power density.
UCAPs feature superior power density, more lifespan cycles of charge and discharge,
and higher terminal voltages for each module than conventional batteries. These
are suitable properties for delivering active and reactive power assistance to the
distribution system in a short period of time. The terminal voltage at the UCAP, the
Power Quality Enhancement in Distribution System Using … 605
DC-link voltage, and the grid voltages on the distribution side all affects how many
ultracapacitors are required for grid support.
The UCAP bank has three modules that are practical and economical for a 260-V
DC-link voltage. The UCAP bank discharging process can be calculated as follows
1 2
Vuc,ini − Vuc,fin
2
EUCAP = ∗C ∗ Wmin (6)
2 60
5 Fuzzy Controller
The procedure of adapting fuzzy data into a single data is known as defuzzification.
An aggregate of all the rule outputs is determined in this method. The aggregate
indicates the required change in switching instant. Particular change in the triggering
angle is calculated which results in change in grid currents to enhance power quality.
6 Simulation Results
See Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17.
7 Conclusion
This study describes a power conditioner system for distribution grids that employs
UCAP-based energy storage with rechargeable capacity. With the proposed config-
uration, the active power filter will be able to deliver the distribution grid with active
and reactive power, and the DVR component will be able to independently adjust
for voltage peaks and valleys. Shunt inverter control techniques and series inverter
610 K. Bhavya et al.
(DVR) control strategies both rely on the id/iq methodology and in-phase compen-
sation, respectively. Performance of the UCAP is tracked using a PID controller and
fuzzy logic controller. The system’s power quality was improved because the fuzzy
logic controller responded more effectively in lowering the sags and swells.
612 K. Bhavya et al.
References
Abstract One in eight females globally suffers from breast cancer. It is identified by
recognizing the malevolence of breast tissue cells. Current medical image processing
techniques use histopathological pictures recorded by a microscope to examine them
employing various procedures and methodologies. Nowadays machine learning algo-
rithms are widely used for the interpretation of medical-related images and tools
related to pathology. As identifying cancer cells manually is time consuming and
may be prone to errors in operation, computer-aided processes are used to get better
outcomes than manual pathological detection methods. This is often accomplished
in deep learning by the process of feature extraction fully aided by a convolutional
neural network (CNN) followed by the classification process via a fully connected
network. Deep learning is widely used in medical imaging since it does not need
prior knowledge in a related discipline. The process involved in the current study is
training a CNN and accomplished prediction accuracy of up to 86.9%.
1 Introduction
India has seen 30% of the situations of bust cancer cells throughout the last couple
of years and it’s most likely to enhance [1]. Among all kinds of cancer in ladies,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 613
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_59
614 N. M. Sai Krishna et al.
carcinoma is possible. Carcinoma has the second-highest fatality rate when respi-
ratory organ and cartilaginous tube cancer, and concerning half-hour of freshly
diagnosed cases area unit of carcinoma solely. Progressing the combat to eradicate
cancer needs an initial outcome that might solely be attainable through an asso-
ciate in nursing economical detection systems. Techniques are developed to notice
carcinoma, together with the processing of medical images and digital pathologies.
Pictures are unit tested by histopathology that typically contains diagnostic tests
of the contrived tissues. These matters are wracked by the growth area unit dera-
cinated by the medical specialist and are a combination of hematoxylin and eosin
stain. They are examined beneath a magnifier for carcinogenic cells. The infinites-
imal pictures hence accumulated are utilized for creating computer-aided cancer
discovery systems. Performing the detection process manually could be monotonous
and possibly encompass human-caused errors, as maximum components in the cell
area unit are oftentimes a portion unevenly random. The objective is to find out the
nature of the growth of tumors as benign or malicious because of the presence of more
hitches in the case of tumors being malicious. Briefly, this is a binary classification
problem and has a higher possibility to get solved by widely ranged machine learning
processes. It has been proved in the part that using machine learning algorithms in
diagnosing various diseases gives additional results when compared to the diagnosis
by a human medical expert. Based on the study conducted by Phillips (Europe), a wide
assortment of computer-based procedures operated on side of breast pictures have
provided additional correct information in the process of detection. This additional
information associated with good quality pictures has an opportunity in improvising
the performance and accuracy toward detecting cancer [2].
2 Background Study
In the process of diagnosis and recognition of breast cancer, CNN plays a real-
istic role to offer high accuracy compared to the multilayer perceptron method [3].
A CNN training task involves a huge volume of data, which lack in the medical
domain, particularly in breast cancer [4]. Some of the ways to detect breast cancer
are mammography, nuclear imaging, computed tomography (CT) scans [5], magnetic
resonance imaging, etc. However, there is a limitation on how accurately to detect
breast cancer from these techniques. On the other hand, histopathology tests are
tissue-based, where cell structures along with additional external elements are stained
and captured in high resolution for pathologic analysis. These images give a high
level of details to accurately diagnose cancer, but the identification is also equally
difficult due to various factors such as multiple appearances of cancer cells, intra-
observer variation, due to common hypothermic features, and area of selection of
the tissue, so that the selected area is in tumor periphery. Hence, this issue can easily
be resolved by deep learning models.
Deep learning is a subsection of machine learning that has resulted in high success
rates and accuracies. The deep learning neural networks are inspired and modeled
CNN-Based Breast Cancer Detection 615
around the human brain to analyze unstructured patterns. These techniques are used
to extract and analyze the features at each layer of the neural network and there-
fore improve the prediction of tumors [6]. There are existing deep learning models
designed exclusively for image classification with high accuracies such as VGG19,
AlexNet, Mobile Net [7], etc. One can easily use these existing models or design a
new model to solve the problem at hand.
As stated above, there are various networks designed aiming to classify breast
cancer, such as Artificial Neural Networks based on Maximum Likelihood Estima-
tion (MLE), GRU–SVM model based on Recurrent Neural Networks (RNNs), Gated
Recurrent Unit (GRU) reinforced by Support Vector Machine (SVM), and many
more. Neural networks together with Multi-variate Adaptive Regression Splines
(MARS) can likewise be used in identifying tumor growth. The BreakHis dataset [8]
published in 2015 has been used by Fabio A. Spanhol, who described the constraints
along with the neural-net system obtaining an accuracy between 80 and 85%.
Arpit B and Aruna T came up with a Genetically Optimized Neural Network
[9] (GONN) for breast cancer identification as benevolent or malignant. The neural
network architecture has been optimized by presenting state-of-the-art crossover and
mutation operators [10]. This had been evaluated by using WBCD and comparing
the classification accuracy confusion matrix, sensitivity, specificity, and the receiver
operating characteristic curves of GONN by means classical backpropagation model
[11]. This technique presented an acceptable accuracy in the classifying process.
However, there is a scope for improvement by using a larger dataset. Ashraf O I
and Siti M S have given out a computer-based process to classify breast cancer
using a multilayer perceptron (MLP) neural network centered around the concept of
improved non-dominated sorting genetic algorithm (NSGA-II) for the optimization
of the accuracy and network structure.
3 Architecture
4 Results
Figure 4 presents a few IDC (−) samples from the validation set and along with that
also shows the model’s predictions.
The top losses incurred by the model during the training process are as follows in
Fig. 5.
Everyone can see that some samples are originally IDC (+), but the model predicts
them as IDC (−). This is a staggering issue. One needs to be certainly cautious with
the false negatives. One doesn’t need to categorize somebody as “No cancer” when
they are in fact “Cancer positive.”
Fig. 5 Prediction/actual/loss/probability
5 Conclusion
So finally designed a model which is 86.9% accurate and has got an improved recall
for both in case of the positive and the negative classes. Anyone still could have
trained the network for more. Here, the model is trained with 7 epochs and it took
approximately two hours. More fine-tuning could have been done. Sophisticated data
augmentation and resolution techniques could have been applied as a Future Scope.
CNN-Based Breast Cancer Detection 621
References
1. Vaka AR, Soni B, Reddy S (2020) Breast cancer detection by leveraging machine learning.
ICT Express 6(4):320–324. ISSN 2405-9595
2. Dabeer S, Khan MM, Islam S (2019) Cancer diagnosis in histopathological image: CNN
based approach. Inf Med Unlocked 16:100231. https://doi.org/10.1016/j.imu.2019.100231.
ISSN 2352-9148
3. Alanazi SA, Kamruzzaman MM, Nazirul Islam Sarker MD, Alruwaili M, Alhwaiti Y, Alsham-
mari N, Siddiqi MH (2021) Boosting breast cancer detection using convolutional neural
network. J Healthcare Eng 5528622:11
4. Saber A, Sakr M, Abo-Seida OM, Keshk A, Chen H (2021) A novel deep-learning model for
automatic detection and classification of breast cancer using the transfer-learning technique.
IEEE Access 9:71194–71209
5. Epimack M et al (2021) Breast cancer segmentation methods: current status and future
potentials. BioMed Res Int 2021:9962109. https://doi.org/10.1155/2021/9962109
6. Han Z, Wei B, Zheng Y, Yin Y, Li K, Li S (2017) Breastcancer multi-classification from
histopathologycal images with structured deep learning model. Sci Rep 7(1):4172
622 N. M. Sai Krishna et al.
7. Salama WM, Aly MH (2021) Deep learning in mammography images segmentation and clas-
sification: automated CNN approach. Alexandria Eng J 60(5):4701–4709. https://doi.org/10.
1016/j.aej.2021.03.048. ISSN 1110-168
8. Joshi SA, Bongale AM, Bongale AM (2021) Breast cancer detection from histopathology
images using machine learning techniques: a bibliometric analysis. Library Philos Pract (e-J)
5376. https://digitalcommons.unl.edu/libphilprac/5376
9. Espanhol FA, Oliveira LS, Petitjean C, Heutte L (2016) Breast cancer histopathological image
classification using convolutional neural networks. In: 2016 International joint conference on
neural networks, IJCNN 2016, Vancouver, BC, Canada, July 24–29, 2016, pp 2560–2567
10. Solanki YS, Chakrabarti P, Jasinski M, Leonowicz Z, Bolshev V, Vinogradov A, Jasinska E,
Gono R, Nami M (2021) A hybrid supervised machine learning classifier system for breast
cancer prognosis using feature selection and data imbalance handling approaches. Electronics
10:699. https://doi.org/10.3390/electronics10060699
11. Agarap AF. On breast cancer detection: an application of machine learning algorithms on the
Wis consin diagnostic dataset. CoRR abs/1711.07831
12. 2015. Breast cancer diagnosis using genetically optimized neural network model. Expert Syst
Appl 42(10):4611–4620. https://doi.org/10.1016/j.eswa.2015.01.065
13. Ashraf OI, Siti MS (2018) Intelligent breast cancer diagnosis based on enhanced Pareto optimal
and multilayer perceptron neural network. Int J Comput Aided Eng Technol Inderscience
10(5):543–556
14. Yu X, Zhou Q, Wang S, Zhang Y-D (2021) A systematic survey of deep learningin breast
cancer. Int J Intell Syst 37(1):152–216. https://doi.org/10.1002/int.22622
15. Khamparia A, Bharati S, Podder P et al (2021) Diagnosis of breast cancer based on modern
mammography using hybrid transfer learning. Multidim Syst Sign Process 32:747–765. https://
doi.org/10.1007/s11045-020-00756-7
Voltage Stability Analysis
for Distribution Network Using
D-STATCOM
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 623
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_60
624 B. Gupta et al.
provides better voltage profiles at each bus of the distribution system, as well as
increased reactive loading capabilities in all loading conditions, improved system
stability, and reduced reactive power flow to reduce line losses. Using a D-STATCOM
controller and Simulink, this work aims to [3] construct an indicator for analysing
the voltage stability of distribution systems. The impact of load variation based on
voltage is investigated in this research. With the presence of DG and D-STATCOM,
the voltage critical stability limits will be calculated using the continuation load flow
technique.
2 Methodology
x: f (x) = 0
x: f (x) = 0
f (x0 )
x1 = x0 −
f (x0 )
Gauss–Seidel (G–S) is a more advanced variant of Gauss iterative. The change was
made to keep the number of iterations to a bare minimum, which is appropriate
for studying power flow in small-scale power systems. A solution vector is first
anticipated based on the real-time data. To acquire the updated value of a given
variable, one of the iterations involves substituting the current values of the other
variables into one of the modelling equations. In respect to this variable, the solution
vector is updated in real time, and the method is repeated until one iteration is
complete. The solution vector will be iterated in this manner until it converges to the
given precision.
(1) If V 1 = V pcc , D-STATCOM will not draw or create any reactive power since
the power exchange between the controller and the grid is zero.
(2) If V 1 > V pcc , then D-STATCOM provides phantom power by acting as an
inductive reactance linked at its terminals.
626 B. Gupta et al.
(3) V 1 < V pcc , the gadget absorbs inductive reactive power after D-STATCOM has
acted as capacitive reactive power.
The study focuses on constructing an indicator for analysing the voltage stability of
distribution networks using the Simulink platform and the D-STATCOM controller.
Voltage Stability Analysis for Distribution Network Using D-STATCOM 627
While load variations are kept constant, the controller’s reaction to changes in the
source voltage is detected with a change in time of [Ton Toff] = [0.15 1] *100 times
longer than the simulation period. The internal voltage of the 25-kV equivalent line
is changed using the source voltage. To keep the D-STATCOM floating at first, the
628 B. Gupta et al.
input is adjusted to 1.077 pu (B3 voltage = 1 pu and reference voltage V ref = 1 pu).
The source voltage is improved by 6%, dropped by 6%, and increased by 6% in three
phases of 0.2, 0.3, and 0.4 s and ultimately obtain its initial value (1.077 pu). After
around 0.15 s of transient, the steady state will be approached during simulation.
For the time being, the D-STATCOM will be declared inactive with respect to the
source. It doesn’t absorb or provide reactive power to the grid. The source voltage
will increase by 6% after t = 0.2 s passes. The D-STATCOM absorbs network
reactive power. The source voltage is lowered by 6%, or Q = 0, at t = 0.3 s. To
maintain a 1 pu voltage, the D-STATCOM must create reactive power (Q shifts from
+2.7 to −2.8 MVAR). The modulation index of the PWM inverter increases from
0.56 to 0.9 when the D-STATCOM switches from inductive to capacitive operation
(output 4). As a result, the voltage of the inverter rises correspondingly. On D-
STATCOM current, there is a rapid reversal of reactive power (output 1). During
voltage flickering mitigation, the source voltage is unaffected, but the variable load is
modulated, allowing the D-STATCOM [6–8] to detect voltage flickering mitigation.
Bus B3 (output 1) voltage, as well as Buses B1 and B3 voltage (output 2), When
the modulation Timing parameter is set to [Ton Toff] = [0.15 1] and Q regulation
is enabled, changes in P and Q can be detected. In the absence of D-STATCOM,
the bus voltage B3 varies between 0.96 and 1.04 pu (+/− 4% fluctuation). The
voltage fluctuations on Bus B3 are reduced to less than 0.7% when the D-STATCOM
controller is fitted. When the voltage falls below a certain threshold, the D-STATCOM
compensates by injecting a 5 Hz-modulated reactive current (output 3) that varies
between 0.6 pu capacitive and 0.6 pu inductive depending on the voltage.
4 Conclusions
The work focuses on voltage stability analysis with the D-STATCOM controller and
Simulink analysis of different power transmission issues. The performance of a single
D-STATCOM that can do both load balancing and reactive power compensation is
being investigated. The controller’s performance is assessed in a range of opera-
tional scenarios, with each case’s resilience being documented. The D-STATCOM
improves voltage regulation in the power system while also helping to lower fault
current during fault circumstances since it stabilizes the reactive power need in
power systems and works as a controlled reactive source. The findings show that
the controller is capable of controlling system voltage in both normal and abnormal
conditions.
Voltage Stability Analysis for Distribution Network Using D-STATCOM 629
References
1. Kumar P, Kumar N (2012) D-STATCOM for stability analysis. IOSR J Electr Electron Eng
1(2):2278–1676
2. Patel AA, Karan B (2017) Application of DSTATCOM for voltage regulation and power quality
improvement. Int J Res Edu Sci Methods 5(3)
3. Hannan MA. Effect of DC capacitor size on D-STATCOM voltage regulation performance
evaluation. Przegl˛ad Elektrotechniczny. ISSN 0033-2097
4. Swaminathan HB (2017) Enhancing power quality issues in distribution system using D-
STATCOM. Int J Recent Res Aspects 4(4):356–359. ISBN 2349-7688
5. Palod A, Huchche V (2015) Reactive power compensation using DSTATCOM. Int J Electr
Electronic Data Communication 2320–2084:21–24 Special Issue
6. Singh B, Solanki J (2006) A comparative study of control algorithms for DSTATCOM for
load compensation. In: IEEE International conference on industrial technology, Mumbai, pp
1492–1497
7. Singh B, Solanki J (2009) A comparison of control algorithms for DSTATCOM. IEEE Trans
Ind Electron, 56(7):2738–2745. ISBN 1557-9948
8. Singh B, Jayaprakash P, Kothari DP, Chandra A, Al Haddad K (2014) Comprehensive study of
DSTATCOM configurations. IEEE Trans Ind Inf 10(2). ISBN 1941-0050
E-Dictionosauraus
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 631
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_61
632 J. Naga Vishnu Vardhan et al.
(usually, with the back of a pencil). It then extracts the word from the text, gathers
its meaning. Additionally, it also gives the user its synonyms and antonyms. Further,
it delivers the output in the form of audio that helps out how to pronounce. So, with
the help of E-Dictionosauraus, one can effortlessly get their work done quickly.
1 Introduction
While reading a book or magazine or article english is widely used language across
the globe. It has huge number of words and is quite impossible for a person to
know the meaning of each. The dictionaries that are currently available are either a
physical dictionary or a search engine such as Google, Firefox, etc., both of these
options are wearisome. Searching for the meaning in a physical dictionary is a tedious
job, and searching in the latter is time-consuming. Also, neither of them has readily
available synonyms and antonyms. Further, effective usage of dictionary depends on
user-friendliness of dictionaries and also on skills of the users.
Some electronics such as the kindle have an inbuilt dictionary. The software
available in this technology gives the meaning of the word searched by the reader.
Even though it gives the meanings of words on the system, it cannot help in the
case of books or newspapers, but, the majority of the people use physical mode to
read. So, we need an application to assist us in this situation. We need software that
not only saves time but also makes the job simpler and effortless. That is exactly
what “E-Dictionosauraus” do. With this software, reading can be made joyous and
uninterrupted.
Using E-Dictionosauraus, when a reader comes across a foreign word, he/she can
directly get the meaning of it, irrespective of where they are reading. The reader uses
the back of the pencil to point to the word they are unaware. The system captures
the image of the page and extracts the word that is being pointed. Inbuilt libraries
such as Pydictionary and Wordnet help to draw out the meaning of the word along
with its synonyms and antonyms. The output is displayed on the screen. In addition
to this, an audio output of the meaning is also provided. This helps user to know how
to pronounce a word.
2 Literature Survey
People across the globe developed different types of e-Dictionaries but are restricted
to a narrow application like language of a particular country and few discussed about
the usage of e-dictionary. In the paper [1], it is mainly focused and discussed on
dictionary usage skills required to use modern digital dictionaries and how as we
move from the usage of paper to electronic dictionaries.
E-Dictionosauraus 633
3 Proposed System
Optical character recognition (OCR) is a technique, which is used to recognize the text
from images and to converts into editable text form. Those images can be handwritten
text, printed text such as documents, receipts, name cards, books, and newspapers.
OCR is a two-step process. In the initial step of “text detection”, the text in the image
will be detected. The second step called “text recognition” is one where the detected
text will be extracted from the image. Performing these two steps together is how
text from the image is extracted (Fig. 1).
634 J. Naga Vishnu Vardhan et al.
The most commonly used color space—RGB is represented as their red, green, and
blue components. RGB describes the color as a tuple of three components. Each
component can take a value in between 0 and 255. The tuple (255, 255, and 255)
represents white color, the tuple (0, 0, and 0) represents black color, and the tuple
(255, 0, and 0) represents a red color. RGB is one of the five major color space
models. There are so many color spaces because different color spaces have their
unique purposes. HSV is a representation of hue, saturation, and brightness. These
characteristics are specifically useful for identifying contrast in images.
3.3 Working
Firstly, the image of the page along with a pointer pointing to the word whose meaning
is to be extracted is captured. This captured input image will be given as input to
the application. The application extracts the color of the pointer, which is pointing
to the word in the input image using the color detection method. After extracting the
location of the pointer, a rectangular box with fixed length and breadth is drawn and
the image will be cropped according to the rectangular box. A few image processing
methods are applied on the cropped image like converting the image into grayscale
and thresholding.
The found nearest contour will be taken as input to the by tesseract function, which
extracts the word from the image. Then, the meaning, synonyms, and antonyms of
E-Dictionosauraus 635
the extracted word are displayed and produced as audio on the output. The meaning,
synonyms, and antonyms of the word are extracted using Pydictionary and Wordnet
library (Fig. 2).
This section discusses the results for various test cases to check the outcome of the
system designed in successfully pointing the targeted word and providing its meaning
along with antonyms and synonyms with voice pop-up.
Consider an image shown below where the image of the page is captured initially.
Then, the word pointed is extracted.
Figure 3 shows the captured input picture that shows a word “successful” pointed
with red color, whose meaning is to be produced. Figure 4 shows, the word “suc-
cessful” which is to be extracted is detected and highlighted with a rectangular
box.
Figure 5 is displaying the meaning, synonyms, and antonyms of the extracted
word “successful”. Audio output can also be produced by clicking on the play button,
which is beside the result.
636 J. Naga Vishnu Vardhan et al.
4.2 Example 2
The process is repeated to check the functionality of the application developed with
another word.
E-Dictionosauraus 637
Figure 6 shows the captured input picture that shows a word “expedition” pointed
with red color, whose meaning is to be produced. Figure 7 shows, the word “expe-
dition” which is to be extracted is detected and highlighted with a rectangular
box.
Figure 8 is displaying the meaning, synonyms, and antonyms of the extracted
word “expedition”. Audio output can also be produced by clicking on the play button,
which is beside the result.
From Fig. 9, one can observe that instead of giving a valid word or a text or even a
red pointer is not found and some image was shown to the detector, then we get the
output as “No Valid Word Detected. Please Try Again”. This confirms the efficacy
of the system designed. It can be concluded that only if a valid input is detected, the
required output is generated, else no.
5 Conclusion
The purpose of the project is to help readers to find the meaning of the word along with
their synonyms and antonyms instantaneously using the application developed. It
makes their tasks easier and comfortable. The application has been tested successfully
638 J. Naga Vishnu Vardhan et al.
for various words where the outcome is effective. In case of non-text, like where image
is given as input, it displays that no valid word is detected. This application developed
helps the reader to get the meaning, synonyms, and antonyms of the word they are
looking for easily and also saves time. The application presented also produces audio
output by which the user can also know how to pronounce the word. The application
developed is easy to use, saves time as we do not have to manually search for the
640 J. Naga Vishnu Vardhan et al.
References
Abstract Healthy life leads to healthy growth in a human’s life. Health is more
important than all other things in our life. The heart is the backbone of our life.
Many types of research are going on in the medical field that finds new treatments
in the healthcare industry. Machine learning is an important technology that helps
predict the accuracy of disease. Machine learning takes the attributes from the health
industry and analyzes the data using many algorithms. Based on the trained data,
the experiment will explore the analysis prediction in the given dataset. It helps in
many ways for humans in their life. In this research, we predicted cardiovascular
disease before the early stage. We collected healthcare data and applied various
machine learning algorithms. We analyzed the accuracy of the support vector machine
(SVM), logistic regression (LR), and stochastic gradient descent (SGD) algorithms.
Finally, research proves support vector machines as the best algorithm for predicting
cardiovascular disease in the early stage of our lifespan.
1 Introduction
Cardiovascular disease [1] is a collection of different resources that affect the blood
circulation in the body and will affect the heart. This leads to coronary artery disease,
heart arrhythmias, and heart failure. The heart patients will disturb their minds and
life. Patients need support and encouragement from neighbor relatives and friends.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 641
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_62
642 M. Shanmuga Sundari et al.
You will find heart-friendly recipes, healthy living tips, and strategies for main-
taining a healthy heart. Our goal of informing the public is to improve their heart
health and avoid heart disease and stroke. Learn stress management and eating tricks
to strengthen your heart and become one less statistic.
Nowadays, predicting heart disease is an essential role in the healthcare industry.
It has a more vital role in giving treatment to patients in the early stages. Heart
disease is also predictable with various factors and attributes. A healthy lifestyle
produces fewer factors to create heart disease. If the patient has a high value of
chronic parameters, then we can use machine learning technology to predict the
problem earliest in life. Machine learning [2] is the technology that will help the
heart disease using human activity and lifestyle. It will give accurate predictions and
solutions in the health industry. It saves human resources in terms of heart diseases.
Machine learning is a field of data mining that handles all real-time and dynamic
problems efficiently. In the health industry, machine learning is acting as a doctor
that helps in many ways for researchers to predict and diagnose diseases. The main
goal of this paper is to provide the prediction about heart disease in the early stage
of life. It helps to predict whether humans will get heart problems or not at an early
stage of our life. Machine learning is a good sense of technology that will analyze
and produce accurate results based on a few parameters.
This paper explains the support vector machine [3], logistic regression [4], and
stochastic gradient descent algorithms [5]. We implemented the model and applied
three algorithms. This research found the best algorithm, which gives the accurate
result of heart problems.
2 Literature Survey
Nowadays, various tools and technologies are available in the healthcare domain that
leads the industry to the next level [2–5]. It works on machine learning techniques
using algorithms such as logistic regression (LR), support vector machine (SVM),
and stochastic gradient descent algorithms (SGD). In health care, many studies are
going on predicting heart disease earlier. Many laboratories are developing a real-time
analysis for heart prediction using machine learning algorithms. SVM is one of the
algorithms to find the dependency values between attributes and analyze the disease
and acute cardio effect in [3] and blood pressure in [6]. This research discussed cardiac
problems in using SVM in [7]. The classical machine learning algorithms analyze
with predictions [8]. SGD algorithm [9] uses to predict acute heart problems. Random
forest algorithm used to predict cardiovascular disease [10]. SVM has used many
research processes for heart disease in [11]. SVM, SGD, and LR are applied to predict
the efficiency of survival rates in [12]. The author compared multilayer perceptron
SVM to predict cardiac problems [13]. Clinical decision support systems are the
collection of incorporated hospital datasets [3–6, 9, 10, 14]. From 2017 onwards,
laboratories used to predict the process mentioned in [15]. Feature detection [16]
Effective Prediction Analysis for Cardiovascular Using Various … 643
Patients Attribute
Preprocessing
Data Base Selection
Stochastic
Support Vector Logistic
gradient
Machine Regression
descent
Cardiovascular
disease
prediction
Accuracy
Measure
technique applies to detect the required features in the data preprocessing model.
The brain tumor prediction [17] was implemented using CNN.
Our proposed system shows to implement the prediction of heart disease. Figure 1
shows the working process flow of the recommended model. The research imple-
mented the model to predict these heart diseases earlier. The dataset has a large
volume of patients’ details. Attribute selection helps to identify the relevant features
to support cardiac disease research. After dataset collection, the preprocessing tech-
nique uses to clean our dataset. We implemented three algorithms in machine learning
after extracting data from a dataset. The medical database consists of discrete
information. Discrete data becomes a complex and tedious task in the prediction.
4 Implementation
4.1 Dataset
We collected a dataset from Kaggle [18] website to research heart disease. The
dataset is collected from Residents of Framingham city in Massachusetts, USA. We
implemented the prediction of the coronary heart dataset in the early stage. The
dataset contains 4240 records and 16 attributes/columns. The collected dataset is in
644 M. Shanmuga Sundari et al.
excel format that we changed to comma separated value (CSV) format. We used
panda’s library in Python to construct our code.
There are four steps in data preprocessing [19]. We followed these steps and extracted
the noiseless data from our original dataset.
Steps followed in preprocessing:
• Data cleaning: The dataset is cleaned up by removing the noisy and inconsistent
data. We reduce 17 attributes to 14 attributes to fit in our analysis.
• Data integration: The data is collected and incorporated into the research analysis.
• Data reduction: The attributes were selected which are suitable for our research.
• Data transformation: The final dataset has the required attributes.
We have 14 attributes after cleaning our dataset. The next step is finding the corre-
lation [20] between two numerical values between the attributes to find an insight
of relation. The dataset has many fields, and the matrix has correlation attributes.
Figure 2 shows the correlation matrix for heart diseases.
The points that are away from the expected line are called outliers [21]. We need to
delete all the outlier values in the matrix. The outliers produce noisy transactions in
the process. Figure 3 shows the outlier of each attribute and gets the original dataset
for our implementation.
5 Algorithm
SVM is the technique to investigate information and patterns for prediction. SVM
includes two stages: the initial step is to prepare an informational index and build
a model, and the second step is to utilize the SVM model to predict the testing
dataset. Hyperplanes are segregated based on the related attributes. Our idea is to
Effective Prediction Analysis for Cardiovascular Using Various … 645
recognize a plane that is the most accurate hyperplane, i.e., we need to locate the
separation between information purposes of the two classes. We need to increment,
i.e., maximizing the edge separation of the information in particular certainty.
The support vector machine is the approach during non-separable support vector
and nonlinear projection without depending on the cost function. We have to use
kernel tricks to handle data. SVM is a powerful tool in health care to predict many
applications and reduce health issues. It also plays a huge role in medicine compo-
sition. In the formula (1)–(3), β is a constant value, and X is the input attribute
value.
Maximum margin classifier
β0 + β1 X 1 + β2 X 2 > 0 (1)
β0 + β1 X 1 + β2 X 2 < 0 (2)
β0 + β1 X 1 + β2 X 2 = 0 (3)
SVM algorithm is best to find the decision boundary using a hyperplane. It clas-
sifies the prediction value using a hyperplane. The model creates a hyperplane based
on the vector data distance. The marginal value is calculated using the distance of
vectors. The maximum margin of the hyperplane is the optimal hyperplane. Using
this hyperplane concept, we can optimize whether the cardiovascular disease is within
the margin or not. So, SVM can predict the accuracy of cardiovascular disease using
marginal hyperplanes. Table 1 shows that classification report of SVM algorithm after
applied cardiovascular database attributes, and the accuracy was found as 85.61%.
Stochastic gradient descent (SGD) is used to fit the linear classifiers and regressors
with convex loss functions. SGD is mainly used for large-scale and machine learning
sparse data, which are encountered in text classification. In the training set, examples
(x1, y1)… (xn, yn) where xi in R are considered as input attributes and yi in R (yi ∈
Effective Prediction Analysis for Cardiovascular Using Various … 647
−1,1 for classification), from that a linear scoring function is calculated as f (x) =
wTx + b using other parameters w ∈ Rm and intercept b ∈ R. Binary classification
prediction is calculated using sign function of x.
Using the below formula, we have to reduce the regularized training
1
n
(w, b) = L yi, f (xi ) + α R(w) (4)
n i=1
Table 2 shows that classification report of SGD algorithm after applied cardio-
vascular database attributes, and the accuracy was found to 85.14%.
The logistic regression (LR) model works well in which two classes have a label with
0 and 1. The regression will get the minimized distance points in the hyperplane. The
sigmoid function is used to predict values to probabilities. The limit of the probability
is between 0 and 1. The curve looks like S, so it is called a sigmoid function. The
threshold value is calculated using the sigmoid function and the curved line.
The logistic function is defined as
1
logistic(η) = (5)
1 + exp(−η)
The equation gives the probabilities between 0 and 1 using the sigmoid equation.
The output enforced from this gives the prediction of cardiovascular disease using
is a constant value, and X is the input attribute value.
1
P y (i) = 1 = (6)
1 + exp − β0 + β1 x1(i) + · · · + β p x (i)
p
6 Comparative Analysis
We find the optimal values of the parameters to find the minimum possible value
of the given cost function. SGD produced 85.14% accuracy. Logistic regression is
a supervised classification algorithm. We use a predictive analysis algorithm based
on the concept of probability. It estimates the probabilities of the dependent variable
and the one or more independent variables using a logistic function. LR algorithm
produced 85.49% accuracy. Since we implemented three algorithms, we achieved the
highest accuracy using the support vector machine, a supervised learning algorithm.
Using the SVM algorithm, we reached an accuracy of 85.61%. In the future, we will
add the health image attributes as a parameter in our dataset and find the accuracy of
cardiovascular disease.
Effective Prediction Analysis for Cardiovascular Using Various … 649
References
1. Rubini PE, Subasini CA, Vanitha Katharine A, Kumaresan V, Gowdham Kumar S, Nithya TM
(2021) A cardiovascular disease prediction using machine learning algorithms. Ann Romanian
Soc Cell Biol 904–912
2. El-Ganainy NO, Balasingham I, Halvorsen PS, Rosseland LA (2020) A new real time clinical
decision support system using machine learning for critical care units. IEEE Access 8:185676–
185687
3. Wu J, Guo P, Cheng Y, Zhu H, Wang XB, Shao X (2020) Ensemble generalized multiclass
support-vector-machine-based health evaluation of complex degradation systems. IEEE/ASME
Trans Mechatron 25(5):2230–2240
4. Ksantini R, Ziou D, Colin B, Dubeau F (2007) Weighted pseudometric discriminatory power
improvement using a bayesian logistic regression model based on a variational method. IEEE
Trans Pattern Anal Mach Intell 30(2):253–266
5. Costilla-Enriquez N, Weng Y, Zhang B (2020) Combining Newton-Raphson and stochastic
gradient descent for power flow analysis. IEEE Trans Power Syst 36(1):514–517
6. Ghosh P, Azam S, Jonkman M, Karim A, Shamrat FJ, Ignatious E, Shultana S, Beeravolu
AR, De Boer F (2021) Efficient prediction of cardiovascular disease using machine learning
algorithms with relief and LASSO feature selection techniques. IEEE Access 9:19304–19326
7. Micek A, Godos J, Del Rio D, Galvano F, Grosso G (2021) Dietary flavonoids and cardio-
vascular disease: a comprehensive dose–response meta-analysis. Molec Nutrition Food Res
65(6):2001019
8. Zhou J, Qiu Y, Zhu S, Armaghani DJ, Li C, Nguyen H, Yagiz S (2021) Optimization of support
vector machine through the use of metaheuristic algorithms in forecasting TBM advance rate.
Eng Appl Artif Intell 97:104015
9. Isola G, Polizzi A, Alibrandi A, Williams RC, Lo Giudice A (2021) Analysis of galectin-3 levels
as a source of coronary heart disease risk during periodontitis. J Periodontal Res 56(3):597–605
10. Sundari MS, Nayak RK (2020) Master card anomaly detection using random forest and support
vector machine algorithms. Int J Crit Rev 7(09). ISSN 2394-5125
11. Reddy RR, Ramadevi Y, Sunitha KVN (2017) Enhanced anomaly detection using ensemble
support vector machine. In: 2017 International conference on big data analytics and computa-
tional intelligence (ICBDAC), March 2017. IEEE, pp 107–111
12. Padmaja B, Prasad VR, Sunitha KVN (2016) TreeNet analysis of human stress behavior using
socio-mobile data. J Big Data 3(1):1–15
13. Ji Y, Kang Z (2021) Three-stage forgetting factor stochastic gradient parameter estimation
methods for a class of nonlinear systems. Int J Robust Nonlinear Control 31(3):971–987
14. Sun D, Xu J, Wen H, Wang D (2021) Assessment of landslide susceptibility mapping based on
Bayesian hyperparameter optimization: a comparison between logistic regression and random
forest. Eng Geol 281:105972
15. Dai R, Zhang W, Tang W, Wynendaele E, Zhu Q, Bin Y, De Spiegeleer B, Xia J (2021) BBPpred:
sequence-based prediction of blood-brain barrier peptides with feature representation learning
and logistic regression. J Chem Inf Model 61(1):525–534
16. Pasha SJ, Mohamed ES (2020) Novel feature reduction (NFR) model with machine learning and
data mining algorithms for effective disease risk prediction. IEEE Access 8:184087–184108
17. Khatoon Mohammed T, Shanmuga Sundari M, Sivani UL (2022) Brain tumor image clas-
sification with CNN perception model. In: Soft computing and signal processing. Springer,
Singapore, pp 351–361
18. https://www.kaggle.com/amanajmera1/framingham-heart-study-dataset/version/1
19. Sundari MS, Nayak RK (2020) Process mining in healthcare systems: a critical review and its
future. Int J Emerg Trends Eng Res 8(9). ISSN 2347-3983
20. Nayak RK, Tripathy R, Mishra D, Burugari VK, Selvaraj P, Sethy A, Jena B (2021) Indian stock
market prediction based on rough set and support vector machine approach. In: Intelligent and
cloud computing. Springer, Singapore, pp 345–355
650 M. Shanmuga Sundari et al.
21. Tripathy R, Nayak RK, Das P, Mishra D (2020). Cellular cholesterol prediction of mammalian
ATP-binding cassette (ABC) proteins based on fuzzy c-means with support vector machine
algorithms. J Intell Fuzzy Syst (Preprint) 1–8
Multi-layered PCM Method
for Detecting Occluded Object
in Secluded Remote Sensing Image
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 651
K. A. Reddy et al. (eds.), Proceedings of Fourth International Conference on Computer
and Communication Technologies, Lecture Notes in Networks and Systems 606,
https://doi.org/10.1007/978-981-19-8563-8_63
652 B. Narendra Kumar Rao et al.
1 Introduction
Detection of objects has become a simple yet challenging problem in the interpre-
tation of remote sensing photographs in recent years. Sensors and post-processing
technology have advanced dramatically in recent years. With a bare minimum 0.5 m
spatial resolution, high-resolution secluded remote sensing images (HR-RS) were
with quality and quantity. On the one hand, the abundance of spatial and spec-
tral information allows more precise recognition of more complicated geographical
objects, while on the other side, the crowded backdrop introduces more interference.
Various tasks involving detection of objects like detection of aeroplanes, aero-
dromes, vehicles, ships and buildings have been the focus of recent research. These
techniques primarily deal with different object formats and a model built from parts,
employing modern classifiers like support vector machine (SVM), features (SIFT
and HOG) and random forest. The appearance of remote objects in HR-RS images
can be significantly influenced by weather, light or cloud, which can be termed as
environmental influence and intraclass changes (size, style or texture). It has been a
significant obstacle to identify the appropriate.
In HR-RS object detection [1], the common problem is occlusion. Large-scale
object detection, such as airport detection, is mainly concerned with occlusion in
moderate/low-resolution pictures. This problem is getting more important for smaller
things as the resolution of remote sensing photographs improves.
In traditional approaches, objects are represented using a single layer, and entities
of object are represented by low-level attributes directly or representational compo-
nents. When the basic representations of this system are distorted, it is readily influ-
enced. Parts are utilised to represent an object in most DPBM models, and they
are representations of small significant regions in feature form; if one component is
adversely impacted, the impact is passed on to the object’s parent node. As a result,
when distortion occurs, such as typical occlusion in HR-RS images, the performance
of these single-layer models will deteriorate.
An effective occlusion model should prevent this effect is transmitted to the object.
To protect these occluded representational aspects, it is natural to use as many of the
remaining pieces as possible. These single-layer models, on the other hand, are
unlikely to do this for two reasons. The first is that most of their structures rely on
linear scoring algorithms, which limit their capacity to locate occluded pieces that
alone protect them even when the mechanism is simply set to maximum scoring
mode. The presence of an object is not supported by a robust reaction from a single
element. The second consideration is that the representation element’s capacity to
infer the complete object independently in many single-layer models is also limited.
Multi-layered PCM Method for Detecting Occluded Object in Secluded … 653
When parts in object polling are absent, the performance of the remaining pieces is
also affected.
To reduce this effect, a buffer layer is constructed. Only this layer is capable of
passing this influence on to the buffer layer below. This layer essentially consists
of a collection of frequently used single-layer models, each of which is capable of
inferring the full-object entity and mimics the state of elements that are blocking one
another. As a result, the impact would be reduced with minimal performance loss.
The suggested PCM model is made up of two layers, each of which represents
distinct levels of object properties. The first layer, partial configuration, is a collection
of semantic pieces that capture local object properties, while the second layer is a
graph-structured arrangement from the first layer’s incomplete configurations. The
model’s initial layer gives it flexibility, enabling it to take deformation within item
categories into consideration. The second layer employs semi-global representation
to capture the appearances of objects and shapes with a wider area of coverage.
The model keeps both deformation modelling and shape-capturing properties. It is
important to note that the model presented here is direction-dependent, and it only
works with things that move in one way. To detect things in all directions, a group
of models will be necessary. Through the PCM model in more using the aeroplane
object as a sample, you can add more detail to this section. Figure 1 represents a
framework for object detection.
The DPBM definition, which is dynamically generated in the object’s most significant
location, differs from the semantic sections in the first layer. Beginning and defining
is the set of n semantic parts for the object category. Semantic parts are arranged
using the item’s matching undirected skeleton graph, where V = pi1.
E{(x, y)|x, y V, x = y}
654 B. Narendra Kumar Rao et al.
For object categories, this description of semantic pieces and the graph are
interconnected.
The DPBM definition, which is automatically located in the object’s most signif-
icant location, differs from the semantic sections in the first layer, by defining the
object category’s set of n semantic parts. The item’s matching is undirected skeleton
network, where V = pi1.
The above is useful to arrange semantic pieces. In this semantic piece description
and item categories, the skeleton graph is linked.
In the second layer, partial configurations are organised using a graph structure.
Each partial configuration casts a vote for the object’s existence and is capable of
deducing the bounding box of the object on its own. The graph’s edges display
the spatial connections between incomplete setups and items. This layout makes it
relaxed to avoid the problem of high occlusion rates degrading detector performance.
The presence of the object will be aided by the unaffected partial configurations in our
model. The occlusion pattern is captured in a partial configuration. The full-object
DPBM receives the greatest score, whereas the single-object DPBM performs badly.
To account for the deformation of the item caused by intraclass geometric variance,
the spatial relationship is used. In DPBM, to account the deformation of an item is
similar to a “spring” in pictorial structural models.
When it comes to constellation models, through a similar approach, when easily
approximated Gaussian distribution captures the spatial organisation, it’s worth
noting but only describes the partial configurations and the interaction of object,
incomplete configurations interactions. Although the later interactions will benefit
the model, the process of detection will be significantly simplified.
As a result, there is no need to re-annotate the bounding box. Furthermore, the
partial configuration’s spatial interdependence with the object can be directly approx-
imated using the partial configuration’s MBR and the full object’s bounding box.
Using the suggested weighted continuous NMS technique, before being clustered
for final discoveries, the incomplete configuration hypotheses from the primitive
layer are transformed into objects with a wide range of functionality. This section
also includes an occlusion inference method. The detection rotation problem is solved
by combining [2]; the results of a gathering of models are from different vectorised
bins. Using DPBM’s is a multiresolution detection capability to handle objects of
various sizes.
Multi-layered PCM Method for Detecting Occluded Object in Secluded … 655
The merging of partial configurations is a major challenge in the PCM model, because
each partial configuration votes in favour of the object’s existence. These partial-
configuration-like items were handled similarly in prior part-based methods. Our
objective is to explain why some partial configurations are more discriminative than
others by definition, particularly when there is a lot going on in the background.
Hence, some DPBMs during first layer are trained inequitably by poor training
data. Resolving this by maintaining a healthy balance in partial configurations is
by assigning weights. The appearance of partial configurations clearly reflects the
differences in weights. These parameters are based on appearance replies or partial
configuration scores. Positive samples must have a higher weighted partial config-
uration, while negative samples should have a lesser valued partial configuration,
according to the concept and vice versa. This problem is considered as two groups
of samples are involved in a categorisation challenge. Utilising a maximum margin
SVM architecture, the weights are established by learning the scores to be used as the
first layer on validation samples. The total of these mistake words should be reduced
by at least two. The hyperplane and the reciprocal of the shortest distance between
two points are sampled as closely as possible.
The first layer is a basic DPBM model with only one component. The first layer is a
basic DPBM model with only one component. Many partial configuration hypotheses
are created by using the first layer to the HR-RS pictures. Many partial configuration
hypotheses are created by using the first layer to the HR-RS pictures. A generalised
Hough transform framework is used to fuse these hypotheses because several of them
represent the same object.
The NMS technique is employed prior to clustering to reduce recurrent or high
interference detection for propositions from the same partial configuration, making
the clustering easy and to function with only one candidate of one type of partial
configuration in the local region. Based on the characteristics of the hypothesis,
clustering is subdivided into clustering based on scores and bounded box clustering.
Clustering of scores
Learning of scores and clustering is through each premise and various proportional
weights. Linear combination function is not applied to ensure that the scoring process
is not influenced by the blockage. Instead, the partial configurations with the highest
weighted score are measured.
656 B. Narendra Kumar Rao et al.
For a better comprehension of the image at the object level, occlusion inference
is useful. Every incomplete configuration, according to our definition, indicates a
potential occlusion pattern. As a result, based on partial configuration combination,
the status of occlusion can be deduced, i.e. the discovered item’s occluded region.
The retrieved characteristics for occlusion inference are the partial configuration
scores of an identified hypothesis. The occluded pattern is incorporated into the
incomplete configuration response. Training happened on a one-vs-rest multiclass
linear classifier [3, 4] SVM to tackle the classification problem in order to solve
the occlusion inference problem. To accomplish so, the notion is that for the same
occlusion pattern, the score distribution in N-dimensional space is similar.
The inferring scores are derived from the intra-group clustering intermediate
outcomes The scores (s1, …, si,…, sN) of N partial configurations in a single
bounding box output create a group. The undetected partial configuration score is set
to 1. The trained classifier is able to estimate the output’s occlusion status based on
the partial configuration’s direction. For N partial configurations, N + 1 classes are
trained, along with a completely visible class, with every class indicating a partial
configuration of an occlusion state. In this stage, the classifier was trained using the
extra occlusion dataset. The occluded states and its true partial configuration ID are
signified by each sample’s label.
Multi-layered PCM Method for Detecting Occluded Object in Secluded … 657
3.3 Datasets
Using publicly available picture datasets is to train the recommended PCM model
[2]. To increase the amount of instances accessible, flip training dataset samples
in various directions. Negative samples are randomly removed randomly by using
training photographs and not including objects.
To account for rotational variability, the circle was split into different right-handed
bins and the north is fixed to 0°. Aside from bounded boxes, there are a few other
things to consider. As the projected direction, through the direction bin’s centre
degree, which is a max-scored model reflects. Zero-pad the occlusion pictures in
aeroplane occlusion dataset is 200 pixels wide, while the ship and automobile datasets
are 100 pixels wide to adjust for truncation of objects and enhance detection results
visibility. In Algorithm 1, the least overlap t is set to 0.55. Five semantic components
are applied for the category of ship, which are structured using a line skeleton graph,
yielding three partial configurations each of which is made up of three semantic
elements that are close together. Experiment establishes four semantic parts for the
automobile category which are aligned using a skeleton graph and in the form of a
line as two partial configurations selected from three related semantic elements.
658 B. Narendra Kumar Rao et al.
In the tests, a result is considered valid if the overlapping ratio is less than one, (the
crossing ratio is used to calculate this) in between bounding box that was discovered
and the ground truth is greater than 0.6; otherwise, it is called false positive (FP),
false negatives (FNs) are the missing things, whereas genuine negatives are the rest
(TNs).
Through precision–recall curve (PRC) to represent the choosing among precision
and recall is a trade-off to measure detection performance. In trials, it also provides
the average precision (AP), and this is used to determine the area covered by PRC.
The elevated values of AP represent the better output. Furthermore, it is usually
understood that recall and accuracy are trade-offs, with different thresholds resulting
in a wide range of returns and accuracies to account for this the F1 score obtained is
used to select the appropriate threshold leading to the maximum F1 in the subsequent
detection. It is similar to a weighted sum and precise to recall values. In general, larger
ideal F1 score indicates that the method is more effective.
Through testing dataset the model’s detection performance can be evaluated. To make
the item, a sliding window method suggestion during the test phase, different angles
and sizes were used, and to apply the recommended settings for DPBMs. On the
same test datasets, the model is tested against two cutting-edge object identification
algorithms, exemplar SVMs and the original DPBM approach. To account for rotation
variation, using the same method much other variety of models from various angles
can be trained. Comparison is made by utilising identical training dataset and testing
datasets for all algorithms, and using their finest parameters.
On the aeroplane datasets, detection technology can be applied to the test in a variety
of scenarios. Through alternative models and aspects could lead to have a significant
impact on the final outcome.
have an impact on our model. The motivation for addressing partial configuration
weights stems from the belief that their ability to reflect the object’s corresponding
local features that are essentially different due to structural variances [6].
To test the intuition, the planned ten north-direction partial configurations are
referred. On the validation dataset, for both favourable and unfavourable samples,
by analysing respective models determine their mean scores. After normalisation, the
appropriate fine-tuned weights and the performance of a good partial configuration
should be good. On the bright side, there are some positive samples while performing
poorly on negative test data samples and vice versa. Poor partial configurations are
connected with the fifth, sixth and ninth partial configurations and have their weights
reduced, whereas greater weights are given to the first, fourth and eighth partial
configurations. As a result, the weights learning approach has captured this property.
Partially configured semantic parts with four semantic parts have a higher average
weight than partial configurations with three semantic parts, relies on partial configu-
ration concept. It partially supports the assumption that “pieces” with a larger surface
area are more discriminative than those with a smaller surface area. This phenomenon
can be noticed, since partial configurations comprise in comparison with partial
configurations, entire arms are more discriminative and just contains the head. The
fourth partial arrangement demonstrates how the wings can be combined in a variety
of ways and the tail is more discriminatory than the head, but spanning a similar
area ratio. In future, these phenomena could be used to guide part selection for aero-
plane identification tasks. The performance difference on dataset is negligible, but it
increases on the occlusion dataset, showing that the partial configuration difference is
amplified when just a small number of partial configured clusters are used. Finally, it
may be said that weighted limited configurations more accurately depict the object’s
response, leading to outcomes in detection that are more consistent and dependable.
The partial configurations are meant to cover parts of things that aren’t covered,
the number of whom will have an impact on our PCM model’s final performance.
Through comparing the complete model (group A + B) as a baseline, versus the
partial configuration A and B groups. This calculation is based on our occlusion
datasets and the effective execution of PRC and AP.
It supports the idea that group A is more discriminatory than group B when the
object is fully visible because larger partial configurations are occluded partly on the
occlusion dataset. Small partial configurations collect more localised un-occluded
information for the full object, whereas larger partial configurations capture more
global unoccluded information. This helps to explain why group A performs so
poorly, whereas group B and the complete model perform similarly. On the occlu-
sion dataset, it turns out that more partial configuration coverage does not always
mean higher performance because The entire model is made up of groups A and B.
660 B. Narendra Kumar Rao et al.
There are performance gains compared to either group, but at the cost of increased
computational time.
Our model can estimate an object’s occlusion state, or the occluded region of its
bounding box, based on its detection and direction. By using intermediate results,
this is the sum of all scores in occlusion inference. This model can estimate an
object’s occlusion state, or the occluded region of its bounding box, based on its
detection and direction. The first layer intermediate results, which scores all partial
configurations, and use the trained inference model and make inferences about them.
The anticipated occlusions are extremely near to the ground truth, thanks partly due
to well-predicted object trajectories. The percentage index represents our inference
model’s overall performance, which is about 60% entirely correct and 11% partially
correct. Approximately a total of 29% of detected items are incorrectly inferred
and many of them have poor scores from the first layer, making them unstable to
predict. The results, on the other hand, are pleasing and very instructive for future
inference. Precision and recall curves for the proposed PCM model for different
object detection based on datasets is as follows (Fig. 2). Red curve (PCM) shows good
performance over other SVM approach. FP-False Positives, TP-True Positives, FN-
False Negatives. Table 1 is indicative of occlusion inference and prediction accuracy.
TP
Precision =
TP + FP
TP
Recall =
TP + FN
7 Conclusion
References
1. Cheng G, Han J, Zhou P, Guo L (2014) Multi-class geospatial object detection and geographic
image classification based on collection of part detectors. ISPRS J Photogramm Remote Sens
98:119–132
2. Yu Y, Guan H, Zai D, Ji Z (2016) Rotation-and-scale-invariant airplane detection in high-
resolution satellite images based on deep-Houghforests. ISPRS J Photogramm Remote Sens
112:50–64
3. Han J et al (2014) Efficient, simultaneous detection of multi-class geospatial targets based on
visual saliency modeling and discriminative learning of sparse coding. ISPRS J Photogramm
Remote Sens 89:37–48
4. Cheng G et al (2013) Object detection in remote sensing imagery using a discriminatively trained
mixture model. ISPRS J Photogramm Remote Sens 85:32–43
5. Yao X, Han J, Guo L, Bu S, Liu Z (2015) A coarse-to-fine model for airport detection from remote
sensing images using target-oriented visual saliency and CRF. Neurocomputing 164:162–172
6. Bai X, Zmultig H, Zhou J (2014) VHR object detection based on structural feature extraction
and query expansion. IEEE Trans Geosci Remote Sens 52(10):6508–6520
7. Ranjana R, Narendra Kumar Rao B, Nagendra P, Sreenivasa Chakravarthy S (2022) Broad
learning and hybrid transfer learning system for face mask detection. Telematique 21(1):182–196
8. Narendra Kumar Rao B, Naseeba B, Challa NP, Chakrvarthi S (2022) Web scraping (IMDB)
using python. Telematique 21(1):235–247
Author Index
J
Jagannadham, D. B. V., 319, 347 N
Jayant, G. S., 157 Nagaraju Rayapati, 521
Jaya Pooja Sri, M., 593 Nagasai Anjani kumar, T., 387
Jitendra Kumar, 443 Nagasai Mudgala, 27
Jonnadula Narasimharao, 319, 337, 357 Naga Vishnu Vardhan, J., 631
Juturu Harika, 79 Naga Yamini Anche, 613
Jyothi Babu, A., 145 Nagesh Salimath, 221
Jyothi Jarugula, 473 Najeema Afrin, 319
Nakkeeran Rangasamy, 233
Nandula Haripriya, 583
K Narayana, V. A., 417
Kacham Akanksha, 613 Narendra Kumar Rao, B., 651
Kamakshi, P., 407 Naresh Tangudu, 521
Kamuju Sri Satya Priya, 583 Neelamadhab Padhy, 221, 541, 551
Kanaka Durga Returi, 651 Nidhi Jani, 109
Kanna Naveen, 27 Nitesh Kashyap, 253
Kanneganti Bhavya Sri, 1 Nitesh Pradhan, 297
Kartheek, G. C. R., 191 Nitesh Sonawane, 213
Karthika, G., 69
Karuna, G., 473
Karuppasamy, M., 145 P
Katakam Ananth Yasodharan Kumar, 1 Padma Mayukha, K., 631
Kavya, G., 593 Paleti Krishnasai, 69
Keerthi Reddy, A., 631 Pappala Lokesh, 79
Kevin Chiguano, 491 Pavan Kumar, C. S., 27
Kiran Kumar Bejjanki, 387 Piyush Chauhan, 531
Kshitiz Rathore, 243 Pooja Gupta, 275
Kunamsetti Vaishnavi, 561 Prabhu, A., 347
Pragati Tripathi, 443
Prasad Babu, K., 49
L Prashanth Ragam, 377, 397
Lalit Kane, 463 Priyakanth, R., 613
Laveesh Pant, 157 Priyanshi Shah, 109
Lopamudra Panda, 1 Pujitha, B., 593
Luis Ramirez, 491
R
M Radhika Arumalla, 337
Madhu Khurana, 481 Raheem Unnisa, 337
Madipally Sai Krishna Sashank, 69 Rahul Deo Sah, 221
Mahdi, Hussain Falih, 511 Rahul Roy, 27
Mahesh Babu Katta, 613 Raja Ram Dutta, 221
Maheswari, K., 347 Rajeswari Viswanathan, 623
Maina Goni, 377 Rajiv Singh, 275
Mamta Khosla, 243 Rama Devi Boddu, 377
Mayukh Sarkar, 435 Raman Chahar, 427
Meenakshi, M., 135 Ramesh Deshpande, 631
Author Index 665