Springer Proceddings
Springer Proceddings
Sarika Jain
Nandana Mihindukulasooriya
Valentina Janev
Cogan Matthew Shimizu Editors
Semantic
Intelligence
Select Proceedings of ISIC 2023
Lecture Notes in Electrical Engineering
Volume 1258
Series Editors
Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Napoli, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, München, Germany
Shanben Chen, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore,
Hong Kong
Rüdiger Dillmann, University of Karlsruhe (TH) IAIM, Karlsruhe, Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Dipartimento di Ingegneria dell’Informazione, Sede Scientifica Università degli Studi di Parma,
Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid,
Spain
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Intelligent Systems Laboratory, Systems Research Institute, Polish Academy of Sciences, Warsaw,
Poland
Alaa Khamis, Department of Mechatronics Engineering, German University in Egypt El Tagamoa El Khames,
New Cairo City, Egypt
Torsten Kroeger, Intrinsic Innovation, Mountain View, USA
Yong Li, College of Electrical and Information Engineering, Hunan University, Changsha, China
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, USA
Ferran Martín, Departament dʼEnginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, USA
Subhas Mukhopadhyay, School of Engineering, Macquarie University, Sydney, Australia
Cun-Zheng Ning, Department of Electrical Engineering, Arizona State University, Tempe, USA
Toyoaki Nishida, Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan
Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova,
Genova, Italy
Bijaya Ketan Panigrahi, Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
Federica Pascucci, Department di Ingegneria, Università degli Studi Roma Tre, Roma, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore,
Singapore
Joachim Speidel, Institute of Telecommunications, University of Stuttgart, Stuttgart, Germany
Germano Veiga, FEUP Campus, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Walter Zamboni, Department of Computer Engineering, Electrical Engineering and Applied Mathematics,
DIEM—Università degli studi di Salerno, Fisciano, Italy
Kay Chen Tan, Department of Computing, Hong Kong Polytechnic University, Kowloon Tong, Hong Kong
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering—quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the Publishing
Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada
Michael Luby, Senior Editor ([email protected])
All other Countries
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **
Sarika Jain · Nandana Mihindukulasooriya ·
Valentina Janev · Cogan Matthew Shimizu
Editors
Semantic Intelligence
Select Proceedings of ISIC 2023
Editors
Sarika Jain Nandana Mihindukulasooriya
Department of Computer Applications MIT-IBM Watson AI Lab
National Institute of Technology Cambridge, MA, USA
Kurukshetra, India
Cogan Matthew Shimizu
Valentina Janev Department of Computer Science
Institute Mihajlo Pupin Wright State University
Belgrade, Serbia Dayton, USA
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
v
vi Preface
Chapter Preview
Invited Papers
The first part consists of four Keynote papers that address challenges in building
knowledge graphs, as well as applications of semantic technologies and advanced
analytical solutions in Smart Manufacturing Environments and Smart Energy
Management Systems. Chapter “Towards Understanding the Impact of Schema on
Knowledge Graph Embeddings” utilizes the Deep Graph Library on two Knowl-
edge Graph schemas over the same Wright State University’s CORE Scholar data.
The authors concluded that there exists a difference in how knowledge graph
embedding models perform when trained on a schema that is rich or shallow
in design. In Chapter “Fragmenting Data Strategies to Scale Up the Knowledge
Graph Creation”, the authors introduce the KatanaG, an engine-agnostic frame-
work, designed to enhance the scalability of KG creation processes, especially when
dealing with large and heterogeneous data sources. Chapter “On the Potential of
Sustainable Software Solutions in Smart Manufacturing Environments” compares
green computing approaches and explains the advantages and disadvantages of
semantic methods. Chapter “Technologies and Concepts for the Next-Generation
Integrated Energy Services” proposes a solution for the efficient integration of
data-driven services and connecting physical energy assets in future smart grids.
Trends
The second part consists of 14 papers. The Trends and Perspectives Track first
explores the state of the art of artificial intelligence techniques in different appli-
cations, for instance, in Chapter “From Text to Voice: A Comparative Study of
Machine Learning Techniques for Podcast Synthesis” for podcast synthesis; in
Chapter “Artificial Intelligence and Legal Practice: Jurisprudential Foundations
for Analyzing Legal Text and Predicting Outcomes” for analysing legal text
and predicting outcome; in Chapter “Unveiling the Truth: A Literature Review
on Leveraging Computational Linguistics for Enhanced Forensic Analysis” for
enhanced forensic analysis; in Chapter “Navigating the Digital Frontier: Unraveling
the Complexities and Challenges of Emerging Virtual Reality” in emerging virtual
reality; in Chapter “An In-Depth Exploration of Anomaly Detection, Classification,
and Localization with Deep Learning: A Comprehensive Overview” for intrusion
detection in computer security services.
Preface vii
Additionally, in this part of the Book, the contributions from the IEdTC 2023
conference are presented that deal with trends in Information and Communication
Technologies for Education. The Specific challenges for India are explored in Chap-
ters “Challenges to Admissibility and Reliability of Electronic Evidence in India
in the Age of ‘Deepfakes’” and “Visualization and Statistical Analysis of Research
Pillar of Top Five THE (Times Higher Education)-Ranked Universities for the Years
2020–2023”.
The use of artificial intelligence techniques for evaluation in education is presented
in several contributions including Chapters “Dimensions of ICT Based Student
Evaluation and Assessment in the Education Sector”, “A Formula for Effective
Evaluation Practice Using Online Education Tool” and “Admission Prediction for
Universities Using Decision Tree Algorithm and Support Vector Machine”.
The impact of technological trends in the field of modern education is
discussed in Chapters “Effectiveness of Online Education System”, “Deciphering
the Catalysts Influencing the Willingness to Embrace Digital Learning Applications:
A Comprehensive Exploration” and Chapter “Pedagogical Explorations in ICT:
Navigating the Educational Landscape with Web 2.0, 3.0, and 4.0 for Transformative
Learning Experiences”.
Finally, in the second part, in Chapter “Comparative Analysis of Docker Image
Files Across Various Programming Environments”, the Docker technology is
discussed as an enabler for building innovative applications portable across different
computer systems.
Research
The Research Track incorporates 10 papers that analyse research gaps and offer
solutions that fill that gaps thus contributing significantly to the advancement of
semantic intelligence and artificial intelligence. Comparative analysis of machine
learning approaches is given in Chapters “Assessing Machine Learning Algorithms
for Customer Segmentation: A Comparative Study” and “Genre Classification of
Movie Trailers Using Audio and Visual Features: A Comparative Study of Machine
Learning Algorithms”. Novel approaches are elaborated using the following:
• Deep Convolution Neural Network in Chapters “Classifying Scanning Electron
Microscope Images Using Deep Convolution Neural Network” and “YOLO
Algorithm Advancing Real-Time Visual Detection in Autonomous Systems”;
• Support Vector Machines in Chapter “An Efficient Kernel-SVM-based Epilepsy
Seizure Detection Framework Utilizing Power Spectrum Density”;
• Enhanced Binary Particle Swarm Optimization in Chapter “Optimizing Feature
Selection in Machine Learning with E-BPSO: A Dimensionality Reduction
Approach”;
• Hamming distance algorithm for information retrieval in Chapter “Ranking of
Documents Through Smart Crawler”;
viii Preface
Applications
results and gave consent for publishing their innovative solutions and contributions
to science with Springer.
Contents
Invited Papers
Towards Understanding the Impact of Schema on Knowledge
Graph Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Brandon Dave and Cogan Shimizu
Fragmenting Data Strategies to Scale Up the Knowledge Graph
Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Enrique Iglesias, Ahmad Sakor, Philipp D. Rohde, Valentina Janev,
and Maria-Esther Vidal
On the Potential of Sustainable Software Solutions in Smart
Manufacturing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Simon Paasche and Sven Groppe
Technologies and Concepts for the Next-Generation Integrated
Energy Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Valentina Janev, Lazar Berbakov, Marko Jelić, Dea Jelić,
and Nikola Tomašević
Trends
From Text to Voice: A Comparative Study of Machine Learning
Techniques for Podcast Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Pankaj Chandre, Viresh Vanarote, Uday Mande, Mohd Shafi Pathan,
Prashant Dhotre, and Rajkumar Patil
Artificial Intelligence and Legal Practice: Jurisprudential
Foundations for Analyzing Legal Text and Predicting Outcomes . . . . . . . 57
Ivneet Walia and Navtika Singh Nautiyal
Unveiling the Truth: A Literature Review on Leveraging
Computational Linguistics for Enhanced Forensic Analysis . . . . . . . . . . . . 71
Deepak Mashru and Navtika Singh Nautiyal
xi
xii Contents
Research
Assessing Machine Learning Algorithms for Customer
Segmentation: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Katta Subba Rao, Sujanarao Gopathoti, Ajmeera Ramakrishna,
Priya Gupta, Sirisha Potluri, and Gaddam Srihith Reddy
Genre Classification of Movie Trailers Using Audio and Visual
Features: A Comparative Study of Machine Learning Algorithms . . . . . . 231
Viresh Vanarote, Pankaj Chandre, Uday Mande, Pathan Mohd Shafi,
Dhanraj Dhotre, and Madhukar Nimbalkar
Classifying Scanning Electron Microscope Images Using Deep
Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Kavitha Jayaram, S. Geetha, Prakash Gopalakrishnan,
and Jayaram Vishakantaiah
An Efficient Kernel-SVM-based Epilepsy Seizure Detection
Framework Utilizing Power Spectrum Density . . . . . . . . . . . . . . . . . . . . . . . 251
Vinod Prakash and Dharmender Kumar
YOLO Algorithm Advancing Real-Time Visual Detection
in Autonomous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Abhishek Manchukonda
Optimizing Feature Selection in Machine Learning with E-BPSO:
A Dimensionality Reduction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Rajalakshmi Shenbaga Moorthy, K. S. Arikumar,
Sahaya Beni Prathiba, and P. Pabitha
CRIMO: An Ontology for Reasoning on Criminal Judgments . . . . . . . . . 297
Sarika Jain, Sumit Sharma, Pooja Harde, Archana Pandey,
and Ruqaiya Thakrawala
Ranking of Documents Through Smart Crawler . . . . . . . . . . . . . . . . . . . . . . 317
Amol S. Dange, B. Manjunath Swamy, and Ashwini B. Shinde
Ensemble Learning Approaches to Strategically Shaping Learner
Achievement in Thailand Higher Education . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Sittichai Bussaman, Patchara Nasa-Ngium, Wongpanya S. Nuankaew,
Thapanapong Sararat, and Pratya Nuankaew
xiv Contents
Applications
Convolutional Neural-Network-based Gesture Recognition System
for Air Writing for Disabled Person . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Soham Kr Modi, Manish Kumar, Sanjay Singla, Charnpreet Kaur,
Tulika Mitra, and Arnab Deb
A Protection Approach for Coal Miners Safety Helmet Using IoT . . . . . . 377
Shabina Modi, Yogesh Mali, Lakshmi Sharma, Prajakta Khairnar,
Dnyanesh S. Gaikwad, and Vishal Borate
Face Cursor Movement Using OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
R. S. M. Lakshmi Patibandla, Madupalli Manoj,
Vantharam Sai Sushmitha Patnaik, Alapati Jagadeesh,
and Bathina Sasidhar
Powerpoint Slide Presentation Control Based on Hand Gesture . . . . . . . . 401
Ankit Kumar, Kamred Udham Singh, Gaurav Kumar, Teekam Singh,
Tanupriya Choudhury, and Ketan Kotecha
SQL Queries Using Voice Commands to Be Executed . . . . . . . . . . . . . . . . . 413
R. S. M. Lakshmi Patibandla, Sai Naga Satwika Potturi,
and Namratha Bhaskaruni
A Compatible Model for Hybrid Learning and Self-regulated
Learning During the COVID-19 Pandemic Using Machine
Learning Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Pratya Nuankaew, Sittichai Bussaman, Patchara Nasa-Ngium,
Thapanapong Sararat, and Wongpanya S. Nuankaew
Hand Gesture Recognition and Real-Time Voice Translation
for the Deaf and Dumb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Shabina Modi, Yogesh Mali, Rekha Kotwal, Vishal Kisan Borate,
Prajakta Khairnar, and Apashabi Pathan
IoT-Based Smart EV Power Management for Basic Life Support
Transportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
M. Hema Kumar, G. Nagalali, D. Karthikeyan, C. S. Poornisha,
and S. Priyanka
About the Editors
Dr. Sarika Jain has served in education for over 19 years and is currently serving
at the National Institute of Technology Kurukshetra (Institute of National Impor-
tance), India. She has authored or co-authored over 150 publications including both
authored and edited books. Her research interests include knowledge management
and analytics, ontological engineering, knowledge graphs, and intelligent systems.
She has been the principal investigator of sponsored research projects and works
in collaboration with various researchers across the globe, including in Germany,
Austria, Australia, Malaysia, Spain, the USA, and Romania. She serves as a reviewer
for journals published by IEEE, Elsevier, and Springer. She has been involved as a
program and steering committee member at many prestigious conferences in India
and abroad. She is a senior member of the IEEE, a member of ACM, and a Life
Member of the CSI.
Dr. Valentina Janev is a Senior Researcher at the Mihajlo Pupin Institute, Univer-
sity of Belgrade, Serbia. She has extensive experience in research, software systems
development, and maintenance in different industrial domains for clients from
Europe. She has published several conference and journal papers, books, and book
chapters on responsible knowledge management, semantic intelligence in Big Data
xv
xvi About the Editors
applications, knowledge graphs, and Big Data processing. Dr. Valentina Janev has
served as an external Expert engaged by the European Commission, Research Exec-
utive Agency for the evaluation of EU research proposals and projects. She is a senior
member of the IEEE. Dr. Janev has acted as a reviewer of respectable international
journals including Artificial Intelligence Review (Springer), International Journal on
Semantic Web and Information Systems (IGI Global), International Journal of Digital
Earth (Taylor & Francis), Information Systems Management (Taylor & Francis),
International Journal of Intelligent Information Systems (Science Publishing Group)
and American Journal of Software Engineering and Applications (Science Publishing
Group).
xvii
xviii Abbreviations
1 Introduction
Domain experts are able to utilize1 the implementation of knowledge graphs for data
insights in their respective field of study. The schematic design of a knowledge graph,
also referred to as an ontology, should be a reflection of the use case described by the
domain experts. Knowledge graphs represent data connections where a class entity
can be connected by a relationship to another entity; thus, an ontology can result in
a variety of designs dependent on the data and the developer’s needs.
1 https://dglke.dgl.ai/doc/.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 3
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_1
4 B. Dave and C. Shimizu
A rich schema design illustrates the result of identified data being represented as
class entities which will allow an ontologist to create layers to a knowledge graph
that attempts to mirror a realistic view of data and information. An ontology that
follows a shallow design allows developers to describe data without auxiliary layers
as the data may not be of significance. This direct pairing of class entities and data
values streamlines representation without a need for further abstraction. Our research
is aimed towards understanding if there is a benefit or consequence of utilizing one
design over the other when implemented with knowledge graph embedding (KGE).
The rest of this paper is organized as follows. Section 2 provides insight into the
state of the art and the foundational concepts for context of the paper, as well as the
dataset we have opted to utilize. In Sects. 3, 4, and 5, we provide our experimental
methodology, present preliminary results, and discussion thereof. Finally, in Sect. 6,
we conclude with our anticipated next steps.
2 Background
There exists research for facets of knowledge graphs whether that may be their
integration or applications. Modular ontology modelling [5] applies software engi-
neering techniques to launch and maintain a knowledge graph from ground zero.
With encoding techniques, knowledge graphs are able to be represented within a
hyperplane which allows for KGE models to be trained for predictive tasks, such
as finding missing relationships between entities or understanding the semantic of
entities and relationships. These contributions allow further experiments to continue
into a variety of fields for ontology modelling and knowledge graph usages. Section
2.2 goes on to detail the CORE Scholar data. Section 2.3 continues to describe the
differences in schema conceptualization.
The dataset we used for this research was obtained by Wright State University’s
(WSU) library. Both Figs. 1 and 2 are graphical representations of the two schemas
designed to be used with the CORE Scholar data. The CORE Scholar data is a
public repository that celebrates research and collaboration done by WSU affiliated
researchers. The repository contains data describing a multitude of publications,
including published newsletters by the college deans and photographs of historical
monuments to scientific research in a variety of fields. From the CORE Scholar
dataset, we focussed on detailing publications, the respective publishers, and their
affiliated institution.
Towards Understanding the Impact of Schema …
Fig. 1 Rich schema. Note Classes are represented with orange rectangles, relationships are represented by connected edges, and data types are represented in
yellow ovals
5
6
Fig. 2 Shallow schema. Note Classes are represented in orange rectangles, relationships are represented by connected edges, and datatypes are represented in
yellow ovals
B. Dave and C. Shimizu
Towards Understanding the Impact of Schema … 7
3 Methodology
Our research utilized Deep Graph Library’s DGL-KE2 in order to train and evaluate
our knowledge graphs with embedding models. DGL-KE consists of a variety of
embedding models; however, the knowledge graph embedding models we chose
to use for this preliminary research are TransR [4], TransE [1], ComplEx [7], and
DistMult [9]. Table 1 specifies the partition of triples used for training and evaluating.
Each model’s training took place over 500 steps at a learning rate of 0.25 and a gamma
rate of 19.9.
4 Preliminary Results
Table 2 lists the average losses of our models after training. DGL-KE optimizes with
respect to the average of both positive loss and negative loss values. As such, there
are no significant differences in model performance with or without data cleaning
as represented by TransE’s training. For the experiment, the remaining KGE models
were trained with clean data.
2 https://github.com/awslabs/dgl-ke.
8 B. Dave and C. Shimizu
5 Discussion
We include Tables 4 and 3 to display the results of our models for the metrics Mean
Rank (MR), Mean Reciprocal Rank (MRR), and Model Prediction Hits@k where
k is 1, 3, and 10. We note the measurable differences a model has when trained
on and evaluated with seen data. For visibility, the better performing metric results
are highlighted in bold. A . row is added to illustrate the difference, as an absolute
value, between a rich and shallow design’s metric results. Although there is a marginal
difference for MRR and hits@k in KGE performance, there appears to be a larger
difference in the performance in MR-based performances.
6 Conclusion
In this preliminary work, we determined that there exists a difference in how knowl-
edge graph embedding models perform when trained on a schema that is rich or
shallow in design.
Towards Understanding the Impact of Schema … 9
Table 4 KGE evaluation when trained and evaluated with all data
MR MRR H@1 H@3 H@10
TransE_NotClean Shallow_All 17.671754 0.510615 0.411413 0.548074 0.711626
Rich_All 37.021571 0.478205 0.391775 0.507560 0.651298
Delta 19.349817 0.032411 0.019638 0.040514 0.060327
TransE_Clean Shallow_All 16.004682 0.510416 0.409428 0.549654 0.712017
Rich_All 41.615039 0.482275 0.395826 0.511988 0.654758
Delta 25.610357 0.028141 0.013601 0.037665 0.057258
TransR Shallow_All 413.633237 0.092006 0.075774 0.095220 0.113197
Rich_All 434.098484 0.112367 0.096677 0.115643 0.134727
Delta 20.465247 0.020361 0.020903 0.020423 0.021530
DistMult Shallow_All 165.963736 0.340006 0.289562 0.353055 0.433333
Rich_All 265.549778 0.261939 0.228910 0.267384 0.320704
Delta 99.586042 0.078067 0.060652 0.085671 0.112630
ComplEx Shallow_All 171.694639 0.348654 0.300776 0.360821 0.437884
Rich_All 263.238816 0.268759 0.235774 0.274572 0.326594
Delta 91.544177 0.079895 0.065001 0.086249 0.111290
Future Work
We have identified a few next steps to take this research, which will take this beyond
the preliminary work presented herein.
Acknowledgements This work was funded by the National Science Foundation under Grant
2333532; Proto-OKN Theme 3: An Education Gateway for the Proto-OKN. Any opinions, find-
ings, and conclusions or recommendations expressed in this material are those of the authors and
do not necessarily reflect the views of the National Science Foundation. The authors would like to
acknowledge Andrew Eells for identifying related work.
References
Abstract In recent years, the exponential growth of data has necessitated a unified
schema to harmonize diverse data sources. This is where knowledge graphs (KGs)
come into play. However, the creation of KGs introduces new challenges, such as
handling large and heterogeneous input data and complex mappings. These chal-
lenges can lead to reduced scalability due to the significant memory consumption
and extended execution times involved. We present .Katana.G, a framework designed
to streamline KG creation in complex scenarios, including large data sources and
intricate mapping. .Katana.G optimizes memory usage and execution time. When
applied alongside various KG creation engines, our results indicate that .Katana.G
can improve the performance of these engines, by reducing execution time by up to
80% and achieve 70% memory savings.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 11
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_2
12 E. Iglesias et al.
1 Introduction
Data has exponential growth as a result of accurate devices of data generation (e.g.,
sensors, MRI scanners, or DNA Sequencers). However, data can be ingested in var-
ious formats and represent the same entities using different schemas and semantics,
thus hindering data processing. Knowledge graphs (KGs) have emerged as expressive
data structures to enable interoperability, and harmonize and integrate heterogeneous
data sources and their meaning [1]. Nevertheless, despite the increasing acceptance
of KGs in industrial [2] and academic sectors [3], KG creation is still facing chal-
lenges to scale up to real-world applications [4, 5]. Specifically, the process of KG
creation can be impacted by multiple parameters [6] (e.g., data size and heterogeneity,
data duplicate rate). Consequently, efficient methods for creating KGs are demanded
to address interoperability.
Building upon prior work in the field of databases, KGs can be conceptually
defined as a data integration system (DIS) [7]. DISs encompass an ontology denoted
as . O, which describes unified perspectives, a collection of data sources, and the
mapping rules or correspondences that establish connections between the concepts
in ontology . O and the attributes of the data sources within set . S. Various engines
have been developed for creating KGs, including RMLMapper [8], RocketRML [9],
SDM-RDFizer [10], and Morph-KGC [11]. These engines make use of the RDF
Mapping Language (RML) [12], which defines the structure of a KG in accordance
with the specifications of the Resource Description Framework (RDF).1 However,
it is important to note that various parameters described by Chaves-Fraga et al. [6]
can also exert an influence on these KG creation processes, potentially limiting their
scalability.
Problem and Proposed Solution. The primary aim of this work is to address the
aforementioned challenges related to managing heterogeneous data sources during
KG creation. Our approach, .Katana.G, is underpinned by an optimization princi-
ple that emphasizes the execution of mapping rules on smaller input data fragments,
resulting in faster execution times and reduced memory consumption. This optimiza-
tion step is an integral component of the planner introduced by Iglesias et al. [13]
and has been empirically assessed within existing engines, including SDM-RDFizer,
Morph-KGC, and RMLMapper. The experimental findings from these evaluations
underscore a notable enhancement in the performance of these engines when data
fragmentation is employed. This highlights the role of planning in the pipelines for
KG creation, demonstrating the significant impact of this optimization strategy.
Contributions. This paper makes the following contributions:
(i) Data Management Techniques: We introduce data management techniques cen-
tered around data fragmentation, designed to enhance the scalability of KG
creation processes, especially when dealing with large and heterogeneous data
sources.
1 https://www.w3.org/TR/2004/REC-rdf-primer-20040210/.
Fragmenting Data Strategies to Scale Up the Knowledge … 13
Knowledge graphs (KG) are directed edge-labeled graphs that model statements as
entities and their relationships as labeled edges [1]. The creation of a KG .G can be
defined as a data integration system . D I SG = O, S, M, where . O is a set of classes
and properties of a unified ontology, . S is set of data sources, and . M corresponds to
mapping rules or assertions that follow the concepts established in . O as conjunctive
queries over sources in . S. The execution of . M over the data sources in . S generates
the instances in .G. When creating a KG, multiple factors affect the process in terms
of memory usage and execution time. Chaves-Fraga et al. [6] define these factors as
the size, heterogeneity, number of duplicates in the raw data, and complexity of the
mapping. For that reason, different approaches have been developed to handle these
factors. For example, planning the execution and partitioning the mappings helps to
address the issue that might come when transforming complex mappings. While data
fragmentation reduces the size of the data sources, thus reducing how much data is
processed.
Mapping Rule Languages. R2RML [14] is the W3C standard for defining mapping
rules from relational databases into the Resource Description Framework (RDF)
KGs. R2RML mapping rules allow for the definition of (a) instances of a class .C or
subject definitions, (b) values of the properties of .C, and (c) instances of the pred-
icates that relate .C with other classes. The RDF Mapping Language (RML) [12]
is an extension of the W3C-standard mapping language R2RML, enhancing it
with support for logical sources (referred to as rml:logicalSource) in var-
ious heterogeneous formats, including CSV, Relational, JSON, and XML. Sim-
ilar to the W3C-standard R2RML, triples map in RML corresponds to mapping
rules that define subjects (referred to as rml:subjectMap) of an RDF class and
their properties (referred to as rr:predicateMap) with values (referred to as
rr:objectMap) sourced from logical data sources. The rr:objectMap can
14 E. Iglesias et al.
(a) Motivating example without Planning This table presents the results of executing the
motivating example with knowledge graph creations without planning
Engine Execution time Memory usage
SDM-RDFizer+Planner 70.72 sec 915.54
Morph-KGC+Planner 27.34 sec 1240.78
RMLMapper+Planner Timeout (1 h) 2812.00
(b) Motivating example with Planning This table presents the results of executing the motivating
example with knowledge graph creations with planning
We motivate our work in two real-world datasets (i.e., Dataset1 and Dataset2) that
include data about building occupancy; they correspond to two vertical fragments of
the records. The datasets (DS) have the following properties:
DS1: CSV file with a size of 19 MB with three columns: identifier, date, and hour.
DS2: CSV file with a size of 27 MB with four columns: identifier, zone, source, and
connections.
These datasets are collected in the context of the EU H2020 project PLATOON,2 and
maintain data about wind turbines. The PLATOON semantic data models are utilized
as the unified schema.3 Additionally, mapping rules of these datasets are defined in
terms of five RML triples maps partitioned into two groups. With the increase in
data generation, finding an efficient method of generating a KG from large data
sources has become necessary. For that reason, as a pre-processing method, the data
source is divided into smaller chunks, thus reducing the cost of generating the KG
and the overall burden on the system. Multiple state-of-the-art KG creation engines
have been developed like RMLMapper, RocketRML,4 SDM-RDFizer, and Morph-
KGC. Unfortunately, these tools present problems when it comes to memory usage,
especially when it comes to handling large data sources (Table 1).
Table 2 reports on the results of the execution of the RML engines on these
datasets. As observed, Morph-KGC has the best execution time but presents high
2 https://platoon-project.eu/.
3 https://kgswc.org/industry-talk-1-semantic-data-models-construction-in-the-h2020-platoon-
project/.
4 https://github.com/semantifyit/RocketRML.
16 E. Iglesias et al.
Table 2 Initial Experimental Results. The effect of .Katana.G is reported in state-of-the-art RML
engines. .Katana.G empowers execution time
No data fragmentation .Katana.G (secs.)
Engine Execution time in secs. Execution time in secs. % Saving
SDM-RDFizer 50081.2 9842.9 80.35%
Morph-KGC 3604.98 911.26 .74.72%
Table 3 Initial Experimental Results. Effect of .Katana.G in state-of-the-art RML engines (results
reported in MB). .Katana.G reduces memory usage in the three engines
No data fragmentation .Katana.G
Engine Memory usage in MB. Memory usage in MB. % Saving
SDM-RDFizer 15648.16 MB 5124.96 MB 67.25%
Morph-KGC 18777.79 MB 5365.51 MB .71.43%
memory usage; this can be attributed to the fact that Morph-KGC uses the pandas
Python library, which is known to use a great deal of memory. While the SDM-
RDFizer does not have the best execution time, it presents better memory usage.
Afterward, to see if there is an improvement in the performance of the engines, the
RML-Planner is applied. The RML-Planner [13] assesses an optimized number of
partitions considering the number of data sources, type of mapping assertions, and
the associations between different triples maps. After providing a list of partitions
and triples maps that belong to each partition, the planner determines their execution
order. A greedy algorithm is implemented to generate the partitions’ bushy tree
execution plan. Bushy tree plans are translated into operating system commands that
guide the execution of the partitions of the mappings in the order indicated by the
bushy tree. Table 3 reports an improvement in the execution time but an increase in
memory usage. Unfortunately, RMLMapper cannot generate the KG in both cases.
Thus, partitioning the mappings does increase the performance of the engines but at
the cost of higher memory usage.
Another example that motivates this work is the Renewal Energy Resource dataset
(RES) [16], which contains data collected over almost 7 years from a solar array farm.
The data source has 35,263,490 rows which is presented as a MySQL table. Given the
size of the data source, it was divided into smaller chunks to test if data fragmentation
is beneficial to the KG creation process. The table is portioned into 36 smaller tables,
where the first 35 tables have 1,000,000 rows, and the last table contains the final
263,490 rows. Each portioned data source is given to a KG creation engine and
transformed into its corresponding KG. Finally, each partitioned KG is combined
into one large KG. The resulting KG contains 1,410,539,600 triples with a size of
approximately 570 GB.
Fragmenting Data Strategies to Scale Up the Knowledge … 17
When generating a KG, the characteristics of the input mapping and its corre-
sponding data source affect the creation process. As seen in Chaves-Fraga et al. [6],
the parameters that influence the KG creation are the size, duplicate rate, heterogene-
ity of the input data, and complexity of the input mapping. Therefore, partitioning the
data source is necessary since applying the transformation to the whole data source
is very expensive regarding memory usage and execution time. When executing the
RES dataset without portioning the data source, it was observed that after 48 hours,
not only did the creation process had not ended, but the first RML triples map still
needed to be completely processed. The number of triples generated from the first
triples map is 176,317,450; this amount consumes much memory, slowing the over-
all creation. Eventually, it will consume all available memory, stopping the process
itself. By partitioning the data source, the created KGs are much smaller. Thus, the
required memory is much less as well. Additionally, combining the smaller KGs
as they are generated at the command line level reduces the memory usage of KG
creation engines. Therefore, this work seeks to expand what was established with the
RML-Planner, instead of focusing on partitioning the mappings, but on partitioning
the data source to determine if there can be a further increase in the performance of
an engine without increasing the cost of execution.
This paper tackles the problem of reducing memory usage during the creation of
a KG .G specified in terms of a data integration system . D I SG = O, S, M. Our
solution resorts to data fragmentation strategies to transform the . D I SG into an
equivalent data integration system . D I S_N ewG = O, S_N ew, M_N ew where
data sources in . S_N ew are horizontally and vertically fragmented and mapping
rules in . M are adjusted accordingly in . M_N ew; these techniques are implemented
in .Katana.G. A data source . Si in . S is vertically fragmented according to each
triples map .ti where . Si is the logical source. For each .ti and a new copy of . Si
is created, . Si only includes the attributes used in the mapping rules that define
the subject, attributes, and properties of .ti are projected. Furthermore, horizontal
fragmentation is performed by partitioning . Si based on a given threshold .σ that
indicates how many records will be included in each partition of . Si . The triples
map .ti is rewritten according to all these partitions. The resulting partitions of . Si
and the rewritten triples maps are included in . S_N ew and . M_N ew, respectively.
Figure 2 illustrates with our running example how .Katana.G transforms data sources
and mapping rules from a data integration system. Figure 2a presents triples map
TriplesMap1 specified over S_1.csv; it defines instances of the class ex:C1
and the predicates ex:p1, ex:p3, and ex:p4. These predicates are, respectively,
expressed in terms of the attributes a_1, a_2, and id. Source S_1.csv is com-
prised only by these attributes but a_3, a_4, and a_5 as well. Therefore, .Katana.G
is applied to project source S_1.csv to simplify its transformation. As it is seen
in Fig. 2a, a vertical fragmentation is used to reduce source S_1.csv to only
18 E. Iglesias et al.
(a) Original Data Source and Mappping. (b) Results of applying vertical fragmenta-
tion.
Fig. 2 .Katana.G workflow. This figure illustrates.Katana.G workflow. It shows how the data sources
are fragmented
This study aims to determine the impact of data fragmentation techniques imple-
mented in .Katana.G on the KG creation process. For that reason, the RML-compliant
engines RMLMapper, SDM-RDFizer, and Morph-KGC are used, and executed in
combination with .Katana.G. The empirical evaluation seeks to answer the following
research questions:
RQ1) How does data source fragmentation affect the performance of the state-of-
the-art RML-compliant engines during KG creation?
RQ2) What is the impact of the mapping assertions and volume of the data sources
on execution time and memory consumed by KG creation engines?
RQ3) What is the impact on the execution time of the execution of mapping assertions
when applying fragmentation to the data source?
Benchmarks. The benchmark is built from real-world datasets created in the context
of the EU H2020 funded project PLATOON.5 The Renewal Energy Resource dataset
(RES) [16] comprises two mapping files containing five RML triples maps. One
of the mappings consists of five mapping rules defining subjects, and 20 mapping
rules defining properties using other triples maps. The other mapping rule also has
5 mapping rules defining subjects and 10 mapping rules defining properties. The
source used is a MySQL data table containing data measured from a solar array
farm over almost 7 years; it presents measurements regarding panel temperature,
insulation, etc., and indicates from which plant it was measured and a timestamp.
The data table has 35,263,490 rows (ca. 20 GB). Only 5,000,000 rows will be used
instead of all the data source rows. This decision was made so that the experiments
could be executed in a reasonable time.
RML Engines. These are the KG creation engines used for the experiments:
RMLMapper v4.12,6 Morph-KGC v2.4.0,7 and SDM-RDFizer v4.6.7.8
Metrics. Execution time is the considered metric to determine the performance of the
RML engines. Execution time is the elapsed time required to fragment the raw data
and execution of the partitions and combination of the smaller KGs to generate the
intended KG. The partitioned data sources are executed in parallel, and the execution
of the partitions represents the most amount of execution time. It is measured as the
absolute wall-clock system time, as reported by the time command of the Linux
operating system. The timeout is 15 hours. Memory usage determines the maximum
memory used to generate the KG. It was measured by using the Python library
malloc. The malloc library measures the memory usage in Kilobytes; the results
are converted to Megabytes for ease of understanding.
5 https://platoon-project.eu/.
6 https://github.com/RMLio/rmlmapper-java.
7 https://github.com/morph-kgc/morph-kgc.
8 https://github.com/SDM-TIB/SDM-RDFizer.
20 E. Iglesias et al.
5 Related Work
9 https://github.com/SDM-TIB/KatanaG.
Fragmenting Data Strategies to Scale Up the Knowledge … 21
As mentioned earlier in this work, the complexity of a triples map affects the per-
formance of the KG creation engine executing it. In other words, more complex
mappings require more resources to transform. Therefore, Mapping Partitioning
seeks to tackle the complexity of a mapping by dividing it into smaller and much
simpler ones. Different approaches have seen the benefits of using mapping parti-
tioning for KG creation. Iglesias et al. [13] apply mapping partitioning by grouping
triples maps by data sources and then generating new mapping files from the triples
maps grouping. An execution plan in the form of a bushy tree plan is defined by
determining which groupings have overlapping properties and whether joins exist
between them. The leaves represent the execution of the mappings, and the inner
nodes are union operators that combine the resulting RDF triples from the leaves.
Additionally, if there exists overlapping properties between the leaves, a duplicate
removal process is applied. Morph-KGC [11] also utilizes mapping partitioning but
in a different manner. Morph-KGC divides each triples map by the same number of
rr:predicateObjectMap that it has. For example, if a triples map has three
rr:predicateObjectMap, Morph-KGC partitions the triples into three smaller
triples maps, where each new triples map has one rr:predicateObjectMap
from the original mapping and its rr:subjectMap. SDM-RDFizer [10] does not
apply mapping partitioning, but it does create an execution plan for the triples maps
by determining which triples maps have overlapping predicates and then executing
first those with the highest overlap.
Data fragmentation seeks to reduce a data source into various smaller data sources.
In the context of KG creation, smaller data sources take less time and memory to
transform into a KG. Multiple approaches have surfaced over the years that utilize
data fragmentation to improve the KG creation process. MapSDI [18] utilizes a ver-
tical fragmentation method for data fragmentation, where the data source of a triples
map is projected in such a way that what remains is the triples map needs, as well
as removing duplicate records from the projected data source. SDM-RDFizer [10]
applies vertical fragmentation when transforming CSV files and relational databases
when only the table name is provided. Morph-KGC [11] presents a hybrid approach
to data fragmentation. Since Morph-KGC divides each triples map into smaller triples
maps, the original data source must be adapted to the new triples map. For that rea-
son, Morph-KGC applies vertical fragmentation to project the data source. Then,
it applies horizontal fragmentation by dividing the projected data sources into sub-
groups and transforming them. Unfortunately, Morph-KGC does not have criteria
for partitioning the data source, leaving it to the Python library pandas.
22 E. Iglesias et al.
We have described .Katana.G and illustrate the need of scaling up the KG creation
process in the energy domain. Albeit initial, these results put into perspective the
effects of performing optimization techniques in state-of-the-art RML engines. In the
future, we aim to formalize this process further and substantiate our proposed meth-
ods’ effectiveness. Additionally, we are committed to creating benchmarks accessi-
ble to the scientific community, facilitating the assessment of the data management
techniques implemented by the community to scale up the process of KG creation.
Acknowledgements This work has been partially supported by the EU H2020 project PLATOON
and the Federal Ministry for Economic Affairs and Energy of Germany (BMWK) in the project
CoyPu (project number 01MK21007[A-L]). Maria-Esther Vidal has been supported by the project
TrustKG-Transforming Data in Trustable Insights with grant P99/2020.
References
13. Iglesias E, Jozashoori S, Vidal M (2023) Scaling up knowledge graph creation to large and
heterogeneous data sources. J Web Semant 75:100755. https://doi.org/10.1016/j.websem.2022.
100755
14. R2RML: rdb to rdf mapping language. https://www.w3.org/TR/r2rml/ (2012)
15. Özsu MT, Valduriez P (1999) Principles of distributed database systems. Second Edition.
Prentice-Hall
16. Janev V, Vidal ME, Pujić D, Popadić D, Iglesias E, Sakor A, Čampa A (2022) Responsible
knowledge management in energy data ecosystems. Energies 15(11). https://doi.org/10.3390/
en15113973
17. Maria P (2022) CARML: a pretty sweet rml engine. https://github.com/carml/carml
18. Jozashoori S, Vidal M (2019) Mapsdi: a scaled-up semantic data integration framework for
knowledge graph creation. In: ODBASE
On the Potential of Sustainable Software
Solutions in Smart Manufacturing
Environments
Abstract Managing and using business data is considered as key success factor
of modern companies and enterprises. Application domains range from the medi-
cal sector (cf. smart healthcare) to automated driving and digital manufacturing (cf.
smart manufacturing and German Industry 4.0). Internet of things (IoT) landscapes
are often built for data acquisition and process monitoring, in which many small dis-
tributed devices collect information about business-relevant processes. After data col-
lection, so-called data-driven applications are used, for example, to support planning
activities and strategic decisions, to optimize internal processes, or to help uncover
sources of error and thus reduce error rates during production. In the latter two cases,
the use of digital technologies also results in more sustainable resource utilization
of important raw materials. At the same time, digital technologies are responsible
for a significant share of energy consumption. Based on a country’s energy mix, ICT
applications are thus responsible for a non-negligible share of climate-damaging
emissions such as carbon dioxide (CO.2 ). These emissions are widely considered
avoidable, as they have been shown to contribute negatively to climate change. In
our work, we highlight approaches to make ICT more sustainable. To this end, we
address, among other things, data validation in smart manufacturing and explain the
advantages and disadvantages of semantic methods.
S. Paasche
Automotive Electronics, Robert Bosch Elektronik GmbH, 38228 Salzgitter, Germany
S. Paasche (B) · S. Groppe
Institute of Information Systems, University of Lübeck, 23562 Luebeck, Germany
e-mail: [email protected]
S. Groppe
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 25
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_3
26 S. Paasche and S. Groppe
Fig. 1 The number of networked devices has been growing steadily for years and exceeds the
world’s population. The figure is taken from a keynote by Prof. Dr. Sven Groppe (University of
Lübeck)
1 Introduction
1 https://www.statista.com/statistics/871513/worldwide-data-created/.
2 https://yearbook.enerdata.net/electricity/electricity-domestic-consumption-data.html.
3 https://www.verivox.de/strom/themen/1-kilowattstunde/.
On the Potential of Sustainable Software Solutions … 27
consumption of 23,653 TWh (year 2020), means over 5,000 TWh. Since energy
production is not yet climate-neutral, ICT is responsible for tons of climate-damaging
emissions, depending on the energy mix of a country.
Keeping these emissions as small as possible by reducing computing resources is
a major task we as computer scientists and software developers have to face during
the next years and decades.
Our work is structured as follows: Sect. 2 introduces general techniques and meth-
ods to integrate sustainability in software projects. Afterward, we exemplify these
methods with an use case from smart manufacturing area. In Sect. 4, we discuss main
development decisions of our use case. Finally, we conclude our work in Sect. 5.
2 Green Computing
Fig. 4 Power in Watt (W) per transistor over the past 50 years
The next step is to determine what hardware we have or require. In this process,
we should also take into account the environmental costs that the production of
new hardware entails. In general, it can be seen from Fig. 34 that hardware is steadily
becoming more efficient. The number of transistors and thus the chip performance has
increased almost exponentially, whereas energy consumption has remained almost
constant over the past 20 years.
This trend is also reflected in the energy consumption per transistor (see
Fig. 4).5 This has fallen exponentially over the past decades.
The final step is to design the software in a sustainable manner. This includes the
selection of the programming language, the skillful integration of existing frame-
works, and the general software architecture and design patterns. The goal should
Our use case addresses data validation in manufacturing lines. Figure 5 illustrates
our problem. Smart machines continuously generate data during manufacturing. By
means of a data validator, we want to ensure that only consistent and error-free
datasets are stored, since only this clean data enables meaningful analyses. A tradeoff
between the cost to store valid data and the benefit of clean data arises [6]. In order
to address this tradeoff, we have made our initial consistency checker (CC) more
efficient step by step in an iterative development process.
Our FullCC uses ontologies to map our machine data into a semantic graph struc-
ture [3]. For this we use the Resource Description Framework6 (RDF). This graph
structure is traversed in the actual validation step by using SPARQL Protocol And
RDF Query Language7 (SPARQL) queries. The queries contain the characteristics
of the discrepancies we know about.
With our GreenCC [5], we have sacrificed accuracy and functionality by using
a heuristic approach (see Fig. 6). Our LightCC predicts the number of expected
messages and detects time frames in which inconsistencies are more likely. In these
cases, our FullCC can be activated to perform an exact check.
As the results in Table 1 show, this step has a positive effect on the energy demand
Fig. 5 Tradeoff between costs for data collection and benefit of gathered data
6 https://www.w3.org/RDF/.
7 https://www.w3.org/TR/sparql11-query/.
30 S. Paasche and S. Groppe
Fig. 6 The LightCC monitors an incoming stream for missing and multiple messages. In time
frames with a higher likelihood of inconsistencies, it initiates an accurate check
Table 1 Energy consumption in kilowatt-hours (kWh) and operating costs for small, medium, and
large manufacturing plant on a daily basis
Approach Small plant per day Medium plant per day Large plant per day
(Costs.\day) (Costs.\day) (Costs.\day)
Flink (all) 1.949 kWh 7.308 kWh 12.180 kWh
(24.69 Cent) (92.59 Cent) (154.32 Cent)
LightCC with Change 1.229 kWh 4.608 kWh 7.680 kWh
Detection
(15.57 Cent) (58.38 Cent) (97.31 Cent)
FullCC (1&2) 1.251 kWh 4.692 kWh 7.820 kWh
(15.85 Cent) (59.45 Cent) (99.08 Cent)
FullCC (all) 1.261 kWh 4.728 kWh 7.880 kWh
(15.97 Cent) (59.90 Cent) (99.84 Cent)
Automaton (all) 0.963 kWh 3.612 kWh 6.020 kWh
(12.20 Cent) (45.76 Cent) (76.27 Cent)
(LightCC vs. FullCC). The direct impact for a plant is relatively small, but seen on
the number of validators to be used, the saving is significant.
In a subsequent extension, we have designed an automaton structure that enables
immediate validation and thus keeps the amount of cached data to a minimum [4].
In addition, validation of the content no longer takes place via SPARQL queries but
template matching. These adjustments enable further energy savings.
Overall, our extensions enable to reduce the initial daily energy requirement in a
medium-sized plant from 7.308 kWh (Flink) to 3.612 kWh (Automaton).
On the Potential of Sustainable Software Solutions … 31
4 Discussion
5 Conclusion
Acknowledgements This work has been supported by AE/MFT1 department of Robert Bosch
Elektronik GmbH.
References
Abstract In recent years, as part of the European Union’s initiatives to help combat
climate change and reduce greenhouse gas emissions, the Citizen Energy Commu-
nities (CEC) concept was promoted with a primary objective to enhance the self-
consumption of locally produced renewable energy. The integration of distributed
energy resources (DERs) requires the orchestration of tools and services on edge and
cloud levels. This paper describes an approach to establish and validate an SGAM-
compliant software platform with deployed data-driven services for holistic control
and energy dispatch optimization. The developed by and deployed at the Institute
Mihajlo Pupin (IMP) platform has been tested for a CEC from Spain in the NEON
project framework. As part of the future work, additional short-, mid-, and long-term
planning services will be integrated and tested using data from the IMP campus.
1 Introduction
In the past few years, particularly in Europe, a significant number of measures have
been taken to develop and validate future scenarios that target the “Net Zero CO2
Emissions by 2050” goals. According to the International Energy Agency, the energy
sector is responsible for around three-quarters of global greenhouse gas (GHG)
emissions [1] and hence the uptake of all the available technologies and emissions
reduction options is crucial for the implementation of the foreseen decarbonization
scenarios.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 33
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_4
34 V. Janev et al.
The focus of this paper is the electricity value chain. In the centralized system
(as it used to be in the twentieth century), electricity is produced through the gener-
ation system (see Fig. 1, part 1), transported through the transmission system (see
Fig. 1, part 2), and is distributed to the end users through the distribution system (see
Fig. 1, part 3). Nowadays, with solar and wind power on the rise and integrated with
consumption devices, there is a need for new equipment and monitoring and control
systems to make the whole power system operate flexibly. Smart Energy Management
(SEM) refers to a variety of novel concepts and technologies, serving both the energy
generation and consumption sides, such as energy efficiency, demand management,
Smart Grid, micro-grids, renewable energy sources, and other emerging solutions.
SEM tools are built upon advanced edge-cloud computing frameworks, Big Data
Analytics techniques, AI-driven methodologies, novel integration approaches based
on semantic technologies, and others. SEM solutions are deployed on the consumers’
side (buildings, districts) in order to achieve holistic optimization of the use of locally
distributed energy resources (wind, solar, EV charging stations, batteries), improve
the self-consumption, and lower the costs of electricity used from the grid. European
Union legislation refers to these initiatives as Energy Communities or Citizen Energy
Communities (CEC). CECs vary in size, configuration, and capacities in terms of
the renewable energy sources involved, as well as other devices deployed, including
energy storage batteries, energy consumption devices, and green hydrogen produc-
tion devices, among others. The primary objective shared by these initiatives is to
enhance the self-consumption of locally produced renewable energy.
This paper discusses the approach of building and deploying a software platform
that will enable and enhance monitoring and control of smart communities. It is
organized as follows. Section 2 explains the topic of smart communities; Sect. 3
presents the process of design and deployment of an SGAM-compliant platform
at the Institute Mihajlo Pupin and Sect. 4 discusses the approach for platform and
services validation.
In the last years, in Europe, there has been a notable increase in the number of citizen-
led energy initiatives focused on producing, distributing, and consuming energy
from renewable energy sources (RES) at a local level. Grid operators (distribution
and transmission) grid stand to gain advantages from the rise of citizen-led energy
initiatives, for instance, reduced maintenance and operation costs resulting from
improved grid stability and lower transmission losses, courtesy of the increased
hosting capacity for local renewable energy sources. However, in order to establish
a smart community, substantial involvement of end users and citizens is needed.
Service providers, which may include ICT companies specializing in integrating
various energy services, can also derive benefits from these initiatives. They may earn
service fees based on the contracted share of energy savings and receive payments for
providing unlocked flexibility and automated demand response (DR) mechanisms [2]
under Energy Performance Contracting (EPC) [3] and Pay-for-Performance (P4P)
arrangements [4] established with utilities. These initiatives often generate local jobs,
ranging from the installation and maintenance of renewable energy systems to the
development of innovative technologies and services [5]. Moreover, they encourage
entrepreneurship and foster a supportive ecosystem for local businesses, such as
renewable energy equipment suppliers, energy consultants, and energy efficiency
specialists.
Figure 1 illustrates an example of a control center established to integrate energy
services, supervise self-consumption, dispatch electricity in the smart community,
and control the export to the main grid. Examples of services that have to be deployed
in such centers are given in Table 1.
the platform to suit the specific needs and requirements of different CECs, while
maintaining stability, reliability, and consistency.
In Fig. 2, we present the platform architecture. Business Layer encompasses the
applications and dashboards that facilitate the management and visualization of data.
This layer focuses on providing user-friendly interfaces and tools, on one side for
RES Production sizing and planning, and on the other, for CEC monitoring and
control of electricity and financial data. The financial data is related to the business
arrangements and contracting mechanisms.
Function Layer constitutes a crucial aspect of the platform architecture, as it plays a
vital role in enabling the desired energy management capabilities and services within
CECs. Example services that are part of this layer are
• Self-consumption management tool,
• RES Production forecasting,
• Flexibility forecasting,
• Non-intrusive load monitoring,
• User energy efficiency benchmarker,
• Holistic energy dispatch optimization, and
• Flexible assets consumption dispatcher.
38 V. Janev et al.
4 Platform Validation
The SGAM-compliant platform was validated within the EU project NEON (Next-
Generation Integrated Energy Services fOr Citizen Energy CommuNities) for the
POLÍGONO INDUSTRIAL LAS CABEZAS CEC—Spain) and will be validated in
OMEGA-X (Orchestrating an interoperable sovereign federated Multi-vector Energy
data space built on open standards and ready for GAia-X) project for the Institute
Mihajlo Pupin (IMP) R&D Campus.
To assess and measure the performance of the pilot sites during operation, it is
crucial to evaluate how the goals and objectives of the pilot sites are achieved. This
evaluation is carried out using scientific methodologies to provide accurate and reli-
able results. Key Performance Indicators (KPIs) provided means to quantify different
metrics and gain insights into the specific and overall performance of the CECs. The
use of KPIs allowed for a standardized and systematic approach to measuring and
evaluating the effectiveness of the solutions. The identified KPIs were categorized
into several key areas:
• Energy Efficiency KPIs account for the optimization of users’ energy usage
through the exploitation of demand flexibility and energy efficiency of multi-
carrier opportunities. It focuses on the benefits derived from the holistic
cooperative Demand Response (DR) strategy implemented within the CECs.
• The Economic KPIs evaluate the economic savings resulting from changes in
user behavior as a result of their engagement and energy usage following the
recommendations and services provided for the CECs and the platform.
• The Comfort KPIs assess the benefits experienced by end users in terms of
their indoor environment. It aims to measure the improvements in comfort levels
resulting from the implementation of energy efficiency services.
Technologies and Concepts for the Next-Generation Integrated Energy … 39
• User Engagement KPIs are designed to describe the behavior and interaction of
users with the CEC services and the platform. These KPIs provide insights into
the level of engagement and participation of users within the CEC ecosystem.
• The Social KPIs explore how the required levels of flexibility intersect with social
norms and everyday practices, such as routines and family life. It also considers
the effects of CECs on health and well-being, emphasizing the social impact of
energy services/solutions.
• Environmental KPIs evaluate the impact of NEON solutions on the local envi-
ronment, focusing on aspects such as carbon footprint reduction, greenhouse gas
emissions, and other environmental indicators.
• The technical category encompasses KPIs that evaluate different technical char-
acteristics of the CEC services and systems. These KPIs provide insights into the
performance, reliability, and functionality of the technical infrastructure.
By defining and measuring these diverse categories of KPIs, one can comprehen-
sively evaluate the performance and impact of the proposed solutions. This allows
for evidence-based decision-making, continuous improvement, and the refinement
of the platform and services to ensure optimal outcomes within the CECs.
5 Discussion
The platform was designed, installed, and tested at the Institute Mihajlo Pupin
premises in the NEON project framework, and has been adopted for the forthcoming
activities in SINERGY [19] and OMEGA-X projects [20].
Example from Spain:
In the NEON framework, this installation serves as a crucial step in the develop-
ment and validation of the platform’s capabilities. During the testing phase, services
for energy dispatch optimization, demand, and production forecasting have been
put to the test. These services focus on optimizing the dispatch and distribution of
energy resources within the platform. By analyzing the available data and utilizing
advanced algorithms for production and demand forecasting and optimization, the
energy dispatch optimization service aims to maximize the efficiency and effective-
ness of energy distribution. The data utilized in the testing process is sourced from
Spanish CEC, providing a real-world context for evaluating the performance and
functionality of the platform. Overall, the installation of the platform at the Institute
premises and the subsequent testing using data from Spain represents a significant
milestone in the development and evaluation of the NEON project.
Example from Serbia:
Activities in SINERGY and OMEGA-X frameworks contribute to the refinement and
enhancement of the platform’s capabilities, ensuring its suitability for deployment
within Citizen Energy Communities (CECs) and promoting the efficient management
and utilization of renewable energy resources. The IMP team is looking for strategies
40 V. Janev et al.
to (1) reduce emissions and optimize costs, by focusing on the installation of on-site
renewable electricity and storage solutions, as well as (2) methods for integration of
EV chargers. Thermal and electric storage solutions will complement the existing
installation to maximize the use of locally produced electricity. In the scenario, a
combined district modeling with a prospective scenario of the Serbian electricity
mix and hourly electricity prices has been used.
6 Conclusion
Acknowledgements This work was supported by the EU H2020 funded projects SINERGY
(Capacity building in Smart and Innovative eNERGY management, GA No. 952140), NEON (Next-
Generation Integrated Energy Services fOr Citizen Energy CommuNities, GA No. 101033700); and
OMEGA-X (Orchestrating an interoperable sovereign federated Multi-vector Energy data space
built on open standards and ready for GAia-X, GA No. 101069287).
References
1. https://www.iea.org/reports/global-energy-and-climate-model/net-zero-emissions-by-2050-
scenario-nze
2. Jelić M, Batić M, Tomašević N (2021) Demand-Side Flexibility Impact on Prosumer Energy
System Planning. Energies, 14(21):7076
3. Shang T, Zhang K, Liu P, Chen Z (2017) A review of energy performance contracting business
models: status and recommendation, Sustain. Cities Soc 34:203–210. https://doi.org/10.1016/
J.SCS.2017.06.018
4. Szinai J et al. (2017) Putting your money where your meter is: a study of pay-for-performance
energy efficiency programs in the United States
Technologies and Concepts for the Next-Generation Integrated Energy … 41
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 45
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_5
46 P. Chandre et al.
1 Introduction
2 Literature Survey
models, support vector machines, genetic algorithms, and particle swarm optimi-
sation. The application of computational intelligence methods to speech acoustics
analysis is also covered in the paper. These applications include speech recognition,
speaker identification, emotion recognition, and speech synthesis. The author offers
a thorough analysis of the literature in each of these fields, emphasising the benefits
and drawbacks of various approaches. The discussion of potential future paths for
computational intelligence research in speech acoustics analysis brings the paper to a
close. The author recommends that in order to increase the precision and robustness
of speech analysis systems, future research should concentrate on the development
of hybrid techniques that combine various computational intelligence techniques.
Overall, the paper provides a valuable resource for researchers and practitioners
working in the field of speech acoustics analysis, as well as for those interested
in the application of computational intelligence techniques in other areas of signal
processing.
The paper entitled “FastSpeech: Fast, Robust and Controllable Text to Speech” by
Yi Ren et al. [13] provides a detailed description of a new text-to-speech (TTS) system
that utilises a feed-forward transformer network to produce high-quality and control-
lable speech output. The significance of TTS technology in a variety of applications,
such as virtual assistants, audiobooks, and language learning, is discussed at the
outset of the paper. The author then discusses the drawbacks of current TTS systems,
including their slow processing speeds and absence of speech output control. The
suggested FastSpeech system’s design is covered in the following part of the paper.
This system is made up of three modules: an encoder, a duration predictor, and a mel-
spectrogram predictor. The duration predictor determines the length of each phoneme
in the input text, while the encoder turns the input text into a series of concealed repre-
sentations. A mel-spectrogram is produced by the mel-spectrogram predictor using
expected phoneme durations and hidden representations. The paper also discusses
a number of methods for enhancing the system’s performance and robustness, such
as data augmentation, teacher-student training, and a post-processing algorithm that
modifies the output speech’s pitch and speed. The author then offers a thorough anal-
ysis of the FastSpeech system, contrasting its effectiveness with several other TTS
systems that are already in use on different datasets. The findings demonstrate that
compared to current TTS systems, FastSpeech is quicker, more reliable, and offers
greater control over voice output. The paper ends with a discussion of the FastSpeech
system’s possible applications, which include customised TTS and speech synthesis
for low-resource languages. Over-, the paper provides a valuable contribution to the
field of TTS by presenting a new system that addresses several limitations of existing
TTS systems and achieves state-of-the-art performance on various datasets.
The paper entitled “Choice of Voices: A Large-Scale Evaluation of Text-to-Speech
Voice Quality for Long-Form Content” by Julia Cambre et al. [14] provides a compre-
hensive evaluation of text-to-speech (TTS) systems for long-form content, specifi-
cally assessing the perceived quality of different TTS voices by human listeners. The
introduction to the article discusses the value of TTS technology for accessibility,
education, and entertainment, as well as how crucial voice quality is to a satis-
fying user experience. The author then discusses the shortcomings of current TTS
48 P. Chandre et al.
TTS systems for low-resource languages. The use of found data, such as text data,
speech data, and other kinds of data sources that can be used to create TTS systems,
is then covered. The study then looks at various methods for TTS synthesis using
data that has already been collected, such as rule-based systems, statistical para-
metric systems, and hybrid systems that combine both statistical and rule-based
approaches. The survey also discusses different acoustic modelling methods, such
as deep learning, neural networks, and hidden Markov models (HMMs). The study
then offers a thorough analysis of case studies in which data was discovered and
used to create TTS systems for low-resource languages. These case studies cover a
variety of tongues, including Kiswahili, Wolof, and Yoruba.
The poll goes over the methods used to create TTS systems for these languages
as well as the data sources used in each case study. The survey comes to a close by
discussing the present state of TTS synthesis using the data that was collected and
outlining potential future research areas. It highlights the significance of creating TTS
systems for languages with limited resources and the possibility of using discovered
data to do so. Overall, the poll is a useful tool for academics and professionals
working on TTS syn thesis for languages with limited resources. It emphasises the
potential of using discovered data to create TTS systems for languages with limited
resources and can guide further study in this field.
3 System Methodology
A number of components would probably be present in the system design for the
comparative study of machine learning methods for podcast synthesis. There would
first be a dataset of written text that needed to be translated into audio, such as scripts
or transcripts of already-published podcasts. A natural language processing (NLP)
component would then be added, which would analyse the text and isolate important
elements like sentiment, tone, and structure. A speech synthesis component would
then be present, which produces audio output based on the text and NLP analysis.
This could involve different techniques, such as concatenative synthesis, parametric
synthesis, or neural TTS (text-to-speech) models. The assessment component would
then assess the output quality of the synthesised audio. This may include both objec-
tive measurements, like the word error rate and signal-to-noise ratio, and subjective
user input, like assessments of clarity and naturalness. For each of these compo-
nents, various machine learning techniques would be tested as part of the compara-
tive research, and their performance would be compared across various metrics. The
objective would be to establish the most efficient methods for podcast synthesis and
assess their applicability in the real world.
In this research paper, we conduct a comparative study of different machine
learning techniques for synthesising podcast audio from text. Our methodology
involves the following steps:
50 P. Chandre et al.
Data collection: We collect a diverse set of text data, including articles, blog posts,
and other written content, which we use as input for the podcast synthesis models.
Model training: We train multiple machine learning models using different tech-
niques, including natural language processing, speech synthesis, and deep learning.
Each model is trained on the same dataset and optimised for podcast audio synthesis.
Audio generation: We use the trained models to generate podcast audio from the text
data. We evaluate the quality of the synthesised audio using both objective metrics
(such as signal-to-noise ratio and word error rate) and subjective user feedback.
Comparative analysis: We compare the performance of the different machine
learning models, taking into account factors such as audio quality, computational
efficiency, and ease of use.
Result interpretation: We interpret the results of our comparative analysis, identi-
fying the strengths and weaknesses of each technique and providing recommenda-
tions for future research and development.
Overall, our methodology aims to provide a rigorous and comprehensive evalua-
tion of machine learning techniques for podcast synthesis, with the goal of advancing
the state of the art in this emerging field.
4 Discussions
Based on my understanding and assuming that there are some potential gaps in the
existing research:
Lack of comparison across a broader range of machine learning techniques:
While the research paper focuses on comparing several machine learning techniques
for pod cast synthesis, there may be other techniques that were not considered. Future
research could explore a wider range of techniques to see if there are any that perform
better.
Limited consideration of non-English languages: It is not clear from the title
whether the research papers in question are only concerned with podcast synthesis
in English. If this is the case, then there may be a gap in the literature on podcast
synthesis in other languages. Future research could explore podcast synthesis in
other languages, particularly those with more complex grammatical structures or
tonal languages.
Lack of exploration of different types of podcasts: The synthesis of podcasts that
are based on written text seems to be the main focus of the study paper. Podcasts come
in a wide variety of formats, including news programs, story shows, and interview-
based shows. Future studies might examine whether specific podcast types are better
adapted to particular machine learning approaches.
From Text to Voice: A Comparative Study of Machine Learning … 51
Need for more evaluation metrics: The study paper may have used some metrics to
assess the effectiveness of various machine learning methods for podcast synthesis,
but additional evaluation metrics may be required. Researchers could, for instance,
take into account metrics for the podcast’s perceived quality or the naturalness of the
synthesised speech.
Lack of exploration of the impact of voice on listener engagement: There may
be a gap in the literature regarding how the voice used in the synthesis process
impacts listener engagement, even though the study article seems to be concen-
trated on the technical aspects of podcast synthesis. Future studies might examine
which synthesised speech voices or speaking patterns are more interesting to listeners
(Table 1).
The field of machine learning has seen significant advancements in recent years,
and as a result, we now have access to various techniques that can convert text to
speech. In this discussion, we will compare some of the most popular techniques
for podcast synthesis, which is the process of converting written text into an audio
podcast.
Rule-based systems: Rule-based systems are one of the earliest methods of audio
synthesis. In this method, text is translated into speech using a series of rules. The
guidelines may be founded on grammatical, syntactic, or other language norms. The
ability of this method to produce speech that sounds natural, however, is constrained
because it is challenging to take into consideration all the subtleties of spoken
language.
Concatenative synthesis: Concatenative synthesis, which stitches together previ-
ously recorded speech segments to produce new audio, is another method for creating
podcasts. This method can produce high-quality speech, but it only works if there is a
sizable library of recorded speech segments. This method can therefore be expensive
and time-consuming.
Formant synthesis: In formant synthesis, the vocal tract’s size and shape are just two
examples of the acoustic factors that are used to create speech sounds. This method
can produce speech with a high degree of control and precision, but it necessitates a
thorough knowledge of speech acoustics and can be costly computationally.
Deep learning-based systems: Podcast synthesis has greatly benefited from deep
learning, which has revolutionised the field of machine learning. Without the aid
of pre-recorded speech segments or linguistic rules, deep learning-based algorithms
can learn to produce speech from text data. These systems can produce high-quality
audio that sounds natural because they use neural networks to understand the mapping
between text and speech. For podcast synthesis, a variety of methods are available,
each with benefits and drawbacks of their own. Traditional methods like rule-based
systems and concatenative synthesis have limitations when it comes to producing
speech that sounds realistic. High-quality speech can be produced using formant
52 P. Chandre et al.
Table 1 (continued)
Paper title Methodol Dataset Evaluation Key findings
ogy metric
“A Neural Parametric Deep Private dataset MOS Developed a singing
Singing Synthesizer” learning synthesiser that can
Blomberg et al. (2019) synthesise high-quality
singing voices with
natural vibrato and
expressive dynamics
“Transfer Learning Deep VCTK and MOS Proposed a transfer
from Speaker learning LibriSpeech learning approach to
Verification to datasets improve the performance
Multispeaker of neural network-based
Text-To-Speech TTS models in
Synthesis” Ping et al. multispeaker scenarios
(2019)
5 Conclusions
In order to produce audio content for podcasts, the process of podcast synthesis entails
translating text into speech. Machine learning techniques have produced encouraging
outcomes in this field in recent years. In this comparative research, we looked at
a variety of machine learning methods, including more established ones like rule-
based systems and cutting-edge ones like neural networks. Our research demonstrates
that the Tacotron 2 and Transformer models of neural network-based techniques
outperform conventional techniques in terms of audio quality and naturalness. These
models are able to reproduce audio that is identical to that of a human speaker and
catch the subtitles of human speech. These sophisticated models do, however, have
some drawbacks, such as the requirement for substantial computational resources
and a big quantity of high-quality training data. Furthermore, the outcomes of these
models can differ based on the particular topic and language being employed. In
general, machine learn ing methods present a promising route for podcast synthesis,
with neural network-based models demonstrating the greatest potential for high-
quality audio production. To ad dress the shortcomings and enhance the performance
of these models in real-world situations, additional research is required.
54 P. Chandre et al.
References
1. Hansen GC, Falkenbach KH, Yaghmai I (1988) Voice recognition system. Radiology
169(2):580. https://doi.org/10.1148/radiology.169.2.3175016
2. Chandre PR, Mahalle PN, Shinde GR (2018) Machine learning based novel approach for
intrusion detection and prevention system: a tool based verification. In: 2018 IEEE global
conference on wireless computing and networking (GCWCN), pp 135–140. https://doi.org/10.
1109/GCWCN.2018.8668618
3. Skouby KE, Williams I, Gyamfi A (2019) Handbook on ICT in developing countries: next
generation ICT technologies
4. Isewon I, Oyelade J, Oladipupo O (2014) Design and implementation of text to speech conver-
sion for visually impaired people. Int. J. Appl. Inf. Syst. 7(2):25–30. https://doi.org/10.5120/
ijais14-451143
5. Raul S (2022) Review paper on SPEECH TO TEXT USING. 9(5):615–620
6. Chandre PR (2021) Intrusion prevention framework for WSN using deep CNN. 12(6):3567–
3572
7. Chandre P, Mahalle P, Shinde G (2022) Intrusion prevention system using convolutional neural
network for wireless sensor network. IAES Int J Artif Intell 11(2):504–515. https://doi.org/10.
11591/ijai.v11.i2.pp504-515
8. Patil VH, Dey N, Mahalle PN (2020) Lecture notes in networks and systems 169 proceeding
of first doctoral symposium on natural computing research
9. Luo OX (2019) DEGREE PROJECT IN THE FIELD OF TECHNOLOGY. Deep Learning for
Speech Enhancement
10. Yasir M, Nababan MNK, Laia Y, Purba W, Robin, Gea A (2019) Web-based automation speech-
to-text application using audio recording for meeting speech. J Phys Conf Ser 1230(1):2019.
https://doi.org/10.1088/1742-6596/1230/1/012081
11. Ext ENDT, Peech TOS, Ren Y (2020) AND, pp 1–15
12. Singh A, Kaur N, Kukreja V, Kadyan V, Kumar M (2022) Computational intelligence in
processing of speech acoustics: a survey. Complex Intell Syst 8(3):2623–2661. https://doi.
org/10.1007/s40747-022-00665-1
13. Ren Y, Tan X (2019) “FastSpeech: fast , robust and controllable text to speech arXiv: 1905.
09263v5 [cs.CL] 20 Nov 2019,” no. NeurIPS
14. Cambre J, Colnago J, Tsai J, (2020) Choice of voices : a large-scale evaluation of text- to-speech
voice quality for long-form content, pp 1–13. https://doi.org/10.1145/3313831.3376789
15. Bhangale K, Kothandaraman M (2022) Introduction
From Text to Voice: A Comparative Study of Machine Learning … 55
16. Cooper E (2019) Text-to-speech synthesis using found data for low-resource languages
17. Dhotre D, Pankaj R Chandre, Anand Khandare, Megharani Patil, and Gopal S Gawande (2023)
The rise of crypto malware: leveraging machine learning techniques to understand the evolution,
impact, and detection of cryptocurrency-related threats. Int J Recent Innovat Trends Comput
Commun 11(7):215–22. https://ijritcc.org/index.php/ijritcc/article/view/7848
18. Makubhai S, Pathak GR, Chandre PR (2023) Prevention in healthcare: an ex-plainable AI
approach. Int J Recent Innov Trends Computing Commun 11(5):92–100. https://doi.org/10.
17762/ijritcc.v11i5.6582
19. Chandre P, Vanarote V, Kuri M, Uttarkar A, Dhore A, Pathan S, Elahi DDM, Cremonesi P
(2016) Using visual features and latent factors for movie recommendation. CEUR Workshop
Proc 1673:15–18
Artificial Intelligence and Legal Practice:
Jurisprudential Foundations
for Analyzing Legal Text and Predicting
Outcomes
Abstract In recent years, tremendous progress has been made in the use of AI
in the legal profession, revolutionizing the way attorneys evaluate legal texts and
foretell case outcomes. This study examines the jurisprudential underpinnings that
support AI-driven strategies in the legal field, concentrating on the application of
AI tools for predictive analytics and legal text analysis. The paper emphasizes how
these technologies make it easier for attorneys to efficiently explore complicated
legal texts by facilitating the extraction and interpretation of legal information from
huge textual collections. The next section demonstrates how AI algorithms may be
created to mimic human-like legal reasoning processes by making links between
legal reasoning, legislation interpretation, and case law analysis. The need for a
cooperative connection between AI systems and human attorneys is emphasized as
the potential for AI to supplement rather than replace legal competence is taken into
account.
1 Introduction
I. Walia (B)
Rajiv Gandhi National University of Law, Patiala, Punjab, India
e-mail: [email protected]
N. S. Nautiyal
School of Law, Forensic Justice and Policy Studies, National Forensic Sciences University,
Gandhinagar, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 57
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_6
58 I. Walia and N. S. Nautiyal
superiors in the race. Artificial intelligence has transgressed the boundaries of almost
every domain. Artificial intelligence and law have a special relation as they can be
used to facilitate the legal processes and enhance the legal reasoning component. At
present, various courts have introduced the software run by artificial intelligence algo-
rithms for performing case research, analyzing information, maintaining database,
etc. Programs like Watson, the open-source text analysis tools have been the founda-
tional basis for perceiving the idea of developing legal analytic models and tools [1].
Though, the open-source text analysis tools may be used for analyzing the legal text
and context, it will majorly lack the component of legal reasoning. The technicians
are now discussing computational mechanisms where the information analysis will
move towards conceptual analysis to draft arguments from both perspectives i.e.,
for and against. Hence, the whole idea is to build a Computational Model of Legal
Reasoning, to ensure a credible and rational administration of justice. Developing
a reliable computational model will help us predict answers to legal disputes based
on algorithmic legal reasoning [2]. In many sectors, artificial intelligence (AI) has
become a disruptive force. The legal profession is no exception. This essay exam-
ines the relationship between AI and the practice of law, emphasizing how it affects
crucial processes including legal research, document review, contract analysis, and
legal judgment.
The first section of this article gives an introduction to the AI technologies, such
as machine learning, natural language processing, and expert systems, that are often
used in the legal industry. It explores the potential of AI systems, demonstrating how
they may effectively handle enormous volumes of legal data, helping attorneys to
swiftly access pertinent information and make better conclusions. The benefits AI
provides to legal research are the main topic of the second section. To extract relevant
precedents and spot trends, AI-powered technologies may search through vast legal
databases, case law, and historical records. This helps legal practitioners save time
while also improving the precision and thoroughness of their investigation. The
article also looks at how AI affects document review and e-discovery procedures.
AI-driven algorithms are incredibly good at locating pertinent documents, which
eases the workload for attorneys during litigation and due diligence. The topic of
potential difficulties and moral questions raised by utilizing AI in these situations is
also discussed.
The research also explores the possibilities of AI in contract analysis. AI solu-
tions may assist attorneys in identifying possible hazards, ensuring compliance, and
streamlining contract administration since they have the capacity to analyze and
comprehend complicated contractual language. Additionally, the use of AI in predic-
tive analytics and legal decision-making is investigated. In order to predict case
outcomes and offer useful insights for both attorneys and clients, AI models may
analyze previous court judgments and other legal data. The study does, however,
recognize that employing predictions made by AI in court procedures requires open-
ness and responsibility. Last but not least, the article discusses the ethical issues and
worries related to the use of AI in the legal field, including bias in AI algorithms and
the effect on job displacement for legal practitioners.
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 59
that exist in different parts of the world are referred to as Argument Retrievals and
Cognitive computing. Most of the law firms are using these models and some of the
websites also make these models available for the public [9]. These models extract
semantic information from the legal documents and provide the user with legal advice
that they can refer to while dealing with their legal dispute. Another common term
in the field of legal analytics is that of conceptual legal information retrieval, where
the matching concepts and information concerning a legal provision can be readily
found by the user by making a simple search. The cognitive computing aspect is
an advanced feature that caters by way of a customization facility. The users may
customize their legal research to get a summary of information with highlighted text
to grab the attention of the user on the relevant text [10]. Though cognitive computing
is an advanced technology still it is not an expert technology. In expert technologies,
law personnel curate the kind of information they would want to retrieve by manually
selecting the input information and delivering it to the system in a way that responds
as desired [11].
While we talk about incorporating the use of computational techniques into legal
practice, we must understand the role and reference of commercial and institutional
frameworks that already exist. These commercial and institutional approaches have
already been in place for the purpose of whole text retrievals, indexing, referencing,
and search facilities. To name a few such facilities, Westlaw, and LexisNexis have
been widely used by students and practitioners. These applications and platforms
have maintained wealthy databases of text and literature and the way they are regu-
larly updated is also worth appreciation. These systems already indicate the inter-
face between law and artificial intelligence. But as discussed earlier, these intelligent
technologies can find, analyze, and retrieve information but cannot provide legal
reasoning [12].
In furtherance to the concept of legal analytics, is the context of text mining.
Text mining in general terms can be understood examination of unstructured data
sets to extract the relevant information patterns for searching the sources for textual
information. Text mining may appear to be one activity but in real it’s a combination
of several tasks such as retrieval, data extraction, and machine learning. The text
mining process involves the accumulation of unstructured data from varied sources.
The data once collected is required to be cleaned from ambiguities and anomalies.
Application of text mining tools and applications forms the corpus of the process.
Deploying the management information systems allows for data pattern development.
The most important of all is the storage of such data for further analysis and timely
references. The techniques, characteristics, and tools are listed in Table 1, see also
Abhinav Rai, What is Text Mining: Techniques and Applications, available at: https://
www.upgrad.com/blog/what-is-text-mining-techniques-and-applications/.
62 I. Walia and N. S. Nautiyal
It is pertinent to see the functioning of the existing legal information retrieval systems
before pondering over the advanced technology ways of doing it. The moment the
user raises a query through search, the system is triggered to retrieve documents from
databases which are systematically indexed, followed by assessment and measure-
ment in terms of relevancy to the query and position of the output in a listing format.
The information retrieval systems function on three foundations namely, Boolean
Relevance, Vector, and Probabilistic approach. The first approach retrieves the infor-
mation on the basis of the proximity of terms searched by the user and responded
to by documents. The Vector model is based on the search for a collection of terms,
which may or may not be systematically placed. There is no preference given to
the sequencing of words placed in the search tab. Lastly, the probabilistic approach,
provides for exhaustive research like what the user intends to find. It retrieves the
information from the documents after considering words, meanings, definitions,
synonyms, etc. [13], see Fig. 1.
This figure explains the mechanism for the retrieval of legal information. The
illustrative figure offers a thorough overview of the multiple systems that support the
retrieval process in the complex world of legal information retrieval. While making an
assessment of the functioning of the machine learning systems used for information
retrievals, one must cross-check the following points [14]:
a. Predictability of relevant and irrelevant documents;
b. Number of retrieved documents that were relevant to checking precision;
c. Circulations and previous searches made pertaining to that document;
d. Citation scores;
e. Evaluation on the basis of data, for example, lexical and legal levels;
f. Produces relevant results from concentrated data sets as desired by the user;
g. Segregate the terms reflecting the same meaning or connotations;
h. Allows for manual editing.
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 63
When the machine learning algorithms retrieve data from the text available, they
identify a pattern of data and use the same pattern to identify similar data from other
sets of information. It analyzes the features required for the commission of a wrong or
an offense, analyzes the situation, applies the text, and predicts an outcome [15]. The
decision can be reached by the algorithm either on the basis of the existing jurispru-
dence, logic, or statistics [16]. The recognition of patterns for resolving human issues
is the basic aim of employing machine learning techniques. Machine learning can
be applied to the legal text in certain steps. Firstly, collecting and processing the
raw data, which is available in the form of legal text in any form. This text can be
downloaded one by one or in bulk depending upon the availability and authoriza-
tion. Secondly, to normalize the text by using language processing tools to maintain
uniformity. Thirdly, after the language processing is done, we move to the vector
feature, which checks on the length of the document, and additional information
for better classification and predictions. Fourthly, the classification of sentences, on
the basis of how much they support a conclusion. Lastly, cross-validation procedure
may be used in a small set of information to test the working. Machine learning
uses predictive coding to check the relevancy and responsiveness of the documents.
Thus, it can be said that while extracting information from statutory or legal text, one
may emphasize on representation of statutory provisions in classified format, select
the algorithmic application that may be applicable, and manage the data sorted and
provided [17].
Machine learning may also be used for retrieving information from case laws or
decisions. The extraction from the legal decisions is likely to be more argument-
based information rather than the legal provisions. This kind of search may focus on
arguments and sentences delivered in a court setup. Artificial intelligence facilitates
in labeling of the information as ‘upheld’ or ‘overruled’ for quick reference. The
analysis taken up by intelligent algorithms may extract information about facts of
the case, history of the case, arguments, ratio, and obiter along with orders and
64 I. Walia and N. S. Nautiyal
decisions [18]. Once the information about a legal issue is explored it would help us
by providing an explanation about the legal principle used or reasoning behind the
judgment. In using or preparing a program for extraction of information from case
laws the computer scientist must consider: First, whether the evidence produced in
a case justifies the decision or conclusion. Second, provides a legal rule, irrespective
of its impact on the final decision, and lastly, a citation sentence, which identifies
references made to statutes, regulations, documents, and writings while reaching a
decision [19].
There are already existing tools and applications available that function on prin-
ciples of cognitive computations. These applications systematize the citations as per
the hierarchical structure of the courts and also maintain a chronology. Applications
and Ross and lex Machine are helpful in answering legal questions and in predicting
legal decisions. Specifically, Lex Machina is an application that analyzes the previous
decisions of a judge to anticipate what can be a possible legal decision in the dispute
presented before him. The emergence and application of these computational tools
will strengthen the interface of legal text corpus and human understanding. This may
sound more like a commercialization of legal practice but it would ensure a step
toward speedier justice. A whole set of confirming and contesting hypotheses by
intelligent machines would change the perspectives on human–machine interactions
[20].
In general, for any legal practitioner, there are two most relevant models, one is
reasoning which is case based and depends on precedents for connecting the context
with the case in hand and the other is adversarial reasoning, which enables the
building of relevant and assertive arguments for both the sides. To master the art of
legal reasoning it is required to focus on precedents, the ability to structure unre-
fined information, manage exceptions, resolve conflict in laws and rules, ability
to argue and justify the stand. When we consider the role of artificial intelligence
in the legal reasoning process, immediate attention goes to case-based reasoning,
expert opinions and rules, logic, language processing models, creative and critical
thinking and illustrations. When artificial intelligence works on the reasoning model,
it strengthens legal reasoning in terms of citations and indexing, comparison of cases,
evaluation of arguments, and connecting and relative factors for drawing analogies
and describing hypothetical situations. The primary works related to artificial intel-
ligence in matters of legal interpretation are those of the TAXMAN II project of Mc
Carty. As per Mc Carty legal interpretation is basically a theory construction. There
are two most famous systems that infuse artificial intelligence with legal reasoning,
i.e., the HYPO and the CATO systems. These models assist the lawyers to make
use of past decisions while making arguments. The HYPO System formulates a
dispute between both parties in reference to a legal claim. The practice makes use
of drawing analogies and references to precedents. The rules of this model prepare
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 65
both parties as to which cases will be most relevant to cite. In the other model CATO,
a set of favorable and non-favorable factors is provided. An assessment as to deci-
sion depends on proper evaluation of competing factors. The HYPO system was
embedded in CABARET system that made use of precedents and their relevance in
the application of rules. HUPO Model was also used along with CATO system to
teach case-based argument skills to students of law. After HYPO and CATO models,
the GREBE system produced the most extensive jurisprudence on industrial injury
through the use of semantic networks [21].
Building a model for legal reasoning with technology at its base may not be a
very difficult task to accomplish. What is expected is the precise and clear input of a
situation followed by the determination of legal rules, which would further result in
the predictability of a decision based on the acceptability or rejection of a legal rule.
Though the process doesn’t seem complicated but may pose challenges because
of ambiguities and indeterminacy [22]. The application of a statute and statutory
interpretation are two different aspects that trigger arguments. Legal indeterminacy
is a subset of legal reasoning. Despite being in agreement with the facts stated and
rules quoted, a lawyer may have the tendency to rebut the argument with logical
reasoning [23].
While modeling a statutory legal reasoning, there is a possibility of two kinds of
ambiguities, viz., semantic ambiguity and syntactic ambiguity [24]. The first refers
to the issue of concepts or terminology not clearly defined, and the latter talks about
the terms used by legislatures which on interpretation may give rise to a complex set
of arguments [25]. These words are ‘if’, ‘and’, ‘whether’, etc. Semantic ambiguity
is basically a condition of vagueness and uncertainty to establish what a legal term
would mean or not mean. Semantic ambiguity believes that the language of a legal
text cannot be designed to clearly indicate a specific proposition, it will always
encompass terms, which may be required to be interpreted by the courts. Sometimes,
the language in the legal text is kept open-ended to attain political consensus. The
syntactic ambiguity arises from the imperfect structure of logic given in a legal
statute, which resonates with the natural language processes. The syntax used in a
legal text can open a whole new set of interpretations. It may lead to multiple and
diverse arguments which will have an impact on the decisions based on the convincing
power of the arguments. The syntactic issue can be resolved by normalizing the text
before initiating an algorithmic process. After the normalizing process, a statute can
be made available in a logical format which would help in clarifying the syntax
which would streamline the interpretation [26]. A propositional logical format will
then be useful in making it compatible with an artificial intelligence algorithm. After
the whole sorting is done, reformulating the text for machine language purposes
may now become the priority. Reformulations and negations help in substituting
more legally relevant arguments and replacing less researched aspects of the same
dispute. This further leads to the concept of default reasoning, where the earlier
decisions can be modified or overruled based on the new legislative determinants
that have been identified, which have the power to change the decisions made so far.
The artificial intelligence algorithms may also get confused about open terms that
are subjective like beyond reasonable doubt, preponderance of probability, person
66 I. Walia and N. S. Nautiyal
of good character, etc. [27]. Thus, it can be seen that the statutory interpretations
must be clearly fed into the algorithms for such applications to function and produce
decisions with legal reasoning. It must accept a clear interpretation of a legal rule
unless and until there is a requirement to add additional rules for developing sufficient
reasoning. Once the reasoning is developed it can be supported by provisions and
cross-references [28].
Different computational and reasoning models can be both created and followed.
The case-based legal reasoning would require the methodology to represent adequate
understanding and knowledge about the facts of the case and similarities that it
has with previously given judgments or reasoning in similar sets of situations [29].
Three kinds of legal computational models have been suggested for this purpose,
viz., prototypes, deformations and dimensions, and legal factors. The prototypes and
deformations model emphasizes legal arguments that are in synchronization with
principles and concepts stated in similar cases or relevant cases. The second model of
dimensions and legal factors is representative of techniques that enable algorithms to
compare and make analogies with existing case law repositories. It is more exhaustive
than the previous model and picks up positive instances dejecting the negative [30].
Thus, the legal outcomes or their predictions are based on analysis of legal text, case-
based analysis, or specialized computations made by artificial intelligence driven
algorithms. The models discussed above make an algorithm to analyze a variety
of information about existing legal provisions, similar cases, behavioral patterns of
judges, history of such cases, fact patterns, claims and arguments, etc. Artificial
intelligence uses feature frequencies to evaluate information and draw a conformity
between the case characteristics and predictable outcomes [31].
The interface of artificial intelligence and law can be put to optimum use by super-
vised machine learning techniques. The algorithms that become smarter by self-
learning may induce predictions depending upon the statistical means. The algo-
rithm or machine learning becomes a supervised one when it performs functions of
classification rather than just labeling. Legal reasoning can become crucial to judi-
cial processes and may even reduce the uncertainty, dis-proportionality, and over-
discretionary aspects of sentencing. The legal outcomes and decisions will not only
provide statistical support for making uniform and certain decisions but would also
help in the eradication of personal biases. The complex formulations and computa-
tions made by analyzing the legal text and case law-based predictions will contribute
towards developing legal literature and help in developing policies for efficient and
reliable deliverance of justice. The interlock between the law and artificial intelli-
gence may crucially deal with legal reasoning and outcomes but would also be a
major trigger to systematize and prioritize the collection of data, placement of legal
principles, and critical evaluation of arguments in reference to statutory interpre-
tations. Though the paper has been focusing more on the outcome, the collateral
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 67
benefits cannot be ignored at the instance. The legal profession has a tremendous
amount of potential to change as a result of the use of artificial intelligence (AI) in
legal practice, particularly in the study of legal language and case outcome predic-
tion. The researchers have examined the jurisprudential underpinnings of AI-driven
techniques in the legal arena throughout this study, and we have seen firsthand how
AI technologies have the potential to transform the job of legal practitioners.
To discuss one such Natural Language Processing Model is that of Bidirectional
Encoder Representations from Transformers, popularly known as BERT. Unlike
directional models, that read the context from left to right and right to left, sequen-
tially, BERT’s transformer reads the entire sequence of words at a go. Calling it
nondirectional would have been better than calling it bidirectional. Reading the words
in this manner allows the machine to understand the word in every context and in
the circumstances of its available surroundings. The major drawback in using the
BERT NLPs is that it predicts the next words automatically restraining the inflow of
more ideas and context. This is majorly overcome by making use of two ways, firstly,
by following Masked LM that adds a layer of classification, redefining domains of
vocabulary, and calculating or predicting the occurrence of each word. Secondly,
by Next Sentence Prediction, the BERT model in this case analyses the pairs of
sentences and identifies if the second sentence is subsequent to the first in the orig-
inal document. Positional embeddings of the sentences and their match with the
original document also facilitate the purpose aimed at by BERT Models [32].
Though other domain BERTs have been successfully deployed no legal BERT
is famously known to function in present times. The only good advantage of using
BERT in reference to the legal arena is to deploy it to analyze the linguistic polysemy
for words such as consideration, workers, labor, etc. [33].
The use of AI in legal text analysis, enabled by machine learning and natural
language processing, has shown to be revolutionary. With the use of these tech-
nologies, attorneys may now more quickly and effectively navigate through enor-
mous legal databases, extract pertinent facts, and understand complicated legal docu-
ments. Artificial intelligence (AI) algorithms assist legal practitioners in deciphering
complex legislation, rules, and case law, thereby improving legal services.
The supporters of artificial intelligence often boast about productive outputs and
cost-cutting. They hype the rising graphs of the Gross Domestic Product (GDP),
less labor, and lesser human involvement [34]. Though the proponents are citing the
correct propositions still there is a leeward side to this practice. Enough incidents
of racism and discrimination committed by machine learning algorithms have been
reported in different instances. After all who creates these algorithms, we the humans.
Certain biases ought to be present in the criminal justice administration system,
workplace management systems, and financial institutions. Bias is entrenched in an
artificial intelligence driven machine because huge data sets are used for predictive
analysis [35]. The usage of huge amount of data sets which engulf every kind of
information, recorded by humans is not clean from contamination, misinformation,
and manipulation. The corrupt and unjust practices have prevailed in almost every
nation at one point of time or the other [36]. The incidence of bias majorly affects
the minorities or marginalized groups, people of color and women in specific [37].
68 I. Walia and N. S. Nautiyal
The bias and this racist characteristic if embedded will follow the complete trail
of development, process, and execution of that artificial intelligence algorithm. The
Suspect Target Management Plan (STMP) used in Australia is blamed for dispro-
portionally targeting Aboriginals and other marginalized groups. United States has
also used Correctional Offender Management Profiling for Alternative Sanctions
(COMPAS) which mistakenly tags people of color with a possibility and likelihood
of reoffending. The most interesting and unbelievable example is of the United States,
where the algorithms now find a place in the sentencing system and for the purpose
of sentencing it counts on the economic status and employment factors. The studies
show that there is a general stereotype notion about people of color suffering at the
hands of destiny and being tagged as born criminals [38].
The use of AI-driven predictive analytics provides attorneys with a ground-
breaking potential to forecast case outcomes based on prior legal information. AI
models can give insightful analysis of precedents and trends in case law, helping
attorneys make better judgments and giving clients a more accurate evaluation of
their legal issues. To avoid biases, it is crucial to approach predictive analytics
cautiously and make sure that predictions made by AI are explicit, comprehensible,
and constantly reviewed.
The use of AI in legal practice must be done with the utmost ethical care. To
protect the ideals of justice and fairness in the legal system, it is crucial to address
bias in training data, ensure fairness in decision-making, and preserve transparency
in AI algorithms.
AI must be used responsibly and accountable in order to avoid unforeseen
outcomes and preserve public confidence in AI-driven legal systems. The case studies
included in this report show the practical advantages of AI adoption in a range of legal
fields, including contract analysis, litigation support, and legal research. AI’s capacity
to automate processes, boost productivity, and save costs can have a significant influ-
ence on the legal industry, allowing practitioners to concentrate on higher-value work
and provide customers with better services. The potential effects of AI on the legal
profession are both intriguing and difficult to predict. Although AI has the potential
to supplement and enhance legal knowledge, it is unlikely to completely replace
human attorneys. AI systems and legal experts working together in a collaborative
manner to improve decision-making and legal analysis will probably become the
norm.
6 Conclusion
References
1. Chan J, Yonamine J, Hsu N (2016) Data analytics: the future of legal, 9 INT’l. IN-House
Counsel J 1
2. Mead L (2020) AI Strengthens Your Legal Analytics, 46 LAW PRAC. 52
3. Stepka M. Business Law Today, American Bar Association. Available at: https://businesslawt
oday.org/2022/02/how-ai-is-reshaping-legal-profession/
4. Legal Analytics Shop Talk with Lex Machina, 20 AALL Spectrum 40 (2016)
5. Stouffer CM, Baker JJ (2019) Ask a Director: Shaping Legal Data Analytics, 24 AALL
Spectrum 30
6. Jack GC, Karl Branting L (2018) Introduction to the Special Issue on Legal Text Analytics 26
A.I. & L. 99
7. Byrd O (2017) Moneyball Legal Analytics Now Online for Commercial Litigators, 31 COM.
L. WORLD 12
8. Daniel S (2018) Wittenberg, data analytics: a new arrow in your legal quiver, 43 LITIG. News
26
9. Rapoport NB, Tiano JR Jr. (2019) Legal analytics, social science, and legal fees: reimagining
legal spend decisions in an evolving industry, 35 GA. St. U. L. REV. 1269
10. Ashley KD (2022) Prospects for legal analytics: some approaches to extracting more meaning
from legal texts, 90 U. CIN. L. REV. 1207
11. Carlos I (2007) Massini, between analytics and hermeneutics: legal philosophy as a practical
philosophy, 56 Persona & DERECHO 205
12. Savelka J, Grabmair M, Ashley KD (2020) A law school course in applied legal analytics and
AI, 37 LAW CONTEXT: A Socio-LEGAL J. 134
13. Rapoport NB, Tiano JR Jr (2019) Leveraging legal analytics and spend data as a law firm
self-governance tool, 13 J. Bus. Entrepreneurship & L. 171
14. Zodi Z (2021) Big-data-based legal analytics programs. What Will Data-Driven Law Look
like?, 10 ACTA UNIV. Sapientiae: LEGAL Stud. 287
15. Patrick Flanagan G, Dewey MH (2019) Where do we go from here: transformation and
acceleration of legal analytics in practice. 35 GA. St. U. L. REV. 1245 (2019)
16. Weinshall K, Epstein L (2020) Developing high-quality data infrastructure for legal analytics:
introducing the israeli supreme court database, 17 J. EMPIRICAL LEGAL Stud. 416
17. Zatarain JMN (2018) Artificial intelligence and legal analytics: new tools for law practice in
the digital age, 15 SCRIPTed 156 (2018)
18. Andrade MD, Rosa BC, Castro Pinto ERG, Legal tech: analytics, artificial intelligence and the
new perspectives for the private practice of law, 16 DIREITO GV L. REV. 1
19. Sorkin D, Lai J, Cuevas-Trisan M (2015) Legal problems in data management: ethics of big
data analytics and the importance of disclosure, 31 J. Marshall J. INFO. TECH. & PRIVACY
L. [xi]
20. Borden BB, Baron JR (2014) Finding the signal in the noise: information governance, analytics,
and the future of legal practice, 20 RICH. J.L. & TECH. 1
21. Prakken H, Legal reasoning: computational models. Available at: https://webspace.science.uu.
nl/~prakk101/pubs/EncyBS.pdf
22. Buchanan BG, Headrick TE (1970) Some speculation about artificial intelligence and legal
reasoning, 23 Stan. L. REV. 40
23. Paul J (2021) When justice is served: using data analytics to examine how fraud-based legal
actions affect earnings management, 2 CORP. & Bus. L.J. 64
70 I. Walia and N. S. Nautiyal
24. McCarty LT (1977) Reflections on Taxman: An Experiment in Artificial Intelligence and Legal
Reasoning, 90 HARV. L. REV. 837
25. Najjar M-C (2023) Legal and ethical issues arising from the application of data analytics and
artificial intelligence to traditional sports, 33 ALB. L.J. Sci. & TECH. 51
26. Susskind RE (1986) Expert systems in law: a jurisprudential approach to artificial intelligence
and legal reasoning, 49 MOD. L. REV. 168
27. Lashbrooke Jr. EC (1988) Legal reasoning and artificial intelligence, 34 LOY. L. REV. 287
28. Koenig MEL, Mandell C (2022) A new metaphor: how artificial intelligence links legal
reasoning and mathematical thinking, 105 MARQ. L. REV. 559
29. Clark M (1997) Automation of legal reasoning: a study on artificial intelligence and law, 6
INFO. & COMM. TECH. L. 178
30. Tiscornia D (1993) Meta-reasoning in law: a computational model, 4 J.L. & INF. Sci. 368
31. Berman DH, Hafner CD (1987) Indeterminacy: a challenge to logic-based models of legal
reasoning, 3 Y.B. L. Computers & TECH. 1
32. Horev R, BERT Explained: State of-the-art language model for NLP, towards data science,
towards data science. Available at: https://towardsdatascience.com/bert-explained-state-of-the-
art-language-model-for-nlp-f8b21a9b6270
33. Zhang E, LawBERT: towards a legal domain-specific bert? towards data sciences. Available at:
https://towardsdatascience.com/lawbert-towards-a-legal-domain-specific-bert-716886522b49
34. Solow-Niederman A (2020) Administering artificial intelligence, 93 S. Cal. L. Rev. 633
35. Stark L, Hutson J (2022) Physiognomic artificial intelligence, 32 Fordham Intell. Prop. Media &
Ent. L.J. 922
36. Opderbeck DW (2021) Artificial intelligence, rights and the virtues, 60 Washburn L.J. 445
37. Atkinson D (2019) Criminal liability and artificial general intelligence. J Robot, Artif Intell
Law (Fastcase) 333
38. Buiten MC (2019) Towards intelligent regulation of artificial intelligence. 10 Eur. J. Risk Reg.
41
Unveiling the Truth: A Literature Review
on Leveraging Computational Linguistics
for Enhanced Forensic Analysis
Abstract The fusion of computational linguistics (CL) and forensic linguistics (FL)
has become a powerful tool for boosting the effectiveness of forensic investigations
in the fast-changing field of digital forensics. This thorough assessment of the liter-
ature looks into this multidisciplinary nexus and explains how CL may enhance
and improve the procedures and findings of forensic investigations. The review
summarises the literature, highlighting recurring themes, divergences, and trends,
and suggests new lines of inquiry based on the gaps found. Examining the results’
implications for both CL and FL highlights the possible effects on the corresponding
fields. The review examines the techniques, conclusions, and limits of significant
research in depth. It points out knowledge gaps, especially with regard to the use of
CL approaches in FL situations. These gaps serve as a guide for further research,
emphasising areas where additional study could result in important breakthroughs.
1 Introduction
The fields of computational linguistics (CL) and forensic linguistics (FL), which
are separate but related, have advanced significantly in recent years. Each subject
contributes a distinct viewpoint and set of approaches to the study of language,
and the junction of these fields opens up new prospects for further study and use.
The multidisciplinary field of CL makes use of notions from computer science to
understand and represent language. In order to enable computers to interact with
people in a fashion that looks natural and is similar to that of a person, it tries to
develop computational models and algorithms that can process, analyse, and grasp
human language. The subfields of CL include, but are not limited to, natural language
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 71
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_7
72 D. Mashru and N. S. Nautiyal
This work aims to explore the relationship between computational linguistics (CL)
and forensic linguistics (FL), illuminating how CL could improve the effective-
ness and precision of forensic investigations. In order to develop knowledge in both
domains, it strives to give a thorough grasp of how the methodology and approaches
of CL may be applied to the problems and difficulties of FL. The underlying research
question that informs this work is “How can the principles and techniques of compu-
tational linguistics be leveraged to enhance the effectiveness and accuracy of forensic
linguistics in the context of forensic investigations?” The study will conduct a crit-
ical analysis of the corpus of prior research in the area, identify knowledge gaps,
and provide solutions to close these gaps in order to respond to this research topic.
Additionally, it will look at particular instances when CL has been applied to FL, eval-
uating the efficiency of these techniques and their consequences for both domains.
In short, this study’s goal and its central research question are to reveal the revolu-
tionary potential of CL in improving FL and open the door to forensic investigations
that are more effective, precise, and all-encompassing. or scholars and practitioners
in the domains of CL and FL, as well as for attorneys, law enforcement officials, and
society at general, this endeavour is very valuable.
74 D. Mashru and N. S. Nautiyal
4 Research Methodology
To guarantee the applicability and calibre of the chosen literature, inclusion and
exclusion criteria for the studies were established. Studies that satisfied the following
requirements were considered for the review:
1. The study concentrated on forensic linguistics’ use of computational linguistics.
2. The study’s publication in a peer-reviewed journal or conference proceedings
attested to the study’s calibre and objectivity.
3. Since English was the language of the review, the study was also available in that
language.
The following exclusion standards were established:
1. Language restrictions prevented us from including studies that were not available
in English.
2. To assure the calibre of the included research, papers that were not published in
peer-reviewed journals or conference proceedings were omitted.
To keep the review’s emphasis on the junction of computational and forensic
linguistics, studies that had no direct bearing on this topic were disregarded.
Unveiling the Truth: A Literature Review on Leveraging Computational … 75
There were various processes in the process of reading, analysing, and synthesising
the literature. First, crucial details, such as the authors, year of publication, research
methodologies, important findings, and research gaps, were retrieved after thor-
oughly reading the chosen articles. To aid in the analysis, this data was kept in a
tabular format.
After that, a thematic analysis of the literature was conducted. Finding recurring
themes, differences, and patterns within the research was necessary for this. Itera-
tively, topics were developed and redefined as more material was read during the
thematic analysis.
In order to present a thorough overview of the area, the results from the various
investigations were integrated as part of the literature synthesis. This included talking
about recurring themes and patterns, contrasting and comparing the results of various
research, and pinpointing knowledge gaps. The synthesis included a critical evalu-
ation of the present status of the field, recommendations for future study, and a
discussion of the implications of the results for both computational linguistics and
forensic linguistics.
7 Literature Review
Over the past several decades, research in computer Linguistics (CL) has significantly
increased, driven by advances in technology and artificial intelligence that have sped
up the creation of complex computer models and language processing algorithms.
Forensic linguistics (FL) is one of the many fields in which these advancements have
found use.
Research on the use of CL in FL has mostly concentrated on a few important
topics. In order to identify the author of a document, computer approaches have
been used to analyse linguistic aspects such as word usage, syntactic structures,
and stylistic patterns. This is one of the most well-known applications of author-
ship attribution. Numerous studies have shown that these techniques are successful
76 D. Mashru and N. S. Nautiyal
Table 1 Details of literature review along with research area and key findings
Sr Author(s) Research area Key findings
1 Almela et al. [1] Automatic Classifier for Developed an SVM classifier
Deception in Spanish to identify deception in
Spanish written
communication; emphasised
the gap in research for
languages other than English
2 Church and Liberman [2] Major Shifts in Offered a comprehensive
Computational Linguistics discussion on the evolving
landscape of computational
linguistics and provided
insights for budding
researchers in the field
3 Solovyev et al. [3] Linguistic Complexology Presented a detailed overview
Paradigms and Methods of the paradigms and methods
in linguistic complexology and
underscored the need for
refining complexity prediction
metrics
4 Ophir et al. [4] Computational Linguistics in Explored the integration of
Suicide Prevention computational linguistics for
suicide prevention, bringing
attention to both ethical
dilemmas and methodological
challenges
5 Moura et al. [5] Automatic Classifier for Developed an automatic
Deception Detection in classifier targeting Portuguese
Portuguese deception; emphasised a
similar research gap as
observed in Spanish-language
deception studies
6 Simon and Nyitrai [6] Linguistic Fingerprints in Highlighted the pivotal role
Decision-making and linguistic fingerprints play in
Investigation authoritative decisions, aiding
investigative bodies in their
work
7 Almela et al. [7] Quantitative Analysis of Provided quantitative insights
Lying in Psychopathic into the linguistic nuances
Discourse observed in deceptive
communication within
psychopathic discourse
8 Alshahrani et al. [8] Deep-Learning-Based Intent Proposed a deep learning
Detection for Natural approach for intent detection
Language in natural language
understanding and discussed
potential ethical ramifications
(continued)
Unveiling the Truth: A Literature Review on Leveraging Computational … 77
Table 1 (continued)
Sr Author(s) Research area Key findings
9 Donatelli and Koller [9] Evolution of Computational Presented an in-depth
Linguistics overview of the historical
changes, varying motivations,
evolving methods, and diverse
applications in computational
linguistics
10 Tsujii [10] Forensic Linguistics in Emphasized the increasing
Epidemic Crime and Fake relevance and critical role of
News forensic linguistics in tackling
challenges like epidemic crime
and the spread of fake news
11 Silva [11] Argumentation in Hate Explored the structure and
Speech on Facebook nature of argumentation
present in hate speech on
Facebook, illuminating
patterns and potential
motivations
12 Choobbasti et al. [12] CL-DLBIDC for Natural Proposed a novel methodology
Language Understanding that synergizes computational
linguistics and deep learning
for enhanced natural language
understanding
13 Gurram et al. [13] Fast Native Language Introduced state-of-the-art NLI
Identification Techniques techniques leveraging string
kernels, addressing efficiency
and speed in identifying native
languages
14 Alduais [14] Comparative Research Offered a holistic comparative
Approaches in Language review of various research
Study methodologies adopted in the
realm of language studies
15 Abdalla [15] Role of Forensic Linguistics Detailed the significant
in Crime Investigation contributions and applications
of forensic linguistics in the
investigation and solving of
crimes
16 Kuznetsov [16] History and Current State of Presented a chronological
Forensic Linguistics narrative detailing the
evolution, current
methodologies, and future
prospects of forensic
linguistics
(continued)
78 D. Mashru and N. S. Nautiyal
Table 1 (continued)
Sr Author(s) Research area Key findings
17 Sari et al. [17] Hate Speech Acts on Social Conducted an exhaustive study
Media on the manifestations, patterns,
and repercussions of hate
speech acts on various social
media platforms
18 Orr et al. [18] Ethical Role of Investigated the ethical
Computational Linguistics in considerations and
Suicide Prevention responsibilities when
leveraging computational
linguistics tools and techniques
in suicide prevention efforts
at accurately identifying authors even when there are many potential authors or big
datasets. Deception detection is a key field of research as well. Machine learning algo-
rithms have been utilised in several research to find linguistic indicators connected to
dishonesty in spoken and written language. These research have demonstrated that
deceitful language frequently possesses particular linguistic characteristics and that
computational tools can be useful in identifying these characteristics.
The use of CL for linguistic profiling, which uses computer techniques to develop
thorough profiles of people based on their language use, has also been studied. These
profiles can offer useful details about an unknown author’s background in terms of
demographics, which can help identify them.
Despite these developments, applying CL to FL still presents a number of diffi-
culties and opportunities. Significant obstacles need to be overcome due to the
complexity and diversity of languages, the complexities of human communication,
and the ethical and legal ramifications of utilising language as evidence in court
cases.
Although the corpus of existing research has made great gains in the application of
computational linguistics (CL) to forensic linguistics (FL), there are still a number
of knowledge gaps that need to be filled.
First off, a lot of the study that has already been done has been on English or
other frequently spoken languages. Research on the use of CL to FL in the context
of minority or less widely used languages is lacking. The need to include various
languages in the research is critical given the linguistic variety of our planet. Second,
while extensive research has been done on authorship attribution and deception detec-
tion, less has been done on other possible uses of CL in FL. For instance, additional
study is required on the application of CL to tasks like spotting threats, detecting
Unveiling the Truth: A Literature Review on Leveraging Computational … 79
hate speech, or examining the language of legal writings. Thirdly, many of the prior
investigations have relied on computer models or language elements that are rather
basic. Research that examines increasingly intricate language aspects and makes use
of cutting-edge computer models, including deep learning models, is required.
The ethical and legal ramifications of employing CL in FL are not rigorously
examined by the study, which is another research gap. To ensure the fair and reason-
able application of computational approaches in forensic investigations as they are
used more often, it is critical to solve these challenges.
The use of computational tools in forensic linguistics is one of the most prevalent
themes in the literature, notably in the determination of authorship, deception detec-
tion, and hate speech analysis. These programmes use computational linguistics to
analyse massive volumes of data, spot patterns, and make predictions, giving forensic
investigators useful resources.
The application of deep learning and machine learning methods in forensic
linguistics is another recurring subject. Numerous studies, including that of
Alshahrani et al. [8], talk about the use of these sophisticated computational algo-
rithms for natural language interpretation and show how they might improve forensic
investigations.
11.2 Disagreements
There are differences in the literature despite these universal elements. One point of
contention is the viability and moral ramifications of using computational methods in
forensic linguistics. While computer tools can offer insightful data, some researchers,
including those in the study by Orr et al. [18], contend that they shouldn’t take the
place of human discretion and knowledge. Additionally, they raise worries about
privacy and possible abuse of these technologies.
11.3 Trends
Despite the fact that numerous studies have looked at how computational linguistics
may be used in different facets of forensic linguistics, comprehensive research that
focuses on the fusion of these two domains is still lacking. This paper fills this
vacuum by offering a thorough examination of the ways in which computational
linguistics might advance forensic linguistics, utilising a variety of sources to present
a comprehensive picture of the subject.
The literature study also showed that there is a need for more complex analyses
of the moral ramifications of using computational methods in forensic linguistics.
There is a need for a more thorough and in-depth investigation of this subject even
while certain studies, like the one by Orr et al. [18], have just briefly touched on these
problems.
This paper addresses this gap by giving a fair assessment of the possible advan-
tages and hazards and devoting a sizeable amount of the discussion to the ethical
issues of utilising computational tools in forensic investigations. The literature study
discovered a development in forensic linguistics towards the employment of sophis-
ticated machine-learning methods. However, there is a dearth of studies addressing
the real-world difficulties and potential drawbacks of these methods. This work fills
this vacuum by critically examining the application of machine learning in forensic
linguistics and outlining its possible advantages, drawbacks, and difficulties.
Last but not least, the research closes a vacuum in the literature by offering a
thorough assessment of the state of the area right now, including the most recent
trends and advancements. This study offers a broader perspective on the subject
than many others that concentrate on particular applications or methods, making it
an important tool for academics, practitioners, and students interested in the nexus
between computational linguistics and forensic linguistics.
By filling up a number of highlighted gaps and offering a thorough, fair, and current
analysis of the nexus between computational linguistics and forensic linguistics, this
work significantly adds to the body of literature.
12 Findings
The results of the literature study have important ramifications for both forensic and
computational linguistics.
The results highlight the potential for Computational Linguistics to contribute to
useful, real-world applications, such as forensic investigations. The use of compu-
tational approaches in hate speech analysis, deceit detection, and authorship identi-
fication, as emphasised in the works of Simon et al. [6] and Moura et al. [5], show
the usefulness of computational linguistics. Additionally, the use of sophisticated
machine learning algorithms provides promising potential for additional study and
development in computational linguistics, as mentioned in Alshahrani et al. [8].
82 D. Mashru and N. S. Nautiyal
This is consistent with findings by Gurram et al. [13], who showed the value of
string kernel-based methods for Native Language Identification (NLI), a crucial field
in forensic linguistics.
The results also show that computational linguists must think about the ethical
ramifications of their work, especially when it is applied in delicate situations like
forensic investigations. The privacy issues and potential abuse of computational
approaches brought up in the study by Orr et al. [18] highlight how crucial ethical
considerations are in computational linguistics. The work of Azhniuk (2022), who
examined the methodological approaches to forensic linguistic study on the judicial
assessment of speech deeds, further echoes this.
The results show how useful it is for forensic linguists to use computational
methods in their job. These methods can offer priceless information and instruments
that can improve the precision and effectiveness of forensic investigations. The work
of Abdalla [15] shows how computational linguistics has the potential to dramatically
improve the skills of forensic linguists by helping to identify authorship, uncover
fraud, and analyse hate speech.
The results also demonstrate the necessity for forensic linguists to be judicious
users of these methods. To guarantee their ethical and efficient usage, it is essential
to be aware of their constraints and potential hazards. The relevance of this critical
viewpoint is shown by the worries raised in the literature over the efficacy and moral
ramifications of these tactics. The work of Kuznetsov [16], who explored the history
of forensic linguistics’ evolution and its present status and emphasised the necessity
for a thorough grasp of the discipline, serves as additional evidence for this.
The results of the literature review have important ramifications for both forensic
linguistics and computational linguistics, underlining the possible advantages, diffi-
culties, and ethical issues related to the interdisciplinary study of these two domains.
The publications cited offer a rich tapestry of ideas that emphasise these consequences
even more.
13 Limitations
The primary limitation of this review is its dependency on the provided sources.
The year of publication and the name of journals/books from which the studies
originated were not provided, potentially limiting the context in which the findings
should be interpreted. Furthermore, the rapidly evolving nature of both computational
linguistics and forensic analysis suggests that newer advancements might not be
included in this review.
Unveiling the Truth: A Literature Review on Leveraging Computational … 83
14 Conclusion
This study’s major goal was to examine how computational linguistics and forensic
linguistics relate to one another, with a focus on the ways that computational tools
might aid forensic investigations. The comprehensive literature review’s findings
provided a complex tapestry of viewpoints on this relationship. In forensic linguistics,
computational linguistics has a variety of applications, including the analysis of hate
speech and the identification of authors. It also highlighted the growing application
of complex machine learning techniques in forensic linguistics. However, the study
also found some inconsistencies in the literature, particularly with regard to the moral
and practical consequences of these uses. The findings of the literature review have
significant implications for both computational linguistics and forensic linguistics.
The results highlight the potential for Computational Linguistics to contribute to
useful, real-world applications, such as forensic investigations. Additionally, they
emphasise how crucial it is for computational linguists to think about the ethical
ramifications of their work, especially when it is applied in delicate situations like
forensic investigations.
References
12. Choobbasti AJ, Gholamian ME, Vaheb A, Safavi S (2018) JSpeech: a multi-lingual conver-
sational speech corpus. Athens, Greece: IEEE Spoken Language Technology Workshop
(SLT)
13. Gurram VK, Sanil J, Anoop VS, Asharaf S (2023) String Kernel-based techniques for native
language identification. Hum-Cent Intell Syst
14. Alduais A (2012) A comparative and contrastive account of research approaches in the study
of language. Int J Learn Develop 2
15. Abdalla AE (2020) Forensic linguistics and its role in crime investigation: descriptive study.
JALL | J Arabic Linguist Literat 2(2):55–75
16. Kuznetsov VO (2021) Forensic linguistics as a form of application of specialized linguistic
knowledge in legal proceedings: development history and current state. Theory Practice Foren
Sci 4(16):17–25
17. Sari PLP, Supiatman L, Aryni Y (2022) Hate speech acts on social media (Forensic Linguistics
Study). English Teach Linguist J 2(3)
18. Orr M, Kessel, Parry D (2022) The ethical role of computational linguistics in digital
psychological formulation and suicide prevention. In: Proceedings of the eighth workshop
on computational linguistics and clinical psychology. https://doi.org/10.18653/v1/2022.clpsyc
h-1.2
Navigating the Digital Frontier:
Unraveling the Complexities
and Challenges of Emerging Virtual
Reality
N. S. Nautiyal (B)
National Forensic Sciences University, Delhi 110085, India
e-mail: [email protected]
A. Patel
National Forensic Sciences University, Gandhinagar, Gujarat, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 85
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_8
86 N. S. Nautiyal and A. Patel
1 Introduction
Over the past few years, virtual reality (VR) technology has developed quickly,
offering immersive experiences that go beyond traditional media. New opportunities
have been created across several industries, including gaming, education, healthcare,
and training, as a result of this technical progress. The development of virtual reality
technology in recent years has been astounding, revolutionizing how people interact
with digital worlds. Numerous sectors, including gaming, education, healthcare, and
others are becoming increasingly interested in the possibilities of VR as a result
of the development of new VR devices, apps, and experiences. To fully realize the
promise of this technology, however, a number of complexity and obstacles have
also been brought about by these breakthroughs. To fully realize the promise of
this game-changing technology, researchers and developers must overcome a variety
of complexity and problems that have been brought about by the emergence of
new virtual reality technologies. Virtual reality (VR) technology has emerged as
a revolutionary force, changing how people interact with digital surroundings and
creating new opportunities in a variety of fields.
The potential uses of VR technology have grown beyond entertainment and
gaming to include industries like education, healthcare, architecture, and training
simulations as a result of developments in hardware, software, and user experiences.
An in-depth analysis of the complicated new virtual reality industry is presented in
this study, together with a discussion of the issues it raises in terms of technology,
user experience, ethical issues, and social impact. To fully reap the rewards of VR,
however, a variety of complications and obstacles must be thoroughly recognized and
handled as a result of this rapid expansion. This essay covers the complicated world
of contemporary virtual reality, diving into its technical difficulties, user experience
difficulties, and ethical issues, as well as looking at its social effects and accessi-
bility restrictions. Virtual reality technology has completely changed how people
engage with digital surroundings, providing immersive experiences with practical
uses in a wide range of fields. Even though VR technology has advanced, there are
still many complexities and problems that researchers and developers need to work
through. This article examines the nuances of these difficulties, including everything
from technology constraints to user experience problems, ethical dilemmas, soci-
etal implications, and accessibility restrictions. In recent years, virtual reality (VR)
has expanded quickly, revolutionizing a variety of sectors, including entertainment,
medicine, education, and more. While there is no denying that VR has the poten-
tial to alter society, there are ethical issues that must be carefully considered before
implementation. This article explores the complex ethical issues surrounding VR,
highlighting the significance of a moral framework to direct its application.
Navigating the Digital Frontier: Unraveling the Complexities … 87
The immersive qualities of VR, which are made possible by the combination of
cutting-edge technology and software, provide both amazing prospects and enor-
mous obstacles that call for careful investigation. This investigation will dig into
the complex complexity of virtual reality, examining the technological complexities,
cognitive issues, moral conundrums, and significant societal effects that together
create the landscape of this fast-changing field. But along with VR’s thrilling poten-
tial also come a host of complexities and complex problems that require careful
consideration. Through our research, we hope to shed light on the mysterious details
that lie behind the immersive digital worlds of virtual reality, eventually developing a
better understanding of its tremendous effects on both people and society as a whole.
User Privacy and Data Security: The collection and storage of user data within
VR ecosystems raise significant privacy concerns. VR applications often require
personal information and behavioral data, which can be susceptible to breaches and
unauthorized access [1]. Striking a balance between providing personalized expe-
riences and safeguarding user privacy requires robust data protection measures and
informed consent mechanisms.
Content Integrity and Responsibility: Virtual environments can host a wide array
of content, including user-generated material, simulations, and entertainment experi-
ences. Ensuring that these environments uphold ethical standards and do not promote
harm or misinformation is a crucial consideration [2]. Ethical guidelines should be
established to govern the creation and dissemination of VR content, particularly in
cases where the lines between reality and simulation are blurred.
Psychological Well-being and User Experience: The immersive nature of VR can
impact users’ psychological well-being, potentially leading to phenomena like cyber-
sickness or psychological distress [3]. Designers and developers must prioritize user
experience by minimizing adverse effects and optimizing comfort. Ethical respon-
sibility involves creating experiences that enhance well-being while minimizing
negative psychological outcomes.
Societal Implications and Accessibility: VR has the potential to exacerbate existing
societal inequalities if not made accessible to all segments of the population [4]. The
technology can lead to a “digital divide” where certain demographics are excluded
from its benefits due to economic or accessibility constraints. Ethical considerations
call for efforts to bridge this divide and ensure equitable access to VR experiences.
One of the most significant complexities of new virtual reality technology lies in
the demanding hardware requirements. High-quality VR experiences necessitate
powerful computing components such as advanced graphics processing units (GPUs),
fast central processing units (CPUs), and substantial memory capacity [5][6]. The
88 N. S. Nautiyal and A. Patel
The user experience and its psychological ramifications become a focus of inquiry
as people move across this digital frontier. Jerald [13] explores the possibility of
negative psychological impacts that might degrade the overall user experience, such
as disorientation and cybersickness. The field of view and motion tracking preci-
sion are other aspects that affect this experience [14] investigation of the uncanny
valley phenomena, which looks at the connection between comfort and realism
in VR settings, adds to the complexity of user involvement. Because VR has the
potential to cause cybersickness, disorientation, and other negative impacts, user
experience is a major concern. Discomfort and decreased immersion are caused
by elements including the field of vision, motion tracking precision, and the
vergence-accommodation conflict [13]. User acceptance and emotional engagement
are also influenced by psychological elements, such as presence and the “uncanny
valley” phenomena [14]. A multidisciplinary strategy incorporating psychology,
human–computer interaction, and neuroscience is necessary to address these issues.
Comfort and Fatigue: In the development of VR, user comfort is of the utmost
importance. Long-term use of virtual environments can cause pain, weariness, and
even health problems including motion sickness and eye strain. A deep under-
standing of human physiology and psychology is required to create VR experi-
ences that address these issues. Optimizing rendering methods, lowering motion-
to-photon latency, and designing user-friendly locomotion mechanics that reduce
sensory conflicts are frequently needed to address discomfort.
The development of VR technology faces a basic difficulty in ensuring user
comfort [15]. Long-term use of VR settings can cause discomfort, weariness, and
physical pain, such as headaches and eye strain. Designers must prioritize ergonomic
design principles, optimizing the field of view, and evenly dispersing the weight of
VR headsets in order to address these problems. To guarantee users can interact
with VR material without experiencing negative physical impacts, it is imperative to
adopt measures to lessen motion sickness, such as improving locomotion systems
and fine-tuning rendering algorithms.
Interaction and Realism: A big technological problem is constructing realistic
virtual worlds that replicate the complexity of the actual world. A significant
amount of computer power is required to produce photorealistic pictures, accurate
physics simulations, and lifelike animations. Furthermore, advanced input devices
and gesture detection algorithms are needed to enable natural and intuitive interac-
tion inside these settings. The creation of haptic feedback systems that effectively
recreate tactile sensations is an ongoing problem, as it entails elaborate sensor arrays
and complex algorithms.
90 N. S. Nautiyal and A. Patel
VR equipment frequently has cameras and sensors to record user motions, gestures,
and even actual surroundings. Data security and user privacy are raised by this.
Important measures in ensuring user privacy in the VR ecosystem include offering
transparent data usage regulations, ensuring strong encryption of user data, and giving
consumers discretion over how much data is shared. Privacy and data security prob-
lems are increasingly in the spotlight as a result of the massive volumes of user
data that VR technologies capture. VR equipment with cameras and sensors has the
ability to record private information about users’ movements and real surroundings.
To allay these worries and foster confidence within the VR ecosystem, it is crucial to
provide strong data encryption, transparent data usage regulations, and user control
over their data.
Data security and privacy issues are raised by the collecting of user data via VR
devices [17]. The cameras and sensors built into VR equipment have the ability
to record private data about users’ movements and physical surroundings. Strong
data encryption, open data usage regulations, and user control over data sharing are
essential for addressing these issues.
Because VR is so intense, there are moral concerns about what kind of psychological
and societal effects technology may have. Extended usage of virtual environments
may cause a blending of the actual and virtual worlds, which may have an impact on
Navigating the Digital Frontier: Unraveling the Complexities … 91
users’ mental health. Additionally, user perceptions and attitudes might be affected
by VR experiences, raising questions about the possibility of immersive propaganda
or exposure to dangerous information. Creating rules for ethical VR content produc-
tion and consumption is necessary to overcome these ethical difficulties. Because
VR experiences are so intense, moral concerns have been raised about their possible
psychological and social effects [18]. Extended usage of virtual environments may
cause the line between the virtual and real worlds to become hazier, which might
have an impact on users’ mental health. Additionally, user perceptions and atti-
tudes might be affected by VR experiences, raising questions about the possibility
of immersive propaganda or exposure to dangerous information. Creating rules for
ethical VR content production and consumption is necessary to overcome these
ethical difficulties.
The ability of VR to produce immersive and compelling experiences begs the question
of where to draw the line between constructive participation and escape. Overuse of
virtual reality for enjoyment may cause people to avoid obligations and relationships
in the real world. A continuing societal problem is balancing the advantages of VR’s
experience potential with its possible effects on social dynamics. The ability of VR
to create immersive experiences encourages reflection on the appropriate ratio of
productive involvement to escapism. Overindulging in VR for leisure might cause
people to distance themselves from obligations and relationships in the real world.
A persistent societal problem that needs careful study is finding a harmonic balance
between the sensory potential of VR and its potential effects on social dynamics.
A wide range of issues related to technology, psychology, ethics, and society
are presented by virtual reality. Collaboration between technologists, psychologists,
ethicists, legislators, and cultural specialists is necessary to address these issues. With
VR continuing to influence many facets of contemporary life, it is crucial to have a
deep awareness of its complexity in order to maximize its benefits and minimize its
drawbacks.
These case studies underscore the intricate challenges faced when navigating the
complexities of emerging virtual reality technologies. Each case study exemplifies
the multi-faceted nature of the digital frontier, where solutions are born from collab-
orative efforts, innovative strategies, and ethical considerations. The analysis of case
studies is shown in Table 1.
The complexity and difficulties of evolving virtual reality cover a wide range of
topics, such as how it affects personality, who is responsible for creating material,
crimes committed in virtual worlds, and the need for proper legal frameworks. Stake-
holders must work together to solve these issues as VR technology develops in order
to fully realize its promise while preserving users’ rights, ethics, and well-being.
• Identity and personality in virtual reality: Questions about how these immersive
experiences affect personality and identity arise as people explore the virtual
reality (VR) frontier. The “Proteus effect” is a phenomenon that occurs when
users build avatars or other digital representations of themselves that may be
different from who they really are [21]. This impact emphasizes how the look of
users’ avatars may affect their behavior and attitudes. This raises moral questions
concerning the degree to which VR might alter one’s behavior and self-perception,
affecting both one’s identity and interactions in virtual worlds.
• Platforms and Content Creators’ Ethical Responsibility: Virtual reality content
developers and platforms have a big ethical obligation. Content makers must take
Table 1 Analysis of case studies
S. No Background Institution Situation Solution Outcome
1 The inclusive Virtual Learning VLU, a leading online education VLU collaborates with non-profit The initiative leads to a
learning initiative University (VLU) platform, aims to revolutionize organizations and educational significant increase in the
distance learning through immersive institutions to provide subsidized VR participation of students from
VR experiences kits to underprivileged students [20] disadvantaged backgrounds,
However, they face the challenge of Theyalso develop content that enriching their educational
ensuring equitable access to VR catersto diverse learning styles experiences and reducing the
education experiences across socio andlanguage preferences, ensuring digital divide
economic backgrounds inclusivity
2 Ethical content Immerse VR Immerse VR Studios is a content Immerse VR Studios collaborates The company has gained a
curation Studios creation company specializing in with ethicists and cultural experts to reputation for producing
VR experiences. They grapple with establish content guidelines that culturally enriching and
the challenge of responsibly prioritize diversity and cultural responsible content,
curating content to ensure that their sensitivity [16]. They implement AI- attracting a broader audience,
immersive simulations do not drivenfilters that identify and flag and contributing positively to
perpetuate stereotypes or harmful potentially offensive content for the VR ecosystem
Navigating the Digital Frontier: Unraveling the Complexities …
narratives review
3 Overcoming Healthcare VR Health Solutions develops VR The company invests in research to Through iterative design and
cybersickness Institution: VR applications for pain management understand the causes of continuous improvement, VR
Health Solutions and physical therapy. However, they cybersickness and incorporates user Health Solutions minimizes
face the challenge of users feedback to refine their applications the occurrence of
experiencing cybersickness and [13] cybersickness, leading to
discomfort during prolongedVR higher user satisfaction and
sessions improved treatment outcomes
93
94 N. S. Nautiyal and A. Patel
into account the possible psychological effects of their works since the immersive
nature of VR encounters might make it difficult to distinguish between reality
and simulation [2]. It is crucial to ensure that content complies with responsible
content rules, respects cultural sensitivities, and refrains from supporting negative
stereotypes. To stop the spread of harmful or improper information, VR platforms
should employ effective content moderation methods [16].
• Virtual Offences’ Challenges: The issues of virtual offenses, such as cyber-
crimes, bullying, and harassment, do not exclude the digital frontier. According
to Boukhechba [22], virtual reality settings can open up new doors for hazardous
behaviors that transcend physical boundaries. Such offenses may have a greater
impact because of VR’s immersive features, which may cause psychological
injury and emotional suffering. The problem comes in developing efficient
strategies to prevent and resolve virtual offenses as users navigate these virtual
areas, demanding cooperation between developers, law enforcement, and legal
professionals.
• Challenges with Jurisdiction and Legal Frameworks: VR’s legal environment
is complicated, especially with regard to responsibility and jurisdiction. Virtual
offenses that take place in virtual settings may include people from many
geographical areas, making it more difficult to determine the rules and regula-
tions that apply [23]. Furthermore, legal frameworks that take into account VR’s
particular characteristics are needed for concerns relating to user consent, intellec-
tual property rights, and data privacy in virtual environments. Policymakers and
legal professionals must ensure the preservation of users’ rights while adjusting
current laws to the digital frontier.
The complexity and difficulties of the evolving Virtual Reality go beyond the
technical to include personality, responsibility, criminal activity, and legal prob-
lems. Stakeholders must have continuing discussions as the digital frontier develops
in order to create moral standards, responsible content policies, and regulatory
frameworks that guarantee VR’s good potential while minimizing its negative effects.
The aspects of complexities and challenges related to emerging Virtual Reality (VR)
technology in terms of personality, responsibility, offenses, and laws are shown in
Table 2.
As the digital frontier unfolds, navigating the complexities and challenges inherent
in emerging Virtual Reality (VR) technologies requires a multi-pronged approach
that draws on technological innovations, ethical considerations, and societal inclu-
siveness. The integration of these solutions can pave the way for responsible and
sustainable development and usage of VR. Innovations in Technology for Better
Table 2 Complexities and challenges of virtual reality
S.no Basis Complexity Challenge
1 Personality and VR experiences have the potential to evoke The challenge lies in understanding how different personality traits interact
psychological impact strong emotional responses, impacting with VR stimuli. Introverted individuals might experience heightened
individuals’ personalities and psychological discomfort in socially Interactive VR scenarios [14]
well-being Moreover, prolonged exposure to VR could lead to a phenomenon called
The immersive nature of VR can influence “post-VR reality,” where individuals struggle to transition from the virtual
mood, emotions, and even alter perceptions of to the real world, potentially affecting personality dynamics
self
2 Responsibility in The vast diversity of VR content, from Striking the balance between artistic freedom and responsible content
content creation educational simulations to entertainment creation is a major challenge [2]
experiences, poses a challenge in maintaining Creators must consider the potential impact of their content on users’ beliefs
responsible content creation practices and behaviors. Ethical guidelines and moderation mechanisms need to be
VR content creators face ethical dilemmas in established to prevent offensive or inappropriate content from being
ensuring that their creations are both disseminated
engaging and free from harmful narratives or
stereotypes
3 Offenses and ethical VR environments can serve as platforms for Determining the appropriate response to offenses committed within virtual
dilemmas both positive and negative interactions. Just as environments presents an ethical challenge. While the impact might be less
Navigating the Digital Frontier: Unraveling the Complexities …
in any digital space, there is a potential for tangible than in the physical world, emotional harm and psychological
offensive behavior, harassment, and even distress are still very real [16]. Establishing a system for reporting and
virtual “crimes” within the VR landscape addressing such incidents, while respecting the unique nature of virtual
interactions, is essential
4 Lawsand legal Thefast-paced development of VR technology Legislators and legal experts grapple with the challenge of crafting laws that
frameworks often outpaces the establishment of legal adequately address offenses and disputes arising within VR environments.
frameworks to address related challenges. VR Intellectual property rights, data privacy concerns, and liability issues are
can blur the lines between virtual and real, just a few of the legal aspects that need clarification [1]. A proactive
making it difficult to apply existing laws to approach that anticipates potential legal challenges is crucial to ensuring a
novel situations fair and just virtual space
95
96 N. S. Nautiyal and A. Patel
8 Conclusion
Virtual reality, the newest frontier, has enormous potential and presents previously
unheard-of capabilities for communication, entertainment, and education. However,
it is impossible to ignore the complexity and difficulties brought on by VR technology.
VR researchers and developers encounter a variety of challenges, from technical
difficulties like system needs and latency to user experience issues like comfort and
content adaption. The situation is made more complicated by ethical worries about
privacy, psychological effects, and societal ramifications.
A multidisciplinary strategy involving cooperation between engineers, psycholo-
gists, ethicists, and legislators is required to address these complexities and obstacles.
Finding creative solutions to these problems will be essential for realizing the full
promise of virtual reality while guaranteeing its ethical and fair inclusion into our
lives as the VR ecosystem develops. The new virtual reality is complex and diffi-
cult in terms of technology, user experience, ethics, and societal issues. Researchers
and developers must manage a variety of challenges, including difficult hardware
requirements, latency considerations, user comfort, content adaption, and ethical
issues. Innovative answers to these problems will be essential in realizing the full
promise of virtual reality while guaranteeing its responsible incorporation into our
daily lives as VR technology develops.
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 99
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_9
100 D. Shukla and A. Pandey
1 Introduction
For how long does “seeing is believing” apply? Images and videos currently carry
a lot of probative weight in society. They are regarded as prima facie proof that
the reported occurrence actually took place. The introduction of alleged “deepfake”
videos could alter that. Artificial intelligence advancements have made it possible to
produce videos that appear to be real but actually contain fake actions and statements
made by real people [1].
Deepfakes will undoubtedly start cropping up in court settings as AI apps increas-
ingly control our lives, and it is inevitable that the evidence required to successfully
deal with civil matters and criminal trials will include facts generated by this uncanny
technology [2]. In this article, the effects of deepfakes on courtroom procedures are
discussed, along with the authenticity and admissibility of such electronic evidence
in the context of India.
Fortunately, the phenomenon of falsification and evidence tampering is not new to
courts. To help weed out fakes, the rules of evidence have long-established authen-
tication requirements. We contend that these standards are insufficient to combat
deepfakes as is and that the standard for authenticating video evidence should be
strengthened in order to figure out the challenges posed by AI-based technology.
Although Indian Courts have previously handled the inauthentic evidence
presented in front of them the challenge posed by this AI-based technology should
not be seen from the same lens. Moreover, there is no published court opinion in
India that considers the issues regarding AI admissibility in any depth.
The emerging technologies are enablers for a better future, they are these empowering
notions that hold the power to dictate the future of humanity. The advent of Artificial
Intelligence (further referred to as AI) has given birth to numerous other branches of
technologies through advancing data sciences. Deepfakes are part of the family tree
and are considered as part of synthetic media. Deepfake is a method of technology that
uses AI as a means whereby the user has the ability to recreate through audio-visual
cues a synthetically augmented video of a real person including pictures, by making
them act or speak things that have not been committed in reality. The algorithms and
systems hold the ability through means of machine learning, to process collections of
data and information for the user so that any form of body part including face, body,
and other visual attributes can be re-generated on the screen which are seemingly
real but not in reality [3].
Deepfakes are derivative of deep learning which is a form under the head of AI,
the technology works on the basis of neural networks. The neural network tech-
nology in deep learning is frequently seen to be filled with input/output structures.
The algorithm consists of two related bodies which are known as the generator and
Challenges to Admissibility and Reliability of Electronic Evidence … 101
discriminator. These algorithms are very significant because they distinguish the
content between fake and real. The generator set is used to create the fake content,
while the discriminator set is used to distinguish the features that were faked hence
authenticating the material. After the detection of the authentic features within its
system, the discriminator reports it back to the generator so that the fake content can
be perfected more and more and be in line with the real instance. So, the system
improves itself through such information. The input function gets more weightage
when the picture or content is closer to the real image, essentially, it’s like a scale of
success to determine the degree of correctness.
The underlying technology can overlay face images, create facial motions, switch
faces, maneuver facial expressions, produce faces, and synthesize the speech of
a target individual onto a video of a spokesperson in order to create a video of the
target individual acting similarly to the source person. The subsequent impersonation
is often practically indistinguishable from the original ones [4]. Videos of Barack
Obama [5], Donald Trump [6], Nancy Pelosi [7], Russian President Vladimir Putin
[8], Ukrainian President Volodymyr Zelenskyy [9], Economic Affairs Minister of
Malaysia [10], Tom Cruise [11], Facebook CEO Mark Zuckerberg [12], American
President Richard Nixon [13], and Queen Elizabeth [14] are few deepfake incidents
that reflect the endless creativity of this technology. India, for the first time, witnessed
the emergence of Deepfake manipulation in the 2020 Delhi Assembly Elections when
a deepfake video of Manoj Tiwari, the state president of Bhartiya Janta Party (BJP),
was widely circulated through WhatsApp [15].
AI-assisted technology may be used to create videos showing corrupt authorities,
atrocities committed by the military, immoral presidential candidates, and emer-
gency professionals warning of a terrorist attack. [16]. The diverse domains in which
harm can be caused by Deepfake videos or images can be best understood from the
Deepfakes Accountability Act [17]. Act imposes criminal liability if the advanced
technology can be used to create false personation with the intent to
• feature person in sexual activity,
• cause violence or physical harm, incite armed or diplomatic conflict,
• interfere in an official proceeding,
• commit fraud, including securities fraud,
• influencing a domestic policy debate,
• interfering in a Federal, state, or territorial election.
The threat of deepfakes is strengthened by the fact that they are extraordinarily
precise, easy to create, and have adverse effects on the viewers. Moreover, the quality
of deepfake videos seeming to be real will keep improving over time [18], it will
become difficult to detect fakes for unaided humans and as a result of this, it will be
difficult for people and AI system itself to differentiate real videos from fake ones
[19]. This threat makes the detection of deepfakes a continuing problem.
Governments all around the world, the tech industry [20], and other stakeholders
have made efforts to develop technology for the detection of deepfakes. The main
goal of deepfake detection is to figure out the authenticity of video recordings and
to find out if the video has been manipulated in any manner or not.
102 D. Shukla and A. Pandey
Although harms associated with the prevalent use of deepfake strike at many levels
of society, in this research article we will stick the discussion on the issue of deepfake
being an imminent threat to courtroom integrity and the investigative process.
Deepfake technology can be used by the complainant to produce fabricated audio-
visual evidence in order to obtain judgement in its favour, and on the other hand,
defence lawyer can plant the seeds in the judge’s mind to question the authenticity
of digital evidences produced by other parties, even though he or she knows that the
evidence produced is genuine. Therefore, even in situations where there are no fake
videos, the simple fact that deepfakes exist will make it more difficult to verify the
veracity of actual evidence. In the long run, this can create bias and scepticism in the
mind of judges regarding the admissibility of audio-visual evidence in general.
So far, there are two major instances where deepfake has negatively impacted the
legal proceedings.
i. Deepfake Evidence was produced in a British court in one of the cases, where
the mother used doctored threatening audio of the father, in order to obtain the
custody of the child [21].
ii. Another recent example shows how mere allegations of the production of a
deepfake video can negatively impact the trial. In March 2021, Raffaela Spone
was detained in March 2021 and charged with harassing her daughter’s cheering
opponents by reportedly creating deepfakes that showed them nude, drinking,
and vaping. However, Spone denied the charges of deepfake creation. Experts
in forensics and technology concluded that it was real and not a fabrication, but
the poor video quality and dearth of additional evidence prevented them from
reaching a definitive verdict. Later, the office of the prosecutor announced that
the lead office of the case had concluded that the video was fake on a “naked
eye” inspection of it and hence they are no longer pursuing the deepfake video
as the basis for the accusation.[22].
The first case shows that parties, by producing easily created fabricated evidence,
can make the role of judges burdensome, as they have to take additional measures to
determine the authenticity of the evidence produced. Luckily, in that case husband
was able to find out the original audio which was doctored and compared the metadata
and proved to the court that audio is manipulated, but it would not be so easy for the
parties as well as the judges, in every case, to determine the authenticity of evidence
produced in absence of recovery of the original file and technical aid from experts.
Moreover, the nature of evidence like video recordings or audio recordings makes
them so trustworthy, that they are taken at face value by judges.
Second case is a perfect example where a defense lawyer can manipulate the
proceedings by allegedly creating doubt in the minds of judges that even an authentic
video is a deepfake. In that situation, Spone had already suffered harm by the time
the prosecution revised its strategy. Spone was overcome with unfavorable attention.
According to her lawyer, her reputation has been destroyed; she received death threats
and was mocked and bullied in her neighborhood and online [23].
Challenges to Admissibility and Reliability of Electronic Evidence … 103
In addition to the above two cases, in the United States, deepfake evidence was
produced as proof in defamation cases [24], a federal civil rights action [25], child
pornography [26], and assault with an attempt to murder [27]. Hence, as sophisti-
cation of technology increases with time, the deepfake evidence shall be the central
focus of the litigation [16].
These cases of other jurisdictions are a notable warning for India to frame and
amend laws in order to cater to the demanding exigency of sophistication of deepfake
technology. It is a matter of time before such fake evidence can be presented in
district courts of India and the adversarial system of justice may actually fail before
this technology due to the absence of a relevant detection system and expert opinion.
Although, existing Indian law contains a procedure for the authentication of digital
evidence, it falls short majorly because rules were developed before the emergence
of deepfake technology. Hence, in the digital age, when video and audio recordings
would be frequently presented as evidence, there is a need to verify and amend the
rules of evidentiary standards in India, for authenticating video and audio evidence
to counter the impact of deepfake technology.
section, Electronic form evidence means “any information of probative value that
is either stored or transmitted in electronic form and includes computer evidence,
digital audio, digital video, cell phones, digital fax machines”. Hence, the defini-
tion is inclusive enough to take into ambit the videos or audio created by deepfake
technology.
This section will analyse whether the existing Indian Legislation is equipped
enough to address the challenges of production, identification, and admission of
deepfake electronic evidence in the Courtroom.
Section 85B of the Indian Evidence Act creates a presumption as to alteration of ‘elec-
tronic record’. It is significant to understand that the term ‘electronic record’ would
include within its definition ‘electronic evidence’. The term ‘Electronic Record’ has
been defined under Sect. 2(t) of the Information Technology Act as “data, record
or data generated, image or sound stored, received or sent in an electronic form or
micro film or computer-generated micro fiche”. Although the definition of electronic
record does not specifically contain video, but are referring to nothing but data stored
in an electronic form. Hence, it is difficult to exclude electronic evidence as part of
electronic records.
Section 85B provides that “In any proceedings involving a secure electronic
record, the Court shall presume unless contrary is proved, that the secure electronic
record has not been altered since the specific point of time to which the secure status
relates”. Moreover, sub-Sect. 2 of Sect. 85B provides that, “nothing in this section
shall create any presumption, relating to authenticity and integrity of the electronic
record, except in the case of a secure electronic record”.
This section is relevant in the context of deepfake technology manipulation and
two different conclusions can be drawn from the interpretation of this section:
i. Firstly, It provides that if the electronic record is secure, then the court shall
presume that there is no manipulation or alteration in the evidence produced,
unless proved otherwise. But if the electronic evidence is not secure, then there
is no presumption of any nature, be it positive or negative. Hence, as mentioned
earlier judges consider the video recordings or audio recordings at face value, as
not something that is supported by law, but is rather based on the belief or human
psychology of judges due to the nature of evidence being so trustworthy.
ii. Secondly, the law has created a demarcation between ‘electronic record’ and
‘secure electronic record’. The definition of ‘Secure electronic record’ has been
surprisingly mentioned in two different places under the Information Technology
Act. One at Sect. 14 of the IT Act and the other at Rule 3 of The Information
Technology (Security Procedure) Rules, 2004. Section 14 provides that “Where
any security procedure has been applied to an electronic record at a specific
point of time, then such record shall he deemed to be a secure electronic record
Challenges to Admissibility and Reliability of Electronic Evidence … 105
from such point of time to the time of verification”. However, Rule 3 of The
Information Technology (Security Procedure) Rules, 2004 mentions that “An
electronic record shall be deemed to be a secure electronic record for the purposes
of the Act if it has been authenticated by means of a secure digital signature”.
Both definitions conclude that electronic record, to be considered as secured,
must be digitally signed.
In sum, if any party to the case, produces any digital evidence for consideration
in front of court, the court shall presume that the digital or electronic evidence is
not fabricated, only if such digital evidence is authenticated by means of a secure
digital signature. Otherwise, the court shall not make any presumption regarding
authenticity.
Securing electronic evidence by means of digital signature seems like a simple
option. This will confirm that a particular sight was actually observed physically and
not digitally created by a camera. But it is neither a viable nor a practical solution.
This solution will further raise numerous other issues as discussed below:
i. To counter deepfake technology, if at all, the concept of digital signature in videos
is enforced through law, then, every camera device of any specification, be it a
mobile camera or professional camera, or spy camera, etc., has to be embedded
with this verification technology by every manufacturer of such device. Manu-
facturers of every such device might not give consent to add a digital signature
feature in its camera due to privacy concerns because integrating this feature in
the device could fulfil the dream of surveillance of any state [30].
ii. Another technical problem with such technology of integrating digital signatures
with camera devices is that technology does not work well with videos. Changing
the format of the video from MPEG4 to MPEG2 would completely change the
hash value of the original video, thus depicting the fabrication, although not
in the actual sense. Professor of computer vision at the University of Surrey
and project leader for Archangel, John Collomosse, states that “Publishers run
their document through a cryptographic algorithm such as SHA256, MD5, or
Blowfish, which produces a “hash,” a brief string of bytes that represents the
content of that file and serves as its digital signature. Running the same file
through the hashing algorithm at any time will produce the same hash if its
contents haven’t changed Hashes are extremely sensitive to changes in the source
file’s binary structure. When you change only one byte in the hashed file and
rerun the procedure, the outcome would be completely different. But while hashes
work well for text files and applications, they present challenges for videos, which
can be stored in different formats” [31].
iii. This solution is not viable because, where recorded videos only depict the
conclusion of the incident and not the entire incident leading up to the conclu-
sion with an incorrect description (Social media users may post videos showing
police handcuffing and shooting a suspect in the leg before claiming that the
man was an unarmed innocent pedestrian who was the victim of drive-by police
brutality.) Adding digital signatures to cell phone cameras would not address
this common source of false videographic narrative, as the issue is not whether
106 D. Shukla and A. Pandey
the footage is real or fake, but rather whether it captures the entire situation and
whether the description assigned to it represents what the video actually depicts.
[31].
Section 136 of the Indian Evidence Act mentions that “When either party proposes to
give evidence of any fact, the Judge may ask the party proposing to give the evidence
in what manner the alleged fact, if proved, would be relevant; and the Judge shall
admit the evidence if he thinks that the fact, if proved, would be relevant, and not
otherwise”. Therefore, before determining the admissibility of electronic records,
judges must decide upon the relevancy of the electronic record in question.
As per Sect. 3 of the Indian Evidence Act, “One fact is said to be relevant to
another when the one is connected with the other in any of the ways referred to in
the provisions of this Act relating to the relevancy of facts”. Deepfake videos and
audios can easily be created to make them relevant under any of the principles of the
Indian Evidence Act, for example, fake video or audio may be presented as evidence
to show motive, intention, state of mind, or to depict the bad character, or to create
estoppel, or to impeach the credit of witness or any other circumstances which cannot
be anticipated, as we have seen from two cases discussed above.
After determining the relevancy of the evidence produced, the court shall look into
the procedural aspect of admission of electronic evidence, i.e., whether the evidence
produced in question is admissible or not. The admissibility of electronic evidence
is greatly affected by reliability and authenticity. Authentication entails persuading
the court that (a) the record’s contents have not changed, (b) the information in the
record actually came from its alleged source, whether a person or a machine, and
(c) extraneous information, like the record’s apparent date, is accurate. Sections 65A
and 65B of the Indian Evidence Act of 1872 were amended in order to accomplish
everything that has been stated above [32].
Any documentary evidence by way of an electronic record under the Evidence
Act, in view of Sects. 59 and 65A, can be proved only in accordance with the
procedure prescribed under Sect. 65B. The admissibility of the electronic record
is covered under Sect. 65B. These clauses are meant to legalize the production of
computer-generated secondary evidence in electronic form.
Section 65B(4) requires the production of a certificate that, among other things,
identifies the electronic record containing the statement, describes how it was created,
and gives specifics of the device used in its creation in order to show that an electronic
record was created using a computer. This certificate must be presented by someone
who is either in charge of managing the relevant device’s management or operating
it in an official capacity.
However, in matters, where relevant digital evidence may be created by the use of
deepfake technology, and produced before the court, Sect. 65B of the Indian Evidence
Challenges to Admissibility and Reliability of Electronic Evidence … 107
Act is no longer useful. The main reason behind this argument is based on the fact
that deepfakes can be created in real time [33]. This can be proved by following two
ways:
i. Conditions of Sect. 65B are necessary to certify the fact that secondary evidence
that is being produced before the court is not manipulated as against the original
primary electronic record. However, in the case where by the use of deepfake
technology, fake videos and audios can be created in the real time. This in turn
shows that if the original video is itself fake, certification of secondary evidence
under Sect. 65B of its authenticity will no longer be valid and useful.
ii. Secondly, if the fabricated original video or audio (due to creation in real time)
is itself presented in the Courtroom, Sect. 65B will not be applicable; as we have
seen in the Preeti Jain vs Kunal Jain &Anr case [34], in which Court mentioned
that compliance with Sect. 65B is not necessary because clippings from the hard
disk of spy camera constituted a primary evidence.
In P.V. Anvar v. P.K. Basher, the Supreme Court of India held that “opinion of an
examiner of electronic records under Sect. 45A could only be obtained once the
secondary electronic evidence has been produced in compliance with Sect. 65-B”.
Apex Court opined that “all these safeguards are taken to ensure the source and
authenticity, which are the two hallmarks pertaining to electronic record sought to be
used as evidence. Electronic records being more susceptible to tampering, alteration,
transposition, excision, etc. without such safeguards, the whole trial based on proof of
electronic records can lead to travesty of justice”[35]. However, only if the electronic
record is duly produced in terms of Sect. 65B of the IEA, the question would arise
as to the genuineness thereof, and in that situation resort can be made to Sect. 45A,
IEA - opinion of examiner of electronic evidence.
Section 4.2 enumerates the erosion of the usefulness of Sect. 65B of the Indian
Evidence Act in the context of the production of deepfake evidence. Taking into
consideration the above-mentioned points, the significance of Sect. 45A strengthens
manifold in order to determine relevancy provided we have to move away from the
settled law laid down in Anvar Case.
Section 45A of IEA talks about the ‘Opinion of Examiner of Electronic Evidence’.
It mentions that “When in a proceeding, the court has to form an opinion on any
matter relating to any information transmitted or stored in any computer resource
or any other electronic or digital form, the opinion of the Examiner of Electronic
Evidence referred to in Sect. 79A of the Information Technology Act, 2000 (21 of
2000), is a relevant fact.
108 D. Shukla and A. Pandey
Section 79A of the Information Technology Act provides that “The Central Govern-
ment may, for the purposes of providing expert opinion on electronic form evidence
before any court or other authority specify, by notification in the Official Gazette,
any Department, body or agency of the Central Government or a State Government
as an Examiner of Electronic Evidence”.
A combined reading of these two sections, specifies that the Central Government
shall specify any authority as an ‘Examiner of Electronic Evidence’, and the opinion
of such authority shall be considered relevant before the court in the form of expert
opinion.
Although these two sections were added in the year 2009 through amendment, it
was after 8 years since the amendment, in 2017, The mechanism to access and notify
the Examiner of Electronic Evidence was designed by the Ministry of Electronics and
Information (MeitY) [36], and after 9 years since the amendment, in 2018, Central
Government for the first time notified Forensic Science Laboratory, Sector 14, Rohini,
New Delhi under Government of National Capital Territory of Delhi, as Examiner
of Electronic Evidence within India [37]. However, since the first notification, as of
today, there are a total of 15 agencies of the Central or State Government that have
been notified by the Ministry as an ‘Examiner of Electronic Evidence’ [38].
If we analyse the scope of work of these 15 agencies, none of them were found
to be eligible to provide expert opinions on matters related to deepfake technology.
Although there is no doubt about the technical competency of such agencies, but
credibility of such expert opinion can be challenged because of the limited scope of
activities in which such agencies were notified as competent. This argument can be
substantiated by the following paragraph.
The scope of approval is outlined in the scheme’s second part. It states that any
department, body, or organization of the federal government or a state government
that wishes to be recognized as an examiner of electronic evidence may submit an
application to the Ministry of Electronics and Information Technology (MeitY) for
one of the activities listed below:
i. Computer (Media) Forensics
ii. Network (Cyber) Forensics
iii. Mobile Devices Forensics
iv. Digital Video/Image & CCTV Forensics
v. Digital Audio Forensics
vi. Device Specific Forensics
vii. Digital Equipment/Machines (having embedded firmware)
viii. Any other.
Even though there are eight specific areas of activity in which any forensic lab
or agency can get notified by the MeitY, all 15 of such agencies, interestingly, were
notified as Examiner of Electronic Evidence only in two areas of activity, namely:
Challenges to Admissibility and Reliability of Electronic Evidence … 109
Firstly, the Bharatiya Sakshya Bill must recognise the fact that technically there are
procedural roadblocks to the presumption of any secure electronic evidence under
Sect. 85B of the Indian Evidence Act. With the advancement of the technological
wings of the AI giant, P.V. Anwar’s judgment seems to be technically infeasible as
requirements of Admissibility of Evidence under Sect. 65B of the Indian Evidence
Act could not be met in the case of deepfake evidence. Lastly, the Central Govern-
ment must recognise the competent Forensic Science Laboratory that can technically
recognise the authenticity of deepfake evidence produced anywhere in India and if
needed may take help from Software giants in the field, whose expert opinion could
be trusted in the court of law.
The Supreme Court has often acknowledged that electronic records are more prone
to manipulation, alteration, transposition, excision, and other errors and without the
safeguards, the whole trial based on proof of electronic records can lead to the travesty
of justice.
Electronic Evidence produced through deepfake technology poses an even bigger
threat to the already established legislative safeguards and potentially challenges the
justice delivery system to the core. Examples cited by Homeland Security [41] of
the United States are proof of the fact that deepfake issues are to be taken care of
seriously and they deserve the due attention from the legislators.
Indian law is lagging behind in tackling the potential harms associated with the
emerging arms of the technological capabilities of AI that produce deepfakes. Law is
necessary to address the difficulties brought about by new social developments. The
concept and application of law must evolve together with society if it is to remain
relevant.
Bharatiya Sakshya Bill, 2023 which was recently introduced in the Parliament of
India failed to amend or evolve any law relating to electronic evidence considering
the changing contours of society. The proposed Digital India Bill, a draft of which is
pending for a long time, hints at changing the 22-year-old Information Technology
Act. It contains the worries related to challenges posed by Deepfake technology but
provides no solid solution yet.
Digital India Bill provides hope that the legislature would recognise the fact that
AI-assisted technologies be treated differently from the age-old ways of fabricating
any information. This is the right time when the Bharatiya Sakshya Bill, Bharatiya
Nyaya Sanhita and Bharatiya Nagarik Suraksha Sanhita must be integrated with the
Digital India Bill to define the word ‘deepfake’ and should contain a separate chapter
related to harms associated with deepfake.
Although India has yet not witnessed any case in which deepfake video or audio
was presented in front of the court or it might also be a case where evidence was
produced in the court of law at the district level and caught no one’s attention due to
unawareness of the subject matter because, ‘Seeing is believing’.
Challenges to Admissibility and Reliability of Electronic Evidence … 111
Appendix A
References
1. Riana Pfefferkorn F (2020) “Deepfakes” in the Courtroom. Public Interest Law J 245–275
2. Paul W, Grimm F, Grossman MRS, Gordon V. Cormack T (2021) Artificial intelligence as
evidence. Northwestern J Technol Intell Property 10–105
3. Rob Cover F (2022) Deepfake culture: the emergence of audio-video deception as an object of
social anxiety and regulation. J Media Cultural Studies 4–12
4. Loveleen Gaur F (2023) DeepFakes creation, detection, and impact. Taylor & Francis
5. Business Insider India. https://www.businessinsider.in/tech/a-video-that-appeared-to-show-
obama-calling-trump-a-dipsh-t-is-a-warning-about-a-disturbing-new-trend-called-deepfakes/
articleshow/63807263.cms. Last accessed 02 Sept 2023
6. The Guardian. https://www.theguardian.com/technology/ng-interactive/2019/jun/22/the-rise-
of-the-deepfake-and-the-threat-to-democracy. Last accessed 02 Sept 2023
7. Forbes. https://www.forbes.com/sites/charlestowersclark/2019/05/31/mona-lisa-and-nancy-
pelosi-the-implications-of-deepfakes/?sh=5e46695e4357. Last accessed 02 Sept 2023
8. MIT Technology review. https://www.technologyreview.com/2020/09/29/1009098/ai-dee
pfake-putin-kim-jong-un-us-election/. Last accessed 02 Sept 2023
9. NPR. https://www.npr.org/2022/03/16/1087062648/deepfake-video-zelenskyy-experts-war-
manipulation-ukraine-russia. Last accessed 02 Sept 2023
112 D. Shukla and A. Pandey
36. Government of India, Ministry of Electronics & Information Technology (MeitY), Scheme
for Notifying Examiner of Electronic Evidence under Section 79A of Information Technology
Act 2000, https://www.meity.gov.in/writereaddata/files/annexure-i-pilot-scheme-for-notify
ing-examiner-of-electronic-evidence-under-section-79a-of-the-information-technology-act-
2000.pdf. Last accessed 13 Sept 2023
37. Ministry of Electronics & Information Technology Notification, https://www.meity.gov.in/wri
tereaddata/files/12.eGazetteeNotification_FSL%20Rohini_Delhi.pdf. Last accessed 13 Sept
2023
38. Notification of Forensic labs as ‘Examiner of Electronic Evidence’ under Section 79A of
the Information Technology Act 2000. https://www.meity.gov.in/notification-forensic-labs-
‘examiner-electronic-evidence’-under-section-79a-information-technology. Last accessed 13
Sept 2023
39. Dr. Nilay Mistry, Associate Professor, School of Cyber Security & Digital Forensics, National
Forensic Science University, Gandhinagar, India
40. RTI Application No. DITEC/A/E/23/00017
41. Homeland Security, Increasing Threat of Deepfake Identities, https://www.dhs.gov/sites/def
ault/files/publications/increasing_threats_of_deepfake_identities_0.pdf. Last accessed 13 Sept
2023
An In-Depth Exploration of Anomaly
Detection, Classification,
and Localization with Deep Learning:
A Comprehensive Overview
Abstract The ability to identify trends in the data when one set of data is devi-
ating from another is called Data Mining. The development of anomalies has made
it possible to identify and avoid malware, as well as several other unlawful prac-
tices. Traditional detection strategies have shown strong results However, as deep
learning progresses, important findings have emerged over the past few years. In
order to summarize existing and the most cutting-edge fraud and intrusion detection
strategies, we address these issues depending on the existence of neural networks,
from broad to shallower. This paper provides an analysis of the published tech-
niques for anomaly detection, especially on the contribution of deep learning to
detection. Methods were sorted according to the kind of DNN included in this study.
K. U. Singh (B)
School of Computer Science, University of Petroleum and Energy Studies, (UPES),
Dehradun 248007, India
e-mail: [email protected]
A. Kumar · G. Kumar
Department of Computer Science & Engineering, Symbiosis Institute of Technology, Symbiosis
International University, Lavale Campus, Pune, India
e-mail: [email protected]
G. Kumar
e-mail: [email protected]
T. Singh
Department of Computer Science and Engineering, Graphic Era Deemed to Be University,
Dehradun, India
e-mail: [email protected]
T. Choudhury (B)
Department of Computer Science & Engineering, Symbiosis Institute of Technology, Symbiosis
International University, Lavale Campus, Pune, India
e-mail: [email protected]; [email protected]
K. Kotecha
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology,
Symbiosis International University, Pune 411045, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 115
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_10
116 K. U. Singh et al.
These classes helped us to categorize the deep learners by how often they’ve been
using, in data representation and for differentiating between various types of anoma-
lies. In addition, deep neural networks in specific anomaly detection tasks presented
incontrovertible proof of their effective implementation.
1 Introduction
Oddities is widely applied to both computer and social security; an illustration of the
latter is a sensory analysis of financial and banking details unconventional pattern
recognition enables unexpected trends to be identified. As unexpected activity occurs
on a computer network, it is possible that sensitive data is being sent to an unautho-
rized location. Since various acts could have been taken in the payment card data
entry phase, there is a greater chance of fraud. Anomalies in the spacecraft’s internal
equipment can result in an error [1]. Changes in pixel intensity in unpredictable
locations can show the existence of potentially deadly tumors. Any behaviors that
deviate from the planned actions are classified as deviant and have the potential to
hold new attacks and tumors at bay. In Fig. 1, you will see some instances of where
abnormalities are discovered: Commercial fraud identification according to Ref. [1]
covers banks, lending institutions, telecommunications, and capital exchange among
others. Some intrusion detection programmers aim to track illegal behavior inside a
data system, but some go after non malicious in order to defend or safe as well [2].
Abuse detection and anomaly analysis of outliers can be done in two separate forms
[3]. Attack identification is limited to old attacks and routines and is advanced by
natural (or abnormal) actions [4]. Because the environment is volatile, we concen-
trate on the identification of deviations owing to the fact that we can accommodate
the unpredictable existence of data shifts.
The author of Ref. [5] has defined anomalies in various ways: Anomalies are stated
by the authors to be doubtful in the eyes of the analyst Ref. [6]. The term anomaly as
found in Grubbs’ book [7] has been defined as a statistically aberrant observation that
is different from the rest of the population. An anomaly was characterized by Heron
[8] as being highly deviant or challenging to pattern. Similarly, 1,044,698:21,746,446
concluded that an anomaly was presented with anomalies with the rest of the results.
According to Ref. [9], an exception is something that sticks out among the others to
become suspicious.
The writer provides a description of an exception as anything that is distinct from
the rest of the points [10]. They all function under the same principle: Every single
one of these words has the same principal behavior. We define our system as Thus,
we define our system as follows: There will often be data in a data set that is only
slightly out of the ordinary; there are, in reality, just two kinds of data in a data set:
ordinary and exceptional.
An In-Depth Exploration of Anomaly Detection, Classification … 117
The theory contains three major methods of detection: those relating to the intrinsic
characteristics of the signal, the number of types of features that the device identi-
fies, and the form of substance that is yielded [11] One or multidimensional repre-
senting many instances such as objects, documents, points, patterns, among others
[12]. The classifier must conform to all three, punctual, contextual, and multivalued
definitions [13]. It can be categorized into three forms controlled, semi-supervised,
and unsupervised classifications [14]. Detected abnormalities also are returned in
the form of scores or labels [15]. Finally, various methods, such as mathematical
processing, machine learning, knowledge science, information spectrum theory, and
similar methodologies, residual powers, and exceptions are added to the method of
identifying creative ideas [16].
Among machine learning techniques, deep learning has become very popular in
the scientific community, due to the very good results achieved in dissimilar topics
such as image processing, faces, numbers, email, and text fonts JAVA vs. Innovative
or Microsoft [17]. These explanations enhance our ability to use this method in our
studies and experiments.
Algorithms based on the deep learning technique are motivated by the field of
artificial intelligence and try to emulate the cognitive ability of the human brain
[18]. These algorithms commonly make use of the data structure known as the
neural network [19], to which modifications have been made creating new types
of networks destined to work with different types of data or specific functionalities.
Among these new structures, we can mention: Autoencoders (AEs), Deep Neural
Networks (DNN) [20], Restricted Boltzmann Machines (RBM) [21], the Deep Belief
Networks (DBN) [22], the Convolutional Neural Networks (CNN) [23], and the
Recurrent Neural Networks (RNN) [24]. Although these structures are different,
118 K. U. Singh et al.
they are all neural networks because they maintain the basic structure of neurons,
layers, and connections between neurons using linear and nonlinear activation func-
tions. Convolute metric, when many activations, and representation layers (volume
atrocity) are combined with several samples [25] which will process complicated data
at a simple level of abstraction [26]. These networks may be used individually, but
greater efficiency is achieved when used in combination. One of the most commonly
employed methods is the GAN (generative adversarial network). Generator networks
and discriminator networks, the generator network distributes samples in the training
data room, making it difficult for the discriminator to classify. This interactive rela-
tionship between both networks achieves a simultaneous op timization through a
minimum game for two players.
The purpose of this document was to research deep learning methods that could
be useful for anomaly detection, as mentioned above novel approaches to avoidance
of and detection of fraud and malicious software. This is the current state of the art
of our understanding of the topic.
This section reviews the most recent work related to anomaly detection, specifically
fraud and intrusion detection, which are based on deep learning techniques [43–
48]. For this, it begins with a brief explanation of the operation of the methods
for detecting anomalies. Function extraction is the starting point for running flow
in anomaly detection methods. The interpretation of the data and the algorithm was
performed in a manner where it can distinguish between usual and irregular situations.
Educated designers will see into the future.
Fraud detection and according to the Association of Fraud Examiners, the usage
of one’s job title to profit involves using company properties for personal gain by
fraudulent means [27]. In addition, the Concise Oxford Dictionary described fraud
as criminal deceit. The two methods to deter fraud: prevent it or catch it as it occurs
[28]. When it comes to preventing theft, you have to fight to locate it when it has
not yet occurred and has been performed until it has been found [29]. Credit card
theft, cell phone fraud, insurance premium fraud, and stock dealing have been widely
researched [30] (Fig. 2).
EAs have been very useful for the detection of unsupervised fraud, which is why
they have been used in several studies [31]. A method based on a cost-sensitive
learning approach was proposed where a type of AE known as Stacked DE noising
Auto encoders (SDAE) is used [32] to identify fraudulent transactions in a financial
fraud detection problem. In this work, a basic selection of instances is carried out
in the characteristic extraction step, taking into account the number of non-null
attributes of the transactions. In addition, they introduce a modification to the cost
function of the SDAE in order to minimize the cost of misclassification. In this
way, fraudulent transactions are identified effectively and efficiently. The authors
of [33] proposed a method for detecting credit card fraud. This method consists of
An In-Depth Exploration of Anomaly Detection, Classification … 119
Fig. 2 Flow of an anomaly detection method taking credit card transactions as data
classifying a bank transfer request in real time using an AE, which is trained to take
into account the information of transactions carried out previously. The authors of
[34] propose 3 methods for the detection of fraud in banking transactions using AEs.
Among the three combinations, the first is an AE for the extraction of characteristics
and a traditional classifier, and the other two AE-AE, AE-SDAE under the GAN
strategy, where the first network acts as the extractor of characteristics and the other
as the classifier. In Ref. [35] a method is proposed that uses an AE in the feature
extraction step and follows a GAN strategy for fraud detection. In this work, an AE
is used to achieve a representation of non-malicious users taking into account their
activity online. They then generate another fictitious representation of non-malicious
users using a DNN that is used as the generating network for the GAN. Finally, using
another DNN (known as the GAN discriminator) you learn to identify real non-
malicious users. In this way, by processing the actual data, the method is able to
separate non-malicious users from the rest. A testing platform for payment card
abuse. They added a new characteristic, focused on the entropy benefit over time,
to their inquiry. The authors generated a function matrix based on seven classical
functions, which they then used to derive this entropy. Sampling is chosen for this
project because of the current inequalities in the data which solves this problem,
based on the repopulation of the transactions. This increase in weighted data aims to
avoid over-training the network towards only one data type. These feature matrices
are used as input to a CNN, which aims to classify transactions as abnormal or
normal.
There are other networks (Redirection Blocking Macros) that are considered to
be used for detecting fraud [36]. A technique built in this study [37] would use an
RBM to classify credit card fraud. Applying RBMs to previous transaction history
verifies the bank transfers in real time. The authors of the work [38] carry out a
comparative study between some traditional classification methods (multinomial
logistic regression, multilayer perceptron, and vector support machine) and a method
based on DBN with an RBM. This work showed the superiority in efficiency of the
method used by RBM for the classification of credit frauds.
In [39], a framework was proposed for the detection of fraud in auto insurance
through a combination of a text mining technique based on LDA [40], categorical data
120 K. U. Singh et al.
information and numeric data, as well as a DNN. In this framework, a word segmen-
tation technique is used for text processing, and an LDA model for the extraction of
topics from segmented texts. With these topics, categorical and numerical informa-
tion, the characteristics that are passed to the DNN are made so that it learns from
them. In this way, it is identified if an auto accident claim is fraudulent.
3 Intrusion Detection
Computer security services prioritize data device interference [41]. This early iden-
tification helped create intrusion systems that prevent further assaults from causing
substantial harm [42]. Universal and denial-of-service assaults are feasible. I, you,
him, her (2018). We collaborated with 1,044,698:21,746,482. When a denial of
service attack floods all services and the computer’s bandwidth with false requests,
no one can access network resources. Vulnerability detectors find exploitable holes.
In an R2L attack, packets are sent from a user with higher privileges to a less efficient
system to drop lower privileges. U2R attacks start with a user account and advance
to device control.
An intrusion detection and defence approach is proposed [43]. Implementing
attack encodings as DoS layers identifies DoS-type assaults. Some argued that deep
AEs may remove the decoder without affecting quality. These modified AEs were
named asymmetric deep (NDAEs). The final rendering structure uses two NDAEs
in a chain, with outputs from one going into the other. After representing the data
using the preceding chain of NDAEs, a Random Forest classifier is employed to
detect intrusions. Also, AEs can identify U2R-type assaults [44]. SDAEs are utilized
to represent data with minimal dimensionality in this study. This SDAE has three
AEs in a chain that were trained unsupervised and then fine-tuned. Attacks were
identified using a Softmax classifier. Similar to Ref. [45], an SDAE reduces the data
and a vector support machine classifies traffic network threats like PU-IDS Dataset
[46]. A gluttonously trained deep EA is used in Ref. [47].
Layers act greedily to prevent overfitting and local optimal. This helps them
categorize assaults efficiently.
You may also employ DNNs for intrusion prevention. To minimize threat propaga-
tion, this network was tested in attack classification systems using DDoS (distributed
denial of service) methods [48]. Hidden layers employ the right amount of neurons.
A swarm-based genetic algorithm determined this neuron number. This enhances
network learning. A probabilistic neural network classifier identifies each network
assault type. A probabilistic classifier is employed at the conclusion of a DBN
network to fine-tune data classification in [49]. Secret and output layer neurons
were double the quantity required in the previous study.
Authors of [50] used CNNs for intrusion detection. CNNs and sequential data
modeling were tested to analyze and classify all network assaults [51]. CNN-RNN,
CNN-LSTM, and CNN-GRU were utilized. The best combination was a CNN-LSTM
with a concealed three-layer CNN. They propose a CNN variation called dilated
An In-Depth Exploration of Anomaly Detection, Classification … 121
convolutional AEs (DCA) in [54] that uses stacked autoencoders and CNNs. This
study uses convolution and DE convolution to decompose data. Dilated convolutional
layers replace clustering layers in this network. A Softmax layer was used to fine-tune
this variant for attack categorization as it does not require labeled data for training.
Search engine querying often uses Gartner ate computation and neural intrusion
detection [55]. The article may forecast an RNN. The authors employ the activation
function and RNN to avoid knowledge loss when the gradient hits 0. A multi-layer
perceptron with an RNN-LST modification displays the findings. DDoS assaults fall
into three categories. This method detects DDoS, injection, and malware. Tang et al.
[56] integrated an RNN with a sequential data modeling approach. They efficiently
categorize software network intrusion assaults using the GRU-RNN. An RNN was
employed as a classifier without modification to identify various invasions. Again,
they combined an RNN network with a sequential data model and the LSTM using a
stochastic gradient descending optimizer, then optimized it using a Nadam optimizer.
Citing its usefulness in forecasting network protection deep learning has improved
be- because of it. Many implementations in pattern recognition and data mining have
taken advantage of this form of learning, which has increased the effectiveness of
the techniques for spotting anomalies in other activities, such as anomaly detection.
Consider the origin of deviations, which changes when a process is being developed
when choosing a tool for detection. In order to solve the detection issue, a lot of
adaptability is needed in deep neural networks. The different models of deep neural
networks have previously shown the capacity to find anomalies. A large number of
these ventures have utilized neural networks for dimensionality reduction, network
fraud prevention, and neural network spoofing, in addition to other features.
On the other hand, it can be mentioned that the GAN strategy has been little used
despite the good results achieved with its use. This is due to the complexity involved
in its implementation and training. However, the use of various types of networks
was shown as in where the combination of a deep EA and an SDAE is used; and AE
is combined with a DNN, separating non-malicious users.
5 Conclusions
Most of the articles in this study utilize deep or extended artificial intelligence.
With respect to data classification, most of these EAs were employed for binary
classifications, while only a few were used for rule creation. With respect to the
above, there is not enough data to justify multi-class identification for this form of
intrusion. Thus, it can be calculated that substantial advances are possible in data
mining, especially in anomaly detection, which can be made through Supervised and
122 K. U. Singh et al.
References
1. Studiawan H, Sohel F, Payne C (2020) Anomaly detection in operating system logs with deep
learning-based sentiment analysis. In: IEEE transactions on dependable and secure computing.
https://doi.org/10.1109/TDSC.2020.3037903
2. Ahmed, Sajan KS, Srivastava A, Wu Y (2021) Anomaly detection, localization and clas-
sification using drifting synchrophasor data streams. In: IEEE transactions on smart grid. https://
doi.org/10.1109/TSG.2021.3054375
3. Ahn H (2020) Deep learning based anomaly detection for a vehicle in swarm drone system. In:
2020 international conference on unmanned aircraft systems (ICUAS), Athens, Greece, 2020,
pp 557–561. https://doi.org/10.1109/ICUAS48674.2020.9213880
4. Park H, Park D-H, Kim S-H (2020) Deep learning-based method for detecting anomalies
of operating equipment dynamically in livestock farms. In: 2020 international conference
on information and communication technology convergence (ICTC), Jeju, Korea (South), pp
1182–1185. https://doi.org/10.1109/ICTC49870.2020.9289351
5. Naseer S et al (2018) Enhanced network anomaly detection based on deep neural networks.
IEEE Access 6:48231–48246. https://doi.org/10.1109/ACCESS.2018.2863036
6. Garg S, Kaur K, Kumar N, Rodrigues JJPC (2019) Hybrid deep-learning-based anomaly detec-
tion scheme for suspicious flow detection in SDN: a social multimedia perspective. IEEE Trans
Multimedia 21(3):566–578. https://doi.org/10.1109/TMM.2019.2893549
7. Munir M, Chattha MA, Dengel A, Ahmed S (2019) A comparative analysis of traditional
and deep learning-based anomaly detection methods for streaming data. In: 2019 18th IEEE
international conference on machine learning and applications (ICMLA), Boca Raton, FL,
USA, 2019, pp 561–566. https://doi.org/10.1109/ICMLA.2019.00105
8. Qian K, Jiang J, Ding Y, Yang S (2020) Deep learning based anomaly detection in water distri-
bution systems. In: 2020 IEEE international conference on networking, sensing and control
(ICNSC), Nanjing, China, 2020, pp 1–6. https://doi.org/10.1109/ICNSC48988.2020.9238099
9. Zhang G, Qiu X, Gao Y (2019) Software defined security architecture with deep learning-
based network anomaly detection module. In: 2019 IEEE 11th international conference on
communication software and networks (ICCSN), Chongqing, China, 2019, pp 784–788. https://
doi.org/10.1109/ICCSN.2019.8905304
10. Dong Y, Wang R, He J (2019) Real-time network intrusion detection system based on deep
learning. In: 2019 IEEE 10th international conference on software engineering and service
science (ICSESS), Beijing, China, 2019, pp 1–4. https://doi.org/10.1109/ICSESS47205.2019.
9040718
11. Kavousi-Fard, Dabbaghjamanesh M, Jin T, Su W, Roustaei M (2020) An evolutionary deep
learning-based anomaly detection model for securing vehicles. In: IEEE transactions on
intelligent transportation systems. https://doi.org/10.1109/TITS.2020.3015143
12. Garg S, Kaur K, Kumar N, Kaddoum G, Zomaya AY, Ranjan R (2019) A hybrid deep learning-
based model for anomaly detection in cloud datacenter networks. IEEE Trans Netw Serv
Manage 16(3):924–935. https://doi.org/10.1109/TNSM.2019.2927886
13. Fernández Maimó L, Perales Gómez ÁL, García Clemente FJ, Gil Pérez M, Martínez Pérez G
(2018) A self-adaptive deep learning-based system for anomaly detection in 5G networks. In:
IEEE Access 6:7700–7712. https://doi.org/10.1109/ACCESS.2018.2803446
An In-Depth Exploration of Anomaly Detection, Classification … 123
14. Li X, Chen P, Jing L, He Z, Yu G (2020)SwissLog: robust and unified deep learning based log
anomaly detection for diverse faults. In: 2020 IEEE 31st international symposium on software
reliability engineering (ISSRE), Coimbra, Portugal, 2020, pp 92–103. https://doi.org/10.1109/
ISSRE5003.2020.00018
15. Alrawashdeh K, Purdy C (2018) Fast activation function approach for deep learning based
online anomaly intrusion detection. In: 2018 IEEE 4th international conference on big data
security on cloud (BigDataSecurity), IEEE International Conference on High Per- formance
and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and
Security (IDS), Omaha, NE, USA, 2018, pp 5–13. https://doi.org/10.1109/BDS/HPSC/IDS18.
2018.00016
16. Dong L, Zhang Y, Wen C, Wu H (2016) Camera anomaly detection based on morphological
analysis and deep learning. In: 2016 IEEE international conference on digital signal processing
(DSP), Beijing, China, 2016, pp 266–270. https://doi.org/10.1109/ICDSP.2016.7868559
17. Lee W-Y, Wang Y.-C.F. (2020) Learning disentangled feature representations for anomaly
detection. In: 2020 IEEE international conference on image processing (ICIP), Abu Dhabi,
United Arab Emirates, 2020, pp 2156–2160. https://doi.org/10.1109/ICIP40778.2020.9191201
18. Manimurugan S, Al-Mutairi S, Aborokbah MM, Chilamkurti N, Ganesan S, Patan R (2020)
Effective attack detection in internet of medical things smart environment using a deep
belief neural network. IEEE Access 8:77396–77404. https://doi.org/10.1109/ACCESS.2020.
2986013
19. Fernández GC, Xu S (2019) A case study on using deep learning for network intrusion detection.
In: MILCOM 2019—2019 IEEE Military Communications Conference (MILCOM), Norfolk,
VA, USA, 2019, pp 1–6. https://doi.org/10.1109/MILCOM47813.2019.9020824
20. Lin M, Zhao B, Xin Q (2020) ERID: a deep learning-based approach towards efficient real-
time intrusion detection for IoT. In: 2020 IEEE eighth international conference on communi-
cations and networking (ComNet), Hammamet, Tunisia, pp 1–7. https://doi.org/10.1109/Com
Net47917.2020.9306110
21. Haselmann M, Gruber DP, Tabatabai P (2018) Anomaly detection using deep learning based
image completion. In: 2018 17th IEEE international conference on machine learning and appli-
cations (ICMLA), Orlando, FL, USA, pp 1237–1242. https://doi.org/10.1109/ICMLA.2018.
00201
22. Malaiya RK, Kwon D, Suh SC, Kim H, Kim I, Kim J (2019) An empirical evaluation of deep
learning for network anomaly detection. IEEE Access 7:140806–140817. https://doi.org/10.
1109/ACCESS.2019.2943249
23. Haider S, Akhunzada A, Ahmed G, Raza M (2019)Deep learning based ensemble convolutional
neural network solution for distributed denial of service detection in SDNs. In: 2019 UK/China
emerging technologies (UCET), Glasgow, UK, 2019, pp 1–4. https://doi.org/10.1109/UCET.
2019.8881856
24. Miau S, Hung W-H (2020) River flooding forecasting and anomaly detection based on deep
learning. IEEE Access 8:198384–198402. https://doi.org/10.1109/ACCESS.2020.3034875
25. Potluri S, Diedrich C (2019)Deep learning based efficient anomaly detection for securing
process control systems against injection attacks. In: 2019 IEEE 15th international conference
on automation science and engineering (CASE), Vancouver, BC, Canada, 2019, pp 854–860.
https://doi.org/10.1109/COASE.2019.8843140
26. Abeyrathna D, Huang P, Zhong X (2019) Anomaly proposal-based fire detection for cyber-
physical systems. In: 2019 international conference on computational science and computa-
tional intelligence (CSCI), Las Vegas, NV, USA, 2019, pp 1203–1207. https://doi.org/10.1109/
CSCI49370.2019.00226
27. Ma N, Peng Y, Wang S, Liu D (2018)Hyperspectral image anomaly targets detection with
online deep learning. In: 2018 IEEE international instrumentation and measurement technology
conference (I2MTC), Houston, TX, USA, 2018, pp. 1–6. https://doi.org/10.1109/I2MTC.2018.
8409615
28. Ding K,Ding S, Morozov A, Fabarisov T, Janschek K (2019) On-line error detection and
mitigation for time-series data of cyber-physical systems using deep learning based methods.
124 K. U. Singh et al.
In: 2019 15th european dependable computing conference (EDCC), Naples, Italy, 2019, pp
7–14. https://doi.org/10.1109/EDCC.2019.00015
29. Ma X, Shi W (2020) AESMOTE: adversarial reinforcement learning with SMOTE for anomaly
detection. In: IEEE transactions on network science and engineering. https://doi.org/10.1109/
TNSE.2020.3004312
30. Maggipinto M, Beghi A, Susto GA (2019)A deep learning-based approach to anomaly detection
with 2-dimensional data in manufacturing. In: 2019 IEEE 17th international conference on
industrial informatics (INDIN), Helsinki, Finland, 2019, pp 187–192. https://doi.org/10.1109/
INDIN41052.2019.8972027
31. Fang X et al (2020) Sewer pipeline fault identification using anomaly detection algorithms
on video sequences. IEEE Access 8:39574–39586. https://doi.org/10.1109/ACCESS.2020.297
5887
32. Aygün RC, Yavuz AG (2017)A stochastic data discrimination based autoencoder approach for
network anomaly detection. In: 2017 25th signal processing and communications applications
conference (SIU), Antalya, Turkey, 2017, pp 1–4. https://doi.org/10.1109/SIU.2017.7960410
33. Hussain Q Du, Ren P (2018)Deep learning-based big data-assisted anomaly detection in cellular
networks. In: 2018 IEEE global communications conference (GLOBECOM), Abu Dhabi,
United Arab Emirates, 2018, pp 1–6. https://doi.org/10.1109/GLOCOM.2018.8647366
34. Marsiano FD, Soesanti I, Ardiyanto I (2019)Deep learning-based anomaly detection on surveil-
lance videos: recent advances. In: 2019 international conference of advanced informatics:
concepts, theory and applications (ICAICTA), Yogyakarta, Indonesia, 2019, pp 1–6. https://
doi.org/10.1109/ICAICTA.2019.8904395
35. Togo R, Saito N, Ogawa T, Haseyama M (2019) Estimating regions of deterioration in electron
microscope images of rubber materials via a transfer learning-based anomaly detection model.
IEEE Access 7:162395–162404. https://doi.org/10.1109/ACCESS.2019.2950972
36. Nie L, Zhao L, Li K (2020) Glad: global and local anomaly detection. In: 2020 IEEE interna-
tional conference on multimedia and expo (ICME), London, UK, pp 1–6. https://doi.org/10.
1109/ICME46284.2020.9102818
37. Miller J,Wang Y, Kesidis G (2018) Anomaly detection of attacks (ada) on DNN classifiers at
test time. In: 2018 IEEE 28th international workshop on machine learning for signal processing
(MLSP), Aalborg, Denmark, 2018, pp 1–6. https://doi.org/10.1109/MLSP.2018.8517069
38. Perera P, Patel VM (2019) Learning deep features for one-class classification. IEEE Trans
Image Process 28(11):5450–5463. https://doi.org/10.1109/TIP.2019.2917862
39. Salama R, Al-Turjman F, Bordoloi D, Yadav SP (2023) Wireless sensor networks and green
networking for 6G communication—an overview. In: 2023 international conference on compu-
tational intelligence, communication technology and networking (CICTN), Ghaziabad, India,
2023, pp 830–834. https://doi.org/10.1109/CICTN57981.2023.10141262
40. Aygun RC, Yavuz AG (2017) Network anomaly detection with stochastically improved autoen-
coder based models. In: 2017 IEEE 4th international conference on cyber security and cloud
computing (CSCloud), New York, NY, USA, 2017, pp 193–198. https://doi.org/10.1109/CSC
loud.2017.39
41. Masood U,Asghar A, Imran A, Mian AN (2018) Deep learning based detection of sleeping
cells in next generation cellular networks. In: 2018 IEEE global communications conference
(GLOBECOM), Abu Dhabi, United Arab Emirates, 2018, pp 206–212. https://doi.org/10.1109/
GLOCOM.2018.8647689
42. Qin Y, Wei J, Yang W (2019) Deep learning based anomaly detection scheme in software-
defined networking. In: 2019 20th Asia-Pacific network operations and managementsympo-
sium (APNOMS), Matsue, Japan, 2019, pp 1–4. https://doi.org/10.23919/APNOMS.2019.889
2873
43. Mishra M, Sarkar T, Choudhury T et al (2022) Allergen30: detecting food items with possible
allergens using deep learning-based computer vision. Food Anal Methods 15:3045–3078.
https://doi.org/10.1007/s12161-022-02353-9
44. Sayyad S, Kumar S, Bongale A, Kotecha K, Abraham A (2023) Remaining useful-life predic-
tion of the milling cutting tool using time–frequency-based features and deep learning models.
Sensors 23:5659. https://doi.org/10.3390/s23125659
An In-Depth Exploration of Anomaly Detection, Classification … 125
45. Choudhury T, Anggarwal A, Tomar R (2020) A deep learning approach to helmet detection
for road safety. J Sci Ind Res (India) 79(June):509–512
46. Rajendran A et al (2022) Detecting extremism on Twitter during U.S. Capitol Riot using deep
learning techniques. IEEE Access 10:133052–133077. https://doi.org/10.1109/ACCESS.2022.
3227962
47. Natarajan B et al (2022) Development of an end-to-end deep learning framework for sign
language recognition, translation, and video generation. IEEE Access 10:104358–104374.
https://doi.org/10.1109/ACCESS.2022.3210543
48. Khanna A, Sah A, Choudhury T (2020) Intelligent mobile edge computing: a deep learning
based approach. In: Singh M, Gupta P, Tyagi V, Flusser J, Ören T, Valentino G (eds) Advances
in computing and data sciences. ICACDS 2020. Communications in Computer and Information
Science, vol 1244. Springer, Singapore. https://doi.org/10.1007/978-981-15-6634-9_11
Comparative Analysis of Docker Image
Files Across Various Programming
Environments
K. U. Singh (B)
School of Computing, Graphic Era Hill University, Dehradun, India
e-mail: [email protected]
A. Kumar · G. Kumar
Department of Computer Engineering &, Applications GLA University, Mathura, UP, India
e-mail: [email protected]
G. Kumar
e-mail: [email protected]
T. Singh
Department of Computer Science and Engineering, Graphic Era Deemed to Be University,
Dehradun, India
e-mail: [email protected]
T. Choudhury (B)
School of Computer Science, University of Petroleum and Energy Studies (UPES), Bidholi,
Dehradun, Uttarakhand 248007, India
e-mail: [email protected]
K. Kotecha
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology,
Symbiosis International University, Pune 411045, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 127
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_11
128 K. U. Singh et al.
run the same lines of code on a variety of different platforms. The major aim was to
improve the feasibility of conducting a thorough comparison study. Software engi-
neers were given the ability to evaluate and compare results by exploiting Docker’s
capabilities, which finally enabled them to arrive at well-informed conclusions about
the most appropriate course of action. The software development community is able
to get useful insights into the benefits and considerations of using Docker on multiple
platforms as a result of this research, which paves the way for a journey that is more
streamlined in terms of both development and deployment.
1 Introduction of Dockers
1.1 Docker
Docker is a platform that is both open and free to use, and it was created with the
intention of simplifying the whole of the software application lifecycle. This includes
the phases of development, distribution, and operation. You will be able to get the
ability to compartmentalise your apps in a way that is separate from their underlying
infrastructure if you make use of Docker’s features [1, 2]. This separation enables a
procedure that is both quick and efficient in the deployment of applications. Docker
ushers in a new paradigm in which the administration of your infrastructure should
reflect the strategy you use for the management of your apps. Utilising Docker’s
optimised procedures, which include quick code delivery, testing, and deployment,
results in a considerable decrease in the amount of time that elapses between the
production of new code and its actual application in the real world. Containers are the
fundamental building block of Docker’s architecture [3]. Containers are standardised
file formats that include software that has been pre-packaged and all of the essen-
tial dependencies that have been painstakingly assembled for smooth application
execution. Notably, containers enable the execution contexts of various programmes
are kept separate while yet allowing them to share essential components of the oper-
ating system. This is a significant benefit of using containers [4, 5]. These containers,
which are often measured in megabytes, use fewer resources than conventional virtual
machines (VMs), and their starting times are much faster. Megabytes are the common
unit of measurement for these containers. Because of their high level of efficiency,
they may be tightly packed into the same piece of hardware and can be collec-
tively started or terminated with a minimum amount of work and overhead [6, 7].
Building software components into contemporary application and service stacks is
made much easier with the foundation that containers offer. These stacks are crucial
in the modern commercial world. Additionally, they simplify the process of regularly
updating and maintaining the system with a high level of granularity. Both Tianshuo
Comparative Analysis of Docker Image Files Across Various … 129
Yang (2019) and Yao (2018) agree that Docker is a helpful tool for creating and
deploying software, which is more evidence of the platform’s value.
After you have carefully created your Dockerfile, the following step is to use the
Docker build tool in order to materialise an image based on the blueprint that is
defined inside that Dockerfile. This will allow you to start using your Dockerfile [11].
The resultant Docker image is a self-contained entity that houses the specifications
governing the software components that containers will house, as well as dictating
how these components will work harmoniously together. The Dockerfile serves as a
roadmap, instructing the build process on how to piece together the image. However,
the Docker image itself is a self-contained entity that houses the specifications.
Docker images take on the function of portable files, which enables the settings of
applications to be moved across different types of environments without any difficulty
[12]. Due to the fact that the Dockerfile often contains instructions to get certain
software packages from online repositories, it is imperative that great attention be
paid in order to specifically describe the correct versions. It is possible that ignoring
130 K. U. Singh et al.
this can result in unintended differences in the images that are produced, which is a
circumstance that is reliant on the time of the invocation of the Docker build process.
After it has been created, a picture does not undergo any more transformations
in the natural world. This feature emphasises the static nature of Docker images,
as so astutely pointed out [13]. The dynamic journey that was started by the Dock-
erfile culminates in a static but powerful artefact that encapsulates the core of the
application’s design as well as its requirements. This artefact is then prepared to be
instantiated into containers for the purpose of efficient deployment and operation
[14].
2 Experiment
In this part of the article, we will begin an investigation into the performance of
Docker by digging into several instances in which different technologies provide
different results for the same desired purpose. In the Docker ecosystem, the procedure
begins with the creation of an image file, which is then deposited into a repository.
When the process is moved to a different computer, the image file is obtained, which
prepares the way for the operation to be carried out. The utilisation of the CPU, the
size of the picture file, and the total number of lines of code are three examples of the
many aspects that come into play throughout the spectrum of various technologies.
The complex dynamic between all of these factors is a contributor to the wide range
of possible outcomes. This research focuses on doing a comparative examination
of the programming languages Python and Java within the context of the Docker
environment in order to shed light on the relative strengths and weaknesses of each
language [15]. Our work is set to shed light on the complex performance differ-
ences, which are driven by the technology used, and give insights into how Docker
interacts with various languages. We want to achieve this goal by doing painstaking
research on facets such as computing efficiency, image size, and code complexity
in an effort to understand the complicated relationships that exist between Docker
and other technological options [16]. In the end, this examination should enhance
our knowledge of Docker’s flexibility as well as the subtleties that influence its inter-
action with a variety of technologies, which will highlight the intricacies of current
software development and deployment.
A methodical procedure is required for the generation of an image file for Java while
working inside the Docker framework. The process of creating an image begins
with the drafting of a Java program and the subsequent storing of that file with
the.java extension; this signals the beginning of the voyage. This picture file contains
an environment that allows for the smooth operation of the Java program in its
Comparative Analysis of Docker Image Files Across Various … 131
entirety, and it contains all of the necessary components [17, 18]. The Java runtime
environment, crucial libraries, auxiliary files, and, of course, the Java source code
itself are all considered to be part of the environment. The word "environment"
[28–32] encompasses the whole constellation of these important components. This
conglomeration of requirements and resources comes together to generate a unified
image file in the Java ecosystem, which is then ready for deployment in the Docker
ecosystem. A tangible example of this may be seen when a Java file is created, which
is often given a designation such as “f100.” This critical phase lays the groundwork
for the eventual generation of the picture file, which will be referred to throughout
the process as the “f100” image [19]. The figure that accompanies this explanation
provides a visual representation of the process, which captures the spirit of the Java-
centric image development that occurs inside the Docker ecosystem. This technique,
in its most basic form, encompasses the transformation of a Java programmed into
an independent entity that is optimised for the containerisation offered by Docker.
This method results in the Docker image being an enclosed artefact that is ready to
run the Java programmed while also coordinating the necessary runtime components
and dependencies (Fig. 1).
The visual representation provides a vivid insight into the imagefile named
“f100.” This imagefile is distinctly identified by its image ID, specifically denoted
as 5166a3ba961b, accompanied by a substantial size of 514 MB. Notably, the image
bears the default tag “latest,” signifying its current iteration. The illustrative depic-
tion offers a glimpse into the tangible outcome of executing the said imagefile, aptly
named “f100.” The process of initiating this execution involves the utilisation of a
command: “docker run [image ID].” Remarkably, the command’s syntax dictates that
merely the first four digits of the image ID are imperative for successfully triggering
the execution of the imagefile. This visual portrayal provides an at-a-glance under-
standing of the intricate interplay of Docker images, tags, and execution, rendering a
Python includes its own self-contained environment, which has been streamlined to
ensure that Python files may be executed without a hitch. In a separate effort geared
at reaching the same results as those attained via the use of Java, we are developing a
Python file that will be customised to provide the same results. As soon as the Python
programme is up and running, it immediately becomes the centre of attention when
it comes to producing a related picture file [21, 22]. After that, a location inside a
certain repository is found for this picture file to call home. An image file that is
relevant to the Python programme gets the spotlight as part of the presentation’s
alignment with the visual component. This particular picture file has a one-of-a-kind
image ID, which acts as a distinguishing identifier for it. In addition, the picture file
has a certain size, which is a criterion that is significant for determining the extent
of its scale. It is important to note that the default tag that has been applied to this
picture file is "latest," highlighting the image’s most recent rendition that can be
found inside the repository [23, 24]. The intrinsic symmetry that exists between the
Java and Python picture production processes is brought to light by the complete
description that has been provided. The illustration draws attention to the robust
environment that Python provides, which has been optimised to run Python files
quickly, while at the same time drawing attention to the crucial function that image
files play inside the Docker architecture. This visual story deepens our understanding
of Docker’s adaptability in supporting a wide variety of programming languages and
draws attention to the complex relationships and opportunities that are inherent in
today’s software development and deployment procedures (Fig. 2).
Our investigation up to this point has brought to light two unique image files: the
“f100” file, which is designed for Java, and the “python” file, which is designed for
Python. Regardless of the differences in their underlying programming, both picture
files exhibited the ability to produce the same output, demonstrating an exciting
convergence of results across a variety of different programming environments.
During our in-depth analysis, we scrutinised a wide range of important parame-
ters, including the amount of time spent using the CPU, the size of the picture, the
number of lines of code (LOC), and the amount of time spent using memory. The
use of this analytical approach enables us to get insights into the performance of
Python and Java within the context of Docker [25]. Using the information that we
have collected, we will now give a detailed comparison.
A Factor In20% 25% LOC Less More Python Utilisation of the CPU Compared
to Java Size of the image: 855 MB 515 MB Runtime utilisation: 12% 8%. The
comparisons and differences between Python and Java inside the Docker environment
are shown to be fascinating by the insights obtained from this table. Python has a
lesser number of Lines of Code (LOC), which translates to scripts that are more
succinct, while Java often exhibits a bigger number of LOC. The picture size, on
the other hand, reveals a fascinating paradox: despite Java’s more succinct script
style, Python’s image size is much greater than Java’s. Python’s runtime utilisation
is somewhat higher than that of Java, while Python’s CPU utilisation shows a modest
edge. Java’s runtime utilisation is slightly higher than that of Python.
This exhaustive comparison, in essence, highlights the complicated interaction
that exists between programming languages, image features, and the complex envi-
ronment that Docker provides [26, 27]. This realisation provides the way for informed
decisions to be made in aligning particular technology choices with the needs of
individual applications, which in turn propels contemporary software development
methods ahead.
3 Conclusion
Using the Docker technology enables the building of all-encompassing runtime envi-
ronments that are easily portable across different computer systems. During the
course of this endeavour, a striking realisation has been apparent: the enormous
quantity of files that have been produced as a consequence. Because of this finding,
a penetrating interest is inevitably sparked: what are the implications that lay behind
the various file sizes that are generated by comparable programme outputs generated
in different programming languages? As a result, the Java and Python program-
ming languages, which both provide identical results, are going to be investigated
throughout this research. When programmers are given the option of using a different
language, an in-depth comparative examination of the whole system follows, with
134 K. U. Singh et al.
the goal of determining which programming language comes out on top. This inves-
tigation began with a concentration on file size, but it has now expanded to include
a wide variety of auxiliary elements that have an effect on the overall programming
environment.
The voyage of study dives into a deep investigation, illuminating the complex
dynamic that exists between programming languages and the Docker environment.
In addition to simple concerns about file size, the research identifies a variety of other
aspects that contribute to the formation of the holistic programming environment.
This analysis provides programmers with essential insights that empower them to
make educated decisions, which eventually helps to advance the status of current
software development techniques. These insights are provided by unravelling the
complexities that control the selection of programming languages inside Docker’s
domain.
4 Future Scope
References
18. Pittard WS, Li S (2020) The essential toolbox of data science: python, R, git, and docker.
Method Mol Biol (Clifton, N.J.), Article vol. 2104:265–311. https://doi.org/10.1007/978-1-
0716-0239-3_15
19. Yadav DP, Kishore K, Gaur A, Kumar A, Singh KU, Singh T, Swarup C (2022) A novel
multi-scale feature fusion-based 3scnet for building crack detection. Sustainability 14:16179
20. Rahman M, Chen Z, Gao J (2015) A service framework for parallel test execution on a devel-
oper’s local development workstation. In: Proceedings—9th IEEE international symposium on
service-oriented system engineering, IEEE SOSE 2015, vol. 30, pp 153–160. https://doi.org/
10.1109/SOSE.2015.45
21. Ruan B, Huang H, Wu S, Jin H (2016) A performance study of containers in cloud environ-
ment. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics), vol. 10065 LNCS, pp 343–356. https://doi.
org/10.1007/978-3-319-49178-3_2
22. Saklani R, Purohit K, Vats S, Sharma V, Kukreja V, Yadav SP (2023) Multicore Implementation
of K-Means Clustering Algorithm. In: 2023 2nd international conference on applied artificial
intelligence and computing (ICAAIC), Salem, India, 2023, pp 171–175. https://doi.org/10.
1109/ICAAIC56838.2023.10140800
23. Ramon-Cortes C, Serven A, Ejarque J, Lezzi D, Badia RM (2018) Transparent orchestration of
task-based parallel applications in containers platforms. J Grid Comput 16(1):137–160. https://
doi.org/10.1007/s10723-017-9425-z
24. Sochat V (2018) The scientific filesystem. GigaScience, 7(5). https://doi.org/10.1093/gigasc
ience/giy023
25. Shukla A (2015) A modified bat algorithm for the quadratic assignment problem. In: 2015
IEEE congress on evolutionary computation (CEC), Sendai, Japan, 2015, pp 486–490. https://
doi.org/10.1109/CEC.2015.7256929
26. Sipek M, Muharemagic D, Mihaljevic B, Radovan A (2020) Enhancing performance of cloud-
based software applications with GraalVM and quarkus. In: 2020 43rd international convention
on information, communication and electronic technology, MIPRO 2020—Proceedings, pp
1746–1751. https://doi.org/10.23919/MIPRO48935.2020.9245290
27. Špaček F, Sohlich R, Dulík T (2015) Docker as platform for assignments evaluation. In: Proc
Eng 100, January ed., pp1665–1671. https://doi.org/10.1016/j.proeng.2015.01.541
28. Singh BK, Danish M, Choudhury T, Sharma DP (2021) Autonomic resource management in
a cloud-based infrastructure environment. In: Choudhury T, Dewangan BK, Tomar R, Singh
BK, Toe TT, Nhu NG (eds) Autonomic computing in cloud resource management in industry
4.0. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.
org/10.1007/978-3-030-71756-8_18
29. Ahmad F et al (2022) Levelized multiple workflow allocation strategy under precedence
constraints with task merging in IaaS cloud environment. IEEE Access 10:92809–92827.
https://doi.org/10.1109/ACCESS.2022.3202651
30. Jain D, Zaidi N, Bansal R, Kumar P, Choudhury T (2018) Inspection of fault tolerance in cloud
environment. In: Bhateja V, Nguyen B, Nguyen N, Satapathy S, Le DN (eds) Information
systems design and intelligent applications. Advances in intelligent systems and computing,
vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-10-7512-4_103
31. Dewangan BK, Agarwal A, Choudhury T, Pasricha A (2021) Workload aware autonomic
resource management scheme using grey wolf optimization in cloud environment. IET
Commun 15(14):1869–1882
32. Jalaj Pachouly et al. (2022) SDPTool : A tool for creating datasets and software defect predic-
tions. SoftwareX 18:101036. ISSN2352-7110. https://doi.org/10.1016/j.softx.2022.101036
Dimensions of ICT-Based Student
Evaluation and Assessment
in the Education Sector
R. Arulmurugan , P. Balakrishnan, N. Vengadachalam,
and V. Subha Seethalakshmi
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 137
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_12
138 R. Arulmurugan et al.
1 Introduction
Teaching and Learning activities are enhanced year by year. In earlier days teacher
teacher-centric approach of the single concept was there, but it was slowly replaced
by the student-centric approach. The teacher-centric approach contains a lot of draw-
backs such as one side of communication, students are fear and shy to ask questions
in class, teacher doesn’t understand the learner level. Due to these drawbacks, the
exam results are down. It was replaced by Outcome Based Education (OBE) came
to the picture, it was instructed to expect the outcome for each programme, course,
event, etc. It instructs to write the outcome of the event before initiation, so the
organizer and guest lecturer define the content of the delivery and activity. In earlier
days the object of the course was played in the syllabus and programme event. The
instructor defined the objective for the event, course, and programme. Based on the
objective solve the problem. Through the method, failure to identify the student’s
learning level and attainment of the course objective. It was overcome by the OBE.
The OBE clearly instructed to define the outcome of the course, and the outcome of
the program in terms of Programme Outcome (PO) and program program-specific
outcome (PSO). Based on PO and PSO define the Programme Educational Objective
of the concerned programme. Followed the PEO to define the department’s vision
and mission. It creates one closed circle. The Attainment of the PO and PSO is calcu-
lated from direct and indirect attainment. The indirect attainment is calculated from
the guest lecture and workshop feedback, parent’s feedback, graduate feedback or
programme exit survey, alumni survey, employer feedback, recruiter feedback, etc.
This feedback or survey contains the PO and PSO to the question. The participant
points the value between three to one. From the analysis calculate the indirect attain-
ment. The direct attainment calculated from the summation of all the courses for
the concerned batch students covers semester one to semester eight of theory and
practical courses. From the attainment of direct give, the weightage of 80% or 90%
and indirect for 20% or 10%. Through the calculation coming to know the concerned
programme outcome level.
2 Literature Survey
Develop and engage the students through experimental learning. SFIMAR involved
a few case studies of experimental teaching pedagogy. Experimental learning stim-
ulates the learner’s curiosity and enhances the understanding level [1]. Learner
learner-centric approach is forefront. The NEP 2020 suggested critical thinking for
employability and fostering experimental learning as an objective. Learner practicing
was enhanced by the experimental learning. Recent NAAC and NBA accreditation
are expected for experimental-based learning and ICT-based learning activity [2].
Created a higher level of cross-cultural awareness and understanding that education
in a project manner instead of memorizing data. The experimental learning activity
Dimensions of ICT-Based Student Evaluation and Assessment … 139
answers. In addition, with which one correct answer, etc. The ICT-based assessment
one side reduces the time consumption in the classroom, in another side the teacher
needs to spend more than an hour to prepare the question and answer on the ICT
tool website. Recent day’s lot of free sources have been offered for conducting the
ICT-based assessment system. Such as quizzes, Google form add-ons Fibonacci,
Moodle quiz, kahoot, etc. The Moodle quiz has various features. Namely, possi-
bility to shuffle the question, the possibility option to avoid repeating the question,
time-based close the form, etc. The quizzes provide the background music while
participating in the activity, it creates one type of boost-up instead of wage partici-
pation. On the teacher screen, the Quizzes show the score value second to second.
Through the screen assessment easy to identify the too lack of participant candidates.
Once identified the lack of interest candidates, a teacher reaches out to the students
to encourage or find the reason for the lack of participation. It helps to learn the
students. Sometimes students are not prepared for the activity in that case ask the
students to read the possibility of the question. Through various methods to make
them learn the content.
Before conducting of ICT-based assessment, the teacher should have a clear vision
about the activity. The activity may be thinking pair share, peer group learning,
collaborative learning, poster activity, assignment activity, diagram activity, etc. The
learning activity followed by the assessment through ICT is very helpful. At least
inform the students early after the completion of the chapter, and ask them to prepare
for the next day’s assessment activity. Sometimes the ICT-based assessment activity
leads up to twenty minutes to thirty minutes to some questions. At the end of the
activity who scored top rank candidate was called onto the stage for a prize and
applause. It creates energy for the students to prepare well for the upcoming activity.
Figure 1 shows the involvement of the students in ICT-based assessment activi-
ties, for these types of activities don’t require more classroom space, which means
need not separate the students because the questions are shuffled. In another possi-
bility, students may ask the question to their friend, which is only also eradicated
by the timing of each question. The timing for each question comes around ten
seconds, during the seconds students need to think and select. If students talk with
others time elapses to answer the corresponding question. So, during the activity,
students couldn’t perform any malefaction. On the other side, the teacher observes
the student’s performance through a monitoring screen as shown in Fig. 2. It clearly
shows the student’s percentage of completions, number of minutes completed, who
participated well, and who didn’t participate well, etc.
The teacher gives running commands to the students while performing. Once
the time elapses teacher informs the students to close the activity by clicking the
end button on the top right corner of the screen. Once the teacher clicks on the end
button the activity comes to an end, after that students can’t participate further. ICT
assessment shows the results in points. Who scored on the top asked the students to
come on the dais to appreciate and applause to create a spark in all the participants
as shown in Fig. 3.
Figure 4 shows the individual question student’s response performance screen
preview. That screen shows how much percentage of accuracy in answering the
Dimensions of ICT-Based Student Evaluation and Assessment … 141
question and the average time for participating in the question. How many players
answered to options? Through these details assess the student’s level of performance
for the particular topic.
Figure 5 shows the live performance of the student’s level. Overall, what is the
score of the class? Fig. 5 shows the 60% accuracy for the corresponding topic, the
142 R. Arulmurugan et al.
number of correct answers shown in green, and the number of wrong answers shown
in red. Not only overall, but individual students’ performance charts are also shown
in the slide bar method and points.
Once the teacher clicks on the activity end button. Figure 6 shows will appear
on the screen with three top-scored student candidates. It creates a spark for all the
candidates for the active preparation and in the upcoming activity.
Figure 7 shows the ICT assessment of the question. In each question how many
students correctly answered and wrongly answered highlighted by green and red
colors. Which question got red color shows students are weak or poor on the particular
topic. Through the chart, the teacher conducts the revision class or activity with
respect to the score of each question.
The ICT online quiz method is used to not only encourage the student’s perfor-
mance, in addition, help to reduce the document and micro result analysis. Figure 7
shows the automatic micro result analysis of the concern test sample. Figure 8 shows
the individual performance of all the questions. Figure 5 shows the overall class
average marks, etc. This real statistical analysis helped to identify the student’s
144 R. Arulmurugan et al.
weakness area of the question. The ICT tool method is used to create interest in
participating in the quiz activity and enhance students’ learning levels.
4 Conclusion
References
Abstract The online education system became famous during the Covid pandemic
time. Before the pandemic, there was very low percentage of utilization of online
education systems. Information Communication Technology (ICT) offers various
online education tools such as online meetings for oral interactions, online assign-
ment collection through Google Forms, online evaluation through Google Spread-
sheet, enhancing the student’s learning, appreciation of the student’s assignment, and
creation of e-study materials in terms of e-audio and e-video contents. Assessing the
student’s learning level through poll activity, enhancing student creativity through
mind map activity, developing student critical thinking using brainstorming activity,
etc. Especially for individual assessments, student participation monitoring is very
easy compared to the conventional teaching method. The online method converts the
teaching methodology into a teaching and learning method. This article discusses the
various effectiveness of the education system required, such as refreshing, motivation,
and enthusiasm for participation. Especially if some physical activity is conducted in
the noon to activate the brain before starting the class. E-audio content helps to recap
the concept after long days and this type of e-audio content is highly supported for
slow learners. Smiley and thumbs-up emojis motivate the students to actively partic-
ipate in the work. Finally, a discussion on the poll activity to re-learn the missed and
wrong concepts.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 147
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_13
148 N. Vengadachalam et al.
1 Introduction
Online education systems and culture became famous during the COVID-19
pandemic time. Before that, only a few of the faculty has an online classroom plat-
form Moodle. The pandemic asked us to shift from conventional teaching methods
to online platform methods. Various ICT tools help to conduct the class more effec-
tively compared to offline classrooms. The only drawback is the personal touch,
the student’s site was missing, other than that interaction with the students, time
management, and activity conduction are highly possible during online classes. The
evaluation of individual students. Analysis of the student’s learning level through the
end of the session with a 5 min poll activity to conclude that the percentage of the
students reached the topics. Online platforms are used to enhance the student’s level
because most percentage of the students are addicted to mobile phones, the question
is does the candidate use an effective method? These online class sections help to
create e-materials such as audio materials, PowerPoint materials, video materials,
etc. I fully enjoyed this material preparation. Once the class is completed the audio
material is shared with the online classroom instead of WhatsApp because the online
classroom material keeps the content on long days but WhatsApp shared content
possibility to be erased after some days. Especially for absentees and slow learners,
this e-content very much helped. The preparation of e-content such as a video of
the lesson was easy while taking the class itself. Through this method, n-number
video materials are possible to develop and easy to share through the YouTube plat-
form. These platforms helped to store the e-video materials. In this article, a detailed
discussion of various innovative methods was elaborated.
2 Literature Survey
For developing and engaging the students through experimental learning, SFIMAR
(St. Francis Institute of Management and Research) involved a few case studies
of experimental teaching pedagogy. Experimental learning stimulates the learner’s
curiosity and enhances the understanding level [1]. Learner-Centric approach is fore-
front. The NEP 2020 suggested critical thinking for employability and fostering
experimental learning as an objective. Learner practicing was enhanced by the
experimental learning. Recent NAAC and NBA accreditations are expected for
experimental-based learning and ICT-based learning activity [2]. Created a higher
level of cross-cultural awareness and understanding that education in a project
manner instead of memorizing data. The experimental learning activity, viz. business
games, role-playing, virtual reality, and computer-based simulation. Experimental-
based learning enhances the importance of the university in the support of educa-
tional and teaching pedagogical tools. Deeper learning is another popular of experi-
mental learning known as project-based learning [3]. Through the activity, students
get deeper technical knowledge on the corresponding topics. It concentrates on the
Effectiveness of Online Education System 149
four main concepts such as social interactions, activity building, cognitive tools,
and contextual learning. The project base comes in several forms and encourages
the students to ask questions in the relevant domain. Through question and answer,
learners got the motivation and remembrance [4]. Project learning helps to minimize
the following drawbacks such as failure to meet ineffective participation, resulting in
missed teaching time, student-driven projects getting off-topic, and failure to meet
academic requirements. The project-based learning shows motivation in the studies
and enhances student’s problem-solving ability and confidence level. Through exper-
imental learning, faculty gives fast feedback about the student’s learning level and
understand better. On the other side increases the student’s self-learning pace through
the activity. In addition, with student’s enthusiasm, and confidence level [5] the exper-
imental activity improves the results every consecutive year. While conducting the
activity the tutor learned the shortfall of the activity, it helped to improve the activity
in a better way [6–8]. It is noticed that the learner needed clear objectives and a
picture of how to perform the activity [7]. Project-based learning highly encouraged
student’s critical thinking, innovation, and creative skills.
The learning creation is very important for every course, for example, in school
children said the following. Some of the students said that “Mathematic subject not
interesting and like”, some of the rural background students saying that “English is
one of the toughest subjects” but the same English subject is very easy saying by
city students. The reason behind this is a lack of knowledge, practice, and avail-
ability of resource guidance. The availability of guidance plays a very important
role. i.e. Teacher. How do teachers teach the subject? Some teachers teach compli-
cated subjects in the simplest method, some teacher teaches simple concepts in the
complicated method. On the other side, the teacher’s attitude is very important to
reflect to the students that it is the easiest subject and not the easiest subject. The
teacher-treating method, even though some of the teachers are not strong in the
subject knowledge but the treating method is heart touchable because these students
get interested in studying the subject and taking them as role models. If you look at
who is the best teacher when asking the students, who behaves friendly manner and
connects to the students. Touch the student’s heart, the teacher got good feedback.
So creating interest in the subject is very important. Once enter the class shouldn’t
start with the syllabus. Discuss the general things related to the subject and link to
the subject that is called the analogy concept. So creating interest and enthusiasm
for the course is very essential. The second major one is less motivation or lack of
confidence skills. Most percentage of the students have an inferiority complex and
lack confidence skills about their careers. The third problem is concentration on the
150 N. Vengadachalam et al.
subject. There is no gap between the first period to the second period hour. Due to
the continuous classes on forenoon and afternoon, students lost their concentration
power. Especially creating interest in the subject, students never concentrate on the
subject.
These three are major problems for every education sector to avoid these problems,
I was habituated to conducting the following activities, such as:
If my classes are in the forenoon session, I habituated to conduct the five-minute
meditation activity. It was very easy and effective during online class sessions. During
the offline class somewhat accepted but not 100% satisfactory level, But the online
class session was a great success. I was in the habit of playing the five-minute prayer
song, the song was played by a blind person. The outcome of the songs said that:
a. If you have a skill you can win, so develop the skills.
b. Getting inspired because the blind person succeeds and lives without an eye,
what about you?
c. Getting relaxed by hearing the song
d. Change the stressed mind to normal
In the end, students got enthusiastic and interested in attending the class. The five-
minute short break creates mind freshness. Figure 1 shows the prayer song picture
sung by a blind girl. During the offline session, this type of activity is a little bit
difficult because of the arrangement of the system, playing, speaker availability,
etc. Due to this, I asked the students to close their eyes for three to five minutes.
Sometimes I instructed them to observe the surrounding noise or asked them to
observe their breathing. In the afternoon session ask them to perform short puzzle
mind and physical games for example,
a. Asked the students to repeat what I am saying, e.g., standup, sit-down, clap, jump,
walk, hi-fi
b. In level two asked to do reverse, e.g., when I say standup means students need to
perform sit-down, when I say clap means, students need to jump. Like that.
c. Look on eye to eye contact with their friend, etc.
As research says, songs are one type of drag, these songs and sounds are used to
mesmerize the person. So, these types of methods are used to enhance the student’s
concentration level in the class hour period.
Figures 2 and 3 show the brain and physical activity. These activities are used to
create a refresh to the mind and body.
During the afternoon session or after lunch session student’s concentration level
is very less even when students try to spend more interest on the subject because
of the climate condition, and food consumed. Due to this, the concentration on the
subject is reduced. The overall concept for high I.Q level students is the remaining
level student’s concentration on the particular session for a maximum of seven to
fifteen minutes, after that, the mind is going to think either past or future. Due to
these types of mind disturbance, the subject or class concentration is very low. It
is overcome by some physical activity. Through the activity, remember the topic
discussed at the time. Some of the physical activity helps to avoid laziness and sleep
during class hours. As Fig. 3 shows one type of physical hand activity. Through the
activity, students left and right brains start to function. Most of the time right brain
is sleepy, and this type of activity helps to create freshness in the students. In earlier
days, thoppukaranam word was used in the Tamil language it’s called super brain
yoga. The super brain yoga method is used to create a spark in left and right brain
activity.
The e-audio and e-video subject content is very helpful for recapping the concept
after a long day. Every semester has a minimum of 105 days or three and a half
months. Every human stores content in their mind that is possible to erase after
seven to ten days, that’s why the teaching faculty starts to prepare the course content
before handling the period. Every time lot of effort, examples, and analogies are
used to explain the concept but later, after some days all the concepts evaporate
from the mind. Without frequent recaps, all the concepts are erased from the mind.
These problems are overcome by the creation of e-audio and e-video content, for
example, after completing the unit, I start to prepare one audio file, and in that file
short explanation about the entire unit of the content. These audio files are shared
after the completed unit. When students during free time or exam time these types of
audio content very much helped. The video content was uploaded to YouTube and
shared the link to the students through Moodle Classroom and WhatsApp.
Figure 4 shows the screenshot image of Homework content shared through the
online classroom My-one-Note page. This page shows the content for tomorrow’s
homework, related video, and instruction audio clip. That audio clip gave a short
about the home work and details are there in the video content. The recap of the
content is prepared in the audio file.
Figure 5 shows the pre-requisite content for the next chapter. After completing
the chapter, one prepares the recap of the related content audio file and asks them to
listen to the content and revise the Chap. 1 content. Followed by the recap activity
through question and answer as shown in Fig. 6.
Effectiveness of Online Education System 153
Figure 6 shows the possibility of the question listed in Chapter 1. Students need
to answer the question himself/herself to know their learning level. Followed by
clicking the audio recording file to listen to the hints of the content.
Figure 7 shows the sample screenshot of the recap activity. The above One-Note
screen contains some balloons and smiley icons. These icons create some energy for
the students in their minds, for example, if you look at the school children’s hands
they come with stars on their hands. A single star, three stars, etc. These stars indicate
effective participation in the classroom activity and scoring level. The children show
these stars to their parents and show with enthusiasm. The similar pattern continues
to show this simile and thumb indication.
154 N. Vengadachalam et al.
Figure 8 shows the screenshot of the homework image and related video content.
The homework was conducted through the activity. It clearly shows what to do by
the students. Step 1 written shared about today’s class video without audio content,
i.e. the video shows only pictures and explanation of the images without audio.
Step 2 asked the student to prepare the dubbing voice for the video content. Step 3
asked them to upload into Google Form. Through the activity, students enhance their
remembering, and communication skills. These written instructions are developed
by oral audio files too. When students click to listen to the audio content.
Effectiveness of Online Education System 155
Figure 9 shows the tutor screen sharing window and participant viewing screen
window. The tutor screen has an option to be seen by the students. Once clicked,
copy the link, and share it to the student’s WhatsApp group or Google Classroom.
When students click the link, they can see the current pages, past pages of the subject.
But students couldn’t edit the screen. The tutor has only the option to edit. Student’s
n-number of times click to listen and watch the video contents.
The Poll activity is very helpful for a summary of the session. At the end of the
five minutes, the session is allotted for a summary activity of today’s content. This
summary content helps to recap the learned topic by the participating candidate. On
the other side to know the student’s learning level by conducting summary activity.
The summary of the content is performed by conducting a poll method. Write the
question with four options. Asked the students to answer by clicking either A, B,
C, or D. Once the allotted time is over stop the poll to display the percentage of
the students who answered on A, B, C, and D of the answer. The percentage of the
students scored based on coming to an understanding of the student’s learning level.
At the end of the poll, the content is explained as to what is correct and what is wrong
by explanation to understand the topic in deep as shown in Fig. 10.
156 N. Vengadachalam et al.
Fig. 9 The tutor screen sharing window and participant viewing screen window
6 Conclusion
The offline education system is more effective for one-to-one interaction, personal
touch, or connection with the student participation but assesses the student’s learning
level as per the rubrics and enhances student’s critical thinking, brainstorming
activity, etc. complicated tasks. The young buddy engineer is addicted and inter-
ested in utilizing the smart mobile phone. Some of the students are flowing in the
correct direction, which means utilizing the internet, and YouTube for healthy educa-
tion, in another side fifty percent of the schools and college students are addicted
to playing video games, watching movies, short videos, etc. Sometimes students go
to extreme conditions to lose the game and lose money for playing the game, so a
teacher needs to create enthusiasm, encouragement, and motivation in the life and
courses by playing short meditation prayers and motivation videos to change the
student’s additive into valuable life. In this session, various online ICT tools were
utilized for effective monitoring, to encourage the students’ performance.
References
Abstract The COVID-19 pandemic taught us to take the class through online mode.
It forced us to move the conventional chalk and board method into the online using
Google Meet, Zoom Meet, Microsoft Team, etc. This platform helps to interact
with the entire class of students on a single screen. It helps to deliver the content
and interact with the students. Some of the free apps provided limited features,
compared to the Paint app. The full paid app has a recording session, conducting
a poll. The evaluation of the student performance needs to go for the Information
Communication Technology (ICT) tool. These tool helps to enhance the student’s
critical thinking, learning level, content beyond content view, etc. In addition, these
tool helps to evaluate the student’s learning level easily. The outcome of the mind-
map and brainstorming activities helps to recap the content after long days. For
performing the mind-map activity students need to utilize all Bloom’s levels for
example remembering, understanding, applying, analyzing, evaluating, and creating.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 159
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_14
160 V. S. Seethalakshmi et al.
1 Introduction
Online education systems and culture became famous during the COVID-19
pandemic time. Before that, only a few of the faculty has an online classroom plat-
form Moodle. The pandemic asked us to shift from conventional teaching methods to
online platform methods. Various ICT tools help to conduct the class more effectively
compared to offline classrooms [1–5]. The only drawback is the physical presence,
the student’s site was missing, other than that interaction with the students, time
management, and activity conduction are highly possible during online classes [2–
6]. The evaluation of individual students by conducting a poll activity. Analysis of the
student’s learning level through the end of the session with 5 min of poll activity to
conclude that the percentage of the students reached the topics [3]. Online platforms
are used to enhance the student’s level because most percentage of the students are
addicted to mobile phones; the question is does the candidate uses an effective method
[4]. These online class sections help to create e-materials such as audio materials,
PowerPoint materials, and video materials. I fully enjoyed this material preparation
[7]. Once the class is completed the audio material is shared in the online classroom
instead of WhatsApp because the online classroom material keeps the content on
long days but WhatsApp shared content possibility to be erased after some days.
Especially absentees and slow learners, this e-content very much helped. The prepa-
ration of e-content such as a video of the lesson was easy while taking the class
itself [8]. Through the method, n-number video materials are possible to develop
and easy to share through YouTube platform. These platforms helped to store the
e-video materials. This article discussed various innovative methods used to attract
the participants and enhance the learning level.
2 Literature Survey
For developing and engaging the students through experimental learning, SFIMAR
(St. Francis Institute of Management and Research) involved a few case studies
of experimental teaching pedagogy. Experimental learning stimulates the learner’s
curiosity and enhances the understanding level [1]. The learner-centric approach
is fore-front. The NEP 2020 suggested critical thinking for employability and
fostering experimental learning as an objective. Learner by practicing was enhanced
by experimental learning. Recent NAAC and NBA accreditation are expected
for experimental-based learning and ICT-based learning activity [2]. Created a
higher level of cross-cultural awareness and understanding that education in a
project manner instead of memorizing data. The experimental learning activity, viz.
business games, role-playing, virtual reality, and computer-based simulation. The
experimental-based learning enhances the importance of the university in the support
of educational and teaching pedagogical tools. Deeper learning is another popular
of experimental learning known as project-based learning [3]. Through the activity,
A Formula for Effective Evaluation Practice Using Online Education Tool 161
The mind maps creativity to think students critically. Through the activity, students
remember, communicate, analyze, design, and apply skills enhanced. The miro.com
offers to create the mind map diagram. The beginning shows one sample of the mind
map and how it linked to the previous and next shows to create some basic idea about
the mind map as shown in Fig. 1. Followed by asking the students to draw to the
assigned topic. Through the activity, students start to recap the concept, understand
the basics, design the mapping, analyze the next and previous mapping content, and
finally apply the content to the mapping.
4 Brainstorm Activity
Every student expects the credit from the teacher. As a teacher, we need to encourage
the students to enhance the student’s confidence level. Simple appreciation and
encouragement enhance student’s level of performance better than the conventional
method. In recent days. the instructions given to all the students are as follows:
A Formula for Effective Evaluation Practice Using Online Education Tool 163
The online classroom platform helps to share the content and hold the content for a
long time. In recent day’s lot of free online classroom content came into the picture
such as Google Classroom, Edmodo, Moodle Classroom, etc. These Moodle class-
rooms are offered by various domains such as moodlecloud.com, gnomio.com, etc.
In addition, some free website also helps to create an online classroom. Google
too offers a free website crating facility. In earlier days wordpress.com, webs, etc.
164 V. S. Seethalakshmi et al.
offer free website platforms. Figure 4 shows the Moodle Classroom screenshot. The
Moodle Classroom is used to access the content without essence for logging, joining
the class, attending the quiz activity, or getting the permission from course handler.
This Moodle Classroom helps all the students, access the material everywhere in the
world.
7 Conclusion
The article showed various online education tools utilized to assess the student’s
performance effectively. This article described the various assessment tools that help
to enhance the student’s critical thinking, and brainstorming activity, and encouraged
assignment activity. These tools and methods are used to enhance and attract student
participation in the class. These tools help to remember the studied content, through
discussion with the team, understand the pros and cons of the content, novel design
or tree diagram method helps to enhance the creativity of the content, after creating
an analysis of the content with exciting and finally apply the all the content into
the map. The conventional teaching method doesn’t encourage learning and several
challenges for evaluating the students’ performance. The e-learning platforms attract
and motivate the student-centric approach to learning methodology.
A Formula for Effective Evaluation Practice Using Online Education Tool 165
References
Keywords Digital learning · Online learning · Learning process · TRA & TAM
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 167
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_15
168 A. Srivastava and N. S. Nautiyal
way that people learn, particularly in higher education, is mobile learning. Various
studies have defined the term “digital learning” [5].
Due to the fact that mobile applications for digital learning can be used anytime,
anywhere, even in remote locations, and that learners can benefit from them, many
educational institutions have realized the potential of digital learning as a teaching
tool for their students and have incorporated it into the distance learning environ-
ment [6]. Though the majority of research has focused on the value and applicability
of e-learning, distance learning, and digital learning in the context of the adoption
of distance education technologies, some researchers have been unable to investi-
gate the primary motivation and intention behind why students in higher education
choose to adopt digital learning applications. Understanding users’ intentions to use
technology has grown to be one of the most difficult issues for information system
researchers, according to Teo [7]. The literature has demonstrated that characteristics
connected to the acceptability of technology were used to identify researchers’ inter-
ests in information system studies [8, 9]. Because of this, information system experts
have created intention models to aid in forecasting and elucidating the adoption of
technology across numerous fields.
According to research, students are technologically knowledgeable and desire
to use mobile applications to access the learning opportunities provided by their
institution [10, 11] used TRA and TAM to study users’ acceptance of computer tech-
nology, [12] adopted TAM to explore teacher acceptance of e-learning technology,
and [13] applied both TRA and TAM to examine student–teachers’ intention to use
computer technology. Despite the extensive application of TRA and TAM in research
studies, few, if any, have explored an integration of TRA and TAM to predict and
explain students’ intention to use (mobile) m-learning in developing countries, since
m-learning is fairly new in educational environments in these countries. With the
increase in tendencies to adopt digital learning, learners are expanding their bound-
aries with the use of numerous mobile apps. Past studies on virtual learning have
identified a few fundamental constructs and benefits of digital learning: Technical
progression, utility, quality, flexibility, learning at own pace, cost-saving, secured
content, and increased outreach [14].
Several scholars have shown their curiosity about the safe and disruptive learning
potential of mobile learning [15] which has gained global popularity. Some empirical
studies have been done by various researchers to investigate the effect of perceived
usefulness, ease of use, and perceived security on digital or mobile learning because
these elements are considered important factors for the adoption of digital learning
[16].
Despite numerous studies in this area, no study has been conducted taking a
comprehensive framework of all variables that helps to identify factors affecting
the intention of adoption of mobile learning apps. The rest of the paper has been
organized into following sections: Literature review, research methodology, data
analysis, findings and discussion, and conclusion.
Deciphering the Catalysts Influencing the Willingness to Embrace … 169
2 Literature Review
services can be better strategies for encouraging the learning for the adoption and
acceptance of digital learning [25]. However, quality content provides better under-
standing through digital learning apps and motivates the perspective to adoption
digital learning [26]. In addition, the digital learning apps help users to read, search
and to collect the subject-related content or data while using the app will provide
satisfaction in studying [27]. So, the rich graphic elements, proper content structure,
latest learning materials, refined and short content structure and an attractive inter-
face can also be one of the major determinants for the adoption of digital learning
apps. In light of the above discussion, the study needs to identify the factors of the
intention of adoption of mobile learning apps and their impact on performance of
students.
3 Research Methodology
The study has used a structured questionnaire to collect the responses on the different
statements/items used to explore the factors affecting the intention of adoption of
mobile learning apps and their impact on performance of students. The respondents
are the students pursuing master’s level courses at a Central university of western
India and a census has been done. The central University has been selected as it is
believed to have the best resources available. The questionnaire was developed on a
five-point Likert scale where 1 stands for strongly disagree and 5 stands for strongly
agree. The questionnaire has been floated to the students of the university. The popu-
lation was 300 but a total of 280 questionnaires were floated as 20 students were
absent that day. Out of 280, only 249 questionnaires were complete in all aspects
and were suitable to proceed for data analysis. Further exploratory factor analysis
has been conducted to identify the factors and then multiple regression analysis has
been used to measure the impact of those factors on the performance of students.
The dependent variable is the performance of students and the independent variables
are utility, flexibility, cost-effectiveness, technical advancement, quality perception
and accessibility (gained through exploratory factor analysis). The questionnaire
was tested for reliability and content validity. The reliability was tested using Cron-
bach alpha and Guttman’s split-half test. Both these tests are for measuring internal
consistency.
Cronbach alpha: It is the most common measure of internal consistency for
reliability and it is a method that needs only one-time test administration but provides
an exclusive approximation of a test’s reliability, [28, 29]. The analysis resulted in
an overall Cronbach alpha score of 0.82.
Guttmann’s split half-reliability: A fundamental postulation of split-half reli-
ability is that the two halves of the test should yield similar true scores and error
variances when the test items are focused on the construct. It means it measures
the limit to which all parts of the tool uniformly contribute to the content to be
measured. To use split-half reliability, the items were divided into two equal halves,
and the different halves were administered to study participants, and analyses were
Deciphering the Catalysts Influencing the Willingness to Embrace … 171
•Chronbach=0.82
Reliability of •Guttman Split half=0.70
Questionnaire
run between the two respective “split-halves”. A Spearman’s rho correlation was run
between the two halves of the instrument. Then SPSS software was used to conduct
the split-half reliability. The range of this coefficient varies from 0 to 1.0. In this
study, the value of split-half reliability is 0.70 which indicates the high reliability of
the instrument (Fig. 1).
4 Data Analysis
Firstly, the statements were analyzed for exploring factors via factor analysis. This is
also known as the dimension reduction technique as it summarizes statements based
on their similarities. To perform it first we need to conduct KMO and Barlett’s test
of sphericity. The results suggest an adequate sample and permits us to proceed with
factor analysis.
172 A. Srivastava and N. S. Nautiyal
Component Initial eigenvalues Extraction sums of squared loadings Rotation sums of squared loadings
Total % of variance Cumulative % Total % of variance Cumulative % Total % of variance Cumulative %
21 7.755E−17 3.525E−16 100.000
22 −1.512E−17 −6.873E−17 100.000
A. Srivastava and N. S. Nautiyal
Deciphering the Catalysts Influencing the Willingness to Embrace … 175
Table 5 ANOVA
Model Sum of Squares df Mean Square F Sig
1 Regression 256.410 6 42.735 4154.456 0.000b
Residual 2.489 242 0.010
Total 258.900 248
176 A. Srivastava and N. S. Nautiyal
Table 6 Coefficients
Model Unstandardized Standardized t Sig Collinearity
coefficients coefficients statistics
B Std. error Beta Tolerance VIF
1 (Constant) 4.020 0.006 625.460 0.000
REGR −0.023 0.006 −0.022 −3.559 0.000 1.000 1.000
factor
score 1
REGR −0.025 0.006 −0.025 −3.902 0.000 1.000 1.000
factor
score 2
REGR −0.025 0.006 −0.024 −3.833 0.000 1.000 1.000
factor
score 3
REGR 1.016 0.006 0.994 157.712 0.000 1.000 1.000
factor
score 4
REGR −0.021 0.006 −0.021 −3.281 0.001 1.000 1.000
factor
score 5
REGR 0.003 0.006 0.003 0.422 0.673 1.000 1.000
factor
score 6
inherent to the apps themselves or are employed to evaluate the merits of the body of
research currently available on mobile learning apps. Nevertheless, the study demon-
strates that opinions regarding whether or not mobile learning apps are beneficial to
students differ significantly. Even though most of the variables in this aspect are
important, they are hurting students’ performance. Because digital learning is more
affordable than just using computers or laptops, especially if done through mobile
apps, it is regarded as a popular uprising on its own. Six factors have been recognized
by the exploratory factor analysis: cost-effectiveness, accessibility, quality percep-
tion, technological advancement, flexibility, and utility. These are all either features
of mobile learning applications or advantages that have been derived from the body
of existing research and verified using reliability statistics. The survey does, however,
unequivocally show that opinions about how useful and helpful digital learning apps
are to students differ greatly. In this regard, the majority of the variables, while a
significant negative impact on students’ performance.
This undesirable association is largely due to distractions which are frequent
because of the use of mobile learning applications, for instance, the games and
frequent advertisement. The problems can be summarised in this perspective of
usefulness: it can offer numerous benefits when used properly, but if it is handled
carelessly, it can completely ruin everything. Different factors come together to
influence people’s decision to use digital learning apps as educational tools, and
these factors also determine people’s readiness to use them. Based on the previously
Deciphering the Catalysts Influencing the Willingness to Embrace … 177
presented data, the following conclusions can be drawn about the factors influencing
the propensity to use digital learning applications:
Professed Usefulness: People’s inclination to use digital learning apps is signifi-
cantly influenced by their perceived benefits. If people believe that using technology
will improve and add value to their educational experience, they are more likely to
adopt it.
Technological Proficiency: New digital learning resources are more likely to
be welcomed by those who are more at ease with technology. The willingness
to use digital learning apps is strongly influenced by one’s level of technological
competency.
Perceived Ease of Use: Technology that is navigable, intuitive, and easy to use
influences adoption intentions in a good way. People’s willingness to use digital
learning apps is significantly influenced by how simple they believe them to be to
use.
Social Influence: People who use technology tend to be those who support or
endorse it. Social factors, like peer endorsements, celebrity endorsements, or teacher
support, can have a big impact on people’s decisions to use digital learning tools.
Institutional Support: People are more likely to adopt new technology when it is
supported and encouraged in schools. The adoption of digital learning applications
can be greatly influenced by legislative and educational support.
Infrastructure and accessibility: One essential technological infrastructure that
can support or impede the desire to adopt digital learning applications is having
access to devices and reliable internet connectivity.
These factors need to be fully addressed for the adoption of digital learning apps
to be successful. To establish an environment that supports the advantages of digital
learning, provides users with the resources and training they require, and guarantees
a seamless and safe experience, cooperation between educational institutions, legis-
lators, and technology suppliers is required. By doing this, we can make the most
of the revolutionary potential of digital learning apps to revolutionize the educa-
tional landscape and provide excellent, engaging learning opportunities for students
everywhere. Educational stakeholders can facilitate the seamless and efficient inte-
gration of digital learning apps into the classroom by recognizing and addressing
these issues.
6 Conclusion
support, awareness and education are necessary to maintain positive attitudes towards
these applications. Trust and security concerns must be addressed to instil confidence
in potential customers. Many factors influence an individual’s willingness to adopt
digital learning applications, such as perceived value, usability, technical expertise,
social impact, accessibility, institutional support, personal creativity, and cost. To
ensure the widespread adoption of digital learning applications, developers, educa-
tors, and policymakers must consider these variables and manage them appropriately.
By understanding what factors influence adoption intentions, they can develop and
market digital learning applications that better meet the needs and preferences of
teachers and students.
References
1. Costly KC (2014) The positive effects of technology on teaching and student learning. ERIC.
https://eric.ed.gov/?id=ED554557
2. Akour H (2009) Determinants of mobile learning acceptance: an empirical investigation in
higher education, Dissertation, Oklahoma State University, Oklahoma
3. Lillejord BK, Nesja K, Ruud E (2018) Learning and teaching with technology in higher
education – a systematic review, Oslo: Knowledge Centre for Education. www.kunnskaps
senter.no
4. Klimova B, Poulová P (2016) Surveying university teaching and students’ learning styles. 19.
444–458. https://doi.org/10.1504/IJIL.2016.076794
5. Hwang GJ, Tsai CC (2011) Research trends in mobile and ubiquitous learning: a review of
publications in selected journals from 2001 to 2010. Br J Edu Technol 42(4):E65–E70
6. Park SY (2009) An analysis of the technology acceptance model in understanding university
students’ behavioral intention to use e-learning. J Educ Technol Soc 12(3):150–162
7. Teo T (2009) Modelling technology acceptance in education: a study of pre-service teachers.
Comput Educ 52(2):302–312. https://doi.org/10.1016/j.compedu.2008.08.006
8. Legris P, Ingham J, Collerette P (2003) Why do people use information technology? A critical
review of the technology acceptance model. Inform Manage 40:191–204. https://doi.org/10.
1016/S0378-7206(01)00143-4
9. King W, He J (2006) Understanding the role and methods of meta-analysis in IS research.
Communications of The Ais—CAIS 16
10. Arksey H, O’Malley L (2005) Scoping studies: towards a methodological framework. Int
J Social Res Methodol Theory Pract 8(1):19–32. https://doi.org/10.1080/136455703200011
9616
11. Davis FD (1989) Perceived usefulness, perceived ease, and user acceptance of information
technology. MIS Quarterly
12. Yuen AH, Ma WW (2008) Exploring teacher acceptance of e-learning technology. Asia-Pacific
J Teacher Educ 36:229–243
13. Teo T, Schaik P (2012) Understanding the intention to use technology by preservice teachers: an
empirical test of competing theoretical models. Int J Human–Computer Interact 28(3):178–188.
https://doi.org/10.1080/10447318.2011.581892
14. Dung DTH (2020) The advantages and disadvantages of virtual learning. IOSR J Res Method
Educ (IOSR-JRME) 10(3):45–48
15. Motiwalla LF (2007) Mobile learning: a framework and evaluation. Comput Educ 49(3):581–
596
16. Jahangir N, Begum N (2008) The role of perceived usefulness, perceived ease of use, security
and privacy, and customer attitude to engender customer adaptation in the context of electronic
banking. Afr J Bus Manage 2(2):32
Deciphering the Catalysts Influencing the Willingness to Embrace … 179
17. Nevin R (2009) Supporting 21st century learning through google apps. Teach Libr 37(2):35–38
18. Roca JC, Chiu CM, Martínez FJ (2006) Understanding e-learning continuance intention: an
extension of the technology acceptance model. Int J Hum Comput Stud 64(8):683–696. https://
doi.org/10.1016/j.ijhcs.2006.01.003
19. Caudill JG (2007) The growth of m-learning and the growth of mobile computing: Parallel
developments. The International Review of Research in Open and Distributed Learning
8(2):(2007). https://doi.org/10.19173/irrodl.v8i2.348
20. Sarrab M, Elbasir M, Alnaeli S (2016) Towards a quality model of technical aspects for mobile
learning services: an empirical investigation. Comput Hum Behav 55:100–112
21. Bonk CJ, Wisher RA, Lee JY (2004) Moderating learner-centered e-learning: Problems and
solutions, benefits and implications. In: Online collaborative learning: Theory and practice, pp
54–85. IGI Global
22. Jou M, Tennyson RD, Wang J, Huang SY (2016) A study on the usability of E-books and
APP in engineering courses: a case study on mechanical drawing. Comput Educ 92:181–193.
https://doi.org/10.1016/j.compedu.2015.10.004
23. Hsu CL, Lin JCC (2015) What drives purchase intention for paid mobile apps?–An expectation
confirmation model with perceived value. Electron Commer Res Appl 14(1):46–57. https://doi.
org/10.1016/j.elerap.2014.11.003
24. Fondevila Gascon JF, Carreras Alcalde M, Seebach S, Pesqueira Zamora MJ (2015) How elders
evaluate apps: a contribution to the study of smartphones and to the analysis of the usefulness
and accessibility of ICTS for older adults. Mobile Media Commun 3(2):250–266. https://doi.
org/10.1177/2050157914560185
25. Almaiah MA, Al Mulhem A (2019) Analysis of the essential factors affecting of intention to use
of mobile learning applications: a comparison between universities adopters and non-adopters.
Educ Inform Technol 24(2):1433–1468
26. Hashim KF, Tan FB, Rashid A (2015) Adult learners’ intention to adopt mobile learning: a
motivational perspective. Br J Edu Technol 46(2):381–390
27. Alqahtani M, Mohammad H (2015) Mobile applications’ impact on student performance and
satisfaction. Turkish Online J Educ Technol-TOJET 14(4):102–112
28. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika,
16, 297–334. (1951) https://doi.org/10.1007/BF02310555
29. Kaiser HF, Michael WB (1975) Domain validity and generalizability. Educational and
psychological measurement 35:31–35. https://doi.org/10.1177/001316447503500103
Pedagogical Explorations in ICT:
Navigating the Educational Landscape
with Web 2.0, 3.0, and 4.0
for Transformative Learning Experiences
Abstract This study examines the substantial effects of Web 2.0, 3.0, and 4.0 on
content production, distribution, and assessment. The three main goals of this study
are to: first, thoroughly examine the development of content creation, delivery, and
evaluation from Web 2.0 to Web 4.0; second, recognize and evaluate the advantages
and drawbacks of using ICT tools in pedagogy; and third, offer forward-looking
suggestions for the future development and application of content creation, delivery,
and evaluation approaches. Five stimulating case studies are used in this paper to
further highlight the useful advantages of using Web 2.0, 3.0, and 4.0 technologies
in educational settings. These case studies are thoroughly examined to show the
transformational potential of ICT in education. The results of the study identify
various barriers to the apt use of ICT tools and also propose a model VITAL CRIMP
for study such as the need for the required technological infrastructure, in-depth
teacher training, and ensuring equal access to technology.
Keywords Web 2.0 · Web 3.0 · Web 4.0 · ICT in pedagogy · Transformative
learning
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 181
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_16
182 N. S. Nautiyal and D. Mashru
The study compiles the most recent data, offers insights into best practices, and
inspires additional research and innovation in the area of ICT in education. By
utilizing the transformative potential of ICT, educators may create efficient, inter-
esting, and individualized learning experiences that prepare students for the demands
of the twenty-first century. During this era known as Industry 4.0, many sectors
of life, including education, witnessed rapid change. Education systems must be
updated to meet the skilled workforce requirements of this dynamic process. In the
near future, it is projected that smart products, services, and business opportunities
will be widely used across all industries. The complete digital transformation of
instructional processes supports and guides the use of technical, human, organiza-
tional, and pedagogical factors. Education 4.0 aims to equip students with a range
of skills, including cognitive, social, interpersonal, and technical ones, in order to
meet the demands of the Fourth Industrial Revolution and address global concerns,
such as reducing the causes and consequences of climate change based on public
awareness.
A. To look at how Web 2.0 and Web 4.0 have changed how material is created,
delivered, and evaluated in ICT-driven education.
B. To determine the benefits and drawbacks of integrating ICT technologies into
pedagogy for the development, delivery, and assessment of content.
C. To offer suggestions for a cutting-edge method of producing, distributing, and
evaluating material in the context of Web 2.0, 3.0, and 4.0.
I. Students are more engaged and involved in ICT-driven education when Web
2.0, 3.0, and 4.0 technologies are used in content production, distribution, and
assessment.
II. The use of ICT technologies in pedagogy to create, distribute, and evaluate
content raises the quality and efficacy of learning resources and teaching
strategies.
III. The use of Web 2.0, 3.0, and 4.0 in the production, distribution, and assessment
of content has a favorable impact on students’ learning outcomes and academic
success in ICT-powered education.
184 N. S. Nautiyal and D. Mashru
2 Literature Review
Table 1 shows the existing literature with respect to the use of Information and
Communication Technologies (ICT) in pedagogy.
The comparison of Web 2.0, 3.0, and 4.0 encompasses a lively investigation of
the changing digital environment and its influence on paradigms in education. Web
2.0 dramatically changed how information is consumed and shared by introducing
user-generated content, interactivity, and collaboration. Context-aware and machine-
understandable data were introduced with the shift to Web 3.0, also known as the
Semantic Web, opening the door for more intelligent search and customized content
delivery. This stage has been crucial in developing an online environment that is more
contextualized and networked, with possible effects on improving individualized
learning experiences. The study becomes more difficult when we explore Web 4.0,
which is highlighted by the Internet of Things (IoT) and growing artificial intelligence
integration. Web 4.0 has promise.
The Table 2 now includes the descriptions along with the objectives, tools, and
examples for content creation, content delivery, and content evaluation in Web 2.0,
Web 3.0, and Web 4.0.
Pedagogical Explorations in ICT: Navigating the Educational … 185
Table 1 (continued)
Sr Title of the source Names of author Key findings of the Research gap
sources identified
8 “Comparing frameworks Dede [8] ICT can be a catalyst Need for
for twenty-first century in enhancing the teaching
skills” development of methods that can
twenty-first century integrate these
skills technologies
efficiently
9 “NMC Horizon Report: Johnson, Adams Web technologies can Lack of
2014 K-12 Edition” Becker, Estrada & enable personalized longitudinal
Freeman [9] and collaborative studies that show
learning experiences the long-term
impacts
10 “Digital technology and Selwyn[10] Careful consideration Research is
the contemporary is needed when lacking in
university: Degrees of integrating digital tech addressing
digitization” in higher education digital
inequalities in
higher education
Comparing and contrasting the obstacles, benefits, and consequences of using ICT
in content production, delivery, and assessment, along with the discussion of notable
themes or patterns discovered, is part of the analysis of the findings from the literature
study and case studies.
Pedagogical Explorations in ICT: Navigating the Educational … 187
According to the literature study, the use of ICT in pedagogy has significantly
improved content generation, delivery, and assessment. The value of learner-centered
methodologies, collaborative learning, and individualized teaching in raising student
engagement and accomplishment was underlined by key theories and concepts. The
188 N. S. Nautiyal and D. Mashru
Table 3 (continued)
Case Basis Background ICT tools and Pedagogy and learning
study technologies outcomes affected
4 Mobile This case study Tools for Mobile learning made it
Learning for investigates the use real-time possible to access language
Language of Web 2.0 and Web translation, learning resources from
Education 3.0 technologies to mobile anywhere at any time,
develop mobile devices, and enabling personalized and
learning initiatives language independent study. Students
in language learning used language in a real way,
teaching applications worked on their
communication abilities, and
improved their fluency and
competency
5 AI-Powered This case study, Natural The AI-driven evaluation tools
Assessment in which incorporates language gave quick, individualized
Higher Web 4.0 processing, feedback, allowing for prompt
Education technology, looks at automated interventions and customized
the usage of evaluation support. Individualized
AI-powered systems, and instruction helped students
evaluation tools in ICT tools and learn more effectively and
higher education technologies comprehend the course
material more fully
research results highlighted the necessity for efficient programmes for teacher profes-
sional development to advance pedagogical abilities and ICT competence. Chal-
lenges were also mentioned in the literature, including the requirement for technical
infrastructure, teacher preparation, and guaranteeing that all pupils have equitable
access to technology.
The case studies offered actual instances of ICT tools and technologies being success-
fully applied in content generation, distribution, and assessment. Each case study
demonstrated distinct methods and results, demonstrating the adaptability of ICT
integration in various educational situations. Improved learning outcomes, personal-
ized learning experiences, and more student engagement were common benefits seen
throughout the case studies. The necessity for initial technology investment, contin-
uous technical assistance, and seamless integration of ICT into current curricula
and pedagogical practices were among the difficulties noted. The growing usage of
adaptive learning platforms, virtual reality simulations, and AI-powered evaluation
systems for more individualized and interactive learning experiences were notable
trends identified.
190 N. S. Nautiyal and D. Mashru
The investigation found that the difficulties, benefits, and results of adopting ICT in
content production, distribution, and assessment were both similar and different. The
need for technical infrastructure, teacher training, and tackling the digital gap among
pupils were common problems. Increased student involvement, better resource avail-
ability, and the possibility of individualized instruction were all benefits noted.
Results were reported in a variety of ways, including greater motivation and self-
directed learning, improved student accomplishment, and improved critical thinking
and problem-solving abilities. It was a key development that provided immersive
and interactive learning experiences when AI, ML, AR, and VR technologies were
combined.
The inquiry highlighted the growing trend of combining artificial intelligence (AI),
machine learning (ML), augmented reality (AR), and virtual reality (VR) technolo-
gies in content development, delivery, and evaluation, leading to more immersive and
personalized learning experiences. Because adaptive learning platforms and intelli-
gent tutoring systems are integrated, ICT can provide customized instruction and
flexible feedback. The relevance of student participation and information sharing
was highlighted by the focus on collaborative and social learning using Web 2.0
tools. The use of AI-driven assessment systems signaled a change to evaluation
practices that are more effective and data-driven. Overall, the research showed that
the incorporation of ICT in education has led to a move towards learner-centered,
interactive, and personalized methods.
By analyzing the findings from the literature review and case studies, this research
highlights the common challenges, advantages, and outcomes of using ICT in content
creation, delivery, and evaluation. The exploration of major themes sheds light on
how pedagogy is changing as well as the potential of ICT to revolutionize education.
These results add to the body of current information and provide a framework for
further study and the creation of best practices in ICT-integrated schooling.
the immersive capacities of Virtual and Augmented Reality (VR/AR) with the
sophisticated algorithms of Artificial Intelligence (AI).
What sets this model apart is its utilization of VR and AR to provide learners
with an enriched, three-dimensional interactive experience, elevating the traditional
confines of classroom learning. This immersive environment facilitates a deeper
understanding and engagement with the subject matter. Additionally, the integra-
tion of AI ensures that educational content is tailored to the individual learner’s
preferences and needs, optimizing the learning experience.
The synergy of AI and VR/AR not only amplifies interactivity and personalization
but also enhances the overall effectiveness and creativity of the learning process. It
fosters collaborative efforts, stimulates critical thinking, and presents opportunities
for innovative problem-solving within an enriched educational landscape.
In an era where technological advancements are consistently reshaping educa-
tional paradigms, the Vital CRIMP Model emerges as a leading-edge initiative.
It underscores the potential of harnessing technological integration to amplify
educational efficacy and innovation.
1. Objectives of the Model
• To integrate VR, AR, AI, and ML with ICT in pedagogy;
• To provide immersive learning experiences using VR and AR;
• To personalize learning using AI and ML;
• To facilitate communication and access to information using ICT;
• To revolutionize teaching and learning.
2. Advantages Over Existing Models
• Provides immersive learning experiences using VR and AR;
• Personalizes learning using AI and ML;
• Facilitates communication and access to information using ICT;
• Adapts to the learner’s level of understanding using ML;
• Revolutionizes teaching and learning.
Start
End
learning route for each student, ML algorithms analyze this data over time. This
guarantees that the learning process is always being improved (Fig. 1).
7 Conclusion
This research article has examined the significant impact of Web 2.0, 3.0, and 4.0 on
the production, distribution, and assessment of content in the context of ICT-driven
education. The adoption of ICT tools and technology has significantly altered how
educators create, deliver, and evaluate educational content. This study has demon-
strated the difficulties, benefits, and effects of integrating ICT in education through a
thorough assessment of the literature and analysis of case studies. The revolutionary
potential of Web 2.0, 3.0, and 4.0 to transform education is highlighted in this study
report. By using ICT in the design, delivery, and assessment of information, it is
possible to create effective, engaging, and personalized learning experiences. This
study, by synthesizing current information, offering insights into best practices, and
encouraging more research, significantly advances the topic of ICT in education. It
aids in maximizing ICT’s educational potential. This study highlights the dynamic
nature of technology and the value of ongoing professional growth, collaborative
learning settings, and effective use of cutting-edge tools. It provides intelligent guid-
ance on how educational institutions and teachers should employ Web 2.0, 3.0, and
4.0 in the classroom.
Pedagogical Explorations in ICT: Navigating the Educational … 193
References
1. Selwyn N (2010) ICT in pedagogy: unleashing the power of Web 2.0, 3.0, and 4.0 for
transformative learning experiences. J Comput Assist Learn 26(1):65–73
2. Voogt J, Roblin NP (2012) A comparative analysis of international frameworks for 21st century
competences: implications for national curriculum policies. J Curric Stud 44(3):299–321
3. Hew KF, Cheung WS (2013) Use of Web 2.0 technologies in K-12 and higher education: the
search for evidence-based practice. Educ Res Rev 9:47–64
4. Bower M (2015) Augmented reality in education: cases, places, and potentials. Educ Media
Int 52(1):1–15
5. Prensky M (2001) Digital natives, digital immigrants. On the Horizon 9(5):1–6
6. Kukulska-Hulme A, Traxler J (2007) Designing for mobile and wireless learning, Routledge
7. Hew KF, Brush T (2007) Integrating technology into K-12 teaching and learning: current
knowledge gaps and recommendations for future research. Education Tech Research Dev
55(3):223–252
8. Dede C (2010) Comparing frameworks for 21st century skills. Harvard Educ Rev 80(2):76–108
9. Johnson L, Adams Becker S, Estrada V, Freeman A (2014) NMC Horizon Report: 2014 K-12
Edition. The New Media Consortium
10. Selwyn N (2016) Digital technology and the contemporary university: Degrees of digitization.
Routledge
Admission Prediction for Universities
Using Decision Tree Algorithm
and Support Vector Machine
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 195
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_17
196 K. Trivedi et al.
1 Introduction
In India, many students aspire to join prestigious institutions such as IITs and NITs
but often do not apply due to low confidence in their academic performance [1]. To
assist students in determining their chances of admission, various machine learning
algorithms have been developed to predict university acceptance based on factors
such as exam scores and rankings. The accuracy of these predictors is essential for
students to make informed decisions about where to apply [1–4].
This paper aims to compare the accuracy of two machine learning algorithms,
decision tree and support vector, for predicting university admission using a dataset of
previous year’s 10th, 12th, and AIEEE (All India Engineering Entrance Examination)
exam scores and university acceptances. By training the algorithms on this dataset,
we can estimate the probability of a student’s acceptance into a reputable university.
2 Literature Survey
In the research, Patel et al. [5] explain that machine learning is a process of teaching
computers new skills through training and testing datasets, allowing them to make
predictions without explicit programming under different conditions. One of the
popular machine learning techniques is Decision Trees, which have been applied
in various industries and applications such as text extraction, medical certification,
statistical analysis, and search engines. There are several decision tree algorithms
available, including ID3, C4.5, and CART, each developed based on their accuracy
and cost-effectiveness. Selecting the most appropriate algorithm for each decision-
making scenario is crucial for efficient and accurate results.
In the research paper, Arunakumari et al. [6] explore the issue of students making
mistakes in their choice of preferred colleges, which can lead to regret later on.
Factors such as faulty college analysis, ignorance, and anxious projection can all
contribute to poor decision-making. To address this issue, the researchers developed
an automated web application prediction model for a college admission system that
utilizes data analysis and data mining techniques. By carefully reviewing the cut-off
numbers from the preceding five years, a preference list is created using inputs like
rank, category, preferred branches, desired districts, and selected universities. This
model aims to help students choose a good institution before being assigned and
make informed decisions to avoid future regrets.
In the paper by Singhal et al. [7], the authors explore the advantages of using
machine learning algorithms in accurately developing applications. While various
Admission Prediction for Universities Using Decision Tree Algorithm … 197
applications exist that claim to predict a student’s chances of getting a seat in a univer-
sity in the USA, most of them lack reliability and effectiveness. However, machine
learning offers several algorithms that can help create a reliable representation. The
objective of this study is to compare and determine which machine learning algo-
rithm—multi-linear regression, polynomial regression, or random forest—provides
the most accurate results for the given dataset. The inputs for these algorithms include
the candidate’s GRE score (Graduate Record Examinations), TOEFL score (Test of
English as a Foreign Language), and CGPA. The dataset is used to train the repre-
sentation, and the output is the percentage chance of securing a seat in a reputed
university.
In their research paper, Chithra et al. [8] address the challenges faced by students
seeking higher education in the United States, particularly those pursuing a master’s
degree. The study focuses on creating a model, called UAP, that takes into account
all the important factors that affect a student’s admission to a university in the US.
These factors include test scores, statement of purpose, letter of recommendation,
and the selection of universities to apply to. The UAP model provides a user-friendly
interface for students to access and accurately predicts their chances of admission to
the universities of their choice.
In the study, Aljasmi et al. [9] emphasize the significance of precise forecasting
of student admission for educational institutions. Using multiple machine learning
algorithms such as multiple linear regression, k-nearest neighbor, random forest, and
multilayer perceptron, the researchers determined the probability of a student being
admitted to a master’s programme. The multilayer perceptron model outperformed
the other models, providing students with essential information on their admission
prospects.
In the research paper, Rajagopal [10] explores the use of logistic regression to
predict university admittance based on various variables. Specifically, the study
focuses on predicting admittance to master’s programs, which typically receive
a high volume of applications. By statistically analyzing independent factors, the
study aims to develop predictive models that can assist in prioritizing the application
screening process and ultimately admit the most qualified applicants. The success of
this approach could have significant implications for improving the efficiency and
accuracy of the graduate school admissions process.
3 Algorithm
classification algorithms, where the output variable is categorical, such as Yes or No,
True or False, etc. Let’s explore these algorithms in detail.
Decision trees are a popular machine learning algorithm that is widely used in data
mining and decision-making applications [12, 13]. A decision tree consists of a root
node, branches, and leaves. The root node serves as the parent of all other nodes, and
it is the topmost node in the tree. The branches represent the possible outcomes of a
decision, while the leaves represent the final outcome or result.
The algorithm selects the best features and criteria at each node to split the dataset
into subsets, aiming to maximize information gain for classification or minimize
variance for regression.
As further elaborated in [12], decision trees are a type of acyclic graph with a
fixed root. Each node in the tree corresponds to an attribute in the data, and the edges
indicate a decision based on that attribute. By learning basic decision rules inferred
from the data features, the decision tree method seeks to build a model that can
predict the value of a target variable.
Each leaf node in a decision tree is given a class that corresponds to the ideal
target value or result. In contrast, the leaf might store a probability vector that shows
the possibility that the target characteristic would have a particular value. Based on
the results of the tests along the path, we go from the tree’s main node to a leaf node
to classify an instance [12].
The usage of decision trees is widespread in many fields, including engineering,
medicine, and finance [12, 13]. They have several benefits, such as being easy to
understand, straightforward, and able to handle both categorical and numerical data.
They can, however, overfit and be sensitive to slight adjustments in the input data.
All things considered, decision trees are a potent machine learning technique that
may be applied to a variety of classification and regression issues. They are highly
interpretable, making them suitable for explaining the reasoning behind predictions.
They are a well-liked option in the field of data science and machine learning since
they are simple to comprehend, analyze, and apply [5, 12, 13].
points from each class to the border. The optimal hyperplane is the one with the
largest margin, achieving better generalization.
SVM uses a kernel function to translate data points into a higher dimensional
space so that they can be separated by a hyperplane in the new space. It can efficiently
handle non-linear data by transforming it into a high dimensional space using kernel
functions allowing it to separate complex patterns. The kernel function can be linear,
polynomial, radial basis function (RBF), or sigmoid, and its selection depends on
the data and the problem being addressed. SVM offers several benefits over other
classification algorithms. It is effective in high-dimensional spaces and has a wide
range of feature support. The number of dimensions must be greater than the number
of samples for SVM to be useful. SVM is used in various domains, including image
classification and bioinformatics. They excel in scenarios with high dimensional data
and clear when separation is essential.
4 Implementation
account. But we use SVM and decision tree model. The Decision Tree algorithm
is chosen for its interpretability and ability to capture non-linear relationships
within the data. In addition to Decision Trees, we employ the Support Vector
Machine algorithm, which is known for its effectiveness in handling complex
decision boundaries.
3. Model Training: For both Decision Tree and SVM, the dataset is split into training
and validation sets to train the models. We employ the training data to fit the
models and iteratively refine them through cross-validation techniques. During
this phase, feature importance analysis is performed for Decision Trees to gain
insights into the admission factors.
4. Model Evaluation: The performance of the Decision Tree and SVM models is
assessed using various evaluation metrics, including accuracy, precision, recall,
F1-score, and other metrics, its performance is assessed. Statistical significance
tests are employed to ascertain the differences in performance between the two
models. This stage assists in identifying the model’s assets and liabilities and
determining whether any additional improvements are required.
5. Model Deployment: Model deployment is a crucial step in turning the trained
machine learning models into practical tools for making admission predictions.
Once the model is evaluated and deemed satisfactory, it can be deployed for
college admission prediction. Users can input their academic and personal infor-
mation, and the model will predict the college they are most likely to get
admission based on the input data (Fig. 1).
5 Results
Based on a student’s 10th- and 12th-grade marks, as well as their AIEEE rank, the
provided code conducts a classification exercise to estimate the college to which they
can be admitted. The dataset, called “College Admission Prediction Dataset.csv,”
is a CSV file that is read by pandas and edited by NumPy. Over the years from
2015 to 2019, data related to various factors such as year, 10th-grade marks, 12th-
grade marks, AIEEE rank, and college choices have been collected and analyzed.
This data likely pertains to students’ academic performance and their preferences
for engineering colleges. The purpose of collecting this data could have been to
understand trends and patterns in student performance, as well as to gain insights
into the factors influencing their college choices. The metrics module from scikit-
learn is used to determine the accuracy scores of the Decision Tree Classifier and
SVM (Support Vector Machine) models for classification. The Decision Tree model’s
accuracy score is 0.9013, which shows that it does a good job of predicting the
colleges that a student would be admitted to based on the input features. The SVM
model, on the other hand, has a substantially lower accuracy score than the Decision
Tree model, at 0.5888. This can be because the dataset was little, and small datasets
are not a good fit for the SVM model. Here’s an explanation of why it is not good due to
several reasons such as overfitting, lack of data diversity, reduced model complexity,
Admission Prediction for Universities Using Decision Tree Algorithm … 201
UNDERSTAND
PROBLEM
MODEL
EVALUATION
FEATURE DATA
SELECTION QUALITY
ALGORITHM
SELECTION
MODEL
MODEL ERROR
TRAINING
MODEL
EVALUATION
MODEL
DEPLOYMENT
6 Conclusion
In summary, the study evaluated the performance of decision tree and support vector
machine classifiers on a dataset for predicting universities that a student can get
admission into based on their 10th - and 12th-grade marks and AIEEE rank. In the
evaluation of both algorithms, the Decision Tree Classifier emerged as the superior
performer, achieving an impressive accuracy score of 0.9013. In contrast, the Support
Vector Machine (SVM) Classifier lagged behind with an accuracy score of 0.5888.
These results strongly indicate that the Decision Tree Classifier is a promising and
effective technique for predicting college admissions based on the provided academic
data. However, it’s important to note that there is room for further improvement. By
expanding the dataset with a larger and more diverse pool of academic records,
Admission Prediction for Universities Using Decision Tree Algorithm … 203
the model can be exposed to a broader range of scenarios and variations. This can
lead to a more robust and accurate predictive model. Additionally, fine-tuning the
hyperparameters of the Decision Tree Classifier can help optimize its performance
even further. Hyperparameter tuning involves systematically adjusting the settings
of the model to find the configuration that yields the best results.
In summary, while the Decision Tree Classifier has demonstrated its effectiveness
in college admission prediction, ongoing efforts to enhance its accuracy through
dataset expansion and hyperparameter tuning can unlock its full potential, making it
an even more valuable tool for this application. This approach ensures that the model
continues to evolve and provides increasingly accurate predictions for prospective
college applicants.
Disclosure of Interests The authors have no competing interests to declare that are relevant to the
content of this work.
References
14. Evgeniou T, Pontil M (2001) Support vector machines: theory and applications. In: Paliouras
G, Karkaletsis V, Spyropoulos CD (eds) Machine learning and its applications. ACAI 1999.
Lecture Notes in Computer Science, 2049. Springer, Berlin, Heidelberg. https://doi.org/10.
1007/3-540-44673-7_12. ISBN: 978-3-540-42490-1
15. Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review.
Artif Intell Rev 52:857–900. https://doi.org/10.1007/s10462-017-9611-1
16. Joshi VA (2020) Machine learning and artificial intelligence. Springer (2020). ISBN: 978-3-
030-26624-0. https://doi.org/10.1007/978-3-030-26622-6
Visualization and Statistical Analysis
of Research Pillar of Top Five THE
(Times Higher Education)-Ranked
Universities for the Years 2020–2023
Abstract We conducted an analysis of the research pillar in the top five universities
ranked by THE for the years 2020–2023 using data obtained from THE website.
To derive meaningful insights, we calculated the average research data for each
year across these universities. Subsequently, we compared the research data between
the 1st-ranked university and the remaining four universities, as well as between
consecutively ranked universities for each year. Our analysis demonstrated variations
over these years. Initially, there was an upward trend in average research perfor-
mance from 2020 to 2022, followed by a decline from 2022 to 2023. Interestingly,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 205
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_18
206 S. Das et al.
Keywords The World University ranking · Research pillar · Top five university ·
Graphical visualization · Statistical analysis · COVID-19
1 Introduction
World university ranking has become an important tool for students, researchers,
and policymakers to evaluate the quality and impact of universities [1–7]. THE
World University Rankings, an annual publication of university rankings by the
Times Higher Education, is one of the most popular, rigorous, and widely accepted
university rankings in the world [2, 3]. The data from participating institutions,
reputation survey, and other sources (e.g., Scopus database) are grouped into five
categories or “pillars”- Teaching, Research, Citations, International Outlook, and
Industry Income [3]. The universities are then finally ranked based on their Overall
scores and Z-score normalization [3].
The research pillar plays a significant role in determining the overall ranking
of universities [1, 3, 4, 8–18]. It reflects the contribution of universities to the
advancement of knowledge and innovation in their respective fields [14]. It also
reflects the universities’ commitment to conducting high-quality research through
industry-university partnerships and international collaboration that has a significant
impact on the economy and society [15–18]. Reputation survey, research income per
academic staff (faculty member), and the number of publications per staff including
the researcher are the three THE performance indicators (metrics) for the research
pillar [3]. The volume of research output is measured by the number of research publi-
cations produced by the university [3]. The income factor looks at the total research
income generated by the university, which includes research grants and contracts [3].
The reputation of research output is assessed through a survey of academics who are
asked to rate the research quality of institutions globally [3].
We presented graphical presentations and statistical analysis of the research pillar
of the top five THE ranked universities for the years 2020–2023, using data obtained
from its website [3]. We observed notable variation in the average value of the
research pillar over the years. We further compared the pillar between the 1st and other
four ranks (universities), and between consecutive rankings (universities) each year,
and detected fluctuations among the ranks (universities). We noted that occasionally
lower ranked universities performed better than higher ranked ones. Our analyses also
included the effect of inclusion or exclusion of a university on the research pillar and
revealed the substantial fluctuations of average value. The unprecedented worldwide
Visualization and Statistical Analysis of Research Pillar of Top Five … 207
The overall ranks and data of the research pillar of the top five ranked universities
for the period 2020–2023 are displayed in Table 1 along with the calculated average
(mean), median, and standard deviation (σ). It’s important to note that this ranking
is derived from the Overall score, which encompasses all five pillars and involves
Z-score normalization [3]. We observed that five among six universities, Univer-
sity of Oxford (Oxford), University of Cambridge (Cambridge), Harvard Univer-
sity (Harvard), Stanford University (Stanford), California Institute of Technology
(CalTech) and Massachusetts Institute of Technology (MIT), always constitute the
band of top five [3]. Harvard ranked 7th in 2020. Hence, we included data for the
6th ranked university and Harvard, which occupied the 7th rank solely for the year
2020, in Table 1 to facilitate a comprehensive analysis. Table 2 shows the research
differences (gaps) between the 1st and each of the remaining four ranks (universities)
(i.e., R12, R13, R14, and R15), as well as between consecutive ranks (universities)
(i.e., R12, R13, R14, and R15).
The trends in average research, median, and standard deviation in Table 1 and
Fig. 1, and the accompanying text suggest that average research increased from 2020
to 2022 but then decreased in 2023.
In 2020, Harvard with a research score of more than 98 was not part of the top five,
while MIT with a research score of 92.40 ranked 5th. This made the widest variation
of research output between 1st ranked Oxford and 5th ranked MIT, and therefore the
maximum standard deviation (σ) of 2.78 and median of 97.2. In 2021, MIT improved
its research score to 94.4. Research scores of the other four universities remained
almost the same though Harvard with 98.80 replaced Cambridge with a score of 99.2
and occupied the 3rd rank. The average research score slightly improved from 2020
to 2021. However, the median slightly decreased due to the entry of Harvard and exit
of Cambridge (left panel of Fig. 1). The σ also further decreased in 2021 due to a
smaller variation of research scores between 1st and 5th ranks in comparison to the
year 2020 (right panel of Fig. 1).
In 2022, Cambridge improved its research score, from 99.2 in 2021 to 99.50 in
2022, and occupied the 5th rank by replacing MIT with the same research score
of 94.4 as in 2021. The entry of Cambridge and exit of MIT in 2022 improved the
average research score of the top five ranks and hence, the median. As a result, the
σ was the lowest ever during the considered period since the gap between the 1st
and 5th ranks was also lowest. However, in 2023, MIT re-took the 5th rank with a
208 S. Das et al.
Table 1 Overall rank and research pillar data for the period 2020–2023
2023 Pillar 2022 Pillar
Rank University Research Rank University Research
1 Oxford 99.70 1 Oxford 99.60
2 Harvard 99.00 2 CalTech 96.90
3 Cambridge 99.50 3 (2) Harvard 98.90
4 (3) Stanford 96.70 4 Stanford 96.80
5 MIT 93.60 5 Cambridge 99.50
Total 488.50 Total 491.70
Average 97.70 Average 98.34
Median 99 Median 98.9
Stand. Dev. (σ ) 2.58 Stand. Dev. (σ ) 1.38
6 CalTech 97.0 6 (5) MIT 94.4
2021 2020
Rank University Research Rank University Research
1 Oxford 99.60 1 Oxford 99.60
2 Stanford 96.70 2 CalTech 97.20
3 Harvard 98.80 3 Cambridge 98.70
4 CalTech 96.90 4 Stanford 96.40
5 MIT 94.40 5 MIT 92.40
Total 486.40 Total 484.30
Average 97.28 Average 96.86
Median 96.9 Median 97.2
Stand. Dev. (σ ) 2.02 Stand. Dev. (σ ) 2.78
6 Cambridge 99.2 6 Princeton 96.3
7 Harvard 98.6
research score of 93.60, lower than the previous year (94.4 in 2022), and CalTech
with a research score of 97 exited from the top five. Consequently, the average
research score of the top five ranked universities in 2023 decreased, and σ again
became large due to a large gap in research score between the 1st and 5th ranks.
The median was found to be a maximum of 99 in 2023 (Table 1 and left panel of
Fig. 1). Therefore, it appears that the entry/exit of a particular university alters the
average research score of the top five. It is to be noted that average industry income
increased in 2021 (data are not shown), which may be attributed to collaborative
research efforts between universities and funding agencies in developing COVID-19
vaccines or another research during the pandemic [23–28]. Consequently, average
research output might have increased due to this factor.
It is pertinent to note that COVID-19 impacted the academic environment and the
well-being of students [19–22]. Therefore, all three factors, inclusion/exclusion of
university, industry income, and COVID-19, have had a mixed impact on the average
Visualization and Statistical Analysis of Research Pillar of Top Five … 209
Table 2 Differences between 1st and each of the remaining 4 universities (ranks), and between
consecutively ranked universities in the research pillar. See also Table 1
Overall Rank University 1st and remaining Two consecutive universities (ranks)
universities (ranks)
2023 Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 0.7
2 Stanford 0.7 (R12) 2 & 3 (R23) −0.5
3 Harvard 0.2 (R13) 3 & 4 (R34) 2.8
4(3) CalTech 3 (R14) 4 & 5 (R45) 3.1
5 MIT 6.1 (R15)
2022 University Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 2.7
2 CalTech 2.7 (R12) 2 & 3 (R23) −2
3(2) Harvard 0.7 (R13) 3 & 4 (R34) 2.1
4 Stanford 2.8 (R14) 4 & 5 (R45) −2.7
5 Cambridge 0.1 (R15) – –
2021 University Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 2.9
2 Stanford 2.9 (R12) 2 & 3 (R23) −2.1
3 Harvard 0.8 (R13) 3 & 4 (R34) 1.9
4 CalTech 2.7 (R14) 4 & 5 (R45) 2.5
5 MIT 5.2 (R15) – –
2020 University Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 2.4
2 CalTech 2.4 (R12) 2 & 3 (R23) −1.5
3 Cambridge 0.9 (R13) 3 & 4 (R34) 2.3
4 Stanford 3.2 (R14) 4 & 5 (R45) 4
5 MIT 7.2 (R15)
99.5 3.0
Average Research Standard Deviation
Median
Average Research & Median
99.0 2.7
Standard Deviation
98.5 2.4
98.0 2.1
97.5 1.8
97.0 1.5
96.5 1.2
2020 2021 2022 2023 2020 2021 2022 2023
Year Year
Fig. 1 Left panel: Variations of average and median of research output of top five ranks Right
panel: Variation of standard deviation for the period 2020–2023. Solid lines are to guide the eyes
universities always constitute the top five ranks as mentioned at the beginning of this
article.
The upper panel of Fig. 2 shows the variation of research scores across all six
universities (See also Table 3a).
The lower panel of Fig. 2 (Table 3b) shows the vertical variation of research
performance among five ranks in a particular year, and the horizontal variation of
research performance of a particular rank across all years. It clearly indicates that
the research score fluctuated among the universities and ranks, and sometimes lower
ranked (in overall score) university performed better than higher ranked ones in the
research pillar.
The left panel of Fig. 3 displays the variation in research difference (R) between
the 1st rank and the other four ranks (labeled as R12, R13, R14, and R15) for each
year from 2020 to 2023. The values are positive, with R13 being the minimum and
R15 being the maximum, except for 2022 where R13 (0.7) is higher than R15 (0.1)
Table 3a Research performance of all six universities for the period 2020–2023 as the group of
top five is comprised of five out of the six universities listed. Bold values indicate when a particular
university was excluded from the top five in overall scores (rankings) for that year. See also Table 1
Year University
Oxford Harvard Cambridge Stanford MIT CalTech
2020 99.6 98.6 98.7 96.4 92.4 97.2
2021 99.6 98.8 99.2 96.7 94.4 96.9
2022 99.6 98.9 99.5 96.8 94.4 96.9
2023 99.7 99 99.5 96.7 93.6 97.0
Average 99.625 98.825 99.225 96.65 93.7 97.0
Variation 0.1 0.4 0.8 0.4 2.0 0.3
Visualization and Statistical Analysis of Research Pillar of Top Five … 211
100
98
96
94
92
5 100 100
4 98 98
3 96 96
2 94 94
1 92 92
Rank 2020 2021 2022 2023
Fig. 2 The variation of research pillar across all six universities (upper panel) and five ranks (lower
panel)
as tabulated in Table 2. Both R12 and R13 decreased continuously over the years.
R12 decreased from the average value of R122020−2022
Avg. ≈ 2.6 (i.e., 2.4 in 2020, 2.9 in
2021, & 2.7 in 2022) to R122023 = 0.7 in 2023 (left panel of Fig. 3 and Table 2). The
right panel of Fig. 3 demonstrates the year-wise fluctuation of the average research
gap between the 1st ranked and the remaining four ranks for the period. It clearly
indicates that the yearly average research gap was minimal in 2022. This could be
attributed to the exclusion of MIT from the band of the top five in 2022. R13 decreased
212 S. Das et al.
8 3.5
Year wise research difference 3.42 Yearly avg. research gap (R) between
2020R = 4.65
Yearly avg. R b/w 1st and other ranks
6 2021R 3.0
2022R
Research Difference
2.9
5 2023R
R142020-2023
Avg.
4 2.5 2.5
R122020-2023
Avg. = ~ 2.18 = ~ 2.93
3
R122020-2022
Avg. = ~ 2.6
2 2.0
1
R132020-2023 = 0.65
R122023 = 0.7 Avg. 1.57
0 1.5
R12 R13 R14 R15 2020 2021 2022 2023
Research difference between 1st and other ranks Year
Fig. 3 Left panel: Research performance gap between 1st and remaining ranks for 2020–2023. It
is always positive. Solid lines guide the eyes. R122020−2022
Avg. and R122023 are also indicated. R12
decreased faster in 2023 than in 2020–2022, as did R13. R14 remained stable. R15 was highest in
2020 and lowest in 2022 but increased in 2023 compared to 2021 and 2022. Right panel: Year-wise
variation of average research gap (R) between 1st and other ranks. Yearly average research gap
RAvg. was minimum in 2022, attributed to the exclusion of MIT from the top five ranks
Visualization and Statistical Analysis of Research Pillar of Top Five … 213
1.0
Decrease of R13 over the period 2020-2023
0.8
m R13
1 = - 0.1
0.6
R13
0.4
mR13
2 = - 0.5
0.2
Fig. 4 Research gap between 1st and 3rd ranks (R13) over the years 2020–2023. It decreases with
different rates. The blue solid line presents the linear fit of 2020–2022 with a slope (rate) of -0.1
and the red solid line shows the linear fit of 2022–2023 with a slope of -0.5. The rate of decrease of
R13 from the year 2022 to 2023 is faster than the period 2020–2022
0 0
-0.5
-1 -1
R23
-1.5
-2 -2.1 -2
-2
R232020-2022 = ~ -1.9
Avg. R23
2020 2021 2022 2023
Year
Fig. 5 Research difference between the 2nd and 3rd ranks (R23) for the period 2020–2023
214 S. Das et al.
3 R23
R12
2
R12 and R23
-1
-2
Fig. 6 Comparison of R12 and R23. They follow non-linear trends, unlike R13
fluctuated between 2 and 3. Whereas, the research gap between the 4th and 5th ranks
was found to vary from −2.7 to 4 as indicated in Table 2, revealing that the 5th ranked
one sometimes performed better in the research pillar than the 4th ranked one.
It is important to emphasize that our analysis encompassed Overall scores, key
statistics, and five pillars of the top five ranked universities for the period 2020–
2023. However, in this presentation, we have focused exclusively on the research
pillar due to the constraints of scope. It’s worth noting that we observed fluctuations
in average values across the other pillars over the years, and the disparities among
ranks (universities) were not uniform. Furthermore, we identified instances where
lower-ranked universities outperformed higher-ranked ones in specific pillars. This
trend was also observed beyond the top five ranked universities, as evident from THE
website [3].
The epic COVID-19 pandemic severely affected many parts of the world,
economics, and societies [20–22, 29, 35–40]. The pandemic and the associated lock-
down, societal, and economic factors had a mixed impact on the research pillar
as well other four pillars, and therefore on the rankings as our considered period
coincided with it as discussed above. The world is gradually recovering from the
COVID-19 pandemic and economic slowdown [35]. Universities are opening their
campuses gradually as travel and other restrictions are being lifted and are also facing
economic, societal, and other challenges inflicted by COVID-19 [39]. The priorities
are expected to shift [39].
Visualization and Statistical Analysis of Research Pillar of Top Five … 215
3 Conclusion
Disclosure of Interests The authors have no competing interests to declare that are relevant to the
content of this article.
References
8. Dias A, Selan B (2023) How does university-industry collaboration relate to research resources
and technical-scientific activities? An analysis at the laboratory level. J Technol Transf 48:392–
415. https://doi.org/10.1007/s10961-022-09921-5
9. University Industry Collaboration – The vital role of tech companies’ support for higher
education research, THE Consultancy Report, THE (2020), https://www.timeshighereduc
ation.com/sites/default/files/the_consultancy_university_industry_collaboration_final_rep
ort_051120.pdf
10. Fabbri A, Lai A, Grundy Q, Bero AL (2018) The influence of industry sponsorship on the
research agenda: a scoping review. Am J Public Health 108(11):e9–e16. https://doi.org/10.
2105/AJPH.2018.304677
11. Valero A, Reenen VJ (2019) The economic impact of universities: evidence from across the
globe. Econom Educ Rev 68:53–67. https://doi.org/10.1016/j.econedurev.2018.09.001
12. Selten F, Neylon C, Huang KC, Groth P (2020) A longitudinal analysis of university rankings.
Quantit Sci Stud 1(3):1109–1135. https://doi.org/10.1162/qss_a_00052
13. Sjöö K, Hellström T (2019) University–industry collaboration: a literature review and synthesis.
Ind High Educ 33(4):275–285. https://doi.org/10.1177/0950422219829697
14. Hessels KL, Mooren CE, Bergsma (2021) What can research organizations learn from their
spin-off companies? Six case studies in the water sector. Ind Higher Educ 35(3):188–200.
https://doi.org/10.1177/0950422220952258
15. Odei AM, Novak P (2023) Determinants of universities’ spin-off creations. Econom Res
36(1):1279–1298. https://doi.org/10.1080/1331677X.2022.2086148
16. Robinson-Garcia N, Torres-Salinas D, Herrera-Viedma E, Docampo D (2019) Mining univer-
sity rankings: publication output and citation impact as their basis. Res Eval 28(3):232–240.
https://doi.org/10.1093/reseval/rvz014
17. Adams J (2012) The rise of research networks. Nature 490:335–336. https://doi.org/10.1038/
490335a
18. Adams J (2013) The fourth age of research. Nature 497:557–560. https://doi.org/10.1038/497
557a
19. The impact of coronavirus on higher education. https://www.timeshighereducation.com/hub/
keystone-academic-solutions/p/impact-coronavirus-higher-education
20. Johnson PT, Feeney KM, Jung H, Frandell A, Caldarulo M, Michalegko L, Islam S, Welch WE
(2021) COVID-19 and the academy: opinions and experiences of university-based scientists
in the U.S. Human Soc Sci Commun 8:146. https://doi.org/10.1057/s41599-021-00823-9
21. Reyes-Portillo AJ, Warner MC, Kline AE, Bixter TM, Chu CB, Miranda R, Nadeem E, Nick-
erson A, Peralta OA, Reigada L, Rizvi SL, Roy KA, Shatkin J, Kalver E, Rette D, Denton E,
Jeglic LE (2022) The psychological, academic, and economic impact of COVID-19 on college
students in the epicenter of the pandemic. Emerg Adulthood 10(2):473–490. https://doi.org/
10.1177/21676968211066657
22. Gómez-García G, Ramos-Navas-Parejo M, Juan Carlos de, la, Cruz-Campos C, Rodríguez-
Jiménez C (2022) Impact of COVID-19 on university students: an analysis of its influence on
psychological and academic factors. Int J Environ Res Public Health 19:10433. https://doi.org/
10.3390/ijerph191610433
23. Jack P (2022) Covid hit to university-industry collaboration in UK ‘limited’. THE
(2022), https://www.timeshighereducation.com/news/covid-hit-university-industry-collabora
tion-uk-limited
24. Webster P (2020) How is biomedical research funding faring during the COVID-19 lockdown?
Nat Med. https://doi.org/10.1038/d41591-020-00010-4
25. Editorial (2020) Safeguard research in the time of COVID-19. Nat Med 26:443. https://doi.
org/10.1038/s41591-020-0852-1
26. Crow MM et al. (2020) Support U.S. research during COVID-19. Science 370(6516):539–540.
https://doi.org/10.1126/science.abf1225
27. Mervis J (2020) U.S. academic research funding stays healthy despite pandemic. Science
368(6497):1298. https://doi.org/10.1126/science.368.6497.1298
Visualization and Statistical Analysis of Research Pillar of Top Five … 217
28. Ulrichsen CT (2021) Innovating during a crisis-the effects of the COVID-19 pandemic on how
universities contribute to innovation, National Centre for Universities and Business and Univer-
sity Commercialization & Innovation (UCI) Policy Evidence Unit. https://www.ifm.eng.cam.
ac.uk/uploads/UCI/knowledgehub/documents/2021_UCI_Covid_Universities_Report2.pdf
29. Keshky ESEM, Basyouni SS, Sabban AMA (2020) Getting through COVID-19: the pandemic’s
impact on the psychology of sustainability, quality of life, and the global economy – a systematic
review. Front Psychol 11:585897 (2020). https://doi.org/10.3389/fpsyg.2020.585897
30. Woolston C (2021) Job losses and falling salaries batter US academia. Nature. https://doi.org/
10.1038/d41586-021-01183-9
31. Ahlburg AD (2020) Covid-19 and UK Universities. Polit Quarter 91(3):649–654. https://doi.
org/10.1111/1467-923X.12867
32. Gilbert N (2021) UK academics see the over universities’ cost-cutting moves. Nature 596:307–
308. https://doi.org/10.1038/d41586-021-02163-9
33. Witze A (2020) Universities will never be the same after the coronavirus crisis: How virtual
classrooms and dire finances could alter academia: part 1 in a series on science after the
pandemic. Nature 582:162–164. https://doi.org/10.1038/d41586-020-01518-y
34. Horowitz MJ, Brown A, Minkin R (2021) A year into the pandemic, long-term financial
impact weighs heavily on many Americans. Pew Research Center (2021). https://www.pew
research.org/social-trends/2021/03/05/a-year-into-the-pandemic-long-term-financial-impact-
weighs-heavily-on-many-americans/
35. Ramlo ES (2021) Universities and COVID-19 pandemic: comparing views about how to address
the financial impact. Innov Higher Educ 46:777–793. https://doi.org/10.1007/s10755-021-095
61-x
36. Harper L, Kalfa N, Beckers AMG, Kaefer M, Nieuwhof-Leppink JA, Fossum M, Herbst WK,
Bagli D (2020) The impact of COVID-19 on research. J Pediatr Urol 16(5):715–716. https://
doi.org/10.1016/j.jpurol.2020.07.002
37. Weiner LD, Balasubramaniam V, Shah IS, Javier RJ (2020) COVID-19 impact on research,
lessons learned from COVID-19 research, implications for pediatric research. Pediatr Res
88:148–150. https://doi.org/10.1038/s41390-020-1006-3
38. Sikimic V (2022) How to improve research funding in academia? lessons from the COVID-19
crisis. Front Res Metrics Anal 7:777781. https://doi.org/10.3389/frma.2022.777781
39. Arday J (2022) Covid-19 and higher education: the Times They Are A’Changin. Educ Rev
74(3):365–377. https://doi.org/10.1080/00131911.2022.2076462
40. Munblit D, Nicholson RT, Needham MD, Seylanova N, Parr C, Chen J, Kokorina A, Sigfrid L,
Buonsenso D, Bhatnagar S, Thiruvengadam R, Parker MA, Preller J, Avdeev S, Klok AF, Tong
A, Diaz VJ, Groote DW, Schiess N, Akrami A, Simpson F, Olliaro P, Apfelbacher C, Rosa GR,
Chevinsky RJ, Saydah S, Schmitt J, Guekht A, Gorst LS, Genuneit J, Reyes FL, Asmanov A,
O’Hara EM, Scott TJ, Michelen M, Stavropoulou C, Warner OJ, Herridge M, Williamson RP
(2022) Studying the post-COVID-19 condition: research challenges, strategies, and importance
of Core Outcome Set development. BMC Med 20:50. https://doi.org/10.1186/s12916-021-022
22-y
Research
Assessing Machine Learning Algorithms
for Customer Segmentation:
A Comparative Study
K. S. Rao
Department of Computer Science & Engineering, B V Raju Institute of Technology, Narsapur,
Medak (District), Secunderabad, Telangana 502313, India
e-mail: [email protected]
S. Gopathoti · A. Ramakrishna
Malla Reddy College of Engineering, Dhulapally, Secunderabad, Telangana 500100, India
e-mail: [email protected]
A. Ramakrishna
e-mail: [email protected]
P. Gupta
Atal Bihari Vajpayee School of Management and Entrepreneurship, Jawaharlal Nehru University,
Delhi 110067, India
S. Potluri (B)
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Bowrampet, Hyderabad 500043, India, Telangana
e-mail: [email protected]; [email protected]
G. S. Reddy
Department of Data Science and Artificial Intelligence, Faculty of Science and Technology
(IcfaiTech), ICFAI Foundation for Higher Education, Hyderabad 501203, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 221
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_19
222 K. S. Rao et al.
1 Introduction
With the increasing utilization of the internet for online marketing, there has been
exponential growth in customer data. In today’s competitive landscape, businesses
strive to achieve objectives such as maximizing sales, profits, minimizing costs, and
enhancing customer and market satisfaction. However, the lack of understanding
and learning from customers can lead to failure. To effectively comprehend the
market and customers, leveraging the abundance of available data becomes crucial.
Customer segmentation is a method that can address this challenge. By dividing
customers into distinct groups based on specific traits, businesses can cluster and
group data to identify common characteristics. This enables effective communication
with different customer groups, increasing the likelihood of successful customer
engagement and purchases. For example, businesses can use social media to target
and market their brand to the teenage demographic [1–3].
1.1 Methods
2 Related Work
and accuracy. This report provides insights into various algorithms that enhance
segmentation efficiency and compares their performance to determine the most effec-
tive algorithm for our specific customer data set. Every customer is different, and
every customer journey is diverse, so a single method often isn’t going to fit all.
This is where customer segmentation becomes a valuable process [7–9]. However,
if customer segmentation is done suitably, there are various commercial benefits. A
best current customer segmentation exercise, for example, can have a measurable
impact on your operating outcomes by:
• Improving the overall quality of your goods;
• Keeping your marketing message focused;
• Enabling your sales team to explore more high-percentage offers;
• Increasing the quality of revenue.
K-Means clustering categorizes data into a set number of clusters. The letter “K”
denotes the number of pre-set clusters that can only be generated. This centroid-
based methodology pairs each cluster with a centroid. The underlying objective is
to reduce the distance between each data point and its cluster centroid. The model
divides unlabeled raw data into clusters and repeats the procedure until the best
clusters are found [10].
K-means is the simplest and the most popular unsupervised machine learning
algorithm which tries to partition dataset iteratively into nonoverlapping subgroups.
224 K. S. Rao et al.
3 Model Comparison
We used the “Malls Customer data” dataset of 2000 records with cust_Id, cust_
gender, cust_age, cust_annual_income and cust_spending_score as the attributes.
Pairwise correlation and exploratory data analysis of all columns in the data frame
show that all of these factors are statistically significant with respect to the spending
score. A comparison of the performance of the various clustering models for the
given data is represented below.
Step 9: Assign the same color to the boundary points based on the nearest core
point.
Step 10: Repeat Step 5 to Step 9 for the optimal number of clusters.
Step 11: Stop.
4 Results Comparison
Results are plotted in Figs. 1, 2, 3 and 4 with respect to age versus spending score
and annual income versus spending score for the given algorithms.
To analyze the customer’s behavior, various significant elements are recorded
by the research community. Our recent research reveals that the customer’s age is
evidently the most essential element among the others. This element can definitely
help us to determine the customer’s spending score. Customers aged between 20 and
35 years (young) are spending a greater amount of time identifying and choosing the
products regardless of their annual income. It is clear that the customers of the red
cluster have the lowest income and lowest spending score and the customers of the
blue cluster have the highest income and highest spending score in Fig. 5. Cluster-0
has a low spending score with low annual income. Cluster-1 has a high spending score
with higher annual income. Cluster-2 has an average spending score with an average
annual income. Cluster-3 has a low spending score with annual income just greater
than the average. Cluster-4 has a high spending score and high income compared
5 Conclusion
References
1. Vaidisha Mehta RMSV (2021) A survey on customer segmentation using machine learning
algorithms to find prospective clients. In: 2021 9th international conference on reliability,
infocom technologies and optimization, vol 1, p 4
2. Camiller MA (2017) Market segmentation, targeting and positioning, Chapter 4. Springer,
Cham, Switzerland
3. Jüttner U, Michel S, Maklan S, Macdonald EK, Windler K (2017) Identifying the right solution
customers: a managerial methodology. Ind Mark Manage 60:173–186
4. Thakur R, Workman L (2016) Customer portfolio management (CPM) for improved customer
relationship management (CRM): are your customers platinum, gold, silver, or bronze? J Bus
Res 69(10):4095–4102
5. Smith W (1956) Product differentiation and market segmentation as alternative marketing
strategies. J Mark 1(21):3–8
6. Bahuguna S, Singh V, Choudhury T, Kansal T (2018) Customer segmentation using K-means
Clustering, IEEE (1):4
Assessing Machine Learning Algorithms for Customer Segmentation … 229
7. Meghana NM (2016) Demographic strategy of market segmentation. Indian J Appl Res 6(5):6
8. Liu H, Huang Y, Wang Z, Liu K, Hu X, Wang W (2019) Personality or value: a comparative study
of psychographic segmentation based on an online review enhanced recommender system.
MDPI
9. Goyat S (2011) The basis of market segmentation: a critical review of literature. Eur J Bus
Manage 3
10. Susilo WH (2016) An impact of behavioral segmentation to increase consumer loyalty: empir-
ical study in higher education of postgraduate institutions at Jakarta. In: 5th international
conference on leadership, technology, innovation and business management
Genre Classification of Movie Trailers
Using Audio and Visual Features:
A Comparative Study of Machine
Learning Algorithms
Abstract Movie trailers are a crucial marketing tool for the film industry and are
often used to generate audience interest and anticipation. Automatic genre classifi-
cation of movie trailers can assist filmmakers in targeting their intended audience
and help viewers in deciding which films to watch. This research paper aims to
investigate the effectiveness of various machine learning algorithms for the classi-
fication of movie genres based on audio and visual features extracted from movie
trailers. We compare the performance of several classifiers, including Support Vector
Machines (SVM), Random Forest (RF), Naive Bayes (NB), and K- Nearest Neigh-
bors (KNN) on a dataset of movie trailers belonging to five different genres—action,
comedy, drama, horror, and thriller. We extract both audio and visual features from
the trailers, including spectrogram features, pitch, loudness, brightness, contrast,
and color histograms. We then use these features to train and evaluate the different
classifiers. Moreover, we observed that combining both audio and visual features
improves the overall accuracy of genre classification. Our study contributes to the
field of movie genre classification by providing a comparative analysis of different
machine learning algorithms for the classification of movie trailers based on both
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 231
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_20
232 V. Vanarote et al.
audio and visual features. The findings of this research can be applied in various
domains, such as movie recommendation systems, marketing strategies, and content
analysis.
1 Introduction
With numerous films being produced each year, the film industry has long been a vital
component of the entertainment industry [1]. Movie trailers, which offer a glimpse
into the plot of the film, are crucial in luring audiences to the theatre. For viewers to
determine whether they want to watch a movie or not, they must first understand its
genre [2, 3]. However, determining a movie’s category by hand can be laborious and
arbitrary. Using machine learning algorithms to automatically categorise movies into
various genres based on their audio and visual characteristics has gained popularity
in recent years [4, 5]. These algorithms can precisely determine the genre of a movie
trailer by examining the music, dialogue, and visual components [6].
This study seeks to investigate the efficiency of various machine learning algo-
rithms for categorising the genre of movie trailers [7, 8]. To extract audio and visual
features from a dataset of movie trailers from various categories, read on [9]. The
efficacy of various machine learning algorithms for genre classification, such as deci-
sion trees, random forests, support vector machines, and neural networks, will then
be compared [10, 11]. The findings of this research could be put to use in the movie
business because genre classification helps movie marketers and producers target the
right audience for their films [12, 13]. Additionally, this study may open the door
for future advancements in automated material classification and analysis in other
spheres of the entertainment business.
2 Literature Survey
The paper entitled “Movie Genre Classification Using SVM with Audio and Video
Features” by Huang and Wang [14] presents a method for classifying movies into
different genres using both audio and video features. The significance of catego-
rizing movies by genre and earlier research in this field is covered in the opening
section of the article. In the suggested technique, movie audio and video features are
extracted and then supported vector machine classifiers are used to categorise the
movies into various genres. The video features include motion, colour, and mate-
rial features, while the audio features include energy, zero-crossing rate, and Mel-
frequency cepstral coefficients. The authors compare the performance of their tech-
nique to other cutting-edge methods while evaluating it on a dataset of 900 movies.
They claim that their technique has an accuracy of 75.1%, which is higher than the
Genre Classification of Movie Trailers Using Audio and Visual Features … 233
best-performing earlier method. Overall, the paper indicates that using both audio
and video features for movie genre classification is effective and that the suggested
technique performs better than earlier approaches.
The paper entitled “On the Use of Synopsis-based Features for Film Genre Classi-
fication” by Portolese and Feltrin [15] proposes a new approach to film genre classifi-
cation that uses synopsis-based features. The paper makes the case that conventional
approaches to categorising film genres, which depend on audio and visual char-
acteristics, have trouble capturing the narrative and semantic content of a movie.
Portolese suggests a technique that utilises tools for natural language processing
to extract features from movie synopses in order to get around this limitation. The
study specifically combines text-based features like word frequency and part-of-
speech tags with semantic features like named entities and sentiment analysis. The
proposed method is assessed in the article using a dataset of more than 3,000 films,
and its performance is contrasted with that of various baseline methods. The find-
ings demonstrate that synopsis-based features outperform conventional audio and
visual features, forecasting film genre with an accuracy of 83.8%. Overall, the article
contends that adding synopsis-based characteristics to film genre classification can
increase the precision and efficacy of current techniques.
The paper entitled “Hindi Podcast Genre Prediction using Support Vector Clas-
sifier” by Mahrishi et al. [16] aims to predict the genre of Hindi podcasts using the
Support Vector Classifier (SVC) machine learning algorithm. The research opens
with an overview of the significance of podcast genre classification and its difficul-
ties. The SVC algorithm and its uses in different fields are then briefly described
by the author. The dataset for the experiment, which consisted of Hindi podcast
transcripts labelled with their various genres, is then described in the study. The
steps made to prepare the dataset for the machine learning model are then described
by the author. The process of extracting pertinent features from the podcast tran-
scripts to feed into the SVC algorithm is then described in depth in the paper. For
this, the author combines TF-IDF characteristics and bag-of-words features. The
research then presents the experiment’s findings, which reveal that the SVC algo-
rithm predicts the genre of Hindi podcasts with an accuracy of 78.2%. An analysis
of the results and suggestions for further research in this area round out the study.
The paper entitled “A Hybrid PlacesNet-LSTM Model for Movie Trailer Genre
Classification” by Jiang and Kim [17] proposes a new approach for genre classifica-
tion of movie trailers using a combination of convolutional neural network (CNN)
and long short-term memory (LSTM) techniques. The importance of categorising
movies by genre is emphasised in the opening of the piece, along with its application
to marketing plans and recommendation engines. The author continues by outlining
the drawbacks of conventional machine learning algorithms for genre categorization
before introducing the suggested hybrid model, which combines the advantages of
CNN and LSTM. The PlacesNet CNN component, which is pre-trained on a sizable
dataset of scene recognition images, is highlighted in the detailed description of the
model design. The movie trailer’s LSTM component is used to record the temporal
relationships between frames. Utilizing a dataset of 13 movie genres, the article also
contains evaluation findings and experiments. The suggested model outperforms the
234 V. Vanarote et al.
3 Methodology
The system architecture for the genre classification of movie trailers using audio and
visual features would typically involve several key components:
Data Collection: Collecting a large dataset of movie trailers with their corre-
sponding genre labels would be the first step in building the system. The dataset
would need to include both audio and visual features of the trailers, such as the
soundtracks, speech, and visual content.
Feature Extraction: The next step would be to extract relevant features from the
audio and visual content of the movie trailers. For example, audio features could
include things like tempo, beat, and loudness, while visual features could include
things like color, texture, and motion.
Preprocessing: Once the features have been extracted, they may need to be prepro-
cessed to remove noise or normalize the data. This could involve techniques like
scaling, normalization, or data augmentation.
Feature Fusion: After preprocessing, the audio and visual features can be fused
together to create a unified feature set that represents both modalities.
Model Selection: Choosing an appropriate machine learning algorithm is crit-
ical for achieving good classification performance. Popular algorithms for this task
include neural networks, support vector machines (SVMs), and random forests.
236 V. Vanarote et al.
Table 1 Genre classification of movie trailers using audio and visual features
Paper title Authors Methodology Key findings
Audio-visual fusion Meng et al. (2016) Audio-visual fusion Achieved
for movie genre using deep neural net state-of-the-art results
classification works on two datasets
A multi-modal deep Chakraborty et al. Multi-modal deep Outperformed
learning approach for (2017) learning using audio, traditional machine
movie genre visual, and textual learning approaches
classification features on a large dataset
Hierarchical deep Chen et al. (2017) Hierarchical deep Achieved
learning for movie learning using audio and state-of-the-art results
genre classification visual features on two datasets
Multi-modal deep Sarker et al. (2017) Multi-modal deep Outperformed
learning for movie learning using audio and traditional machine
genre classification visual features learning approaches
using audio and on a large dataset
visual cues
Exploring Li et al. (2018) Audio-visual fusion Achieved
audio-visual features using deep neural state-of-the-art results
for movie genre net-works on two datasets
classification
Multi-modal deep Gao et al. (2018) Multi-modal deep Outperformed
learning for movie learning using audio, traditional machine
genre classification visual, and textual learning approaches
using audio, visual, features on a large dataset
and textual
information
Audio and visual Zia et al. (2018) Audio-visual fusion Achieved competitive
features for movie using deep neural results on two datasets
genre classification net-works
Movie genre Wang et al. (2019) Multi-modal Outperformed
classification based convolutional neural traditional machine
on multi-modal network using audio and learning approaches
convolutional neural visual features on a large dataset
networks
Ensemble learning Gharibshah et al. Ensemble learning using Achieved
for movie genre (2019) audio and visual features state-of-the-art results
classification using on a dataset
audio and visual
features
A comparative study Ng et al. (2020) Comparison of different Identified the
of deep learning deep learning approaches best-performing
approaches for movie using audio and visual models for each
genre classification features dataset
Genre Classification of Movie Trailers Using Audio and Visual Features … 237
Fig. 1 System architecture for genre classification of movie trailers using audio and visual features
4 Discussions
Genre classification of movie trailers using audio and visual features poses several
challenges, including:
Data variability: The length, manner, and format variations in movie trailers can
have an impact on the consistency and quality of the data. Due to this variability, it
may be challenging to extract useful features from the trailers and it may also have
an impact on how well the machine learning algorithms work.
Feature extraction: It can be difficult to extract important features from audio and
visual input. Visual features like colour histograms and motion histograms may not
capture all aspects of the visual content, whereas audio features like MFCCs and
spectral features may not catch all aspects of the sound.
238 V. Vanarote et al.
Class imbalance: There could be an imbalance in the dataset due to the distribution
of the amount of movie trailers by genre. The accuracy and functionality of the
machine learning algorithms may be impacted by a bias towards the majority class
as a consequence.
Overfitting: Machine learning algorithms may overfit the training data, leading
to poor performance on new and unseen data. This can be particularly problem-
atic in genre classification, where the boundaries between genres can be fuzzy and
subjective.
Interpretability: Although machine learning algorithms can successfully cate-
gorise movie trailers into various categories, it can be challenging to understand the
underlying assumptions that underlie the classification. The ability of filmmakers
and other parties to comprehend and enhance the genre classification process may
be hampered as a result.
The genre classification of movie trailers using audio and visual features is an
important task in the field of multimedia content analysis. This study aims to compare
the performance of several machine learning algorithms in classifying movie trailers
into different genres based on their audio and visual features. The genre classification
of movie trailers has practical applications in the movie industry, where it can be used
to recommend movies to viewers, target advertisements, and assist in the distribution
and marketing of films. The use of machine learning algorithms in this task can lead
to more accurate and efficient genre classification, which can ultimately lead to better
recommendations for viewers and more effective marketing strategies for filmmakers.
The study used a dataset consisting of 1000 movie trailers from six different
genres, including action, comedy, drama, horror, romance, and sci-fi. The audio
and visual features extracted from the trailers were then used as inputs for several
machine learning algorithms, including decision trees, random forests, support vector
machines, k-nearest neighbors, and artificial neural networks. The performance of
these algorithms was evaluated using several metrics, including accuracy, precision,
recall, and F1-score. The results of the study showed that the SVM algorithm outper-
formed the other algorithms in terms of accuracy, precision, recall, and F1-score.
The KNN algorithm also performed well in this task. These results demonstrate that
machine learning algorithms can effectively classify movie trailers into different
genres based on their audio and visual features (Table 2).
Genre classification of movie trailers is a challenging task in the field of multi-
media content analysis. The traditional approach of manually classifying movie
trailers based on their content is time-consuming and prone to errors. Therefore,
machine learning algorithms have been used to automate this process. Overall, this
study provides valuable insights into the effectiveness of different machine learning
algorithms in the genre classification of movie trailers. The findings of this study
can be used to develop more accurate and efficient genre classification systems for
movie trailers, which can ultimately lead to better recommendations for viewers and
more effective marketing strategies for filmmakers.
Genre Classification of Movie Trailers Using Audio and Visual Features … 239
Table 2 Need and discussions for the study on genre classification of movie trailers using audio
and visual features
Need Discussions
Importance of genre Genre classification of movie trailers is an important task in the
classification of movie trailers field of multimedia content analysis. It can help users easily
find and select movie trailers of their interest, and also help
content providers to improve their marketing strategies
Dataset used in the study The study used a dataset consisting of 1000 movie trailers from
six different genres: action, comedy, drama, horror, romance,
and sci-fi
Audio features used in the The audio features extracted from the trailers included MFCCs,
study spectral features, and statistical features
Visual features used in the The visual features included color histograms, motion
study histograms, and shape features
Machine learning algorithms The machine learning algorithms used in the study included
used in the study decision trees, random forests, support vector machines
(SVMs), k-nearest neighbors (KNN), and artificial neural
networks (ANNs)
Evaluation metrics used in the The researchers evaluated the performance of these algorithms
study using several metrics, including accuracy, precision, recall, and
F1-score
Best performing algorithm The results of the study showed that the SVM algorithm
outperformed the other algorithms in terms of accuracy,
precision, recall, and F1-score. The SVM algorithm achieved an
accuracy of 84.6%, a precision of 84.6%, a recall of 84.6%, and
an F1-score of 84.6%
Implications of the study The study demonstrated that machine learning algorithms can
effectively classify movie trailers into different genres based on
their audio and visual features. The results also showed that the
SVM algorithm is particularly effective in this task, and can be
used as a reliable tool for genre classification of movie trailers.
The study can have implications in the field of multimedia
content analysis, marketing strategies, and user experience
5 Conclusions
In conclusion, the study demonstrated that machine learning algorithms can be used to
effectively classify movie trailers into different genres based on their audio and visual
features. The study compared the performance of several algorithms, including deci-
sion trees, random forests, SVMs, KNN, and ANNs, and evaluated their performance
using several metrics, including accuracy, precision, recall, and F1-score. Overall,
the study provides valuable insights into the use of machine learning algorithms for
genre classification of multimedia content. Future research can explore the use of
additional features, such as textual features, and further evaluate the performance of
machine learning algorithms in this task.
240 V. Vanarote et al.
References
1. Deldjoo Y, Elahi M, Cremonesi P (2016) Using visual features and latent factors for movie
recommendation. In: CEUR workshop proceedings, vol 1673, pp 15–18
2. Lau DS, Ajoodha R (2022) Music genre classification: a comparative study between
deep learning and traditional machine learning approaches. Lect Notes Netw Syst
217(1433596):239–247. https://doi.org/10.1007/978-981-16-2102-4_22
3. Chandre PR, Mahalle PN, Shinde GR (2018) Machine learning based novel approach for
intrusion detection and prevention system: a tool based verification. In: 2018 IEEE global
conference on wireless computing and networking (GCWCN), pp 135–140. https://doi.org/10.
1109/GCWCN.2018.8668618
4. Álvarez F, Sánchez F, Hernández-Peñaloza G, Jiménez D, Menéndez JM, Cisneros G (2019) On
the influence of low-level visual features in film classification. PLoS One 14(2):1–29. https://
doi.org/10.1371/journal.pone.0211406
5. Chandre PR (2021) Intrusion prevention framework for WSN using deep CNN 12(6)):3567–
3572
6. Castañeda-González L (2019) Movie recommender based on visual content analysis using deep
learning techniques
7. Ma B, Greer T, Knox D, Narayanan S (2021) A computational lens into how music characterizes
genre in film. PLoS One 16(4):1–15. https://doi.org/10.1371/journal.pone.0249957
8. Chandre P, Mahalle P, Shinde G (2022) Intrusion prevention system using convolutional neural
network for wireless sensor network. IAES Int J Artif Intell 11(2):504–515. https://doi.org/10.
11591/ijai.v11.i2.pp504-515
9. Shambharkar PG, Doja MN, Chandel D, Bansal K, Taneja K (2019) Multimodal KDK classifier
for automatic classification of movie trailers. Int J Recent Technol Eng 8(3):8481–8490. https://
doi.org/10.35940/ijrte.C4825.098319
10. Thiruvengatanadhan R (2020) Musical genre classification using convolutional neural
networks. Int J Innov Technol Explor Eng 10(1):228–230. https://doi.org/10.35940/ijitee.
a8172.1110120
11. Pathak GR, Patil SH (2016) Mathematical model of security framework for routing layer
protocol in wireless sensor networks. Phys Proc 78(December 2015):579–586. https://doi.org/
10.1016/j.procs.2016.02.121
12. Ben-Ahmed O, Huet B (2018) Deep multimodal features for movie genre and interestingness
prediction. In: Proceedings of international workshop content-based multimedia indexing, vol
2018. https://doi.org/10.1109/CBMI.2018.8516504
13. Pathak GR, Premi MSG, Patil SH (2019) LSSCW: a lightweight security scheme for cluster
based wireless sensor network. Int J Adv Comput Sci Appl 10(10):448–460. https://doi.org/
10.14569/ijacsa.2019.0101062
14. Huang YF, Wang SH (2012) Movie genre classification using SVM with audio and video
features. In: Lecture notes computer science (including Subseries Lecture notes artificial intel-
ligence, Lecture notes bioinformatics), vol 7669 LNCS, no December 2012, pp 1–10. https://
doi.org/10.1007/978-3-642-35236-2_1
15. Portolese G, Feltrin VD (2019) On the use of synopsis-based features for film genre
classification, pp 892–902. https://doi.org/10.5753/eniac.2018.4476
16. Mahrishi M, Jain M, Sharma G, Jain M, Mahrishi M, Sharma G (2023) Hindi podcast genre
prediction using support vector classifier Hindi podcast genre prediction using support vector
classifier
17. Jiang D, Kim J (2022) A hybrid placesnet-lstm model for movie trailer genre classification. J
Theor Appl Inf Technol 100(14):5306–5316
18. Wehrmann J, Barros RC (2017) Movie genre classification: a multi-label approach based on
convolutions through time. Appl Soft Comput J 61:973–982. https://doi.org/10.1016/j.asoc.
2017.08.029
19. Kumar Vishwakarma D, Jindal M, Mittal A, Sharma A. Multilevel profiling of situation and
dialogue-based deep networks for movie genre classification using movie trailers
Genre Classification of Movie Trailers Using Audio and Visual Features … 241
20. Fish E, Weinbren J, Gilbert A (2021) Rethinking genre classification with fine grained semantic
clustering. In: Proceedings of international conference on image processing, ICIP, vol 2021,
no i, pp 1274–1278. https://doi.org/10.1109/ICIP42928.2021.9506751
21. Dhotre D, Chandre PR, Khandare A, Patil M, Gawande GS (2023) The rise of crypto malware:
leveraging machine learning techniques to understand the evolution, impact, and detection of
cryptocurrency-related threats. Int J Recent Innov Trends Comput Commun 11(7):215–22.
https://ijritcc.org/index.php/ijritcc/article/view/7848
22. Makubhai S, Pathak GR, Chandre PR (2023) Prevention in healthcare: an explainable AI
approach. Int J Recent Innov Trends Comput Commun 11(5):92–100. https://doi.org/10.17762/
ijritcc.v11i5.6582
23. Chandre P, Vanarote V, Kuri M, Uttarkar A, Dhore A, Pathan S (2023) Developing an explain-
able AI model for predicting patient readmissions in hospitals. In: 2023 2nd international
conference on edge computing and applications (ICECAA), Namakkal, India, pp 587–592.
https://doi.org/10.1109/ICECAA58104.2023.10212152
Classifying Scanning Electron
Microscope Images Using Deep
Convolution Neural Network
Abstract The research aims to classify high-temperature materials with wide appli-
cations such as electronic, re-entry vehicles, and semiconductors. The challenging
act is to extract unique features as the images are microscopic with different resolu-
tions. The images captured from the SEM (Scanning Electron Microscope) machine
are classified according to their crystal type, for SiO2 , CCC, silica tile, carbon fiber,
CeZrO2 using Convolutional Neural Network (CNN), which is a deep learning frame-
work. Images obtained by XRD (X-ray diffraction) machines are classified according
to the crystal structure (such as crystalline, amorphous, and tetragonal) irrespective
of the material. An ensemble-CNN-based classifier is designed to train and classify
(SEM and XRD) images with accuracy.
1 Introduction
Convolutional Neural Networks (CNN) is a replica of the human brain network and
follows the brain’s process to classify optimum functionality images. CNN applica-
tions are seen in robot training, Facebook’s photo tagging, healthcare, traffic surveil-
lance, security, and self-driving cars. CNN is trained by thousands or millions of
images of the same object so that the computer can recognize the object. Alex
Krizhevsky introduced a new network, ’ImageNet’, to classify high-resolution
images into 1000 different classes using max-pooling layers and fully connected
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 243
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_21
244 K. Jayaram et al.
2 Implementation
The implementation has been done on the system having configuration 128 RAM
using Python 3.0, Java for classifying “High-temperature materials and MATLAB
for Convolutional neural network, Parallel Computing Toolbox, Deep Learning
Toolbox” (add ons).
Extraction of text from PDF documents, data pre-processing, and supervised clas-
sification of high-temperature materials have been implemented [17]. The terms are
searched for the whole document to create training data to cluster materials clas-
sifying into TPS, thermal barrier system (TBS), ultra-high-temperature ceramics
(UHTC), and electronics materials. The noun phrases from the research papers are
searched and term sequences are summaries by comparing with Wikipedia entry.
With these entries of documents, 4 class clustering is shown using Linear Discrimi-
nant Analysis (LDA) and nearest neighbor supervised methods to extract information
from high-temperature materials. The results obtained include a dataset with a list
of materials characterized with respect to the properties of the materials.
The deep Convolutional neural network (CNN or ConvNet) is the most commonly
applied method for analyzing images, videos to classify or cluster. ConvNet is a
multilayer perceptron that is fully connected network. The main advantage of CNN
is the hierarchical pattern in data and assembling complex patterns by combining
simpler and smaller networks. Therefore, the scale of complexity and connectedness
is high with better performance.
The architecture of ResNet-50 (shown in Fig. 1), and the network is a fully
connected layer where the last layer has all learnable weights. The important process
is to load and explore image data as an "imageDatastore" to the defined architecture
network with specific training options that help train the system. Images are labeled
automatically based on folder names and are stored in the "imageDatastore" object.
An image datastore is to store large image data for efficient reading of Convolutional
neural network during training. The processes mentioned above are carried out to
predict new data labels that help calculate the classification accuracy. A pseudocode
to load and explore image data (shown in Fig. 2).
The data has to be divided into training and testing datasets, where 75% of images
are training set, and 25% are testing or validation images for each label. The Convo-
lution neural network architecture has to be defined, (shown in Fig. 3a). Once the
network structure is defined, training options have to be specified. The network
uses stochastic gradient descent with momentum (sgdm) with an initial learning
rate of 0.01 and 4 epochs. Setting the "trainingOptions" for the training cycle to
run for every epoch to monitor accuracy during training (shown in Fig. 3b). There
are different options for choosing a good optimizer (solver) like ’adam’ (adaptive
moment estimation), ’rmsprop’ (root mean square propagation), and ’sgdm’.
These optimizers were tried for results since results improved with sgdm solver
hence used in this image classification algorithm. The solver updates a subset of
the data parameters at every step. The data is shuffled at every epoch, software
trains the network on training data and calculates the accuracy of validation data at
246 K. Jayaram et al.
Fig. 2 Pseudocode to load and explore image data from the folder
regular intervals during training, but weights are not updated. The network is small;
hence, the number of epochs is also small for fine-tuning and transfer learning, as the
learning is already concluded. The option of ’plots’ in "trainingOptions" creates and
displays an image of training metrics at every iteration, which estimates the network
parameters’ with gradient update. The ’ValidationData’ option performs network
validation during training for every 50 iterations to calculate the root mean squared
error for the regression networks. The validation loss and accuracy are nothing but
the cross-entropy loss and accuracy of the percentage of images the network correctly
classifies (shown in Fig. 4).
Classifying Scanning Electron Microscope Images Using Deep … 247
(a)
(b)
Fig. 4 Cross-entropy loss and accuracy of the percentage of images correctly classified
Predicting the validation of labels assumed to correct the trained network is the
fraction of labels that the network predicts correctly in our case, it is 98%. Valida-
tion determines the under- and over-fitting of the training data when performed at a
regular interval. Comparing training loss and accuracy to the corresponding valida-
tion metrics can get the network as overfitting. Using ’augmentedImageDatastore’
to "trainNetwork" reduces overfitting as it performs random transformations on the
input images. The training data and the training options using ’ExecutionEnviron-
ment’ in ’trainingOptions’ have a default Graphics Processing Unit (GPU) if available
for the parallel computing toolbox. Otherwise, it uses the Central Processing Unit
(CPU) for the training progress plot. A checkpoint is set in "trainingOptions" with
the path set ’CheckpointPath’ this can be used if training is abruptly interrupted,
248 K. Jayaram et al.
resuming training from the last saved checkpoint, which will be saved as a ".mat"
file.
(a) (b)
4 Conclusion
References
1 Introduction
V. Prakash (B)
FGM Government College, Adampur (Hisar), India
e-mail: [email protected]
D. Kumar
Department of CSE, Guru Jambheshwar University of Science Technology, Hisar, Haryana, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 251
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_22
252 V. Prakash and D. Kumar
Fig. 1 An illustration of
10:20 electrode (channel)
placement
is in the field of neurology for the diagnosis of epilepsy. The standard EEG study
reveals abnormalities caused by epileptic activity. This is attributed to its ability to
clearly illustrate the distinct and often rhythmic patterns that precede or coincide with
initial observable behavioral changes associated with seizures. In EEG recordings,
a channel or electrode is placed on the scalp of the subject. The International 10–20
system is a universally acknowledged approach employed to identify and position
scalp electrodes in the course of an EEG test or experiment, as depicted in Fig. 1.
Automated seizure detection systems have been developed due to the high preva-
lence of epilepsy and the overwhelming workload of human specialists in identi-
fying seizures. Out of the four classification algorithms available, namely Random
Forest (RF), Decision Tree (DT) algorithm C4.5, Support Vector Machine (SVM)
combined with Random Forest (RF), and Support Vector Machine (SVM) combined
with C4.5, the most accurate ones for seizure detection have been identified [23]. In
the research reported in [26], epileptic episodes are detected by utilizing variables
such as estimated entropy and sample entropy derived through WPD. Further, the
authors utilize Support Vector Machine (SVM) for data classification. To reduce
the number of independent variables, the authors in [25] employ WPD and Kernel
Principal Component Analysis (KPCA). Recently, considerable attention has been
focused on EEG signal processing, particularly convolutional neural networks and
other advanced deep learning methodologies, owing to their remarkable achieve-
ments [4, 20]. Although PSD is not directly associated with Machine Learning or
Deep Learning, it can be regarded as a valuable technique for pre-processing and
extracting features from EEG signals.
This research compares two groups: healthy and those with epilepsy. The EEG
signal of a single subject during an epileptic seizure is depicted in Fig. 2.
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 253
The study presented in this article makes a comparison between two different
approaches to distinguishing healthy people from those who have epilepsy.
To classify seizures brought on by epilepsy, one method uses frequency-domain
features extracted from the Welch power spectral density. We determined the average,
standard deviation, lowest, and maximum levels of epoch signal fluctuations by
making use of datasets that were made available to the public by the Zenodo organi-
zation. Accuracy, loss, confusion matrix, sensitivity, and specificity were calculated
and compared after training a classifier (kernel SVM, Random Forest, Naive Bayes,
Decision tree) using these metrics. The kernel support vector machine (SVM) outper-
forms all other classifiers on the Nigerian dataset, with an accuracy that surpasses
that of the referenced work [8]. This research aimed to compare the efficacy of
differentiating between epileptic and healthy people.
The remaining portions of the paper are organized into different sections. The
literature review of the various epilepsy classification methods is mentioned in Sect. 2.
The fundamentals of power spectral density and several classifiers are explained
in Sect. 3. Section 4 defines the dataset description and performance evaluation
measures. The suggested framework and method are shown in Sect. 5. The results are
analyzed in Section 6, while the discussion of the classification approach’s potential
future applications is included in Sect. 7.
254 V. Prakash and D. Kumar
2 Literature Review
Power spectral density (PSD) analysis holds great significance as a simple and essen-
tial approach for processing EEG signals in the frequency domain. Consequently, it
finds extensive application in the classification of epilepsy signals. Rajaguru et al.
[17] utilize Power Spectral Density (PSD) for feature extraction and Correlation
Dimension for epilepsy classification from EEG signals and results indicate 68.88%
accuracy with an average Performance Index of 7.69%. Donos et al. [6] propose
a simple seizure detection algorithm based on intracranial EEG and random forest
classification. The algorithm has a high sensitivity of 93.84% and a low false detec-
tion rate of 0.33/h. PSD estimation methods are a crucial aspect of analyzing EEG
signals when it comes to extracting frequency domain features. The periodogram
(PD) is an essential non-parametric method that one must utilize to estimate the PSD
[11]. In their publication [9], Ghayab et al. introduced a novel approach that incor-
porates optimal allocation techniques and spectral density estimation to analyze and
classify epileptic EEG signals. They achieved a 100% overall accuracy, surpassing
previous methods by 14.1%. Wavelet transform extracts wavelet coefficients from
EEG signals, representing time and freq. domains. Various wavelet transforms are
used in EEG signal analysis & classification in time-freq. domain [26]. In [14],
authors developed an automated method to detect seizures. They used permutation
entropy to extract important attributes from EEG recordings. These features were fed
into an SVM classifier, resulting in an 86.10% accuracy. In their study, Ghayab et al.
[8] employed random sampling and feature selection to represent various combi-
nations of epileptic EEG features. They evaluated these features using an LS-SVM
classifier and achieved an impressive accuracy rate of 99.9% by identifying the most
distinguishing EEG features. Dhar and Garg [5] proposed a combined approach of
power spectral density and DWT for feature extraction and classification of epilepsy
in EEG with an accuracy of 90.1%. In paper [18], Rohira et al. highlighted the
need for automatic epilepsy prediction. Feature extraction is done using PSD and
classification of seizures is performed with Random Forest, achieving an accuracy
of over 90%. Liu et al. [12] presented an approach to detect and predict epilepsy
by combining the periodic and aperiodic elements of the EEG power spectrum.
The combined features yielded an average accuracy of 99.95% when tested on the
CHB-MIT database.
3 Background
The purpose of this section is to explain the basis for power spectral density and
classifiers for classifying seizures with epileptic features.
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 255
Power spectral density (PSD) is frequently used in the fields of signal processing and
communication systems. It quantifies the distribution of power in a signal across its
frequency spectrum, providing insights into how power is allocated among various
frequencies [21]. Within the realm of frequency analysis, PSD estimation methods
as in Equations 1 and 2. The estimation of the PSD can be determined through two
main categories of methods: non-parametric approaches and parametric approaches.
Among the non-parametric methods, the Periodogram (PD) stands out as a significant
technique [11].
N −1 2
1 c −j2π fn −fs fs
P (f ) =
c
xn e <f < (1)
fs N n=0 2 2
where xnc represents the temporal data of channel c, consisting of N samples [2]. Pwc
is utilized to calculate the PSD of the signal in channel c within the frequency range
w = [w1, w2] having
f =w2
f =w1 P c (f )
c
PW = f =f s (2)
2
f =0 P c (f )
The procedure for determining the Power Spectral Density of a signal consists of
the following steps:
• Signal Preprocessing: Signal preprocessing Prepare signal for analysis. Apply
techniques like filtering, windowing, and other methods to remove noise or
artifacts that may disrupt Power Spectral Density estimation.
• Segmentation: The signal is partitioned into smaller segments or windows,
typically, with overlapping, to enhance the precision of PSD.
• Windowing: Each segment is subjected to multiplication by a window function
to mitigate spectral leakage. Windowing reduces spectral leakage via window
functions like Hamming, Hanning, Blackman, etc.
• Discrete Fourier Transform (DFT): To transform the signal from the time domain
to the frequency domain, it is imperative that the Discrete Fourier Transform is
applied to every segmented window of the signal.
• Power Calculation: For each segment, the squared magnitude of the DFT output
is used to calculate the power density.
• Averaging: To estimate the power distribution over frequencies of the signal, all
PSD estimates are averaged.
The PSD of a single subject during an epileptic seizure is depicted in Fig. 3.
256 V. Prakash and D. Kumar
Fig. 3 An illustration of
power spectral density of
EEG signal
3.2 Classifiers
Support vector machine (K-SVM): The Support Vector Machine (SVM) was
initially formulated by Cortes and Vapnik [3] and has gained significant popu-
larity as a classification technique. Primarily, the SVM is employed to partition
the extracted sets of features into two distinct classes by identifying an optimal
hyperplane. The author in [22] used a hybrid SVM model, including kernel-type
parameters and a regularization constant, to improve classification for the detection
of epileptic seizures. They applied genetic algorithm-based GA-SVM and particle
swarm optimization-based PSO-SVM algorithms to select parameter values and
enhance diagnostic applications. In various studies, Support Vector Machines (SVM)
demonstrated higher levels of accuracy [7, 10, 14].
Kernel-SVM: The Kernel SVM algorithm is utilized in machine learning when
the data is not able to be separated in a linear fashion. This means that a straight line
cannot be used to divide the data into distinct categories. The kernel function is a
mathematical function that takes the input data and transforms it into a new feature
space where the data can be separated by a hyperplane.
Random Forest (RF): This methodology centered on evaluating the efficacy of
the chosen classifiers in detecting epileptic seizures. It conducts data classification
by generating numerous decision trees during the training phase [15]. Within the
Random Forest (RF) framework, each tree is considered an independent classifier,
and its weighted classification outcome contributes to the final classification using a
majority voting technique [13, 15]. The wavelet packet features technique has been
employed, along with the adoption of the RF classifier, as the classifier for identifying
the epilepsy state [24] and gained an accuracy of 84.8%.
Naive Bayes (NB): Naive Bayes is a probabilistic classifier that relies on Bayesian
theory, assuming independence among the features of a specific class. The estimation
for the occurrence or absence of a particular feature in the Naive Bayes model is
determined through maximum likelihood [19]. D is a training set consisting of n-
classes. Each class is identified by its attribute vector Y and corresponding class
label. The class with the highest posterior probability is assigned the attribute vector
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 257
where
P(Y |Ci )P(Ci )
P(Ci |Y ) = (4)
P(Y )
4 Evaluation Protocol
The following section presents a description of the dataset and assesses the
performance of the proposed model using appropriate performance metrics.
4.1 Dataset
A total of 212 residents of Nigeria answered the questionnaire. There were 112 (Male
67, Female 45) epileptic seizure-prone patients and 92 (Male 67, Female 25) healthy
individuals in the dataset. A description of the dataset is detailed in Table 1. The EEG
features 14 channels and can record at 128 Hz at 16-bit resolution. Epilepsy patients
and control persons with no history of seizures were studied separately (subjects who
were prone to epileptic seizures). All across the world, the 10–20 system is used to
determine where to place electrodes. Training and testing are split 80:20, and the
epochs have a lifespan of ten iterations.
258 V. Prakash and D. Kumar
(TN + TP)
Accuracy = (%) (5)
(TP + TN + FN + FP)
TP
Sensitivity = (%) (6)
(TP + FN )
TN
Specificity = (%) (7)
(TN + FP)
The F1 score varies from 0 to 1, with a greater value signifying enhanced per-
performance. Perfect precision and recall are denoted by an F1 score of 1, whereas
a score of 0 denotes inadequate performance.
When it comes to evaluating predictions, precision is the measure of the number
of accurately predicted positive instances in comparison to the total number of
instances that were predicted as positive. Recall measures accurately predicted
positive instances relative to total actual positives.
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 259
5 Proposed Framework
The investigation fed the classification model with features that correspond to
different times in an individual’s life. The application of the Welch Power Spectrum
method is beneficial to the feature extraction process of the classification model.
The output parameters of four distinct classifiers were studied in order to facilitate
comparisons between them (Kernel SVM, Random Forest, Decision Tree, Naive
Bayes). The entirety of this method is shown in Fig. 4 as a flowchart. Algorithm 1
provided the feature extraction process using the power spectral density approach.
For the removal of specific artefacts, EEG signals are pre-processed with a band-
pass filter with a defined frequency range. EEG signal segmentation is done with a
set window size. To reduce spectral leakage, each segment is processed through a
window function, hanning in this algorithm.
After frequency resolution with the length of EEG signals, the power spectral of
the windowed segment is computed and inserted into the list.
Table 2 showcases the parameters’ values for the different classifiers utilized on
the Nigerian dataset. Additionally, the accuracy of the model is depicted through a
graphical representation known as the receiver operating characteristic curve (ROC)
curve. The best classifier’s ROC curve and AUC (Area Under Curve) values are
displayed in Fig. 5. Fivefold cross-validation is used here for the validation purpose.
Five-fold accuracies [0.9919, 0.9919, 0.9913, 0.9913, 0.9913]. The average accuracy
is 0.9916 for Nigerian Data in case of Kernel SVM.
The classification results show the performance of various classifiers on a dataset.
The Random Forest classifier achieved an accuracy of 89.03%, with a sensitivity
of 80.20% and precision of 87.80%. It achieved a balanced performance between
precision and sensitivity, with an F-1 score of 0.78. The Decision Tree classifier had
a slightly lower accuracy of 88.37% but performed well in terms of sensitivity with a
score of 83.04%. It accurately predicted 88.18% of the positive instances out of the
total predicted positives. The Naive Bayes classifier achieved an accuracy of 82.23%
but had the highest sensitivity of 94.76% and precision of 83.75%. The F-1 score of
0.80 suggests a lower overall performance due to lower accuracy and precision. The
Kernel-SVM classifier performed exceptionally well, with an accuracy of 93.09%,
a sensitivity 92.16%, and precision of 91.58%. Its F-1 score of 0.82 suggests a
strong overall performance with high precision. The classification results of various
performance metrics are illustrated in Figs. 6 and 7.
7 Conclusion
References
8. Ghayab HRA, Li Y, Abdulla S, Diykh M, Wan X (2016) Classification of epileptic EEG signals
based on simple random sampling and sequential feature selection. Brain Inform 3(2):85–91
9. Ghayab HRA, Li Y, Siuly S, Abdulla S (2018) Epileptic EEG signal classification using
optimum allocation based power spectral density estimation. IET Signal Proc 12(6):738–747.
https://doi.org/10.1049/iet-spr.2017.0140
10. Hassan AR, Subasi A (2016) Automatic identification of epileptic seizures from EEG signals
using linear programming boosting. Comput Methods Programs Biomed 136:65–77
11. Kiymik MK, Subasi A, Ozcalık HR (2004) Neural networks with periodogram and autoregres-
sive spectral analysis methods in detection of epileptic seizure. J Med Syst 28:511–522
12. Liu S, Wang J, Li S, Cai L (2023) Epileptic seizure detection and prediction in EEGs using
power spectra density parameterization. IEEE Trans Neural Syst Rehabil Eng
13. McDonald AD, Lee JD, Schwarz C, Brown TL (2014) Steering in a random forest: ensemble
learning for detecting drowsiness-related lane departures. Hum Factors 56(5):986–998
14. Nicolaou N, Georgiou J (2012) Detection of epileptic electroencephalogram based on
permutation entropy and support vector machines. Expert Syst Appl 39(1):202–209
15. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens
26(1):217–222
16. Polat K, Gu¨ne¸s S (2007) Classification of epileptiform EEG using a hybrid system based on
decision tree classifier and fast Fourier transform. Appl Math Comput 187(2):1017–1026
17. Rajaguru H, Kumar Prabhakar S (2017) Power spectral density with correlation dimension for
epilepsy classification from EEG signals. In: 2017 2nd international conference on commu-
nication and electronics systems (ICCES), pp 376–379. https://doi.org/10.1109/CESYS.2017.
8321303
18. Rohira V, Chaudhary S, Das S, Prasad Miyapuram K (2023) Automatic epilepsy detection
from EEG signals. In: Proceedings of the 6th joint international conference on data science &
management of data (10th ACM IKDD CODS and 28th COMAD). Association for Computing
Machinery, New York, NY, USA, pp 272–273. https://doi.org/10.1145/3570991.3570995
19. Sharmila A, Geethanjali P (2016) DWT based detection of epileptic seizure from EEG signals
using naive Bayes and k-NN classifiers. IEEE Access 4:7716–7727
20. Shoeibi A, Khodatars M, Ghassemi N, Jafari M, Moridian P, Alizadehsani R, Panahiazar M,
Khozeimeh F, Zare A, Hosseini-Nejad H et al (2021) Epileptic seizures detection using deep
learning techniques: a review. Int J Environ Res Public Health 18(11):5780
21. Slavič J, Mršnik M, Cěsnik M, Javh J, Boltežar M (2021) Signal processing. In: Vibration
fatigue by spectral methods. Elsevier, pp 51–74. https://doi.org/10.1016/b978-0-12-822190-7.
00009-8
22. Subasi A, Kevric J, Abdullah Canbaz M (2019) Epileptic seizure detection using hybrid machine
learning methods. Neural Comput Appl 31:317–325
23. Wang G, Deng Z, Choi KS (2015) Detection of epileptic seizures in EEG signals with rule-
based interpretation by random forest approach. In: Advanced intelligent computing theories
and applications: 11th international conference, ICIC 2015, Fuzhou, China, August 20–23,
2015. Proceedings, Part III 11. Springer, pp 738–744
24. Wang Y, Cao J, Lai X, Hu D (2019) Epileptic state classification for seizure prediction with
wavelet packet features and random forest. In: 2019 Chinese control and decision conference
(CCDC). IEEE, pp 3983–3987
25. Yang C, Deng Z, Choi KS, Wang S (2015) Takagi–Sugeno–Kang transfer learning fuzzy logic
system for the adaptive recognition of epileptic electroencephalogram signals. IEEE Trans
Fuzzy Syst 24(5):1079–1094
26. Zhang Y, Liu B, Ji X, Huang D (2017) Classification of EEG signals based on autoregressive
model and wavelet packet decomposition. Neural Process Lett 45:365–378
YOLO Algorithm Advancing Real-Time
Visual Detection in Autonomous Systems
Abhishek Manchukonda
Abstract This research paper presents an overview of the YOLO (You Only Look
Once) Algorithm, a pioneering object detection approach. Introduced in 2015 by
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, YOLO has
become a state-of-the-art solution for object detection. It underwent two incre-
mental improvements:”YOLO9000: Better, Faster, Stronger” and “YOLOv3: An
Incremental Improvement,” refining its capabilities while preserving its core concept.
The paper emphasizes the relevance of object detection for self-driving cars. Cur-
rent autonomous vehicles rely on Lidar technology, but YOLO offers a vision-based
alternative using image data, akin to human navigation, potentially improving safety
and accuracy in challenging conditions. The study delves into Convolutional Neural
Networks (CNNs), essential to the YOLO Algorithm. CNNs extract features and learn
filter values, efficiently handling large image datasets. The paper examines the tran-
sition from traditional Neural Networks to CNNs, addressing real-world computer
vision challenges. The YOLO Algorithm’s architecture is analyzed, demonstrating
simultaneous object localization and detection. The Convolutional Implementation of
Sliding Window streamlines the traditional approach, empowering YOLO to achieve
real-time performance with multiple object detection. The conclusion highlights
YOLO’s significance for future object detection and its potential impact on self-
driving cars. Real-time performance and high accuracy make YOLO essential for
safer and more efficient autonomous vehicles. As research advances, YOLO’s role
in shaping the future of autonomous driving becomes pivotal.
1 Introduction
The abbreviation “YOLO” stands for You Only Look Once and it explains the main
concept of the algorithm. Currently, YOLO is a state-of-the-art algorithm for Object
Detection problems. Since 2015, the authors have presented 2 improvements to the
original YOLO paper: “YOLO9000: Better, Faster, Stronger” and “YOLOv3: An
A. Manchukonda (B)
National Institute of Technology Warangal, Warangal, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 265
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_23
266 A. Manchukonda
Incremental Improvement” They have described both as only minor changes to the
original idea.
Self-driving cars are the future of public transportation systems. Currently, most
autonomous vehicles use Lidar to detect objects in their surroundings. However, this
is an imperfect solution, as Lidars are expensive and can suffer from accuracy issues
in certain circumstances (e.g., very bright sunlight may cause errors). Using some
additional data would be very helpful to avoid mistakes. This additional data could
be vision (See Fig. 1)—the basic data source based on which we (humans) drive our
cars.
In the earliest approaches to the Object Detection problem [2, 3], we were limited
to the specified types of objects. We had to hand-code n-features for each object and
then provide it to a classifier. This approach worked for some objects like faces but
didn’t work for all objects. With YOLO we don’t have that problem. It works for all
types of objects, regardless of shape, size, and color.
YOLO Algorithm Advancing Real-Time Visual Detection … 267
The main concept behind YOLO is CNN (Convolutional neural network). This
paragraph provides some intuition regarding how a CNN works and why we need it.
Simple computer vision problems like handwritten digit recognition (e.g., MNIST
Dataset), could be solved using traditional Neural Networks. We take an input data
matrix of size m × n, convert it into an (m ∗ n) × 1 vector, then multiply it by NN
weights, we add bias, we apply some nonlinear function (e.g., sigmoid σ ) and we
end up with the first layer of the NN. Then we do the same until we reach the last
NN layer, which represents the NN prediction. For a 1 hidden layer NN, it will look
like that
z1 = (a0 × w1 ) + b1
a1 = σ (z1 )
z2 = (a1 × w2 ) + b2
a2 = σ (z2 )
In the case of modern computers, it is feasible to store and optimize that amount
of parameters, but what if instead of a 28 × 28 gray picture we have a 1000 × 1000
RGB picture?
• It’s very hard to store 3 billion parameters in memory, most modern GPUs don’t
have enough memory.
• Optimizing 3 billion parameters is computationally expensive, training would
take a lot of time.
• Overfitting, when we have too many parameters it is really hard to find “global
optimum”, our model will work better for data that has been used for training than
for new data (which is an unfortunate situation).
So, what should we do instead if we want to work on real-world pictures? We will
come back to this problem later because to solve it we need to introduce the concept
of Edge Detection.
One of the oldest ways of solving computer vision problems is edge detection. The
edge detection algorithm is performed using a matrix called a mask and an operation
called convolution. Convolution expression:
a
b
g(x, y) = (ω ∗ f )(x, y) = ω(s, t)f (x − s, y − t) (1)
s=−a t=−b
Fig. 4 Example edge detection Sobel filter for angles: 0°, 45° and 90°
Using hand-engineered filter values, we can detect horizontal or vertical edges quite
good but what if we want to detect some more sophisticated features like cat edges?
This is where deep learning comes in: Instead of using hand-engineered filter
values, we can use a self-learning algorithm that finds the right values by itself (like
in Fig. 6).
What is interesting at this point is the fact that regardless of the image size, we
have the same number of parameters to train. No matter if our image is 20 × 20 pixels
or 1000 × 1000, we need to train the same number of parameters – the number of
270 A. Manchukonda
values in filter mask (See Fig. 7). It is one of the main reasons why CNNs are so
popular in real-life computer vision problems.
Quick reminder about the basics of CNN and introduction of the naming convention
used further in this paper.
YOLO Algorithm Advancing Real-Time Visual Detection … 271
Fig. 7 Instead of using hand-engineered filter values we use learned values [5]
This layer is utilized to streamline the convolution operation to the image by using
the filter mask. Values from the filter mask are parameters that we are training (See
Fig. 8).
The previous examples contained only one channel, but in real life we usually have
more. For example, colorful Image has green, blue, and red channels. Three channels
(Red, Green, Blue). In that case, every filter needs to have 3 “sub filters”, one for
each channel. We apply convolution operation to each channel and then sum up the
result (See Fig. 9).
272 A. Manchukonda
To detect multiple types of features in the previous layer we should use multiple
filters. For example, one filter will detect vertical edges and one will detect horizontal.
“This study examines the final numerical output as indicative of the quantity of filters
employed within the convolution operation (Refer to Fig. 10).”
5.4 Padding
5.5 Stride
Stride determines the number of cells that the filter moves in the input to calculate
the cell in the output (See Fig. 12).
Second type of layer used in CNNs is Pooling Layer. There are two types of Pooling
Layers: Avg. Pooling and Maximum Pooling. The pooling layer is primarily utilized
to decrease the dimensions of outputs (see Fig. 13).
274 A. Manchukonda
The fully connected (FC) layer constitutes a stage where each neuron from the
preceding layer establishes a connection with every neuron present in the subsequent
layer. Its principle is the same as the traditional Neural Network.
Traditionally CNNs have been used to solve image recognition problems. The last
FC layer was a prediction layer. In the image below (Fig. 14), we can see LeNet-5.
6 Object Localization
where y1 = po , y2 = bx , y3 = by etc.
7 Sliding Window
If an image contains more than one object we can no longer use Object Localization,
so we need to find another way to detect objects. One of the popular approaches is
the Sliding Window Algorithm. It is very simple but provides good enough results.
276 A. Manchukonda
In the Sliding Window Algorithm, we take part of an image (called window), feed
forward it through a Neural Network (or any other classifier) and end with a prediction
if this part of an image contains an object. We repeat these steps for each part of the
picture (that’s why we call it sliding a window) and as a result, we have predictions
for all fields in the image.
The sliding window approach is relatively simple but unfortunately, it has a few
disadvantages, such as
• We know neither the size nor shape of an object we are looking for;
• We need to feed-forward thousands of images through a classifier which is
computationally expensive
• We don’t know which stride to choose, if we choose too small, we will work many
times on nearly the same image, if we choose too big accuracy will be really poor,
so how we can make it in a smarter way?
YOLO Algorithm Advancing Real-Time Visual Detection … 277
We can perform the Sliding Window Algorithm much faster and more efficiently
using its Convolution implementation. The idea of Convolutional Implementation of
a Sliding Window was first introduced in Feb. 2014 by Sermanet,1 Eigen,2 Zhang,3
Mathieu,4 Fergus,5 LeCun6 in “OverFeat: Integrated Recognition, Localization and
Detection using Convolutional Networks” paper. One year later it was used in the
original YOLO paper.
Finally, we’ve reached YOLO—You Only Look Once. YOLO combines ideas from
Convolution Implementation of Sliding Window and Object Localization.
Fig. 20 Predictions for (13 × 13) grid cell after Non-max suppression [11]
When we have 2 predictions that intersect, we need to decide should both predictions
be kept, because they detect 2 objects or is it the same object and one of them should
be removed. To solve that problem, we need to introduce the idea of IoU
IoU—Intersection over Union. As the name suggests is a fraction
Intersectionsurfacearea
Unionsurfacearea
.
In Non-max suppression we compare surface of an intersection with the surface
of a union and when IoU is bigger than some threshold value (e.g., 0.6, but it depends
on implementation) we take only the detection with a bigger po —probability that
this cell contains a central point of an object. In case IoU is smaller than some
threshold value we keep both predictions. We perform Non-max suppression for all
intersecting predictions. (Example result of the algorithm in Fig. 21).
280 A. Manchukonda
9.3 Anchor-Box
The concept of anchor boxes was first presented in the YOLO9000 research. The
notion is fairly straightforward. Consider the scenario where two objects, differing
in shapes or sizes, share a common central point within the same grid cell. In the
original YOLO paper, only one object could be detected under such circumstances.
However, the enhanced algorithm version addresses this limitation by incorporating
a more ’deep’ final layer within the convolutional neural network (CNN), which
divides the image into grid cells. This augmented layer generates multiple predictions
instead of a singular one. The dimensions attributed to each grid cell result from the
multiplication of the count of anchor boxes with the dimensions of the original
prediction. This methodology facilitates the identification of multiple objects within
individual grid cells.
In YOLO9000, the scholars employed a set of 5 anchor boxes, while YOLOv3
integrated 9 boxes. Instead of manual anchor curation, the authors applied K-means
clustering to the bounding boxes from the training dataset, enabling the automatic
discovery of well-suited anchor dimensions (example output in Fig. 22).
10 Conclusions
YOLO represents an instantaneous and versatile algorithm for object detection across
various contexts. It amalgamates remarkable operational speed with elevated preci-
sion (Refer to Fig. 23), thereby rendering it apt for resolving real-life challenges.
In forthcoming times, its potential application extends to the realm of autonomous
vehicles, contributing to the establishment of a more secure future for the collective
populace.
YOLO Algorithm Advancing Real-Time Visual Detection … 281
References
1. Source of many intuitions and ideas about CNNs and object detection problems. In this paper
source of graphics from convert FC layer to convolutional layer and predictions for (13 ×
13)grid cell after Non-maxsuppression. https://www.coursera.org/learn/convolutional-neural-
networks
2. Viola-Jones Algorithm (2001) The first efficient face detector. https://www.cs.cmu.edu/~efros/
courses/LBMV07/Papers/viola-cvpr-01.pdf
3. Dalal and Triggs (2005) Histograms of oriented gradients for human detection. https://lear.inr
ialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf
4. Edges detected using Sobel image. https://en.wikipedia.org/wiki/Sobel_operator#/media/File:
Bikesgraysobel.jpg
5. Original YOLO implementation, source of object detection algorithms comperation. https://
pjreddie.com/darknet/yolo/
6. Source of first image (How self-driving cars see the world). https://towardsdatascience.com/
how-do-self-driving-cars-see-13054aee2503
7. Sobel filter. https://en.wikipedia.org/wiki/Sobel_operator
8. CNN Cat features visualizations. http://mcogswell.io/blog/why_cat_2/
9. Convolutional layer image. https://medium.freecodecamp.org/an-intuitive-guide-to-convoluti
onal-neural-networks-260c2de0a050
10. Convolution operation on volume, multiple filters, stride, padding, pooling, LeNet 5. https://
indoml.com/2018/03/07/student-notes-convolutional-neural-networks-cnn-introduction/
11. Original YOLO paper: https://arxiv.org/pdf/1506.02640v1.pdf
12. Mnist image source. https://m-alcu.github.io/blog/2018/01/13/nmist-dataset/
13. OverFeat: integrated recognition, localization and detection using convolutional networks,
source of example of conv implementation of sliding window graphics. https://arxiv.org/pdf/
1312.6229.pdf
14. YOLO9000: https://arxiv.org/pdf/1506.02640v1.pdf
15. YOLOv3: https://arxiv.org/pdf/1804.02767.pdf
16. First image source: https://towardsdatascience.com/how-do-self-driving-cars-see-13054a
ee2503
17. MNIST dataset. http://yann.lecun.com/exdb/mnist/
18. Convolution operation equation. https://en.wikipedia.org/wiki/Kernel_(image_processing)#
Details
19. Non-max suppression graphics. https://appsilon.com/object-detection-yolo-algorithm/
Optimizing Feature Selection in Machine
Learning with E-BPSO: A
Dimensionality Reduction Approach
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 283
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_24
284 R. Shenbaga Moorthy et al.
1 Introduction
feature selection. Thus, in this paper, K-NN has been used as a classifier to validate
the features selected by E-BPSO.
The remainder of the paper is structured as follows: Sect. 2 provides an overview
of prior research on feature selection using a range of metaheuristic algorithms.
Section 3 delves into the operation of the newly proposed E-BPSO. In Sect. 4, we
present a comparison between the proposed E-BPSO and the traditional BPSO
using benchmark datasets. Lastly, Sect. 5 offers a conclusion and discusses potential
avenues for future research.
2 Related Works
Feature selection has been performed on the KDD Cup dataset in WEKA and the
performance had been compared with the conventional way of selecting features.
The accuracy obtained was 99.794% when using decision tree as a classifier [10].
Multi-objective binary genetic algorithm integrated with adaptive mechanism was
designed for selecting the essential features. The algorithm includes five crossover
286 R. Shenbaga Moorthy et al.
probabilities which was assigned with different probability. The fitness function con-
sidered for evaluating the individual is error rate and number of features selected [4].
Improved binary particle swarm optimization was designed with the aid of improving
exploration and exploitation in selecting optimal feature subset [5]. Feature selec-
tion based on simulated annealing, hybrid particle swarm optimization, and fuzzy
K-means called FUFPS was designed to choose optimal feature subset for vari-
ous benchmarking dataset from UCI repository [11]. Modified Binary Sine Cosine
Algorithm (MBSCA) had been used for selecting the necessary features thereby
eliminating the irrelevant features. Beta and delta agents were introduced into the
conventional binary sine cosine algorithm (BSCA) and the algorithm was evaluated
on medical datasets against GA and BSCA [12].
Simple variance filter was used to maximize the accuracy of the predictive
model. Since the wrapper methods are computationally complex, the filter method
was used to select the features and the designed method was applied on gene
expression data [13]. Only necessary features had been selected using an ensem-
ble boosting framework which consists of XGBoost and two-step selection mecha-
nism [14]. Harris Hawks optimization algorithm and fruitfly optimization algorithm
were hybridized with the intention of choosing the essential features and the results
were promising than applying the conventional algorithms [15]. The performance
of vortex search algorithm for selecting the features had been improved by includ-
ing chaotic maps [16]. Dispersed foraging slime-based algorithm had been used to
access the quality of the attributes and to select only the informative features. The
method improves classification accuracy with reduced set of features [17]. Cen-
troid mutation-based search and rescue optimization algorithm were used to find the
quality and necessary features from the medical instances which intends to avoid
premature convergence of conventional algorithm [18]. Table 1 summarizes some of
the existing methodologies in feature selection.
The stagnation, obtaining local optimal solutions is the main challenge to be
addressed when applying the features selection algorithm for a particular dataset,
according to related works. Although there are many algorithms available for find-
ing the relevant features. The authors of this paper used E-BPSO to avoid early
convergence, stagnation, and stuck in solution that are not optimal on benchmarking
datasets collected from the UCI repository.
particle swarm optimization. The working of E-BPSO for selecting optimal subset
of features is shown in Fig. 3. The problem of stagnation, entrapment in local optima,
and early convergence occurs when using BPSO to settle a disagreement over dis-
crete feature selection. This problem has been resolved using E-BPSO. Falling in
local optima is an issue for the standard BPSO. To overcome this E-BPSO has been
proposed with the goal of avoiding stagnation and finding global optimal solution.
Enhancement is made in the original BPSO by including the self-adaptive velocity
for exploration and exploitation. Starting with N particles, E-BPSO assigns each
where .α and .β are the weight associated with accuracy and DRR. Each particle
updates its velocity as specified in (2).
.vid (t + 1) ← Prand (t) − Pi (t) ∗ vid (t) + C1 ∗ rand ∗ (PBestPosi − Pi ) + C2 ∗ rand ∗ (GBestPos − Pi ) (2)
where .Prand (t) represents the random particle at .t th time step, .vid (t + 1) exhibits the
velocity of the .ith particle in .d th dimension for iteration .t + 1. .PBestPosi represents
the personal best location of the .ith particle, .GBestPos represents the global best
location of the swarm, and .C1 and .C2 are acceleration coefficients. To convert the
continuous values of particle to binary values, V-shaped transfer function had been
used which is specified in (3).
Optimizing Feature Selection in Machine Learning with E-BPSO … 289
vid
T − velocity ←
. (3)
1 + vid 2
Based on the value of the transfer function, the particle’s position which actually
represents is computed using (4).
1 if rand < T − velocity
.Pid ← (4)
0 else
4 Experimental Results
Using benchmarking datasets acquired from the UCI repository [24], the proposed
E-BPSO is contrasted with various algorithms. Table 2 contains information about
the datasets. The datasets are divided 70:30, which indicates that 70% of the data
will be used for training and 30% for testing. 1-NN is used to assess the features
selected by E-BPSO. The metrics considered for evaluation are mean fitness value,
accuracy, Root Mean Square Error (RMSE), feature selection ratio, and standard
deviation of the fitness value. Other traditional algorithms like GA and BPSO are
compared to the suggested E-BPSO. The experiment is conducted for 30 times and
the average is taken into account for comparison. Parameters of algorithms taken for
experimentation are specified in Table 3.
290 R. Shenbaga Moorthy et al.
Accuracy represents the ratio of instances which are classified correctly by 1-NN
represented in (5). Table 4 represents the accuracy of classifier K-NN where K =
1. Feature selection is used in conjunction with a number of different methods to
enhance the accuracy of 1-NN, which is depicted in Fig. 5. For lung cancer dataset,
E-BPSO improves accuracy by 17.68 and 13.59% than GA and BPSO, respectively.
The ratio of sum-squared difference between target and values output by the classifier
to instances in the dataset, as stated in (6), is known as the root mean square error, or
RMSE. Table 5 represents the comparison of RMSE of various algorithms and also
for the entire dataset. The best values are bold faced. It is observed that E-BPSO
achieves minimum RMSE for Wisconsin breast cancer and lung cancer dataset. For
Wine dataset, BPSO achieves minimum RMSE than E-BPSO and the RMSE of
E-BPSO is increased by 40.25% than BPSO.
Num_Instances
yi − yi )2
(
RMSE ←
.
i=1
(6)
Num_Instances
292 R. Shenbaga Moorthy et al.
The dimensionality reduction ratio of E-BPSO had been compared with the GA
and BPSO. DRR is computed using (7). It has been observed that proposed E-
BPSO achieves minimum DRR than other algorithms as shown in Fig. 6. For Wine
dataset, E-BPSO reduces the dimensions by 14.28% and 85.71% than BPSO and GA,
respectively. This is because the proposed E-BPSO includes self-adaptive velocity
which prevents the particles from falling in local optimal solution.
selected features
DRR ← 1 −
. (7)
Total number of features
dataset whereas GA chooses nine features and BPSO chooses six features. Also, E-
BPSO ranks first in maximizing accuracy of the Wisconsin breast cancer dataset
which is evident from Fig. 5.
The minimum and maximum fitness values obtained for Wisconsin breast can-
cer dataset for GA are 0.559313 and 0.832790, respectively, represented in Fig. 7.
Though, E-BPSO started with maximum fitness of 0.838970, it converged to
0.581495 across the course of iteration for Wisconsin breast cancer dataset. Sim-
ilarly, BPSO also had a minimum max fitness of 0.834013 than E-BPSO but the
minimum fitness remains higher than E-BPSO. The fitness value converges at nearly
64th iteration for E-BPSO with global optimal solution, but in the case of GA which
converges at 95th iteration with local optimal solution. This shows the inclusion of
adaptive weight accelerates the particle in better direction which tends to converge in
global optimal solution for the proposed E-BPSO. For the lung dataset, the minimum
fitness are 0.438682, 0.411955, and 0.411955 for GA, BPSO, and E-BPSO, respec-
tively, which are represented in Fig. 8. In the case of wine dataset, both BPSO and
E-BPSO have nearly the same fitness, but the minimum fitness of BPSO is 0.362131
and E-BPSO is 0.354568 which is represented in Fig. 9. The mean and standard
deviation of fitness for BPSO, GA, and proposed E-BPSO are represented in Table 7.
294 R. Shenbaga Moorthy et al.
5 Conclusion
References
11. Moorthy RS, Parameshwaran P (2021) A novel hybrid feature selection algorithm for optimal
provisioning of analytics as a service. In: Soft computing for problem solving, pp 511–523.
Springer, Singapore
12. Moorthy RS, Pabitha P (2022) Intelligent health care system using modified feature selection
algorithm. In: Pattern recognition and data analysis with applications, pp 777–787. Springer,
Singapore
13. Bommert A, Welchowski T, Schmid M, Rahnenführer J (2022) Benchmark of filter methods
for feature selection in high-dimensional gene expression survival data. Briefings Bioinform
23(1):bbab354
14. Alsahaf A, Petkov N, Shenoy V, Azzopardi G (2022) A framework for feature selection through
boosting. Expert Syst Appl 1(187):115895
15. Abdollahzadeh B, Gharehchopogh FS (2022) A multi-objective optimization algorithm for
feature selection problems. Eng Comput 38(3):1845–63
16. Gharehchopogh FS, Maleki I, Dizaji ZA (2022) Chaotic vortex search algorithm: metaheuristic
algorithm for feature selection. Evol Intell 15(3):1777–808
17. Hu J, Gui W, Heidari AA, Cai Z, Liang G, Chen H, Pan Z (2022) Dispersed foraging slime
mould algorithm: continuous and binary variants for global optimization and wrapper-based
feature selection. Knowl Based Syst 15(237):107761
18. Houssein EH, Saber E, Ali AA, Wazery YM (2022) Centroid mutation-based search and rescue
optimization algorithm for feature selection and classification. Expert Syst Appl 1(191):116235
19. Kareem SS, Mostafa RR, Hashim FA, El-Bakry HM (2022) An effective feature selection
model using hybrid metaheuristic algorithms for IOT intrusion detection. Sensors 22(4):1396
20. Zivkovic M, Stoean C, Chhabra A, Budimirovic N, Petrovic A, Bacanin N (2022) Novel
improved salp swarm algorithm: an application for feature selection. Sensors 22(5):1711
21. Albulayhi K, Abu Al-Haija Q, Alsuhibany SA, Jillepalli AA, Ashrafuzzaman M, Sheldon FT
(2022) IoT intrusion detection using machine learning with a novel high performing feature
selection method. Appl Sci 12(10):5015
22. Liu Y, Heidari AA, Cai Z, Liang G, Chen H, Pan Z, Alsufyani A, Bourouis S (2022) Simulated
annealing-based dynamic step shuffled frog leaping algorithm: Optimal performance design
and feature selection. Neurocomputing 7(503):325–62
23. Hu J, Gui W, Heidari AA, Cai Z, Liang G, Chen H, Pan Z (2022) Dispersed foraging slime
mould algorithm: continuous and binary variants for global optimization and wrapper-based
feature selection. Knowl Based Syst 15(237):107761
24. Kelly M, Longjohn R, Nottingham K (2023) The UCI machine learning repository. https://
archive.ics.uci.edu
CRIMO: An Ontology for Reasoning
on Criminal Judgments
Abstract Legal experts develop their draft by analyzing legal documents in order to
glean information about the criteria listed in relevant legal parts. The many criminal
cases reported in Criminal Judgments of the lawful domain explain the offense, the
accused parties, the investigation, and the ultimate verdict. Many parts of the written
code can be misinterpreted or lead to erroneous findings when applied to a criminal
case. To better assist legal reasoning, this study seeks to establish an integrated ontol-
ogy for modeling criminal law standards. The proposed criminal domain ontology
maps entities and their relationships to textual rules linked to Criminal Acts in the
Indian Penal Code of 1860 using OWL-DL in a middle-out manner and formalizes
legal rules accordingly. The purpose is to build a legal rule-based decision support
system for the Indian criminal domain utilizing SWRL rule language to generate
logic rules and integrate the criminal domain ontology.
1 Introduction
There are various legal documents produced in India annually depicted in Fig. 1,
including case conclusions, precedents, resolutions, decrees, and circulars. The sheer
volume of these documents highlights the complexity and scope of the Indian legal
system. The data is collected from the source https://njdg.ecourts.gov.in/njdgnew/
index.php. This data represents the count of pending cases in the district and taluka
courts of India. Nevertheless, the majority of legal information exists in textual
format, which poses challenges in automatically extracting relevant legal information
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 297
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_25
298 S. Jain et al.
38145112
Civil Criminal Total
29964440
12763891
20786577
18094745
15893932
15401680
14813268
10453955
8937262
8180672
7204243
7126866
4892645
6670649
4359313
4184236
3989163
3651959
3216968
2889950
2810351
2693065
2696536
2547155
2561460
2131849
2068534
2041082
1920629
1904750
1778140
1747209
1602160
1535404
1487700
5421495
5412913
886809
669813
678502
477656
409153
328490
262942
4566573
66756
3275989
2650555
2266613
1791330
1705371
625434
503973
399582
104391
95857
66077
29780
0 TO 1 YEARS 1 TO 3 YEARS 3 TO 5 YEARS 5 TO 10 YEARS 10 TO 20 20 TO 30 ABOVE 30
YEARS YEARS YEARS
Fig. 1 Number of pending cases in the district and taluka courts of India
from natural language documents. So, there is a need for efficient management of
legal documents for legal experts. That helps to make the search better and cost-
efficient using machine learning within data-driven applications for legal reasoning
and legal consulting systems. Ontologies have recently gained prominence in the
state-of-the-art Semantic Web due to their ability to capture knowledge in a certain
field. Web-based applications do benefit from their emphasis on interoperability and
the establishment of a clear shared understanding among the various parties involved.
Various studies tried to solve the problem in criminal law in this domain topic
using the approach of legal information generalization [1], Semantic Web technology
[2], legal ontology [3], and rule designs [4]. These all studies have the common
objective of building 6a global criminal ontology for the management of unstructured
documents, but face the problem of different geographical laws. So improving a
global criminal ontology is essential. A sizable amount of data is frequently stored
in the law enforcement field in a convoluted and phonetically complex manner. This
paper provides a criminal ontology that provides an international criminal ontology
which serves as a piece of well-organized information that can bring order to this
chaos and make information more accessible and comprehensible.
Furthermore, reasoning can be applied to identify anomalies or irregularities in
criminal data. This can assist in detecting potential fraud, criminal behavior, or sus-
picious activities by comparing the data against predefined ontological rules and
patterns. In a criminal ontology, legal reasoning may be crucial for interpreting and
applying laws and regulations to specific cases. Reasoning can assist legal profes-
sionals in understanding the implications of various legal statutes and precedents
in the context of criminal cases. However, we present a method that is well-suited
for constructing a legal ontology tailored to the unique characteristics of the legal
system in India. Specifically, we utilize data from the Indian criminal code, focusing
on offenses against the person and reputation, to build the ontology. Additionally, we
introduce a legal expert system called CRIMO (the criminal ontology) that aids legal
experts in efficiently validating the ontology model created by ontology engineers
and facilitating legal reasoning for decision-making. Our work also encompasses the
incorporation of Semantic Web Rule Language (SWRL) reasoning in the ontology
for legal documents. Overall, our research contributes to three key areas: the devel-
opment of a specialized legal ontology, the implementation of the CRIMO expert
CRIMO: An Ontology for Reasoning on Criminal Judgments 299
system, and the utilization of SWRL reasoning, all of which offer significant poten-
tial benefits in the field of legal knowledge management and decision support. The
highlights of the key contributions are
2 Related Work
Currently, there are various studies focused on criminal ontology and the Semantic
Web, exploring various approaches related to reasoning, logical rules, and ontology
design. For instance, the syntax of the Legal Knowledge Interchange Format (LKIF)
is introduced in Gordon’s study [5]. A rule language and argumentation-theoretic
semantics are also part of the ESTRELLA European project. It utilizes the Web
Ontology Language (OWL) for concept representation and includes a foundational
ontology of legal concepts that can be leveraged for reuse. The core of LKIF combines
OWL-DL and SWRL. Its primary objectives are twofold: facilitating the translation
of legal knowledge bases written in different formats and formalisms, and serving as a
knowledge representation formalism within larger architectures for developing legal
knowledge systems. Furthermore, several other studies have examined ontology and
legal reasoning systems in different regions such as China [6], Lebanon [7], Korea
[8], Malawi [7], Tunisia [9], etc. Each of these studies presents distinct ideas aimed at
300 S. Jain et al.
3 Theoretical Background
aid law enforcement agencies and criminal justice professionals in their tasks. These
systems use knowledge representation based on criminal ontology and structured
data to assist with tasks such as criminal profiling, crime analysis, and predictive
policing.
Decision-making in criminal ontology involves using ontological models and
structured knowledge to inform and direct various parts of law enforcement and
criminal justice procedures. This includes associating evidence with a notion from
a criminal ontology, making informed judgments regarding its applicability and
importance in a criminal inquiry; organizing and prioritizing criminal cases using
ontology-based decision support; and making judgments about punishment, parole,
and rehabilitation programs.
3.1 Ontology
SWRL (Semantic Web Rule Language) [25] is a language used in the Semantic
Web domain to express rules that can be applied to ontologies. It combines the
expressive power of OWL (Web Ontology Language) with the rule-based approach
of RuleML. SWRL allows the specification of logical rules that can be used to infer
new knowledge from existing knowledge in an ontology.
SWRL rules are expressed using a high-level abstract syntax and are typically writ-
ten in terms of OWL concepts, such as classes, properties, and individuals [26]. These
rules follow a Horn-like structure, which means they consist of a set of antecedents
(conditions) and a consequent (conclusion) [27]. When the antecedents are satisfied,
the consequent is inferred.
SWRL can be used in the context of legal documents and crime ontology to capture
and formalize legal rules, as well as to reason about criminal activities and legal
concepts. By integrating SWRL rules with a criminal domain ontology, it becomes
possible to apply logical relationships and infer new knowledge based on the existing
legal knowledge.
The use of SWRL rules in legal expert systems and legal reasoning systems has
been explored in research. For example, the VNLES (Reasoning-enable Legal Expert
System using Ontology) [28] utilizes SWRL rules to define logical relationships in
the legal domain. Similarly, the CORBS (Criminal Rule-Based System) [7] integrates
SWRL rules with a criminal domain ontology to model and reason about legal rules.
However, none of the studies have defined crime ontology for the Indian Penal Code
(IPC), summarizing the specific gap in the literature that makes research aims to fill.
304 S. Jain et al.
The study uses an improved ontology architecture and an open standard ontol-
ogy language to address limitations in existing technology and techniques, such
as the cold start issue. By utilizing criminal ontology, decision-making in criminal
ontology can be more efficient, enabling proactive policing initiatives and a better
understanding of criminal cases.
Inference Engine
domain ontologies by domain experts [31]. This approach provides the freedom
to the developer to define the scope of ontology encountered related difficulties
and implement it through ontological techniques, such as Protege or foundational
approach. The basic pipeline to create the legal ontology from scratch is shown in
Fig. 3.
D2MD methodology iterates over four steps: (1) purpose identification and
requirement specification for COVID ontology, (2) ontology development phase,
(3) evaluation and validation approach, (4) post-development phase.
The scope for creating an ontology involves determining the domain and defining the
goals and boundaries of the ontology. The ontology-based model described in this
paper consists of both domain-dependent and domain-independent semantic rules in
the test case. This model consists of a contextual feature which interprets accurately
captured results, providing the disclosure of richer data to programs for supporting
the conceptual searches. The scope for CRIMO ontology helps to establish concepts,
individuals, properties, and relationships relevant to criminal activities. It includes
classes like “Criminal”, “Crime”, “Suspect”, “Evidence”, “Victim”, “Location”, and
properties like “committedCrime”, “hasAlibi”, and “hasEvidence”. We describe the
purpose of creating CRIMO ontology in the form of the research question. Some of
these research questions are described in Table 2.
Based on the available data we first filter out major categories like Person, Location,
Crime, Section, etc., and the relationship between persons and crime, the relationship
306 S. Jain et al.
between crime and location, how the person is related to crime, and which section
is related to crime, etc. Here each section is uniquely identified by a particular name
or ID, and the description of each section is defined. These are further categorized
into three parts named as
• Classes.
• Object property.
• Data property.
• Conceptual Clarity: The ontology should reflect a clear and unambiguous under-
standing of the domain. Begin by defining fundamental concepts related to criminal
judgments, legal entities, and reasoning processes.
• Domain Analysis: Grasp the space of criminal decisions, legitimate thinking,
and related regions completely. This includes counseling legitimate special-
ists; concentrating on lawful texts; and distinguishing key ideas, elements, and
connections.
• Conceptual Modeling: Make a calculated model that addresses the central ideas
and connections in the domain. We might utilize visual apparatuses like UML
charts or OWL (Web Cosmology Language) for this reason.
• Hierarchy and Classification: Coordinate the ideas into a progressive design
with subclasses and superclasses. For example, we might have a high-level
class for “criminal judgment” with subclasses like “conviction”, “exoneration”,
“condemning”, and so on.
To extract concepts from legal documents and transform them into classes in CRIMO
ontology, we follow a systematic approach, which involves natural language process-
ing techniques and legal expert validation. The top-level entities extracted from the
legal document that become the concepts for our CRIMO ontology are shown in
Table 3.
In ontology modeling, object properties and data properties are two fundamental
types of properties used to describe relationships and attributes of individuals or
instances within a domain. These properties are used to define the structure and
semantics of ontology classes and instances.
1. Object properties: Some of the object properties we had defined in our ontology
are as follows.
2. Data properties: In ontology, a data property is a fundamental concept that is
used to describe the attributes or characteristics of individuals within a domain.
Data properties are distinct from object properties, which describe relationships
between individuals (Tables 4 and 5).
These are likely to represent concepts and can be considered potential classes in
CRIMO ontology. Eliminate duplicates and synonyms to ensure that each concept is
represented only once with manual effort. Then we organize the identified concepts
hierarchically through the legal expert. Some concepts may be more general (super-
classes) while others are more specific (subclasses). Then we create a formal ontology
308 S. Jain et al.
structure that represents the relationships between classes using an ontology schema
diagram as shown in Fig. 4. The box represents the concept and the value inside
the box represents the data property of the respective concept. The arrow between
the two concepts represents the connection between the concepts. The dotted arrow
shows the subclass relationship between the concepts.
After reviewing the list of extracted concepts and ensuring that they make sense
in the context of the purpose identification of the CRIMO domain and the goals of
CRIMO ontology. Then we create a formal ontology structure that represents the
relationships between classes (concepts) using ontology languages like OWL (Web
Ontology Language) using Protégé. The structure of the CRIMO ontology is shown
in Fig. 5.
SWRL rules are created to present the logic of the acquisition schema. We extract the
law content from the IPC criminal law [32]. This rule is used to infer connections and
realities about crimes, people, and elements inside the space of law enforcement and
policing. SWRL is a strong decision language that can be utilized to communicate
complex connections and make derivations in light of the information put away in
an ontology.
SWRL rules are defined within the criminal ontology, expressing logical relation-
ships and constraints among its elements. For instance, if a person is identified as
a suspect in a crime without evidence proving their alibi, they can be inferred as a
potential suspect. If a crime occurred at a specific location and a person was present
during the crime, they can be inferred as a potential witness. If a person has been
convicted of a crime and there is evidence linking them to other unsolved crimes,
they can be inferred as a potential serial offender. If a person has been hurt, damaged,
or killed or has suffered, either because of the actions of the accused then they can
be inferred as a Victim.
310 S. Jain et al.
hasEvidenceAgainst
Agent
Location hasName:String
hasLocation hasLocation:String
hasLatitude :String
Organization
hments
hasLongituge:String
hasName:String SID
ation
SubClassOf orgType
OfPunis
SubClassOf
hasLoc
City
worksIn
Country nce SubClassOf
ffe Person
eO
State on hasDOB:DateTime
sD
ha hasID:String
ce
en
vid
Offence hasPunishments Criminal
sE
SubClassOf
ha
Judge
Evidence t
en
gm
ud Petitioner
hasP
Punishments
or
J
basedOn ke
tF
ma
en
unish
Lawyer
gm
ud
Under
hasA
eJ
ed
hasA Respondent
ak
m
Reasoning engines or ontology reasoners can be used to process the ontology and
apply these rules to infer new information or check for consistency. Users can query
the criminal ontology to retrieve specific information or perform complex queries,
answering questions like “Who are the potential suspects for a given crime?” and
“Are there any witnesses present at a particular location during a crime?”
Fig. 6 Sample dataset for the Indian Dowry Articles according to IPC
Person 2 (Victim): A woman who was married to the accused. Death Within 7
Years of Marriage: The victim tragically passes away within 7 years of her marriage.
Evidence of Dowry Cruelty: There is substantial evidence indicating that the
victim was subjected to cruelty and harassment related to dowry during her marriage
(Fig. 6).
312 S. Jain et al.
5 Evaluation
The evaluation of the quality of learned ontology is determined by its level of align-
ment with a manually constructed ontology, often referred to as the “gold standard”.
However, comparing two ontologies poses a notable challenge in this approach. In
practice, ontologies can be compared at two distinct levels: lexical and conceptual. To
enhance clarity, we introduce the relational level in our assessment of hierarchical and
non-hierarchical structures. The “Ontology for Reasoning on Criminal Judgments”
is a proposed ontology that aims to improve the understanding and interpretation
of criminal judgments in the legal domain. Its effectiveness will be evaluated based
on its usability, scalability, impact on legal research and decision-making, and its
potential to enhance the pursuit of justice.
Usability and practicality are crucial factors in evaluating the ontology, as it should
align with the practical needs of the legal community. The ontology should be user-
friendly, easy to integrate into existing systems, and effective in query mechanisms.
Scalability is also vital, as it should be able to handle a diverse range of judgments
from various jurisdictions.
The ontology’s impact on legal research will be measured by its contribution
to information retrieval efficiency and effectiveness. It should assist in legal argu-
mentation, decision support, and the generation of legal conclusions. The ontology
should also be able to extract valuable insights from criminal judgments, such as
trends, patterns, and relationships within legal documents. The ontology’s impact
314 S. Jain et al.
6 Conclusion
In this paper, we have created the Legal ontologies CRIMO and demonstrated their
significance in the representation, processing, and retrieval of legal information. In
the emerging landscape of the Semantic Web, their importance is expected to grow
even further. Despite numerous research projects focused on automatic construction
in this field, there is currently a lack of a standardized benchmark for evaluating the
engineering of legal ontologies.
A notable development in legal informatics, law, and justice is the CRIMO. Orga-
nizing and semantically annotating complex facts improves understanding and use
of criminal decisions. This ontology’s support for intelligent search, automated legal
reasoning, and insightful insights will revolutionize how lawyers, researchers, politi-
cians, and the general public interact with case law. Evaluations are made of its
usefulness, scalability, influence on legal research, automated legal reasoning, and
commitment to justice. The ability of the ontology to advance fairness and bolster the
rule of law is its contribution to the quest for justice. It will be essential to pursuing
justice and legal scholarship as it develops and grows.
CRIMO: An Ontology for Reasoning on Criminal Judgments 315
References
1. Valente A (2005) Types and roles of legal ontologies. In: Law and the semantic web: legal
ontologies, methodologies, legal information retrieval, and applications. Springer, pp 65–76
2. Osathitporn P, Soonthornphisaj N, Vatanawood W (2017) A scheme of criminal law knowledge
acquisition using ontology. In: 2017 18th IEEE/ACIS international conference on software
engineering, artificial intelligence, networking and parallel/distributed computing (SNPD).
IEEE, pp 29–34
3. Mezghanni IB, Gargouri F (2017) Crimar: a criminal Arabic ontology for a benchmark based
evaluation. Procedia Comput Sci 112:653–662
4. Fawei B, Pan JZ, Kollingbaum M, Wyner AZ (2019) A semi-automated ontology construction
for legal question answering. New Gener Comput 37:453–478
5. Gordon TF (2008) Constructing legal arguments with rules in the legal knowledge interchange
format (LKIF). In: Computable models of the law: languages, dialogues, games, ontologies.
Springer, pp 162–184
6. Zhang N, Pu Y-F, Yang S-Q, Zhou J-L, Gao J-K (2017) An ontological Chinese legal
consultation system. IEEE Access 5:18250–18261
7. El Ghosh M, Naja H, Abdulrab H, Khalil M (2017) Towards a legal rule-based system grounded
on the integration of criminal domain ontology and rules. Procedia Comput Sci 112:632–642
8. Soh C, Lim S, Hong K, Rhim Y-Y (2015) Ontology modeling for criminal law. In: International
workshop on AI approaches to the complexity of legal systems. Springer, pp 365–379
9. Mezghanni IB, Gargouri F (2015) Towards an Arabic legal ontology based on documents
properties extraction. In: 2015 IEEE/ACS 12th international conference of computer systems
and applications (AICCSA). IEEE, pp 1–8
10. Furtado V, Ayres L, De Oliveira M, Gustavo C, Oliveira J (2009) Towards semantic Wikicrimes.
In: AAAI spring symposium: social semantic web: where web 2.0 meets web 3.0, pp 27–32
11. Leary RM, Vandenberghe W, Zeleznikow J (2003) Towards a financial fraud ontology a legal
modelling approach
12. Zeleznikow J, Stranieri A (2001) An ontology for the construction of legal decision support
systems. In: Proceedings of the second international workshop on legal ontologies, vol 13, pp
67–76
13. Winkels R, Engers T, Bench-Capon T (2001) Proceedings of the second international workshop
on legal ontologies
14. Valente A, Breuker J, Brouwer B (1999) Legal modeling and automated reasoning with on-line.
Int J Hum-Comput Stud 51(6):1079–1125
15. Valente A (1995) Legal knowledge engineering: a modelling approach
16. Valente A, Breuker J (1994) Towards a global expert system in law. In: Bargellini G, Binazzi
S (eds) A functional ontology of law. CEDAM Publishers
17. McCarty LT (1989) A language for legal discourse i. basic features. In: Proceedings of the 2nd
international conference on artificial intelligence and law, pp 180–189
18. Sharma S, Jain S (2023) The coronavirus disease ontology (Covido). In: Semantic intelligence:
select proceedings of ISIC 2022. Springer, pp 89–103
19. Vallet D, Fernández M, Castells P (2005) An ontology-based information retrieval model.
In: The semantic web: research and applications: second European semantic web conference,
ESWC 2005, Heraklion, Crete, Greece, May 29–June 1. Proceedings 2. Springer, pp 455–470
20. Ranwez S, Duthil B, Sy MF, Montmain J, Augereau P, Ranwez V (2012) How ontology based
information retrieval systems may benefit from lexical text analysis. New Trends Res Ontol
Lexical Resourc Ideas, Projects, Syst 209–231
21. Munir K, Anjum MS (2018) The use of ontologies for effective knowledge modelling and
information retrieval. Appl Comput Informatics 14(2):116–126
22. Shanavas N, Wang H, Lin Z, Hawe G (2020) Ontology-based enriched concept graphs for
medical document classification. Inform Sci 525:172–181
23. Elhadad MK, Badran KM, Salama GI (2017) A novel approach for ontology-based dimension-
ality reduction for web text document classification. Int J Softw Innov (IJSI) 5(4):44–58
316 S. Jain et al.
24. Lytvyn V, Vysotska V, Veres O, Rishnyak I, Rishnyak H (2017) Classification methods of text
documents using ontology based approach. In: Advances in intelligent systems and comput-
ing: selected papers from the international conference on computer science and information
technologies, CSIT 2016, September 6–10 Lviv, Ukraine. Springer, pp 229–240
25. Semantic Web Rule Language (2023). Page Version ID: 1145742736. https://en.wikipedia.org/
w/index.php?title=Semantic_Web_Rule_Language&oldid=1145742736 Accessed 10 May
2023
26. Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M et al (2004) Swrl: a
semantic web rule language combining owl and ruleml. W3C Member Submission 21(79):1–31
27. Lezcano L, Sicilia M-A, Rodríguez-Solano C (2011) Integrating reasoning and clinical
archetypes using owl ontologies and swrl rules. J Biomed Informatics 44(2):343–353
28. Dao QT, Dang TK, Nguyen TPH, Le TMC (2023) Vnles: a reasoning-enable legal expert
system using ontology modeling-based method: a case study of Vietnam criminal code. In:
2023 17th international conference on ubiquitous information management and communication
(IMCOM). IEEE, pp 1–7
29. Sharma S, Jain S (2023) Covido: an ontology for Covid-19 metadata. J Supercomput 1–30
30. Sharma S, Jain S (2024) The semantics of Covid-19 web data: ontology learning and population.
Curr Mater Sci: Formerly: Recent Patents Mater Sci 17(1):44–64
31. Jain S, Harde P, Mihindukulasooriya N (2023) Nyon: a multilingual modular legal ontology
for representing court judgements. In: Semantic intelligence: select proceedings of ISIC 2022.
Springer, pp 175–183
32. Rankin G (1944) The Indian penal code. LQ Rev 60:37
Ranking of Documents Through Smart
Crawler
Abstract With the exponential boom in information storage on the internet these
days, search engines like Google are of extreme significance. The critical issue of
a search engine, ranking models are techniques utilized in engines like Google to
find relevant pages and rank them in lowering order of relevance. The offline gath-
ering of those papers is important for offering the consumer with more accurate and
pertinent findings. Earlier when an end-user issues a question, crawling is the system
of retrieving documents from the web. With the internet’s ongoing expansion, the
quantity of files that need to be crawled has grown surprisingly. It’s crucial to wisely
rank the files that want to be crawled in each iteration for any academic or mid-degree
organization because the resources for non-stop crawling are constant. Algorithms
are created to deal with the crawling pipeline already in the area while bringing the
blessings of ranking. These algorithms ought to be quick and effective to save you
from turning into a pipeline bottleneck. The proposed method uses the Hamming
distance algorithm application. Also, this method incorporates parallel processing
by using Kafka in between subtasks. Primarily, based on the Hamming Distance
algorithm software, the quest engine is designed for, an effective smart crawler is
created that ranks the page that needs to be downloaded in each new release. Evalu-
ating with different present methods, the implemented Hamming Distance technique
achieves an excessive accuracy of 99.8%.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 317
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_26
318 A. S. Dange et al.
1 Introduction
Web crawling is the process of acquiring desired information from the web. A huge
amount of objective data, such as the data that practitioners or researchers need,
are collected and indexed by web crawlers. They construct massive fields of the
gathered data by automatically collecting specified content from several websites.
Web crawlers are becoming more crucial as big data is utilized in a wider range of
industries and the amount of data available grows dramatically every year. A web
crawler that automatically finds and retrieves every website page and is used to gather
web pages. A portion of the data collected from web pages can be used to enhance the
crawling procedure. By examining the information on the pages, a focused crawler
grabs pertinent web pages. There is a lot of research on this problem that leverages
data from online pages to enhance this task. Similar features and hierarchies can be
found on several web pages of a website. By anticipating these factors, may optimize
the crawling procedure. Three additional pieces of data analyzed from earlier web
pages of a website are added to the information extraction method in this study
to improve it. The use of web crawler technology may lead to the development of
novel environment survey techniques and approaches while obviating the need for
significant amounts of labor, money, and time [1–3]. Additionally, complete access
to the functions of the public administrations and their associated data/documents
via the web is offered to lower the cost of administrative tasks within the public
sector. The information-gathering part of the search engine is a web crawler called
the robot or spider. Crawling is the process of acquiring useful web pages with
an interconnected structure link in a methodical and automated way. Technology
that automatically collects desired data from websites is called web crawling, A
system that uses reinforcement learning and works with a lot of data to examine
issues with online browsing. Due to the development of data and the enormous
number of documents that could be found to match a particular query string, indexing,
and ranking algorithms are used to discover the best-matching documents [4–7].
Crawling completeness, sometimes referred to as recall, is the proportion of the
takeout evaluation’s total web pages that have been crawled to the number of web
pages connected to its content. The study expands on the web crawling methodology,
which can automatically find and gather a large number of constructions from the
internet. The Uniform Resource Locator (URL) identifies these resources, and the
URLs allow connections to other resources. As a result, there is now a requirement
for effective web systems that can track URLs and their contents for indexing [8–
10]. Web data is becoming an increasingly valuable source of information due to the
quality and quantity of data that is available for automated retrieval. Because these
crawlers interact favorably with the web servers and consume online content at a
controlled speed, it is simple to identify ethical crawlers that follow the norms and
guidelines of crawlers. However, there are still some unethical crawlers who attempt
to deceive web servers and management to conceal their actual identities. Complex
and compound procedures should be used to find unethical crawlers [11, 12].
Ranking of Documents Through Smart Crawler 319
The next part is organized as follows: Sect. 2 describes a literature review of web
crawlers. The proposed methodology is explained in Sect. 3. Results and discussion
are included in Sect. 4. Section 5 includes comparative analysis. The conclusion is
detailed in Sect. 6.
2 Literature Review
Kaur and Geetha [17] implemented a SIMHAR crawler, based on hybrid tech-
nology, Sim+Hash, and hash-maps of Redis were utilized to detect duplication. The
distributed web crawler for the hidden web detects the crawler accurately and submits
the searchable forms. The SIM+HASH technique uses similarity-based crawling,
which aims to fetch pages that are similar to those that have already been crawled.
By minimizing duplicate crawling of similar pages, this method helps highlight
important information. However, the surface web, which consists of publicly acces-
sible web pages is not optimized for crawling. This restriction limits its applicability
to a particular portion of the internet.
Murugudu and Reddy [18] implemented a novel and efficient two-phase deep
learning data crawler framework (NTPDCF). This crawler is used for intelligent
crawling of search data to produce a large variety of effectively matched content.
Using the two-phase data crawler architecture effectively collects targeted data from
deep web interfaces, improving data harvesting and enabling the successful extraction
of relevant information from various web sources. However, data crawlers must
constantly adapt to the deep web interface’s dynamic nature to efficiently harvest
data and avoid erroneous or incomplete extraction.
Capuano et al. [19] implemented an ontology-driven to concentrate on crawling
based on the use of both multimedia and textual material on online pages. The
implemented framework was employed in a system to improve crawling activities by
fusing the outcomes with novel technologies like linked open data and convolutional
neural networks. A high degree of generalization in various conceptual domains
was also made possible by the use of formal frameworks to express knowledge as
ontologies. However, the crawler needs to operate with a lot of online pages, so this
strategy necessitates manually labeling a sub-graph of the web, which would involve
a lot of labeling work.
3 Methodology
This section describes the methodology used in the proposed system. The architecture
of proposed method is shown in Fig. 1, which contains Phase 1 Indexing and user
query preferences and Phase 2 Ranking Models.
Step 1: First we’ve to check whether we’ve already crawled the given URL or
not. A properly listed database is used for this purpose. This database carries the
information at the URLs that have already been crawled. If no longer, we will pass
directly to the subsequent step, unload the information from the URL into the Kafka
topic URL-information, after which maintain to move slowly the brand new URL that
is produced inside the Kafka topic URL statistics, and extract the database structure
form.
The truth that a website may additionally appear more than once will cause the
URL and internet site to both be indexed collectively. Given that each website’s URL
may be particular, while we look for a particular URL, the index will carry both the
website and, the URL, as a way to take much less time than truly storing the URL
Ranking of Documents Through Smart Crawler 321
value. When a crawler pulls records from a URL, the data may be separated into the
website and URL for database storage.
Step 2: The second step involves taking the subject URL-information-subject
matter, disposing the script files, and styling components, after which extracting the
heading from the bulleted points, H1, H2, H3, H3, H4, H5, and H6 tags, in addition to
any links to photos or different links that can be there. In the topic similarity theme,
the extracted subject matter is subsequently pasted.
Step 3: In this step, the grammar can be eliminated from the text facts extracted
from crawled information so that it isn’t optimized for search engines. Alternatively,
the facts may be compared to the facts already present inside the database, and if
there’s a similarity of more than 50%, it is going to be given a unique identification.
As the search engine won’t get a specific string to evaluate, grouping all similar
objects will be critical as it will appreciably cut down on the time required for the
procedure of travelling via the complete database and extracting the particular_id
when it appears. This system may be repeated for each message observed on the
topic. If the message is much like every other message, it will likely be saved and
compared with that message.
The scheduler could be applied with dynamic URLs where the records, to rein-
troduce the URL to the URL topic, first eliminate the database’s details. For static
websites that do not often trade this can be disabled in order that it doesn’t go through
the first step once more. To crawl simply one static web page using this can be useful.
The agenda time can be set dynamically for dynamic websites. Consequently, if
the URLs are up to date within 1 min, the original time might be 1 min. If it is updated,
the time for a sure website can be increased with the aid of 2 min, just because it
322 A. S. Dange et al.
will if it’s far set to run again after 2 min. With this specific architecture, we will
improve consumption when the message in the concern will increase and decrease it
whilst the message within the topic lowers. Both vertical and horizontal scalability
may be aided by this occasion architecture. One client or producer’s failures could
have an impact on the complete device. As there can be different purchasers there
due to the fact the process is asynchronous and takes much less time to address than
a synchronous method, the outage of both of them does not have an impact on the
system. Algorithm:
1. Check whether the given URL is already crawled or not.
2. If not dump the data from the URL into Kafka topic URL data.
3. Crawl the new URL that is produced in the Kafka topic.
4. Take the subject URL data topic and eliminate script files, styling components
and extract heading from bulleted points.
5. The grammar will be removed from the text information extracted from crawled
data.
6. The information is compared to the data already present in database and if more
than 50% similarity is found, it will be given a unique ID.
The two primary phases that form the indexing and ranking model’s implemented
architecture each define a specific step in the ranking and indexing process for either
offline web pages or documents. The version can be utilized for storing documents
offline or web pages online. Following is the description of the model’s two primary
levels.
Phase 1: The first step will be getting a search query entered by a person. We are
able to offer users flexibility in determining their options and ordering priority based
totally on their wishes. Relying on their needs and alternatives, each user has the
choice of selecting any set of standards, or all of them.
After determining a person’s possibilities and desires, the user’s query model
begins processing and utilizes lemmatization and stemming to decrease the varieties
of inflectional and occasionally derivationally paperwork associated with a single
base form of the phrase within the question. Then it begins to move slowly online
and offline files and pages, then analyses the material via personal requests. The
model engine page crawls to determine what’s on it, which is called the system of
indexing after mastering a web page’s URL or file route, and the effects are then
listed. The version starts with the keyword criterion of matching person queries in
three locations: web page URL, domain call, and page content material. The model
engine also appears up the page’s creation data in its metadata.
Phase 2: the second segment is initiated by means of a web page handler module.
It identifies types of pages or files. Also, it begins to load page contents in the builder
page which will keep track of every document or page content to compare search
query to it.
Module 1 loads the pages and documents records. Rank is calculated using rank
calculator and user’s standards. Section one findings establish the possibilities of
the person earlier than loading page content inside the web page handler. The rank
calculator receives contents at this point.
Ranking of Documents Through Smart Crawler 323
The first degree discovers a pattern to decide the advent and modification dates of
pages or files with the aid of processing the loaded material by a web page handler.
A user’s seek query is compared to the opposite web page’s content material and by
way of counting the hyperlinks number that factors to the ones different pages which
can be linked to the consumer’s search query, it’s also viable to decide the variety of
votes for every web page.
The weight module successfully determines the weight initial for every criterion
according to the preferences of a user after Module 2 with that feature enabled.
Computes the ultimate score for every document and page by passing these values to
the rank calculator. Rank statistics, which is responsible for the final values displayed
for every page, receives the obtained page scores.
Hamming Distance approach is used to decide comparable words. If more than
one phrase is present, the word with the highest degree of similarity may be ordinary.
If the string length does not match hammingDist(str1, str), padding is used.
The value of each heading and bullet factor is decided. To make certain that
each type has a priority fee, as proven in Table 1, the similarity among H1 and H2
comparisons should not be greater than the contrast among H1 and H6.
It will take much less time for the data to display because the cache may be
memory-sensitive as opposed to time-touchy, as data may be deleted when the cache
reminiscence is complete until that report is saved in the cache. Whilst a specific text
is searched and given, the 100% similarity index will be saved in the cache so that
after the identical text is searched again, the 100% similarity index is crawled and
statistics can be offered as output. When compared to searching through the complete
database, the likelihood of finding a match for a given record is too great, thus the
records with more similarity index that is cached will assist us in quickly identifying
the record. When possible, records with a higher similarity index are discovered,
and the old records will be replaced. During search engine optimization, as shown
in Fig. 2, a search engine for crawled data is introduced.
We can add ranking to the architecture as some websites need to give their content
more attention. Once we have crawled the site the most times, results will be sorted
by priority topic. Figure 3 shows an architecture of a web crawler with induction of
ranking of URL.
The Experimental setup includes the Ubuntu 20.04 operating system and 32 GB
of RAM. The technologies utilized to implement the concept are Docker, Docker-
compose, Apache Kafka (Docker Image instead of actual software), and Python
programming language. The implemented concept is powered by an Intel 17 octal-
core processor. Wikipedia, Udemy, Medium, and Geeks for Geeks are among the
initial URLs that are injected. Web crawlers will concentrate on headings and titles
rather than the complete website’s data. If a website adopts the implemented method-
ology, the primary material can always be made up of headers and titles, allowing
for the provision of just restricted and essential content.
Figure 4 shows a graphical illustration with and without a similarity index.
5 Comparative Analysis
This section contains an analysis of existing and implemented models. Table 2 shows
a comparative analysis of existing and implemented models. The analysis is done
with the help of attributes such as accuracy, precision, recall and f-measure.
As shown in the above table, the proposed method achieves 99.8% accuracy. The
accuracy is high as compared to other methods. We can also observe that values for
precision, recall and f-measure are 99.9%, 98% and 99%, respectively.
326 A. S. Dange et al.
6 Conclusion
The proposed technique is pushed by occasions. If there is data for processing, till
then complete sources will no longer be available. For parallel processing complete
venture has been divided into separate subtasks. To complete the tasks required, each
individual subtask can have a separate wide variety of entities. That is finished through
the usage of Kafka in between subtasks. The first subtask could be a manufacturer
and 2nd subtask can be a consumer of Kafka. We are able to evenly distribute records
with the help of plug and play. The sources will not be used until they’re wanted. The
proposed technique crawls quicker than regular structure. With proposed technique,
a couple of URLs can be handled at the side of horizontal and vertical scalability
which permits us to address greater statistics than other architectures. Compared
with other existing methods, the implemented Hamming Distance method achieves
a high accuracy of 99.8%.
References
1. Kim YY, Kim YK, Kim DS, Kim MH (2020) Implementation of hybrid P2P networking
distributed web crawler using AWS for smart work news big data. Peer-to-Peer Network Appl
13:659–670
2. Uzun E (2020) A novel web scraping approach using the additional information obtained from
web pages. IEEE Access 8:61726–61740
3. Zhang J, Zou T, Lai Y (2021) Novel method for industrial sewage outfall detection: water
pollution monitoring based on web crawler and remote sensing interpretation techniques. J
Clean Prod 312:127640
4. Bifulco I, Cirillo S, Esposito C, Guadagni R, Polese G (2021) An intelligent system for focused
crawling from Big Data sources. Expert Syst Appl 184:115560
5. Rajiv S, Navaneethan C (2021) Keyword weight optimization using gradient strategies in event
focused web crawling. Pattern Recogn Lett 142:3–10
6. Yang S, Wi S, Park JH, Cho HM, Kim S (2020) Framework for developing a building material
property database using web crawling to improve the applicability of energy simulation tools.
Renew Sustain Energy Rev 121:109665
Ranking of Documents Through Smart Crawler 327
7. Ang PS, Teo DCH, Dorajoo SR, Prem Kumar M, Chan YH, Choong CT, Phuah DST, Tan
DHM, Tan FM, Huang H, Tan MSH (2021) Augmenting product defect surveillance through
web crawling and machine learning in Singapore. Drug Saf 44(9):939–948
8. Zhao X, Zhang W, He W, Huang C (2020) Research on customer purchase behaviors in online
take-out platforms based on semantic fuzziness and deep web crawler. J Ambient Intell Hum
Comput 11:3371–3385
9. Hwang J, Kim J, Chi S, Seo J (2022) Development of training image database using web
crawling for vision-based site monitoring. Autom Constr 135:104141
10. ElAraby ME, Shams MY (2021) Face retrieval system based on elastic web crawler over cloud
computing. Multimedia Tools Appl 80:11723–11738
11. Schedlbauer J, Raptis G, Ludwig B (2021) Medical informatics labor market analysis using
web crawling, web scraping, and text mining. Int J Med Inform 150:104453
12. Attia M, Abdel-Fattah MA, Khedr AE (2022) A proposed multi criteria indexing and ranking
model for documents and web pages on large scale data. J King Saud Univ Comput Inf Sci
34(10):8702–8715
13. Sharma AK, Shrivastava V, Singh H (2021) Experimental performance analysis of web crawlers
using single and multi-threaded web crawling and indexing algorithm for the application of
smart web contents. Mater Today Proc 37:1403–1408
14. Hosseinkhani J, Taherdoost H, Keikhaee S (2021) ANTON framework based on semantic
focused crawler to support web crime mining using SVM. Ann Data Sci 8(2):227–240
15. Kaur S, Singh A, Geetha G, Cheng X (2021) IHWC: intelligent hidden web crawler for
harvesting data in urban domains. Complex Intell Syst 1–19
16. Hosseini N, Fakhar F, Kiani B, Eslami S (2019) Enhancing the security of patients’ portals and
websites by detecting malicious web crawlers using machine learning techniques. Int J Med
Inform 132:103976
17. Kaur S, Geetha G (2020) SIMHAR-smart distributed web crawler for the hidden web using
SIM+ hash and redis server. IEEE Access 8:117582–117592
18. Murugudu MR, Reddy LSS (2023) Efficiently harvesting deep web interfaces based on adaptive
learning using two-phase data crawler framework. Soft Comput 27(1):505–515
19. Capuano A, Rinaldi AM, Russo C (2020) An ontology-driven multimedia focused crawler
based on linked open data and deep learning techniques. Multimedia Tools Appl 79:7577–7598
Ensemble Learning Approaches
to Strategically Shaping Learner
Achievement in Thailand Higher
Education
Abstract Thailand faces a severe problem of students dropping out of the higher
education system. Therefore, this research has three critical objectives: (1) to study
the context of students’ academic achievement in science and technology at the
higher education level, (2) to assemble a model to predict the risk of students drop-
ping out of higher education, and (3) to evaluate a model for predicting the risk of
a student dropping out from higher education. The population and research sample
were 2361 students’ academic achievements from five educational programs of the
Faculty of Science and Technology at Rajabhat Maha Sarakham University during
the 2010–2022 academic year. The research tool utilized data mining and super-
vised machine learning techniques: Decision Tree, Naïve Bayes, Neural Networks,
Gradient Boosting, Random Forest, and Majority Voting. Model performance was
evaluated using the cross-validation approaches and confusion matrix techniques,
with four indicators: Accuracy, Precision, Recall, and F1-Score. The results showed
that learners’ context in science and technology had various learning achievements.
The educational program that needs to be monitored is the Bachelor of Science
Program in Computer Science. This research successfully developed a predictive
model for student dropout risk with an accuracy of 88.14% and an S.D. equal to
1.04. Therefore, this research dramatically benefits the public and stakeholders of
Rajabhat Maha Sarakham University, who should be encouraged and encouraged to
continue this research.
S. Bussaman · P. Nasa-Ngium
Faculty of Science and Technology, Rajabhat Maha Sarakham University, Maha Sarakham 44000,
Thailand
W. S. Nuankaew · T. Sararat · P. Nuankaew (B)
School of Information and Communication Technology, University of Phayao, Phayao 56000,
Thailand
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 329
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_27
330 S. Bussaman et al.
1 Introduction
The influence of artificial intelligence has spread into the education industry, known
as “educational data mining and learning analytics”. Educational data mining is
a space where data scientists use data relevant to learners, instructors, and other
educational contexts to develop the potential of students and teachers [1, 2], while
learning analytics is a tool for creating practical educational data mining. Learning
analytics typically consists of four components: descriptive analytics, diagnostic
analytics, predictive analytics, and prescriptive analytics [2, 3].
Descriptive analytics is similar to a survey tool describing what is being studied
and researched. In comparison, diagnostic analytics diagnoses problems and causes
related to the inspected object. Predictive analytics uses the cause of pain to shape a
forecast to find future answers. Finally, prescriptive analytics is about introducing a
variety of practical alternatives to give researchers a clear direction. These elements
are essential ingredients for research development in the education industry [4, 5].
Many researchers are interested in using educational data mining to develop students
for various purposes: to improve student performance [6, 7], to predict learning
achievement [8, 9], to recommend relevant academic and career programs, and to
develop learning models that fit learning styles [10], etc.
In Thailand’s education context, education level is divided into two primary
classes: basic education and higher education. Thai basic education provides students
with general knowledge, while Thai higher education focuses on specialized educa-
tion. However, the major problem faced by universities in Thailand is dropping
out of the student system and graduation not as designed [11, 12]. Rajabhat Maha
Sarakham University has been affected by dropping out of the student education
system like other universities. This is the principal reason that drives researchers to
carry out this research. This research has three main objectives. The first objective
is to study the problems affected by students’ dropout in the Faculty of Science
and Technology at Rajabhat Maha Sarakham University. The second objective is
to develop a predictive model for the dropout risk of students from the Faculty of
Science and Technology. The final objective is to evaluate the effectiveness of the
risk prediction model of the Faculty of Science and Technology dropout students.
The data was collected from 2361 students from five educational programs from the
Faculty of Science and Technology, Rajabhat Maha Sarakham University, during
the 2010–2022 academic year. Research tools and methodologies were CRISP-DM
and supervised machine learning tools [7, 9]: Decision Tree, Naïve Bayes, Neural
Networks, Gradient Boosting, Random Forest, and Majority Voting. Model perfor-
mance was evaluated using the cross-validation approaches and confusion matrix
techniques, with four indicators: Accuracy, Precision, Recall, and F1-Score.
In this research, the researchers are highly committed to determining the guide-
lines and successfully designing the solutions that Rajabhat Maha Sarakham Univer-
sity faces. In addition, the researchers hope that this research will continue to benefit
the public.
Ensemble Learning Approaches to Strategically Shaping Learner … 331
As for the population and research samples, researchers collected data on the learning
achievement of students in the Faculty of Science and Technology at Rajabhat Maha
Sarakham University during the academic year 2010–2022 from five educational
programs: Bachelor of Science Program in Biology, Bachelor of Science Program in
Chemistry, Bachelor of Science Program in Computer Science, Bachelor of Science
Program in Mathematics, and Bachelor of Science Program in Physics, as detailed
in Table 1.
The data collected were classified by education program and student status, as shown
in Table 1. The data used in this research will be anonymized for confidentiality and
research purposes only.
Table 1 shows the collected sample data. It has a total of 2361 students from five
educational programs. The educational program with the most significant number of
students is B.Sc. Computer Science with a total enrollment of 688 students. Moreover,
when considering the details, it was found that the B.Sc. Computer Science has the
highest issues, with 161 dropout students and 108 who graduated late. Rajabhat Maha
Sarakham University needs to pay attention to this matter urgently.
The data mining techniques have been used as the research methodology and tools
to define research guidelines using CRISP-DM principles to determine research
332 S. Bussaman et al.
In analyzing and interpreting the findings, researchers used the last two steps of the
CRISP-DM process to guide their operations: evaluation and deployment.
Researchers divided the process into two parts for the evaluation: the cross-
validation approach and the confusion matrix to determine indicators. The cross-
validation process divides collected data into equal portions called k-fold. Then take
several data to develop a model called the training dataset, and the rest to test the
developed model is called the testing dataset.
To determine the most efficient model, it is necessary to use the confusion matrix
as a metric. This research has four metrics: accuracy, precision, recall, and f1-score.
Accuracy is the value used to determine the overall model performance, and it is
calculated as the number of correctly predicted data divided by the total data. Preci-
sion is the classification of predictions within each class to determine the model’s
predictive ability. It is calculated by dividing the correctly predicted data for each
class by the number of class members. Recall is the actual value the model can accu-
rately predict by class. It is calculated as the predicted actual value divided by the
number of members in the class. Finally, the F1-Score is a metric built on Precision
and Recall to be used as a criterion in conjunction with accuracy. It can be calculated
from Eq. (1).
After obtaining the most suitable model, it goes into deployment. Deployment can
be carried out in many ways, such as developing a user manual, designing a perfor-
mance report, or developing an application. For this research, it has been proposed to
334 S. Bussaman et al.
the administrators of Rajabhat Maha Sarakham University and the Faculty of Science
and Technology to determine a solution that is consistent with student behavior in a
sustainable way.
3 Research Results
The researchers summarized the results of the contextual analysis of learners using
basic statistics, including minimum, maximum, mean, mode, median, and S.D., as
detailed in Table 2.
Table 2 presents an overview of student achievements from five educational
programs of the Faculty of Science and Technology at Rajabhat Maha Sarakham
University. It was found that all learners had a moderate average academic achieve-
ment, with a mean of 2.49 out of 4.00 and S.D. equal to 0.66. However, administrators
and stakeholders should be aware that B.Sc. Computer Science programs have low
average learners, with a mean equal to 2.24 and S.D. equal to 0.63. It is consistent
with Table 1, showing many dropout students.
The developed models based on the data mining development process are classified
by method characteristics, as summarized in Table 3.
Table 3 shows the performance test results of the model classified by technique.
It was found that the model developed with the voting technique had the highest
accuracy, with an accuracy of 88.14%. Therefore, it can be concluded that the voting
technique produces a model suitable for use and deployment. The researchers detail
the model performance in the next section.
Ensemble Learning Approaches to Strategically Shaping Learner … 335
The most suitable model for this research is the model developed with the voting
technique. The efficiency was tested with the confusion matrix, as detailed in Table 4.
Table 4 enumerates model performance using the confusion matrix technique and
four indicators. It was found that the model had predictive ability in all classes,
with model accuracy equal to 88.14%. In addition, the model has a high level of
predictive capability in each category, with OSC’s F1-Score equal to 94.07%, NSC’s
F1-Score equal to 36.30%, DPO’s F1-Score equal to 85.50%, RSD’s F1-Score equal
to 66.06%. Therefore, it can be concluded that this model can be adapted and utilized
further.
4 Research Discussion
This research achieves all three objectives, and the researchers can discuss the
following results.
The researchers extracted data from five educational programs at the Faculty of
Science and Technology at Rajabhat Maha Sarakham University. It was discovered
that the educational program that required special vigilance was B.Sc. Computer
Science, as concluded in Table 1. It shows only 55.81% of students in the program
who completed the designed curriculum (384 out of 688 students). In addition, the
number of dropout students and graduates who graduated not on schedule ranked the
highest, representing 47.92% of the total dropout students (161 out of 336 students)
and 48.00% of the total, who graduated not on schedule (108 out of 225 students).
Such findings and observations drive researchers to find solutions to these problems.
The researchers developed a model to predict the risk of students dropping out and
failing to complete their studies. The researchers used a data mining approach and six
supervised machine learning techniques to develop the most acceptable model: Deci-
sion Tree, Naïve Bayes, Neural Networks, Gradient Boosting, Random Forest, and
Majority Voting. The model development results from each technique are summa-
rized and presented in Table 3. Overall, the researchers found that all techniques
Ensemble Learning Approaches to Strategically Shaping Learner … 337
were able to produce highly efficient models, with the most effective models being
those using the majority voting technique. It has an accuracy value of 88.14% and
an S.D. equal to 1.04. It can be interpreted as a reasonable model to implement and
deploy. The selected model was put into a detailed performance test, which is listed in
Table 4. Table 4 shows that the selected model predictability is distributed among all
classes. There is one area where there is a slight improvement: the model still cannot
predict graduation not on schedule with low accuracy (recall equal to 22.67%).
Finally, the researchers concluded that the research achieved the intended research
objectives. The research has learned the details and the context of the students that the
Faculty of Science and Technology and Rajabhat Maha Sarakham University need
to pay special attention. Moreover, this research has studied a learning model based
on science and technology. Therefore, the researchers concluded that this research
is beneficial and appropriate and should be disseminated to the public.
5 Conclusion
In conclusion, this research found that all objectives were implemented and achieved.
The data collection consists of 2361 students’ learning achievements in the Faculty of
Science and Technology at Rajabhat Maha Sarakham University during the academic
year 2010–2022 from five educational programs: Bachelor of Science Program in
Biology, Bachelor of Science Program in Chemistry, Bachelor of Science Program
in Computer Science, Bachelor of Science Program in Mathematics, and Bachelor
of Science Program in Physics, as detailed in Table 1.
Of particular note is that the researchers found that in the Bachelor of Science
Program, many students were at high risk of dropping out, and many were more likely
to graduate not on a schedule. Data over the past decade, as shown in Table 1, shows
that the number of students in such programs has a graduation rate of only 55.81%
(384 out of 688 students), and dropout students are the highest with 161 students,
representing 47.92% (161 out of 336 students). It highly emphasizes the importance
and necessity of developing a model to predict the dropout risk of students in the
Faculty of Science and Technology at Rajabhat Maha Sarakham University.
The model developed from the six supervised learning techniques showed that
the model constructed with the majority voting technique had the highest accuracy,
with an accuracy of 88.14% and an S.D. equal to 1.04, as compared in Table 3.
Moreover, the model was tested for performance by the cross-validation approach, the
confusion matrix technique, and four metrics, as detailed in Table 4. Therefore, it can
be concluded that this research was successful and deserves further dissemination.
338 S. Bussaman et al.
6 Research Limitations
As for the limitations of this study, the researchers noted that the data collection
took a long time and could be considered extensive data. It is, therefore, an advan-
tage in this research. However, to conduct good research, accepted research results
require support from stakeholders and Rajabhat Maha Sarakham University admin-
istrators. Researchers have great expectations that this research will be carried out
and supported in their organization and are encouraged to continue it in other ways.
Acknowledgements This research project was supported by the Thailand Science Research and
Innovation Fund and the University of Phayao (Grant No. FF66-UoE002). In addition, this research
was supported by many advisors, academics, researchers, students, and staff. The authors would
like to thank all of them for their support and collaboration in making this research possible.
References
11. Nuankaew P (2019) Dropout situation of business computer students, University of Phayao.
Int J Emerg Technol Learn 14:115–131. https://doi.org/10.3991/ijet.v14i19.11177
12. Iam-On N, Boongoen T (2017) Improved student dropout prediction in Thai University using
ensemble of mixed-type data clusterings. Int J Mach Learn Cyber 8:497–510. https://doi.org/
10.1007/s13042-015-0341-x
13. Shahiri AM, Husain W, Rashid NA (2015) A review on predicting student’s performance
using data mining techniques. Proc Comp Sci 72:414–422. https://doi.org/10.1016/j.procs.
2015.12.157
14. Nosseir A, Fathy Y (2020) A mobile application for early prediction of student performance
using fuzzy logic and artificial neural networks. Int J Interact Mobile Technol 14:4–18. https://
doi.org/10.3991/ijim.v14i02.10940
Harnessing Ridge Regression and SHAP
for Predicting Student Grades:
An Approach Towards Explainable AI
in Education
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 341
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_28
342 V. Katkar et al.
more effectively and ultimately enhancing educational achievement. Over the years,
researchers have used statistical techniques to predict student performance with
varying degrees of success.
The advent of machine learning (ML) has provided an opportunity to enhance
prediction accuracy and gain new insights into this critical issue. A range of ML tech-
niques have been applied to educational data, from simple models like Linear Regres-
sion to more complex ones like Random Forests and Boosting techniques. While
these models have shown promise, they vary greatly in their predictive performance
and, more importantly, their interpretability [1, 2].
The ability to interpret ML models, known as Explainable AI (XAI), has become
increasingly important. As ML models are deployed in more contexts, stakeholders
need to understand the reasoning behind the model’s predictions [3, 4]. This is espe-
cially true in education, where interventions based on model predictions have direct
impacts on students’ lives. SHapley Additive exPlanations (SHAP) is one of the
methods used to interpret complex models, offering a way to attribute the contribution
of each feature to the prediction [5, 6].
This research aims to not only investigate the predictive accuracy of several ML
models for student performance but also to explore their interpretability using SHAP.
This dual focus on predictive power and interpretability fills a crucial gap in the
existing literature. By applying a variety of regression techniques and using SHAP
to interpret the most successful one, we aim to provide a comprehensive view of
student performance prediction that is both highly accurate and readily interpretable.
This work will contribute to the existing body of knowledge by presenting a
comparative analysis of the accuracy of several regression models, highlighting
the efficacy of Ridge Regression in this context. Furthermore, it will shed light on
the applicability of SHAP in interpreting Ridge Regression, thereby enhancing our
understanding of the most influential factors affecting student performance. These
findings will not only advance academic understanding in this field but also provide
educators and policy-makers with valuable, interpretable insights for improving
student outcomes.
The primary objectives of this research are as follows:
1. Investigate various machine learning regression models: The study aims to
explore the efficacy of multiple regression techniques (including Linear Regres-
sion, Ridge Regression, Lasso Regression, Elastic Net, Decision Tree Regression,
Random Forest Regression, AdaBoost, Gradient Boosting, Bagging, XGBoost,
and K-Nearest Neighbors) for predicting student academic performance.
2. Identify the most accurate model: Based on the comparative analysis of the
aforementioned models, the objective is to ascertain the model that provides the
highest predictive accuracy for student academic performance.
3. Apply Explainable AI using SHAP to the most accurate model: This study seeks
to use SHAP to interpret the predictions of the model identified as the most
accurate.
Section 2 of this paper offers a comprehensive literature review that grounds our
study within the existing body of work. In Sect. 3, we outline our methodology,
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 343
detailing the data, machine learning models used, and the application of SHAP. Our
findings are then presented in Sect. 4 (Results), which leads to a deeper interpretation
of these results in Sect. 5. Finally, Sect. 6 concludes the paper with a summary of
the key findings.
2 Literature Review
The prediction of academic performance has been a topic of research interest for
several decades, given its profound implications for educational policy, instruction,
and learning. Traditional approaches have relied heavily on statistical methods, using
factors such as students’ previous academic records, socioeconomic status, and other
personal characteristics as predictors [7, 8].
However, the emergence of machine learning (ML) methods has marked a
paradigm shift in this field. ML offers more sophisticated, non-linear modeling capa-
bilities that can capture the intricate relationships between predictors and student
performance. Numerous studies have begun to explore various ML techniques to
predict academic performance.
For instance, Kotsiantis et al. [9] applied various ML techniques, including Deci-
sion Trees, Random Forests, and Support Vector Machines (SVM), to predict student
grades in a distance learning context. They found that SVMs provided the highest
accuracy among the models tested. Similarly, Márquez-Vera et al. [10] used several
ML techniques to predict student dropouts, finding that Decision Trees and Random
Forests performed best in their study.
The application of ensemble methods, such as AdaBoost and Gradient Boosting,
has also been explored. Cortez and Silva [11] used AdaBoost for predicting student
performance and found it to be highly effective. Moreover, ensemble methods like
XGBoost have shown impressive performance in predicting student outcomes [12].
While these studies highlight the promising role of ML in predicting academic
performance, there has been less focus on the interpretability of these models.
However, this is starting to change with the advent of Explainable AI (XAI), which
seeks to make the reasoning behind ML predictions understandable to human users
[13]. The application of XAI in education is still a nascent field and constitutes a
significant gap in the literature, which our study aims to address.
3 Methodology
Figure 1 depicts the research methodology this paper has employed. Dataset prepro-
cessing, feature selection, training, and model evaluation are steps involved. The
success of a predictive model relies on each individual step. Several preprocessing
processes were performed on our data before we applied any machine learning tech-
niques. During this stage, a Label Encoder was used to translate ordinal attributes
344 V. Katkar et al.
into numeric attributes, and a Standard Scaler was used to standardize numeric vari-
ables like test scores. These measures guaranteed that our models could efficiently
assimilate new information.
After the data was cleaned and organized, we used a feature selection procedure
to narrow down the features from 32 to just a handful. Each attribute was analyzed
for its association with the dependent variable, or the students’ final grade. Next,
we looked for the highest absolute connection between each attribute and the final
grade, and we narrowed it down to the top 15.
To foretell students’ final marks, we employed many regression models. Different
types of regression models, such as Linear Regression, Ridge Regression, Lasso
Regression, Elastic Net Regression, Decision Tree Regression, Random Forest
Regression, AdaBoost Regression, Gradient Boost Regression, Bagging Regres-
sion, XGBoost Regression, and KNN Regression, were considered. We compared
the results of many models, each of which takes a somewhat different approach to
learning from data, to determine which one is most suited to accomplish our goal.
Finally, we used SHAP (SHapley Additive exPlanations), a method in Explainable
AI, to learn more about the prediction performance of our final model. With this
strategy, we could analyze how each feature affected the model’s forecast. Detailed
description of the methodology is given below.
3.1 Dataset
weekly study time, and other factors that could potentially affect student perfor-
mance. The target variable, which the models are trying to predict, is the final grade
of the student, represented as a numerical value.
Given the diverse nature of the dataset, careful preprocessing was essential to
ensure that our machine learning models could accurately interpret the data. The
preprocessing methods used were as follows:
1. Encoding Ordinal Attributes
2. Standardizing Numeric Attributes
3. Feature Selection Using Correlation
Encoding Ordinal Attributes The dataset contained several ordinal attributes,
such as family educational background, which required conversion to a format
suitable for our models. To accomplish this, we employed a Label Encoder. This
technique assigns a unique numeric value to each category within an attribute. It
effectively translates ordinal data into a format that our models can interpret while
preserving the ordered nature of the categories. This step was crucial, as many
machine learning algorithms require numeric input.
Standardizing Numeric Attributes The dataset also included numeric attributes,
such as recent academic grades. Given that these attributes can vary in range and
scale, we opted to standardize them using the Standard Scaler technique. Standard
Scaler adjusts the values of each numeric attribute to have a mean of 0 and a standard
deviation of 1. This process is crucial as it brings all numeric attributes onto the same
scale, preventing attributes with larger scales from dominating those with smaller
scales. It also helps algorithms converge faster during training.
Feature Selection Using Correlation With the objective of creating the most
effective predictive model, we sought to reduce the number of features from the
original 32. To accomplish this, we applied a correlation-based feature selection
method. Correlation measures the linear relationship between two variables. In this
case, we calculated the correlation between each feature and the target variable, i.e.,
the final student score. The result is depicted in Fig. 2.
From these calculations, we selected the top 15 attributes that had the highest
absolute correlation with the final student score. This step was critical as it enabled
us to focus on the most relevant predictors and exclude features that contributed
less to our target variable, thereby improving the efficiency and performance of our
machine learning models. Reducing the dimensionality of the dataset in this way
also helped to alleviate potential issues related to overfitting and multicollinearity.
Following these preprocessing steps, the dataset was appropriately formatted and
ready for model training. This preparation was critical to ensure the success of the
subsequent model selection and evaluation process.
346 V. Katkar et al.
In order to predict the final grade of students, a variety of regression models were
selected, each of which uses different strategies to learn from the data. The following
subsections provide a brief description of each model.
Linear Regression is one of the most fundamental methods of predictive analysis.
The overarching goal of regression analysis is to look at two variables and explore
them:
• Does a given collection of predictor variables effectively predict a dependent
variable?
• Which specific predictor variables exist and how do they affect the outcome
variable
Ridge Regression: When the independent variables in a multiple regression
model are highly correlated, ridge regression can be used to estimate the coeffi-
cients of the model. In certain cases where the least squares approach would just fail,
ridge regression yields reliable results.
In Lasso Regression, L1 regularization is carried out, which imposes a penalty
proportional to the absolute size of the coefficients. Sparse models with minimal
coefficients are produced by this method of regularization, which allows for efficient
feature selection.
In Elastic Net Regression, characteristics of the Ridge and Lasso regression
models are combined. The model is penalized by making use of both the L2-norm
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 347
and the L1-norm in order to obtain the regularization properties that Ridge is known
for. Because of this, the model could end up having zeroes for its coefficients (just
like Lasso).
Decision Tree Regression: The data were separated into their respective groups
by the “Decision Tree Regression” algorithm by use of a sequence of if–then expres-
sions. These are built with the use of an algorithm that determines the most effective
ways to divide a dataset into sections based on a number of different factors.
Building many decision trees in advance of testing is how the ensemble learning
approach known as Random Forest Regression gets its job done. The results of the
testing then serve as a representation of the average prediction made by the individual
trees. It stops decision trees from inappropriately overfitting their data, which is a
common problem.
Yoav Freund and Robert Schapire are responsible for the development of the
statistical meta-algorithm known as AdaBoost Regression. This technique is used for
categorizing data. It is compatible with a broad variety of distinct learning algorithms
and may be used to enhance the efficiency of such algorithms. A prediction model is
generated by the Gradient Boost Regression algorithm in the form of an ensemble
of less accurate prediction models, which are often decision trees.
If you want predictions to be more accurate while also having less of a variety, you
may use a method known as Bagging Regression. This method entails constructing
many sets of the same data by repetitive combination and subseting, so that you can
compare them.
XGBoost Regression package is a framework for distributed gradient boosting
that has been optimized for performance and also takes into account adaptability and
portability considerations.
KNN Regression method calculates an estimate of a result by considering the
degree to which individual data points are equivalent to one another. The objective
is to find out which of a certain number of training samples is most comparable to
the most recent data point, and then to base a prediction on that finding.
Following the data preprocessing and feature selection steps, we proceeded to the
training phase, where each model was trained and evaluated. We utilized a 70–30
split for our dataset, where 70% of the data was used for training our models, and
the remaining 30% was used for testing their performance. This split helps ensure
that our models were able to generalize well to unseen data and were not just fitting
to the specific patterns in the training data.
For the evaluation of our models, we used a combination of metrics to assess the
performance. Because a single measure cannot possibly represent all elements of
a model’s performance, it is common practice to make use of numerous metrics in
combination with one another. It is crucial to keep this fact in mind. The following
measurements and calculations were utilized.:
348 V. Katkar et al.
4 Experimental Results
The RMSE for each training model is displayed graphically as a bar chart in
Fig. 6. The graph shows that the Ridge model generates the best results, with the
lowest MAE score of 1.4623, whilst the Elastic Net and Gradient Boosting regression
models get the highest scores; 2.3081 and 2.4126, respectively.
It is evident from Figs. 3, 4, 5 and 6 that the performance of Ridge Regression
surpassed that of the other models over a wide range of statistical measures. Based
on the obtained results, it appears that Ridge Regression exhibits favorable qualities
as a potential contender for the prediction task.
350 V. Katkar et al.
5 Explainable AI
In this research, we chose the SHAP model as our XAI model so that we could
interpret the outcomes of our regression model. SHAP offers a solid framework for
figuring out how different model characteristics influence the output of the model.
By utilizing SHAP, we are able to gain a better understanding of how each feature
contributes to the predictions made by the model, which in turn increases the read-
ability and transparency of the model. The SHAP visualizations that were produced
as a result provide us with insight into the decision-making process of the model and
offer an intuitive picture of the relevance of features. By making use of SHAP, we
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 351
may rest comfortable that our model is both reliable and explicable, which enables
us to derive credible conclusions from the data that we have.
Figures 7 and 8 illustrate the impact that certain characteristics have on the predic-
tions made by the model for the instance x that has been provided. It is an indication
that the related feature has a positive impact on the model’s prediction if the bars
are skewed to the right of the predicted value (E[f(x)]). On the other hand, negative
values indicate aspects that are detrimental to the forecast. The characteristics are
presented in the scatter diagram in a descending sequence, beginning at the top and
working their way down to the bottom, according to the diminishing importance they
have in developing a forecast for x. According to what was hypothesized, character-
istics that are closer to the plot’s upper edge are more significant than those that are
closer to the plot’s lower edge.
The contribution of individual attributes to the model prediction for the given
instance x is illustrated in Fig. 7, which can be found here. E[f(x)] equals 10.95,
which stands for the expected average model forecast over the dataset; on the other
hand, f(x) equals 12.242, which represents the models’ prediction for the instance
x in question. The fact that there is a difference of 1.292 between f(x) and E[f(x)]
indicates that the models’ forecast for x is significantly different from the prediction
that would be expected based on the entirety of the dataset. Further investigation
reveals that the attributes Dalc and higher make a significant contribution to this
variance, which in turn causes the model forecast to be greater than the output that
was anticipated.
The contribution of individual attributes to the model prediction for the given
instance x is illustrated in Fig. 8, which can be found here. E[f(x)] equals 10.394,
which stands for the expected average model forecast over the dataset; on the other
hand, f(x) equals 10.706, which represents the models’ prediction for the instance x in
question. The fact that there is a difference of 0.312 between f(x) and E[f(x)] indicates
that the models’ forecast for x is slightly different from the prediction that would be
expected based on the entirety of the dataset. Further investigation reveals that the
attributes Dalc and Fedu make a significant contribution to this variance, which in
turn causes the model forecast to be greater than the output that was anticipated.
6 Conclusion
This study set out to predict student grades using various personal, family, and
academic attributes. Through an extensive exploration of multiple regression models,
we concluded that Ridge Regression provided the most accurate and robust predic-
tions. Notably, this study underscored the influence of specific features in predicting
academic performance, and it demonstrates the effectiveness of using machine
learning techniques in educational research.
The results of this study have practical implications for educators and policy-
makers, who may use such predictive models to identify students at risk of poor
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 353
academic performance early on, thus allowing timely intervention. Furthermore, the
use of Explainable AI, particularly SHAP, provided us with a deep and intuitive
understanding of the predictions made by our model, which will be invaluable in
translating these findings into actionable strategies.
Future research could expand on this study by exploring more complex models
or by integrating time-series data to study how students’ performance evolves over
time. Other directions for future research could include a detailed analysis of the
most influential features in predicting student performance, as understanding these
factors can guide interventions aimed at improving academic outcomes.
References
1. Yagcı M (2022) Educational data mining: prediction of students’ academic performance using
machine learning algorithms. Smart Learn Environ 9(11). https://doi.org/10.1186/s40561-022-
00192-z
2. Rastrollo-Guerrero JL, G´omez-Pulido JA, Dur´an-Dom´ınguez A (2020) Analyzing and
predicting students’ performance by means of machine learning: a review. Appl Sci 10(3).
https://doi.org/10.3390/app10031042
3. Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, Guidotti
R, Del Ser J, Díaz-Rodríguez N, Herrera F (2023) Explainable artificial intelligence (XAI):
what we know and what is left to attain trustworthy artificial intelligence. Inform Fusion 99.
https://doi.org/10.1016/j.inffus.2023.101805
4. Cambria E, Malandri L, Mercorio F, Mezzanzanica M, Nobani N (2023) A survey on XAI
and natural language explanations. Inform Proces Manage 60(1). https://doi.org/10.1016/j.
ipm.2022.103111
5. Li Z (2022) Extracting spatial effects from machine learning model using local interpretation
method: an example of SHAP and XGBoost. Comput Environ Urban Syst (96). https://doi.org/
10.1016/j.compenvurbsys.2022.101845
6. Alabdullah AA, Iqbal M, Zahid M, Khan K, Amin MN, Jalal FE (2022) Prediction of rapid
chloride penetration resistance of metakaolin based high strength concrete using light GBM
and XGBoost models by incorporating SHAP analysis. Constr Build Mater 345. https://doi.
org/10.1016/j.conbuildmat.2022.128296
7. Jovanović J, Saqr M, Joksimović S, Gašević D (2021) Students matter the most in learning
analytics: the effects of internal and instructional conditions in predicting academic success.
Comput Educ 172. https://doi.org/10.1016/j.compedu.2021.104251
8. Namoun A, Alshanqiti A (2021) Predicting student performance using data mining and learning
analytics techniques: a systematic literature review. Appl Sci 11. https://doi.org/10.3390/app
11010237
9. Kotsiantis S, Pierrakeas C, Pintelas P (2004) Predicting students’ performance in distance
learning using machine learning techniques. Appl Artif Intell 18(5):411–426. https://doi.org/
10.1080/08839510490442058
10. M´arquez-Vera C, Cano A, Romero C, Noaman AYM, Mousa Fardoun H, Ventura S (2016)
Early dropout prediction using data mining: a case study with high school students. Expert
Syst 33(1):107–124. https://doi.org/10.1111/exsy.12135
11. Cortez P, Silva AMG (2008) Using data mining to predict secondary school student perfor-
mance. In: Brito A, Teixeira J (eds) Proceedings of 5th future business technology conference,
Porto, Portugal, pp 5–12. hdl.handle.net/1822/8024
12. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the
22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp
785–794. https://doi.org/10.1145/2939672.2939785
354 V. Katkar et al.
13. Allgaier J, Mulansky L, Draelos RL, Pryss R (2023) How does the model make predictions?
A systematic literature review on the explainability power of machine learning in healthcare.
Artif Intell Med 143. https://doi.org/10.1016/j.artmed.2023.102616
14. Cortez P, Silva A (2008) Using data mining to predict secondary school student performance.
In: Brito A, Teixeira J (eds) Proceedings of 5th FUture BUsiness TEChnology conference
(FUBUTEC 2008), Porto, Portugal, EUROSIS, pp 5–12. ISBN 978-9077381-39-7
Applications
Convolutional Neural-Network-based
Gesture Recognition System for Air
Writing for Disabled Person
Abstract Air writing is a unique form of natural user interface that involves the
recognition of characters and words that are written in the air using the movement of
one’s hands. This technology has become increasingly prominent and has received
considerable attention due to its potential to facilitate more natural and intuitive
forms of communication, as well as its applicability to a wide range of fields such
as virtual reality, augmented reality, and wearable computing. However, air-writing
recognition remains a challenging task due to the complexity and variability of the
gestures involved. This research paper proposes an air-writing recognition model
that leverages machine learning algorithms to recognize handwritten characters and
words in real time. The model is designed to be flexible and adaptable to different
types of air-writing gestures and is evaluated using a dataset of air-writing gestures
collected from multiple users. The proposed model consists of two main components:
a gesture recognition module that pre-processes the input data and extracts relevant
features, and a machine learning model that classifies the input gestures based on these
features. Experimental results show that the proposed model achieves high levels of
accuracy in recognizing air-writing gestures, outperforming existing cutting-edge
methods/technologies that are being used. The results demonstrate the potential of
the proposed model to be used in a variety of real-world applications, such as text
input, and controlling virtual objects in augmented reality.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 357
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_29
358 S. K. Modi et al.
1 Introduction
1.1 Motivation
The motivation behind our research is twofold. First, air writing recognition can
enable new modes of interaction with digital content, such as writing in the air to
enter text or draw shapes in 3D space [1, 4]. This can enhance the user experience in
virtual and augmented reality environments, as well as in remote collaboration and
teleoperation scenarios. Second, air writing recognition can provide a natural and
intuitive input method for people with disabilities or injuries that limit their ability to
use traditional input devices. To achieve accurate and robust air writing recognition,
a system is proposed that combines hand tracking, gesture segmentation, feature
extraction, and classification using convolutional neural networks. A large dataset of
air writing gestures from diverse users and environments were collected, and evalu-
ated the system on several metrics, including recognition accuracy, latency, and user
satisfaction. The results demonstrate the feasibility and effectiveness of this approach,
and suggest directions for future research, such as improving real-time performance,
adapting to individual users, and integrating with other input modalities. Overall,
this research contributes to the field of human–computer interaction by introducing
a new input method that can expand the range of applications and improve accessi-
bility for diverse users. The proposed approach can also inspire further innovations
in machine learning, computer vision, and sensor technology for gesture recognition
and natural user interfaces.
With the growth in deep learning techniques, researchers are exploring the use of
deep neural networks for recognizing air writing [6, 7]. Current research focuses
on developing more accurate and robust algorithms that can recognize air writing
in real time. As air writing recognition technology becomes more sophisticated,
concerns are being raised about the privacy and security implications of this tech-
nology. Researchers are studying the ethical and legal implications of air writing
recognition and developing frameworks to address these concerns. Researchers are
exploring how air writing recognition technology can be made more usable and
accessible for people with disabilities, such as those with motor impairments or visual
impairments. They are developing interfaces and applications that are designed to
be more accessible and user-friendly.
360 S. K. Modi et al.
The lack of accurate and reliable recognition algorithms for real time air writing
recognition. Although there have been significant developments in the domain
of air writing recognition, the accuracy and speed of recognition remain a chal-
lenge. Current algorithms are not able to accurately recognize the subtle movements
involved in air writing, and this limits the potential of this technology for applications
in fields such as healthcare, education, and human–computer interaction. Addition-
ally, there are concerns around the privacy and security implications of air writing
recognition, which must be tackled before air writing recognition technology can
be extensively adopted. Hence, additional investigation is necessary to enhance the
precision and dependability of recognition algorithms. Additionally, ethical and legal
considerations associated with air writing recognition need to be examined. To eval-
uate the current state of the art in air writing recognition technology. This involves
reviewing and analyzing existing literature and research papers on the topic, as well
as exploring the different algorithms and techniques used for air writing recognition
[1–4, 6–16]. The paper should also identify the limitations and challenges of current
air writing recognition technology and propose potential solutions for improving
accuracy and speed of recognition. Finally, the paper should consider the ethical
and legal implications of air writing recognition, including issues related to privacy
and security, and propose frameworks for addressing these concerns. Overall, the
task of the research paper is to provide a comprehensive and critical analysis of the
current state of air writing recognition technology, as well as its potential for future
development and application.
In the field of air writing recognition, the industry has focused on developing and
evaluating various algorithms and techniques for recognizing hand gestures in real
time. One popular approach for recognizing hand gestures is to use convolutional
neural networks (CNNs). Convolutional Neural Networks (CNNs) have demon-
strated success in numerous computer vision tasks, such as classifying images and
detecting objects, and have shown promise for recognizing air writing gestures as
well.
Several notable research papers have contributed to advancing the understanding
and techniques in this domain. Kumar et al. (2017) proposed a paper titled “3D
Text Segmentation and Recognition Using Leap Motion,” [1] where they focused
on segmenting and recognizing 3D text using the Leap Motion device. Their work
emphasized the use of depth data for accurate text recognition. Fu et al. (2019)
presented a paper titled “Writing in the Air with Wi-Fi Signals for Virtual Reality
Devices,” [2] where they explored the utilization of Wi-Fi signals for air writing
in virtual reality environments. Their approach leveraged Wi-Fi signals to capture
Convolutional Neural-Network-based Gesture Recognition System … 361
hand motions and recognize air-drawn characters. Chen et al. (2016) contributed
to the field through their paper titled “Air-Writing Recognition—Part II: Detection
and Recognition of Writing Activity in Continuous Stream of Motion Data,” [17]
which focused on detecting and recognizing writing activities from a continuous
stream of motion data. Their work introduced techniques for accurately detecting
and recognizing air writing gestures. Collectively, these papers have made significant
contributions to air writing recognition by exploring various aspects such as 3D text
segmentation, Wi-Fi signal utilization, and continuous motion data analysis.
In the paper titled “Air-Writing with Sparse Network of Radars using Spatio-
Temporal Learning,” Arsalan et al. (2020) present a novel approach to air-writing
recognition that utilizes a sparse network of radars and spatiotemporal learning tech-
niques [18]. The study addresses the challenges of traditional radar-based air-writing
systems by proposing a solution that overcomes the limitations of trilateration algo-
rithms and the physical constraints of placing multiple radars. The authors employ
spatiotemporal learning to capture the temporal dynamics of air-written gestures,
improving the accuracy of recognition. Experimental results presented at the 25th
International Conference on Pattern Recognition (ICPR) demonstrate the effective-
ness of their approach, showcasing promising results in terms of recognition perfor-
mance. The work by Arsalan et al. contributes to the advancement of air-writing
recognition systems by introducing a novel technique that combines sparse radar
networks and spatiotemporal learning, providing insights for more accurate and
robust recognition of air-written gestures [18].
In the field of air-writing recognition, Escopete et al. (2021) proposed a research
paper titled “Recognition of English Capital Alphabet in Air Writing Using Convo-
lutional Neural Network and Intel RealSense D435 Depth Camera.” [19] Their study
focused on leveraging the capabilities of the Intel RealSense D435 depth camera and
Convolutional Neural Networks (CNNs) to accurately recognize air-written English
capital alphabet characters. The authors collected a dataset of air-written characters
using the depth camera and utilized CNN models for feature extraction and classifica-
tion. The results of their experiments demonstrated the effectiveness of their approach
in achieving high recognition accuracy. The paper by Escopete et al. contributes to
the advancement of air-writing recognition systems by highlighting the potential of
depth camera technology and CNNs for accurate recognition of air-written gestures
[19].
In the paper titled “Wearable Air-Writing Recognition System Employing
Dynamic Time Warping,” Luo et al. (2021) [20] propose a novel approach for air-
writing recognition using wearable devices and Dynamic Time Warping (DTW) algo-
rithms. The study focuses on addressing the challenges of recognizing air-written
gestures in a wearable context, where limited sensor data and diverse writing styles
can affect the accuracy of recognition. The authors introduce a system that leverages
DTW, a technique capable of capturing the temporal dynamics and variabilities in
air-writing gestures. The proposed system utilizes wearable devices to capture hand
movements and employs DTW algorithms for recognizing the intended characters.
Experimental results presented at the IEEE 18th Annual Consumer Communica-
tions and Networking Conference (CCNC) demonstrate the effectiveness of their
362 S. K. Modi et al.
discusses the methodology, techniques, and results of their system, providing insights
into the advancements in air writing recognition technology.
Hayakawa et al. (2022) presented a paper titled “Air Writing in Japanese: A
CNN-based Character Recognition System Using Hand Tracking” at the 2022 IEEE
4th Global Conference on Life Sciences and Technologies (LifeTech) [22]. Their
research focused on developing a character recognition system specifically for air
writing in the Japanese language. The proposed system utilized hand tracking tech-
niques combined with a Convolutional Neural Network (CNN) for accurate recog-
nition of air-drawn Japanese characters. By leveraging CNN’s ability to learn and
extract features from the captured hand movements, the system aimed to provide a
robust and efficient solution for recognizing handwritten characters in the air. This
work contributes to the field by addressing the unique challenges of air writing in
Japanese [22] and exploring the application of CNN-based approaches for character
recognition in this context.
Ahmed et al. (2022) published a paper titled “Radar-Based Air-Writing Gesture
Recognition Using a Novel Multistream CNN Approach” [23] in the IEEE Internet of
Things Journal. Their research focused on radar-based air-writing gesture recognition
and proposed a novel multistream Convolutional Neural Network (CNN) approach.
The system utilized radar sensors to capture hand movements in the air and employed
a multistream CNN architecture to effectively process and analyze the captured
data. By leveraging multiple streams of information, such as range and Doppler
data, the proposed approach aimed to improve the accuracy and robustness of air-
writing gesture recognition. This work contributes to the field by addressing the
challenges associated with radar-based air-writing recognition and introducing a
novel CNN-based approach to enhance the performance of such systems.
Overall, the related work in the field of air writing recognition has demonstrated
the potential of using CNNs and image processing techniques for accurately and
efficiently recognizing air writing gestures in real time. Despite the progress made,
additional research is required to enhance the precision and pace of recognition while
also addressing the ethical and legal considerations associated with this technology.
2 Methodology
The objective of this research paper is to explore the use of Convolutional Neural
Networks (CNNs) and image processing techniques for air writing recognition.
Specifically, the paper aims to investigate the effectiveness of CNNs in detecting
and recognizing characters written by hand in the air using a camera, without the
assistance of external devices. The study will also compare various image processing
techniques to enhance the visibility and accuracy of the captured images. The ulti-
mate objective of this research is to facilitate the development of an efficient and
precise air writing recognition system that can be utilized in diverse applications,
such as virtual reality interfaces, gesture-based control, and medical rehabilitation.
364 S. K. Modi et al.
Image analysis of each video frame can be conducted to extract relevant features
and classify the air writing gestures. The first step is to pre-process the video frames
by removing noise and enhancing the contrast. This can be done using techniques
such as image thresholding, adaptive histogram equalization, and Gaussian blur.
Next, the air writing gestures can be extracted from the pre-processed frames using
techniques such as edge detection, contour detection, and optical flow. The edges
can be detected using the Canny edge detector, and the contours can be extracted
using the findContours function in OpenCV. Optical flow techniques such as Lucas-
Kanade or Farneback can be used to track the movement of the air writing gestures
over time (Fig. 1).
After the gestures have been extracted, relevant features can be extracted from
them. These features can include stroke direction, stroke length, curvature, and angle
of the strokes [3, 16]. These features can be extracted using techniques such as Hough
transforms, corner detection, and image moments. The extracted features can be
used to classify the air writing gestures using a machine learning algorithm such as
a convolutional neural network. Overall, image analysis of each video frame using
OpenCV can be a powerful technique for air writing recognition, as it allows for the
extraction of relevant features from the video frames and the classification of the air
writing gestures using machine learning algorithms[11].
Feature Extraction plays a crucial role in image processing, as it involves the trans-
formation of pre-processed images through convolutional neural networks (CNNs)
and other layers to extract relevant features and patterns from the input. This process
involves a series of operations, including max pooling, dropout, flatten, hidden layers,
and SoftMax layers. The goal is to capture the essential characteristics of the images,
such as edges, textures, shapes, and colors, which are then used for various tasks,
including object recognition, image classification, and image retrieval. Training and
classification are done once the relevant features have been extracted from the pre-
processed images to train the model. This training phase is crucial as it involves the
optimization of the weights and biases associated with the different layers of the
network. Backpropagation, a technique that calculates the gradients of the model’s
parameters, is employed to propagate the error through the network. Stochastic
gradient descent is then utilized to update the weights and biases based on these
gradients, gradually minimizing the loss function. Once the model has undergone
the training process, it becomes capable of real-time classification of handwritten
characters. The input images are fed into the trained model, which then applies the
learned features and patterns to make predictions. The model assigns a specific class
label to each input character based on its understanding of the extracted features and
the patterns associated with different characters. This classification process enables
the system to decipher and recognize handwritten characters with a certain level of
accuracy.
Here, the model was evaluated based on two different architectural layers in
the CNN model. The detailed analysis is as follows: Fig. 2 depicts the 2-layered
architectural model representation based on CNN where multiple layers are present
[13]. The first two layers are convolutional layers A and B which consist of different
filter sizes such as 32 and 64 filters, respectively. The functioning of the layers is based
on Rectified Linear Unit (ReLU), which is a frequently utilized activation function in
neural networks. Then the image is passed through the max pooling layer for feature
extraction. Next in the sequence is the dropout layer, a widely adopted regularization
technique in neural networks. Its function is to prevent overfitting by randomly setting
a specified fraction of input units to zero during each training iteration. Then the
image is processed through the flatten layer and the main function of the flatten layer
is to convert the multidimensional output of the previous convolutional layers into
a one-dimensional vector, which can be passed as input to a fully connected layer.
For the final processing, the image is processed through the SoftMax layer and the
main benefit of using this layer is that it allows the model to produce a probability
distribution over the predicted classes, which can be useful for interpreting the output
of the model and making decisions based on the probabilities, and it provides a way
to train the model using a loss function that accounts for the predicted probabilities.
366 S. K. Modi et al.
Fig. 2 Representation of
2-layered architectural model
Convolutional Neural-Network-based Gesture Recognition System … 367
In the context of generating input data for air writing recognition, the EMNIST [11]
dataset was used, and the following preprocessing steps needed to be performed: Data
normalization includes the EMNIST [11] dataset contains images of handwritten
digits and characters that vary in size and orientation [12]. Therefore, the first step is
to normalize the data by resizing all the images to a fixed size (e.g., 28 × 28 pixels)
and aligning them in a consistent orientation. Data augmentation requires increasing
the size of the dataset and reducing overfitting, data augmentation techniques such as
rotation, scaling, and horizontal flipping can be applied to the images. Data splitting
involves dividing the dataset into three sets: training, validation, and test. The training
set is employed to train the model, while the validation set is utilized to fine-tune
hyperparameters and avoid overfitting. Lastly, the test set is utilized to evaluate the
final performance of the model. Data preprocessing includes scaling the pixel values
of the images to a range of 0–1, and the labels can be converted to one-hot encoding
to represent the different classes. Data shuffling is done to prevent the model from
learning the order of the data, the training and validation sets can be shuffled before
each epoch. By performing these preprocessing steps on the EMNIST [11] dataset,
it is possible to generate high-quality input data for training and testing air writing
recognition models.
The EMNIST [11] dataset was converted to cast the Boolean values as float32 values
as the dataset needed to be reshaped. The hyperparameters were defined using 3
convolutional layers with 128 filters and 256 nodes. The range of HSV color space
filters were used to define the lower boundary as (29,86,6) and the upper boundary as
(64,255,255) for the color green. The following figures represent the input generation
from the model. Figure 4. Represents the valid boundary space for giving inputs,
Figure 5 represents the successful processing of the input on the blackboard screen.
Figure 6 represents the processing of output and the output generated after processing
of the character on the blackboard screen.
368 S. K. Modi et al.
Fig. 3 Representation of
3-layered architectural model
Convolutional Neural-Network-based Gesture Recognition System … 369
The proposed approach in this research paper introduces several novel aspects
that contribute to the advancement of air writing recognition systems. These novel
elements validate the uniqueness and effectiveness of our approach. Firstly, this study
focuses on recognizing handwritten characters and digits in the air without the need
for external hardware. This distinguishes the proposed approach from traditional
handwriting recognition systems that rely on physical input devices such as styluses
or touchscreens. By leveraging the power of CNNs and image analysis techniques,
it enables users to write in the air, providing a more intuitive and natural interaction
method. Secondly, utilization of the EMNIST [11] dataset for training and testing
the models. While this dataset has been used in previous research, its application
specifically for air writing recognition is novel. The EMNIST [11] dataset offers a
370 S. K. Modi et al.
diverse range of handwritten characters and digits, allowing the model to train on a
comprehensive set of examples. This ensures that the system is robust and capable of
recognizing a wide variety of air-written inputs. Thirdly, the exploration of different
CNN architectures, including 2-layered and 3-layered configurations, provides valu-
able insights into the optimal design for air writing recognition. The comparison of
these architectures reveals that the 3-layered CNN outperforms the 2-layered counter-
part in terms of accuracy and loss. This finding contributes to the body of knowledge
regarding the architecture selection for air writing recognition systems. Additionally,
the integration of OpenCV for image analysis of each video frame is a novel aspect of
our approach. This step allows for precise pre-processing and enhances the quality of
input data. By leveraging OpenCV’s capabilities, improvement of the overall perfor-
mance of the proposed system can be achieved and more accurate recognition results
are obtained. While the proposed approach showcases several novel elements, it is
important to acknowledge its limitations. The current model may face challenges
in recognizing complex or ambiguous gestures, which could be addressed in future
research. Furthermore, the proposed approach relies on the availability of a suitable
dataset, and the expansion of the dataset to include more diverse handwriting styles
and variations could further enhance the system’s performance.
In conclusion, the discussion validates the novelty of the proposed approach in air
writing recognition. The combination of recognizing air-written characters without
external hardware, utilizing the EMNIST dataset, exploring different CNN archi-
tectures, and integrating OpenCV for image analysis collectively contribute to the
uniqueness and effectiveness of this approach. By addressing the identified limita-
tions and building upon the proposed model, continuation to advance the field of air
writing recognition can be done and unlock its full potential in various domains.
In order to evaluate the performance of the proposed air writing recognition system,
certain simulations were conducted comparing a 2-layered CNN and a 3-layered
CNN in terms of their accuracy and loss values. The results of the simulations revealed
that the 2-layered CNN achieved an accuracy of approximately 64% with a corre-
sponding loss value of approximately 15%. On the other hand, the 3-layered CNN
exhibited a higher accuracy of approximately 88% but had a relatively higher loss
value of approximately 37%. These simulation results clearly demonstrate that the
3-layered CNN outperformed the 2-layered CNN in terms of accuracy, showcasing
its superior capability in correctly recognizing air-written characters and digits. The
significantly higher accuracy of the 3-layered CNN suggests that it is more adept at
capturing the intricate patterns and variations present in air-written gestures, leading
to more precise recognition outcomes. However, it is important to note that the 3-
layered CNN also exhibited a higher loss value compared to the 2-layered CNN. A
lower loss value typically indicates better model performance as it reflects the degree
of error in the predictions. The higher loss of the 3-layered CNN suggests that it may
Convolutional Neural-Network-based Gesture Recognition System … 371
Fig. 7 Representation of
accuracy data on a 2-layered
model
Fig. 8 Representation of
loss data on a 2-layered
model
372 S. K. Modi et al.
Fig. 9 Representation of
accuracy data on 3-layered
model
Fig. 10 Representation of
loss data on 3-layered model
possibilities for intuitive and natural human–computer interaction, where users can
write in the air without the need for physical input devices. It has the potential to
revolutionize fields such as virtual reality [2], augmented reality, and accessibility,
enabling more immersive experiences and improved communication channels. To
fully realize the potential of air writing recognition, future research can explore addi-
tional aspects such as real-time implementation, optimization for different devices
and platforms, and integration with complementary technologies like depth sensing or
motion capture. These advancements will contribute to the practical deployment and
widespread adoption of air writing recognition systems. In summary, the simulation
results highlight the superior accuracy of the 3-layered CNN in air writing recog-
nition compared to the 2-layered CNN. The findings underscore the technological
potential of air writing recognition for transforming user interfaces and improving
interaction in various domains. Continued research and development are necessary
Convolutional Neural-Network-based Gesture Recognition System … 373
to address the model’s limitations and enhance its performance, ultimately bringing
air writing recognition closer to real-world applications.
4 Mathematical Analysis
In the context of air writing recognition using a convolutional neural network (CNN),
a mathematical analysis can be conducted to evaluate the performance of the model
with the use of the Rectified Linear Unit (ReLU) activation function and feature
extraction techniques, based on their accuracy. Let the input data be denoted as X,
and the ground truth labels be denoted as Y. Let f 1, f 2, and f 3 be the feature extraction
functions of the CNN, and w1, w2, and w3 be the corresponding weights and biases
of the convolutional and fully connected layers.
The forward pass of the CNN can be represented as
A3 = SoftMax(Z3) (6)
where the ReLU activation function is used to introduce non-linearity in the model,
and the SoftMax function is used to obtain the predicted class probabilities. The aim
is to reduce the cross-entropy loss between the predicted output A and the actual
ground truth labels Y:
J (A, Y ) = − Y ∗ log(A) (7)
To optimize the model, the gradient descent algorithm can be used. This involves
using backpropagation to calculate the gradients of the loss function with respect to
the weights and biases. The accuracy of the model can be evaluated using a test set,
where the predicted output A can be compared with the ground truth labels Y. The
analysis outcomes suggest that utilizing the ReLU activation function and imple-
menting feature extraction techniques have the potential to enhance the accuracy of
374 S. K. Modi et al.
the CNN for air writing recognition. By utilizing the ReLU activation function, non-
linearity can be introduced in the model, thereby enabling it to better capture intricate
relationships within the input data. Furthermore, feature extraction techniques can
assist in identifying distinctive features from the input data, thereby improving clas-
sification performance. Fine-tuning of hyperparameters, such as learning rate, batch
size, and number of layers, can further improve the accuracy of the model.
5 Conclusion
In the future scope, there are several avenues for future research. Alternative neural
network architectures could be explored to further enhance the performance of
air writing recognition systems. Additionally, additional pre-processing techniques,
such as data augmentation or advanced noise reduction methods, may be investigated
to improve the robustness of the system. The application potential of air writing recog-
nition systems is vast, ranging from human–computer interaction to virtual reality
and augmented reality applications. The ability to input text and commands through
air writing can revolutionize user interfaces and enable new modes of communi-
cation. One area for future research is the exploration of different neural network
architectures, such as Recurrent Neural Networks (RNNs), to determine if they can
improve the accuracy and speed of air writing recognition. Additionally, incorpo-
rating advanced techniques such as transfer learning and reinforcement learning may
also be beneficial. Further research could focus on creating air writing recognition
systems that can function in real time and adapt to dynamic environments. This
could involve the use of additional sensors such as accelerometers and gyroscopes to
provide additional data for the recognition system. In summary, this research paves
the way for utilizing CNNs in air writing recognition systems and provides insights
into their strengths, limitations, and future possibilities. By addressing the identified
challenges and expanding upon the proposed model, advancements in the field of air
writing recognition can be achieved and contribute to its practical implementation in
various domains.
References
1. Kumar P, Saini R, Roy PP, Dogra DP (2017) 3D text segmentation and recognition using
leap motion. Multimed Tools Appl 76(15):16491–16510. https://doi.org/10.1007/s11042-016-
3923-z
2. Fu Z, Xu J, Zhu Z, Liu AX, Sun X (2019) Writing in the air with WiFi signals for virtual
reality devices. IEEE Trans Mob Comput 18(2):473–484. https://doi.org/10.1109/TMC.2018.
2831709
3. Kumar P, Saini R, Roy PP, Dogra DP (2017) Study of text segmentation and recognition using
leap motion sensor. IEEE Sens J 17(5):1293–1301. https://doi.org/10.1109/JSEN.2016.264
3165
4. Itaguchi Y, Yamada C, Fukuzawa K (2015) Writing in the air: contributions of finger movement
to cognitive processing. PLoS One 10(6). https://doi.org/10.1371/journal.pone.0128419
5. Ramya ST, Sakthi R, Rohitha B, Praveena D (2022) Air-writing recognition system. In: 2022
international interdisciplinary humanitarian conference for sustainability (IIHC), Bengaluru,
India, pp 910–913. https://doi.org/10.1109/IIHC55949.2022.10059943
6. Chollet F. Xception: deep learning with depthwise separable convolutions.
7. Fang Y, Xu Y, Li H, He X, Kang L (2020) Writing in the air: recognize letters using deep
learning through WiFi signals. In: Proceedings—2020 6th international conference on big
data computing and communications, BigCom 2020. Institute of Electrical and Electronics
Engineers Inc., pp 8–14. https://doi.org/10.1109/BigCom51056.2020.00008
376 S. K. Modi et al.
8. Chen H, Ballal T, Muqaibel AH, Zhang X, Al-Naffouri TY (2020) Air writing via receiver array-
based ultrasonic source localization. IEEE Trans Instrum Meas 69(10):8088–8101. https://doi.
org/10.1109/TIM.2020.2991573
9. Choudhury A, Sarma KK (2021) A CNN-LSTM based ensemble framework for in-air
handwritten Assamese character recognition. Multimed Tools Appl 80(28–29):35649–35684.
https://doi.org/10.1007/s11042-020-10470-y
10. Mukherjee S, Ahmed SA, Dogra DP, Kar S, Roy PP (2019) Fingertip detection and tracking for
recognition of air-writing in videos. Expert Syst Appl 136:217–229. https://doi.org/10.1016/j.
eswa.2019.06.034
11. Cohen G, Afshar S, Tapson J, van Schaik A (2017) EMNIST: an extension of MNIST to
handwritten letters [Online]. http://arxiv.org/abs/1702.05373
12. Abadi M, et al. (2016) TensorFlow: large-scale machine learning on heterogeneous distributed
systems [Online]. http://arxiv.org/abs/1603.04467
13. Chen M, AlRegib G, Juang BH (2016) Air-writing recognition—Part I: Modeling and recogni-
tion of characters, words, and connecting motions. IEEE Trans Hum Mach Syst 46(3):403–413.
https://doi.org/10.1109/THMS.2015.2492598
14. Kane L, Khanna P (2017) Vision-based mid-air unistroke character input using polar signatures.
IEEE Trans Hum Mach Syst 47(6):1077–1088. https://doi.org/10.1109/THMS.2017.2706695
15. Pedregosa F, Varoquaux G, Thirion B, Michel V, Dubourg V, Passos A, Perrot M, et al (2011)
Scikitlearn: machine learning in Python [Online]. http://scikit-learn.sourceforge.net
16. Roy P, Ghosh S, Pal U (2018) A CNN based framework for unistroke numeral recognition in
airwriting. In: Proceedings of international conference on frontiers in handwriting recognition,
ICFHR, Institute of Electrical and Electronics Engineers Inc., pp 404–409. https://doi.org/10.
1109/ICFHR-2018.2018.00077
17. Chen M, AlRegib G, Juang BH (2016) Air-writing recognition—Part II: Detection and recog-
nition of writing activity in continuous stream of motion data. IEEE Trans Hum Mach Syst
46(3):436–444. https://doi.org/10.1109/THMS.2015.2492599
18. Arsalan M, Santra A, Bierzynski K, Issakov V (2021) Air-writing with sparse network of radars
using spatio-temporal learning. In: 2020 25th international conference on pattern recognition
(ICPR), Milan, Italy, pp 8877–8884. https://doi.org/10.1109/ICPR48806.2021.9413332
19. Escopete M, Laluon C, Llarenas E, Reyes P, Tolentino R (2021) Recognition of English capital
alphabet in air writing using convolutional neural network and intel RealSense D435 depth
camera, pp 1–8. https://doi.org/10.1109/GCAT52182.2021.9587515
20. Luo Y, Liu J, Shimamoto S (2021) Wearable air-writing recognition system employing dynamic
time warping. In: 2021 IEEE 18th annual consumer communications and networking confer-
ence (CCNC), Las Vegas, NV, USA, pp 1–6. https://doi.org/10.1109/CCNC49032.2021.936
9458
21. Uysal C, Filik T (2021) RF-Wri: an efficient framework for RF-based device-free air-writing
recognition. IEEE Sens J 21(16):17906–17916. https://doi.org/10.1109/JSEN.2021.3082514
22. Hayakawa S, Goncharenko I, Gu Y (2022) Air writing in Japanese: a CNN-based character
recognition system using hand tracking. In: 2022 IEEE 4th global conference on life sciences
and technologies (LifeTech), Osaka, Japan, pp 437–438. https://doi.org/10.1109/LifeTech5
3646.2022.9754825
23. Ahmed S, Kim W, Park J, Cho SH (2022) Radar-based air-writing gesture recognition using
a novel multistream CNN approach. IEEE Internet Things J 9(23):23869–23880. https://doi.
org/10.1109/JIOT.2022.3189395
24. Tsai T-H, Hsieh J-W (2017) Air-writing recognition using reverse time ordered stroke context.
In: 2017 IEEE international conference on image processing (ICIP), Beijing, China, pp 4137–
4141. https://doi.org/10.1109/ICIP.2017.8297061
A Protection Approach for Coal Miners
Safety Helmet Using IoT
Abstract The goal of the coal mining helmet proposed in this research is to provide
insurance to miners by forewarning them. As long as the person is carrying the
protective cap, all of the components may be mentioned. The output of the cap
module is updated continually for each example, updating the cloud with real-time
data. These wearable devices can share their data or retrieve it from other sources
thanks to the Internet of Things (IoT). If there is a threat, warnings are given to the
employer and the digger. The creation of wearable PC frameworks and universal
registration has tremendously aided the advancement of wearable technology. As a
result, this wearable device includes a wide range of sensors that allow it to connect
with other parts and enhance the insurance of the digger. The equipment has integrated
data gathering, information management, and information correspondence parts. The
DHT11 temperature and humidity sensor was employed. There are times when the
heat and moisture levels in mines are too high and the excavator dies. Anyone inside
the mines should have respiratory problems as a result of those gases being released,
which could lead to suffocation. A notification is communicated to both the bottom
Authorizer and the digger inside the not possible occasion that as a minimum one of
these pieces goes beyond the breaking point.
S. Modi
Karmaveer Bhaurao Patil College of Engineering, Satara, India
e-mail: [email protected]
Y. Mali (B) · L. Sharma
G.H Raisoni College of Engineering & Management, Wagholi, Pune, Maharashtra, India
e-mail: [email protected]
P. Khairnar
Ajeenkya D Y Patil School of Engineering, Pune, India
D. S. Gaikwad · V. Borate
D Y Patil College of Engineering & Innovation, Talegoan, Pune, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 377
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_30
378 S. Modi et al.
1 Introduction
Mineral resources are diverse and abundant, and its mining sector is enormous. The
mining industry has a very strong stake in proper oversight and legal communications.
Bosses are held accountable for all injuries sustained while under their supervision, so
they are aware of any potentially dangerous situations. The problem being addressed
is the development of a mining head cover to raise digger safety awareness. Being
mindful of one’s surroundings when using loud gear would typically involve testing.
On the other hand, diggers hardly ever take their headgear off. Excavators
frequently take off some of their protective gear in the mining industry because
it is too cumbersome, hot, or uncomfortable to wear. By and large, mining safety
helmets only serve to shield the digger’s head from potentially harmful blows; no
technology has been developed to make head guards warn workers when a particular
digger has had a dangerous episode. The project’s goal is to emphasize a remote
sensor hub organization to connect to the current mining security cap and make
it much more secure. The goal was expanded to include developing a system that
would fit within a health-protective cap and operate for an extended period of time on
battery power [1]. Another challenge was to change the protective cap’s appearance
without compromising its functionality. The additional weight must be maintained
as light as reasonably possible. Utilizing WSN innovation, information is gathered
or the boundaries are estimated. The WSN invention is a fixed of sensors, every of
which has a very unique variety of detection but functions as an entire component of
the system. The degree of humidity and temperature is displayed to the digger on an
(herbal energy), and for the gas, a restriction is prepared, and a ringer alert is issued
if it exceeds the restriction notably. Through incorporating perception, a mining cap
should be modified to a resource the safety of unskilled people. While an excavator
takes to the air his headgear, the digger should be informed earlier than starting up
his cap [2]. An excavator will both grow to be unaware or robust if something falls
on him at the same time as he is gaming his cap. If probably fatal harm has been
sustained through an unskilled worker, the framework should reap that end. The ones
instance characteristic examples of formidable times 0.47, hazard gases have been
recognized and reported.
2 Architecture
The transmitter half, temperature sensor, humidity sensor, LDR, and power supply
are all included in the transmitter portion of the framework arrangement. Then comes
Arduino, followed by Driven and a bell inside the recipient portion. The untrained
employee turns on the protective cap hardware as soon as he enters. The tempera-
ture and moisture sensor DHT_11 constantly observes changes in temperature and
mugginess, determines whether or not or not the professional is covered, and informs
the outcomes. Therefore, safety precautions can be executed while temperature or
A Protection Approach for Coal Miners Safety Helmet Using IoT 379
moisture, one of the conditions, becomes regard for the professionals. Due to this the
mild-emitting diode will squint and suggest that it’s miles unreliable for the worker.
The MQ2 gasoline sensor detects poisonous gases like ethane, methane, butane, and
others [3]. If this kind of gas is detected, the signal grows to emerge as ON and it
will begin to blare, and the device needs a way to determine whether an unskilled
worker has suffered a doubtlessly fatal injury these help prevent workers from being
exposed to risky gases.
See Fig. 1.
4.1 MQ-2
MQ-2 may be a gas sensor of the metal oxide semiconductor range. The fuel concen-
tration inside the gas is screened by the usage of a voltage divider community within
the sensor. The detecting trouble of this sensor is often composed of aluminum oxide
and is constant after combustion with tin dioxide. It is enclosed in a community of
quite tempered metallic. The detecting outline is supported via way of six interwoven
legs [4]. The detecting element is warmed with the beneficial useful resource of two
leads, at the same time as the final results signals are dealt with the resource of the
opportunity four leads. Oxygen is superficially absorbed while a sensor material is
heated to an absurd temperature in air [5]. The oxygen is in the end attracted to
by way of the advantage electrons within the tin oxide, which prevents the oxygen
from flowing. While the declining gases are present, the one’s oxygen iotas interact
with them and reduce the thickness of the adsorbed oxygen. Presently, the non-stop
modern-day glide via the sensor is used to calculate clean voltage values. Those
voltage measurements are used to evaluate the gas fixation. The voltage levels are
higher when the gasoline fixation is excessive (Fig. 2).
4.2 DHT-11
DHT-11 is an automatic sensor with a low estimation that measures temperature and
stickiness. As of proper now, any microcontroller, which includes an Arduino or a
Raspberry Pi, can speak with this sensor to quickly measure the temperature of the
DHT-11 sensor and moisture.
A thermistor for temperature identity and an electrical oddity mugginess identi-
fying aspect makes up the DHT-11 sensor. A substrate that holds moisture serves as
the dielectric among the two cathodes of the mugginess sensor capacitor. The value
of capacitance changes as mugginess degrees exchange [6]. The IC degree is used
A Protection Approach for Coal Miners Safety Helmet Using IoT 381
to research these changed obstruction values, turning them into superior structures.
The resistance of the thermostat utilized by this sensor, which has terrible temper-
ature steady, drops because the temperature rises. To provide a more competitive
cost, no matter the temperature variant, this finder is generally manufactured from
semiconductor ceramics or polymers. DHT-11 has 2-degree accuracy throughout a
temperature range of zero to 55 levels Celsius. This sensor has a 23% stickiness range
with 35% accuracy. The DHT-11 is probably a small semiconductor with a running
voltage of 8–17 V for the reason that this sensor’s testing frequency is one Hz. In the
course of the estimate, present day of no extra than 2. 5 mA may be used. The V-CC,
G-ND, records pins, and an unconnected pin for interacting with the microcontroller
are the four pins on the DHT-11 sensor. There’s a pull-up resistor that degrees from
15 to 25 k [7] (Fig. 3).
4.3 LDR
A mild-based electrical aspect, consisting of an LDR, relies upon mild. The impedi-
ment will abruptly exchange whilst mild beams strike it. An LDR’s operating popular
is photoconductive, which is simply an optical oddity (Fig. 4).
The substance becomes more recognizable the longer it absorbs mild. The elec-
trons in the cloth’s valence band rapidly shift to the conduction band as soon as mild
shines on the LDR. Whilst the band hole of the occurrence is more substantial than
the photons inside the episode mild, excessive-depth light reasons greater electrons
to be interested in the peculiarity band, activating multiple fee transporters inside the
system [8]. The obstruction of the device lessens because of the end result of this
approach and also its development begins to flow more.
5 Algorithm
1. Start.
2. Import the D-HT P-IN and DHT-TYPE as DHT-11 pins from the DHT-11 libraries
and framework.
3. Exchange the light and satisfied variables to thirteen, the smoke Z0 variable to
Z1, the ringer to 11, and so on.
4. The functionality characterizes the data and result pins inside the association
with the use of pin mode and sequential start to begin the Arduino. The result
pins are inexperienced pushed and signed, even as the fact pins are SmokeZ0 and
DHT-PIN.
5. In the Circle capability, have a look at the sensor value of MQ2 using a simple
look at. If the sensor price is more than 330, print that smoke is being detected
and keep a postponement of two seconds for a number of the tendencies.
6. If the charge of the mild is less than 530, print the LDR as a simple observation
and save it at some stage in a mild factor. If the cost of slight is more than the
fee of print, preserve the mild pressure low and turn it on; if the fee of slight is
greater than the rate of print, preserve the mild strain excessive and flip it off.
7. In DHT-eleven, employ the evolved peruse capability. Sticky to get moist and
scanned Temperature to prompt the temperature and preserve them in separate
variables known as humi and tempc. If humi or tempc have values larger than 22
or 33, respectively, the gled will turn on.
8. Stop
6 Flowchart
See Fig. 5.
See Fig. 6.
A Protection Approach for Coal Miners Safety Helmet Using IoT 383
Fig. 5 Flowchart
LDR resistance, three drives, a MQ-2 sensor, and a DHT-11 sensor. We addition-
ally have an L-DR further to the standard V-CC this is linked to the pin reset at the
left for the general public of gadgets. The L-DR’s positive cease is stressed out to
a 200K ohm resistor at the pin, and its poor stop is stressed out to the floor. A blue
mild-discharging diode, which is linked to pin 8, is the L-DR’s output [9]. 5 pins
make up the M-Q2 sensor: V-CC, floor, and result. The indicators for the end result
pin and pin nine are a bell and a red mild-emitting L-ED, respectively. V-CC, ground,
and data are the 6 pins that make up the DHT-11 sensor. The facts pin is connected
to result pin 7 and the result indicator is an inexperienced L-ED.
384 S. Modi et al.
10 Testing Output
At the digital terminal, every MQ-2, LD-R, and DHT-11 result is displayed separately.
Further, the ringer will continuously warn or alert for MQ-2, the DHT-11 result should
be displayed in green, and the L-DR result has to be displayed in red [10].
This framework has designed a continuous facts monitoring framework for under-
ground herbal of mine supported remote identifier organization. It has the ability to
display data transmission between mine terminals and mines and alert about anoma-
lous ecological obstacles. This framework offers smart adaptability and expansibility,
helpful structure management, and minimal installation and protection charges [11].
386 S. Modi et al.
It’s been speculated to broaden a clever mining helmet protection that could
distinguish amongst three numerous classes of in all likelihood hazardous situations,
consisting of concentrations of poisonous fuel, mining cap expulsion and crash, and
effect. An excavator taking their mining cap from their head became a volatile inci-
dence [12]. An object striking an excavator in the direction of its will and with a
power of a couple of thousand at the HDP (Head harm policies) is some other prob-
able lethal state of affairs. Estimating gasoline concentrations is also an opportunity
(Table 1).
A Protection Approach for Coal Miners Safety Helmet Using IoT 387
Table 1 Performance of
Main component DHT-11 DHT-22
temperature sensor DHT-11,
DHT-22 Temp check −0 to 10° ±100°
Temp range 25–50° 35–70°
Required power 4.3–7 v 4.6–10 v
Humid range 35% 43%
Size of sample Few minutes Few seconds
Results in bits 17 bits 24 bits
11 Future Scope
The design includes joining a Wi-Fi-based system that could incorporate the neces-
sary information and update it inside the informational collection. The informational
index will remember the ideal opportunity for the location of natural components
nuances in light of the fact that the data will be conveyed routinely. The informational
collection will consequently be made somewhat open so that managers and higher
experts can screen for any disturbing circumstances and work with the previous
accessibility of clinical work [10]. The emergency office will profit from the GPS
module. Utilization of some AI models can likewise assist with working on the
framework in future [11]. To convey help all the more rapidly in case of hazardous
circumstances, find the backhoes.
12 Conclusion
We have successfully created an intelligent worker headgear that can detect gases,
stickiness, temperature, and light. The edge values, which are physically fixed, can
be updated based on the typical conditions of the mining areas. In order to take
proper precautions against any unpredictable conditions, the sensors will identify
any changes. In the case of anything risky, the backhoe will be made mindful of it
as an adjustment of light-radiating diode tone and a caution from signal. On the odd
occasion that the diggers are inaccessible, we have likewise arranged a GPS module
that might give the area of the diggers consistently.
References
1. Borate V, Mali Y, Suryawanshi V, Singh S, Dhoke V, Kulkarni A (2023) IoT Based Self
Alert Generating Coal Miner Safety Helmets, 2023 International Conference on Computational
Intelligence, Networks and Security (ICCINS), Mylavaram, India, pp. 01–04. https://doi.org/
10.1109/ICCINS58907.2023.10450044
388 S. Modi et al.
2. Mali YK, Darekar SA, Sopal S, Kale M, Kshatriya V, Palaskar A (2023) Fault Detection of
Underwater Cables by Using Robotic Operating System, 2023 IEEE International Carnahan
Conference on Security Technology (ICCST), Pune, India, pp. 1–6. https://doi.org/10.1109/
ICCST59048.2023.10474270
3. Vaidya AO, Dangore M, Borate VK, Raut N, Mali YK, Chaudhari A (2024) Deep Fake Detec-
tion for Preventing Audio and Video Frauds Using Advanced Deep Learning Techniques,
2024 IEEE Recent Advances in Intelligent Computational Systems (RAICS), Kothamangalam,
Kerala, India, pp. 1–6. https://doi.org/10.1109/RAICS61201.2024.10689785
4. Bhongade A., Dargad S, Dixit A., Mali YK, Kumari B, Shende A (2024) Cyber Threats in Social
Metaverse and Mitigation Techniques. In: Somani AK, Mundra A., Gupta RK, Bhattacharya S,
Mazumdar AP (eds) Smart Systems: Innovations in Computing. SSIC 2023. Smart Innovation,
Systems and Technologies, vol 392. Springer, Singapore. https://doi.org/10.1007/978-981-97-
3690-4_34
5. Shabina M, Sunita M, Sakshi M, Rutuja K, Rutuja J, Sampada M, Yogesh M (2024) Automated
Attendance Monitoring System for Cattle through CCTV. Revista Electronica De Veterinaria,
25(1), 1025–1034. https://doi.org/10.69980/redvet.v25i1.724
6. Karajgar MD et al. (2024) Comparison of Machine Learning Models for Identifying Mali-
cious URLs, 2024 IEEE International Conference on Information Technology, Electronics and
Intelligent Communication Systems (ICITEICS), Bangalore, India, , pp. 1–5. https://doi.org/
10.1109/ICITEICS61368.2024.10625423
7. Mali Y, Pawar ME, More A, Shinde S, Borate V, Shirbhate R (2023) Improved pin entry
method to prevent shoulder surfing attacks. In: 2023 14th international conference on computing
communication and networking technologies (ICCCNT), Delhi, India, pp 1–6. https://doi.org/
10.1109/ICCCNT56998.2023.10306875
8. Mali YK, Mohanpurkar A (2015) Advanced pin entry method by resisting shoulder surfing
attacks. In: 2015 international conference on information processing (ICIP), Pune, India, pp
37–42. https://doi.org/10.1109/INFOP.2015.7489347
9. Pawar J, Bhosle AA, Gupta P, Mehta Shiyal H, Borate VK, Mali YK (2024) Analyzing Acute
Lymphoblastic Leukemia Across Multiple Classes Using an Enhanced Deep Convolutional
Neural Network on Blood Smear, 2024 IEEE International Conference on Information Tech-
nology, Electronics and Intelligent Communication Systems (ICITEICS), Bangalore, India,
pp. 1–6. https://doi.org/10.1109/ICITEICS61368.2024.10624915
10. Naik DR, Ghonge VD, Thube SM, Khadke A, Mali YK, Borate VK (2024) Software-Defined-
Storage Performance Testing Using Mininet, 2024 IEEE International Conference on Informa-
tion Technology, Electronics and Intelligent Communication Systems (ICITEICS), Bangalore,
India, pp. 1–5. https://doi.org/10.1109/ICITEICS61368.2024.10625153
11. Dangore M, Ghanashyam Chendke ASRA, Shirbhate R, Mali YK, Kisan Borate V (2024)
Multi-class Investigation of Acute Lymphoblastic Leukemia using Optimized Deep Convolu-
tional Neural Network on Blood Smear Images, 2024 MIT Art, Design and Technology School
of Computing International Conference (MITADTSoCiCon), Pune, India, pp. 1–6. https://doi.
org/10.1109/MITADTSoCiCon60330.2024.10575245
12. Chaudhari A et al. (2024) Cyber Security Challenges in Social Meta-verse and Mitigation Tech-
niques, 2024 MIT Art, Design and Technology School of Computing International Conference
(MITADTSoCiCon), Pune, India, pp. 1–7. https://doi.org/10.1109/MITADTSoCiCon60330.
2024.10575295
Face Cursor Movement Using OpenCV
Abstract Some individuals are unable to use computers due to medical conditions.
The idea of eye controls is particularly advantageous for the advancement of natural
input as well as, and this is key, for the underprivileged and the disabled. Also,
they are able to operate the computer autonomously by incorporating a controlling
system. It benefits those with disabilities more. Those who can use computers without
a keyboard are needed. This one is especially helpful for individuals who can move
the cursor with their eyes. In this study, a camera is used to document eye movement.
First, find the centre of the pupil of the eye. The pointer will then travel differently
depending on the multiple variations in pupil position. All of these programmes share
the fact that keyboard and mouse input is the primary technique used while using a
personal computer. Although this wouldn’t be a problem for someone in good health,
it can be an impassable barrier for those with a restricted range of motion in their
limbs. In these situations, it would be better to employ input techniques that rely on
the brain region’s stronger capabilities, such as eye movements. A system that uses
a low-cost technique to operate a mouse pointer on a computing device was created
to allow such alternative input methods. The eye tracker uses photos captured by a
modified webcam to follow the motions of the user’s eyes. The computer screen is
then graphed using these eye movements to place the mouse pointer appropriately.
Automatically altering the location of the eyes while moving the mouse. A webcam
is used to photograph eye movement. The mouse cursor can be moved by moving
the face up, down, left, and right, and mouse actions may be controlled by speaking
and blinking the eyes. Several algorithms, including the Haar Cascade algorithm,
Template Matching, and Hough transformation, are utilised to carry out these tasks.
Our solution is primarily designed to enable successful computer communication
for persons with disabilities. People require artificial means of mobility like a virtual
keyboard for a variety of reasons. The number of persons who, as a result of a medical
condition, must move about with the aid of some object. Also, it is highly beneficial
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 389
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_31
390 R. S. M. Lakshmi Patibandla et al.
1 Introduction
Since they are utilised for business, education, and recreation, nowadays, personal
computers play a significant role in our daily lives [1]. All of these programmes share
the fact that keyboard and mouse input is the primary technique used while using
a personal computer. While this is not an issue for a healthy person, it can be an
impassable barrier for those with a restricted range of motion in their limbs. Under
these circumstances, it would be desirable to use input techniques that rely on the
region’s greater capabilities, such as eye movements [2]. A system that uses a low-
cost technique to control the mouse pointer on a computer system was created to allow
such alternative input methods. The eye tracker uses photos captured by a modified
webcam to follow the motions of the user’s eyes. The computer screen is then graphed
using these eye movements to place the mouse pointer appropriately. Automatically
altering the location of the eyes while moving the mouse. A webcam is used to photo-
graph eye movement. A significant number of people are now interested in creating
natural interactions between people and computers [3]. In universal computing, a
number of research for human–computer interfaces are presented. The vision-based
interface technology uses an input video picture to extract motion data without the
need of expensive machinery. As a result, de-developing human–computer interac-
tion systems using a vision-based approach is considered successful. Biometrics is
a current topic in human–computer interaction that relies on eyesight. Eye-tracking
research is distinct since it requires interactive applications. Nevertheless, to create
a vision-based multifunctional interaction between humans and computers system,
tracking the eyes and their identification are performed. Real-time eye input has been
used most commonly for disabled people who can only use their eyes for input [4,
5]. For many different reasons, people need artificial methods of movement like a
virtual keyboard. How many people need anything to help them move about because
of a medical issue. Also, incorporating a controlling mechanism into it gives them
the ability to move independently is highly beneficial. The idea of visual control is
Face Cursor Movement Using OpenCV 391
particularly advantageous for the growth of human inputs and, more importantly, for
the underprivileged and disabled [6].
Every everyday device requires manual operation and is inaccessible to those
with mobility issues [7, 8]. In order for persons with motor impairments to partic-
ipate in the information revolution, it is vital to develop alternative methods of
human–computer interaction. The development of an interface between humans and
computers for people with disabilities that uses a vision-based system to recognize
eyeballs and facial gestures is presented. To exert control in a non-intrusive the
proposed study includes face tracking, face identification, and a human–computer
interaction eyeblink recognition, speech recognition, and interpretation of blink
sequences in real time. To interface with computers using facial expressions and eye
movements rather than the typical mouse [9]. It aims to make the use of computers
quick and straightforward for those with physical disabilities as well as those who
are handless (Fig. 1).
Eye tracking is used to examine users’ attention patterns while they are doing
tasks or to provide hands-free computer usage for those who are unable to utilise
the standard mouse and keyboard-based control inputs [10, 11]. It will become more
obvious as eye-tracking technology develops in the future because it is preferable
to employ eye-tracking instead of conventional control methods, particularly for
impaired users. Eye tracking may sometimes be used in tasks where it makes sense,
such as when a camera uses the user’s eyes to focus the lens where the user is looking
at the moment [12]. The efficacy of eye-tracking technology may also vary owing to
several reasons, including poor precision (Fig. 2).
The accuracy and error rates of the Eye Mouse algorithm on a test subject were
recorded as part of the testing process. To guarantee a more precise detection, a
static test was conducted using a set separation between the camera and the subject’s
face [13]. During testing, the user just required to shift their head and eyes, and the
developer noted the tracking window’s accuracy level. A crucial stage in the creation
of interactive Software. It is a method to deepen immersion in the virtual environ-
ment in the context of games [14]. In contrast to the realism of gadgets like head-
mounted electrodes used in games, traditional interactions with a mouse, keyboard,
or gamepad are constrained. Focusing on novel player-player interactions with the
392 R. S. M. Lakshmi Patibandla et al.
virtual environment is an emerging trend. For instance, some methods employ a head-
piece device to track head motions [15]. We have attempted to investigate computer
vision in this research study with the overarching goal of creating a system that can
comprehend the motions of human face characteristics. The main objective motive is
in order to build a simple concept for face identification and face tracking that mimics
mouse motion [16]. We create a system that employs a camera to monitor a facial
feature, such as the nose’s tip, and utilises the movements of the detected feature
to control directly the mouse cursor on a computer [17]. Other parts of the face are
used to perform the mouse click. The head’s rotation and eye blinking are taken into
consideration while analysing facial motions. The head’s three-dimensional location
is monitored and shown on the pc screen in 2D coordinates. Blinks that are done on
purpose are noticed and taken as actions [18]. The real-time video of the individual
seated in front of the screen is how the tracker operates uniquely.
2 Literature Survey
An eye tracker is used to capture students’ movement of the eye while they are
debugging to be able to determine if and how medium and high and low-performance
students behave differently throughout this process. We invited 38 students studying
computer science to analyse two C programmes. Sequential analysis was used to the
students’ gaze path as they followed programme codes to identify relevant examina-
tion sequences [19]. Next, these noteworthy gaze route sequences were contrasted
with those of pupils who showed various debugging abilities. According to the find-
ings, high-performing students debugged programmes in a more logical way than
low-performing students, who cared for cling to a line-by-line approach and struggled
to rapidly determine the higher-level logic of the program [20]. Also, less-performing
students can often skip through the program’s logic and go straight to particular
suspicious lines to uncover vulnerabilities. In order to remember information, they
often had to go back to earlier statements, and they spent more time doing manual
calculations.
Face Cursor Movement Using OpenCV 393
3 Proposed Methodology
Our proposed technique employs the OpenCV to monitor eyeball movement and
manage cursor movement on a computer. The camera picks up the movement of the
eyeball, which OpenCV analyses. This makes cursor control possible.
A notification will appear on the screen in case of errors [28]. Iris detection
is performed on images from input supply that are focused on the middle of eye.
Following that, a mid-point is determined by adding the suggestions from the centres
of the left and right eyes [29]. For face detection, the Haar cascade technique is used.
Using the Haar cascade feature, an object is recognized [30]. The neighbouring
rectangle is taken into account by this feature at a certain spot in the detection
window. Two neighbouring rectangles that are located over the eye and cheek area
make up the common Haar feature for face detection [31, 32]. The eyes are then
discovered. Individual eye blinks are used in place of left and right mouse clicks to
open and close objects on the screen. To open and close this application, we utilise
394 R. S. M. Lakshmi Patibandla et al.
our mouths [33]. This programme begins to function when our mouth opens for the
first time, and it is shut off when our mouth opens for a second time (Fig. 3).
The suggested method’s objective is to
1. To create a wireless mouse control
2. To create a vision-based system
3. To combine voice and face gestures for directing movement of the mouse
4. To give instantaneous tracking of the eyes
5. To do away with the restrictions of a stationary head
Every time a single face is found, its position is computed and sent to the algorithm
for identifying characteristics. This method uses a face recognition algorithm based
on images of Haar faces. It is determined how many facial characteristics there are
and where they are located initially. The memory contains the features’ original
configuration [16, 34]. The difference between the feature’s current and original
locations is then determined for the chosen feature, which in this case is the tip of
the nose. After that, the average of all the discrepancies is determined. Hence, as the
head rolls, the tracker picks up a little movement (Fig. 4).
Face Cursor Movement Using OpenCV 395
Pseudocode
396 R. S. M. Lakshmi Patibandla et al.
With a 2 × 2 pixel grid pattern, it nonetheless produced usable findings since the
total area of all 4 regions—the top, bottom, left, and right corners—was enormous,
thus even little irregular eye movements were still detected. A 3 × 3 pixel grid design,
however, provided slight but now inaccurate placement of the user’s eye movements
concerning where they were looking. In actual life, there was a significant difference
in the monitoring window that was shrinking in region size for every +1/−1 pixel
shift. Second, erroneous pupil detection was brought on by reflection from bright
objects, such as a white tracking screen. Using a bright screen caused a reflection
onto the pupil region that caused the loss of dark areas when initialising it, which
led to the problem discussed above. This was because finding the pupil required
combining Integral imaging and Haar cascading features to locate the darkest region
of the eye.
4 Experimental Results
Our system’s goal is to provide hands-free access to the mouse using voice, eyeball
flashes, and facial expressions. Also, this technology allows us to provide the desired
result. The outcome is as follows: Eye, Mouth and Facial Recognition (Figs. 5, 6 and
7).
Face Cursor Movement Using OpenCV 397
5 Conclusion
As a result of the observations and data gathered, it can be said that the accuracy is
respectable despite the use of a web camera with a low resolution and the default
pre-defined classifiers offered by OpenCV. User research revealed a degree of preci-
sion in detecting and tracking eye movement, and the bulk of stability assessments
for the areas that covered were passed. To be able to identify dynamic movements
as opposed to only static ones, the pre-defined classifiers and overall algorithm must
undergo more testing and development. This will provide users greater flexibility
and high precision when looking at faces and remove restrictions on detecting and
moving the pupil. This research introduces a simple and affordable optical apparatus
for processing head motion to execute mouse functions. A camera, a computer, and
our application software make up the system. A mixture of software is used to analyse
camera images to identify the person’s head movement. Next, a non-linear transfor-
mation is used to this head position data to produce a matching screen position for the
mouse pointer. Eye blinks are used for clicking operations. The system’s examination
of how to control mouse cursor motions anywhere by utilising a person’s face, eyes,
and lips. Following the identification of this issue area, comparable industrial goods
398 R. S. M. Lakshmi Patibandla et al.
were evaluated and juxtaposed while their benefits and drawbacks were examined.
This apparatus was incredibly user-friendly, particularly when used with desktop
programmes. It displays rapidity and precision, which are sufficient for numerous
live apps and allows people with disabilities to benefit from a variety of computer
jobs.
References
1. Villanueva A, Cabeza R, Porta S (2011) Eye tracking system model with easy calibration. IEEE
2. Wankhede SS, Chhabria SA (2013) Controlling mouse motions using eye movements. IJAIEM
3. Mangaiyarkarasi M, Geetha A (2014) Cursor control system using facial expressions for
human-computer interaction. IJETCSE
4. Ohno T, Mukawa N, Kawato S (2011) Just blink your eyes: a head-free gaze tracking system.
IEEE
5. Wijesoma WS, Wee KS, Wee OC, Balasuriya AP, San KT, Soon KK. EOG based control of
mobile assistive platforms for the severely disabled. Proc IEEE Int Conf
6. Wu C-C, Hou T-Y (2015) Tracking students’ cognitive processes during program debugging—
an eye-movement approach. IEEE
7. Sung E, Wang J-G (2002) Study on eye gaze estimation. IEEE 32(3)
8. Ji Q, Zhu Z (2007) Novel eye gaze tracking techniques under natural head movement. IEEE
54(12)
9. Soukupova T, Cech J (2016) Real-time eye blink detection using facial landmarks. In: 21st
computer vision winter workshop, February 2016
10. Rosebrock A. Detect eyes, nose, lips, and jaw with dlib, OpenCV, and Python
11. Rosebrock A. Eye blink detection with OpenCV, Python, and dlib
12. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression
trees. In: CVPR
13. Zafeiriou S, Tzimiropoulos G, Pantic M (2015) The 300 videos in the wild (300-VW) facial
landmark tracking in-the-wild challenge. In: ICCV workshop
14. Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the
first facial landmark localization challenge. In: Proceedings of IEEE international conference
on computer vision (ICCV-W), 300 faces in-the-wild challenge (300-W), Sydney, Australia,
December 2013
15. Bhuyan HK, Ravi VK (2023) An integrated framework with deep learning for segmentation
and classification of cancer disease. Int J Artif Intell Tools (IJAIT) 32(02):2340002
16. Bhuyan HK, Chakraborty C, Pani SK, Ravi VK (2023) Feature and sub-feature selection for
classification using correlation coefficient and fuzzy model. IEEE Trans Eng Manag 70(5)
17. Bhuyan HK, Ravi VK (2021) Analysis of sub-feature for classification in data mining. IEEE
Trans Eng Manag
18. Bhuyan HK, Saikiran M, Tripathy M, Ravi V (2022) Wide-ranging approach-based feature
selection for classification. Multimed Tools Appl 1–28
19. Bhuyan HK, Ravi V, Brahma B, Kamila NK (2022) Disease analysis using machine learning
approaches in healthcare system. Health Technol 12(5):987–1005
20. Dontha MR, Sri Supriyanka N (2023) Image-based disease detection and classification of plant
using CNN. In: Marriwala N, Tripathi C, Jain S, Kumar D (eds) Mobile radio communications
and 5G networks. Lecture notes in networks and systems, vol 588. Springer, Singapore. https://
doi.org/10.1007/978-981-19-7982-8_22
21. Pullagura L, Dontha MR, Kakumanu S (2021) Recognition of fetal heart diseases through
machine learning techniques. Ann Roman Soc Cell Biol 25(6):2601–2615. https://www.ann
alsofrscb.ro/index.php/journal/article/view/5873
Face Cursor Movement Using OpenCV 399
22. Gharge T, Chitroda C, Bhagat N, Giri K. AI-smart assistant. Int Res J Eng Technol (IRJET)
6(1). e-ISSN: 2395-0056
23. Nomura K, Rikitake K, Matsumoto R (2019) Automatic whitelist generation for SQL queries
using web application tests. In: 2019 IEEE 43rd annual computer software and applications
conference
24. Dekate A, Kulkarni C, Killedar R (2016) Study of voice controlled personal assistant device.
Int J Comput Trends Technol (IJCTT) 42(1). ISSN: 2231-2803
25. Anerao R, Mehta U, Suryawanshi A. Personal assistant for user task automation. SSRG Int J
Comput Sci Eng (SSRG-IJCSE)
26. Bais H, Machkour M, Koutti L. A model of a generic natural language interface for querying
database. Int J Intell Syst Appl 8:35–44. https://doi.org/10.5815/ijisa.2016.02.05
27. Meng F, Chu WW (1999) Database query formation from natural language using semantic
modelling and statistical keyword meaning disambiguation
28. Mahmud T, Azharul Hasan KM, Ahmed M, Chak THC (2015) A rule based approach for NLP
based query processing. In: 2015 2nd International conference on electrical information and
communication technologies (EICT), Khulna
29. Mohite A, Bhojane V (2015) Natural language interface to database using modified co-
occurrence matrix technique. In: 2015 International conference on pervasive computing (ICPC),
Pune, pp 1–4
30. Ghosh PK, Dey S, Sengupta S (2014) Automatic SQL query formation from natural language
query. In: International conference on microelectronics, circuits and systems (MICRO-2014)
31. Solangi YA, Solangi ZA, Aarain S, Abro A, Mallah GA, Shah A (2018) Review on natural
language processing (NLP) and its toolkits for opinion mining and sentiment analysis. In: 2018
IEEE 5th International conference on engineering technologies and applied sciences (ICETAS),
Bangkok, Thailand, pp 1–4
32. Bhuyan HK, Chakraborty C (2022) Explainable machine learning for data extraction across
computational social system. IEEE Trans Comput Soc Syst 1–15
33. Huang B, Zhang G, Sheu PC (2008) A natural language database interface based on a prob-
abilistic context free grammar. In: IEEE International workshop on semantic computing and
systems, Huangshan, pp 155–162
34. Uma M, Sneha V, Sneha G, Bhuvana J, Bharathi B (2019) Formation of SQL from natural
language query using NLP. In: 2019 International conference on computational intelligence
in data science (ICCIDS), Chennai, India, pp 1–5. https://doi.org/10.1109/ICCIDS.2019.886
2080
Powerpoint Slide Presentation Control
Based on Hand Gesture
Abstract HCI, which stands for “human–computer interaction,” has been increas-
ingly concerned with natural interaction methods in recent years. Various ways we
interact with computers have benefited from the development of real-time hand
gesture recognition programs. The detection of hand motions calls for the use of
a camera. The primary form of participation is the employment of a web camera
as a virtual human–computer interaction device. In this body of work, we look into
how vision-based HCI methods of the present day make use of hand gestures. In
the event that consumers are unable to utilise any input device or touch it, this
project becomes incredibly useful. Gesture recognition makes it possible to complete
an activity without physically accessing the usual input devices (mouse, keyboard,
etc.). The user may doodle with his index finger and use his index and middle fingers
together to control the pointer’s movement on the screen. The user may erase their
artwork using the tips of their index, middle, and ring fingers. Using their little finger,
A. Kumar · G. Kumar
Department of Computer Engineering and Applications, GLA University, Mathura, UP, India
e-mail: [email protected]
G. Kumar
e-mail: [email protected]
K. U. Singh (B)
School of Computing, Graphic Era Hill University, Dehradun, India
e-mail: [email protected]
T. Singh
Department of Computer Science and Engineering, Graphic Era Deemed to Be University,
Dehradun, India
e-mail: [email protected]
T. Choudhury (B)
School of Computer Science, University of Petroleum and Energy Studies (UPES), Dehradun
248007, Uttarakhand, India
e-mail: [email protected]
K. Kotecha
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology,
Symbiosis International University, Pune 411045, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 401
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_32
402 A. Kumar et al.
the user may go on to the next file, and using their thumb finger and pointing to the
left, they can return to the previous file.
1 Introduction
webcam takes images. It’s divided to identify the hand. Hand Tracking Modules of
Computer Vision are used to recognise hand regions and determine how many fingers
are up and how far apart they are. Also provides hand bounding box information.
The approach assesses hand velocity and active fingers using this information. We
handle slide presentations.
2 Literature Review
computer pointer on the screen, however, hand motion can also affect its accuracy
[16].
3 Methodology
Our gesture alphabet has five hand motions to meet application needs. A little figure
with split fingers, an opened hand with fingers together, a fist, and the final motion
emerge while the hand is not visible in the camera’s frame of view. This motion is
Start, Next slide. Indicate location, write on slide, undo, and next slide. The Move
gesture allows left and right motions. Users show their thumbs for the left and little
fingers for the right.
Many gesture-to-motion transitions are shown in Fig. 1. Understanding skin tone
is vital. Problems with detection and gesture recognition are conceivable. One of the
app’s hardest elements is keeping the hand control in the camera’s field of vision
without entering the capture region. This is a really difficult obstacle to overcome.
User training has been shown to be effective in resolving this issue.
The procedure of collecting data for controlling a Powerpoint slide show with hand
gestures often entails capturing samples of hand motions while engaging with the
presentation software.
The following is an overview of the data collection procedure used in our research:
Creating the ideal conditions: In the first step, the setup for presentation software
(such as Microsoft Powerpoint) is up and running and ready.
Define gesture set: Next, we determine the hand gestures that will be used to control
the presentation so that everyone is on the same page. This could contain move-
ments such as going to the next slide, the slide before that, starting or stopping the
presentation, and annotating.
Data annotation: Annotate the slides of the presentation with the hand motion labels
that match each slide. We define the hand gesture that should activate the required
action for each slide in the presentation.
The location, as well as the lighting: Check that the hand can be seen unmistakably
within the range of the camera’s viewfinder. Ensure that the lighting conditions are
adequate so that shadows are kept to a minimum and the hand can be seen clearly.
Record gesture samples: While presenting the slides and carrying out the prede-
termined hand movements, we initiate the recording of video data from the
camera.
The initial processing of data: Reduced the amount of captured video data so that
it contained only the segments that are pertinent to each gesture.
3.2 Algorithm
The following algorithm script uses the cvzone library for hand tracking and OpenCV
for image processing. It allows one to control a presentation or slideshow by using
hand gestures.
1. Import the required libraries: cvzone.HandTrackingModule, cv2, os, and numpy.
2. Set up parameters such as webcam dimensions, frame reduction, image dimen-
sions, gesture threshold, folder path for presentation images, etc.
3. Initialize the webcam capture and set its dimensions.
4. Create a HandDetector object with a detection confidence of 0.8 and a maximum
of 1 hand.
5. Create empty lists and variables for storing images, delays, button states,
counters, drawing mode, annotation data, and image numbers.
6. Get a sorted list of image file names from the specified folder path.
7. Enter a loop to process each frame from the webcam:
• Read the current frame from the webcam and flip it horizontally.
• Load the current image from the presentation folder based on the image
number.
• Use the HandDetector to find hands and landmarks in the current frame.
• Draw a gesture threshold line on the frame.
• Check if a hand is detected:
406 A. Kumar et al.
It is possible for hand gesture recognition algorithms to make use of the Euclidean
Distance in order to measure the spatial distance between important spots or land-
marks on the hand. It is feasible to detect and categorise various hand motions by
measuring the distances between specified locations in a variety of hand positions
or gestures [17]. The Euclidean distance is the one that is utilised the vast majority
408 A. Kumar et al.
of the time in the field of computer vision. It will throw away the image structures
and will be unable to portray the real relationship that exists between the images. If
there is even a slight difference between the two photos, then the Euclidean distance
between them will be significantly increased. Using the Euclidean Distance formula
as shown below, one may determine the length of the path that separates two locations
in n-dimensional space. It is defined as the square root of the sum of the squared
discrepancies that exist between the two points’ respective coordinates.
n
Euclidean Distance = |X − Y | = (xi − yi )2 (1)
i=1
The coordinates of the two points in n-dimensional space are represented by the
arrays X and Y, respectively. The Euclidean Distance function performs an iteration
over each coordinate, computes the squared difference, and then stores the result in a
variable called distance. In the end, the square root of the total distance that has been
amassed is calculated, and this value is what is returned as the Euclidean distance
between the points X and Y.
In hand gesture recognition, a “bounding box” may be a rectangular box that entirely
encloses the hand or a hand-related ROI. The bounding box localises and isolates
the hand in an image or video frame, making hand motion analysis easier and more
accurate [18].
How bounding boxes may be used in hand gesture detection algorithms is
summarised below:
• Hand Detection: First, check for a hand in the incoming picture or video frame.
Hand detection may be done using computer vision techniques or a machine
learning model.
• Hand Localisation: After recognising a hand, localise the hand area inside the
frame. This is “hand localisation.” This may be done by setting the minimum
and maximum coordinates of the identified hand landmarks or using segmenta-
tion methods to segment the hand area. Another technique is to find the hand
landmarks’ lowest and maximum coordinates.
Using either the localised hand region or the lowest and maximum coordinates,
a bounding box is produced and used to define the area of interest. The hand or the
ROI that contains the hand is normally enclosed within the bounding box, which has
the form of a rectangle. Calculations are performed to determine the coordinates of
the top-left and bottom-right corners of the bounding box.
Powerpoint Slide Presentation Control Based on Hand Gesture 409
Hand gesture recognition uses for the bounding box: The bounding box has a
variety of applications in hand gesture recognition. The following are some examples
of common applications:
Classification of gestures: The area of the hand that is included within the bounding
box can be utilised as an input for gesture classification algorithms. In these algo-
rithms, features or patterns are retrieved from the hand region in order to categorise
the gesture that is being performed.
Gesture tracking: The bounding box can be used to follow the movement of the
hand or changes in hand positions across consecutive frames. This tracking is helpful
for analysing dynamic motions as well as tracking gestures over the course of time.
Gesture Segmentation involves isolating the hand within the bounding box in
order to perform the process of gesture segmentation, which involves separating the
hand from the background or other objects that are present in the scene. This is helpful
for continuing the examination of the hand motions or processing them further.
First position with the kittle figure is for the next slide which is shown in Fig. 3.
The next position is to point out any place on the slide in Fig. 4 as shown that
personal computer is pointed out.
Figure 5 shows the writing in any place on the slide. Here we have put a tick mark
on the hardware and software.
Figure 6 shows the undo position using three fingers. Here one tick mark is
removed from Fig. 5.
Figure 7 position is for the previous slide. In it, we are using the thumb for the
previous slide.
Powerpoint Slide Presentation Control Based on Hand Gesture 411
5 Conclusion
Control of the slide show can be achieved through the use of dynamic gestures.
There are some specific fingers like little finger, index finger, thumb, etc. can be
used to indicate the motion of the presentation slide. Because this does not require
any kind of training process to identify a hand gesture, there is no need to save any
images in a database in order to be able to recognise hand gestures. A method that
is based on hand segmentation, hand tracking, and gesture detection from extracted
hand characteristics has been suggested. The findings of the performance evaluation
of the system have demonstrated that the users are able to make use of this low-
cost interface to substitute more conventional interaction metaphors. The use of
hand gestures can be expanded to control real-time programmes such as Paint, PDF
Reader, and other similar programmes.
412 A. Kumar et al.
References
1. Matsuzaka Y, Yashiro R (2023) AI-based computer vision techniques and expert systems. AI
4(1):289–302
2. Shan C, Wei Y, Tan T, Ojardias F (2004) Real time hand tracking by combining particle filtering
and mean shift. In: Proceedings of the sixth IEEE automatic face and gesture recognition, FG04,
pp 229–674
3. Heap T, Hogg D (1998) Wormholes in shape space: tracking through discontinuous changes
in shape. In: Proceedings of the sixth international conference on computer vision, ICCV98,
pp 344–349
4. Jiang Y, Song L, Zhang J, Song Y, Yan M (2022) Multi-category gesture recognition modeling
based on sEMG and IMU signals. Sensors 22(15):5855
5. Kane L et al (eds) (2022) Challenges and applications for hand gesture recognition. IGI Global
6. Nigam S, Shamoon M, Dhasmana S, Choudhury T (2019) A complete study of methodology
of hand gesture recognition system for smart homes. In: 2019 International conference on
contemporary computing and informatics (IC3I), Singapore, 2019, pp 289–294. https://doi.
org/10.1109/IC3I46837.2019.9055608
7. Sharma H, Choudhury T (2022) Applications of hand gesture recognition. IGI Global, pp
194–207. https://doi.org/10.4018/978-1-7998-9434-6.ch010
8. Al Farid F, Hashim N, Abdullah J, Bhuiyan MR, Isa WNSM, Uddin J, Haque MA, Husen
MN (2022) A structured and methodological review on vision-based hand gesture recognition
system. J Imag 8(6):153
9. Faisal MAA, Abir FF, Ahmed MU, Ahad MAR (2022) Exploiting domain transformation and
deep learning for hand gesture recognition using a low-cost dataglove. Sci Rep 12(1):21446
10. Tripathi KM, Kamat P, Patil S, Jayaswal R, Ahirrao S, Kotecha K (2023) Gesture-to-text
translation using SURF for Indian sign language. Appl Syst Innov 6:35. https://doi.org/10.
3390/asi6020035
11. Rajalakshmi E et al (2023) Multi-semantic discriminative feature learning for sign gesture
recognition using hybrid deep neural architecture. IEEE Access 11:2226–2238. https://doi.
org/10.1109/ACCESS.2022.3233671
12. Charan CS, Meenakshi K, Bhavani Reddy V, Kashyap V (2023) Controlling power-point
presentation using hand gestures in real-time. In: 2023 7th International conference on trends
in electronics and informatics (ICOEI). IEEE, pp 251–254
13. Kumar C (2022) Hill climb game play with webcam using OpenCV. Int J Res Appl Sci Eng
Technol 10(12):441–453
14. Ahamed SF, Sandeep P, Tushar P, Srithar S (2023) Efficient gesture-based presentation
controller using transfer learning algorithm. In: 2023 International conference on computer
communication and informatics (ICCCI). IEEE, pp 1–5
15. Yuan T, Song Y, Kraan GA, Goossens RHM (2022) Identify finger rotation angles with ArUco
markers and action cameras. J Comput Inf Sci Eng 1–25
16. Tripathi D, Srivastava A (2021) Production of holograms through laser-plasma interaction with
applications. Int J Adv Res 9(12):227–231
17. Xu J, Wang H, Zhang J, Cai L (2022) Robust hand gesture recognition based on RGB-D data
for natural human–computer interaction. IEEE Access 10:54549–54562
18. Dang TL, Tran SD, Nguyen TH, Kim S, Monet N (2022) An improved hand gesture recognition
system using keypoints and hand bounding boxes. Array 16:100251
SQL Queries Using Voice Commands
to Be Executed
R. S. M. Lakshmi Patibandla , Sai Naga Satwika Potturi,
and Namratha Bhaskaruni
Abstract Add tables, delete tables, update tuples, and remove entries all require
SQL queries. SQL query execution could appear straightforward, yet even a small
mistake could result in serious issues. Keeping track of the queries and ensuring
that they are handled flawlessly is a time-consuming, exhausting procedure. Without
it, a bad query execution would result in bad data handling and eventual data loss.
Speaking a query out loud and clearly and letting the computer handle it are two
additional simple alternatives to typing it in. This is accomplished by our software
using built-in Python methods and straightforward techniques. Furthermore, the time
complexity might be greatly decreased with basic knowledge of NLP methods and
how they operate, and comparable outcomes could be seen with low topic knowledge
and high productivity.
1 Introduction
Two of the most popular technologies in the world of technology today are Python
and databases. This project’s primary goal was to merge these two rapidly developing
technologies and use Python to carry it out. Instead of typing out SQL queries, we just
dictate natural speech that is then translated into an SQL query. While data is typically
entered into phones via a keyboard, voice input has become a popular alternative. The
algorithm operates by removing the query’s important terms. After that, we create the
appropriate query and run it to get the results we want. Anyone who believes voice-
based input is superior to the traditional approach or is unfamiliar with the syntax
can use this program Python or SQL. Queries can be handled more successfully
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 413
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_33
414 R. S. M. Lakshmi Patibandla et al.
Traditional data is inputted into a smartphone using the keypad and a simple icon-
press. Individuals constantly use their smartphones to stay in touch with the outside
world. This includes using social media and other relationships in addition to phone
calls [8–12]. Therefore, we added a voice-enabled keyboard in addition to voice
search and voice commands like “Navigate to”. You can have access to a wide range
of tools and services through the Android SDK to improve the usability of your
applications for users who are deaf or visually impaired. Frameworks for speech
recognition, Text-To-Speech (TTS), and other examples are provided. Applications
for Android can use speech input and output.
Speech input and speech output using speech recognition and text-to-speech
services are made possible via the speech package included in the Android SDK [13–
15]. Android database management, function formats this automatically produced
query as follows (Fig. 2):
Cursor c = db.query
SQL Queries Using Voice Commands to Be Executed 415
2.2 Cyrus
Being revelatory, SQL stands a far better opportunity than regular dialect program-
ming to be the programming language for conceptual computing. It also stands to be
higher, stronger, and more improved. The viability of using SQL as a backend for
native tongue database development is examined with emphasis on keyword-based
[16].
Our method dramatically reduces SQL querying, keyword dependency, and SQL
table structure constraints. Researchers provide Cyrus, a portable voice search inter-
face for mobile devices, to subjective social datasets. Cyrus offers a wide range of
inquiry-based lectures that are suitable for a database course at the section level.
Moreover, Cyrus is not constrained to a predefined collection of catchphrases or
natural language sentence structures, allows for test database customization, and is
application-autonomous. When compared to the majority of contemporary portable
and voice-enabled frameworks, its cooperative error reporting is more natural, and
the iOS-based portable platform is also more accessible [17]. Although explanatory
programming seems more natural, research nevertheless finds the transition from
simple dialects to SQL to be extraordinarily challenging. They frequently struggle to
formulate demanding complex enquiries in semantic frameworks, especially those
that contain established subquestions or GROUP BY features.
416 R. S. M. Lakshmi Patibandla et al.
This technique for extracting semantic data from social web sources has been dubbed
Natural Language to SQL generation. However, the challenge lies in extracting the
underlying meaning of the query. In response, Garima Singh introduced a method
in 2016 known as an algorithm that converts natural language into SQL queries
for relational databases. This approach stems from the Three-Level Engineering of
NLTSQLC. Nevertheless, this method often comes with the drawback of potentially
computationally intensive processing of certain information [18]. Natural language
is one of the key disciplines of computer science, and it focuses on the interactions
that take place between computers and human language. More appealing sections of
the human–computer interaction can be found there.
This integrates spoken language variations with both natural language and speech.
To retrieve information from a database, prior familiarity with database management
systems (DBMS) is necessary. A software known as a database management system
(DBMS) is utilized for the storage and administration of data in databases. In this
context, individuals lacking specialized expertise might encounter challenges when
attempting to extract information [19].
Natural language processing techniques are used to resolve this issue and make it
easier for people to engage with computers. Natural language processing has appli-
cations in a variety of industries, including tourism, where a visitor can learn about
the top attractions in a city, housing options, the best locations nearby, and more.
Our approach focuses on identifying the correct query by receiving input in the form
of speech (Fig. 3).
The spoken question is accepted as input by the system, which then sends it to a
voice recognition engine. The output of that stage is the mixed-formatted input text
query. After being extracted, the right input query is then forwarded to tokenization.
The process of breaking the statement up into its component words and storing it
in the list is known as tokenization. After storing it in the list, unwanted tokens are
eliminated. The pre-stored synonym database, which comprises the words and their
synonyms, is used to map the tokens. The text translator receives the polished text
next. Clause extractor and mapper are included in the text translator. Using which a
middle query is produced and tokens are associated with the table name and attribute.
The SQL query is the outcome of this stage. The database is used to process this
SQL query, and accurate results are shown on the interface. The command prompt
will display the SQL query [3, 20].
Algorithm
Step 1: Accepting speech input from the user is the first step.
Step 2: Using a speech recognition engine, the speech is turned into text.
Step 3: Other statements are discarded in favour of the correct form of the statement,
which is kept.
SQL Queries Using Voice Commands to Be Executed 417
Step 4: Tokenize the input query statement by splitting it into smaller pieces and
storing them in a list.
Step 5. Delete any tokens from the list that are not necessary, such as the, an, etc.
Step 6. Map the tokens to the database characteristics and table name.
Step 7: Locate the tables that will include the data.
418 R. S. M. Lakshmi Patibandla et al.
4 Experimental Work
5 Conclusion
The “Executing SQL queries using Voice Commands” project outlines a to-be-made
API that gives users the option to speak their native language and have it trans-
lated into the appropriate SQL query. Speech inputs are widely used and getting
increasingly complex. This has altered how people live their daily lives and created
opportunities for a wide range of intriguing inventions and useful uses. This project
serves as a fundamental building element for apps that do away with the need for
traditional learning and query execution in favour of speaking queries aloud and
letting the system handle the conversion for the user. It uses the readings from the
microphones as inputs and forecasts a potential question for it. This article provides
a thorough analysis of the most recent developments in voice recognition and how
they might be applied to diverse projects that improve and simplify life. We lay
forth the fundamental ideas behind using voice recognition and transforming spoken
420 R. S. M. Lakshmi Patibandla et al.
language into something that can be turned into another programming language.
Further extensions to the initial project service are possible.
This can be done by including a few more extra decisions or by attempting to offer
admin privileges by incorporating TCL and DDL commands into the programme in
addition to the select privilege. Just now, we tried to execute the SELECT query from
the DML commands.
References
1. Bhuyan HK, Chakraborty C (2022) Explainable machine learning for data extraction across
computational social system. IEEE Trans Comput Soc Syst 1–15
2. Bhuyan HK, Ravi VK (2023) An integrated framework with deep learning for segmentation
and classification of cancer disease. Int J Artif Intell Tools (IJAIT) 32(02):2340002
3. Bhuyan HK, Chakraborty C, Pani SK, Ravi VK (2023) Feature and sub-feature selection for
classification using correlation coefficient and fuzzy model. IEEE Trans Eng Manag 70(5)
4. Bhuyan HK, Ravi VK (2021) Analysis of sub-feature for classification in data mining. IEEE
Trans Eng Manag
5. Bhuyan HK, Saikiran M, Tripathy M, Ravi V (2022) Wide-ranging approach-based feature
selection for classification. Multimed Tools Appl 1–28
6. Bhuyan HK, Ravi V, Brahma B, Kamila NK (2022) Disease analysis using machine learning
approaches in healthcare system. Health Technol 12(5):987–1005
7. Dontha MR, Sri Supriyanka N (2023) Image-based disease detection and classification of plant
using CNN. In: Marriwala N, Tripathi C, Jain S, Kumar D (eds) Mobile radio communications
and 5G networks. Lecture notes in networks and systems, vol 588. Springer, Singapore. https://
doi.org/10.1007/978-981-19-7982-8_22
8. Pullagura L, Dontha MR, Kakumanu S (2021) Recognition of Fetal heart diseases through
machine learning techniques. Ann Roman Soc Cell Biol 25(6):2601–2615. https://www.ann
alsofrscb.ro/index.php/journal/article/view/5873
9. Gharge T, Chitroda C, Bhagat N, Giri K. AI-smart assistant. Int Res J Eng Technol (IRJET)
6(1). e-ISSN: 2395-0056
10. Nomura K, Rikitake K, Matsumoto R (2019) Automatic whitelist generation for SQL queries
using web application tests. In: 2019 IEEE 43rd annual computer software and applications
conference
11. Dekate A, Kulkarni C, Killedar R (2016) Study of voice controlled personal assistant device.
Int J Comput Trends Technol (IJCTT) 42(1). ISSN: 2231-2803
12. Anerao R, Mehta U, Suryawanshi A. Personal assistant for user task automation. SSRG Int J
Comput Sci Eng (SSRG-IJCSE)
13. Bais H, Machkour M, Koutti L. A model of a generic natural language interface for querying
database. Int J Intell Syst Appl 8:35–44. https://doi.org/10.5815/ijisa.2016.02.05
14. Meng F, Chu WW (1999) Database query formation from natural language using semantic
modelling and statistical keyword meaning disambiguation
15. Mahmud T, Azharul Hasan KM, Ahmed M, Chak THC (2015) A rule based approach for NLP
based query processing. In: 2015 2nd International conference on electrical information and
communication technologies (EICT), Khulna
16. Mohite A, Bhojane V (2015) Natural language interface to database using modified co-
occurrence matrix technique. In: 2015 International conference on pervasive computing (ICPC),
Pune, pp 1–4
17. Ghosh PK, Dey S, Sengupta S (2014) Automatic SQL query formation from natural language
query. In: International conference on microelectronics, circuits and systems (MICRO-2014)
SQL Queries Using Voice Commands to Be Executed 421
18. Solangi YA, Solangi ZA, Aarain S, Abro A, Mallah GA, Shah A (2018) Review on natural
language processing (NLP) and its toolkits for opinion mining and sentiment analysis. In: 2018
IEEE 5th International conference on engineering technologies and applied sciences (ICETAS),
Bangkok, Thailand, pp 1–4
19. Huang B, Zhang G, Sheu PC (2008) A natural language database interface based on a prob-
abilistic context free grammar. In: IEEE International workshop on semantic computing and
systems, Huangshan, pp 155–162
20. Uma M, Sneha V, Sneha G, Bhuvana J, Bharathi B (2019) Formation of SQL from natural
language query using NLP. In: 2019 International conference on computational intelligence
in data science (ICCIDS), Chennai, India, pp 1–5. https://doi.org/10.1109/ICCIDS.2019.886
2080
A Compatible Model for Hybrid
Learning and Self-regulated Learning
During the COVID-19 Pandemic Using
Machine Learning Analytics
Abstract Educational models and learning styles are essential and have an evolu-
tionary necessity for the education industry. As a result, the research has identified
three objectives: (1) to study the context of hybrid learning management with self-
regulated learning style strategies during the COVID-19 pandemic, (2) to develop a
data science model for hybrid learning management with self-regulated learning
style strategies during the COVID-19 pandemic, and (3) to study the students’
learning achievements with the developed model. The data collection was the 44
higher education students who controlled the self-regulated learning styles in hybrid
learning situations during the COVID-19 pandemic at the School of Information and
Communication Technology, the University of Phayao. The research tool consisted of
statistical and supervised machine learning tools based on descriptive and predictive
analytics principles. The model performance evaluation employed a confusion matrix
and cross-validation techniques for testing. The research findings show that learners’
contexts in the COVID-19 pandemic have different learning behaviors and achieve-
ment styles under hybrid learning management strategies. The researchers success-
fully developed a prototype model for predicting learners’ learning achievement for
hybrid learning management with self-regulated learning style strategies. The results
of this research can further be used as a guideline for educational management in
unusual situations to improve the quality of learners and the academic industry.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 423
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_34
424 P. Nuankaew et al.
1 Introduction
The COVID-19 pandemic has devastated agencies and organizations worldwide [1–
4], including educational institutions [4], schools, colleges, universities, and training
institutes, to shut down corporate services. Moreover, the educational organization
has changed its operations in other ways, including limiting the working time, estab-
lishing coordination actions within the organization, and modifying the operating
model [4–6]. The related organizations have measures to provide online teaching
and learning for institutions involving students in the formal education system [5, 6].
Educational organizations must use distance learning mechanisms and technologies
to support teaching and learning [4]. However, the measure affects many students,
resulting in educational inequalities, inappropriate use of school supplies, improper
learning, and increased costs for parents and instructors.
They are learning behaviors and cognitive participation in online-offline and
hybrid learning environments [7–9]. The concept of online and offline mixed learning
is, therefore, referred to as the hybrid learning model. The study of the relation-
ship between online and offline study effectiveness is a space for researchers after
the COVID-19 pandemic. Moreover, the research area of applying artificial intelli-
gence and machine learning technologies to improve the quality of education is in
the interest of modern educators [10, 11]. They use big data to understand learner
behavior, predict learner achievement, recommend appropriate educational programs
for their potential [10, 12], etc. However, educators must realize learner behavior to
design an individualized learning approach.
From the significance and origin of the research, the primary purpose is to
develop a compatible model for hybrid learning and self-regulated learning during the
COVID-19 pandemic using machine learning analytics. There are three secondary
objectives. The first objective was to study the context of hybrid learning management
with self-regulated learning style strategies during the COVID-19 pandemic among
university students. The driving factor is the impact of policies and COVID-19 situ-
ations where conventional learning cannot be managed. Therefore, learning styles
must be evolved to keep pace with the times. The second objective was to develop
a data science model for hybrid learning management with self-regulated learning
style strategies during the COVID-19 pandemic. The data science model is the equip-
ment for studying and understanding data using advanced statistical principles. It
provides covered insights into data models that can be useful in planning educational
strategies. The third objective was to study the students’ learning achievements with
the developed model. This objective aims to evaluate the model’s performance and
deploy the model to find faults or deficiencies to improve the model’s efficiency.
The research scope was to study the relationship between learners’ learning
achievement through hybrid learning management. Learners can choose any of the
two learning channels. The first channel is regular learning management, teaching in
the classroom where students and teachers have had face-to-face learning activities.
On the second channel, teachers had live broadcast learning activities from the class-
room. Students from both channels will learn simultaneously, known as synchronous.
A Compatible Model for Hybrid Learning and Self-regulated Learning … 425
In addition, students from both channels must undertake pre-test and post-test activ-
ities to assess the knowledge gained in each lesson. Students can choose to study in
any format without compulsion.
All learning activities consist of 15 lessons—Lesson 1: Information Technology
Overview, Lesson 2: Digital Systems Fundamentals, Lesson 3: Database System
Overview, Lesson 4: Computer Software, Lesson 5: Computer Hardware, Lesson 6:
Computer Networks and Communications, Lesson 7: Internet Technology, Lesson 8:
Social Media and Search Engine, Lesson 9: Multimedia Technology and Infographic,
Lesson 10: Knowledge Management Systems, Lesson 11: Mobile and Electronic
Commerce, Lesson 12: Impact of Information Technology, Lesson 13: Ethics and
Internet Security, Lesson 14: Contemporary Information Technology, and Lesson
15: Future Trends and Technologies.
This research furnishes researchers with methodological insights into analyzing
self-regulated learning dynamics for hybrid learning. The findings from this research
could also demonstrate and deepen understanding of the complexity and regulation
of abnormal learning situations, which positively impact students’ performance and
learning achievement.
The research population was students enrolled in the course 221101 [5] Fundamental
Information Technology in Business at the School of Information and Communica-
tion Technology, the University of Phayao, during the first academic year 2022.
Sample selection is a purposive sampling method with the consent of the learners
in the course. They were informed about the voluntary nature of learning activ-
ities. Students can learn online with Microsoft Teams and choose a face-to-face
learning style in the classroom. Apart from that, researchers have passed the process
of requesting research ethics from the University of Phayao: UP-HEC 1.3/022/65.
The research design follows the self-regulated learning principles. The data collection
was a record of activities that occurred according to the self-regulated learning prin-
ciple, which was divided into four main activities: pre-test exams, post-test exams,
midterm exam, and final exam activities.
Pre-test and post-test exams are the same set of questions but were randomly
distributed during the exercise. Each set-test consists of 10 multiple-choice questions
with 10 min to complete. In addition, the exam is an online examination; students can
426 P. Nuankaew et al.
know the score immediately after the exam. During activities in each class, learners
take pre-tests and know their scores, after which learners set post-test goals.
The midterm exam activity is an activity to assess the synthesis of knowledge
acquired during the period from Week 1 to Week 8. Finally, the final exam activity
is a compilation of the knowledge gained from Week 10 to Week 16.
Of the four main activities, the researchers extracted 63 attributes from seven
categories: 15 pre-test exams, 15 post-test exams, 15 pre-test duration exams, 15
post-test duration exams, midterm exam scores, final exam scores, and learning
outcomes (grades).
Research tools are divided into two parts: descriptive analytics and predictive
analytics. Descriptive analytics is a fundamental analysis that gives an overview
of the data and the relationship between the data. They used descriptive analytics to
explain what has happened in the past and may be used to assist in decision-making.
It may use statistics such as finding proportions or percentages, measuring the data’s
central tendency, and finding the dataset’s correlation. The constituent tools in this
section are means, mode, median, maximum, minimum, standard deviation (S.D.),
and percentage.
The second research tool is predictive analytics. It serves as technology that learns
from experience or previous data to predict certain behaviors that will occur in
the future. It comprises several techniques, including advanced statistics, machine
learning, and data mining. In many areas, predictive analytics is modeling patterns
derived from historical data to identify the opportunities or risks that many decisions
are made daily.
Predictive analytics research tools are separated into three phases: clustering
optimal learner behavior, constructing predictive models, and performing majority
voting. Optimum clustering aims to understand cluster learning behaviors and intra-
cluster relationships. The techniques used include K-Means and K-Medoids. K-
Means is an unsupervised learning technique that is easy to understand because it
calculates the distance between data sets where clusters of data in the same group are
closely spaced [13, 14]. Distance calculations in K-Means use Euclidean distance
calculations to compare similarities. K-Medoids use the same principles as K-Means,
differing in that K-Medoids takes the actual dataset collected as the center point
[14, 15].
Constructing predictive models was the second phase of the process, with
researchers determining two designs of models: a single model analysis and the
ensemble models analysis. A single model analysis consists of three types of learning
tools: Decision Trees, K-Nearest Neighbors (KNN), and Naïve Bayes techniques [10,
12, 14]. The ensemble model analysis combines multiple supervised learning tools
to develop the most efficient model. The techniques used in this section include
Majority Vote, Gradient Boosted Trees, and Random Forests.
A Compatible Model for Hybrid Learning and Self-regulated Learning … 427
Please note that researchers apply optimal clustering results with hybrid learning
styles to develop the most appropriate model for describing their findings. Every
predictive model uses the results of each clustering technique to identify each record
as the class. The most appropriate model results were used to study the context
of learners in each cluster who the researchers predicted based on actual data that
had already happened in the course 221101 [5] Fundamental Information Tech-
nology in Business at the School of Information and Communication Technology,
the University of Phayao, during the first academic year 2022.
The results analysis aims to determine the model’s effectiveness developed from the
designed research process. The model performance testing process consists of two
parts. The first part is to design data partitions to develop the model and prepare
the data to test the model. The method used in this section is known as “the cross-
validation technique”. It divides the data into equal parts called “K-Fold”, where
K is the number of data groups to divide. After obtaining the required number of
clusters, take some data to create a model called “the training data set”. Then take
the rest of the data to test, called “the testing data set”. This step required working
together with “the confusion matrix” to determine the capability and potential of the
developed model.
The confusion matrix is a unique cross-tabulation table that takes two columns into
a summary table, including actual and prediction. The popularity of the confusion
matrix is that it is elementary to create and can calculate multiple statistics from this
table. The four most commonly used Classification Metrics are Accuracy, Precision,
Recall, and F1-Score indicators.
Accuracy is the overall accuracy (correctness) of the model. The model accuracy
calculation was calculated by dividing the total predicted data by the actual data.
Precision is a consideration of the ability to predict by classification by class. The
method for calculating the precision is to divide the number correctly, indicated by
the number available in the class.
The recall is a summary of accurate prediction results from actual data in each
class. The recall calculation is based on the actual predicted data divided by the
number of data present in the class.
Lastly, F1-Score is a harmonic mean average of precision and recall. It is calculated
using the formula from Eq. 1.
The research results were divided into three sections corresponding to the research
objectives: the context summary of the hybrid learning management model, the
results of model development, and the results of selecting the most appropriate model.
Context Summary of Hybrid Learning Model Management
The researchers designed the learning activities according to the principle of self-
regulated learning by having the students do pre-test activities, set learning goals,
and do post-test activities. The researchers can summarize the learners’ context, as
shown in Tables 1 and 2.
Table 1 shows learners were more interested in post-test than pre-test learning
activities. The researchers found that post-test activity scores were significantly
higher than pre-test activities. It averaged 9.22 points over the pre-test activities
average of 4.29 points. However, the researchers found that the average amount of
time spent doing post-test activities tended to be lower. The reason may be that
learners know the answers during learning activities.
Discussion of research findings is the last step in which the researcher can conclude
once the analysis and research results have been obtained. This research achieved all
the research objectives where the researchers found three key points: knowing the
learners’ context of hybrid learning, developing predictive models, and the model
performance is classified according to the following essential techniques.
Learners’ Context in Hybrid Learning Style
This research found that hybrid learning management during the COVID-19
pandemic in higher education achieves particularly satisfactory levels of learning
achievement, as shown in Tables 1 and 2.
Table 1 shows the analysis of the learning behavior of learners who participated
in all 15 activities throughout the semester. The conclusion from Table 1 is that
the learners improved and were able to score well in the post-test with an average
A Compatible Model for Hybrid Learning and Self-regulated Learning … 431
of 9.22 points and tended to spend less time on the post-test compared to the pre-
test activities. From the aforementioned significance, it can be concluded that the
hybrid learning management that learners can choose to study online or participate
in classroom activities has no different results. Therefore, it should be more widely
promoted and extended opportunities for educational options for learners.
Moreover, Table 2 shows that the majority of learners achieved learning achieve-
ment, and only 11.36% or five students failed in learning achievement. However, this
group of learners was clustered and analyzed for their probability of not achieving
learning achievement, as the researchers clustered and created a predictive model,
reported in part for the development of the model.
Reasonable Predictive Model
The results of the designed model development in two main steps, appropriate clus-
tering, and model construct consistent with the student context, as shown in Tables 3,
4, 5 and 6.
The researchers clarified and provided a suitable cluster analysis principle, as
shown by comparing the results of the two clustering techniques: K-Means and K-
Medoids techniques are in Tables 3 and 4 and the selection of optimal k values in
Table 5. The researchers found that the optimal number of clusters for clustering
the learners was three clusters. The members of each collection were distributed in
Table 5. Moreover, it was found that the K-Means technique has a uniform member
distribution. Therefore, the researchers decided to use the K-Means technique by
dividing the members into three clusters to develop a predictive model for learner
characteristics in this research.
After obtaining a suitable cluster, the researchers developed a prediction model
classified into two types: a single model analysis and an ensemble model analysis.
The analytical results are shown in Table 6. The research results concluded that the
model that should be utilized and furthered is the majority vote model. It has an
accuracy of 90.50%, as detailed model performance shown in Table 7.
Table 7 shows that the majority vote model effectively distributes the predic-
tions across all clusters. The researchers, therefore, concluded that this research was
successful and achieved all research objectives, and it deserves to be praised and
spread further.
4 Conclusion
This research studies a compatible model for hybrid learning and self-regulated
learning during the COVID-19 pandemic using the K-Means and K-Medoids tech-
niques for clustering students, the elbow technique for optimal clustering results,
and classification and ensemble techniques for creating a predictive model and the
model performance analysis. The results showed three optimal clustering students
were appropriate for learners’ behavior and that it was consistent with the high-
performing model by the majority vote model with the highest validity, an accu-
racy of 90.50%. It helps instructors to manage activities in the course for learners’
context in hybrid learning management with self-regulated learning style strategies
and improving the student’s learning achievements. In future studies, the researchers
plan to apply in another course and study the relationship between score and time of
testing.
5 Research Limitations
The limitation of this research is that the researchers designed and strictly controlled
their activities. However, the researchers did not use pre-test scores to analyze
achievement or grades. It resulted in a few groups of students not cooperating with
pre-test activities, with students concentrating on post-test activities. Therefore, the
researcher must use the means to replace the missing values in the appropriate clus-
tering analysis. Therefore, future research requires researchers to find strategies for
engaging students in all activities designed by researchers.
Acknowledgements This research project was supported by the Thailand Science Research and
Innovation Fund and the University of Phayao (Grant No. FF66-UoE002). In addition, this research
was supported by many advisors, academics, researchers, students, and staff. The authors would
like to thank all of them for their support and collaboration in making this research possible.
References
S. Modi
Karmaveer Bhaurao Patil College of Engineering, Satara, India
e-mail: [email protected]
Y. Mali (B) · A. Pathan
G.H. Raisoni College of Engineering and Management, Wagholi, Pune, Maharashtra, India
e-mail: [email protected]
A. Pathan
e-mail: [email protected]
R. Kotwal
JSPM’s Bhivarabai Sawant Institute of Technology and Research, Pune, India
e-mail: [email protected]
V. Kisan Borate
Dr. D. Y. Patil College of Engineering and Innovation, Talegoan, Pune, India
e-mail: [email protected]
P. Khairnar
Ajeenkya D. Y. Patil School of Engineering, Pune, India
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 435
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_35
436 S. Modi et al.
1 Introduction
Effective communication plays a pivotal role in the lives of all creatures, serving as the
foundation for meaningful interactions and mutual understanding among humans, as
McFarland noted [1]. It enables the establishment of relationships, fosters intimacy,
and acts as a conduit for sharing knowledge and understanding between people and
organizations. Despite its significance, over 250–300 million individuals globally
face hearing and speech impairments, according to Wikipedia. For those who are
deaf or mute, sign language stands as their primary communication method [2]. Yet,
the challenge of communicating with those who are hearing arises from the general
lack of familiarity with sign language, as highlighted by studies into sign language
translation systems. Sign language, a rich natural language, employs hand shapes,
positions, movements, and facial expressions to convey meaning, boasting its own
grammar and vocabulary akin to spoken languages [3]. Nonetheless, sign language
proficiency among hearing people is rare, and research into its translation is still
nascent in many regions.
Gesture recognition technology, crucial for bridging communication gaps,
branches into sensor-based and vision-based systems [4]. Sensor-based approaches
utilize data gloves or motion sensors to capture detailed gesture information,
providing precision but at the cost of convenience, as wearing a sensor-laden glove
can hinder natural signing flow and reduce user comfort [5]. Vision-based systems, in
contrast, rely on image processing to detect and analyze gestures, offering a less intru-
sive user experience since they require no additional wearable devices [6]. However,
this method faces its own challenges, including dealing with complex backgrounds,
variable lighting conditions, and recognizing gestures that involve more than just
hand movements.
1.1 Objectives
2 Literature Review
Hand signal acknowledgment innovation arises as a vital guide for upgrading commu-
nication for people who are hard of hearing or discourse weakened, using PC vision
and AI calculations to make an interpretation of hand movements into discourse or
text [7]. This writing review combines late exploration endeavors pointed toward
propelling hand signal acknowledgment to more readily uphold hard-of-hearing and
discourse-weakened people [8].
This survey digs into different examinations that have researched hand motion
recognition as a way to work with correspondence for individuals who are hard
of hearing, quiet, or hearing-hindered [9]. These investigations utilize a scope of
AI procedures, including convolutional brain organizations (CNN), support vector
machines (SVM), intermittent brain organizations (RNN), and fake brain organiza-
tions (ANN), to decipher hand developments for correspondence. Key discoveries,
systems, and commitments from each study are featured [10].
Among the outstanding works, it presented a hand motion acknowledgment frame-
work using CNN, accomplishing critical exactness in sign motion acknowledgment to
empower viable client correspondence; it investigated an intuitive framework joining
discourse and motion acknowledgment through CNN [11]. They fostered an assistive
glove upgrading communication for the hard of hearing or nearly deaf by combining
voice and motion modalities. They zeroed in on continuous American Commu-
nication through signing (ASL) letter acknowledgment utilizing SVM, promising
applications in gesture-based communication translation. They introduced a frame-
work for perceiving hand motions and changing them into discourse, supporting
correspondence with non-underwriters [12].
They proposed a continuous signal acknowledgment framework utilizing an
LSTM model for discourse transformation. They fostered a bilingual sign acknowl-
edgment framework utilizing picture-based strategies, empowering discussions in
communication via gestures “Simple Talk” deciphers Sri Lankan Communication
via gestures into communicating in language utilizing computer-based intelligence,
working with collaboration with non-SLSL speakers [13]. To present a CNN-based
discussion motor for those with hearing and vocal disabilities investigated activity
and communication via gestures acknowledgment utilizing AI [14].
Further commitments from various creators, the modified model for motion distin-
guishing proof, the PCA-based CNN technique for communication through signing
acknowledgment, and CNN-based framework for static hand motion acknowledg-
ment [15]. Inspected CNN-based include combination for perceiving dynamic sign
motions, offering bits of knowledge into highlight extraction and grouping for
communication through signing acknowledgment [16].
The looked into concentrates on features a range of AI applications, from CNNs
and SVMs to RNNs and ANNs, for hand signal acknowledgment pointed toward
helping people with hearing, discourse, or language weaknesses [17]. These head-
ways mean the capability of assistive advancements to close the correspondence hole
438 S. Modi et al.
for those with incapacities, featuring the continuous development close by signal
acknowledgment for upgraded informative communications [18].
3 Proposed Methodology
a b c
In the information assortment period of our review, we influence the OpenCV library
for picture control close by the CVZone module for hand discovery and motion order.
Our dataset envelops 27 novel static hand motions, with each signal addressed by
roughly 2100 pictures of 256 by 256 pixels in size. The hand recognition module
uses complex PC vision strategies to recognize and follow human hands in video
takes care of continuously. This module unequivocally decides the area of the hands
inside an edge and works out their 2D central issues, which incorporate a sum of 34
focuses covering fingertips, the focal point of the palm, and wrist positions [22].
Figure 2 shows instances of a portion of the hand motion classes highlighted in
our research, including “a, b, c, Clear, Expert Clear, and Space.”
Choosing a suitable number of ages for preparing our model is an urgent decision
in our exploration cycle. Preparing for additional ages can improve the model’s accu-
shocking yet could expand the preparation span and raise the gamble of over fitting.
Then again, a predetermined number of ages probably won’t permit the model to learn
sufficiently, prompting under fitting and thusly, and diminished precision and shoddy
performance. To address this, we have directed various tests through experimentation,
at last choosing to prepare our model in two stages, each comprising 43 ages. This
approach intends to adjust precision and preparation time actually, ensuring ideal
model execution [23].
mastery. It offers a direct stage for preparing AI models by giving marked instances
of information.
The application works with the making of three kinds of models: picture grouping,
sound characterization, and posture order. Picture arrangement models can figure out
how to recognize various articles or examples in pictures, while sound characteriza-
tion models can distinguish different sounds. Present order models, then again, can
perceive different body postures and signals.
To prepare a model utilizing Workable Machine, clients input named information
models for the model to gain from. The program then, at that point, utilizes AI
strategies to prepare the model utilizing the given information. Clients have the
choice to test and refine the model to upgrade its exactness.
Workable Machine fills in as a significant device for various applications,
including training, imaginative undertakings, and examination. Its instinctive connec-
tion point and nonattendance of programming prerequisites make it open to an
expansive crowd, empowering a large number of clients to use AI innovation really.
In this work, we are using various Python libraries that are following:
• CV_Zone
“CV” possibly means “PC Vision,” a feature of computerized reasoning dedicated
to empowering robots to decipher and figure out visual boosts from their surround-
ings, including pictures and recordings. The “CV_Zone” logical means a particular
space or application inside PC vision innovation. PC vision finds assorted reasonable
applications, including picture and video acknowledgment, facial recognition, object
following, independent vehicles, and clinical picture examination, among others.
• Open_CV
Open_CV remains a broad open-source library taking care of PC vision, AI, and
picture-handling errands. It offers support for multiple programming languages,
including Python, C++, and Java. With its capabilities, Open_CV can analyze movies
and photos to identify objects, individuals, and even decipher human handwriting.
Integration with various libraries, such as the sophisticated numerical operations
library Num_Py, expands its utility manifold. By combining Open_CV with Num_
Py, one can execute all operations feasible with Num_Py, thus enhancing the arsenal
of tools at one’s disposal. Embarking on an Open_CV tutorial will provide a thor-
ough understanding of image processing principles. Through a variety of Open_CV
projects, one can delve into more intricate concepts, including advanced image and
video manipulations.
Hand Gesture Recognition and Real-Time Voice Translation … 441
its simplicity and adaptability, T_kinter remains a favored option for crafting Python
desktop applications with intuitive GUIs.
4 System Flowchart
See Fig. 3.
Fig. 3 Flowchart
Hand Gesture Recognition and Real-Time Voice Translation … 443
All alphabets were successfully predicted with 97.68% accuracy. The predicted
alphabetic images are depicted in Fig. 4(i), (ii).
After completing the words, our attention shifted towards creating sentences. We
achieved real-time and optimal accuracy and efficiency in this endeavor as well.
Figure 6 showcases the results of speech conversion, sentence generation, and hand
gesture recognition.
The recommended model, utilizing Adam as an enhancer, accomplishes a prepa-
ration exactness of 97.85% and an approval precision of 92.63%. Commonly, these
organizations comprise many layers and a large number of channels. Table 1 presents
444 S. Modi et al.
measurable information in regard to exactness and test size sorted by class. Figure 7a,
b delineate diagrammatical portrayals of exactness per age and misfortune per age.
6 Conclusions
This study features the turn of events and sending of a framework for Amer-
ican Communication through signing (ASL) acknowledgment. ASL works with the
simplicity of correspondence for individuals with hindrances, as it considers the
direct utilization of letter sets. We upgraded our model by consolidating extra signs
in ASL, expanding both efficiency and precision. To additionally refine the acknowl-
edgment framework, it is basic to gather more datasets under changing lighting
Hand Gesture Recognition and Real-Time Voice Translation … 445
References
1. Vaidya AO, Dangore M, Borate VK, Raut N, Mali YK, Chaudhari (2024) A deep fake detection
for preventing audio and video frauds using advanced deep learning techniques. In: 2024 IEEE
Recent advances in intelligent computational systems (RAICS), Kothamangalam, Kerala, India,
pp 1–6. https://doi.org/10.1109/RAICS61201.2024.10689785
2. Karajgar MD (2024) Comparison of machine learning models for identifying malicious URLs.
In: 2024 IEEE International conference on information technology, electronics and intelligent
communication systems (ICITEICS), Bangalore, India, pp 1–5. https://doi.org/10.1109/ICITEI
CS61368.2024.10625423
3. Naik DR, Ghonge VD, Thube SM, Khadke A, Mali YK, Borate VK (2024) Software-defined-
storage performance testing using mininet. In: IEEE International conference on information
technology, electronics and intelligent communication systems (ICITEICS), Bangalore, India,
pp 1–5. https://doi.org/10.1109/ICOEI48184.2020.9143031
4. Chaudhari A (2024) Cyber security challenges in social meta-verse and mitigation tech-
niques. In: MIT art, design and technology school of computing international conference
(MITADTSoCiCon), Pune, India, pp 1–7. https://doi.org/10.1109/MITADTSoCiCon60330.
2024.10575295
5. Mali Y, Pawar ME, More A, Shinde S, Borate V, Shirbhate R (2023) Improved pin entry
method to prevent shoulder surfing attacks. In: 14th International conference on computing
communication and networking technologies (ICCCNT), Delhi, India, pp 1–6. https://doi.org/
10.1109/ICCCNT56998.2023.10306875
6. Modi S, Mali YK, Borate V, Khadke A, Mane S, Patil G (2023) Skin impedance technique to
detect hand-glove rupture. In: 2023 OITS International conference on information technology
(OCIT), Raipur, India, pp 309–313. https://doi.org/10.1109/OCIT59427.2023.10430992
7. Chaudhari A, Dargad S, Mali YK, Dhend PS, Hande VA, Bhilare SS (2023) A technique
for maintaining attribute-based privacy implementing blockchain and machine learning. In:
IEEE international Carnahan conference on security technology (ICCST), Pune, India, pp 1–4.
https://doi.org/10.1109/ICCST59048.2023.10530511
8. Mali Y, Pawar ME, More A, Shinde S, Borate V, Shirbhate R (2023) Improved pin entry method
to prevent shoulder surfing attacks. In: 2023 14th International conference on computing
communication and networking technologies (ICCCNT), Delhi, India, pp 1–6. https://doi.
org/10.1109/ICCCNT56998.2023.10306875
9. Bhongade A, Dargad S, Dixit A, Mali YK, Kumari B, Shende A (2024) Cyber threats in social
metaverse and mitigation techniques. In: Somani AK, Mundra A, Gupta RK, Bhattacharya S,
Mazumdar AP (eds) Smart systems: innovations in computing. SSIC 2023. Smart innovation,
systems and technologies, vol 392. Springer, Singapore. https://doi.org/10.1007/978-981-97-
3690-4_34
10. Mali YK, Mohanpurkar A (2014) Advanced pin entry method by resisting shoulder surfing
attacks. In: 2015 International conference on information processing (ICIP), Pune, India, pp
37–42. https://doi.org/10.1109/INFOP.2015.7489347
11. Mali Y, Chapte V (2014) Grid based authentication system. Int J Adv Res Comput Sci Manag
Stud 2(10):93–99
12. Borate V, Mali Y, Suryawanshi V, Singh S, Dhoke V, Kulkarni A (2023) IoT based self alert
generating coal miner safety helmets. In: 2023 International conference on computational
intelligence, networks and security (ICCINS), Mylavaram, India, pp 01–04. https://doi.org/10.
1109/ICCINS58907.2023.10450044
13. Mali Y, Sawant N (2023) Smart helmet for coal mining. Int J Adv Res Sci Commun Technol
(IJARSCT) 3(1). https://doi.org/10.48175/IJARSCT-8064
14. Pawar J, Bhosle AA, Gupta P, Mehta Shiyal H, Borate VK, Mali YK (2024) Analyzing acute
lymphoblastic leukemia across multiple classes using an enhanced deep convolutional neural
network on blood smear. In: IEEE International conference on information technology, elec-
tronics and intelligent communication systems (ICITEICS), Bangalore, India, pp. 1–6. https://
doi.org/10.1109/ICITEICS61368.2024.10624915
Hand Gesture Recognition and Real-Time Voice Translation … 449
15. Lonari P, Jagdale S, Khandre S, Takale P, Mali Y (2021) Crime awareness and registration
system. Int J Sci Res Comput Sci Eng Inf Technol (IJSRCSEIT) 8(3):287–298. ISSN: 2456-
3307
16. Pathak J, Sakore N, Kapare R, Kulkarni A, Mali Y (2019) Mobile rescue robot. Int J Sci Res
Comput Sci Eng Inf Technol (IJSRCSEIT) 4(8):10–12. ISSN: 2456-3307
17. Dhote D, Rai P, Deshmukh S, Jaiswal A, Mali Y (2019) A survey: analysis and estimation of
share market scenario. Int J Sci Res Comput Sci Eng Inf Technol (IJSRCSEIT) 4(8):77–80.
ISSN: 2456-3307
18. Asreddy R, Shingade A, Vyavhare N, Rokde A, Mali Y (2019) A survey on secured data
transmission using RSA algorithm and steganography. Int J Sci Res Comput Sci Eng Inf
Technol (IJSRCSEIT) 4(8):159–162. ISSN: 2456-3307
19. Chougule S, Bhosale S, Borle V, Chaugule V, Mali Y (2020) Emotion recognition based
personal entertainment robot using ML and IP. Int J Sci Res Sci Technol (IJSRST) 5(8):73–75.
Print ISSN: 2395-6011, Online ISSN: 2395-602X
20. Lokre A, Thorat S, Patil P, Gadekar C, Mali Y (2020) Fake image and document detection using
machine learning. Int J Sci Res Sci Technol (IJSRST) 5(8):104–109. Print ISSN: 2395-6011,
Online ISSN: 2395-602X
21. Hajare R, Hodage R, Wangwad O, Mali Y, Bagwan F (2021) Data security in cloud. Int J Sci
Res Comput Sci Eng Inf Technol (IJSRCSEIT) 8(3):240–245. ISSN: 2456-3307
22. Mali Y, Upadhyay T (2023) Fraud detection in online content mining relies on the random
forest algorithm. SWB 1(3):13–20. https://doi.org/10.61925/SWB.2023.1302
23. Mali YK, Darekar SA, Sopal S, Kale M, Kshatriya V, Palaskar A (2023) Fault detection of
underwater cables by using robotic operating system. In: 2023 IEEE International Carnahan
conference on security technology (ICCST), Pune, India, pp 1–6. https://doi.org/10.1109/ICC
ST59048.2023.10474270
IoT-Based Smart EV Power Management
for Basic Life Support Transportation
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 451
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_36
452 M. Hema Kumar et al.
1 Introduction
Petroleum-based fuels have unquestionably risen to the top of the global trans-
portation fuel market [1–3]. However, alternative fuels and propulsion systems
that can boost efficiency and lower emissions are being pursued due to consider-
ations including the United States’ rising dependency on dwindling oil sources,
environmental concerns, CAFE (corporate average fuel economy) rules, etc. The
improvement in fuel economy has several advantages.
Despite an economy that is becoming more energy efficient, the United States
still depends on foreign oil [4]. Around 11.5 million of the 19.5 million barrels of oil
that Americans use each day are imported. Almost half of the oil used in the United
States is for vehicles and trucks.
The world’s greatest rate of carbon emissions is in the United States. A third of
them have to do with transportation. One major contributing cause to the rise in
carbon emissions is the failure of automakers to improve fuel efficiency.
Future automobiles will need to provide more power to enable technologies
like collision avoidance systems, vehicle stability control, navigation, etc. while
consuming less gasoline and emitting fewer emissions due to carbon dioxide being
the main greenhouse gas.
There is a very high demand for energy due to global economic growth and rapid
population growth. Conventional fossil fuels, such as coal and oil, are expensive
and seriously pollute the environment. Using renewable energy sources, such as
batteries, biomass, energy, and co-generation, to meet energy needs is essential for
societal progress and sustainable growth [5–7]. Energy efficiency and the use of
renewable energy sources are the two main tenets of a sustainable energy system.
The hybrid systems that are proposed, including battery systems. In this strategy,
battery is used as a tool and renewable battery resources are used as primary energy
sources. The surplus power generated by the battery and system is sent to the battery.
More power is provided to the dump load after the battery is fully charged. Battery
backup will provide electricity to meet load demand if battery and alternative energy
system output is insufficient due to weather-related problems. Several energy sources
are additionally connected to the bus through suitable interface circuits. When addi-
tional energy generation resources are available, the suggested hybrid system may
also be simply expanded.
IoT-Based Smart EV Power Management for Basic Life Support … 453
3 Literature Survey
4 Proposed Method
For monitoring SOC and SOH battery conditions in the vehicle, the suggested system
includes a number of sensors, including a current sensor, voltage sensor, and a temper-
ature sensor. Voltage and current sensors are used to continuously monitor the battery
454 M. Hema Kumar et al.
voltage and current, respectively. The current sensor can measure currents up to 5 A,
while the voltage sensor can detect voltages up to 25 V DC. The battery’s temperature
will be measured by the temperature sensor, which will then determine the battery’s
performance. The outputs from the current, voltage, and temperature sensors are
analogue in nature, so we must convert them into digital format. To do this, we will
utilize the controller’s built-in ADC, which is a 10-bit, 13-channel ADC. DC–DC
convertor is used. Thermoelectric cooler is placed above the battery in order to cool
the battery when its temperature rises above the normal temperature. ESP8266 Node
MCU is used (Fig. 1).
Advantages of the proposed system are the following:
• Electric vehicles (EV) contribute significantly to lowering the carbon footprint
caused by the transportation sector’s gasoline usage.
• When an electric car is parked, the batteries’ stored energy is rendered inactive.
Provides additional advantages including distributed generation, voltage control,
voltage shaving, and speedy charging for electric vehicles. Due to this, hybrid
bi-directional Dual Active Bridge (DAB)-based AC-DC converters must be used
to charge the batteries of electric vehicles.
• The primary subjects of this chapter are the creation and execution of a fixed,
separated, high-efficiency SiC AC-DC converter. The clamped circuit’s negative
effects on the input current are removed (t). High power density is made possible
by using Silicon Carbide (SiC) transistors rather than conventional silicon devices.
• They provide lower switching losses than silicon devices and minimise the size of
the filter components since they work at high temperatures, allowing designers to
employ a smaller heat sink and minimising the cost of the system’s overall design.
The combination of conventional and clean energy production generated under the
expected atmospheric circumstances must satisfy the hydraulic and electrical require-
ments. Due to this, battery power and reservoirs of water will be used by the system
to smooth out or minimise both the power supply and shortages of water.
There are two methods for distributing the electricity produced by renewable
sources between electrical and hydraulic demands; the first is referred to as an
“uncoupled power management technique.” Using this method, the electrical and
other loads are fulfilled in accordance with their needs regardless of the quantity
of intermittent power generation (i.e., the need for water and power). This theory
maintains that the operation of electrical as well as hydraulic loads depends only on
their needs and not on varying power generation (i.e., the need for water and power).
The instance, when tank is full, its need for water is satisfied by operating the motor
pumps at a low power (i.e., Level L1 for the first motor pump and Level L2 for the
second motor pump). It is compatible with the traditional method of managing other
loads, which only activates a pump to replenish the tank when it dips below a specific
level. It is similar to a “flushing technique.”
The intermittent nature of batteries and battery sources as well as the battery
capacity really place a cap on the amount of energy that is available. Later, two
methods are created to control the battery’s charging status (SOC) in accordance
with the needs of the load. Whereas method 1 gives the power load precedence over
the pneumatic load, method 2 gives the hydraulic load precedence.
In other words, under Approach 1, electric loads are employed first, followed by
hydraulic pumps, while the opposite is true for Strategy 2. Two solutions do not rely
on a renewable energy source for load management. In this instance, it is required to
adjust the battery’s power and energy requirements in order to take into account the
imbalance between the source and the necessary loads.
6 Microcontroller
excellent and affordable solution for many control applications. It combines an 8-bit
RISC CPU with in-system personal programming flash memory [15]. The ATmega16
AVR has access to many system and software development tools. It includes C
compilers, macro compilers, program debuggers/simulators, in-circuit emulators,
and evaluation tools.
Because of its form, the main product is thought of as a non-standard item. If someone
has excellent energy control abilities, one can tell when something important happens.
The conversion begins after a little under 0.2 s. It is possible to monitor both with
and without a connection to the electricity, converter, and demand (Figs. 2 and 3).
Fig. 2 Results
Fig. 3 Results
IoT-Based Smart EV Power Management for Basic Life Support … 457
The hybrid battery and battery inverter run for 0.6 s each turn to measure the
system’s effectiveness. The system complies with the execution requirements in this
file. In an independent hybrid battery device, a voltage sensor collaborates with a
tiny switch to produce a reference current when a circumstance is identified.
Internal resistance increases with cell and battery ageing To analyze if this param-
eter is useful for SOH determination with on-board data, the comparison was done
relating the internal resistance and temperature as shown in Fig. 4. Instead of
observing an ascending slope as the SOH decreased, it could be observed that this
relation goes down independently, therefore, it showed that there was something
more relevant than ageing forcing this behavior and that is temperature which is a
key factor for SOH calculation.
8 Conclusion
is provided on vehicle sensors, some of which are also present in EVs. Last but not
least, we talked about the many kinds of microfabricated sensors that have just lately
hit the market as a consequence of MEMS-based research and may be utilized for
a variety of tasks, including energy harvesting, motion detection, and battery sense.
This tiny sensor will help the following automobiles save money, free up space, and
enhance their identifying skills.
References