0% found this document useful (0 votes)

9 views452 pages

Springer Proceddings

The document is a collection of proceedings from the International Semantic Intelligence Conference (ISIC) 2023, focusing on advancements in artificial intelligence, machine learning, and semantic technologies. It includes 36 manuscripts from 92 authors across various institutions, covering topics such as knowledge graphs, AI applications, and educational technologies. The proceedings aim to foster discussion and innovation in intelligent systems and are part of the Lecture Notes in Electrical Engineering series published by Springer.

Uploaded by

editor.ijniet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views452 pages

Springer Proceddings

Uploaded by

editor.ijniet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 452

Lecture Notes in Electrical Engineering 1258

Sarika Jain
Nandana Mihindukulasooriya
Valentina Janev
Cogan Matthew Shimizu Editors

Semantic
Intelligence
Select Proceedings of ISIC 2023
Lecture Notes in Electrical Engineering

Volume 1258

Series Editors

Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Napoli, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, München, Germany
Shanben Chen, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore,
Hong Kong
Rüdiger Dillmann, University of Karlsruhe (TH) IAIM, Karlsruhe, Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Dipartimento di Ingegneria dell’Informazione, Sede Scientifica Università degli Studi di Parma,
Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid,
Spain
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Intelligent Systems Laboratory, Systems Research Institute, Polish Academy of Sciences, Warsaw,
Poland
Alaa Khamis, Department of Mechatronics Engineering, German University in Egypt El Tagamoa El Khames,
New Cairo City, Egypt
Torsten Kroeger, Intrinsic Innovation, Mountain View, USA
Yong Li, College of Electrical and Information Engineering, Hunan University, Changsha, China
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, USA
Ferran Martín, Departament dʼEnginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, USA
Subhas Mukhopadhyay, School of Engineering, Macquarie University, Sydney, Australia
Cun-Zheng Ning, Department of Electrical Engineering, Arizona State University, Tempe, USA
Toyoaki Nishida, Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan
Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova,
Genova, Italy
Bijaya Ketan Panigrahi, Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, India
Federica Pascucci, Department di Ingegneria, Università degli Studi Roma Tre, Roma, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore,
Singapore
Joachim Speidel, Institute of Telecommunications, University of Stuttgart, Stuttgart, Germany
Germano Veiga, FEUP Campus, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Walter Zamboni, Department of Computer Engineering, Electrical Engineering and Applied Mathematics,
DIEM—Università degli studi di Salerno, Fisciano, Italy
Kay Chen Tan, Department of Computing, Hong Kong Polytechnic University, Kowloon Tong, Hong Kong
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering—quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the Publishing
Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada
Michael Luby, Senior Editor ([email protected])
All other Countries
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **
Sarika Jain · Nandana Mihindukulasooriya ·
Valentina Janev · Cogan Matthew Shimizu
Editors

Semantic Intelligence
Select Proceedings of ISIC 2023
Editors
Sarika Jain Nandana Mihindukulasooriya
Department of Computer Applications MIT-IBM Watson AI Lab
National Institute of Technology Cambridge, MA, USA
Kurukshetra, India
Cogan Matthew Shimizu
Valentina Janev Department of Computer Science
Institute Mihajlo Pupin Wright State University
Belgrade, Serbia Dayton, USA

ISSN 1876-1100 ISSN 1876-1119 (electronic)

Lecture Notes in Electrical Engineering
ISBN 978-981-97-7355-8 ISBN 978-981-97-7356-5 (eBook)
https://doi.org/10.1007/978-981-97-7356-5

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore

If disposing of this product, please recycle the paper.

Preface

The International Semantic Intelligence Conference (ISIC) is an international

platform for Artificial Intelligence, Machine Learning and the Semantic Web commu-
nities. It presents a forum to publish cutting-edge research results in intelligent appli-
cations. ISIC aims to bring together researchers, practitioners and industry specialists
to discuss, advance, and shape the future of intelligent systems by virtue of machine
learning and semantic technologies.
Semantic computing is the hottest topic today because of the many listed draw-
backs of using the statistical approaches alone. Being a conference in a very specific
field “semantic intelligence”, this is going to be a special event and is most required
for the researchers of artificial intelligence. This is not a one-time event but will
be held every year under the same name ISIC, but at different venues. This event
is an international event by all means. The first edition has been successfully held
in India. The second event was planned to be held at Georgia Southern Univer-
sity, United States, but due to COVID restrictions, it was held in online mode. ISIC
2023 (hosted by the University of Minho, Braga, Portugal, from 17th to 19th of
October 2023) was organized in parallel with the International Health Informatics
Conference (IHIC 2023, hosted by the Management Education and Research Insti-
tute, Delhi, India) and the International EdTech Conference (IEdTC 2023, hosted
by the Annasaheb Dange College of Engineering and Technology (ADCET), Ashta,
India).
This Proceedings includes the presented papers at ISIC 2023 and IEdTC 2023.
The papers were peer-reviewed and approved by the program committee experts with
extended experience in research, development and innovation in topics in scope of
the conferences.

v
vi Preface

Chapter Preview

This Book comprises 36 manuscripts from 92 authors coming from 46 different

universities/institutions over five different countries, namely the United States, India,
Germany, Serbia, and Thailand.

Invited Papers

The first part consists of four Keynote papers that address challenges in building
knowledge graphs, as well as applications of semantic technologies and advanced
analytical solutions in Smart Manufacturing Environments and Smart Energy
Management Systems. Chapter “Towards Understanding the Impact of Schema on
Knowledge Graph Embeddings” utilizes the Deep Graph Library on two Knowl-
edge Graph schemas over the same Wright State University’s CORE Scholar data.
The authors concluded that there exists a difference in how knowledge graph
embedding models perform when trained on a schema that is rich or shallow
in design. In Chapter “Fragmenting Data Strategies to Scale Up the Knowledge
Graph Creation”, the authors introduce the KatanaG, an engine-agnostic frame-
work, designed to enhance the scalability of KG creation processes, especially when
dealing with large and heterogeneous data sources. Chapter “On the Potential of
Sustainable Software Solutions in Smart Manufacturing Environments” compares
green computing approaches and explains the advantages and disadvantages of
semantic methods. Chapter “Technologies and Concepts for the Next-Generation
Integrated Energy Services” proposes a solution for the efficient integration of
data-driven services and connecting physical energy assets in future smart grids.

Trends

The second part consists of 14 papers. The Trends and Perspectives Track first
explores the state of the art of artificial intelligence techniques in different appli-
cations, for instance, in Chapter “From Text to Voice: A Comparative Study of
Machine Learning Techniques for Podcast Synthesis” for podcast synthesis; in
Chapter “Artificial Intelligence and Legal Practice: Jurisprudential Foundations
for Analyzing Legal Text and Predicting Outcomes” for analysing legal text
and predicting outcome; in Chapter “Unveiling the Truth: A Literature Review
on Leveraging Computational Linguistics for Enhanced Forensic Analysis” for
enhanced forensic analysis; in Chapter “Navigating the Digital Frontier: Unraveling
the Complexities and Challenges of Emerging Virtual Reality” in emerging virtual
reality; in Chapter “An In-Depth Exploration of Anomaly Detection, Classification,
and Localization with Deep Learning: A Comprehensive Overview” for intrusion
detection in computer security services.
Preface vii

Additionally, in this part of the Book, the contributions from the IEdTC 2023
conference are presented that deal with trends in Information and Communication
Technologies for Education. The Specific challenges for India are explored in Chap-
ters “Challenges to Admissibility and Reliability of Electronic Evidence in India
in the Age of ‘Deepfakes’” and “Visualization and Statistical Analysis of Research
Pillar of Top Five THE (Times Higher Education)-Ranked Universities for the Years
2020–2023”.
The use of artificial intelligence techniques for evaluation in education is presented
in several contributions including Chapters “Dimensions of ICT Based Student
Evaluation and Assessment in the Education Sector”, “A Formula for Effective
Evaluation Practice Using Online Education Tool” and “Admission Prediction for
Universities Using Decision Tree Algorithm and Support Vector Machine”.
The impact of technological trends in the field of modern education is
discussed in Chapters “Effectiveness of Online Education System”, “Deciphering
the Catalysts Influencing the Willingness to Embrace Digital Learning Applications:
A Comprehensive Exploration” and Chapter “Pedagogical Explorations in ICT:
Navigating the Educational Landscape with Web 2.0, 3.0, and 4.0 for Transformative
Learning Experiences”.
Finally, in the second part, in Chapter “Comparative Analysis of Docker Image
Files Across Various Programming Environments”, the Docker technology is
discussed as an enabler for building innovative applications portable across different
computer systems.

Research

The Research Track incorporates 10 papers that analyse research gaps and offer
solutions that fill that gaps thus contributing significantly to the advancement of
semantic intelligence and artificial intelligence. Comparative analysis of machine
learning approaches is given in Chapters “Assessing Machine Learning Algorithms
for Customer Segmentation: A Comparative Study” and “Genre Classification of
Movie Trailers Using Audio and Visual Features: A Comparative Study of Machine
Learning Algorithms”. Novel approaches are elaborated using the following:
• Deep Convolution Neural Network in Chapters “Classifying Scanning Electron
Microscope Images Using Deep Convolution Neural Network” and “YOLO
Algorithm Advancing Real-Time Visual Detection in Autonomous Systems”;
• Support Vector Machines in Chapter “An Efficient Kernel-SVM-based Epilepsy
Seizure Detection Framework Utilizing Power Spectrum Density”;
• Enhanced Binary Particle Swarm Optimization in Chapter “Optimizing Feature
Selection in Machine Learning with E-BPSO: A Dimensionality Reduction
Approach”;
• Hamming distance algorithm for information retrieval in Chapter “Ranking of
Documents Through Smart Crawler”;
viii Preface

• Explainable AI in Chapter “Harnessing Ridge Regression and SHAP for

Predicting Student Grades: An Approach Towards Explainable AI in Education”.
Additionally, in this Part are manuscripts related to innovations in Criminal Judg-
ments (Chapter “CRIMO: An Ontology for Reasoning on Criminal Judgments”:
CRIMO - An Ontology for Reasoning on Criminal Judgments) and Educa-
tion (Chapter “Ensemble Learning Approaches to Strategically Shaping Learner
Achievement in Thailand Higher Education”: Ensemble Learning Approaches to
Strategically Shaping Learner Achievement in Thailand Higher Education).

Applications

The Applications and Deployment Track accepted papers showcasing applications

of semantic intelligence and advanced analytics techniques for:
• Gesture recognition in Chapters “Convolutional Neural-Network-based Gesture
Recognition System for Air Writing for Disabled Person”; “Powerpoint Slide
Presentation Control Based on Hand Gesture” and “Hand Gesture Recognition
and Real-Time Voice Translation for the Deaf and Dumb”;
• Eye tracking in Chapter “Face Cursor Movement Using OpenCV”;
• Speech recognition in Chapter “SQL Queries Using Voice Commands to Be
Executed”;
• Design of safety equipment for coal mining in Chapter “A Protection Approach
for Coal Miners Safety Helmet Using IoT”;
• Hybrid Learning in Chapter “A Compatible Model for Hybrid Learning
and Self-regulated Learning During the COVID-19 Pandemic Using Machine
Learning Analytics”;
• Smart EV power management in Chapter “IoT-Based Smart EV Power
Management for Basic Life Support Transportation”.

Kurukshetra, India Sarika Jain

Cambridge, USA Nandana Mihindukulasooriya
Belgrade, Serbia Valentina Janev
Dayton, USA Cogan Matthew Shimizu

Acknowledgments The International Semantic Intelligence Conference (ISIC

2023) results from a larger group of people working together. The editors express
their gratitude to the organizing team, including the track chairs, session chairs,
technical program committee members and external reviewers. We are grateful to
the many volunteers, who worked tirelessly to ensure the event’s success. Finally,
we are thankful to all authors who selected ISIC 2023 for the presentation of their
Preface ix

results and gave consent for publishing their innovative solutions and contributions
to science with Springer.
Contents

Invited Papers
Towards Understanding the Impact of Schema on Knowledge
Graph Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Brandon Dave and Cogan Shimizu
Fragmenting Data Strategies to Scale Up the Knowledge Graph
Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Enrique Iglesias, Ahmad Sakor, Philipp D. Rohde, Valentina Janev,
and Maria-Esther Vidal
On the Potential of Sustainable Software Solutions in Smart
Manufacturing Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Simon Paasche and Sven Groppe
Technologies and Concepts for the Next-Generation Integrated
Energy Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Valentina Janev, Lazar Berbakov, Marko Jelić, Dea Jelić,
and Nikola Tomašević

Trends
From Text to Voice: A Comparative Study of Machine Learning
Techniques for Podcast Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Pankaj Chandre, Viresh Vanarote, Uday Mande, Mohd Shafi Pathan,
Prashant Dhotre, and Rajkumar Patil
Artificial Intelligence and Legal Practice: Jurisprudential
Foundations for Analyzing Legal Text and Predicting Outcomes . . . . . . . 57
Ivneet Walia and Navtika Singh Nautiyal
Unveiling the Truth: A Literature Review on Leveraging
Computational Linguistics for Enhanced Forensic Analysis . . . . . . . . . . . . 71
Deepak Mashru and Navtika Singh Nautiyal

xi
xii Contents

Navigating the Digital Frontier: Unraveling the Complexities

and Challenges of Emerging Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . 85
Navtika Singh Nautiyal and Archana Patel
Challenges to Admissibility and Reliability of Electronic Evidence
in India in the Age of ‘Deepfakes’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Divyansh Shukla and Anshul Pandey
An In-Depth Exploration of Anomaly Detection, Classification,
and Localization with Deep Learning: A Comprehensive Overview . . . . . 115
Kamred Udham Singh, Ankit Kumar, Gaurav Kumar, Teekam Singh,
Tanupriya Choudhury, and Ketan Kotecha
Comparative Analysis of Docker Image Files Across Various
Programming Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Kamred Udham Singh, Ankit Kumar, Gaurav Kumar, Teekam Singh,
Tanupriya Choudhury, and Ketan Kotecha
Dimensions of ICT-Based Student Evaluation and Assessment
in the Education Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
R. Arulmurugan, P. Balakrishnan, N. Vengadachalam,
and V. Subha Seethalakshmi
Effectiveness of Online Education System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
N. Vengadachalam, V. Subha Seethalakshmi, R. Arulmurugan,
and P. Balakrishnan
A Formula for Effective Evaluation Practice Using Online
Education Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
V. Subha Seethalakshmi, R. Arulmurugan, P. Balakrishnan,
and N. Vengadachalam
Deciphering the Catalysts Influencing the Willingness to Embrace
Digital Learning Applications: A Comprehensive Exploration . . . . . . . . . 167
Ankita Srivastava and Navtika Singh Nautiyal
Pedagogical Explorations in ICT: Navigating the Educational
Landscape with Web 2.0, 3.0, and 4.0 for Transformative Learning
Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Navtika Singh Nautiyal and Deepak Mashru
Admission Prediction for Universities Using Decision Tree
Algorithm and Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Khushbu Trivedi, Jenisia Dsouza, Shivam Kumar, Vatsal Saxena,
Shravani Kulkarni, Susanta Das, Parineeta Kelkar, Piyush Bhosale,
and Ritul Dhanwade
Contents xiii

Visualization and Statistical Analysis of Research Pillar of Top Five

THE (Times Higher Education)-Ranked Universities for the Years
2020–2023 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Susanta Das, Shravani Kulkarni, Jenisia Dsouza, Piyush Bhosale,
Ritul Dhanwade, Khushbu Trivedi, Parineeta Kelkar,
Debanjali Barman Roy, and Ranjit Kumar

Research
Assessing Machine Learning Algorithms for Customer
Segmentation: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Katta Subba Rao, Sujanarao Gopathoti, Ajmeera Ramakrishna,
Priya Gupta, Sirisha Potluri, and Gaddam Srihith Reddy
Genre Classification of Movie Trailers Using Audio and Visual
Features: A Comparative Study of Machine Learning Algorithms . . . . . . 231
Viresh Vanarote, Pankaj Chandre, Uday Mande, Pathan Mohd Shafi,
Dhanraj Dhotre, and Madhukar Nimbalkar
Classifying Scanning Electron Microscope Images Using Deep
Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Kavitha Jayaram, S. Geetha, Prakash Gopalakrishnan,
and Jayaram Vishakantaiah
An Efficient Kernel-SVM-based Epilepsy Seizure Detection
Framework Utilizing Power Spectrum Density . . . . . . . . . . . . . . . . . . . . . . . 251
Vinod Prakash and Dharmender Kumar
YOLO Algorithm Advancing Real-Time Visual Detection
in Autonomous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Abhishek Manchukonda
Optimizing Feature Selection in Machine Learning with E-BPSO:
A Dimensionality Reduction Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Rajalakshmi Shenbaga Moorthy, K. S. Arikumar,
Sahaya Beni Prathiba, and P. Pabitha
CRIMO: An Ontology for Reasoning on Criminal Judgments . . . . . . . . . 297
Sarika Jain, Sumit Sharma, Pooja Harde, Archana Pandey,
and Ruqaiya Thakrawala
Ranking of Documents Through Smart Crawler . . . . . . . . . . . . . . . . . . . . . . 317
Amol S. Dange, B. Manjunath Swamy, and Ashwini B. Shinde
Ensemble Learning Approaches to Strategically Shaping Learner
Achievement in Thailand Higher Education . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Sittichai Bussaman, Patchara Nasa-Ngium, Wongpanya S. Nuankaew,
Thapanapong Sararat, and Pratya Nuankaew
xiv Contents

Harnessing Ridge Regression and SHAP for Predicting Student

Grades: An Approach Towards Explainable AI in Education . . . . . . . . . . 341
Vijay Katkar, Swapnil Kadam, Juber Mulla, and Niyaj Nadaf

Applications
Convolutional Neural-Network-based Gesture Recognition System
for Air Writing for Disabled Person . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Soham Kr Modi, Manish Kumar, Sanjay Singla, Charnpreet Kaur,
Tulika Mitra, and Arnab Deb
A Protection Approach for Coal Miners Safety Helmet Using IoT . . . . . . 377
Shabina Modi, Yogesh Mali, Lakshmi Sharma, Prajakta Khairnar,
Dnyanesh S. Gaikwad, and Vishal Borate
Face Cursor Movement Using OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
R. S. M. Lakshmi Patibandla, Madupalli Manoj,
Vantharam Sai Sushmitha Patnaik, Alapati Jagadeesh,
and Bathina Sasidhar
Powerpoint Slide Presentation Control Based on Hand Gesture . . . . . . . . 401
Ankit Kumar, Kamred Udham Singh, Gaurav Kumar, Teekam Singh,
Tanupriya Choudhury, and Ketan Kotecha
SQL Queries Using Voice Commands to Be Executed . . . . . . . . . . . . . . . . . 413
R. S. M. Lakshmi Patibandla, Sai Naga Satwika Potturi,
and Namratha Bhaskaruni
A Compatible Model for Hybrid Learning and Self-regulated
Learning During the COVID-19 Pandemic Using Machine
Learning Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Pratya Nuankaew, Sittichai Bussaman, Patchara Nasa-Ngium,
Thapanapong Sararat, and Wongpanya S. Nuankaew
Hand Gesture Recognition and Real-Time Voice Translation
for the Deaf and Dumb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Shabina Modi, Yogesh Mali, Rekha Kotwal, Vishal Kisan Borate,
Prajakta Khairnar, and Apashabi Pathan
IoT-Based Smart EV Power Management for Basic Life Support
Transportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
M. Hema Kumar, G. Nagalali, D. Karthikeyan, C. S. Poornisha,
and S. Priyanka
About the Editors

Dr. Sarika Jain has served in education for over 19 years and is currently serving
at the National Institute of Technology Kurukshetra (Institute of National Impor-
tance), India. She has authored or co-authored over 150 publications including both
authored and edited books. Her research interests include knowledge management
and analytics, ontological engineering, knowledge graphs, and intelligent systems.
She has been the principal investigator of sponsored research projects and works
in collaboration with various researchers across the globe, including in Germany,
Austria, Australia, Malaysia, Spain, the USA, and Romania. She serves as a reviewer
for journals published by IEEE, Elsevier, and Springer. She has been involved as a
program and steering committee member at many prestigious conferences in India
and abroad. She is a senior member of the IEEE, a member of ACM, and a Life
Member of the CSI.

Dr. Nandana Mihindukulasooriya is a Senior Research Scientist at IBM Research

in New York, USA. Dr. Mihindukulasooriya holds a Ph.D. in Artificial Intelligence
(AI) from Universidad Politécnica de Madrid, Spain. His research interests include
Knowledge Representation and Reasoning (KRR), Semantic Web, Linked Data,
and Natural Language Processing (NLP). Dr. Mihidukulasooriya has participated
in several European research projects including SEALS (FP7–238975), LIDER
(FP7-610782), and 4V (TIN2013-46238). Before IBM, he was a member of the
Ontology Engineering Group, Universidad Politécnica de Madrid (Spain), renowned
for breakthrough innovations in Ontology Engineering, Linked Data, Data Inte-
gration, and semantic Infrastructure. Dr. Mihindukulasooriya currently serves as
a Project Management Committee (PMC) member at Apache Axis and Apache Web
Services projects of Apache Software Foundation.

Dr. Valentina Janev is a Senior Researcher at the Mihajlo Pupin Institute, Univer-
sity of Belgrade, Serbia. She has extensive experience in research, software systems
development, and maintenance in different industrial domains for clients from
Europe. She has published several conference and journal papers, books, and book
chapters on responsible knowledge management, semantic intelligence in Big Data

xv
xvi About the Editors

applications, knowledge graphs, and Big Data processing. Dr. Valentina Janev has
served as an external Expert engaged by the European Commission, Research Exec-
utive Agency for the evaluation of EU research proposals and projects. She is a senior
member of the IEEE. Dr. Janev has acted as a reviewer of respectable international
journals including Artificial Intelligence Review (Springer), International Journal on
Semantic Web and Information Systems (IGI Global), International Journal of Digital
Earth (Taylor & Francis), Information Systems Management (Taylor & Francis),
International Journal of Intelligent Information Systems (Science Publishing Group)
and American Journal of Software Engineering and Applications (Science Publishing
Group).

Dr. Cogan Matthew Shimizu is an Assistant Professor at Wright State University,

where he leads the Knowledge and Semantic Technologies (KASTLE) Lab. His
research focuses on pattern-based methods for knowledge engineering and their
application in neurosymbolic AI. He has authored over 75 publications, including
at premier venues including the Semantic Web journal and ISWC. He is now a
Managing Editor at the Semantic Web journal and on the steering committee for the
K-CAP conference series.
Abbreviations

AEs Auto Encoders

AI Artificial Intelligence
ANN Artificial Neural Networks
API Application Programming Interface
AR Augmented Reality
BERT Bidirectional Encoder Representations from Transformers
CAFÉ Corporate Average Fuel Economy
CEC Citizen Energy Communities
CL Computational Linguistics
CNNs Convolutional Neural Networks
CORBS Criminal Rule-Based System
CPUs Central Processing Units
CRIMO Indian Criminal Ontology
D2MD Data to Metadata
DAB Dual Active Bridge
DBMS Database Management Systems
DBN Deep Belief Networks
DBSCAN Density-Based Spatial Clustering of Applications with Noise
DCA Dilated Convolutional AEs
DERs Distributed Energy Resources
DIS Data Integration System
DNN Deep Neural Networks
DR Demand Response
DSS Decision Support System
E-BPSO Enhanced Binary Particle Swarm Optimization
EEG Electroencephalogram
eIDAS Electronic Identification, Authentication, and Trust Services
EPC Energy Performance Contracting
EV Electric Vehicles
FL Forensic Linguistics
GANs Generative Adversarial Networks

xvii
xviii Abbreviations

GDPR General Data Protection Regulation

GHG Greenhouse Gas
GPUs Graphics Processing Units
HMMs Hidden Markov Models
ICT Information Communication Technology
IoT Internet of Things
KGE Knowledge Graph Embeddings
KGs Knowledge Graphs
KNN K-Nearest Neighbors
KPCA Kernel Principal Component Analysis
KPIs Key Performance Indicators
LDA Linear Discriminant Analysis
LKIF Legal Knowledge Interchange Format
LOC Lines of Code
LSTM Long Short-Term Memory
MAE Mean Absolute Error
MBSCA Modified Binary Sine Cosine Algorithm
ML Machine Learning
MR Mean Rank
MRR Mean Reciprocal Rank
NB Naive Bayes
NIS Network and Information Security
NLP Natural Language Processing
OBE Outcome Based Education
OCV-SOC Open Circuit Voltage-State of Charge
ODA Ontology Development Approach
PCA Principal Components Analysis
PSD Power Spectral Density
PSO Programme Specific Outcome
RBF Radial Basis Function
RBM Restricted Boltzmann Machines
RDF Resource Description Framework
RES Renewable Energy Sources
RF Random Forest
RML RDF Mapping Language
RMSE Root Mean Squared Error
RNNs Recurrent Neural Networks
RTI Right to Information
SDAE Stacked DE noising Auto Encoders
SEM Scanning Electron Microscope
SEM Smart Energy Management
SGAM Smart energy Grid Architecture Model
SHAP SHapley Additive exPlanations
SLSL Sri Lankan Sign Language
SQL Structured Query Language
Abbreviations xix

SVM Support Vector Machines

SWRL Semantic Web Rule Language
TBS Thermal Barrier System
TPS Thermal Protection System
TTS Text-to-Speech
UHTC Ultra High-Temperature Ceramics
VIF Variance Inflation Factor
VR Virtual Reality
XAI Explainable Artificial Intelligence
XRD X-Ray Diffraction
YOLO You Only Look Once
Invited Papers
Towards Understanding the Impact
of Schema on Knowledge Graph
Embeddings

Brandon Dave and Cogan Shimizu

Abstract Knowledge graphs (KGs) enable researchers to understand a set of data

within a domain of research and how different aspects of the data may connect. The
methodology used to design and develop a KG varies depending on the use case.
When designing a schema for a KG, also called an ontology, the developers can
describe data in a rich or shallow manner. A shallow approach has uses for when
there is no significant data to describe with data values, whereas a rich approach
more closely mirrors reality by providing layers in the ontology to the data descrip-
tion. In this paper, we examine the impact that the complexity a KG schema has on
their corresponding knowledge graph embeddings (KGE), where complexity varies
across shallow or rich approaches for entity-to-entity relationships. We utilize the
Deep Graph Library on two schemas over the same Wright State University’s CORE
Scholar data. Preliminary work has shown that there are indeed differences in per-
formance, but further investigation is needed to determine the causal mechanisms,
as well as to perform additional data cleaning.

1 Introduction

Domain experts are able to utilize1 the implementation of knowledge graphs for data
insights in their respective field of study. The schematic design of a knowledge graph,
also referred to as an ontology, should be a reflection of the use case described by the
domain experts. Knowledge graphs represent data connections where a class entity
can be connected by a relationship to another entity; thus, an ontology can result in
a variety of designs dependent on the data and the developer’s needs.

1 https://dglke.dgl.ai/doc/.

B. Dave (B) · C. Shimizu

Wright State University, Dayton, OH, USA
e-mail: [email protected]
C. Shimizu
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 3
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_1
4 B. Dave and C. Shimizu

A rich schema design illustrates the result of identified data being represented as
class entities which will allow an ontologist to create layers to a knowledge graph
that attempts to mirror a realistic view of data and information. An ontology that
follows a shallow design allows developers to describe data without auxiliary layers
as the data may not be of significance. This direct pairing of class entities and data
values streamlines representation without a need for further abstraction. Our research
is aimed towards understanding if there is a benefit or consequence of utilizing one
design over the other when implemented with knowledge graph embedding (KGE).
The rest of this paper is organized as follows. Section 2 provides insight into the
state of the art and the foundational concepts for context of the paper, as well as the
dataset we have opted to utilize. In Sects. 3, 4, and 5, we provide our experimental
methodology, present preliminary results, and discussion thereof. Finally, in Sect. 6,
we conclude with our anticipated next steps.

2 Background

2.1 Related Work

There exists research for facets of knowledge graphs whether that may be their
integration or applications. Modular ontology modelling [5] applies software engi-
neering techniques to launch and maintain a knowledge graph from ground zero.
With encoding techniques, knowledge graphs are able to be represented within a
hyperplane which allows for KGE models to be trained for predictive tasks, such
as finding missing relationships between entities or understanding the semantic of
entities and relationships. These contributions allow further experiments to continue
into a variety of fields for ontology modelling and knowledge graph usages. Section
2.2 goes on to detail the CORE Scholar data. Section 2.3 continues to describe the
differences in schema conceptualization.

2.2 WSU CORE Scholar Data

The dataset we used for this research was obtained by Wright State University’s
(WSU) library. Both Figs. 1 and 2 are graphical representations of the two schemas
designed to be used with the CORE Scholar data. The CORE Scholar data is a
public repository that celebrates research and collaboration done by WSU affiliated
researchers. The repository contains data describing a multitude of publications,
including published newsletters by the college deans and photographs of historical
monuments to scientific research in a variety of fields. From the CORE Scholar
dataset, we focussed on detailing publications, the respective publishers, and their
affiliated institution.
Towards Understanding the Impact of Schema …

Fig. 1 Rich schema. Note Classes are represented with orange rectangles, relationships are represented by connected edges, and data types are represented in
yellow ovals
5
6

Fig. 2 Shallow schema. Note Classes are represented in orange rectangles, relationships are represented by connected edges, and datatypes are represented in
yellow ovals
B. Dave and C. Shimizu
Towards Understanding the Impact of Schema … 7

Table 1 Training data partition

Training Validation Test
Split Data 776563 259203 259185
All Data 1942051 388260 388251

2.3 Shallow Versus Rich Schemas

An ontology is a schematic design of an applicable knowledge graph. The depth

of one’s ontology can always be constrained to the use cases defined by the devel-
opment team. Figure 1 is designed for deeper querying of class identifications for
the relationships connected to a publication and for relationships of roles and their
respective personal information. The shallow schema, pictured in Fig. 2, reflects data
relationships as their respective data value to a class entity.

3 Methodology

Our research utilized Deep Graph Library’s DGL-KE2 in order to train and evaluate
our knowledge graphs with embedding models. DGL-KE consists of a variety of
embedding models; however, the knowledge graph embedding models we chose
to use for this preliminary research are TransR [4], TransE [1], ComplEx [7], and
DistMult [9]. Table 1 specifies the partition of triples used for training and evaluating.
Each model’s training took place over 500 steps at a learning rate of 0.25 and a gamma
rate of 19.9.

4 Preliminary Results

Table 2 lists the average losses of our models after training. DGL-KE optimizes with
respect to the average of both positive loss and negative loss values. As such, there
are no significant differences in model performance with or without data cleaning
as represented by TransE’s training. For the experiment, the remaining KGE models
were trained with clean data.

2 https://github.com/awslabs/dgl-ke.
8 B. Dave and C. Shimizu

Table 2 Trained models

Pos. loss Neg. loss Average loss Average Training time
regularization (s)
TransE_NotClean Rich_All 0.6927409053 1.067981296 0.880361104 1.08E-02 2171.652527
Rich_Split 0.5731419444 0.8959519851 0.734546957 1.15E-02 2100.223849
Shallow_All 0.7196711075 1.094491489 0.9070813012 1.08E-02 2221.380924
Shallow_Split 0.5592314935 0.8573559636 0.7082937336 1.21E-02 2187.441092
TransE_Clean Rich_All 0.6918644512 1.052533621 0.8721990371 1.13E-02 1995.674074
Rich_Split 0.8854867172 1.40779551 1.146641114 1.01E-03 46.77071643
Shallow_All 0.6929218888 0.9684783602 0.8307001209 1.03E-02 1965.921571
Shallow_Split 0.5654308927 0.8687946874 0.7171127892 1.17E-02 2173.908379
TransR Rich_All 48.38371078 1.743245237 25.06347794 1.68E-05 117998.479
Rich_Split 43.93017563 2.182257621 23.05621658 1.67E-05 112157.6883
Shallow_All 65.76916473 0.8211680578 33.2951667 2.81E-05 75265.70867
Shallow_Split 58.20628952 0.9760909441 29.59119026 2.63E-05 92543.83346
DistMult Rich_All 0.5514617062 0.7621009398 0.656781323 6.72E-05 39.86753392
Rich_Split 0.5079857039 0.77404006 0.641012876 6.70E-05 28.16067123
Shallow_All 0.4833472383 0.8264365554 0.6548918915 1.02E-04 38.64474416
Shallow_Split 0.423595767 0.7918563342 0.607726059 8.91E-05 41.55230474
ComplEx Rich_All 0.5457397699 0.7690031171 0.6573714352 5.49E-05 44.67850161
Rich_Split 0.5078055394 0.7930349755 0.6504202604 5.37E-05 45.87114024
Shallow_All 0.4744121134 0.791073463 0.6327427793 7.89E-05 42.87777638
Shallow_Split 0.4239445174 0.8003909087 0.6121677113 7.71E-05 45.15852642

5 Discussion

We include Tables 4 and 3 to display the results of our models for the metrics Mean
Rank (MR), Mean Reciprocal Rank (MRR), and Model Prediction Hits@k where
k is 1, 3, and 10. We note the measurable differences a model has when trained
on and evaluated with seen data. For visibility, the better performing metric results
are highlighted in bold. A . row is added to illustrate the difference, as an absolute
value, between a rich and shallow design’s metric results. Although there is a marginal
difference for MRR and hits@k in KGE performance, there appears to be a larger
difference in the performance in MR-based performances.

6 Conclusion

In this preliminary work, we determined that there exists a difference in how knowl-
edge graph embedding models perform when trained on a schema that is rich or
shallow in design.
Towards Understanding the Impact of Schema … 9

Table 3 KGE evaluation with split training

MR MRR H@1 H@3 H@10
TransE_NotClean Shallow_Split 138.438027 0.293748 0.249561 0.301931 0.364737
Rich_Split 176.536839 0.320323 0.274590 0.330523 0.398368
Delta 38.098812 0.026574 0.025030 0.028592 0.033631
TransE_Clean Shallow_Split 136.215533 0.297559 0.251864 0.306459 0.372850
Rich_Split 174.791412 0.326508 0.279918 0.338311 0.405980
Delta 38.575879 0.028950 0.028054 0.031852 0.033130
TransR Shallow_Split 427.716955 0.094434 0.078577 0.096875 0.115589
Rich_Split 434.585313 0.083940 0.070556 0.085262 0.100761
Delta 6.868358 0.010494 0.008021 0.011613 0.014828
DistMult Shallow_Split 292.566215 0.133801 0.096202 0.137969 0.200195
Rich_Split 375.044530 0.087382 0.065789 0.086065 0.119559
Delta 82.478315 0.046419 0.030413 0.051904 0.080636
ComplEx Shallow_Split 308.120902 0.130338 0.095243 0.133567 0.191433
Rich_Split 387.443978 0.088012 0.064216 0.087500 0.126698
Delta 79.323076 0.042327 0.031027 0.046067 0.064735

Table 4 KGE evaluation when trained and evaluated with all data
MR MRR H@1 H@3 H@10
TransE_NotClean Shallow_All 17.671754 0.510615 0.411413 0.548074 0.711626
Rich_All 37.021571 0.478205 0.391775 0.507560 0.651298
Delta 19.349817 0.032411 0.019638 0.040514 0.060327
TransE_Clean Shallow_All 16.004682 0.510416 0.409428 0.549654 0.712017
Rich_All 41.615039 0.482275 0.395826 0.511988 0.654758
Delta 25.610357 0.028141 0.013601 0.037665 0.057258
TransR Shallow_All 413.633237 0.092006 0.075774 0.095220 0.113197
Rich_All 434.098484 0.112367 0.096677 0.115643 0.134727
Delta 20.465247 0.020361 0.020903 0.020423 0.021530
DistMult Shallow_All 165.963736 0.340006 0.289562 0.353055 0.433333
Rich_All 265.549778 0.261939 0.228910 0.267384 0.320704
Delta 99.586042 0.078067 0.060652 0.085671 0.112630
ComplEx Shallow_All 171.694639 0.348654 0.300776 0.360821 0.437884
Rich_All 263.238816 0.268759 0.235774 0.274572 0.326594
Delta 91.544177 0.079895 0.065001 0.086249 0.111290

Future Work

We have identified a few next steps to take this research, which will take this beyond
the preliminary work presented herein.

1. We will reproduce this analysis on other common knowledge resources, including

DBpedia [3], YAGO 4 [6], and Wikidata [8] and their filtered versions, such as
YAGO4-19K and DB93K [2].
10 B. Dave and C. Shimizu

2. We will investigate the causal mechanisms of differing results between different

KGE techniques. That is, a formal investigation into how different KGE tech-
niques are impacted by different graph structures, including analysis of differences
in parameter space.

Acknowledgements This work was funded by the National Science Foundation under Grant
2333532; Proto-OKN Theme 3: An Education Gateway for the Proto-OKN. Any opinions, find-
ings, and conclusions or recommendations expressed in this material are those of the authors and
do not necessarily reflect the views of the National Science Foundation. The authors would like to
acknowledge Andrew Eells for identifying related work.

References

1. Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings

for modeling multi-relational data. In: Proceedings of the 26th international conference on
neural information processing systems - Volume 2, pp 2787–2795. NIPS’13, Curran Associates
Inc., Red Hook, NY, USA
2. Hubert N, Monnin P, Brun A, Monticolo D (2023) [email protected]: is my knowledge graph embedding
model semantic-aware?
3. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P, Hellmann S, Morsey
M, Van Kleef P, Auer S, Bizer C (2014) Dbpedia—a large-scale, multilingual knowledge base
extracted from wikipedia. Semantic Web J 6. https://doi.org/10.3233/SW-140134
4. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for
knowledge graph completion. Proceedings of the AAAI conference on artificial intelligence
29(1). https://doi.org/10.1609/aaai.v29i1.9491
5. Shimizu C, Hammar K, Hitzler P (2023) Modular ontology modeling. Semantic Web 14:459–
489 https://doi.org/10.3233/SW-222886, 3
6. Thomas PT, Gerhard Weikum FMS (2020) Yago 4: a reason-able knowledge base. https://doi.
org/10.1007/978-3-030-49461-2_34
7. Trouillon T, Welbl J, Riedel S, Éric Gaussier, Bouchard G (2016) Complex embeddings for
simple link prediction
8. Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledgebase. Commun
ACM 57(10):78–85. https://doi.org/10.1145/2629489
9. Yang B, tau Yih W, He X, Gao J, Deng L (2015) Embedding entities and relations for learning
and inference in knowledge bases
Fragmenting Data Strategies to Scale Up
the Knowledge Graph Creation

Enrique Iglesias , Ahmad Sakor , Philipp D. Rohde , Valentina Janev ,

and Maria-Esther Vidal

Abstract In recent years, the exponential growth of data has necessitated a unified
schema to harmonize diverse data sources. This is where knowledge graphs (KGs)
come into play. However, the creation of KGs introduces new challenges, such as
handling large and heterogeneous input data and complex mappings. These chal-
lenges can lead to reduced scalability due to the significant memory consumption
and extended execution times involved. We present .Katana.G, a framework designed
to streamline KG creation in complex scenarios, including large data sources and
intricate mapping. .Katana.G optimizes memory usage and execution time. When
applied alongside various KG creation engines, our results indicate that .Katana.G
can improve the performance of these engines, by reducing execution time by up to
80% and achieve 70% memory savings.

Keywords Knowledge graph creation · Data fragmentation

P. D. Rohde · M.-E. Vidal

TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
e-mail: [email protected]
M.-E. Vidal
e-mail: [email protected]
E. Iglesias (B) · A. Sakor · P. D. Rohde · M.-E. Vidal
L3S Research Center, Hannover, Germany
e-mail: [email protected]
A. Sakor
e-mail: [email protected]
A. Sakor · P. D. Rohde · M.-E. Vidal
Leibniz University of Hannover, Hannover, Germany
V. Janev
Institute Mihailo Pupin, University of Belgrade, Belgrade, Serbia
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 11
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_2
12 E. Iglesias et al.

1 Introduction

Data has exponential growth as a result of accurate devices of data generation (e.g.,
sensors, MRI scanners, or DNA Sequencers). However, data can be ingested in var-
ious formats and represent the same entities using different schemas and semantics,
thus hindering data processing. Knowledge graphs (KGs) have emerged as expressive
data structures to enable interoperability, and harmonize and integrate heterogeneous
data sources and their meaning [1]. Nevertheless, despite the increasing acceptance
of KGs in industrial [2] and academic sectors [3], KG creation is still facing chal-
lenges to scale up to real-world applications [4, 5]. Specifically, the process of KG
creation can be impacted by multiple parameters [6] (e.g., data size and heterogeneity,
data duplicate rate). Consequently, efficient methods for creating KGs are demanded
to address interoperability.
Building upon prior work in the field of databases, KGs can be conceptually
defined as a data integration system (DIS) [7]. DISs encompass an ontology denoted
as . O, which describes unified perspectives, a collection of data sources, and the
mapping rules or correspondences that establish connections between the concepts
in ontology . O and the attributes of the data sources within set . S. Various engines
have been developed for creating KGs, including RMLMapper [8], RocketRML [9],
SDM-RDFizer [10], and Morph-KGC [11]. These engines make use of the RDF
Mapping Language (RML) [12], which defines the structure of a KG in accordance
with the specifications of the Resource Description Framework (RDF).1 However,
it is important to note that various parameters described by Chaves-Fraga et al. [6]
can also exert an influence on these KG creation processes, potentially limiting their
scalability.
Problem and Proposed Solution. The primary aim of this work is to address the
aforementioned challenges related to managing heterogeneous data sources during
KG creation. Our approach, .Katana.G, is underpinned by an optimization princi-
ple that emphasizes the execution of mapping rules on smaller input data fragments,
resulting in faster execution times and reduced memory consumption. This optimiza-
tion step is an integral component of the planner introduced by Iglesias et al. [13]
and has been empirically assessed within existing engines, including SDM-RDFizer,
Morph-KGC, and RMLMapper. The experimental findings from these evaluations
underscore a notable enhancement in the performance of these engines when data
fragmentation is employed. This highlights the role of planning in the pipelines for
KG creation, demonstrating the significant impact of this optimization strategy.
Contributions. This paper makes the following contributions:
(i) Data Management Techniques: We introduce data management techniques cen-
tered around data fragmentation, designed to enhance the scalability of KG
creation processes, especially when dealing with large and heterogeneous data
sources.

1 https://www.w3.org/TR/2004/REC-rdf-primer-20040210/.
Fragmenting Data Strategies to Scale Up the Knowledge … 13

(ii) The .K atana .G Framework: We introduce .Katana.G, an engine-agnostic frame-

work that optimizes data integration systems to achieve reductions in execution
time and memory usage, resulting in more efficient processes.
(iii) Experimental Validation: We conduct an extensive experimental study on
established benchmarks and state-of-the-art RML engines, demonstrating the
effectiveness of the data integration systems generated by .Katana.G.
This paper is organized into five additional sections. Section 2 defines the nec-
essary concepts to understand this work and an example illustrating the scalabil-
ity issues in existing KG creation engines. Section 3 formalizes the problem and
describes the proposed solution. Next, Sect. 4 presents the empirical evaluation that
determines the performance of the proposed solution. Section 5 illustrates related
work that applies data fragmentation and mapping partitioning. Finally, Sect. 6
presents the conclusions reached.

2 Preliminaries and Motivation

Knowledge graphs (KG) are directed edge-labeled graphs that model statements as
entities and their relationships as labeled edges [1]. The creation of a KG .G can be
defined as a data integration system . D I SG = O, S, M, where . O is a set of classes
and properties of a unified ontology, . S is set of data sources, and . M corresponds to
mapping rules or assertions that follow the concepts established in . O as conjunctive
queries over sources in . S. The execution of . M over the data sources in . S generates
the instances in .G. When creating a KG, multiple factors affect the process in terms
of memory usage and execution time. Chaves-Fraga et al. [6] define these factors as
the size, heterogeneity, number of duplicates in the raw data, and complexity of the
mapping. For that reason, different approaches have been developed to handle these
factors. For example, planning the execution and partitioning the mappings helps to
address the issue that might come when transforming complex mappings. While data
fragmentation reduces the size of the data sources, thus reducing how much data is
processed.
Mapping Rule Languages. R2RML [14] is the W3C standard for defining mapping
rules from relational databases into the Resource Description Framework (RDF)
KGs. R2RML mapping rules allow for the definition of (a) instances of a class .C or
subject definitions, (b) values of the properties of .C, and (c) instances of the pred-
icates that relate .C with other classes. The RDF Mapping Language (RML) [12]
is an extension of the W3C-standard mapping language R2RML, enhancing it
with support for logical sources (referred to as rml:logicalSource) in var-
ious heterogeneous formats, including CSV, Relational, JSON, and XML. Sim-
ilar to the W3C-standard R2RML, triples map in RML corresponds to mapping
rules that define subjects (referred to as rml:subjectMap) of an RDF class and
their properties (referred to as rr:predicateMap) with values (referred to as
rr:objectMap) sourced from logical data sources. The rr:objectMap can
14 E. Iglesias et al.

also be defined as a reference or a join with the rr:subjectMap in another

triples map known as rr:RefObjectMap and rr:joinCondition, respec-
tively. In general, rr:subjectMap, rr:predicateMap, and rr:objectMap
are collectively referred to as rr:TermMap and are responsible for generating RDF
terms. Figure 1 presents three triples maps TriplesMap1, TriplesMap2, and
TriplesMap3. TriplesMap1 defines instances of the class ex:C1, one attribute
p1, and two properties p3, and p4. Table1 is the logical source of TriplesMap1,
and the properties p3 and p4 are defined as references to other triples maps; p3 refers
TriplesMap2 which is defined over the same logical source than TriplesMap1.
On the other hand, p4 is defined in terms of TriplesMap3 and because it is spec-
ified over Table2, a join is required to link the two triples maps. TriplesMap2
specifies the instances of ex:C2 and the values of the predicate ex:p5. Lastly,
TriplesMap3 also specifies the instances of ex:C3 and the values of ex:p6.
Data Source Fragmentation. Data fragmentation divides a database into multiple
smaller databases. The data fragmentation should be carried out so that the original
database can be rebuilt from the pieces. Data fragmentation can be carried out in
one of three ways: Horizontal, Vertical, and Hybrid [15]. Horizontal fragmentation
divides the table horizontally by rows to create a subset of rows. Vertical fragmenta-
tion splits tables vertically so that the new tables are comprised of a subset of columns.
Lastly, the fragmentation is hybrid, when both horizontal and vertical types of frag-
mentation are applied. Data fragmentation can also be applied to other table-like data
source formats like CSV and TSV. In this paper, we do not cover JSON and XML,
because these formats require different partitioning criteria to avoid data loss.

Fig. 1 Mapping assertions

Fragmenting Data Strategies to Scale Up the Knowledge … 15

Table 1 The results of the motivating example

Engine Execution time Memory usage (MB)
SDM-RDFizer 119.15 sec 748.3
Morph-KGC 35.17 sec 1140.78
RMLMapper Timeout (1 h) 2812.00

(a) Motivating example without Planning This table presents the results of executing the
motivating example with knowledge graph creations without planning
Engine Execution time Memory usage
SDM-RDFizer+Planner 70.72 sec 915.54
Morph-KGC+Planner 27.34 sec 1240.78
RMLMapper+Planner Timeout (1 h) 2812.00

(b) Motivating example with Planning This table presents the results of executing the motivating
example with knowledge graph creations with planning

2.1 Motivating Example

We motivate our work in two real-world datasets (i.e., Dataset1 and Dataset2) that
include data about building occupancy; they correspond to two vertical fragments of
the records. The datasets (DS) have the following properties:
DS1: CSV file with a size of 19 MB with three columns: identifier, date, and hour.
DS2: CSV file with a size of 27 MB with four columns: identifier, zone, source, and
connections.
These datasets are collected in the context of the EU H2020 project PLATOON,2 and
maintain data about wind turbines. The PLATOON semantic data models are utilized
as the unified schema.3 Additionally, mapping rules of these datasets are defined in
terms of five RML triples maps partitioned into two groups. With the increase in
data generation, finding an efficient method of generating a KG from large data
sources has become necessary. For that reason, as a pre-processing method, the data
source is divided into smaller chunks, thus reducing the cost of generating the KG
and the overall burden on the system. Multiple state-of-the-art KG creation engines
have been developed like RMLMapper, RocketRML,4 SDM-RDFizer, and Morph-
KGC. Unfortunately, these tools present problems when it comes to memory usage,
especially when it comes to handling large data sources (Table 1).
Table 2 reports on the results of the execution of the RML engines on these
datasets. As observed, Morph-KGC has the best execution time but presents high

2 https://platoon-project.eu/.
3 https://kgswc.org/industry-talk-1-semantic-data-models-construction-in-the-h2020-platoon-
project/.
4 https://github.com/semantifyit/RocketRML.
16 E. Iglesias et al.

Table 2 Initial Experimental Results. The effect of .Katana.G is reported in state-of-the-art RML
engines. .Katana.G empowers execution time
No data fragmentation .Katana.G (secs.)
Engine Execution time in secs. Execution time in secs. % Saving
SDM-RDFizer 50081.2 9842.9 80.35%
Morph-KGC 3604.98 911.26 .74.72%

RMLMapper TimeOut (15h) 37827.3 .29.95% ≤

Table 3 Initial Experimental Results. Effect of .Katana.G in state-of-the-art RML engines (results
reported in MB). .Katana.G reduces memory usage in the three engines
No data fragmentation .Katana.G
Engine Memory usage in MB. Memory usage in MB. % Saving
SDM-RDFizer 15648.16 MB 5124.96 MB 67.25%
Morph-KGC 18777.79 MB 5365.51 MB .71.43%

RMLMapper 40996.75 MB .≤ 21588.63 MB .47.34% ≤

memory usage; this can be attributed to the fact that Morph-KGC uses the pandas
Python library, which is known to use a great deal of memory. While the SDM-
RDFizer does not have the best execution time, it presents better memory usage.
Afterward, to see if there is an improvement in the performance of the engines, the
RML-Planner is applied. The RML-Planner [13] assesses an optimized number of
partitions considering the number of data sources, type of mapping assertions, and
the associations between different triples maps. After providing a list of partitions
and triples maps that belong to each partition, the planner determines their execution
order. A greedy algorithm is implemented to generate the partitions’ bushy tree
execution plan. Bushy tree plans are translated into operating system commands that
guide the execution of the partitions of the mappings in the order indicated by the
bushy tree. Table 3 reports an improvement in the execution time but an increase in
memory usage. Unfortunately, RMLMapper cannot generate the KG in both cases.
Thus, partitioning the mappings does increase the performance of the engines but at
the cost of higher memory usage.
Another example that motivates this work is the Renewal Energy Resource dataset
(RES) [16], which contains data collected over almost 7 years from a solar array farm.
The data source has 35,263,490 rows which is presented as a MySQL table. Given the
size of the data source, it was divided into smaller chunks to test if data fragmentation
is beneficial to the KG creation process. The table is portioned into 36 smaller tables,
where the first 35 tables have 1,000,000 rows, and the last table contains the final
263,490 rows. Each portioned data source is given to a KG creation engine and
transformed into its corresponding KG. Finally, each partitioned KG is combined
into one large KG. The resulting KG contains 1,410,539,600 triples with a size of
approximately 570 GB.
Fragmenting Data Strategies to Scale Up the Knowledge … 17

When generating a KG, the characteristics of the input mapping and its corre-
sponding data source affect the creation process. As seen in Chaves-Fraga et al. [6],
the parameters that influence the KG creation are the size, duplicate rate, heterogene-
ity of the input data, and complexity of the input mapping. Therefore, partitioning the
data source is necessary since applying the transformation to the whole data source
is very expensive regarding memory usage and execution time. When executing the
RES dataset without portioning the data source, it was observed that after 48 hours,
not only did the creation process had not ended, but the first RML triples map still
needed to be completely processed. The number of triples generated from the first
triples map is 176,317,450; this amount consumes much memory, slowing the over-
all creation. Eventually, it will consume all available memory, stopping the process
itself. By partitioning the data source, the created KGs are much smaller. Thus, the
required memory is much less as well. Additionally, combining the smaller KGs
as they are generated at the command line level reduces the memory usage of KG
creation engines. Therefore, this work seeks to expand what was established with the
RML-Planner, instead of focusing on partitioning the mappings, but on partitioning
the data source to determine if there can be a further increase in the performance of
an engine without increasing the cost of execution.

3 Our Approach: .Katana.G

This paper tackles the problem of reducing memory usage during the creation of
a KG .G specified in terms of a data integration system . D I SG = O, S, M. Our
solution resorts to data fragmentation strategies to transform the . D I SG into an
equivalent data integration system . D I S_N ewG = O, S_N ew, M_N ew where
data sources in . S_N ew are horizontally and vertically fragmented and mapping
rules in . M are adjusted accordingly in . M_N ew; these techniques are implemented
in .Katana.G. A data source . Si in . S is vertically fragmented according to each
triples map .ti where . Si is the logical source. For each .ti and a new copy of . Si
is created, . Si only includes the attributes used in the mapping rules that define
the subject, attributes, and properties of .ti are projected. Furthermore, horizontal
fragmentation is performed by partitioning . Si based on a given threshold .σ that
indicates how many records will be included in each partition of . Si . The triples
map .ti is rewritten according to all these partitions. The resulting partitions of . Si
and the rewritten triples maps are included in . S_N ew and . M_N ew, respectively.
Figure 2 illustrates with our running example how .Katana.G transforms data sources
and mapping rules from a data integration system. Figure 2a presents triples map
TriplesMap1 specified over S_1.csv; it defines instances of the class ex:C1
and the predicates ex:p1, ex:p3, and ex:p4. These predicates are, respectively,
expressed in terms of the attributes a_1, a_2, and id. Source S_1.csv is com-
prised only by these attributes but a_3, a_4, and a_5 as well. Therefore, .Katana.G
is applied to project source S_1.csv to simplify its transformation. As it is seen
in Fig. 2a, a vertical fragmentation is used to reduce source S_1.csv to only
18 E. Iglesias et al.

(a) Original Data Source and Mappping. (b) Results of applying vertical fragmenta-
tion.

(c) Results of applying horizontal fragmentation.

Fig. 2 .Katana.G workflow. This figure illustrates.Katana.G workflow. It shows how the data sources
are fragmented

the columns necessary for the transformation TriplesMap1. Afterward, Fig. 2b

illustrates how source S_1.csv is further fragmented, but in this case, horizon-
tal fragmentation is applied instead. A 1,000,000 rows threshold is defined and
S_1.csv is divided into five smaller data sources since the original data source
had 5,000,000 rows; the original TriplesMap1 is also rewritten into five triples
maps TriplesMap11, TriplesMap12, TriplesMap13, TriplesMap14,
and TriplesMap15. These triples maps replace TriplesMap1 which are added
to . M_N ew as well as S_11.csv, S_12.csv, S_13.csv, S_14.csv, and
S_15.csv.
Fragmenting Data Strategies to Scale Up the Knowledge … 19

4 Results and Discussion

This study aims to determine the impact of data fragmentation techniques imple-
mented in .Katana.G on the KG creation process. For that reason, the RML-compliant
engines RMLMapper, SDM-RDFizer, and Morph-KGC are used, and executed in
combination with .Katana.G. The empirical evaluation seeks to answer the following
research questions:
RQ1) How does data source fragmentation affect the performance of the state-of-
the-art RML-compliant engines during KG creation?
RQ2) What is the impact of the mapping assertions and volume of the data sources
on execution time and memory consumed by KG creation engines?
RQ3) What is the impact on the execution time of the execution of mapping assertions
when applying fragmentation to the data source?
Benchmarks. The benchmark is built from real-world datasets created in the context
of the EU H2020 funded project PLATOON.5 The Renewal Energy Resource dataset
(RES) [16] comprises two mapping files containing five RML triples maps. One
of the mappings consists of five mapping rules defining subjects, and 20 mapping
rules defining properties using other triples maps. The other mapping rule also has
5 mapping rules defining subjects and 10 mapping rules defining properties. The
source used is a MySQL data table containing data measured from a solar array
farm over almost 7 years; it presents measurements regarding panel temperature,
insulation, etc., and indicates from which plant it was measured and a timestamp.
The data table has 35,263,490 rows (ca. 20 GB). Only 5,000,000 rows will be used
instead of all the data source rows. This decision was made so that the experiments
could be executed in a reasonable time.
RML Engines. These are the KG creation engines used for the experiments:
RMLMapper v4.12,6 Morph-KGC v2.4.0,7 and SDM-RDFizer v4.6.7.8
Metrics. Execution time is the considered metric to determine the performance of the
RML engines. Execution time is the elapsed time required to fragment the raw data
and execution of the partitions and combination of the smaller KGs to generate the
intended KG. The partitioned data sources are executed in parallel, and the execution
of the partitions represents the most amount of execution time. It is measured as the
absolute wall-clock system time, as reported by the time command of the Linux
operating system. The timeout is 15 hours. Memory usage determines the maximum
memory used to generate the KG. It was measured by using the Python library
malloc. The malloc library measures the memory usage in Kilobytes; the results
are converted to Megabytes for ease of understanding.

5 https://platoon-project.eu/.
6 https://github.com/RMLio/rmlmapper-java.
7 https://github.com/morph-kgc/morph-kgc.
8 https://github.com/SDM-TIB/SDM-RDFizer.
20 E. Iglesias et al.

Implementation..Katana.G is implemented in Python 3.8. The code is available in our

GitHub repository.9 The experiments are executed in an Intel(R) Xeon(R) equipped
with a Platinum 8160 CPU @ 2.10GHz 24 cores, 754 GB RAM, and with the O.S.
Ubuntu 16.04 LTS.
Table 2 reports on execution time of the RML engines on these datasets. As
observed, Morph-KGC has the best execution time, however, it presents high memory
usage; this can be attributed to the fact that Morph-KGC uses the pandas Python
library, which is known to use a great deal of memory. While the SDM-RDFizer does
not have the best execution time, it presents better memory usage. Afterward, to see
if there is an improvement in the performance of the engines, .Katana.G is applied.
Furthermore, Table 3 presents the results on memory usage while the three engines
execute the data integration system generated by .Katana.G. In terms of memory,
Morph-KGC exhibits more savings (71.43% of memory usage reduction).
As seen in Tables 2 and 3, .Katana.G data fragmentation methods positively impact
the performance of the studied RML engines (RQ1). RMLMapper especially shows
significant improvement since it timed out when executing the whole data source, but
when using data fragmentation, it completed the KG creation process. By partitioning
the data source, the amount of uploaded data is reduced. Thus, following what is
established in Chaves-Fraga et al. [6], small data sources speed up the KG graph
creation process (RQ2). Finally, triples maps with small data sources are executed
faster and these executions consume less memory than the ones with large data
sources (RQ3).

5 Related Work

5.1 Mapping Languages and KG Creation Frameworks

A KG can be generated by semantifying and integrating heterogeneous data into an

RDF data model; multiple tools and techniques have been developed for this purpose.
To allow flexible and transparent transformation, declarative mapping languages are
defined to map data into concepts of a unified schema or ontology and transform
them into RDF. R2RML [14] and its extension RDF Mapping Language (RML) [12]
are two of the most popular mapping languages. R2RML is recommended by the
World Wide Web Consortium (W3C). Multiple KG creation engines use R2RML
and RML to generate a KG such as RMLMapper [8], SDM-RDFizer [10], Rock-
etRML [9], CARML [17], and Morph-KGC [11]. All these engines utilize strategies
for generating KGs that allow them to improve certain aspects of the creation process,
like duplicate removal, join execution, etc. Unfortunately, some strategies come with
higher memory usage.

9 https://github.com/SDM-TIB/KatanaG.
Fragmenting Data Strategies to Scale Up the Knowledge … 21

5.2 Mapping Partitioning

As mentioned earlier in this work, the complexity of a triples map affects the per-
formance of the KG creation engine executing it. In other words, more complex
mappings require more resources to transform. Therefore, Mapping Partitioning
seeks to tackle the complexity of a mapping by dividing it into smaller and much
simpler ones. Different approaches have seen the benefits of using mapping parti-
tioning for KG creation. Iglesias et al. [13] apply mapping partitioning by grouping
triples maps by data sources and then generating new mapping files from the triples
maps grouping. An execution plan in the form of a bushy tree plan is defined by
determining which groupings have overlapping properties and whether joins exist
between them. The leaves represent the execution of the mappings, and the inner
nodes are union operators that combine the resulting RDF triples from the leaves.
Additionally, if there exists overlapping properties between the leaves, a duplicate
removal process is applied. Morph-KGC [11] also utilizes mapping partitioning but
in a different manner. Morph-KGC divides each triples map by the same number of
rr:predicateObjectMap that it has. For example, if a triples map has three
rr:predicateObjectMap, Morph-KGC partitions the triples into three smaller
triples maps, where each new triples map has one rr:predicateObjectMap
from the original mapping and its rr:subjectMap. SDM-RDFizer [10] does not
apply mapping partitioning, but it does create an execution plan for the triples maps
by determining which triples maps have overlapping predicates and then executing
first those with the highest overlap.

5.3 Data Source Fragmentation

Data fragmentation seeks to reduce a data source into various smaller data sources.
In the context of KG creation, smaller data sources take less time and memory to
transform into a KG. Multiple approaches have surfaced over the years that utilize
data fragmentation to improve the KG creation process. MapSDI [18] utilizes a ver-
tical fragmentation method for data fragmentation, where the data source of a triples
map is projected in such a way that what remains is the triples map needs, as well
as removing duplicate records from the projected data source. SDM-RDFizer [10]
applies vertical fragmentation when transforming CSV files and relational databases
when only the table name is provided. Morph-KGC [11] presents a hybrid approach
to data fragmentation. Since Morph-KGC divides each triples map into smaller triples
maps, the original data source must be adapted to the new triples map. For that rea-
son, Morph-KGC applies vertical fragmentation to project the data source. Then,
it applies horizontal fragmentation by dividing the projected data sources into sub-
groups and transforming them. Unfortunately, Morph-KGC does not have criteria
for partitioning the data source, leaving it to the Python library pandas.
22 E. Iglesias et al.

6 Conclusions and Future Work

We have described .Katana.G and illustrate the need of scaling up the KG creation
process in the energy domain. Albeit initial, these results put into perspective the
effects of performing optimization techniques in state-of-the-art RML engines. In the
future, we aim to formalize this process further and substantiate our proposed meth-
ods’ effectiveness. Additionally, we are committed to creating benchmarks accessi-
ble to the scientific community, facilitating the assessment of the data management
techniques implemented by the community to scale up the process of KG creation.

Acknowledgements This work has been partially supported by the EU H2020 project PLATOON
and the Federal Ministry for Economic Affairs and Energy of Germany (BMWK) in the project
CoyPu (project number 01MK21007[A-L]). Maria-Esther Vidal has been supported by the project
TrustKG-Transforming Data in Trustable Insights with grant P99/2020.

References

1. Gutiérrez C, Sequeda JF (2021) Knowledge graphs. Commun ACM 64(3):96–104

2. Noy NF, Gao Y, Jain A, Narayanan A, Patterson A, Taylor J (2019) Industry-scale knowledge
graphs: lessons and challenges. Commun ACM 62(8):36–43
3. Auer S, Kovtun V, Prinz M, Kasprzik A, Stocker M, Vidal M (2018) Towards a knowledge graph
for science. In: Akerkar R, Ivanovic M, Kim S, Manolopoulos Y, Rosati R, Savic M, Badica
C, Radovanovic M (eds) Proceedings of the 8th international conference on web intelligence,
Mining and Semantics, WIMS 2018, Novi Sad, Serbia, pp 1:1:6. ACM
4. Chaudhri VK, Baru CK, Chittar N, Dong XL, Genesereth MR, Hendler JA, Kalyanpur A,
Lenat DB, Sequeda J, Vrandecic D, Wang K (2022) Knowledge graphs: introduction, history
and perspectives. AI Mag 43(1):17–29. https://doi.org/10.1609/aimag.v43i1.19119
5. Dong XL (2023) Generations of knowledge graphs: the crazy ideas and the business impact.
Proc VLDB Endow 16(12):4130–4137
6. Chaves-Fraga D, Endris KM, Iglesias E, Corcho Ó, Vidal M (2019) What are the parameters that
affect the construction of a knowledge graph? In: OTM confederated international conferences
7. Lenzerini M (2002) Data integration: a theoretical perspective. In: Popa L, Abiteboul S, Kolaitis
PG (eds) Proceedings of the symposium on principles of database systems. https://doi.org/10.
1145/543613.543644
8. Dimou A, De Nies T, Verborgh R, Mannens E, Van de Walle R (2016) Automated metadata
generation for linked data generation and publishing workflows. In: Workshop on linked data
on the web
9. Şimşek U, Kärle E, Fensel D (2019) RocketRML—A NodeJS implementation of a use-case
specific RML mapper. Accessed 24 June 2022
10. Iglesias E, Jozashoori S, Chaves-Fraga D, Collarana D, Vidal ME (2020) SDM-RDFizer: an
RML interpreter for the efficient creation of RDF knowledge graphs. In: CIKM ’20: The
29th ACM international conference on information and knowledge management, virtual event,
Ireland, https://doi.org/10.1145/3340531.3412881
11. Arenas-Guerrero J, Chaves-Fraga D, Toledo J, Pérez MS, Corcho O (2022) Morph-KGC:
scalable knowledge graph materialization with mapping partitions. Semantic Web. https://doi.
org/10.3233/SW-223135
12. Dimou A, Vander Sande M, Colpaert P, Verborgh R, Mannens E, Van de Walle R (2014) RML: a
generic language for integrated RDF mappings of heterogeneous data. In: Workshop on linked
data on the web
Fragmenting Data Strategies to Scale Up the Knowledge … 23

13. Iglesias E, Jozashoori S, Vidal M (2023) Scaling up knowledge graph creation to large and
heterogeneous data sources. J Web Semant 75:100755. https://doi.org/10.1016/j.websem.2022.
100755
14. R2RML: rdb to rdf mapping language. https://www.w3.org/TR/r2rml/ (2012)
15. Özsu MT, Valduriez P (1999) Principles of distributed database systems. Second Edition.
Prentice-Hall
16. Janev V, Vidal ME, Pujić D, Popadić D, Iglesias E, Sakor A, Čampa A (2022) Responsible
knowledge management in energy data ecosystems. Energies 15(11). https://doi.org/10.3390/
en15113973
17. Maria P (2022) CARML: a pretty sweet rml engine. https://github.com/carml/carml
18. Jozashoori S, Vidal M (2019) Mapsdi: a scaled-up semantic data integration framework for
knowledge graph creation. In: ODBASE
On the Potential of Sustainable Software
Solutions in Smart Manufacturing
Environments

Simon Paasche and Sven Groppe

Abstract Managing and using business data is considered as key success factor
of modern companies and enterprises. Application domains range from the medi-
cal sector (cf. smart healthcare) to automated driving and digital manufacturing (cf.
smart manufacturing and German Industry 4.0). Internet of things (IoT) landscapes
are often built for data acquisition and process monitoring, in which many small dis-
tributed devices collect information about business-relevant processes. After data col-
lection, so-called data-driven applications are used, for example, to support planning
activities and strategic decisions, to optimize internal processes, or to help uncover
sources of error and thus reduce error rates during production. In the latter two cases,
the use of digital technologies also results in more sustainable resource utilization
of important raw materials. At the same time, digital technologies are responsible
for a significant share of energy consumption. Based on a country’s energy mix, ICT
applications are thus responsible for a non-negligible share of climate-damaging
emissions such as carbon dioxide (CO.2 ). These emissions are widely considered
avoidable, as they have been shown to contribute negatively to climate change. In
our work, we highlight approaches to make ICT more sustainable. To this end, we
address, among other things, data validation in smart manufacturing and explain the
advantages and disadvantages of semantic methods.

Keywords Green computing · Data validation · Smart manufacturing

S. Paasche
Automotive Electronics, Robert Bosch Elektronik GmbH, 38228 Salzgitter, Germany
S. Paasche (B) · S. Groppe
Institute of Information Systems, University of Lübeck, 23562 Luebeck, Germany
e-mail: [email protected]
S. Groppe
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 25
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_3
26 S. Paasche and S. Groppe

Fig. 1 The number of networked devices has been growing steadily for years and exceeds the
world’s population. The figure is taken from a keynote by Prof. Dr. Sven Groppe (University of
Lübeck)

1 Introduction

Smart technologies help us to perform everyday tasks, help companies to optimize

internal processes and identify sources of error, and provide us with the informa-
tion we need to make decisions. Nowadays, so-called data-driven technologies are
used for this purpose [8]. The areas of application range from smart healthcare to
autonomous driving and intelligent manufacturing. In all these scenarios, the use of
information and communication technology (ICT) applications aims to support users
in their work.
With the ubiquity of ICT, there is an increasing trend of networked devices
(see Fig. 1). As the figure shows, the number of connected devices far exceeds the
size of the world’s population, as nearly every person today owns devices such as
smartphones, smartwatches, smart home gadgets, and streaming-capable TVs.
As the number of devices increases, the volume of data to be processed also
increases.1 While this was about 2 zettabytes in 2010, it is estimated to be about 181
zettabytes in 2025 (increase by a factor of nearly 100).
Although ICT applications simplify daily processes, the use of these applications
has a downside that should not be underestimated, in terms of environmental costs.
According to Geiger et al. [1], the energy consumption of ICT has been increasing
continuously for years. In 2020, the share of global consumption was between 1
and 3.2%. This corresponds to about 236 terawatt-hours (TWh) and 756 TWh.2 For
comparison: 1 kilowatt-hour allows the preparation of about 70 cups of coffee3 For the
year 2030, the share is prognosed to be about 23% [1], which, assuming an overall

1 https://www.statista.com/statistics/871513/worldwide-data-created/.
2 https://yearbook.enerdata.net/electricity/electricity-domestic-consumption-data.html.
3 https://www.verivox.de/strom/themen/1-kilowattstunde/.
On the Potential of Sustainable Software Solutions … 27

consumption of 23,653 TWh (year 2020), means over 5,000 TWh. Since energy
production is not yet climate-neutral, ICT is responsible for tons of climate-damaging
emissions, depending on the energy mix of a country.
Keeping these emissions as small as possible by reducing computing resources is
a major task we as computer scientists and software developers have to face during
the next years and decades.
Our work is structured as follows: Sect. 2 introduces general techniques and meth-
ods to integrate sustainability in software projects. Afterward, we exemplify these
methods with an use case from smart manufacturing area. In Sect. 4, we discuss main
development decisions of our use case. Finally, we conclude our work in Sect. 5.

2 Green Computing

One approach to saving software system resources is offered by green computing

techniques. According to Uddin et al. [9], green computing (also green IT) describes
ways to use and develop ICT in energy-efficient and eco-friendly ways.
Figure 2 shows the main areas to consider, when we aim to develop sustainable
systems. First, we should make sure that the problem is relevant and that its solution
adds value to our work. These considerations also help us to ensure that our algorithms
perform only necessary computations. This is particularly important because every
calculation requires CPU and memory.

Fig. 2 Main areas of Green

Computing: Problem,
Hardware, and Software
28 S. Paasche and S. Groppe

Fig. 3 Hardware trends over the past 50 years

Fig. 4 Power in Watt (W) per transistor over the past 50 years

The next step is to determine what hardware we have or require. In this process,
we should also take into account the environmental costs that the production of
new hardware entails. In general, it can be seen from Fig. 34 that hardware is steadily
becoming more efficient. The number of transistors and thus the chip performance has
increased almost exponentially, whereas energy consumption has remained almost
constant over the past 20 years.
This trend is also reflected in the energy consumption per transistor (see
Fig. 4).5 This has fallen exponentially over the past decades.
The final step is to design the software in a sustainable manner. This includes the
selection of the programming language, the skillful integration of existing frame-
works, and the general software architecture and design patterns. The goal should

4 Adapted from: https://github.com/karlrupp/microprocessor-trend-data.

5 Derived from: https://github.com/karlrupp/microprocessor-trend-data.
On the Potential of Sustainable Software Solutions … 29

be to use techniques that are as resource-efficient as possible (cf. semantics vs.

automaton in Sect. 3) as well as architectures that are easy to extend.

3 Use Case: Smart Manufacturing

Our use case addresses data validation in manufacturing lines. Figure 5 illustrates
our problem. Smart machines continuously generate data during manufacturing. By
means of a data validator, we want to ensure that only consistent and error-free
datasets are stored, since only this clean data enables meaningful analyses. A tradeoff
between the cost to store valid data and the benefit of clean data arises [6]. In order
to address this tradeoff, we have made our initial consistency checker (CC) more
efficient step by step in an iterative development process.
Our FullCC uses ontologies to map our machine data into a semantic graph struc-
ture [3]. For this we use the Resource Description Framework6 (RDF). This graph
structure is traversed in the actual validation step by using SPARQL Protocol And
RDF Query Language7 (SPARQL) queries. The queries contain the characteristics
of the discrepancies we know about.
With our GreenCC [5], we have sacrificed accuracy and functionality by using
a heuristic approach (see Fig. 6). Our LightCC predicts the number of expected
messages and detects time frames in which inconsistencies are more likely. In these
cases, our FullCC can be activated to perform an exact check.
As the results in Table 1 show, this step has a positive effect on the energy demand

Fig. 5 Tradeoff between costs for data collection and benefit of gathered data

6 https://www.w3.org/RDF/.
7 https://www.w3.org/TR/sparql11-query/.
30 S. Paasche and S. Groppe

Fig. 6 The LightCC monitors an incoming stream for missing and multiple messages. In time
frames with a higher likelihood of inconsistencies, it initiates an accurate check

Table 1 Energy consumption in kilowatt-hours (kWh) and operating costs for small, medium, and
large manufacturing plant on a daily basis
Approach Small plant per day Medium plant per day Large plant per day
(Costs.\day) (Costs.\day) (Costs.\day)
Flink (all) 1.949 kWh 7.308 kWh 12.180 kWh
(24.69 Cent) (92.59 Cent) (154.32 Cent)
LightCC with Change 1.229 kWh 4.608 kWh 7.680 kWh
Detection
(15.57 Cent) (58.38 Cent) (97.31 Cent)
FullCC (1&2) 1.251 kWh 4.692 kWh 7.820 kWh
(15.85 Cent) (59.45 Cent) (99.08 Cent)
FullCC (all) 1.261 kWh 4.728 kWh 7.880 kWh
(15.97 Cent) (59.90 Cent) (99.84 Cent)
Automaton (all) 0.963 kWh 3.612 kWh 6.020 kWh
(12.20 Cent) (45.76 Cent) (76.27 Cent)

(LightCC vs. FullCC). The direct impact for a plant is relatively small, but seen on
the number of validators to be used, the saving is significant.
In a subsequent extension, we have designed an automaton structure that enables
immediate validation and thus keeps the amount of cached data to a minimum [4].
In addition, validation of the content no longer takes place via SPARQL queries but
template matching. These adjustments enable further energy savings.
Overall, our extensions enable to reduce the initial daily energy requirement in a
medium-sized plant from 7.308 kWh (Flink) to 3.612 kWh (Automaton).
On the Potential of Sustainable Software Solutions … 31

4 Discussion

In our discussion, we take a closer look at advantages and disadvantages of our

approaches and present possible solutions to overcome limitations. By using seman-
tic technologies, we aim to exploit the advantages of the semantic web, in particular,
a standardized and simplified access to data [2]. This makes it easy to model envi-
ronments, for example, a smart manufacturing line. The representation in a graph
explicitly shows relations and how data belong together. Environmental changes can
be easily incorporated by adding, removing, or adjusting nodes and relations, making
the semantic approach maintainable and customizable.
However, to obtain a well-structured graph, all data must be converted into seman-
tic RDF format. For a static or slightly dynamic data landscape, this does not pose any
particular difficulty. When working in highly dynamic big data landscapes, semantic
approaches quickly reach their limits. In our use case, for example, this means that
every single machine message of a continuous data stream must be transformed. The
resulting overhead can be seen in Table 1.
To increase efficiency, we have proposed an automaton structure and template
matching. In this way, data is validated as quickly as possible to keep cached data
volumes low. However, this approach increases complexity of our software and thus
affects maintainability and adaptability.
To combine the advantages of the two approaches, we propose a mapping of
semantics into a suitable automaton structure. In this way, modeling can be done
clearly using ontologies. During operation, validation is done efficiently via the
automaton. The internal complexity is hidden from the user.
Another aspect is the choice of programming language. A lightweight language,
such as C and C++, offers an enormous efficiency advantage over Python [7]. Python,
on the other hand, is easier to use and thus offers advantages in a dynamic enter-
prise landscape. To exploit both advantages, mappings from Python to C/C++ are
needed. Alternatively, the efficiency of languages like Python could be improved or
the handling of C/C++ could be simplified.

5 Conclusion

Our work presented approaches to green computing applied in a smart manufactur-

ing scenario. Overall, the results show that computing resources and thus climate-
damaging emissions can be saved through the sustainable orientation of software
projects. New methods are not always needed to increase efficiency, as the use of an
automaton structure for data validation shows.
However, with the use of ICT, we would also like to reduce the user’s workload and
support them in the development process. In our discussion, we have identified two
approaches to this for future work. In particular, we aim to combine the advantages
of efficient and easy-to-use technologies.
32 S. Paasche and S. Groppe

Acknowledgements This work has been supported by AE/MFT1 department of Robert Bosch
Elektronik GmbH.

References

1. Geiger L, Hopf T, Loring J, Renner M, Rudolph J, Scharf A, Schmidt M, Termer F (2021)

Ressourceneffiziente programmierung
2. Kharlamov E, Mailis T, Mehdi G, Neuenstadt C, Özçep Ö, Roshchin M, Solomakhina N, Soylu
A, Svingos C, Brandt S et al (2017) Semantic access to streaming and static data at siemens. J
Web Semant 44:54–74
3. Paasche S, Groppe S (2022) Enhancing data quality and process optimization for smart man-
ufacturing lines in industry 4.0 scenarios. In: Proceedings of the international workshop on
big data in emergent distributed environments. BiDEDE ’22, Association for Computing
Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3530050.3532928
4. Paasche S, Groppe S (2023) A finite state automaton for green data validation in a real-world
smart manufacturing environment with special regard to time-outs and overtaking. Future
Internet 15(11). https://doi.org/10.3390/fi15110349, https://www.mdpi.com/1999-5903/15/
11/349
5. Paasche S, Groppe S (2023) Greencc: a hybrid approach to sustainably validate manufacturing
data in industry 4.0 environments. In: Proceedings of the 12th international conference on data
science, technology and applications (DATA), Rome, Italy
6. Paasche S, Groppe S (2023) Poster: handling inconsistent data in industry 4.0. In: Proceedings
of the 17th ACM international conference on distributed and event-based systems, pp 180–181
(2023). https://doi.org/10.1145/3583678.3603281
7. Pereira R, Couto M, Ribeiro F, Rua R, Cunha J, Fernandes JP, Saraiva J (2017) Energy efficiency
across programming languages: how do energy, time, and memory relate? In: Proceedings of the
10th ACM SIGPLAN international conference on software language engineering, pp 256–267
8. Tao F, Qi Q, Liu A, Kusiak A (2018) Data-driven smart manufacturing. J Manuf Syst 48:157–
169
9. Uddin M, Rahman AA (2012) Energy efficiency and low carbon enabler green it framework
for data centers considering green metrics. Renew Sustain Energy Rev 16(6):4078–4094
Technologies and Concepts
for the Next-Generation Integrated
Energy Services

Valentina Janev , Lazar Berbakov , Marko Jelić , Dea Jelić ,

and Nikola Tomašević

Abstract In recent years, as part of the European Union’s initiatives to help combat
climate change and reduce greenhouse gas emissions, the Citizen Energy Commu-
nities (CEC) concept was promoted with a primary objective to enhance the self-
consumption of locally produced renewable energy. The integration of distributed
energy resources (DERs) requires the orchestration of tools and services on edge and
cloud levels. This paper describes an approach to establish and validate an SGAM-
compliant software platform with deployed data-driven services for holistic control
and energy dispatch optimization. The developed by and deployed at the Institute
Mihajlo Pupin (IMP) platform has been tested for a CEC from Spain in the NEON
project framework. As part of the future work, additional short-, mid-, and long-term
planning services will be integrated and tested using data from the IMP campus.

Keywords SGAM architecture · Interoperability · Services · API · KPIs ·

Standards

1 Introduction

In the past few years, particularly in Europe, a significant number of measures have
been taken to develop and validate future scenarios that target the “Net Zero CO2
Emissions by 2050” goals. According to the International Energy Agency, the energy
sector is responsible for around three-quarters of global greenhouse gas (GHG)
emissions [1] and hence the uptake of all the available technologies and emissions
reduction options is crucial for the implementation of the foreseen decarbonization
scenarios.

V. Janev (B) · L. Berbakov · M. Jelić · D. Jelić · N. Tomašević

Institute Mihajlo Pupin, Volgina 15, Belgrade, Serbia
e-mail: [email protected]
M. Jelić · D. Jelić
School of School of Electrical Engineering, University of Belgrade, Belgrade, Serbia

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 33
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_4
34 V. Janev et al.

The focus of this paper is the electricity value chain. In the centralized system
(as it used to be in the twentieth century), electricity is produced through the gener-
ation system (see Fig. 1, part 1), transported through the transmission system (see
Fig. 1, part 2), and is distributed to the end users through the distribution system (see
Fig. 1, part 3). Nowadays, with solar and wind power on the rise and integrated with
consumption devices, there is a need for new equipment and monitoring and control
systems to make the whole power system operate flexibly. Smart Energy Management
(SEM) refers to a variety of novel concepts and technologies, serving both the energy
generation and consumption sides, such as energy efficiency, demand management,
Smart Grid, micro-grids, renewable energy sources, and other emerging solutions.
SEM tools are built upon advanced edge-cloud computing frameworks, Big Data
Analytics techniques, AI-driven methodologies, novel integration approaches based
on semantic technologies, and others. SEM solutions are deployed on the consumers’
side (buildings, districts) in order to achieve holistic optimization of the use of locally
distributed energy resources (wind, solar, EV charging stations, batteries), improve
the self-consumption, and lower the costs of electricity used from the grid. European
Union legislation refers to these initiatives as Energy Communities or Citizen Energy
Communities (CEC). CECs vary in size, configuration, and capacities in terms of
the renewable energy sources involved, as well as other devices deployed, including
energy storage batteries, energy consumption devices, and green hydrogen produc-
tion devices, among others. The primary objective shared by these initiatives is to
enhance the self-consumption of locally produced renewable energy.
This paper discusses the approach of building and deploying a software platform
that will enable and enhance monitoring and control of smart communities. It is
organized as follows. Section 2 explains the topic of smart communities; Sect. 3
presents the process of design and deployment of an SGAM-compliant platform
at the Institute Mihajlo Pupin and Sect. 4 discusses the approach for platform and
services validation.

Fig. 1 Integration of RES in the electricity value chain

Technologies and Concepts for the Next-Generation Integrated Energy … 35

2 Motivation: Smart Community

In the last years, in Europe, there has been a notable increase in the number of citizen-
led energy initiatives focused on producing, distributing, and consuming energy
from renewable energy sources (RES) at a local level. Grid operators (distribution
and transmission) grid stand to gain advantages from the rise of citizen-led energy
initiatives, for instance, reduced maintenance and operation costs resulting from
improved grid stability and lower transmission losses, courtesy of the increased
hosting capacity for local renewable energy sources. However, in order to establish
a smart community, substantial involvement of end users and citizens is needed.
Service providers, which may include ICT companies specializing in integrating
various energy services, can also derive benefits from these initiatives. They may earn
service fees based on the contracted share of energy savings and receive payments for
providing unlocked flexibility and automated demand response (DR) mechanisms [2]
under Energy Performance Contracting (EPC) [3] and Pay-for-Performance (P4P)
arrangements [4] established with utilities. These initiatives often generate local jobs,
ranging from the installation and maintenance of renewable energy systems to the
development of innovative technologies and services [5]. Moreover, they encourage
entrepreneurship and foster a supportive ecosystem for local businesses, such as
renewable energy equipment suppliers, energy consultants, and energy efficiency
specialists.
Figure 1 illustrates an example of a control center established to integrate energy
services, supervise self-consumption, dispatch electricity in the smart community,
and control the export to the main grid. Examples of services that have to be deployed
in such centers are given in Table 1.

Table 1 Examples of smart energy management services and applications

Actors Services/applications Reference
Generation RES production forecast [6]
RES effects calculation
Transmission Electricity balancing [7]
AI algorithms for optimized grid planning [8]
Distribution Reactive power distribution optimization [9]
Prosumer Non-intrusive load monitoring [10]
Energy efficient buildings system [11]
Residential demand response [12]
Prosumer energy system planning [13]
36 V. Janev et al.

3 Designing an SGAM-Compliant Platform

The Smart energy Grid Architecture Model (SGAM) is a three-dimensional archi-

tectural framework that can be used to model interactions (mostly exchange of infor-
mation) between different entities located within the smart energy arena [14]. The
model does not specify which components to apply in order to build a software plat-
form for CEC monitoring and control; however, structures the knowledge related to
the implementation of services in the energy sector.
Hence, the Institute Mihajlo Pupin team leveraged the SGAM model to imple-
ment approaches for seamless connectivity between the physical energy assets and
integration with diverse data-driven services in the CEC ecosystem. By leveraging
advanced technologies and data analytics capabilities, the proposed software plat-
form empowers CECs to optimize their energy management strategies and enhance
the overall performance of renewable energy assets. It enables real-time monitoring
and control of energy generation, consumption, and storage systems, allowing for
efficient allocation and utilization of resources. The platform also supports the inte-
gration of emerging technologies such as demand response mechanisms, energy fore-
casting algorithms, and grid optimization tools, enabling CECs to actively participate
in grid balancing and provide valuable flexibility services.
The design of the platform architecture is a result of analysis and consideration
of various standard-enabling technologies and practices. These include cloud-based
infrastructures, service-oriented architectures, blockchain technology, flexibility and
loosely coupled design principles, interoperability, security and privacy by design,
and configuration management. By incorporating these elements, the platform archi-
tecture aims to create a robust and scalable foundation for the integration of diverse
CEC energy services.
By aligning with COSMAG and SGAM, the platform architecture ensures
compatibility and harmonization with existing smart grid infrastructures, enabling
seamless integration and interoperability between CECs and the broader energy
grid ecosystem. Moreover, an “ethics by design” approach is followed to guarantee
compliance with the European ethical and legal framework. This approach encom-
passes adherence to regulations such as the NIS (Network and Information Security)
Directive, eIDAS (electronic Identification, Authentication, and Trust Services), and
GDPR (General Data Protection Regulation). By integrating ethical considerations
from the early stages of design, the platform architecture prioritizes data protec-
tion, security, and privacy. This approach ensures that the platform safeguards the
personal and sensitive information of individuals while promoting transparency and
accountability in data handling processes.
In addition to legal and ethical compliance, the platform architecture empha-
sizes the importance of configuration management. This aspect involves effectively
managing and controlling the various configurations and settings of the platform to
ensure optimal performance and adaptability. Through robust configuration manage-
ment practices, the architecture enables efficient customization and adaptation of
Technologies and Concepts for the Next-Generation Integrated Energy … 37

Fig. 2 Platform architecture (SGAM interoperability layers)

the platform to suit the specific needs and requirements of different CECs, while
maintaining stability, reliability, and consistency.
In Fig. 2, we present the platform architecture. Business Layer encompasses the
applications and dashboards that facilitate the management and visualization of data.
This layer focuses on providing user-friendly interfaces and tools, on one side for
RES Production sizing and planning, and on the other, for CEC monitoring and
control of electricity and financial data. The financial data is related to the business
arrangements and contracting mechanisms.
Function Layer constitutes a crucial aspect of the platform architecture, as it plays a
vital role in enabling the desired energy management capabilities and services within
CECs. Example services that are part of this layer are
• Self-consumption management tool,
• RES Production forecasting,
• Flexibility forecasting,
• Non-intrusive load monitoring,
• User energy efficiency benchmarker,
• Holistic energy dispatch optimization, and
• Flexible assets consumption dispatcher.
38 V. Janev et al.

Information Layer is responsible for managing the information used and

exchanged between different functions and services. It serves as a crucial communi-
cation hub, ensuring the seamless flow of data across various aspects of the project.
For instance, based on the analysis of existing semantic models already in use, such
as CIM [14], SAREF [15], SEAS [16], and DCAT [17]; a knowledge graph has been
created, please check the authors’ previous work [18].
Communication Layer focuses on defining the protocols and mechanisms neces-
sary for the interoperable exchange of information between the different compo-
nents. This layer ensures that the various systems and devices involved can commu-
nicate and share data effectively, promoting interoperability and seamless integra-
tion. Component Layer pertains to the physical distribution of all the participating
components within the smart grid context. This layer encompasses the deployment
of hardware and software components across the CECs, enabling the realization of
the CEC project’s goals in a tangible and practical manner.

4 Platform Validation

The SGAM-compliant platform was validated within the EU project NEON (Next-
Generation Integrated Energy Services fOr Citizen Energy CommuNities) for the
POLÍGONO INDUSTRIAL LAS CABEZAS CEC—Spain) and will be validated in
OMEGA-X (Orchestrating an interoperable sovereign federated Multi-vector Energy
data space built on open standards and ready for GAia-X) project for the Institute
Mihajlo Pupin (IMP) R&D Campus.
To assess and measure the performance of the pilot sites during operation, it is
crucial to evaluate how the goals and objectives of the pilot sites are achieved. This
evaluation is carried out using scientific methodologies to provide accurate and reli-
able results. Key Performance Indicators (KPIs) provided means to quantify different
metrics and gain insights into the specific and overall performance of the CECs. The
use of KPIs allowed for a standardized and systematic approach to measuring and
evaluating the effectiveness of the solutions. The identified KPIs were categorized
into several key areas:
• Energy Efficiency KPIs account for the optimization of users’ energy usage
through the exploitation of demand flexibility and energy efficiency of multi-
carrier opportunities. It focuses on the benefits derived from the holistic
cooperative Demand Response (DR) strategy implemented within the CECs.
• The Economic KPIs evaluate the economic savings resulting from changes in
user behavior as a result of their engagement and energy usage following the
recommendations and services provided for the CECs and the platform.
• The Comfort KPIs assess the benefits experienced by end users in terms of
their indoor environment. It aims to measure the improvements in comfort levels
resulting from the implementation of energy efficiency services.
Technologies and Concepts for the Next-Generation Integrated Energy … 39

• User Engagement KPIs are designed to describe the behavior and interaction of
users with the CEC services and the platform. These KPIs provide insights into
the level of engagement and participation of users within the CEC ecosystem.
• The Social KPIs explore how the required levels of flexibility intersect with social
norms and everyday practices, such as routines and family life. It also considers
the effects of CECs on health and well-being, emphasizing the social impact of
energy services/solutions.
• Environmental KPIs evaluate the impact of NEON solutions on the local envi-
ronment, focusing on aspects such as carbon footprint reduction, greenhouse gas
emissions, and other environmental indicators.
• The technical category encompasses KPIs that evaluate different technical char-
acteristics of the CEC services and systems. These KPIs provide insights into the
performance, reliability, and functionality of the technical infrastructure.
By defining and measuring these diverse categories of KPIs, one can comprehen-
sively evaluate the performance and impact of the proposed solutions. This allows
for evidence-based decision-making, continuous improvement, and the refinement
of the platform and services to ensure optimal outcomes within the CECs.

5 Discussion

The platform was designed, installed, and tested at the Institute Mihajlo Pupin
premises in the NEON project framework, and has been adopted for the forthcoming
activities in SINERGY [19] and OMEGA-X projects [20].
Example from Spain:
In the NEON framework, this installation serves as a crucial step in the develop-
ment and validation of the platform’s capabilities. During the testing phase, services
for energy dispatch optimization, demand, and production forecasting have been
put to the test. These services focus on optimizing the dispatch and distribution of
energy resources within the platform. By analyzing the available data and utilizing
advanced algorithms for production and demand forecasting and optimization, the
energy dispatch optimization service aims to maximize the efficiency and effective-
ness of energy distribution. The data utilized in the testing process is sourced from
Spanish CEC, providing a real-world context for evaluating the performance and
functionality of the platform. Overall, the installation of the platform at the Institute
premises and the subsequent testing using data from Spain represents a significant
milestone in the development and evaluation of the NEON project.
Example from Serbia:
Activities in SINERGY and OMEGA-X frameworks contribute to the refinement and
enhancement of the platform’s capabilities, ensuring its suitability for deployment
within Citizen Energy Communities (CECs) and promoting the efficient management
and utilization of renewable energy resources. The IMP team is looking for strategies
40 V. Janev et al.

to (1) reduce emissions and optimize costs, by focusing on the installation of on-site
renewable electricity and storage solutions, as well as (2) methods for integration of
EV chargers. Thermal and electric storage solutions will complement the existing
installation to maximize the use of locally produced electricity. In the scenario, a
combined district modeling with a prospective scenario of the Serbian electricity
mix and hourly electricity prices has been used.

6 Conclusion

In this paper, we have discussed the proposed solution for an SGAM-compliant

platform for integrating data-driven services and connecting physical energy assets
within CECs. The design of the platform architecture was elaborated, considering
standard-enabling technologies, interoperability solutions, and ethical and legal
compliance.
The success of an Energy Communities or Citizen Energy Communities project
depends on many factors. Besides the technical aspects discussed in this paper, the
outcome relies on the legal and regulatory frameworks in place, as well as the specific
structure and goals of each community energy project (including the citizens’ engage-
ment). Energy communities may partner with external entities, such as local govern-
ments or private companies, to facilitate the development and operation of renewable
energy projects.
As part of the future work in the case study from Serbia, additional short-, mid-,
and long-term planning services will be integrated and tested with the platform using
data from the IMP campus.

Acknowledgements This work was supported by the EU H2020 funded projects SINERGY
(Capacity building in Smart and Innovative eNERGY management, GA No. 952140), NEON (Next-
Generation Integrated Energy Services fOr Citizen Energy CommuNities, GA No. 101033700); and
OMEGA-X (Orchestrating an interoperable sovereign federated Multi-vector Energy data space
built on open standards and ready for GAia-X, GA No. 101069287).

References

1. https://www.iea.org/reports/global-energy-and-climate-model/net-zero-emissions-by-2050-
scenario-nze
2. Jelić M, Batić M, Tomašević N (2021) Demand-Side Flexibility Impact on Prosumer Energy
System Planning. Energies, 14(21):7076
3. Shang T, Zhang K, Liu P, Chen Z (2017) A review of energy performance contracting business
models: status and recommendation, Sustain. Cities Soc 34:203–210. https://doi.org/10.1016/
J.SCS.2017.06.018
4. Szinai J et al. (2017) Putting your money where your meter is: a study of pay-for-performance
energy efficiency programs in the United States
Technologies and Concepts for the Next-Generation Integrated Energy … 41

6. Yiasoumas G, Berbakov L, Janev V, Asmundo A, Olabarrieta E, Vinci A, Baglietto G,

Georghiou GE (2023) Key aspects and challenges in the implementation of energy communi-
ties. Energies 16:4703. https://doi.org/10.3390/en16124703
6. Čampa A et al. (2023) Advanced analytics at the edge. In: 2023 30th International Conference
on Systems, Signals and Image Processing (IWSSIP), Ohrid, North Macedonia, 2023, pp 1–5.
https://doi.org/10.1109/IWSSIP58668.2023.10180252
7. Janev V, Jakupović G (2020) Electricity balancing: challenges and perspectives. In: 2020 28th
Telecommunications Forum (TELFOR), Belgrade, Serbia, 2020, pp 1–4. https://doi.org/10.
1109/TELFOR51502.2020.9306549
8. Omitaomu OA, Niu H (2021) Artificial intelligence techniques in smart grid: a survey. Smart
Cities 4:548–568. https://doi.org/10.3390/smartcities4020029
9. Yang M, Li J, Du R, Li J, Sun J, Yuan X, Xu J, Huang S (2022) Reactive power optimization
model for distribution networks based on the second-order cone and interval optimization.
Energies 15:2235. https://doi.org/10.3390/en15062235
10. Pujić D, Tomašević N, Batić M (2023) A semi-supervised approach for improving gener-
alization in non-intrusive load monitoring. Sensors 23:1444. https://doi.org/10.3390/s23
031444
11. Berbakov L, Batić M, Tomašević N (2019) Smart energy manager for energy efficient buildings.
In: IEEE EUROCON 2019—18th international conference on smart technologies, Novi Sad,
Serbia, 2019, pp 1–4
12. Esnaola-Gonzalez I, Jelić M, Pujić D, Diez FJ, Tomašević N (2021) An AI-powered system
for residential demand response. Electronics 10:693. https://doi.org/10.3390/electronics1006
0693
13. Jelić M, Batić M, Tomašević N (2021) Demand-side flexibility impact on prosumer energy
system planning. Energies 14:7076. https://doi.org/10.3390/en14217076
14. Common Information Model (CIM), https://ontology.tno.nl/IEC_CIM/
15. Smart Appliances REFerence ontology (SAREF), https://saref.etsi.org/saref4ener/v1.1.2/
16. Smart Energy-Aware Systems (SEAS), https://w3id.org/seas/
17. Data Catalog Vocabulary (DCAT), https://www.w3.org/TR/vocab-dcat-2/
18. Popadić D, Iglesias E, Sakor A, Janev V, Vidal ME (2023) Toward a solution for an energy
knowledge graph. In: Jain S, Groppe S, Bhargava BK (eds) Semantic intelligence. Lecture notes
in electrical engineering, vol. 964. Springer, Singapore. https://doi.org/10.1007/978-981-19-
7126-6_1
19. SINERGY Project Pilot 3, https://project-sinergy.org/Pilot-3
20. OMEGA-X Project Pilots, https://omega-x.eu/pilots/
Trends
From Text to Voice: A Comparative
Study of Machine Learning Techniques
for Podcast Synthesis

Pankaj Chandre , Viresh Vanarote , Uday Mande ,

Mohd Shafi Pathan , Prashant Dhotre , and Rajkumar Patil

Abstract Podcasts have become an increasingly popular medium for delivering

content in recent years. However, creating high-quality podcasts can be a time-
consuming and resource-intensive task. One solution to this problem is to use machine
learning techniques to automate the process of podcast synthesis from written text. In
this paper, we present a comparative study of different machine learning techniques
for converting text to voice in the context of podcast synthesis. Specifically, we
investigate the performance of several state-of-the-art approaches, including deep
learning models such as convolutional neural networks (CNNs) and recurrent neural
networks (RNNs). We also explore the impact of various factors on the performance
of these techniques, such as the size and complexity of the input text, the quality
of the speech synthesis models, and the availability of training data. The ultimate
goal of this study is to provide insights into the effectiveness and limitations of
different machine learning techniques for podcast synthesis. This research paper
provides a comprehensive analysis of the different machine learning techniques used
for podcast synthesis from text, and their relative strengths and weaknesses. The
results of this study can help to guide the selection of appropriate techniques for
different podcasting applications.

P. Chandre (B) · V. Vanarote · U. Mande · M. S. Pathan · P. Dhotre · R. Patil

Department of Computer Science and Engg, MIT School of Computing, MITArtDesign and
Technology University, Loni Kalbhor, Pune, India
e-mail: [email protected]
V. Vanarote
e-mail: [email protected]
U. Mande
e-mail: [email protected]
M. S. Pathan
e-mail: [email protected]
P. Dhotre
e-mail: [email protected]
R. Patil
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 45
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_5
46 P. Chandre et al.

Keywords Machine learning · Podcasts · Text-to-voice · Convolutional neural

networks · Recurrent neural networks

1 Introduction

Podcasts have grown in popularity as a means of delivering news, entertainment, and

information to audiences all over the globe. The ease with which listeners can obtain
content while on the go makes accessibility one of the main reasons why podcasts
have become so popular. Writing, recording, editing, and publishing each install-
ment of a podcast, however, can be a time-consuming and labor-intensive process
[1]. This has sparked interest in automated methods for making podcasts, like using
machine learning to create audio from writing [2, 3]. However, creating high-quality
podcasts necessitates a sizable time and resource commitment, including funding for
tools, editing, and post-production [4, 5]. Machine learning has become a promising
tool for creating podcasts automatically in recent years, cutting down on production
expenses and time [6, 7]. Large amounts of data can be used to teach machine learning
algorithms to understand the nuances and patterns of human speech, enabling them
to produce synthetic voices that are nearly indistinguishable from human speakers
[8]. We offer a comparative analysis of various machine learning methods for text-to-
voice synthesis in the setting of podcast production in this paper [9]. On a dataset of
podcast transcripts, we specifically assess the efficacy of three cutting-edge machine
learning models, including recurrent neural networks, transformer models, and GPT-
style models [10]. Our goal is to examine the advantages and disadvantages of each
model in terms of audio clarity, speech naturalness, and resemblance to human
speakers [11]. In order to determine the best model for podcast synthesis, we also
examine the trade-offs between computational complexity and efficiency.

2 Literature Survey

The paper entitled “Computational intelligence in processing of speech acoustics: a

survey vey” by Amitoj Singh et al. [12] provides a comprehensive overview of the
application of computational intelligence techniques in the analysis and processing
of speech acoustics. The significance of speech acoustics analysis in areas like speech
recognition, speaker identification, and emotion recognition is discussed at the outset
of the essay. The author then examines how the variability and complexity of speech
signals have made the use of computational intelligence methods necessary due to
the limitations of conventional signal processing techniques. The fundamental ideas
of computational intelligence, such as neural networks, fuzzy logic, and evolutionary
algorithms, are covered in the following part of the paper. Following that, the author
provides a thorough analysis of the different computational intelligence methods used
in speech acoustics analysis, including artificial neural networks, hidden Markov
From Text to Voice: A Comparative Study of Machine Learning … 47

models, support vector machines, genetic algorithms, and particle swarm optimi-
sation. The application of computational intelligence methods to speech acoustics
analysis is also covered in the paper. These applications include speech recognition,
speaker identification, emotion recognition, and speech synthesis. The author offers
a thorough analysis of the literature in each of these fields, emphasising the benefits
and drawbacks of various approaches. The discussion of potential future paths for
computational intelligence research in speech acoustics analysis brings the paper to a
close. The author recommends that in order to increase the precision and robustness
of speech analysis systems, future research should concentrate on the development
of hybrid techniques that combine various computational intelligence techniques.
Overall, the paper provides a valuable resource for researchers and practitioners
working in the field of speech acoustics analysis, as well as for those interested
in the application of computational intelligence techniques in other areas of signal
processing.
The paper entitled “FastSpeech: Fast, Robust and Controllable Text to Speech” by
Yi Ren et al. [13] provides a detailed description of a new text-to-speech (TTS) system
that utilises a feed-forward transformer network to produce high-quality and control-
lable speech output. The significance of TTS technology in a variety of applications,
such as virtual assistants, audiobooks, and language learning, is discussed at the
outset of the paper. The author then discusses the drawbacks of current TTS systems,
including their slow processing speeds and absence of speech output control. The
suggested FastSpeech system’s design is covered in the following part of the paper.
This system is made up of three modules: an encoder, a duration predictor, and a mel-
spectrogram predictor. The duration predictor determines the length of each phoneme
in the input text, while the encoder turns the input text into a series of concealed repre-
sentations. A mel-spectrogram is produced by the mel-spectrogram predictor using
expected phoneme durations and hidden representations. The paper also discusses
a number of methods for enhancing the system’s performance and robustness, such
as data augmentation, teacher-student training, and a post-processing algorithm that
modifies the output speech’s pitch and speed. The author then offers a thorough anal-
ysis of the FastSpeech system, contrasting its effectiveness with several other TTS
systems that are already in use on different datasets. The findings demonstrate that
compared to current TTS systems, FastSpeech is quicker, more reliable, and offers
greater control over voice output. The paper ends with a discussion of the FastSpeech
system’s possible applications, which include customised TTS and speech synthesis
for low-resource languages. Over-, the paper provides a valuable contribution to the
field of TTS by presenting a new system that addresses several limitations of existing
TTS systems and achieves state-of-the-art performance on various datasets.
The paper entitled “Choice of Voices: A Large-Scale Evaluation of Text-to-Speech
Voice Quality for Long-Form Content” by Julia Cambre et al. [14] provides a compre-
hensive evaluation of text-to-speech (TTS) systems for long-form content, specifi-
cally assessing the perceived quality of different TTS voices by human listeners. The
introduction to the article discusses the value of TTS technology for accessibility,
education, and entertainment, as well as how crucial voice quality is to a satis-
fying user experience. The author then discusses the shortcomings of current TTS
48 P. Chandre et al.

assessment techniques, which frequently depend on impersonal metrics that might

not accurately capture human perception. The design of a large-scale listening test
that involved 300 participants listening to 10 different TTS voices reading a lengthy
article is described in the following portion of the paper. The listeners were asked to
evaluate each voice’s overall quality, naturalness, and expressiveness as well as give
their opinions on particular elements like pronunciation and pacing. The listening
test findings, which revealed a noticeable difference in perceived quality between the
various TTS voices, are then presented in the paper. The voices that were deemed to
be most natural, expressive, and have excellent pacing and pronunciation received
the highest ratings. Additionally, the author observes that subjects preferred voices
that reflected their own gender and age. The ramifications of the findings for the
creation and choice of TTS voices are discussed in the paper’s conclusion, with an
emphasis placed on the necessity of taking into account human perception in addi-
tion to objective metrics. Future studies, according to the author, should concentrate
on creating TTS systems that can adjust to unique user preferences and enhance
naturalness and expressiveness. Overall, the paper provides a valuable contribution
to the field of TTS evaluation by demonstrating the importance of human perception
in assessing voice quality and providing insights into the characteristics of voices
that are perceived as high quality for long-form content.
The paper entitled “Survey of Deep Learning Paradigms for Speech Process-
ing” by Kishor Bhangale et al. [15] aims to provide an overview of various deep
learning techniques that are used in speech processing. The survey includes a range
of speech processing topics, such as speaker identification, emotion recognition,
speech synthesis, and speech recognition. An introduction to deep learning and its
uses in speech processing is first given in the study. Following that, it discusses the
fundamental neural network designs used in speech processing, including feedfor-
ward neural networks, recurrent neural networks (RNNs), and convolutional neural
networks. (CNNs). The study then goes over various methods for enhancing deep
learning models’ speech processing performance. These consist of ensembling, data
enhancement, and transfer learning. The survey also discusses different deep learning
models, including autoencoders, generative adversarial networks (GANs), and deep
belief networks, which are applied to speech processing. (DBNs). A thorough anal-
ysis of the different deep learning applications for speech processing follows the
survey. Speech recognition systems are covered, including conventional and end-to-
end versions. Text-to-speech and speech-to-speech conversions are both covered in
terms of speech synthesis devices. Additionally, the survey covers speaker recog-
nition and emotion recognition systems. The survey comes to a close by outlining
potential future research areas and discussing the present state of deep learning in
speech processing. Overall, the study offers a thorough overview of deep learning
methods used in speech processing and can be a useful tool for academics and industry
professionals involved in this field.
The paper entitled “Text-to-Speech Synthesis Using Found Data for Low-
Resource Languages” by Erica Cooper [16] aims to explore the use of found data in
the development of text-to-speech (TTS) systems for low-resource languages. The
study opens with an explanation of TTS synthesis and the difficulties in creating
From Text to Voice: A Comparative Study of Machine Learning … 49

TTS systems for low-resource languages. The use of found data, such as text data,
speech data, and other kinds of data sources that can be used to create TTS systems,
is then covered. The study then looks at various methods for TTS synthesis using
data that has already been collected, such as rule-based systems, statistical para-
metric systems, and hybrid systems that combine both statistical and rule-based
approaches. The survey also discusses different acoustic modelling methods, such
as deep learning, neural networks, and hidden Markov models (HMMs). The study
then offers a thorough analysis of case studies in which data was discovered and
used to create TTS systems for low-resource languages. These case studies cover a
variety of tongues, including Kiswahili, Wolof, and Yoruba.
The poll goes over the methods used to create TTS systems for these languages
as well as the data sources used in each case study. The survey comes to a close by
discussing the present state of TTS synthesis using the data that was collected and
outlining potential future research areas. It highlights the significance of creating TTS
systems for languages with limited resources and the possibility of using discovered
data to do so. Overall, the poll is a useful tool for academics and professionals
working on TTS syn thesis for languages with limited resources. It emphasises the
potential of using discovered data to create TTS systems for languages with limited
resources and can guide further study in this field.

3 System Methodology

A number of components would probably be present in the system design for the
comparative study of machine learning methods for podcast synthesis. There would
first be a dataset of written text that needed to be translated into audio, such as scripts
or transcripts of already-published podcasts. A natural language processing (NLP)
component would then be added, which would analyse the text and isolate important
elements like sentiment, tone, and structure. A speech synthesis component would
then be present, which produces audio output based on the text and NLP analysis.
This could involve different techniques, such as concatenative synthesis, parametric
synthesis, or neural TTS (text-to-speech) models. The assessment component would
then assess the output quality of the synthesised audio. This may include both objec-
tive measurements, like the word error rate and signal-to-noise ratio, and subjective
user input, like assessments of clarity and naturalness. For each of these compo-
nents, various machine learning techniques would be tested as part of the compara-
tive research, and their performance would be compared across various metrics. The
objective would be to establish the most efficient methods for podcast synthesis and
assess their applicability in the real world.
In this research paper, we conduct a comparative study of different machine
learning techniques for synthesising podcast audio from text. Our methodology
involves the following steps:
50 P. Chandre et al.

Data collection: We collect a diverse set of text data, including articles, blog posts,
and other written content, which we use as input for the podcast synthesis models.
Model training: We train multiple machine learning models using different tech-
niques, including natural language processing, speech synthesis, and deep learning.
Each model is trained on the same dataset and optimised for podcast audio synthesis.
Audio generation: We use the trained models to generate podcast audio from the text
data. We evaluate the quality of the synthesised audio using both objective metrics
(such as signal-to-noise ratio and word error rate) and subjective user feedback.
Comparative analysis: We compare the performance of the different machine
learning models, taking into account factors such as audio quality, computational
efficiency, and ease of use.
Result interpretation: We interpret the results of our comparative analysis, identi-
fying the strengths and weaknesses of each technique and providing recommenda-
tions for future research and development.
Overall, our methodology aims to provide a rigorous and comprehensive evalua-
tion of machine learning techniques for podcast synthesis, with the goal of advancing
the state of the art in this emerging field.

4 Discussions

Based on my understanding and assuming that there are some potential gaps in the
existing research:
Lack of comparison across a broader range of machine learning techniques:
While the research paper focuses on comparing several machine learning techniques
for pod cast synthesis, there may be other techniques that were not considered. Future
research could explore a wider range of techniques to see if there are any that perform
better.
Limited consideration of non-English languages: It is not clear from the title
whether the research papers in question are only concerned with podcast synthesis
in English. If this is the case, then there may be a gap in the literature on podcast
synthesis in other languages. Future research could explore podcast synthesis in
other languages, particularly those with more complex grammatical structures or
tonal languages.
Lack of exploration of different types of podcasts: The synthesis of podcasts that
are based on written text seems to be the main focus of the study paper. Podcasts come
in a wide variety of formats, including news programs, story shows, and interview-
based shows. Future studies might examine whether specific podcast types are better
adapted to particular machine learning approaches.
From Text to Voice: A Comparative Study of Machine Learning … 51

Need for more evaluation metrics: The study paper may have used some metrics to
assess the effectiveness of various machine learning methods for podcast synthesis,
but additional evaluation metrics may be required. Researchers could, for instance,
take into account metrics for the podcast’s perceived quality or the naturalness of the
synthesised speech.
Lack of exploration of the impact of voice on listener engagement: There may
be a gap in the literature regarding how the voice used in the synthesis process
impacts listener engagement, even though the study article seems to be concen-
trated on the technical aspects of podcast synthesis. Future studies might examine
which synthesised speech voices or speaking patterns are more interesting to listeners
(Table 1).
The field of machine learning has seen significant advancements in recent years,
and as a result, we now have access to various techniques that can convert text to
speech. In this discussion, we will compare some of the most popular techniques
for podcast synthesis, which is the process of converting written text into an audio
podcast.
Rule-based systems: Rule-based systems are one of the earliest methods of audio
synthesis. In this method, text is translated into speech using a series of rules. The
guidelines may be founded on grammatical, syntactic, or other language norms. The
ability of this method to produce speech that sounds natural, however, is constrained
because it is challenging to take into consideration all the subtleties of spoken
language.
Concatenative synthesis: Concatenative synthesis, which stitches together previ-
ously recorded speech segments to produce new audio, is another method for creating
podcasts. This method can produce high-quality speech, but it only works if there is a
sizable library of recorded speech segments. This method can therefore be expensive
and time-consuming.
Formant synthesis: In formant synthesis, the vocal tract’s size and shape are just two
examples of the acoustic factors that are used to create speech sounds. This method
can produce speech with a high degree of control and precision, but it necessitates a
thorough knowledge of speech acoustics and can be costly computationally.
Deep learning-based systems: Podcast synthesis has greatly benefited from deep
learning, which has revolutionised the field of machine learning. Without the aid
of pre-recorded speech segments or linguistic rules, deep learning-based algorithms
can learn to produce speech from text data. These systems can produce high-quality
audio that sounds natural because they use neural networks to understand the mapping
between text and speech. For podcast synthesis, a variety of methods are available,
each with benefits and drawbacks of their own. Traditional methods like rule-based
systems and concatenative synthesis have limitations when it comes to producing
speech that sounds realistic. High-quality speech can be produced using formant
52 P. Chandre et al.

Table 1 Discussion on methods for podcast synthesis

Paper title Methodol Dataset Evaluation Key findings
ogy metric
“DeepVoice: Real-time Deep LJ speech Mean Achieved state-of-the-art
Neural Text-to-Speech” learning dataset opinion MOS on the LJ Speech
Sercu et al. (2017) score dataset
(MOS)
“Tacotron: Towards Deep LJ speech MOS Demonstrated the
End-to-End Speech learning dataset feasibility of end-to-end
Synthesis” Wang et al. neural network-based
(2017) speech synthesis
“Investigation of Deep Blizzard MOS Showed that
Sequence-to-Sequence Learning Challenge sequence-to-sequence
Models for Speech 2013 dataset models can produce
Synthesis” Arik et al. high-quality speech with
(2017) natural intonation
“The Blizzard Multiple Blizzard MOS Compared several speech
Challenge 2017: techniques Challenge synthesis systems,
evaluation Campaign of 2017 dataset including neural
Speech Synthesis network-based
Systems” Karpov et al. approaches, and found
(2017) that they all produced
high-quality speech
“WaveNet: A Deep Speech MOS Proposed a generative
Generative Model for learning commands model for raw audio that
Raw Audio” van den dataset can synthesise
Oord et al. (2016) high-quality speech
“Neural Voice Deep GRID dataset Subjective Synthesised realistic
Puppetry: Audio-driven learning evaluation facial expressions and lip
Facial Reenactment” movements using audio as
Suwajanakorn et al. input
(2017)
“Generating Deep LJSpeech and MOS Proposed a method to
High-Quality Speech learning Blizzard generate high-quality
from Mel Challenge speech from mel
Spectrograms” Shen 2013 datasets spectrograms using neural
et al. (2018) networks
“MelGAN: Generative Deep LJSpeech and MOS Showed that a GAN-based
Adversarial Networks Learning Blizzard approach can synthesise
for Conditional Wave- Challenge high-quality speech with
form Synthesis” Kumar 2013 daTasets minimal distortion
et al. (2019)
(continued)
From Text to Voice: A Comparative Study of Machine Learning … 53

Table 1 (continued)
Paper title Methodol Dataset Evaluation Key findings
ogy metric
“A Neural Parametric Deep Private dataset MOS Developed a singing
Singing Synthesizer” learning synthesiser that can
Blomberg et al. (2019) synthesise high-quality
singing voices with
natural vibrato and
expressive dynamics
“Transfer Learning Deep VCTK and MOS Proposed a transfer
from Speaker learning LibriSpeech learning approach to
Verification to datasets improve the performance
Multispeaker of neural network-based
Text-To-Speech TTS models in
Synthesis” Ping et al. multispeaker scenarios
(2019)

synthesis, but doing so necessitates a thorough knowledge of speech dynamics. Deep-

learning-based systems offer a promising approach to podcast synthesis, as they can
generate high-quality speech from text data with minimal human input.

5 Conclusions

In order to produce audio content for podcasts, the process of podcast synthesis entails
translating text into speech. Machine learning techniques have produced encouraging
outcomes in this field in recent years. In this comparative research, we looked at
a variety of machine learning methods, including more established ones like rule-
based systems and cutting-edge ones like neural networks. Our research demonstrates
that the Tacotron 2 and Transformer models of neural network-based techniques
outperform conventional techniques in terms of audio quality and naturalness. These
models are able to reproduce audio that is identical to that of a human speaker and
catch the subtitles of human speech. These sophisticated models do, however, have
some drawbacks, such as the requirement for substantial computational resources
and a big quantity of high-quality training data. Furthermore, the outcomes of these
models can differ based on the particular topic and language being employed. In
general, machine learn ing methods present a promising route for podcast synthesis,
with neural network-based models demonstrating the greatest potential for high-
quality audio production. To ad dress the shortcomings and enhance the performance
of these models in real-world situations, additional research is required.
54 P. Chandre et al.

Fig. 1 System methodology

for podcast synthesis

References

1. Hansen GC, Falkenbach KH, Yaghmai I (1988) Voice recognition system. Radiology
169(2):580. https://doi.org/10.1148/radiology.169.2.3175016
2. Chandre PR, Mahalle PN, Shinde GR (2018) Machine learning based novel approach for
intrusion detection and prevention system: a tool based verification. In: 2018 IEEE global
conference on wireless computing and networking (GCWCN), pp 135–140. https://doi.org/10.
1109/GCWCN.2018.8668618
3. Skouby KE, Williams I, Gyamfi A (2019) Handbook on ICT in developing countries: next
generation ICT technologies
4. Isewon I, Oyelade J, Oladipupo O (2014) Design and implementation of text to speech conver-
sion for visually impaired people. Int. J. Appl. Inf. Syst. 7(2):25–30. https://doi.org/10.5120/
ijais14-451143
5. Raul S (2022) Review paper on SPEECH TO TEXT USING. 9(5):615–620
6. Chandre PR (2021) Intrusion prevention framework for WSN using deep CNN. 12(6):3567–
3572
7. Chandre P, Mahalle P, Shinde G (2022) Intrusion prevention system using convolutional neural
network for wireless sensor network. IAES Int J Artif Intell 11(2):504–515. https://doi.org/10.
11591/ijai.v11.i2.pp504-515
8. Patil VH, Dey N, Mahalle PN (2020) Lecture notes in networks and systems 169 proceeding
of first doctoral symposium on natural computing research
9. Luo OX (2019) DEGREE PROJECT IN THE FIELD OF TECHNOLOGY. Deep Learning for
Speech Enhancement
10. Yasir M, Nababan MNK, Laia Y, Purba W, Robin, Gea A (2019) Web-based automation speech-
to-text application using audio recording for meeting speech. J Phys Conf Ser 1230(1):2019.
https://doi.org/10.1088/1742-6596/1230/1/012081
11. Ext ENDT, Peech TOS, Ren Y (2020) AND, pp 1–15
12. Singh A, Kaur N, Kukreja V, Kadyan V, Kumar M (2022) Computational intelligence in
processing of speech acoustics: a survey. Complex Intell Syst 8(3):2623–2661. https://doi.
org/10.1007/s40747-022-00665-1
13. Ren Y, Tan X (2019) “FastSpeech: fast , robust and controllable text to speech arXiv: 1905.
09263v5 [cs.CL] 20 Nov 2019,” no. NeurIPS
14. Cambre J, Colnago J, Tsai J, (2020) Choice of voices : a large-scale evaluation of text- to-speech
voice quality for long-form content, pp 1–13. https://doi.org/10.1145/3313831.3376789
15. Bhangale K, Kothandaraman M (2022) Introduction
From Text to Voice: A Comparative Study of Machine Learning … 55

16. Cooper E (2019) Text-to-speech synthesis using found data for low-resource languages
17. Dhotre D, Pankaj R Chandre, Anand Khandare, Megharani Patil, and Gopal S Gawande (2023)
The rise of crypto malware: leveraging machine learning techniques to understand the evolution,
impact, and detection of cryptocurrency-related threats. Int J Recent Innovat Trends Comput
Commun 11(7):215–22. https://ijritcc.org/index.php/ijritcc/article/view/7848
18. Makubhai S, Pathak GR, Chandre PR (2023) Prevention in healthcare: an ex-plainable AI
approach. Int J Recent Innov Trends Computing Commun 11(5):92–100. https://doi.org/10.
17762/ijritcc.v11i5.6582
19. Chandre P, Vanarote V, Kuri M, Uttarkar A, Dhore A, Pathan S, Elahi DDM, Cremonesi P
(2016) Using visual features and latent factors for movie recommendation. CEUR Workshop
Proc 1673:15–18
Artificial Intelligence and Legal Practice:
Jurisprudential Foundations
for Analyzing Legal Text and Predicting
Outcomes

Ivneet Walia and Navtika Singh Nautiyal

Abstract In recent years, tremendous progress has been made in the use of AI
in the legal profession, revolutionizing the way attorneys evaluate legal texts and
foretell case outcomes. This study examines the jurisprudential underpinnings that
support AI-driven strategies in the legal field, concentrating on the application of
AI tools for predictive analytics and legal text analysis. The paper emphasizes how
these technologies make it easier for attorneys to efficiently explore complicated
legal texts by facilitating the extraction and interpretation of legal information from
huge textual collections. The next section demonstrates how AI algorithms may be
created to mimic human-like legal reasoning processes by making links between
legal reasoning, legislation interpretation, and case law analysis. The need for a
cooperative connection between AI systems and human attorneys is emphasized as
the potential for AI to supplement rather than replace legal competence is taken into
account.

Keywords Artificial intelligence · Legal practice · Predicting outcomes · Text

mining

1 Introduction

Artificial intelligence is the deep-rooted fungi of cyberspace, which communicates

via a web of neural networks. The artificial super-intelligence has superseded the
cognitive capabilities of a human being. Humans are placed as superiors amongst
all species because of the way they can articulate, reason, and communicate. The
machines nowadays are no less to perform these functions, thereby, outshining the

I. Walia (B)
Rajiv Gandhi National University of Law, Patiala, Punjab, India
e-mail: [email protected]
N. S. Nautiyal
School of Law, Forensic Justice and Policy Studies, National Forensic Sciences University,
Gandhinagar, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 57
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_6
58 I. Walia and N. S. Nautiyal

superiors in the race. Artificial intelligence has transgressed the boundaries of almost
every domain. Artificial intelligence and law have a special relation as they can be
used to facilitate the legal processes and enhance the legal reasoning component. At
present, various courts have introduced the software run by artificial intelligence algo-
rithms for performing case research, analyzing information, maintaining database,
etc. Programs like Watson, the open-source text analysis tools have been the founda-
tional basis for perceiving the idea of developing legal analytic models and tools [1].
Though, the open-source text analysis tools may be used for analyzing the legal text
and context, it will majorly lack the component of legal reasoning. The technicians
are now discussing computational mechanisms where the information analysis will
move towards conceptual analysis to draft arguments from both perspectives i.e.,
for and against. Hence, the whole idea is to build a Computational Model of Legal
Reasoning, to ensure a credible and rational administration of justice. Developing
a reliable computational model will help us predict answers to legal disputes based
on algorithmic legal reasoning [2]. In many sectors, artificial intelligence (AI) has
become a disruptive force. The legal profession is no exception. This essay exam-
ines the relationship between AI and the practice of law, emphasizing how it affects
crucial processes including legal research, document review, contract analysis, and
legal judgment.
The first section of this article gives an introduction to the AI technologies, such
as machine learning, natural language processing, and expert systems, that are often
used in the legal industry. It explores the potential of AI systems, demonstrating how
they may effectively handle enormous volumes of legal data, helping attorneys to
swiftly access pertinent information and make better conclusions. The benefits AI
provides to legal research are the main topic of the second section. To extract relevant
precedents and spot trends, AI-powered technologies may search through vast legal
databases, case law, and historical records. This helps legal practitioners save time
while also improving the precision and thoroughness of their investigation. The
article also looks at how AI affects document review and e-discovery procedures.
AI-driven algorithms are incredibly good at locating pertinent documents, which
eases the workload for attorneys during litigation and due diligence. The topic of
potential difficulties and moral questions raised by utilizing AI in these situations is
also discussed.
The research also explores the possibilities of AI in contract analysis. AI solu-
tions may assist attorneys in identifying possible hazards, ensuring compliance, and
streamlining contract administration since they have the capacity to analyze and
comprehend complicated contractual language. Additionally, the use of AI in predic-
tive analytics and legal decision-making is investigated. In order to predict case
outcomes and offer useful insights for both attorneys and clients, AI models may
analyze previous court judgments and other legal data. The study does, however,
recognize that employing predictions made by AI in court procedures requires open-
ness and responsibility. Last but not least, the article discusses the ethical issues and
worries related to the use of AI in the legal field, including bias in AI algorithms and
the effect on job displacement for legal practitioners.
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 59

2 Legal Text Analytics and Text Mining

Artificial intelligence is impacting legal practice in the most productive ways. It

facilitates not only the lawyers but the judges too in expedited research and decision-
making processes. The most important contention here is to understand that despite
the developments of artificial intelligence driven technologies, the element of human
judgment shouldn’t be compromised. The inherent biases in the algorithms would
always demand the explainability quotient. By explainability, it means that the judges
in the courts or the lawyers in the courtroom must not depend their decisions blindly
on the outcomes generated by artificial intelligence rather they must demand the
explainable ability in the process followed by the artificial intelligence technology to
have produced such outcome. Artificial intelligence software are already in place in
the legal profession, for instance, LawGeex is used to review the terms of the contracts
and legal instruments swifter than humans. Such algorithms deploy machine learning
principles that identify the flaws and mistakes more accurately than homo sapiens.
As we all know to make artificial intelligence technologies work, we need a lot of
data and information along with the techniques to process such huge chunks of infor-
mation. For the application of artificial intelligence to legal contracts, artificial intel-
ligence needs to review and analyse the legal performance of various contracts and
the language used therein. The process may seem easy as this is what an AI machine
is expected to do, but there are hurdles in executing these practices. For instance, the
decisions on these contracts are context-based and may vary from region to region.
Also, the terms of the whole contract are most of the time not even submitted to the
court for scrutiny, leaving only relevant provisions for discussions behind. There is a
famous saying in the field of computer science, ‘Garbage in, Garbage Out’. Machines
are good at mimicking. If the contracts analyzed are not good enough the genera-
tive artificial intelligence ought to default to producing authentic and appreciable
outcomes and predictions [3]. Leave aside the prediction in regard to the documents
only, the companies like Lex Machine are a step forward and keen on understanding
the behavioral patterns of the judges and lawyers with respect to specific matters by
analyzing the judgments and arguments after evaluating certain parameters.
To quote another example, the applications like CS Disco are used for identifying
the relevant documents that may otherwise have been disguised or hidden from
production in the courtroom. The software became very popular amongst the lawyers
working in law firms. The application of artificial intelligence in legal practice is not
only confined to proofreading and decision-making but rather propels the efficiency
of legal research. For instance, the Westlaw portal run by Thomson Reuters assists
and facilitates legal research. The search features in these startups not only match
the words but also try to assimilate the meaning and the concept behind those words
before producing and prioritizing the documents. To make legal research even more
useful, the services like Quick Check also make value addition by analyzing the
arguments and citing relevant judgments which can support the arguments. Services
like quick check are also helpful in identifying the cases that the lawyer may have
missed or judgments that have been overruled. Artificial intelligence has grown from
60 I. Walia and N. S. Nautiyal

general intelligence to artificial superintelligence or rather generative AI. Artificial

intelligence’s creativity and emergent behavior lead to unexpected outcomes. The
most interesting element is the creation of GPT 3 by Open AI that uses approximately
two hundred billion parameters to produce desired outcomes on research in the most
structured and harmonious manner giving it a look of a natural occurrence rather than
the machine impact. The GPT doesn’t promote perfection in work, but it analyzes
data sequentially to prioritize the search and provide near to correct predictions [3].
Developing a computational model for the purpose of legal reasoning can mini-
mize the human efforts for calculating the cost of liability, compensation maintenance
cost, etc. It may help in drafting contractual clauses and also possibly determine the
damages by swifter calculations. The models that we are aiming at must strike a
balance between unbiased inputs for expected outputs [4]. The models so developed
will absorb information from legislation, cases, regulations, rules, orders, policies,
etc. The computational models must be crafted in a manner that they automatically
connect with the legal text. If every time the practitioner has to feed in the relevant
clauses, the process would become cumbersome and more manual than automated.
While developing such a programme, one may face the difficulty of presenting legal
information in a manner that the machine is able to formulate a legal contention, as
expected. If the legal issue so drafted by the machine is not in consonance with the
thoughts of the legal practitioner, the whole objective would be defeated [5]. Another
difficulty may arise, where the machine may fail to systematize the more important
issues over the less important issues. Thus, the representation of legal information
is of utmost importance to enable a computer programme to rationalize whether a
set of legal rules can be applied to a given pack of facts. As human judgment will
be completely out of question, the response from the machine is expected to be
more objective, precise and scientific rather than jurisprudentially argumentative. It
is important to check and recheck the relevancy of legal reasoning provided by these
programmes to assure justifiable outcomes. It will be important to carefully deploy
the text analytic approaches to carefully gather information, analyze and rationalize
the critical approaches for formulating legal questions, predictions leading to expla-
nations, arguments, and decisions. This automation of legal text conversions to legal
arguments will lead to swifter decisions and cost-effective management [6].
Legal analytics is only a window into the house of computational models. Once
the information is mined out of legal text, cases, and regulations, what should be
the next step? Will these analytic models quickly retrieve the legal information
from text and serve the relevant legal provisions to legal practitioners on a platter,
with a click? While developing these software’s the ethical aspects of the artifi-
cial intelligence must not be compromised. The algorithms must be explainable to
ensure transparency, accountability, reliability, and credibility [7]. The awareness
and acceptability of these predictability and decision-making applications is a must
amongst the members of the society to ensure judicial process integrity. The semantic
information so extracted from legal text must be ensured to be free from biases and
fallacies to prevent errors and misquoting. It may be said that legal analytics cause a
fission between lawyers and computer experts. The process of technological osmosis
will engulf the legal literature to develop a digital legal world [8]. The two models
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 61

that exist in different parts of the world are referred to as Argument Retrievals and
Cognitive computing. Most of the law firms are using these models and some of the
websites also make these models available for the public [9]. These models extract
semantic information from the legal documents and provide the user with legal advice
that they can refer to while dealing with their legal dispute. Another common term
in the field of legal analytics is that of conceptual legal information retrieval, where
the matching concepts and information concerning a legal provision can be readily
found by the user by making a simple search. The cognitive computing aspect is
an advanced feature that caters by way of a customization facility. The users may
customize their legal research to get a summary of information with highlighted text
to grab the attention of the user on the relevant text [10]. Though cognitive computing
is an advanced technology still it is not an expert technology. In expert technologies,
law personnel curate the kind of information they would want to retrieve by manually
selecting the input information and delivering it to the system in a way that responds
as desired [11].
While we talk about incorporating the use of computational techniques into legal
practice, we must understand the role and reference of commercial and institutional
frameworks that already exist. These commercial and institutional approaches have
already been in place for the purpose of whole text retrievals, indexing, referencing,
and search facilities. To name a few such facilities, Westlaw, and LexisNexis have
been widely used by students and practitioners. These applications and platforms
have maintained wealthy databases of text and literature and the way they are regu-
larly updated is also worth appreciation. These systems already indicate the inter-
face between law and artificial intelligence. But as discussed earlier, these intelligent
technologies can find, analyze, and retrieve information but cannot provide legal
reasoning [12].
In furtherance to the concept of legal analytics, is the context of text mining.
Text mining in general terms can be understood examination of unstructured data
sets to extract the relevant information patterns for searching the sources for textual
information. Text mining may appear to be one activity but in real it’s a combination
of several tasks such as retrieval, data extraction, and machine learning. The text
mining process involves the accumulation of unstructured data from varied sources.
The data once collected is required to be cleaned from ambiguities and anomalies.
Application of text mining tools and applications forms the corpus of the process.
Deploying the management information systems allows for data pattern development.
The most important of all is the storage of such data for further analysis and timely
references. The techniques, characteristics, and tools are listed in Table 1, see also
Abhinav Rai, What is Text Mining: Techniques and Applications, available at: https://
www.upgrad.com/blog/what-is-text-mining-techniques-and-applications/.
62 I. Walia and N. S. Nautiyal

Table 1 Techniques, characteristics, and tools of text mining

Techniques Characteristics Tools
Retrieval Retrievals valuable information from Text Analyst
unstructured text
Extraction Extract information from a structured database Text Finder Clear Forest
Text
Summarization Reduce length by keeping its main points and Tropic Tracking Tool
overall meaning as it is Sentence Ext Tool
Categorization Document based categorization Intelligent Miner
Cluster Cluster collection of documents, clustering, Carrot Rapid Miner
classification, and analysis of text documents

3 Mechanisms for Retrieval of Legal Information

It is pertinent to see the functioning of the existing legal information retrieval systems
before pondering over the advanced technology ways of doing it. The moment the
user raises a query through search, the system is triggered to retrieve documents from
databases which are systematically indexed, followed by assessment and measure-
ment in terms of relevancy to the query and position of the output in a listing format.
The information retrieval systems function on three foundations namely, Boolean
Relevance, Vector, and Probabilistic approach. The first approach retrieves the infor-
mation on the basis of the proximity of terms searched by the user and responded
to by documents. The Vector model is based on the search for a collection of terms,
which may or may not be systematically placed. There is no preference given to
the sequencing of words placed in the search tab. Lastly, the probabilistic approach,
provides for exhaustive research like what the user intends to find. It retrieves the
information from the documents after considering words, meanings, definitions,
synonyms, etc. [13], see Fig. 1.
This figure explains the mechanism for the retrieval of legal information. The
illustrative figure offers a thorough overview of the multiple systems that support the
retrieval process in the complex world of legal information retrieval. While making an
assessment of the functioning of the machine learning systems used for information
retrievals, one must cross-check the following points [14]:
a. Predictability of relevant and irrelevant documents;
b. Number of retrieved documents that were relevant to checking precision;
c. Circulations and previous searches made pertaining to that document;
d. Citation scores;
e. Evaluation on the basis of data, for example, lexical and legal levels;
f. Produces relevant results from concentrated data sets as desired by the user;
g. Segregate the terms reflecting the same meaning or connotations;
h. Allows for manual editing.
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 63

Fig. 1 Mechanisms for retrieval of legal information

When the machine learning algorithms retrieve data from the text available, they
identify a pattern of data and use the same pattern to identify similar data from other
sets of information. It analyzes the features required for the commission of a wrong or
an offense, analyzes the situation, applies the text, and predicts an outcome [15]. The
decision can be reached by the algorithm either on the basis of the existing jurispru-
dence, logic, or statistics [16]. The recognition of patterns for resolving human issues
is the basic aim of employing machine learning techniques. Machine learning can
be applied to the legal text in certain steps. Firstly, collecting and processing the
raw data, which is available in the form of legal text in any form. This text can be
downloaded one by one or in bulk depending upon the availability and authoriza-
tion. Secondly, to normalize the text by using language processing tools to maintain
uniformity. Thirdly, after the language processing is done, we move to the vector
feature, which checks on the length of the document, and additional information
for better classification and predictions. Fourthly, the classification of sentences, on
the basis of how much they support a conclusion. Lastly, cross-validation procedure
may be used in a small set of information to test the working. Machine learning
uses predictive coding to check the relevancy and responsiveness of the documents.
Thus, it can be said that while extracting information from statutory or legal text, one
may emphasize on representation of statutory provisions in classified format, select
the algorithmic application that may be applicable, and manage the data sorted and
provided [17].
Machine learning may also be used for retrieving information from case laws or
decisions. The extraction from the legal decisions is likely to be more argument-
based information rather than the legal provisions. This kind of search may focus on
arguments and sentences delivered in a court setup. Artificial intelligence facilitates
in labeling of the information as ‘upheld’ or ‘overruled’ for quick reference. The
analysis taken up by intelligent algorithms may extract information about facts of
the case, history of the case, arguments, ratio, and obiter along with orders and
64 I. Walia and N. S. Nautiyal

decisions [18]. Once the information about a legal issue is explored it would help us
by providing an explanation about the legal principle used or reasoning behind the
judgment. In using or preparing a program for extraction of information from case
laws the computer scientist must consider: First, whether the evidence produced in
a case justifies the decision or conclusion. Second, provides a legal rule, irrespective
of its impact on the final decision, and lastly, a citation sentence, which identifies
references made to statutes, regulations, documents, and writings while reaching a
decision [19].
There are already existing tools and applications available that function on prin-
ciples of cognitive computations. These applications systematize the citations as per
the hierarchical structure of the courts and also maintain a chronology. Applications
and Ross and lex Machine are helpful in answering legal questions and in predicting
legal decisions. Specifically, Lex Machina is an application that analyzes the previous
decisions of a judge to anticipate what can be a possible legal decision in the dispute
presented before him. The emergence and application of these computational tools
will strengthen the interface of legal text corpus and human understanding. This may
sound more like a commercialization of legal practice but it would ensure a step
toward speedier justice. A whole set of confirming and contesting hypotheses by
intelligent machines would change the perspectives on human–machine interactions
[20].

4 Models for Legal Reasoning

In general, for any legal practitioner, there are two most relevant models, one is
reasoning which is case based and depends on precedents for connecting the context
with the case in hand and the other is adversarial reasoning, which enables the
building of relevant and assertive arguments for both the sides. To master the art of
legal reasoning it is required to focus on precedents, the ability to structure unre-
fined information, manage exceptions, resolve conflict in laws and rules, ability
to argue and justify the stand. When we consider the role of artificial intelligence
in the legal reasoning process, immediate attention goes to case-based reasoning,
expert opinions and rules, logic, language processing models, creative and critical
thinking and illustrations. When artificial intelligence works on the reasoning model,
it strengthens legal reasoning in terms of citations and indexing, comparison of cases,
evaluation of arguments, and connecting and relative factors for drawing analogies
and describing hypothetical situations. The primary works related to artificial intel-
ligence in matters of legal interpretation are those of the TAXMAN II project of Mc
Carty. As per Mc Carty legal interpretation is basically a theory construction. There
are two most famous systems that infuse artificial intelligence with legal reasoning,
i.e., the HYPO and the CATO systems. These models assist the lawyers to make
use of past decisions while making arguments. The HYPO System formulates a
dispute between both parties in reference to a legal claim. The practice makes use
of drawing analogies and references to precedents. The rules of this model prepare
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 65

both parties as to which cases will be most relevant to cite. In the other model CATO,
a set of favorable and non-favorable factors is provided. An assessment as to deci-
sion depends on proper evaluation of competing factors. The HYPO system was
embedded in CABARET system that made use of precedents and their relevance in
the application of rules. HUPO Model was also used along with CATO system to
teach case-based argument skills to students of law. After HYPO and CATO models,
the GREBE system produced the most extensive jurisprudence on industrial injury
through the use of semantic networks [21].
Building a model for legal reasoning with technology at its base may not be a
very difficult task to accomplish. What is expected is the precise and clear input of a
situation followed by the determination of legal rules, which would further result in
the predictability of a decision based on the acceptability or rejection of a legal rule.
Though the process doesn’t seem complicated but may pose challenges because
of ambiguities and indeterminacy [22]. The application of a statute and statutory
interpretation are two different aspects that trigger arguments. Legal indeterminacy
is a subset of legal reasoning. Despite being in agreement with the facts stated and
rules quoted, a lawyer may have the tendency to rebut the argument with logical
reasoning [23].
While modeling a statutory legal reasoning, there is a possibility of two kinds of
ambiguities, viz., semantic ambiguity and syntactic ambiguity [24]. The first refers
to the issue of concepts or terminology not clearly defined, and the latter talks about
the terms used by legislatures which on interpretation may give rise to a complex set
of arguments [25]. These words are ‘if’, ‘and’, ‘whether’, etc. Semantic ambiguity
is basically a condition of vagueness and uncertainty to establish what a legal term
would mean or not mean. Semantic ambiguity believes that the language of a legal
text cannot be designed to clearly indicate a specific proposition, it will always
encompass terms, which may be required to be interpreted by the courts. Sometimes,
the language in the legal text is kept open-ended to attain political consensus. The
syntactic ambiguity arises from the imperfect structure of logic given in a legal
statute, which resonates with the natural language processes. The syntax used in a
legal text can open a whole new set of interpretations. It may lead to multiple and
diverse arguments which will have an impact on the decisions based on the convincing
power of the arguments. The syntactic issue can be resolved by normalizing the text
before initiating an algorithmic process. After the normalizing process, a statute can
be made available in a logical format which would help in clarifying the syntax
which would streamline the interpretation [26]. A propositional logical format will
then be useful in making it compatible with an artificial intelligence algorithm. After
the whole sorting is done, reformulating the text for machine language purposes
may now become the priority. Reformulations and negations help in substituting
more legally relevant arguments and replacing less researched aspects of the same
dispute. This further leads to the concept of default reasoning, where the earlier
decisions can be modified or overruled based on the new legislative determinants
that have been identified, which have the power to change the decisions made so far.
The artificial intelligence algorithms may also get confused about open terms that
are subjective like beyond reasonable doubt, preponderance of probability, person
66 I. Walia and N. S. Nautiyal

of good character, etc. [27]. Thus, it can be seen that the statutory interpretations
must be clearly fed into the algorithms for such applications to function and produce
decisions with legal reasoning. It must accept a clear interpretation of a legal rule
unless and until there is a requirement to add additional rules for developing sufficient
reasoning. Once the reasoning is developed it can be supported by provisions and
cross-references [28].
Different computational and reasoning models can be both created and followed.
The case-based legal reasoning would require the methodology to represent adequate
understanding and knowledge about the facts of the case and similarities that it
has with previously given judgments or reasoning in similar sets of situations [29].
Three kinds of legal computational models have been suggested for this purpose,
viz., prototypes, deformations and dimensions, and legal factors. The prototypes and
deformations model emphasizes legal arguments that are in synchronization with
principles and concepts stated in similar cases or relevant cases. The second model of
dimensions and legal factors is representative of techniques that enable algorithms to
compare and make analogies with existing case law repositories. It is more exhaustive
than the previous model and picks up positive instances dejecting the negative [30].
Thus, the legal outcomes or their predictions are based on analysis of legal text, case-
based analysis, or specialized computations made by artificial intelligence driven
algorithms. The models discussed above make an algorithm to analyze a variety
of information about existing legal provisions, similar cases, behavioral patterns of
judges, history of such cases, fact patterns, claims and arguments, etc. Artificial
intelligence uses feature frequencies to evaluate information and draw a conformity
between the case characteristics and predictable outcomes [31].

5 Discussion and Findings

The interface of artificial intelligence and law can be put to optimum use by super-
vised machine learning techniques. The algorithms that become smarter by self-
learning may induce predictions depending upon the statistical means. The algo-
rithm or machine learning becomes a supervised one when it performs functions of
classification rather than just labeling. Legal reasoning can become crucial to judi-
cial processes and may even reduce the uncertainty, dis-proportionality, and over-
discretionary aspects of sentencing. The legal outcomes and decisions will not only
provide statistical support for making uniform and certain decisions but would also
help in the eradication of personal biases. The complex formulations and computa-
tions made by analyzing the legal text and case law-based predictions will contribute
towards developing legal literature and help in developing policies for efficient and
reliable deliverance of justice. The interlock between the law and artificial intelli-
gence may crucially deal with legal reasoning and outcomes but would also be a
major trigger to systematize and prioritize the collection of data, placement of legal
principles, and critical evaluation of arguments in reference to statutory interpre-
tations. Though the paper has been focusing more on the outcome, the collateral
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 67

benefits cannot be ignored at the instance. The legal profession has a tremendous
amount of potential to change as a result of the use of artificial intelligence (AI) in
legal practice, particularly in the study of legal language and case outcome predic-
tion. The researchers have examined the jurisprudential underpinnings of AI-driven
techniques in the legal arena throughout this study, and we have seen firsthand how
AI technologies have the potential to transform the job of legal practitioners.
To discuss one such Natural Language Processing Model is that of Bidirectional
Encoder Representations from Transformers, popularly known as BERT. Unlike
directional models, that read the context from left to right and right to left, sequen-
tially, BERT’s transformer reads the entire sequence of words at a go. Calling it
nondirectional would have been better than calling it bidirectional. Reading the words
in this manner allows the machine to understand the word in every context and in
the circumstances of its available surroundings. The major drawback in using the
BERT NLPs is that it predicts the next words automatically restraining the inflow of
more ideas and context. This is majorly overcome by making use of two ways, firstly,
by following Masked LM that adds a layer of classification, redefining domains of
vocabulary, and calculating or predicting the occurrence of each word. Secondly,
by Next Sentence Prediction, the BERT model in this case analyses the pairs of
sentences and identifies if the second sentence is subsequent to the first in the orig-
inal document. Positional embeddings of the sentences and their match with the
original document also facilitate the purpose aimed at by BERT Models [32].
Though other domain BERTs have been successfully deployed no legal BERT
is famously known to function in present times. The only good advantage of using
BERT in reference to the legal arena is to deploy it to analyze the linguistic polysemy
for words such as consideration, workers, labor, etc. [33].
The use of AI in legal text analysis, enabled by machine learning and natural
language processing, has shown to be revolutionary. With the use of these tech-
nologies, attorneys may now more quickly and effectively navigate through enor-
mous legal databases, extract pertinent facts, and understand complicated legal docu-
ments. Artificial intelligence (AI) algorithms assist legal practitioners in deciphering
complex legislation, rules, and case law, thereby improving legal services.
The supporters of artificial intelligence often boast about productive outputs and
cost-cutting. They hype the rising graphs of the Gross Domestic Product (GDP),
less labor, and lesser human involvement [34]. Though the proponents are citing the
correct propositions still there is a leeward side to this practice. Enough incidents
of racism and discrimination committed by machine learning algorithms have been
reported in different instances. After all who creates these algorithms, we the humans.
Certain biases ought to be present in the criminal justice administration system,
workplace management systems, and financial institutions. Bias is entrenched in an
artificial intelligence driven machine because huge data sets are used for predictive
analysis [35]. The usage of huge amount of data sets which engulf every kind of
information, recorded by humans is not clean from contamination, misinformation,
and manipulation. The corrupt and unjust practices have prevailed in almost every
nation at one point of time or the other [36]. The incidence of bias majorly affects
the minorities or marginalized groups, people of color and women in specific [37].
68 I. Walia and N. S. Nautiyal

The bias and this racist characteristic if embedded will follow the complete trail
of development, process, and execution of that artificial intelligence algorithm. The
Suspect Target Management Plan (STMP) used in Australia is blamed for dispro-
portionally targeting Aboriginals and other marginalized groups. United States has
also used Correctional Offender Management Profiling for Alternative Sanctions
(COMPAS) which mistakenly tags people of color with a possibility and likelihood
of reoffending. The most interesting and unbelievable example is of the United States,
where the algorithms now find a place in the sentencing system and for the purpose
of sentencing it counts on the economic status and employment factors. The studies
show that there is a general stereotype notion about people of color suffering at the
hands of destiny and being tagged as born criminals [38].
The use of AI-driven predictive analytics provides attorneys with a ground-
breaking potential to forecast case outcomes based on prior legal information. AI
models can give insightful analysis of precedents and trends in case law, helping
attorneys make better judgments and giving clients a more accurate evaluation of
their legal issues. To avoid biases, it is crucial to approach predictive analytics
cautiously and make sure that predictions made by AI are explicit, comprehensible,
and constantly reviewed.
The use of AI in legal practice must be done with the utmost ethical care. To
protect the ideals of justice and fairness in the legal system, it is crucial to address
bias in training data, ensure fairness in decision-making, and preserve transparency
in AI algorithms.
AI must be used responsibly and accountable in order to avoid unforeseen
outcomes and preserve public confidence in AI-driven legal systems. The case studies
included in this report show the practical advantages of AI adoption in a range of legal
fields, including contract analysis, litigation support, and legal research. AI’s capacity
to automate processes, boost productivity, and save costs can have a significant influ-
ence on the legal industry, allowing practitioners to concentrate on higher-value work
and provide customers with better services. The potential effects of AI on the legal
profession are both intriguing and difficult to predict. Although AI has the potential
to supplement and enhance legal knowledge, it is unlikely to completely replace
human attorneys. AI systems and legal experts working together in a collaborative
manner to improve decision-making and legal analysis will probably become the
norm.

6 Conclusion

In conclusion, a thorough grasp of the jurisprudential underpinnings, ethical issues,

and responsible application are essential for the successful integration of AI in legal
practice. The legal profession may make use of the capacity of AI to increase effi-
ciency, accuracy, and informed decision-making while respecting the core legal
system ideals of justice and fairness by using AI technology for legal text analysis
and predictive analytics. A happy cohabitation between AI and the legal profession
Artificial Intelligence and Legal Practice: Jurisprudential Foundations … 69

will be ensured by ongoing investigation and improvement of AI’s compliance with

legal concepts, eventually benefiting society as a whole.

References

1. Chan J, Yonamine J, Hsu N (2016) Data analytics: the future of legal, 9 INT’l. IN-House
Counsel J 1
2. Mead L (2020) AI Strengthens Your Legal Analytics, 46 LAW PRAC. 52
3. Stepka M. Business Law Today, American Bar Association. Available at: https://businesslawt
oday.org/2022/02/how-ai-is-reshaping-legal-profession/
4. Legal Analytics Shop Talk with Lex Machina, 20 AALL Spectrum 40 (2016)
5. Stouffer CM, Baker JJ (2019) Ask a Director: Shaping Legal Data Analytics, 24 AALL
Spectrum 30
6. Jack GC, Karl Branting L (2018) Introduction to the Special Issue on Legal Text Analytics 26
A.I. & L. 99
7. Byrd O (2017) Moneyball Legal Analytics Now Online for Commercial Litigators, 31 COM.
L. WORLD 12
8. Daniel S (2018) Wittenberg, data analytics: a new arrow in your legal quiver, 43 LITIG. News
26
9. Rapoport NB, Tiano JR Jr. (2019) Legal analytics, social science, and legal fees: reimagining
legal spend decisions in an evolving industry, 35 GA. St. U. L. REV. 1269
10. Ashley KD (2022) Prospects for legal analytics: some approaches to extracting more meaning
from legal texts, 90 U. CIN. L. REV. 1207
11. Carlos I (2007) Massini, between analytics and hermeneutics: legal philosophy as a practical
philosophy, 56 Persona & DERECHO 205
12. Savelka J, Grabmair M, Ashley KD (2020) A law school course in applied legal analytics and
AI, 37 LAW CONTEXT: A Socio-LEGAL J. 134
13. Rapoport NB, Tiano JR Jr (2019) Leveraging legal analytics and spend data as a law firm
self-governance tool, 13 J. Bus. Entrepreneurship & L. 171
14. Zodi Z (2021) Big-data-based legal analytics programs. What Will Data-Driven Law Look
like?, 10 ACTA UNIV. Sapientiae: LEGAL Stud. 287
15. Patrick Flanagan G, Dewey MH (2019) Where do we go from here: transformation and
acceleration of legal analytics in practice. 35 GA. St. U. L. REV. 1245 (2019)
16. Weinshall K, Epstein L (2020) Developing high-quality data infrastructure for legal analytics:
introducing the israeli supreme court database, 17 J. EMPIRICAL LEGAL Stud. 416
17. Zatarain JMN (2018) Artificial intelligence and legal analytics: new tools for law practice in
the digital age, 15 SCRIPTed 156 (2018)
18. Andrade MD, Rosa BC, Castro Pinto ERG, Legal tech: analytics, artificial intelligence and the
new perspectives for the private practice of law, 16 DIREITO GV L. REV. 1
19. Sorkin D, Lai J, Cuevas-Trisan M (2015) Legal problems in data management: ethics of big
data analytics and the importance of disclosure, 31 J. Marshall J. INFO. TECH. & PRIVACY
L. [xi]
20. Borden BB, Baron JR (2014) Finding the signal in the noise: information governance, analytics,
and the future of legal practice, 20 RICH. J.L. & TECH. 1
21. Prakken H, Legal reasoning: computational models. Available at: https://webspace.science.uu.
nl/~prakk101/pubs/EncyBS.pdf
22. Buchanan BG, Headrick TE (1970) Some speculation about artificial intelligence and legal
reasoning, 23 Stan. L. REV. 40
23. Paul J (2021) When justice is served: using data analytics to examine how fraud-based legal
actions affect earnings management, 2 CORP. & Bus. L.J. 64
70 I. Walia and N. S. Nautiyal

24. McCarty LT (1977) Reflections on Taxman: An Experiment in Artificial Intelligence and Legal
Reasoning, 90 HARV. L. REV. 837
25. Najjar M-C (2023) Legal and ethical issues arising from the application of data analytics and
artificial intelligence to traditional sports, 33 ALB. L.J. Sci. & TECH. 51
26. Susskind RE (1986) Expert systems in law: a jurisprudential approach to artificial intelligence
and legal reasoning, 49 MOD. L. REV. 168
27. Lashbrooke Jr. EC (1988) Legal reasoning and artificial intelligence, 34 LOY. L. REV. 287
28. Koenig MEL, Mandell C (2022) A new metaphor: how artificial intelligence links legal
reasoning and mathematical thinking, 105 MARQ. L. REV. 559
29. Clark M (1997) Automation of legal reasoning: a study on artificial intelligence and law, 6
INFO. & COMM. TECH. L. 178
30. Tiscornia D (1993) Meta-reasoning in law: a computational model, 4 J.L. & INF. Sci. 368
31. Berman DH, Hafner CD (1987) Indeterminacy: a challenge to logic-based models of legal
reasoning, 3 Y.B. L. Computers & TECH. 1
32. Horev R, BERT Explained: State of-the-art language model for NLP, towards data science,
towards data science. Available at: https://towardsdatascience.com/bert-explained-state-of-the-
art-language-model-for-nlp-f8b21a9b6270
33. Zhang E, LawBERT: towards a legal domain-specific bert? towards data sciences. Available at:
https://towardsdatascience.com/lawbert-towards-a-legal-domain-specific-bert-716886522b49
34. Solow-Niederman A (2020) Administering artificial intelligence, 93 S. Cal. L. Rev. 633
35. Stark L, Hutson J (2022) Physiognomic artificial intelligence, 32 Fordham Intell. Prop. Media &
Ent. L.J. 922
36. Opderbeck DW (2021) Artificial intelligence, rights and the virtues, 60 Washburn L.J. 445
37. Atkinson D (2019) Criminal liability and artificial general intelligence. J Robot, Artif Intell
Law (Fastcase) 333
38. Buiten MC (2019) Towards intelligent regulation of artificial intelligence. 10 Eur. J. Risk Reg.
41
Unveiling the Truth: A Literature Review
on Leveraging Computational Linguistics
for Enhanced Forensic Analysis

Deepak Mashru and Navtika Singh Nautiyal

Abstract The fusion of computational linguistics (CL) and forensic linguistics (FL)
has become a powerful tool for boosting the effectiveness of forensic investigations
in the fast-changing field of digital forensics. This thorough assessment of the liter-
ature looks into this multidisciplinary nexus and explains how CL may enhance
and improve the procedures and findings of forensic investigations. The review
summarises the literature, highlighting recurring themes, divergences, and trends,
and suggests new lines of inquiry based on the gaps found. Examining the results’
implications for both CL and FL highlights the possible effects on the corresponding
fields. The review examines the techniques, conclusions, and limits of significant
research in depth. It points out knowledge gaps, especially with regard to the use of
CL approaches in FL situations. These gaps serve as a guide for further research,
emphasising areas where additional study could result in important breakthroughs.

Keywords Computational linguistics · Forensic linguistics · Digital forensics ·

Application of CL · Literature review

1 Introduction

The fields of computational linguistics (CL) and forensic linguistics (FL), which
are separate but related, have advanced significantly in recent years. Each subject
contributes a distinct viewpoint and set of approaches to the study of language,
and the junction of these fields opens up new prospects for further study and use.
The multidisciplinary field of CL makes use of notions from computer science to
understand and represent language. In order to enable computers to interact with
people in a fashion that looks natural and is similar to that of a person, it tries to
develop computational models and algorithms that can process, analyse, and grasp
human language. The subfields of CL include, but are not limited to, natural language

D. Mashru (B) · N. S. Nautiyal

School of Law, Forensic Justice and Policy Studies, National Forensic Sciences University,
Gandhinagar, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 71
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_7
72 D. Mashru and N. S. Nautiyal

processing, machine translation, speech recognition, and information extraction.

Machine learning and artificial intelligence have advanced the field by enabling the
development of complex language models that can produce human-like text, translate
languages with high accuracy, and extract useful information from vast amounts of
text data. On the other side, the area of applied linguistics known as FL uses linguistic
knowledge and techniques to aid in the resolution of legal, criminal, and adminis-
trative problems. It entails the examination of language used in legal contexts or
connection with legal matters, including, among other things, the language of laws,
legal opinions, police interviews, and witness testimony. FL can be used to establish
authorship, spot fraud, settle textual disagreements, and offer expert testimony in
court proceedings.
By bringing attention to and resolving problems like prejudice, discrimination,
and misunderstanding in legal language, you may help ensure the fair and just use
of language in legal situations. Exciting opportunities can be found at the cross-
roads of CL and FL. Many of the studies carried out in FL may be automated and
improved using the computational methods created in CL. For instance, computa-
tional approaches may be used to more effectively and broadly carry out authorship
attribution, a typical FL operation. Similarly, by applying machine learning algo-
rithms that can recognise and understand the verbal clues connected to dishonesty,
deception detection may be improved. Additionally, methods like information extrac-
tion and text categorisation can help with the extensive examination of legal materials
including legislation, legal opinions, and court transcripts.
The study of language benefits from the viewpoints and approaches that both CL
and FL offer. With the potential to improve the efficacy and efficiency of forensic
investigations as well as the fair and reasonable use of language in legal situations,
their confluence provides interesting research and application vistas. As a result, in
our increasingly digitised and linguistically varied world, the study of CL, FL, and
their junction is crucial and relevant.

2 Sample Importance and Relevance of the Study

in the Context of Forensic Investigations

In the field of forensic investigations, the fusion of computational linguistics (CL)

and forensic linguistics (FL) is highly significant and relevant. The importance of
language as a key piece of evidence in judicial and criminal processes has grown as we
go through an era marked by the growth of digital technology. Textual information,
whether it comes from modern forms of communication like emails and social media
postings or from more formal formats like letters and papers, frequently plays a key
role in revealing the truth behind illegal activity. The importance of this study, which
intends to examine the synergistic potential of CL and FL in boosting the efficacy of
forensic investigations, resides in just this.
Unveiling the Truth: A Literature Review on Leveraging Computational … 73

The volume and complexity of linguistic data used in forensic investigations,

which are both growing in complexity, highlight the importance of this work. Such
data processing by hand can be laborious, time-consuming, and prone to human
mistakes. CL provides the capacity to automate and simplify the process of linguistic
analysis, improving the effectiveness and accuracy of forensic investigations. CL
does this via the use of complex algorithms and computational models. The use of
CL in FL can go beyond efficiency improvements. It can offer brand-new skills and
insights that are challenging, if not impossible, to attain through manual analysis.
For example, machine learning algorithms may be trained to spot small correlations
and patterns in linguistic data that human analysts would miss.
Such patterns are useful for tasks that are frequently at the heart of forensic inves-
tigations, such as authorship attribution, deception detection, and language profiling.
The application of language as evidence in judicial and criminal procedures has
wider societal and ethical ramifications, which makes this work relevant in these
areas as well. It highlights crucial considerations about how language should be used
fairly and justly in legal situations and about the potential role that computational
tools may play in addressing problems like prejudice, discrimination, and misun-
derstanding in legal language. This study’s significance and relevance come from
its ability to improve the efficiency of forensic investigations, support the fair and
just use of language in legal settings, and inspire more research and innovation at
the nexus of CL and FL. As a result, this study is important for scholars and practi-
tioners in the disciplines of CL and FL as well as for lawyers, law enforcement, and
the general public.

3 Purpose of the Study and the Research Question

This work aims to explore the relationship between computational linguistics (CL)
and forensic linguistics (FL), illuminating how CL could improve the effective-
ness and precision of forensic investigations. In order to develop knowledge in both
domains, it strives to give a thorough grasp of how the methodology and approaches
of CL may be applied to the problems and difficulties of FL. The underlying research
question that informs this work is “How can the principles and techniques of compu-
tational linguistics be leveraged to enhance the effectiveness and accuracy of forensic
linguistics in the context of forensic investigations?” The study will conduct a crit-
ical analysis of the corpus of prior research in the area, identify knowledge gaps,
and provide solutions to close these gaps in order to respond to this research topic.
Additionally, it will look at particular instances when CL has been applied to FL, eval-
uating the efficiency of these techniques and their consequences for both domains.
In short, this study’s goal and its central research question are to reveal the revolu-
tionary potential of CL in improving FL and open the door to forensic investigations
that are more effective, precise, and all-encompassing. or scholars and practitioners
in the domains of CL and FL, as well as for attorneys, law enforcement officials, and
society at general, this endeavour is very valuable.
74 D. Mashru and N. S. Nautiyal

4 Research Methodology

In order to ensure a thorough coverage of the pertinent research in the disciplines

of computational linguistics and forensic linguistics, the literature search method
for this study was created to be exhaustive and systematic. ACM Digital Library,
Google Scholar, JSTOR, and PubMed were among the academic databases and digital
libraries that were searched.
Computational linguistics and forensic linguistics-related terms and phrases were
combined to do the search. In addition to “Computational Linguistics,” “Forensic
Linguistics,” “Authorship Identification,” “Deception Detection”, “Hate Speech
Analysis”, and “Machine Learning”, these keywords also contained “Deep Learn-
ing” and “Machine Learning”. To guarantee a thorough examination of the subject,
including both foundational works and the most recent advancements in study, the
search was not restricted to a particular time period.
The search approach also included a manual search of the highlighted papers’
reference lists to locate more pertinent research. This method of “snowballing” helps
to ensure that no important investigations are overlooked.

5 A Description of the Standards for Including

and Excluding Studies

To guarantee the applicability and calibre of the chosen literature, inclusion and
exclusion criteria for the studies were established. Studies that satisfied the following
requirements were considered for the review:
1. The study concentrated on forensic linguistics’ use of computational linguistics.
2. The study’s publication in a peer-reviewed journal or conference proceedings
attested to the study’s calibre and objectivity.
3. Since English was the language of the review, the study was also available in that
language.
The following exclusion standards were established:
1. Language restrictions prevented us from including studies that were not available
in English.
2. To assure the calibre of the included research, papers that were not published in
peer-reviewed journals or conference proceedings were omitted.
To keep the review’s emphasis on the junction of computational and forensic
linguistics, studies that had no direct bearing on this topic were disregarded.
Unveiling the Truth: A Literature Review on Leveraging Computational … 75

6 Specifics of the Literature Analysis and Synthesis Process

There were various processes in the process of reading, analysing, and synthesising
the literature. First, crucial details, such as the authors, year of publication, research
methodologies, important findings, and research gaps, were retrieved after thor-
oughly reading the chosen articles. To aid in the analysis, this data was kept in a
tabular format.
After that, a thematic analysis of the literature was conducted. Finding recurring
themes, differences, and patterns within the research was necessary for this. Itera-
tively, topics were developed and redefined as more material was read during the
thematic analysis.
In order to present a thorough overview of the area, the results from the various
investigations were integrated as part of the literature synthesis. This included talking
about recurring themes and patterns, contrasting and comparing the results of various
research, and pinpointing knowledge gaps. The synthesis included a critical evalu-
ation of the present status of the field, recommendations for future study, and a
discussion of the implications of the results for both computational linguistics and
forensic linguistics.

7 Literature Review

Computational linguistics, a vibrant intersection of technology and language study,

holds transformative potential in forensic analysis. This literature review delves into
key advancements, methodologies, and applications of computational linguistics in
forensic contexts, shedding light on its profound implications in areas like deception
detection, hate speech, and ethical considerations (Table 1).

8 Overview of Current Computational Linguistics

Research and How It is Used in Forensic Linguistics

Over the past several decades, research in computer Linguistics (CL) has significantly
increased, driven by advances in technology and artificial intelligence that have sped
up the creation of complex computer models and language processing algorithms.
Forensic linguistics (FL) is one of the many fields in which these advancements have
found use.
Research on the use of CL in FL has mostly concentrated on a few important
topics. In order to identify the author of a document, computer approaches have
been used to analyse linguistic aspects such as word usage, syntactic structures,
and stylistic patterns. This is one of the most well-known applications of author-
ship attribution. Numerous studies have shown that these techniques are successful
76 D. Mashru and N. S. Nautiyal

Table 1 Details of literature review along with research area and key findings
Sr Author(s) Research area Key findings
1 Almela et al. [1] Automatic Classifier for Developed an SVM classifier
Deception in Spanish to identify deception in
Spanish written
communication; emphasised
the gap in research for
languages other than English
2 Church and Liberman [2] Major Shifts in Offered a comprehensive
Computational Linguistics discussion on the evolving
landscape of computational
linguistics and provided
insights for budding
researchers in the field
3 Solovyev et al. [3] Linguistic Complexology Presented a detailed overview
Paradigms and Methods of the paradigms and methods
in linguistic complexology and
underscored the need for
refining complexity prediction
metrics
4 Ophir et al. [4] Computational Linguistics in Explored the integration of
Suicide Prevention computational linguistics for
suicide prevention, bringing
attention to both ethical
dilemmas and methodological
challenges
5 Moura et al. [5] Automatic Classifier for Developed an automatic
Deception Detection in classifier targeting Portuguese
Portuguese deception; emphasised a
similar research gap as
observed in Spanish-language
deception studies
6 Simon and Nyitrai [6] Linguistic Fingerprints in Highlighted the pivotal role
Decision-making and linguistic fingerprints play in
Investigation authoritative decisions, aiding
investigative bodies in their
work
7 Almela et al. [7] Quantitative Analysis of Provided quantitative insights
Lying in Psychopathic into the linguistic nuances
Discourse observed in deceptive
communication within
psychopathic discourse
8 Alshahrani et al. [8] Deep-Learning-Based Intent Proposed a deep learning
Detection for Natural approach for intent detection
Language in natural language
understanding and discussed
potential ethical ramifications
(continued)
Unveiling the Truth: A Literature Review on Leveraging Computational … 77

Table 1 (continued)
Sr Author(s) Research area Key findings
9 Donatelli and Koller [9] Evolution of Computational Presented an in-depth
Linguistics overview of the historical
changes, varying motivations,
evolving methods, and diverse
applications in computational
linguistics
10 Tsujii [10] Forensic Linguistics in Emphasized the increasing
Epidemic Crime and Fake relevance and critical role of
News forensic linguistics in tackling
challenges like epidemic crime
and the spread of fake news
11 Silva [11] Argumentation in Hate Explored the structure and
Speech on Facebook nature of argumentation
present in hate speech on
Facebook, illuminating
patterns and potential
motivations
12 Choobbasti et al. [12] CL-DLBIDC for Natural Proposed a novel methodology
Language Understanding that synergizes computational
linguistics and deep learning
for enhanced natural language
understanding
13 Gurram et al. [13] Fast Native Language Introduced state-of-the-art NLI
Identification Techniques techniques leveraging string
kernels, addressing efficiency
and speed in identifying native
languages
14 Alduais [14] Comparative Research Offered a holistic comparative
Approaches in Language review of various research
Study methodologies adopted in the
realm of language studies
15 Abdalla [15] Role of Forensic Linguistics Detailed the significant
in Crime Investigation contributions and applications
of forensic linguistics in the
investigation and solving of
crimes
16 Kuznetsov [16] History and Current State of Presented a chronological
Forensic Linguistics narrative detailing the
evolution, current
methodologies, and future
prospects of forensic
linguistics
(continued)
78 D. Mashru and N. S. Nautiyal

Table 1 (continued)
Sr Author(s) Research area Key findings
17 Sari et al. [17] Hate Speech Acts on Social Conducted an exhaustive study
Media on the manifestations, patterns,
and repercussions of hate
speech acts on various social
media platforms
18 Orr et al. [18] Ethical Role of Investigated the ethical
Computational Linguistics in considerations and
Suicide Prevention responsibilities when
leveraging computational
linguistics tools and techniques
in suicide prevention efforts

at accurately identifying authors even when there are many potential authors or big
datasets. Deception detection is a key field of research as well. Machine learning algo-
rithms have been utilised in several research to find linguistic indicators connected to
dishonesty in spoken and written language. These research have demonstrated that
deceitful language frequently possesses particular linguistic characteristics and that
computational tools can be useful in identifying these characteristics.
The use of CL for linguistic profiling, which uses computer techniques to develop
thorough profiles of people based on their language use, has also been studied. These
profiles can offer useful details about an unknown author’s background in terms of
demographics, which can help identify them.
Despite these developments, applying CL to FL still presents a number of diffi-
culties and opportunities. Significant obstacles need to be overcome due to the
complexity and diversity of languages, the complexities of human communication,
and the ethical and legal ramifications of utilising language as evidence in court
cases.

9 Determining Knowledge Gaps in the Present State

of Knowledge

Although the corpus of existing research has made great gains in the application of
computational linguistics (CL) to forensic linguistics (FL), there are still a number
of knowledge gaps that need to be filled.
First off, a lot of the study that has already been done has been on English or
other frequently spoken languages. Research on the use of CL to FL in the context
of minority or less widely used languages is lacking. The need to include various
languages in the research is critical given the linguistic variety of our planet. Second,
while extensive research has been done on authorship attribution and deception detec-
tion, less has been done on other possible uses of CL in FL. For instance, additional
study is required on the application of CL to tasks like spotting threats, detecting
Unveiling the Truth: A Literature Review on Leveraging Computational … 79

hate speech, or examining the language of legal writings. Thirdly, many of the prior
investigations have relied on computer models or language elements that are rather
basic. Research that examines increasingly intricate language aspects and makes use
of cutting-edge computer models, including deep learning models, is required.
The ethical and legal ramifications of employing CL in FL are not rigorously
examined by the study, which is another research gap. To ensure the fair and reason-
able application of computational approaches in forensic investigations as they are
used more often, it is critical to solve these challenges.

10 A Description of How the Current Study Attempts

to Bridge Such Gaps

By performing an extensive literature evaluation and suggesting areas for future

research, this study seeks to close the gaps in the existing body of information. This
project will examine the existing research on lesser-known or minority languages in
order to fill the research gap on these subjects. It will also look for chances to apply CL
to FL in these situations. It will also suggest ways to get around problems with certain
languages, such as the scarcity of linguistic resources or the difficulty of the language.
This investigation will look into more possible CL uses in FL in order to broaden the
research’s application beyond authorship attribution and deception detection. It will
evaluate the prior research on these applications and suggest fresh lines of inquiry
in light of the problems and opportunities found. This study will evaluate the prior
research that has utilised complicated language elements and cutting-edge computer
models in order to promote their utilisation. It will point out the advantages and
disadvantages of these strategies and make suggestions for how to make them better.
Finally, this study will analyse the current literature on these concerns in order to
address the ethical and legal consequences of utilising CL in FL. It will list the main
moral and legal issues that need to be addressed, talk about how they affect the use
of CL in Florida, and provide solutions.
By performing an extensive literature evaluation and suggesting areas for future
research, this study seeks to close the gaps in the existing body of informa-
tion. By broadening the scope of the study, advancing the use of sophisticated
computer models and complex language aspects, and addressing the moral and legal
ramifications of utilising CL in FL, it aims to enhance the field of CL and FL.
80 D. Mashru and N. S. Nautiyal

11 Discussion and Analysis

11.1 Common Themes

The use of computational tools in forensic linguistics is one of the most prevalent
themes in the literature, notably in the determination of authorship, deception detec-
tion, and hate speech analysis. These programmes use computational linguistics to
analyse massive volumes of data, spot patterns, and make predictions, giving forensic
investigators useful resources.
The application of deep learning and machine learning methods in forensic
linguistics is another recurring subject. Numerous studies, including that of
Alshahrani et al. [8], talk about the use of these sophisticated computational algo-
rithms for natural language interpretation and show how they might improve forensic
investigations.

11.2 Disagreements

There are differences in the literature despite these universal elements. One point of
contention is the viability and moral ramifications of using computational methods in
forensic linguistics. While computer tools can offer insightful data, some researchers,
including those in the study by Orr et al. [18], contend that they shouldn’t take the
place of human discretion and knowledge. Additionally, they raise worries about
privacy and possible abuse of these technologies.

11.3 Trends

According to trends in the literature, advanced machine learning techniques, such as

deep learning, are being used in forensic linguistics. Additionally, there is a tendency
towards the creation of increasingly complicated and nuanced models that can more
accurately depict the richness of human language and behaviour.
This is evident from the works of Donatelli and Koller [9], as well as Tsujii [10],
which show the evolution of the goals, methods, and applications of computational
linguistics.
There is a substantial body of work in both forensic linguistics and computational
linguistics that sheds light on the potential of computational techniques to support
forensic investigations. It also highlights the relevance of ongoing discussion and
research in this area and the necessity of carefully considering the ethical implications
of these technologies.
Unveiling the Truth: A Literature Review on Leveraging Computational … 81

11.4 Research Gap

Despite the fact that numerous studies have looked at how computational linguistics
may be used in different facets of forensic linguistics, comprehensive research that
focuses on the fusion of these two domains is still lacking. This paper fills this
vacuum by offering a thorough examination of the ways in which computational
linguistics might advance forensic linguistics, utilising a variety of sources to present
a comprehensive picture of the subject.
The literature study also showed that there is a need for more complex analyses
of the moral ramifications of using computational methods in forensic linguistics.
There is a need for a more thorough and in-depth investigation of this subject even
while certain studies, like the one by Orr et al. [18], have just briefly touched on these
problems.
This paper addresses this gap by giving a fair assessment of the possible advan-
tages and hazards and devoting a sizeable amount of the discussion to the ethical
issues of utilising computational tools in forensic investigations. The literature study
discovered a development in forensic linguistics towards the employment of sophis-
ticated machine-learning methods. However, there is a dearth of studies addressing
the real-world difficulties and potential drawbacks of these methods. This work fills
this vacuum by critically examining the application of machine learning in forensic
linguistics and outlining its possible advantages, drawbacks, and difficulties.
Last but not least, the research closes a vacuum in the literature by offering a
thorough assessment of the state of the area right now, including the most recent
trends and advancements. This study offers a broader perspective on the subject
than many others that concentrate on particular applications or methods, making it
an important tool for academics, practitioners, and students interested in the nexus
between computational linguistics and forensic linguistics.
By filling up a number of highlighted gaps and offering a thorough, fair, and current
analysis of the nexus between computational linguistics and forensic linguistics, this
work significantly adds to the body of literature.

12 Findings

The results of the literature study have important ramifications for both forensic and
computational linguistics.
The results highlight the potential for Computational Linguistics to contribute to
useful, real-world applications, such as forensic investigations. The use of compu-
tational approaches in hate speech analysis, deceit detection, and authorship identi-
fication, as emphasised in the works of Simon et al. [6] and Moura et al. [5], show
the usefulness of computational linguistics. Additionally, the use of sophisticated
machine learning algorithms provides promising potential for additional study and
development in computational linguistics, as mentioned in Alshahrani et al. [8].
82 D. Mashru and N. S. Nautiyal

This is consistent with findings by Gurram et al. [13], who showed the value of
string kernel-based methods for Native Language Identification (NLI), a crucial field
in forensic linguistics.
The results also show that computational linguists must think about the ethical
ramifications of their work, especially when it is applied in delicate situations like
forensic investigations. The privacy issues and potential abuse of computational
approaches brought up in the study by Orr et al. [18] highlight how crucial ethical
considerations are in computational linguistics. The work of Azhniuk (2022), who
examined the methodological approaches to forensic linguistic study on the judicial
assessment of speech deeds, further echoes this.
The results show how useful it is for forensic linguists to use computational
methods in their job. These methods can offer priceless information and instruments
that can improve the precision and effectiveness of forensic investigations. The work
of Abdalla [15] shows how computational linguistics has the potential to dramatically
improve the skills of forensic linguists by helping to identify authorship, uncover
fraud, and analyse hate speech.
The results also demonstrate the necessity for forensic linguists to be judicious
users of these methods. To guarantee their ethical and efficient usage, it is essential
to be aware of their constraints and potential hazards. The relevance of this critical
viewpoint is shown by the worries raised in the literature over the efficacy and moral
ramifications of these tactics. The work of Kuznetsov [16], who explored the history
of forensic linguistics’ evolution and its present status and emphasised the necessity
for a thorough grasp of the discipline, serves as additional evidence for this.
The results of the literature review have important ramifications for both forensic
linguistics and computational linguistics, underlining the possible advantages, diffi-
culties, and ethical issues related to the interdisciplinary study of these two domains.
The publications cited offer a rich tapestry of ideas that emphasise these consequences
even more.

13 Limitations

The primary limitation of this review is its dependency on the provided sources.
The year of publication and the name of journals/books from which the studies
originated were not provided, potentially limiting the context in which the findings
should be interpreted. Furthermore, the rapidly evolving nature of both computational
linguistics and forensic analysis suggests that newer advancements might not be
included in this review.
Unveiling the Truth: A Literature Review on Leveraging Computational … 83

14 Conclusion

This study’s major goal was to examine how computational linguistics and forensic
linguistics relate to one another, with a focus on the ways that computational tools
might aid forensic investigations. The comprehensive literature review’s findings
provided a complex tapestry of viewpoints on this relationship. In forensic linguistics,
computational linguistics has a variety of applications, including the analysis of hate
speech and the identification of authors. It also highlighted the growing application
of complex machine learning techniques in forensic linguistics. However, the study
also found some inconsistencies in the literature, particularly with regard to the moral
and practical consequences of these uses. The findings of the literature review have
significant implications for both computational linguistics and forensic linguistics.
The results highlight the potential for Computational Linguistics to contribute to
useful, real-world applications, such as forensic investigations. Additionally, they
emphasise how crucial it is for computational linguists to think about the ethical
ramifications of their work, especially when it is applied in delicate situations like
forensic investigations.

References

1. Almela A, Valencia-García V, Cantos P (2013) Seeing through Deception: A Computational

Approach to Deceit Detection in Spanish Written Communication. Linguisstic Evidence in
Security, Law and Intelligence 1(1)
2. Liberman KCaM (2021) The future of computational linguistics: on beyond alchemy. Front
Artif Intell 4(2021)
3. Solovyev V, Solnyshkina M, McNamara D (2022) Computational linguistics and discourse
complexology: paradigms and research methods. Comput Linguist Discourse Complexol
26(2):275–316
4. Ophir Y, Tikochinski R, Klomek AB, Reichart R (2021) The Hitchhiker’s guide to computa-
tional linguistics in suicide prevention
5. Moura R, Sousa-Silva R, Cardoso HL (2021) Automated fake news detection using computa-
tional forensic linguistics. Springer, Cham
6. Simon GU, Nyitrai AE (2021) The phenomena of epidemic crime, deepfakes, fake news, and
the role of forensic linguistics. Inform Társadalom 2(XX1):86–101
7. Almela A, Alcaraz-Mármol G, Cantos P (2015) Analysing deception in a psychopath’s speech:
a quantitative approach. D.E.L.T.A, pp 559–572
8. Alshahrani HJ, Tarmissi K, Alshahrani H, Elfaki MA, Yafoz A, Alsini R, Alghushairy O,
Hamza MA (2022) Computational linguistics with deep-learning-based intent detection for
natural language understanding. Appl Sci 12(17):2022
9. Donatelli L, Koller A (2023) Compositionality in Computational Linguistics. Ann Rev Linguist
1(9):4630481
10. Tsujii J (2021) Natural language processing and computational linguistics. Comput Linguist
4(27):707–727
11. Silva WP (2021) Argumentation in hate speech on Facebook: a contributive categoriza-
tion to Forensic Linguistics and Computational Linguistics. REVISTA DE ESTUDOS DA
LINGUAGEM 4(29)
84 D. Mashru and N. S. Nautiyal

12. Choobbasti AJ, Gholamian ME, Vaheb A, Safavi S (2018) JSpeech: a multi-lingual conver-
sational speech corpus. Athens, Greece: IEEE Spoken Language Technology Workshop
(SLT)
13. Gurram VK, Sanil J, Anoop VS, Asharaf S (2023) String Kernel-based techniques for native
language identification. Hum-Cent Intell Syst
14. Alduais A (2012) A comparative and contrastive account of research approaches in the study
of language. Int J Learn Develop 2
15. Abdalla AE (2020) Forensic linguistics and its role in crime investigation: descriptive study.
JALL | J Arabic Linguist Literat 2(2):55–75
16. Kuznetsov VO (2021) Forensic linguistics as a form of application of specialized linguistic
knowledge in legal proceedings: development history and current state. Theory Practice Foren
Sci 4(16):17–25
17. Sari PLP, Supiatman L, Aryni Y (2022) Hate speech acts on social media (Forensic Linguistics
Study). English Teach Linguist J 2(3)
18. Orr M, Kessel, Parry D (2022) The ethical role of computational linguistics in digital
psychological formulation and suicide prevention. In: Proceedings of the eighth workshop
on computational linguistics and clinical psychology. https://doi.org/10.18653/v1/2022.clpsyc
h-1.2
Navigating the Digital Frontier:
Unraveling the Complexities
and Challenges of Emerging Virtual
Reality

Navtika Singh Nautiyal and Archana Patel

Abstract Virtual Reality (VR) technology’s quick development has ushered in a

revolutionary period in human–computer interaction, offering immersive experi-
ences that break through conventional barriers. This article explores the nuances of
this changing environment, illuminating the many difficulties and problems brought
on by developing virtual reality. The study offers a thorough review of the complex-
ities that researchers, developers, and stakeholders must manage, with a focus on
the technological, experiential, ethical, and social components. The technical chal-
lenges include requiring technology requirements as well as the necessity of control-
ling latency to ensure user comfort and avoid motion sickness. In addition, difficul-
ties with user experience come from the need to balance content adaption for VR
with preserving user comfort, particularly in circumstances where users engage in
protracted usage. In a period of growing data collecting, ethical problems include
privacy and data security issues as well as the social and psychological effects of
protracted immersion in virtual environments. In addition, the social implications of
VR present issues like inclusiveness in the age of the digital divide and the thin line
between productive involvement and possible escape. This article tries to provide a
thorough overview of the complex environment around developing virtual reality by
addressing these issues and by providing suggestions for how they could be resolved
in order to responsibly integrate VR into our digital society.

Keywords Virtual reality · Digital world · Virtual environments · Privacy · Data

security

N. S. Nautiyal (B)
National Forensic Sciences University, Delhi 110085, India
e-mail: [email protected]
A. Patel
National Forensic Sciences University, Gandhinagar, Gujarat, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 85
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_8
86 N. S. Nautiyal and A. Patel

1 Introduction

Over the past few years, virtual reality (VR) technology has developed quickly,
offering immersive experiences that go beyond traditional media. New opportunities
have been created across several industries, including gaming, education, healthcare,
and training, as a result of this technical progress. The development of virtual reality
technology in recent years has been astounding, revolutionizing how people interact
with digital worlds. Numerous sectors, including gaming, education, healthcare, and
others are becoming increasingly interested in the possibilities of VR as a result
of the development of new VR devices, apps, and experiences. To fully realize the
promise of this technology, however, a number of complexity and obstacles have
also been brought about by these breakthroughs. To fully realize the promise of
this game-changing technology, researchers and developers must overcome a variety
of complexity and problems that have been brought about by the emergence of
new virtual reality technologies. Virtual reality (VR) technology has emerged as
a revolutionary force, changing how people interact with digital surroundings and
creating new opportunities in a variety of fields.
The potential uses of VR technology have grown beyond entertainment and
gaming to include industries like education, healthcare, architecture, and training
simulations as a result of developments in hardware, software, and user experiences.
An in-depth analysis of the complicated new virtual reality industry is presented in
this study, together with a discussion of the issues it raises in terms of technology,
user experience, ethical issues, and social impact. To fully reap the rewards of VR,
however, a variety of complications and obstacles must be thoroughly recognized and
handled as a result of this rapid expansion. This essay covers the complicated world
of contemporary virtual reality, diving into its technical difficulties, user experience
difficulties, and ethical issues, as well as looking at its social effects and accessi-
bility restrictions. Virtual reality technology has completely changed how people
engage with digital surroundings, providing immersive experiences with practical
uses in a wide range of fields. Even though VR technology has advanced, there are
still many complexities and problems that researchers and developers need to work
through. This article examines the nuances of these difficulties, including everything
from technology constraints to user experience problems, ethical dilemmas, soci-
etal implications, and accessibility restrictions. In recent years, virtual reality (VR)
has expanded quickly, revolutionizing a variety of sectors, including entertainment,
medicine, education, and more. While there is no denying that VR has the poten-
tial to alter society, there are ethical issues that must be carefully considered before
implementation. This article explores the complex ethical issues surrounding VR,
highlighting the significance of a moral framework to direct its application.
Navigating the Digital Frontier: Unraveling the Complexities … 87

2 Complexities Related to Virtual Reality

The immersive qualities of VR, which are made possible by the combination of
cutting-edge technology and software, provide both amazing prospects and enor-
mous obstacles that call for careful investigation. This investigation will dig into
the complex complexity of virtual reality, examining the technological complexities,
cognitive issues, moral conundrums, and significant societal effects that together
create the landscape of this fast-changing field. But along with VR’s thrilling poten-
tial also come a host of complexities and complex problems that require careful
consideration. Through our research, we hope to shed light on the mysterious details
that lie behind the immersive digital worlds of virtual reality, eventually developing a
better understanding of its tremendous effects on both people and society as a whole.
User Privacy and Data Security: The collection and storage of user data within
VR ecosystems raise significant privacy concerns. VR applications often require
personal information and behavioral data, which can be susceptible to breaches and
unauthorized access [1]. Striking a balance between providing personalized expe-
riences and safeguarding user privacy requires robust data protection measures and
informed consent mechanisms.
Content Integrity and Responsibility: Virtual environments can host a wide array
of content, including user-generated material, simulations, and entertainment experi-
ences. Ensuring that these environments uphold ethical standards and do not promote
harm or misinformation is a crucial consideration [2]. Ethical guidelines should be
established to govern the creation and dissemination of VR content, particularly in
cases where the lines between reality and simulation are blurred.
Psychological Well-being and User Experience: The immersive nature of VR can
impact users’ psychological well-being, potentially leading to phenomena like cyber-
sickness or psychological distress [3]. Designers and developers must prioritize user
experience by minimizing adverse effects and optimizing comfort. Ethical respon-
sibility involves creating experiences that enhance well-being while minimizing
negative psychological outcomes.
Societal Implications and Accessibility: VR has the potential to exacerbate existing
societal inequalities if not made accessible to all segments of the population [4]. The
technology can lead to a “digital divide” where certain demographics are excluded
from its benefits due to economic or accessibility constraints. Ethical considerations
call for efforts to bridge this divide and ensure equitable access to VR experiences.

3 Technological Intricacies with Respect to Virtual Reality

One of the most significant complexities of new virtual reality technology lies in
the demanding hardware requirements. High-quality VR experiences necessitate
powerful computing components such as advanced graphics processing units (GPUs),
fast central processing units (CPUs), and substantial memory capacity [5][6]. The
88 N. S. Nautiyal and A. Patel

computational load required for rendering detailed 3D environments and tracking

user movements in real time poses a significant barrier to entry for individuals
with limited resources. This hardware barrier not only limits accessibility but also
hampers the widespread adoption of VR experiences across diverse user groups. One
of the primary complexities associated with new virtual reality lies in the demanding
hardware requirements. High-quality VR experiences demand powerful hardware
components, including advanced GPUs, fast processors, and ample memory [5][6].
The computational load required to render realistic 3D environments and track user
movements in real time poses a significant barrier to entry for individuals with limited
resources. This hardware barrier restricts the accessibility of VR experiences and
hinders their widespread adoption. In the exploration of the digital frontier, the
technological underpinnings of VR have garnered significant attention. Slater [7]
underscores the critical role of hardware capabilities, noting that factors such as
computational power, resolution, and latency intricately shape the quality of immer-
sive experiences. Liu [8] delves into the balancing act faced by developers, as they
strive to deliver high-fidelity graphics while ensuring compatibility with accessible
and affordable hardware resources. Moreover, Steuer [9] defines the dimensions that
contribute to telepresence, which has implications for the effectiveness of VR expe-
riences. The technological complexities of VR are manifold. Hardware constraints,
such as computational power, resolution, and latency, impact the quality of immer-
sive experiences [7]. Additionally, creating realistic virtual environments necessitates
sophisticated graphics rendering, increasing the demand for hardware resources [8].
Balancing the need for high-fidelity visuals with accessible and affordable hardware
poses an ongoing challenge [9].
Latency and Motion Sickness: The delay between human input and system
response, or latency, is still a major problem for VR technology. Even minute delays
can cause pain, confusion, and motion sickness by destroying the sense of immer-
sion. The synchronization of several elements, such as tracking sensors, displays, and
rendering pipelines, is necessary for low-latency performance. When wireless VR
installations are taken into account, the difficulty increases since these new factors
might have an influence on latency and the overall quality of the experience. Deliv-
ering a cozy and engaging VR experience requires achieving low-latency perfor-
mance [10]. The time elapsed between a user’s input and the accompanying system
response is referred to as latency. Motion sickness, which causes symptoms like
nausea and disorientation, can be brought on by even little delays and can cause the
feeling of presence to be disrupted [11]. In wireless VR installations, where synchro-
nization between tracking sensors, display refresh rates, and rendering pipelines
grows increasingly complex, the difficulty in minimizing latency is amplified.
Content Design and Adaptation: It can be difficult to convert conventional
content into a VR format [12]. There are creative and technological difficulties with
converting linear media, like movies and webpages, to interactive and immersive
experiences. In order to support user agency and exploration, narrative structures
must be reimagined for storytelling in VR. In order to retain compelling tales that
connect with people while using the immersive potential of VR, content makers must
find a balance. Design and Adaptation of Content.
Navigating the Digital Frontier: Unraveling the Complexities … 89

4 User Experience Challenges

The user experience and its psychological ramifications become a focus of inquiry
as people move across this digital frontier. Jerald [13] explores the possibility of
negative psychological impacts that might degrade the overall user experience, such
as disorientation and cybersickness. The field of view and motion tracking preci-
sion are other aspects that affect this experience [14] investigation of the uncanny
valley phenomena, which looks at the connection between comfort and realism
in VR settings, adds to the complexity of user involvement. Because VR has the
potential to cause cybersickness, disorientation, and other negative impacts, user
experience is a major concern. Discomfort and decreased immersion are caused
by elements including the field of vision, motion tracking precision, and the
vergence-accommodation conflict [13]. User acceptance and emotional engagement
are also influenced by psychological elements, such as presence and the “uncanny
valley” phenomena [14]. A multidisciplinary strategy incorporating psychology,
human–computer interaction, and neuroscience is necessary to address these issues.
Comfort and Fatigue: In the development of VR, user comfort is of the utmost
importance. Long-term use of virtual environments can cause pain, weariness, and
even health problems including motion sickness and eye strain. A deep under-
standing of human physiology and psychology is required to create VR experi-
ences that address these issues. Optimizing rendering methods, lowering motion-
to-photon latency, and designing user-friendly locomotion mechanics that reduce
sensory conflicts are frequently needed to address discomfort.
The development of VR technology faces a basic difficulty in ensuring user
comfort [15]. Long-term use of VR settings can cause discomfort, weariness, and
physical pain, such as headaches and eye strain. Designers must prioritize ergonomic
design principles, optimizing the field of view, and evenly dispersing the weight of
VR headsets in order to address these problems. To guarantee users can interact
with VR material without experiencing negative physical impacts, it is imperative to
adopt measures to lessen motion sickness, such as improving locomotion systems
and fine-tuning rendering algorithms.
Interaction and Realism: A big technological problem is constructing realistic
virtual worlds that replicate the complexity of the actual world. A significant
amount of computer power is required to produce photorealistic pictures, accurate
physics simulations, and lifelike animations. Furthermore, advanced input devices
and gesture detection algorithms are needed to enable natural and intuitive interac-
tion inside these settings. The creation of haptic feedback systems that effectively
recreate tactile sensations is an ongoing problem, as it entails elaborate sensor arrays
and complex algorithms.
90 N. S. Nautiyal and A. Patel

5 Ethical Considerations in Virtual Reality

The digital frontier’s ethical environment expands to include significant concerns

about content integrity, privacy, and responsible content creation. Kavoori [1]
explored concerns about user privacy and data security inside VR ecosystems to
throw light on the shadow side of VR. In his 2018 article, Gunkel [17] engages in
a philosophical examination of virtual reality as a medium, posing concerns about
the moral implications of immersive media and how it could affect people’s percep-
tions and actions. VR’s ethical environment includes complex questions about data
security, privacy, and suitable content. Massive volumes of user data are collected
by VR systems, which raises questions regarding unauthorized access and possible
abuse [1]. A further concern is the influence of violent or ethically dubious content
on users when immersive experiences blur the lines between reality and simulation
[2]. The debate over how to strike a balance between creative freedom and ethical
content production is still going on.

5.1 Data Security and Privacy

VR equipment frequently has cameras and sensors to record user motions, gestures,
and even actual surroundings. Data security and user privacy are raised by this.
Important measures in ensuring user privacy in the VR ecosystem include offering
transparent data usage regulations, ensuring strong encryption of user data, and giving
consumers discretion over how much data is shared. Privacy and data security prob-
lems are increasingly in the spotlight as a result of the massive volumes of user
data that VR technologies capture. VR equipment with cameras and sensors has the
ability to record private information about users’ movements and real surroundings.
To allay these worries and foster confidence within the VR ecosystem, it is crucial to
provide strong data encryption, transparent data usage regulations, and user control
over their data.
Data security and privacy issues are raised by the collecting of user data via VR
devices [17]. The cameras and sensors built into VR equipment have the ability
to record private data about users’ movements and physical surroundings. Strong
data encryption, open data usage regulations, and user control over data sharing are
essential for addressing these issues.

5.2 Psychological and Societal Impact

Because VR is so intense, there are moral concerns about what kind of psychological
and societal effects technology may have. Extended usage of virtual environments
may cause a blending of the actual and virtual worlds, which may have an impact on
Navigating the Digital Frontier: Unraveling the Complexities … 91

users’ mental health. Additionally, user perceptions and attitudes might be affected
by VR experiences, raising questions about the possibility of immersive propaganda
or exposure to dangerous information. Creating rules for ethical VR content produc-
tion and consumption is necessary to overcome these ethical difficulties. Because
VR experiences are so intense, moral concerns have been raised about their possible
psychological and social effects [18]. Extended usage of virtual environments may
cause the line between the virtual and real worlds to become hazier, which might
have an impact on users’ mental health. Additionally, user perceptions and atti-
tudes might be affected by VR experiences, raising questions about the possibility
of immersive propaganda or exposure to dangerous information. Creating rules for
ethical VR content production and consumption is necessary to overcome these
ethical difficulties.

5.3 Accessibility and Social Implications

Virtual reality has societal ramifications in areas including education, interpersonal

communication, and urban planning. By creating immersive learning environments,
the technology has the potential to revolutionize remote learning [19]. The possi-
bility of unequal distribution of VR experiences and gear also raises questions
regarding accessibility discrepancies [20]. Cultural sensitivity must also be taken into
account because virtual settings may unintentionally support stereotypes or cultural
appropriation [16].

5.4 Digital Divide and Inclusivity

The broad adoption of VR technology faces a substantial obstacle in the form of

the digital divide. Inequalities already present in society can be made worse by
socioeconomic differences that restrict access to high-quality VR experiences and
gear. In order to ensure inclusion, it is necessary to solve pricing issues, increase the
availability of VR in educational settings, and provide content that appeals to a variety
of user groups. The broad use of virtual reality technology continues to be seriously
hampered by the digital divide [5]. Inequalities already present can be made worse
by socioeconomic differences that limit access to high-quality VR experiences and
gear. In order to ensure inclusion, significant efforts must be made to solve pricing
issues, increase VR accessibility in educational contexts, and develop content that
appeals to a variety of user groups.
92 N. S. Nautiyal and A. Patel

5.5 Reality and Escapism

The ability of VR to produce immersive and compelling experiences begs the question
of where to draw the line between constructive participation and escape. Overuse of
virtual reality for enjoyment may cause people to avoid obligations and relationships
in the real world. A continuing societal problem is balancing the advantages of VR’s
experience potential with its possible effects on social dynamics. The ability of VR
to create immersive experiences encourages reflection on the appropriate ratio of
productive involvement to escapism. Overindulging in VR for leisure might cause
people to distance themselves from obligations and relationships in the real world.
A persistent societal problem that needs careful study is finding a harmonic balance
between the sensory potential of VR and its potential effects on social dynamics.
A wide range of issues related to technology, psychology, ethics, and society
are presented by virtual reality. Collaboration between technologists, psychologists,
ethicists, legislators, and cultural specialists is necessary to address these issues. With
VR continuing to influence many facets of contemporary life, it is crucial to have a
deep awareness of its complexity in order to maximize its benefits and minimize its
drawbacks.

6 Case Studies: Navigating the Complexities of Emerging

Virtual Reality

These case studies underscore the intricate challenges faced when navigating the
complexities of emerging virtual reality technologies. Each case study exemplifies
the multi-faceted nature of the digital frontier, where solutions are born from collab-
orative efforts, innovative strategies, and ethical considerations. The analysis of case
studies is shown in Table 1.
The complexity and difficulties of evolving virtual reality cover a wide range of
topics, such as how it affects personality, who is responsible for creating material,
crimes committed in virtual worlds, and the need for proper legal frameworks. Stake-
holders must work together to solve these issues as VR technology develops in order
to fully realize its promise while preserving users’ rights, ethics, and well-being.
• Identity and personality in virtual reality: Questions about how these immersive
experiences affect personality and identity arise as people explore the virtual
reality (VR) frontier. The “Proteus effect” is a phenomenon that occurs when
users build avatars or other digital representations of themselves that may be
different from who they really are [21]. This impact emphasizes how the look of
users’ avatars may affect their behavior and attitudes. This raises moral questions
concerning the degree to which VR might alter one’s behavior and self-perception,
affecting both one’s identity and interactions in virtual worlds.
• Platforms and Content Creators’ Ethical Responsibility: Virtual reality content
developers and platforms have a big ethical obligation. Content makers must take
Table 1 Analysis of case studies
S. No Background Institution Situation Solution Outcome
1 The inclusive Virtual Learning VLU, a leading online education VLU collaborates with non-profit The initiative leads to a
learning initiative University (VLU) platform, aims to revolutionize organizations and educational significant increase in the
distance learning through immersive institutions to provide subsidized VR participation of students from
VR experiences kits to underprivileged students [20] disadvantaged backgrounds,
However, they face the challenge of Theyalso develop content that enriching their educational
ensuring equitable access to VR catersto diverse learning styles experiences and reducing the
education experiences across socio andlanguage preferences, ensuring digital divide
economic backgrounds inclusivity
2 Ethical content Immerse VR Immerse VR Studios is a content Immerse VR Studios collaborates The company has gained a
curation Studios creation company specializing in with ethicists and cultural experts to reputation for producing
VR experiences. They grapple with establish content guidelines that culturally enriching and
the challenge of responsibly prioritize diversity and cultural responsible content,
curating content to ensure that their sensitivity [16]. They implement AI- attracting a broader audience,
immersive simulations do not drivenfilters that identify and flag and contributing positively to
perpetuate stereotypes or harmful potentially offensive content for the VR ecosystem
Navigating the Digital Frontier: Unraveling the Complexities …

narratives review
3 Overcoming Healthcare VR Health Solutions develops VR The company invests in research to Through iterative design and
cybersickness Institution: VR applications for pain management understand the causes of continuous improvement, VR
Health Solutions and physical therapy. However, they cybersickness and incorporates user Health Solutions minimizes
face the challenge of users feedback to refine their applications the occurrence of
experiencing cybersickness and [13] cybersickness, leading to
discomfort during prolongedVR higher user satisfaction and
sessions improved treatment outcomes
93
94 N. S. Nautiyal and A. Patel

into account the possible psychological effects of their works since the immersive
nature of VR encounters might make it difficult to distinguish between reality
and simulation [2]. It is crucial to ensure that content complies with responsible
content rules, respects cultural sensitivities, and refrains from supporting negative
stereotypes. To stop the spread of harmful or improper information, VR platforms
should employ effective content moderation methods [16].
• Virtual Offences’ Challenges: The issues of virtual offenses, such as cyber-
crimes, bullying, and harassment, do not exclude the digital frontier. According
to Boukhechba [22], virtual reality settings can open up new doors for hazardous
behaviors that transcend physical boundaries. Such offenses may have a greater
impact because of VR’s immersive features, which may cause psychological
injury and emotional suffering. The problem comes in developing efficient
strategies to prevent and resolve virtual offenses as users navigate these virtual
areas, demanding cooperation between developers, law enforcement, and legal
professionals.
• Challenges with Jurisdiction and Legal Frameworks: VR’s legal environment
is complicated, especially with regard to responsibility and jurisdiction. Virtual
offenses that take place in virtual settings may include people from many
geographical areas, making it more difficult to determine the rules and regula-
tions that apply [23]. Furthermore, legal frameworks that take into account VR’s
particular characteristics are needed for concerns relating to user consent, intellec-
tual property rights, and data privacy in virtual environments. Policymakers and
legal professionals must ensure the preservation of users’ rights while adjusting
current laws to the digital frontier.
The complexity and difficulties of the evolving Virtual Reality go beyond the
technical to include personality, responsibility, criminal activity, and legal prob-
lems. Stakeholders must have continuing discussions as the digital frontier develops
in order to create moral standards, responsible content policies, and regulatory
frameworks that guarantee VR’s good potential while minimizing its negative effects.

7 Navigating the Complexities of Emerging Virtual Reality

and Its Solutions

The aspects of complexities and challenges related to emerging Virtual Reality (VR)
technology in terms of personality, responsibility, offenses, and laws are shown in
Table 2.
As the digital frontier unfolds, navigating the complexities and challenges inherent
in emerging Virtual Reality (VR) technologies requires a multi-pronged approach
that draws on technological innovations, ethical considerations, and societal inclu-
siveness. The integration of these solutions can pave the way for responsible and
sustainable development and usage of VR. Innovations in Technology for Better
Table 2 Complexities and challenges of virtual reality
S.no Basis Complexity Challenge
1 Personality and VR experiences have the potential to evoke The challenge lies in understanding how different personality traits interact
psychological impact strong emotional responses, impacting with VR stimuli. Introverted individuals might experience heightened
individuals’ personalities and psychological discomfort in socially Interactive VR scenarios [14]
well-being Moreover, prolonged exposure to VR could lead to a phenomenon called
The immersive nature of VR can influence “post-VR reality,” where individuals struggle to transition from the virtual
mood, emotions, and even alter perceptions of to the real world, potentially affecting personality dynamics
self
2 Responsibility in The vast diversity of VR content, from Striking the balance between artistic freedom and responsible content
content creation educational simulations to entertainment creation is a major challenge [2]
experiences, poses a challenge in maintaining Creators must consider the potential impact of their content on users’ beliefs
responsible content creation practices and behaviors. Ethical guidelines and moderation mechanisms need to be
VR content creators face ethical dilemmas in established to prevent offensive or inappropriate content from being
ensuring that their creations are both disseminated
engaging and free from harmful narratives or
stereotypes
3 Offenses and ethical VR environments can serve as platforms for Determining the appropriate response to offenses committed within virtual
dilemmas both positive and negative interactions. Just as environments presents an ethical challenge. While the impact might be less
Navigating the Digital Frontier: Unraveling the Complexities …

in any digital space, there is a potential for tangible than in the physical world, emotional harm and psychological
offensive behavior, harassment, and even distress are still very real [16]. Establishing a system for reporting and
virtual “crimes” within the VR landscape addressing such incidents, while respecting the unique nature of virtual
interactions, is essential
4 Lawsand legal Thefast-paced development of VR technology Legislators and legal experts grapple with the challenge of crafting laws that
frameworks often outpaces the establishment of legal adequately address offenses and disputes arising within VR environments.
frameworks to address related challenges. VR Intellectual property rights, data privacy concerns, and liability issues are
can blur the lines between virtual and real, just a few of the legal aspects that need clarification [1]. A proactive
making it difficult to apply existing laws to approach that anticipates potential legal challenges is crucial to ensuring a
novel situations fair and just virtual space
95
96 N. S. Nautiyal and A. Patel

Experiences Continuous innovation is essential if VR is to overcome its techno-

logical challenges. The demands on processing power and rendering skills can be
reduced because of advancements in hardware efficiency brought about through
collaboration between hardware makers, software developers, and researchers [8].
While assuring compatibility with more accessible hardware resources, investments
in cutting-edge graphics processing units and optimization techniques can improve
the quality of images [7]. Real-time motion tracking technology developments can
also reduce motion sickness and improve user comfort Jerald (2016).
i. Giving user-centered design and well-being a first priority
The user’s well-being must be prioritized in the human-centered design
process in order to address user experience and psychological aspects. Itera-
tive design approaches that incorporate user feedback can result in interfaces
that are simple to use and cozy to interact with [14]. Design professionals
may shape experiences to lessen pain and raise general happiness by taking
into account individual differences in perception and tolerance. According to
Felnhofer [24], adding adaptive VR material that changes in response to user
reactions might also lessen the detrimental psychological impacts and increase
emotional involvement.
ii. Ethical Guidelines and Reliable Content Production
It takes the development of strong frameworks that protect user privacy and
preserve content integrity to address ethical issues in VR. Within VR ecosys-
tems, user privacy may be preserved by adhering to the concepts of informed
consent and data anonymization [1]. Inappropriate or hazardous information
may be recognized and restricted by putting in place content moderation systems
and AI-driven filters, guaranteeing responsible content distribution [2]. Guide-
lines that encourage artistic freedom while limiting the continuation of negative
stereotypes can be developed through collaboration between content producers,
ethicists, and cultural specialists [16].
iii. Accessibility and Inclusivity of Experiences
Strategies for societal inclusivity are essential to addressing the possible
digital gap. There should be efforts made to make VR experiences and gear
available to a wider range of consumers [4]. This may entail collaborations with
educational institutions to offer underprivileged communities VR experiences.
A more inclusive VR environment may also be fostered by the production of
culturally sensitive material that respects many viewpoints. A comprehensive
strategy that incorporates technology breakthroughs, user-centric design, ethical
concerns, and social inclusivity is required to navigate the challenges of new
virtual reality technologies. To lead VR towards a responsible and transforma-
tional digital frontier, academics, developers, ethicists, and politicians will work
together continuously.
Navigating the Digital Frontier: Unraveling the Complexities … 97

8 Conclusion

Virtual reality, the newest frontier, has enormous potential and presents previously
unheard-of capabilities for communication, entertainment, and education. However,
it is impossible to ignore the complexity and difficulties brought on by VR technology.
VR researchers and developers encounter a variety of challenges, from technical
difficulties like system needs and latency to user experience issues like comfort and
content adaption. The situation is made more complicated by ethical worries about
privacy, psychological effects, and societal ramifications.
A multidisciplinary strategy involving cooperation between engineers, psycholo-
gists, ethicists, and legislators is required to address these complexities and obstacles.
Finding creative solutions to these problems will be essential for realizing the full
promise of virtual reality while guaranteeing its ethical and fair inclusion into our
lives as the VR ecosystem develops. The new virtual reality is complex and diffi-
cult in terms of technology, user experience, ethics, and societal issues. Researchers
and developers must manage a variety of challenges, including difficult hardware
requirements, latency considerations, user comfort, content adaption, and ethical
issues. Innovative answers to these problems will be essential in realizing the full
promise of virtual reality while guaranteeing its responsible incorporation into our
daily lives as VR technology develops.

References

1. Author F (2016) Article title. Journal 2(5):99–110

2. Author F, Author S (2016) Title of a proceedings paper. In: Editor F, Editor S (eds)
CONFERENCE 2016, LNCS, vol 9999. Springer, Heidelberg, pp 1–13
3. Author F, Author S, Author T (1999) Book title, 2nd edn. Publisher, Location
4. Author F (2010) Contribution title. In: 9th international proceedings on proceedings, pp 1–2.
Publisher, Location
5. LNCS Homepage, http://www.springer.com/lncs. Last accessed 21 Nov 2016
Challenges to Admissibility
and Reliability of Electronic Evidence
in India in the Age of ‘Deepfakes’

Divyansh Shukla and Anshul Pandey

Abstract If any electronic evidence, in the form of videos or images, is produced

before the court, judges automatically attach a lot of probative weight to such
electronic evidence. With the advancement of technological capabilities of AI that
produces Deepfakes, this standard rule might or rather should change over a period
of time. ‘Deepfakes’ is a sophisticated form of technology that enables people to
create fabricated videos of real people and make them say and do things that they
never said or did. With its advent in 2017, Deepfake technology is taking long strides
and has reached a point where it will be difficult for the naked eye to identify minute
signs that the image or video is doctored and not real. This is a cause for worry for
every stakeholder who is involved with the justice delivery system in India as it will
be difficult to conclude the authenticity of the electronic form of evidence produced
by parties to the suit. India follows an adversarial system of justice, and the Indian
Evidence Act, in both civil and criminal cases, serves as a final gatekeeper to funnel
relevant and authentic evidence. This paper attempts to explore the potential impact
of deepfakes in the courtroom. The paper discusses why and how deepfakes pose
a new and unique threat to the justice delivery system and tries to analyse whether
Indian legislations such as ‘Indian Evidence Act’ and ‘Information Technology Act’
are tailored to cope with ‘deepfakes’ as a new frontier of false evidence or not. The
article also analyses the responsibility of ‘Examiner of Electronic Evidence’ as an
Expert to identify fake content in any digital media and give an expert opinion under
Sect. 45A of the Indian Evidence Act.

Keywords Deepfake · Artificial intelligence · Reliability of electronic evidence

D. Shukla (B) · A. Pandey

Chhatrapati Shahu Ji Maharaj University, Kanpur, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 99
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_9
100 D. Shukla and A. Pandey

1 Introduction

For how long does “seeing is believing” apply? Images and videos currently carry
a lot of probative weight in society. They are regarded as prima facie proof that
the reported occurrence actually took place. The introduction of alleged “deepfake”
videos could alter that. Artificial intelligence advancements have made it possible to
produce videos that appear to be real but actually contain fake actions and statements
made by real people [1].
Deepfakes will undoubtedly start cropping up in court settings as AI apps increas-
ingly control our lives, and it is inevitable that the evidence required to successfully
deal with civil matters and criminal trials will include facts generated by this uncanny
technology [2]. In this article, the effects of deepfakes on courtroom procedures are
discussed, along with the authenticity and admissibility of such electronic evidence
in the context of India.
Fortunately, the phenomenon of falsification and evidence tampering is not new to
courts. To help weed out fakes, the rules of evidence have long-established authen-
tication requirements. We contend that these standards are insufficient to combat
deepfakes as is and that the standard for authenticating video evidence should be
strengthened in order to figure out the challenges posed by AI-based technology.
Although Indian Courts have previously handled the inauthentic evidence
presented in front of them the challenge posed by this AI-based technology should
not be seen from the same lens. Moreover, there is no published court opinion in
India that considers the issues regarding AI admissibility in any depth.

2 What Are Deepfakes?

The emerging technologies are enablers for a better future, they are these empowering
notions that hold the power to dictate the future of humanity. The advent of Artificial
Intelligence (further referred to as AI) has given birth to numerous other branches of
technologies through advancing data sciences. Deepfakes are part of the family tree
and are considered as part of synthetic media. Deepfake is a method of technology that
uses AI as a means whereby the user has the ability to recreate through audio-visual
cues a synthetically augmented video of a real person including pictures, by making
them act or speak things that have not been committed in reality. The algorithms and
systems hold the ability through means of machine learning, to process collections of
data and information for the user so that any form of body part including face, body,
and other visual attributes can be re-generated on the screen which are seemingly
real but not in reality [3].
Deepfakes are derivative of deep learning which is a form under the head of AI,
the technology works on the basis of neural networks. The neural network tech-
nology in deep learning is frequently seen to be filled with input/output structures.
The algorithm consists of two related bodies which are known as the generator and
Challenges to Admissibility and Reliability of Electronic Evidence … 101

discriminator. These algorithms are very significant because they distinguish the
content between fake and real. The generator set is used to create the fake content,
while the discriminator set is used to distinguish the features that were faked hence
authenticating the material. After the detection of the authentic features within its
system, the discriminator reports it back to the generator so that the fake content can
be perfected more and more and be in line with the real instance. So, the system
improves itself through such information. The input function gets more weightage
when the picture or content is closer to the real image, essentially, it’s like a scale of
success to determine the degree of correctness.
The underlying technology can overlay face images, create facial motions, switch
faces, maneuver facial expressions, produce faces, and synthesize the speech of
a target individual onto a video of a spokesperson in order to create a video of the
target individual acting similarly to the source person. The subsequent impersonation
is often practically indistinguishable from the original ones [4]. Videos of Barack
Obama [5], Donald Trump [6], Nancy Pelosi [7], Russian President Vladimir Putin
[8], Ukrainian President Volodymyr Zelenskyy [9], Economic Affairs Minister of
Malaysia [10], Tom Cruise [11], Facebook CEO Mark Zuckerberg [12], American
President Richard Nixon [13], and Queen Elizabeth [14] are few deepfake incidents
that reflect the endless creativity of this technology. India, for the first time, witnessed
the emergence of Deepfake manipulation in the 2020 Delhi Assembly Elections when
a deepfake video of Manoj Tiwari, the state president of Bhartiya Janta Party (BJP),
was widely circulated through WhatsApp [15].
AI-assisted technology may be used to create videos showing corrupt authorities,
atrocities committed by the military, immoral presidential candidates, and emer-
gency professionals warning of a terrorist attack. [16]. The diverse domains in which
harm can be caused by Deepfake videos or images can be best understood from the
Deepfakes Accountability Act [17]. Act imposes criminal liability if the advanced
technology can be used to create false personation with the intent to
• feature person in sexual activity,
• cause violence or physical harm, incite armed or diplomatic conflict,
• interfere in an official proceeding,
• commit fraud, including securities fraud,
• influencing a domestic policy debate,
• interfering in a Federal, state, or territorial election.
The threat of deepfakes is strengthened by the fact that they are extraordinarily
precise, easy to create, and have adverse effects on the viewers. Moreover, the quality
of deepfake videos seeming to be real will keep improving over time [18], it will
become difficult to detect fakes for unaided humans and as a result of this, it will be
difficult for people and AI system itself to differentiate real videos from fake ones
[19]. This threat makes the detection of deepfakes a continuing problem.
Governments all around the world, the tech industry [20], and other stakeholders
have made efforts to develop technology for the detection of deepfakes. The main
goal of deepfake detection is to figure out the authenticity of video recordings and
to find out if the video has been manipulated in any manner or not.
102 D. Shukla and A. Pandey

3 Deepfakes’ Impact on the Courtroom

Although harms associated with the prevalent use of deepfake strike at many levels
of society, in this research article we will stick the discussion on the issue of deepfake
being an imminent threat to courtroom integrity and the investigative process.
Deepfake technology can be used by the complainant to produce fabricated audio-
visual evidence in order to obtain judgement in its favour, and on the other hand,
defence lawyer can plant the seeds in the judge’s mind to question the authenticity
of digital evidences produced by other parties, even though he or she knows that the
evidence produced is genuine. Therefore, even in situations where there are no fake
videos, the simple fact that deepfakes exist will make it more difficult to verify the
veracity of actual evidence. In the long run, this can create bias and scepticism in the
mind of judges regarding the admissibility of audio-visual evidence in general.
So far, there are two major instances where deepfake has negatively impacted the
legal proceedings.
i. Deepfake Evidence was produced in a British court in one of the cases, where
the mother used doctored threatening audio of the father, in order to obtain the
custody of the child [21].
ii. Another recent example shows how mere allegations of the production of a
deepfake video can negatively impact the trial. In March 2021, Raffaela Spone
was detained in March 2021 and charged with harassing her daughter’s cheering
opponents by reportedly creating deepfakes that showed them nude, drinking,
and vaping. However, Spone denied the charges of deepfake creation. Experts
in forensics and technology concluded that it was real and not a fabrication, but
the poor video quality and dearth of additional evidence prevented them from
reaching a definitive verdict. Later, the office of the prosecutor announced that
the lead office of the case had concluded that the video was fake on a “naked
eye” inspection of it and hence they are no longer pursuing the deepfake video
as the basis for the accusation.[22].
The first case shows that parties, by producing easily created fabricated evidence,
can make the role of judges burdensome, as they have to take additional measures to
determine the authenticity of the evidence produced. Luckily, in that case husband
was able to find out the original audio which was doctored and compared the metadata
and proved to the court that audio is manipulated, but it would not be so easy for the
parties as well as the judges, in every case, to determine the authenticity of evidence
produced in absence of recovery of the original file and technical aid from experts.
Moreover, the nature of evidence like video recordings or audio recordings makes
them so trustworthy, that they are taken at face value by judges.
Second case is a perfect example where a defense lawyer can manipulate the
proceedings by allegedly creating doubt in the minds of judges that even an authentic
video is a deepfake. In that situation, Spone had already suffered harm by the time
the prosecution revised its strategy. Spone was overcome with unfavorable attention.
According to her lawyer, her reputation has been destroyed; she received death threats
and was mocked and bullied in her neighborhood and online [23].
Challenges to Admissibility and Reliability of Electronic Evidence … 103

In addition to the above two cases, in the United States, deepfake evidence was
produced as proof in defamation cases [24], a federal civil rights action [25], child
pornography [26], and assault with an attempt to murder [27]. Hence, as sophisti-
cation of technology increases with time, the deepfake evidence shall be the central
focus of the litigation [16].
These cases of other jurisdictions are a notable warning for India to frame and
amend laws in order to cater to the demanding exigency of sophistication of deepfake
technology. It is a matter of time before such fake evidence can be presented in
district courts of India and the adversarial system of justice may actually fail before
this technology due to the absence of a relevant detection system and expert opinion.
Although, existing Indian law contains a procedure for the authentication of digital
evidence, it falls short majorly because rules were developed before the emergence
of deepfake technology. Hence, in the digital age, when video and audio recordings
would be frequently presented as evidence, there is a need to verify and amend the
rules of evidentiary standards in India, for authenticating video and audio evidence
to counter the impact of deepfake technology.

4 Analysis of the Existing Indian Laws to Address

the Challenge of Fake ‘Electronic Evidence’
in the Courtroom

Every aspect of trial preparation and practice will be impacted by deepfakes,

including attorneys’ attempts to introduce or exclude videos as evidence, judges’
rulings on whether a video is admissible, expert and lay witnesses’ requests to testify
about the video, and judge’s consideration of the evidence when making a decision
[19].
With the increase of opportunities for all people to access any information in the
twenty-first century, reliance on electronic means of communication also increased.
The Apex Court of India noted that we are living in a time of technological advance-
ments. Technologies are developing quickly. Given the possibility of evidence manip-
ulation, caution must be exercised when accepting electronic evidence. Making
complete rules is impossible because they might not hold true in the fast-paced world
of technological advancement. In the current technological age, new approaches and
tools are emerging, thus it is important to take these into account when admitting elec-
tronic evidence [28]. This caused the need to amend the law related to the admission
of electronic evidence in the court of law.
Information with probative value that is saved or transmitted in binary form and
can be used as evidence in court is known as electronic evidence [29]. Section 3
of the Indian Evidence Act defines ‘Evidence’ as “all documents including elec-
tronic record produced for the inspection of court”. Moreover, the definition of
‘electronic form evidence’ can be understood from the Explanation of Sect. 79A of
the Information Technology Act. The explanation provides that for the purpose of this
104 D. Shukla and A. Pandey

section, Electronic form evidence means “any information of probative value that
is either stored or transmitted in electronic form and includes computer evidence,
digital audio, digital video, cell phones, digital fax machines”. Hence, the defini-
tion is inclusive enough to take into ambit the videos or audio created by deepfake
technology.
This section will analyse whether the existing Indian Legislation is equipped
enough to address the challenges of production, identification, and admission of
deepfake electronic evidence in the Courtroom.

4.1 Presumption as to Electronic Record

Section 85B of the Indian Evidence Act creates a presumption as to alteration of ‘elec-
tronic record’. It is significant to understand that the term ‘electronic record’ would
include within its definition ‘electronic evidence’. The term ‘Electronic Record’ has
been defined under Sect. 2(t) of the Information Technology Act as “data, record
or data generated, image or sound stored, received or sent in an electronic form or
micro film or computer-generated micro fiche”. Although the definition of electronic
record does not specifically contain video, but are referring to nothing but data stored
in an electronic form. Hence, it is difficult to exclude electronic evidence as part of
electronic records.
Section 85B provides that “In any proceedings involving a secure electronic
record, the Court shall presume unless contrary is proved, that the secure electronic
record has not been altered since the specific point of time to which the secure status
relates”. Moreover, sub-Sect. 2 of Sect. 85B provides that, “nothing in this section
shall create any presumption, relating to authenticity and integrity of the electronic
record, except in the case of a secure electronic record”.
This section is relevant in the context of deepfake technology manipulation and
two different conclusions can be drawn from the interpretation of this section:
i. Firstly, It provides that if the electronic record is secure, then the court shall
presume that there is no manipulation or alteration in the evidence produced,
unless proved otherwise. But if the electronic evidence is not secure, then there
is no presumption of any nature, be it positive or negative. Hence, as mentioned
earlier judges consider the video recordings or audio recordings at face value, as
not something that is supported by law, but is rather based on the belief or human
psychology of judges due to the nature of evidence being so trustworthy.
ii. Secondly, the law has created a demarcation between ‘electronic record’ and
‘secure electronic record’. The definition of ‘Secure electronic record’ has been
surprisingly mentioned in two different places under the Information Technology
Act. One at Sect. 14 of the IT Act and the other at Rule 3 of The Information
Technology (Security Procedure) Rules, 2004. Section 14 provides that “Where
any security procedure has been applied to an electronic record at a specific
point of time, then such record shall he deemed to be a secure electronic record
Challenges to Admissibility and Reliability of Electronic Evidence … 105

from such point of time to the time of verification”. However, Rule 3 of The
Information Technology (Security Procedure) Rules, 2004 mentions that “An
electronic record shall be deemed to be a secure electronic record for the purposes
of the Act if it has been authenticated by means of a secure digital signature”.
Both definitions conclude that electronic record, to be considered as secured,
must be digitally signed.
In sum, if any party to the case, produces any digital evidence for consideration
in front of court, the court shall presume that the digital or electronic evidence is
not fabricated, only if such digital evidence is authenticated by means of a secure
digital signature. Otherwise, the court shall not make any presumption regarding
authenticity.
Securing electronic evidence by means of digital signature seems like a simple
option. This will confirm that a particular sight was actually observed physically and
not digitally created by a camera. But it is neither a viable nor a practical solution.
This solution will further raise numerous other issues as discussed below:
i. To counter deepfake technology, if at all, the concept of digital signature in videos
is enforced through law, then, every camera device of any specification, be it a
mobile camera or professional camera, or spy camera, etc., has to be embedded
with this verification technology by every manufacturer of such device. Manu-
facturers of every such device might not give consent to add a digital signature
feature in its camera due to privacy concerns because integrating this feature in
the device could fulfil the dream of surveillance of any state [30].
ii. Another technical problem with such technology of integrating digital signatures
with camera devices is that technology does not work well with videos. Changing
the format of the video from MPEG4 to MPEG2 would completely change the
hash value of the original video, thus depicting the fabrication, although not
in the actual sense. Professor of computer vision at the University of Surrey
and project leader for Archangel, John Collomosse, states that “Publishers run
their document through a cryptographic algorithm such as SHA256, MD5, or
Blowfish, which produces a “hash,” a brief string of bytes that represents the
content of that file and serves as its digital signature. Running the same file
through the hashing algorithm at any time will produce the same hash if its
contents haven’t changed Hashes are extremely sensitive to changes in the source
file’s binary structure. When you change only one byte in the hashed file and
rerun the procedure, the outcome would be completely different. But while hashes
work well for text files and applications, they present challenges for videos, which
can be stored in different formats” [31].
iii. This solution is not viable because, where recorded videos only depict the
conclusion of the incident and not the entire incident leading up to the conclu-
sion with an incorrect description (Social media users may post videos showing
police handcuffing and shooting a suspect in the leg before claiming that the
man was an unarmed innocent pedestrian who was the victim of drive-by police
brutality.) Adding digital signatures to cell phone cameras would not address
this common source of false videographic narrative, as the issue is not whether
106 D. Shukla and A. Pandey

the footage is real or fake, but rather whether it captures the entire situation and
whether the description assigned to it represents what the video actually depicts.
[31].

4.2 Relevancy and Admissibility of Electronic Records

as Evidence

Section 136 of the Indian Evidence Act mentions that “When either party proposes to
give evidence of any fact, the Judge may ask the party proposing to give the evidence
in what manner the alleged fact, if proved, would be relevant; and the Judge shall
admit the evidence if he thinks that the fact, if proved, would be relevant, and not
otherwise”. Therefore, before determining the admissibility of electronic records,
judges must decide upon the relevancy of the electronic record in question.
As per Sect. 3 of the Indian Evidence Act, “One fact is said to be relevant to
another when the one is connected with the other in any of the ways referred to in
the provisions of this Act relating to the relevancy of facts”. Deepfake videos and
audios can easily be created to make them relevant under any of the principles of the
Indian Evidence Act, for example, fake video or audio may be presented as evidence
to show motive, intention, state of mind, or to depict the bad character, or to create
estoppel, or to impeach the credit of witness or any other circumstances which cannot
be anticipated, as we have seen from two cases discussed above.
After determining the relevancy of the evidence produced, the court shall look into
the procedural aspect of admission of electronic evidence, i.e., whether the evidence
produced in question is admissible or not. The admissibility of electronic evidence
is greatly affected by reliability and authenticity. Authentication entails persuading
the court that (a) the record’s contents have not changed, (b) the information in the
record actually came from its alleged source, whether a person or a machine, and
(c) extraneous information, like the record’s apparent date, is accurate. Sections 65A
and 65B of the Indian Evidence Act of 1872 were amended in order to accomplish
everything that has been stated above [32].
Any documentary evidence by way of an electronic record under the Evidence
Act, in view of Sects. 59 and 65A, can be proved only in accordance with the
procedure prescribed under Sect. 65B. The admissibility of the electronic record
is covered under Sect. 65B. These clauses are meant to legalize the production of
computer-generated secondary evidence in electronic form.
Section 65B(4) requires the production of a certificate that, among other things,
identifies the electronic record containing the statement, describes how it was created,
and gives specifics of the device used in its creation in order to show that an electronic
record was created using a computer. This certificate must be presented by someone
who is either in charge of managing the relevant device’s management or operating
it in an official capacity.
However, in matters, where relevant digital evidence may be created by the use of
deepfake technology, and produced before the court, Sect. 65B of the Indian Evidence
Challenges to Admissibility and Reliability of Electronic Evidence … 107

Act is no longer useful. The main reason behind this argument is based on the fact
that deepfakes can be created in real time [33]. This can be proved by following two
ways:
i. Conditions of Sect. 65B are necessary to certify the fact that secondary evidence
that is being produced before the court is not manipulated as against the original
primary electronic record. However, in the case where by the use of deepfake
technology, fake videos and audios can be created in the real time. This in turn
shows that if the original video is itself fake, certification of secondary evidence
under Sect. 65B of its authenticity will no longer be valid and useful.
ii. Secondly, if the fabricated original video or audio (due to creation in real time)
is itself presented in the Courtroom, Sect. 65B will not be applicable; as we have
seen in the Preeti Jain vs Kunal Jain &Anr case [34], in which Court mentioned
that compliance with Sect. 65B is not necessary because clippings from the hard
disk of spy camera constituted a primary evidence.

4.3 Expert Opinion of ‘Examiner of Electronic Evidence’

In P.V. Anvar v. P.K. Basher, the Supreme Court of India held that “opinion of an
examiner of electronic records under Sect. 45A could only be obtained once the
secondary electronic evidence has been produced in compliance with Sect. 65-B”.
Apex Court opined that “all these safeguards are taken to ensure the source and
authenticity, which are the two hallmarks pertaining to electronic record sought to be
used as evidence. Electronic records being more susceptible to tampering, alteration,
transposition, excision, etc. without such safeguards, the whole trial based on proof of
electronic records can lead to travesty of justice”[35]. However, only if the electronic
record is duly produced in terms of Sect. 65B of the IEA, the question would arise
as to the genuineness thereof, and in that situation resort can be made to Sect. 45A,
IEA - opinion of examiner of electronic evidence.
Section 4.2 enumerates the erosion of the usefulness of Sect. 65B of the Indian
Evidence Act in the context of the production of deepfake evidence. Taking into
consideration the above-mentioned points, the significance of Sect. 45A strengthens
manifold in order to determine relevancy provided we have to move away from the
settled law laid down in Anvar Case.
Section 45A of IEA talks about the ‘Opinion of Examiner of Electronic Evidence’.
It mentions that “When in a proceeding, the court has to form an opinion on any
matter relating to any information transmitted or stored in any computer resource
or any other electronic or digital form, the opinion of the Examiner of Electronic
Evidence referred to in Sect. 79A of the Information Technology Act, 2000 (21 of
2000), is a relevant fact.
108 D. Shukla and A. Pandey

4.4 Explanation.—For the Purposes of This Section,

an Examiner of Electronic Evidence Shall Be an Expert”.

Section 79A of the Information Technology Act provides that “The Central Govern-
ment may, for the purposes of providing expert opinion on electronic form evidence
before any court or other authority specify, by notification in the Official Gazette,
any Department, body or agency of the Central Government or a State Government
as an Examiner of Electronic Evidence”.
A combined reading of these two sections, specifies that the Central Government
shall specify any authority as an ‘Examiner of Electronic Evidence’, and the opinion
of such authority shall be considered relevant before the court in the form of expert
opinion.
Although these two sections were added in the year 2009 through amendment, it
was after 8 years since the amendment, in 2017, The mechanism to access and notify
the Examiner of Electronic Evidence was designed by the Ministry of Electronics and
Information (MeitY) [36], and after 9 years since the amendment, in 2018, Central
Government for the first time notified Forensic Science Laboratory, Sector 14, Rohini,
New Delhi under Government of National Capital Territory of Delhi, as Examiner
of Electronic Evidence within India [37]. However, since the first notification, as of
today, there are a total of 15 agencies of the Central or State Government that have
been notified by the Ministry as an ‘Examiner of Electronic Evidence’ [38].
If we analyse the scope of work of these 15 agencies, none of them were found
to be eligible to provide expert opinions on matters related to deepfake technology.
Although there is no doubt about the technical competency of such agencies, but
credibility of such expert opinion can be challenged because of the limited scope of
activities in which such agencies were notified as competent. This argument can be
substantiated by the following paragraph.
The scope of approval is outlined in the scheme’s second part. It states that any
department, body, or organization of the federal government or a state government
that wishes to be recognized as an examiner of electronic evidence may submit an
application to the Ministry of Electronics and Information Technology (MeitY) for
one of the activities listed below:
i. Computer (Media) Forensics
ii. Network (Cyber) Forensics
iii. Mobile Devices Forensics
iv. Digital Video/Image & CCTV Forensics
v. Digital Audio Forensics
vi. Device Specific Forensics
vii. Digital Equipment/Machines (having embedded firmware)
viii. Any other.
Even though there are eight specific areas of activity in which any forensic lab
or agency can get notified by the MeitY, all 15 of such agencies, interestingly, were
notified as Examiner of Electronic Evidence only in two areas of activity, namely:
Challenges to Admissibility and Reliability of Electronic Evidence … 109

i. Computer (Media) Forensics, and

ii. Mobile Devices Forensics.
This shows that the scope of activity in which such agencies can be considered
to be an experts to provide opinions in any matter before the court is very limited.
Moreover, deepfake Videos and Audios which would fall under the domain of ‘Digital
Video Forensics’ and ‘Digital Audio Forensics’ are not covered under any of the
agency’s scope of work. Hence, in the near future, if any such deepfake evidence is
produced in front of the Court, on paper, there is no competent Central Government
Examiner of Electronic Evidence, whose opinion could be considered relevant or to
say it otherwise if any expert electronic evidence is produced in court of law, such
opinion might not be admissible in court of law due to challenge to its authenticity.
However, after my discussion with the competent expert [39] in this field, the
above-mentioned view was not supported by him convincingly. He was of the opinion
that even if any scientific lab is not declared technically competent under Sect. 79A
of IT Act, and yet has given its expert opinion on the authenticity of deepfake video
or audio, then the opinion of such lab would be considered relevant, provided an
expert has convinced and testified his opinion in front of judge, based on scientific
and technical grounds. He was also of the opinion that Digital Video Forensic and
Digital Audio Forensics is a very dynamic field and it is not easy to find competent
experts in this field.
Scheme to notify the examiner of electronic evidence also mentions that labora-
tories must submit a certain list of documents to MeitY annually. The list includes
“Number of cases referred to by the prosecuting agency/court (details along with
case title), Number of cases handled and reports filed before the court, Number of
times, examiner appeared before the court as an expert (give case title), Observation
passed by the Courts”. Information regarding all the above parameters was requested
by the author from MeitY through Right to Information (RTI) [40] (Appendix A) for
the purpose of analysing the cases, if any, in which the examiner’s opinion related
to Video and Audio Forensic were considered relevant. However, the Appeal was
disposed of for the reason that information cannot be shared with the applicant as
there exists a fiduciary relationship and the information is confidential. This reason
to dispose of the request makes no sense as a list of cases in which expert opinion
was provided cannot in any way be termed ‘confidential’.
Sections 4.1, 4.2, and 4.3 shows that there exists a certain void in terms of admissi-
bility and relevancy of expert Opinion of examiner of electronic evidence in the Indian
Evidence Act and Information Technology Act. Indian Legislature must address the
challenges of admission of fabricated deepfake evidence produced in the Courtroom
through the upcoming Bhartiya Sakshya Bill and Digital India Bill.
110 D. Shukla and A. Pandey

5 Suggestions and Conclusion

Firstly, the Bharatiya Sakshya Bill must recognise the fact that technically there are
procedural roadblocks to the presumption of any secure electronic evidence under
Sect. 85B of the Indian Evidence Act. With the advancement of the technological
wings of the AI giant, P.V. Anwar’s judgment seems to be technically infeasible as
requirements of Admissibility of Evidence under Sect. 65B of the Indian Evidence
Act could not be met in the case of deepfake evidence. Lastly, the Central Govern-
ment must recognise the competent Forensic Science Laboratory that can technically
recognise the authenticity of deepfake evidence produced anywhere in India and if
needed may take help from Software giants in the field, whose expert opinion could
be trusted in the court of law.
The Supreme Court has often acknowledged that electronic records are more prone
to manipulation, alteration, transposition, excision, and other errors and without the
safeguards, the whole trial based on proof of electronic records can lead to the travesty
of justice.
Electronic Evidence produced through deepfake technology poses an even bigger
threat to the already established legislative safeguards and potentially challenges the
justice delivery system to the core. Examples cited by Homeland Security [41] of
the United States are proof of the fact that deepfake issues are to be taken care of
seriously and they deserve the due attention from the legislators.
Indian law is lagging behind in tackling the potential harms associated with the
emerging arms of the technological capabilities of AI that produce deepfakes. Law is
necessary to address the difficulties brought about by new social developments. The
concept and application of law must evolve together with society if it is to remain
relevant.
Bharatiya Sakshya Bill, 2023 which was recently introduced in the Parliament of
India failed to amend or evolve any law relating to electronic evidence considering
the changing contours of society. The proposed Digital India Bill, a draft of which is
pending for a long time, hints at changing the 22-year-old Information Technology
Act. It contains the worries related to challenges posed by Deepfake technology but
provides no solid solution yet.
Digital India Bill provides hope that the legislature would recognise the fact that
AI-assisted technologies be treated differently from the age-old ways of fabricating
any information. This is the right time when the Bharatiya Sakshya Bill, Bharatiya
Nyaya Sanhita and Bharatiya Nagarik Suraksha Sanhita must be integrated with the
Digital India Bill to define the word ‘deepfake’ and should contain a separate chapter
related to harms associated with deepfake.
Although India has yet not witnessed any case in which deepfake video or audio
was presented in front of the court or it might also be a case where evidence was
produced in the court of law at the district level and caught no one’s attention due to
unawareness of the subject matter because, ‘Seeing is believing’.
Challenges to Admissibility and Reliability of Electronic Evidence … 111

Appendix A

References

1. Riana Pfefferkorn F (2020) “Deepfakes” in the Courtroom. Public Interest Law J 245–275
2. Paul W, Grimm F, Grossman MRS, Gordon V. Cormack T (2021) Artificial intelligence as
evidence. Northwestern J Technol Intell Property 10–105
3. Rob Cover F (2022) Deepfake culture: the emergence of audio-video deception as an object of
social anxiety and regulation. J Media Cultural Studies 4–12
4. Loveleen Gaur F (2023) DeepFakes creation, detection, and impact. Taylor & Francis
5. Business Insider India. https://www.businessinsider.in/tech/a-video-that-appeared-to-show-
obama-calling-trump-a-dipsh-t-is-a-warning-about-a-disturbing-new-trend-called-deepfakes/
articleshow/63807263.cms. Last accessed 02 Sept 2023
6. The Guardian. https://www.theguardian.com/technology/ng-interactive/2019/jun/22/the-rise-
of-the-deepfake-and-the-threat-to-democracy. Last accessed 02 Sept 2023
7. Forbes. https://www.forbes.com/sites/charlestowersclark/2019/05/31/mona-lisa-and-nancy-
pelosi-the-implications-of-deepfakes/?sh=5e46695e4357. Last accessed 02 Sept 2023
8. MIT Technology review. https://www.technologyreview.com/2020/09/29/1009098/ai-dee
pfake-putin-kim-jong-un-us-election/. Last accessed 02 Sept 2023
9. NPR. https://www.npr.org/2022/03/16/1087062648/deepfake-video-zelenskyy-experts-war-
manipulation-ukraine-russia. Last accessed 02 Sept 2023
112 D. Shukla and A. Pandey

10. SBS News. https://www.sbs.com.au/news/a-gay-sex-tape-is-threatening-to-end-the-political-

careers-of-two-men-in-malaysia. Last accessed 02 Sept 2023
11. Australian Broadcasting Corporation News. https://www.abc.net.au/news/2021-06-24/tom-cru
ise-deepfake-chris-ume-security-washington-dc/100234772. Last accessed 02 Sept 2023
12. The Independent. https://www.independent.co.uk/tech/mark-zuckerberg-deepfake-ai-meta-
b2236388.html. Last accessed 02 Sept 2023
13. CNET. https://www.cnet.com/science/mit-releases-deepfake-video-of-nixon-announcing-
nasa-apollo-11-disaster/. Last accessed 02 Sept 2023
14. CNN. https://edition.cnn.com/2020/12/25/uk/deepfake-queen-speech-christmas-intl-gbr/
index.html. Last accessed 02 Sept 2023
15. MIT Technology review. https://www.technologyreview.com/2020/02/19/868173/an-indian-
politician-is-using-deepfakes-to-try-and-win-voters/. Last accessed 02 Sept 2023
16. Rebecca A, Delfino F (2022) Deepfakes on trial: a call to expand the trial judge’s gatekeeping
role to protect legal proceedings from technological fakery, Loyola Law School, 1–6
17. Congress.gov. https://www.congress.gov/bill/117thcongress/house-bill/2395/text. Last
accessed 24 Aug 2023
18. The New York Times. https://www.nytimes.com/2019/11/24/technology/tech-companies-dee
pfakes.html. Last accessed 24 Aug 2023
19. Pfefferkorn RF (2020) “Deepfakes” in the Courtroom. Public Interest Law J 245–250
20. Intel Newsroom. https://www.intel.com/content/www/us/en/newsroom/news/intel-introduces-
real-time-deepfake-detector.html#gs.notfp6. Last accessed 13 Sept 2023
21. The Telegraph. https://www.telegraph.co.uk/news/2020/01/31/deepfake-audio-used-custody-
battle-lawyer-reveals-doctored-evidence/. Last accessed 13 Sept 2023
22. BBC News. https://www.bbc.com/news/technology-56404038. Last accessed 13 Sept 2023
23. The Washington Post. https://www.washingtonpost.com/technology/2021/05/14/deepfake-
cheer-mom-claims-dropped/. Last accessed 13 Sept 2023
24. In re Woori Bank, 2021 WL 2645812, p. *1–2 (N.D. Cal. 2021) (plaintiff sought discovery
from social media platform to support his defamation action based on claim that a “deepfake”
image of the plaintiff engaging in an improper intimate act had been posted on a social media
platform)
25. Hohsfield v. Staffieri, 2021 WL 5086367, p. *1 (N.J. 2021) (plaintiff brought a 42 USC 1983
action against police officers, claiming that they created a deepfake photo of him engaging in
a lewd act to frame him and justify his arrest)
26. Schaffer v. Shinn, 2021 WL 6101435, p.*7 (Ariz. 2021) (defendant attacked sufficiency of
the evidence supporting sentencing enhancement arguing that the pornograph image was a
deepfake)
27. People v. Smith, __ N.W.2d __, 2021 WL 641725, p* (Mich. 2021) (defendant challenged the
admission of Facebook posts belong to others which purportedly included his image and gang
moniker, suggesting that they were fake)
28. Tukaram S, Dighole v, Manikrao Shivaji Kokate [(2010) 4 SCC 329]
29. Vivek Dubey F (2017) Admissibility of electronic evidence: an indian perspective. Foren Res
Criminol Int J 58
30. Forbes. https://www.forbes.com/sites/kalevleetaru/2018/09/09/why-digital-signatures-wont-
prevent-deep-fakes-but-will-help-repressive-governments/?sh=23f94e835295. Last accessed
13 Sept 2023
31. PCMagUK. https://uk.pcmag.com/opinion/121370/can-anything-protect-us-from-deepfakes.
Last accessed 13 Sept 2023
32. Vipul Vinod F (2020) Snag of electronic evidence, Ram Manohar Lohia National Law Univ J
166
33. Candice R, Gerstner F, Hany Farid S (2022) Detecting real-time deep-fake videos using active
illumination. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
(CVPRW) 2022, pp 53–60, University of California, Berkeley
34. AIR 2016 Rajasthan 153
35. [(2014) 10 SCC 473]
Challenges to Admissibility and Reliability of Electronic Evidence … 113

36. Government of India, Ministry of Electronics & Information Technology (MeitY), Scheme
for Notifying Examiner of Electronic Evidence under Section 79A of Information Technology
Act 2000, https://www.meity.gov.in/writereaddata/files/annexure-i-pilot-scheme-for-notify
ing-examiner-of-electronic-evidence-under-section-79a-of-the-information-technology-act-
2000.pdf. Last accessed 13 Sept 2023
37. Ministry of Electronics & Information Technology Notification, https://www.meity.gov.in/wri
tereaddata/files/12.eGazetteeNotification_FSL%20Rohini_Delhi.pdf. Last accessed 13 Sept
2023
38. Notification of Forensic labs as ‘Examiner of Electronic Evidence’ under Section 79A of
the Information Technology Act 2000. https://www.meity.gov.in/notification-forensic-labs-
‘examiner-electronic-evidence’-under-section-79a-information-technology. Last accessed 13
Sept 2023
39. Dr. Nilay Mistry, Associate Professor, School of Cyber Security & Digital Forensics, National
Forensic Science University, Gandhinagar, India
40. RTI Application No. DITEC/A/E/23/00017
41. Homeland Security, Increasing Threat of Deepfake Identities, https://www.dhs.gov/sites/def
ault/files/publications/increasing_threats_of_deepfake_identities_0.pdf. Last accessed 13 Sept
2023
An In-Depth Exploration of Anomaly
Detection, Classification,
and Localization with Deep Learning:
A Comprehensive Overview

Kamred Udham Singh, Ankit Kumar, Gaurav Kumar, Teekam Singh,

Tanupriya Choudhury, and Ketan Kotecha

Abstract The ability to identify trends in the data when one set of data is devi-
ating from another is called Data Mining. The development of anomalies has made
it possible to identify and avoid malware, as well as several other unlawful prac-
tices. Traditional detection strategies have shown strong results However, as deep
learning progresses, important findings have emerged over the past few years. In
order to summarize existing and the most cutting-edge fraud and intrusion detection
strategies, we address these issues depending on the existence of neural networks,
from broad to shallower. This paper provides an analysis of the published tech-
niques for anomaly detection, especially on the contribution of deep learning to
detection. Methods were sorted according to the kind of DNN included in this study.

K. U. Singh (B)
School of Computer Science, University of Petroleum and Energy Studies, (UPES),
Dehradun 248007, India
e-mail: [email protected]
A. Kumar · G. Kumar
Department of Computer Science & Engineering, Symbiosis Institute of Technology, Symbiosis
International University, Lavale Campus, Pune, India
e-mail: [email protected]
G. Kumar
e-mail: [email protected]
T. Singh
Department of Computer Science and Engineering, Graphic Era Deemed to Be University,
Dehradun, India
e-mail: [email protected]
T. Choudhury (B)
Department of Computer Science & Engineering, Symbiosis Institute of Technology, Symbiosis
International University, Lavale Campus, Pune, India
e-mail: [email protected]; [email protected]
K. Kotecha
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology,
Symbiosis International University, Pune 411045, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 115
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_10
116 K. U. Singh et al.

These classes helped us to categorize the deep learners by how often they’ve been
using, in data representation and for differentiating between various types of anoma-
lies. In addition, deep neural networks in specific anomaly detection tasks presented
incontrovertible proof of their effective implementation.

Keywords Deep learning based anomaly detection · Fraud detection · Intrusion

detection · Deep learning

1 Introduction

Oddities is widely applied to both computer and social security; an illustration of the
latter is a sensory analysis of financial and banking details unconventional pattern
recognition enables unexpected trends to be identified. As unexpected activity occurs
on a computer network, it is possible that sensitive data is being sent to an unautho-
rized location. Since various acts could have been taken in the payment card data
entry phase, there is a greater chance of fraud. Anomalies in the spacecraft’s internal
equipment can result in an error [1]. Changes in pixel intensity in unpredictable
locations can show the existence of potentially deadly tumors. Any behaviors that
deviate from the planned actions are classified as deviant and have the potential to
hold new attacks and tumors at bay. In Fig. 1, you will see some instances of where
abnormalities are discovered: Commercial fraud identification according to Ref. [1]
covers banks, lending institutions, telecommunications, and capital exchange among
others. Some intrusion detection programmers aim to track illegal behavior inside a
data system, but some go after non malicious in order to defend or safe as well [2].
Abuse detection and anomaly analysis of outliers can be done in two separate forms
[3]. Attack identification is limited to old attacks and routines and is advanced by
natural (or abnormal) actions [4]. Because the environment is volatile, we concen-
trate on the identification of deviations owing to the fact that we can accommodate
the unpredictable existence of data shifts.
The author of Ref. [5] has defined anomalies in various ways: Anomalies are stated
by the authors to be doubtful in the eyes of the analyst Ref. [6]. The term anomaly as
found in Grubbs’ book [7] has been defined as a statistically aberrant observation that
is different from the rest of the population. An anomaly was characterized by Heron
[8] as being highly deviant or challenging to pattern. Similarly, 1,044,698:21,746,446
concluded that an anomaly was presented with anomalies with the rest of the results.
According to Ref. [9], an exception is something that sticks out among the others to
become suspicious.
The writer provides a description of an exception as anything that is distinct from
the rest of the points [10]. They all function under the same principle: Every single
one of these words has the same principal behavior. We define our system as Thus,
we define our system as follows: There will often be data in a data set that is only
slightly out of the ordinary; there are, in reality, just two kinds of data in a data set:
ordinary and exceptional.
An In-Depth Exploration of Anomaly Detection, Classification … 117

Fig. 1 Representation of anomaly detection through its fields of applications

The theory contains three major methods of detection: those relating to the intrinsic
characteristics of the signal, the number of types of features that the device identi-
fies, and the form of substance that is yielded [11] One or multidimensional repre-
senting many instances such as objects, documents, points, patterns, among others
[12]. The classifier must conform to all three, punctual, contextual, and multivalued
definitions [13]. It can be categorized into three forms controlled, semi-supervised,
and unsupervised classifications [14]. Detected abnormalities also are returned in
the form of scores or labels [15]. Finally, various methods, such as mathematical
processing, machine learning, knowledge science, information spectrum theory, and
similar methodologies, residual powers, and exceptions are added to the method of
identifying creative ideas [16].
Among machine learning techniques, deep learning has become very popular in
the scientific community, due to the very good results achieved in dissimilar topics
such as image processing, faces, numbers, email, and text fonts JAVA vs. Innovative
or Microsoft [17]. These explanations enhance our ability to use this method in our
studies and experiments.
Algorithms based on the deep learning technique are motivated by the field of
artificial intelligence and try to emulate the cognitive ability of the human brain
[18]. These algorithms commonly make use of the data structure known as the
neural network [19], to which modifications have been made creating new types
of networks destined to work with different types of data or specific functionalities.
Among these new structures, we can mention: Autoencoders (AEs), Deep Neural
Networks (DNN) [20], Restricted Boltzmann Machines (RBM) [21], the Deep Belief
Networks (DBN) [22], the Convolutional Neural Networks (CNN) [23], and the
Recurrent Neural Networks (RNN) [24]. Although these structures are different,
118 K. U. Singh et al.

they are all neural networks because they maintain the basic structure of neurons,
layers, and connections between neurons using linear and nonlinear activation func-
tions. Convolute metric, when many activations, and representation layers (volume
atrocity) are combined with several samples [25] which will process complicated data
at a simple level of abstraction [26]. These networks may be used individually, but
greater efficiency is achieved when used in combination. One of the most commonly
employed methods is the GAN (generative adversarial network). Generator networks
and discriminator networks, the generator network distributes samples in the training
data room, making it difficult for the discriminator to classify. This interactive rela-
tionship between both networks achieves a simultaneous op timization through a
minimum game for two players.
The purpose of this document was to research deep learning methods that could
be useful for anomaly detection, as mentioned above novel approaches to avoidance
of and detection of fraud and malicious software. This is the current state of the art
of our understanding of the topic.

2 Materials and Methods

This section reviews the most recent work related to anomaly detection, specifically
fraud and intrusion detection, which are based on deep learning techniques [43–
48]. For this, it begins with a brief explanation of the operation of the methods
for detecting anomalies. Function extraction is the starting point for running flow
in anomaly detection methods. The interpretation of the data and the algorithm was
performed in a manner where it can distinguish between usual and irregular situations.
Educated designers will see into the future.
Fraud detection and according to the Association of Fraud Examiners, the usage
of one’s job title to profit involves using company properties for personal gain by
fraudulent means [27]. In addition, the Concise Oxford Dictionary described fraud
as criminal deceit. The two methods to deter fraud: prevent it or catch it as it occurs
[28]. When it comes to preventing theft, you have to fight to locate it when it has
not yet occurred and has been performed until it has been found [29]. Credit card
theft, cell phone fraud, insurance premium fraud, and stock dealing have been widely
researched [30] (Fig. 2).
EAs have been very useful for the detection of unsupervised fraud, which is why
they have been used in several studies [31]. A method based on a cost-sensitive
learning approach was proposed where a type of AE known as Stacked DE noising
Auto encoders (SDAE) is used [32] to identify fraudulent transactions in a financial
fraud detection problem. In this work, a basic selection of instances is carried out
in the characteristic extraction step, taking into account the number of non-null
attributes of the transactions. In addition, they introduce a modification to the cost
function of the SDAE in order to minimize the cost of misclassification. In this
way, fraudulent transactions are identified effectively and efficiently. The authors
of [33] proposed a method for detecting credit card fraud. This method consists of
An In-Depth Exploration of Anomaly Detection, Classification … 119

Fig. 2 Flow of an anomaly detection method taking credit card transactions as data

classifying a bank transfer request in real time using an AE, which is trained to take
into account the information of transactions carried out previously. The authors of
[34] propose 3 methods for the detection of fraud in banking transactions using AEs.
Among the three combinations, the first is an AE for the extraction of characteristics
and a traditional classifier, and the other two AE-AE, AE-SDAE under the GAN
strategy, where the first network acts as the extractor of characteristics and the other
as the classifier. In Ref. [35] a method is proposed that uses an AE in the feature
extraction step and follows a GAN strategy for fraud detection. In this work, an AE
is used to achieve a representation of non-malicious users taking into account their
activity online. They then generate another fictitious representation of non-malicious
users using a DNN that is used as the generating network for the GAN. Finally, using
another DNN (known as the GAN discriminator) you learn to identify real non-
malicious users. In this way, by processing the actual data, the method is able to
separate non-malicious users from the rest. A testing platform for payment card
abuse. They added a new characteristic, focused on the entropy benefit over time,
to their inquiry. The authors generated a function matrix based on seven classical
functions, which they then used to derive this entropy. Sampling is chosen for this
project because of the current inequalities in the data which solves this problem,
based on the repopulation of the transactions. This increase in weighted data aims to
avoid over-training the network towards only one data type. These feature matrices
are used as input to a CNN, which aims to classify transactions as abnormal or
normal.
There are other networks (Redirection Blocking Macros) that are considered to
be used for detecting fraud [36]. A technique built in this study [37] would use an
RBM to classify credit card fraud. Applying RBMs to previous transaction history
verifies the bank transfers in real time. The authors of the work [38] carry out a
comparative study between some traditional classification methods (multinomial
logistic regression, multilayer perceptron, and vector support machine) and a method
based on DBN with an RBM. This work showed the superiority in efficiency of the
method used by RBM for the classification of credit frauds.
In [39], a framework was proposed for the detection of fraud in auto insurance
through a combination of a text mining technique based on LDA [40], categorical data
120 K. U. Singh et al.

information and numeric data, as well as a DNN. In this framework, a word segmen-
tation technique is used for text processing, and an LDA model for the extraction of
topics from segmented texts. With these topics, categorical and numerical informa-
tion, the characteristics that are passed to the DNN are made so that it learns from
them. In this way, it is identified if an auto accident claim is fraudulent.

3 Intrusion Detection

Computer security services prioritize data device interference [41]. This early iden-
tification helped create intrusion systems that prevent further assaults from causing
substantial harm [42]. Universal and denial-of-service assaults are feasible. I, you,
him, her (2018). We collaborated with 1,044,698:21,746,482. When a denial of
service attack floods all services and the computer’s bandwidth with false requests,
no one can access network resources. Vulnerability detectors find exploitable holes.
In an R2L attack, packets are sent from a user with higher privileges to a less efficient
system to drop lower privileges. U2R attacks start with a user account and advance
to device control.
An intrusion detection and defence approach is proposed [43]. Implementing
attack encodings as DoS layers identifies DoS-type assaults. Some argued that deep
AEs may remove the decoder without affecting quality. These modified AEs were
named asymmetric deep (NDAEs). The final rendering structure uses two NDAEs
in a chain, with outputs from one going into the other. After representing the data
using the preceding chain of NDAEs, a Random Forest classifier is employed to
detect intrusions. Also, AEs can identify U2R-type assaults [44]. SDAEs are utilized
to represent data with minimal dimensionality in this study. This SDAE has three
AEs in a chain that were trained unsupervised and then fine-tuned. Attacks were
identified using a Softmax classifier. Similar to Ref. [45], an SDAE reduces the data
and a vector support machine classifies traffic network threats like PU-IDS Dataset
[46]. A gluttonously trained deep EA is used in Ref. [47].
Layers act greedily to prevent overfitting and local optimal. This helps them
categorize assaults efficiently.
You may also employ DNNs for intrusion prevention. To minimize threat propaga-
tion, this network was tested in attack classification systems using DDoS (distributed
denial of service) methods [48]. Hidden layers employ the right amount of neurons.
A swarm-based genetic algorithm determined this neuron number. This enhances
network learning. A probabilistic neural network classifier identifies each network
assault type. A probabilistic classifier is employed at the conclusion of a DBN
network to fine-tune data classification in [49]. Secret and output layer neurons
were double the quantity required in the previous study.
Authors of [50] used CNNs for intrusion detection. CNNs and sequential data
modeling were tested to analyze and classify all network assaults [51]. CNN-RNN,
CNN-LSTM, and CNN-GRU were utilized. The best combination was a CNN-LSTM
with a concealed three-layer CNN. They propose a CNN variation called dilated
An In-Depth Exploration of Anomaly Detection, Classification … 121

convolutional AEs (DCA) in [54] that uses stacked autoencoders and CNNs. This
study uses convolution and DE convolution to decompose data. Dilated convolutional
layers replace clustering layers in this network. A Softmax layer was used to fine-tune
this variant for attack categorization as it does not require labeled data for training.
Search engine querying often uses Gartner ate computation and neural intrusion
detection [55]. The article may forecast an RNN. The authors employ the activation
function and RNN to avoid knowledge loss when the gradient hits 0. A multi-layer
perceptron with an RNN-LST modification displays the findings. DDoS assaults fall
into three categories. This method detects DDoS, injection, and malware. Tang et al.
[56] integrated an RNN with a sequential data modeling approach. They efficiently
categorize software network intrusion assaults using the GRU-RNN. An RNN was
employed as a classifier without modification to identify various invasions. Again,
they combined an RNN network with a sequential data model and the LSTM using a
stochastic gradient descending optimizer, then optimized it using a Nadam optimizer.

4 Results and Discussion

Citing its usefulness in forecasting network protection deep learning has improved
be- because of it. Many implementations in pattern recognition and data mining have
taken advantage of this form of learning, which has increased the effectiveness of
the techniques for spotting anomalies in other activities, such as anomaly detection.
Consider the origin of deviations, which changes when a process is being developed
when choosing a tool for detection. In order to solve the detection issue, a lot of
adaptability is needed in deep neural networks. The different models of deep neural
networks have previously shown the capacity to find anomalies. A large number of
these ventures have utilized neural networks for dimensionality reduction, network
fraud prevention, and neural network spoofing, in addition to other features.
On the other hand, it can be mentioned that the GAN strategy has been little used
despite the good results achieved with its use. This is due to the complexity involved
in its implementation and training. However, the use of various types of networks
was shown as in where the combination of a deep EA and an SDAE is used; and AE
is combined with a DNN, separating non-malicious users.

5 Conclusions

Most of the articles in this study utilize deep or extended artificial intelligence.
With respect to data classification, most of these EAs were employed for binary
classifications, while only a few were used for rule creation. With respect to the
above, there is not enough data to justify multi-class identification for this form of
intrusion. Thus, it can be calculated that substantial advances are possible in data
mining, especially in anomaly detection, which can be made through Supervised and
122 K. U. Singh et al.

unsupervised learning. Although deep neural networks continue to do well in mining

tasks, it’s extremely doubtful that the use of them would have a dramatic impact on
mining algorithms. In contrast, a study like this did not find anomalous behavior in
the context of data mining using deep learning. This study serves as an excellent
starting point for further investigation of the problem.

References

1. Studiawan H, Sohel F, Payne C (2020) Anomaly detection in operating system logs with deep
learning-based sentiment analysis. In: IEEE transactions on dependable and secure computing.
https://doi.org/10.1109/TDSC.2020.3037903
2. Ahmed, Sajan KS, Srivastava A, Wu Y (2021) Anomaly detection, localization and clas-
sification using drifting synchrophasor data streams. In: IEEE transactions on smart grid. https://
doi.org/10.1109/TSG.2021.3054375
3. Ahn H (2020) Deep learning based anomaly detection for a vehicle in swarm drone system. In:
2020 international conference on unmanned aircraft systems (ICUAS), Athens, Greece, 2020,
pp 557–561. https://doi.org/10.1109/ICUAS48674.2020.9213880
4. Park H, Park D-H, Kim S-H (2020) Deep learning-based method for detecting anomalies
of operating equipment dynamically in livestock farms. In: 2020 international conference
on information and communication technology convergence (ICTC), Jeju, Korea (South), pp
1182–1185. https://doi.org/10.1109/ICTC49870.2020.9289351
5. Naseer S et al (2018) Enhanced network anomaly detection based on deep neural networks.
IEEE Access 6:48231–48246. https://doi.org/10.1109/ACCESS.2018.2863036
6. Garg S, Kaur K, Kumar N, Rodrigues JJPC (2019) Hybrid deep-learning-based anomaly detec-
tion scheme for suspicious flow detection in SDN: a social multimedia perspective. IEEE Trans
Multimedia 21(3):566–578. https://doi.org/10.1109/TMM.2019.2893549
7. Munir M, Chattha MA, Dengel A, Ahmed S (2019) A comparative analysis of traditional
and deep learning-based anomaly detection methods for streaming data. In: 2019 18th IEEE
international conference on machine learning and applications (ICMLA), Boca Raton, FL,
USA, 2019, pp 561–566. https://doi.org/10.1109/ICMLA.2019.00105
8. Qian K, Jiang J, Ding Y, Yang S (2020) Deep learning based anomaly detection in water distri-
bution systems. In: 2020 IEEE international conference on networking, sensing and control
(ICNSC), Nanjing, China, 2020, pp 1–6. https://doi.org/10.1109/ICNSC48988.2020.9238099
9. Zhang G, Qiu X, Gao Y (2019) Software defined security architecture with deep learning-
based network anomaly detection module. In: 2019 IEEE 11th international conference on
communication software and networks (ICCSN), Chongqing, China, 2019, pp 784–788. https://
doi.org/10.1109/ICCSN.2019.8905304
10. Dong Y, Wang R, He J (2019) Real-time network intrusion detection system based on deep
learning. In: 2019 IEEE 10th international conference on software engineering and service
science (ICSESS), Beijing, China, 2019, pp 1–4. https://doi.org/10.1109/ICSESS47205.2019.
9040718
11. Kavousi-Fard, Dabbaghjamanesh M, Jin T, Su W, Roustaei M (2020) An evolutionary deep
learning-based anomaly detection model for securing vehicles. In: IEEE transactions on
intelligent transportation systems. https://doi.org/10.1109/TITS.2020.3015143
12. Garg S, Kaur K, Kumar N, Kaddoum G, Zomaya AY, Ranjan R (2019) A hybrid deep learning-
based model for anomaly detection in cloud datacenter networks. IEEE Trans Netw Serv
Manage 16(3):924–935. https://doi.org/10.1109/TNSM.2019.2927886
13. Fernández Maimó L, Perales Gómez ÁL, García Clemente FJ, Gil Pérez M, Martínez Pérez G
(2018) A self-adaptive deep learning-based system for anomaly detection in 5G networks. In:
IEEE Access 6:7700–7712. https://doi.org/10.1109/ACCESS.2018.2803446
An In-Depth Exploration of Anomaly Detection, Classification … 123

14. Li X, Chen P, Jing L, He Z, Yu G (2020)SwissLog: robust and unified deep learning based log
anomaly detection for diverse faults. In: 2020 IEEE 31st international symposium on software
reliability engineering (ISSRE), Coimbra, Portugal, 2020, pp 92–103. https://doi.org/10.1109/
ISSRE5003.2020.00018
15. Alrawashdeh K, Purdy C (2018) Fast activation function approach for deep learning based
online anomaly intrusion detection. In: 2018 IEEE 4th international conference on big data
security on cloud (BigDataSecurity), IEEE International Conference on High Per- formance
and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and
Security (IDS), Omaha, NE, USA, 2018, pp 5–13. https://doi.org/10.1109/BDS/HPSC/IDS18.
2018.00016
16. Dong L, Zhang Y, Wen C, Wu H (2016) Camera anomaly detection based on morphological
analysis and deep learning. In: 2016 IEEE international conference on digital signal processing
(DSP), Beijing, China, 2016, pp 266–270. https://doi.org/10.1109/ICDSP.2016.7868559
17. Lee W-Y, Wang Y.-C.F. (2020) Learning disentangled feature representations for anomaly
detection. In: 2020 IEEE international conference on image processing (ICIP), Abu Dhabi,
United Arab Emirates, 2020, pp 2156–2160. https://doi.org/10.1109/ICIP40778.2020.9191201
18. Manimurugan S, Al-Mutairi S, Aborokbah MM, Chilamkurti N, Ganesan S, Patan R (2020)
Effective attack detection in internet of medical things smart environment using a deep
belief neural network. IEEE Access 8:77396–77404. https://doi.org/10.1109/ACCESS.2020.
2986013
19. Fernández GC, Xu S (2019) A case study on using deep learning for network intrusion detection.
In: MILCOM 2019—2019 IEEE Military Communications Conference (MILCOM), Norfolk,
VA, USA, 2019, pp 1–6. https://doi.org/10.1109/MILCOM47813.2019.9020824
20. Lin M, Zhao B, Xin Q (2020) ERID: a deep learning-based approach towards efficient real-
time intrusion detection for IoT. In: 2020 IEEE eighth international conference on communi-
cations and networking (ComNet), Hammamet, Tunisia, pp 1–7. https://doi.org/10.1109/Com
Net47917.2020.9306110
21. Haselmann M, Gruber DP, Tabatabai P (2018) Anomaly detection using deep learning based
image completion. In: 2018 17th IEEE international conference on machine learning and appli-
cations (ICMLA), Orlando, FL, USA, pp 1237–1242. https://doi.org/10.1109/ICMLA.2018.
00201
22. Malaiya RK, Kwon D, Suh SC, Kim H, Kim I, Kim J (2019) An empirical evaluation of deep
learning for network anomaly detection. IEEE Access 7:140806–140817. https://doi.org/10.
1109/ACCESS.2019.2943249
23. Haider S, Akhunzada A, Ahmed G, Raza M (2019)Deep learning based ensemble convolutional
neural network solution for distributed denial of service detection in SDNs. In: 2019 UK/China
emerging technologies (UCET), Glasgow, UK, 2019, pp 1–4. https://doi.org/10.1109/UCET.
2019.8881856
24. Miau S, Hung W-H (2020) River flooding forecasting and anomaly detection based on deep
learning. IEEE Access 8:198384–198402. https://doi.org/10.1109/ACCESS.2020.3034875
25. Potluri S, Diedrich C (2019)Deep learning based efficient anomaly detection for securing
process control systems against injection attacks. In: 2019 IEEE 15th international conference
on automation science and engineering (CASE), Vancouver, BC, Canada, 2019, pp 854–860.
https://doi.org/10.1109/COASE.2019.8843140
26. Abeyrathna D, Huang P, Zhong X (2019) Anomaly proposal-based fire detection for cyber-
physical systems. In: 2019 international conference on computational science and computa-
tional intelligence (CSCI), Las Vegas, NV, USA, 2019, pp 1203–1207. https://doi.org/10.1109/
CSCI49370.2019.00226
27. Ma N, Peng Y, Wang S, Liu D (2018)Hyperspectral image anomaly targets detection with
online deep learning. In: 2018 IEEE international instrumentation and measurement technology
conference (I2MTC), Houston, TX, USA, 2018, pp. 1–6. https://doi.org/10.1109/I2MTC.2018.
8409615
28. Ding K,Ding S, Morozov A, Fabarisov T, Janschek K (2019) On-line error detection and
mitigation for time-series data of cyber-physical systems using deep learning based methods.
124 K. U. Singh et al.

In: 2019 15th european dependable computing conference (EDCC), Naples, Italy, 2019, pp
7–14. https://doi.org/10.1109/EDCC.2019.00015
29. Ma X, Shi W (2020) AESMOTE: adversarial reinforcement learning with SMOTE for anomaly
detection. In: IEEE transactions on network science and engineering. https://doi.org/10.1109/
TNSE.2020.3004312
30. Maggipinto M, Beghi A, Susto GA (2019)A deep learning-based approach to anomaly detection
with 2-dimensional data in manufacturing. In: 2019 IEEE 17th international conference on
industrial informatics (INDIN), Helsinki, Finland, 2019, pp 187–192. https://doi.org/10.1109/
INDIN41052.2019.8972027
31. Fang X et al (2020) Sewer pipeline fault identification using anomaly detection algorithms
on video sequences. IEEE Access 8:39574–39586. https://doi.org/10.1109/ACCESS.2020.297
5887
32. Aygün RC, Yavuz AG (2017)A stochastic data discrimination based autoencoder approach for
network anomaly detection. In: 2017 25th signal processing and communications applications
conference (SIU), Antalya, Turkey, 2017, pp 1–4. https://doi.org/10.1109/SIU.2017.7960410
33. Hussain Q Du, Ren P (2018)Deep learning-based big data-assisted anomaly detection in cellular
networks. In: 2018 IEEE global communications conference (GLOBECOM), Abu Dhabi,
United Arab Emirates, 2018, pp 1–6. https://doi.org/10.1109/GLOCOM.2018.8647366
34. Marsiano FD, Soesanti I, Ardiyanto I (2019)Deep learning-based anomaly detection on surveil-
lance videos: recent advances. In: 2019 international conference of advanced informatics:
concepts, theory and applications (ICAICTA), Yogyakarta, Indonesia, 2019, pp 1–6. https://
doi.org/10.1109/ICAICTA.2019.8904395
35. Togo R, Saito N, Ogawa T, Haseyama M (2019) Estimating regions of deterioration in electron
microscope images of rubber materials via a transfer learning-based anomaly detection model.
IEEE Access 7:162395–162404. https://doi.org/10.1109/ACCESS.2019.2950972
36. Nie L, Zhao L, Li K (2020) Glad: global and local anomaly detection. In: 2020 IEEE interna-
tional conference on multimedia and expo (ICME), London, UK, pp 1–6. https://doi.org/10.
1109/ICME46284.2020.9102818
37. Miller J,Wang Y, Kesidis G (2018) Anomaly detection of attacks (ada) on DNN classifiers at
test time. In: 2018 IEEE 28th international workshop on machine learning for signal processing
(MLSP), Aalborg, Denmark, 2018, pp 1–6. https://doi.org/10.1109/MLSP.2018.8517069
38. Perera P, Patel VM (2019) Learning deep features for one-class classification. IEEE Trans
Image Process 28(11):5450–5463. https://doi.org/10.1109/TIP.2019.2917862
39. Salama R, Al-Turjman F, Bordoloi D, Yadav SP (2023) Wireless sensor networks and green
networking for 6G communication—an overview. In: 2023 international conference on compu-
tational intelligence, communication technology and networking (CICTN), Ghaziabad, India,
2023, pp 830–834. https://doi.org/10.1109/CICTN57981.2023.10141262
40. Aygun RC, Yavuz AG (2017) Network anomaly detection with stochastically improved autoen-
coder based models. In: 2017 IEEE 4th international conference on cyber security and cloud
computing (CSCloud), New York, NY, USA, 2017, pp 193–198. https://doi.org/10.1109/CSC
loud.2017.39
41. Masood U,Asghar A, Imran A, Mian AN (2018) Deep learning based detection of sleeping
cells in next generation cellular networks. In: 2018 IEEE global communications conference
(GLOBECOM), Abu Dhabi, United Arab Emirates, 2018, pp 206–212. https://doi.org/10.1109/
GLOCOM.2018.8647689
42. Qin Y, Wei J, Yang W (2019) Deep learning based anomaly detection scheme in software-
defined networking. In: 2019 20th Asia-Pacific network operations and managementsympo-
sium (APNOMS), Matsue, Japan, 2019, pp 1–4. https://doi.org/10.23919/APNOMS.2019.889
2873
43. Mishra M, Sarkar T, Choudhury T et al (2022) Allergen30: detecting food items with possible
allergens using deep learning-based computer vision. Food Anal Methods 15:3045–3078.
https://doi.org/10.1007/s12161-022-02353-9
44. Sayyad S, Kumar S, Bongale A, Kotecha K, Abraham A (2023) Remaining useful-life predic-
tion of the milling cutting tool using time–frequency-based features and deep learning models.
Sensors 23:5659. https://doi.org/10.3390/s23125659
An In-Depth Exploration of Anomaly Detection, Classification … 125

45. Choudhury T, Anggarwal A, Tomar R (2020) A deep learning approach to helmet detection
for road safety. J Sci Ind Res (India) 79(June):509–512
46. Rajendran A et al (2022) Detecting extremism on Twitter during U.S. Capitol Riot using deep
learning techniques. IEEE Access 10:133052–133077. https://doi.org/10.1109/ACCESS.2022.
3227962
47. Natarajan B et al (2022) Development of an end-to-end deep learning framework for sign
language recognition, translation, and video generation. IEEE Access 10:104358–104374.
https://doi.org/10.1109/ACCESS.2022.3210543
48. Khanna A, Sah A, Choudhury T (2020) Intelligent mobile edge computing: a deep learning
based approach. In: Singh M, Gupta P, Tyagi V, Flusser J, Ören T, Valentino G (eds) Advances
in computing and data sciences. ICACDS 2020. Communications in Computer and Information
Science, vol 1244. Springer, Singapore. https://doi.org/10.1007/978-981-15-6634-9_11
Comparative Analysis of Docker Image
Files Across Various Programming
Environments

Kamred Udham Singh, Ankit Kumar, Gaurav Kumar, Teekam Singh,

Tanupriya Choudhury, and Ketan Kotecha

Abstract Docker technology plays an essential part in improving the process of

software development while also efficiently solving the issues related to deployment
and hosting. Docker is able to work independently of the underlying hardware infras-
tructure since it uses virtualisation as its mode of operation. Instead, it makes use of
its own collection of code in order to carry out the implementations of applications.
The concept of enclosing the whole of an application’s environment together with
its source code is fundamental to Docker’s approach. Docker successfully avoids the
problems that are often associated with having a variety of working environments
by using this method. This method, on the other hand, might result in a larger file
size since it includes the whole environment in its representation. Within the Docker
environment, these data files are referred to as “Images,” and the considerable size
of these images may be affected by a wide variety of variables. In the course of this
study, a number of tests were carried out, each of which included the use of Docker to

K. U. Singh (B)
School of Computing, Graphic Era Hill University, Dehradun, India
e-mail: [email protected]
A. Kumar · G. Kumar
Department of Computer Engineering &, Applications GLA University, Mathura, UP, India
e-mail: [email protected]
G. Kumar
e-mail: [email protected]
T. Singh
Department of Computer Science and Engineering, Graphic Era Deemed to Be University,
Dehradun, India
e-mail: [email protected]
T. Choudhury (B)
School of Computer Science, University of Petroleum and Energy Studies (UPES), Bidholi,
Dehradun, Uttarakhand 248007, India
e-mail: [email protected]
K. Kotecha
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology,
Symbiosis International University, Pune 411045, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 127
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_11
128 K. U. Singh et al.

run the same lines of code on a variety of different platforms. The major aim was to
improve the feasibility of conducting a thorough comparison study. Software engi-
neers were given the ability to evaluate and compare results by exploiting Docker’s
capabilities, which finally enabled them to arrive at well-informed conclusions about
the most appropriate course of action. The software development community is able
to get useful insights into the benefits and considerations of using Docker on multiple
platforms as a result of this research, which paves the way for a journey that is more
streamlined in terms of both development and deployment.

Keywords Dockers · Cloud programming · Dockers image · Comparative analysis

1 Introduction of Dockers

1.1 Docker

Docker is a platform that is both open and free to use, and it was created with the
intention of simplifying the whole of the software application lifecycle. This includes
the phases of development, distribution, and operation. You will be able to get the
ability to compartmentalise your apps in a way that is separate from their underlying
infrastructure if you make use of Docker’s features [1, 2]. This separation enables a
procedure that is both quick and efficient in the deployment of applications. Docker
ushers in a new paradigm in which the administration of your infrastructure should
reflect the strategy you use for the management of your apps. Utilising Docker’s
optimised procedures, which include quick code delivery, testing, and deployment,
results in a considerable decrease in the amount of time that elapses between the
production of new code and its actual application in the real world. Containers are the
fundamental building block of Docker’s architecture [3]. Containers are standardised
file formats that include software that has been pre-packaged and all of the essen-
tial dependencies that have been painstakingly assembled for smooth application
execution. Notably, containers enable the execution contexts of various programmes
are kept separate while yet allowing them to share essential components of the oper-
ating system. This is a significant benefit of using containers [4, 5]. These containers,
which are often measured in megabytes, use fewer resources than conventional virtual
machines (VMs), and their starting times are much faster. Megabytes are the common
unit of measurement for these containers. Because of their high level of efficiency,
they may be tightly packed into the same piece of hardware and can be collec-
tively started or terminated with a minimum amount of work and overhead [6, 7].
Building software components into contemporary application and service stacks is
made much easier with the foundation that containers offer. These stacks are crucial
in the modern commercial world. Additionally, they simplify the process of regularly
updating and maintaining the system with a high level of granularity. Both Tianshuo
Comparative Analysis of Docker Image Files Across Various … 129

Yang (2019) and Yao (2018) agree that Docker is a helpful tool for creating and
deploying software, which is more evidence of the platform’s value.

1.2 Docker File

The Dockerfile is a crucial part of Docker, a container management system, and

serves as its basis. The creation of a Docker image, a fundamental component of
the container ecosystem, requires the usage of a text document known as a Docker-
file. The Dockerfile is used to build Docker images. In order to generate a Docker
image, the Dockerfile must be run [8]. This file, written in human-readable grammar,
contains the instructions necessary for the challenging process of creating a Docker
image. In the following paragraphs, we’ll talk more specifically about this process
and describe it in further detail. In its most basic form, a Dockerfile serves as a
blueprint by describing the underlying operating system upon which containers are
built. This information can be found in the "System Information" section of a Dock-
erfile. It expands its scope to encompass the definition of programming languages,
environmental variables, file paths, network port settings, and a range of other critical
components that join together to form the context of the container [9, 10]. This brings
its total number of supported components up to a total of twenty-five. In addition to
the technical qualities, a Dockerfile will include a comprehensive description of the
role that the container that is created will perform after it has been brought into use.
This viewpoint, which was presented by David Jaramillo in 2016, highlights the
important part that Dockerfiles play in conducting the complex symphony of Docker
containers. Dockerfiles provide a thorough guide for creating the component parts
of containerised applications.

1.3 Docker Image

After you have carefully created your Dockerfile, the following step is to use the
Docker build tool in order to materialise an image based on the blueprint that is
defined inside that Dockerfile. This will allow you to start using your Dockerfile [11].
The resultant Docker image is a self-contained entity that houses the specifications
governing the software components that containers will house, as well as dictating
how these components will work harmoniously together. The Dockerfile serves as a
roadmap, instructing the build process on how to piece together the image. However,
the Docker image itself is a self-contained entity that houses the specifications.
Docker images take on the function of portable files, which enables the settings of
applications to be moved across different types of environments without any difficulty
[12]. Due to the fact that the Dockerfile often contains instructions to get certain
software packages from online repositories, it is imperative that great attention be
paid in order to specifically describe the correct versions. It is possible that ignoring
130 K. U. Singh et al.

this can result in unintended differences in the images that are produced, which is a
circumstance that is reliant on the time of the invocation of the Docker build process.
After it has been created, a picture does not undergo any more transformations
in the natural world. This feature emphasises the static nature of Docker images,
as so astutely pointed out [13]. The dynamic journey that was started by the Dock-
erfile culminates in a static but powerful artefact that encapsulates the core of the
application’s design as well as its requirements. This artefact is then prepared to be
instantiated into containers for the purpose of efficient deployment and operation
[14].

2 Experiment

In this part of the article, we will begin an investigation into the performance of
Docker by digging into several instances in which different technologies provide
different results for the same desired purpose. In the Docker ecosystem, the procedure
begins with the creation of an image file, which is then deposited into a repository.
When the process is moved to a different computer, the image file is obtained, which
prepares the way for the operation to be carried out. The utilisation of the CPU, the
size of the picture file, and the total number of lines of code are three examples of the
many aspects that come into play throughout the spectrum of various technologies.
The complex dynamic between all of these factors is a contributor to the wide range
of possible outcomes. This research focuses on doing a comparative examination
of the programming languages Python and Java within the context of the Docker
environment in order to shed light on the relative strengths and weaknesses of each
language [15]. Our work is set to shed light on the complex performance differ-
ences, which are driven by the technology used, and give insights into how Docker
interacts with various languages. We want to achieve this goal by doing painstaking
research on facets such as computing efficiency, image size, and code complexity
in an effort to understand the complicated relationships that exist between Docker
and other technological options [16]. In the end, this examination should enhance
our knowledge of Docker’s flexibility as well as the subtleties that influence its inter-
action with a variety of technologies, which will highlight the intricacies of current
software development and deployment.

2.1 Docker Image File with Java

A methodical procedure is required for the generation of an image file for Java while
working inside the Docker framework. The process of creating an image begins
with the drafting of a Java program and the subsequent storing of that file with
the.java extension; this signals the beginning of the voyage. This picture file contains
an environment that allows for the smooth operation of the Java program in its
Comparative Analysis of Docker Image Files Across Various … 131

entirety, and it contains all of the necessary components [17, 18]. The Java runtime
environment, crucial libraries, auxiliary files, and, of course, the Java source code
itself are all considered to be part of the environment. The word "environment"
[28–32] encompasses the whole constellation of these important components. This
conglomeration of requirements and resources comes together to generate a unified
image file in the Java ecosystem, which is then ready for deployment in the Docker
ecosystem. A tangible example of this may be seen when a Java file is created, which
is often given a designation such as “f100.” This critical phase lays the groundwork
for the eventual generation of the picture file, which will be referred to throughout
the process as the “f100” image [19]. The figure that accompanies this explanation
provides a visual representation of the process, which captures the spirit of the Java-
centric image development that occurs inside the Docker ecosystem. This technique,
in its most basic form, encompasses the transformation of a Java programmed into
an independent entity that is optimised for the containerisation offered by Docker.
This method results in the Docker image being an enclosed artefact that is ready to
run the Java programmed while also coordinating the necessary runtime components
and dependencies (Fig. 1).
The visual representation provides a vivid insight into the imagefile named
“f100.” This imagefile is distinctly identified by its image ID, specifically denoted
as 5166a3ba961b, accompanied by a substantial size of 514 MB. Notably, the image
bears the default tag “latest,” signifying its current iteration. The illustrative depic-
tion offers a glimpse into the tangible outcome of executing the said imagefile, aptly
named “f100.” The process of initiating this execution involves the utilisation of a
command: “docker run [image ID].” Remarkably, the command’s syntax dictates that
merely the first four digits of the image ID are imperative for successfully triggering
the execution of the imagefile. This visual portrayal provides an at-a-glance under-
standing of the intricate interplay of Docker images, tags, and execution, rendering a

Fig. 1 Access the docker image with Java support

132 K. U. Singh et al.

comprehensive overview of the operational dynamics within the Docker ecosystem

[20]. The streamlined approach of running images, as well as the associated iden-
tification and execution processes, is elucidated, aiding both novices and seasoned
practitioners in navigating Docker’s intricacies with ease and precision.

2.2 Docker Image File with Python

Python includes its own self-contained environment, which has been streamlined to
ensure that Python files may be executed without a hitch. In a separate effort geared
at reaching the same results as those attained via the use of Java, we are developing a
Python file that will be customised to provide the same results. As soon as the Python
programme is up and running, it immediately becomes the centre of attention when
it comes to producing a related picture file [21, 22]. After that, a location inside a
certain repository is found for this picture file to call home. An image file that is
relevant to the Python programme gets the spotlight as part of the presentation’s
alignment with the visual component. This particular picture file has a one-of-a-kind
image ID, which acts as a distinguishing identifier for it. In addition, the picture file
has a certain size, which is a criterion that is significant for determining the extent
of its scale. It is important to note that the default tag that has been applied to this
picture file is "latest," highlighting the image’s most recent rendition that can be
found inside the repository [23, 24]. The intrinsic symmetry that exists between the
Java and Python picture production processes is brought to light by the complete
description that has been provided. The illustration draws attention to the robust
environment that Python provides, which has been optimised to run Python files
quickly, while at the same time drawing attention to the crucial function that image
files play inside the Docker architecture. This visual story deepens our understanding
of Docker’s adaptability in supporting a wide variety of programming languages and
draws attention to the complex relationships and opportunities that are inherent in
today’s software development and deployment procedures (Fig. 2).

Fig. 2 Access the docker image with image id and tag

Comparative Analysis of Docker Image Files Across Various … 133

2.3 Comparative Results

Our investigation up to this point has brought to light two unique image files: the
“f100” file, which is designed for Java, and the “python” file, which is designed for
Python. Regardless of the differences in their underlying programming, both picture
files exhibited the ability to produce the same output, demonstrating an exciting
convergence of results across a variety of different programming environments.
During our in-depth analysis, we scrutinised a wide range of important parame-
ters, including the amount of time spent using the CPU, the size of the picture, the
number of lines of code (LOC), and the amount of time spent using memory. The
use of this analytical approach enables us to get insights into the performance of
Python and Java within the context of Docker [25]. Using the information that we
have collected, we will now give a detailed comparison.
A Factor In20% 25% LOC Less More Python Utilisation of the CPU Compared
to Java Size of the image: 855 MB 515 MB Runtime utilisation: 12% 8%. The
comparisons and differences between Python and Java inside the Docker environment
are shown to be fascinating by the insights obtained from this table. Python has a
lesser number of Lines of Code (LOC), which translates to scripts that are more
succinct, while Java often exhibits a bigger number of LOC. The picture size, on
the other hand, reveals a fascinating paradox: despite Java’s more succinct script
style, Python’s image size is much greater than Java’s. Python’s runtime utilisation
is somewhat higher than that of Java, while Python’s CPU utilisation shows a modest
edge. Java’s runtime utilisation is slightly higher than that of Python.
This exhaustive comparison, in essence, highlights the complicated interaction
that exists between programming languages, image features, and the complex envi-
ronment that Docker provides [26, 27]. This realisation provides the way for informed
decisions to be made in aligning particular technology choices with the needs of
individual applications, which in turn propels contemporary software development
methods ahead.

3 Conclusion

Using the Docker technology enables the building of all-encompassing runtime envi-
ronments that are easily portable across different computer systems. During the
course of this endeavour, a striking realisation has been apparent: the enormous
quantity of files that have been produced as a consequence. Because of this finding,
a penetrating interest is inevitably sparked: what are the implications that lay behind
the various file sizes that are generated by comparable programme outputs generated
in different programming languages? As a result, the Java and Python program-
ming languages, which both provide identical results, are going to be investigated
throughout this research. When programmers are given the option of using a different
language, an in-depth comparative examination of the whole system follows, with
134 K. U. Singh et al.

the goal of determining which programming language comes out on top. This inves-
tigation began with a concentration on file size, but it has now expanded to include
a wide variety of auxiliary elements that have an effect on the overall programming
environment.
The voyage of study dives into a deep investigation, illuminating the complex
dynamic that exists between programming languages and the Docker environment.
In addition to simple concerns about file size, the research identifies a variety of other
aspects that contribute to the formation of the holistic programming environment.
This analysis provides programmers with essential insights that empower them to
make educated decisions, which eventually helps to advance the status of current
software development techniques. These insights are provided by unravelling the
complexities that control the selection of programming languages inside Docker’s
domain.

4 Future Scope

Docker’s usefulness can be extended to a variety of platform types, which makes it an

adaptable tool that can be used for many different kinds of programmes. Notably, it
functions as a platform that can accommodate numerous programming environments.
This is a significant advantage. Because of this property, it is possible to conduct a vast
number of combinatorial studies, which provides a novel vantage point from which
to assess and improve the effectiveness of the system when it comes to software
development endeavours.
By putting Docker’s versatility to use inside a variety of programming environ-
ments, one has the possibility to carry out exhaustive studies that investigate a number
of different permutations. This research has the potential to unearth ideal setups that
match programming languages with the capabilities of Docker, which would result
in software development processes that are more streamlined and effective. Docker’s
flexibility, in its purest form, provides a fertile field for experimentation by permitting
the selection of programming and Docker configurations that harmoniously combine
to generate greater productivity and creativity in the arena of software development.
This is accomplished via the selection of programming and Docker configurations
that harmoniously coalesce.

References

1. AbdelBaky M, Diaz-Montes J, Parashar M (2017) Towards distributed software-defined envi-

ronments. In: Proceedings—2017 17th IEEE/ACM international symposium on cluster, cloud
and grid computing, CCGRID 2017, 2017, pp 703–706. https://doi.org/10.1109/CCGRID.201
7.30
2. Anton V, Ramon-Cortes C, Ejarque J, Badia RM (2017) Transparent execution of task- based
parallel applications in docker with COMP superscalar. In: Proceedings—2017 25th Euromicro
Comparative Analysis of Docker Image Files Across Various … 135

international conference on parallel, distributed and network-based processing, PDP 2017,

2017, pp 463–467. https://doi.org/10.1109/PDP.2017.26
3. Braun N, Hauth T, Pulvermacher C, Ritter M (2017) An interactive and comprehensive working
environment for high-energy physics software with python and Jupyter notebooks. In: J Phys
Conf Ser 898(7). https://doi.org/10.1088/1742-6596/898/7/072020
4. Gopu A, Hayashi S, Young MD, Kotulla R, Henschel R, Harbeck D (2016) Trident: scal-
able compute archives: Workflows, visualisation, and analysis. In: Proceedings of SPIE—the
international society for optical engineering, vol. 9913. https://doi.org/10.1117/12.2233111
5. Dong L, Han K, Liang L, Niu B, Zhao S, Zhu Z (2019) On application-aware and on-demand
service composition in heterogenous NFV environments. In: 2019 IEEE global communica-
tions conference, GLOBECOM 2019—Proceedings. https://doi.org/10.1109/GLOBECOM3
8437.2019.9013916
6. Elkholy M, Marzok MA (2022) Light weight serverless computing at fog nodes for internet of
things systems. Indones J Electr Eng Comput Sci 26(1):394–403. https://doi.org/10.11591/ije
ecs.v26.i1.pp394-403
7. Kim B, Ali T, Lijeron C, Afgan E, Krampis K (2017) Bio-docklets: virtualization containers
for single-step execution of NGS pipelines. GigaScience 6(8):Art no. gix048. https://doi.org/
10.1093/gigascience/gix048
8. Kim J, Jun TJ, Kang D, Kim D, Kim D (2018) GPU enabled serverless computing frame-
work. In: Proceedings—26th euromicro international conference on parallel, distributed, and
network-based processing, PDP 2018, pp 533–540. https://doi.org/10.1109/PDP2018.2018.
00090
9. Melo L, Wiese I, Dramorim M (2021) Using docker to assist QA forum users. IEEE Trans
Softw Eng 47(11):2563–2574. https://doi.org/10.1109/TSE.2019.2956919
10. Lubomski P, Kalinowski A, Krawczyk H (2016) Multi-level virtualization and its impact on
system performance in cloud computing. Commun Comput Inform Sci 608:247–259. https://
doi.org/10.1007/978-3-319-39207-3_22
11. Perez A, Risco S, Naranjo DM, Caballer M, Molto G (2019) On-premises serverless computing
for event-driven data processing applications. In: IEEE international conference on cloud
computing, CLOUD, pp 414–421. https://doi.org/10.1109/CLOUD.2019.00073
12. Rawat P, Bajaj M, Vats S, Sharma V, Gopal L, Kumar R (2023) Optimizing hypothyroid
diagnosis with physician-supervised feature reduction using machine learning techniques.
2023 International conference on computational intelligence, communication technology and
networking (CICTN), Ghaziabad, India, 2023, pp 711–715. https://doi.org/10.1109/CICTN5
7981.2023.10140459
13. Pérez A, Caballer M, Moltó G, Calatrava A (2019) A programming model and middleware for
high throughput serverless computing applications. In: Proceedings of the ACM symposium
on applied computing, 2019, vol. Part F147772, pp 106–113. https://doi.org/10.1145/3297280.
3297292
14. Owsiak M et al (2017) Running simultaneous Kepler sessions for the parallelization of para-
metric scans and optimization studies applied to complex workflows. J Comput Sci 20:103–111.
https://doi.org/10.1016/j.jocs.2016.12.005
15. Yadav SP, Gupta A, Dos Santos Nascimento C, de Albuquerque VHC, Naruka MS, Singh
Chauhan S (2023) Voice-based virtual-controlled intelligent personal assistants. In: 2023 inter-
national conference on computational intelligence, communication technology and networking
(CICTN), Ghaziabad, India, pp 563–568. https://doi.org/10.1109/CICTN57981.2023.101
41447
16. Yadav DP (2021) Feature fusion based deep learning method for leukemia cell classification.
In: 2021 5th international conference on information systems and computer networks (ISCON),
Mathura, India, 2021, pp 1–4. https://doi.org/10.1109/ISCON52037.2021.9702440
17. Nguyen N, Bein D (2017) Distributed MPI cluster with Docker Swarm mode. In: 2017 IEEE
7th annual computing and communication workshop and conference, CCWC 2017. https://doi.
org/10.1109/CCWC.2017.7868429
136 K. U. Singh et al.

18. Pittard WS, Li S (2020) The essential toolbox of data science: python, R, git, and docker.
Method Mol Biol (Clifton, N.J.), Article vol. 2104:265–311. https://doi.org/10.1007/978-1-
0716-0239-3_15
19. Yadav DP, Kishore K, Gaur A, Kumar A, Singh KU, Singh T, Swarup C (2022) A novel
multi-scale feature fusion-based 3scnet for building crack detection. Sustainability 14:16179
20. Rahman M, Chen Z, Gao J (2015) A service framework for parallel test execution on a devel-
oper’s local development workstation. In: Proceedings—9th IEEE international symposium on
service-oriented system engineering, IEEE SOSE 2015, vol. 30, pp 153–160. https://doi.org/
10.1109/SOSE.2015.45
21. Ruan B, Huang H, Wu S, Jin H (2016) A performance study of containers in cloud environ-
ment. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics), vol. 10065 LNCS, pp 343–356. https://doi.
org/10.1007/978-3-319-49178-3_2
22. Saklani R, Purohit K, Vats S, Sharma V, Kukreja V, Yadav SP (2023) Multicore Implementation
of K-Means Clustering Algorithm. In: 2023 2nd international conference on applied artificial
intelligence and computing (ICAAIC), Salem, India, 2023, pp 171–175. https://doi.org/10.
1109/ICAAIC56838.2023.10140800
23. Ramon-Cortes C, Serven A, Ejarque J, Lezzi D, Badia RM (2018) Transparent orchestration of
task-based parallel applications in containers platforms. J Grid Comput 16(1):137–160. https://
doi.org/10.1007/s10723-017-9425-z
24. Sochat V (2018) The scientific filesystem. GigaScience, 7(5). https://doi.org/10.1093/gigasc
ience/giy023
25. Shukla A (2015) A modified bat algorithm for the quadratic assignment problem. In: 2015
IEEE congress on evolutionary computation (CEC), Sendai, Japan, 2015, pp 486–490. https://
doi.org/10.1109/CEC.2015.7256929
26. Sipek M, Muharemagic D, Mihaljevic B, Radovan A (2020) Enhancing performance of cloud-
based software applications with GraalVM and quarkus. In: 2020 43rd international convention
on information, communication and electronic technology, MIPRO 2020—Proceedings, pp
1746–1751. https://doi.org/10.23919/MIPRO48935.2020.9245290
27. Špaček F, Sohlich R, Dulík T (2015) Docker as platform for assignments evaluation. In: Proc
Eng 100, January ed., pp1665–1671. https://doi.org/10.1016/j.proeng.2015.01.541
28. Singh BK, Danish M, Choudhury T, Sharma DP (2021) Autonomic resource management in
a cloud-based infrastructure environment. In: Choudhury T, Dewangan BK, Tomar R, Singh
BK, Toe TT, Nhu NG (eds) Autonomic computing in cloud resource management in industry
4.0. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.
org/10.1007/978-3-030-71756-8_18
29. Ahmad F et al (2022) Levelized multiple workflow allocation strategy under precedence
constraints with task merging in IaaS cloud environment. IEEE Access 10:92809–92827.
https://doi.org/10.1109/ACCESS.2022.3202651
30. Jain D, Zaidi N, Bansal R, Kumar P, Choudhury T (2018) Inspection of fault tolerance in cloud
environment. In: Bhateja V, Nguyen B, Nguyen N, Satapathy S, Le DN (eds) Information
systems design and intelligent applications. Advances in intelligent systems and computing,
vol 672. Springer, Singapore. https://doi.org/10.1007/978-981-10-7512-4_103
31. Dewangan BK, Agarwal A, Choudhury T, Pasricha A (2021) Workload aware autonomic
resource management scheme using grey wolf optimization in cloud environment. IET
Commun 15(14):1869–1882
32. Jalaj Pachouly et al. (2022) SDPTool : A tool for creating datasets and software defect predic-
tions. SoftwareX 18:101036. ISSN2352-7110. https://doi.org/10.1016/j.softx.2022.101036
Dimensions of ICT-Based Student
Evaluation and Assessment
in the Education Sector
R. Arulmurugan , P. Balakrishnan, N. Vengadachalam,
and V. Subha Seethalakshmi

Abstract The outcome-based education is assessed through direct and indirect

assessment. The direct assessment contains all the courses from semester one to eight.
Each course contains the outcome, that outcome evaluated and achieved through
various teaching and learning activities. Every activity contains rubrics and assess-
ments. The assessment should include all the students’ individual rubrics assess-
ment, as a teacher conducts the activity and preparation of the assessment is one of
the complicated tasks because the time management for conduction of activity, and
clarifying the doubts took more minutes, so everyone failed to assess the students
and individual of all the students are makes more and more complicated task to
the all the teaching community. Finally prepare the result analysis, weak and bright
analysis is the most complicated one. These difficulties are supported by the Infor-
mation Communication Technology (ICT) tools a major role. These tools not only
support for assessment method, but they help to avoid the student’s malpractices and
spend more hours preparing result analysis and finding out weak and bright analysis
students, etc. This article details a description of ICT-based assessment and outcome.
This ICT-based method is used to encourage the participant to perform higher level
compared to the conventional method. In this article, detailed conduction, execution,
and results were showcased.

Keywords Peer group learning · ICT tool-based assessment · Result analysis ·

Weak and bright analysis · Activity-based learning

R. Arulmurugan (B) · P. Balakrishnan · N. Vengadachalam · V. S. Seethalakshmi

Dept. of EE, Annasaheb Dange College of Engineering and Technology, Ashta, Maharashtra,
India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 137
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_12
138 R. Arulmurugan et al.

1 Introduction

Teaching and Learning activities are enhanced year by year. In earlier days teacher
teacher-centric approach of the single concept was there, but it was slowly replaced
by the student-centric approach. The teacher-centric approach contains a lot of draw-
backs such as one side of communication, students are fear and shy to ask questions
in class, teacher doesn’t understand the learner level. Due to these drawbacks, the
exam results are down. It was replaced by Outcome Based Education (OBE) came
to the picture, it was instructed to expect the outcome for each programme, course,
event, etc. It instructs to write the outcome of the event before initiation, so the
organizer and guest lecturer define the content of the delivery and activity. In earlier
days the object of the course was played in the syllabus and programme event. The
instructor defined the objective for the event, course, and programme. Based on the
objective solve the problem. Through the method, failure to identify the student’s
learning level and attainment of the course objective. It was overcome by the OBE.
The OBE clearly instructed to define the outcome of the course, and the outcome of
the program in terms of Programme Outcome (PO) and program program-specific
outcome (PSO). Based on PO and PSO define the Programme Educational Objective
of the concerned programme. Followed the PEO to define the department’s vision
and mission. It creates one closed circle. The Attainment of the PO and PSO is calcu-
lated from direct and indirect attainment. The indirect attainment is calculated from
the guest lecture and workshop feedback, parent’s feedback, graduate feedback or
programme exit survey, alumni survey, employer feedback, recruiter feedback, etc.
This feedback or survey contains the PO and PSO to the question. The participant
points the value between three to one. From the analysis calculate the indirect attain-
ment. The direct attainment calculated from the summation of all the courses for
the concerned batch students covers semester one to semester eight of theory and
practical courses. From the attainment of direct give, the weightage of 80% or 90%
and indirect for 20% or 10%. Through the calculation coming to know the concerned
programme outcome level.

2 Literature Survey

Develop and engage the students through experimental learning. SFIMAR involved
a few case studies of experimental teaching pedagogy. Experimental learning stim-
ulates the learner’s curiosity and enhances the understanding level [1]. Learner
learner-centric approach is forefront. The NEP 2020 suggested critical thinking for
employability and fostering experimental learning as an objective. Learner practicing
was enhanced by the experimental learning. Recent NAAC and NBA accreditation
are expected for experimental-based learning and ICT-based learning activity [2].
Created a higher level of cross-cultural awareness and understanding that education
in a project manner instead of memorizing data. The experimental learning activity
Dimensions of ICT-Based Student Evaluation and Assessment … 139

viz. business games, role-playing, virtual reality, and computer-based simulation.

Experimental-based learning enhances the importance of the university in the support
of educational and teaching pedagogical tools. Deeper learning is another popular
of experimental learning known as project-based learning [3]. Through the activity,
students get deeper technical knowledge on the corresponding topics. It concentrates
on four main concepts such as social interactions, activity building, cognitive tools,
and contextual learning. The project base comes in several forms and enhances the
students to ask questions in the relevant domain. Through question and answer, the
learner got the motivation and remember [4]. Project learning helps to minimize the
following drawbacks such as failure to meet ineffective participation, resulting in
missed teaching time, student-driven project getting off topic, and failure to meet
academic requirements. The project-based learning shows motivation in the studies
and enhances student’s problem-solving ability and confidence level. Through exper-
imental learning, faculty gives fast feedback about the student’s learning level and
understand better. On another side, increases the student’s self-learning pace through
the activity. In addition, student’s enthusiasm, and confidence level [5]. The exper-
imental activity improves the results every consecutive year. While conducting the
activity the tutor learned the shortfall of the activity, which helped to improve the
activity in more better way [6]–[8]. It was noticed that the Learner needed clear
objectives and a picture of how to perform the activity [7]. Project-based learning
highly encouraged student’s critical thinking, innovation, and creative skills.

3 ICT-Based Student Evaluation Methodology

Information Communication Technology (ICT) plays a vital role in supporting the

teaching and learning process. The students enhance through various brainstorming
activities such as peer group learning, think pair share, collaborative learning,
diagram recap activity, etc. The assessment of the activity makes it more complicated
for a single teacher. Each activity contains the rubric assessment, the evaluation of
the individual 60+ students in one of the complicated tasks by a single teacher. On the
side time management for completing the activity within an hour makes more pardon
to the teacher. So, the teacher couldn’t evaluate properly or accurately. These draw-
backs are overcome by the ICT-enabled assessment very supportive of the teaching
system. It generates the assessment with a single click. Through ICT-based assess-
ment encourage the student’s participation at the time of the activity, on another side
to know the student’s weak area or portion. If the number of students is weak in a
particular area, it helps the report to the teacher to teach the concept again. Suppose
fifty percent of the students are weak in one particular area if we identified by the
report. Teachers conduct peer group activities to improve the lacking students.
The ICT-based activity assessment creates interest in the students while partici-
pating in the activity. The ICT not only supports the teacher assessment. Some of the
ICT tools for example Google Form ICT tools send the students’ progress report by
single click. It contains students’ performance, number of correct answers, and wrong
140 R. Arulmurugan et al.

answers. In addition, with which one correct answer, etc. The ICT-based assessment
one side reduces the time consumption in the classroom, in another side the teacher
needs to spend more than an hour to prepare the question and answer on the ICT
tool website. Recent day’s lot of free sources have been offered for conducting the
ICT-based assessment system. Such as quizzes, Google form add-ons Fibonacci,
Moodle quiz, kahoot, etc. The Moodle quiz has various features. Namely, possi-
bility to shuffle the question, the possibility option to avoid repeating the question,
time-based close the form, etc. The quizzes provide the background music while
participating in the activity, it creates one type of boost-up instead of wage partici-
pation. On the teacher screen, the Quizzes show the score value second to second.
Through the screen assessment easy to identify the too lack of participant candidates.
Once identified the lack of interest candidates, a teacher reaches out to the students
to encourage or find the reason for the lack of participation. It helps to learn the
students. Sometimes students are not prepared for the activity in that case ask the
students to read the possibility of the question. Through various methods to make
them learn the content.
Before conducting of ICT-based assessment, the teacher should have a clear vision
about the activity. The activity may be thinking pair share, peer group learning,
collaborative learning, poster activity, assignment activity, diagram activity, etc. The
learning activity followed by the assessment through ICT is very helpful. At least
inform the students early after the completion of the chapter, and ask them to prepare
for the next day’s assessment activity. Sometimes the ICT-based assessment activity
leads up to twenty minutes to thirty minutes to some questions. At the end of the
activity who scored top rank candidate was called onto the stage for a prize and
applause. It creates energy for the students to prepare well for the upcoming activity.
Figure 1 shows the involvement of the students in ICT-based assessment activi-
ties, for these types of activities don’t require more classroom space, which means
need not separate the students because the questions are shuffled. In another possi-
bility, students may ask the question to their friend, which is only also eradicated
by the timing of each question. The timing for each question comes around ten
seconds, during the seconds students need to think and select. If students talk with
others time elapses to answer the corresponding question. So, during the activity,
students couldn’t perform any malefaction. On the other side, the teacher observes
the student’s performance through a monitoring screen as shown in Fig. 2. It clearly
shows the student’s percentage of completions, number of minutes completed, who
participated well, and who didn’t participate well, etc.
The teacher gives running commands to the students while performing. Once
the time elapses teacher informs the students to close the activity by clicking the
end button on the top right corner of the screen. Once the teacher clicks on the end
button the activity comes to an end, after that students can’t participate further. ICT
assessment shows the results in points. Who scored on the top asked the students to
come on the dais to appreciate and applause to create a spark in all the participants
as shown in Fig. 3.
Figure 4 shows the individual question student’s response performance screen
preview. That screen shows how much percentage of accuracy in answering the
Dimensions of ICT-Based Student Evaluation and Assessment … 141

Fig. 1 Student’s participation in ICT activity

Fig. 2 ICT assessment monitoring screen

Fig. 3 Appreciation event of the activity

question and the average time for participating in the question. How many players
answered to options? Through these details assess the student’s level of performance
for the particular topic.
Figure 5 shows the live performance of the student’s level. Overall, what is the
score of the class? Fig. 5 shows the 60% accuracy for the corresponding topic, the
142 R. Arulmurugan et al.

Fig. 4 Individual question and student’s response performance screen preview

number of correct answers shown in green, and the number of wrong answers shown
in red. Not only overall, but individual students’ performance charts are also shown
in the slide bar method and points.
Once the teacher clicks on the activity end button. Figure 6 shows will appear
on the screen with three top-scored student candidates. It creates a spark for all the
candidates for the active preparation and in the upcoming activity.
Figure 7 shows the ICT assessment of the question. In each question how many
students correctly answered and wrongly answered highlighted by green and red
colors. Which question got red color shows students are weak or poor on the particular
topic. Through the chart, the teacher conducts the revision class or activity with
respect to the score of each question.

Fig. 5 Live students’ performance chart

Dimensions of ICT-Based Student Evaluation and Assessment … 143

Fig. 6 Top three rank declared

Fig. 7 ICT assessment with respect to the question

The ICT online quiz method is used to not only encourage the student’s perfor-
mance, in addition, help to reduce the document and micro result analysis. Figure 7
shows the automatic micro result analysis of the concern test sample. Figure 8 shows
the individual performance of all the questions. Figure 5 shows the overall class
average marks, etc. This real statistical analysis helped to identify the student’s
144 R. Arulmurugan et al.

Fig. 8 ICT assessment for individual students

weakness area of the question. The ICT tool method is used to create interest in
participating in the quiz activity and enhance students’ learning levels.

4 Conclusion

This article described ICT-based evaluation method. The earlier conventional

teaching and learning methods used to assess each student are more complicated.
Particularly identification of weak and bright analysis, and micro result analysis are
difficult. Rubrics-based assessment method was not included in the practical exam-
ination. These problems were overcome by activity-based learning, the outcome of
the course content, and rubrics-based assessment came into the picture. However, the
rubrics-based assessment evaluation took a lot of time, and effective evaluation of
each individual is complicated and less accurate. These problems are overcome by
ICT-based assessment evaluation. These ICT-based methods are used to encourage
the participant to perform higher level compared to the conventional method. In this
article, detailed conduction, execution, and results were showcased. Further, extends
into the result generation of all the tests in a single-page analysis. The outcome of the
ICT tool used to conduct the activity in the most friendly manner, find the students’
weaknesses while performing, and analyze the student’s skills easily. In recent days
various free tools are supported for conducting ICT-based assessment. Each one has
its own merits. At the end of the ICT assessment, students got enthusiastic, energetic,
and interested to learn the concept.
Dimensions of ICT-Based Student Evaluation and Assessment … 145

References

1. Banerjee S et al (2023) Effectiveness of experiential learning as a pedagogy in higher education:

a study of SFIMAR. Indian J Adult Educ 84(1):37–51
2. Yadav S (2023) Reflective practices in adult education for life long learning. Indian J Adult Educ
84(1):52–61. ISSN: 0019-5006
3. Faris Muhammed MK, Chitturu S (2023) Understanding the culture of sports: a study of Malabar
Region in Kerala. Indian J Adult Educ 84(1):20–36. ISSN: 0019-5006
4. Banerjee S, George A, Kadbane A (2023) Effectiveness of experiential learning as a pedagogy
in higher education: a study of SFIMAR. Indian J Adult Educ 84(1):37–51. ISSN: 0019-5006
5. Bennett AG, Cassin F, van deer Merwe M (2017) How design education can use generative
play to innovate for social change: a case study on the design of South African children’s health
education toolkits. Int J Design 11(2):57–72
6. Legg R, Recipe M, Athena KS, Mani MinaIowa State University, Ames, Iowa (2005) Solving
multidimensional problems through a new perspective: the integration of design for sustainability
and engineering Education. In: Proceedings of the 2005 American Society for Engineering
Education Annual Conference & Exposition, American Society for Engineering Education
7. Kleinsmann M, Valkenburg R, Sluijs J (2017) Capturing the value of design thinking in different
innovation practices. Int J Design 11(2):25–40
8. Na J, Choi Y, Harrison D (2017) The design innovation spectrum: an overview of design
influences on innovation for manufacturing companies. Int J Design 11(2):13–24
Effectiveness of Online Education System

N. Vengadachalam, V. Subha Seethalakshmi, R. Arulmurugan,

and P. Balakrishnan

Abstract The online education system became famous during the Covid pandemic
time. Before the pandemic, there was very low percentage of utilization of online
education systems. Information Communication Technology (ICT) offers various
online education tools such as online meetings for oral interactions, online assign-
ment collection through Google Forms, online evaluation through Google Spread-
sheet, enhancing the student’s learning, appreciation of the student’s assignment, and
creation of e-study materials in terms of e-audio and e-video contents. Assessing the
student’s learning level through poll activity, enhancing student creativity through
mind map activity, developing student critical thinking using brainstorming activity,
etc. Especially for individual assessments, student participation monitoring is very
easy compared to the conventional teaching method. The online method converts the
teaching methodology into a teaching and learning method. This article discusses the
various effectiveness of the education system required, such as refreshing, motivation,
and enthusiasm for participation. Especially if some physical activity is conducted in
the noon to activate the brain before starting the class. E-audio content helps to recap
the concept after long days and this type of e-audio content is highly supported for
slow learners. Smiley and thumbs-up emojis motivate the students to actively partic-
ipate in the work. Finally, a discussion on the poll activity to re-learn the missed and
wrong concepts.

Keywords Meditation · Mind map · Brainstorm · e-content development

N. Vengadachalam (B) · V. S. Seethalakshmi · R. Arulmurugan · P. Balakrishnan

Dept. of EE, Annasaheb Dange College of Engineering and Technology, Ashta, Maharashtra,
India
e-mail: [email protected]
R. Arulmurugan
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 147
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_13
148 N. Vengadachalam et al.

1 Introduction

Online education systems and culture became famous during the COVID-19
pandemic time. Before that, only a few of the faculty has an online classroom plat-
form Moodle. The pandemic asked us to shift from conventional teaching methods
to online platform methods. Various ICT tools help to conduct the class more effec-
tively compared to offline classrooms. The only drawback is the personal touch,
the student’s site was missing, other than that interaction with the students, time
management, and activity conduction are highly possible during online classes. The
evaluation of individual students. Analysis of the student’s learning level through the
end of the session with a 5 min poll activity to conclude that the percentage of the
students reached the topics. Online platforms are used to enhance the student’s level
because most percentage of the students are addicted to mobile phones, the question
is does the candidate use an effective method? These online class sections help to
create e-materials such as audio materials, PowerPoint materials, video materials,
etc. I fully enjoyed this material preparation. Once the class is completed the audio
material is shared with the online classroom instead of WhatsApp because the online
classroom material keeps the content on long days but WhatsApp shared content
possibility to be erased after some days. Especially for absentees and slow learners,
this e-content very much helped. The preparation of e-content such as a video of
the lesson was easy while taking the class itself. Through this method, n-number
video materials are possible to develop and easy to share through the YouTube plat-
form. These platforms helped to store the e-video materials. In this article, a detailed
discussion of various innovative methods was elaborated.

2 Literature Survey

For developing and engaging the students through experimental learning, SFIMAR
(St. Francis Institute of Management and Research) involved a few case studies
of experimental teaching pedagogy. Experimental learning stimulates the learner’s
curiosity and enhances the understanding level [1]. Learner-Centric approach is fore-
front. The NEP 2020 suggested critical thinking for employability and fostering
experimental learning as an objective. Learner practicing was enhanced by the
experimental learning. Recent NAAC and NBA accreditations are expected for
experimental-based learning and ICT-based learning activity [2]. Created a higher
level of cross-cultural awareness and understanding that education in a project
manner instead of memorizing data. The experimental learning activity, viz. business
games, role-playing, virtual reality, and computer-based simulation. Experimental-
based learning enhances the importance of the university in the support of educa-
tional and teaching pedagogical tools. Deeper learning is another popular of experi-
mental learning known as project-based learning [3]. Through the activity, students
get deeper technical knowledge on the corresponding topics. It concentrates on the
Effectiveness of Online Education System 149

four main concepts such as social interactions, activity building, cognitive tools,
and contextual learning. The project base comes in several forms and encourages
the students to ask questions in the relevant domain. Through question and answer,
learners got the motivation and remembrance [4]. Project learning helps to minimize
the following drawbacks such as failure to meet ineffective participation, resulting in
missed teaching time, student-driven projects getting off-topic, and failure to meet
academic requirements. The project-based learning shows motivation in the studies
and enhances student’s problem-solving ability and confidence level. Through exper-
imental learning, faculty gives fast feedback about the student’s learning level and
understand better. On the other side increases the student’s self-learning pace through
the activity. In addition, with student’s enthusiasm, and confidence level [5] the exper-
imental activity improves the results every consecutive year. While conducting the
activity the tutor learned the shortfall of the activity, it helped to improve the activity
in a better way [6–8]. It is noticed that the learner needed clear objectives and a
picture of how to perform the activity [7]. Project-based learning highly encouraged
student’s critical thinking, innovation, and creative skills.

3 ICT-Based Portfolios Preparation Methodology

3.1 Meditation and Enthusiastic Learning Creation

The learning creation is very important for every course, for example, in school
children said the following. Some of the students said that “Mathematic subject not
interesting and like”, some of the rural background students saying that “English is
one of the toughest subjects” but the same English subject is very easy saying by
city students. The reason behind this is a lack of knowledge, practice, and avail-
ability of resource guidance. The availability of guidance plays a very important
role. i.e. Teacher. How do teachers teach the subject? Some teachers teach compli-
cated subjects in the simplest method, some teacher teaches simple concepts in the
complicated method. On the other side, the teacher’s attitude is very important to
reflect to the students that it is the easiest subject and not the easiest subject. The
teacher-treating method, even though some of the teachers are not strong in the
subject knowledge but the treating method is heart touchable because these students
get interested in studying the subject and taking them as role models. If you look at
who is the best teacher when asking the students, who behaves friendly manner and
connects to the students. Touch the student’s heart, the teacher got good feedback.
So creating interest in the subject is very important. Once enter the class shouldn’t
start with the syllabus. Discuss the general things related to the subject and link to
the subject that is called the analogy concept. So creating interest and enthusiasm
for the course is very essential. The second major one is less motivation or lack of
confidence skills. Most percentage of the students have an inferiority complex and
lack confidence skills about their careers. The third problem is concentration on the
150 N. Vengadachalam et al.

subject. There is no gap between the first period to the second period hour. Due to
the continuous classes on forenoon and afternoon, students lost their concentration
power. Especially creating interest in the subject, students never concentrate on the
subject.
These three are major problems for every education sector to avoid these problems,
I was habituated to conducting the following activities, such as:
If my classes are in the forenoon session, I habituated to conduct the five-minute
meditation activity. It was very easy and effective during online class sessions. During
the offline class somewhat accepted but not 100% satisfactory level, But the online
class session was a great success. I was in the habit of playing the five-minute prayer
song, the song was played by a blind person. The outcome of the songs said that:
a. If you have a skill you can win, so develop the skills.
b. Getting inspired because the blind person succeeds and lives without an eye,
what about you?
c. Getting relaxed by hearing the song
d. Change the stressed mind to normal
In the end, students got enthusiastic and interested in attending the class. The five-
minute short break creates mind freshness. Figure 1 shows the prayer song picture
sung by a blind girl. During the offline session, this type of activity is a little bit
difficult because of the arrangement of the system, playing, speaker availability,
etc. Due to this, I asked the students to close their eyes for three to five minutes.
Sometimes I instructed them to observe the surrounding noise or asked them to
observe their breathing. In the afternoon session ask them to perform short puzzle
mind and physical games for example,

Fig. 1 Meditation and enthusiastic prayer song (Source https://www.youtube.com/watch?v=VwW

fb3FXwjc)
Effectiveness of Online Education System 151

a. Asked the students to repeat what I am saying, e.g., standup, sit-down, clap, jump,
walk, hi-fi
b. In level two asked to do reverse, e.g., when I say standup means students need to
perform sit-down, when I say clap means, students need to jump. Like that.
c. Look on eye to eye contact with their friend, etc.
As research says, songs are one type of drag, these songs and sounds are used to
mesmerize the person. So, these types of methods are used to enhance the student’s
concentration level in the class hour period.
Figures 2 and 3 show the brain and physical activity. These activities are used to
create a refresh to the mind and body.
During the afternoon session or after lunch session student’s concentration level
is very less even when students try to spend more interest on the subject because
of the climate condition, and food consumed. Due to this, the concentration on the
subject is reduced. The overall concept for high I.Q level students is the remaining
level student’s concentration on the particular session for a maximum of seven to
fifteen minutes, after that, the mind is going to think either past or future. Due to
these types of mind disturbance, the subject or class concentration is very low. It
is overcome by some physical activity. Through the activity, remember the topic
discussed at the time. Some of the physical activity helps to avoid laziness and sleep
during class hours. As Fig. 3 shows one type of physical hand activity. Through the
activity, students left and right brains start to function. Most of the time right brain
is sleepy, and this type of activity helps to create freshness in the students. In earlier
days, thoppukaranam word was used in the Tamil language it’s called super brain
yoga. The super brain yoga method is used to create a spark in left and right brain
activity.

Fig. 2 Brain gym exercises activity

152 N. Vengadachalam et al.

Fig. 3 Brain and physical activity

4 Developed and Shared E-Audio and Video Content

The e-audio and e-video subject content is very helpful for recapping the concept
after a long day. Every semester has a minimum of 105 days or three and a half
months. Every human stores content in their mind that is possible to erase after
seven to ten days, that’s why the teaching faculty starts to prepare the course content
before handling the period. Every time lot of effort, examples, and analogies are
used to explain the concept but later, after some days all the concepts evaporate
from the mind. Without frequent recaps, all the concepts are erased from the mind.
These problems are overcome by the creation of e-audio and e-video content, for
example, after completing the unit, I start to prepare one audio file, and in that file
short explanation about the entire unit of the content. These audio files are shared
after the completed unit. When students during free time or exam time these types of
audio content very much helped. The video content was uploaded to YouTube and
shared the link to the students through Moodle Classroom and WhatsApp.
Figure 4 shows the screenshot image of Homework content shared through the
online classroom My-one-Note page. This page shows the content for tomorrow’s
homework, related video, and instruction audio clip. That audio clip gave a short
about the home work and details are there in the video content. The recap of the
content is prepared in the audio file.
Figure 5 shows the pre-requisite content for the next chapter. After completing
the chapter, one prepares the recap of the related content audio file and asks them to
listen to the content and revise the Chap. 1 content. Followed by the recap activity
through question and answer as shown in Fig. 6.
Effectiveness of Online Education System 153

Fig. 4 Shared e-audio and video content to the online classroom

Fig. 5 Pre-requisites of the next chapter

Figure 6 shows the possibility of the question listed in Chapter 1. Students need
to answer the question himself/herself to know their learning level. Followed by
clicking the audio recording file to listen to the hints of the content.
Figure 7 shows the sample screenshot of the recap activity. The above One-Note
screen contains some balloons and smiley icons. These icons create some energy for
the students in their minds, for example, if you look at the school children’s hands
they come with stars on their hands. A single star, three stars, etc. These stars indicate
effective participation in the classroom activity and scoring level. The children show
these stars to their parents and show with enthusiasm. The similar pattern continues
to show this simile and thumb indication.
154 N. Vengadachalam et al.

Fig. 6 Screenshot of recap page

Fig. 7 Sample of recap activity

Figure 8 shows the screenshot of the homework image and related video content.
The homework was conducted through the activity. It clearly shows what to do by
the students. Step 1 written shared about today’s class video without audio content,
i.e. the video shows only pictures and explanation of the images without audio.
Step 2 asked the student to prepare the dubbing voice for the video content. Step 3
asked them to upload into Google Form. Through the activity, students enhance their
remembering, and communication skills. These written instructions are developed
by oral audio files too. When students click to listen to the audio content.
Effectiveness of Online Education System 155

Fig. 8 Screenshot of the homework

Figure 9 shows the tutor screen sharing window and participant viewing screen
window. The tutor screen has an option to be seen by the students. Once clicked,
copy the link, and share it to the student’s WhatsApp group or Google Classroom.
When students click the link, they can see the current pages, past pages of the subject.
But students couldn’t edit the screen. The tutor has only the option to edit. Student’s
n-number of times click to listen and watch the video contents.

5 Summary of the Content Through Poll Activity

The Poll activity is very helpful for a summary of the session. At the end of the
five minutes, the session is allotted for a summary activity of today’s content. This
summary content helps to recap the learned topic by the participating candidate. On
the other side to know the student’s learning level by conducting summary activity.
The summary of the content is performed by conducting a poll method. Write the
question with four options. Asked the students to answer by clicking either A, B,
C, or D. Once the allotted time is over stop the poll to display the percentage of
the students who answered on A, B, C, and D of the answer. The percentage of the
students scored based on coming to an understanding of the student’s learning level.
At the end of the poll, the content is explained as to what is correct and what is wrong
by explanation to understand the topic in deep as shown in Fig. 10.
156 N. Vengadachalam et al.

Fig. 9 The tutor screen sharing window and participant viewing screen window

Fig. 10 Discussion and summary poll activity

Effectiveness of Online Education System 157

6 Conclusion

The offline education system is more effective for one-to-one interaction, personal
touch, or connection with the student participation but assesses the student’s learning
level as per the rubrics and enhances student’s critical thinking, brainstorming
activity, etc. complicated tasks. The young buddy engineer is addicted and inter-
ested in utilizing the smart mobile phone. Some of the students are flowing in the
correct direction, which means utilizing the internet, and YouTube for healthy educa-
tion, in another side fifty percent of the schools and college students are addicted
to playing video games, watching movies, short videos, etc. Sometimes students go
to extreme conditions to lose the game and lose money for playing the game, so a
teacher needs to create enthusiasm, encouragement, and motivation in the life and
courses by playing short meditation prayers and motivation videos to change the
student’s additive into valuable life. In this session, various online ICT tools were
utilized for effective monitoring, to encourage the students’ performance.

References

1. Banerjee S et al (2023) Effectiveness of experiential learning as a pedagogy in higher education:

a study of SFIMAR. Indian J Adult Educ 84(1):37–51
2. Yadav S (2023) Reflective practices in adult education for lifelong learning. Indian J Adult Educ
84(1):52–61. ISSN: 0019-5006
3. Faris Muhammed MK, Chitturu S (2023) Understanding the culture of sports: a study of Malabar
region in Kerala. Indian J Adult Educ 84(1):20–36. ISSN: 0019-5006
4. Banerjee S, George A, Kadbane A (2023) Effectiveness of experiential learning as a pedagogy
in higher education: a study of SFIMAR. Indian J Adult Educ 84(1):37–51. ISSN 0019-5006
5. Bennett AG, Cassin F, van deer Merwe M (2017) How design education can use generative
play to innovate for social change: a case study on the design of South African children’s health
education toolkits (2017). Int J Design 11(2):57–72
6. Legg R, Recipe M, Athena KS, Mani MinaIowa State University, Ames, Iowa, (2005) Solving
multidimensional problems through a new perspective: the integration of design for sustainability
and engineering Education. In: Proceedings of the 2005 American Society for Engineering
Education Annual Conference & Exposition, American Society for Engineering Education
7. Kleinsmann M, Valkenburg R, Sluijs J (2017) Capturing the value of design thinking in different
innovation practices. Int J Design 11(2):25–40
8. Na J, Choi Y, Harrison D (2017) The design innovation spectrum: an overview of design
influences on innovation for manufacturing companies. Int J Design 11(2):13–24
A Formula for Effective Evaluation
Practice Using Online Education Tool

V. Subha Seethalakshmi, R. Arulmurugan, P. Balakrishnan,

and N. Vengadachalam

Abstract The COVID-19 pandemic taught us to take the class through online mode.
It forced us to move the conventional chalk and board method into the online using
Google Meet, Zoom Meet, Microsoft Team, etc. This platform helps to interact
with the entire class of students on a single screen. It helps to deliver the content
and interact with the students. Some of the free apps provided limited features,
compared to the Paint app. The full paid app has a recording session, conducting
a poll. The evaluation of the student performance needs to go for the Information
Communication Technology (ICT) tool. These tool helps to enhance the student’s
critical thinking, learning level, content beyond content view, etc. In addition, these
tool helps to evaluate the student’s learning level easily. The outcome of the mind-
map and brainstorming activities helps to recap the content after long days. For
performing the mind-map activity students need to utilize all Bloom’s levels for
example remembering, understanding, applying, analyzing, evaluating, and creating.

Keywords Mind-Map activity · Brainstorm activity · Encouraged assignment

activity

V. S. Seethalakshmi (B) · R. Arulmurugan · P. Balakrishnan · N. Vengadachalam

Dept. of EE, Annasaheb Dange College of Engineering and Technology, Ashta, Maharashtra,
India
e-mail: [email protected]
R. Arulmurugan
e-mail: [email protected]
P. Balakrishnan
e-mail: [email protected]
N. Vengadachalam
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 159
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_14
160 V. S. Seethalakshmi et al.

1 Introduction

Online education systems and culture became famous during the COVID-19
pandemic time. Before that, only a few of the faculty has an online classroom plat-
form Moodle. The pandemic asked us to shift from conventional teaching methods to
online platform methods. Various ICT tools help to conduct the class more effectively
compared to offline classrooms [1–5]. The only drawback is the physical presence,
the student’s site was missing, other than that interaction with the students, time
management, and activity conduction are highly possible during online classes [2–
6]. The evaluation of individual students by conducting a poll activity. Analysis of the
student’s learning level through the end of the session with 5 min of poll activity to
conclude that the percentage of the students reached the topics [3]. Online platforms
are used to enhance the student’s level because most percentage of the students are
addicted to mobile phones; the question is does the candidate uses an effective method
[4]. These online class sections help to create e-materials such as audio materials,
PowerPoint materials, and video materials. I fully enjoyed this material preparation
[7]. Once the class is completed the audio material is shared in the online classroom
instead of WhatsApp because the online classroom material keeps the content on
long days but WhatsApp shared content possibility to be erased after some days.
Especially absentees and slow learners, this e-content very much helped. The prepa-
ration of e-content such as a video of the lesson was easy while taking the class
itself [8]. Through the method, n-number video materials are possible to develop
and easy to share through YouTube platform. These platforms helped to store the
e-video materials. This article discussed various innovative methods used to attract
the participants and enhance the learning level.

2 Literature Survey

For developing and engaging the students through experimental learning, SFIMAR
(St. Francis Institute of Management and Research) involved a few case studies
of experimental teaching pedagogy. Experimental learning stimulates the learner’s
curiosity and enhances the understanding level [1]. The learner-centric approach
is fore-front. The NEP 2020 suggested critical thinking for employability and
fostering experimental learning as an objective. Learner by practicing was enhanced
by experimental learning. Recent NAAC and NBA accreditation are expected
for experimental-based learning and ICT-based learning activity [2]. Created a
higher level of cross-cultural awareness and understanding that education in a
project manner instead of memorizing data. The experimental learning activity, viz.
business games, role-playing, virtual reality, and computer-based simulation. The
experimental-based learning enhances the importance of the university in the support
of educational and teaching pedagogical tools. Deeper learning is another popular
of experimental learning known as project-based learning [3]. Through the activity,
A Formula for Effective Evaluation Practice Using Online Education Tool 161

students get deeper technical knowledge on the corresponding topics. It concen-

trates on four main concepts such as social interactions, activity building, cognitive
tools, and contextual learning. The project base comes in several forms and enhances
the students to ask questions in the relevant domain. Through question and answer,
learners got the motivation and remembrance [4]. Project learning helps to minimize
the following drawbacks such as failure to meet ineffective participation, resulting
in missed teaching time, student-driven projects can get off topic, and failure to meet
academic requirements. The project-based learning shows motivation for the studies
and enhances students’ problem-solving ability and confidence level. Through exper-
imental learning, faculty gives fast feedback about the student’s learning level and
understand better. On another side increases the student’s self-learning pace through
the activity. In addition, increases the student’s enthusiasm, and confidence levels
[5]. The experimental activity improves the results every consecutive year. While
conducting the activity the tutor learned the shortfall of the activity, which helped to
improve the activity in a better way [6–8]. It was noticed that the Learner needed clear
objectives and a picture of how to perform the activity [7]. Project-based learning
highly encouraged student’s critical thinking, innovation, and creative skills.

3 ICT-Based Portfolios Preparation Methodology

3.1 Mind Map Creating Activity

The mind maps creativity to think students critically. Through the activity, students
remember, communicate, analyze, design, and apply skills enhanced. The miro.com
offers to create the mind map diagram. The beginning shows one sample of the mind
map and how it linked to the previous and next shows to create some basic idea about
the mind map as shown in Fig. 1. Followed by asking the students to draw to the
assigned topic. Through the activity, students start to recap the concept, understand
the basics, design the mapping, analyze the next and previous mapping content, and
finally apply the content to the mapping.

4 Brainstorm Activity

The online interactive brainstorming activity is conducted to enhance student’s

thinking levels. These activities are used to enhance students’ box thinking, and
learning through other thinking, for example, asking simple questions like what you
want to become in the future. Answering different possibilities of answers will come
by everyone at the end of the timeshare the result of the various answers thought by the
same level of age group people. Through the activity enhance student brainstorming
level and thought process level as shown in Fig. 2.
162 V. S. Seethalakshmi et al.

Fig. 1 Sample screenshot of mapping content

Fig. 2 Brainstorming sample shot activity screenshot

5 Assignment Encouragement Activity

Every student expects the credit from the teacher. As a teacher, we need to encourage
the students to enhance the student’s confidence level. Simple appreciation and
encouragement enhance student’s level of performance better than the conventional
method. In recent days. the instructions given to all the students are as follows:
A Formula for Effective Evaluation Practice Using Online Education Tool 163

Fig. 3 Assignment activity announcement sheet

a. Shouldn’t shout at the students in front of others

b. Don’t scold the degraded words to the students
c. Understand the students lack reason and need personal counseling
d. Don’t throw chalk pieces and student notes outside
e. Don’t punish the latecomers
f. Teach what’s a recent trend on today’s topic
g. Show some video animation related to the topic
h. Give the seminar topic to the interested students
i. Don’t show any partiality to the students.
This generation of students are more sensitive. These students are going on
extremely low and extremely high levels. So as a teacher there is a great respon-
sibility to correct society. Figure 3 shows the smiley simple to attract the student’s
assignment encouragement.

6 Attractive Online Classroom Platform Activity

The online classroom platform helps to share the content and hold the content for a
long time. In recent day’s lot of free online classroom content came into the picture
such as Google Classroom, Edmodo, Moodle Classroom, etc. These Moodle class-
rooms are offered by various domains such as moodlecloud.com, gnomio.com, etc.
In addition, some free website also helps to create an online classroom. Google
too offers a free website crating facility. In earlier days wordpress.com, webs, etc.
164 V. S. Seethalakshmi et al.

Fig. 4 Moodle Class screenshot

offer free website platforms. Figure 4 shows the Moodle Classroom screenshot. The
Moodle Classroom is used to access the content without essence for logging, joining
the class, attending the quiz activity, or getting the permission from course handler.
This Moodle Classroom helps all the students, access the material everywhere in the
world.

7 Conclusion

The article showed various online education tools utilized to assess the student’s
performance effectively. This article described the various assessment tools that help
to enhance the student’s critical thinking, and brainstorming activity, and encouraged
assignment activity. These tools and methods are used to enhance and attract student
participation in the class. These tools help to remember the studied content, through
discussion with the team, understand the pros and cons of the content, novel design
or tree diagram method helps to enhance the creativity of the content, after creating
an analysis of the content with exciting and finally apply the all the content into
the map. The conventional teaching method doesn’t encourage learning and several
challenges for evaluating the students’ performance. The e-learning platforms attract
and motivate the student-centric approach to learning methodology.
A Formula for Effective Evaluation Practice Using Online Education Tool 165

References

1. Banerjee S et al (2023) Effectiveness of experiential learning as a pedagogy in higher education:

a study of SFIMAR. Indian J Adult Educ 84(1):37–51
2. Yadav S (2023) Reflective Practices in Adult Education for lifelong learning. Indian J Adult
Educ 84(1):52–61. ISSN: 0019-5006
3. Faris Muhammed MK, Chitturu S (2023) Understanding the culture of sports: a study of Malabar
Region in Kerala. Indian J Adult Educ 84(1):20–36. ISSN: 0019-5006
4. Sanchayita Banerjee, Anish George, Abhishek Kadbane (2023) Effectiveness of experiential
learning as a pedagogy in higher education: a study of SFIMAR. Indian J Adult Educ 84(1):37–
51. ISSN 0019-5006
5. Bennett AG, Cassin F, van deer Merwe M (2017) How design education can use generative
play to innovate for social change: a case study on the design of South African children’s health
education toolkits. Int J Design 11(2):57–72
6. Legg R, Recipe M, Athena KS, Mani MinaIowa State University, Ames, Iowa (2005) Solving
multidimensional problems through a new perspective: the integration of design for sustainability
and engineering Education. In: Proceedings of the 2005 American Society for Engineering
Education Annual Conference & Exposition, American Society for Engineering Education
7. Kleinsmann M, Valkenburg R, Sluijs J (2017) Capturing the value of design thinking in different
innovation practices. Int J Design 11(2):25–40
8. Na J, Choi Y, Harrison D (2017) The design innovation spectrum: an overview of design
influences on innovation for manufacturing companies. Int J Design 11(2):13–24
Deciphering the Catalysts Influencing
the Willingness to Embrace Digital
Learning Applications: A Comprehensive
Exploration

Ankita Srivastava and Navtika Singh Nautiyal

Abstract The rapid advancement of technology has revolutionized the education

sector, ushering in a new era of digital learning applications. In recent years, the
adoption of these applications has witnessed significant growth due to their poten-
tial to enhance learning experiences, promote accessibility, and adapt to individual
needs. The research methodology employed a mixed-methods approach, combining
quantitative surveys and qualitative interviews. The study sample consisted of a
diverse group of students and educators from various educational institutions, repre-
senting different age groups, backgrounds, and fields of study. Furthermore, the
study found that the attitudes and perceptions of peers and instructors also exerted a
considerable influence on the intention to adopt digital learning applications. Positive
endorsements and recommendations from influential peers and respected educators
positively impacted participants’ attitudes toward adoption.

Keywords Digital learning · Online learning · Learning process · TRA & TAM

1 Introduction

Learning on a digital platform is one of the key contributions to digital revolu-

tions [1]. Information and communication technology is also essential for improving
the teaching and learning process and supports a number of duties carried out by
educational institutions [2]. The demonstrative method of teaching and learning has
changed following the modern age of digitization and innovations, and new learning
tools have been integrated into the teaching and learning process [3]. According to
Ref. [4], one of these technological advancements that has significantly changed the

A. Srivastava · N. S. Nautiyal (B)

National Forensic Sciences University, Gujarat, India
e-mail: [email protected]
A. Srivastava
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 167
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_15
168 A. Srivastava and N. S. Nautiyal

way that people learn, particularly in higher education, is mobile learning. Various
studies have defined the term “digital learning” [5].
Due to the fact that mobile applications for digital learning can be used anytime,
anywhere, even in remote locations, and that learners can benefit from them, many
educational institutions have realized the potential of digital learning as a teaching
tool for their students and have incorporated it into the distance learning environ-
ment [6]. Though the majority of research has focused on the value and applicability
of e-learning, distance learning, and digital learning in the context of the adoption
of distance education technologies, some researchers have been unable to investi-
gate the primary motivation and intention behind why students in higher education
choose to adopt digital learning applications. Understanding users’ intentions to use
technology has grown to be one of the most difficult issues for information system
researchers, according to Teo [7]. The literature has demonstrated that characteristics
connected to the acceptability of technology were used to identify researchers’ inter-
ests in information system studies [8, 9]. Because of this, information system experts
have created intention models to aid in forecasting and elucidating the adoption of
technology across numerous fields.
According to research, students are technologically knowledgeable and desire
to use mobile applications to access the learning opportunities provided by their
institution [10, 11] used TRA and TAM to study users’ acceptance of computer tech-
nology, [12] adopted TAM to explore teacher acceptance of e-learning technology,
and [13] applied both TRA and TAM to examine student–teachers’ intention to use
computer technology. Despite the extensive application of TRA and TAM in research
studies, few, if any, have explored an integration of TRA and TAM to predict and
explain students’ intention to use (mobile) m-learning in developing countries, since
m-learning is fairly new in educational environments in these countries. With the
increase in tendencies to adopt digital learning, learners are expanding their bound-
aries with the use of numerous mobile apps. Past studies on virtual learning have
identified a few fundamental constructs and benefits of digital learning: Technical
progression, utility, quality, flexibility, learning at own pace, cost-saving, secured
content, and increased outreach [14].
Several scholars have shown their curiosity about the safe and disruptive learning
potential of mobile learning [15] which has gained global popularity. Some empirical
studies have been done by various researchers to investigate the effect of perceived
usefulness, ease of use, and perceived security on digital or mobile learning because
these elements are considered important factors for the adoption of digital learning
[16].
Despite numerous studies in this area, no study has been conducted taking a
comprehensive framework of all variables that helps to identify factors affecting
the intention of adoption of mobile learning apps. The rest of the paper has been
organized into following sections: Literature review, research methodology, data
analysis, findings and discussion, and conclusion.
Deciphering the Catalysts Influencing the Willingness to Embrace … 169

2 Literature Review

Nowadays, with the advancement in technology, educational institutes are imple-

menting digital learning in the teaching and learning approach. With technological
advancements, learning techniques are also changing day by day and the popularity
of digital learning apps is also becoming a more important part of the learning process
[5]. Digital learning is implemented by schools, colleges, and universities across the
world for various reasons, in which digital learning through apps is supporting the
twenty-first century for better knowledge enhancement the learners [17], but still
there are some old stories about the intention of adoption of digital learning applica-
tions. So, a literature review will explore the core factors which are influencing the
intention of the adaptation to digital learning applications by the learner.
Technology advancement can be one of the elements for the adoption of e-learning,
while many other aspects influence behaviour and intention to use digital learning
apps [6]. Technological improvement can speed up the learning process by using
computers, mobile devices, and a variety of applications [7]. In contrast, digital
learning that is supported by technology affects how satisfied a user is with the
perceived value and accuracy of the knowledge [18]. Access to the network at all
times, security, and data storage can be seen as the key determinants of how well
mobile learning apps are adapted [19]. Additionally, it depends on the security,
privacy, and data storage [20].
Virtual education initially enables institutions to reach out to students who want to
obtain education remotely. As a result, it benefits the institution by expanding student
enrollment. The university can provide instruction to a large number of students who
participate in online lectures and video conferences. In some circumstances, it aids in
lowering the cost of staffing as well as the budget for employee travel [14]. Respon-
siveness is the most important criterion while experiencing and adopting the use of
mobile learning APPs. Quick Learning Assessment including Learning Feedback and
timely reply to solutions to user’s problem, response to evaluate learner progress, can
significantly enhance the users’ learning confidence and learning effect [21]. Quick
Assessment and feedback are also considered as an important part of teaching and
learning. Therefore, a well-designed learning support and responsiveness approach
will be fruitful for the learner and mobile app users, to improve the students’ learning
[5].
Digital learning apps have their own utility in performing various tasks, espe-
cially engineering courses and found that it is also useful for practical teaching,
especially while performing empirical investigation [22]. It has been observed that
there is a huge potential for mobile learning applications in the current academic
environment due to their usefulness, perceived value and satisfaction gained from
using paid apps [23]. Digital learning with mobile applications is useful for any age
group of person and age factor does not influence the perception, experience and use
of the newest technological advancement [24]. So, users’ learning confidence, quick
access to learning materials, and updated knowledge as per the latest syllabus can be
determinants for the adoption of digital learning apps. Quality content and quality
170 A. Srivastava and N. S. Nautiyal

services can be better strategies for encouraging the learning for the adoption and
acceptance of digital learning [25]. However, quality content provides better under-
standing through digital learning apps and motivates the perspective to adoption
digital learning [26]. In addition, the digital learning apps help users to read, search
and to collect the subject-related content or data while using the app will provide
satisfaction in studying [27]. So, the rich graphic elements, proper content structure,
latest learning materials, refined and short content structure and an attractive inter-
face can also be one of the major determinants for the adoption of digital learning
apps. In light of the above discussion, the study needs to identify the factors of the
intention of adoption of mobile learning apps and their impact on performance of
students.

3 Research Methodology

The study has used a structured questionnaire to collect the responses on the different
statements/items used to explore the factors affecting the intention of adoption of
mobile learning apps and their impact on performance of students. The respondents
are the students pursuing master’s level courses at a Central university of western
India and a census has been done. The central University has been selected as it is
believed to have the best resources available. The questionnaire was developed on a
five-point Likert scale where 1 stands for strongly disagree and 5 stands for strongly
agree. The questionnaire has been floated to the students of the university. The popu-
lation was 300 but a total of 280 questionnaires were floated as 20 students were
absent that day. Out of 280, only 249 questionnaires were complete in all aspects
and were suitable to proceed for data analysis. Further exploratory factor analysis
has been conducted to identify the factors and then multiple regression analysis has
been used to measure the impact of those factors on the performance of students.
The dependent variable is the performance of students and the independent variables
are utility, flexibility, cost-effectiveness, technical advancement, quality perception
and accessibility (gained through exploratory factor analysis). The questionnaire
was tested for reliability and content validity. The reliability was tested using Cron-
bach alpha and Guttman’s split-half test. Both these tests are for measuring internal
consistency.
Cronbach alpha: It is the most common measure of internal consistency for
reliability and it is a method that needs only one-time test administration but provides
an exclusive approximation of a test’s reliability, [28, 29]. The analysis resulted in
an overall Cronbach alpha score of 0.82.
Guttmann’s split half-reliability: A fundamental postulation of split-half reli-
ability is that the two halves of the test should yield similar true scores and error
variances when the test items are focused on the construct. It means it measures
the limit to which all parts of the tool uniformly contribute to the content to be
measured. To use split-half reliability, the items were divided into two equal halves,
and the different halves were administered to study participants, and analyses were
Deciphering the Catalysts Influencing the Willingness to Embrace … 171

•A questionnaire was developed with 22

Questionnnai statements.
re
Development

• Population (N) is 300 (MBA students of a

central university)
• Sample (n)= 280 (20 students
Sample were absent that day), and 249 question-
Selection naires completed in all aspects were received.

•Chronbach=0.82
Reliability of •Guttman Split half=0.70
Questionnaire

•Exploratory factor Analysis

•Regression Analysis
Data Analysis

Fig. 1 Research methodology

run between the two respective “split-halves”. A Spearman’s rho correlation was run
between the two halves of the instrument. Then SPSS software was used to conduct
the split-half reliability. The range of this coefficient varies from 0 to 1.0. In this
study, the value of split-half reliability is 0.70 which indicates the high reliability of
the instrument (Fig. 1).

4 Data Analysis

Firstly, the statements were analyzed for exploring factors via factor analysis. This is
also known as the dimension reduction technique as it summarizes statements based
on their similarities. To perform it first we need to conduct KMO and Barlett’s test
of sphericity. The results suggest an adequate sample and permits us to proceed with
factor analysis.
172 A. Srivastava and N. S. Nautiyal

Table 1 KMO and Bartlett’s test

Kaiser–Meyer–Olkin measure of sampling adequacy 0.413
Bartlett’s test of sphericity
Approx. Chi-square 2883.382
Level of significance 0.000

The results are given in Table 1.

The KMO and Barlett’s test turned out to be significant which suggests that the
sample size is adequate to perform factor analysis.
The factor analysis results are presented in Table 2.
The result fetched by the total variance explained suggests that the factors explored
are able to explain approximately 75% of the variation. Further principal component
matrix has been used on 22 statements to get the factors. Varimax rotation with
Kaiser normalization in SPSS 25 has been applied in order to get optimal factor
loading (Table 3).
The variables are grouped on the basis of characteristics. The similar ones are
grouped together. On the basis of their characteristics, the variables are named in
Table 4.
Further, a regression analysis was conducted to see the impact of the indepen-
dent variables obtained through factor analysis on the performance of students. The
dependent variable is the performance of students and the independent variables are
utility, flexibility, cost-effectiveness, technical advancement, quality perception, and
accessibility. The results are presented below.
The ANOVA table suggests the model fit. The significant value of F statistics
(4154.456) at one percent significance suggests the model is a good fit (Table 5).
Furthermore, the coefficient diagnosis has been conducted and it suggests five vari-
ables to be significant at a one percent level of significance. The significant vari-
ables are utility, flexibility, cost-effectiveness, technical advancement, and quality
perception with coefficient values of −0.023, −0.025, −0.025, 1.016, and −0.021,
respectively. The four coefficients have negative sign which means they are nega-
tively affecting the performance of students. The major reason for this negative
association can be attributed to the distractions that are common while using mobile
learning apps. The VIF (variance inflation factor) value is low enough to suggest no
multi-collinearity among independent variables (Table 6).

5 Findings and Discussions

The cost-effectiveness of digital learning, primarily when facilitated by mobile apps,

sets it apart from customary computer or laptop learning. The exploratory factor anal-
ysis yielded six factors: quality perception, affordability, cost-effectiveness, acces-
sibility, flexibility, and technological advancement. Reliability statistics are either
Table 2 Total variance explained
Component Initial eigenvalues Extraction sums of squared loadings Rotation sums of squared loadings
Total % of variance Cumulative % Total % of variance Cumulative % Total % of variance Cumulative %
1 5.264 23.926 23.926 5.264 23.926 23.926 4.826 21.937 21.937
2 3.825 17.385 41.310 3.825 17.385 41.310 3.349 15.222 37.159
3 2.262 10.282 51.593 2.262 10.282 51.593 2.492 11.328 48.486
4 2.007 9.125 60.717 2.007 9.125 60.717 2.037 9.258 57.745
5 1.867 8.489 69.206 1.867 8.489 69.206 2.026 9.211 66.956
6 1.227 5.576 74.782 1.227 5.576 74.782 1.722 7.826 74.782
7 0.985 4.479 79.261
8 0.753 3.425 82.686
9 0.637 2.894 85.580
10 0.518 2.354 87.934
11 0.503 2.285 90.219
12 0.410 1.864 92.083
13 0.372 1.692 93.775
14 0.321 1.460 95.236
Deciphering the Catalysts Influencing the Willingness to Embrace …

15 0.309 1.405 96.640

16 0.288 1.309 97.950
17 0.237 1.078 99.028
18 0.208 0.947 99.975
19 0.005 0.025 100.000
20 9.108E−17 4.140E−16 100.000
(continued)
173
Table 2 (continued)
174

Component Initial eigenvalues Extraction sums of squared loadings Rotation sums of squared loadings
Total % of variance Cumulative % Total % of variance Cumulative % Total % of variance Cumulative %
21 7.755E−17 3.525E−16 100.000
22 −1.512E−17 −6.873E−17 100.000
A. Srivastava and N. S. Nautiyal
Deciphering the Catalysts Influencing the Willingness to Embrace … 175

Table 3 Rotated component matrix

1 2 3 4 5 6
VAR00001 0.026 0.025 0.488 0.030 −0.034 0.367
VAR00002 −0.022 −0.025 −0.024 0.994 −0.021 0.003
VAR00003 0.947 −0.006 0.040 −0.067 0.089 0.082
VAR00004 0.944 −0.006 0.035 −0.073 0.086 0.082
VAR00005 −0.035 0.949 0.062 −0.037 0.033 0.125
VAR00006 −0.047 0.856 0.066 −0.013 −0.006 0.084
VAR00007 0.011 −0.017 −0.331 0.092 −0.131 0.115
VAR00008 0.051 0.213 0.062 0.015 0.011 0.856
VAR00009 0.037 0.848 0.023 0.025 −0.055 0.085
VAR00010 −0.013 0.163 0.181 −0.020 0.002 0.827
VAR00011 −0.035 0.949 0.062 −0.037 0.033 0.125
VAR00012 0.087 −0.063 −0.010 −0.006 0.803 −0.052
VAR00013 0.108 0.051 0.003 −0.065 0.818 −0.044
VAR00014 0.138 0.028 0.817 −0.030 −0.012 0.234
VAR00015 0.947 −0.006 0.040 −0.067 0.089 0.082
VAR00016 0.079 0.010 0.084 0.032 0.809 0.100
VAR00017 0.862 −0.026 0.128 −0.028 0.019 −0.085
VAR00018 −0.022 −0.025 −0.024 0.994 −0.021 0.003
VAR00019 0.182 0.041 0.821 0.086 −0.060 0.107
VAR00020 0.771 −0.016 0.122 0.130 0.050 −0.039
VAR00021 0.157 0.104 0.825 −0.006 0.024 0.027
VAR00022 0.831 −0.026 0.170 0.033 0.061 −0.024

Table 4 Name of factors

Utility (1) Flexibility (2) Cost-effective Technical Quality Accessibility
(3) advancement perception (6)
(4) (5)
Q3, Q4, Q5, Q6 Q1, Q7 Q2 Q12 Q8
Q15, Q17 Q9, Q11 Q14, Q19 Q18 Q13 Q10
Q20, Q22 Q21 Q16

Table 5 ANOVA
Model Sum of Squares df Mean Square F Sig
1 Regression 256.410 6 42.735 4154.456 0.000b
Residual 2.489 242 0.010
Total 258.900 248
176 A. Srivastava and N. S. Nautiyal

Table 6 Coefficients
Model Unstandardized Standardized t Sig Collinearity
coefficients coefficients statistics
B Std. error Beta Tolerance VIF
1 (Constant) 4.020 0.006 625.460 0.000
REGR −0.023 0.006 −0.022 −3.559 0.000 1.000 1.000
factor
score 1
REGR −0.025 0.006 −0.025 −3.902 0.000 1.000 1.000
factor
score 2
REGR −0.025 0.006 −0.024 −3.833 0.000 1.000 1.000
factor
score 3
REGR 1.016 0.006 0.994 157.712 0.000 1.000 1.000
factor
score 4
REGR −0.021 0.006 −0.021 −3.281 0.001 1.000 1.000
factor
score 5
REGR 0.003 0.006 0.003 0.422 0.673 1.000 1.000
factor
score 6

inherent to the apps themselves or are employed to evaluate the merits of the body of
research currently available on mobile learning apps. Nevertheless, the study demon-
strates that opinions regarding whether or not mobile learning apps are beneficial to
students differ significantly. Even though most of the variables in this aspect are
important, they are hurting students’ performance. Because digital learning is more
affordable than just using computers or laptops, especially if done through mobile
apps, it is regarded as a popular uprising on its own. Six factors have been recognized
by the exploratory factor analysis: cost-effectiveness, accessibility, quality percep-
tion, technological advancement, flexibility, and utility. These are all either features
of mobile learning applications or advantages that have been derived from the body
of existing research and verified using reliability statistics. The survey does, however,
unequivocally show that opinions about how useful and helpful digital learning apps
are to students differ greatly. In this regard, the majority of the variables, while a
significant negative impact on students’ performance.
This undesirable association is largely due to distractions which are frequent
because of the use of mobile learning applications, for instance, the games and
frequent advertisement. The problems can be summarised in this perspective of
usefulness: it can offer numerous benefits when used properly, but if it is handled
carelessly, it can completely ruin everything. Different factors come together to
influence people’s decision to use digital learning apps as educational tools, and
these factors also determine people’s readiness to use them. Based on the previously
Deciphering the Catalysts Influencing the Willingness to Embrace … 177

presented data, the following conclusions can be drawn about the factors influencing
the propensity to use digital learning applications:
Professed Usefulness: People’s inclination to use digital learning apps is signifi-
cantly influenced by their perceived benefits. If people believe that using technology
will improve and add value to their educational experience, they are more likely to
adopt it.
Technological Proficiency: New digital learning resources are more likely to
be welcomed by those who are more at ease with technology. The willingness
to use digital learning apps is strongly influenced by one’s level of technological
competency.
Perceived Ease of Use: Technology that is navigable, intuitive, and easy to use
influences adoption intentions in a good way. People’s willingness to use digital
learning apps is significantly influenced by how simple they believe them to be to
use.
Social Influence: People who use technology tend to be those who support or
endorse it. Social factors, like peer endorsements, celebrity endorsements, or teacher
support, can have a big impact on people’s decisions to use digital learning tools.
Institutional Support: People are more likely to adopt new technology when it is
supported and encouraged in schools. The adoption of digital learning applications
can be greatly influenced by legislative and educational support.
Infrastructure and accessibility: One essential technological infrastructure that
can support or impede the desire to adopt digital learning applications is having
access to devices and reliable internet connectivity.
These factors need to be fully addressed for the adoption of digital learning apps
to be successful. To establish an environment that supports the advantages of digital
learning, provides users with the resources and training they require, and guarantees
a seamless and safe experience, cooperation between educational institutions, legis-
lators, and technology suppliers is required. By doing this, we can make the most
of the revolutionary potential of digital learning apps to revolutionize the educa-
tional landscape and provide excellent, engaging learning opportunities for students
everywhere. Educational stakeholders can facilitate the seamless and efficient inte-
gration of digital learning apps into the classroom by recognizing and addressing
these issues.

6 Conclusion

In conclusion, the many factors that together determine individuals willingness to

adopt these innovative learning tools affect their willingness to use digital learning
applications. An important part is the perceived value of the users of these applications
in terms of improving learning outcomes and acquiring new skills. Another factor
promoting adoption is the availability and accessibility of the necessary technolog-
ical infrastructure. Cost-effectiveness, flexibility and customization influence people
and willingness to embrace digital learning. In addition, social influence, institutional
178 A. Srivastava and N. S. Nautiyal

support, awareness and education are necessary to maintain positive attitudes towards
these applications. Trust and security concerns must be addressed to instil confidence
in potential customers. Many factors influence an individual’s willingness to adopt
digital learning applications, such as perceived value, usability, technical expertise,
social impact, accessibility, institutional support, personal creativity, and cost. To
ensure the widespread adoption of digital learning applications, developers, educa-
tors, and policymakers must consider these variables and manage them appropriately.
By understanding what factors influence adoption intentions, they can develop and
market digital learning applications that better meet the needs and preferences of
teachers and students.

References

1. Costly KC (2014) The positive effects of technology on teaching and student learning. ERIC.
https://eric.ed.gov/?id=ED554557
2. Akour H (2009) Determinants of mobile learning acceptance: an empirical investigation in
higher education, Dissertation, Oklahoma State University, Oklahoma
3. Lillejord BK, Nesja K, Ruud E (2018) Learning and teaching with technology in higher
education – a systematic review, Oslo: Knowledge Centre for Education. www.kunnskaps
senter.no
4. Klimova B, Poulová P (2016) Surveying university teaching and students’ learning styles. 19.
444–458. https://doi.org/10.1504/IJIL.2016.076794
5. Hwang GJ, Tsai CC (2011) Research trends in mobile and ubiquitous learning: a review of
publications in selected journals from 2001 to 2010. Br J Edu Technol 42(4):E65–E70
6. Park SY (2009) An analysis of the technology acceptance model in understanding university
students’ behavioral intention to use e-learning. J Educ Technol Soc 12(3):150–162
7. Teo T (2009) Modelling technology acceptance in education: a study of pre-service teachers.
Comput Educ 52(2):302–312. https://doi.org/10.1016/j.compedu.2008.08.006
8. Legris P, Ingham J, Collerette P (2003) Why do people use information technology? A critical
review of the technology acceptance model. Inform Manage 40:191–204. https://doi.org/10.
1016/S0378-7206(01)00143-4
9. King W, He J (2006) Understanding the role and methods of meta-analysis in IS research.
Communications of The Ais—CAIS 16
10. Arksey H, O’Malley L (2005) Scoping studies: towards a methodological framework. Int
J Social Res Methodol Theory Pract 8(1):19–32. https://doi.org/10.1080/136455703200011
9616
11. Davis FD (1989) Perceived usefulness, perceived ease, and user acceptance of information
technology. MIS Quarterly
12. Yuen AH, Ma WW (2008) Exploring teacher acceptance of e-learning technology. Asia-Pacific
J Teacher Educ 36:229–243
13. Teo T, Schaik P (2012) Understanding the intention to use technology by preservice teachers: an
empirical test of competing theoretical models. Int J Human–Computer Interact 28(3):178–188.
https://doi.org/10.1080/10447318.2011.581892
14. Dung DTH (2020) The advantages and disadvantages of virtual learning. IOSR J Res Method
Educ (IOSR-JRME) 10(3):45–48
15. Motiwalla LF (2007) Mobile learning: a framework and evaluation. Comput Educ 49(3):581–
596
16. Jahangir N, Begum N (2008) The role of perceived usefulness, perceived ease of use, security
and privacy, and customer attitude to engender customer adaptation in the context of electronic
banking. Afr J Bus Manage 2(2):32
Deciphering the Catalysts Influencing the Willingness to Embrace … 179

17. Nevin R (2009) Supporting 21st century learning through google apps. Teach Libr 37(2):35–38
18. Roca JC, Chiu CM, Martínez FJ (2006) Understanding e-learning continuance intention: an
extension of the technology acceptance model. Int J Hum Comput Stud 64(8):683–696. https://
doi.org/10.1016/j.ijhcs.2006.01.003
19. Caudill JG (2007) The growth of m-learning and the growth of mobile computing: Parallel
developments. The International Review of Research in Open and Distributed Learning
8(2):(2007). https://doi.org/10.19173/irrodl.v8i2.348
20. Sarrab M, Elbasir M, Alnaeli S (2016) Towards a quality model of technical aspects for mobile
learning services: an empirical investigation. Comput Hum Behav 55:100–112
21. Bonk CJ, Wisher RA, Lee JY (2004) Moderating learner-centered e-learning: Problems and
solutions, benefits and implications. In: Online collaborative learning: Theory and practice, pp
54–85. IGI Global
22. Jou M, Tennyson RD, Wang J, Huang SY (2016) A study on the usability of E-books and
APP in engineering courses: a case study on mechanical drawing. Comput Educ 92:181–193.
https://doi.org/10.1016/j.compedu.2015.10.004
23. Hsu CL, Lin JCC (2015) What drives purchase intention for paid mobile apps?–An expectation
confirmation model with perceived value. Electron Commer Res Appl 14(1):46–57. https://doi.
org/10.1016/j.elerap.2014.11.003
24. Fondevila Gascon JF, Carreras Alcalde M, Seebach S, Pesqueira Zamora MJ (2015) How elders
evaluate apps: a contribution to the study of smartphones and to the analysis of the usefulness
and accessibility of ICTS for older adults. Mobile Media Commun 3(2):250–266. https://doi.
org/10.1177/2050157914560185
25. Almaiah MA, Al Mulhem A (2019) Analysis of the essential factors affecting of intention to use
of mobile learning applications: a comparison between universities adopters and non-adopters.
Educ Inform Technol 24(2):1433–1468
26. Hashim KF, Tan FB, Rashid A (2015) Adult learners’ intention to adopt mobile learning: a
motivational perspective. Br J Edu Technol 46(2):381–390
27. Alqahtani M, Mohammad H (2015) Mobile applications’ impact on student performance and
satisfaction. Turkish Online J Educ Technol-TOJET 14(4):102–112
28. Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika,
16, 297–334. (1951) https://doi.org/10.1007/BF02310555
29. Kaiser HF, Michael WB (1975) Domain validity and generalizability. Educational and
psychological measurement 35:31–35. https://doi.org/10.1177/001316447503500103
Pedagogical Explorations in ICT:
Navigating the Educational Landscape
with Web 2.0, 3.0, and 4.0
for Transformative Learning Experiences

Navtika Singh Nautiyal and Deepak Mashru

Abstract This study examines the substantial effects of Web 2.0, 3.0, and 4.0 on
content production, distribution, and assessment. The three main goals of this study
are to: first, thoroughly examine the development of content creation, delivery, and
evaluation from Web 2.0 to Web 4.0; second, recognize and evaluate the advantages
and drawbacks of using ICT tools in pedagogy; and third, offer forward-looking
suggestions for the future development and application of content creation, delivery,
and evaluation approaches. Five stimulating case studies are used in this paper to
further highlight the useful advantages of using Web 2.0, 3.0, and 4.0 technologies
in educational settings. These case studies are thoroughly examined to show the
transformational potential of ICT in education. The results of the study identify
various barriers to the apt use of ICT tools and also propose a model VITAL CRIMP
for study such as the need for the required technological infrastructure, in-depth
teacher training, and ensuring equal access to technology.

Keywords Web 2.0 · Web 3.0 · Web 4.0 · ICT in pedagogy · Transformative
learning

1 Introduction

Education has undergone significant changes as a result of the development of infor-

mation and communication technology (ICT). The change from Web 2.0 to Web 4.0,
in particular, has had a substantial impact on the creation, distribution, and evaluation
of content in ICT-powered education. Web 2.0 marked a turning point in education
by ushering in a new era of user-generated content and collaboration. Teachers and
students have given learners the ability to actively take part in the development and
sharing of data through the use of platforms like wikis, blogs, and social media. This
shift from passive consumption to active participation transformed the development

N. S. Nautiyal (B) · D. Mashru

National Forensic Sciences University, Gandhinagar, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 181
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_16
182 N. S. Nautiyal and D. Mashru

and delivery of instructional resources. Students become co-creators of information

through creating collaborative learning environments and promoting peer knowl-
edge exchange. The Semantic Web, which is often known as Web 3.0, further altered
teaching as the digital world developed. This time period saw the introduction of
sophisticated algorithms and technology that could analyze and comprehend data.
The transmission of personalized and flexible content was made possible by the inte-
gration of these technologies. Utilizing learner data, learning management systems,
recommendation engines, and intelligent tutoring systems offer personalized learning
opportunities that take into account each student’s preferences, learning preferences,
and proficiency levels. More engaging and productive educational experiences have
been produced as a result of the move towards personalized learning.
Pedagogy has expanded with the arrival of Web 4.0, often known as the Intelligent
Web. In order to revolutionize content generation, distribution, and assessment, this
age integrates cutting-edge technology including artificial intelligence (AI), machine
learning (ML), augmented reality (AR), and virtual reality (VR). By automating
procedures, producing intelligent feedback, and developing adaptable learning mate-
rials, AI and ML algorithms improve content development. With the use of AR and
VR technology, students may interact with information in a three-dimensional envi-
ronment while experiencing immersive and engaging learning experiences. Incor-
porating these tools has broadened pedagogy, increasing students’ creative thinking,
critical thinking skills, and problem-solving abilities.
The use of ICT technology has transformed content creation, distribution, and
evaluation by enabling personalized, interactive, and immersive learning experiences.
By studying how pedagogy has evolved in the digital age and offering inspiration for
new developments in instructional tactics, this research aims to enhance the use of
ICT in education. There are several advantages and problems that have come with the
shift from Web 2.0 to Web 4.0 in ICT-driven education. Technology infrastructure,
lecture preparation, and equal access to technology must all be carefully taken into
account before ICT technologies can be adopted. The benefits, though, are numerous.
Through cooperative content creation, peer learning and the development of digital
literacy skills are fostered. Learner motivation and engagement are increased when
tailored and interactive materials are delivered. By providing insights into learners’
growth, learning analytics and data-driven evaluation enable targeted interventions
and well-informed decision-making.
This study aims to investigate how content creation, distribution, and assessment
have changed from Web 2.0 to Web 4.0 in ICT-driven education. This study aims
to shed light on the advantages and disadvantages of using ICT in the classroom
through a review of the literature and analysis of well-known case studies. The
results will aid in understanding how teaching and learning practices have changed
as a result of Web 2.0, 3.0, and 4.0. This research will also provide suggestions for a
cutting-edge method of developing, presenting, and evaluating material in order to
optimize the use of ICT tools and technologies to improve learning outcomes and
student engagement. Pedagogy has changed as a result of the introduction of Web
2.0, 3.0, and 4.0, and educators and students now have greater options to participate
in cutting-edge teaching and learning activities.
Pedagogical Explorations in ICT: Navigating the Educational … 183

The study compiles the most recent data, offers insights into best practices, and
inspires additional research and innovation in the area of ICT in education. By
utilizing the transformative potential of ICT, educators may create efficient, inter-
esting, and individualized learning experiences that prepare students for the demands
of the twenty-first century. During this era known as Industry 4.0, many sectors
of life, including education, witnessed rapid change. Education systems must be
updated to meet the skilled workforce requirements of this dynamic process. In the
near future, it is projected that smart products, services, and business opportunities
will be widely used across all industries. The complete digital transformation of
instructional processes supports and guides the use of technical, human, organiza-
tional, and pedagogical factors. Education 4.0 aims to equip students with a range
of skills, including cognitive, social, interpersonal, and technical ones, in order to
meet the demands of the Fourth Industrial Revolution and address global concerns,
such as reducing the causes and consequences of climate change based on public
awareness.

1.1 Objectives of the Study

A. To look at how Web 2.0 and Web 4.0 have changed how material is created,
delivered, and evaluated in ICT-driven education.
B. To determine the benefits and drawbacks of integrating ICT technologies into
pedagogy for the development, delivery, and assessment of content.
C. To offer suggestions for a cutting-edge method of producing, distributing, and
evaluating material in the context of Web 2.0, 3.0, and 4.0.

1.2 Hypotheses of the Study

I. Students are more engaged and involved in ICT-driven education when Web
2.0, 3.0, and 4.0 technologies are used in content production, distribution, and
assessment.
II. The use of ICT technologies in pedagogy to create, distribute, and evaluate
content raises the quality and efficacy of learning resources and teaching
strategies.
III. The use of Web 2.0, 3.0, and 4.0 in the production, distribution, and assessment
of content has a favorable impact on students’ learning outcomes and academic
success in ICT-powered education.
184 N. S. Nautiyal and D. Mashru

2 Literature Review

A paradigm shift in contemporary education is underlined by the literature on the

use of Information and Communication Technologies (ICT) in pedagogy. The use of
Web 2.0, 3.0, and 4.0 technologies has drawn a lot of interest because of its potential
to completely change the way that people learn. The literature also emphasizes the
significance of tackling issues like uneven levels of digital literacy and establishing
pedagogical congruence in order to fully reap the rewards of modern technologies.
This literature review exposes the complex effects of ICT integration in pedagogy
by combining ideas from numerous academic investigations, and it provides a thor-
ough framework for comprehending the changing environment of transformational
educational practices.

2.1 Table of the Literature

Table 1 shows the existing literature with respect to the use of Information and
Communication Technologies (ICT) in pedagogy.

3 Comparative Analysis of Web 2.0, 3.0, and 4.0

The comparison of Web 2.0, 3.0, and 4.0 encompasses a lively investigation of
the changing digital environment and its influence on paradigms in education. Web
2.0 dramatically changed how information is consumed and shared by introducing
user-generated content, interactivity, and collaboration. Context-aware and machine-
understandable data were introduced with the shift to Web 3.0, also known as the
Semantic Web, opening the door for more intelligent search and customized content
delivery. This stage has been crucial in developing an online environment that is more
contextualized and networked, with possible effects on improving individualized
learning experiences. The study becomes more difficult when we explore Web 4.0,
which is highlighted by the Internet of Things (IoT) and growing artificial intelligence
integration. Web 4.0 has promise.
The Table 2 now includes the descriptions along with the objectives, tools, and
examples for content creation, content delivery, and content evaluation in Web 2.0,
Web 3.0, and Web 4.0.
Pedagogical Explorations in ICT: Navigating the Educational … 185

Table 1 Summarized literature review in a tabular format

Sr Title of the source Names of author Key findings of the Research gap
sources identified
1 “ICT In Pedagogy: Selwyn [1] Need for a critical Lack of critical,
Unleashing The Power Of approach in studying in-depth studies
Web 2.0, 3.0, And 4.0 For the impact of Web 2.0 on how tech
Transformative Learning in pedagogy, integrates with
Experiences” considering social and pedagogy in
cultural factors varied contexts
2 “A comparative analysis Voogt & Roblin [2] Importance of Need for
of international integrating ICT concrete
frameworks for know-how and guidelines and
twenty-first century twenty-first century professional
competences: competences for development for
Implications for national national curriculum teachers
curriculum policies” policies
3 “Use of Web 2.0 Hew & Cheung [3] Need for more Scarcity of
technologies in K-12 and rigorous research to studies
higher education: The establish providing robust
search for evidence-based evidence-based evidence on the
practice” practices for Web 2.0 effectiveness of
in education these
technologies
4 “Augmented reality in Bower [4] AR can provide Limited practical
education: Cases, places, interactive and case studies on
and potentials” immersive AR’s efficacy in
environments, but real-world
effective use requires classrooms
proper design and
infrastructure
5 “Digital natives, digital Prensky [5] Educational methods Gap in
immigrants” need to adapt to the pedagogical
tech-savvy ’digital approaches that
natives’ can bridge the
generational
divide
6 “Designing for mobile Kukulska-Hulme Potential for mobile Need for studies
and wireless learning” & Traxler [6] and wireless learning focusing on
to transform pedagogy design principles
for mobile
learning
7 “Integrating technology Hew & Brush [7] Significant gaps in More research is
into K-12 teaching and understanding needed to
learning: Current effective tech understand the
knowledge gaps and integration in K-12 varied ICT tools’
recommendations for settings impact
future research”
(continued)
186 N. S. Nautiyal and D. Mashru

Table 1 (continued)
Sr Title of the source Names of author Key findings of the Research gap
sources identified
8 “Comparing frameworks Dede [8] ICT can be a catalyst Need for
for twenty-first century in enhancing the teaching
skills” development of methods that can
twenty-first century integrate these
skills technologies
efficiently
9 “NMC Horizon Report: Johnson, Adams Web technologies can Lack of
2014 K-12 Edition” Becker, Estrada & enable personalized longitudinal
Freeman [9] and collaborative studies that show
learning experiences the long-term
impacts
10 “Digital technology and Selwyn[10] Careful consideration Research is
the contemporary is needed when lacking in
university: Degrees of integrating digital tech addressing
digitization” in higher education digital
inequalities in
higher education

4 Case Studies Related to Learning a Platform

The integration of learning platforms has opened up transformational pedagogical

opportunities in the field of modern education, as demonstrated by incisive case
studies that highlight their numerous effects (Table 3).
These case studies show how, in the context of ICT integration, content develop-
ment, distribution, and assessment have evolved. They emphasize the transforma-
tional potential of Web 2.0, 3.0, and 4.0 in education by showcasing the effective use
of ICT tools and technology and highlighting the beneficial influence on pedagogy
and learning outcomes.

5 Identification of Research Gap Findings and Analysis

Comparing and contrasting the obstacles, benefits, and consequences of using ICT
in content production, delivery, and assessment, along with the discussion of notable
themes or patterns discovered, is part of the analysis of the findings from the literature
study and case studies.
Pedagogical Explorations in ICT: Navigating the Educational … 187

Table 2 The comparative analysis of Web 2.0, 3.0, and 4.0

Web Content creation Content delivery Content evaluation
generation
Web 2.0 Web 2.0 introduced With Web 2.0, content Web 2.0 facilitated peer
user-generated content delivery became more assessment and feedback
platforms such as wikis, interactive and through collaborative
blogs, and social media, learner-centered. Tools platforms, enabling
allowing students and like discussion forums, learners to engage in
educators to actively video-sharing platforms, active evaluation and
create and share content. and online learning reflection. It also allowed
This has promoted management systems for immediate feedback
collaborative learning and enable multimedia-rich through online quizzes,
knowledge sharing among and personalized content assessments, and
learners delivery discussion forums
Web 3.0 Web 3.0, also known as Web 3.0 has enhanced Web 3.0 has expanded the
the Semantic Web, content delivery through capabilities of content
introduced intelligent personalized evaluation through
algorithms and recommendations and learning analytics.
technologies that can adaptive learning Advanced data analytics
process and understand platforms. Intelligent and visualization
the meaning of data. This systems can analyze techniques can provide
has enabled advanced learners’ data, insights into learners’
content creation tools, preferences, and learning progress, engagement, and
including AI-generated styles to provide tailored performance, allowing
content and intelligent content, ensuring a more educators to make
tutoring systems, which personalized and data-driven decisions and
can adapt to individual engaging learning interventions
learners’ needs experience
Web 4.0 Web 4.0, also referred to Web 4.0 leverages AI, Web 4.0 enhances content
as the Intelligent Web, ML, AR, and VR evaluation by leveraging
integrates emerging technologies to offer AI algorithms for
technologies like AI, ML, highly interactive and automated assessment,
AR, and VR into content personalized content intelligent feedback, and
creation. It enables the delivery. Learners can learning analytics.
development of engage in virtual Intelligent systems can
immersive and interactive reality-based simulations, assess complex projects,
learning materials, augmented reality provide personalized
simulations, and virtual applications, and adaptive feedback, and generate
experiences that foster learning platforms that detailed performance
deeper understanding and provide real-time analytics, enabling more
engagement feedback and guidance comprehensive evaluation
and support

5.1 Findings from Literature Review

According to the literature study, the use of ICT in pedagogy has significantly
improved content generation, delivery, and assessment. The value of learner-centered
methodologies, collaborative learning, and individualized teaching in raising student
engagement and accomplishment was underlined by key theories and concepts. The
188 N. S. Nautiyal and D. Mashru

Table 3 The demonstration of analysis of cases

Case Basis Background ICT tools and Pedagogy and learning
study technologies outcomes affected
1 Collaborative This case study Wiki sites, The collaborative learning
Learning examines how a chat boards, environment encouraged
Platform in a collaborative and software cooperation, information
High School learning platform for creating exchange, and student
was implemented in multimedia participation. Students
a high school are examples actively participated in the
environment using of ICT tools design of the curriculum,
Web 2.0 tools like and which facilitated greater
wikis and technologies comprehension and enhanced
discussion boards learning outcomes
2 Personalized This case study Learning By offering specialized
Adaptive looks into the use of analytics, AI material and flexible
Learning in Web 3.0 algorithms, feedback, the personalized
Higher technologies and and adaptive adaptive learning platforms
Education clever algorithms to learning improved student engagement
create personalized systems and created more customized
adaptive learning learning opportunities.
systems in higher Students showed greater
education success levels and enhanced
institutions information retention
3 Virtual Reality This case study, VR headgear, Medical students were given
Simulations which incorporates VR games, engaging and lifelike teaching
for Medical Web 4.0 and haptic experiences thanks to the use
Training technology, focuses technology are of VR simulations. It enabled
on the usage of VR examples of practical training in a secure
simulations in ICT tools and and regulated setting,
medical training technologies enhancing skill learning,
programmes enhancing decision-making,
and boosting confidence
(continued)
Pedagogical Explorations in ICT: Navigating the Educational … 189

Table 3 (continued)
Case Basis Background ICT tools and Pedagogy and learning
study technologies outcomes affected
4 Mobile This case study Tools for Mobile learning made it
Learning for investigates the use real-time possible to access language
Language of Web 2.0 and Web translation, learning resources from
Education 3.0 technologies to mobile anywhere at any time,
develop mobile devices, and enabling personalized and
learning initiatives language independent study. Students
in language learning used language in a real way,
teaching applications worked on their
communication abilities, and
improved their fluency and
competency
5 AI-Powered This case study, Natural The AI-driven evaluation tools
Assessment in which incorporates language gave quick, individualized
Higher Web 4.0 processing, feedback, allowing for prompt
Education technology, looks at automated interventions and customized
the usage of evaluation support. Individualized
AI-powered systems, and instruction helped students
evaluation tools in ICT tools and learn more effectively and
higher education technologies comprehend the course
material more fully

research results highlighted the necessity for efficient programmes for teacher profes-
sional development to advance pedagogical abilities and ICT competence. Chal-
lenges were also mentioned in the literature, including the requirement for technical
infrastructure, teacher preparation, and guaranteeing that all pupils have equitable
access to technology.

5.2 Case Study Analysis

The case studies offered actual instances of ICT tools and technologies being success-
fully applied in content generation, distribution, and assessment. Each case study
demonstrated distinct methods and results, demonstrating the adaptability of ICT
integration in various educational situations. Improved learning outcomes, personal-
ized learning experiences, and more student engagement were common benefits seen
throughout the case studies. The necessity for initial technology investment, contin-
uous technical assistance, and seamless integration of ICT into current curricula
and pedagogical practices were among the difficulties noted. The growing usage of
adaptive learning platforms, virtual reality simulations, and AI-powered evaluation
systems for more individualized and interactive learning experiences were notable
trends identified.
190 N. S. Nautiyal and D. Mashru

5.3 Comparison and Contrast

The investigation found that the difficulties, benefits, and results of adopting ICT in
content production, distribution, and assessment were both similar and different. The
need for technical infrastructure, teacher training, and tackling the digital gap among
pupils were common problems. Increased student involvement, better resource avail-
ability, and the possibility of individualized instruction were all benefits noted.
Results were reported in a variety of ways, including greater motivation and self-
directed learning, improved student accomplishment, and improved critical thinking
and problem-solving abilities. It was a key development that provided immersive
and interactive learning experiences when AI, ML, AR, and VR technologies were
combined.

5.4 Discussion of Significant Trends

The inquiry highlighted the growing trend of combining artificial intelligence (AI),
machine learning (ML), augmented reality (AR), and virtual reality (VR) technolo-
gies in content development, delivery, and evaluation, leading to more immersive and
personalized learning experiences. Because adaptive learning platforms and intelli-
gent tutoring systems are integrated, ICT can provide customized instruction and
flexible feedback. The relevance of student participation and information sharing
was highlighted by the focus on collaborative and social learning using Web 2.0
tools. The use of AI-driven assessment systems signaled a change to evaluation
practices that are more effective and data-driven. Overall, the research showed that
the incorporation of ICT in education has led to a move towards learner-centered,
interactive, and personalized methods.
By analyzing the findings from the literature review and case studies, this research
highlights the common challenges, advantages, and outcomes of using ICT in content
creation, delivery, and evaluation. The exploration of major themes sheds light on
how pedagogy is changing as well as the potential of ICT to revolutionize education.
These results add to the body of current information and provide a framework for
further study and the creation of best practices in ICT-integrated schooling.

6 Proposed Model: Vital Crimp

“VITAL CRIMP”—Virtual and Augmented Reality Technologies and AI Lever-

aging Communication, Revolutionizing ICT in Machine Learning Pedagogy. The
“Vital CRIMP Model” represents an innovative approach to modern educational
methodologies, particularly within the realm of technology. This model integrates
Pedagogical Explorations in ICT: Navigating the Educational … 191

the immersive capacities of Virtual and Augmented Reality (VR/AR) with the
sophisticated algorithms of Artificial Intelligence (AI).
What sets this model apart is its utilization of VR and AR to provide learners
with an enriched, three-dimensional interactive experience, elevating the traditional
confines of classroom learning. This immersive environment facilitates a deeper
understanding and engagement with the subject matter. Additionally, the integra-
tion of AI ensures that educational content is tailored to the individual learner’s
preferences and needs, optimizing the learning experience.
The synergy of AI and VR/AR not only amplifies interactivity and personalization
but also enhances the overall effectiveness and creativity of the learning process. It
fosters collaborative efforts, stimulates critical thinking, and presents opportunities
for innovative problem-solving within an enriched educational landscape.
In an era where technological advancements are consistently reshaping educa-
tional paradigms, the Vital CRIMP Model emerges as a leading-edge initiative.
It underscores the potential of harnessing technological integration to amplify
educational efficacy and innovation.
1. Objectives of the Model
• To integrate VR, AR, AI, and ML with ICT in pedagogy;
• To provide immersive learning experiences using VR and AR;
• To personalize learning using AI and ML;
• To facilitate communication and access to information using ICT;
• To revolutionize teaching and learning.
2. Advantages Over Existing Models
• Provides immersive learning experiences using VR and AR;
• Personalizes learning using AI and ML;
• Facilitates communication and access to information using ICT;
• Adapts to the learner’s level of understanding using ML;
• Revolutionizes teaching and learning.

6.1 Step-By-Step of the Model

1. Delivery of material: To offer personalized material, AI-powered systems

examine learner data (such as learning preferences, styles, and performance). This
guarantees that the information is interesting and relevant to each learner.
2. Immersive Learning: VR and AR technologies create immersive learning
experiences that enhance understanding and retention. For instance, AR may make
textbook material come to life while VR can be utilized for virtual field excursions.
3. Adaptive Assessment: ML algorithms use learner data (such as performance on
prior assessments) to modify upcoming tests to the learner’s level of comprehension.
This guarantees that exams are difficult but manageable.
4. Feedback and Improvement: AI gives learners immediate feedback, assisting
them in identifying areas in which they may develop. In order to customize the
192 N. S. Nautiyal and D. Mashru

Start

1. Delivery of Material: AI Systems examine learner data to offer personalized and

relevant material

2. Immersive Learning: VR and AR technologies create immersive experiences

that enhance understanding and retention

3. Adoptive Assessment: ML algorithms analyze learner data to adopt future

assessments to the learner’s level of understanding

4. Feedback and Improvement: AL provides Instant Feedback, and ML algorithms

analyze this data over time to adapt the learning path for each learner

End

Fig. 1 Flowchart of the model

learning route for each student, ML algorithms analyze this data over time. This
guarantees that the learning process is always being improved (Fig. 1).

7 Conclusion

This research article has examined the significant impact of Web 2.0, 3.0, and 4.0 on
the production, distribution, and assessment of content in the context of ICT-driven
education. The adoption of ICT tools and technology has significantly altered how
educators create, deliver, and evaluate educational content. This study has demon-
strated the difficulties, benefits, and effects of integrating ICT in education through a
thorough assessment of the literature and analysis of case studies. The revolutionary
potential of Web 2.0, 3.0, and 4.0 to transform education is highlighted in this study
report. By using ICT in the design, delivery, and assessment of information, it is
possible to create effective, engaging, and personalized learning experiences. This
study, by synthesizing current information, offering insights into best practices, and
encouraging more research, significantly advances the topic of ICT in education. It
aids in maximizing ICT’s educational potential. This study highlights the dynamic
nature of technology and the value of ongoing professional growth, collaborative
learning settings, and effective use of cutting-edge tools. It provides intelligent guid-
ance on how educational institutions and teachers should employ Web 2.0, 3.0, and
4.0 in the classroom.
Pedagogical Explorations in ICT: Navigating the Educational … 193

References

1. Selwyn N (2010) ICT in pedagogy: unleashing the power of Web 2.0, 3.0, and 4.0 for
transformative learning experiences. J Comput Assist Learn 26(1):65–73
2. Voogt J, Roblin NP (2012) A comparative analysis of international frameworks for 21st century
competences: implications for national curriculum policies. J Curric Stud 44(3):299–321
3. Hew KF, Cheung WS (2013) Use of Web 2.0 technologies in K-12 and higher education: the
search for evidence-based practice. Educ Res Rev 9:47–64
4. Bower M (2015) Augmented reality in education: cases, places, and potentials. Educ Media
Int 52(1):1–15
5. Prensky M (2001) Digital natives, digital immigrants. On the Horizon 9(5):1–6
6. Kukulska-Hulme A, Traxler J (2007) Designing for mobile and wireless learning, Routledge
7. Hew KF, Brush T (2007) Integrating technology into K-12 teaching and learning: current
knowledge gaps and recommendations for future research. Education Tech Research Dev
55(3):223–252
8. Dede C (2010) Comparing frameworks for 21st century skills. Harvard Educ Rev 80(2):76–108
9. Johnson L, Adams Becker S, Estrada V, Freeman A (2014) NMC Horizon Report: 2014 K-12
Edition. The New Media Consortium
10. Selwyn N (2016) Digital technology and the contemporary university: Degrees of digitization.
Routledge
Admission Prediction for Universities
Using Decision Tree Algorithm
and Support Vector Machine

Khushbu Trivedi, Jenisia Dsouza, Shivam Kumar, Vatsal Saxena,

Shravani Kulkarni, Susanta Das, Parineeta Kelkar, Piyush Bhosale,
and Ritul Dhanwade

Abstract Nowadays, educational institutions rely on data analytics, probability

models, and a few weighted combination models to forecast which universities
students are likely to gain admission to, especially prestigious institutions like IITs.
The admission process considers various factors such as school scores and perfor-
mance in competitive exams. Numerous machine learning algorithms have been
developed to aid in admission predictions. In this study, decision tree and support
vector machine algorithms were used to predict which universities students are likely
to be accepted to based on the previous year’s data analysis. We also discussed the
accuracy of both models.

K. Trivedi · J. Dsouza · S. Kumar · V. Saxena · S. Kulkarni · S. Das (B) · P. Kelkar · P. Bhosale ·

R. Dhanwade
School of Engineering, Ajeenkya DY Patil University, Charoli Bk. Via Lohegaon, Pune,
Maharashtra 412105, India
e-mail: [email protected]
K. Trivedi
e-mail: [email protected]
J. Dsouza
e-mail: [email protected]
S. Kumar
e-mail: [email protected]
V. Saxena
e-mail: [email protected]
S. Kulkarni
e-mail: [email protected]
P. Kelkar
e-mail: [email protected]
P. Bhosale
e-mail: [email protected]
R. Dhanwade
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 195
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_17
196 K. Trivedi et al.

Keywords Decision tree algorithm · Support vector algorithm · Admission

prediction

1 Introduction

In India, many students aspire to join prestigious institutions such as IITs and NITs
but often do not apply due to low confidence in their academic performance [1]. To
assist students in determining their chances of admission, various machine learning
algorithms have been developed to predict university acceptance based on factors
such as exam scores and rankings. The accuracy of these predictors is essential for
students to make informed decisions about where to apply [1–4].
This paper aims to compare the accuracy of two machine learning algorithms,
decision tree and support vector, for predicting university admission using a dataset of
previous year’s 10th, 12th, and AIEEE (All India Engineering Entrance Examination)
exam scores and university acceptances. By training the algorithms on this dataset,
we can estimate the probability of a student’s acceptance into a reputable university.

2 Literature Survey

In the research, Patel et al. [5] explain that machine learning is a process of teaching
computers new skills through training and testing datasets, allowing them to make
predictions without explicit programming under different conditions. One of the
popular machine learning techniques is Decision Trees, which have been applied
in various industries and applications such as text extraction, medical certification,
statistical analysis, and search engines. There are several decision tree algorithms
available, including ID3, C4.5, and CART, each developed based on their accuracy
and cost-effectiveness. Selecting the most appropriate algorithm for each decision-
making scenario is crucial for efficient and accurate results.
In the research paper, Arunakumari et al. [6] explore the issue of students making
mistakes in their choice of preferred colleges, which can lead to regret later on.
Factors such as faulty college analysis, ignorance, and anxious projection can all
contribute to poor decision-making. To address this issue, the researchers developed
an automated web application prediction model for a college admission system that
utilizes data analysis and data mining techniques. By carefully reviewing the cut-off
numbers from the preceding five years, a preference list is created using inputs like
rank, category, preferred branches, desired districts, and selected universities. This
model aims to help students choose a good institution before being assigned and
make informed decisions to avoid future regrets.
In the paper by Singhal et al. [7], the authors explore the advantages of using
machine learning algorithms in accurately developing applications. While various
Admission Prediction for Universities Using Decision Tree Algorithm … 197

applications exist that claim to predict a student’s chances of getting a seat in a univer-
sity in the USA, most of them lack reliability and effectiveness. However, machine
learning offers several algorithms that can help create a reliable representation. The
objective of this study is to compare and determine which machine learning algo-
rithm—multi-linear regression, polynomial regression, or random forest—provides
the most accurate results for the given dataset. The inputs for these algorithms include
the candidate’s GRE score (Graduate Record Examinations), TOEFL score (Test of
English as a Foreign Language), and CGPA. The dataset is used to train the repre-
sentation, and the output is the percentage chance of securing a seat in a reputed
university.
In their research paper, Chithra et al. [8] address the challenges faced by students
seeking higher education in the United States, particularly those pursuing a master’s
degree. The study focuses on creating a model, called UAP, that takes into account
all the important factors that affect a student’s admission to a university in the US.
These factors include test scores, statement of purpose, letter of recommendation,
and the selection of universities to apply to. The UAP model provides a user-friendly
interface for students to access and accurately predicts their chances of admission to
the universities of their choice.
In the study, Aljasmi et al. [9] emphasize the significance of precise forecasting
of student admission for educational institutions. Using multiple machine learning
algorithms such as multiple linear regression, k-nearest neighbor, random forest, and
multilayer perceptron, the researchers determined the probability of a student being
admitted to a master’s programme. The multilayer perceptron model outperformed
the other models, providing students with essential information on their admission
prospects.
In the research paper, Rajagopal [10] explores the use of logistic regression to
predict university admittance based on various variables. Specifically, the study
focuses on predicting admittance to master’s programs, which typically receive
a high volume of applications. By statistically analyzing independent factors, the
study aims to develop predictive models that can assist in prioritizing the application
screening process and ultimately admit the most qualified applicants. The success of
this approach could have significant implications for improving the efficiency and
accuracy of the graduate school admissions process.

3 Algorithm

Machine Learning can be broadly classified into three categories: supervised

learning, unsupervised learning, and reinforcement learning [11]. Supervised
Learning deals with labeled data, while unsupervised learning deals with unlabeled
data. Supervised Learning is further classified into two subcategories: Classification
and Regression. On the other hand, unsupervised learning is divided into two types:
Clustering and Association. Both Decision Tree and Support Vector Machines are
198 K. Trivedi et al.

classification algorithms, where the output variable is categorical, such as Yes or No,
True or False, etc. Let’s explore these algorithms in detail.

3.1 Decision Tree Algorithm

Decision trees are a popular machine learning algorithm that is widely used in data
mining and decision-making applications [12, 13]. A decision tree consists of a root
node, branches, and leaves. The root node serves as the parent of all other nodes, and
it is the topmost node in the tree. The branches represent the possible outcomes of a
decision, while the leaves represent the final outcome or result.
The algorithm selects the best features and criteria at each node to split the dataset
into subsets, aiming to maximize information gain for classification or minimize
variance for regression.
As further elaborated in [12], decision trees are a type of acyclic graph with a
fixed root. Each node in the tree corresponds to an attribute in the data, and the edges
indicate a decision based on that attribute. By learning basic decision rules inferred
from the data features, the decision tree method seeks to build a model that can
predict the value of a target variable.
Each leaf node in a decision tree is given a class that corresponds to the ideal
target value or result. In contrast, the leaf might store a probability vector that shows
the possibility that the target characteristic would have a particular value. Based on
the results of the tests along the path, we go from the tree’s main node to a leaf node
to classify an instance [12].
The usage of decision trees is widespread in many fields, including engineering,
medicine, and finance [12, 13]. They have several benefits, such as being easy to
understand, straightforward, and able to handle both categorical and numerical data.
They can, however, overfit and be sensitive to slight adjustments in the input data.
All things considered, decision trees are a potent machine learning technique that
may be applied to a variety of classification and regression issues. They are highly
interpretable, making them suitable for explaining the reasoning behind predictions.
They are a well-liked option in the field of data science and machine learning since
they are simple to comprehend, analyze, and apply [5, 12, 13].

3.2 Support Vector Machine

A common supervised learning method used in classification and regression anal-

ysis is called Support Vector Machines (SVM) [14–16]. It has been widely used in
a variety of industries, including bioinformatics, image classification, pattern recog-
nition, and text classification. Finding the ideal hyperplane or decision border that
categorizes data points into multiple classes is the main objective of SVM. The deci-
sion boundary is established by maximizing the distance between the nearest data
Admission Prediction for Universities Using Decision Tree Algorithm … 199

points from each class to the border. The optimal hyperplane is the one with the
largest margin, achieving better generalization.
SVM uses a kernel function to translate data points into a higher dimensional
space so that they can be separated by a hyperplane in the new space. It can efficiently
handle non-linear data by transforming it into a high dimensional space using kernel
functions allowing it to separate complex patterns. The kernel function can be linear,
polynomial, radial basis function (RBF), or sigmoid, and its selection depends on
the data and the problem being addressed. SVM offers several benefits over other
classification algorithms. It is effective in high-dimensional spaces and has a wide
range of feature support. The number of dimensions must be greater than the number
of samples for SVM to be useful. SVM is used in various domains, including image
classification and bioinformatics. They excel in scenarios with high dimensional data
and clear when separation is essential.

4 Implementation

In this section, we present the practical implementation of the Admission Predic-

tion system using two prominent machine learning algorithms: Decision Tree and
Support Vector Machine (SVM). We will briefly review the steps involved in
data preprocessing, model training, hyperparameter tuning, and model evaluation
[14–16]:
1. Data preprocessing
a. Understanding the Problem: Before anything else, it’s important to take some
time to consider the issues or worries that students had before enrolling in
school. The aim of the study should then be to find answers to those issues.
b. Data Collection: Data can be collected from different sources like Kaggle,
UCI Machine Learning Repository, Yocket, etc. After collecting the relevant
data pre-processing and cleaning of the data is done. Pre-processing involves
cleaning the raw data in a cycle, such that it is converted from a real-world
dataset to a flawless dataset. Information pre-processing is the term for the
set of actions that are taken to transform the data into a relatively clean data
collection and make it suitable for analysis. The majority of this knowledge
on the present reality is chaotic, such as noisy, missing, and inconsistent data.
c. Feature Selection: The process of selecting pertinent features from the pre-
processed data that significantly affect the result is known as feature selec-
tion. This step helps to reduce the complexity of the model and improve its
accuracy.
2. Algorithm Selection: This step allows for the performance measures of various
machine learning methods, including decision trees, random forest, support
vector machines (SVM), artificial neural networks (ANN), and others, to be
assessed. Metrics like accuracy, precision, recall, F1-score, etc. may be taken into
200 K. Trivedi et al.

account. But we use SVM and decision tree model. The Decision Tree algorithm
is chosen for its interpretability and ability to capture non-linear relationships
within the data. In addition to Decision Trees, we employ the Support Vector
Machine algorithm, which is known for its effectiveness in handling complex
decision boundaries.
3. Model Training: For both Decision Tree and SVM, the dataset is split into training
and validation sets to train the models. We employ the training data to fit the
models and iteratively refine them through cross-validation techniques. During
this phase, feature importance analysis is performed for Decision Trees to gain
insights into the admission factors.
4. Model Evaluation: The performance of the Decision Tree and SVM models is
assessed using various evaluation metrics, including accuracy, precision, recall,
F1-score, and other metrics, its performance is assessed. Statistical significance
tests are employed to ascertain the differences in performance between the two
models. This stage assists in identifying the model’s assets and liabilities and
determining whether any additional improvements are required.
5. Model Deployment: Model deployment is a crucial step in turning the trained
machine learning models into practical tools for making admission predictions.
Once the model is evaluated and deemed satisfactory, it can be deployed for
college admission prediction. Users can input their academic and personal infor-
mation, and the model will predict the college they are most likely to get
admission based on the input data (Fig. 1).

5 Results

Based on a student’s 10th- and 12th-grade marks, as well as their AIEEE rank, the
provided code conducts a classification exercise to estimate the college to which they
can be admitted. The dataset, called “College Admission Prediction Dataset.csv,”
is a CSV file that is read by pandas and edited by NumPy. Over the years from
2015 to 2019, data related to various factors such as year, 10th-grade marks, 12th-
grade marks, AIEEE rank, and college choices have been collected and analyzed.
This data likely pertains to students’ academic performance and their preferences
for engineering colleges. The purpose of collecting this data could have been to
understand trends and patterns in student performance, as well as to gain insights
into the factors influencing their college choices. The metrics module from scikit-
learn is used to determine the accuracy scores of the Decision Tree Classifier and
SVM (Support Vector Machine) models for classification. The Decision Tree model’s
accuracy score is 0.9013, which shows that it does a good job of predicting the
colleges that a student would be admitted to based on the input features. The SVM
model, on the other hand, has a substantially lower accuracy score than the Decision
Tree model, at 0.5888. This can be because the dataset was little, and small datasets
are not a good fit for the SVM model. Here’s an explanation of why it is not good due to
several reasons such as overfitting, lack of data diversity, reduced model complexity,
Admission Prediction for Universities Using Decision Tree Algorithm … 201

UNDERSTAND
PROBLEM

MODEL
EVALUATION

FEATURE DATA
SELECTION QUALITY

ALGORITHM
SELECTION

MODEL
MODEL ERROR
TRAINING

MODEL
EVALUATION

MODEL
DEPLOYMENT

Fig. 1 Steps of the model evaluation process

202 K. Trivedi et al.

Fig. 2 Comparison of decision tree and SVM algorithms

limited margin of errors, generalization issues. To address these challenges, it is

essential to either gather more data or consider using simpler models that are better
suited for small datasets.
The user can enter their grades and rank to get a forecast of the college they can get
into while testing the algorithms. User input was gathered using the input() method,
which was then processed and predicted using the Decision Tree Classifier algorithm.
Using the matplotlib library, a bar plot is produced to show the performance of
the two models. This graphical representation allows users to easily discern the
accuracy ratings of both models. The comparison reveals that the Decision Tree
model surpasses the SVM model in terms of accuracy, making it the more accurate
choice for predicting college admissions based on the given academic data.
The graph below in Fig. 2. displays a comparison of the algorithms. The X-axis
displays the algorithm, and the Y axis displays the accuracy.

6 Conclusion

In summary, the study evaluated the performance of decision tree and support vector
machine classifiers on a dataset for predicting universities that a student can get
admission into based on their 10th - and 12th-grade marks and AIEEE rank. In the
evaluation of both algorithms, the Decision Tree Classifier emerged as the superior
performer, achieving an impressive accuracy score of 0.9013. In contrast, the Support
Vector Machine (SVM) Classifier lagged behind with an accuracy score of 0.5888.
These results strongly indicate that the Decision Tree Classifier is a promising and
effective technique for predicting college admissions based on the provided academic
data. However, it’s important to note that there is room for further improvement. By
expanding the dataset with a larger and more diverse pool of academic records,
Admission Prediction for Universities Using Decision Tree Algorithm … 203

the model can be exposed to a broader range of scenarios and variations. This can
lead to a more robust and accurate predictive model. Additionally, fine-tuning the
hyperparameters of the Decision Tree Classifier can help optimize its performance
even further. Hyperparameter tuning involves systematically adjusting the settings
of the model to find the configuration that yields the best results.
In summary, while the Decision Tree Classifier has demonstrated its effectiveness
in college admission prediction, ongoing efforts to enhance its accuracy through
dataset expansion and hyperparameter tuning can unlock its full potential, making it
an even more valuable tool for this application. This approach ensures that the model
continues to evolve and provides increasingly accurate predictions for prospective
college applicants.

Disclosure of Interests The authors have no competing interests to declare that are relevant to the
content of this work.

References

1. Raut R, Abnave J, Dikondwar S, Pandita S, Marudwar A (2023) Undergraduate college admis-

sion prediction system using decision tree classifier. In: Sharma KD, Peng S-L, Sharma R, Jeon
G (Eds.) Micro-electronics and telecommunications engineering (proceedings of 6th ICMETE
2022). Book series: Lecture Notes in Networks and Systems, Springer Nature, Springer
2. Liu Y-S, Lee L (2022) Evaluation of college admissions: a decision tree guide to provide
information for improvement. Human Social Sci Commun 9:390. https://doi.org/10.1057/s41
599-022-01413-z
3. Basu K, Basu T, Buckmire R, Lal N (2019) Predictive models of student college commitment
decisions using machine learning. Data 4:65. https://doi.org/10.3390/data4020065
4. Reddy SB, Pallavi B, Shruthi B, Rohini T (2022) Machine learning framework for prediction
of admission in engineering college. J Crit Rev 10(01):133–143. ISSN: 2394-5125
5. Patel HH, Prajapati P (2018) Study and analysis of decision tree based classification algorithms.
Int J Comput Sci Eng 6(10):74–78
6. Arunakumari BN, Vishnu SHK, Sheetal N, Shashidhar R (2021) An automated predic-
tion model for college admission system. Ilkogretim Online-Elementary Education Online
20(6):1172–1180
7. Singhal S, Sharma A (2020) Prediction of admission process for graduate studies using AI
algorithm. Eur J Mol Clin Med 7(4):116–120
8. Chithra ADA, Malepati CN, Rohith P, Bindu SS, Swaroop S (2020) Prediction for university
admission using machine learning. Int J Recent Technol Eng 8(6):2277–3878
9. Aljasmi S, Nassif BA, Shahin I, Elnagar A (2020) Graduate admission prediction using machine
learning. Int J Comput Commun 14:79–83. https://doi.org/10.46300/91013.2020.14.13
10. Rajagopal PKS (2020) Predicting student university admission using logistic regression. Eur J
Comput Sci Inform Technol 8(3):46–56
11. Jhaveri HR, Revathi A, Ramana K, Raut R, Dhanaraj KR (2022) A review on machine
learning strategies for real-world engineering applications. Mobile Inform Syst 2022 (article
ID 1833507):1–26 (2022). https://doi.org/10.1155/2022/1833507
12. Rokach L, Maimon O (2005) Decision tress. In: Maimon O, Rokach L (eds) Data mining and
knowledge discovery handbook. Springer, Boston, MA (2005). https://doi.org/10.1007/0-387-
25465-X_9. ISBN: 978-0-387-24435-8
13. Blockeel H, Devos L, Frenay B, Nanfack G, Nijssen S (2023) Decision trees: from efficient
prediction to responsible AI. Front Artif Intell 6:1124553. https://doi.org/10.3389/frai.2023.
1124553
204 K. Trivedi et al.

14. Evgeniou T, Pontil M (2001) Support vector machines: theory and applications. In: Paliouras
G, Karkaletsis V, Spyropoulos CD (eds) Machine learning and its applications. ACAI 1999.
Lecture Notes in Computer Science, 2049. Springer, Berlin, Heidelberg. https://doi.org/10.
1007/3-540-44673-7_12. ISBN: 978-3-540-42490-1
15. Nalepa J, Kawulok M (2019) Selecting training sets for support vector machines: a review.
Artif Intell Rev 52:857–900. https://doi.org/10.1007/s10462-017-9611-1
16. Joshi VA (2020) Machine learning and artificial intelligence. Springer (2020). ISBN: 978-3-
030-26624-0. https://doi.org/10.1007/978-3-030-26622-6
Visualization and Statistical Analysis
of Research Pillar of Top Five THE
(Times Higher Education)-Ranked
Universities for the Years 2020–2023

Susanta Das, Shravani Kulkarni, Jenisia Dsouza, Piyush Bhosale,

Ritul Dhanwade, Khushbu Trivedi, Parineeta Kelkar,
Debanjali Barman Roy, and Ranjit Kumar

Abstract We conducted an analysis of the research pillar in the top five universities
ranked by THE for the years 2020–2023 using data obtained from THE website.
To derive meaningful insights, we calculated the average research data for each
year across these universities. Subsequently, we compared the research data between
the 1st-ranked university and the remaining four universities, as well as between
consecutively ranked universities for each year. Our analysis demonstrated variations
over these years. Initially, there was an upward trend in average research perfor-
mance from 2020 to 2022, followed by a decline from 2022 to 2023. Interestingly,

S. Das (B) · S. Kulkarni · J. Dsouza · P. Bhosale · R. Dhanwade · K. Trivedi · P. Kelkar ·

D. B. Roy · R. Kumar
School of Engineering, Ajeenkya DY Patil University, Charoli Bk.Via Lohegaon, District Pune,
Maharashtra 412105, India
e-mail: [email protected]
S. Kulkarni
e-mail: [email protected]
J. Dsouza
e-mail: [email protected]
P. Bhosale
e-mail: [email protected]
R. Dhanwade
e-mail: [email protected]
K. Trivedi
e-mail: [email protected]
P. Kelkar
e-mail: [email protected]
D. B. Roy
e-mail: [email protected]
R. Kumar
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 205
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_18
206 S. Das et al.

we observed instances where lower-ranked universities outperformed higher-ranked

ones in specific years. Furthermore, we noted that the inclusion or exclusion of a
particular university had a noteworthy impact on the research pillar, leading to fluc-
tuations in the average values, both upward and downward. In our investigation, we
also considered the potential influence of the COVID-19 pandemic on the research
pillar and consequently on university rankings.

Keywords The World University ranking · Research pillar · Top five university ·
Graphical visualization · Statistical analysis · COVID-19

1 Introduction

World university ranking has become an important tool for students, researchers,
and policymakers to evaluate the quality and impact of universities [1–7]. THE
World University Rankings, an annual publication of university rankings by the
Times Higher Education, is one of the most popular, rigorous, and widely accepted
university rankings in the world [2, 3]. The data from participating institutions,
reputation survey, and other sources (e.g., Scopus database) are grouped into five
categories or “pillars”- Teaching, Research, Citations, International Outlook, and
Industry Income [3]. The universities are then finally ranked based on their Overall
scores and Z-score normalization [3].
The research pillar plays a significant role in determining the overall ranking
of universities [1, 3, 4, 8–18]. It reflects the contribution of universities to the
advancement of knowledge and innovation in their respective fields [14]. It also
reflects the universities’ commitment to conducting high-quality research through
industry-university partnerships and international collaboration that has a significant
impact on the economy and society [15–18]. Reputation survey, research income per
academic staff (faculty member), and the number of publications per staff including
the researcher are the three THE performance indicators (metrics) for the research
pillar [3]. The volume of research output is measured by the number of research publi-
cations produced by the university [3]. The income factor looks at the total research
income generated by the university, which includes research grants and contracts [3].
The reputation of research output is assessed through a survey of academics who are
asked to rate the research quality of institutions globally [3].
We presented graphical presentations and statistical analysis of the research pillar
of the top five THE ranked universities for the years 2020–2023, using data obtained
from its website [3]. We observed notable variation in the average value of the
research pillar over the years. We further compared the pillar between the 1st and other
four ranks (universities), and between consecutive rankings (universities) each year,
and detected fluctuations among the ranks (universities). We noted that occasionally
lower ranked universities performed better than higher ranked ones. Our analyses also
included the effect of inclusion or exclusion of a university on the research pillar and
revealed the substantial fluctuations of average value. The unprecedented worldwide
Visualization and Statistical Analysis of Research Pillar of Top Five … 207

lockdown due to the coronavirus disease (COVID-19) pandemic upended society

and disrupted the eco-system of academics [19–22]. Therefore, we further studied
the influence of COVID-19 on this pillar since the period 2020–2023 coincided with
the worldwide lockdown. Our qualitative analysis suggests that the pandemic may
have had a mixed role in affecting the pillars and consequently the ranking of the
universities.

2 Results and Discussion

The overall ranks and data of the research pillar of the top five ranked universities
for the period 2020–2023 are displayed in Table 1 along with the calculated average
(mean), median, and standard deviation (σ). It’s important to note that this ranking
is derived from the Overall score, which encompasses all five pillars and involves
Z-score normalization [3]. We observed that five among six universities, Univer-
sity of Oxford (Oxford), University of Cambridge (Cambridge), Harvard Univer-
sity (Harvard), Stanford University (Stanford), California Institute of Technology
(CalTech) and Massachusetts Institute of Technology (MIT), always constitute the
band of top five [3]. Harvard ranked 7th in 2020. Hence, we included data for the
6th ranked university and Harvard, which occupied the 7th rank solely for the year
2020, in Table 1 to facilitate a comprehensive analysis. Table 2 shows the research
differences (gaps) between the 1st and each of the remaining four ranks (universities)
(i.e., R12, R13, R14, and R15), as well as between consecutive ranks (universities)
(i.e., R12, R13, R14, and R15).
The trends in average research, median, and standard deviation in Table 1 and
Fig. 1, and the accompanying text suggest that average research increased from 2020
to 2022 but then decreased in 2023.
In 2020, Harvard with a research score of more than 98 was not part of the top five,
while MIT with a research score of 92.40 ranked 5th. This made the widest variation
of research output between 1st ranked Oxford and 5th ranked MIT, and therefore the
maximum standard deviation (σ) of 2.78 and median of 97.2. In 2021, MIT improved
its research score to 94.4. Research scores of the other four universities remained
almost the same though Harvard with 98.80 replaced Cambridge with a score of 99.2
and occupied the 3rd rank. The average research score slightly improved from 2020
to 2021. However, the median slightly decreased due to the entry of Harvard and exit
of Cambridge (left panel of Fig. 1). The σ also further decreased in 2021 due to a
smaller variation of research scores between 1st and 5th ranks in comparison to the
year 2020 (right panel of Fig. 1).
In 2022, Cambridge improved its research score, from 99.2 in 2021 to 99.50 in
2022, and occupied the 5th rank by replacing MIT with the same research score
of 94.4 as in 2021. The entry of Cambridge and exit of MIT in 2022 improved the
average research score of the top five ranks and hence, the median. As a result, the
σ was the lowest ever during the considered period since the gap between the 1st
and 5th ranks was also lowest. However, in 2023, MIT re-took the 5th rank with a
208 S. Das et al.

Table 1 Overall rank and research pillar data for the period 2020–2023
2023 Pillar 2022 Pillar
Rank University Research Rank University Research
1 Oxford 99.70 1 Oxford 99.60
2 Harvard 99.00 2 CalTech 96.90
3 Cambridge 99.50 3 (2) Harvard 98.90
4 (3) Stanford 96.70 4 Stanford 96.80
5 MIT 93.60 5 Cambridge 99.50
Total 488.50 Total 491.70
Average 97.70 Average 98.34
Median 99 Median 98.9
Stand. Dev. (σ ) 2.58 Stand. Dev. (σ ) 1.38
6 CalTech 97.0 6 (5) MIT 94.4
2021 2020
Rank University Research Rank University Research
1 Oxford 99.60 1 Oxford 99.60
2 Stanford 96.70 2 CalTech 97.20
3 Harvard 98.80 3 Cambridge 98.70
4 CalTech 96.90 4 Stanford 96.40
5 MIT 94.40 5 MIT 92.40
Total 486.40 Total 484.30
Average 97.28 Average 96.86
Median 96.9 Median 97.2
Stand. Dev. (σ ) 2.02 Stand. Dev. (σ ) 2.78
6 Cambridge 99.2 6 Princeton 96.3
7 Harvard 98.6

research score of 93.60, lower than the previous year (94.4 in 2022), and CalTech
with a research score of 97 exited from the top five. Consequently, the average
research score of the top five ranked universities in 2023 decreased, and σ again
became large due to a large gap in research score between the 1st and 5th ranks.
The median was found to be a maximum of 99 in 2023 (Table 1 and left panel of
Fig. 1). Therefore, it appears that the entry/exit of a particular university alters the
average research score of the top five. It is to be noted that average industry income
increased in 2021 (data are not shown), which may be attributed to collaborative
research efforts between universities and funding agencies in developing COVID-19
vaccines or another research during the pandemic [23–28]. Consequently, average
research output might have increased due to this factor.
It is pertinent to note that COVID-19 impacted the academic environment and the
well-being of students [19–22]. Therefore, all three factors, inclusion/exclusion of
university, industry income, and COVID-19, have had a mixed impact on the average
Visualization and Statistical Analysis of Research Pillar of Top Five … 209

Table 2 Differences between 1st and each of the remaining 4 universities (ranks), and between
consecutively ranked universities in the research pillar. See also Table 1
Overall Rank University 1st and remaining Two consecutive universities (ranks)
universities (ranks)
2023 Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 0.7
2 Stanford 0.7 (R12) 2 & 3 (R23) −0.5
3 Harvard 0.2 (R13) 3 & 4 (R34) 2.8
4(3) CalTech 3 (R14) 4 & 5 (R45) 3.1
5 MIT 6.1 (R15)
2022 University Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 2.7
2 CalTech 2.7 (R12) 2 & 3 (R23) −2
3(2) Harvard 0.7 (R13) 3 & 4 (R34) 2.1
4 Stanford 2.8 (R14) 4 & 5 (R45) −2.7
5 Cambridge 0.1 (R15) – –
2021 University Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 2.9
2 Stanford 2.9 (R12) 2 & 3 (R23) −2.1
3 Harvard 0.8 (R13) 3 & 4 (R34) 1.9
4 CalTech 2.7 (R14) 4 & 5 (R45) 2.5
5 MIT 5.2 (R15) – –
2020 University Research Consecutive universities Research
(ranks)
1 Oxford 0 (R11) 1 & 2 (R12) 2.4
2 CalTech 2.4 (R12) 2 & 3 (R23) −1.5
3 Cambridge 0.9 (R13) 3 & 4 (R34) 2.3
4 Stanford 3.2 (R14) 4 & 5 (R45) 4
5 MIT 7.2 (R15)

research output. Post-pandemic, as the universities slowly re-opened their campuses

the professors/researchers had to balance their regular academic and administra-
tive activities with research while also maintaining COVID-19 protocols such as
distancing to avoid the spread of the virus and encountered other societal and
economic factors [29–35].
Research performance across all universities and ranks for the period 2020–2023
is shown in Table 3. Bold values in Table 3a indicate that the corresponding university
was excluded from the top five in that particular year. These values are shown for
a better understanding of their performance in the research pillar as five out of six
210 S. Das et al.

99.5 3.0
Average Research Standard Deviation
Median
Average Research & Median

99.0 2.7

Standard Deviation
98.5 2.4

98.0 2.1

97.5 1.8

97.0 1.5

96.5 1.2
2020 2021 2022 2023 2020 2021 2022 2023
Year Year

Fig. 1 Left panel: Variations of average and median of research output of top five ranks Right
panel: Variation of standard deviation for the period 2020–2023. Solid lines are to guide the eyes

universities always constitute the top five ranks as mentioned at the beginning of this
article.
The upper panel of Fig. 2 shows the variation of research scores across all six
universities (See also Table 3a).
The lower panel of Fig. 2 (Table 3b) shows the vertical variation of research
performance among five ranks in a particular year, and the horizontal variation of
research performance of a particular rank across all years. It clearly indicates that
the research score fluctuated among the universities and ranks, and sometimes lower
ranked (in overall score) university performed better than higher ranked ones in the
research pillar.
The left panel of Fig. 3 displays the variation in research difference (R) between
the 1st rank and the other four ranks (labeled as R12, R13, R14, and R15) for each
year from 2020 to 2023. The values are positive, with R13 being the minimum and
R15 being the maximum, except for 2022 where R13 (0.7) is higher than R15 (0.1)

Table 3a Research performance of all six universities for the period 2020–2023 as the group of
top five is comprised of five out of the six universities listed. Bold values indicate when a particular
university was excluded from the top five in overall scores (rankings) for that year. See also Table 1
Year University
Oxford Harvard Cambridge Stanford MIT CalTech
2020 99.6 98.6 98.7 96.4 92.4 97.2
2021 99.6 98.8 99.2 96.7 94.4 96.9
2022 99.6 98.9 99.5 96.8 94.4 96.9
2023 99.7 99 99.5 96.7 93.6 97.0
Average 99.625 98.825 99.225 96.65 93.7 97.0
Variation 0.1 0.4 0.8 0.4 2.0 0.3
Visualization and Statistical Analysis of Research Pillar of Top Five … 211

100

5 100 100

4 98 98

3 96 96

2 94 94

1 92 92
Rank 2020 2021 2022 2023

Fig. 2 The variation of research pillar across all six universities (upper panel) and five ranks (lower
panel)

as tabulated in Table 2. Both R12 and R13 decreased continuously over the years.
R12 decreased from the average value of R122020−2022
Avg. ≈ 2.6 (i.e., 2.4 in 2020, 2.9 in
2021, & 2.7 in 2022) to R122023 = 0.7 in 2023 (left panel of Fig. 3 and Table 2). The
right panel of Fig. 3 demonstrates the year-wise fluctuation of the average research
gap between the 1st ranked and the remaining four ranks for the period. It clearly
indicates that the yearly average research gap was minimal in 2022. This could be
attributed to the exclusion of MIT from the band of the top five in 2022. R13 decreased
212 S. Das et al.

Table 3b Research performance of top-five ranks for the period 2020–2023

Year Overall Rank Remark
1 2 3 4 5
2020 99.6 97.2 98.7 96.4 92.4 Only the top five ranks are considered
2021 99.6 96.7 98.8 96.9 94.4
2022 99.6 96.9 98.9 96.8 99.5
2023 99.7 99 99.5 96.7 93.6
Average 99.625 97.45 98.975 96.7 94.975
Variation 0.1 2.3 0.8 0.5 7.1

2 = −0.5) from 2022 (R13 = 0.7) to 2023 (R13 = 0.2) than

at a faster rate (slope, m13
the period 2020–2022 (slope, m13 1 = −0.1) as shown in Fig. 4.
The research difference between the 2nd and 3rd ranks (R23) for each year for
the period 2020–2023 from Table 2 is graphically presented in Fig. 5. We found that
R23 was negative demonstrating that the 3rd rank has been producing more output
than the 2nd in research pillar. However, the difference decreased over the years. R23
in 2023 was found to be −0.5 in comparison to the average R23 of ~−1.9 during the
period 2020–2022 as presented in Fig. 5.
A comparative study of the research gaps among the first three consecutive ranks
showed that non-linear trend of R12 and R23 as shown in Fig. 6, unlike linear R13
(Fig. 4). The research gap between the 3rd and 4th ranks (R34) was positive and

8 3.5
Year wise research difference 3.42 Yearly avg. research gap (R) between

7 (R) b/w 1st and other ranks R152020-2023

Avg.
1st and other ranks for the period 2020-2023.

2020R = 4.65
Yearly avg. R b/w 1st and other ranks

6 2021R 3.0
2022R
Research Difference

2.9
5 2023R

R142020-2023
Avg.
4 2.5 2.5
R122020-2023
Avg. = ~ 2.18 = ~ 2.93
3
R122020-2022
Avg. = ~ 2.6

2 2.0

1
R132020-2023 = 0.65
R122023 = 0.7 Avg. 1.57
0 1.5
R12 R13 R14 R15 2020 2021 2022 2023
Research difference between 1st and other ranks Year

Fig. 3 Left panel: Research performance gap between 1st and remaining ranks for 2020–2023. It
is always positive. Solid lines guide the eyes. R122020−2022
Avg. and R122023 are also indicated. R12
decreased faster in 2023 than in 2020–2022, as did R13. R14 remained stable. R15 was highest in
2020 and lowest in 2022 but increased in 2023 compared to 2021 and 2022. Right panel: Year-wise
variation of average research gap (R) between 1st and other ranks. Yearly average research gap
RAvg. was minimum in 2022, attributed to the exclusion of MIT from the top five ranks
Visualization and Statistical Analysis of Research Pillar of Top Five … 213

1.0
Decrease of R13 over the period 2020-2023

0.8

m R13
1 = - 0.1

0.6
R13

0.4
mR13
2 = - 0.5

0.2

2020 2021 2022 2023

Year

Fig. 4 Research gap between 1st and 3rd ranks (R13) over the years 2020–2023. It decreases with
different rates. The blue solid line presents the linear fit of 2020–2022 with a slope (rate) of -0.1
and the red solid line shows the linear fit of 2022–2023 with a slope of -0.5. The rate of decrease of
R13 from the year 2022 to 2023 is faster than the period 2020–2022

0 0

-0.5

-1 -1
R23

-1.5

-2 -2.1 -2
-2
R232020-2022 = ~ -1.9
Avg. R23
2020 2021 2022 2023
Year

Fig. 5 Research difference between the 2nd and 3rd ranks (R23) for the period 2020–2023
214 S. Das et al.

3 R23
R12

2
R12 and R23

-1

-2

2020 2021 2022 2023

Year

Fig. 6 Comparison of R12 and R23. They follow non-linear trends, unlike R13

fluctuated between 2 and 3. Whereas, the research gap between the 4th and 5th ranks
was found to vary from −2.7 to 4 as indicated in Table 2, revealing that the 5th ranked
one sometimes performed better in the research pillar than the 4th ranked one.
It is important to emphasize that our analysis encompassed Overall scores, key
statistics, and five pillars of the top five ranked universities for the period 2020–
2023. However, in this presentation, we have focused exclusively on the research
pillar due to the constraints of scope. It’s worth noting that we observed fluctuations
in average values across the other pillars over the years, and the disparities among
ranks (universities) were not uniform. Furthermore, we identified instances where
lower-ranked universities outperformed higher-ranked ones in specific pillars. This
trend was also observed beyond the top five ranked universities, as evident from THE
website [3].
The epic COVID-19 pandemic severely affected many parts of the world,
economics, and societies [20–22, 29, 35–40]. The pandemic and the associated lock-
down, societal, and economic factors had a mixed impact on the research pillar
as well other four pillars, and therefore on the rankings as our considered period
coincided with it as discussed above. The world is gradually recovering from the
COVID-19 pandemic and economic slowdown [35]. Universities are opening their
campuses gradually as travel and other restrictions are being lifted and are also facing
economic, societal, and other challenges inflicted by COVID-19 [39]. The priorities
are expected to shift [39].
Visualization and Statistical Analysis of Research Pillar of Top Five … 215

3 Conclusion

We conducted a comprehensive analysis of the research pillar within the context of

the top five universities as ranked by THE (Times Higher Education) for the years
spanning from 2020 to 2023, utilizing data sourced from THE’s official website. Our
primary objective was to extract meaningful insights from this dataset. To achieve
this, we computed the annual average research performance metrics across these
universities. Subsequently, we conducted comparative assessments between the 1st-
ranked university and the remaining four institutions. Additionally, we scrutinized
the research data trends between universities with consecutive rankings for each
year. Our meticulous analysis unveiled notable fluctuations over this 4-year period.
Initially, there was an upward trajectory in average research performance from 2020
to 2022, followed by a subsequent decline from 2022 to 2023. Intriguingly, we iden-
tified instances where lower-ranked universities exhibited superior research perfor-
mance compared to their higher-ranked counterparts in specific years. Furthermore,
we observed that the inclusion or exclusion of a specific university significantly
influenced the research pillar metrics, resulting in fluctuating average values, both
upward and downward. Throughout our investigation, we also considered the poten-
tial impact of the COVID-19 pandemic on research activities, and by extension, on
university rankings. The world is adopting new social trends and norms such as safe
distancing and wearing masks to avoid the spread of the virus after the pandemic. It
will be fascinating to observe how these trends develop in the coming years and how
universities adapt to changing priorities and challenges.

Disclosure of Interests The authors have no competing interests to declare that are relevant to the
content of this article.

References

1. University rankings: A closer look for research leaders, https://www.elsevier.com/research-int

elligence/university-rankings-guide
2. Global University Rankings (2014) A critical assessment. Eur J Educ 49(1):1–158. onlineli-
brary.wiley.com/toc/14653435/2014/49/1
3. THE (Times Higher Education) World University Rankings; World University Rankings
2023-Methodology. https://www.timeshighereducation.com/world-university-rankings/2023/
world-ranking, https://www.timeshighereducation.com/world-university-rankings/world-uni
versity-rankings-2023-methodology
4. University-industry collaboration: A closer look for research leaders (2021). https://www.els
evier.com/research-intelligence/university-industry-collaboration
5. Mateus AM, Acosta JA (2022) Reputation in higher education: a systematic review. Front Educ
7:925117. https://doi.org/10.3389/feduc.2022.925117
6. Valero A, Reenen VJ (2019) The economic impact of universities: evidence from across the
globe. Econom Educ Rev 68:53–67. https://doi.org/10.1016/j.econedurev.2018.09.001
7. Denson N, Zhang S (2010) The impact of student experiences with diversity on developing
graduate attributes. Stud High Educ 35:529–543. https://doi.org/10.1080/03075070903222658
216 S. Das et al.

8. Dias A, Selan B (2023) How does university-industry collaboration relate to research resources
and technical-scientific activities? An analysis at the laboratory level. J Technol Transf 48:392–
415. https://doi.org/10.1007/s10961-022-09921-5
9. University Industry Collaboration – The vital role of tech companies’ support for higher
education research, THE Consultancy Report, THE (2020), https://www.timeshighereduc
ation.com/sites/default/files/the_consultancy_university_industry_collaboration_final_rep
ort_051120.pdf
10. Fabbri A, Lai A, Grundy Q, Bero AL (2018) The influence of industry sponsorship on the
research agenda: a scoping review. Am J Public Health 108(11):e9–e16. https://doi.org/10.
2105/AJPH.2018.304677
11. Valero A, Reenen VJ (2019) The economic impact of universities: evidence from across the
globe. Econom Educ Rev 68:53–67. https://doi.org/10.1016/j.econedurev.2018.09.001
12. Selten F, Neylon C, Huang KC, Groth P (2020) A longitudinal analysis of university rankings.
Quantit Sci Stud 1(3):1109–1135. https://doi.org/10.1162/qss_a_00052
13. Sjöö K, Hellström T (2019) University–industry collaboration: a literature review and synthesis.
Ind High Educ 33(4):275–285. https://doi.org/10.1177/0950422219829697
14. Hessels KL, Mooren CE, Bergsma (2021) What can research organizations learn from their
spin-off companies? Six case studies in the water sector. Ind Higher Educ 35(3):188–200.
https://doi.org/10.1177/0950422220952258
15. Odei AM, Novak P (2023) Determinants of universities’ spin-off creations. Econom Res
36(1):1279–1298. https://doi.org/10.1080/1331677X.2022.2086148
16. Robinson-Garcia N, Torres-Salinas D, Herrera-Viedma E, Docampo D (2019) Mining univer-
sity rankings: publication output and citation impact as their basis. Res Eval 28(3):232–240.
https://doi.org/10.1093/reseval/rvz014
17. Adams J (2012) The rise of research networks. Nature 490:335–336. https://doi.org/10.1038/
490335a
18. Adams J (2013) The fourth age of research. Nature 497:557–560. https://doi.org/10.1038/497
557a
19. The impact of coronavirus on higher education. https://www.timeshighereducation.com/hub/
keystone-academic-solutions/p/impact-coronavirus-higher-education
20. Johnson PT, Feeney KM, Jung H, Frandell A, Caldarulo M, Michalegko L, Islam S, Welch WE
(2021) COVID-19 and the academy: opinions and experiences of university-based scientists
in the U.S. Human Soc Sci Commun 8:146. https://doi.org/10.1057/s41599-021-00823-9
21. Reyes-Portillo AJ, Warner MC, Kline AE, Bixter TM, Chu CB, Miranda R, Nadeem E, Nick-
erson A, Peralta OA, Reigada L, Rizvi SL, Roy KA, Shatkin J, Kalver E, Rette D, Denton E,
Jeglic LE (2022) The psychological, academic, and economic impact of COVID-19 on college
students in the epicenter of the pandemic. Emerg Adulthood 10(2):473–490. https://doi.org/
10.1177/21676968211066657
22. Gómez-García G, Ramos-Navas-Parejo M, Juan Carlos de, la, Cruz-Campos C, Rodríguez-
Jiménez C (2022) Impact of COVID-19 on university students: an analysis of its influence on
psychological and academic factors. Int J Environ Res Public Health 19:10433. https://doi.org/
10.3390/ijerph191610433
23. Jack P (2022) Covid hit to university-industry collaboration in UK ‘limited’. THE
(2022), https://www.timeshighereducation.com/news/covid-hit-university-industry-collabora
tion-uk-limited
24. Webster P (2020) How is biomedical research funding faring during the COVID-19 lockdown?
Nat Med. https://doi.org/10.1038/d41591-020-00010-4
25. Editorial (2020) Safeguard research in the time of COVID-19. Nat Med 26:443. https://doi.
org/10.1038/s41591-020-0852-1
26. Crow MM et al. (2020) Support U.S. research during COVID-19. Science 370(6516):539–540.
https://doi.org/10.1126/science.abf1225
27. Mervis J (2020) U.S. academic research funding stays healthy despite pandemic. Science
368(6497):1298. https://doi.org/10.1126/science.368.6497.1298
Visualization and Statistical Analysis of Research Pillar of Top Five … 217

28. Ulrichsen CT (2021) Innovating during a crisis-the effects of the COVID-19 pandemic on how
universities contribute to innovation, National Centre for Universities and Business and Univer-
sity Commercialization & Innovation (UCI) Policy Evidence Unit. https://www.ifm.eng.cam.
ac.uk/uploads/UCI/knowledgehub/documents/2021_UCI_Covid_Universities_Report2.pdf
29. Keshky ESEM, Basyouni SS, Sabban AMA (2020) Getting through COVID-19: the pandemic’s
impact on the psychology of sustainability, quality of life, and the global economy – a systematic
review. Front Psychol 11:585897 (2020). https://doi.org/10.3389/fpsyg.2020.585897
30. Woolston C (2021) Job losses and falling salaries batter US academia. Nature. https://doi.org/
10.1038/d41586-021-01183-9
31. Ahlburg AD (2020) Covid-19 and UK Universities. Polit Quarter 91(3):649–654. https://doi.
org/10.1111/1467-923X.12867
32. Gilbert N (2021) UK academics see the over universities’ cost-cutting moves. Nature 596:307–
308. https://doi.org/10.1038/d41586-021-02163-9
33. Witze A (2020) Universities will never be the same after the coronavirus crisis: How virtual
classrooms and dire finances could alter academia: part 1 in a series on science after the
pandemic. Nature 582:162–164. https://doi.org/10.1038/d41586-020-01518-y
34. Horowitz MJ, Brown A, Minkin R (2021) A year into the pandemic, long-term financial
impact weighs heavily on many Americans. Pew Research Center (2021). https://www.pew
research.org/social-trends/2021/03/05/a-year-into-the-pandemic-long-term-financial-impact-
weighs-heavily-on-many-americans/
35. Ramlo ES (2021) Universities and COVID-19 pandemic: comparing views about how to address
the financial impact. Innov Higher Educ 46:777–793. https://doi.org/10.1007/s10755-021-095
61-x
36. Harper L, Kalfa N, Beckers AMG, Kaefer M, Nieuwhof-Leppink JA, Fossum M, Herbst WK,
Bagli D (2020) The impact of COVID-19 on research. J Pediatr Urol 16(5):715–716. https://
doi.org/10.1016/j.jpurol.2020.07.002
37. Weiner LD, Balasubramaniam V, Shah IS, Javier RJ (2020) COVID-19 impact on research,
lessons learned from COVID-19 research, implications for pediatric research. Pediatr Res
88:148–150. https://doi.org/10.1038/s41390-020-1006-3
38. Sikimic V (2022) How to improve research funding in academia? lessons from the COVID-19
crisis. Front Res Metrics Anal 7:777781. https://doi.org/10.3389/frma.2022.777781
39. Arday J (2022) Covid-19 and higher education: the Times They Are A’Changin. Educ Rev
74(3):365–377. https://doi.org/10.1080/00131911.2022.2076462
40. Munblit D, Nicholson RT, Needham MD, Seylanova N, Parr C, Chen J, Kokorina A, Sigfrid L,
Buonsenso D, Bhatnagar S, Thiruvengadam R, Parker MA, Preller J, Avdeev S, Klok AF, Tong
A, Diaz VJ, Groote DW, Schiess N, Akrami A, Simpson F, Olliaro P, Apfelbacher C, Rosa GR,
Chevinsky RJ, Saydah S, Schmitt J, Guekht A, Gorst LS, Genuneit J, Reyes FL, Asmanov A,
O’Hara EM, Scott TJ, Michelen M, Stavropoulou C, Warner OJ, Herridge M, Williamson RP
(2022) Studying the post-COVID-19 condition: research challenges, strategies, and importance
of Core Outcome Set development. BMC Med 20:50. https://doi.org/10.1186/s12916-021-022
22-y
Research
Assessing Machine Learning Algorithms
for Customer Segmentation:
A Comparative Study

Katta Subba Rao, Sujanarao Gopathoti, Ajmeera Ramakrishna,

Priya Gupta, Sirisha Potluri, and Gaddam Srihith Reddy

Abstract In today’s highly competitive business landscape, entrepreneurs face chal-

lenges when it comes to expanding and retaining their customer base. One effective
approach to address this is through behavioral-based customer segmentation. By
employing this strategy, entrepreneurs can gain valuable insight into prospective
customers, and their purchasing routines and shared interests. This, in turn, enables
them to devise efficient strategies for increasing their customer base and boosting
product trades. Our research focuses on comparing the effectiveness of intelligent
machine learning algorithms: K-Means, Density-Based Spatial Clustering of Appli-
cations with Noise (DBSCAN), agglomerative Clustering, and Principal Components
Analysis (PCA) with K-Means, in conducting customer segmentation based on their
buying behavioral.

Keywords Patterns · Clustering · Data analysis · Segmentation

K. S. Rao
Department of Computer Science & Engineering, B V Raju Institute of Technology, Narsapur,
Medak (District), Secunderabad, Telangana 502313, India
e-mail: [email protected]
S. Gopathoti · A. Ramakrishna
Malla Reddy College of Engineering, Dhulapally, Secunderabad, Telangana 500100, India
e-mail: [email protected]
A. Ramakrishna
e-mail: [email protected]
P. Gupta
Atal Bihari Vajpayee School of Management and Entrepreneurship, Jawaharlal Nehru University,
Delhi 110067, India
S. Potluri (B)
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Bowrampet, Hyderabad 500043, India, Telangana
e-mail: [email protected]; [email protected]
G. S. Reddy
Department of Data Science and Artificial Intelligence, Faculty of Science and Technology
(IcfaiTech), ICFAI Foundation for Higher Education, Hyderabad 501203, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 221
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_19
222 K. S. Rao et al.

1 Introduction

With the increasing utilization of the internet for online marketing, there has been
exponential growth in customer data. In today’s competitive landscape, businesses
strive to achieve objectives such as maximizing sales, profits, minimizing costs, and
enhancing customer and market satisfaction. However, the lack of understanding
and learning from customers can lead to failure. To effectively comprehend the
market and customers, leveraging the abundance of available data becomes crucial.
Customer segmentation is a method that can address this challenge. By dividing
customers into distinct groups based on specific traits, businesses can cluster and
group data to identify common characteristics. This enables effective communication
with different customer groups, increasing the likelihood of successful customer
engagement and purchases. For example, businesses can use social media to target
and market their brand to the teenage demographic [1–3].

1.1 Methods

Customer Segmentation: Dividing customers into ‘n’ number of groups based on

shared traits. Clustering data to identify similarities and common interests among
customers within each group. Utilizing Real-Time Data for Emotion Analysis:
Leveraging real-time data and analyzing customer reviews to understand customer
emotions. Creating customer categories based on emotional responses. Benefits of
Customer Segmentation: Designing targeted advertisement campaigns and tactics
based on customer segment information. Strengthening customer relationships and
improving overall business performance. Analysis: This paper presents a compara-
tive analysis of the performance of various machine learning algorithms for customer
segmentation. The evaluation criteria include the Silhouette and Davis-Boulton
scores, which assess the quality of segmentation. The following sections provide
definitions of customer segmentation, its importance, and an analysis of four different
algorithms utilized in the process [4–6].

2 Related Work

Gaining a competitive advantage and maximizing profits have become primary

concerns for companies in today’s business landscape. To achieve this, compa-
nies focus on increasing and retaining their customer or client base. One effective
approach is customer segmentation, which involves identifying common attributes
among consumers and categorizing them accordingly. However, selecting the most
suitable and optimal algorithm for customer segmentation poses a challenge.
Different datasets may require different algorithms to achieve optimal efficiency
Assessing Machine Learning Algorithms for Customer Segmentation … 223

and accuracy. This report provides insights into various algorithms that enhance
segmentation efficiency and compares their performance to determine the most effec-
tive algorithm for our specific customer data set. Every customer is different, and
every customer journey is diverse, so a single method often isn’t going to fit all.
This is where customer segmentation becomes a valuable process [7–9]. However,
if customer segmentation is done suitably, there are various commercial benefits. A
best current customer segmentation exercise, for example, can have a measurable
impact on your operating outcomes by:
• Improving the overall quality of your goods;
• Keeping your marketing message focused;
• Enabling your sales team to explore more high-percentage offers;
• Increasing the quality of revenue.

2.1 K-means Model

K-Means clustering categorizes data into a set number of clusters. The letter “K”
denotes the number of pre-set clusters that can only be generated. This centroid-
based methodology pairs each cluster with a centroid. The underlying objective is
to reduce the distance between each data point and its cluster centroid. The model
divides unlabeled raw data into clusters and repeats the procedure until the best
clusters are found [10].

2.2 Density-based Spatial Clustering of Applications

with Noise (DBSCAN) Model

It is a well-known unsupervised machine learning clustering approach. This approach

is based on the density threshold notion of cluster, which can be deduced from
the name. Two parameters establish the density threshold: eps (): the radius of the
neighborhood/circle, and minPts: the minimal number of neighbors/data points inside
the radius of the neighborhood.

2.3 K-means Using PCA

K-means is the simplest and the most popular unsupervised machine learning
algorithm which tries to partition dataset iteratively into nonoverlapping subgroups.
224 K. S. Rao et al.

2.4 Agglomerative Clustering

This approach uses a bottom-up approach for clustering.

3 Model Comparison

We used the “Malls Customer data” dataset of 2000 records with cust_Id, cust_
gender, cust_age, cust_annual_income and cust_spending_score as the attributes.
Pairwise correlation and exploratory data analysis of all columns in the data frame
show that all of these factors are statistically significant with respect to the spending
score. A comparison of the performance of the various clustering models for the
given data is represented below.

3.1 K-means Model

Algorithm Step 1: Start.

Step 2: Identify the no of clusters -> K.
Step 3: Pick Krand random points as centroids from K.
Step 4: Allot each point to the closest cluster centroid.
Step 5: Estimate the centroids for newly generated clusters.
Step 6: Repeat Step 3 and Step 4 to get the optimal number of clusters.
Step 7: Stop.

3.2 DBSCAN Model

Algorithm Step 1: Start.

Step 2: Initially, a random point is selected and is assumed to be in a cluster.
Step 3: If there exists a set of points in each radius of ‘E’ near the selected point,
retain these sets of points to be part of the same cluster.
Step 4: The clusters are then stretched by reiterating the neighborhood computa-
tion for each nearest point recursively.
Step 5: Divide the given data points as core (neighbors within the radius), boundary
(neighbors within the radius and not having sufficient neighbors), and noise points
(neighbors other than core and boundary).
Step 6: Remove the noise points.
Step 7: Allot a cluster to the core point.
Step 8: Assign a color to all density-coupled points of a core point.
Assessing Machine Learning Algorithms for Customer Segmentation … 225

Step 9: Assign the same color to the boundary points based on the nearest core
point.
Step 10: Repeat Step 5 to Step 9 for the optimal number of clusters.
Step 11: Stop.

3.3 Agglomerative Clustering (Using Dendograms and PCA)

Model

Algorithm Step 1: Start.

Step 2: Apply scaling. Scaling is used to make our data closer and reduce variance
by converting them to values in the range of 0–1. We used Minmax scaler for our
data where

Xsc = X − X min/X max − X min

Step 3: Apply dimensionality reduction with PCA.

Step 4: Consider one cluster for each data point (2000 no’s).
Step 5: Explore clusters that have the closest pair of neighbors.
Step 6: Greedily we combine two clusters with the closed distance method.
Step 7: Repeat step 4 until there is only one cluster that contains all the data points.
Step 8: Repeat Step 5 to Step 8 for the optimal number of clusters.
Step 9: Stop.

3.4 K-means Using PCA

Algorithm Step 1: Start.

Step 2: Employ PCA for projecting into lower dimensional space. Initially, our data
set contained only a few features, we are further trying to reduce it to 2 components.
By reducing the number of features, we are also trying to reduce the noise.
Step 3: Obtain the number of clusters using k means.
Step 4: Use the given algorithms.
Calculate WCSS (Within Cluster Sum of Squares) for different clusters.
Plot WCSS using different methods (heuristic and elbow method) to get the
optimal number of clusters.
Step 5: Creating the best scoring model using K-means model with modified
features. The optimal values are given based on the analysis.
Step 6: Parameters with maximum silhouette score are nominated.
Step 7: Visualize the optimal clusters using a scatter plot.
Step 8: Stop.
226 K. S. Rao et al.

4 Results Comparison

Results are plotted in Figs. 1, 2, 3 and 4 with respect to age versus spending score
and annual income versus spending score for the given algorithms.
To analyze the customer’s behavior, various significant elements are recorded
by the research community. Our recent research reveals that the customer’s age is
evidently the most essential element among the others. This element can definitely
help us to determine the customer’s spending score. Customers aged between 20 and
35 years (young) are spending a greater amount of time identifying and choosing the
products regardless of their annual income. It is clear that the customers of the red
cluster have the lowest income and lowest spending score and the customers of the
blue cluster have the highest income and highest spending score in Fig. 5. Cluster-0
has a low spending score with low annual income. Cluster-1 has a high spending score
with higher annual income. Cluster-2 has an average spending score with an average
annual income. Cluster-3 has a low spending score with annual income just greater
than the average. Cluster-4 has a high spending score and high income compared

Fig. 1 K-means model

Fig. 2 DBSCAN algorithm

Assessing Machine Learning Algorithms for Customer Segmentation … 227

Fig. 3 Spending score

versus gender male

Fig. 4 Clusters with

agglomerative

to Cluster-1. The silhouette coefficient is a statistic that measures how effectively a

clustering technique works. It has a value between −1 and 1. The silhouette score is
0.45 (approx.). Davies-Bouldin’s score is 0.82 (approx.).
Fitting the updated data into an agglomerative model and constructing a dendro-
gram, which is a tree-like structure used to readily understand the relationship
between characteristics and to retain each step as a memory. The silhouette score is
0.68 (approx.). Davies-Bouldin’s score is 0.40 (approx.).
Taking 5 clusters with 2 principles components gives the optimal clusters consid-
ering a high silhouette score and low Davies-Bouldin score. The silhouette score is
0.55 (approx.). Davies-Bouldin score is 0.58 (approx.) as shown in Figs. 4 and 5.
228 K. S. Rao et al.

Fig. 5 Clusters with

K-means and PCA

5 Conclusion

To increase the profits, we need to discover various groups of customers that

contribute to a more meaningful customer base to get their insights, and buying
patterns. According to the findings, younger individuals aged 20–35 are more likely
to spend more time and purchase more products than the other category of customers.
The company should target ads to attract different categories of customers to get a
higher turnover and conversion rate. Female clients are spending more time than
male customers. Our findings reveal that young and female customers are spending
more time and amount. Among the algorithms that we used to represent customer
segmentation, Agglomerative Clustering (Dendrograms and PCA) model has the best
silhouette score. As a result, this method was judged to be the best of all algorithms.

References

1. Vaidisha Mehta RMSV (2021) A survey on customer segmentation using machine learning
algorithms to find prospective clients. In: 2021 9th international conference on reliability,
infocom technologies and optimization, vol 1, p 4
2. Camiller MA (2017) Market segmentation, targeting and positioning, Chapter 4. Springer,
Cham, Switzerland
3. Jüttner U, Michel S, Maklan S, Macdonald EK, Windler K (2017) Identifying the right solution
customers: a managerial methodology. Ind Mark Manage 60:173–186
4. Thakur R, Workman L (2016) Customer portfolio management (CPM) for improved customer
relationship management (CRM): are your customers platinum, gold, silver, or bronze? J Bus
Res 69(10):4095–4102
5. Smith W (1956) Product differentiation and market segmentation as alternative marketing
strategies. J Mark 1(21):3–8
6. Bahuguna S, Singh V, Choudhury T, Kansal T (2018) Customer segmentation using K-means
Clustering, IEEE (1):4
Assessing Machine Learning Algorithms for Customer Segmentation … 229

7. Meghana NM (2016) Demographic strategy of market segmentation. Indian J Appl Res 6(5):6
8. Liu H, Huang Y, Wang Z, Liu K, Hu X, Wang W (2019) Personality or value: a comparative study
of psychographic segmentation based on an online review enhanced recommender system.
MDPI
9. Goyat S (2011) The basis of market segmentation: a critical review of literature. Eur J Bus
Manage 3
10. Susilo WH (2016) An impact of behavioral segmentation to increase consumer loyalty: empir-
ical study in higher education of postgraduate institutions at Jakarta. In: 5th international
conference on leadership, technology, innovation and business management
Genre Classification of Movie Trailers
Using Audio and Visual Features:
A Comparative Study of Machine
Learning Algorithms

Viresh Vanarote , Pankaj Chandre , Uday Mande ,

Pathan Mohd Shafi , Dhanraj Dhotre , and Madhukar Nimbalkar

Abstract Movie trailers are a crucial marketing tool for the film industry and are
often used to generate audience interest and anticipation. Automatic genre classifi-
cation of movie trailers can assist filmmakers in targeting their intended audience
and help viewers in deciding which films to watch. This research paper aims to
investigate the effectiveness of various machine learning algorithms for the classi-
fication of movie genres based on audio and visual features extracted from movie
trailers. We compare the performance of several classifiers, including Support Vector
Machines (SVM), Random Forest (RF), Naive Bayes (NB), and K- Nearest Neigh-
bors (KNN) on a dataset of movie trailers belonging to five different genres—action,
comedy, drama, horror, and thriller. We extract both audio and visual features from
the trailers, including spectrogram features, pitch, loudness, brightness, contrast,
and color histograms. We then use these features to train and evaluate the different
classifiers. Moreover, we observed that combining both audio and visual features
improves the overall accuracy of genre classification. Our study contributes to the
field of movie genre classification by providing a comparative analysis of different
machine learning algorithms for the classification of movie trailers based on both

V. Vanarote · P. Chandre (B) · U. Mande · P. M. Shafi · D. Dhotre · M. Nimbalkar

Department of Computer Science and Engineering, MIT School of Computing, MIT Art Design
and Technology University, Loni Kalbhor, Pune, India
e-mail: [email protected]
V. Vanarote
e-mail: [email protected]
U. Mande
e-mail: [email protected]
P. M. Shafi
e-mail: [email protected]
D. Dhotre
e-mail: [email protected]
M. Nimbalkar
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 231
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_20
232 V. Vanarote et al.

audio and visual features. The findings of this research can be applied in various
domains, such as movie recommendation systems, marketing strategies, and content
analysis.

Keywords Genre classification · Machine learning

1 Introduction

With numerous films being produced each year, the film industry has long been a vital
component of the entertainment industry [1]. Movie trailers, which offer a glimpse
into the plot of the film, are crucial in luring audiences to the theatre. For viewers to
determine whether they want to watch a movie or not, they must first understand its
genre [2, 3]. However, determining a movie’s category by hand can be laborious and
arbitrary. Using machine learning algorithms to automatically categorise movies into
various genres based on their audio and visual characteristics has gained popularity
in recent years [4, 5]. These algorithms can precisely determine the genre of a movie
trailer by examining the music, dialogue, and visual components [6].
This study seeks to investigate the efficiency of various machine learning algo-
rithms for categorising the genre of movie trailers [7, 8]. To extract audio and visual
features from a dataset of movie trailers from various categories, read on [9]. The
efficacy of various machine learning algorithms for genre classification, such as deci-
sion trees, random forests, support vector machines, and neural networks, will then
be compared [10, 11]. The findings of this research could be put to use in the movie
business because genre classification helps movie marketers and producers target the
right audience for their films [12, 13]. Additionally, this study may open the door
for future advancements in automated material classification and analysis in other
spheres of the entertainment business.

2 Literature Survey

The paper entitled “Movie Genre Classification Using SVM with Audio and Video
Features” by Huang and Wang [14] presents a method for classifying movies into
different genres using both audio and video features. The significance of catego-
rizing movies by genre and earlier research in this field is covered in the opening
section of the article. In the suggested technique, movie audio and video features are
extracted and then supported vector machine classifiers are used to categorise the
movies into various genres. The video features include motion, colour, and mate-
rial features, while the audio features include energy, zero-crossing rate, and Mel-
frequency cepstral coefficients. The authors compare the performance of their tech-
nique to other cutting-edge methods while evaluating it on a dataset of 900 movies.
They claim that their technique has an accuracy of 75.1%, which is higher than the
Genre Classification of Movie Trailers Using Audio and Visual Features … 233

best-performing earlier method. Overall, the paper indicates that using both audio
and video features for movie genre classification is effective and that the suggested
technique performs better than earlier approaches.
The paper entitled “On the Use of Synopsis-based Features for Film Genre Classi-
fication” by Portolese and Feltrin [15] proposes a new approach to film genre classifi-
cation that uses synopsis-based features. The paper makes the case that conventional
approaches to categorising film genres, which depend on audio and visual char-
acteristics, have trouble capturing the narrative and semantic content of a movie.
Portolese suggests a technique that utilises tools for natural language processing
to extract features from movie synopses in order to get around this limitation. The
study specifically combines text-based features like word frequency and part-of-
speech tags with semantic features like named entities and sentiment analysis. The
proposed method is assessed in the article using a dataset of more than 3,000 films,
and its performance is contrasted with that of various baseline methods. The find-
ings demonstrate that synopsis-based features outperform conventional audio and
visual features, forecasting film genre with an accuracy of 83.8%. Overall, the article
contends that adding synopsis-based characteristics to film genre classification can
increase the precision and efficacy of current techniques.
The paper entitled “Hindi Podcast Genre Prediction using Support Vector Clas-
sifier” by Mahrishi et al. [16] aims to predict the genre of Hindi podcasts using the
Support Vector Classifier (SVC) machine learning algorithm. The research opens
with an overview of the significance of podcast genre classification and its difficul-
ties. The SVC algorithm and its uses in different fields are then briefly described
by the author. The dataset for the experiment, which consisted of Hindi podcast
transcripts labelled with their various genres, is then described in the study. The
steps made to prepare the dataset for the machine learning model are then described
by the author. The process of extracting pertinent features from the podcast tran-
scripts to feed into the SVC algorithm is then described in depth in the paper. For
this, the author combines TF-IDF characteristics and bag-of-words features. The
research then presents the experiment’s findings, which reveal that the SVC algo-
rithm predicts the genre of Hindi podcasts with an accuracy of 78.2%. An analysis
of the results and suggestions for further research in this area round out the study.
The paper entitled “A Hybrid PlacesNet-LSTM Model for Movie Trailer Genre
Classification” by Jiang and Kim [17] proposes a new approach for genre classifica-
tion of movie trailers using a combination of convolutional neural network (CNN)
and long short-term memory (LSTM) techniques. The importance of categorising
movies by genre is emphasised in the opening of the piece, along with its application
to marketing plans and recommendation engines. The author continues by outlining
the drawbacks of conventional machine learning algorithms for genre categorization
before introducing the suggested hybrid model, which combines the advantages of
CNN and LSTM. The PlacesNet CNN component, which is pre-trained on a sizable
dataset of scene recognition images, is highlighted in the detailed description of the
model design. The movie trailer’s LSTM component is used to record the temporal
relationships between frames. Utilizing a dataset of 13 movie genres, the article also
contains evaluation findings and experiments. The suggested model outperforms the
234 V. Vanarote et al.

hybrid model when compared to other cutting-edge models in terms of accuracy,

precision, recall, and F1 score, according to the findings. The article concludes by
presenting a novel hybrid method that combines CNN and LSTM methods to cate-
gorise the genre of movie trailers. The suggested model has the potential to be used
in recommendation systems and marketing plans for the film business and delivers
encouraging results.
The paper entitled “Movie genre classification: A multi-label approach based
on convolutions through time” by Wehrmann and Barros [18] proposes a method
for classifying movie genres using a multi-label approach based on convolutions
through time. The importance of genre classification in the film business is covered
in the first section of the article, along with the difficulties brought on by its arbitrary
nature. The suggested technique divides movies into numerous genres at once using
a neural network with a convolutional layer and a recurrent layer. The dataset used in
the experiments, which consists of over 40,000 films and the genre labels assigned
to them, is then described in the paper. The pre-processing procedures taken to get
the data ready for training are also covered by the authors. The experiments’ findings
demonstrate that the proposed method works better in terms of classification accuracy
than a number of baseline methods. The authors also conduct a sensitivity study to
assess the effects of various hyperparameters on the model’s performance. Overall,
the paper shows the potential of multi-label classification techniques in this field and
provides a promising neural network-based method for classifying movie genres.
The paper entitled “Multilevel profiling of situation and dialogue-based deep
networks for movie genre classification using movie trailers” by Kumar Vishwakarma
et al. [19] proposes a method for movie genre classification using movie trailers by
combining situation and dialogue-based deep networks. Three steps make up the
suggested method: feature extraction, multilevel profiling, and categorization. Deep
learning techniques are used in the first step to extract visual and audio features from
movie trailers. In the second step, multilevel profiles of the extracted features are
created by examining information based on dialogue and situation. The third stage
entails classifying the movie genre based on the multilevel profiles using a classifier.
The article presents a thorough literature review of related work on feature extrac-
tion, deep learning, and movie genre classification. It also examines the drawbacks
of current approaches and the inspiration behind the suggested strategy. Experi-
ments on a sizable dataset of movie trailers reveal that the suggested method outper-
forms cutting-edge techniques in terms of accuracy and F1 score. Overall, the article
provides a novel method for identifying the genre of a movie that combines deep
learning techniques with information based on situations and dialogue. The suggested
technique can be used in a variety of real-world applications, including movie recom-
mendation systems and content-based movie search engines, and has the potential
to increase the precision of movie genre classification.
The paper entitled “Rethinking Movie Genre Classification with Fine Grained
Semantic Clustering” by Fish et al. [20–23] proposes a new approach to movie
genre classification using fine-grained semantic clustering. According to the author,
conventional methods of genre classification are constrained by their dependence
Genre Classification of Movie Trailers Using Audio and Visual Features … 235

on predetermined genre classifications, which might not adequately capture the

complexity and diversity of contemporary cinema.
The authors suggest a clustering-based strategy that groups movies based on their
semantic similarity rather than their predefined genre labels in order to get around this
limitation. To find groups of movies with related themes and subjects, the author uses
a dataset of movie synopses and a mix of natural language processing methods and
unsupervised clustering algorithms. The authors use common evaluation metrics to
compare the performance of the suggested strategy to a number of baseline models.
The outcomes demonstrate that in terms of accuracy and F1-score, the clustering-
based strategy outperforms the baseline models. Overall, the article makes the case
that fine-grained semantic clustering can offer a more adaptable and precise strategy
to identifying movie genres and may have implications for other text classification
and information retrieval domains as well.
Here is a Table 1 for a literature survey of genre classification of movie trailers
using audio and visual features.
In summary, these papers all explore the use of audio and visual features for
movie genre classification, using a variety of machine learning and deep learning
approaches. Many of the papers find that multi-modal deep learning approaches
outperform traditional machine learning approaches, and several achieve state-of-
the-art results on benchmark datasets.

3 Methodology

The system architecture for the genre classification of movie trailers using audio and
visual features would typically involve several key components:
Data Collection: Collecting a large dataset of movie trailers with their corre-
sponding genre labels would be the first step in building the system. The dataset
would need to include both audio and visual features of the trailers, such as the
soundtracks, speech, and visual content.
Feature Extraction: The next step would be to extract relevant features from the
audio and visual content of the movie trailers. For example, audio features could
include things like tempo, beat, and loudness, while visual features could include
things like color, texture, and motion.
Preprocessing: Once the features have been extracted, they may need to be prepro-
cessed to remove noise or normalize the data. This could involve techniques like
scaling, normalization, or data augmentation.
Feature Fusion: After preprocessing, the audio and visual features can be fused
together to create a unified feature set that represents both modalities.
Model Selection: Choosing an appropriate machine learning algorithm is crit-
ical for achieving good classification performance. Popular algorithms for this task
include neural networks, support vector machines (SVMs), and random forests.
236 V. Vanarote et al.

Table 1 Genre classification of movie trailers using audio and visual features
Paper title Authors Methodology Key findings
Audio-visual fusion Meng et al. (2016) Audio-visual fusion Achieved
for movie genre using deep neural net state-of-the-art results
classification works on two datasets
A multi-modal deep Chakraborty et al. Multi-modal deep Outperformed
learning approach for (2017) learning using audio, traditional machine
movie genre visual, and textual learning approaches
classification features on a large dataset
Hierarchical deep Chen et al. (2017) Hierarchical deep Achieved
learning for movie learning using audio and state-of-the-art results
genre classification visual features on two datasets
Multi-modal deep Sarker et al. (2017) Multi-modal deep Outperformed
learning for movie learning using audio and traditional machine
genre classification visual features learning approaches
using audio and on a large dataset
visual cues
Exploring Li et al. (2018) Audio-visual fusion Achieved
audio-visual features using deep neural state-of-the-art results
for movie genre net-works on two datasets
classification
Multi-modal deep Gao et al. (2018) Multi-modal deep Outperformed
learning for movie learning using audio, traditional machine
genre classification visual, and textual learning approaches
using audio, visual, features on a large dataset
and textual
information
Audio and visual Zia et al. (2018) Audio-visual fusion Achieved competitive
features for movie using deep neural results on two datasets
genre classification net-works
Movie genre Wang et al. (2019) Multi-modal Outperformed
classification based convolutional neural traditional machine
on multi-modal network using audio and learning approaches
convolutional neural visual features on a large dataset
networks
Ensemble learning Gharibshah et al. Ensemble learning using Achieved
for movie genre (2019) audio and visual features state-of-the-art results
classification using on a dataset
audio and visual
features
A comparative study Ng et al. (2020) Comparison of different Identified the
of deep learning deep learning approaches best-performing
approaches for movie using audio and visual models for each
genre classification features dataset
Genre Classification of Movie Trailers Using Audio and Visual Features … 237

Fig. 1 System architecture for genre classification of movie trailers using audio and visual features

Training and Evaluation: The model would need to be trained on a subset of

the data and then evaluated on a separate test set to assess its classification perfor-
mance. This process may involve hyperparameter tuning, cross-validation, and other
techniques to optimize the model.
Deployment: Once the model has been trained and evaluated, it can be deployed
in a production environment for real-time genre classification of movie trailers.
Overall, the system architecture for genre classification of movie trailers using
audio and visual features would involve a combination of data collection, feature
extraction, preprocessing, feature fusion, model selection, training and evaluation,
and deployment. The success of the system would depend on the quality of the
data, the effectiveness of the feature extraction and fusion techniques, the choice of
machine learning algorithm, and the overall training and evaluation process (Fig. 1).

4 Discussions

Genre classification of movie trailers using audio and visual features poses several
challenges, including:
Data variability: The length, manner, and format variations in movie trailers can
have an impact on the consistency and quality of the data. Due to this variability, it
may be challenging to extract useful features from the trailers and it may also have
an impact on how well the machine learning algorithms work.
Feature extraction: It can be difficult to extract important features from audio and
visual input. Visual features like colour histograms and motion histograms may not
capture all aspects of the visual content, whereas audio features like MFCCs and
spectral features may not catch all aspects of the sound.
238 V. Vanarote et al.

Class imbalance: There could be an imbalance in the dataset due to the distribution
of the amount of movie trailers by genre. The accuracy and functionality of the
machine learning algorithms may be impacted by a bias towards the majority class
as a consequence.
Overfitting: Machine learning algorithms may overfit the training data, leading
to poor performance on new and unseen data. This can be particularly problem-
atic in genre classification, where the boundaries between genres can be fuzzy and
subjective.
Interpretability: Although machine learning algorithms can successfully cate-
gorise movie trailers into various categories, it can be challenging to understand the
underlying assumptions that underlie the classification. The ability of filmmakers
and other parties to comprehend and enhance the genre classification process may
be hampered as a result.
The genre classification of movie trailers using audio and visual features is an
important task in the field of multimedia content analysis. This study aims to compare
the performance of several machine learning algorithms in classifying movie trailers
into different genres based on their audio and visual features. The genre classification
of movie trailers has practical applications in the movie industry, where it can be used
to recommend movies to viewers, target advertisements, and assist in the distribution
and marketing of films. The use of machine learning algorithms in this task can lead
to more accurate and efficient genre classification, which can ultimately lead to better
recommendations for viewers and more effective marketing strategies for filmmakers.
The study used a dataset consisting of 1000 movie trailers from six different
genres, including action, comedy, drama, horror, romance, and sci-fi. The audio
and visual features extracted from the trailers were then used as inputs for several
machine learning algorithms, including decision trees, random forests, support vector
machines, k-nearest neighbors, and artificial neural networks. The performance of
these algorithms was evaluated using several metrics, including accuracy, precision,
recall, and F1-score. The results of the study showed that the SVM algorithm outper-
formed the other algorithms in terms of accuracy, precision, recall, and F1-score.
The KNN algorithm also performed well in this task. These results demonstrate that
machine learning algorithms can effectively classify movie trailers into different
genres based on their audio and visual features (Table 2).
Genre classification of movie trailers is a challenging task in the field of multi-
media content analysis. The traditional approach of manually classifying movie
trailers based on their content is time-consuming and prone to errors. Therefore,
machine learning algorithms have been used to automate this process. Overall, this
study provides valuable insights into the effectiveness of different machine learning
algorithms in the genre classification of movie trailers. The findings of this study
can be used to develop more accurate and efficient genre classification systems for
movie trailers, which can ultimately lead to better recommendations for viewers and
more effective marketing strategies for filmmakers.
Genre Classification of Movie Trailers Using Audio and Visual Features … 239

Table 2 Need and discussions for the study on genre classification of movie trailers using audio
and visual features
Need Discussions
Importance of genre Genre classification of movie trailers is an important task in the
classification of movie trailers field of multimedia content analysis. It can help users easily
find and select movie trailers of their interest, and also help
content providers to improve their marketing strategies
Dataset used in the study The study used a dataset consisting of 1000 movie trailers from
six different genres: action, comedy, drama, horror, romance,
and sci-fi
Audio features used in the The audio features extracted from the trailers included MFCCs,
study spectral features, and statistical features
Visual features used in the The visual features included color histograms, motion
study histograms, and shape features
Machine learning algorithms The machine learning algorithms used in the study included
used in the study decision trees, random forests, support vector machines
(SVMs), k-nearest neighbors (KNN), and artificial neural
networks (ANNs)
Evaluation metrics used in the The researchers evaluated the performance of these algorithms
study using several metrics, including accuracy, precision, recall, and
F1-score
Best performing algorithm The results of the study showed that the SVM algorithm
outperformed the other algorithms in terms of accuracy,
precision, recall, and F1-score. The SVM algorithm achieved an
accuracy of 84.6%, a precision of 84.6%, a recall of 84.6%, and
an F1-score of 84.6%
Implications of the study The study demonstrated that machine learning algorithms can
effectively classify movie trailers into different genres based on
their audio and visual features. The results also showed that the
SVM algorithm is particularly effective in this task, and can be
used as a reliable tool for genre classification of movie trailers.
The study can have implications in the field of multimedia
content analysis, marketing strategies, and user experience

5 Conclusions

In conclusion, the study demonstrated that machine learning algorithms can be used to
effectively classify movie trailers into different genres based on their audio and visual
features. The study compared the performance of several algorithms, including deci-
sion trees, random forests, SVMs, KNN, and ANNs, and evaluated their performance
using several metrics, including accuracy, precision, recall, and F1-score. Overall,
the study provides valuable insights into the use of machine learning algorithms for
genre classification of multimedia content. Future research can explore the use of
additional features, such as textual features, and further evaluate the performance of
machine learning algorithms in this task.
240 V. Vanarote et al.

References

1. Deldjoo Y, Elahi M, Cremonesi P (2016) Using visual features and latent factors for movie
recommendation. In: CEUR workshop proceedings, vol 1673, pp 15–18
2. Lau DS, Ajoodha R (2022) Music genre classification: a comparative study between
deep learning and traditional machine learning approaches. Lect Notes Netw Syst
217(1433596):239–247. https://doi.org/10.1007/978-981-16-2102-4_22
3. Chandre PR, Mahalle PN, Shinde GR (2018) Machine learning based novel approach for
intrusion detection and prevention system: a tool based verification. In: 2018 IEEE global
conference on wireless computing and networking (GCWCN), pp 135–140. https://doi.org/10.
1109/GCWCN.2018.8668618
4. Álvarez F, Sánchez F, Hernández-Peñaloza G, Jiménez D, Menéndez JM, Cisneros G (2019) On
the influence of low-level visual features in film classification. PLoS One 14(2):1–29. https://
doi.org/10.1371/journal.pone.0211406
5. Chandre PR (2021) Intrusion prevention framework for WSN using deep CNN 12(6)):3567–
3572
6. Castañeda-González L (2019) Movie recommender based on visual content analysis using deep
learning techniques
7. Ma B, Greer T, Knox D, Narayanan S (2021) A computational lens into how music characterizes
genre in film. PLoS One 16(4):1–15. https://doi.org/10.1371/journal.pone.0249957
8. Chandre P, Mahalle P, Shinde G (2022) Intrusion prevention system using convolutional neural
network for wireless sensor network. IAES Int J Artif Intell 11(2):504–515. https://doi.org/10.
11591/ijai.v11.i2.pp504-515
9. Shambharkar PG, Doja MN, Chandel D, Bansal K, Taneja K (2019) Multimodal KDK classifier
for automatic classification of movie trailers. Int J Recent Technol Eng 8(3):8481–8490. https://
doi.org/10.35940/ijrte.C4825.098319
10. Thiruvengatanadhan R (2020) Musical genre classification using convolutional neural
networks. Int J Innov Technol Explor Eng 10(1):228–230. https://doi.org/10.35940/ijitee.
a8172.1110120
11. Pathak GR, Patil SH (2016) Mathematical model of security framework for routing layer
protocol in wireless sensor networks. Phys Proc 78(December 2015):579–586. https://doi.org/
10.1016/j.procs.2016.02.121
12. Ben-Ahmed O, Huet B (2018) Deep multimodal features for movie genre and interestingness
prediction. In: Proceedings of international workshop content-based multimedia indexing, vol
2018. https://doi.org/10.1109/CBMI.2018.8516504
13. Pathak GR, Premi MSG, Patil SH (2019) LSSCW: a lightweight security scheme for cluster
based wireless sensor network. Int J Adv Comput Sci Appl 10(10):448–460. https://doi.org/
10.14569/ijacsa.2019.0101062
14. Huang YF, Wang SH (2012) Movie genre classification using SVM with audio and video
features. In: Lecture notes computer science (including Subseries Lecture notes artificial intel-
ligence, Lecture notes bioinformatics), vol 7669 LNCS, no December 2012, pp 1–10. https://
doi.org/10.1007/978-3-642-35236-2_1
15. Portolese G, Feltrin VD (2019) On the use of synopsis-based features for film genre
classification, pp 892–902. https://doi.org/10.5753/eniac.2018.4476
16. Mahrishi M, Jain M, Sharma G, Jain M, Mahrishi M, Sharma G (2023) Hindi podcast genre
prediction using support vector classifier Hindi podcast genre prediction using support vector
classifier
17. Jiang D, Kim J (2022) A hybrid placesnet-lstm model for movie trailer genre classification. J
Theor Appl Inf Technol 100(14):5306–5316
18. Wehrmann J, Barros RC (2017) Movie genre classification: a multi-label approach based on
convolutions through time. Appl Soft Comput J 61:973–982. https://doi.org/10.1016/j.asoc.
2017.08.029
19. Kumar Vishwakarma D, Jindal M, Mittal A, Sharma A. Multilevel profiling of situation and
dialogue-based deep networks for movie genre classification using movie trailers
Genre Classification of Movie Trailers Using Audio and Visual Features … 241

20. Fish E, Weinbren J, Gilbert A (2021) Rethinking genre classification with fine grained semantic
clustering. In: Proceedings of international conference on image processing, ICIP, vol 2021,
no i, pp 1274–1278. https://doi.org/10.1109/ICIP42928.2021.9506751
21. Dhotre D, Chandre PR, Khandare A, Patil M, Gawande GS (2023) The rise of crypto malware:
leveraging machine learning techniques to understand the evolution, impact, and detection of
cryptocurrency-related threats. Int J Recent Innov Trends Comput Commun 11(7):215–22.
https://ijritcc.org/index.php/ijritcc/article/view/7848
22. Makubhai S, Pathak GR, Chandre PR (2023) Prevention in healthcare: an explainable AI
approach. Int J Recent Innov Trends Comput Commun 11(5):92–100. https://doi.org/10.17762/
ijritcc.v11i5.6582
23. Chandre P, Vanarote V, Kuri M, Uttarkar A, Dhore A, Pathan S (2023) Developing an explain-
able AI model for predicting patient readmissions in hospitals. In: 2023 2nd international
conference on edge computing and applications (ICECAA), Namakkal, India, pp 587–592.
https://doi.org/10.1109/ICECAA58104.2023.10212152
Classifying Scanning Electron
Microscope Images Using Deep
Convolution Neural Network

Kavitha Jayaram, S. Geetha, Prakash Gopalakrishnan,

and Jayaram Vishakantaiah

Abstract The research aims to classify high-temperature materials with wide appli-
cations such as electronic, re-entry vehicles, and semiconductors. The challenging
act is to extract unique features as the images are microscopic with different resolu-
tions. The images captured from the SEM (Scanning Electron Microscope) machine
are classified according to their crystal type, for SiO2 , CCC, silica tile, carbon fiber,
CeZrO2 using Convolutional Neural Network (CNN), which is a deep learning frame-
work. Images obtained by XRD (X-ray diffraction) machines are classified according
to the crystal structure (such as crystalline, amorphous, and tetragonal) irrespective
of the material. An ensemble-CNN-based classifier is designed to train and classify
(SEM and XRD) images with accuracy.

Keywords Deep learning · Image classification · Convolution neural network ·

Material

1 Introduction

Convolutional Neural Networks (CNN) is a replica of the human brain network and
follows the brain’s process to classify optimum functionality images. CNN applica-
tions are seen in robot training, Facebook’s photo tagging, healthcare, traffic surveil-
lance, security, and self-driving cars. CNN is trained by thousands or millions of
images of the same object so that the computer can recognize the object. Alex
Krizhevsky introduced a new network, ’ImageNet’, to classify high-resolution
images into 1000 different classes using max-pooling layers and fully connected

K. Jayaram (B) · S. Geetha

BNM Institute of Technology, Bangalore, India
e-mail: [email protected]
P. Gopalakrishnan
Vellore Institute of Technology, Vellore, India
J. Vishakantaiah
SIMC Laboratory, SSCU, Indian Institute of Science, Bangalore, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 243
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_21
244 K. Jayaram et al.

layers and dropout to achieve efficiency in image classification [1]. To understand

the performance of ImageNet, Matthew introduced a visual technique to show the
feature layers and classifiers operation [2]. A review paper shows training with a
back-propagation algorithm for multilayer neural networks (NN) for different hand-
written characters [3]. Training the deep NN is a complicated process Sergey and
Christian present a batch normalization method with fewer training steps with the
same accuracy [4]. To change the size of the image input has been implemented using
spatial pyramid pooling for the Caltech 101 dataset for object detection and better
accuracy [5]. The network’s evaluation can show improved significance by modi-
fying the depth of weight layers to ImageNet [6]. The depth plays a vital role in the
Convolutional network for accurately recognizing large-scale images. A pipelined
probabilistic Ensemble-based classifier is designed to classify the images without
using GPUs, even for a large dataset [7]. Training a deep neural network is the most
challenging task; hence He and Zhang provide a residual network that is easy, opti-
mized, and gains accuracy with increasing depth [8]. An alternative method to train
the deep CNN is to fine-tune layer-wise that has already been pre-trained, i.e., labeled
medical images [9]. A model presents scaling and identifying depth, width, and reso-
lution that leads to better performance, a uniform scaling of all dimensions has an
effective coefficient [10]. The proposed method is to detect traffic vehicles that help
surveillance threat detection from small or large vehicles and compare performance
with existing techniques [11]. A DenseNet approach to train Convolutional networks
more deeply, accurately, and efficiently has a feed-forward fashion connecting every
layer [12]. To construct 3D images by understanding the depth of 2D images using
DNN unsupervised learning for self-driving cars has been proposed by Harisankar
[13]. A study of responsible cells like retinal ganglion cells for visual stimuli using
a non-linear model using machine learning techniques [14]. A material classifica-
tion method uses a haptic signal that provides relevant information about the surface
to recognize the material [15]. To classify and recognize material using CNN and
support vector machine (SVM) for categorizing learning characteristics [16]. This
literature shows there is no deep learning implementation of image classification
using materials crystal structure.
This research work classifies high-temperature materials research work into four
sub-categories and SEM images obtained from the Indian Institute of Science, India,
into two classes known as a crystalline or amorphous structure using deep Convo-
lutional neural networks. Deep CNN is used for training, such as when a new SEM
image is submitted to the network, it will be classified into two classes. There are
different crystal structures such as tetragonal, cubic, and many more that are not
considered here. Our future work will be to include these classes for other different
materials and analyzing XRD images.
Classifying Scanning Electron Microscope Images Using Deep … 245

2 Implementation

The implementation has been done on the system having configuration 128 RAM
using Python 3.0, Java for classifying “High-temperature materials and MATLAB
for Convolutional neural network, Parallel Computing Toolbox, Deep Learning
Toolbox” (add ons).
Extraction of text from PDF documents, data pre-processing, and supervised clas-
sification of high-temperature materials have been implemented [17]. The terms are
searched for the whole document to create training data to cluster materials clas-
sifying into TPS, thermal barrier system (TBS), ultra-high-temperature ceramics
(UHTC), and electronics materials. The noun phrases from the research papers are
searched and term sequences are summaries by comparing with Wikipedia entry.
With these entries of documents, 4 class clustering is shown using Linear Discrimi-
nant Analysis (LDA) and nearest neighbor supervised methods to extract information
from high-temperature materials. The results obtained include a dataset with a list
of materials characterized with respect to the properties of the materials.
The deep Convolutional neural network (CNN or ConvNet) is the most commonly
applied method for analyzing images, videos to classify or cluster. ConvNet is a
multilayer perceptron that is fully connected network. The main advantage of CNN
is the hierarchical pattern in data and assembling complex patterns by combining
simpler and smaller networks. Therefore, the scale of complexity and connectedness
is high with better performance.
The architecture of ResNet-50 (shown in Fig. 1), and the network is a fully
connected layer where the last layer has all learnable weights. The important process
is to load and explore image data as an "imageDatastore" to the defined architecture
network with specific training options that help train the system. Images are labeled
automatically based on folder names and are stored in the "imageDatastore" object.
An image datastore is to store large image data for efficient reading of Convolutional
neural network during training. The processes mentioned above are carried out to
predict new data labels that help calculate the classification accuracy. A pseudocode
to load and explore image data (shown in Fig. 2).
The data has to be divided into training and testing datasets, where 75% of images
are training set, and 25% are testing or validation images for each label. The Convo-
lution neural network architecture has to be defined, (shown in Fig. 3a). Once the
network structure is defined, training options have to be specified. The network
uses stochastic gradient descent with momentum (sgdm) with an initial learning
rate of 0.01 and 4 epochs. Setting the "trainingOptions" for the training cycle to
run for every epoch to monitor accuracy during training (shown in Fig. 3b). There
are different options for choosing a good optimizer (solver) like ’adam’ (adaptive
moment estimation), ’rmsprop’ (root mean square propagation), and ’sgdm’.
These optimizers were tried for results since results improved with sgdm solver
hence used in this image classification algorithm. The solver updates a subset of
the data parameters at every step. The data is shuffled at every epoch, software
trains the network on training data and calculates the accuracy of validation data at
246 K. Jayaram et al.

Fig. 1 Partial architecture of ResNet-50

Fig. 2 Pseudocode to load and explore image data from the folder

regular intervals during training, but weights are not updated. The network is small;
hence, the number of epochs is also small for fine-tuning and transfer learning, as the
learning is already concluded. The option of ’plots’ in "trainingOptions" creates and
displays an image of training metrics at every iteration, which estimates the network
parameters’ with gradient update. The ’ValidationData’ option performs network
validation during training for every 50 iterations to calculate the root mean squared
error for the regression networks. The validation loss and accuracy are nothing but
the cross-entropy loss and accuracy of the percentage of images the network correctly
classifies (shown in Fig. 4).
Classifying Scanning Electron Microscope Images Using Deep … 247

(a)

(b)

Fig. 3 Pseudocode a Defines convolutional neural network architecture. b Solver

Fig. 4 Cross-entropy loss and accuracy of the percentage of images correctly classified

Predicting the validation of labels assumed to correct the trained network is the
fraction of labels that the network predicts correctly in our case, it is 98%. Valida-
tion determines the under- and over-fitting of the training data when performed at a
regular interval. Comparing training loss and accuracy to the corresponding valida-
tion metrics can get the network as overfitting. Using ’augmentedImageDatastore’
to "trainNetwork" reduces overfitting as it performs random transformations on the
input images. The training data and the training options using ’ExecutionEnviron-
ment’ in ’trainingOptions’ have a default Graphics Processing Unit (GPU) if available
for the parallel computing toolbox. Otherwise, it uses the Central Processing Unit
(CPU) for the training progress plot. A checkpoint is set in "trainingOptions" with
the path set ’CheckpointPath’ this can be used if training is abruptly interrupted,
248 K. Jayaram et al.

resuming training from the last saved checkpoint, which will be saved as a ".mat"
file.

3 Results and Analysis

Using this classification method, we could categorize high-temperature materials

into four subdivisions, (shown in Fig. 5). The Thermal Protection System (TPS)
material is one category where Silica, SiO2, and Carbon–Carbon Composite are used
as material for re-entry vehicles.
A set of training images of Crystalline and amorphous SiO2 from SEM (shown
in Fig. 6). The network uses a blob detector for feature detection, SURF, and KAZE
(detectSURFFeatures, detectKAZEFeatures) descriptors to classify the images. The
network includes several gradient-based descriptors combined for better accuracy
and speed. Figure 6 shows the training SEM images for crystalline and amorphous
SiO2 . Here we have trained our network so that it can classify a new image into
crystalline or amorphous SiO2
The SEM images are good for showing spatial variations and phase discrimina-
tion of the compound’s chemical compositions. XRD is basically used to analyze
crystal structure and identify the crystalline phases of the material to reveal chemical
composition data. (Shown in Fig. 7) shows different crystal structures like crys-
talline, cubic, amorphous, and tetragonal. Our further work includes analyzing and
classifying materials according to their crystal structure using XRD graphs.

Fig. 5 Classification of materials for “High-temperature materials”

Classifying Scanning Electron Microscope Images Using Deep … 249

(a) (b)

Fig. 6 SEM images a crystalline SiO2 b amorphous SiO2

(a) (b) (c) (d)

Fig. 7 XRD image structure a Crystalline b Cubic c Amorphous d Tetragonal

4 Conclusion

We have analyzed “abstract” and “results” sections to classify high-temperature

materials into sub-categories. The crystallographic structures can be analyzed using a
Scanning Electron Microscope (SEM) and X-ray diffraction (XRD) machine images.
Once research papers are classified, we have extracted SEM images to analyze and
classify images into any two categories. Using deep-learning with machine learning
techniques to classify images automatically is proposed in this technical paper. Clas-
sifying SEM SiO2 images into a crystalline or amorphous structure was successful
using deep learning. The same needs to be extended for classifying different mate-
rial crystallographic structures. Our further work includes the classification of XRD
images using deep learning.
There is no conflict of interest. We have not received any funds for this research
work.
250 K. Jayaram et al.

References

1. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional

neural networks. In: Advances in neural information processing systems. ACM 60(6):1097–
1105
2. Zeiler MD, Fergus R (2013) Visualizing and understanding convolution networks. In: Computer
vision, and pattern recognition, lecture notes, vol 8689. Springer
3. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document
recognition. In: Proceedings of IEEE, vol 86, pp 2278–2324
4. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing
internal covariate shift. In: Proceedings of 32nd international conference on machine learning,
vol 37, pp 448–456
5. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks
for visual recognition. In: Computer vision and pattern recognition. Lecture notes in computer
science. Springer
6. Simonyan K, Zisserman A (2015) Very deep convolution networks for large-scale image
recognition. In: Computer vision and pattern recognition
7. Neena A, Geetha M (2018) Image classification using an ensemble-based deep CNN. Springer
Nature Singapore
8. He K, Zhang X, et al (2015) Deep residual learning for image recognition. IEEE Xplore
9. Tajbakhsh N, et al (2016) Convolutional neural networks for medical image analysis: full
training or fine-tuning? IEEE Trans Med Imag 35(5)
10. Tan M, Le QV (2020) EfficientNet: rethinking model scaling for convolution neural networks.
In: 36th international conference on machine learning, USA
11. Haritha H, Thangavel SK (2019) A modified deep learning architecture for vehicle detection
in traffic monitoring system. Int J Comput Appl
12. Huang G, Liu Z, et al (2017) Densely connected convolutional networks. In: IEEE conference
on computer vision and pattern recognition
13. Harisankar V, Sajith Variyar VV, Soman KP (2020) Unsupervised depth estimation from
monocular images for autonomous vehicles. In: 4th international conference methodologies,
and communication
14. Das GP, Vance PJ, et al (2018) Computational modeling of salamander retinal Ganglion cells
using machine learning approaches. Neurocomputing
15. Zheng H, Fang L, Ji M, et al (2015) Deep learning for surface material classification using
haptic and visual information. IEEE Trans Multimed
16. Sticlaru A (2017) Material classification using neural networks. Comput Vis Pattern Recogn
17. Jayaram K, Prakash G, Jayaram V (2020) Automatic extraction of rarely explored materials
and methods sections from research journals using machine learning techniques. Int J Adv
Comput Sci Appl
An Efficient Kernel-SVM-based Epilepsy
Seizure Detection Framework Utilizing
Power Spectrum Density

Vinod Prakash and Dharmender Kumar

Abstract Machine learning algorithms can leverage electroencephalogram (EEG)

data to extract valuable information. The main objective of this research is to inves-
tigate the potential utilization of these technologies in the diagnosis of mental disor-
ders, specifically epilepsy. For feature extraction, the Welch power spectral density
(PSD) is used on a dataset from Nigeria that is available through the Zenodo project.
The purpose of this is to aid in the diagnosis of epilepsy. Multiple classifiers, including
Kernel SVM, Naive Bayes, and Random Forest, are employed for classification. Both
methods are used in conjunction with each other. The proposed approach attains
a remarkable accuracy of 93.09% by utilizing the Kernel support vector machine
(SVM), surpassing other classifier models. The performance results are significant
as they have the potential to enhance the diagnosis of neurological disorders, leading
to improved patient outcomes.

Keywords Electroencephalography · Feature extraction · Welch power spectral

density · Epilepsy prediction

1 Introduction

Whenever an individual experiences a seizure due to epilepsy, their life is put in

jeopardy. Epilepsy, a neurological disorder, is primarily caused by abnormal neural
activity in the brain. The electroencephalogram (EEG), also referred to as an elec-
troencephalogram, is the term used to describe the measurement of voltage fluctu-
ations that arise from ionic current flows in the neurons of the brain. The process
involves recording the brain’s natural electrical activity using numerous electrodes
placed on the scalp for a brief period of 20–40 min. The foremost utilization of EEG

V. Prakash (B)
FGM Government College, Adampur (Hisar), India
e-mail: [email protected]
D. Kumar
Department of CSE, Guru Jambheshwar University of Science Technology, Hisar, Haryana, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 251
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_22
252 V. Prakash and D. Kumar

Fig. 1 An illustration of
10:20 electrode (channel)
placement

is in the field of neurology for the diagnosis of epilepsy. The standard EEG study
reveals abnormalities caused by epileptic activity. This is attributed to its ability to
clearly illustrate the distinct and often rhythmic patterns that precede or coincide with
initial observable behavioral changes associated with seizures. In EEG recordings,
a channel or electrode is placed on the scalp of the subject. The International 10–20
system is a universally acknowledged approach employed to identify and position
scalp electrodes in the course of an EEG test or experiment, as depicted in Fig. 1.
Automated seizure detection systems have been developed due to the high preva-
lence of epilepsy and the overwhelming workload of human specialists in identi-
fying seizures. Out of the four classification algorithms available, namely Random
Forest (RF), Decision Tree (DT) algorithm C4.5, Support Vector Machine (SVM)
combined with Random Forest (RF), and Support Vector Machine (SVM) combined
with C4.5, the most accurate ones for seizure detection have been identified [23]. In
the research reported in [26], epileptic episodes are detected by utilizing variables
such as estimated entropy and sample entropy derived through WPD. Further, the
authors utilize Support Vector Machine (SVM) for data classification. To reduce
the number of independent variables, the authors in [25] employ WPD and Kernel
Principal Component Analysis (KPCA). Recently, considerable attention has been
focused on EEG signal processing, particularly convolutional neural networks and
other advanced deep learning methodologies, owing to their remarkable achieve-
ments [4, 20]. Although PSD is not directly associated with Machine Learning or
Deep Learning, it can be regarded as a valuable technique for pre-processing and
extracting features from EEG signals.
This research compares two groups: healthy and those with epilepsy. The EEG
signal of a single subject during an epileptic seizure is depicted in Fig. 2.
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 253

Fig. 2 The EEG signal representation of an epileptic patient

The study presented in this article makes a comparison between two different
approaches to distinguishing healthy people from those who have epilepsy.
To classify seizures brought on by epilepsy, one method uses frequency-domain
features extracted from the Welch power spectral density. We determined the average,
standard deviation, lowest, and maximum levels of epoch signal fluctuations by
making use of datasets that were made available to the public by the Zenodo organi-
zation. Accuracy, loss, confusion matrix, sensitivity, and specificity were calculated
and compared after training a classifier (kernel SVM, Random Forest, Naive Bayes,
Decision tree) using these metrics. The kernel support vector machine (SVM) outper-
forms all other classifiers on the Nigerian dataset, with an accuracy that surpasses
that of the referenced work [8]. This research aimed to compare the efficacy of
differentiating between epileptic and healthy people.
The remaining portions of the paper are organized into different sections. The
literature review of the various epilepsy classification methods is mentioned in Sect. 2.
The fundamentals of power spectral density and several classifiers are explained
in Sect. 3. Section 4 defines the dataset description and performance evaluation
measures. The suggested framework and method are shown in Sect. 5. The results are
analyzed in Section 6, while the discussion of the classification approach’s potential
future applications is included in Sect. 7.
254 V. Prakash and D. Kumar

2 Literature Review

Power spectral density (PSD) analysis holds great significance as a simple and essen-
tial approach for processing EEG signals in the frequency domain. Consequently, it
finds extensive application in the classification of epilepsy signals. Rajaguru et al.
[17] utilize Power Spectral Density (PSD) for feature extraction and Correlation
Dimension for epilepsy classification from EEG signals and results indicate 68.88%
accuracy with an average Performance Index of 7.69%. Donos et al. [6] propose
a simple seizure detection algorithm based on intracranial EEG and random forest
classification. The algorithm has a high sensitivity of 93.84% and a low false detec-
tion rate of 0.33/h. PSD estimation methods are a crucial aspect of analyzing EEG
signals when it comes to extracting frequency domain features. The periodogram
(PD) is an essential non-parametric method that one must utilize to estimate the PSD
[11]. In their publication [9], Ghayab et al. introduced a novel approach that incor-
porates optimal allocation techniques and spectral density estimation to analyze and
classify epileptic EEG signals. They achieved a 100% overall accuracy, surpassing
previous methods by 14.1%. Wavelet transform extracts wavelet coefficients from
EEG signals, representing time and freq. domains. Various wavelet transforms are
used in EEG signal analysis & classification in time-freq. domain [26]. In [14],
authors developed an automated method to detect seizures. They used permutation
entropy to extract important attributes from EEG recordings. These features were fed
into an SVM classifier, resulting in an 86.10% accuracy. In their study, Ghayab et al.
[8] employed random sampling and feature selection to represent various combi-
nations of epileptic EEG features. They evaluated these features using an LS-SVM
classifier and achieved an impressive accuracy rate of 99.9% by identifying the most
distinguishing EEG features. Dhar and Garg [5] proposed a combined approach of
power spectral density and DWT for feature extraction and classification of epilepsy
in EEG with an accuracy of 90.1%. In paper [18], Rohira et al. highlighted the
need for automatic epilepsy prediction. Feature extraction is done using PSD and
classification of seizures is performed with Random Forest, achieving an accuracy
of over 90%. Liu et al. [12] presented an approach to detect and predict epilepsy
by combining the periodic and aperiodic elements of the EEG power spectrum.
The combined features yielded an average accuracy of 99.95% when tested on the
CHB-MIT database.

3 Background

The purpose of this section is to explain the basis for power spectral density and
classifiers for classifying seizures with epileptic features.
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 255

3.1 Welch Power Spectral Density

Power spectral density (PSD) is frequently used in the fields of signal processing and
communication systems. It quantifies the distribution of power in a signal across its
frequency spectrum, providing insights into how power is allocated among various
frequencies [21]. Within the realm of frequency analysis, PSD estimation methods
as in Equations 1 and 2. The estimation of the PSD can be determined through two
main categories of methods: non-parametric approaches and parametric approaches.
Among the non-parametric methods, the Periodogram (PD) stands out as a significant
technique [11].
N −1 2
1 c −j2π fn −fs fs
P (f ) =
c
xn e <f < (1)
fs N n=0 2 2

where xnc represents the temporal data of channel c, consisting of N samples [2]. Pwc
is utilized to calculate the PSD of the signal in channel c within the frequency range
w = [w1, w2] having
f =w2
f =w1 P c (f )
c
PW = f =f s (2)
2
f =0 P c (f )

The procedure for determining the Power Spectral Density of a signal consists of
the following steps:
• Signal Preprocessing: Signal preprocessing Prepare signal for analysis. Apply
techniques like filtering, windowing, and other methods to remove noise or
artifacts that may disrupt Power Spectral Density estimation.
• Segmentation: The signal is partitioned into smaller segments or windows,
typically, with overlapping, to enhance the precision of PSD.
• Windowing: Each segment is subjected to multiplication by a window function
to mitigate spectral leakage. Windowing reduces spectral leakage via window
functions like Hamming, Hanning, Blackman, etc.
• Discrete Fourier Transform (DFT): To transform the signal from the time domain
to the frequency domain, it is imperative that the Discrete Fourier Transform is
applied to every segmented window of the signal.
• Power Calculation: For each segment, the squared magnitude of the DFT output
is used to calculate the power density.
• Averaging: To estimate the power distribution over frequencies of the signal, all
PSD estimates are averaged.
The PSD of a single subject during an epileptic seizure is depicted in Fig. 3.
256 V. Prakash and D. Kumar

Fig. 3 An illustration of
power spectral density of
EEG signal

3.2 Classifiers

Support vector machine (K-SVM): The Support Vector Machine (SVM) was
initially formulated by Cortes and Vapnik [3] and has gained significant popu-
larity as a classification technique. Primarily, the SVM is employed to partition
the extracted sets of features into two distinct classes by identifying an optimal
hyperplane. The author in [22] used a hybrid SVM model, including kernel-type
parameters and a regularization constant, to improve classification for the detection
of epileptic seizures. They applied genetic algorithm-based GA-SVM and particle
swarm optimization-based PSO-SVM algorithms to select parameter values and
enhance diagnostic applications. In various studies, Support Vector Machines (SVM)
demonstrated higher levels of accuracy [7, 10, 14].
Kernel-SVM: The Kernel SVM algorithm is utilized in machine learning when
the data is not able to be separated in a linear fashion. This means that a straight line
cannot be used to divide the data into distinct categories. The kernel function is a
mathematical function that takes the input data and transforms it into a new feature
space where the data can be separated by a hyperplane.
Random Forest (RF): This methodology centered on evaluating the efficacy of
the chosen classifiers in detecting epileptic seizures. It conducts data classification
by generating numerous decision trees during the training phase [15]. Within the
Random Forest (RF) framework, each tree is considered an independent classifier,
and its weighted classification outcome contributes to the final classification using a
majority voting technique [13, 15]. The wavelet packet features technique has been
employed, along with the adoption of the RF classifier, as the classifier for identifying
the epilepsy state [24] and gained an accuracy of 84.8%.
Naive Bayes (NB): Naive Bayes is a probabilistic classifier that relies on Bayesian
theory, assuming independence among the features of a specific class. The estimation
for the occurrence or absence of a particular feature in the Naive Bayes model is
determined through maximum likelihood [19]. D is a training set consisting of n-
classes. Each class is identified by its attribute vector Y and corresponding class
label. The class with the highest posterior probability is assigned the attribute vector
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 257

Y, calculated using Eqs. 3 and 4 [19].

P(ci |Y ) > P(cj |Y ) for 1 ≤ j ≤ n, j = i (3)

where
P(Y |Ci )P(Ci )
P(Ci |Y ) = (4)
P(Y )

In the given equation, P(Ci ) is an initial class probability, P(Ci |Y ) is a posterior

probability, P(Y ) is the prior probability of Y, and P(Y |Ci ) is the posterior probability
with respect to C i .
Decision Tree (DT): Decision tree learning is a technique utilized to estimate
discrete-valued functions, wherein the acquired function is depicted through a deci-
sion tree. Alternatively, learned trees can be expressed as collections of if-then rules
with the purpose of enhancing human comprehensibility [1, 16]. These methods of
learning are widely favored as inductive inference algorithms and have demonstrated
their effectiveness in various domains, including the acquisition of diagnostic skills
in the field of medicine.

4 Evaluation Protocol

The following section presents a description of the dataset and assesses the
performance of the proposed model using appropriate performance metrics.

4.1 Dataset

A total of 212 residents of Nigeria answered the questionnaire. There were 112 (Male
67, Female 45) epileptic seizure-prone patients and 92 (Male 67, Female 25) healthy
individuals in the dataset. A description of the dataset is detailed in Table 1. The EEG
features 14 channels and can record at 128 Hz at 16-bit resolution. Epilepsy patients
and control persons with no history of seizures were studied separately (subjects who
were prone to epileptic seizures). All across the world, the 10–20 system is used to
determine where to place electrodes. Training and testing are split 80:20, and the
epochs have a lifespan of ten iterations.
258 V. Prakash and D. Kumar

Table 1 Dataset description

Field Description
Dataset name Nigeria epilepsy dataset
Data details Epilepsy:112, Control:92
Number of subjects 204 (123/81)
Electrode count 14 channel
Sampling rate 128 Hz
Task Epilepsy detection
Dataset link https://zenodo.org/record/1252

4.2 Performance Metrics

The effectiveness of the proposed action is measured across performance metrics in

Eqs. 5, 6, 7 and 8:

(TN + TP)
Accuracy = (%) (5)
(TP + TN + FN + FP)
TP
Sensitivity = (%) (6)
(TP + FN )
TN
Specificity = (%) (7)
(TN + FP)

where TP = True Positive , FN = False Negative, FP = False Positive, TN = True

Negative
F1 Score: The F1 score, is a measure of a model’s accuracy on a dataset.

Recall ∗ Pr ecision
F1Score = 2 ∗ (8)
Recall + Pr ecision

The F1 score varies from 0 to 1, with a greater value signifying enhanced per-
performance. Perfect precision and recall are denoted by an F1 score of 1, whereas
a score of 0 denotes inadequate performance.
When it comes to evaluating predictions, precision is the measure of the number
of accurately predicted positive instances in comparison to the total number of
instances that were predicted as positive. Recall measures accurately predicted
positive instances relative to total actual positives.
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 259

5 Proposed Framework

The investigation fed the classification model with features that correspond to
different times in an individual’s life. The application of the Welch Power Spectrum
method is beneficial to the feature extraction process of the classification model.
The output parameters of four distinct classifiers were studied in order to facilitate
comparisons between them (Kernel SVM, Random Forest, Decision Tree, Naive
Bayes). The entirety of this method is shown in Fig. 4 as a flowchart. Algorithm 1
provided the feature extraction process using the power spectral density approach.
For the removal of specific artefacts, EEG signals are pre-processed with a band-
pass filter with a defined frequency range. EEG signal segmentation is done with a
set window size. To reduce spectral leakage, each segment is processed through a
window function, hanning in this algorithm.

Algorithm 1 Proposed power spectral density feature extraction with DFT

1: Input:EEG, window size : Ws, Sampling rate : Fs, Overlap : O
2: Output:Power spec : POWSEP , Freq resolution : FRES
3: Begin
4: POWSEP = [ϕ], FRES = [ϕ]
5: /*Preprocessing EEG
6: for all EEG do
7: EEG[F ] = bandpass(EEG)
8: end for
9: /*Segmentation and windowing EEG
10: for all EEG[F ] do
11: SEG = EEG SEG(EEG[F ], Ws, O)
12: Wseg = hanning[SEG]
13: FFT = fft(W SEG)
14: end for
15: /*Calculate Power Spectral
16: for all FFT do
17: Cof DFT = FFT (Wseg )
18: POW SEP = [CofDFT ]
19: end for
20: /*Compute frequency resolution
21: FRES = Fs/len(EEG)
22: /*Append to list
23: Return POWSEP
24: Return FRES
260 V. Prakash and D. Kumar

Fig. 4 Flowchart activity of power spectral density method

After frequency resolution with the length of EEG signals, the power spectral of
the windowed segment is computed and inserted into the list.

6 Results and Analysis

Table 2 showcases the parameters’ values for the different classifiers utilized on
the Nigerian dataset. Additionally, the accuracy of the model is depicted through a
graphical representation known as the receiver operating characteristic curve (ROC)
curve. The best classifier’s ROC curve and AUC (Area Under Curve) values are
displayed in Fig. 5. Fivefold cross-validation is used here for the validation purpose.
Five-fold accuracies [0.9919, 0.9919, 0.9913, 0.9913, 0.9913]. The average accuracy
is 0.9916 for Nigerian Data in case of Kernel SVM.
The classification results show the performance of various classifiers on a dataset.
The Random Forest classifier achieved an accuracy of 89.03%, with a sensitivity
of 80.20% and precision of 87.80%. It achieved a balanced performance between
precision and sensitivity, with an F-1 score of 0.78. The Decision Tree classifier had
a slightly lower accuracy of 88.37% but performed well in terms of sensitivity with a
score of 83.04%. It accurately predicted 88.18% of the positive instances out of the
total predicted positives. The Naive Bayes classifier achieved an accuracy of 82.23%
but had the highest sensitivity of 94.76% and precision of 83.75%. The F-1 score of

Table 2 Performance evaluation for the proposed approach

Classifier Acc.(%) Sen.(%) Prec(%) F-1 Score
Random forest 89.03 80.20 87.80 0.78
Decision tree 88.37 83.04 88.18 0.76
Naive Bayes 82.23 94.76 83.75 0.80
Proposed K-SVM 93.09 92.16 91.58 0.82
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 261

Fig. 5 ROC curve with

AUC value for Kernel-SVM

0.80 suggests a lower overall performance due to lower accuracy and precision. The
Kernel-SVM classifier performed exceptionally well, with an accuracy of 93.09%,
a sensitivity 92.16%, and precision of 91.58%. Its F-1 score of 0.82 suggests a
strong overall performance with high precision. The classification results of various
performance metrics are illustrated in Figs. 6 and 7.

Fig. 6 Accuracy and sensitivity

262 V. Prakash and D. Kumar

Fig. 7 Precision and F1 score

7 Conclusion

Detecting epilepsy through identifying significant features related to seizures in

EEG data shows promise with the use of power spectrum density and classifi-
cation methods. Customized treatment strategies based on an individual’s seizure
patterns can significantly improve their quality of life. The Kernel SVM is highly
accurate, achieving a 93.09% success rate in distinguishing between seizure and
non-seizure activities. However, accuracy can be improved by expanding the dataset
size, exploring deep learning algorithms, and integrating other physiological signals.
This research is a critical advancement in medical diagnostics and paves the way for
creating personalized epilepsy detection systems.

References

1. Almustafa KM (2020) Classification of epileptic seizure dataset using different machine

learning algorithms. Inform Med Unlocked 21:100444
2. Birjandtalab J, Heydarzadeh M, Nourani M (2017) Automated EEG-based epileptic seizure
detection using deep neural networks. In: 2017 IEEE international conference on healthcare
informatics (ICHI). IEEE. https://doi.org/10.1109/ichi.2017.55
3. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.
org/10.1007/bf00994018
4. Craik A, He Y, Contreras-Vidal JL (2019) Deep learning for electroencephalogram (EEG)
classification tasks: a review. J Neural Eng 16(3):031001
5. Dhar P, Garg VK (2023) Detection of epileptic seizure using a combination of discrete wavelet
transform and power spectral density. In: International conference on innovative computing
and communications. Springer Nature Singapore, pp 637–646
6. Donos C, Dümpelmann M, Schulze-Bonhage A (2015) Early seizure detection algorithm based
on intracranial EEG and random forest classification. Int J Neural Syst 25(05):1550023
7. Fasil O, Rajesh R (2019) Time-domain exponential energy for epileptic EEG signal classifica-
tion. Neurosci Lett 694:1–8
An Efficient Kernel-SVM-based Epilepsy Seizure Detection … 263

8. Ghayab HRA, Li Y, Abdulla S, Diykh M, Wan X (2016) Classification of epileptic EEG signals
based on simple random sampling and sequential feature selection. Brain Inform 3(2):85–91
9. Ghayab HRA, Li Y, Siuly S, Abdulla S (2018) Epileptic EEG signal classification using
optimum allocation based power spectral density estimation. IET Signal Proc 12(6):738–747.
https://doi.org/10.1049/iet-spr.2017.0140
10. Hassan AR, Subasi A (2016) Automatic identification of epileptic seizures from EEG signals
using linear programming boosting. Comput Methods Programs Biomed 136:65–77
11. Kiymik MK, Subasi A, Ozcalık HR (2004) Neural networks with periodogram and autoregres-
sive spectral analysis methods in detection of epileptic seizure. J Med Syst 28:511–522
12. Liu S, Wang J, Li S, Cai L (2023) Epileptic seizure detection and prediction in EEGs using
power spectra density parameterization. IEEE Trans Neural Syst Rehabil Eng
13. McDonald AD, Lee JD, Schwarz C, Brown TL (2014) Steering in a random forest: ensemble
learning for detecting drowsiness-related lane departures. Hum Factors 56(5):986–998
14. Nicolaou N, Georgiou J (2012) Detection of epileptic electroencephalogram based on
permutation entropy and support vector machines. Expert Syst Appl 39(1):202–209
15. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens
26(1):217–222
16. Polat K, Gu¨ne¸s S (2007) Classification of epileptiform EEG using a hybrid system based on
decision tree classifier and fast Fourier transform. Appl Math Comput 187(2):1017–1026
17. Rajaguru H, Kumar Prabhakar S (2017) Power spectral density with correlation dimension for
epilepsy classification from EEG signals. In: 2017 2nd international conference on commu-
nication and electronics systems (ICCES), pp 376–379. https://doi.org/10.1109/CESYS.2017.
8321303
18. Rohira V, Chaudhary S, Das S, Prasad Miyapuram K (2023) Automatic epilepsy detection
from EEG signals. In: Proceedings of the 6th joint international conference on data science &
management of data (10th ACM IKDD CODS and 28th COMAD). Association for Computing
Machinery, New York, NY, USA, pp 272–273. https://doi.org/10.1145/3570991.3570995
19. Sharmila A, Geethanjali P (2016) DWT based detection of epileptic seizure from EEG signals
using naive Bayes and k-NN classifiers. IEEE Access 4:7716–7727
20. Shoeibi A, Khodatars M, Ghassemi N, Jafari M, Moridian P, Alizadehsani R, Panahiazar M,
Khozeimeh F, Zare A, Hosseini-Nejad H et al (2021) Epileptic seizures detection using deep
learning techniques: a review. Int J Environ Res Public Health 18(11):5780
21. Slavič J, Mršnik M, Cěsnik M, Javh J, Boltežar M (2021) Signal processing. In: Vibration
fatigue by spectral methods. Elsevier, pp 51–74. https://doi.org/10.1016/b978-0-12-822190-7.
00009-8
22. Subasi A, Kevric J, Abdullah Canbaz M (2019) Epileptic seizure detection using hybrid machine
learning methods. Neural Comput Appl 31:317–325
23. Wang G, Deng Z, Choi KS (2015) Detection of epileptic seizures in EEG signals with rule-
based interpretation by random forest approach. In: Advanced intelligent computing theories
and applications: 11th international conference, ICIC 2015, Fuzhou, China, August 20–23,
2015. Proceedings, Part III 11. Springer, pp 738–744
24. Wang Y, Cao J, Lai X, Hu D (2019) Epileptic state classification for seizure prediction with
wavelet packet features and random forest. In: 2019 Chinese control and decision conference
(CCDC). IEEE, pp 3983–3987
25. Yang C, Deng Z, Choi KS, Wang S (2015) Takagi–Sugeno–Kang transfer learning fuzzy logic
system for the adaptive recognition of epileptic electroencephalogram signals. IEEE Trans
Fuzzy Syst 24(5):1079–1094
26. Zhang Y, Liu B, Ji X, Huang D (2017) Classification of EEG signals based on autoregressive
model and wavelet packet decomposition. Neural Process Lett 45:365–378
YOLO Algorithm Advancing Real-Time
Visual Detection in Autonomous Systems

Abhishek Manchukonda

Abstract This research paper presents an overview of the YOLO (You Only Look
Once) Algorithm, a pioneering object detection approach. Introduced in 2015 by
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, YOLO has
become a state-of-the-art solution for object detection. It underwent two incre-
mental improvements:”YOLO9000: Better, Faster, Stronger” and “YOLOv3: An
Incremental Improvement,” refining its capabilities while preserving its core concept.
The paper emphasizes the relevance of object detection for self-driving cars. Cur-
rent autonomous vehicles rely on Lidar technology, but YOLO offers a vision-based
alternative using image data, akin to human navigation, potentially improving safety
and accuracy in challenging conditions. The study delves into Convolutional Neural
Networks (CNNs), essential to the YOLO Algorithm. CNNs extract features and learn
filter values, efficiently handling large image datasets. The paper examines the tran-
sition from traditional Neural Networks to CNNs, addressing real-world computer
vision challenges. The YOLO Algorithm’s architecture is analyzed, demonstrating
simultaneous object localization and detection. The Convolutional Implementation of
Sliding Window streamlines the traditional approach, empowering YOLO to achieve
real-time performance with multiple object detection. The conclusion highlights
YOLO’s significance for future object detection and its potential impact on self-
driving cars. Real-time performance and high accuracy make YOLO essential for
safer and more efficient autonomous vehicles. As research advances, YOLO’s role
in shaping the future of autonomous driving becomes pivotal.

1 Introduction

The abbreviation “YOLO” stands for You Only Look Once and it explains the main
concept of the algorithm. Currently, YOLO is a state-of-the-art algorithm for Object
Detection problems. Since 2015, the authors have presented 2 improvements to the
original YOLO paper: “YOLO9000: Better, Faster, Stronger” and “YOLOv3: An

A. Manchukonda (B)
National Institute of Technology Warangal, Warangal, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 265
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_23
266 A. Manchukonda

Fig. 1 World from the point of view of self-drive cars [1]

Incremental Improvement” They have described both as only minor changes to the
original idea.

2 Why We Need Object Detection, in the Context of Cars

That Self-Drive

Self-driving cars are the future of public transportation systems. Currently, most
autonomous vehicles use Lidar to detect objects in their surroundings. However, this
is an imperfect solution, as Lidars are expensive and can suffer from accuracy issues
in certain circumstances (e.g., very bright sunlight may cause errors). Using some
additional data would be very helpful to avoid mistakes. This additional data could
be vision (See Fig. 1)—the basic data source based on which we (humans) drive our
cars.

3 What Types of Objects Could be Detected Using YOLO

In the earliest approaches to the Object Detection problem [2, 3], we were limited
to the specified types of objects. We had to hand-code n-features for each object and
then provide it to a classifier. This approach worked for some objects like faces but
didn’t work for all objects. With YOLO we don’t have that problem. It works for all
types of objects, regardless of shape, size, and color.
YOLO Algorithm Advancing Real-Time Visual Detection … 267

4 Intuitions Behind Convolutional Neural Networks

The main concept behind YOLO is CNN (Convolutional neural network). This
paragraph provides some intuition regarding how a CNN works and why we need it.

4.1 Why Do We Need CNN

Simple computer vision problems like handwritten digit recognition (e.g., MNIST
Dataset), could be solved using traditional Neural Networks. We take an input data
matrix of size m × n, convert it into an (m ∗ n) × 1 vector, then multiply it by NN
weights, we add bias, we apply some nonlinear function (e.g., sigmoid σ ) and we
end up with the first layer of the NN. Then we do the same until we reach the last
NN layer, which represents the NN prediction. For a 1 hidden layer NN, it will look
like that

z1 = (a0 × w1 ) + b1
a1 = σ (z1 )
z2 = (a1 × w2 ) + b2
a2 = σ (z2 )

Given an input vector a0 , it undergoes a transformation in the first hidden layer

(z1 ) before non-linear activation. The result, a1 , represents the output after activation.
This transformation involves a weight matrix w1 connecting a0 and a1 , accompanied
by bias b1 . The same process repeats for z2 and a2 using analogous parameters.
We can solve the MNIST problem using traditional NN because it contains a
relatively small amount of features. Let us assume that the first layer contains 1000
Nodes. In that case, for first layer, we will have to train 785000 features. (Fig. 2)

(28 × 28)image × 1000hidden layer nodes + 1000bias

= 784 × 1000 + 1000 = 785, 000 parameters

In the case of modern computers, it is feasible to store and optimize that amount
of parameters, but what if instead of a 28 × 28 gray picture we have a 1000 × 1000
RGB picture?

(10001000)image × 3RGB channels × 1000hidden nodes

= 3,000,000,000 parameters

Such a large number of parameters causes problems such as:

268 A. Manchukonda

Fig. 2 Neural network for MNIST dataset[4]

• It’s very hard to store 3 billion parameters in memory, most modern GPUs don’t
have enough memory.
• Optimizing 3 billion parameters is computationally expensive, training would
take a lot of time.
• Overfitting, when we have too many parameters it is really hard to find “global
optimum”, our model will work better for data that has been used for training than
for new data (which is an unfortunate situation).
So, what should we do instead if we want to work on real-world pictures? We will
come back to this problem later because to solve it we need to introduce the concept
of Edge Detection.

4.2 Edge Detection

One of the oldest ways of solving computer vision problems is edge detection. The
edge detection algorithm is performed using a matrix called a mask and an operation
called convolution. Convolution expression:

a
b
g(x, y) = (ω ∗ f )(x, y) = ω(s, t)f (x − s, y − t) (1)
s=−a t=−b

Example cells from detected edge matrix (Fig. 3)

f (1, 2) = (0 × −1) + (0 × −1) + (0 × −1) + (10 × 0) + (10 × 0)
E.g.: + (10 × 0) + (10 × 1) + (10 × 1) + (10 × 1) .
= 0 + 0 + 0 + 0 + 0 + 0 + 10 + 10 + 10 = 30
YOLO Algorithm Advancing Real-Time Visual Detection … 269

Fig. 3 Example convolution operation [5]

Fig. 4 Example edge detection Sobel filter for angles: 0°, 45° and 90°

f (3, 4) = (10 × −1) + (10 × −1) + (10 × −1) + (10 × 0) + (10 × 0)

E.g.: + (10 × 0) + (10 × 1) + (10 × 1) + (10 × 1) .
= −10 − 10 − 10 + 0 + 0 + 0 + 10 + 10 + 10 = 0

4.3 Masks Examples

In traditional computer vision algorithms, filter values were hand-engineered. The

most basic examples of hand-engineered filters are: Sobel Filter or Haar Filter. They
allow us to detect relatively simple image features (see Figs. 4 and 5).

4.4 What if We Don’t Know Filter Values

Using hand-engineered filter values, we can detect horizontal or vertical edges quite
good but what if we want to detect some more sophisticated features like cat edges?
This is where deep learning comes in: Instead of using hand-engineered filter
values, we can use a self-learning algorithm that finds the right values by itself (like
in Fig. 6).
What is interesting at this point is the fact that regardless of the image size, we
have the same number of parameters to train. No matter if our image is 20 × 20 pixels
or 1000 × 1000, we need to train the same number of parameters – the number of
270 A. Manchukonda

Fig. 5 Detected edges after applying Sobel filter [6]

Fig. 6 Cat features recognized by individual CNN layers [7]

values in filter mask (See Fig. 7). It is one of the main reasons why CNNs are so
popular in real-life computer vision problems.

5 Architecture of Convolutional Neural Network

Quick reminder about the basics of CNN and introduction of the naming convention
used further in this paper.
YOLO Algorithm Advancing Real-Time Visual Detection … 271

Fig. 7 Instead of using hand-engineered filter values we use learned values [5]

Fig. 8 Convolution operation example [8]

5.1 Convolution Layer

This layer is utilized to streamline the convolution operation to the image by using
the filter mask. Values from the filter mask are parameters that we are training (See
Fig. 8).

5.2 Convolution Operation on Volume

The previous examples contained only one channel, but in real life we usually have
more. For example, colorful Image has green, blue, and red channels. Three channels
(Red, Green, Blue). In that case, every filter needs to have 3 “sub filters”, one for
each channel. We apply convolution operation to each channel and then sum up the
result (See Fig. 9).
272 A. Manchukonda

Fig. 9 Convolution operation on volume example [9]

Fig. 10 Example of using 2 filters [9]

5.3 Multiple Filters

To detect multiple types of features in the previous layer we should use multiple
filters. For example, one filter will detect vertical edges and one will detect horizontal.
“This study examines the final numerical output as indicative of the quantity of filters
employed within the convolution operation (Refer to Fig. 10).”

5.4 Padding

To avoid under-representation of edge pixels, it is worth to add extra one layer of

extra pixels around the original image (See Fig. 11).
YOLO Algorithm Advancing Real-Time Visual Detection … 273

Fig. 11 Example of 1-pixel padding [9]

Fig. 12 Example of stride = 2 [9]

5.5 Stride

Stride determines the number of cells that the filter moves in the input to calculate
the cell in the output (See Fig. 12).

5.6 Pooling Layer

Second type of layer used in CNNs is Pooling Layer. There are two types of Pooling
Layers: Avg. Pooling and Maximum Pooling. The pooling layer is primarily utilized
to decrease the dimensions of outputs (see Fig. 13).
274 A. Manchukonda

Fig. 13 Example of pooling layer [9]

Fig. 14 Example CNN—LeNet 5 from 1998 [9]

5.7 Fully Connected Layer

The fully connected (FC) layer constitutes a stage where each neuron from the
preceding layer establishes a connection with every neuron present in the subsequent
layer. Its principle is the same as the traditional Neural Network.

5.8 Example CNN Architecture

Traditionally CNNs have been used to solve image recognition problems. The last
FC layer was a prediction layer. In the image below (Fig. 14), we can see LeNet-5.

5.9 Convert the FC Layer to the Convolutional Layer

An interesting “trick” is we can implement the FC layer by using convolution filters.

The amount of filters should correspond to the amount of Nodes that we wanted to
achieve by applying the FC layer. Mathematically, it is exactly the same, we have
the same amount of features to train but we will spot later that this “trick” will be
very useful for the YOLO algorithm (See Fig. 15).
YOLO Algorithm Advancing Real-Time Visual Detection … 275

Fig. 15 Example of FC layer converted into convolution [5]

6 Object Localization

Object Localization is a simplified version of the object detection problem, where

an image contains only one big object of some class. The goal of NN is to predict
if an image contains any object and if so, to predict central point coordinates, width
and high. A very similar concept will be used further in this paper to describe how
YOLO Algorithm works. For the purposes of this paper to avoid problems with the
size of an image, let’s assume that the top left corner will be described as point (0,
0) and the bottom right corner will be described as point (1, 1) (Example outputs at
Fig. 16).
? means that this element of the vector doesn’t matter so here could be anything
Loss function could be designed like this

(ŷ1 − y1 )2 + · · · + (ŷn − yn )2 if y1 = 0
L(ŷ, y) = (2)
(ŷ1 − y1 )2 if y1 = 1

where y1 = po , y2 = bx , y3 = by etc.

7 Sliding Window

If an image contains more than one object we can no longer use Object Localization,
so we need to find another way to detect objects. One of the popular approaches is
the Sliding Window Algorithm. It is very simple but provides good enough results.
276 A. Manchukonda

Fig. 16 Example object localization output

7.1 How Algorithm Works

In the Sliding Window Algorithm, we take part of an image (called window), feed
forward it through a Neural Network (or any other classifier) and end with a prediction
if this part of an image contains an object. We repeat these steps for each part of the
picture (that’s why we call it sliding a window) and as a result, we have predictions
for all fields in the image.

7.2 Why It is Not Perfect

The sliding window approach is relatively simple but unfortunately, it has a few
disadvantages, such as
• We know neither the size nor shape of an object we are looking for;
• We need to feed-forward thousands of images through a classifier which is
computationally expensive
• We don’t know which stride to choose, if we choose too small, we will work many
times on nearly the same image, if we choose too big accuracy will be really poor,
so how we can make it in a smarter way?
YOLO Algorithm Advancing Real-Time Visual Detection … 277

Fig. 17 Example of conv implementation of sliding window [10]

8 Convolutional Implementation of Sliding Window

We can perform the Sliding Window Algorithm much faster and more efficiently
using its Convolution implementation. The idea of Convolutional Implementation of
a Sliding Window was first introduced in Feb. 2014 by Sermanet,1 Eigen,2 Zhang,3
Mathieu,4 Fergus,5 LeCun6 in “OverFeat: Integrated Recognition, Localization and
Detection using Convolutional Networks” paper. One year later it was used in the
original YOLO paper.

8.1 How It Works

Using convolutions, we can share a lot of computations. We apply convolution opera-

tion to the original image, and we “accumulate” knowledge about individual regions
in cells of the next layer. Then we apply the next convolution we “accumulate accu-
mulated” knowledge and so on and so forth. We end up having information about
regions in a single cell of the last output. More on that in the original paper (See
Fig. 17).

1 New York University, [email protected].

2 New York University, [email protected].
3 New York University, [email protected].
4 New York University, [email protected].
5 New York University, [email protected].
6 New York University, [email protected].
278 A. Manchukonda

9 You Only Look Once

Finally, we’ve reached YOLO—You Only Look Once. YOLO combines ideas from
Convolution Implementation of Sliding Window and Object Localization.

9.1 Working Procedure

The YOLO Algorithm employs a Convolution Implementation of Sliding Window on

the input image, resulting in the segmentation of the image into a grid cell structure.
Each cell amalgamates contextual information from its own vicinity. These cells
represent vectors akin to those utilized in Object Localization. Encoded within are:
the probability of the cell containing the centroid of an object, the coordinates of
this centroid (adopting a coordinate system, where the top-left corner of each grid
cell corresponds to (0, 0) and the bottom-right corner corresponds to (1, 1)), the
dimensions of the object (which may exceed unity due to the cell’s cognizance of
neighboring cells), and the probabilities associated with various object classes (as
depicted in Example Output for a Single Cell in Fig. 19). For example, in the original
YOLO architecture last output is 7 × 7 × 30 (See Fig. 18) which means that we have
49 grid cells and each cell is a vector of 30 elements that describes that grid.
Following the convolutional neural network (CNN) propagation of our image,
predictions are generated for each individual cell. Achieving the ultimate predictions
necessitates the execution of two sequential procedures (Refer to the outcome of these
two steps depicted in Fig. 20).
• Exclude all predictions where the value of po is below a certain predefined
threshold. Therefore, eliminate any predictions where the probability that a cell
contains the central point of an object is less than e.g., 50%.

Fig. 18 Original YOLO architecture [11]

YOLO Algorithm Advancing Real-Time Visual Detection … 279

Fig. 19 Example cell output for (3 × 3) × 8 output layer

Fig. 20 Predictions for (13 × 13) grid cell after Non-max suppression [11]

• ”Employing non-maximum suppression to eliminate any remaining redundant

predictions.”

9.2 Non-max Suppression

When we have 2 predictions that intersect, we need to decide should both predictions
be kept, because they detect 2 objects or is it the same object and one of them should
be removed. To solve that problem, we need to introduce the idea of IoU
IoU—Intersection over Union. As the name suggests is a fraction
Intersectionsurfacearea
Unionsurfacearea
.
In Non-max suppression we compare surface of an intersection with the surface
of a union and when IoU is bigger than some threshold value (e.g., 0.6, but it depends
on implementation) we take only the detection with a bigger po —probability that
this cell contains a central point of an object. In case IoU is smaller than some
threshold value we keep both predictions. We perform Non-max suppression for all
intersecting predictions. (Example result of the algorithm in Fig. 21).
280 A. Manchukonda

Fig. 21 Example of non-max suppression algorithm result [12]

9.3 Anchor-Box

The concept of anchor boxes was first presented in the YOLO9000 research. The
notion is fairly straightforward. Consider the scenario where two objects, differing
in shapes or sizes, share a common central point within the same grid cell. In the
original YOLO paper, only one object could be detected under such circumstances.
However, the enhanced algorithm version addresses this limitation by incorporating
a more ’deep’ final layer within the convolutional neural network (CNN), which
divides the image into grid cells. This augmented layer generates multiple predictions
instead of a singular one. The dimensions attributed to each grid cell result from the
multiplication of the count of anchor boxes with the dimensions of the original
prediction. This methodology facilitates the identification of multiple objects within
individual grid cells.
In YOLO9000, the scholars employed a set of 5 anchor boxes, while YOLOv3
integrated 9 boxes. Instead of manual anchor curation, the authors applied K-means
clustering to the bounding boxes from the training dataset, enabling the automatic
discovery of well-suited anchor dimensions (example output in Fig. 22).

10 Conclusions

YOLO represents an instantaneous and versatile algorithm for object detection across
various contexts. It amalgamates remarkable operational speed with elevated preci-
sion (Refer to Fig. 23), thereby rendering it apt for resolving real-life challenges.
In forthcoming times, its potential application extends to the realm of autonomous
vehicles, contributing to the establishment of a more secure future for the collective
populace.
YOLO Algorithm Advancing Real-Time Visual Detection … 281

Fig. 22 Example output for CNN with 2 anchor boxes

Fig. 23 Comparison of the best object detection algorithms [13]

282 A. Manchukonda

References

1. Source of many intuitions and ideas about CNNs and object detection problems. In this paper
source of graphics from convert FC layer to convolutional layer and predictions for (13 ×
13)grid cell after Non-maxsuppression. https://www.coursera.org/learn/convolutional-neural-
networks
2. Viola-Jones Algorithm (2001) The first efficient face detector. https://www.cs.cmu.edu/~efros/
courses/LBMV07/Papers/viola-cvpr-01.pdf
3. Dalal and Triggs (2005) Histograms of oriented gradients for human detection. https://lear.inr
ialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf
4. Edges detected using Sobel image. https://en.wikipedia.org/wiki/Sobel_operator#/media/File:
Bikesgraysobel.jpg
5. Original YOLO implementation, source of object detection algorithms comperation. https://
pjreddie.com/darknet/yolo/
6. Source of first image (How self-driving cars see the world). https://towardsdatascience.com/
how-do-self-driving-cars-see-13054aee2503
7. Sobel filter. https://en.wikipedia.org/wiki/Sobel_operator
8. CNN Cat features visualizations. http://mcogswell.io/blog/why_cat_2/
9. Convolutional layer image. https://medium.freecodecamp.org/an-intuitive-guide-to-convoluti
onal-neural-networks-260c2de0a050
10. Convolution operation on volume, multiple filters, stride, padding, pooling, LeNet 5. https://
indoml.com/2018/03/07/student-notes-convolutional-neural-networks-cnn-introduction/
11. Original YOLO paper: https://arxiv.org/pdf/1506.02640v1.pdf
12. Mnist image source. https://m-alcu.github.io/blog/2018/01/13/nmist-dataset/
13. OverFeat: integrated recognition, localization and detection using convolutional networks,
source of example of conv implementation of sliding window graphics. https://arxiv.org/pdf/
1312.6229.pdf
14. YOLO9000: https://arxiv.org/pdf/1506.02640v1.pdf
15. YOLOv3: https://arxiv.org/pdf/1804.02767.pdf
16. First image source: https://towardsdatascience.com/how-do-self-driving-cars-see-13054a
ee2503
17. MNIST dataset. http://yann.lecun.com/exdb/mnist/
18. Convolution operation equation. https://en.wikipedia.org/wiki/Kernel_(image_processing)#
Details
19. Non-max suppression graphics. https://appsilon.com/object-detection-yolo-algorithm/
Optimizing Feature Selection in Machine
Learning with E-BPSO: A
Dimensionality Reduction Approach

Rajalakshmi Shenbaga Moorthy , K. S. Arikumar ,

Sahaya Beni Prathiba , and P. Pabitha

Abstract In the era of informatics, the effectiveness of machine learning models

is compromised due to the challenge of dimensionality in the data. The presence
of redundant and irrelevant features significantly increases computational complex-
ity, posing a central obstacle in the extraction of valuable insights from the extensive
dataset. Any machine learning model’s performance suffers because of the issue of the
plague of dimensionality. To improve the classifier’s performance, feature selection
is applied beforehand on applying the machine learning model. Feature selection is
accomplished using Enhanced Binary Particle Swarm Optimization (E-BPSO) with
the aid of boosting the performance of the K-Nearest Neighbor (K-NN) classifier
and is experimented on benchmarking real-world datasets. The conventional BPSO
suffers from the problem of exploration which leads to premature convergence. In
order to overcome the drawbacks of conventional BPSO, E-BPSO is proposed. The
enhancement is made by integrating the self-adaptive velocity to drive the particle
with the aid to balance exploration and exploitation. The performance of the proposed
E-BPSO is evaluated against the traditional binary particle swarm optimization algo-
rithm and genetic algorithm, considering metrics like accuracy, fitness, root mean
square error, and dimensionality reduction ratio.

Keywords Enhanced binary particle swarm optimization · K-nearest neighbor ·

Feature selection · Curse of dimensionality · Binary particle swarm optimization

R. Shenbaga Moorthy (B)

Sri Ramachandra Faculty of Engineering and Technology, Sri Ramachandra Institute of Higher
Education and Research, Chennai 600 116, India
e-mail: [email protected]
K. S. Arikumar
Department of Data Science and Business Systems, SRM University – Kattankulathur, Chennai,
India
S. B. Prathiba
Centre for Cyber Physical Systems, School of Computer Science and Engineering, Vellore Institute
of Technology, Chennai 600127, India
P. Pabitha
Department of Computer Technology, Anna University, Madras Institute of Technology Campus,
Chennai 600044, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 283
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_24
284 R. Shenbaga Moorthy et al.

1 Introduction

In this digital era, significant amount of information is produced as dimensions

expand due to factors such as Internet of Things (IoT) devices, distributed comput-
ing, social computing, and similar sources. These data pose a serious problem when
analyzing them for insights. The visual depiction of analyzing the data is shown in
Fig. 1. As the data is having huge dimensions, the data challenges the performance of
the machine learning algorithm. Feature selection is a promising solution for handling
data with high dimensions [1, 2]. Most important features are selected by applying a
criterion function which eventually helps the learning algorithm to improve its accu-
racy. Also, feature selection aids in compressing the data thereby reducing the space
occupied by the original dataset. Feature selection is a prime process in the data
preprocessing task which helps machine learning algorithms for producing better
insights. Feature selection is a challenging task in image recognition, text classifica-
tion, spam classification, etc. The problem of high computations and complexity in
the above-mentioned techniques made the researchers focus on metaheuristic algo-
rithms for feature selection. As metaheuristic algorithms have a broad spectrum of
applications including bioinformatics, the expectation on it to bring optimal subset of
feature is high. Thus, the researchers believe that applying metaheuristic algorithms
for feature selection will be a more promising solution [3]. Various metaheuristics
algorithms like genetic algorithm [4], particle swarm optimization [5], forest opti-
mization [6], ant colony optimization [7], and dragonfly optimization algorithm [8]
exist for selecting the best subset of features. Various mechanisms available for fea-
ture selection are specified in Fig. 2. In accordance with the principles outlined in
the No Free Lunch Theorem [9], there exists no metaheuristic algorithm that per-
fectly best identifies the optimal feature subset. This made the authors enhance the
performance of the BPSO for best selecting the optimal subset of features. From the
literature [5], it is noted that the performance of K-NN can be enhanced through

Fig. 1 Analyzing the data

Optimizing Feature Selection in Machine Learning with E-BPSO … 285

Fig. 2 Various feature selection mechanisms

feature selection. Thus, in this paper, K-NN has been used as a classifier to validate
the features selected by E-BPSO.
The remainder of the paper is structured as follows: Sect. 2 provides an overview
of prior research on feature selection using a range of metaheuristic algorithms.
Section 3 delves into the operation of the newly proposed E-BPSO. In Sect. 4, we
present a comparison between the proposed E-BPSO and the traditional BPSO
using benchmark datasets. Lastly, Sect. 5 offers a conclusion and discusses potential
avenues for future research.

2 Related Works

Feature selection has been performed on the KDD Cup dataset in WEKA and the
performance had been compared with the conventional way of selecting features.
The accuracy obtained was 99.794% when using decision tree as a classifier [10].
Multi-objective binary genetic algorithm integrated with adaptive mechanism was
designed for selecting the essential features. The algorithm includes five crossover
286 R. Shenbaga Moorthy et al.

probabilities which was assigned with different probability. The fitness function con-
sidered for evaluating the individual is error rate and number of features selected [4].
Improved binary particle swarm optimization was designed with the aid of improving
exploration and exploitation in selecting optimal feature subset [5]. Feature selec-
tion based on simulated annealing, hybrid particle swarm optimization, and fuzzy
K-means called FUFPS was designed to choose optimal feature subset for vari-
ous benchmarking dataset from UCI repository [11]. Modified Binary Sine Cosine
Algorithm (MBSCA) had been used for selecting the necessary features thereby
eliminating the irrelevant features. Beta and delta agents were introduced into the
conventional binary sine cosine algorithm (BSCA) and the algorithm was evaluated
on medical datasets against GA and BSCA [12].
Simple variance filter was used to maximize the accuracy of the predictive
model. Since the wrapper methods are computationally complex, the filter method
was used to select the features and the designed method was applied on gene
expression data [13]. Only necessary features had been selected using an ensem-
ble boosting framework which consists of XGBoost and two-step selection mecha-
nism [14]. Harris Hawks optimization algorithm and fruitfly optimization algorithm
were hybridized with the intention of choosing the essential features and the results
were promising than applying the conventional algorithms [15]. The performance
of vortex search algorithm for selecting the features had been improved by includ-
ing chaotic maps [16]. Dispersed foraging slime-based algorithm had been used to
access the quality of the attributes and to select only the informative features. The
method improves classification accuracy with reduced set of features [17]. Cen-
troid mutation-based search and rescue optimization algorithm were used to find the
quality and necessary features from the medical instances which intends to avoid
premature convergence of conventional algorithm [18]. Table 1 summarizes some of
the existing methodologies in feature selection.
The stagnation, obtaining local optimal solutions is the main challenge to be
addressed when applying the features selection algorithm for a particular dataset,
according to related works. Although there are many algorithms available for find-
ing the relevant features. The authors of this paper used E-BPSO to avoid early
convergence, stagnation, and stuck in solution that are not optimal on benchmarking
datasets collected from the UCI repository.

3 Enhanced Binary Particle Swarm Optimization

(E-BPSO)

A common combinatorial optimization problem in knowledge engineering is fea-

ture selection, which seeks to create a model with fewer features while enhancing
performance. The intention of the feature selection algorithm is to obtain the finest
subgroup of features without sacrificing the model’s accuracy. This research study
intends to identify a subset of features as best as possible using enhanced binary
Optimizing Feature Selection in Machine Learning with E-BPSO … 287

Table 1 State-of-the-art methods in feature selection

References Feature selection Classifier Dataset Performance
method
[19] Gorilla troops K-NN NSL-KDD,NoT- Accuracy 95.5%
optimizer on bird IoT (NSL-KDD),
swarms 98.7%
(GTO-BSA) (CICIDS-2017),
81.5%
(UNSW-NB15),
81.5% (NoT-IoT)
[20] Salp swarm K-NN Datasets from 97.4% (average
algorithm UCI accuracy)
[21] Information gain Bagging, IoT intrusion Accuracy 99.97%
and gain ratio multilayer dataset
perceptron, J48 (IoTID20),
and K-NN NSL-KDD
[22] Shuffled frog K-NN Datasets from Friedman
leaping algorithm UCI testscore =
1.4156E+00
[23] Dispersed K-NN Datasets from Rank 1 in
foraging slime UCI minimizing the
mold algorithm error rate

particle swarm optimization. The working of E-BPSO for selecting optimal subset
of features is shown in Fig. 3. The problem of stagnation, entrapment in local optima,
and early convergence occurs when using BPSO to settle a disagreement over dis-
crete feature selection. This problem has been resolved using E-BPSO. Falling in
local optima is an issue for the standard BPSO. To overcome this E-BPSO has been
proposed with the goal of avoiding stagnation and finding global optimal solution.
Enhancement is made in the original BPSO by including the self-adaptive velocity
for exploration and exploitation. Starting with N particles, E-BPSO assigns each

Fig. 3 Proposed E-BPSO

288 R. Shenbaga Moorthy et al.

Fig. 4 Working of E-BPSO

particle a value of either 1 or 0, where 1 denotes the existence of a feature and 0

denotes its absence. Each particle’s performance is accessed through 1-NN classifier.
The flowchart representing the working of E-BPSO is shown in Fig. 4. The fitness
of each particle is evaluated based on accuracy and Dimensionality Reduction Ratio
(DRR) which is specified in (1).

fit Pi ← α ∗ Accuracy + β ∗ DRR

. (1)

where .α and .β are the weight associated with accuracy and DRR. Each particle
updates its velocity as specified in (2).

.vid (t + 1) ← Prand (t) − Pi (t) ∗ vid (t) + C1 ∗ rand ∗ (PBestPosi − Pi ) + C2 ∗ rand ∗ (GBestPos − Pi ) (2)

where .Prand (t) represents the random particle at .t th time step, .vid (t + 1) exhibits the
velocity of the .ith particle in .d th dimension for iteration .t + 1. .PBestPosi represents
the personal best location of the .ith particle, .GBestPos represents the global best
location of the swarm, and .C1 and .C2 are acceleration coefficients. To convert the
continuous values of particle to binary values, V-shaped transfer function had been
used which is specified in (3).
Optimizing Feature Selection in Machine Learning with E-BPSO … 289

vid
T − velocity ←
. (3)
1 + vid 2

Based on the value of the transfer function, the particle’s position which actually
represents is computed using (4).

1 if rand < T − velocity
.Pid ← (4)
0 else

The algorithm depicting the working of E-BPSO is specified in Algorithm 1:

Algorithm 1: E − BPSO (D)

input : DatasetD
output: FeatureSubsetFS
1 Swarm ← GenerateNparticlesrandomly ∈ (0, 1)
2 InitializePBest, GBest as ∞
3 InitializePBestPos and GBestPos as 0
4 while t < Max_Iter do
5 for each particle Pi ∈ P do
6 Compute Fitness using (1)
7 if fit Pi ≤ PBest Pi then
8 fit Pi = PBest Pi
9 end
10 end
11 GBest ← argmin (PBest)
12 Update Velocity with the use of (2)
13 Update T-Velocity with the use of (3)
14 Update Particle’s Position using (4)
15 end
16 Return GBest

4 Experimental Results

Using benchmarking datasets acquired from the UCI repository [24], the proposed
E-BPSO is contrasted with various algorithms. Table 2 contains information about
the datasets. The datasets are divided 70:30, which indicates that 70% of the data
will be used for training and 30% for testing. 1-NN is used to assess the features
selected by E-BPSO. The metrics considered for evaluation are mean fitness value,
accuracy, Root Mean Square Error (RMSE), feature selection ratio, and standard
deviation of the fitness value. Other traditional algorithms like GA and BPSO are
compared to the suggested E-BPSO. The experiment is conducted for 30 times and
the average is taken into account for comparison. Parameters of algorithms taken for
experimentation are specified in Table 3.
290 R. Shenbaga Moorthy et al.

Table 2 Description of dataset taken for experimentation

Dataset #Features #Instances #Class
Wisconsin breast 10 699 2
cancer
Lung cancer 56 32 3
Wine 13 178 3

Table 3 Parameters of algorithms

Parameter E-BPSO, BPSO GA
#Particles 10 10
Max iteration 100 100
Acceleration coefficients .C1 , 2 –
.C2

Crossover rate – 0.8

Mutation rate –6 0.1

4.1 Comparison of Accuracy

Accuracy represents the ratio of instances which are classified correctly by 1-NN
represented in (5). Table 4 represents the accuracy of classifier K-NN where K =
1. Feature selection is used in conjunction with a number of different methods to
enhance the accuracy of 1-NN, which is depicted in Fig. 5. For lung cancer dataset,
E-BPSO improves accuracy by 17.68 and 13.59% than GA and BPSO, respectively.

Instances Correctly Classified

.Accuracy ← (5)
Total number of Instances

Table 4 Accuracy without feature selection

Dataset.\algorithms Without feature selection
Wisconsin breast cancer 95.146
Lung cancer 61.32
Wine 94.812
Optimizing Feature Selection in Machine Learning with E-BPSO … 291

Fig. 5 Accuracy of various algorithms

Table 5 Comparison of RMSE

Dataset.\algorithms Without FS GA BPSO E-BPSO
Wisconsin breast can- 0.210 0.223 0.202 0.186
cer
Lung cancer 0.359 0.591 0.543 0.272
Wine 0.192 0.173 0.092 0.154

4.2 Comparison of Root Mean Square Error (RMSE)

The ratio of sum-squared difference between target and values output by the classifier
to instances in the dataset, as stated in (6), is known as the root mean square error, or
RMSE. Table 5 represents the comparison of RMSE of various algorithms and also
for the entire dataset. The best values are bold faced. It is observed that E-BPSO
achieves minimum RMSE for Wisconsin breast cancer and lung cancer dataset. For
Wine dataset, BPSO achieves minimum RMSE than E-BPSO and the RMSE of
E-BPSO is increased by 40.25% than BPSO.

Num_Instances
yi − yi )2
(
RMSE ←
.
i=1
(6)
Num_Instances
292 R. Shenbaga Moorthy et al.

4.3 Comparison of Dimensionality Reduction Ratio

The dimensionality reduction ratio of E-BPSO had been compared with the GA
and BPSO. DRR is computed using (7). It has been observed that proposed E-
BPSO achieves minimum DRR than other algorithms as shown in Fig. 6. For Wine
dataset, E-BPSO reduces the dimensions by 14.28% and 85.71% than BPSO and GA,
respectively. This is because the proposed E-BPSO includes self-adaptive velocity
which prevents the particles from falling in local optimal solution.

selected features
DRR ← 1 −
. (7)
Total number of features

4.4 Comparison of Features Selected

Features selected by various methodologies are represented in Table 6. The cell’s

numeric entries stand in for the feature identity of the associated dataset. It is clear
from Table 6 that various algorithms select a unique subset of features which has
impact over the performance of the algorithm. But, E-BPSO chooses minimal subset
of features which results in reduction in storage space without compromising the
accuracy. For example, the E-BPSO chooses four features for Wisconsin breast cancer

Fig. 6 Comparison of dimensionality reduction ratio

Optimizing Feature Selection in Machine Learning with E-BPSO … 293

Table 6 Comparison of features selected

Dataset.\algorithms GA BPSO E-BPSO
Wisconsin breast 1, 2, 3, 4, 5, 7, 8, 9, 10 1, 3, 5, 6, 7, 8 2, 3, 5, 9
cancer
Lung cancer 1, 2, 3, 4, 6, 7, 9, 10, 6, 7, 19, 20, 22, 24 6, 17, 19, 20, 21
11, 12, 13, 21, 23, 24,
25, 26, 27, 32, 41, 52
Wine 1, 2, 3, 4, 5, 6, 7, 8, 10, 1, 2, 3, 4, 5, 9, 10 1, 3, 4, 7, 9, 10
11, 12, 13

Table 7 Comparison of mean and standard deviation

Dataset.\algorithms GA BPSO E-BPSO
Mean SD Mean SD Mean SD
Wisconsin breast 0.675 0.074 0.605 0.086 0.581 0.094
cancer
Lung cancer 0.472 0.045 0.452 0.043 0.452 0.043
Wine 0.508 0.064 0.449 0.111 0.426 0.103

dataset whereas GA chooses nine features and BPSO chooses six features. Also, E-
BPSO ranks first in maximizing accuracy of the Wisconsin breast cancer dataset
which is evident from Fig. 5.

4.5 Comparison of Rate of Convergence

The minimum and maximum fitness values obtained for Wisconsin breast can-
cer dataset for GA are 0.559313 and 0.832790, respectively, represented in Fig. 7.
Though, E-BPSO started with maximum fitness of 0.838970, it converged to
0.581495 across the course of iteration for Wisconsin breast cancer dataset. Sim-
ilarly, BPSO also had a minimum max fitness of 0.834013 than E-BPSO but the
minimum fitness remains higher than E-BPSO. The fitness value converges at nearly
64th iteration for E-BPSO with global optimal solution, but in the case of GA which
converges at 95th iteration with local optimal solution. This shows the inclusion of
adaptive weight accelerates the particle in better direction which tends to converge in
global optimal solution for the proposed E-BPSO. For the lung dataset, the minimum
fitness are 0.438682, 0.411955, and 0.411955 for GA, BPSO, and E-BPSO, respec-
tively, which are represented in Fig. 8. In the case of wine dataset, both BPSO and
E-BPSO have nearly the same fitness, but the minimum fitness of BPSO is 0.362131
and E-BPSO is 0.354568 which is represented in Fig. 9. The mean and standard
deviation of fitness for BPSO, GA, and proposed E-BPSO are represented in Table 7.
294 R. Shenbaga Moorthy et al.

Fig. 7 Convergence for Wisconsin dataset

Fig. 8 Rate of convergence for lung cancer dataset

5 Conclusion

A classifier’s performance can be greatly enhanced by maximizing accuracy and

reducing error rate, and this is where feature selection comes in. It has been suggested
that E-BPSO be used to choose the best subgroup of features in order to get around the
problems with traditional BPSO. The suggested E-BPSO strengthens the exploration
and exploitation ability in finding the ideal global solution of a particle by avoiding
local best solution by altering the velocity in an adaptive way.The current velocity
is multiplied by the difference between the location of the random particle and the
current particle, which actually moves the particle in the direction of the best outcome.
Optimizing Feature Selection in Machine Learning with E-BPSO … 295

Fig. 9 Rate of convergence

for wine dataset

The suggested E-BPSO’s performance is compared with that of BPSO and GA

in experiments conducted on benchmarking datasets. According to the results, E-
BPSO works better than other algorithms with regard to accuracy, dimensionality
reduction ratio, root mean square error, and mean fitness. Future research will focus
on datasets with class imbalances, and the efficacy of feature selection algorithms
on these datasets will be examined.

References

1. Agrawal P, Abutarboush HF, Ganesh T, Mohamed AW (2021) Metaheuristic algorithms on

feature selection: a survey of one decade of research (2009–2019). IEEE Access 2(9):26766–91
2. Dhal P, Azad C (2021) A comprehensive survey on feature selection in the various fields of
machine learning. Appl Intell 23:1–39
3. Nouri-Moghaddam B, Ghazanfari M, Fathian M (2021) A novel multi-objective forest
optimization algorithm for wrapper feature selection. Expert Syst Appl 1(175):114737
4. Xue Y, Zhu H, Liang J, Słowik A (2021) Adaptive crossover operator based multi-
objective binary genetic algorithm for feature selection in classification. Knowl Based Syst
5(227):107218
5. Moorthy RS, Pabitha P (2022) Accelerating analytics using improved binary particle swarm
optimization for discrete feature selection. Comput J 65(10):2547–69
6. Ghaemi M, Feizi-Derakhshi MR (2016) Feature selection using forest optimization algorithm.
Pattern Recogn 1(60):121–9
7. Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm
for feature selection. Expert Syst Appl 39(3):3747–63
8. Mafarja MM, Eleyan D, Jaber I, Hammouri A, Mirjalili S (2017) Binary dragonfly algorithm
for feature selection. In: 2017 International conference on new trends in computing sciences
(ICTCS), pp 12–17. IEEE
9. Adam SP, Alexandropoulos SA, Pardalos PM, Vrahatis MN (2019) A review. Approximation
and optimization, No free lunch theorem, pp 57–82
10. Chae HS, Jo BO, Choi SH, Park TK (2013) Feature selection for intrusion detection using
NSL-KDD. Recent Adv Comput Sci 20132:184–7
296 R. Shenbaga Moorthy et al.

11. Moorthy RS, Parameshwaran P (2021) A novel hybrid feature selection algorithm for optimal
provisioning of analytics as a service. In: Soft computing for problem solving, pp 511–523.
Springer, Singapore
12. Moorthy RS, Pabitha P (2022) Intelligent health care system using modified feature selection
algorithm. In: Pattern recognition and data analysis with applications, pp 777–787. Springer,
Singapore
13. Bommert A, Welchowski T, Schmid M, Rahnenführer J (2022) Benchmark of filter methods
for feature selection in high-dimensional gene expression survival data. Briefings Bioinform
23(1):bbab354
14. Alsahaf A, Petkov N, Shenoy V, Azzopardi G (2022) A framework for feature selection through
boosting. Expert Syst Appl 1(187):115895
15. Abdollahzadeh B, Gharehchopogh FS (2022) A multi-objective optimization algorithm for
feature selection problems. Eng Comput 38(3):1845–63
16. Gharehchopogh FS, Maleki I, Dizaji ZA (2022) Chaotic vortex search algorithm: metaheuristic
algorithm for feature selection. Evol Intell 15(3):1777–808
17. Hu J, Gui W, Heidari AA, Cai Z, Liang G, Chen H, Pan Z (2022) Dispersed foraging slime
mould algorithm: continuous and binary variants for global optimization and wrapper-based
feature selection. Knowl Based Syst 15(237):107761
18. Houssein EH, Saber E, Ali AA, Wazery YM (2022) Centroid mutation-based search and rescue
optimization algorithm for feature selection and classification. Expert Syst Appl 1(191):116235
19. Kareem SS, Mostafa RR, Hashim FA, El-Bakry HM (2022) An effective feature selection
model using hybrid metaheuristic algorithms for IOT intrusion detection. Sensors 22(4):1396
20. Zivkovic M, Stoean C, Chhabra A, Budimirovic N, Petrovic A, Bacanin N (2022) Novel
improved salp swarm algorithm: an application for feature selection. Sensors 22(5):1711
21. Albulayhi K, Abu Al-Haija Q, Alsuhibany SA, Jillepalli AA, Ashrafuzzaman M, Sheldon FT
(2022) IoT intrusion detection using machine learning with a novel high performing feature
selection method. Appl Sci 12(10):5015
22. Liu Y, Heidari AA, Cai Z, Liang G, Chen H, Pan Z, Alsufyani A, Bourouis S (2022) Simulated
annealing-based dynamic step shuffled frog leaping algorithm: Optimal performance design
and feature selection. Neurocomputing 7(503):325–62
23. Hu J, Gui W, Heidari AA, Cai Z, Liang G, Chen H, Pan Z (2022) Dispersed foraging slime
mould algorithm: continuous and binary variants for global optimization and wrapper-based
feature selection. Knowl Based Syst 15(237):107761
24. Kelly M, Longjohn R, Nottingham K (2023) The UCI machine learning repository. https://
archive.ics.uci.edu
CRIMO: An Ontology for Reasoning
on Criminal Judgments

Sarika Jain, Sumit Sharma, Pooja Harde, Archana Pandey,

and Ruqaiya Thakrawala

Abstract Legal experts develop their draft by analyzing legal documents in order to
glean information about the criteria listed in relevant legal parts. The many criminal
cases reported in Criminal Judgments of the lawful domain explain the offense, the
accused parties, the investigation, and the ultimate verdict. Many parts of the written
code can be misinterpreted or lead to erroneous findings when applied to a criminal
case. To better assist legal reasoning, this study seeks to establish an integrated ontol-
ogy for modeling criminal law standards. The proposed criminal domain ontology
maps entities and their relationships to textual rules linked to Criminal Acts in the
Indian Penal Code of 1860 using OWL-DL in a middle-out manner and formalizes
legal rules accordingly. The purpose is to build a legal rule-based decision support
system for the Indian criminal domain utilizing SWRL rule language to generate
logic rules and integrate the criminal domain ontology.

Keywords Criminal ontology · Semantic Web · SWRL · Criminal justice · Rule

modeling · Owl-dl reasoning

1 Introduction

There are various legal documents produced in India annually depicted in Fig. 1,
including case conclusions, precedents, resolutions, decrees, and circulars. The sheer
volume of these documents highlights the complexity and scope of the Indian legal
system. The data is collected from the source https://njdg.ecourts.gov.in/njdgnew/
index.php. This data represents the count of pending cases in the district and taluka
courts of India. Nevertheless, the majority of legal information exists in textual
format, which poses challenges in automatically extracting relevant legal information

S. Jain · S. Sharma (B) · P. Harde · A. Pandey · R. Thakrawala

National Institute of Technology Kurukshetra, Thanesar, India
e-mail: [email protected]
S. Jain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 297
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_25
298 S. Jain et al.

NUMBER OF PENDING CASES IN DISTRICT AND

TALUKA COURTS OF INDIA NUMBER OF PENDING CASES AND TYPES IN
DISTRICT AND TALUKA COURTS OF INDIA
Civil Criminal Total
17330464

38145112
Civil Criminal Total

29964440
12763891

20786577
18094745

15893932
15401680

14813268
10453955
8937262

8180672
7204243

7126866

4892645
6670649

4359313

4184236
3989163

3651959
3216968

2889950

2810351
2693065

2696536
2547155

2561460

2131849
2068534

2041082
1920629

1904750
1778140

1747209
1602160
1535404

1487700
5421495
5412913

886809

669813

678502
477656
409153

328490
262942
4566573

66756
3275989
2650555
2266613

1791330

1705371

625434

503973
399582
104391

95857
66077
29780
0 TO 1 YEARS 1 TO 3 YEARS 3 TO 5 YEARS 5 TO 10 YEARS 10 TO 20 20 TO 30 ABOVE 30
YEARS YEARS YEARS

Fig. 1 Number of pending cases in the district and taluka courts of India

from natural language documents. So, there is a need for efficient management of
legal documents for legal experts. That helps to make the search better and cost-
efficient using machine learning within data-driven applications for legal reasoning
and legal consulting systems. Ontologies have recently gained prominence in the
state-of-the-art Semantic Web due to their ability to capture knowledge in a certain
field. Web-based applications do benefit from their emphasis on interoperability and
the establishment of a clear shared understanding among the various parties involved.
Various studies tried to solve the problem in criminal law in this domain topic
using the approach of legal information generalization [1], Semantic Web technology
[2], legal ontology [3], and rule designs [4]. These all studies have the common
objective of building 6a global criminal ontology for the management of unstructured
documents, but face the problem of different geographical laws. So improving a
global criminal ontology is essential. A sizable amount of data is frequently stored
in the law enforcement field in a convoluted and phonetically complex manner. This
paper provides a criminal ontology that provides an international criminal ontology
which serves as a piece of well-organized information that can bring order to this
chaos and make information more accessible and comprehensible.
Furthermore, reasoning can be applied to identify anomalies or irregularities in
criminal data. This can assist in detecting potential fraud, criminal behavior, or sus-
picious activities by comparing the data against predefined ontological rules and
patterns. In a criminal ontology, legal reasoning may be crucial for interpreting and
applying laws and regulations to specific cases. Reasoning can assist legal profes-
sionals in understanding the implications of various legal statutes and precedents
in the context of criminal cases. However, we present a method that is well-suited
for constructing a legal ontology tailored to the unique characteristics of the legal
system in India. Specifically, we utilize data from the Indian criminal code, focusing
on offenses against the person and reputation, to build the ontology. Additionally, we
introduce a legal expert system called CRIMO (the criminal ontology) that aids legal
experts in efficiently validating the ontology model created by ontology engineers
and facilitating legal reasoning for decision-making. Our work also encompasses the
incorporation of Semantic Web Rule Language (SWRL) reasoning in the ontology
for legal documents. Overall, our research contributes to three key areas: the devel-
opment of a specialized legal ontology, the implementation of the CRIMO expert
CRIMO: An Ontology for Reasoning on Criminal Judgments 299

system, and the utilization of SWRL reasoning, all of which offer significant poten-
tial benefits in the field of legal knowledge management and decision support. The
highlights of the key contributions are

1. Create an ontology specifically to model legal documents, we capture the seman-

tics and meaning of various legal concepts, terms, and relationships. This enables
a more precise and standardized representation of legal knowledge, making it
machine-understandable and analyze legal texts.
2. Proposes a solution for constructing a legal expert system that incorporates reason-
ing capabilities. By leveraging SWRL rules, legal systems can automate various
tasks including legal analysis, decision support, and compliance checking. This
can save time and effort for legal professionals and improve the accuracy and
consistency of legal reasoning.
3. Presents a case study that examines the application of ontology-based information
extraction in analyzing Criminal Judgments in India. The study highlights the
effectiveness of utilizing ontologies to extract relevant information from legal
documents, thereby improving the understanding and analysis of legal events.

In this concern, semantic representation of legal documents is a novel research

direction for information gathering for the Indian Penal Code (IPC) and analyzing
the court judgment, which is explosive and unstructured. We are mainly focusing
on extracting the concept and relation from the legal document to develop a new
ontology. The structure of the paper is organized as: An introduction to the topic
is discussed in Sect. 1. In Sect. 2, the background information is reviewed. Section
3 presents our knowledge acquisition scheme. The design of the knowledge base
is illustrated in Sect. 4. Section 5 outlines our approach for acquiring SWRL rules.
Finally, Sect. 6 presents the concluding remarks of the paper.

2 Related Work

Currently, there are various studies focused on criminal ontology and the Semantic
Web, exploring various approaches related to reasoning, logical rules, and ontology
design. For instance, the syntax of the Legal Knowledge Interchange Format (LKIF)
is introduced in Gordon’s study [5]. A rule language and argumentation-theoretic
semantics are also part of the ESTRELLA European project. It utilizes the Web
Ontology Language (OWL) for concept representation and includes a foundational
ontology of legal concepts that can be leveraged for reuse. The core of LKIF combines
OWL-DL and SWRL. Its primary objectives are twofold: facilitating the translation
of legal knowledge bases written in different formats and formalisms, and serving as a
knowledge representation formalism within larger architectures for developing legal
knowledge systems. Furthermore, several other studies have examined ontology and
legal reasoning systems in different regions such as China [6], Lebanon [7], Korea
[8], Malawi [7], Tunisia [9], etc. Each of these studies presents distinct ideas aimed at
300 S. Jain et al.

improving automated legal reasoning, specifically tackling the challenges prevalent

in their respective local legal systems. Table 1 provides a summary of these studies,
highlighting their diverse methodologies and functionalities within the domain of
ontology and legal reasoning.
Based on the summary of Table 1 of the literature review we have identified the
gap that laws between countries can vary due to several factors, including historical,
cultural, and political differences [16]. In comparing the existing literature, none of
the studies have defined crime ontology for the Indian Penal Code (IPC), summarizing
the specific gap in the literature that makes our research objective to fill.
In the other hand, in the Legal interpretation, the consideration will not focus
only on the element of the crime. The facts that are related to the offender will be
considered as well because of their importance to the crime sentencing process [15].
For example, in the context of dowry cases, when comparing the legal codes of the
United States and Vietnam to those of other countries, the most significant distinction
lies in the level of detail within each set of rules. The United States Code tends to
provide relatively generic information regarding different types of dowry-related
offenses. It primarily addresses the criminal act itself but does not delve deeply
into the surrounding circumstances. In contrast, the Indian legal code goes to great
lengths to define specific details about the circumstances that are associated with the
dowry-related offense.
One crucial difference stems from the fact that the Indian Code explicitly outlines
the exact punishment duration for each rule, whereas the United States Code tends
to employ a more generic approach, usually stipulating punishments such as the
death penalty or life imprisonment. These variations have important implications: the
Indian Code’s detailed approach lends itself to effective rule definition in India’s legal
system, especially when it comes to specific cases and rule definitions. In contrast,
the United States Code’s more generic approach simplifies ontology design, making
it more suitable for broader category definitions. In the realm of legal norms in the
context of dowry cases can be represented as an obligation rule. These obligation
rules follow a conditional form: IF a set of conditions (operative facts) holds true,
THEN a specific legal effect or obligation arises. For example, when examining
dowry-related cases, we can use SWRL expressions to describe specific actions. In
the context of dowry-related cases, (a) might refer to the nature and extent of the
dowry, while (b) could pertain to records related to the victim, and (c) may represent
the conditions surrounding the offense.

3 Theoretical Background

Legal ontologies are used in various applications, including knowledge-based and

information-based systems. The level of detail in knowledge representation directly
correlates with the level of intelligence needed. Applications include information
retrieval, simplifying documents, categorizing and depicting, and decision manage-
ment. Criminal ontology can be used to create Decision Support Systems (DSS) to
CRIMO: An Ontology for Reasoning on Criminal Judgments 301

Table 1 Examination of existing legal ontologies and their scope of coverage

Literature Year Scope Types Description Reasoning
Furtado et al. [10] 2009 Ontology facilitates Crime ontology with Discusses Yes this ontology
WikiCrimes data KB of crime reports WikiCrimes, a Web uses reasoning
exchange 2.0 application that
aims to increase
transparency and
citizen prevention of
crimes through
ontologies and
multi-agent systems
Vandenberghe and 2003 Ontology for KB for semantic Formulation of the No reasoning
Financial [11] representing indexing and search fraud problem, and
financial fraud cases describes the legal
model applied to the
case of investment
fraud online
Zeleznikow and 2001 Provide KB for the Represent KB, Use the inference to Yes ontology
Stranieri [12] legal knowledge moderately evaluate the provides the
structured importance of the reasoning
legal document
Boer et al. [13] 2001 Provide a legal Knowledge base in Developed CLIME Yes this ontology
advice system for Protégé and RDF, ontology for a legal uses reasoning
maritime law moderately advice system
structured
Valente et al. [14] 1999 Develop legal expert Types of knowledge Discusses the use of Yes this ontology
systems, which can that are typical for ontologies in uses reasoning
assist lawyers and legal reasoning e-Court, a platform
legal professionals for online dispute
on resolution, where
Brazilian-Territory ontologies are used
to support legal
reasoning and
information
management
Valente [15] 1995 Representing Reflects how The paper outlines a Yes this ontology
normative different categories comprehensive uses reasoning
knowledge of legal knowledge approach to
are interrelated modeling legal
knowledge systems
using the ON-LINE
architecture
Valente and Breuker 1994 Ontology for the law Understand a This ontology is Yes this ontology
[16] domain, reasoning, basically a set of uses reasoning
and problem-solving interconnected
primitive categories
and sub-categories
of legal knowledge
proposed under a
teleological and
functional view of
the law system
McCarty [17] 1989 KB expressing legal knowledge The paper discusses Yes this ontology
knowledge representation, the development of a uses reasoning
highly structured language for legal
discourse called
LLD, its features,
integration of
modalities, and
implementation
progress
302 S. Jain et al.

aid law enforcement agencies and criminal justice professionals in their tasks. These
systems use knowledge representation based on criminal ontology and structured
data to assist with tasks such as criminal profiling, crime analysis, and predictive
policing.
Decision-making in criminal ontology involves using ontological models and
structured knowledge to inform and direct various parts of law enforcement and
criminal justice procedures. This includes associating evidence with a notion from
a criminal ontology, making informed judgments regarding its applicability and
importance in a criminal inquiry; organizing and prioritizing criminal cases using
ontology-based decision support; and making judgments about punishment, parole,
and rehabilitation programs.

3.1 Ontology

In the field of computer science, ontology pertains to a structured and systematic

depiction of knowledge or information pertaining to a certain topic. Ontology engi-
neering is a specialized area within the fields of artificial intelligence and knowl-
edge engineering. Its primary objective is to establish a systematic and machine-
interpretable framework for delineating the fundamental ideas, interconnections, and
principles inside a given domain. Ontologies are used to model and express knowl-
edge in a way that both people and machines can comprehend. This entails defining
ideas, attributes, and connections among domain entities [18]. The Semantic Web,
which is an extension of the World Wide Web, is a significant application area for
ontologies. Ontologies are utilized to semantically annotate web content, thereby
making it more accessible and machine-understandable. The Resource Description
Framework (RDF) and the Web Ontology Language (OWL) are essential standards
for the development of Semantic Web ontologies. To create, manage, and reason
with ontologies, numerous tools and languages have been developed. Protégé, a
popular ontology editor and framework for knowledge engineering, is one exam-
ple of such a tool. RDF is a data paradigm used to represent information about
web resources. OWL (Web Ontology Language) is a more expressive, RDF-based
language for constructing ontologies with complex semantics. These standards are
indispensable for the evolution of the Semantic Web. There are several applications
in the domain of natural language processing where ontologies can provide a better
improvement and the same has been provided in studies like enhancement in infor-
mation retrieval [19–21], document classification [22–24], knowledge modeling [3],
information extraction [19], knowledge discovery [2], etc.
Ontology can serve as a means to transform the domain expertise of criminal
investigation into a standardized language, while also enabling the creation of a
conceptual model that is comprehensible and useful for both humans and computers.
An example of the ontology in the legal domain is shown in Fig. 2. This bridges
domain knowledge to understandable language for investigations for the machine
and integration.
CRIMO: An Ontology for Reasoning on Criminal Judgments 303

Fig. 2 Sample triple of

concept that is related to
another concept with relation

3.2 SWRL Reasoning

SWRL (Semantic Web Rule Language) [25] is a language used in the Semantic
Web domain to express rules that can be applied to ontologies. It combines the
expressive power of OWL (Web Ontology Language) with the rule-based approach
of RuleML. SWRL allows the specification of logical rules that can be used to infer
new knowledge from existing knowledge in an ontology.
SWRL rules are expressed using a high-level abstract syntax and are typically writ-
ten in terms of OWL concepts, such as classes, properties, and individuals [26]. These
rules follow a Horn-like structure, which means they consist of a set of antecedents
(conditions) and a consequent (conclusion) [27]. When the antecedents are satisfied,
the consequent is inferred.

3.3 Legal Documentation and Ontology Reasoning

SWRL can be used in the context of legal documents and crime ontology to capture
and formalize legal rules, as well as to reason about criminal activities and legal
concepts. By integrating SWRL rules with a criminal domain ontology, it becomes
possible to apply logical relationships and infer new knowledge based on the existing
legal knowledge.
The use of SWRL rules in legal expert systems and legal reasoning systems has
been explored in research. For example, the VNLES (Reasoning-enable Legal Expert
System using Ontology) [28] utilizes SWRL rules to define logical relationships in
the legal domain. Similarly, the CORBS (Criminal Rule-Based System) [7] integrates
SWRL rules with a criminal domain ontology to model and reason about legal rules.
However, none of the studies have defined crime ontology for the Indian Penal Code
(IPC), summarizing the specific gap in the literature that makes research aims to fill.
304 S. Jain et al.

3.4 Advantages/Application of Legal Ontology Reasoning

The application of legal ontology reasoning offers several advantages in various

domains, including law, legal informatics, and legal expert systems. Here are some
key advantages

• Formalization of Legal Knowledge: Legal ontology reasoning allows for the

formalization and representation of legal knowledge in a structured and machine-
readable format. By defining legal concepts, relationships, and rules using ontolo-
gies, legal knowledge can be organized and made accessible for automated
reasoning and decision-making processes.
• Automated Reasoning and Inference: Legal ontology reasoning enables auto-
mated reasoning and inference based on the defined legal rules and relationships.
By applying logical rules and inference mechanisms, legal systems can analyze
legal cases, statutes, and regulations to derive new insights, make predictions, and
support decision-making processes.
• Legal Expert Systems: Legal ontology reasoning forms the foundation for the
development of legal expert systems. These systems utilize ontologies and rea-
soning mechanisms to provide legal advice, answer legal queries, and assist in
legal decision-making. By capturing and reasoning with legal knowledge, expert
systems can enhance the efficiency and accuracy of legal processes.
• Legal Information Retrieval: Legal ontology reasoning can improve the retrieval
and organization of legal information. By structuring legal knowledge using
ontologies, legal documents, and cases can be indexed, categorized, and retrieved
more effectively. This facilitates efficient legal research and information retrieval
for legal professionals and researchers.
• Knowledge Management and Reusability: Legal ontologies facilitate knowl-
edge management and reusability. By representing legal concepts and relation-
ships in a modular and reusable manner, ontologies can be applied across dif-
ferent projects, institutions, or applications. This reduces redundancy, promotes
consistency, and accelerates the development of new legal systems.

The study uses an improved ontology architecture and an open standard ontol-
ogy language to address limitations in existing technology and techniques, such
as the cold start issue. By utilizing criminal ontology, decision-making in criminal
ontology can be more efficient, enabling proactive policing initiatives and a better
understanding of criminal cases.

4 Indian Criminal Ontology

The Indian Criminal Ontology (CRIMO) was developed through an interdisciplinary

approach called data to metadata (D2MD) [29], an Ontology Development Approach
(ODA) created from scratch [30], which has been successfully used to develop several
CRIMO: An Ontology for Reasoning on Criminal Judgments 305

Scenario: Presumption of Abetment in a

Data Cleaning SWRL Rules
Dowry-Related Suicide Case In this
scenario, we have three individuals Reasoner
involved: The accused, the victim, and a NLP Processing Entity Relation
death ... Recognition Extraction
User
Data Acquision
Application Legal Expert Concept Hierarchy

Data Preprocessing Legal Ontology

Inference Engine

Fig. 3 The basic framework to create CRIMO (legal ontology)

domain ontologies by domain experts [31]. This approach provides the freedom
to the developer to define the scope of ontology encountered related difficulties
and implement it through ontological techniques, such as Protege or foundational
approach. The basic pipeline to create the legal ontology from scratch is shown in
Fig. 3.
D2MD methodology iterates over four steps: (1) purpose identification and
requirement specification for COVID ontology, (2) ontology development phase,
(3) evaluation and validation approach, (4) post-development phase.

4.1 Purpose Identification to Create CRIMO

The scope for creating an ontology involves determining the domain and defining the
goals and boundaries of the ontology. The ontology-based model described in this
paper consists of both domain-dependent and domain-independent semantic rules in
the test case. This model consists of a contextual feature which interprets accurately
captured results, providing the disclosure of richer data to programs for supporting
the conceptual searches. The scope for CRIMO ontology helps to establish concepts,
individuals, properties, and relationships relevant to criminal activities. It includes
classes like “Criminal”, “Crime”, “Suspect”, “Evidence”, “Victim”, “Location”, and
properties like “committedCrime”, “hasAlibi”, and “hasEvidence”. We describe the
purpose of creating CRIMO ontology in the form of the research question. Some of
these research questions are described in Table 2.

4.2 Ontology Development Phase (Develop CRIMO)

Based on the available data we first filter out major categories like Person, Location,
Crime, Section, etc., and the relationship between persons and crime, the relationship
306 S. Jain et al.

Table 2 Some competency questions to determine the scope of CRIMO

S.No Competency question
1. Retrieve the dates and details of dowry cases where abatement of suicide is a charge
2. Find the names of individuals who have been accused in dowry cases with charges of
dowry harassment
3. Retrieve the names of individuals accused in dowry cases with charges of “cruelty for
dowry”
4. Who are the parties involved in the case, including the accused, victims, and any
witnesses?
5. What evidence has been collected or presented in support of the case, and how reliable is
it?
6. Has there been any prior history or disputes between the parties involved that may be
relevant to the case?
7. Are there any constitutional issues that need to be considered in this case, such as search
and seizure violations?
8. What are the specific charges filed against the accused, and what penalties may be
associated with those charges?
9. Has the accused been informed of their legal rights, including the right to an attorney and
the right to remain silent?
10. Are there any potential witnesses or experts who will be called to testify, and what is their
relevance to the case?
11. Have there been any plea negotiations or discussions of a possible settlement between the
parties involved?
12. What is the expected timeline for the case, including key dates for hearings, trial, and
other legal proceedings?
13. What is the nature of the alleged crime, and can you provide a brief description of the
events leading up to incident?

between crime and location, how the person is related to crime, and which section
is related to crime, etc. Here each section is uniquely identified by a particular name
or ID, and the description of each section is defined. These are further categorized
into three parts named as

• Classes.
• Object property.
• Data property.

4.2.1 Identifying Classes

Identifying classes in a criminal ontology involves defining the various categories or

types of entities and concepts that are relevant to the domain of criminal activity and
law enforcement. Identifying classes has the following procedure:
CRIMO: An Ontology for Reasoning on Criminal Judgments 307

• Conceptual Clarity: The ontology should reflect a clear and unambiguous under-
standing of the domain. Begin by defining fundamental concepts related to criminal
judgments, legal entities, and reasoning processes.
• Domain Analysis: Grasp the space of criminal decisions, legitimate thinking,
and related regions completely. This includes counseling legitimate special-
ists; concentrating on lawful texts; and distinguishing key ideas, elements, and
connections.
• Conceptual Modeling: Make a calculated model that addresses the central ideas
and connections in the domain. We might utilize visual apparatuses like UML
charts or OWL (Web Cosmology Language) for this reason.
• Hierarchy and Classification: Coordinate the ideas into a progressive design
with subclasses and superclasses. For example, we might have a high-level
class for “criminal judgment” with subclasses like “conviction”, “exoneration”,
“condemning”, and so on.

To extract concepts from legal documents and transform them into classes in CRIMO
ontology, we follow a systematic approach, which involves natural language process-
ing techniques and legal expert validation. The top-level entities extracted from the
legal document that become the concepts for our CRIMO ontology are shown in
Table 3.

4.2.2 Identifying Properties

In ontology modeling, object properties and data properties are two fundamental
types of properties used to describe relationships and attributes of individuals or
instances within a domain. These properties are used to define the structure and
semantics of ontology classes and instances.

1. Object properties: Some of the object properties we had defined in our ontology
are as follows.
2. Data properties: In ontology, a data property is a fundamental concept that is
used to describe the attributes or characteristics of individuals within a domain.
Data properties are distinct from object properties, which describe relationships
between individuals (Tables 4 and 5).

4.3 Schema Diagram and Construction of CRIMO

These are likely to represent concepts and can be considered potential classes in
CRIMO ontology. Eliminate duplicates and synonyms to ensure that each concept is
represented only once with manual effort. Then we organize the identified concepts
hierarchically through the legal expert. Some concepts may be more general (super-
classes) while others are more specific (subclasses). Then we create a formal ontology
308 S. Jain et al.

Table 3 Extracted entities that become the concepts in CRIMO

S. no Class/concept Description
1. Persons: This is a class of Thing which is a superclass of all classes. This person class
consists following subclasses
a. Criminals or suspects: This class includes individuals who have committed crimes or are suspected
of committing crimes. It can also include information about their
characteristics, criminal records, and affiliations
b. Victims: This class represents the individuals or entities who have been harmed or
affected by criminal activities. It may include information about their injuries,
losses, and personal details
c. Criminal justice system: This class includes judges, prosecutors, defense attorneys, and other
professionals involved in the criminal justice system. It may also include their
roles, responsibilities, and qualifications
2. Sections: This class contains all 511 section of IPC and their definition, i.e., which
section contains which type of crime
3. Punishments and penalty This class includes the penalties and sentences associated with different
crimes, such as imprisonment, fines, probation, and community service
4. Locations: This class represents geographical areas, such as cities, states, and countries,
as well as the legal jurisdictions and boundaries that determine where crimes
are prosecuted
5. Legislation and Laws: This class includes the laws, statutes, and regulations relevant to criminal
activity. It may also include information about amendments, repeal, and
historical legal documents
6. Weapons: This class includes the type of weapon used in particular crime such as knife,
gun, poison, and rope.
7. Evidence and Forensics: This class includes physical evidence, digital evidence, and forensic analysis.
It may encompass types of evidence like DNA, fingerprints, surveillance
footage, and digital records

Table 4 Extracted object properties and their domain and range

Property Domain Range Property Domain Range
hasEvidenceAgainst Agent Agent hasInvolvedRelative Crime Person
hasClient Case Person hasCourtCity Case Place
hasLocation Agent Location hasDoneOffence Agent Offence
hasEvidence Agent Evidence intentionallyCommits Person Crime
hasIntent Person Crime hasJudge Case Person
hasJudgment Case Section hasDoneViolenceAgainst Person Person
hasPlaceOfOccurence Case Location worksIn Person Organization
hasPunished Judgment Punishments hasJudgment Case Judgment
hasA Case Lawyer hasA Case Judge
CRIMO: An Ontology for Reasoning on Criminal Judgments 309

Table 5 Extracted data properties and their domain and range

Property Domain Range Property Domain Range
hasCourtName Case String hasDateOfOccurence Case Date
hasSectionDesc Case Integer hasSectionNum Case Integer
hasRelatedSECID Case Integer hasSectionId Case Integer
hasDateOfJudgment Case DateTime hasPunsihment Crime String
hasName Agent String hasLatitude Location String
hasLongitude Location String hasDOB Person DateTiem
judgeName Case String caseID Case String
regDate Case DateTime hasPunsihment Crime String

structure that represents the relationships between classes using an ontology schema
diagram as shown in Fig. 4. The box represents the concept and the value inside
the box represents the data property of the respective concept. The arrow between
the two concepts represents the connection between the concepts. The dotted arrow
shows the subclass relationship between the concepts.
After reviewing the list of extracted concepts and ensuring that they make sense
in the context of the purpose identification of the CRIMO domain and the goals of
CRIMO ontology. Then we create a formal ontology structure that represents the
relationships between classes (concepts) using ontology languages like OWL (Web
Ontology Language) using Protégé. The structure of the CRIMO ontology is shown
in Fig. 5.

4.4 Legal Reasoning

SWRL rules are created to present the logic of the acquisition schema. We extract the
law content from the IPC criminal law [32]. This rule is used to infer connections and
realities about crimes, people, and elements inside the space of law enforcement and
policing. SWRL is a strong decision language that can be utilized to communicate
complex connections and make derivations in light of the information put away in
an ontology.
SWRL rules are defined within the criminal ontology, expressing logical relation-
ships and constraints among its elements. For instance, if a person is identified as
a suspect in a crime without evidence proving their alibi, they can be inferred as a
potential suspect. If a crime occurred at a specific location and a person was present
during the crime, they can be inferred as a potential witness. If a person has been
convicted of a crime and there is evidence linking them to other unsolved crimes,
they can be inferred as a potential serial offender. If a person has been hurt, damaged,
or killed or has suffered, either because of the actions of the accused then they can
be inferred as a Victim.
310 S. Jain et al.

hasEvidenceAgainst

Agent
Location hasName:String
hasLocation hasLocation:String
hasLatitude :String
Organization
hments
hasLongituge:String
hasName:String SID
ation
SubClassOf orgType
OfPunis

SubClassOf
hasLoc

City
worksIn
Country nce SubClassOf
ffe Person
eO
State on hasDOB:DateTime
sD
ha hasID:String

ce
en
vid
Offence hasPunishments Criminal

sE
SubClassOf

ha
Judge
Evidence t
en
gm
ud Petitioner
hasP

Punishments

or
J
basedOn ke

tF
ma

en
unish

Lawyer
gm
ud
Under
hasA
eJ
ed

hasA Respondent
ak
m

IPC Judgment Witness Suspect

hasA
Under Victim
SubClassOf
Section Act Case
hasA Notice
History_Of_Case_Hearing caseID: String hasA
registrationDate:DateTime
judgeName:String
businessOnDate:DateTime
hasA hasStageOfCase
hearingDate:DateTime hasA
purposeOfHearing:String forThe
Case_Status
Offence Interimorder
hasCriminalOffence Case_Details Fir_Details orderNumber: Integer firstHearingDate:DateTime
hasCivilOffence regPoliceStation: String orderDate: DateTime nextHearingDate:DateTime
filingNo:integer
firNo: String orderDetails:String
registrationNo:Integer
filingDate:DateTime firYear: Integer
CNRnumber:Integer

Fig. 4 Schema diagram for the CRIMO ontology

Reasoning engines or ontology reasoners can be used to process the ontology and
apply these rules to infer new information or check for consistency. Users can query
the criminal ontology to retrieve specific information or perform complex queries,
answering questions like “Who are the potential suspects for a given crime?” and
“Are there any witnesses present at a particular location during a crime?”

4.4.1 Case Study: Dowry Sample Case

Scenario: Presumption of Abetment in a Dowry-Related Suicide Case

In this scenario, we have three individuals involved: the accused, the victim, and a
death that occurred within 7 years of marriage. The rule will be applied to determine
if there is a presumption of the accused being guilty of abetment of suicide based on
evidence of dowry-related cruelty or harassment.
Person 1 (Accused): A man who is the husband of the victim.
CRIMO: An Ontology for Reasoning on Criminal Judgments 311

Fig. 5 The structure view of the CRIMO ontology in Protégé

Fig. 6 Sample dataset for the Indian Dowry Articles according to IPC

Person 2 (Victim): A woman who was married to the accused. Death Within 7
Years of Marriage: The victim tragically passes away within 7 years of her marriage.
Evidence of Dowry Cruelty: There is substantial evidence indicating that the
victim was subjected to cruelty and harassment related to dowry during her marriage
(Fig. 6).
312 S. Jain et al.

Table 6 SWRL rules and description addressed to rule

Rule no. SWRL rule Description
Rule-1: Person(?perpetrator), Person(?victim), If a person is both a perpetrator and a victim of
AbetmentOfSuicide(?perpetrator, ?victim), abetment of suicide or subjected to cruelty for
CommittedSuicide(?victim), dowry, they are charged with abetment.
SubjectedToCrueltyForDowry(?victim) .→
ChargedWithAbetment(?perpetrator)
Rule-2: Person(?x), Person(?y), Evidence(?e), If there is a person x who has evidence against
Crime_Type(?d), Harm_Caused_By(?a), person y, and they are involved with a relative n, and
hasDoneViolence(?x, ?y), hasDoneAssault(?x, ?y), various actions between them indicate a crime
Physcologyical_torment(?x, ?y), (violence, assault, psychological torment), then x is
hasEvidenceAgainst(?y, ?x), a suspect, y is a victim and x and n are linked to the
hasInvolvedRelative(?x, ?n) .→ Suspect(?x), 498 crime
Victim(?y), 498(?n), 498(?x)
Rule-3: Person(?accused), Person(?victim), If an accused person is connected to the victim
AbetmentOfSuicide(?accused, ?victim), through abetment of suicide and there is evidence of
DeathWithinSevenYearsOfMarriage(?victim), dowry cruelty, the accused is presumed guilty of
EvidenceOfDowryCruelty(?victim) .→ abetment
PresumptionOfGuiltyAbetment(?accused)
Rule-4: Person(?x), Person(?y), Evidence(?e), If there is a person x with evidence against person y,
Crime_Type(?d), Harm_Caused_By(?a), and they share a relative n, plus various harmful
hasDoneViolence(?x, ?y), hasDoneAssault(?x, ?y), actions (violence, assault, psychological torment)
Physcologyical_torment(?x, ?y), suggesting a crime, then x is a suspect, y is a victim,
hasEvidenceAgainst(?y, ?x), and both are linked to the 498 crime
hasInvolvedRelative(?x, ?n) .→ Suspect(?x),
Victim(?y), 498(?n), 498(?x)

4.4.2 SWRL (Semantic Web Rule For Dowry Article)

Based on this scenario, the SWRL rule will be applied as follows:

Person(?accused) corresponds to the accused husband. Person(?victim) corre-
sponds to the deceased victim. AbetmentOfSuicide(?accused, ?victim) represents
the accusation of the husband abetting the wife’s suicide. DeathWithinSevenYear-
sOfMarriage(?victim) indicates that the victim passed away within 7 years of her
marriage. EvidenceOfDowryCruelty(?victim) signifies the presence of compelling
evidence that the victim suffered cruelty and harassment related to dowry during her
marriage. If all these conditions are met in the scenario, the SWRL rule will result
in the presumption that the accused husband is guilty of abetment of suicide as per
Section-113(A) of the IPC. This rule defined in Table 6 helps to establish the legal
presumption in cases where such conditions exist, and it plays a crucial role in the
legal process for dowry-related suicide cases.
On applying the rule we can conclude new information like who was the suspect
and victim in the given scenario of Dowry. Based on the given information like
persons involved, Evidence, weapon used, type of crime, and action of crime we
are now able to infer new knowledge as to which section of the Indian Penal Code
should be applied to the accused in this case and what would be Punishment. The
reasoner is to validate the rule and provide the explanation as shown in Fig. 7. Like
in this scenario Sect. 498(A) of IPC should be applied.
CRIMO: An Ontology for Reasoning on Criminal Judgments 313

Fig. 7 Explanation of SWRL reasoning through the reasoner

5 Evaluation

The evaluation of the quality of learned ontology is determined by its level of align-
ment with a manually constructed ontology, often referred to as the “gold standard”.
However, comparing two ontologies poses a notable challenge in this approach. In
practice, ontologies can be compared at two distinct levels: lexical and conceptual. To
enhance clarity, we introduce the relational level in our assessment of hierarchical and
non-hierarchical structures. The “Ontology for Reasoning on Criminal Judgments”
is a proposed ontology that aims to improve the understanding and interpretation
of criminal judgments in the legal domain. Its effectiveness will be evaluated based
on its usability, scalability, impact on legal research and decision-making, and its
potential to enhance the pursuit of justice.
Usability and practicality are crucial factors in evaluating the ontology, as it should
align with the practical needs of the legal community. The ontology should be user-
friendly, easy to integrate into existing systems, and effective in query mechanisms.
Scalability is also vital, as it should be able to handle a diverse range of judgments
from various jurisdictions.
The ontology’s impact on legal research will be measured by its contribution
to information retrieval efficiency and effectiveness. It should assist in legal argu-
mentation, decision support, and the generation of legal conclusions. The ontology
should also be able to extract valuable insights from criminal judgments, such as
trends, patterns, and relationships within legal documents. The ontology’s impact
314 S. Jain et al.

Table 7 Comparison of existing work with our system

Paper Rules Ontology Reasoning Multi-Docs
China [6] NO Yes Yes No
Korea [8] Yes Yes No No
Lebanon [7] Yes Yes Yes No
Tunisia [9] No Yes No No
Malawi [7] No No Yes No
CRIMO Yes Yes Yes Yes

on decision-making processes within legal institutions should be studied, includ-

ing whether it aids judges, lawyers, and policymakers in making more informed
decisions, leading to more equitable and efficient legal outcomes. The ontology’s
success will be measured by its contribution to the pursuit of justice, promoting
fairness, transparency, and the rule of law within the legal system.
In conclusion, the evaluation of the “CRIMO” should involve a multifaceted
approach that considers usability, scalability, impact on legal research and decision-
making, and its role in enhancing the pursuit of justice as shown in Table 7.

6 Conclusion

In this paper, we have created the Legal ontologies CRIMO and demonstrated their
significance in the representation, processing, and retrieval of legal information. In
the emerging landscape of the Semantic Web, their importance is expected to grow
even further. Despite numerous research projects focused on automatic construction
in this field, there is currently a lack of a standardized benchmark for evaluating the
engineering of legal ontologies.
A notable development in legal informatics, law, and justice is the CRIMO. Orga-
nizing and semantically annotating complex facts improves understanding and use
of criminal decisions. This ontology’s support for intelligent search, automated legal
reasoning, and insightful insights will revolutionize how lawyers, researchers, politi-
cians, and the general public interact with case law. Evaluations are made of its
usefulness, scalability, influence on legal research, automated legal reasoning, and
commitment to justice. The ability of the ontology to advance fairness and bolster the
rule of law is its contribution to the quest for justice. It will be essential to pursuing
justice and legal scholarship as it develops and grows.
CRIMO: An Ontology for Reasoning on Criminal Judgments 315

References

1. Valente A (2005) Types and roles of legal ontologies. In: Law and the semantic web: legal
ontologies, methodologies, legal information retrieval, and applications. Springer, pp 65–76
2. Osathitporn P, Soonthornphisaj N, Vatanawood W (2017) A scheme of criminal law knowledge
acquisition using ontology. In: 2017 18th IEEE/ACIS international conference on software
engineering, artificial intelligence, networking and parallel/distributed computing (SNPD).
IEEE, pp 29–34
3. Mezghanni IB, Gargouri F (2017) Crimar: a criminal Arabic ontology for a benchmark based
evaluation. Procedia Comput Sci 112:653–662
4. Fawei B, Pan JZ, Kollingbaum M, Wyner AZ (2019) A semi-automated ontology construction
for legal question answering. New Gener Comput 37:453–478
5. Gordon TF (2008) Constructing legal arguments with rules in the legal knowledge interchange
format (LKIF). In: Computable models of the law: languages, dialogues, games, ontologies.
Springer, pp 162–184
6. Zhang N, Pu Y-F, Yang S-Q, Zhou J-L, Gao J-K (2017) An ontological Chinese legal
consultation system. IEEE Access 5:18250–18261
7. El Ghosh M, Naja H, Abdulrab H, Khalil M (2017) Towards a legal rule-based system grounded
on the integration of criminal domain ontology and rules. Procedia Comput Sci 112:632–642
8. Soh C, Lim S, Hong K, Rhim Y-Y (2015) Ontology modeling for criminal law. In: International
workshop on AI approaches to the complexity of legal systems. Springer, pp 365–379
9. Mezghanni IB, Gargouri F (2015) Towards an Arabic legal ontology based on documents
properties extraction. In: 2015 IEEE/ACS 12th international conference of computer systems
and applications (AICCSA). IEEE, pp 1–8
10. Furtado V, Ayres L, De Oliveira M, Gustavo C, Oliveira J (2009) Towards semantic Wikicrimes.
In: AAAI spring symposium: social semantic web: where web 2.0 meets web 3.0, pp 27–32
11. Leary RM, Vandenberghe W, Zeleznikow J (2003) Towards a financial fraud ontology a legal
modelling approach
12. Zeleznikow J, Stranieri A (2001) An ontology for the construction of legal decision support
systems. In: Proceedings of the second international workshop on legal ontologies, vol 13, pp
67–76
13. Winkels R, Engers T, Bench-Capon T (2001) Proceedings of the second international workshop
on legal ontologies
14. Valente A, Breuker J, Brouwer B (1999) Legal modeling and automated reasoning with on-line.
Int J Hum-Comput Stud 51(6):1079–1125
15. Valente A (1995) Legal knowledge engineering: a modelling approach
16. Valente A, Breuker J (1994) Towards a global expert system in law. In: Bargellini G, Binazzi
S (eds) A functional ontology of law. CEDAM Publishers
17. McCarty LT (1989) A language for legal discourse i. basic features. In: Proceedings of the 2nd
international conference on artificial intelligence and law, pp 180–189
18. Sharma S, Jain S (2023) The coronavirus disease ontology (Covido). In: Semantic intelligence:
select proceedings of ISIC 2022. Springer, pp 89–103
19. Vallet D, Fernández M, Castells P (2005) An ontology-based information retrieval model.
In: The semantic web: research and applications: second European semantic web conference,
ESWC 2005, Heraklion, Crete, Greece, May 29–June 1. Proceedings 2. Springer, pp 455–470
20. Ranwez S, Duthil B, Sy MF, Montmain J, Augereau P, Ranwez V (2012) How ontology based
information retrieval systems may benefit from lexical text analysis. New Trends Res Ontol
Lexical Resourc Ideas, Projects, Syst 209–231
21. Munir K, Anjum MS (2018) The use of ontologies for effective knowledge modelling and
information retrieval. Appl Comput Informatics 14(2):116–126
22. Shanavas N, Wang H, Lin Z, Hawe G (2020) Ontology-based enriched concept graphs for
medical document classification. Inform Sci 525:172–181
23. Elhadad MK, Badran KM, Salama GI (2017) A novel approach for ontology-based dimension-
ality reduction for web text document classification. Int J Softw Innov (IJSI) 5(4):44–58
316 S. Jain et al.

24. Lytvyn V, Vysotska V, Veres O, Rishnyak I, Rishnyak H (2017) Classification methods of text
documents using ontology based approach. In: Advances in intelligent systems and comput-
ing: selected papers from the international conference on computer science and information
technologies, CSIT 2016, September 6–10 Lviv, Ukraine. Springer, pp 229–240
25. Semantic Web Rule Language (2023). Page Version ID: 1145742736. https://en.wikipedia.org/
w/index.php?title=Semantic_Web_Rule_Language&oldid=1145742736 Accessed 10 May
2023
26. Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M et al (2004) Swrl: a
semantic web rule language combining owl and ruleml. W3C Member Submission 21(79):1–31
27. Lezcano L, Sicilia M-A, Rodríguez-Solano C (2011) Integrating reasoning and clinical
archetypes using owl ontologies and swrl rules. J Biomed Informatics 44(2):343–353
28. Dao QT, Dang TK, Nguyen TPH, Le TMC (2023) Vnles: a reasoning-enable legal expert
system using ontology modeling-based method: a case study of Vietnam criminal code. In:
2023 17th international conference on ubiquitous information management and communication
(IMCOM). IEEE, pp 1–7
29. Sharma S, Jain S (2023) Covido: an ontology for Covid-19 metadata. J Supercomput 1–30
30. Sharma S, Jain S (2024) The semantics of Covid-19 web data: ontology learning and population.
Curr Mater Sci: Formerly: Recent Patents Mater Sci 17(1):44–64
31. Jain S, Harde P, Mihindukulasooriya N (2023) Nyon: a multilingual modular legal ontology
for representing court judgements. In: Semantic intelligence: select proceedings of ISIC 2022.
Springer, pp 175–183
32. Rankin G (1944) The Indian penal code. LQ Rev 60:37
Ranking of Documents Through Smart
Crawler

Amol S. Dange, B. Manjunath Swamy, and Ashwini B. Shinde

Abstract With the exponential boom in information storage on the internet these
days, search engines like Google are of extreme significance. The critical issue of
a search engine, ranking models are techniques utilized in engines like Google to
find relevant pages and rank them in lowering order of relevance. The offline gath-
ering of those papers is important for offering the consumer with more accurate and
pertinent findings. Earlier when an end-user issues a question, crawling is the system
of retrieving documents from the web. With the internet’s ongoing expansion, the
quantity of files that need to be crawled has grown surprisingly. It’s crucial to wisely
rank the files that want to be crawled in each iteration for any academic or mid-degree
organization because the resources for non-stop crawling are constant. Algorithms
are created to deal with the crawling pipeline already in the area while bringing the
blessings of ranking. These algorithms ought to be quick and effective to save you
from turning into a pipeline bottleneck. The proposed method uses the Hamming
distance algorithm application. Also, this method incorporates parallel processing
by using Kafka in between subtasks. Primarily, based on the Hamming Distance
algorithm software, the quest engine is designed for, an effective smart crawler is
created that ranks the page that needs to be downloaded in each new release. Evalu-
ating with different present methods, the implemented Hamming Distance technique
achieves an excessive accuracy of 99.8%.

Keywords Web crawling · Ranking · Hamming distance

A. S. Dange · A. B. Shinde (B)

Department of Computer Science & Engineering, Annasaheb Dange College of Engineering &
Technology, Ashta, Maharashtra, India
e-mail: [email protected]
B. M. Swamy
Department of Computer Science & Engineering, Don Bosco Institute of Technology, Bangalore,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 317
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_26
318 A. S. Dange et al.

1 Introduction

Web crawling is the process of acquiring desired information from the web. A huge
amount of objective data, such as the data that practitioners or researchers need,
are collected and indexed by web crawlers. They construct massive fields of the
gathered data by automatically collecting specified content from several websites.
Web crawlers are becoming more crucial as big data is utilized in a wider range of
industries and the amount of data available grows dramatically every year. A web
crawler that automatically finds and retrieves every website page and is used to gather
web pages. A portion of the data collected from web pages can be used to enhance the
crawling procedure. By examining the information on the pages, a focused crawler
grabs pertinent web pages. There is a lot of research on this problem that leverages
data from online pages to enhance this task. Similar features and hierarchies can be
found on several web pages of a website. By anticipating these factors, may optimize
the crawling procedure. Three additional pieces of data analyzed from earlier web
pages of a website are added to the information extraction method in this study
to improve it. The use of web crawler technology may lead to the development of
novel environment survey techniques and approaches while obviating the need for
significant amounts of labor, money, and time [1–3]. Additionally, complete access
to the functions of the public administrations and their associated data/documents
via the web is offered to lower the cost of administrative tasks within the public
sector. The information-gathering part of the search engine is a web crawler called
the robot or spider. Crawling is the process of acquiring useful web pages with
an interconnected structure link in a methodical and automated way. Technology
that automatically collects desired data from websites is called web crawling, A
system that uses reinforcement learning and works with a lot of data to examine
issues with online browsing. Due to the development of data and the enormous
number of documents that could be found to match a particular query string, indexing,
and ranking algorithms are used to discover the best-matching documents [4–7].
Crawling completeness, sometimes referred to as recall, is the proportion of the
takeout evaluation’s total web pages that have been crawled to the number of web
pages connected to its content. The study expands on the web crawling methodology,
which can automatically find and gather a large number of constructions from the
internet. The Uniform Resource Locator (URL) identifies these resources, and the
URLs allow connections to other resources. As a result, there is now a requirement
for effective web systems that can track URLs and their contents for indexing [8–
10]. Web data is becoming an increasingly valuable source of information due to the
quality and quantity of data that is available for automated retrieval. Because these
crawlers interact favorably with the web servers and consume online content at a
controlled speed, it is simple to identify ethical crawlers that follow the norms and
guidelines of crawlers. However, there are still some unethical crawlers who attempt
to deceive web servers and management to conceal their actual identities. Complex
and compound procedures should be used to find unethical crawlers [11, 12].
Ranking of Documents Through Smart Crawler 319

The next part is organized as follows: Sect. 2 describes a literature review of web
crawlers. The proposed methodology is explained in Sect. 3. Results and discussion
are included in Sect. 4. Section 5 includes comparative analysis. The conclusion is
detailed in Sect. 6.

2 Literature Review

Sharma et al. [13] implemented an experimental overall performance evaluation of

web crawlers which utilizes multi and single-threaded web crawling and indexing
algorithms for the smart web contents application. The simulation work addresses
the key parameters for this hierarchical, single, and multithreaded clustering strategy,
including execution time and harvest ratio. The result of the implemented method
provides the system’s better performance than existing methods of self-adaptive,
ANN, and probabilistic methods and it takes less time to investigate.
Hosseinkhani et al. [14] implemented the ANTON framework based on a
semantic-focused crawler to support web crime mining using SVM. The enhanced
criminal ontology employed in the implemented framework, which took elements
from biological studies of ant foraging behavior, was developed using an ant-miner-
focused crawler. The ANTON framework employs SVM for the classification of
content that relates to crime, increasing precision and decreasing false positives to
produce findings in crime mining. The results also show that this work provides effi-
cient solutions through crime ontologies and ant-based servers. However, due to pre-
defined semantic information sources and frequent changes, the ANTON framework
has trouble responding to changing crime patterns.
Kaur et al. [15] implemented an intelligent hidden web crawler (IHCW) for
harvesting data in urban domains to handle the relevant issues, including priori-
tizing URLs, domain classification, and avoiding exhaustive searching. The crawler
operates well while collecting data on pollutants. Hidden web sources are effectively
searched, visits are minimized and crawling resources are saved. By using rejection
rules, the crawler selects the pertinent websites and ignores the unimportant ones.
The implemented framework was accurate and had a high harvest rate. It effectively
collects hidden web interfaces from large-scale sites and outperforms other crawlers
in terms of rates. However, IHWCs had trouble effectively capturing and processing
this real-time data because it necessitates continuous page refreshing.
Hosseini et al. [16] implemented an SVM-based classifier for detecting crawlers.
Numerous sorts of recognized text and non-text crawlers are categorized, as well as
unknown and harmful ones, to protect patient privacy and web server security. SVM
was extremely effective at binary classification tasks, which makes them suited for
accurately identifying trustworthy and harmful web crawlers, and the crawlers were
detected with high precision. However, due to the SVM-based classifier’s heteroge-
neous behavior and insufficient labeled data, was difficult to compile a representative
collection of malicious web crawlers.
320 A. S. Dange et al.

Kaur and Geetha [17] implemented a SIMHAR crawler, based on hybrid tech-
nology, Sim+Hash, and hash-maps of Redis were utilized to detect duplication. The
distributed web crawler for the hidden web detects the crawler accurately and submits
the searchable forms. The SIM+HASH technique uses similarity-based crawling,
which aims to fetch pages that are similar to those that have already been crawled.
By minimizing duplicate crawling of similar pages, this method helps highlight
important information. However, the surface web, which consists of publicly acces-
sible web pages is not optimized for crawling. This restriction limits its applicability
to a particular portion of the internet.
Murugudu and Reddy [18] implemented a novel and efficient two-phase deep
learning data crawler framework (NTPDCF). This crawler is used for intelligent
crawling of search data to produce a large variety of effectively matched content.
Using the two-phase data crawler architecture effectively collects targeted data from
deep web interfaces, improving data harvesting and enabling the successful extraction
of relevant information from various web sources. However, data crawlers must
constantly adapt to the deep web interface’s dynamic nature to efficiently harvest
data and avoid erroneous or incomplete extraction.
Capuano et al. [19] implemented an ontology-driven to concentrate on crawling
based on the use of both multimedia and textual material on online pages. The
implemented framework was employed in a system to improve crawling activities by
fusing the outcomes with novel technologies like linked open data and convolutional
neural networks. A high degree of generalization in various conceptual domains
was also made possible by the use of formal frameworks to express knowledge as
ontologies. However, the crawler needs to operate with a lot of online pages, so this
strategy necessitates manually labeling a sub-graph of the web, which would involve
a lot of labeling work.

3 Methodology

This section describes the methodology used in the proposed system. The architecture
of proposed method is shown in Fig. 1, which contains Phase 1 Indexing and user
query preferences and Phase 2 Ranking Models.
Step 1: First we’ve to check whether we’ve already crawled the given URL or
not. A properly listed database is used for this purpose. This database carries the
information at the URLs that have already been crawled. If no longer, we will pass
directly to the subsequent step, unload the information from the URL into the Kafka
topic URL-information, after which maintain to move slowly the brand new URL that
is produced inside the Kafka topic URL statistics, and extract the database structure
form.
The truth that a website may additionally appear more than once will cause the
URL and internet site to both be indexed collectively. Given that each website’s URL
may be particular, while we look for a particular URL, the index will carry both the
website and, the URL, as a way to take much less time than truly storing the URL
Ranking of Documents Through Smart Crawler 321

Fig. 1 Architecture of the proposed method

value. When a crawler pulls records from a URL, the data may be separated into the
website and URL for database storage.
Step 2: The second step involves taking the subject URL-information-subject
matter, disposing the script files, and styling components, after which extracting the
heading from the bulleted points, H1, H2, H3, H3, H4, H5, and H6 tags, in addition to
any links to photos or different links that can be there. In the topic similarity theme,
the extracted subject matter is subsequently pasted.
Step 3: In this step, the grammar can be eliminated from the text facts extracted
from crawled information so that it isn’t optimized for search engines. Alternatively,
the facts may be compared to the facts already present inside the database, and if
there’s a similarity of more than 50%, it is going to be given a unique identification.
As the search engine won’t get a specific string to evaluate, grouping all similar
objects will be critical as it will appreciably cut down on the time required for the
procedure of travelling via the complete database and extracting the particular_id
when it appears. This system may be repeated for each message observed on the
topic. If the message is much like every other message, it will likely be saved and
compared with that message.
The scheduler could be applied with dynamic URLs where the records, to rein-
troduce the URL to the URL topic, first eliminate the database’s details. For static
websites that do not often trade this can be disabled in order that it doesn’t go through
the first step once more. To crawl simply one static web page using this can be useful.
The agenda time can be set dynamically for dynamic websites. Consequently, if
the URLs are up to date within 1 min, the original time might be 1 min. If it is updated,
the time for a sure website can be increased with the aid of 2 min, just because it
322 A. S. Dange et al.

will if it’s far set to run again after 2 min. With this specific architecture, we will
improve consumption when the message in the concern will increase and decrease it
whilst the message within the topic lowers. Both vertical and horizontal scalability
may be aided by this occasion architecture. One client or producer’s failures could
have an impact on the complete device. As there can be different purchasers there
due to the fact the process is asynchronous and takes much less time to address than
a synchronous method, the outage of both of them does not have an impact on the
system. Algorithm:
1. Check whether the given URL is already crawled or not.
2. If not dump the data from the URL into Kafka topic URL data.
3. Crawl the new URL that is produced in the Kafka topic.
4. Take the subject URL data topic and eliminate script files, styling components
and extract heading from bulleted points.
5. The grammar will be removed from the text information extracted from crawled
data.
6. The information is compared to the data already present in database and if more
than 50% similarity is found, it will be given a unique ID.
The two primary phases that form the indexing and ranking model’s implemented
architecture each define a specific step in the ranking and indexing process for either
offline web pages or documents. The version can be utilized for storing documents
offline or web pages online. Following is the description of the model’s two primary
levels.
Phase 1: The first step will be getting a search query entered by a person. We are
able to offer users flexibility in determining their options and ordering priority based
totally on their wishes. Relying on their needs and alternatives, each user has the
choice of selecting any set of standards, or all of them.
After determining a person’s possibilities and desires, the user’s query model
begins processing and utilizes lemmatization and stemming to decrease the varieties
of inflectional and occasionally derivationally paperwork associated with a single
base form of the phrase within the question. Then it begins to move slowly online
and offline files and pages, then analyses the material via personal requests. The
model engine page crawls to determine what’s on it, which is called the system of
indexing after mastering a web page’s URL or file route, and the effects are then
listed. The version starts with the keyword criterion of matching person queries in
three locations: web page URL, domain call, and page content material. The model
engine also appears up the page’s creation data in its metadata.
Phase 2: the second segment is initiated by means of a web page handler module.
It identifies types of pages or files. Also, it begins to load page contents in the builder
page which will keep track of every document or page content to compare search
query to it.
Module 1 loads the pages and documents records. Rank is calculated using rank
calculator and user’s standards. Section one findings establish the possibilities of
the person earlier than loading page content inside the web page handler. The rank
calculator receives contents at this point.
Ranking of Documents Through Smart Crawler 323

The first degree discovers a pattern to decide the advent and modification dates of
pages or files with the aid of processing the loaded material by a web page handler.
A user’s seek query is compared to the opposite web page’s content material and by
way of counting the hyperlinks number that factors to the ones different pages which
can be linked to the consumer’s search query, it’s also viable to decide the variety of
votes for every web page.
The weight module successfully determines the weight initial for every criterion
according to the preferences of a user after Module 2 with that feature enabled.
Computes the ultimate score for every document and page by passing these values to
the rank calculator. Rank statistics, which is responsible for the final values displayed
for every page, receives the obtained page scores.
Hamming Distance approach is used to decide comparable words. If more than
one phrase is present, the word with the highest degree of similarity may be ordinary.
If the string length does not match hammingDist(str1, str), padding is used.
The value of each heading and bullet factor is decided. To make certain that
each type has a priority fee, as proven in Table 1, the similarity among H1 and H2
comparisons should not be greater than the contrast among H1 and H6.
It will take much less time for the data to display because the cache may be
memory-sensitive as opposed to time-touchy, as data may be deleted when the cache
reminiscence is complete until that report is saved in the cache. Whilst a specific text
is searched and given, the 100% similarity index will be saved in the cache so that
after the identical text is searched again, the 100% similarity index is crawled and
statistics can be offered as output. When compared to searching through the complete
database, the likelihood of finding a match for a given record is too great, thus the
records with more similarity index that is cached will assist us in quickly identifying
the record. When possible, records with a higher similarity index are discovered,
and the old records will be replaced. During search engine optimization, as shown
in Fig. 2, a search engine for crawled data is introduced.
We can add ranking to the architecture as some websites need to give their content
more attention. Once we have crawled the site the most times, results will be sorted
by priority topic. Figure 3 shows an architecture of a web crawler with induction of
ranking of URL.

Table 1 Priority types

Type H1 H2 H3 H4 H5 H6 Bullet points
Priority 1 2 3 4 5 6 7
324 A. S. Dange et al.

Fig. 2 Search engine for crawled data

Fig. 3 Web crawler architecture with induction of ranking of URL

Ranking of Documents Through Smart Crawler 325

Fig. 4 Crawling result with and without similarity index

4 Result and Discussion

The Experimental setup includes the Ubuntu 20.04 operating system and 32 GB
of RAM. The technologies utilized to implement the concept are Docker, Docker-
compose, Apache Kafka (Docker Image instead of actual software), and Python
programming language. The implemented concept is powered by an Intel 17 octal-
core processor. Wikipedia, Udemy, Medium, and Geeks for Geeks are among the
initial URLs that are injected. Web crawlers will concentrate on headings and titles
rather than the complete website’s data. If a website adopts the implemented method-
ology, the primary material can always be made up of headers and titles, allowing
for the provision of just restricted and essential content.
Figure 4 shows a graphical illustration with and without a similarity index.

5 Comparative Analysis

This section contains an analysis of existing and implemented models. Table 2 shows
a comparative analysis of existing and implemented models. The analysis is done
with the help of attributes such as accuracy, precision, recall and f-measure.
As shown in the above table, the proposed method achieves 99.8% accuracy. The
accuracy is high as compared to other methods. We can also observe that values for
precision, recall and f-measure are 99.9%, 98% and 99%, respectively.
326 A. S. Dange et al.

Table 2 Comparative analysis of existing and implemented models

Author Method Accuracy (%) Precision (%) Recall (%) F-measure (%)
Kaur et al. [15] IHCW 90 84.62 81.06% –
Hosseini et al. SVM 99.08 99.8 96 97.79
[16]
Murugudu and NTPDCF 95 91 72 –
Reddy [18]
Karur and SIMHAR 99.4 99 97 98
Geetha [17]
Proposed Hamming 99.8 99.9 98 99
method distance

6 Conclusion

The proposed technique is pushed by occasions. If there is data for processing, till
then complete sources will no longer be available. For parallel processing complete
venture has been divided into separate subtasks. To complete the tasks required, each
individual subtask can have a separate wide variety of entities. That is finished through
the usage of Kafka in between subtasks. The first subtask could be a manufacturer
and 2nd subtask can be a consumer of Kafka. We are able to evenly distribute records
with the help of plug and play. The sources will not be used until they’re wanted. The
proposed technique crawls quicker than regular structure. With proposed technique,
a couple of URLs can be handled at the side of horizontal and vertical scalability
which permits us to address greater statistics than other architectures. Compared
with other existing methods, the implemented Hamming Distance method achieves
a high accuracy of 99.8%.

References

1. Kim YY, Kim YK, Kim DS, Kim MH (2020) Implementation of hybrid P2P networking
distributed web crawler using AWS for smart work news big data. Peer-to-Peer Network Appl
13:659–670
2. Uzun E (2020) A novel web scraping approach using the additional information obtained from
web pages. IEEE Access 8:61726–61740
3. Zhang J, Zou T, Lai Y (2021) Novel method for industrial sewage outfall detection: water
pollution monitoring based on web crawler and remote sensing interpretation techniques. J
Clean Prod 312:127640
4. Bifulco I, Cirillo S, Esposito C, Guadagni R, Polese G (2021) An intelligent system for focused
crawling from Big Data sources. Expert Syst Appl 184:115560
5. Rajiv S, Navaneethan C (2021) Keyword weight optimization using gradient strategies in event
focused web crawling. Pattern Recogn Lett 142:3–10
6. Yang S, Wi S, Park JH, Cho HM, Kim S (2020) Framework for developing a building material
property database using web crawling to improve the applicability of energy simulation tools.
Renew Sustain Energy Rev 121:109665
Ranking of Documents Through Smart Crawler 327

7. Ang PS, Teo DCH, Dorajoo SR, Prem Kumar M, Chan YH, Choong CT, Phuah DST, Tan
DHM, Tan FM, Huang H, Tan MSH (2021) Augmenting product defect surveillance through
web crawling and machine learning in Singapore. Drug Saf 44(9):939–948
8. Zhao X, Zhang W, He W, Huang C (2020) Research on customer purchase behaviors in online
take-out platforms based on semantic fuzziness and deep web crawler. J Ambient Intell Hum
Comput 11:3371–3385
9. Hwang J, Kim J, Chi S, Seo J (2022) Development of training image database using web
crawling for vision-based site monitoring. Autom Constr 135:104141
10. ElAraby ME, Shams MY (2021) Face retrieval system based on elastic web crawler over cloud
computing. Multimedia Tools Appl 80:11723–11738
11. Schedlbauer J, Raptis G, Ludwig B (2021) Medical informatics labor market analysis using
web crawling, web scraping, and text mining. Int J Med Inform 150:104453
12. Attia M, Abdel-Fattah MA, Khedr AE (2022) A proposed multi criteria indexing and ranking
model for documents and web pages on large scale data. J King Saud Univ Comput Inf Sci
34(10):8702–8715
13. Sharma AK, Shrivastava V, Singh H (2021) Experimental performance analysis of web crawlers
using single and multi-threaded web crawling and indexing algorithm for the application of
smart web contents. Mater Today Proc 37:1403–1408
14. Hosseinkhani J, Taherdoost H, Keikhaee S (2021) ANTON framework based on semantic
focused crawler to support web crime mining using SVM. Ann Data Sci 8(2):227–240
15. Kaur S, Singh A, Geetha G, Cheng X (2021) IHWC: intelligent hidden web crawler for
harvesting data in urban domains. Complex Intell Syst 1–19
16. Hosseini N, Fakhar F, Kiani B, Eslami S (2019) Enhancing the security of patients’ portals and
websites by detecting malicious web crawlers using machine learning techniques. Int J Med
Inform 132:103976
17. Kaur S, Geetha G (2020) SIMHAR-smart distributed web crawler for the hidden web using
SIM+ hash and redis server. IEEE Access 8:117582–117592
18. Murugudu MR, Reddy LSS (2023) Efficiently harvesting deep web interfaces based on adaptive
learning using two-phase data crawler framework. Soft Comput 27(1):505–515
19. Capuano A, Rinaldi AM, Russo C (2020) An ontology-driven multimedia focused crawler
based on linked open data and deep learning techniques. Multimedia Tools Appl 79:7577–7598
Ensemble Learning Approaches
to Strategically Shaping Learner
Achievement in Thailand Higher
Education

Sittichai Bussaman , Patchara Nasa-Ngium ,

Wongpanya S. Nuankaew , Thapanapong Sararat ,
and Pratya Nuankaew

Abstract Thailand faces a severe problem of students dropping out of the higher
education system. Therefore, this research has three critical objectives: (1) to study
the context of students’ academic achievement in science and technology at the
higher education level, (2) to assemble a model to predict the risk of students drop-
ping out of higher education, and (3) to evaluate a model for predicting the risk of
a student dropping out from higher education. The population and research sample
were 2361 students’ academic achievements from five educational programs of the
Faculty of Science and Technology at Rajabhat Maha Sarakham University during
the 2010–2022 academic year. The research tool utilized data mining and super-
vised machine learning techniques: Decision Tree, Naïve Bayes, Neural Networks,
Gradient Boosting, Random Forest, and Majority Voting. Model performance was
evaluated using the cross-validation approaches and confusion matrix techniques,
with four indicators: Accuracy, Precision, Recall, and F1-Score. The results showed
that learners’ context in science and technology had various learning achievements.
The educational program that needs to be monitored is the Bachelor of Science
Program in Computer Science. This research successfully developed a predictive
model for student dropout risk with an accuracy of 88.14% and an S.D. equal to
1.04. Therefore, this research dramatically benefits the public and stakeholders of
Rajabhat Maha Sarakham University, who should be encouraged and encouraged to
continue this research.

Keywords Academic achievement model · Education problem solving ·

Educational data mining · Learning analytics · Students’ dropout model

S. Bussaman · P. Nasa-Ngium
Faculty of Science and Technology, Rajabhat Maha Sarakham University, Maha Sarakham 44000,
Thailand
W. S. Nuankaew · T. Sararat · P. Nuankaew (B)
School of Information and Communication Technology, University of Phayao, Phayao 56000,
Thailand
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 329
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_27
330 S. Bussaman et al.

1 Introduction

The influence of artificial intelligence has spread into the education industry, known
as “educational data mining and learning analytics”. Educational data mining is
a space where data scientists use data relevant to learners, instructors, and other
educational contexts to develop the potential of students and teachers [1, 2], while
learning analytics is a tool for creating practical educational data mining. Learning
analytics typically consists of four components: descriptive analytics, diagnostic
analytics, predictive analytics, and prescriptive analytics [2, 3].
Descriptive analytics is similar to a survey tool describing what is being studied
and researched. In comparison, diagnostic analytics diagnoses problems and causes
related to the inspected object. Predictive analytics uses the cause of pain to shape a
forecast to find future answers. Finally, prescriptive analytics is about introducing a
variety of practical alternatives to give researchers a clear direction. These elements
are essential ingredients for research development in the education industry [4, 5].
Many researchers are interested in using educational data mining to develop students
for various purposes: to improve student performance [6, 7], to predict learning
achievement [8, 9], to recommend relevant academic and career programs, and to
develop learning models that fit learning styles [10], etc.
In Thailand’s education context, education level is divided into two primary
classes: basic education and higher education. Thai basic education provides students
with general knowledge, while Thai higher education focuses on specialized educa-
tion. However, the major problem faced by universities in Thailand is dropping
out of the student system and graduation not as designed [11, 12]. Rajabhat Maha
Sarakham University has been affected by dropping out of the student education
system like other universities. This is the principal reason that drives researchers to
carry out this research. This research has three main objectives. The first objective
is to study the problems affected by students’ dropout in the Faculty of Science
and Technology at Rajabhat Maha Sarakham University. The second objective is
to develop a predictive model for the dropout risk of students from the Faculty of
Science and Technology. The final objective is to evaluate the effectiveness of the
risk prediction model of the Faculty of Science and Technology dropout students.
The data was collected from 2361 students from five educational programs from the
Faculty of Science and Technology, Rajabhat Maha Sarakham University, during
the 2010–2022 academic year. Research tools and methodologies were CRISP-DM
and supervised machine learning tools [7, 9]: Decision Tree, Naïve Bayes, Neural
Networks, Gradient Boosting, Random Forest, and Majority Voting. Model perfor-
mance was evaluated using the cross-validation approaches and confusion matrix
techniques, with four indicators: Accuracy, Precision, Recall, and F1-Score.
In this research, the researchers are highly committed to determining the guide-
lines and successfully designing the solutions that Rajabhat Maha Sarakham Univer-
sity faces. In addition, the researchers hope that this research will continue to benefit
the public.
Ensemble Learning Approaches to Strategically Shaping Learner … 331

Table 1 Data collection

Educational program Graduated Dropped out Resigned Total
On schedule Not on schedule
B.Sc. Biology 570 10 49 15 644
B.Sc. Chemistry 313 2 0 11 326
B.Sc. Computer Science 384 108 161 35 688
B.Sc. Mathematics 251 56 90 38 435
B.Sc. Physics 165 49 36 18 268
Total 1683 225 336 117 2361

2 Materials and Methods

2.1 Population and Research Samples

As for the population and research samples, researchers collected data on the learning
achievement of students in the Faculty of Science and Technology at Rajabhat Maha
Sarakham University during the academic year 2010–2022 from five educational
programs: Bachelor of Science Program in Biology, Bachelor of Science Program in
Chemistry, Bachelor of Science Program in Computer Science, Bachelor of Science
Program in Mathematics, and Bachelor of Science Program in Physics, as detailed
in Table 1.

2.2 Data Collection

The data collected were classified by education program and student status, as shown
in Table 1. The data used in this research will be anonymized for confidentiality and
research purposes only.
Table 1 shows the collected sample data. It has a total of 2361 students from five
educational programs. The educational program with the most significant number of
students is B.Sc. Computer Science with a total enrollment of 688 students. Moreover,
when considering the details, it was found that the B.Sc. Computer Science has the
highest issues, with 161 dropout students and 108 who graduated late. Rajabhat Maha
Sarakham University needs to pay attention to this matter urgently.

2.3 Research Methodology and Tools

The data mining techniques have been used as the research methodology and tools
to define research guidelines using CRISP-DM principles to determine research
332 S. Bussaman et al.

procedures: business understanding, data understanding, data preparation, modeling,

evaluation, and deployment.
Firstly, the problem needs to be defined and understood, which is the stage of
business understanding. The researchers found that many students at the Faculty of
Science and Technology at Rajabhat Maha Sarakham University have dropped out
of the system in the past decade. At the same time, graduates often fail to graduate
as designed, as shown in Table 1. Secondly, researchers interpret the collected data
to understand the data, which is the second step.
To understand the data, the researchers found that all educational programs
required students to enroll in 30 credits of general education courses and approxi-
mately 90 credits of specialized courses related to the educational program. There-
fore, the researchers used only general education subjects to analyze the data and
develop a prediction model. It consists of thirteen courses: 1,102,002 English Reading
and Writing, 1,109,001 Using Information for Learning, 1,200,001 Art Appreciation,
1,200,004 Meaning of Life, 1,200,005 Human Security, 1,200,006 Human Behavior
and Self Development, 1,300,001 Natural Resources and Environmental Manage-
ment of Thai, 1,300,002 Local Studies, 1,400,001 Life and Environment, 1,400,002
Science for Quality of Life, 1,400,003 Mathematics and Statistics for Decision-
Making, 1,400,004 Information Technology for Life, and 1,400,005 Exercises for
Quality of Life.
The third step is data preparation. The researchers found that the collected data
was highly coarse and sparse. Therefore, the researchers eliminated and trimmed
the improper and inconsistent data to provide the appropriate data to develop the
model. After getting the data ready for model development. The researchers set
out to develop the model in two phases: the first phase, single model development
using supervised learning techniques, and the second phase, ensemble technique for
efficient model analysis.
Modeling tools used in this section include Decision Tree, Naïve Bayes, Neural
Networks, Gradient Boosting, Random Forest, and Majority Voting. A decision tree
[9] is an inverted tree structure. It consists of a root node which serves as the initiation
of the model and branches, which serve as the model’s condition. Another component
is the leaf node, which acts as an alternative and answer model. Decision tree models
are popular because they are easy to apply.
Naïve Bayes is a prevalent technique for classification in data mining with compu-
tational probabilities [9, 13]. The advantage of Naïve Bayes is that it can work
conveniently and quickly in multiple classes.
Neural Networks [13, 14] are one of the most powerful algorithms in machine
learning, known as Deep Learning, advanced networks with many layers. The work
of biological neural networks inspired Neural Networks. The advantages of neural
networks are that the model is very flexible and does not limit the layer size. In
addition, it can be used to estimate continuous functions (Universal approximation
theorem) and learn various data features by oneself.
Ensemble Learning Approaches to Strategically Shaping Learner … 333

Gradient boosting is a selective optimization to make each new classifier instance

progressively more accurate, and it learns from the accumulated discrepancy gener-
ated by previous instance predictions. The advantage of gradient boosting is that the
model learns from its mistakes and improves them.
Random forest is one of a group of models called ensemble learning, whose
principle is to train the same model multiple times (instances) on the same data
set. Each training session selects a different part of the data that is qualified, and
then decides on those models to vote on which class is the most chosen. In majority
voting, they are also known as hard voting. All classifiers will vote for the class. After
that, the majority that received the most votes was used as the class in the model’s
response.
The fourth step is evaluation. In developing the model, it is necessary to have a
guideline to determine the criteria for selecting the best model. Researchers use cross-
validation and confusion matrix techniques as tools to evaluate effective models.
The last step is deployment. The researchers have evaluated the model and set
guidelines for actual implementation to present to organizational executives.

2.4 Research Analysis and Interpretation

In analyzing and interpreting the findings, researchers used the last two steps of the
CRISP-DM process to guide their operations: evaluation and deployment.
Researchers divided the process into two parts for the evaluation: the cross-
validation approach and the confusion matrix to determine indicators. The cross-
validation process divides collected data into equal portions called k-fold. Then take
several data to develop a model called the training dataset, and the rest to test the
developed model is called the testing dataset.
To determine the most efficient model, it is necessary to use the confusion matrix
as a metric. This research has four metrics: accuracy, precision, recall, and f1-score.
Accuracy is the value used to determine the overall model performance, and it is
calculated as the number of correctly predicted data divided by the total data. Preci-
sion is the classification of predictions within each class to determine the model’s
predictive ability. It is calculated by dividing the correctly predicted data for each
class by the number of class members. Recall is the actual value the model can accu-
rately predict by class. It is calculated as the predicted actual value divided by the
number of members in the class. Finally, the F1-Score is a metric built on Precision
and Recall to be used as a criterion in conjunction with accuracy. It can be calculated
from Eq. (1).

2 ∗ (precision ∗ recall)/(precision + recall) (1)

After obtaining the most suitable model, it goes into deployment. Deployment can
be carried out in many ways, such as developing a user manual, designing a perfor-
mance report, or developing an application. For this research, it has been proposed to
334 S. Bussaman et al.

Table 2 Summary of the context of the learner

Educational program Academic achievement
Min Max Mean Mode Median S.D
B.Sc. Biology 0.15 3.92 2.73 2.64 2.76 0.61
B.Sc. Chemistry 0.26 3.90 2.73 2.70 2.73 0.57
B.Sc. Computer Science 0.11 3.60 2.24 2.21 2.33 0.63
B.Sc. Mathematics 0.10 3.61 2.28 2.39 2.40 0.66
B.Sc. Physics 0.22 3.90 2.54 2.43 2.59 0.69
Overall 0.10 3.92 2.49 2.53 2.56 0.66

the administrators of Rajabhat Maha Sarakham University and the Faculty of Science
and Technology to determine a solution that is consistent with student behavior in a
sustainable way.

3 Research Results

3.1 Learner Context

The researchers summarized the results of the contextual analysis of learners using
basic statistics, including minimum, maximum, mean, mode, median, and S.D., as
detailed in Table 2.
Table 2 presents an overview of student achievements from five educational
programs of the Faculty of Science and Technology at Rajabhat Maha Sarakham
University. It was found that all learners had a moderate average academic achieve-
ment, with a mean of 2.49 out of 4.00 and S.D. equal to 0.66. However, administrators
and stakeholders should be aware that B.Sc. Computer Science programs have low
average learners, with a mean equal to 2.24 and S.D. equal to 0.63. It is consistent
with Table 1, showing many dropout students.

3.2 Model Development Results

The developed models based on the data mining development process are classified
by method characteristics, as summarized in Table 3.
Table 3 shows the performance test results of the model classified by technique.
It was found that the model developed with the voting technique had the highest
accuracy, with an accuracy of 88.14%. Therefore, it can be concluded that the voting
technique produces a model suitable for use and deployment. The researchers detail
the model performance in the next section.
Ensemble Learning Approaches to Strategically Shaping Learner … 335

Table 3 Summary of model performance classified by technique

Classifiers/Class Accuracy S.D Precision Recall F1-Score
Single classifiers
Decision tree 82.80 1.65
• On schedule 86.71 97.68 91.87
• Not on schedule 38.10 7.11 11.99
• Dropped out 73.42 79.76 76.46
• Resigned 46.55 23.08 30.86
Naïve Bayes 82.17 1.00
• On schedule 88.80 94.24 91.44
• Not on schedule 23.74 14.67 18.13
• Dropped out 83.50 73.81 78.36
• Resigned 52.52 62.39 57.03
Neural networks 80.77 1.65
• On schedule 87.43 94.65 90.90
• Not on schedule 17.78 7.11 10.16
• Dropped out 72.27 80.65 76.23
• Resigned 36.49 23.08 28.27
Ensemble approach
Gradient boosting 86.95 1.19
• On schedule 88.88 98.81 93.58
• Not on schedule 78.57 19.05 30.66
• Dropped out 79.21 89.58 84.08
• Resigned 83.33 40.54 54.55
Random forest 87.08 0.98
• On schedule 90.02 99.11 94.35
• Not on schedule 84.06 26.98 40.85
• Dropped out 76.69 84.00 80.18
• Resigned 65.75 37.21 47.52
Majority voting 88.14 1.04
• On schedule 89.19 99.52 94.07
• Not on schedule 91.07 22.67 36.30
• Dropped out 84.23 86.81 85.50
• Resigned 79.12 56.69 66.06
336 S. Bussaman et al.

Table 4 Selected model efficiency

True OSC True DPO True RSD True NSC Class precision
Pred. OSC 1675 24 7 172 89.19
Pred. DPO 3 283 48 2 84.23
Pred. RSD 1 18 72 0 79.12
Pred. NSC 4 1 0 51 91.07
Class recall 99.52 86.81 56.69 22.67
OSC = On schedule, NSC = Not on schedule, DPO = Dropped Out, RSD = Resigned

3.3 Selected Model Efficiency

The most suitable model for this research is the model developed with the voting
technique. The efficiency was tested with the confusion matrix, as detailed in Table 4.
Table 4 enumerates model performance using the confusion matrix technique and
four indicators. It was found that the model had predictive ability in all classes,
with model accuracy equal to 88.14%. In addition, the model has a high level of
predictive capability in each category, with OSC’s F1-Score equal to 94.07%, NSC’s
F1-Score equal to 36.30%, DPO’s F1-Score equal to 85.50%, RSD’s F1-Score equal
to 66.06%. Therefore, it can be concluded that this model can be adapted and utilized
further.

4 Research Discussion

This research achieves all three objectives, and the researchers can discuss the
following results.
The researchers extracted data from five educational programs at the Faculty of
Science and Technology at Rajabhat Maha Sarakham University. It was discovered
that the educational program that required special vigilance was B.Sc. Computer
Science, as concluded in Table 1. It shows only 55.81% of students in the program
who completed the designed curriculum (384 out of 688 students). In addition, the
number of dropout students and graduates who graduated not on schedule ranked the
highest, representing 47.92% of the total dropout students (161 out of 336 students)
and 48.00% of the total, who graduated not on schedule (108 out of 225 students).
Such findings and observations drive researchers to find solutions to these problems.
The researchers developed a model to predict the risk of students dropping out and
failing to complete their studies. The researchers used a data mining approach and six
supervised machine learning techniques to develop the most acceptable model: Deci-
sion Tree, Naïve Bayes, Neural Networks, Gradient Boosting, Random Forest, and
Majority Voting. The model development results from each technique are summa-
rized and presented in Table 3. Overall, the researchers found that all techniques
Ensemble Learning Approaches to Strategically Shaping Learner … 337

were able to produce highly efficient models, with the most effective models being
those using the majority voting technique. It has an accuracy value of 88.14% and
an S.D. equal to 1.04. It can be interpreted as a reasonable model to implement and
deploy. The selected model was put into a detailed performance test, which is listed in
Table 4. Table 4 shows that the selected model predictability is distributed among all
classes. There is one area where there is a slight improvement: the model still cannot
predict graduation not on schedule with low accuracy (recall equal to 22.67%).
Finally, the researchers concluded that the research achieved the intended research
objectives. The research has learned the details and the context of the students that the
Faculty of Science and Technology and Rajabhat Maha Sarakham University need
to pay special attention. Moreover, this research has studied a learning model based
on science and technology. Therefore, the researchers concluded that this research
is beneficial and appropriate and should be disseminated to the public.

5 Conclusion

In conclusion, this research found that all objectives were implemented and achieved.
The data collection consists of 2361 students’ learning achievements in the Faculty of
Science and Technology at Rajabhat Maha Sarakham University during the academic
year 2010–2022 from five educational programs: Bachelor of Science Program in
Biology, Bachelor of Science Program in Chemistry, Bachelor of Science Program
in Computer Science, Bachelor of Science Program in Mathematics, and Bachelor
of Science Program in Physics, as detailed in Table 1.
Of particular note is that the researchers found that in the Bachelor of Science
Program, many students were at high risk of dropping out, and many were more likely
to graduate not on a schedule. Data over the past decade, as shown in Table 1, shows
that the number of students in such programs has a graduation rate of only 55.81%
(384 out of 688 students), and dropout students are the highest with 161 students,
representing 47.92% (161 out of 336 students). It highly emphasizes the importance
and necessity of developing a model to predict the dropout risk of students in the
Faculty of Science and Technology at Rajabhat Maha Sarakham University.
The model developed from the six supervised learning techniques showed that
the model constructed with the majority voting technique had the highest accuracy,
with an accuracy of 88.14% and an S.D. equal to 1.04, as compared in Table 3.
Moreover, the model was tested for performance by the cross-validation approach, the
confusion matrix technique, and four metrics, as detailed in Table 4. Therefore, it can
be concluded that this research was successful and deserves further dissemination.
338 S. Bussaman et al.

6 Research Limitations

As for the limitations of this study, the researchers noted that the data collection
took a long time and could be considered extensive data. It is, therefore, an advan-
tage in this research. However, to conduct good research, accepted research results
require support from stakeholders and Rajabhat Maha Sarakham University admin-
istrators. Researchers have great expectations that this research will be carried out
and supported in their organization and are encouraged to continue it in other ways.

Acknowledgements This research project was supported by the Thailand Science Research and
Innovation Fund and the University of Phayao (Grant No. FF66-UoE002). In addition, this research
was supported by many advisors, academics, researchers, students, and staff. The authors would
like to thank all of them for their support and collaboration in making this research possible.

Conflict of Interest The authors declare no conflict of interest.

References

1. Hernández-Blanco A, Herrera-Flores B, Tomás D, Navarro-Colorado B (2019) A systematic

review of deep learning approaches to educational data mining. Complexity 2019:e1306039.
https://doi.org/10.1155/2019/1306039
2. Aldowah H, Al-Samarraie H, Fauzy WM (2019) Educational data mining and learning analytics
for 21st century higher education: a review and synthesis. Telematics Inform 37:13–49. https://
doi.org/10.1016/j.tele.2019.01.007
3. Calvet Liñán L, Juan Pérez ÁA (2015) Educational data mining and learning analytics: differ-
ences, similarities, and time evolution. Int J Educ Technol High Educ 12:98–112. https://doi.
org/10.7238/rusc.v12i3.2515
4. Sahni J (2023) Is learning analytics the future of online education?: assessing student engage-
ment and academic performance in the online learning environment. Int J Emerg Technol Learn
(iJET) 18:33–49. https://doi.org/10.3991/ijet.v18i02.32167
5. Quadri AT, Shukor NA (2021) The benefits of learning analytics to higher education institutions:
a scoping review. Int J Emerg Technol Learn (iJET) 16:4–15. https://doi.org/10.3991/ijet.v16
i23.27471
6. Zhiyenbayeva N, Belyanova E, Petunina I, Dmitrichenkova S, Dolzhich E (2021) Personal-
ized computer support of performance rates and education process in high school: case study
of engineering students. Int J Eng Pedagogy (iJEP) 11:135–153. https://doi.org/10.3991/ijep.
v11i2.19451
7. Dabhade P, Agarwal R, Alameen KP, Fathima AT, Sridharan R, Gopakumar G (2021) Educa-
tional data mining for predicting students’ academic performance using machine learning
algorithms. Mater Today Proc 47:5260–5267. https://doi.org/10.1016/j.matpr.2021.05.646
8. Yu J (2021) Academic performance prediction method of online education using random forest
algorithm and artificial intelligence methods. Int J Emerg Technol Learn (iJET) 16:45–57.
https://doi.org/10.3991/ijet.v16i05.20297
9. Nayak P, Vaheed SK, Gupta S, Mohan N (2023) Predicting students’ academic performance
by mining the educational data through machine learning-based classification model. Educ Inf
Technol. https://doi.org/10.1007/s10639-023-11706-8
10. Seghroucheni YZ, Chekour M (2023) How learning styles can withstand the demands of mobile
learning environments? Int J Interact Mobile Technol (iJIM) 17:84–99. https://doi.org/10.3991/
ijim.v17i05.36403
Ensemble Learning Approaches to Strategically Shaping Learner … 339

11. Nuankaew P (2019) Dropout situation of business computer students, University of Phayao.
Int J Emerg Technol Learn 14:115–131. https://doi.org/10.3991/ijet.v14i19.11177
12. Iam-On N, Boongoen T (2017) Improved student dropout prediction in Thai University using
ensemble of mixed-type data clusterings. Int J Mach Learn Cyber 8:497–510. https://doi.org/
10.1007/s13042-015-0341-x
13. Shahiri AM, Husain W, Rashid NA (2015) A review on predicting student’s performance
using data mining techniques. Proc Comp Sci 72:414–422. https://doi.org/10.1016/j.procs.
2015.12.157
14. Nosseir A, Fathy Y (2020) A mobile application for early prediction of student performance
using fuzzy logic and artificial neural networks. Int J Interact Mobile Technol 14:4–18. https://
doi.org/10.3991/ijim.v14i02.10940
Harnessing Ridge Regression and SHAP
for Predicting Student Grades:
An Approach Towards Explainable AI
in Education

Vijay Katkar, Swapnil Kadam, Juber Mulla, and Niyaj Nadaf

Abstract This paper presents a comparative analysis of several regression tech-

niques for predicting student academic performance, with a particular focus on
Ridge Regression and its interpretability through SHAP, a method of Explain-
able Artificial Intelligence (XAI). The models evaluated include Linear Regression,
Ridge Regression, Lasso Regression, Elastic Net, Decision Tree Regression, Random
Forest Regression, AdaBoost, Gradient Boosting, Bagging, XGBoost, and K-Nearest
Neighbors. Using a dataset encompassing student family background, personal infor-
mation, and recent academic grades, we trained and evaluated these models to predict
future academic performance. Experimental results demonstrated that Ridge Regres-
sion outperformed all other models in predictive accuracy with the highest r 2 score
of 0.9036 and lowest RMSE of 1.4623. Furthermore, we employed SHAP values
to interpret the predictions made by the Ridge Regression model, revealing key
feature contributions to the model’s output. This research underscores the poten-
tial of Ridge Regression and SHAP in building predictive and interpretable models
in the educational domain. This fusion of predictive accuracy and interpretability
provides invaluable insights for educators and policy-makers alike in understanding
and enhancing student academic achievement.

Keywords Predictive modeling · Student academic performance · Machine

learning · Ridge regression · Explainable AI · SHAP analysis

1 Introduction

In the field of educational research, accurately predicting student academic perfor-

mance is a topic of immense importance. The ability to predict student outcomes
can provide valuable insights for educators, allowing them to target interventions

V. Katkar (B) · S. Kadam · J. Mulla · N. Nadaf

Department of Computer Science and Engineering, Annasaheb Dange College of Engineering and
Technology, Sangli, Maharashtra, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 341
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_28
342 V. Katkar et al.

more effectively and ultimately enhancing educational achievement. Over the years,
researchers have used statistical techniques to predict student performance with
varying degrees of success.
The advent of machine learning (ML) has provided an opportunity to enhance
prediction accuracy and gain new insights into this critical issue. A range of ML tech-
niques have been applied to educational data, from simple models like Linear Regres-
sion to more complex ones like Random Forests and Boosting techniques. While
these models have shown promise, they vary greatly in their predictive performance
and, more importantly, their interpretability [1, 2].
The ability to interpret ML models, known as Explainable AI (XAI), has become
increasingly important. As ML models are deployed in more contexts, stakeholders
need to understand the reasoning behind the model’s predictions [3, 4]. This is espe-
cially true in education, where interventions based on model predictions have direct
impacts on students’ lives. SHapley Additive exPlanations (SHAP) is one of the
methods used to interpret complex models, offering a way to attribute the contribution
of each feature to the prediction [5, 6].
This research aims to not only investigate the predictive accuracy of several ML
models for student performance but also to explore their interpretability using SHAP.
This dual focus on predictive power and interpretability fills a crucial gap in the
existing literature. By applying a variety of regression techniques and using SHAP
to interpret the most successful one, we aim to provide a comprehensive view of
student performance prediction that is both highly accurate and readily interpretable.
This work will contribute to the existing body of knowledge by presenting a
comparative analysis of the accuracy of several regression models, highlighting
the efficacy of Ridge Regression in this context. Furthermore, it will shed light on
the applicability of SHAP in interpreting Ridge Regression, thereby enhancing our
understanding of the most influential factors affecting student performance. These
findings will not only advance academic understanding in this field but also provide
educators and policy-makers with valuable, interpretable insights for improving
student outcomes.
The primary objectives of this research are as follows:
1. Investigate various machine learning regression models: The study aims to
explore the efficacy of multiple regression techniques (including Linear Regres-
sion, Ridge Regression, Lasso Regression, Elastic Net, Decision Tree Regression,
Random Forest Regression, AdaBoost, Gradient Boosting, Bagging, XGBoost,
and K-Nearest Neighbors) for predicting student academic performance.
2. Identify the most accurate model: Based on the comparative analysis of the
aforementioned models, the objective is to ascertain the model that provides the
highest predictive accuracy for student academic performance.
3. Apply Explainable AI using SHAP to the most accurate model: This study seeks
to use SHAP to interpret the predictions of the model identified as the most
accurate.
Section 2 of this paper offers a comprehensive literature review that grounds our
study within the existing body of work. In Sect. 3, we outline our methodology,
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 343

detailing the data, machine learning models used, and the application of SHAP. Our
findings are then presented in Sect. 4 (Results), which leads to a deeper interpretation
of these results in Sect. 5. Finally, Sect. 6 concludes the paper with a summary of
the key findings.

2 Literature Review

The prediction of academic performance has been a topic of research interest for
several decades, given its profound implications for educational policy, instruction,
and learning. Traditional approaches have relied heavily on statistical methods, using
factors such as students’ previous academic records, socioeconomic status, and other
personal characteristics as predictors [7, 8].
However, the emergence of machine learning (ML) methods has marked a
paradigm shift in this field. ML offers more sophisticated, non-linear modeling capa-
bilities that can capture the intricate relationships between predictors and student
performance. Numerous studies have begun to explore various ML techniques to
predict academic performance.
For instance, Kotsiantis et al. [9] applied various ML techniques, including Deci-
sion Trees, Random Forests, and Support Vector Machines (SVM), to predict student
grades in a distance learning context. They found that SVMs provided the highest
accuracy among the models tested. Similarly, Márquez-Vera et al. [10] used several
ML techniques to predict student dropouts, finding that Decision Trees and Random
Forests performed best in their study.
The application of ensemble methods, such as AdaBoost and Gradient Boosting,
has also been explored. Cortez and Silva [11] used AdaBoost for predicting student
performance and found it to be highly effective. Moreover, ensemble methods like
XGBoost have shown impressive performance in predicting student outcomes [12].
While these studies highlight the promising role of ML in predicting academic
performance, there has been less focus on the interpretability of these models.
However, this is starting to change with the advent of Explainable AI (XAI), which
seeks to make the reasoning behind ML predictions understandable to human users
[13]. The application of XAI in education is still a nascent field and constitutes a
significant gap in the literature, which our study aims to address.

3 Methodology

Figure 1 depicts the research methodology this paper has employed. Dataset prepro-
cessing, feature selection, training, and model evaluation are steps involved. The
success of a predictive model relies on each individual step. Several preprocessing
processes were performed on our data before we applied any machine learning tech-
niques. During this stage, a Label Encoder was used to translate ordinal attributes
344 V. Katkar et al.

Fig. 1 Proposed methodology

into numeric attributes, and a Standard Scaler was used to standardize numeric vari-
ables like test scores. These measures guaranteed that our models could efficiently
assimilate new information.
After the data was cleaned and organized, we used a feature selection procedure
to narrow down the features from 32 to just a handful. Each attribute was analyzed
for its association with the dependent variable, or the students’ final grade. Next,
we looked for the highest absolute connection between each attribute and the final
grade, and we narrowed it down to the top 15.
To foretell students’ final marks, we employed many regression models. Different
types of regression models, such as Linear Regression, Ridge Regression, Lasso
Regression, Elastic Net Regression, Decision Tree Regression, Random Forest
Regression, AdaBoost Regression, Gradient Boost Regression, Bagging Regres-
sion, XGBoost Regression, and KNN Regression, were considered. We compared
the results of many models, each of which takes a somewhat different approach to
learning from data, to determine which one is most suited to accomplish our goal.
Finally, we used SHAP (SHapley Additive exPlanations), a method in Explainable
AI, to learn more about the prediction performance of our final model. With this
strategy, we could analyze how each feature affected the model’s forecast. Detailed
description of the methodology is given below.

3.1 Dataset

The dataset [14] comprises student achievement in secondary education of two

Portuguese schools. It contains a total of 32 attributes for each student, including both
school-related and personal characteristics. School-related features include course
subject (Mathematics or Portuguese), number of absences, test scores, etc. Personal
and social attributes cover aspects such as family size, parents’ education level,
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 345

weekly study time, and other factors that could potentially affect student perfor-
mance. The target variable, which the models are trying to predict, is the final grade
of the student, represented as a numerical value.

3.2 Data Preprocessing

Given the diverse nature of the dataset, careful preprocessing was essential to
ensure that our machine learning models could accurately interpret the data. The
preprocessing methods used were as follows:
1. Encoding Ordinal Attributes
2. Standardizing Numeric Attributes
3. Feature Selection Using Correlation
Encoding Ordinal Attributes The dataset contained several ordinal attributes,
such as family educational background, which required conversion to a format
suitable for our models. To accomplish this, we employed a Label Encoder. This
technique assigns a unique numeric value to each category within an attribute. It
effectively translates ordinal data into a format that our models can interpret while
preserving the ordered nature of the categories. This step was crucial, as many
machine learning algorithms require numeric input.
Standardizing Numeric Attributes The dataset also included numeric attributes,
such as recent academic grades. Given that these attributes can vary in range and
scale, we opted to standardize them using the Standard Scaler technique. Standard
Scaler adjusts the values of each numeric attribute to have a mean of 0 and a standard
deviation of 1. This process is crucial as it brings all numeric attributes onto the same
scale, preventing attributes with larger scales from dominating those with smaller
scales. It also helps algorithms converge faster during training.
Feature Selection Using Correlation With the objective of creating the most
effective predictive model, we sought to reduce the number of features from the
original 32. To accomplish this, we applied a correlation-based feature selection
method. Correlation measures the linear relationship between two variables. In this
case, we calculated the correlation between each feature and the target variable, i.e.,
the final student score. The result is depicted in Fig. 2.
From these calculations, we selected the top 15 attributes that had the highest
absolute correlation with the final student score. This step was critical as it enabled
us to focus on the most relevant predictors and exclude features that contributed
less to our target variable, thereby improving the efficiency and performance of our
machine learning models. Reducing the dimensionality of the dataset in this way
also helped to alleviate potential issues related to overfitting and multicollinearity.
Following these preprocessing steps, the dataset was appropriately formatted and
ready for model training. This preparation was critical to ensure the success of the
subsequent model selection and evaluation process.
346 V. Katkar et al.

Fig. 2 Correlation between attributes

3.3 Regression Models Used for Experimentation

In order to predict the final grade of students, a variety of regression models were
selected, each of which uses different strategies to learn from the data. The following
subsections provide a brief description of each model.
Linear Regression is one of the most fundamental methods of predictive analysis.
The overarching goal of regression analysis is to look at two variables and explore
them:
• Does a given collection of predictor variables effectively predict a dependent
variable?
• Which specific predictor variables exist and how do they affect the outcome
variable
Ridge Regression: When the independent variables in a multiple regression
model are highly correlated, ridge regression can be used to estimate the coeffi-
cients of the model. In certain cases where the least squares approach would just fail,
ridge regression yields reliable results.
In Lasso Regression, L1 regularization is carried out, which imposes a penalty
proportional to the absolute size of the coefficients. Sparse models with minimal
coefficients are produced by this method of regularization, which allows for efficient
feature selection.
In Elastic Net Regression, characteristics of the Ridge and Lasso regression
models are combined. The model is penalized by making use of both the L2-norm
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 347

and the L1-norm in order to obtain the regularization properties that Ridge is known
for. Because of this, the model could end up having zeroes for its coefficients (just
like Lasso).
Decision Tree Regression: The data were separated into their respective groups
by the “Decision Tree Regression” algorithm by use of a sequence of if–then expres-
sions. These are built with the use of an algorithm that determines the most effective
ways to divide a dataset into sections based on a number of different factors.
Building many decision trees in advance of testing is how the ensemble learning
approach known as Random Forest Regression gets its job done. The results of the
testing then serve as a representation of the average prediction made by the individual
trees. It stops decision trees from inappropriately overfitting their data, which is a
common problem.
Yoav Freund and Robert Schapire are responsible for the development of the
statistical meta-algorithm known as AdaBoost Regression. This technique is used for
categorizing data. It is compatible with a broad variety of distinct learning algorithms
and may be used to enhance the efficiency of such algorithms. A prediction model is
generated by the Gradient Boost Regression algorithm in the form of an ensemble
of less accurate prediction models, which are often decision trees.
If you want predictions to be more accurate while also having less of a variety, you
may use a method known as Bagging Regression. This method entails constructing
many sets of the same data by repetitive combination and subseting, so that you can
compare them.
XGBoost Regression package is a framework for distributed gradient boosting
that has been optimized for performance and also takes into account adaptability and
portability considerations.
KNN Regression method calculates an estimate of a result by considering the
degree to which individual data points are equivalent to one another. The objective
is to find out which of a certain number of training samples is most comparable to
the most recent data point, and then to base a prediction on that finding.

3.4 Model Training and Evaluation

Following the data preprocessing and feature selection steps, we proceeded to the
training phase, where each model was trained and evaluated. We utilized a 70–30
split for our dataset, where 70% of the data was used for training our models, and
the remaining 30% was used for testing their performance. This split helps ensure
that our models were able to generalize well to unseen data and were not just fitting
to the specific patterns in the training data.
For the evaluation of our models, we used a combination of metrics to assess the
performance. Because a single measure cannot possibly represent all elements of
a model’s performance, it is common practice to make use of numerous metrics in
combination with one another. It is crucial to keep this fact in mind. The following
measurements and calculations were utilized.:
348 V. Katkar et al.

R2 Score: The coefficient of determination, is a statistical measure that determines

how much of the variation in the independent variables can be explained by shifts
in the value of the dependent variable. It provides a measure that may be used
to evaluate the degree of accuracy with which the model’s predictions match the
observed phenomena. R2 values that are closer to 1 indicate that the model provides
a more exact fit to the data.
Mean Absolute Error (MAE) is a statistical measure that determines the mean
magnitude of errors over a set of forecasts, but it does not take into account the trend of
those errors. The absolute differences between the predictions and the observations
are totaled over the test sample, and each difference is given an equal amount of
weight in the calculation.
Mean Squared Error (MSE), which is a measure of the quality of an estimator,
is non-negative and becomes better as it gets closer to zero. Because it is based on
the square root of the differences, it is more prone to substantial inaccuracies than
the mean absolute error.
Square root of the mean square error, also known as the root mean squared error
(RMSE), can be used to compute the average size of a mistake. This statistic is also
known as the Root Mean Squared Error (RMSE). It basically tells us how well
the data matches the line of best fit that we’ve been looking at.
Using these metrics allowed us to comprehensively evaluate the performance of
our models and make a more informed decision about which model performs the
best on our task of predicting student grades. The model with the best performance
across these metrics was then selected as our final model.

4 Experimental Results

The experiments were conducted using the Python programming language on a

machine with an i5 processor. The dataset underwent a process of data preprocessing,
following which many regression models were trained. The analysis and visualization
of the trained models’ performance is depicted in Figs. 3, 4, 5, and 6.
The R2 score for each training model is displayed graphically as a bar chart in
Fig. 3. The graph shows that the Ridge model generates the best results, with an R2
score of 0.9036, whilst the Elastic Net and Gradient Boosting regression models get
the lowest scores.
The MAE for each training model is displayed graphically as a bar chart in Fig. 4.
The graph shows that the Ridge model generates the best results, with the lowest MAE
score of 0.9113, whilst the Elastic Net and Gradient Boosting regression models get
the highest scores; 1.2999 and 1.4738, respectively.
The MSE for each training model is displayed graphically as a bar chart in Fig. 5.
The graph shows that the Ridge model generates the best results, with the lowest MAE
score of 2.1382, whilst the Elastic Net and Gradient Boosting regression models get
the highest scores; 5.3272 and 5.8208, respectively.
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 349

Fig. 3 R2 score of different regression models

Fig. 4 MAE of different regression models

The RMSE for each training model is displayed graphically as a bar chart in
Fig. 6. The graph shows that the Ridge model generates the best results, with the
lowest MAE score of 1.4623, whilst the Elastic Net and Gradient Boosting regression
models get the highest scores; 2.3081 and 2.4126, respectively.
It is evident from Figs. 3, 4, 5 and 6 that the performance of Ridge Regression
surpassed that of the other models over a wide range of statistical measures. Based
on the obtained results, it appears that Ridge Regression exhibits favorable qualities
as a potential contender for the prediction task.
350 V. Katkar et al.

Fig. 5 MSE score of different regression models

Fig. 6 RMSE score of different regression models

5 Explainable AI

In this research, we chose the SHAP model as our XAI model so that we could
interpret the outcomes of our regression model. SHAP offers a solid framework for
figuring out how different model characteristics influence the output of the model.
By utilizing SHAP, we are able to gain a better understanding of how each feature
contributes to the predictions made by the model, which in turn increases the read-
ability and transparency of the model. The SHAP visualizations that were produced
as a result provide us with insight into the decision-making process of the model and
offer an intuitive picture of the relevance of features. By making use of SHAP, we
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 351

may rest comfortable that our model is both reliable and explicable, which enables
us to derive credible conclusions from the data that we have.
Figures 7 and 8 illustrate the impact that certain characteristics have on the predic-
tions made by the model for the instance x that has been provided. It is an indication
that the related feature has a positive impact on the model’s prediction if the bars
are skewed to the right of the predicted value (E[f(x)]). On the other hand, negative
values indicate aspects that are detrimental to the forecast. The characteristics are
presented in the scatter diagram in a descending sequence, beginning at the top and
working their way down to the bottom, according to the diminishing importance they
have in developing a forecast for x. According to what was hypothesized, character-
istics that are closer to the plot’s upper edge are more significant than those that are
closer to the plot’s lower edge.
The contribution of individual attributes to the model prediction for the given
instance x is illustrated in Fig. 7, which can be found here. E[f(x)] equals 10.95,
which stands for the expected average model forecast over the dataset; on the other
hand, f(x) equals 12.242, which represents the models’ prediction for the instance
x in question. The fact that there is a difference of 1.292 between f(x) and E[f(x)]
indicates that the models’ forecast for x is significantly different from the prediction
that would be expected based on the entirety of the dataset. Further investigation
reveals that the attributes Dalc and higher make a significant contribution to this
variance, which in turn causes the model forecast to be greater than the output that
was anticipated.

Fig. 7 R2 Score of different regression models

352 V. Katkar et al.

Fig. 8 R2 score of different regression models

The contribution of individual attributes to the model prediction for the given
instance x is illustrated in Fig. 8, which can be found here. E[f(x)] equals 10.394,
which stands for the expected average model forecast over the dataset; on the other
hand, f(x) equals 10.706, which represents the models’ prediction for the instance x in
question. The fact that there is a difference of 0.312 between f(x) and E[f(x)] indicates
that the models’ forecast for x is slightly different from the prediction that would be
expected based on the entirety of the dataset. Further investigation reveals that the
attributes Dalc and Fedu make a significant contribution to this variance, which in
turn causes the model forecast to be greater than the output that was anticipated.

6 Conclusion

This study set out to predict student grades using various personal, family, and
academic attributes. Through an extensive exploration of multiple regression models,
we concluded that Ridge Regression provided the most accurate and robust predic-
tions. Notably, this study underscored the influence of specific features in predicting
academic performance, and it demonstrates the effectiveness of using machine
learning techniques in educational research.
The results of this study have practical implications for educators and policy-
makers, who may use such predictive models to identify students at risk of poor
Harnessing Ridge Regression and SHAP for Predicting Student Grades … 353

academic performance early on, thus allowing timely intervention. Furthermore, the
use of Explainable AI, particularly SHAP, provided us with a deep and intuitive
understanding of the predictions made by our model, which will be invaluable in
translating these findings into actionable strategies.
Future research could expand on this study by exploring more complex models
or by integrating time-series data to study how students’ performance evolves over
time. Other directions for future research could include a detailed analysis of the
most influential features in predicting student performance, as understanding these
factors can guide interventions aimed at improving academic outcomes.

References

1. Yagcı M (2022) Educational data mining: prediction of students’ academic performance using
machine learning algorithms. Smart Learn Environ 9(11). https://doi.org/10.1186/s40561-022-
00192-z
2. Rastrollo-Guerrero JL, Gómez-Pulido JA, Durán-Dom´ınguez A (2020) Analyzing and
predicting students’ performance by means of machine learning: a review. Appl Sci 10(3).
https://doi.org/10.3390/app10031042
3. Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, Guidotti
R, Del Ser J, Díaz-Rodríguez N, Herrera F (2023) Explainable artificial intelligence (XAI):
what we know and what is left to attain trustworthy artificial intelligence. Inform Fusion 99.
https://doi.org/10.1016/j.inffus.2023.101805
4. Cambria E, Malandri L, Mercorio F, Mezzanzanica M, Nobani N (2023) A survey on XAI
and natural language explanations. Inform Proces Manage 60(1). https://doi.org/10.1016/j.
ipm.2022.103111
5. Li Z (2022) Extracting spatial effects from machine learning model using local interpretation
method: an example of SHAP and XGBoost. Comput Environ Urban Syst (96). https://doi.org/
10.1016/j.compenvurbsys.2022.101845
6. Alabdullah AA, Iqbal M, Zahid M, Khan K, Amin MN, Jalal FE (2022) Prediction of rapid
chloride penetration resistance of metakaolin based high strength concrete using light GBM
and XGBoost models by incorporating SHAP analysis. Constr Build Mater 345. https://doi.
org/10.1016/j.conbuildmat.2022.128296
7. Jovanović J, Saqr M, Joksimović S, Gašević D (2021) Students matter the most in learning
analytics: the effects of internal and instructional conditions in predicting academic success.
Comput Educ 172. https://doi.org/10.1016/j.compedu.2021.104251
8. Namoun A, Alshanqiti A (2021) Predicting student performance using data mining and learning
analytics techniques: a systematic literature review. Appl Sci 11. https://doi.org/10.3390/app
11010237
9. Kotsiantis S, Pierrakeas C, Pintelas P (2004) Predicting students’ performance in distance
learning using machine learning techniques. Appl Artif Intell 18(5):411–426. https://doi.org/
10.1080/08839510490442058
10. Márquez-Vera C, Cano A, Romero C, Noaman AYM, Mousa Fardoun H, Ventura S (2016)
Early dropout prediction using data mining: a case study with high school students. Expert
Syst 33(1):107–124. https://doi.org/10.1111/exsy.12135
11. Cortez P, Silva AMG (2008) Using data mining to predict secondary school student perfor-
mance. In: Brito A, Teixeira J (eds) Proceedings of 5th future business technology conference,
Porto, Portugal, pp 5–12. hdl.handle.net/1822/8024
12. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the
22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp
785–794. https://doi.org/10.1145/2939672.2939785
354 V. Katkar et al.

13. Allgaier J, Mulansky L, Draelos RL, Pryss R (2023) How does the model make predictions?
A systematic literature review on the explainability power of machine learning in healthcare.
Artif Intell Med 143. https://doi.org/10.1016/j.artmed.2023.102616
14. Cortez P, Silva A (2008) Using data mining to predict secondary school student performance.
In: Brito A, Teixeira J (eds) Proceedings of 5th FUture BUsiness TEChnology conference
(FUBUTEC 2008), Porto, Portugal, EUROSIS, pp 5–12. ISBN 978-9077381-39-7
Applications
Convolutional Neural-Network-based
Gesture Recognition System for Air
Writing for Disabled Person

Soham Kr Modi, Manish Kumar, Sanjay Singla, Charnpreet Kaur,

Tulika Mitra, and Arnab Deb

Abstract Air writing is a unique form of natural user interface that involves the
recognition of characters and words that are written in the air using the movement of
one’s hands. This technology has become increasingly prominent and has received
considerable attention due to its potential to facilitate more natural and intuitive
forms of communication, as well as its applicability to a wide range of fields such
as virtual reality, augmented reality, and wearable computing. However, air-writing
recognition remains a challenging task due to the complexity and variability of the
gestures involved. This research paper proposes an air-writing recognition model
that leverages machine learning algorithms to recognize handwritten characters and
words in real time. The model is designed to be flexible and adaptable to different
types of air-writing gestures and is evaluated using a dataset of air-writing gestures
collected from multiple users. The proposed model consists of two main components:
a gesture recognition module that pre-processes the input data and extracts relevant
features, and a machine learning model that classifies the input gestures based on these
features. Experimental results show that the proposed model achieves high levels of
accuracy in recognizing air-writing gestures, outperforming existing cutting-edge
methods/technologies that are being used. The results demonstrate the potential of
the proposed model to be used in a variety of real-world applications, such as text
input, and controlling virtual objects in augmented reality.

Keywords Air-writing recognition · Image-recognition · Convolutional neural

networks · Depth analysis · Image classification · Gesture analysis · Deep
learning · Handwriting recognition

S. K. Modi · M. Kumar · S. Singla (B) · C. Kaur · T. Mitra · A. Deb

Department of Computer Science and Engineering, Chandigarh University, NH-95
Chandigarh-Ludhiana Highway, Mohali, Punjab 140413, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 357
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_29
358 S. K. Modi et al.

1 Introduction

Gesture-based communication and control of electronic devices have become

increasingly popular with the rise of smart homes and the Internet of Things (IoT).
As these technologies continue to evolve, they have enabled the development of
various user interfaces, including touchscreens, voice recognition, and gesture recog-
nition. Of these, gesture recognition is becoming increasingly popular due to its
non-invasive and intuitive nature. Gesture recognition is the process of identifying
and interpreting human movements and gestures for controlling devices or commu-
nicating with machines [1]. This technology captures, processes, and analyses data
from various sensors, such as cameras/accelerometers, to detect and recognize hand
movements and gestures. The technology has a wide range of applications, including
gaming, VR [2], robotics, and healthcare. Air-writing recognition is an emerging area
of gesture recognition that involves detecting and recognizing hand movements and
gestures in the air to input text or commands. Air-writing recognition offers a new
level of interactivity and control, enabling users to write or draw in the air and
convert it into digital text or commands. This technology has a lot of potential in
various fields, including health care, education, and smart homes. To develop an air-
writing recognition system, several challenges need to be addressed, including accu-
rate hand tracking and gesture recognition, real-time processing, and user-friendly
interfaces. The proposed method aims to address these challenges and provide a
robust and efficient air-writing recognition system. The model involves capturing
and processing data from various sensors [3], including cameras and accelerom-
eters. Machine learning algorithms are then used to recognize and interpret hand
movements and gestures. The system’s interface is intended to be user-friendly and
intuitive, allowing users to easily write or draw in the air [4]. The proposed air-writing
recognition framework has several potential applications. One of the primary appli-
cations is in healthcare for patients with motor impairments. Air-writing recognition
technology can be used to help patients with disabilities to communicate more effi-
ciently and effectively. This technology can also be used in education for interactive
learning. Air-writing recognition can help students learn by engaging them in inter-
active activities and providing real-time feedback. Finally, air-writing recognition
technology can be used in smart homes for controlling electronic devices. This tech-
nology provides a new level of convenience and control, enabling users to control
their devices using hand gestures.
Air-writing recognition technology has the potential to revolutionize the way
we interact with technology. Technology offers a non-invasive and intuitive way of
controlling electronic devices and communicating with machines. This technology
has significant potential in various fields, including health care, education, and smart
homes. With the ongoing technology, anticipation can be done witnessing additional
inventive applications of air-writing recognition technology in the future. Air-writing
recognition technology is an emerging area of gesture recognition that has significant
potential in various fields [5]. The proposed model aims to address the challenges
associated with air writing recognition and provide a robust and efficient air writing
Convolutional Neural-Network-based Gesture Recognition System … 359

recognition system. Technology has significant potential in healthcare, education, and

smart homes, and it holds the potential to transform our interaction with technology,
as it progresses, we can anticipate witnessing more ground-breaking applications in
the future.

1.1 Motivation

The motivation behind our research is twofold. First, air writing recognition can
enable new modes of interaction with digital content, such as writing in the air to
enter text or draw shapes in 3D space [1, 4]. This can enhance the user experience in
virtual and augmented reality environments, as well as in remote collaboration and
teleoperation scenarios. Second, air writing recognition can provide a natural and
intuitive input method for people with disabilities or injuries that limit their ability to
use traditional input devices. To achieve accurate and robust air writing recognition,
a system is proposed that combines hand tracking, gesture segmentation, feature
extraction, and classification using convolutional neural networks. A large dataset of
air writing gestures from diverse users and environments were collected, and evalu-
ated the system on several metrics, including recognition accuracy, latency, and user
satisfaction. The results demonstrate the feasibility and effectiveness of this approach,
and suggest directions for future research, such as improving real-time performance,
adapting to individual users, and integrating with other input modalities. Overall,
this research contributes to the field of human–computer interaction by introducing
a new input method that can expand the range of applications and improve accessi-
bility for diverse users. The proposed approach can also inspire further innovations
in machine learning, computer vision, and sensor technology for gesture recognition
and natural user interfaces.

1.2 Relevant Contemporary Issue

With the growth in deep learning techniques, researchers are exploring the use of
deep neural networks for recognizing air writing [6, 7]. Current research focuses
on developing more accurate and robust algorithms that can recognize air writing
in real time. As air writing recognition technology becomes more sophisticated,
concerns are being raised about the privacy and security implications of this tech-
nology. Researchers are studying the ethical and legal implications of air writing
recognition and developing frameworks to address these concerns. Researchers are
exploring how air writing recognition technology can be made more usable and
accessible for people with disabilities, such as those with motor impairments or visual
impairments. They are developing interfaces and applications that are designed to
be more accessible and user-friendly.
360 S. K. Modi et al.

1.3 Identification of Problem and Measures

The lack of accurate and reliable recognition algorithms for real time air writing
recognition. Although there have been significant developments in the domain
of air writing recognition, the accuracy and speed of recognition remain a chal-
lenge. Current algorithms are not able to accurately recognize the subtle movements
involved in air writing, and this limits the potential of this technology for applications
in fields such as healthcare, education, and human–computer interaction. Addition-
ally, there are concerns around the privacy and security implications of air writing
recognition, which must be tackled before air writing recognition technology can
be extensively adopted. Hence, additional investigation is necessary to enhance the
precision and dependability of recognition algorithms. Additionally, ethical and legal
considerations associated with air writing recognition need to be examined. To eval-
uate the current state of the art in air writing recognition technology. This involves
reviewing and analyzing existing literature and research papers on the topic, as well
as exploring the different algorithms and techniques used for air writing recognition
[1–4, 6–16]. The paper should also identify the limitations and challenges of current
air writing recognition technology and propose potential solutions for improving
accuracy and speed of recognition. Finally, the paper should consider the ethical
and legal implications of air writing recognition, including issues related to privacy
and security, and propose frameworks for addressing these concerns. Overall, the
task of the research paper is to provide a comprehensive and critical analysis of the
current state of air writing recognition technology, as well as its potential for future
development and application.

1.4 Related Work

In the field of air writing recognition, the industry has focused on developing and
evaluating various algorithms and techniques for recognizing hand gestures in real
time. One popular approach for recognizing hand gestures is to use convolutional
neural networks (CNNs). Convolutional Neural Networks (CNNs) have demon-
strated success in numerous computer vision tasks, such as classifying images and
detecting objects, and have shown promise for recognizing air writing gestures as
well.
Several notable research papers have contributed to advancing the understanding
and techniques in this domain. Kumar et al. (2017) proposed a paper titled “3D
Text Segmentation and Recognition Using Leap Motion,” [1] where they focused
on segmenting and recognizing 3D text using the Leap Motion device. Their work
emphasized the use of depth data for accurate text recognition. Fu et al. (2019)
presented a paper titled “Writing in the Air with Wi-Fi Signals for Virtual Reality
Devices,” [2] where they explored the utilization of Wi-Fi signals for air writing
in virtual reality environments. Their approach leveraged Wi-Fi signals to capture
Convolutional Neural-Network-based Gesture Recognition System … 361

hand motions and recognize air-drawn characters. Chen et al. (2016) contributed
to the field through their paper titled “Air-Writing Recognition—Part II: Detection
and Recognition of Writing Activity in Continuous Stream of Motion Data,” [17]
which focused on detecting and recognizing writing activities from a continuous
stream of motion data. Their work introduced techniques for accurately detecting
and recognizing air writing gestures. Collectively, these papers have made significant
contributions to air writing recognition by exploring various aspects such as 3D text
segmentation, Wi-Fi signal utilization, and continuous motion data analysis.
In the paper titled “Air-Writing with Sparse Network of Radars using Spatio-
Temporal Learning,” Arsalan et al. (2020) present a novel approach to air-writing
recognition that utilizes a sparse network of radars and spatiotemporal learning tech-
niques [18]. The study addresses the challenges of traditional radar-based air-writing
systems by proposing a solution that overcomes the limitations of trilateration algo-
rithms and the physical constraints of placing multiple radars. The authors employ
spatiotemporal learning to capture the temporal dynamics of air-written gestures,
improving the accuracy of recognition. Experimental results presented at the 25th
International Conference on Pattern Recognition (ICPR) demonstrate the effective-
ness of their approach, showcasing promising results in terms of recognition perfor-
mance. The work by Arsalan et al. contributes to the advancement of air-writing
recognition systems by introducing a novel technique that combines sparse radar
networks and spatiotemporal learning, providing insights for more accurate and
robust recognition of air-written gestures [18].
In the field of air-writing recognition, Escopete et al. (2021) proposed a research
paper titled “Recognition of English Capital Alphabet in Air Writing Using Convo-
lutional Neural Network and Intel RealSense D435 Depth Camera.” [19] Their study
focused on leveraging the capabilities of the Intel RealSense D435 depth camera and
Convolutional Neural Networks (CNNs) to accurately recognize air-written English
capital alphabet characters. The authors collected a dataset of air-written characters
using the depth camera and utilized CNN models for feature extraction and classifica-
tion. The results of their experiments demonstrated the effectiveness of their approach
in achieving high recognition accuracy. The paper by Escopete et al. contributes to
the advancement of air-writing recognition systems by highlighting the potential of
depth camera technology and CNNs for accurate recognition of air-written gestures
[19].
In the paper titled “Wearable Air-Writing Recognition System Employing
Dynamic Time Warping,” Luo et al. (2021) [20] propose a novel approach for air-
writing recognition using wearable devices and Dynamic Time Warping (DTW) algo-
rithms. The study focuses on addressing the challenges of recognizing air-written
gestures in a wearable context, where limited sensor data and diverse writing styles
can affect the accuracy of recognition. The authors introduce a system that leverages
DTW, a technique capable of capturing the temporal dynamics and variabilities in
air-writing gestures. The proposed system utilizes wearable devices to capture hand
movements and employs DTW algorithms for recognizing the intended characters.
Experimental results presented at the IEEE 18th Annual Consumer Communica-
tions and Networking Conference (CCNC) demonstrate the effectiveness of their
362 S. K. Modi et al.

approach, showcasing promising results in terms of recognition accuracy. The work

by Luo et al. contributes to the field of air-writing recognition by proposing a wear-
able system that employs DTW algorithms, providing insights for improving the
accuracy of recognizing air-written gestures in wearable applications [20].
Fang et al. (2020) presented a paper titled “Writing in the Air: Recognize Letters
Using Deep Learning Through Wi-Fi Signals,” [7] where they explored the use
of deep learning techniques and Wi-Fi signals to recognize air-drawn letters. Their
approach leveraged deep learning algorithms to capture and analyze Wi-Fi signals for
accurate letter recognition. These papers contribute to the field by introducing novel
deep learning approaches, such as Xception architecture and Wi-Fi signal-based
recognition, for improving the accuracy and efficiency of air writing recognition
systems.
Other contributions to the development of novel techniques and frameworks
include a paper presented by Chen et al. (2020) titled “Air Writing via Receiver
Array-Based Ultrasonic Source Localization,” [8] where they proposed a method
that utilizes receiver array-based ultrasonic source localization for air writing recog-
nition. Their approach focused on accurately localizing the position of the hand
using ultrasonic signals, enabling precise recognition of air-drawn characters. Choud-
hury and Sarma (2021) introduced a paper titled “A CNN-LSTM Based Ensemble
Framework for In-Air Handwritten Assamese Character Recognition,” [9] which
presented a framework that combined Convolutional Neural Networks (CNNs) and
Long Short-Term Memory (LSTM) networks for recognizing in-air handwritten
Assamese characters. Their ensemble approach leveraged the strengths of CNNs and
LSTMs to improve the accuracy of character recognition. These papers contribute to
the field by introducing innovative techniques such as ultrasonic source localization
and ensemble frameworks using CNNs and LSTMs, expanding the possibilities and
effectiveness of air writing recognition systems.
Uysal and Filik (2021) presented a paper titled “RF-Wri: An Efficient Frame-
work for RF-Based Device-Free Air-Writing Recognition” [21] in the IEEE Sensors
Journal. Their work focused on developing an efficient framework for recognizing
device-free air writing using radio frequency (RF) signals. The proposed framework,
called RF-Wri, utilized RF signals to capture the movements of the hand in the air and
accurately recognize the written characters. By leveraging the unique characteristics
of RF signals [21], such as signal strength and phase information, the framework
achieved efficient and robust air writing recognition. The paper contributes to the
field by introducing a novel approach that eliminates the need for wearable devices
or physical contact, offering a more practical and convenient solution for air writing
recognition.
Ramya et al. (2022) presented a paper titled “Air-Writing Recognition System”
[5] at the 2022 International Interdisciplinary Humanitarian Conference for Sustain-
ability (IIHC). Their research focused on developing an air writing recognition
system. The proposed system aimed to recognize handwritten characters and digits
drawn in the air, offering an intuitive and natural human–machine interface [5]. While
the abstract provides limited details, this work contributes to the field by exploring
the design and implementation of an air writing recognition system. The paper likely
Convolutional Neural-Network-based Gesture Recognition System … 363

discusses the methodology, techniques, and results of their system, providing insights
into the advancements in air writing recognition technology.
Hayakawa et al. (2022) presented a paper titled “Air Writing in Japanese: A
CNN-based Character Recognition System Using Hand Tracking” at the 2022 IEEE
4th Global Conference on Life Sciences and Technologies (LifeTech) [22]. Their
research focused on developing a character recognition system specifically for air
writing in the Japanese language. The proposed system utilized hand tracking tech-
niques combined with a Convolutional Neural Network (CNN) for accurate recog-
nition of air-drawn Japanese characters. By leveraging CNN’s ability to learn and
extract features from the captured hand movements, the system aimed to provide a
robust and efficient solution for recognizing handwritten characters in the air. This
work contributes to the field by addressing the unique challenges of air writing in
Japanese [22] and exploring the application of CNN-based approaches for character
recognition in this context.
Ahmed et al. (2022) published a paper titled “Radar-Based Air-Writing Gesture
Recognition Using a Novel Multistream CNN Approach” [23] in the IEEE Internet of
Things Journal. Their research focused on radar-based air-writing gesture recognition
and proposed a novel multistream Convolutional Neural Network (CNN) approach.
The system utilized radar sensors to capture hand movements in the air and employed
a multistream CNN architecture to effectively process and analyze the captured
data. By leveraging multiple streams of information, such as range and Doppler
data, the proposed approach aimed to improve the accuracy and robustness of air-
writing gesture recognition. This work contributes to the field by addressing the
challenges associated with radar-based air-writing recognition and introducing a
novel CNN-based approach to enhance the performance of such systems.
Overall, the related work in the field of air writing recognition has demonstrated
the potential of using CNNs and image processing techniques for accurately and
efficiently recognizing air writing gestures in real time. Despite the progress made,
additional research is required to enhance the precision and pace of recognition while
also addressing the ethical and legal considerations associated with this technology.

2 Methodology

The objective of this research paper is to explore the use of Convolutional Neural
Networks (CNNs) and image processing techniques for air writing recognition.
Specifically, the paper aims to investigate the effectiveness of CNNs in detecting
and recognizing characters written by hand in the air using a camera, without the
assistance of external devices. The study will also compare various image processing
techniques to enhance the visibility and accuracy of the captured images. The ulti-
mate objective of this research is to facilitate the development of an efficient and
precise air writing recognition system that can be utilized in diverse applications,
such as virtual reality interfaces, gesture-based control, and medical rehabilitation.
364 S. K. Modi et al.

2.1 Classification of Handwritten Characters

The proposed method is to develop a system for the recognition of handwritten

characters in the air using CNNs and other layers [13, 16]. The system involves
the following steps: Data Pre-processing includes the process of capturing images
of air writing which are then preprocessed using techniques such as normalization,
smoothing, and binarization, to enhance their quality and remove noise.

2.2 Image Preprocessing

Image analysis of each video frame can be conducted to extract relevant features
and classify the air writing gestures. The first step is to pre-process the video frames
by removing noise and enhancing the contrast. This can be done using techniques
such as image thresholding, adaptive histogram equalization, and Gaussian blur.
Next, the air writing gestures can be extracted from the pre-processed frames using
techniques such as edge detection, contour detection, and optical flow. The edges
can be detected using the Canny edge detector, and the contours can be extracted
using the findContours function in OpenCV. Optical flow techniques such as Lucas-
Kanade or Farneback can be used to track the movement of the air writing gestures
over time (Fig. 1).
After the gestures have been extracted, relevant features can be extracted from
them. These features can include stroke direction, stroke length, curvature, and angle
of the strokes [3, 16]. These features can be extracted using techniques such as Hough
transforms, corner detection, and image moments. The extracted features can be

Fig. 1 Representation of depth analysis

Convolutional Neural-Network-based Gesture Recognition System … 365

used to classify the air writing gestures using a machine learning algorithm such as
a convolutional neural network. Overall, image analysis of each video frame using
OpenCV can be a powerful technique for air writing recognition, as it allows for the
extraction of relevant features from the video frames and the classification of the air
writing gestures using machine learning algorithms[11].
Feature Extraction plays a crucial role in image processing, as it involves the trans-
formation of pre-processed images through convolutional neural networks (CNNs)
and other layers to extract relevant features and patterns from the input. This process
involves a series of operations, including max pooling, dropout, flatten, hidden layers,
and SoftMax layers. The goal is to capture the essential characteristics of the images,
such as edges, textures, shapes, and colors, which are then used for various tasks,
including object recognition, image classification, and image retrieval. Training and
classification are done once the relevant features have been extracted from the pre-
processed images to train the model. This training phase is crucial as it involves the
optimization of the weights and biases associated with the different layers of the
network. Backpropagation, a technique that calculates the gradients of the model’s
parameters, is employed to propagate the error through the network. Stochastic
gradient descent is then utilized to update the weights and biases based on these
gradients, gradually minimizing the loss function. Once the model has undergone
the training process, it becomes capable of real-time classification of handwritten
characters. The input images are fed into the trained model, which then applies the
learned features and patterns to make predictions. The model assigns a specific class
label to each input character based on its understanding of the extracted features and
the patterns associated with different characters. This classification process enables
the system to decipher and recognize handwritten characters with a certain level of
accuracy.
Here, the model was evaluated based on two different architectural layers in
the CNN model. The detailed analysis is as follows: Fig. 2 depicts the 2-layered
architectural model representation based on CNN where multiple layers are present
[13]. The first two layers are convolutional layers A and B which consist of different
filter sizes such as 32 and 64 filters, respectively. The functioning of the layers is based
on Rectified Linear Unit (ReLU), which is a frequently utilized activation function in
neural networks. Then the image is passed through the max pooling layer for feature
extraction. Next in the sequence is the dropout layer, a widely adopted regularization
technique in neural networks. Its function is to prevent overfitting by randomly setting
a specified fraction of input units to zero during each training iteration. Then the
image is processed through the flatten layer and the main function of the flatten layer
is to convert the multidimensional output of the previous convolutional layers into
a one-dimensional vector, which can be passed as input to a fully connected layer.
For the final processing, the image is processed through the SoftMax layer and the
main benefit of using this layer is that it allows the model to produce a probability
distribution over the predicted classes, which can be useful for interpreting the output
of the model and making decisions based on the probabilities, and it provides a way
to train the model using a loss function that accounts for the predicted probabilities.
366 S. K. Modi et al.

Fig. 2 Representation of
2-layered architectural model
Convolutional Neural-Network-based Gesture Recognition System … 367

Figure 3 depicts the 3-layered architectural model representation based on CNN,

where the addition of one more convolutional layer C was implemented with 128
filters to achieve a higher accuracy. Also, in this architecture, the hidden layer was
treated with 256 nodes instead of 128 nodes to provide a high level of abstraction and
representation power, which can lead to better predictive performance, and it can be
trained effectively using standard optimization techniques.

2.3 Dataset Analysis

In the context of generating input data for air writing recognition, the EMNIST [11]
dataset was used, and the following preprocessing steps needed to be performed: Data
normalization includes the EMNIST [11] dataset contains images of handwritten
digits and characters that vary in size and orientation [12]. Therefore, the first step is
to normalize the data by resizing all the images to a fixed size (e.g., 28 × 28 pixels)
and aligning them in a consistent orientation. Data augmentation requires increasing
the size of the dataset and reducing overfitting, data augmentation techniques such as
rotation, scaling, and horizontal flipping can be applied to the images. Data splitting
involves dividing the dataset into three sets: training, validation, and test. The training
set is employed to train the model, while the validation set is utilized to fine-tune
hyperparameters and avoid overfitting. Lastly, the test set is utilized to evaluate the
final performance of the model. Data preprocessing includes scaling the pixel values
of the images to a range of 0–1, and the labels can be converted to one-hot encoding
to represent the different classes. Data shuffling is done to prevent the model from
learning the order of the data, the training and validation sets can be shuffled before
each epoch. By performing these preprocessing steps on the EMNIST [11] dataset,
it is possible to generate high-quality input data for training and testing air writing
recognition models.

2.4 Input Generation

The EMNIST [11] dataset was converted to cast the Boolean values as float32 values
as the dataset needed to be reshaped. The hyperparameters were defined using 3
convolutional layers with 128 filters and 256 nodes. The range of HSV color space
filters were used to define the lower boundary as (29,86,6) and the upper boundary as
(64,255,255) for the color green. The following figures represent the input generation
from the model. Figure 4. Represents the valid boundary space for giving inputs,
Figure 5 represents the successful processing of the input on the blackboard screen.
Figure 6 represents the processing of output and the output generated after processing
of the character on the blackboard screen.
368 S. K. Modi et al.

Fig. 3 Representation of
3-layered architectural model
Convolutional Neural-Network-based Gesture Recognition System … 369

Fig. 4 Representation of the boundary for input writing

Fig. 5 Representation of character

Fig. 6 Image of output generated

3 Results and Discussion

The proposed approach in this research paper introduces several novel aspects
that contribute to the advancement of air writing recognition systems. These novel
elements validate the uniqueness and effectiveness of our approach. Firstly, this study
focuses on recognizing handwritten characters and digits in the air without the need
for external hardware. This distinguishes the proposed approach from traditional
handwriting recognition systems that rely on physical input devices such as styluses
or touchscreens. By leveraging the power of CNNs and image analysis techniques,
it enables users to write in the air, providing a more intuitive and natural interaction
method. Secondly, utilization of the EMNIST [11] dataset for training and testing
the models. While this dataset has been used in previous research, its application
specifically for air writing recognition is novel. The EMNIST [11] dataset offers a
370 S. K. Modi et al.

diverse range of handwritten characters and digits, allowing the model to train on a
comprehensive set of examples. This ensures that the system is robust and capable of
recognizing a wide variety of air-written inputs. Thirdly, the exploration of different
CNN architectures, including 2-layered and 3-layered configurations, provides valu-
able insights into the optimal design for air writing recognition. The comparison of
these architectures reveals that the 3-layered CNN outperforms the 2-layered counter-
part in terms of accuracy and loss. This finding contributes to the body of knowledge
regarding the architecture selection for air writing recognition systems. Additionally,
the integration of OpenCV for image analysis of each video frame is a novel aspect of
our approach. This step allows for precise pre-processing and enhances the quality of
input data. By leveraging OpenCV’s capabilities, improvement of the overall perfor-
mance of the proposed system can be achieved and more accurate recognition results
are obtained. While the proposed approach showcases several novel elements, it is
important to acknowledge its limitations. The current model may face challenges
in recognizing complex or ambiguous gestures, which could be addressed in future
research. Furthermore, the proposed approach relies on the availability of a suitable
dataset, and the expansion of the dataset to include more diverse handwriting styles
and variations could further enhance the system’s performance.
In conclusion, the discussion validates the novelty of the proposed approach in air
writing recognition. The combination of recognizing air-written characters without
external hardware, utilizing the EMNIST dataset, exploring different CNN archi-
tectures, and integrating OpenCV for image analysis collectively contribute to the
uniqueness and effectiveness of this approach. By addressing the identified limita-
tions and building upon the proposed model, continuation to advance the field of air
writing recognition can be done and unlock its full potential in various domains.

3.1 Result Analysis

In order to evaluate the performance of the proposed air writing recognition system,
certain simulations were conducted comparing a 2-layered CNN and a 3-layered
CNN in terms of their accuracy and loss values. The results of the simulations revealed
that the 2-layered CNN achieved an accuracy of approximately 64% with a corre-
sponding loss value of approximately 15%. On the other hand, the 3-layered CNN
exhibited a higher accuracy of approximately 88% but had a relatively higher loss
value of approximately 37%. These simulation results clearly demonstrate that the
3-layered CNN outperformed the 2-layered CNN in terms of accuracy, showcasing
its superior capability in correctly recognizing air-written characters and digits. The
significantly higher accuracy of the 3-layered CNN suggests that it is more adept at
capturing the intricate patterns and variations present in air-written gestures, leading
to more precise recognition outcomes. However, it is important to note that the 3-
layered CNN also exhibited a higher loss value compared to the 2-layered CNN. A
lower loss value typically indicates better model performance as it reflects the degree
of error in the predictions. The higher loss of the 3-layered CNN suggests that it may
Convolutional Neural-Network-based Gesture Recognition System … 371

face challenges in accurately predicting certain instances, leading to a larger margin

of error. This indicates the presence of potential areas for improvement in the model’s
learning process.
Figures 7 and 8 represent the accuracy and loss of model in 2-layered architecture.
Figures 9 and 10 represent the accuracy and loss of the model in 3-layered
architecture.
Considering these simulation results, it is evident that the 3-layered CNN architec-
ture shows promising potential for enhancing the accuracy of air writing recognition
systems. Further research and analysis can delve into refining the model’s architec-
ture, optimizing hyperparameters, or exploring alternative configurations to strike a
balance between accuracy and loss. Moreover, from a technological perspective, the
implications of air writing recognition are significant. This technology opens new

Fig. 7 Representation of
accuracy data on a 2-layered
model

Fig. 8 Representation of
loss data on a 2-layered
model
372 S. K. Modi et al.

Fig. 9 Representation of
accuracy data on 3-layered
model

Fig. 10 Representation of
loss data on 3-layered model

possibilities for intuitive and natural human–computer interaction, where users can
write in the air without the need for physical input devices. It has the potential to
revolutionize fields such as virtual reality [2], augmented reality, and accessibility,
enabling more immersive experiences and improved communication channels. To
fully realize the potential of air writing recognition, future research can explore addi-
tional aspects such as real-time implementation, optimization for different devices
and platforms, and integration with complementary technologies like depth sensing or
motion capture. These advancements will contribute to the practical deployment and
widespread adoption of air writing recognition systems. In summary, the simulation
results highlight the superior accuracy of the 3-layered CNN in air writing recog-
nition compared to the 2-layered CNN. The findings underscore the technological
potential of air writing recognition for transforming user interfaces and improving
interaction in various domains. Continued research and development are necessary
Convolutional Neural-Network-based Gesture Recognition System … 373

to address the model’s limitations and enhance its performance, ultimately bringing
air writing recognition closer to real-world applications.

4 Mathematical Analysis

In the context of air writing recognition using a convolutional neural network (CNN),
a mathematical analysis can be conducted to evaluate the performance of the model
with the use of the Rectified Linear Unit (ReLU) activation function and feature
extraction techniques, based on their accuracy. Let the input data be denoted as X,
and the ground truth labels be denoted as Y. Let f 1, f 2, and f 3 be the feature extraction
functions of the CNN, and w1, w2, and w3 be the corresponding weights and biases
of the convolutional and fully connected layers.
The forward pass of the CNN can be represented as

Z1 = f 1(X , w1) (1)

A1 = ReLU (Z1) (2)

Z2 = f 2(A1, w2) (3)

A2 = ReLU (Z2) (4)

Z3 = f 3(A2, w3) (5)

A3 = SoftMax(Z3) (6)

where the ReLU activation function is used to introduce non-linearity in the model,
and the SoftMax function is used to obtain the predicted class probabilities. The aim
is to reduce the cross-entropy loss between the predicted output A and the actual
ground truth labels Y:

J (A, Y ) = − Y ∗ log(A) (7)

To optimize the model, the gradient descent algorithm can be used. This involves
using backpropagation to calculate the gradients of the loss function with respect to
the weights and biases. The accuracy of the model can be evaluated using a test set,
where the predicted output A can be compared with the ground truth labels Y. The
analysis outcomes suggest that utilizing the ReLU activation function and imple-
menting feature extraction techniques have the potential to enhance the accuracy of
374 S. K. Modi et al.

the CNN for air writing recognition. By utilizing the ReLU activation function, non-
linearity can be introduced in the model, thereby enabling it to better capture intricate
relationships within the input data. Furthermore, feature extraction techniques can
assist in identifying distinctive features from the input data, thereby improving clas-
sification performance. Fine-tuning of hyperparameters, such as learning rate, batch
size, and number of layers, can further improve the accuracy of the model.

5 Conclusion

This research encompasses an in-depth exploration of Convolutional Neural

Networks (CNNs) for the purpose of air writing recognition. The central aim was to
construct a reliable and precise system capable of identifying handwritten characters
and digits traced in the air, devoid of external hardware dependency. To attain this
objective, the EMNIST dataset [11] was meticulously pre-processed, representing
a diverse collection of images containing handwritten characters and digits. This
dataset served as the foundation for training and evaluating various CNN models, each
experimented with distinct architectural configurations, including both 2-layered and
3-layered CNNs.
Throughout the experiments, optimization of the models was achieved through
the application of the ReLU activation function and utilization of feature extrac-
tion techniques. The results consistently favored the 3-layered CNN architecture,
showcasing superior accuracy and minimized loss in comparison to the 2-layered
counterpart. Notably, the integration of the ReLU activation function and feature
extraction techniques significantly bolstered the overall performance of the models.
Additionally, the integration of OpenCV for comprehensive image analysis of each
video frame was explored, offering valuable insights into the essential pre-processing
steps required to generate high-quality input data for air writing recognition.
In conclusion, this research demonstrates the efficacy of CNNs in the domain
of air writing recognition and underscores their substantial potential for real-world
applications. The proposed model exhibits several features contributing to its success,
including the optimized CNN architecture, activation function, and feature extraction
techniques. However, it’s crucial to acknowledge the limitations of this study. The
proposed model’s efficacy heavily relies on the availability of an appropriate dataset,
and further enhancements can be realized through dataset expansion to encompass a
broader spectrum of handwriting styles and variations. Additionally, the model may
face challenges when tasked with recognizing intricate or ambiguous gestures.
This research significantly contributes to the field of air writing recognition by
presenting a tangible demonstration of CNN’s effectiveness and introducing a model
with promising capabilities. The findings pave the way for future advancements,
emphasizing the need for ongoing exploration and innovation in this dynamic and
evolving domain.
Convolutional Neural-Network-based Gesture Recognition System … 375

5.1 Future Work

In the future scope, there are several avenues for future research. Alternative neural
network architectures could be explored to further enhance the performance of
air writing recognition systems. Additionally, additional pre-processing techniques,
such as data augmentation or advanced noise reduction methods, may be investigated
to improve the robustness of the system. The application potential of air writing recog-
nition systems is vast, ranging from human–computer interaction to virtual reality
and augmented reality applications. The ability to input text and commands through
air writing can revolutionize user interfaces and enable new modes of communi-
cation. One area for future research is the exploration of different neural network
architectures, such as Recurrent Neural Networks (RNNs), to determine if they can
improve the accuracy and speed of air writing recognition. Additionally, incorpo-
rating advanced techniques such as transfer learning and reinforcement learning may
also be beneficial. Further research could focus on creating air writing recognition
systems that can function in real time and adapt to dynamic environments. This
could involve the use of additional sensors such as accelerometers and gyroscopes to
provide additional data for the recognition system. In summary, this research paves
the way for utilizing CNNs in air writing recognition systems and provides insights
into their strengths, limitations, and future possibilities. By addressing the identified
challenges and expanding upon the proposed model, advancements in the field of air
writing recognition can be achieved and contribute to its practical implementation in
various domains.

References

1. Kumar P, Saini R, Roy PP, Dogra DP (2017) 3D text segmentation and recognition using
leap motion. Multimed Tools Appl 76(15):16491–16510. https://doi.org/10.1007/s11042-016-
3923-z
2. Fu Z, Xu J, Zhu Z, Liu AX, Sun X (2019) Writing in the air with WiFi signals for virtual
reality devices. IEEE Trans Mob Comput 18(2):473–484. https://doi.org/10.1109/TMC.2018.
2831709
3. Kumar P, Saini R, Roy PP, Dogra DP (2017) Study of text segmentation and recognition using
leap motion sensor. IEEE Sens J 17(5):1293–1301. https://doi.org/10.1109/JSEN.2016.264
3165
4. Itaguchi Y, Yamada C, Fukuzawa K (2015) Writing in the air: contributions of finger movement
to cognitive processing. PLoS One 10(6). https://doi.org/10.1371/journal.pone.0128419
5. Ramya ST, Sakthi R, Rohitha B, Praveena D (2022) Air-writing recognition system. In: 2022
international interdisciplinary humanitarian conference for sustainability (IIHC), Bengaluru,
India, pp 910–913. https://doi.org/10.1109/IIHC55949.2022.10059943
6. Chollet F. Xception: deep learning with depthwise separable convolutions.
7. Fang Y, Xu Y, Li H, He X, Kang L (2020) Writing in the air: recognize letters using deep
learning through WiFi signals. In: Proceedings—2020 6th international conference on big
data computing and communications, BigCom 2020. Institute of Electrical and Electronics
Engineers Inc., pp 8–14. https://doi.org/10.1109/BigCom51056.2020.00008
376 S. K. Modi et al.

8. Chen H, Ballal T, Muqaibel AH, Zhang X, Al-Naffouri TY (2020) Air writing via receiver array-
based ultrasonic source localization. IEEE Trans Instrum Meas 69(10):8088–8101. https://doi.
org/10.1109/TIM.2020.2991573
9. Choudhury A, Sarma KK (2021) A CNN-LSTM based ensemble framework for in-air
handwritten Assamese character recognition. Multimed Tools Appl 80(28–29):35649–35684.
https://doi.org/10.1007/s11042-020-10470-y
10. Mukherjee S, Ahmed SA, Dogra DP, Kar S, Roy PP (2019) Fingertip detection and tracking for
recognition of air-writing in videos. Expert Syst Appl 136:217–229. https://doi.org/10.1016/j.
eswa.2019.06.034
11. Cohen G, Afshar S, Tapson J, van Schaik A (2017) EMNIST: an extension of MNIST to
handwritten letters [Online]. http://arxiv.org/abs/1702.05373
12. Abadi M, et al. (2016) TensorFlow: large-scale machine learning on heterogeneous distributed
systems [Online]. http://arxiv.org/abs/1603.04467
13. Chen M, AlRegib G, Juang BH (2016) Air-writing recognition—Part I: Modeling and recogni-
tion of characters, words, and connecting motions. IEEE Trans Hum Mach Syst 46(3):403–413.
https://doi.org/10.1109/THMS.2015.2492598
14. Kane L, Khanna P (2017) Vision-based mid-air unistroke character input using polar signatures.
IEEE Trans Hum Mach Syst 47(6):1077–1088. https://doi.org/10.1109/THMS.2017.2706695
15. Pedregosa F, Varoquaux G, Thirion B, Michel V, Dubourg V, Passos A, Perrot M, et al (2011)
Scikitlearn: machine learning in Python [Online]. http://scikit-learn.sourceforge.net
16. Roy P, Ghosh S, Pal U (2018) A CNN based framework for unistroke numeral recognition in
airwriting. In: Proceedings of international conference on frontiers in handwriting recognition,
ICFHR, Institute of Electrical and Electronics Engineers Inc., pp 404–409. https://doi.org/10.
1109/ICFHR-2018.2018.00077
17. Chen M, AlRegib G, Juang BH (2016) Air-writing recognition—Part II: Detection and recog-
nition of writing activity in continuous stream of motion data. IEEE Trans Hum Mach Syst
46(3):436–444. https://doi.org/10.1109/THMS.2015.2492599
18. Arsalan M, Santra A, Bierzynski K, Issakov V (2021) Air-writing with sparse network of radars
using spatio-temporal learning. In: 2020 25th international conference on pattern recognition
(ICPR), Milan, Italy, pp 8877–8884. https://doi.org/10.1109/ICPR48806.2021.9413332
19. Escopete M, Laluon C, Llarenas E, Reyes P, Tolentino R (2021) Recognition of English capital
alphabet in air writing using convolutional neural network and intel RealSense D435 depth
camera, pp 1–8. https://doi.org/10.1109/GCAT52182.2021.9587515
20. Luo Y, Liu J, Shimamoto S (2021) Wearable air-writing recognition system employing dynamic
time warping. In: 2021 IEEE 18th annual consumer communications and networking confer-
ence (CCNC), Las Vegas, NV, USA, pp 1–6. https://doi.org/10.1109/CCNC49032.2021.936
9458
21. Uysal C, Filik T (2021) RF-Wri: an efficient framework for RF-based device-free air-writing
recognition. IEEE Sens J 21(16):17906–17916. https://doi.org/10.1109/JSEN.2021.3082514
22. Hayakawa S, Goncharenko I, Gu Y (2022) Air writing in Japanese: a CNN-based character
recognition system using hand tracking. In: 2022 IEEE 4th global conference on life sciences
and technologies (LifeTech), Osaka, Japan, pp 437–438. https://doi.org/10.1109/LifeTech5
3646.2022.9754825
23. Ahmed S, Kim W, Park J, Cho SH (2022) Radar-based air-writing gesture recognition using
a novel multistream CNN approach. IEEE Internet Things J 9(23):23869–23880. https://doi.
org/10.1109/JIOT.2022.3189395
24. Tsai T-H, Hsieh J-W (2017) Air-writing recognition using reverse time ordered stroke context.
In: 2017 IEEE international conference on image processing (ICIP), Beijing, China, pp 4137–
4141. https://doi.org/10.1109/ICIP.2017.8297061
A Protection Approach for Coal Miners
Safety Helmet Using IoT

Shabina Modi, Yogesh Mali, Lakshmi Sharma, Prajakta Khairnar,

Dnyanesh S. Gaikwad, and Vishal Borate

Abstract The goal of the coal mining helmet proposed in this research is to provide
insurance to miners by forewarning them. As long as the person is carrying the
protective cap, all of the components may be mentioned. The output of the cap
module is updated continually for each example, updating the cloud with real-time
data. These wearable devices can share their data or retrieve it from other sources
thanks to the Internet of Things (IoT). If there is a threat, warnings are given to the
employer and the digger. The creation of wearable PC frameworks and universal
registration has tremendously aided the advancement of wearable technology. As a
result, this wearable device includes a wide range of sensors that allow it to connect
with other parts and enhance the insurance of the digger. The equipment has integrated
data gathering, information management, and information correspondence parts. The
DHT11 temperature and humidity sensor was employed. There are times when the
heat and moisture levels in mines are too high and the excavator dies. Anyone inside
the mines should have respiratory problems as a result of those gases being released,
which could lead to suffocation. A notification is communicated to both the bottom
Authorizer and the digger inside the not possible occasion that as a minimum one of
these pieces goes beyond the breaking point.

Keywords Organic light-emitting diode · Sensor for humidity · MQ_2

S. Modi
Karmaveer Bhaurao Patil College of Engineering, Satara, India
e-mail: [email protected]
Y. Mali (B) · L. Sharma
G.H Raisoni College of Engineering & Management, Wagholi, Pune, Maharashtra, India
e-mail: [email protected]
P. Khairnar
Ajeenkya D Y Patil School of Engineering, Pune, India
D. S. Gaikwad · V. Borate
D Y Patil College of Engineering & Innovation, Talegoan, Pune, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 377
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_30
378 S. Modi et al.

1 Introduction

Mineral resources are diverse and abundant, and its mining sector is enormous. The
mining industry has a very strong stake in proper oversight and legal communications.
Bosses are held accountable for all injuries sustained while under their supervision, so
they are aware of any potentially dangerous situations. The problem being addressed
is the development of a mining head cover to raise digger safety awareness. Being
mindful of one’s surroundings when using loud gear would typically involve testing.
On the other hand, diggers hardly ever take their headgear off. Excavators
frequently take off some of their protective gear in the mining industry because
it is too cumbersome, hot, or uncomfortable to wear. By and large, mining safety
helmets only serve to shield the digger’s head from potentially harmful blows; no
technology has been developed to make head guards warn workers when a particular
digger has had a dangerous episode. The project’s goal is to emphasize a remote
sensor hub organization to connect to the current mining security cap and make
it much more secure. The goal was expanded to include developing a system that
would fit within a health-protective cap and operate for an extended period of time on
battery power [1]. Another challenge was to change the protective cap’s appearance
without compromising its functionality. The additional weight must be maintained
as light as reasonably possible. Utilizing WSN innovation, information is gathered
or the boundaries are estimated. The WSN invention is a fixed of sensors, every of
which has a very unique variety of detection but functions as an entire component of
the system. The degree of humidity and temperature is displayed to the digger on an
(herbal energy), and for the gas, a restriction is prepared, and a ringer alert is issued
if it exceeds the restriction notably. Through incorporating perception, a mining cap
should be modified to a resource the safety of unskilled people. While an excavator
takes to the air his headgear, the digger should be informed earlier than starting up
his cap [2]. An excavator will both grow to be unaware or robust if something falls
on him at the same time as he is gaming his cap. If probably fatal harm has been
sustained through an unskilled worker, the framework should reap that end. The ones
instance characteristic examples of formidable times 0.47, hazard gases have been
recognized and reported.

2 Architecture

The transmitter half, temperature sensor, humidity sensor, LDR, and power supply
are all included in the transmitter portion of the framework arrangement. Then comes
Arduino, followed by Driven and a bell inside the recipient portion. The untrained
employee turns on the protective cap hardware as soon as he enters. The tempera-
ture and moisture sensor DHT_11 constantly observes changes in temperature and
mugginess, determines whether or not or not the professional is covered, and informs
the outcomes. Therefore, safety precautions can be executed while temperature or
A Protection Approach for Coal Miners Safety Helmet Using IoT 379

moisture, one of the conditions, becomes regard for the professionals. Due to this the
mild-emitting diode will squint and suggest that it’s miles unreliable for the worker.
The MQ2 gasoline sensor detects poisonous gases like ethane, methane, butane, and
others [3]. If this kind of gas is detected, the signal grows to emerge as ON and it
will begin to blare, and the device needs a way to determine whether an unskilled
worker has suffered a doubtlessly fatal injury these help prevent workers from being
exposed to risky gases.

3 Data Transfer and Signal Transfer Structure

See Fig. 1.

Fig. 1 Transfer structure

380 S. Modi et al.

Fig. 2 Gas detection

component

4 Overview of Different Components

4.1 MQ-2

MQ-2 may be a gas sensor of the metal oxide semiconductor range. The fuel concen-
tration inside the gas is screened by the usage of a voltage divider community within
the sensor. The detecting trouble of this sensor is often composed of aluminum oxide
and is constant after combustion with tin dioxide. It is enclosed in a community of
quite tempered metallic. The detecting outline is supported via way of six interwoven
legs [4]. The detecting element is warmed with the beneficial useful resource of two
leads, at the same time as the final results signals are dealt with the resource of the
opportunity four leads. Oxygen is superficially absorbed while a sensor material is
heated to an absurd temperature in air [5]. The oxygen is in the end attracted to
by way of the advantage electrons within the tin oxide, which prevents the oxygen
from flowing. While the declining gases are present, the one’s oxygen iotas interact
with them and reduce the thickness of the adsorbed oxygen. Presently, the non-stop
modern-day glide via the sensor is used to calculate clean voltage values. Those
voltage measurements are used to evaluate the gas fixation. The voltage levels are
higher when the gasoline fixation is excessive (Fig. 2).

4.2 DHT-11

DHT-11 is an automatic sensor with a low estimation that measures temperature and
stickiness. As of proper now, any microcontroller, which includes an Arduino or a
Raspberry Pi, can speak with this sensor to quickly measure the temperature of the
DHT-11 sensor and moisture.
A thermistor for temperature identity and an electrical oddity mugginess identi-
fying aspect makes up the DHT-11 sensor. A substrate that holds moisture serves as
the dielectric among the two cathodes of the mugginess sensor capacitor. The value
of capacitance changes as mugginess degrees exchange [6]. The IC degree is used
A Protection Approach for Coal Miners Safety Helmet Using IoT 381

Fig. 3 Application circuit

to research these changed obstruction values, turning them into superior structures.
The resistance of the thermostat utilized by this sensor, which has terrible temper-
ature steady, drops because the temperature rises. To provide a more competitive
cost, no matter the temperature variant, this finder is generally manufactured from
semiconductor ceramics or polymers. DHT-11 has 2-degree accuracy throughout a
temperature range of zero to 55 levels Celsius. This sensor has a 23% stickiness range
with 35% accuracy. The DHT-11 is probably a small semiconductor with a running
voltage of 8–17 V for the reason that this sensor’s testing frequency is one Hz. In the
course of the estimate, present day of no extra than 2. 5 mA may be used. The V-CC,
G-ND, records pins, and an unconnected pin for interacting with the microcontroller
are the four pins on the DHT-11 sensor. There’s a pull-up resistor that degrees from
15 to 25 k [7] (Fig. 3).

4.3 LDR

A mild-based electrical aspect, consisting of an LDR, relies upon mild. The impedi-
ment will abruptly exchange whilst mild beams strike it. An LDR’s operating popular
is photoconductive, which is simply an optical oddity (Fig. 4).

Fig. 4 Application overview

382 S. Modi et al.

The substance becomes more recognizable the longer it absorbs mild. The elec-
trons in the cloth’s valence band rapidly shift to the conduction band as soon as mild
shines on the LDR. Whilst the band hole of the occurrence is more substantial than
the photons inside the episode mild, excessive-depth light reasons greater electrons
to be interested in the peculiarity band, activating multiple fee transporters inside the
system [8]. The obstruction of the device lessens because of the end result of this
approach and also its development begins to flow more.

5 Algorithm

1. Start.
2. Import the D-HT P-IN and DHT-TYPE as DHT-11 pins from the DHT-11 libraries
and framework.
3. Exchange the light and satisfied variables to thirteen, the smoke Z0 variable to
Z1, the ringer to 11, and so on.
4. The functionality characterizes the data and result pins inside the association
with the use of pin mode and sequential start to begin the Arduino. The result
pins are inexperienced pushed and signed, even as the fact pins are SmokeZ0 and
DHT-PIN.
5. In the Circle capability, have a look at the sensor value of MQ2 using a simple
look at. If the sensor price is more than 330, print that smoke is being detected
and keep a postponement of two seconds for a number of the tendencies.
6. If the charge of the mild is less than 530, print the LDR as a simple observation
and save it at some stage in a mild factor. If the cost of slight is more than the
fee of print, preserve the mild pressure low and turn it on; if the fee of slight is
greater than the rate of print, preserve the mild strain excessive and flip it off.
7. In DHT-eleven, employ the evolved peruse capability. Sticky to get moist and
scanned Temperature to prompt the temperature and preserve them in separate
variables known as humi and tempc. If humi or tempc have values larger than 22
or 33, respectively, the gled will turn on.
8. Stop

6 Flowchart

See Fig. 5.

7 Circuit Diagram and Working

See Fig. 6.
A Protection Approach for Coal Miners Safety Helmet Using IoT 383

Fig. 5 Flowchart

LDR resistance, three drives, a MQ-2 sensor, and a DHT-11 sensor. We addition-
ally have an L-DR further to the standard V-CC this is linked to the pin reset at the
left for the general public of gadgets. The L-DR’s positive cease is stressed out to
a 200K ohm resistor at the pin, and its poor stop is stressed out to the floor. A blue
mild-discharging diode, which is linked to pin 8, is the L-DR’s output [9]. 5 pins
make up the M-Q2 sensor: V-CC, floor, and result. The indicators for the end result
pin and pin nine are a bell and a red mild-emitting L-ED, respectively. V-CC, ground,
and data are the 6 pins that make up the DHT-11 sensor. The facts pin is connected
to result pin 7 and the result indicator is an inexperienced L-ED.
384 S. Modi et al.

Fig. 6 Connection overview

8 Signal Flow Directions

See Figs. 7 and 8.

Fig. 7 Transfer of signals with direction

A Protection Approach for Coal Miners Safety Helmet Using IoT 385

Fig. 8 Hardware sector

9 Experimental Result and Discussion

See Figs. 9 and 10.

10 Testing Output

At the digital terminal, every MQ-2, LD-R, and DHT-11 result is displayed separately.
Further, the ringer will continuously warn or alert for MQ-2, the DHT-11 result should
be displayed in green, and the L-DR result has to be displayed in red [10].
This framework has designed a continuous facts monitoring framework for under-
ground herbal of mine supported remote identifier organization. It has the ability to
display data transmission between mine terminals and mines and alert about anoma-
lous ecological obstacles. This framework offers smart adaptability and expansibility,
helpful structure management, and minimal installation and protection charges [11].
386 S. Modi et al.

Fig. 9 Interface of hardware

Fig. 10 Real-time lookout

It’s been speculated to broaden a clever mining helmet protection that could
distinguish amongst three numerous classes of in all likelihood hazardous situations,
consisting of concentrations of poisonous fuel, mining cap expulsion and crash, and
effect. An excavator taking their mining cap from their head became a volatile inci-
dence [12]. An object striking an excavator in the direction of its will and with a
power of a couple of thousand at the HDP (Head harm policies) is some other prob-
able lethal state of affairs. Estimating gasoline concentrations is also an opportunity
(Table 1).
A Protection Approach for Coal Miners Safety Helmet Using IoT 387

Table 1 Performance of
Main component DHT-11 DHT-22
temperature sensor DHT-11,
DHT-22 Temp check −0 to 10° ±100°
Temp range 25–50° 35–70°
Required power 4.3–7 v 4.6–10 v
Humid range 35% 43%
Size of sample Few minutes Few seconds
Results in bits 17 bits 24 bits

11 Future Scope

The design includes joining a Wi-Fi-based system that could incorporate the neces-
sary information and update it inside the informational collection. The informational
index will remember the ideal opportunity for the location of natural components
nuances in light of the fact that the data will be conveyed routinely. The informational
collection will consequently be made somewhat open so that managers and higher
experts can screen for any disturbing circumstances and work with the previous
accessibility of clinical work [10]. The emergency office will profit from the GPS
module. Utilization of some AI models can likewise assist with working on the
framework in future [11]. To convey help all the more rapidly in case of hazardous
circumstances, find the backhoes.

12 Conclusion

We have successfully created an intelligent worker headgear that can detect gases,
stickiness, temperature, and light. The edge values, which are physically fixed, can
be updated based on the typical conditions of the mining areas. In order to take
proper precautions against any unpredictable conditions, the sensors will identify
any changes. In the case of anything risky, the backhoe will be made mindful of it
as an adjustment of light-radiating diode tone and a caution from signal. On the odd
occasion that the diggers are inaccessible, we have likewise arranged a GPS module
that might give the area of the diggers consistently.

References

1. Borate V, Mali Y, Suryawanshi V, Singh S, Dhoke V, Kulkarni A (2023) IoT Based Self
Alert Generating Coal Miner Safety Helmets, 2023 International Conference on Computational
Intelligence, Networks and Security (ICCINS), Mylavaram, India, pp. 01–04. https://doi.org/
10.1109/ICCINS58907.2023.10450044
388 S. Modi et al.

2. Mali YK, Darekar SA, Sopal S, Kale M, Kshatriya V, Palaskar A (2023) Fault Detection of
Underwater Cables by Using Robotic Operating System, 2023 IEEE International Carnahan
Conference on Security Technology (ICCST), Pune, India, pp. 1–6. https://doi.org/10.1109/
ICCST59048.2023.10474270
3. Vaidya AO, Dangore M, Borate VK, Raut N, Mali YK, Chaudhari A (2024) Deep Fake Detec-
tion for Preventing Audio and Video Frauds Using Advanced Deep Learning Techniques,
2024 IEEE Recent Advances in Intelligent Computational Systems (RAICS), Kothamangalam,
Kerala, India, pp. 1–6. https://doi.org/10.1109/RAICS61201.2024.10689785
4. Bhongade A., Dargad S, Dixit A., Mali YK, Kumari B, Shende A (2024) Cyber Threats in Social
Metaverse and Mitigation Techniques. In: Somani AK, Mundra A., Gupta RK, Bhattacharya S,
Mazumdar AP (eds) Smart Systems: Innovations in Computing. SSIC 2023. Smart Innovation,
Systems and Technologies, vol 392. Springer, Singapore. https://doi.org/10.1007/978-981-97-
3690-4_34
5. Shabina M, Sunita M, Sakshi M, Rutuja K, Rutuja J, Sampada M, Yogesh M (2024) Automated
Attendance Monitoring System for Cattle through CCTV. Revista Electronica De Veterinaria,
25(1), 1025–1034. https://doi.org/10.69980/redvet.v25i1.724
6. Karajgar MD et al. (2024) Comparison of Machine Learning Models for Identifying Mali-
cious URLs, 2024 IEEE International Conference on Information Technology, Electronics and
Intelligent Communication Systems (ICITEICS), Bangalore, India, , pp. 1–5. https://doi.org/
10.1109/ICITEICS61368.2024.10625423
7. Mali Y, Pawar ME, More A, Shinde S, Borate V, Shirbhate R (2023) Improved pin entry
method to prevent shoulder surfing attacks. In: 2023 14th international conference on computing
communication and networking technologies (ICCCNT), Delhi, India, pp 1–6. https://doi.org/
10.1109/ICCCNT56998.2023.10306875
8. Mali YK, Mohanpurkar A (2015) Advanced pin entry method by resisting shoulder surfing
attacks. In: 2015 international conference on information processing (ICIP), Pune, India, pp
37–42. https://doi.org/10.1109/INFOP.2015.7489347
9. Pawar J, Bhosle AA, Gupta P, Mehta Shiyal H, Borate VK, Mali YK (2024) Analyzing Acute
Lymphoblastic Leukemia Across Multiple Classes Using an Enhanced Deep Convolutional
Neural Network on Blood Smear, 2024 IEEE International Conference on Information Tech-
nology, Electronics and Intelligent Communication Systems (ICITEICS), Bangalore, India,
pp. 1–6. https://doi.org/10.1109/ICITEICS61368.2024.10624915
10. Naik DR, Ghonge VD, Thube SM, Khadke A, Mali YK, Borate VK (2024) Software-Defined-
Storage Performance Testing Using Mininet, 2024 IEEE International Conference on Informa-
tion Technology, Electronics and Intelligent Communication Systems (ICITEICS), Bangalore,
India, pp. 1–5. https://doi.org/10.1109/ICITEICS61368.2024.10625153
11. Dangore M, Ghanashyam Chendke ASRA, Shirbhate R, Mali YK, Kisan Borate V (2024)
Multi-class Investigation of Acute Lymphoblastic Leukemia using Optimized Deep Convolu-
tional Neural Network on Blood Smear Images, 2024 MIT Art, Design and Technology School
of Computing International Conference (MITADTSoCiCon), Pune, India, pp. 1–6. https://doi.
org/10.1109/MITADTSoCiCon60330.2024.10575245
12. Chaudhari A et al. (2024) Cyber Security Challenges in Social Meta-verse and Mitigation Tech-
niques, 2024 MIT Art, Design and Technology School of Computing International Conference
(MITADTSoCiCon), Pune, India, pp. 1–7. https://doi.org/10.1109/MITADTSoCiCon60330.
2024.10575295
Face Cursor Movement Using OpenCV

R. S. M. Lakshmi Patibandla , Madupalli Manoj,

Vantharam Sai Sushmitha Patnaik, Alapati Jagadeesh,
and Bathina Sasidhar

Abstract Some individuals are unable to use computers due to medical conditions.
The idea of eye controls is particularly advantageous for the advancement of natural
input as well as, and this is key, for the underprivileged and the disabled. Also,
they are able to operate the computer autonomously by incorporating a controlling
system. It benefits those with disabilities more. Those who can use computers without
a keyboard are needed. This one is especially helpful for individuals who can move
the cursor with their eyes. In this study, a camera is used to document eye movement.
First, find the centre of the pupil of the eye. The pointer will then travel differently
depending on the multiple variations in pupil position. All of these programmes share
the fact that keyboard and mouse input is the primary technique used while using a
personal computer. Although this wouldn’t be a problem for someone in good health,
it can be an impassable barrier for those with a restricted range of motion in their
limbs. In these situations, it would be better to employ input techniques that rely on
the brain region’s stronger capabilities, such as eye movements. A system that uses
a low-cost technique to operate a mouse pointer on a computing device was created
to allow such alternative input methods. The eye tracker uses photos captured by a
modified webcam to follow the motions of the user’s eyes. The computer screen is
then graphed using these eye movements to place the mouse pointer appropriately.
Automatically altering the location of the eyes while moving the mouse. A webcam
is used to photograph eye movement. The mouse cursor can be moved by moving
the face up, down, left, and right, and mouse actions may be controlled by speaking
and blinking the eyes. Several algorithms, including the Haar Cascade algorithm,
Template Matching, and Hough transformation, are utilised to carry out these tasks.
Our solution is primarily designed to enable successful computer communication
for persons with disabilities. People require artificial means of mobility like a virtual
keyboard for a variety of reasons. The number of persons who, as a result of a medical
condition, must move about with the aid of some object. Also, it is highly beneficial

R. S. M. Lakshmi Patibandla (B)

Department of CSE, Koneru Lakshmaiah Education Foundation, Guntur, AP, India
e-mail: [email protected]
M. Manoj · V. S. S. Patnaik · A. Jagadeesh · B. Sasidhar
Department of IT and CA, School of Computing and Informatics, VFSTR Deemed to Be
University, Guntur, AP, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 389
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_31
390 R. S. M. Lakshmi Patibandla et al.

to incorporate a control strategy in it that allows them to move independently. Face

controls are a fantastic concept for the future of natural input and, more significantly,
for the crippled and handicapped. Face and eye extraction are only a couple of the
image processing methods it uses. A standard webcam is used to take the input
picture. The camera records the motions of the head, mouth, nose, and eyes. First
locate the nose’s pupil in the middle. Next, various variations in face position get
unique commands for the virtual keyboard. To communicate with the virtual keyboard
itself, the signals travel via the motor driver. The virtual keyboard may go forward,
left, right, and halt thanks to the motor driver’s control over direction and speed.

Keywords Eyeball movement · Face recognition · Location matching · Eye

extraction

1 Introduction

Since they are utilised for business, education, and recreation, nowadays, personal
computers play a significant role in our daily lives [1]. All of these programmes share
the fact that keyboard and mouse input is the primary technique used while using
a personal computer. While this is not an issue for a healthy person, it can be an
impassable barrier for those with a restricted range of motion in their limbs. Under
these circumstances, it would be desirable to use input techniques that rely on the
region’s greater capabilities, such as eye movements [2]. A system that uses a low-
cost technique to control the mouse pointer on a computer system was created to allow
such alternative input methods. The eye tracker uses photos captured by a modified
webcam to follow the motions of the user’s eyes. The computer screen is then graphed
using these eye movements to place the mouse pointer appropriately. Automatically
altering the location of the eyes while moving the mouse. A webcam is used to photo-
graph eye movement. A significant number of people are now interested in creating
natural interactions between people and computers [3]. In universal computing, a
number of research for human–computer interfaces are presented. The vision-based
interface technology uses an input video picture to extract motion data without the
need of expensive machinery. As a result, de-developing human–computer interac-
tion systems using a vision-based approach is considered successful. Biometrics is
a current topic in human–computer interaction that relies on eyesight. Eye-tracking
research is distinct since it requires interactive applications. Nevertheless, to create
a vision-based multifunctional interaction between humans and computers system,
tracking the eyes and their identification are performed. Real-time eye input has been
used most commonly for disabled people who can only use their eyes for input [4,
5]. For many different reasons, people need artificial methods of movement like a
virtual keyboard. How many people need anything to help them move about because
of a medical issue. Also, incorporating a controlling mechanism into it gives them
the ability to move independently is highly beneficial. The idea of visual control is
Face Cursor Movement Using OpenCV 391

Fig. 1 Extraction of the

eyeball location

particularly advantageous for the growth of human inputs and, more importantly, for
the underprivileged and disabled [6].
Every everyday device requires manual operation and is inaccessible to those
with mobility issues [7, 8]. In order for persons with motor impairments to partic-
ipate in the information revolution, it is vital to develop alternative methods of
human–computer interaction. The development of an interface between humans and
computers for people with disabilities that uses a vision-based system to recognize
eyeballs and facial gestures is presented. To exert control in a non-intrusive the
proposed study includes face tracking, face identification, and a human–computer
interaction eyeblink recognition, speech recognition, and interpretation of blink
sequences in real time. To interface with computers using facial expressions and eye
movements rather than the typical mouse [9]. It aims to make the use of computers
quick and straightforward for those with physical disabilities as well as those who
are handless (Fig. 1).
Eye tracking is used to examine users’ attention patterns while they are doing
tasks or to provide hands-free computer usage for those who are unable to utilise
the standard mouse and keyboard-based control inputs [10, 11]. It will become more
obvious as eye-tracking technology develops in the future because it is preferable
to employ eye-tracking instead of conventional control methods, particularly for
impaired users. Eye tracking may sometimes be used in tasks where it makes sense,
such as when a camera uses the user’s eyes to focus the lens where the user is looking
at the moment [12]. The efficacy of eye-tracking technology may also vary owing to
several reasons, including poor precision (Fig. 2).
The accuracy and error rates of the Eye Mouse algorithm on a test subject were
recorded as part of the testing process. To guarantee a more precise detection, a
static test was conducted using a set separation between the camera and the subject’s
face [13]. During testing, the user just required to shift their head and eyes, and the
developer noted the tracking window’s accuracy level. A crucial stage in the creation
of interactive Software. It is a method to deepen immersion in the virtual environ-
ment in the context of games [14]. In contrast to the realism of gadgets like head-
mounted electrodes used in games, traditional interactions with a mouse, keyboard,
or gamepad are constrained. Focusing on novel player-player interactions with the
392 R. S. M. Lakshmi Patibandla et al.

Fig. 2 The search region

does not include the top 30%
and bottom 40%

virtual environment is an emerging trend. For instance, some methods employ a head-
piece device to track head motions [15]. We have attempted to investigate computer
vision in this research study with the overarching goal of creating a system that can
comprehend the motions of human face characteristics. The main objective motive is
in order to build a simple concept for face identification and face tracking that mimics
mouse motion [16]. We create a system that employs a camera to monitor a facial
feature, such as the nose’s tip, and utilises the movements of the detected feature
to control directly the mouse cursor on a computer [17]. Other parts of the face are
used to perform the mouse click. The head’s rotation and eye blinking are taken into
consideration while analysing facial motions. The head’s three-dimensional location
is monitored and shown on the pc screen in 2D coordinates. Blinks that are done on
purpose are noticed and taken as actions [18]. The real-time video of the individual
seated in front of the screen is how the tracker operates uniquely.

2 Literature Survey

An eye tracker is used to capture students’ movement of the eye while they are
debugging to be able to determine if and how medium and high and low-performance
students behave differently throughout this process. We invited 38 students studying
computer science to analyse two C programmes. Sequential analysis was used to the
students’ gaze path as they followed programme codes to identify relevant examina-
tion sequences [19]. Next, these noteworthy gaze route sequences were contrasted
with those of pupils who showed various debugging abilities. According to the find-
ings, high-performing students debugged programmes in a more logical way than
low-performing students, who cared for cling to a line-by-line approach and struggled
to rapidly determine the higher-level logic of the program [20]. Also, less-performing
students can often skip through the program’s logic and go straight to particular
suspicious lines to uncover vulnerabilities. In order to remember information, they
often had to go back to earlier statements, and they spent more time doing manual
calculations.
Face Cursor Movement Using OpenCV 393

Real-time driver distraction detection is the cornerstone of several diversion reme-

dies and the requirement for designing a driver-centred driving assistance system.
Although the promising detection performance that data-driven approaches provide,
lowering the high cost of labelled data collection is a specific problem. To reduce the
expense of labelling training data, this study investigated semi-supervised techniques
for driver attention detection in actual driving situations [21].
Feature-based and image-based methods are used for face detection, which is a
crucial task [22]. Geometric analysis is done to determine the face features’ place-
ments, areas, and separations from one another using the feature-based method [23].
The image-based approach involves scanning the target picture with such a window
that hunts for faces at various sizes and angles.
Using template matching, the recognised face from this approach is utilised.
Regression, Bayesian, and discriminative techniques were employed as universal
methods for eye detection [24]. These methods provide results by decreasing gaps
between actual and the user must sit in front of a pc with an advanced camera set
upon the monitor to observe the user’s eyes [25]. The pc continuously analyses the
attention of the video and detects whether the user is looking at the screen, regardless
of whether something is fastened to the user’s head or body. To “choose out” a key,
the user must focus on it for a precise amount of time, but to “push” a key, the user
needs just blink their eye [26, 27]. There is no need for a calibration process with
this device. Enter is the easiest option for this system. There is no external hardware
connection or need.
The eye provides the input to the camera. It will degrade into frames after getting
these streaming videos from the cameras. After the receipt of frames, it will examine
the lighting situation since cameras need enough illumination from outside sources
in any additional predicate eye places. It will then learn about eye and pseudo-eye
appearances and deal with the problem as attribute categories.

3 Proposed Methodology

Our proposed technique employs the OpenCV to monitor eyeball movement and
manage cursor movement on a computer. The camera picks up the movement of the
eyeball, which OpenCV analyses. This makes cursor control possible.
A notification will appear on the screen in case of errors [28]. Iris detection
is performed on images from input supply that are focused on the middle of eye.
Following that, a mid-point is determined by adding the suggestions from the centres
of the left and right eyes [29]. For face detection, the Haar cascade technique is used.
Using the Haar cascade feature, an object is recognized [30]. The neighbouring
rectangle is taken into account by this feature at a certain spot in the detection
window. Two neighbouring rectangles that are located over the eye and cheek area
make up the common Haar feature for face detection [31, 32]. The eyes are then
discovered. Individual eye blinks are used in place of left and right mouse clicks to
open and close objects on the screen. To open and close this application, we utilise
394 R. S. M. Lakshmi Patibandla et al.

Fig. 3 Block schematic

our mouths [33]. This programme begins to function when our mouth opens for the
first time, and it is shut off when our mouth opens for a second time (Fig. 3).
The suggested method’s objective is to
1. To create a wireless mouse control
2. To create a vision-based system
3. To combine voice and face gestures for directing movement of the mouse
4. To give instantaneous tracking of the eyes
5. To do away with the restrictions of a stationary head
Every time a single face is found, its position is computed and sent to the algorithm
for identifying characteristics. This method uses a face recognition algorithm based
on images of Haar faces. It is determined how many facial characteristics there are
and where they are located initially. The memory contains the features’ original
configuration [16, 34]. The difference between the feature’s current and original
locations is then determined for the chosen feature, which in this case is the tip of
the nose. After that, the average of all the discrepancies is determined. Hence, as the
head rolls, the tracker picks up a little movement (Fig. 4).
Face Cursor Movement Using OpenCV 395

Fig. 4 Mouse architecture

Pseudocode
396 R. S. M. Lakshmi Patibandla et al.

Fig. 5 Extracted face,

mouth and eyes based on
facial expressions

With a 2 × 2 pixel grid pattern, it nonetheless produced usable findings since the
total area of all 4 regions—the top, bottom, left, and right corners—was enormous,
thus even little irregular eye movements were still detected. A 3 × 3 pixel grid design,
however, provided slight but now inaccurate placement of the user’s eye movements
concerning where they were looking. In actual life, there was a significant difference
in the monitoring window that was shrinking in region size for every +1/−1 pixel
shift. Second, erroneous pupil detection was brought on by reflection from bright
objects, such as a white tracking screen. Using a bright screen caused a reflection
onto the pupil region that caused the loss of dark areas when initialising it, which
led to the problem discussed above. This was because finding the pupil required
combining Integral imaging and Haar cascading features to locate the darkest region
of the eye.

4 Experimental Results

Our system’s goal is to provide hands-free access to the mouse using voice, eyeball
flashes, and facial expressions. Also, this technology allows us to provide the desired
result. The outcome is as follows: Eye, Mouth and Facial Recognition (Figs. 5, 6 and
7).
Face Cursor Movement Using OpenCV 397

Fig. 6 Mouse clicks and eye blinks

Fig. 7 Seeing cursor movements

5 Conclusion

As a result of the observations and data gathered, it can be said that the accuracy is
respectable despite the use of a web camera with a low resolution and the default
pre-defined classifiers offered by OpenCV. User research revealed a degree of preci-
sion in detecting and tracking eye movement, and the bulk of stability assessments
for the areas that covered were passed. To be able to identify dynamic movements
as opposed to only static ones, the pre-defined classifiers and overall algorithm must
undergo more testing and development. This will provide users greater flexibility
and high precision when looking at faces and remove restrictions on detecting and
moving the pupil. This research introduces a simple and affordable optical apparatus
for processing head motion to execute mouse functions. A camera, a computer, and
our application software make up the system. A mixture of software is used to analyse
camera images to identify the person’s head movement. Next, a non-linear transfor-
mation is used to this head position data to produce a matching screen position for the
mouse pointer. Eye blinks are used for clicking operations. The system’s examination
of how to control mouse cursor motions anywhere by utilising a person’s face, eyes,
and lips. Following the identification of this issue area, comparable industrial goods
398 R. S. M. Lakshmi Patibandla et al.

were evaluated and juxtaposed while their benefits and drawbacks were examined.
This apparatus was incredibly user-friendly, particularly when used with desktop
programmes. It displays rapidity and precision, which are sufficient for numerous
live apps and allows people with disabilities to benefit from a variety of computer
jobs.

References

1. Villanueva A, Cabeza R, Porta S (2011) Eye tracking system model with easy calibration. IEEE
2. Wankhede SS, Chhabria SA (2013) Controlling mouse motions using eye movements. IJAIEM
3. Mangaiyarkarasi M, Geetha A (2014) Cursor control system using facial expressions for
human-computer interaction. IJETCSE
4. Ohno T, Mukawa N, Kawato S (2011) Just blink your eyes: a head-free gaze tracking system.
IEEE
5. Wijesoma WS, Wee KS, Wee OC, Balasuriya AP, San KT, Soon KK. EOG based control of
mobile assistive platforms for the severely disabled. Proc IEEE Int Conf
6. Wu C-C, Hou T-Y (2015) Tracking students’ cognitive processes during program debugging—
an eye-movement approach. IEEE
7. Sung E, Wang J-G (2002) Study on eye gaze estimation. IEEE 32(3)
8. Ji Q, Zhu Z (2007) Novel eye gaze tracking techniques under natural head movement. IEEE
54(12)
9. Soukupova T, Cech J (2016) Real-time eye blink detection using facial landmarks. In: 21st
computer vision winter workshop, February 2016
10. Rosebrock A. Detect eyes, nose, lips, and jaw with dlib, OpenCV, and Python
11. Rosebrock A. Eye blink detection with OpenCV, Python, and dlib
12. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression
trees. In: CVPR
13. Zafeiriou S, Tzimiropoulos G, Pantic M (2015) The 300 videos in the wild (300-VW) facial
landmark tracking in-the-wild challenge. In: ICCV workshop
14. Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the
first facial landmark localization challenge. In: Proceedings of IEEE international conference
on computer vision (ICCV-W), 300 faces in-the-wild challenge (300-W), Sydney, Australia,
December 2013
15. Bhuyan HK, Ravi VK (2023) An integrated framework with deep learning for segmentation
and classification of cancer disease. Int J Artif Intell Tools (IJAIT) 32(02):2340002
16. Bhuyan HK, Chakraborty C, Pani SK, Ravi VK (2023) Feature and sub-feature selection for
classification using correlation coefficient and fuzzy model. IEEE Trans Eng Manag 70(5)
17. Bhuyan HK, Ravi VK (2021) Analysis of sub-feature for classification in data mining. IEEE
Trans Eng Manag
18. Bhuyan HK, Saikiran M, Tripathy M, Ravi V (2022) Wide-ranging approach-based feature
selection for classification. Multimed Tools Appl 1–28
19. Bhuyan HK, Ravi V, Brahma B, Kamila NK (2022) Disease analysis using machine learning
approaches in healthcare system. Health Technol 12(5):987–1005
20. Dontha MR, Sri Supriyanka N (2023) Image-based disease detection and classification of plant
using CNN. In: Marriwala N, Tripathi C, Jain S, Kumar D (eds) Mobile radio communications
and 5G networks. Lecture notes in networks and systems, vol 588. Springer, Singapore. https://
doi.org/10.1007/978-981-19-7982-8_22
21. Pullagura L, Dontha MR, Kakumanu S (2021) Recognition of fetal heart diseases through
machine learning techniques. Ann Roman Soc Cell Biol 25(6):2601–2615. https://www.ann
alsofrscb.ro/index.php/journal/article/view/5873
Face Cursor Movement Using OpenCV 399

22. Gharge T, Chitroda C, Bhagat N, Giri K. AI-smart assistant. Int Res J Eng Technol (IRJET)
6(1). e-ISSN: 2395-0056
23. Nomura K, Rikitake K, Matsumoto R (2019) Automatic whitelist generation for SQL queries
using web application tests. In: 2019 IEEE 43rd annual computer software and applications
conference
24. Dekate A, Kulkarni C, Killedar R (2016) Study of voice controlled personal assistant device.
Int J Comput Trends Technol (IJCTT) 42(1). ISSN: 2231-2803
25. Anerao R, Mehta U, Suryawanshi A. Personal assistant for user task automation. SSRG Int J
Comput Sci Eng (SSRG-IJCSE)
26. Bais H, Machkour M, Koutti L. A model of a generic natural language interface for querying
database. Int J Intell Syst Appl 8:35–44. https://doi.org/10.5815/ijisa.2016.02.05
27. Meng F, Chu WW (1999) Database query formation from natural language using semantic
modelling and statistical keyword meaning disambiguation
28. Mahmud T, Azharul Hasan KM, Ahmed M, Chak THC (2015) A rule based approach for NLP
based query processing. In: 2015 2nd International conference on electrical information and
communication technologies (EICT), Khulna
29. Mohite A, Bhojane V (2015) Natural language interface to database using modified co-
occurrence matrix technique. In: 2015 International conference on pervasive computing (ICPC),
Pune, pp 1–4
30. Ghosh PK, Dey S, Sengupta S (2014) Automatic SQL query formation from natural language
query. In: International conference on microelectronics, circuits and systems (MICRO-2014)
31. Solangi YA, Solangi ZA, Aarain S, Abro A, Mallah GA, Shah A (2018) Review on natural
language processing (NLP) and its toolkits for opinion mining and sentiment analysis. In: 2018
IEEE 5th International conference on engineering technologies and applied sciences (ICETAS),
Bangkok, Thailand, pp 1–4
32. Bhuyan HK, Chakraborty C (2022) Explainable machine learning for data extraction across
computational social system. IEEE Trans Comput Soc Syst 1–15
33. Huang B, Zhang G, Sheu PC (2008) A natural language database interface based on a prob-
abilistic context free grammar. In: IEEE International workshop on semantic computing and
systems, Huangshan, pp 155–162
34. Uma M, Sneha V, Sneha G, Bhuvana J, Bharathi B (2019) Formation of SQL from natural
language query using NLP. In: 2019 International conference on computational intelligence
in data science (ICCIDS), Chennai, India, pp 1–5. https://doi.org/10.1109/ICCIDS.2019.886
2080
Powerpoint Slide Presentation Control
Based on Hand Gesture

Ankit Kumar, Kamred Udham Singh, Gaurav Kumar, Teekam Singh,

Tanupriya Choudhury, and Ketan Kotecha

Abstract HCI, which stands for “human–computer interaction,” has been increas-
ingly concerned with natural interaction methods in recent years. Various ways we
interact with computers have benefited from the development of real-time hand
gesture recognition programs. The detection of hand motions calls for the use of
a camera. The primary form of participation is the employment of a web camera
as a virtual human–computer interaction device. In this body of work, we look into
how vision-based HCI methods of the present day make use of hand gestures. In
the event that consumers are unable to utilise any input device or touch it, this
project becomes incredibly useful. Gesture recognition makes it possible to complete
an activity without physically accessing the usual input devices (mouse, keyboard,
etc.). The user may doodle with his index finger and use his index and middle fingers
together to control the pointer’s movement on the screen. The user may erase their
artwork using the tips of their index, middle, and ring fingers. Using their little finger,

A. Kumar · G. Kumar
Department of Computer Engineering and Applications, GLA University, Mathura, UP, India
e-mail: [email protected]
G. Kumar
e-mail: [email protected]
K. U. Singh (B)
School of Computing, Graphic Era Hill University, Dehradun, India
e-mail: [email protected]
T. Singh
Department of Computer Science and Engineering, Graphic Era Deemed to Be University,
Dehradun, India
e-mail: [email protected]
T. Choudhury (B)
School of Computer Science, University of Petroleum and Energy Studies (UPES), Dehradun
248007, Uttarakhand, India
e-mail: [email protected]
K. Kotecha
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology,
Symbiosis International University, Pune 411045, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 401
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_32
402 A. Kumar et al.

the user may go on to the next file, and using their thumb finger and pointing to the
left, they can return to the previous file.

Keywords Hand gesture · Human–computer interaction · Segmentation · Power

point presentation

1 Introduction

Most human–computer interaction (HCI) nowadays uses mechanical input devices.

This includes keyboards, mice, joysticks, and gamepads. Recent interest in compu-
tational vision has grown due to its capacity to recognise human movements [1].
One of the key draws is computational vision’s ability to improve 3D models. Some
gesture-recognition systems need wear.
Use coloured gloves or markers on their hands and fingers [2]. Smoothing the
process. The tracked hand is rapidly and precisely located by a controlled background
[3]. These two criteria limit the user and interface designer. Our Powerpoint slide
control application requires hand gestures, thus we eliminated solutions that need
coloured gloves, markers, or a controlled background.
Recent years have seen gesture recognition acquire attention due to its growing
importance. Many apps can be gesture-controlled [4]. Mouse control, gaming, and
more employ hand movements. Face gestures like lip movements define language.
Robots follow human hand gestures in hand gesture control. A camera shows the
robot this human hand gesture. Understanding the technique that helps the robot
recognise hand movements from the photo is intriguing. Each gesture commands.
Specific instructions lead the robot to work. Hand movements [5–7] convey many
meanings. One may signify stop, two move forward, three, four, and five turn right,
left, and reverse.
Gestures are recognised using gloves, markers, and other tools. Gloves or markers
may raise system costs, but gestures enhance human–machine interaction. Two
cameras are required to record a 3D hand picture and distinguish hand gestures
[8].
Hand gestures may be static or dynamic [9]. Static motions include still hands.
The system can conduct the user-selected function with a finger. Hand movement
identifies dynamic motions. The right side of a hand-controlled media player like
VLC may increase volume.
Applications may save photos in databases using hand motion detection. These
apps may require a complex algorithm to match database photos with camera
photos and perform the correct actions. Since such software stores movements in
its databases, users should be familiar with them before using them. We designed a
Python-based Hand Gesture Recognition System for Slide Show Navigation without
database image storage.
This study introduces a non-invasive real-time hand tracking and gesture [10,
11] detection approach for Powerpoint slide presentation. In this configuration, the
Powerpoint Slide Presentation Control Based on Hand Gesture 403

webcam takes images. It’s divided to identify the hand. Hand Tracking Modules of
Computer Vision are used to recognise hand regions and determine how many fingers
are up and how far apart they are. Also provides hand bounding box information.
The approach assesses hand velocity and active fingers using this information. We
handle slide presentations.

2 Literature Review

One of the most popular and promising approaches to improving human–computer

interaction is the recognition of hand movements.
Charan et al. [12] present research to translate the presenter’s spoken signal into
the text as well as control the Powerpoint presentation using hand gestures. The
Python interface that is utilised with Microsoft Powerpoint provides the presenter
with a significant amount of leeway in terms of managing the slides of Powerpoint
through the use of hand gestures.
Kumar [13] compare two feature extraction methods—hand contour and complex
moments—in an effort to recognise hand gestures. Each approach’s key benefits and
drawbacks will be emphasised to achieve this goal. An artificial neural network is
created for the task of classification using the back-propagation learning technique.
Ahamed et al. [14] operate different functions of PowerPoint slideshows by using
different hand gestures. This work makes an attempt to map minute changes in move-
ments by making use of some straightforward Python scripts that manage Power-
point slide shows. The motions were identified with the use of machine learning. The
method being offered is intended to assist presenters in delivering more successful
and interactive presentations by improving the presenters’ natural touch with the
computer. The strategy that is advocated, in particular, is more practical than using
a laser pen since the hand is clearer and has the potential to capture the attention of
the audience in a more effective manner.
Yuan et al. [15] research the captured image of the hand motion that has three
stages of processing: pre-processing, feature extraction, and classification. In order
to prepare the image of the hand motion for the feature extraction stage, a number of
methods are used in the pre-processing step to separate the gesture from its context.
The form of the hand is considered in the initial approach to fixing the problems
of scale and translation. This suggests that the suggested technique has promise
as a means of recognising gestures from a variety of perspectives. The system’s
functionality has been deemed satisfactory and beneficial.
To improve the efficiency and dependability of the interface, we provide a vision-
based method for controlling the different mouse functions, including left- and right-
clicking. This study introduces a vision-based interface that can be used to control
a computer mouse through the use of two-dimensional hand gestures. The hand
motion detector is a camera and color detection-based system. The major focus of
this approach is the use of a webcam to simulate an HCI device. Each input image’s
centroid is calculated. This is the principal sensor that controls the position of the
404 A. Kumar et al.

computer pointer on the screen, however, hand motion can also affect its accuracy
[16].

3 Methodology

Our gesture alphabet has five hand motions to meet application needs. A little figure
with split fingers, an opened hand with fingers together, a fist, and the final motion
emerge while the hand is not visible in the camera’s frame of view. This motion is
Start, Next slide. Indicate location, write on slide, undo, and next slide. The Move
gesture allows left and right motions. Users show their thumbs for the left and little
fingers for the right.
Many gesture-to-motion transitions are shown in Fig. 1. Understanding skin tone
is vital. Problems with detection and gesture recognition are conceivable. One of the
app’s hardest elements is keeping the hand control in the camera’s field of vision
without entering the capture region. This is a really difficult obstacle to overcome.
User training has been shown to be effective in resolving this issue.

3.1 Data Collection

The procedure of collecting data for controlling a Powerpoint slide show with hand
gestures often entails capturing samples of hand motions while engaging with the
presentation software.
The following is an overview of the data collection procedure used in our research:

Fig. 1 Hand tracking and gesture recognition examples

Powerpoint Slide Presentation Control Based on Hand Gesture 405

Creating the ideal conditions: In the first step, the setup for presentation software
(such as Microsoft Powerpoint) is up and running and ready.
Define gesture set: Next, we determine the hand gestures that will be used to control
the presentation so that everyone is on the same page. This could contain move-
ments such as going to the next slide, the slide before that, starting or stopping the
presentation, and annotating.
Data annotation: Annotate the slides of the presentation with the hand motion labels
that match each slide. We define the hand gesture that should activate the required
action for each slide in the presentation.
The location, as well as the lighting: Check that the hand can be seen unmistakably
within the range of the camera’s viewfinder. Ensure that the lighting conditions are
adequate so that shadows are kept to a minimum and the hand can be seen clearly.
Record gesture samples: While presenting the slides and carrying out the prede-
termined hand movements, we initiate the recording of video data from the
camera.
The initial processing of data: Reduced the amount of captured video data so that
it contained only the segments that are pertinent to each gesture.

3.2 Algorithm

The following algorithm script uses the cvzone library for hand tracking and OpenCV
for image processing. It allows one to control a presentation or slideshow by using
hand gestures.
1. Import the required libraries: cvzone.HandTrackingModule, cv2, os, and numpy.
2. Set up parameters such as webcam dimensions, frame reduction, image dimen-
sions, gesture threshold, folder path for presentation images, etc.
3. Initialize the webcam capture and set its dimensions.
4. Create a HandDetector object with a detection confidence of 0.8 and a maximum
of 1 hand.
5. Create empty lists and variables for storing images, delays, button states,
counters, drawing mode, annotation data, and image numbers.
6. Get a sorted list of image file names from the specified folder path.
7. Enter a loop to process each frame from the webcam:
• Read the current frame from the webcam and flip it horizontally.
• Load the current image from the presentation folder based on the image
number.
• Use the HandDetector to find hands and landmarks in the current frame.
• Draw a gesture threshold line on the frame.
• Check if a hand is detected:
406 A. Kumar et al.

– Get the first detected hand and its center coordinates.

– Extract landmark information and finger states from the hand.
– Convert landmark coordinates to match the dimensions of the presentation
screen.
– Check if the hand is above the gesture threshold:
If the thumb is extended while other fingers are closed, interpret it as a
“Left” gesture and decrement the image number if possible.
If the little finger is extended while other fingers are closed, interpret it
as a “Right” gesture, and increment the image number if possible.
– Check if the index finger is extended while the middle finger is closed:
Draw a circle on the current presentation image at the index finger’s
position.
– Check if the index finger is extended while other fingers are closed:
If the annotation start flag is false, create a new annotation by incre-
menting the annotation number and appending an empty list to the
annotations.
Add the index finger’s position to the current annotation.
Draw a circle on the current presentation image at the index finger’s
position.
– If the above conditions are not met, reset the annotation start flag.
– Check if the index, middle, ring, and little fingers are extended while the
thumb is closed:
If there are annotations available, remove the last annotation by popping
it from the annotations list, decrement the annotation number, and set
the button pressed flag to true.
– If no hand is detected, reset the annotation start flag.
– Iterate over the annotations and draw lines between consecutive points in
each annotation.
– Show the current presentation image and the webcam frame.
– Check if the ‘q’ key is pressed to exit the loop.
8. Release the webcam and close all windows.
The above algorithm steps are shown in a flowchart in Fig. 2.
Powerpoint Slide Presentation Control Based on Hand Gesture 407

Fig. 2 Flowchart of the used process

3.3 Euclidean Distance

It is possible for hand gesture recognition algorithms to make use of the Euclidean
Distance in order to measure the spatial distance between important spots or land-
marks on the hand. It is feasible to detect and categorise various hand motions by
measuring the distances between specified locations in a variety of hand positions
or gestures [17]. The Euclidean distance is the one that is utilised the vast majority
408 A. Kumar et al.

of the time in the field of computer vision. It will throw away the image structures
and will be unable to portray the real relationship that exists between the images. If
there is even a slight difference between the two photos, then the Euclidean distance
between them will be significantly increased. Using the Euclidean Distance formula
as shown below, one may determine the length of the path that separates two locations
in n-dimensional space. It is defined as the square root of the sum of the squared
discrepancies that exist between the two points’ respective coordinates.

n

Euclidean Distance = |X − Y | = (xi − yi )2 (1)
i=1

The coordinates of the two points in n-dimensional space are represented by the
arrays X and Y, respectively. The Euclidean Distance function performs an iteration
over each coordinate, computes the squared difference, and then stores the result in a
variable called distance. In the end, the square root of the total distance that has been
amassed is calculated, and this value is what is returned as the Euclidean distance
between the points X and Y.

3.4 Bounding Box

In hand gesture recognition, a “bounding box” may be a rectangular box that entirely
encloses the hand or a hand-related ROI. The bounding box localises and isolates
the hand in an image or video frame, making hand motion analysis easier and more
accurate [18].
How bounding boxes may be used in hand gesture detection algorithms is
summarised below:
• Hand Detection: First, check for a hand in the incoming picture or video frame.
Hand detection may be done using computer vision techniques or a machine
learning model.
• Hand Localisation: After recognising a hand, localise the hand area inside the
frame. This is “hand localisation.” This may be done by setting the minimum
and maximum coordinates of the identified hand landmarks or using segmenta-
tion methods to segment the hand area. Another technique is to find the hand
landmarks’ lowest and maximum coordinates.
Using either the localised hand region or the lowest and maximum coordinates,
a bounding box is produced and used to define the area of interest. The hand or the
ROI that contains the hand is normally enclosed within the bounding box, which has
the form of a rectangle. Calculations are performed to determine the coordinates of
the top-left and bottom-right corners of the bounding box.
Powerpoint Slide Presentation Control Based on Hand Gesture 409

Hand gesture recognition uses for the bounding box: The bounding box has a
variety of applications in hand gesture recognition. The following are some examples
of common applications:
Classification of gestures: The area of the hand that is included within the bounding
box can be utilised as an input for gesture classification algorithms. In these algo-
rithms, features or patterns are retrieved from the hand region in order to categorise
the gesture that is being performed.
Gesture tracking: The bounding box can be used to follow the movement of the
hand or changes in hand positions across consecutive frames. This tracking is helpful
for analysing dynamic motions as well as tracking gestures over the course of time.
Gesture Segmentation involves isolating the hand within the bounding box in
order to perform the process of gesture segmentation, which involves separating the
hand from the background or other objects that are present in the scene. This is helpful
for continuing the examination of the hand motions or processing them further.

4 Result and Discussion

First position with the kittle figure is for the next slide which is shown in Fig. 3.
The next position is to point out any place on the slide in Fig. 4 as shown that
personal computer is pointed out.
Figure 5 shows the writing in any place on the slide. Here we have put a tick mark
on the hardware and software.
Figure 6 shows the undo position using three fingers. Here one tick mark is
removed from Fig. 5.

Fig. 3 Next slide position

410 A. Kumar et al.

Fig. 4 Pointing out position

Fig. 5 Writing on slide position

Figure 7 position is for the previous slide. In it, we are using the thumb for the
previous slide.
Powerpoint Slide Presentation Control Based on Hand Gesture 411

Fig. 6 Undo position

Fig. 7 Previous slide position

5 Conclusion

Control of the slide show can be achieved through the use of dynamic gestures.
There are some specific fingers like little finger, index finger, thumb, etc. can be
used to indicate the motion of the presentation slide. Because this does not require
any kind of training process to identify a hand gesture, there is no need to save any
images in a database in order to be able to recognise hand gestures. A method that
is based on hand segmentation, hand tracking, and gesture detection from extracted
hand characteristics has been suggested. The findings of the performance evaluation
of the system have demonstrated that the users are able to make use of this low-
cost interface to substitute more conventional interaction metaphors. The use of
hand gestures can be expanded to control real-time programmes such as Paint, PDF
Reader, and other similar programmes.
412 A. Kumar et al.

References

1. Matsuzaka Y, Yashiro R (2023) AI-based computer vision techniques and expert systems. AI
4(1):289–302
2. Shan C, Wei Y, Tan T, Ojardias F (2004) Real time hand tracking by combining particle filtering
and mean shift. In: Proceedings of the sixth IEEE automatic face and gesture recognition, FG04,
pp 229–674
3. Heap T, Hogg D (1998) Wormholes in shape space: tracking through discontinuous changes
in shape. In: Proceedings of the sixth international conference on computer vision, ICCV98,
pp 344–349
4. Jiang Y, Song L, Zhang J, Song Y, Yan M (2022) Multi-category gesture recognition modeling
based on sEMG and IMU signals. Sensors 22(15):5855
5. Kane L et al (eds) (2022) Challenges and applications for hand gesture recognition. IGI Global
6. Nigam S, Shamoon M, Dhasmana S, Choudhury T (2019) A complete study of methodology
of hand gesture recognition system for smart homes. In: 2019 International conference on
contemporary computing and informatics (IC3I), Singapore, 2019, pp 289–294. https://doi.
org/10.1109/IC3I46837.2019.9055608
7. Sharma H, Choudhury T (2022) Applications of hand gesture recognition. IGI Global, pp
194–207. https://doi.org/10.4018/978-1-7998-9434-6.ch010
8. Al Farid F, Hashim N, Abdullah J, Bhuiyan MR, Isa WNSM, Uddin J, Haque MA, Husen
MN (2022) A structured and methodological review on vision-based hand gesture recognition
system. J Imag 8(6):153
9. Faisal MAA, Abir FF, Ahmed MU, Ahad MAR (2022) Exploiting domain transformation and
deep learning for hand gesture recognition using a low-cost dataglove. Sci Rep 12(1):21446
10. Tripathi KM, Kamat P, Patil S, Jayaswal R, Ahirrao S, Kotecha K (2023) Gesture-to-text
translation using SURF for Indian sign language. Appl Syst Innov 6:35. https://doi.org/10.
3390/asi6020035
11. Rajalakshmi E et al (2023) Multi-semantic discriminative feature learning for sign gesture
recognition using hybrid deep neural architecture. IEEE Access 11:2226–2238. https://doi.
org/10.1109/ACCESS.2022.3233671
12. Charan CS, Meenakshi K, Bhavani Reddy V, Kashyap V (2023) Controlling power-point
presentation using hand gestures in real-time. In: 2023 7th International conference on trends
in electronics and informatics (ICOEI). IEEE, pp 251–254
13. Kumar C (2022) Hill climb game play with webcam using OpenCV. Int J Res Appl Sci Eng
Technol 10(12):441–453
14. Ahamed SF, Sandeep P, Tushar P, Srithar S (2023) Efficient gesture-based presentation
controller using transfer learning algorithm. In: 2023 International conference on computer
communication and informatics (ICCCI). IEEE, pp 1–5
15. Yuan T, Song Y, Kraan GA, Goossens RHM (2022) Identify finger rotation angles with ArUco
markers and action cameras. J Comput Inf Sci Eng 1–25
16. Tripathi D, Srivastava A (2021) Production of holograms through laser-plasma interaction with
applications. Int J Adv Res 9(12):227–231
17. Xu J, Wang H, Zhang J, Cai L (2022) Robust hand gesture recognition based on RGB-D data
for natural human–computer interaction. IEEE Access 10:54549–54562
18. Dang TL, Tran SD, Nguyen TH, Kim S, Monet N (2022) An improved hand gesture recognition
system using keypoints and hand bounding boxes. Array 16:100251
SQL Queries Using Voice Commands
to Be Executed
R. S. M. Lakshmi Patibandla , Sai Naga Satwika Potturi,
and Namratha Bhaskaruni

Abstract Add tables, delete tables, update tuples, and remove entries all require
SQL queries. SQL query execution could appear straightforward, yet even a small
mistake could result in serious issues. Keeping track of the queries and ensuring
that they are handled flawlessly is a time-consuming, exhausting procedure. Without
it, a bad query execution would result in bad data handling and eventual data loss.
Speaking a query out loud and clearly and letting the computer handle it are two
additional simple alternatives to typing it in. This is accomplished by our software
using built-in Python methods and straightforward techniques. Furthermore, the time
complexity might be greatly decreased with basic knowledge of NLP methods and
how they operate, and comparable outcomes could be seen with low topic knowledge
and high productivity.

Keywords NLP · Machine learning · Voice commands · API · Querying

1 Introduction

Two of the most popular technologies in the world of technology today are Python
and databases. This project’s primary goal was to merge these two rapidly developing
technologies and use Python to carry it out. Instead of typing out SQL queries, we just
dictate natural speech that is then translated into an SQL query. While data is typically
entered into phones via a keyboard, voice input has become a popular alternative. The
algorithm operates by removing the query’s important terms. After that, we create the
appropriate query and run it to get the results we want. Anyone who believes voice-
based input is superior to the traditional approach or is unfamiliar with the syntax
can use this program Python or SQL. Queries can be handled more successfully

R. S. M. Lakshmi Patibandla (B)

Department of CSE, Koneru Lakshmaiah Education Foundation, Guntur, AP, India
e-mail: [email protected]
S. N. S. Potturi · N. Bhaskaruni
Department of IT and CA, School of Computing and Informatics, VFSTR Deemed to Be
University, Guntur, AP, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 413
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_33
414 R. S. M. Lakshmi Patibandla et al.

Fig. 1 Problem description

using databases and voice-based interfaces thanks to recent advancements in voice-

based interfaces like Siri from Apple and Google Assistant [1–3]. Comparing this
suggested approach to traditional query structures, its main advantage is that it is
fully hands-free. Even if declarative programming is arguably more straightforward,
the move from imperative to declarative programming is still extremely difficult for
trainees. This project’s goal is to help those learning SQL queries, who aren’t highly
experienced in their execution (Fig. 1).
The primary objective of processing natural language queries is for computers to
comprehend English sentences and carry out the correct actions. Despite the chal-
lenges involved, one of the most auspicious and crucial domains in computer science
is language processing [4–7]. Computers will possess the capability to understand
human language, provide accurate and instant language translations, and condense
information from diverse data sources according to users’ commands. These repre-
sent only a handful of the potential applications that will become feasible once natural
language processing capabilities reach their full potential.

2 Background and Related Work

2.1 Creating a Select Query in Mobile Applications

Traditional data is inputted into a smartphone using the keypad and a simple icon-
press. Individuals constantly use their smartphones to stay in touch with the outside
world. This includes using social media and other relationships in addition to phone
calls [8–12]. Therefore, we added a voice-enabled keyboard in addition to voice
search and voice commands like “Navigate to”. You can have access to a wide range
of tools and services through the Android SDK to improve the usability of your
applications for users who are deaf or visually impaired. Frameworks for speech
recognition, Text-To-Speech (TTS), and other examples are provided. Applications
for Android can use speech input and output.
Speech input and speech output using speech recognition and text-to-speech
services are made possible via the speech package included in the Android SDK [13–
15]. Android database management, function formats this automatically produced
query as follows (Fig. 2):

Cursor c = db.query
SQL Queries Using Voice Commands to Be Executed 415

Fig. 2 Querying using voice

commands

2.2 Cyrus

Being revelatory, SQL stands a far better opportunity than regular dialect program-
ming to be the programming language for conceptual computing. It also stands to be
higher, stronger, and more improved. The viability of using SQL as a backend for
native tongue database development is examined with emphasis on keyword-based
[16].
Our method dramatically reduces SQL querying, keyword dependency, and SQL
table structure constraints. Researchers provide Cyrus, a portable voice search inter-
face for mobile devices, to subjective social datasets. Cyrus offers a wide range of
inquiry-based lectures that are suitable for a database course at the section level.
Moreover, Cyrus is not constrained to a predefined collection of catchphrases or
natural language sentence structures, allows for test database customization, and is
application-autonomous. When compared to the majority of contemporary portable
and voice-enabled frameworks, its cooperative error reporting is more natural, and
the iOS-based portable platform is also more accessible [17]. Although explanatory
programming seems more natural, research nevertheless finds the transition from
simple dialects to SQL to be extraordinarily challenging. They frequently struggle to
formulate demanding complex enquiries in semantic frameworks, especially those
that contain established subquestions or GROUP BY features.
416 R. S. M. Lakshmi Patibandla et al.

3 A SQL Query Using Natural Language Based on Voice

This technique for extracting semantic data from social web sources has been dubbed
Natural Language to SQL generation. However, the challenge lies in extracting the
underlying meaning of the query. In response, Garima Singh introduced a method
in 2016 known as an algorithm that converts natural language into SQL queries
for relational databases. This approach stems from the Three-Level Engineering of
NLTSQLC. Nevertheless, this method often comes with the drawback of potentially
computationally intensive processing of certain information [18]. Natural language
is one of the key disciplines of computer science, and it focuses on the interactions
that take place between computers and human language. More appealing sections of
the human–computer interaction can be found there.
This integrates spoken language variations with both natural language and speech.
To retrieve information from a database, prior familiarity with database management
systems (DBMS) is necessary. A software known as a database management system
(DBMS) is utilized for the storage and administration of data in databases. In this
context, individuals lacking specialized expertise might encounter challenges when
attempting to extract information [19].
Natural language processing techniques are used to resolve this issue and make it
easier for people to engage with computers. Natural language processing has appli-
cations in a variety of industries, including tourism, where a visitor can learn about
the top attractions in a city, housing options, the best locations nearby, and more.
Our approach focuses on identifying the correct query by receiving input in the form
of speech (Fig. 3).
The spoken question is accepted as input by the system, which then sends it to a
voice recognition engine. The output of that stage is the mixed-formatted input text
query. After being extracted, the right input query is then forwarded to tokenization.
The process of breaking the statement up into its component words and storing it
in the list is known as tokenization. After storing it in the list, unwanted tokens are
eliminated. The pre-stored synonym database, which comprises the words and their
synonyms, is used to map the tokens. The text translator receives the polished text
next. Clause extractor and mapper are included in the text translator. Using which a
middle query is produced and tokens are associated with the table name and attribute.
The SQL query is the outcome of this stage. The database is used to process this
SQL query, and accurate results are shown on the interface. The command prompt
will display the SQL query [3, 20].

Algorithm
Step 1: Accepting speech input from the user is the first step.
Step 2: Using a speech recognition engine, the speech is turned into text.
Step 3: Other statements are discarded in favour of the correct form of the statement,
which is kept.
SQL Queries Using Voice Commands to Be Executed 417

Fig. 3 Diagram of database interaction using automatic speech recognition

Step 4: Tokenize the input query statement by splitting it into smaller pieces and
storing them in a list.
Step 5. Delete any tokens from the list that are not necessary, such as the, an, etc.
Step 6. Map the tokens to the database characteristics and table name.
Step 7: Locate the tables that will include the data.
418 R. S. M. Lakshmi Patibandla et al.

4 Experimental Work

4.1 Speech Recognition

The captivating field of computational linguistics advances techniques and resources

that empower computers to identify spoken regional languages and convert them
into text. It is also known as “Automatic Speech Recognition” (ASR) or “Speech
To Text” (STT). This discipline amalgamates insights and research from phonetics,
computer science, and electrical engineering. Specific speech recognition systems
require a process referred to as “preparation” or “Enrollment,” where a speaker reads
text or specific vocabulary items into the system.

4.2 Speech to Text

Speech To Text Converter is a technique for converting spoken words or sounds

into content. Typically, the technique is known as Verbal Acknowledgment. Speech
understanding, the more comprehensive process of extracting content from speech,
is referred to by the phrase “speech recognition.” The name “voice recognition”
or “speaker recognition” is inappropriate because we usually refer to the process
of identifying someone by their voice. Most systems make use of the articulation
model. You can concentrate on the models if you want to receive translations of the
highest caliber (Figs. 4, 5, and 6).

Fig. 4 When instructed to show every record in the table, output

SQL Queries Using Voice Commands to Be Executed 419

Fig. 5 Output when a certain circumstance occurs

Fig. 6 Only retrieving the necessary columns

5 Conclusion

The “Executing SQL queries using Voice Commands” project outlines a to-be-made
API that gives users the option to speak their native language and have it trans-
lated into the appropriate SQL query. Speech inputs are widely used and getting
increasingly complex. This has altered how people live their daily lives and created
opportunities for a wide range of intriguing inventions and useful uses. This project
serves as a fundamental building element for apps that do away with the need for
traditional learning and query execution in favour of speaking queries aloud and
letting the system handle the conversion for the user. It uses the readings from the
microphones as inputs and forecasts a potential question for it. This article provides
a thorough analysis of the most recent developments in voice recognition and how
they might be applied to diverse projects that improve and simplify life. We lay
forth the fundamental ideas behind using voice recognition and transforming spoken
420 R. S. M. Lakshmi Patibandla et al.

language into something that can be turned into another programming language.
Further extensions to the initial project service are possible.
This can be done by including a few more extra decisions or by attempting to offer
admin privileges by incorporating TCL and DDL commands into the programme in
addition to the select privilege. Just now, we tried to execute the SELECT query from
the DML commands.

References

1. Bhuyan HK, Chakraborty C (2022) Explainable machine learning for data extraction across
computational social system. IEEE Trans Comput Soc Syst 1–15
2. Bhuyan HK, Ravi VK (2023) An integrated framework with deep learning for segmentation
and classification of cancer disease. Int J Artif Intell Tools (IJAIT) 32(02):2340002
3. Bhuyan HK, Chakraborty C, Pani SK, Ravi VK (2023) Feature and sub-feature selection for
classification using correlation coefficient and fuzzy model. IEEE Trans Eng Manag 70(5)
4. Bhuyan HK, Ravi VK (2021) Analysis of sub-feature for classification in data mining. IEEE
Trans Eng Manag
5. Bhuyan HK, Saikiran M, Tripathy M, Ravi V (2022) Wide-ranging approach-based feature
selection for classification. Multimed Tools Appl 1–28
6. Bhuyan HK, Ravi V, Brahma B, Kamila NK (2022) Disease analysis using machine learning
approaches in healthcare system. Health Technol 12(5):987–1005
7. Dontha MR, Sri Supriyanka N (2023) Image-based disease detection and classification of plant
using CNN. In: Marriwala N, Tripathi C, Jain S, Kumar D (eds) Mobile radio communications
and 5G networks. Lecture notes in networks and systems, vol 588. Springer, Singapore. https://
doi.org/10.1007/978-981-19-7982-8_22
8. Pullagura L, Dontha MR, Kakumanu S (2021) Recognition of Fetal heart diseases through
machine learning techniques. Ann Roman Soc Cell Biol 25(6):2601–2615. https://www.ann
alsofrscb.ro/index.php/journal/article/view/5873
9. Gharge T, Chitroda C, Bhagat N, Giri K. AI-smart assistant. Int Res J Eng Technol (IRJET)
6(1). e-ISSN: 2395-0056
10. Nomura K, Rikitake K, Matsumoto R (2019) Automatic whitelist generation for SQL queries
using web application tests. In: 2019 IEEE 43rd annual computer software and applications
conference
11. Dekate A, Kulkarni C, Killedar R (2016) Study of voice controlled personal assistant device.
Int J Comput Trends Technol (IJCTT) 42(1). ISSN: 2231-2803
12. Anerao R, Mehta U, Suryawanshi A. Personal assistant for user task automation. SSRG Int J
Comput Sci Eng (SSRG-IJCSE)
13. Bais H, Machkour M, Koutti L. A model of a generic natural language interface for querying
database. Int J Intell Syst Appl 8:35–44. https://doi.org/10.5815/ijisa.2016.02.05
14. Meng F, Chu WW (1999) Database query formation from natural language using semantic
modelling and statistical keyword meaning disambiguation
15. Mahmud T, Azharul Hasan KM, Ahmed M, Chak THC (2015) A rule based approach for NLP
based query processing. In: 2015 2nd International conference on electrical information and
communication technologies (EICT), Khulna
16. Mohite A, Bhojane V (2015) Natural language interface to database using modified co-
occurrence matrix technique. In: 2015 International conference on pervasive computing (ICPC),
Pune, pp 1–4
17. Ghosh PK, Dey S, Sengupta S (2014) Automatic SQL query formation from natural language
query. In: International conference on microelectronics, circuits and systems (MICRO-2014)
SQL Queries Using Voice Commands to Be Executed 421

18. Solangi YA, Solangi ZA, Aarain S, Abro A, Mallah GA, Shah A (2018) Review on natural
language processing (NLP) and its toolkits for opinion mining and sentiment analysis. In: 2018
IEEE 5th International conference on engineering technologies and applied sciences (ICETAS),
Bangkok, Thailand, pp 1–4
19. Huang B, Zhang G, Sheu PC (2008) A natural language database interface based on a prob-
abilistic context free grammar. In: IEEE International workshop on semantic computing and
systems, Huangshan, pp 155–162
20. Uma M, Sneha V, Sneha G, Bhuvana J, Bharathi B (2019) Formation of SQL from natural
language query using NLP. In: 2019 International conference on computational intelligence
in data science (ICCIDS), Chennai, India, pp 1–5. https://doi.org/10.1109/ICCIDS.2019.886
2080
A Compatible Model for Hybrid
Learning and Self-regulated Learning
During the COVID-19 Pandemic Using
Machine Learning Analytics

Pratya Nuankaew , Sittichai Bussaman , Patchara Nasa-Ngium ,

Thapanapong Sararat , and Wongpanya S. Nuankaew

Abstract Educational models and learning styles are essential and have an evolu-
tionary necessity for the education industry. As a result, the research has identified
three objectives: (1) to study the context of hybrid learning management with self-
regulated learning style strategies during the COVID-19 pandemic, (2) to develop a
data science model for hybrid learning management with self-regulated learning
style strategies during the COVID-19 pandemic, and (3) to study the students’
learning achievements with the developed model. The data collection was the 44
higher education students who controlled the self-regulated learning styles in hybrid
learning situations during the COVID-19 pandemic at the School of Information and
Communication Technology, the University of Phayao. The research tool consisted of
statistical and supervised machine learning tools based on descriptive and predictive
analytics principles. The model performance evaluation employed a confusion matrix
and cross-validation techniques for testing. The research findings show that learners’
contexts in the COVID-19 pandemic have different learning behaviors and achieve-
ment styles under hybrid learning management strategies. The researchers success-
fully developed a prototype model for predicting learners’ learning achievement for
hybrid learning management with self-regulated learning style strategies. The results
of this research can further be used as a guideline for educational management in
unusual situations to improve the quality of learners and the academic industry.

Keywords Academic achievement model · Educational data mining · Hybrid

learning model · Self-regulated learning model · Student model

P. Nuankaew · W. S. Nuankaew (B)

School of Information and Communication Technology, University of Phayao, Mae Ka,
Phayao 56000, Thailand
e-mail: [email protected]
S. Bussaman · P. Nasa-Ngium · T. Sararat
Faculty of Science and Technology, Rajabhat Maha Sarakham University, Maha Sarakham 44000,
Thailand

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 423
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_34
424 P. Nuankaew et al.

1 Introduction

The COVID-19 pandemic has devastated agencies and organizations worldwide [1–
4], including educational institutions [4], schools, colleges, universities, and training
institutes, to shut down corporate services. Moreover, the educational organization
has changed its operations in other ways, including limiting the working time, estab-
lishing coordination actions within the organization, and modifying the operating
model [4–6]. The related organizations have measures to provide online teaching
and learning for institutions involving students in the formal education system [5, 6].
Educational organizations must use distance learning mechanisms and technologies
to support teaching and learning [4]. However, the measure affects many students,
resulting in educational inequalities, inappropriate use of school supplies, improper
learning, and increased costs for parents and instructors.
They are learning behaviors and cognitive participation in online-offline and
hybrid learning environments [7–9]. The concept of online and offline mixed learning
is, therefore, referred to as the hybrid learning model. The study of the relation-
ship between online and offline study effectiveness is a space for researchers after
the COVID-19 pandemic. Moreover, the research area of applying artificial intelli-
gence and machine learning technologies to improve the quality of education is in
the interest of modern educators [10, 11]. They use big data to understand learner
behavior, predict learner achievement, recommend appropriate educational programs
for their potential [10, 12], etc. However, educators must realize learner behavior to
design an individualized learning approach.
From the significance and origin of the research, the primary purpose is to
develop a compatible model for hybrid learning and self-regulated learning during the
COVID-19 pandemic using machine learning analytics. There are three secondary
objectives. The first objective was to study the context of hybrid learning management
with self-regulated learning style strategies during the COVID-19 pandemic among
university students. The driving factor is the impact of policies and COVID-19 situ-
ations where conventional learning cannot be managed. Therefore, learning styles
must be evolved to keep pace with the times. The second objective was to develop
a data science model for hybrid learning management with self-regulated learning
style strategies during the COVID-19 pandemic. The data science model is the equip-
ment for studying and understanding data using advanced statistical principles. It
provides covered insights into data models that can be useful in planning educational
strategies. The third objective was to study the students’ learning achievements with
the developed model. This objective aims to evaluate the model’s performance and
deploy the model to find faults or deficiencies to improve the model’s efficiency.
The research scope was to study the relationship between learners’ learning
achievement through hybrid learning management. Learners can choose any of the
two learning channels. The first channel is regular learning management, teaching in
the classroom where students and teachers have had face-to-face learning activities.
On the second channel, teachers had live broadcast learning activities from the class-
room. Students from both channels will learn simultaneously, known as synchronous.
A Compatible Model for Hybrid Learning and Self-regulated Learning … 425

In addition, students from both channels must undertake pre-test and post-test activ-
ities to assess the knowledge gained in each lesson. Students can choose to study in
any format without compulsion.
All learning activities consist of 15 lessons—Lesson 1: Information Technology
Overview, Lesson 2: Digital Systems Fundamentals, Lesson 3: Database System
Overview, Lesson 4: Computer Software, Lesson 5: Computer Hardware, Lesson 6:
Computer Networks and Communications, Lesson 7: Internet Technology, Lesson 8:
Social Media and Search Engine, Lesson 9: Multimedia Technology and Infographic,
Lesson 10: Knowledge Management Systems, Lesson 11: Mobile and Electronic
Commerce, Lesson 12: Impact of Information Technology, Lesson 13: Ethics and
Internet Security, Lesson 14: Contemporary Information Technology, and Lesson
15: Future Trends and Technologies.
This research furnishes researchers with methodological insights into analyzing
self-regulated learning dynamics for hybrid learning. The findings from this research
could also demonstrate and deepen understanding of the complexity and regulation
of abnormal learning situations, which positively impact students’ performance and
learning achievement.

2 Material and Methods

2.1 Population and Sample

The research population was students enrolled in the course 221101 [5] Fundamental
Information Technology in Business at the School of Information and Communica-
tion Technology, the University of Phayao, during the first academic year 2022.
Sample selection is a purposive sampling method with the consent of the learners
in the course. They were informed about the voluntary nature of learning activ-
ities. Students can learn online with Microsoft Teams and choose a face-to-face
learning style in the classroom. Apart from that, researchers have passed the process
of requesting research ethics from the University of Phayao: UP-HEC 1.3/022/65.

2.2 Research Design and Data Collection

The research design follows the self-regulated learning principles. The data collection
was a record of activities that occurred according to the self-regulated learning prin-
ciple, which was divided into four main activities: pre-test exams, post-test exams,
midterm exam, and final exam activities.
Pre-test and post-test exams are the same set of questions but were randomly
distributed during the exercise. Each set-test consists of 10 multiple-choice questions
with 10 min to complete. In addition, the exam is an online examination; students can
426 P. Nuankaew et al.

know the score immediately after the exam. During activities in each class, learners
take pre-tests and know their scores, after which learners set post-test goals.
The midterm exam activity is an activity to assess the synthesis of knowledge
acquired during the period from Week 1 to Week 8. Finally, the final exam activity
is a compilation of the knowledge gained from Week 10 to Week 16.
Of the four main activities, the researchers extracted 63 attributes from seven
categories: 15 pre-test exams, 15 post-test exams, 15 pre-test duration exams, 15
post-test duration exams, midterm exam scores, final exam scores, and learning
outcomes (grades).

2.3 Research Tools and Model Development

Research tools are divided into two parts: descriptive analytics and predictive
analytics. Descriptive analytics is a fundamental analysis that gives an overview
of the data and the relationship between the data. They used descriptive analytics to
explain what has happened in the past and may be used to assist in decision-making.
It may use statistics such as finding proportions or percentages, measuring the data’s
central tendency, and finding the dataset’s correlation. The constituent tools in this
section are means, mode, median, maximum, minimum, standard deviation (S.D.),
and percentage.
The second research tool is predictive analytics. It serves as technology that learns
from experience or previous data to predict certain behaviors that will occur in
the future. It comprises several techniques, including advanced statistics, machine
learning, and data mining. In many areas, predictive analytics is modeling patterns
derived from historical data to identify the opportunities or risks that many decisions
are made daily.
Predictive analytics research tools are separated into three phases: clustering
optimal learner behavior, constructing predictive models, and performing majority
voting. Optimum clustering aims to understand cluster learning behaviors and intra-
cluster relationships. The techniques used include K-Means and K-Medoids. K-
Means is an unsupervised learning technique that is easy to understand because it
calculates the distance between data sets where clusters of data in the same group are
closely spaced [13, 14]. Distance calculations in K-Means use Euclidean distance
calculations to compare similarities. K-Medoids use the same principles as K-Means,
differing in that K-Medoids takes the actual dataset collected as the center point
[14, 15].
Constructing predictive models was the second phase of the process, with
researchers determining two designs of models: a single model analysis and the
ensemble models analysis. A single model analysis consists of three types of learning
tools: Decision Trees, K-Nearest Neighbors (KNN), and Naïve Bayes techniques [10,
12, 14]. The ensemble model analysis combines multiple supervised learning tools
to develop the most efficient model. The techniques used in this section include
Majority Vote, Gradient Boosted Trees, and Random Forests.
A Compatible Model for Hybrid Learning and Self-regulated Learning … 427

Please note that researchers apply optimal clustering results with hybrid learning
styles to develop the most appropriate model for describing their findings. Every
predictive model uses the results of each clustering technique to identify each record
as the class. The most appropriate model results were used to study the context
of learners in each cluster who the researchers predicted based on actual data that
had already happened in the course 221101 [5] Fundamental Information Tech-
nology in Business at the School of Information and Communication Technology,
the University of Phayao, during the first academic year 2022.

2.4 Research Analysis and Interpretation

The results analysis aims to determine the model’s effectiveness developed from the
designed research process. The model performance testing process consists of two
parts. The first part is to design data partitions to develop the model and prepare
the data to test the model. The method used in this section is known as “the cross-
validation technique”. It divides the data into equal parts called “K-Fold”, where
K is the number of data groups to divide. After obtaining the required number of
clusters, take some data to create a model called “the training data set”. Then take
the rest of the data to test, called “the testing data set”. This step required working
together with “the confusion matrix” to determine the capability and potential of the
developed model.
The confusion matrix is a unique cross-tabulation table that takes two columns into
a summary table, including actual and prediction. The popularity of the confusion
matrix is that it is elementary to create and can calculate multiple statistics from this
table. The four most commonly used Classification Metrics are Accuracy, Precision,
Recall, and F1-Score indicators.
Accuracy is the overall accuracy (correctness) of the model. The model accuracy
calculation was calculated by dividing the total predicted data by the actual data.
Precision is a consideration of the ability to predict by classification by class. The
method for calculating the precision is to divide the number correctly, indicated by
the number available in the class.
The recall is a summary of accurate prediction results from actual data in each
class. The recall calculation is based on the actual predicted data divided by the
number of data present in the class.
Lastly, F1-Score is a harmonic mean average of precision and recall. It is calculated
using the formula from Eq. 1.

2 ∗ (precision ∗ recall) / (precision + recall) (1)

428 P. Nuankaew et al.

3 Results and Discussion

3.1 Research Results

The research results were divided into three sections corresponding to the research
objectives: the context summary of the hybrid learning management model, the
results of model development, and the results of selecting the most appropriate model.
Context Summary of Hybrid Learning Model Management
The researchers designed the learning activities according to the principle of self-
regulated learning by having the students do pre-test activities, set learning goals,
and do post-test activities. The researchers can summarize the learners’ context, as
shown in Tables 1 and 2.
Table 1 shows learners were more interested in post-test than pre-test learning
activities. The researchers found that post-test activity scores were significantly
higher than pre-test activities. It averaged 9.22 points over the pre-test activities
average of 4.29 points. However, the researchers found that the average amount of
time spent doing post-test activities tended to be lower. The reason may be that
learners know the answers during learning activities.

Table 1 Data collection

Activity Pre-test Post-test
Min Max Mean S.D. A.T. Min Max Mean S.D. A.T.
Activity 01 2.00 10.00 5.95 2.34 4:18 7.00 10.00 9.58 0.84 1:38
Activity 02 2.00 10.00 5.77 1.89 4:13 6.00 10.00 9.18 1.17 3:41
Activity 03 0.00 8.00 3.90 2.18 4:08 3.00 10.00 8.82 1.75 2:27
Activity 04 2.00 10.00 4.57 1.87 3:38 2.00 10.00 8.41 1.83 3:04
Activity 05 1.00 10.00 4.64 2.21 4:20 6.00 10.00 9.13 1.21 2:09
Activity 06 0.00 8.00 4.11 2.06 3:10 6.00 10.00 9.37 1.05 2:29
Activity 07 1.00 9.00 4.49 2.01 4:10 5.00 10.00 9.58 1.06 2:07
Activity 08 1.00 7.00 4.39 1.41 4:08 6.00 10.00 9.55 1.04 2:07
Activity 09 1.00 8.00 4.68 2.11 4:09 4.00 10.00 9.18 1.45 1:54
Activity 10 0.00 10.00 3.70 2.21 4:26 8.00 10.00 9.60 0.65 2:16
Activity 11 1.00 8.00 3.71 1.47 4:18 4.00 10.00 8.84 1.69 2:55
Activity 12 0.00 10.00 3.38 2.47 3:36 4.00 10.00 9.24 1.22 2:17
Activity 13 1.00 8.00 4.68 2.11 4:09 4.00 10.00 9.18 1.45 1:54
Activity 14 1.00 6.00 2.71 1.13 3:00 4.00 10.00 9.42 1.54 2:05
Activity 15 1.00 9.00 3.67 1.71 3:30 2.00 10.00 9.15 1.53 2:43
Average 0.93 8.73 4.29 1.95 3:56 4.73 10.00 9.22 1.30 2:23
A.T. average time
A Compatible Model for Hybrid Learning and Self-regulated Learning … 429

Table 2 Context of learners’ achievement

Grade Fix-rate Students Grade Fix-rate Students
A 80.00–100.00 7 (15.91%) B+ 75.00–79.99 7 (15.91%)
B 70.00–74.99 5 (11.36%) C+ 65.00–69.99 6 (13.64%)
C 60.00–64.99 6 (13.64%) D+ 55.00–59.99 4 (9.09%)
D 50.00–54.99 4 (9.09%) F 0.00–49.99 5 (11.36%)

The researchers summarized the learning achievement as shown in Table 2. It

found that the majority of students had the highest level of achievement, it was only
11.36% or 5 students who did not achieve the learning achievement.
From both tables, it can be concluded that the hybrid learning management is
successful, the learners can achieve learning achievement without any limitation on
the learning method.
Model Development
Model development results were divided into two parts: the effects of appropriate
clusters analysis, as reported in Table 3, Table 4 and Table 5, respectively, and
the consequences from supervised learning techniques with single techniques and
ensemble techniques, as reported in Table 6.

Table 3 Summary of calculation of average within centroid distance for K-Means

k-Value ACD DPP k-Value ACD DPP k-Value ACD DPP
2 54.77 4.92 7 39.52 3.74 12 30.76 2.15
3* 50.24* 4.53* 8 37.83 1.69 13 29.06 1.70
4* 47.48* 2.76* 9 35.92 1.91 14 26.68 2.38
5 44.86 2.62 10 33.90 2.02 15 25.97 0.71
6 43.26 1.60 11 32.91 0.99 16 24.58 1.39
ACD average within centroid distance, DPP distance from the previous point
Asterisks indicate the best optimal values for model deployment

Table 4 Summary of calculation of average within centroid distance for K-Medoids

k-Value ACD DPP k-Value ACD DPP k-Value ACD DPP
2 84.41 28.09 7 64.77 3.41 12 50.73 0.70
3* 79.30* 5.11* 8 62.07 2.70 13 48.52 2.21
4 75.73 3.57 9 57.93 4.14 14 46.57 1.95
5 71.27 4.45 10 55.00 2.93 15 42.77 3.80
6 68.18 3.09 11 51.43 3.57 16 42.16 0.61
ACD average within centroid distance, DPP distance from the previous point
Asterisks indicate the best optimal values for model deployment
430 P. Nuankaew et al.

Table 5 Members in each cluster were classified according to each technique

Cluster Items
K-Means K-Medoids
Cluster_0 10 items (22.73%) 10 items (22.73%)
Cluster_1 21 items (47.73%) 4 items (9.09%)
Cluster_2 13 items (29.55%) 30 items (68.18%)
Total 44 items (100%) 44 items (100%)

Table 6 Model analysis results

Classifier/Cluster Single model
Decision tree KNN Naïve Bayes
Prec. Recall F1 Prec. Recall F1 Prec. Recall F1
Cluster_0 66.67 40.00 50.00 76.92 100 86.96 100 19.61 32.79
Cluster_1 72.00 85.71 78.26 94.44 80.95 87.18 60.00 90.24 72.08
Cluster_2 53.85 53.85 53.85 92.31 92.31 92.31 66.67 48.00 55.81
Accuracy 66.00 88.00 62.33
Ensemble model
Majority vote GBT Random forest
Prec. Recall F1 Prec. Recall F1 Prec. Recall F1
Cluster_0 90.00 90.00 90.00 66.67 60.00 63.16 85.71 56.60 68.18
Cluster_1 91.30 100 95.45 70.83 80.95 75.56 76.92 95.24 85.11
Cluster_2 90.91 76.92 83.33 72.73 61.54 66.67 81.82 69.23 75.00
Accuracy 90.50 70.50 80.00
Prec. Precision, KNN K-Nearest Neighbors, GBT Gradient Boosted Trees

3.2 Research Discussion

Discussion of research findings is the last step in which the researcher can conclude
once the analysis and research results have been obtained. This research achieved all
the research objectives where the researchers found three key points: knowing the
learners’ context of hybrid learning, developing predictive models, and the model
performance is classified according to the following essential techniques.
Learners’ Context in Hybrid Learning Style
This research found that hybrid learning management during the COVID-19
pandemic in higher education achieves particularly satisfactory levels of learning
achievement, as shown in Tables 1 and 2.
Table 1 shows the analysis of the learning behavior of learners who participated
in all 15 activities throughout the semester. The conclusion from Table 1 is that
the learners improved and were able to score well in the post-test with an average
A Compatible Model for Hybrid Learning and Self-regulated Learning … 431

of 9.22 points and tended to spend less time on the post-test compared to the pre-
test activities. From the aforementioned significance, it can be concluded that the
hybrid learning management that learners can choose to study online or participate
in classroom activities has no different results. Therefore, it should be more widely
promoted and extended opportunities for educational options for learners.
Moreover, Table 2 shows that the majority of learners achieved learning achieve-
ment, and only 11.36% or five students failed in learning achievement. However, this
group of learners was clustered and analyzed for their probability of not achieving
learning achievement, as the researchers clustered and created a predictive model,
reported in part for the development of the model.
Reasonable Predictive Model
The results of the designed model development in two main steps, appropriate clus-
tering, and model construct consistent with the student context, as shown in Tables 3,
4, 5 and 6.
The researchers clarified and provided a suitable cluster analysis principle, as
shown by comparing the results of the two clustering techniques: K-Means and K-
Medoids techniques are in Tables 3 and 4 and the selection of optimal k values in
Table 5. The researchers found that the optimal number of clusters for clustering
the learners was three clusters. The members of each collection were distributed in
Table 5. Moreover, it was found that the K-Means technique has a uniform member
distribution. Therefore, the researchers decided to use the K-Means technique by
dividing the members into three clusters to develop a predictive model for learner
characteristics in this research.
After obtaining a suitable cluster, the researchers developed a prediction model
classified into two types: a single model analysis and an ensemble model analysis.
The analytical results are shown in Table 6. The research results concluded that the
model that should be utilized and furthered is the majority vote model. It has an
accuracy of 90.50%, as detailed model performance shown in Table 7.
Table 7 shows that the majority vote model effectively distributes the predic-
tions across all clusters. The researchers, therefore, concluded that this research was

Table 7 Majority vote model performance

Accuracy = Actual Cluster_0 Actual Cluster_1 Actual Cluster_2 Class precision
80.50% (%)
Pred. 9 0 1 90.00
Cluster_0
Pred. 0 21 2 91.30
Cluster_1
Pred. 1 0 10 90.91
Cluster_2
Class recall 90.00 100.00 76.92
(%)
432 P. Nuankaew et al.

successful and achieved all research objectives, and it deserves to be praised and
spread further.

4 Conclusion

This research studies a compatible model for hybrid learning and self-regulated
learning during the COVID-19 pandemic using the K-Means and K-Medoids tech-
niques for clustering students, the elbow technique for optimal clustering results,
and classification and ensemble techniques for creating a predictive model and the
model performance analysis. The results showed three optimal clustering students
were appropriate for learners’ behavior and that it was consistent with the high-
performing model by the majority vote model with the highest validity, an accu-
racy of 90.50%. It helps instructors to manage activities in the course for learners’
context in hybrid learning management with self-regulated learning style strategies
and improving the student’s learning achievements. In future studies, the researchers
plan to apply in another course and study the relationship between score and time of
testing.

5 Research Limitations

The limitation of this research is that the researchers designed and strictly controlled
their activities. However, the researchers did not use pre-test scores to analyze
achievement or grades. It resulted in a few groups of students not cooperating with
pre-test activities, with students concentrating on post-test activities. Therefore, the
researcher must use the means to replace the missing values in the appropriate clus-
tering analysis. Therefore, future research requires researchers to find strategies for
engaging students in all activities designed by researchers.

Conflict of Interest The authors declare no conflict of interest.

A Compatible Model for Hybrid Learning and Self-regulated Learning … 433

References

1. Srisubat A, Thanasitthichai S, Kongsaengdao S, Maneeton N, Maneeton B, Akksilp S (2023)

Effectiveness of Favipiravir monotherapy in the treatment of COVID-19: real world data anal-
ysis from Thailand. In: The Lancet regional health—Southeast Asia, vol 11, pp 100166. https://
doi.org/10.1016/j.lansea.2023.100166
2. Schultz CM, Burke LA, Kent DA (2023) A systematic review and meta-analysis of the initial
literature regarding COVID-19 symptoms in children in the United States. J Pediatr Health
Care. https://doi.org/10.1016/j.pedhc.2023.02.006
3. Daniel, Cenggoro TW, Pardamean B (2023) A systematic literature review of machine learning
application in COVID-19 medical image classification. Procedia Comput Sci 216:749–756.
https://doi.org/10.1016/j.procs.2022.12.192
4. Abuhammad S (2020) Barriers to distance learning during the COVID-19 outbreak: a qualitative
review from parents’ perspective. Heliyon 6:e05482. https://doi.org/10.1016/j.heliyon.2020.
e05482
5. Abdel-Basset M, Chang V, Nabeeh NA (2021) An intelligent framework using disruptive
technologies for COVID-19 analysis. Technol Forecast Soc Chang 163:120431. https://doi.
org/10.1016/j.techfore.2020.120431
6. Almarzooq ZI, Lopes M, Kochar A (2020) Virtual learning during the COVID-19 pandemic: a
disruptive technology in graduate medical education. J Am Coll Cardiol 75:2635–2638. https://
doi.org/10.1016/j.jacc.2020.04.015
7. Amin S, Sumarmi S, Bachri S, Susilo S, Bashith A (2020) The effect of problem-based hybrid
learning (PBHL) models on spatial thinking ability and geography learning outcomes. Int J
Emerg Technol Learn 15:83–94. https://doi.org/10.3991/ijet.v15i19.15729
8. Kazu İY, Yalçın CK (2022) Investigation of the effectiveness of hybrid learning on academic
achievement: a meta-analysis study. Int J Prog Educ 18:249–265. https://doi.org/10.29329/ijpe.
2022.426.14
9. Li M (2022) Learning behaviors and cognitive participation in online-offline hybrid learning
environment. Int J Emerg Technol Learn 17:146–159. https://doi.org/10.3991/ijet.v17i01.
28715
10. Aldowah H, Al-Samarraie H, Fauzy WM (2019) Educational data mining and learning analytics
for 21st century higher education: a review and synthesis. Telematics Inform 37:13–49. https://
doi.org/10.1016/j.tele.2019.01.007
11. Baker RS, Martin T, Rossi LM (2016) Educational data mining and learning analytics. In: The
Wiley handbook of cognition and assessment. Wiley, pp 379–396. https://doi.org/10.1002/978
1118956588.ch16
12. Nuankaew P, Nasa-Ngium P, Kunasit T, Nuankaew WS (2023) Implementation of data analytics
and machine learning in Thailand education sector. Int J Emerg Technol Learn (iJET) 18:175–
191. https://doi.org/10.3991/ijet.v18i05.36871
13. Hamerly G, Drake J (2015) Accelerating Lloyd’s algorithm for k-means clustering. In: Celebi
ME (ed) Partitional clustering algorithms. Springer International Publishing, Cham, pp 41–78.
https://doi.org/10.1007/978-3-319-09259-1_2
14. Shahiri AM, Husain W, Rashid NA (2015) A review on predicting student’s performance
using data mining techniques. Procedia Comput Sci 72:414–422. https://doi.org/10.1016/j.
procs.2015.12.157
15. Sureja N, Chawda B, Vasant A (2022) An improved K-medoids clustering approach based on
the crow search algorithm. J Comput Math Data Sci 3:100034. https://doi.org/10.1016/j.jcmds.
2022.100034
16. Yuan C, Yang H (2019) Research on K-value selection method of K-means clustering algorithm.
2:226–235. https://doi.org/10.3390/j2020016
17. Marutho D, Hendra Handaka S, Wijaya E, Muljono (2018) The determination of cluster number
at k-mean using elbow method and purity evaluation on headline news. In: 2018 International
seminar on application for technology of information and communication, pp 533–538. https://
doi.org/10.1109/ISEMANTIC.2018.8549751
Hand Gesture Recognition
and Real-Time Voice Translation
for the Deaf and Dumb

Shabina Modi, Yogesh Mali , Rekha Kotwal, Vishal Kisan Borate ,

Prajakta Khairnar, and Apashabi Pathan

Abstract People communicate in various ways, including spoken and written

language, as well as through nonverbal signals. For individuals who are deaf and
mute, sign language becomes the crucial medium of interaction. Yet, when engaging
with others unfamiliar with sign language, communication hurdles can arise, leading
to frustration and a diminished ability to convey feelings accurately. This issue
becomes even more acute during emergencies, where clear communication is vital.
To tackle this challenge, researchers have been investigating methods to convert
hand gestures into text and sound. Two key methodologies for gesture recognition are
vision-based, which employs sensors, and non-vision-based, which utilizes cameras.
This research concentrates on a vision-based strategy, implementing a gesture recog-
nition system through artificial neural networks. This technology aims to identify
hand movements, facilitating ongoing communication. Additionally, the study eval-
uates the advantages and limitations of recognizing hand gestures. Ultimately, this
research endeavors to overcome the communication obstacles encountered by sign

S. Modi
Karmaveer Bhaurao Patil College of Engineering, Satara, India
e-mail: [email protected]
Y. Mali (B) · A. Pathan
G.H. Raisoni College of Engineering and Management, Wagholi, Pune, Maharashtra, India
e-mail: [email protected]
A. Pathan
e-mail: [email protected]
R. Kotwal
JSPM’s Bhivarabai Sawant Institute of Technology and Research, Pune, India
e-mail: [email protected]
V. Kisan Borate
Dr. D. Y. Patil College of Engineering and Innovation, Talegoan, Pune, India
e-mail: [email protected]
P. Khairnar
Ajeenkya D. Y. Patil School of Engineering, Pune, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 435
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_35
436 S. Modi et al.

language users, enhancing their capability to interact effectively, especially in critical

situations.

Keywords Hand-gesture · Communication media · Pyt_tsx3 · T_kinter_Library

1 Introduction

Effective communication plays a pivotal role in the lives of all creatures, serving as the
foundation for meaningful interactions and mutual understanding among humans, as
McFarland noted [1]. It enables the establishment of relationships, fosters intimacy,
and acts as a conduit for sharing knowledge and understanding between people and
organizations. Despite its significance, over 250–300 million individuals globally
face hearing and speech impairments, according to Wikipedia. For those who are
deaf or mute, sign language stands as their primary communication method [2]. Yet,
the challenge of communicating with those who are hearing arises from the general
lack of familiarity with sign language, as highlighted by studies into sign language
translation systems. Sign language, a rich natural language, employs hand shapes,
positions, movements, and facial expressions to convey meaning, boasting its own
grammar and vocabulary akin to spoken languages [3]. Nonetheless, sign language
proficiency among hearing people is rare, and research into its translation is still
nascent in many regions.
Gesture recognition technology, crucial for bridging communication gaps,
branches into sensor-based and vision-based systems [4]. Sensor-based approaches
utilize data gloves or motion sensors to capture detailed gesture information,
providing precision but at the cost of convenience, as wearing a sensor-laden glove
can hinder natural signing flow and reduce user comfort [5]. Vision-based systems, in
contrast, rely on image processing to detect and analyze gestures, offering a less intru-
sive user experience since they require no additional wearable devices [6]. However,
this method faces its own challenges, including dealing with complex backgrounds,
variable lighting conditions, and recognizing gestures that involve more than just
hand movements.

1.1 Objectives

• To make a framework that perceives hand signals.

• To foster a framework that changes hand signals into discourse.
• To execute an easy-to-use and effective framework for people who are hard of
hearing and quiet, using Profound Learning calculations.
Hand Gesture Recognition and Real-Time Voice Translation … 437

2 Literature Review

Hand signal acknowledgment innovation arises as a vital guide for upgrading commu-
nication for people who are hard of hearing or discourse weakened, using PC vision
and AI calculations to make an interpretation of hand movements into discourse or
text [7]. This writing review combines late exploration endeavors pointed toward
propelling hand signal acknowledgment to more readily uphold hard-of-hearing and
discourse-weakened people [8].
This survey digs into different examinations that have researched hand motion
recognition as a way to work with correspondence for individuals who are hard
of hearing, quiet, or hearing-hindered [9]. These investigations utilize a scope of
AI procedures, including convolutional brain organizations (CNN), support vector
machines (SVM), intermittent brain organizations (RNN), and fake brain organiza-
tions (ANN), to decipher hand developments for correspondence. Key discoveries,
systems, and commitments from each study are featured [10].
Among the outstanding works, it presented a hand motion acknowledgment frame-
work using CNN, accomplishing critical exactness in sign motion acknowledgment to
empower viable client correspondence; it investigated an intuitive framework joining
discourse and motion acknowledgment through CNN [11]. They fostered an assistive
glove upgrading communication for the hard of hearing or nearly deaf by combining
voice and motion modalities. They zeroed in on continuous American Commu-
nication through signing (ASL) letter acknowledgment utilizing SVM, promising
applications in gesture-based communication translation. They introduced a frame-
work for perceiving hand motions and changing them into discourse, supporting
correspondence with non-underwriters [12].
They proposed a continuous signal acknowledgment framework utilizing an
LSTM model for discourse transformation. They fostered a bilingual sign acknowl-
edgment framework utilizing picture-based strategies, empowering discussions in
communication via gestures “Simple Talk” deciphers Sri Lankan Communication
via gestures into communicating in language utilizing computer-based intelligence,
working with collaboration with non-SLSL speakers [13]. To present a CNN-based
discussion motor for those with hearing and vocal disabilities investigated activity
and communication via gestures acknowledgment utilizing AI [14].
Further commitments from various creators, the modified model for motion distin-
guishing proof, the PCA-based CNN technique for communication through signing
acknowledgment, and CNN-based framework for static hand motion acknowledg-
ment [15]. Inspected CNN-based include combination for perceiving dynamic sign
motions, offering bits of knowledge into highlight extraction and grouping for
communication through signing acknowledgment [16].
The looked into concentrates on features a range of AI applications, from CNNs
and SVMs to RNNs and ANNs, for hand signal acknowledgment pointed toward
helping people with hearing, discourse, or language weaknesses [17]. These head-
ways mean the capability of assistive advancements to close the correspondence hole
438 S. Modi et al.

for those with incapacities, featuring the continuous development close by signal
acknowledgment for upgraded informative communications [18].

3 Proposed Methodology

The framework gets video outlines through a webcam, utilizing a hand-following

module to recognize the hand inside the edge. Thus, it confines a trimmed district of
interest (return for money invested) encompassing the hand for extra examination.
This return on initial capital investment is resized to a uniform aspect and standardized
to improve the precision of order. The design of the proposed procedure is shown in
the schematic block graph introduced in Fig. 1.
Hand motion acknowledgment is brought out through a pre-prepared profound
gaining model obtained from a Workable Machine. This model cycles the resized
locale of interest (return on initial capital investment) and relegates a hand signal
name from a foreordained assortment of marks. The recognized motion name is then
shown on the video outline, joined by a jumping box that features the hand’s area
[19].
Furthermore, the framework consolidates a sentence cushion intended to gather
a progression of hand motions into a sound ASL sentence. This support is power-
fully refreshed with each perceived hand motion, with the finished sentence being
displayed on the video outline. To upgrade client experience, the framework high-
lights message-to-discourse abilities, changing the collected sentence into discernible
discourse for sound criticism [20].
This arrangement offers ongoing visual and hear-able criticism of perceived hand
motions and their comparing ASL sentences, empowering clients to convey through
hand signals caught by a webcam. The framework tracks down utility in scope
of uses, including assistive specialized gadgets for the consultation hindered, intu-
itive communication through signing instructive devices, and connection points for
human-PC cooperation [21].

Fig. 1 Block schematic for the suggested work

Hand Gesture Recognition and Real-Time Voice Translation … 439

a b c

Spacing between words Clean Clear All

Fig. 2 Gesture of alphabets from the training set

3.1 Dataset Description

In the information assortment period of our review, we influence the OpenCV library
for picture control close by the CVZone module for hand discovery and motion order.
Our dataset envelops 27 novel static hand motions, with each signal addressed by
roughly 2100 pictures of 256 by 256 pixels in size. The hand recognition module
uses complex PC vision strategies to recognize and follow human hands in video
takes care of continuously. This module unequivocally decides the area of the hands
inside an edge and works out their 2D central issues, which incorporate a sum of 34
focuses covering fingertips, the focal point of the palm, and wrist positions [22].
Figure 2 shows instances of a portion of the hand motion classes highlighted in
our research, including “a, b, c, Clear, Expert Clear, and Space.”
Choosing a suitable number of ages for preparing our model is an urgent decision
in our exploration cycle. Preparing for additional ages can improve the model’s accu-
shocking yet could expand the preparation span and raise the gamble of over fitting.
Then again, a predetermined number of ages probably won’t permit the model to learn
sufficiently, prompting under fitting and thusly, and diminished precision and shoddy
performance. To address this, we have directed various tests through experimentation,
at last choosing to prepare our model in two stages, each comprising 43 ages. This
approach intends to adjust precision and preparation time actually, ensuring ideal
model execution [23].

3.2 Model Training

Workable Machine, created by Google, is an easy-to-understand web application

that engages people to make custom AI models with next to no earlier coding
440 S. Modi et al.

mastery. It offers a direct stage for preparing AI models by giving marked instances
of information.
The application works with the making of three kinds of models: picture grouping,
sound characterization, and posture order. Picture arrangement models can figure out
how to recognize various articles or examples in pictures, while sound characteriza-
tion models can distinguish different sounds. Present order models, then again, can
perceive different body postures and signals.
To prepare a model utilizing Workable Machine, clients input named information
models for the model to gain from. The program then, at that point, utilizes AI
strategies to prepare the model utilizing the given information. Clients have the
choice to test and refine the model to upgrade its exactness.
Workable Machine fills in as a significant device for various applications,
including training, imaginative undertakings, and examination. Its instinctive connec-
tion point and nonattendance of programming prerequisites make it open to an
expansive crowd, empowering a large number of clients to use AI innovation really.

3.3 Use of Different In-built Libraries

In this work, we are using various Python libraries that are following:
• CV_Zone
“CV” possibly means “PC Vision,” a feature of computerized reasoning dedicated
to empowering robots to decipher and figure out visual boosts from their surround-
ings, including pictures and recordings. The “CV_Zone” logical means a particular
space or application inside PC vision innovation. PC vision finds assorted reasonable
applications, including picture and video acknowledgment, facial recognition, object
following, independent vehicles, and clinical picture examination, among others.
• Open_CV
Open_CV remains a broad open-source library taking care of PC vision, AI, and
picture-handling errands. It offers support for multiple programming languages,
including Python, C++, and Java. With its capabilities, Open_CV can analyze movies
and photos to identify objects, individuals, and even decipher human handwriting.
Integration with various libraries, such as the sophisticated numerical operations
library Num_Py, expands its utility manifold. By combining Open_CV with Num_
Py, one can execute all operations feasible with Num_Py, thus enhancing the arsenal
of tools at one’s disposal. Embarking on an Open_CV tutorial will provide a thor-
ough understanding of image processing principles. Through a variety of Open_CV
projects, one can delve into more intricate concepts, including advanced image and
video manipulations.
Hand Gesture Recognition and Real-Time Voice Translation … 441

• Num_Py (vr 1.18.9)

The Python library ‘Num_Py’ is employed for array manipulation, offering a rich
suite of tools for matrix operations, linear algebra, and Fourier transforms. Travis
Oliphant developed Num_Py in 2003, where it gained prominence as an acronym
for ‘Numerical Python.
• Tensor_Flow
Tensor_Flow is a comprehensive open-source machine learning platform. This flex-
ible ecosystem furnishes a workflow with high-level APIs, alongside an extensive
array of resources, libraries, and tools. Users have the freedom to select from a
diverse range of conceptual levels within the framework, facilitating the creation and
deployment of machine learning models.
• Epoch
During the training phase of a neural network model, there are multiple epochs, with
each epoch representing a complete iteration through the entire training dataset. Put
simply, an epoch signifies the number of times the model has processed the entire
training data throughout the learning procedure.
• Keras
The high-level neural network library, Keras, is built upon foundational frameworks
such as Tensor_Flow, CN_TK, and Theano. With Keras, deep learning tasks can be
efficiently and swiftly executed, allowing for seamless execution of CPU and GPU
operations simultaneously. Developed using the Python programming language,
known for its debugging-friendly nature and resilience, Keras ensures a user-friendly
and robust environment for neural network development.
• Pyttsx_3
Pyttsx_3 is a Python bundle intended to flawlessly integrate text-to-discourse useful-
ness into Python applications. Working across numerous stages including Windows,
Linux, and Mac_OS, it offers broad help for different voices and languages. With
pyttsx3, designers can easily change over text input into constant discourse yield. The
library likewise offers a clear Programming interface, enabling designers to tailor
discourse qualities like speed, volume, and pitch as per their necessities.
• T_kinter
T_kinter, a Python module, provides an array of utilities for constructing graphical
user interfaces (GUIs) within programs. It enjoys extensive usage and is conveniently
included in the standard Python distribution, thus readily available to developers. T_
kinter facilitates the incorporation of diverse widgets such as buttons, labels, and entry
fields, while enabling event-driven programming through callbacks. Renowned for
442 S. Modi et al.

its simplicity and adaptability, T_kinter remains a favored option for crafting Python
desktop applications with intuitive GUIs.

4 System Flowchart

See Fig. 3.

Fig. 3 Flowchart
Hand Gesture Recognition and Real-Time Voice Translation … 443

5 Results Analysis and Calculations

The preparation dataset comprised a workable Machine model prepared by more

than 43 ages, empowering the model to perceive hand signals. The outcome of
the acknowledgment is not entirely settled by variables like the quantity of layers,
streamlining agents, and channels utilized. The coordinated work process of the
proposed research is delineated in Fig. 3.
At first, we zeroed in on our 29 classes, crossing from a to x, y, z, CLEAR, Expert
CLEAR, and SPACE. Accomplishing 95% exactness for each class, we continued to
construct words utilizing these classes. Letters in order characters were used to shape
words, with ‘Clear’ rectifying any misrecognized letter sets. This stage yielded a re-
makeable exactness of 96.98%. Thusly, we left on the making of sentences to upgrade
correspondence between people with hearing and discourse impairments and those
without. Through this stage, we accomplished continuous and exceptionally quick
outcomes, working with consistent correspondence.

5.1 Predicting Alphabets

All alphabets were successfully predicted with 97.68% accuracy. The predicted
alphabetic images are depicted in Fig. 4(i), (ii).

5.2 Predicting Words on T_Kinter-GUI

Utilizing the aforementioned alphabets, we formed words, employing the Python

library T_kinter to design a graphical user interface (GUI). Figure 5a, b showcases
the depiction of sign language translation and voice recognition for individuals with
hearing and speech impairments.

5.3 Through Hand Gestures Making a Sentence

After completing the words, our attention shifted towards creating sentences. We
achieved real-time and optimal accuracy and efficiency in this endeavor as well.
Figure 6 showcases the results of speech conversion, sentence generation, and hand
gesture recognition.
The recommended model, utilizing Adam as an enhancer, accomplishes a prepa-
ration exactness of 97.85% and an approval precision of 92.63%. Commonly, these
organizations comprise many layers and a large number of channels. Table 1 presents
444 S. Modi et al.

Fig. 4 (i) Alphabet A hand gesture. (ii) Alphabet B hand gesture

measurable information in regard to exactness and test size sorted by class. Figure 7a,
b delineate diagrammatical portrayals of exactness per age and misfortune per age.

6 Conclusions

This study features the turn of events and sending of a framework for Amer-
ican Communication through signing (ASL) acknowledgment. ASL works with the
simplicity of correspondence for individuals with hindrances, as it considers the
direct utilization of letter sets. We upgraded our model by consolidating extra signs
in ASL, expanding both efficiency and precision. To additionally refine the acknowl-
edgment framework, it is basic to gather more datasets under changing lighting
Hand Gesture Recognition and Real-Time Voice Translation … 445

Fig. 5 a Real-time hand gestures. b Hand gesture for word INDIA

Fig. 6 Real-time view of focal points

conditions. By amassing picture-based datasets, the framework can actually distin-

guish static markers and go through ongoing testing. Moreover, upgrading the frame-
work’s precision involves directing trials in Workable Machine, using various classes
containing 2100 pictures each across 43 ages. Our proposed framework accomplishes
a momentous precision of 97.85% progressively hand signal acknowledgment with
voice transformation.
446 S. Modi et al.

Table 1 Accuracy per class

Class Accuracy #Samples
A 1.00 152
B 1.00 99
C 1.00 80
D 1.00 197
E 1.00 164
F 1.00 304
G 1.00 77
H 1.00 208
I 1.00 185
J 1.00 116
K 1.00 254
L 1.00 236
M 1.00 135
N 1.00 125
O 1.00 236
P 1.00 135
Q 1.00 80
R 1.00 299
S 0.99 159
T 1.00 166
U 1.00 356
V 0.99 195
W 1.00 209
SPACE 1.00 150
MASTER Class 1.00 129
CLEAR 1.00 469
Y 1.00 235
Hand Gesture Recognition and Real-Time Voice Translation … 447

Fig. 7 a Accuracy per attempt. b Loss per attempt

448 S. Modi et al.

References

1. Vaidya AO, Dangore M, Borate VK, Raut N, Mali YK, Chaudhari (2024) A deep fake detection
for preventing audio and video frauds using advanced deep learning techniques. In: 2024 IEEE
Recent advances in intelligent computational systems (RAICS), Kothamangalam, Kerala, India,
pp 1–6. https://doi.org/10.1109/RAICS61201.2024.10689785
2. Karajgar MD (2024) Comparison of machine learning models for identifying malicious URLs.
In: 2024 IEEE International conference on information technology, electronics and intelligent
communication systems (ICITEICS), Bangalore, India, pp 1–5. https://doi.org/10.1109/ICITEI
CS61368.2024.10625423
3. Naik DR, Ghonge VD, Thube SM, Khadke A, Mali YK, Borate VK (2024) Software-defined-
storage performance testing using mininet. In: IEEE International conference on information
technology, electronics and intelligent communication systems (ICITEICS), Bangalore, India,
pp 1–5. https://doi.org/10.1109/ICOEI48184.2020.9143031
4. Chaudhari A (2024) Cyber security challenges in social meta-verse and mitigation tech-
niques. In: MIT art, design and technology school of computing international conference
(MITADTSoCiCon), Pune, India, pp 1–7. https://doi.org/10.1109/MITADTSoCiCon60330.
2024.10575295
5. Mali Y, Pawar ME, More A, Shinde S, Borate V, Shirbhate R (2023) Improved pin entry
method to prevent shoulder surfing attacks. In: 14th International conference on computing
communication and networking technologies (ICCCNT), Delhi, India, pp 1–6. https://doi.org/
10.1109/ICCCNT56998.2023.10306875
6. Modi S, Mali YK, Borate V, Khadke A, Mane S, Patil G (2023) Skin impedance technique to
detect hand-glove rupture. In: 2023 OITS International conference on information technology
(OCIT), Raipur, India, pp 309–313. https://doi.org/10.1109/OCIT59427.2023.10430992
7. Chaudhari A, Dargad S, Mali YK, Dhend PS, Hande VA, Bhilare SS (2023) A technique
for maintaining attribute-based privacy implementing blockchain and machine learning. In:
IEEE international Carnahan conference on security technology (ICCST), Pune, India, pp 1–4.
https://doi.org/10.1109/ICCST59048.2023.10530511
8. Mali Y, Pawar ME, More A, Shinde S, Borate V, Shirbhate R (2023) Improved pin entry method
to prevent shoulder surfing attacks. In: 2023 14th International conference on computing
communication and networking technologies (ICCCNT), Delhi, India, pp 1–6. https://doi.
org/10.1109/ICCCNT56998.2023.10306875
9. Bhongade A, Dargad S, Dixit A, Mali YK, Kumari B, Shende A (2024) Cyber threats in social
metaverse and mitigation techniques. In: Somani AK, Mundra A, Gupta RK, Bhattacharya S,
Mazumdar AP (eds) Smart systems: innovations in computing. SSIC 2023. Smart innovation,
systems and technologies, vol 392. Springer, Singapore. https://doi.org/10.1007/978-981-97-
3690-4_34
10. Mali YK, Mohanpurkar A (2014) Advanced pin entry method by resisting shoulder surfing
attacks. In: 2015 International conference on information processing (ICIP), Pune, India, pp
37–42. https://doi.org/10.1109/INFOP.2015.7489347
11. Mali Y, Chapte V (2014) Grid based authentication system. Int J Adv Res Comput Sci Manag
Stud 2(10):93–99
12. Borate V, Mali Y, Suryawanshi V, Singh S, Dhoke V, Kulkarni A (2023) IoT based self alert
generating coal miner safety helmets. In: 2023 International conference on computational
intelligence, networks and security (ICCINS), Mylavaram, India, pp 01–04. https://doi.org/10.
1109/ICCINS58907.2023.10450044
13. Mali Y, Sawant N (2023) Smart helmet for coal mining. Int J Adv Res Sci Commun Technol
(IJARSCT) 3(1). https://doi.org/10.48175/IJARSCT-8064
14. Pawar J, Bhosle AA, Gupta P, Mehta Shiyal H, Borate VK, Mali YK (2024) Analyzing acute
lymphoblastic leukemia across multiple classes using an enhanced deep convolutional neural
network on blood smear. In: IEEE International conference on information technology, elec-
tronics and intelligent communication systems (ICITEICS), Bangalore, India, pp. 1–6. https://
doi.org/10.1109/ICITEICS61368.2024.10624915
Hand Gesture Recognition and Real-Time Voice Translation … 449

15. Lonari P, Jagdale S, Khandre S, Takale P, Mali Y (2021) Crime awareness and registration
system. Int J Sci Res Comput Sci Eng Inf Technol (IJSRCSEIT) 8(3):287–298. ISSN: 2456-
3307
16. Pathak J, Sakore N, Kapare R, Kulkarni A, Mali Y (2019) Mobile rescue robot. Int J Sci Res
Comput Sci Eng Inf Technol (IJSRCSEIT) 4(8):10–12. ISSN: 2456-3307
17. Dhote D, Rai P, Deshmukh S, Jaiswal A, Mali Y (2019) A survey: analysis and estimation of
share market scenario. Int J Sci Res Comput Sci Eng Inf Technol (IJSRCSEIT) 4(8):77–80.
ISSN: 2456-3307
18. Asreddy R, Shingade A, Vyavhare N, Rokde A, Mali Y (2019) A survey on secured data
transmission using RSA algorithm and steganography. Int J Sci Res Comput Sci Eng Inf
Technol (IJSRCSEIT) 4(8):159–162. ISSN: 2456-3307
19. Chougule S, Bhosale S, Borle V, Chaugule V, Mali Y (2020) Emotion recognition based
personal entertainment robot using ML and IP. Int J Sci Res Sci Technol (IJSRST) 5(8):73–75.
Print ISSN: 2395-6011, Online ISSN: 2395-602X
20. Lokre A, Thorat S, Patil P, Gadekar C, Mali Y (2020) Fake image and document detection using
machine learning. Int J Sci Res Sci Technol (IJSRST) 5(8):104–109. Print ISSN: 2395-6011,
Online ISSN: 2395-602X
21. Hajare R, Hodage R, Wangwad O, Mali Y, Bagwan F (2021) Data security in cloud. Int J Sci
Res Comput Sci Eng Inf Technol (IJSRCSEIT) 8(3):240–245. ISSN: 2456-3307
22. Mali Y, Upadhyay T (2023) Fraud detection in online content mining relies on the random
forest algorithm. SWB 1(3):13–20. https://doi.org/10.61925/SWB.2023.1302
23. Mali YK, Darekar SA, Sopal S, Kale M, Kshatriya V, Palaskar A (2023) Fault detection of
underwater cables by using robotic operating system. In: 2023 IEEE International Carnahan
conference on security technology (ICCST), Pune, India, pp 1–6. https://doi.org/10.1109/ICC
ST59048.2023.10474270
IoT-Based Smart EV Power Management
for Basic Life Support Transportation

M. Hema Kumar, G. Nagalali, D. Karthikeyan, C. S. Poornisha,

and S. Priyanka

Abstract E-vehicle offers a more sustainable, effective, and environmentally

friendly transportation option than traditional fossil-fuel-driven vehicles, electric
vehicles have drawn a lot of interest in recent years. The most significant part of the
electric car powertrain, lithium-ion batteries need precise monitoring and manage-
ment. The expense of batteries, anxiety about range, safety concerns, and reliability
are just a few of the difficulties that remain in the mass manufacture of electric vehi-
cles. By utilizing a powerful battery management system, these difficulties can be
considerably reduced. The battery management system is in charge of figuring out
the battery’s level of charge, health, and remaining usable life in real-time in addition
to interfacing with other parts and subsystems of the vehicle. A high-fidelity battery
model and a precise, reliable estimating approach must cooperate under a variety of
power demands, temperatures, and stages of life for the battery management system
to successfully carry out these responsibilities. In this study, lithium-ion batteries are
taken into account. As they can simulate lithium diffusion, electrochemical models
are an appealing strategy for these batteries. Processing and monitoring are done on
the lithium potentials and concentrations inside the batteries and solutions. Due to
their connection to the real battery processes, electrochemical models are therefore
favoured over other modelling approaches for battery charge state and state of health
evaluation.

M. Hema Kumar · C. S. Poornisha (B) · S. Priyanka

Sona College of Technology, Salem, India
e-mail: [email protected]
M. Hema Kumar
e-mail: [email protected]
S. Priyanka
e-mail: [email protected]
G. Nagalali
Mahendra Engineering College, Namakkal, India
e-mail: [email protected]
D. Karthikeyan
Vellor Institute of Technology, Vellore, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 451
S. Jain et al. (eds.), Semantic Intelligence, Lecture Notes in Electrical Engineering 1258,
https://doi.org/10.1007/978-981-97-7356-5_36
452 M. Hema Kumar et al.

Keywords Battery · Electric vehicle · Cooling system

1 Introduction

Petroleum-based fuels have unquestionably risen to the top of the global trans-
portation fuel market [1–3]. However, alternative fuels and propulsion systems
that can boost efficiency and lower emissions are being pursued due to consider-
ations including the United States’ rising dependency on dwindling oil sources,
environmental concerns, CAFE (corporate average fuel economy) rules, etc. The
improvement in fuel economy has several advantages.
Despite an economy that is becoming more energy efficient, the United States
still depends on foreign oil [4]. Around 11.5 million of the 19.5 million barrels of oil
that Americans use each day are imported. Almost half of the oil used in the United
States is for vehicles and trucks.
The world’s greatest rate of carbon emissions is in the United States. A third of
them have to do with transportation. One major contributing cause to the rise in
carbon emissions is the failure of automakers to improve fuel efficiency.
Future automobiles will need to provide more power to enable technologies
like collision avoidance systems, vehicle stability control, navigation, etc. while
consuming less gasoline and emitting fewer emissions due to carbon dioxide being
the main greenhouse gas.

2 Need for Battery Energy System

There is a very high demand for energy due to global economic growth and rapid
population growth. Conventional fossil fuels, such as coal and oil, are expensive
and seriously pollute the environment. Using renewable energy sources, such as
batteries, biomass, energy, and co-generation, to meet energy needs is essential for
societal progress and sustainable growth [5–7]. Energy efficiency and the use of
renewable energy sources are the two main tenets of a sustainable energy system.
The hybrid systems that are proposed, including battery systems. In this strategy,
battery is used as a tool and renewable battery resources are used as primary energy
sources. The surplus power generated by the battery and system is sent to the battery.
More power is provided to the dump load after the battery is fully charged. Battery
backup will provide electricity to meet load demand if battery and alternative energy
system output is insufficient due to weather-related problems. Several energy sources
are additionally connected to the bus through suitable interface circuits. When addi-
tional energy generation resources are available, the suggested hybrid system may
also be simply expanded.
IoT-Based Smart EV Power Management for Basic Life Support … 453

3 Literature Survey

Transportation programs require adaptive methods for SOC predictions because

having a precise, dependable, and strong To ensure safety and alleviate driver worries
about range anxiety, an approximation is required [8–10]. The use of fuzzy logic,
neural networks with artificial intelligence, and extract/observer-based techniques
are the three groups into which the literature separates adaptive SOC estimation
approaches. (such as Kalman Filters). This session’s main focus will be an extract/
observer-based study.
An electrochemical battery model with decreased order serves as the foundation
for a recommended estimating method. To calculate the SOC, terminal potentials,
and concentration gradients, a Kalman filter is used. Estimates and experimental
results for a 6 Ah electric car battery cell are contrasted. Even for little currents,
the filter provides precise estimations. However, due to the methods’ failure to take
into consideration sizable variations in the level of electrolytes near wires, significant
mistakes are found at extremely high release rates (C-rates) [11]. The complexity and
low order of the filter model are comparable to those of other circuit-based techniques.
The method is suitable for present time apps like the inside battery control system
due to its computational effectiveness.
An observer-based strategy is advised for determining the output error injection
status. In this approach, the concentrations and potentials of the solid and electrolyte
are described by a condensed set of partial differential algebraic equations [12].
Modeling and testing with practical running phases, such as the urban dynamometer
driving schedule, demonstrate the value of the suggested method. (UDDS). In this
essay, a technique for using a linearized battery model to figure out a battery’s level of
charge is introduced. To work around the nonlinear behaviour of battery models, the
Open Circuit Voltage-State of Charge (OCV-SOC) link was divided into continuous
segments, and model parameters were assessed for each section separately. The
generated linear model is then used to estimate the SOC using an observer [14].
Utilizing Ah lithium polymer batteries, this technique has been tested.
Scope of the Project
• To charge electric vehicles too fast;
• Efficiency is greater than 98%;
• Voltage, current and thermal sensing of EV battery to give full protection without
any issues;
• Constant voltage stability and improved thermal stability.

4 Proposed Method

For monitoring SOC and SOH battery conditions in the vehicle, the suggested system
includes a number of sensors, including a current sensor, voltage sensor, and a temper-
ature sensor. Voltage and current sensors are used to continuously monitor the battery
454 M. Hema Kumar et al.

voltage and current, respectively. The current sensor can measure currents up to 5 A,
while the voltage sensor can detect voltages up to 25 V DC. The battery’s temperature
will be measured by the temperature sensor, which will then determine the battery’s
performance. The outputs from the current, voltage, and temperature sensors are
analogue in nature, so we must convert them into digital format. To do this, we will
utilize the controller’s built-in ADC, which is a 10-bit, 13-channel ADC. DC–DC
convertor is used. Thermoelectric cooler is placed above the battery in order to cool
the battery when its temperature rises above the normal temperature. ESP8266 Node
MCU is used (Fig. 1).
Advantages of the proposed system are the following:
• Electric vehicles (EV) contribute significantly to lowering the carbon footprint
caused by the transportation sector’s gasoline usage.
• When an electric car is parked, the batteries’ stored energy is rendered inactive.
Provides additional advantages including distributed generation, voltage control,
voltage shaving, and speedy charging for electric vehicles. Due to this, hybrid
bi-directional Dual Active Bridge (DAB)-based AC-DC converters must be used
to charge the batteries of electric vehicles.
• The primary subjects of this chapter are the creation and execution of a fixed,
separated, high-efficiency SiC AC-DC converter. The clamped circuit’s negative
effects on the input current are removed (t). High power density is made possible
by using Silicon Carbide (SiC) transistors rather than conventional silicon devices.
• They provide lower switching losses than silicon devices and minimise the size of
the filter components since they work at high temperatures, allowing designers to
employ a smaller heat sink and minimising the cost of the system’s overall design.

Fig. 1 Proposed block diagram

IoT-Based Smart EV Power Management for Basic Life Support … 455

5 Power Demand Management Strategies

The combination of conventional and clean energy production generated under the
expected atmospheric circumstances must satisfy the hydraulic and electrical require-
ments. Due to this, battery power and reservoirs of water will be used by the system
to smooth out or minimise both the power supply and shortages of water.
There are two methods for distributing the electricity produced by renewable
sources between electrical and hydraulic demands; the first is referred to as an
“uncoupled power management technique.” Using this method, the electrical and
other loads are fulfilled in accordance with their needs regardless of the quantity
of intermittent power generation (i.e., the need for water and power). This theory
maintains that the operation of electrical as well as hydraulic loads depends only on
their needs and not on varying power generation (i.e., the need for water and power).
The instance, when tank is full, its need for water is satisfied by operating the motor
pumps at a low power (i.e., Level L1 for the first motor pump and Level L2 for the
second motor pump). It is compatible with the traditional method of managing other
loads, which only activates a pump to replenish the tank when it dips below a specific
level. It is similar to a “flushing technique.”
The intermittent nature of batteries and battery sources as well as the battery
capacity really place a cap on the amount of energy that is available. Later, two
methods are created to control the battery’s charging status (SOC) in accordance
with the needs of the load. Whereas method 1 gives the power load precedence over
the pneumatic load, method 2 gives the hydraulic load precedence.
In other words, under Approach 1, electric loads are employed first, followed by
hydraulic pumps, while the opposite is true for Strategy 2. Two solutions do not rely
on a renewable energy source for load management. In this instance, it is required to
adjust the battery’s power and energy requirements in order to take into account the
imbalance between the source and the necessary loads.

6 Microcontroller

Description: The ATmega16 is a low-power 8-bit CMOS microcontroller based on

the AVR RISC architecture. Since each of the 32 registers is directly connected to the
ALU, two different registers can be watched simultaneously in one loop. The design
is up to 10 times faster and more efficient than traditional CISC microcontrollers.
This product is based on Atmel’s Large Non-Volatile Memory technology [14]. Flash
program memory can be swapped within the system via the SPI serial interface, non-
volatile memory, an on-chip bootloader running on the AVR core, or other techniques.
The bootloader can download the application software from the application flash via
any interface. The actual read-on-write functionality is provided by the boot flash
partition being able to run programs when the application flash partition is updated.
Atmel’s unibody ATmega8 is a battery-efficient microcontroller that provides an
456 M. Hema Kumar et al.

excellent and affordable solution for many control applications. It combines an 8-bit
RISC CPU with in-system personal programming flash memory [15]. The ATmega16
AVR has access to many system and software development tools. It includes C
compilers, macro compilers, program debuggers/simulators, in-circuit emulators,
and evaluation tools.

7 Results and Discussions

Because of its form, the main product is thought of as a non-standard item. If someone
has excellent energy control abilities, one can tell when something important happens.
The conversion begins after a little under 0.2 s. It is possible to monitor both with
and without a connection to the electricity, converter, and demand (Figs. 2 and 3).

Fig. 2 Results

Fig. 3 Results
IoT-Based Smart EV Power Management for Basic Life Support … 457

Fig. 4 Internal resistance versus temperature

The hybrid battery and battery inverter run for 0.6 s each turn to measure the
system’s effectiveness. The system complies with the execution requirements in this
file. In an independent hybrid battery device, a voltage sensor collaborates with a
tiny switch to produce a reference current when a circumstance is identified.
Internal resistance increases with cell and battery ageing To analyze if this param-
eter is useful for SOH determination with on-board data, the comparison was done
relating the internal resistance and temperature as shown in Fig. 4. Instead of
observing an ascending slope as the SOH decreased, it could be observed that this
relation goes down independently, therefore, it showed that there was something
more relevant than ageing forcing this behavior and that is temperature which is a
key factor for SOH calculation.

8 Conclusion

By displacing the current conventional vehicles, electric automobiles offer enor-

mous promise for the future of transportation and communication. By reducing the
greenhouse gas emissions created by today’s cars, EVs will become much more
environmentally friendly and help to stop global warming. This chapter covers the
various sensor-based EV configuration technologies in detail. An in-depth discussion
458 M. Hema Kumar et al.

is provided on vehicle sensors, some of which are also present in EVs. Last but not
least, we talked about the many kinds of microfabricated sensors that have just lately
hit the market as a consequence of MEMS-based research and may be utilized for
a variety of tasks, including energy harvesting, motion detection, and battery sense.
This tiny sensor will help the following automobiles save money, free up space, and
enhance their identifying skills.

References

1. Wang C, Hashem Nehrir M (2008) Power management of a standalone battery/photovoltaic/

fuel cell energy system. IEEE Trans Energy Convers 23(3):957–967
2. Krishnan MS, Ramkumar MS, Sownthara M (2014) Power management of hybrid renewable
energy system by frequency deviation control. Int J Innov Res Sci Eng Technol 3(3):763–769
3. Krishnan MS, Ramkumar MS, Amudha A (2017) Frequency deviation control in hybrid
renewable energy system using FCUC. Int J Control Theory Appl 10(2): 333–344
4. Nehrir MH et al (2011) A review of hybrid renewable/alternative energy systems for elec-
tric power generation: configurations, control, and applications. IEEE Trans Sustain Energy
2(4):392–403
5. Baloch MH et al (2017) A research on electricity generation from battery corridors of Pakistan
(two provinces): a technical proposal for remote zones. Sustainability 9(9):1611
6. Baloch MH et al (2015) Feasible battery power potential from costal line of Sindh Pakistan.
Res J Appl Sci Eng Technol 10(4):393–400
7. Siddique MN et al (2015) Optimal integration of hybrid (battery–battery) system with diesel
power plant\newline using HOMER. Turk J Electr Eng Comput Sci 23(6):1547–1557
8. Devarajan R, Kumaran A (2019) Stability analysis of hybrid battery and battery system with
super capacitor storage. Int J Intell Adv Res Eng 7:3033–3036
9. Devarajan R, Mayilsamy A (2019) Design and implementation of neuro-fuzzy logic control
based maximum power point tracking for battery system. J Appl Sci Comput 6:333–341
10. Selvam P, Venkatesan G (2019) Power quality improvement in high current with low utility
voltage power generation system through IOT. J Appl Sci Comput VI(IV):3641–3647. ISSN:
1076-5131
11. Akshaya V, Sivanantham S, Hema Kumar M, Velmurugan AK, Deepa K (2023) Application of
convolutional neural network for cancer disease diagnosis—a deep learning based approach. J
Complement Med Res 14(1):69–75. ISSN: 2146-8397
12. Nagalalli G, Ravi G (2023) A novel MegaBAT optimized intelligent intrusion detection system
in wireless sensor networks. J Intell Autom Soft Comput 35(1):475–490
13. Hema Kumar M, Mohanraj V, Suresh Y, Senthilkumar J, Nagalalli G (2021) Real time two
hop neighbour strategic secure routing with attribute specific blockchain encryption scheme for
improved security in wireless sensor networks. Int J Comput Netw Appl 4(4). ISSN: 2395-0455
14. Hema Kumar M, Sobiya SK, Sriram M (2023) IoT enabled hand gesture controlled wheelchair
for disabled people. In: 5th International conference on inventive research in computing
applications (ICIRCA), 3–5 August 2023
15. Patra RK, Hema Kumar M, Srinivas K, Chandra Sekhar P, Subhashini SJ (2023) User-
segregation based channel estimation in the MIMO system. Int J Phys Commun 56. ISSN:
1874-4907