0% found this document useful (0 votes)

107 views639 pages

Machine Learning, Deep Learning and Computational Intelligence For Wireless Communication

Uploaded by

tele com

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views639 pages

Machine Learning, Deep Learning and Computational Intelligence For Wireless Communication

Uploaded by

tele com

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 639

Lecture Notes in Electrical Engineering 749

E. S. Gopi Editor

Machine Learning,
Deep Learning and
Computational
Intelligence
for Wireless
Communication
Proceedings of MDCWC 2020
Lecture Notes in Electrical Engineering

Volume 749

Series Editors

Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Naples, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán,
Mexico
Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore,
Singapore, Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology,
Karlsruhe, Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Università di Parma, Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid,
Madrid, Spain
Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität
München, Munich, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA,
USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra,
Barcelona, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University,
Palmerston North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University,
Singapore, Singapore
Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany
Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Junjie James Zhang, Charlotte, NC, USA
Yong Li, Hunan University, Changsha, Hunan, China
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering - quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS

For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the
Publishing Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada:
Michael Luby, Senior Editor ([email protected])
All other Countries:
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **

More information about this series at http://www.springer.com/series/7818

E. S. Gopi
Editor

Machine Learning, Deep

Learning and Computational
Intelligence for Wireless
Communication
Proceedings of MDCWC 2020
Editor
E. S. Gopi
Department of Electronics and Communication
Engineering
National Institute of Technology Tiruchirappalli
Tiruchirappalli, Tamil Nadu, India

ISSN 1876-1100 ISSN 1876-1119 (electronic)

Lecture Notes in Electrical Engineering
ISBN 978-981-16-0288-7 ISBN 978-981-16-0289-4 (eBook)
https://doi.org/10.1007/978-981-16-0289-4

© Springer Nature Singapore Pte Ltd. 2021

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
I dedicate this book to my mother
Late. Mrs. E. S. Meena.
Preface

Due to the feasibility of collecting huge data from mobile and wireless networks, there
are many possibilities of using machine learning, deep learning and the computational
intelligence to interpret and to hunt knowledge from the collected data. The workshop
aims in consolidating the experimental results, integrating the machine learning, deep
learning and computational intelligence for wireless communication and the related
topics. This book consists of the reviewed papers grouped under the following topics:
(a) machine learning, deep learning and computational intelligence algorithms, (b)
wireless communication systems and (c) mobile data applications. I thank those
directly and indirectly involved in executing the online event MDCWC 2020 held
from 22 October to 24 October 2020 successfully.
Thanks

Tiruchirappalli, India E. S. Gopi

October 2020 Programme Chair
MDCWC 2020

vii
Organization

Machine Learning, Deep Learning and Computational Intelligence for Wireless

Communication (MDCWC 2020) is the first international online workshop organized
by the Pattern Recognition and the Computational Intelligent Division, Department
of Electronics and Communication Engineering, National Institute of Technology
Tiruchirappalli. It was conducted completely in the virtual mode from 22 to 24
October 2020.
The keynote speakers include the following: (a) Prof. K. K. Biswas (Retired
Professor), Indian Institute of Technology Delhi, on “deep learning”; (b) Prof. Emre
Celebi, Professor and Chair of the Department of Computer Science, University of
Central Arkansas, on “data clustering and K-means algorithm”; (c) Prof. Dush Nalin
Jayakody, Professor, School of Computer Science and Robotics, National Tomsk
Polytechnic University, Russia, on “age of information and energism on the future
wireless networks”; (d) Dr. Jithin Jagannath, Director, Marconi-Rosenblatt AI/ML
Innovation Lab, New York, USA, on “how will machine learning revolutionize wire-
less communication”. The invited talk includes the following: (a) Dr. Lakshmanan
Nataraj, Senior Research Staff, Mayachitra Inc., Santa Barbara, CA, USA, on “detec-
tion of GAN-generated images and deepfakes”; (b) Dr. Lalit Kumar Singh, Scien-
tist, NPCIL-BARC, Department of Atomic Energy, Government of India, on “relia-
bility analysis of safety critical systems using machine learning”; (c) Dr. Shyam Lal,
Faculty in National Institute of Technology Karnataka, on “deep learning for IOT
applications”; (d) Dr. Gaurav Purohit, Scientist, CSIR-CEERI, Pilani, on “intelligent
data analytics for prediction of AIR quality on pre- and post-COVID India dataset”;
(e) Mr. Abhinav K. Nair, Senior Engineer (R&D), Radisys India Pvt. Ltd., on “6G—
why should we talk about it now?”; (f) Mr. Mahammad Shaik, Qualcomm Engi-
neer, Hyderabad, on “computational intelligence for efficient transmission policy for
energy harvesting and spectral sensing in cognitive radio system”.
The paper review process was executed using EasyChair. All the selected papers
were presented under three different tracks, (a) machine learning, deep learning
and computational intelligence algorithms (MLDLCI), (b) wireless communication
(WC) and (c) mobile data applications (MDA).
The selected papers include the topics like deep learning to predict the number of
antennas in massive MIMO setup, black widow algorithm for AVR system, LSTM
ix
x Organization

for hotspot detection, GAN to estimate channel coefficients, self-interference cancel-

lation in full-duplex system, RF-VLC underwater communication system, mobile
data applications like SRGAN for super-resolution of satellite images, glaucoma
diagnosis from fundus images, hyperspectral image classification, etc.

Technical Programme Commitee

Abhinav K. Nair, Radisys India private Limited

Akhil Gupta, Lovely Professional University, Phagwara, Punjab
Anand Kulkurni, Symbiosis Institute of Technology, Pune
K. Aparna, National Institute of Technology Surathkal
K. K. Biswas, Retired Professor, Indian Institute of Technology Delhi
Deep Gupta, Visvesvaraya National Institute of Technology
Dushantha Nalin K. Jayakody, National Tomsk Polytechnic University (TPU), Russia
Emre Celebi, Professor and chair of the Department of Computer Science, University
of Central Arkansas, USA
Gaurav Purohit, Scientist, CSIR-CEERI, Pilani
Hariharan Muthusamy, NIT, Srinagar
Jithin Jagannath, Director, Marconi-Rosenblatt AI/ML Innovation Lab, New York,
USA
Lakshmanan Nataraj, Senior Research scientist, Mayachitra Deep learning solutions,
Santa Barbara, CA, USA
Lakshmi Sutha, National Institute of Technology Puducherry
Lalit Singh, Indian Institute of Technology Bhubaneswar
Mandeep Singh, National Institute of Technology Surathkal
Mohammad Shaik, Qualcomm, Hyderabad
Murugan, National Institute of Technology Silchar, Assam
A. V. Narasimhadhan, National Institute of Technology Surathkal
Rajarshi Bhattacharya, National Institute of Technology Patna
Rangababu, National Institute of Technology Meghalaya, Shillong
Sanjoy Dharroy, National Institute of Technology Durgapur
Sankar Nair, Qualcomm, Chennai
B. Sathyabama, Thiagarajar College of Engineering, Madurai
Satyasai Nanda, Malaviya National Institute of Technology Jaipur
Shravan Kumar Bandari, National Institute of Technology Meghalaya, Shillong
Shilpi Gupta, Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat
Shyam Lal, National Institute of Technology Karnataka
Shrishil Hiremath, National Institute of Technology Rourkela
Sudakar Chauhan, National Institute of Technology Kurukshetra
R. Swaminathan, Indian Institute of Technology Indore
Shweta Shah, Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat
Smrithi Agarwal, Motilal Nehru National Institute of Technology Allahabad
Tajinder Singh Arora, National Institute of Technology, Uttarakhand
Organization xi

Umesh C. Pati, National Institute of Technology Rourkela

Vineetha Yogesh, Qualcomm, Bangalore
Wasim Arif, National Institute of Technology Silchar

Executive Commitee Members

Patron
Professor Mini Shaji Thomas, Director, NITT
Co-patron
Dr. Muthuchidambaranathan, Head of the ECE Department, NITT
Coordinator and Programme Chair
Dr. E. S. Gopi, Associate Professor, Department of ECE, NITT
Co-coordinators
Dr. B. Rebekka, Assistant Professor, Department of ECE, NITT
Dr. G. Thavasi Raja, Assistant Professor, Department of ECE, NITT

Session Chairs

Anand Kulkurni
Ashish
Gaurav purohit
Gopi E. S.
Lakshmanan Nataraj
Maheswaran
Narasimhadhan A. V.
Rebekka B.
Sathyabama
Satyasai Jagannath Nanda
Shravan Kumar Bandari
Shyam Lal
Smrithi Agarwal
Sudhakar
Sudharson
Thavasi Raja G.
xii Organization

Referees

Anand Kulkarni
Aparna P.
Ashish Patil
Gangadharan G. R.
Gaurav Purohit
Janet Barnabas
Koushik Guha
Lakshmanan Nataraj
Lakshmi Sutha G.
Lalit Singh
Mahammad Shaik
Maheswaran Palani
Mandeep Singh
Murugan R.
Rebekka Balakrishnan
Sanjay Dhar Roy
Sankar Nair
Sathya Bama B.
Satyasai Jagannath Nanda
Shilpi Gupta
Shravan Kumar Bandari
Shrishail Hiremath
Shweta Shah
Shyam Lal
Smriti Agarwal
Sudakar Chauhan
Sudha Vaiyamalai
Sudharsan Parthasarathy
Swaminathan Ramabadran
Thavasi Raja G.
Umesh C. Pati
Varun P. Gopi
Venkata Narasimhadhan Adapa
Vineetha Yogesh

Supporting Team Members

Rajasekharreddy Poreddy, Research scholar

G. Jaya Brindha, Research scholar
Vinodha Kamaraj, Research scholar
Contents

Machine Learning, Deep Learning and Computational Intelligence

Algorithms
Deep Learning to Predict the Number of Antennas in a Massive
MIMO Setup Based on Channel Characteristics . . . . . . . . . . . . . . . . . . . . . . 3
Sharan Chandra, E. S. Gopi, Hrishikesh Shekhar, and Pranav Mani
Optimal Design of Fractional Order PID Controller for AVR
System Using Black Widow Optimization (BWO) Algorithm . . . . . . . . . . 19
Vijaya Kumar Munagala and Ravi Kumar Jatoth
LSTM Network for Hotspot Prediction in Traffic Density
of Cellular Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
S. Swedha and E. S. Gopi
Generative Adversarial Network and Reinforcement Learning
to Estimate Channel Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Pranav Mani, E. S. Gopi, Hrishikesh Shekhar, and Sharan Chandra
Novel Method of Self-interference Cancelation in Full-Duplex
Radios for 5G Wireless Technology Using Neural Networks . . . . . . . . . . . 59
L. Yashvanth, V. Dharanya, and E. S. Gopi
Dimensionality Reduction of KDD-99 Using Self-perpetuating
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Swapnil Umbarkar and Kirti Sharma
Energy-Efficient Neighbor Discovery Using Bacterial Foraging
Optimization (BFO) Algorithm for Directional Wireless Sensor
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Sagar Mekala and K. Shahu Chatrapati
Auto-encoder—LSTM-Based Outlier Detection Method for WSNs . . . . . 109
Bhanu Chander and Kumaravelan Gopalakrishnan

xiii
xiv Contents

An Improved Swarm Optimization Algorithm-Based Harmonics

Estimation and Optimal Switching Angle Identification . . . . . . . . . . . . . . . 121
M. Alekhya, S. Ramyaka, N. Sambasiva Rao, and Ch. Durga Prasad
A Study on Ensemble Methods for Classification . . . . . . . . . . . . . . . . . . . . . 127
R. Harine Rajashree and M. Hariharan
An Improved Particle Swarm Optimization-Based System
Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Pasila Eswari, Y. Ramalakshmanna, and Ch. Durga Prasad
Channel Coverage Identification Conditions for Massive MIMO
Millimeter Wave at 28 and 39 GHz Using Fine K-Nearest Neighbor
Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Vankayala Chethan Prakash, G. Nagarajan, and N. Priyavarthan
Flip Flop Neural Networks: Modelling Memory for Efficient
Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
S. Sujith Kumar, C. Vigneswaran, and V. Srinivasa Chakravarthy

Wireless Communication Systems

Selection Relay-Based RF-VLC Underwater Communication
System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Mohammad Furqan Ali, Tharindu D. Ponnimbaduge Perera,
Vladislav S. Sergeevich, Sheikh Arbid Irfan,
Unzhakova Ekaterina Viktorovna, Weijia Zhang,
Ândrei Camponogara, and Dushantha Nalin K. Jayakody
Circular Polarized Octal Band CPW-Fed Antenna Using Theory
of Characteristic Mode for Wireless Communication Applications . . . . . 193
Reshmi Dhara
Massive MIMO Pre-coders for Cognitive Radio Network
Performance Improvement: A Technological Survey . . . . . . . . . . . . . . . . . . 211
Mayank Kothari and U. Ragavendran
Design of MIMO Antenna Using Circular Split Ring Slot Defected
Ground Structure for ISM Band Applications . . . . . . . . . . . . . . . . . . . . . . . . 227
F. B. Shiddanagouda, R. M. Vani, and P. V. Hunagund
Performance Comparison of Arduino IDE and Runlinc IDE
for Promotion of IoT STEM AI in Education Process . . . . . . . . . . . . . . . . . 237
Sangay Chedup, Dushantha Nalin K. Jayakody, Bevek Subba,
and Hassaan Hydher
Analysis of Small Loop Antenna Using Numerical EM Technique . . . . . . 255
R. Seetharaman and Chaitanya Krishna Chevula
Contents xv

A Monopole Octagonal Sierpinski Carpet Antenna with Defective

Ground Structure for SWB Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
E. Aravindraj, G. Nagarajan, and R. Senthil Kumaran
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems . . . . 281
K. Ayappasamy, G. Nagarajan, and P. Elavarasan
Secure, Efficient, Lightweight Authentication in Wireless Sensor
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Bhanu Chander and Kumaravelan Gopalakrishnan
Performance Evaluation of Logic Gates Using Magnetic Tunnel
Junction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Jyoti Garg and Subodh Wairya
Medical IoT—Automatic Medical Dispensing Machine . . . . . . . . . . . . . . . . 323
C. V. Nisha Angeline, S. Muthuramlingam, E. Rahul Ganesh,
S. Siva Pratheep, and V. Nishanthan
Performance Analysis of Digital Modulation Formats in FSO . . . . . . . . . . 331
Monica Gautam and Sourabh Sahu
High-Level Synthesis of Cellular Automata–Belousov Zhabotinsky
Reaction in FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
P. Purushothaman, S. Srihari, and S. Deivalakshmi
IoT-Based Calling Bell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Sundara Babu Maddu, Gaddam Venu Gopal, Ch. Lasya Sarada,
and B. Bhargavi

Mobile Data Applications

Development of an Ensemble Gradient Boosting Algorithm
for Generating Alerts About Impending Soil Movements . . . . . . . . . . . . . . 365
Ankush Pathania, Praveen Kumar, Priyanka, Aakash Maurya,
K. V. Uday, and Varun Dutt
Seam Carving Detection and Localization Using Two-Stage Deep
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Lakshmanan Nataraj, Chandrakanth Gudavalli,
Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran,
and B. S. Manjunath
A Machine Learning-Based Approach to Password Authentication
Using Keystroke Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Adesh Thakare, Shreyas Gondane, Nilesh Prasad, and Siddhant Chigale
Attention-Based SRGAN for Super Resolution of Satellite Images . . . . . . 407
D. Synthiya Vinothini and B. Sathya Bama
xvi Contents

Detection of Acute Lymphoblastic Leukemia Using Machine

Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Pradeep Kumar Das, Ayush Pradhan, and Sukadev Meher
Computer-Aided Classifier for Identification of Renal Cystic
Abnormalities Using Bosniak Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 439
P. R. Mohammed Akhil and Menka Yadav
Recognition of Obscure Objects Using Super Resolution-Based
Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
B. Sathyabama, A. Arunesh, D. SynthiyaVinothini,
S. Anupriyadharsini, and S. Md. Mansoor Roomi
Low-Power U-Net for Semantic Image Segmentation . . . . . . . . . . . . . . . . . 473
Vennelakanti Venkata Bhargava Narendra, P. Rangababu,
and Bunil Kumar Balabantaray
Electrocardiogram Signal Classification for the Detection
of Abnormalities Using Discrete Wavelet Transform and Artificial
Neural Network Back Propagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 493
M. Ramkumar, C. Ganesh Babu, and R. Sarath Kumar
Performance Analysis of Optimizers for Glaucoma Diagnosis
from Fundus Images Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . 507
Poonguzhali Elangovan and Malaya Kumar Nath
Machine Learning based Early Prediction of Disease with Risk
Factors Data of the Patient Using Support Vector Machines . . . . . . . . . . . 519
Usharani Chelladurai and Seethalakshmi Pandian
Scene Classification of Remotely Sensed Images using Ensembled
Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
P. Deepan and L. R. Sudha
Fuzziness and Vagueness in Natural Language Quantifiers:
Searching and Systemizing Few Patterns in Predicate Logic . . . . . . . . . . . 551
Harjit Singh
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure
Linguistic Feature Engineering: A Novel Approach . . . . . . . . . . . . . . . . . . . 569
Lovedeep Singh and Kanishk Gautam
Groundwater Level Prediction and Correlative Study
with Groundwater Contamination Under Conditional Scenarios:
Insights from Multivariate Deep LSTM Neural Network Modeling . . . . . 579
Ahan Chatterjee, Trisha Sinha, and Rumela Mukherjee
A Novel Deep Hybrid Spectral Network for Hyperspectral Image
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
K. Priyadharshini @ Manisha and B. Sathya Bama
Contents xvii

Anomaly Prognostication of Retinal Fundus Images Using

EALCLAHE Enhancement and Classifying with Support Vector
Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
P. Raja Rajeswari Chandni
Analysis of Pre-earthquake Signals Using ANN: Implication
for Short-Term Earthquake Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Ramya Jeyaraman, M. Senthil Kumar, and N. Venkatanathan
A Novel Method for Plant Leaf Disease Classification Using Deep
Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
R. Sangeetha and M. Mary Shanthi Rani
About the Editor

Dr. E. S. Gopi has authored eight books, of which seven have been published by
Springer. He has also contributed 8 book chapters to books published by Springer.
He has several papers in international journals and conferences to his credit. He has
20 years of teaching and research experience. He is the coordinator for the pattern
recognition and the computational intelligence laboratory. He is currently Associate
Professor, Department of Electronics and Communication Engineering, National
Institute of Technology, Trichy, India. His books are widely used all over the world.
His book on “Pattern recognition and Computational intelligence using Matlab”,
Springer was recognized as one of the best ebook under “pattern recognition” and
“Matlab” categories by the Book authority, world’s leading site for book recom-
mendations by thought leaders. His research interests include machine intelligence,
pattern recognition, signal processing and computational intelligence. He is the series
editor for the series “Signals and Communication Technology”, Springer publica-
tion. The India International Friendship Society (IFS) has awarded him the “Shiksha
Rattan Puraskar Award” for his meritorious services in the field of education. The
award was presented by Dr. Bhishma Narain Singh, former Governor, Assam and
Tamil Nadu, India. He is also awarded with the “Glory of India Gold Medal” by
International Institute of Success Awareness. This award was presented by Shri.
Syed Sibtey Razi, former Governor of Jharkhand, India. He was also awarded with
“Best citizens of India 2013” by The International Publishing House and Life Time
Golden Achievement award 2021, by Bharat Rattan Publishing House.

xix
Machine Learning, Deep Learning
and Computational Intelligence Algorithms
Deep Learning to Predict the Number
of Antennas in a Massive MIMO Setup
Based on Channel Characteristics

Sharan Chandra , E. S. Gopi , Hrishikesh Shekhar , and Pranav Mani

Abstract Deep learning (DL) solutions learn patterns from data and exploit knowl-
edge gained in learning to generate optimum case-specific solutions that outperform
pre-defined generalized heuristics. With an increase in computational capabilities
and availability of data, such solutions are being adopted in a wide array of fields,
including wireless communications. Massive MIMO is expected to be a major cat-
alyst in enabling 5G wireless access technology. The fundamental requirement is
to equip base stations with arrays of many antennas, which are used to serve many
users simultaneously. Mutual orthogonality between user channels in multiple-input
multiple-output (MIMO) systems is highly desired to facilitate effective detection
of user signals sent during uplink. In this paper, we present potential deep learning
applications in massive MIMO networks. In theory, an infinite number of antennas
at the base station ensures mutual orthogonality between each user’s channel state
information (CSI). We propose the use of artificial neural networks (ANN) to pre-
dict the practical number of antennas required for mutual orthogonality given the
variances of the user channels. We then present an analysis to obtain the practical
value of antennas required for convergence of the signal-to-interference-noise ratio
(SINR) to its limiting value, for the case of perfect CSI. Further, we train a deep
learning model to predict the required number of antennas for the SINR to converge
to its limiting value, given the variances of the channels. We then extend the study
to show the convergence of SINR for the case of imperfect CSI.

Keywords Multiple-input multiple-output (MIMO) · Mutual orthogonality ·

Channel state information (CSI) · Signal-to-interference-noise-ratio (SINR) ·
Artificial neural networks (ANNS) · Deep learning (DL)

S. Chandra (B) · E. S. Gopi · H. Shekhar · P. Mani

National Institute of Technology, Tiruchirappalli, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 3

E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence
for Wireless Communication, Lecture Notes in Electrical Engineering 749,
https://doi.org/10.1007/978-981-16-0289-4_1
4 S. Chandra et al.

1 Introduction

With an increasing demand for faster and more reliable data transmission, traditional
data transfer schemes may no longer be able to satisfy growing demands. Several
technologies have been developed to satisfy these resource-heavy requirements, but
massive MIMO [1] is among the leaders of this race. Massive MIMO has proved to be
both effective and efficient in usage of energy and spectrum, promising performance
boosts of up to 10–100 times faster than that provided by existing MIMO systems
[2].
The concept of MIMO can be boiled down to transmitting and receiving multiple
signals over a single channel simultaneously. Although there is no prescribed crite-
rion to classify what a massive MIMO system is, generally, these systems tend to
utilize tens or even hundreds of antennas as opposed to three or four antennas in tra-
ditional MIMO systems. The key advantage of a massive MIMO system is that it can
bring up to a 50-fold increase in capacity without a significant increase in spectrum
requirement. Further, owing to large data rates, massive MIMO is expected to play
a major role in launching and sustaining 5G technology.
A typical massive MIMO system consists of several hundreds of antennas in the
base station and many users, interconnected to form a dense network (refer Fig. 1).
(1) and (2) signify the mathematical model governing the massive MIMO system
[3].

Y =HX +N (1)

⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
y1 h11 h12 . . .h1N x1 n1
⎢ y2 ⎥ ⎢ h21 h22 . . .h2N ⎥ ⎢ x2 ⎥ ⎢ n2 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ y3 ⎥ ⎢ h31 h32 . . .h3N ⎥ ⎢ x3 ⎥ ⎢ n3 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
[⎢ y4 ⎥=⎢ h41 h42 . . .h4N ⎥ ⎢ x4 ⎥ + ⎢ n4 ⎥] (2)
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. .. .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . ⎦ ⎣ . .. . .. ⎦⎣ . ⎦ ⎣ . ⎦
yM hM 1 hM 2 . . .hMN xN nM

In massive MIMO, mutual orthogonality between each user’s channel state infor-
mation(CSI), is fundamental to the task of message signal detection. This orthogo-
nality between different channels relies upon an underlying assumption of an infinite
number of base station antennas [3]. However, in practical cases, this is not feasible.
Deep Learning to Predict the Number of Antennas … 5

Fig. 1 Schematic of a
generalized massive MIMO
system where Y i represents
the ith antenna in the base
station and Xi represents the
ith user. The blue lines
indicate the channel
coefficients corresponding to
user 1

Hence, in Sect. 3 of this paper, an in-depth study is conducted on the number of

antennas in the base station required to guarantee a low margin of error in recon-
structing the message signal sent by a particular user. Further a deep learning (DL)
[4] system is built to predict an optimal number of base station antennas for effective
deciphering of the message at the base station, given the variances of the channels
in consideration.
Section 4 investigates how the signal-to-interference-noise-ratio (SINR) of a sig-
nal being transmitted in a MIMO system varies with the number of antennas. In
theory, this value is said to converge to a constant under the assumption of infinite
base station antennas. A study is carried out in this section to realize the practical
number of antennas required in the base station for which the SINR converges to the
calculated constant value, to within a threshold, for perfect CSI. We then deploy a
deep learning (DL) model whose objective is to predict the number of base station
antennas for ensuring convergence of the practical value of SINR to its limiting value,
within a threshold of 0.01.
In Sect. 5, we perform a similar analysis for imperfect CSI [3], where we demon-
strate the practical value of base station antennas required for convergence of SINR
to its limiting value, under different threshold conditions. We realize that a similar
DL model can be exercised for the case of imperfect CSI as well.

2 Contributions of the Paper

We propose a deep learning (DL) model to predict the practical number of antennas to
be installed at the base station to ensure mutual orthogonality between user channels.
We use data generated through Monte Carlo simulation [5] to train our artificial neural
6 S. Chandra et al.

network (ANN) [6]. The model learns a precise mapping between the variances of
the user channel coefficients and the required number of antennas. It is found that
the model predicted values quickly and accurately, thereby allowing for potential
deployment in a massive MIMO system. For perfect channel state information (CSI),
we realize that the practical value of antennas required to drive the SINR to its
limiting value is well within the capabilities of a massive MIMO system. We use a
deep learning model to predict this value. This model is trained using data generated
through a Monte Carlo simulation. However, the more impactful observation is that
this is possible even for the imperfect CSI case, which is of more practical utility. A
similar DL approach can, therefore, be extended to predict the number of antennas
required to obtain the convergent value of SINR, for imperfect CSI.

3 Mutual Orthogonality

Mutually orthogonal channels are highly desirable in MIMO systems. Consider a

system of four antennas and two users. Pre-multiplying hH 1 to (1) and (2), we get
hH
1 Y =h H
1 HX +h H
1 N where hH
1 is the channel vector corresponding to user 1 (first
column of H matrix). We see that, if the channel vectors are mutually orthogonal, we
have h∗11 h12 = 0; h∗21 h22 = 0; h∗31 h32 = 0; h∗41 h42 = 0. Hence, if noise is ignored,
matched filter response can be used to detect the signal corresponding to x1 (user 1).

3.1 Trend Analysis

In this section, we analyse the trend in orthogonality between each user’s channel
state information (CSI) with the number of antennas at the base station (refer Fig. 2).
We show, using the Weak Law of Large Numbers [7] that, as the number of antennas in
the base station increases, the channel state information corresponding to individual
users must becomes orthogonal to each other:

h1 and h2 follow Gaussian distribution with zero mean

h11 h∗12 + h21 h∗22 +h31 h∗32 +· · · + hM 1 h∗M 2

lim = E h1 h∗2 = E [h1 ] E h∗2 = 0 (3)
M →∞ M
Deep Learning to Predict the Number of Antennas … 7

Procedure 1 Monte Carlo Simulation for Orthogonality Data

1: Mmax = 10001
2: Threshold = 0.01
3: Trials = 100
4: for Beta1 = 0.02, 0.04, . . . , 2 do
5: for Beta2 = 0.02, 0.04, . . . , 4 do
6: Mavg = 0
7: for trial = 1, 2, . . . , Trials do
8: for m = 1, 2, . . . , Mmax do
9: H 1 = matrix of dimensions m × 1 with values drawn from a complex Gaussian
distribution with mean 0 and variance Beta1
10: H 2 = matrix of dimensions m × 1 with values drawn from a complex Gaussian
distribution with mean 0 and variance Beta2
11: Result = ||Dot Product of H 1 and H 2 | |
12: if Result < Threshold then
13: Mavg = Mavg + m
14: Break
15: end if
16: end for
17: end for
18: Add [Beta1 , Beta2 ] to training data input list
Mavg
19: Add ( Trials ) to training data output list
20: end for
21: end for

In order to investigate the degree to which mutual orthogonality holds true in

the practical sense, we adopted the Monte Carlo simulation technique. Monte Carlo
algorithms [5] are a class of algorithms which use repeated sampling of data from
random variables to model and predict the overall behaviour of a system.
Each user’s channel vector can be modelled as an M-dimensional complex vector
sampled from a zero-mean Gaussian distribution with fixed variance. We proceeded
by varying, both, the variances of the channels and the number of antennas in the
base station and sampled values for our channel vectors. The procedure we made use
of is depicted as Procedure 1. Utilizing the results obtained from one observation
is futile as the value obtained may represent an unlikely outcome, leading to a very
noisy output. Hence, to avoid this problem, for a fixed pair of variances, we repeat the
experiment several times and average our results over all these trials. Following this
approach, we were able to plot the number of antennas required for orthogonality
with varying variances of each user’s channel vector. The graph obtained is depicted
in Sect. 6.1 (refer Fig. 5).
8 S. Chandra et al.

Fig. 2 We can see how the value, depicted in (3), approaches zero with increasing number of
antennas. In these figures, β2 is fixed at 1 for each β1

3.2 Deep Learning Architecture to Predict Number of

Antennas Required for Orthogonality

Using the dataset generated in Sect. 3.1, we leverage the power of a deep learning
(DL) model whose objective is to predict the number of antennas required to safely
assume orthogonality between channel vectors of individual users.
We make use of an artificial neural network (ANN) that takes the variances of
any two channel vectors, as input, and predicts the number of antennas required to
ensure orthogonality between each user’s channel vector.
The network utilized for the purpose of this study (refer Fig. 3) consists of an
input layer fed with variances β1 and β 2 and an output layer that gives the number of
antennas, M, to ensure orthogonality between two channels with characteristics β1
and β 2 . The leaky version of the rectified linear unit (ReLU) activation function is
applied to each layer, and the learning rates are tweaked as per the Adam optimization
algorithm. Batch gradient descent [8] is used to feed-forward randomized batches of
input variances through the network. The mean-squared error (MSE) [9] is used to
compute the gradients and modify the weights of the network.
The network successfully learns to map the relationship between the output (num-
ber of antennas, M) and the input (variances β1 and β 2 ) (refer Fig. 6).
Deep Learning to Predict the Number of Antennas … 9

Fig. 3 Model of the artificial neural network (ANN), used to predict the number of antennas
required

4 Perfect CSI

The process of detection of user signal in massive MIMO relies upon the multipli-
cation of the received signal with the corresponding user’s channel vector. However,
for this, the channel vector for each user should be determined, and this calculated
value should not deviate considerably in time until the next channel vectors are
recalculated.
To extract the channel vector values, we generally send a pilot data consisting of
an identity matrix to isolate each of our channel vectors individually. However, the
assumption made here is that there is no noise corrupting the channel vectors during
our estimation. The noise arising from corrupted channel vectors is accounted for in
Sect. 5, imperfect CSI.

4.1 Signal-to-Interference-Noise-Ratio (SINR) Analysis

For simplicity, let us, once again, consider a system of four antennas at the base
station and two users (as in Sect. 3). From RHS of hH 1 Y =h1 HX +h1 N , we have
H H
∗ ∗
h11 (h11 x1 + h12 x2 + n1 ) + . . . + h41 (h41 x1 + h42 x2 + n4 ) . When we consider detec-
tion of signal corresponding to user 1, it is understood that [h11 h21 h31 h41 ] are known
and [h12 h22 h32 h42 ] are complex Gaussian random variables with zero mean and
variance β2 . Further, let us consider noise variance, E N 2 = σN2 = 1 and power
allocated to each user, Pu . Hence, we have, the theoretical limiting value of SINR [3],
Eu
h1 2 Eu Eu β1
lim SINR= lim M
= lim h1 2 = 2 (4)
N M →∞ M σ 2
σN
i=2 βi + σN2
M →∞ M →∞ Eu
M N

where the power allocated to each user, Pu = EMu , is represented as a function of Eu ,

the energy allotted to each user and M , the number of antennas at the base station.
10 S. Chandra et al.

Fig. 4 Plot of calculated SINR and expected SINR value versus the number of antennas at the base
station. For a we have, β1 = 0.1 and Ni=2 βi = 0.5 and for b we have, β1 = 1.5 and
N
i=2 βi = 8

The theoretical limiting value of SINR, shown above, works well under the
assumption that there is an infinite number of base station antennas. However, it is
unknown as to what extent the above equations hold for practical values of base sta-
tion antennas. Hence, a study was conducted to analyse how the trend of SINR of the
received signal varies with the number of base station antennas (refer Fig. 4). Please
note that, for the purpose of this study, we have considered noise variance, σN2 = 1.

Procedure 2 Monte Carlo Simulation for Perfect CSI

1: Mmax = 3000
2: Trials = 100
3: Threshold = 0.01
4: Beta1max = 6
5: N i=2 βi
max = 20

6: Euser = 1
7: for Beta1 = 0.05, 0.1, . . . , Beta1max do
8: for N i=2 βi = Beta1 + 0.05, Beta1 + 0.1, . . . ,
N
i=2 βi
max do

9: Mavg = 0
10: SINRexpected = Beta1 × E
11: for trial = 1, 2, . . . , Trials do
12: for m = 1, 2, . . . , Mmax do
13: H 1 = Matrix of dimensions m × 1 with values drawn from a complex Gaussian
distribution with mean 0 and variance Beta1
H 12mag ×Euser
14: Num = m
N
β ×E
15: Den = i=2 mi user + 1
16: SINR = Num
Den
17: if SINR − SINRexpected < Threshold then
18: Mavg = Mavg + m
19: Break
20: end if
21: end for
22: end for
N
23: Add Beta1 and i=2 βi to training data input list
Mavg
24: Add ( Trials ) to training data output list
25: end for
26: end for
Deep Learning to Predict the Number of Antennas … 11

The results in Fig. 4 were obtained by fixing Eu = 5 and noise variance, σN2 = 1.
From Fig. 4a, we can infer that by fixing just 127 antennas at the base station, the
received signal SINR converges to the expected value, within a threshold of 0.01.
Similarly, from Fig. 4b, we understand that by fixing 1441 antennas and fixing 3499
antennas at the base station, convergence occurs within an applied threshold of 0.3
and 0.1, respectively. However, this high value of antennas is obtained for a large
cumulative value of user variances (β1 = 1.5 and Ni=2 βi = 8).
On demonstrating the existence of a potential trend, once again, we employ a
Monte Carlo-based approach to simulate practical systems and realize the relation-
ship governing the variances of the channel vectors (β1 and Ni=2 βi ) and the number
of antennas at the base station(M ). We note that the convergent SINR value depends
only on the sum of variances of other channels and not the individual variances
themselves. Hence, by varying the sum over a range of values, we mimic multiple
user scenarios in a practical massive MIMO system. Following the steps shown in
Procedure 2, we calculate expected SINR values and compare it with the expected
SINR at infinity, which is Eβ1 (when we consider signal corresponding to user 1).

4.2 Deep Learning Architecture to Predict Number

of Antennas Required for Convergence of SINR

Using the data generation method mentioned in Sect. 4.1, we create a dataset that
maps the variances of the channels to the number of antennas required for the SINR to
converge to its limiting value. We use this dataset to train an artificial neural network
(ANN) to learn the mapping. We use a threshold of 0.01 for training.
For the purpose of this study, we use a network that consists of an input layer
fed with variances β1 and Ni=2 βi , where β1 is the variance of the channel whose
SINR we are interested in predicting and Ni=2 βi is the sum of variances of all other
channels. The output layer predicts the minimum number of antennas, M, to ensure
SINR of the channel with variance β1 is within a threshold of its convergent value.
The leaky version of the rectified linear unit (ReLU) activation function is applied to
each layer and the learning rates are tweaked as per the Adam optimization algorithm.
Batch gradient descent is used to feed-forward randomized batches of input variances
through the network. The mean-squared error (MSE) [9] is used to compute the
gradients and modify the weights of the network.

5 Imperfect CSI: An SINR Analysis

Section 4 deals with the ideal case, where the channel information observed is not
corrupted by noise. However, in reality, this is not the case. Similar to the previous
case, the channel vector values are determined using the “pilot” data sent across the
12 S. Chandra et al.

channel. However, the observed channel vector ĥ has an added noise component to
it, i.e. ĥ = h +error. As a result, the theoretical value of SINR for imperfect CSI
becomes (6).

Pu h1 4
SINR = 2 2
(5)
N
Pu
Pp
h1 2 + Pu i=2 h1 βi + h1

When we substitute, Pu = √Eu

M
and Pp = NPu , where M → ∞, we have:

⇒ SINR = N Eu 2 β1 2 (6)

Procedure 3 Monte Carlo Simulation for Imperfect CSI

1: Mmax = 20001
2: Trials = 100
3: Beta1 = 0.1
4: N i=2 βi = 0.5
5: Euser = 5
6: N = 4
7: for m = 1, 2, . . . , Mmax do
8: SINRavg = 0
9: for trial = 1, 2, . . . , Trials do
10: H 1 = matrix of dimensions m × 1 with values drawn from a complex Gaussian distribution
with mean 0 and variance Beta1
11: e = matrix of dimensions m × 1 with values drawn from Gaussian distribution with mean
0 and variance 1
12: Hˆ1 = H 1+ e
N × E√
user
m
H 1mag 2 ×Euser
13: Num = √
M
Hˆ1 2
N
βi × H 1mag ×Euser
i=2
mag 2 Hˆ1mag 2
14: Den = N +
1 √
m
+ H 1mag 2
15: SINR = Num
Den
16: SINRavg + = SINR
17: end for
18: SINRavg / = Trials
19: Add SINRavg to list of SINRs
20: end for

As shown in (6), the SINR converges to a constant value when the number of
antennas in the base station tends to infinity. However, we proceed to study at what
realistic value of base station antennas this expected SINR is obtained.
Deep Learning to Predict the Number of Antennas … 13

The steps utilized for this purpose is similar to that used for Perfect CSI and is
given as Procedure 3. However, the only difference is the formula used for calculating
the expected SINR. Following the steps shown in Procedure 3, we calculate expected
SINR values and compare it with the expected SINR as the number of antennas tends
to infinity.

6 Results

6.1 Mutual Orthogonality Simulation Data

Figure 5 indicates that for an average variance of β1 = 0.5 and β2 = 0.5, the practical
number of antennas required for orthogonality is 125. Also, the trend indicates that
the required number of antennas to guarantee a margin of error of 0.1 % is only
feasible up to β1 = 3.26 and β2 = 3.81.
On the basis of Fig. 5, we can confirm of a conclusive trend between the number
of antennas in the base station (M) and the variance of the two user channels (β1 , β2 ),
required for the CSI dot product to fall below a threshold of 0.01. It can also be seen
from Fig. 5b that the change in the number of antennas, M , is equally sensitive to the
change in both, β1 and β 2 . It can be inferred that as the variances of the two user
channels are increased, there is a general increasing characteristic in the required
number of antennas.

Fig. 5 Figure a and b are two-dimensional and three-dimensional representations of the plot
between β1 , β 2 and the number of antennas, M . A well-defined, directly proportional relationship
can be observed between the independent features, β1 and β 2 , and the dependent variable, number
of antennas, M
14 S. Chandra et al.

6.2 Predicting Number of Base Station Antennas Required

for Orthogonality

The artificial neural network (ANN) approach was able to accurately predict the
required number of antennas within a margin of error of about 20 antennas. This
number is not significant from the perspective of massive MIMO. Further, it is appar-
ent that this error can be reduced further, given more data for the network to train
on.
It is evident from Fig. 6 that the network is able to learn the relationship governing
the variances and the number of required antennas. However, an important point to
be noted is that the above study is confined to two users. Hence, in the practical
scenario, the required number of antennas must be calculated for all pairs of users
and the maximum value obtained, from all pairs, must be considered.

6.3 Perfect CSI-SINR Convergence Simulation Data

In Sect. 4.1, pertaining to perfect CSI, we make use of Procedure 2 to generate data for
the number of antennas required for SINR convergence to the expected value. Here,
the parameters β1 and Ni=2 βi are varied and the corresponding required number
of antennas is recorded. The resulting values, generated, resemble a triangular plane
with a well-defined inclination angle.
Figure 7a highlights the dependency of the required number of antennas on β1 . As
can be observed, the trend is almost linearly increasing for fixed values of Ni=2 βi .
Similarly, from Fig. 7b, on analysing the variance of required number of antennas
with Ni=2 βi , we once again observe a significantly proportional trend. However,
from Fig. 7c, we can see that the increase in number of antennas is more steep for β1

Fig. 6 The testing data was

generated by drawing
random variance values
between 0 and 4, and then
using Procedure 1, we obtain
the number of antennas
required for convergence.
For about 2000 test data
points, sorted in increasing
order of antennas required,
the testing curve and actual
values are plotted
Deep Learning to Predict the Number of Antennas … 15

Fig. 7 Figure depicts a three-dimensional plot between β1 , Ni=2 βi and the number of antennas,
M . Different viewing angles of the plot is shown in a–d. A direct dependence can be observed
between β1 , N i=2 βi and the number of antennas, M

as opposed to Ni=2 βi . From Fig. 7d, we can infer that there is a general increase in
the required number of antennas with increasing β1 and Ni=2 βi values.
Further, we note that on fixing a quintessential value of β1 = 2 and Ni=2 βi = 10,
we observe that the required number of antennas is 115. Analysing the extreme
case, by fixing β1 = 6 and Ni=2 βi = 20, we get a required number of antennas as
330. From this, we can infer that these observations are within the implementation
capabilities of a practical massive MIMO system. Hence, the use of this data to train
a deep learning (DL) model to predict the number of antennas is of high practical
use.

6.4 Perfect CSI-Predicting Number of Base Station Antennas

Required for Convergence of SINR

Utilizing the deep learning (DL) model proposed in Sect. 4.2, the required number
of antennas is estimated given a fixed value of β1 and Ni=2 βi . The predicted value
of the number of antennas is plotted against the true value of the required number of
antennas. The estimate, on an average, is within 19 antennas of the required number
16 S. Chandra et al.

Fig. 8 The testing data was

generated by drawing
random variance values
between 0 and 6, for β1 and
between β1 + 0.05 to 20 for
N
i=2 βi .Using Procedure 2,
we then obtain the number of
antennas required for
convergence of SINR. For
about 4000 test data points,
sorted in increasing order of
antennas required, the testing
curve and actual values are
plotted

of antennas which can be inferred from the calculated mean-squared error (MSE =
368.64688).
As can be seen in Fig. 8, the network is able to, effectively, learn the trend on the
input data. The performance can be improved by using larger datasets for training.

6.5 Imperfect CSI—Analysing the Number of Antennas

Required for Convergence of SINR

Similar to perfect CSI, the SINR is calculated using a Monte Carlo approach and
averaged over a number of trials. The resulting SINR is plotted alongside the expected
SINR, as shown in Fig. 9.
These results were obtained by fixing Eu = 5, β1 = 0.1, Ni=2 βi = 0.3 and
noise variance, σN2 = 1. The study indicates that by fixing a (i) threshold of 0.1, we

Fig. 9 Imperfect CSI-plot of

calculated SINR and
expected SINR value vs
number of antennas at the
base station
Deep Learning to Predict the Number of Antennas … 17

require 411 antennas, (ii) threshold of 0.05, we require 4516 antennas, and for (iii)
threshold of 0.01, we require 18,585 antennas for the received SINR to converge to
the expected value. Hence, according to the accuracy necessary, these values can be
utilized as a lower bound.

7 Conclusions

Often, finding the number of antennas to be installed in the base station for effective
and accurate deciphering of received signals is a challenging task. To solve this issue,
we make use of a deep learning model to obtain the number of antennas required,
given the maximum practically possible pair of input variances. Once at least the
predicted number of antennas are available for use at the base station, the channel
state information (CSI) of users are approximately orthogonal. Further, the latency
of the neural network is small enough to handle dynamic CSI. Accordingly, we
can activate the required number of antennas. The signal-to-interference-noise-ratio
(SINR), in case of perfect channel state information (CSI), is said to converge to
a constant value when the number of antennas, M → ∞. This signifies that even
though power allocated to each user Pu → 0, the SINR does not down-scale to 0.
We realize that for practical values of channel variances, the SINR converges and the
number of antennas required to ensure this convergence occurs at around 200 − 300
antennas. This is followed up by using a deep learning model to predict the required
number of antennas for the convergence of SINR to the expected values. Similarly,
for imperfect channel state information, the number of required antennas increases to
around 450. These observations have potential applications in realizing the limiting
value of SINR, in massive MIMO systems. Further, these observations invite deep
learning (DL) solutions to predict the number of antennas required in the base station
to ensure convergence of SINR, in case of imperfect CSI.

References

1. Lu L, Li GY, Swindlehurst AL, Ashikhmin A, Zhang R (2014) An overview of massive MIMO:

benefits and challenges. IEEE J Sel Top Sig Process 8(5):742–758 Oct.
2. Van Chien T, Björnson E (2017) Massive MIMO communications. In: Xiang W, Zheng K,
Shen X (eds) 5G mobile communications. Springer, Cham
3. Gopi ES (2015) Digital signal processing for wireless communication using Matlab, 1st ed.
Springer Publishing Company
4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.
1038/nature14539
5. Reuven Y, Rubinstein, Kroese DP (2016) Simulation and the Monte Carlo method, 3rd ed.
Wiley Publishing
6. Hassoun Mohamad H (1995) Fundamentals of artificial neural networks, 1st edn. MIT Press,
Cambridge
18 S. Chandra et al.

7. Weisstein EW, Weak law of large numbers. From MathWorld–A Wolfram Web Resource.
https://mathworld.wolfram.com/WeakLawofLargeNumbers.html
8. Ruder S (2016) An overview of gradient descent optimization algorithms. ArXiv
abs/1609.04747 n. pag
9. Schluchter MD (2014) Mean square error, in Wiley StatsRef: statistics reference online. Wiley,
New York
Optimal Design of Fractional Order PID
Controller for AVR System Using Black
Widow Optimization (BWO) Algorithm

Vijaya Kumar Munagala and Ravi Kumar Jatoth

Abstract A new technique to improve the fractional order proportional inte-

gral derivative (FOPID) controller parameters for AVR system was proposed. The
proposed technique uses the meta-heuristic Black Widow Optimization (BWO) algo-
rithm for FOPID controller tuning. The AVR systems without controller tend to
generate fluctuations in the output terminal voltage, so a controller is needed to
reduce these fluctuations and produce stable voltage. The BWO-FOPID controller
was used in the system to reduce the output fluctuations of the terminal voltage. More-
over, the two additional tuning parameters of FOPID controller greatly improve the
reliability of the system. Simulation results depict that the algorithm works superior
to the other optimization tuning algorithms which are based on PSO, CS, GA, and C-
YSGA. The system’s rise time and settling time values were improved significantly
when compared with other controllers. Finally, a robust analysis was carried out to
the designed system to check the reliability of the controller.

Keywords AVR system · FOPID controller · BWO optimization

1 Introduction

Electrical power generation systems are responsible for the production of electricity
using various natural resources. These systems are incorporated with generators that
convert mechanical energy into electrical energy. During the conversion process, the
systems tend to oscillate at equilibrium state because of vibrations in the moving
parts, load variations, and various external disturbances. To overcome this, often the
synchronous generators are fed with the help of exciters. The exciters will control
the input to the generators in such a way to uphold the output voltage at a stable level.

V. K. Munagala (B) · R. K. Jatoth

Department of ECE, National Institute of Technology, Warangal, India
e-mail: [email protected]
R. K. Jatoth
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 19

In this process, AVR systems are used in the control loop to maintain a stable signal
level at the input of the exciter so that the generator maintains the constant output
voltage at the terminals.
There are various techniques were provided like optimum control, robust control,
H ∞ /H 2 , and predictive control. All these control strategies use a proportional inte-
gral derivative (PID) controller as a basic control element. The modern industrial
controllers still use the PID controller because of its simple structure, ease of under-
standing the operation, and importantly robustness under different operating condi-
tions. Nowadays, the effort has been increased to improve the performance of PID
controllers using evolving mathematical concepts. One such technique is the use
of fractional order calculus to design PID controllers. Such a method adds addi-
tional parameters called the order of integration (λ) and order of differentiation (μ)
to the PID controller and these controllers are called fractional order PID (FOPID)
controllers. For optimal design of FOPID controller, the parameters like proportional
gain (K p ), integral gain (K i ), differential gain (K d ), λ, and μ should be carefully tuned.
These extra parameters will give additional advantages for FOPID controllers [1]. In
the past, FOPID controllers were used in many applications. Some of the applications
include speed control of DC motor [2], control of the servo press system [3], control
of water flow in irrigation canal [4], boost converter [5], control of flight vehicle [6],
and temperature control [7].
Various objective functions were utilized in the literature to tune the parameters
of FOPID/PID controllers. The basic cost function, integral of absolute error (IAE)
was used in [8] along with an improved artificial bee colony algorithm (ABC) for
tuning FOPID controllers. FOPID controller design using genetic algorithm was
discussed in [9]. FOPID controller tuning problem was solved using particle swarm
optimization algorithm (PSO) [10]. Another cost function called integrated squared
error (ISE) was utilized in the tuning of FOPID controllers [11]. Minimization of
ITSE was used to identify the best-tuned parameters of FOPID controller [12]. Zwee
Lee Gaing proposed an objective function [13] for the optimum design of FOPID
controller and this function was used for PID controller tuning [14, 15]. In addition to
the techniques mentioned, a variety of objective functions were created by combining
ITAE, IAE, ISE, and ITSE with weighted combinations of settling time, rise time,
steady-state error, and overshoot [12, 16, 17]. Chaotic map-based algorithms were
also used in the literature [14, 18, 19]. The advantage of chaotic maps is they will
improve the existing algorithm performance which will further optimize the objective
functions. Along with these techniques, authors have used multi-objective optimiza-
tion [20, 21] by combining more than one objective function. Here, a set of Pareto
solutions are generated, from these a suitable solution will be identified. Unknown
parameters of FOPID controller were identified using salp swarm optimization [22],
cuckoo search optimization algorithm [23]. A brief comparison of various optimiza-
tion algorithms for AVR system controllers was discussed in [24]. The design of
FOPID controllers for the AVR system using various optimization algorithms was
discussed in [25–30]. A frequency-domain approach for optimal tuning of FOPID
controller was discussed by [31, 32].
Optimal Design of Fractional Order PID Controller … 21

The paper was organized into the following sections. Sect. 2 discusses the oper-
ation of AVR system and analysis of system response parameters. A brief overview
of fractional differ-integrals and FOPID controllers was given in Sect. 3. A brief
description of the Black Widow Optimization algorithm and its working method-
ology was discussed in Sect. 4. The tuning of FOPID controller parameters using
the BWO algorithm [33] was described in Sect. 5. The performance of the BWO-
FOPID controller was compared with other optimization-based FOPID controllers
and robust analysis for the proposed controller was also made in Sect. 6.

2 Overview of Automatic Voltage Regulator (AVR) System

Synchronous generators are commonly used in power generation systems. Due to

the variations in the load or sudden changes in power usage, the generators produce
oscillations at the output for a significant amount of time. These oscillations may
lead to system instability and can cause catastrophes. To improve the terminal voltage
stability, generators are controlled by excitation systems and AVR systems. Various
constituents of the AVR system are amplifier, exciter, generator, and sensor [14]. The
interconnections of various system blocks were shown in Fig. 1.
Initially, the generator terminal voltage V t (s) given to the sensor circuit converts
the terminal voltage into a proportional voltage signal V s (s). Then this signal is
subtracted from the reference voltage V ref (s) and the error signal V e (s) will be
generated. The error signal strength will be improved with the amplifier and the
output of the amplifier connects to the exciter. The exciter converts the input signal
to a signal that is suitable to drive the generator. The corresponding mathematical
representations of various blocks in the AVR system are given by the following
equations.
The transfer function of the amplifier was represented as

Ka
G a (s) = (1)
1 + sτa

Vref(s) Ve(s) Amplifier Exciter Generator Vt(s)

+
+ (Ga) (Ge) (Gg)
-
Vs(s)

Sensor
(Hs)

Fig. 1 Components of the AVR system

22 V. K. Munagala and R. K. Jatoth

where K a is amplifier gain which has values in the range [100, 400] and τ a is the
time constant of the amplifier and lies in the range [0.02, 0.1].
The transfer function of the exciter was represented as

Ke
G e (s) = (2)
1 + sτe

where K e is exciter gain which has values in the range [10,400] and τ e is the time
constant of the exciter and lies in the range [0.5, 1.0].
The transfer function of the generator was represented as

Kg
G g (s) = (3)
1 + sτg

where K g is generator gain which has values in the range [0.7, 1.0], and τ g is the
time constant of the generator and lies in the range [1.0, 2.0].
The transfer function of the sensor was represented as

Ks
G s (s) = (4)
1 + sτs

where K s is exciter gain which has values in the range [1.0, 2.0], and τ g is the time
constant of the sensor and lies in the range [0.001, 0.06].
To understand the behavior and dynamics of the system, a step response is plotted
in Fig. 2 and identified its key performance parameters. Table 1 shows the variation
of key parameters with a change in K g value. Since the terminal voltage varies with
load changes, different values of K g were considered in the range [0.7, 1.0] for step

1.6

1.4

1.2
Output magnitude

0.8
Input
0.6 Kg=1
Kg=0.9

0.4 Kg=0.8
Kg=0.7

0.2

0
0 1 2 3 4 5 6 7 8 9 10
Time(s)

Fig. 2 AVR system unit step response

Optimal Design of Fractional Order PID Controller … 23

Table 1 Identified key parameters for AVR system

Parameter K g = 0.7 K g = 0.8 K g = 0.9 Kg = 1
Rise time 32.021 29.7327 27.8725 26.331
Settling time 423.0789 472.1487 520.1147 624.964
Steady state Error 0.125 0.111 0.1 0.091
Peak overshoot 47.85 52.94 57.61 61.97
Gain margin 1.7365 1.3928 1.1254 0.9116
Phase margin 17.6058 9.7074 3.1952 −2.3247

50
Magnitude (dB)

-50

-100

-150
0
Kg=1
Phase (deg)

Kg=0.9
-90 Kg=0.8
Kg=0.7

-180

-270
10-1 100 101 102 103
Frequency (rad/s)

Fig. 3 Bode plot of AVR system without controller

response. The gain margin and phase margin were calculated from the bode plot
mentioned in Fig. 3.

3 Fractional Calculus and Fractional Order Controllers

3.1 Fractional Calculus

Fractional calculus deals with the evaluation of real order integro-differential equa-
tions. Here, integration and differentiation are denoted by a common differ-integral
operator (D-operator) a Dta , where a, t represent limits of integration and α is the
order of differentiation.
24 V. K. Munagala and R. K. Jatoth
⎧ dα
⎪
⎪ , R(α) > 0,
⎨ dt α
α 1, R(α) = 0,
a Dt = (5)
⎪
⎪
t
⎩ ∫(dτ )−α , R(α) < 0
a

The D-operator has two famous definitions known as Grunwald–Letnikov (GL)

and Riemann–Liouville (RL) definition [34].
According to GL, the D-operation is defined as

[ t−a
h ]
α 1 k α
a Dt f (t) = lim α (−1) f (t − kh) (6)
h→0 h k
k=0

where h is the computation step size.

The RL definition for D-operator on function f (t) was defined as

α 1 dn t f (τ )
a Dt f (t) = ∫ dτ (7)
r (n − α) dt n a (t − τ )α−n+1

where α (n − 1, n) and (·) represent the Gamma function.

Fractional order dynamic systems are represented in linear terms of D-operators
given by equation

an Dαn y(t) +an−1 Dαn−1 y(t) + · · · +a0 Dα0 y(t)

= bm Dβm u(t) +bm−1 Dβm−1 u(t) + · · · +b0 Dβ0 u(t) (8)

ξ
where D ξ =0 Dt , n (0, 1 , 2, … k), m (0, 1, 2, … k), α k and β k (k n, n − 1, …
0) are arbitrary real numbers.
Eq. (8) can also be represented in a more standard form as
q
a Dt x(t) = A · x(t) + B · u(t) (9)

y(t) = C · x(t) (10)

where u Rr , x Rn and y Rp are the input signal, state and output signal of the
fractional order system and A Rnxn , B Rnxr , C Rpxn and q represent fractional
commensurate order.

3.2 Fractional Order Controller

The fractional order controllers have additional parameters (λ and μ) because of

which they give flexibility in tuning. Different forms of the fractional controller
Optimal Design of Fractional Order PID Controller … 25

Fig. 4 Different forms of

Derivative
FOPID controller Order(μ)
FOPID
λ=0,μ=2

λ=1,μ=1
PID
λ=0,μ=1
PD

λ=0,μ=0
P λ=1,μ=0 λ=2,μ=0 Integral
PI Order(λ)

were shown in Fig. 4. From the figure, it is observed that all other integer-order
controllers are variations of FOPID controller.
The generalized equation representing fractional PID controller was given by

U (s) Ki
G C (s) = = Kp + λ + Kdsμ (11)
E(s) s

U(s) represents output and E(s) represents the input of the controller. Generally,
the input for any controller is the difference between the desired signal and the
response signal. Correspondingly, in the time domain, the equation can be represented
as
μ
u(t) = K p e(t) + K i Dt−λ e(t) + K i Dt e(t) (12)

Therefore, from Eqs. (11) and (12), the real terms λ and μ make Gc (s) an infinite
order filter because of the real differentiation and integration.

4 Black Widow Optimization

The algorithm was developed by Vahideh et. al [33] based on the black widow spider
lifestyle. Generally, the female spiders are dominant than the male spiders and mostly
active during the night of the day. Whenever female spiders want to mate with male
spiders, they put pheromone on the web and males are attracted to it. After mating,
the female spiders consume the male spider. Then female spiders lay eggs as shown
in Fig. 5 and they will mature in 8–11 days. The hatched black widow spiderlings
exhibit sibling cannibalism because of competition and lack of food source. In some
special cases, the spiderlings will also consume the mother slowly. Because of this,
only strong black widow spiders only survive during their life cycle.
26 V. K. Munagala and R. K. Jatoth

Fig. 5 Female black widow

spider with eggs in her web
[33]

4.1 Initial Population

In the algorithm, a variable in the solution space is called a widow and the solution
is called black widow spider. Therefore for an n-dimensional problem, the widow
matrix consists of n elements.

Widow = [x1 , x2 , x3 , . . . , xn ] (13)

The fitness values of the population are represented as

Fitness = f (x1 , x2 , x3 , . . . , xn ) (14)

The optimization algorithm is started by generating the required candidate widow

matrix of size npop ×n as an initial population. In the next stage, from the initial
population next generation will be produced using procreation.

4.2 Procreate

To produce the next generation population, the male and female spiders will mate in
their corresponding webs. To implement this, a matrix with the same length as the
widow matrix called alpha matrix was used whose elements are random numbers.
The corresponding equations are shown in (15) and (16) where x 1 , and x 2 represent
parents and y1 , and y2 represent children.

y1 = α ∗ x1 + (1 − α) ∗ x2 (15)
Optimal Design of Fractional Order PID Controller … 27

y2 = α ∗ x2 + (1 − α) ∗ x1 (16)

The entire process is reiterated for n/2 times and lastly, the parents and spiderlings
are combined and sorted according to their fitness values. The number of parents
participating in procreation is decided by the procreation rate (PR).

4.3 Cannibalism

The black widow spiders exhibit three types of cannibalism. In sexual cannibalism,
after mating, the female spider consumes the male spider. In sibling cannibalism,
stronger spiderlings eat weaker spiderlings. In the third type, sometimes spiderlings
eat their mother. In the algorithm, this behavior was implemented as a selection of
the population according to their fitness values. The population was selected based
on cannibalism rating (CR).

4.4 Mutation

From the population, the Mutepop number of spiders is selected and mutation is
applied for any two randomly selected positions for each spider. The process of
mutation was shown in Fig. 7. The number of spiders to be mutated (Mutepop) is
selected according to the mutation rate (PM).
In the algorithm, the procreation rate was chosen as 0.6, the cannibalism rate was
chosen as 0.44, and the mutation rate was taken as 0.4. The complete flow of the
black widow optimization algorithm was shown in Fig. 6.

5 Proposed BWO-FOPID Controller

The BWO-FOPID controller provides two additional degrees of freedom. This allows
designing a robust controller for a given application. The process to tune the FOPID
controller using the BWO algorithm was denoted as a block diagram in Fig. 8.
V ref (s) is the reference voltage that should be maintained by the AVR system at
its terminals. V t (s) is the actual terminal voltage produced by the system. V e (s) is
error voltage, which indicates the difference of V ref (s) and V t (s). For each iteration
of the BWO algorithm, a population of K p , K i , k d , λ, and μ are generated and
are substituted in the objective function. FOPID controller takes V e (s) as the input
signal and produces the corresponding control signal. For this signal, the AVR system
terminal voltage and error are calculated. The process is repeated until the termination
28 V. K. Munagala and R. K. Jatoth

Start Randomly select

parents

Generate
Procreate
population using
Logistic Map

Cannibalism

Evaluate Fitness
of Individuals

Mutation

Yes No
End Stop condition Update population

Fig. 6 Flow chart of black widow optimization (BWO) algorithm [33]

Fig. 7 Mutation in BWO

algorithm

criteria were met using the method mentioned in Sect. 2. Finally, the best values of the
parameters will be identified and are used to design the optimum FOPID controller.
The designed controller was inserted into the system. The controller output is
given as input to the AVR system and it produces corresponding terminal voltage.
The terminal voltage was again compared with the reference voltage and the error
signal is produced. The process is repeated until the error signal becomes zero. When
the desired level was reached, the controller produces a constant U(s) to uphold the
output level at the terminal voltage.
Optimal Design of Fractional Order PID Controller … 29

Parameter Tuning

J=ZLG
(Objective Function)

BWO Optimization AVR System

KP KI KD λ μ
Vref(s) Ve(s) U(S) Vt(s)
+ K P + K I s − λ + KDs μ Amplifier Exciter Generator
-
BWO-FOPID Controller

Sensor

Fig. 8 Block diagram of BWO-FOPID controller

During the FOPID controller design, to tune the parameters, ZLG optimization
function was used. Although various standard optimization functions like IAE, ISE,
ITAE, and ITSE are available, it is mentioned that the ZLG producing better results.
The equation for the ZLG optimization function [13] was given in Eq. (17).

ZLG = 1 − e−β ∗ Mp + E ss + e−β ∗ (Ts − Tr ) (17)

The term Mp represents maximum peak overshoot, E ss is the steady-state error,

Ts and Tr represents settling time and rise time of the system, respectively. β is an
adjustment parameter and generally, it is taken as 1 [13]. The identified values of
controller parameters are given in Table 3. The convergence curve for the BWO-
FOPID algorithm during parameter identification was shown in Fig. 9. The range of
parameters considered for the optimization process was mentioned in Table 2.

6 Results and Discussions

All the simulations are performed using MATLAB/Simulink (with FOMCON

toolbox) version 8.1a on the computer with Intel i5 processor @ 3.00 GHz and
8 GB RAM. For the BWO optimization algorithm, the total population was chosen
as 50 and 35 iterations were performed.

6.1 Step Response

The tuned values of FOPID parameters were mentioned in Table 3. A comparison

of FOPID controllers designed using different optimization algorithms was made
30 V. K. Munagala and R. K. Jatoth

BWO
6.4
6.2
6
5.8
Cost value

5.6
5.4
5.2
5
4.8

4.6

4.4 X: 35
Y: 4.307

5 10 15 20 25 30 35
Iterations

Fig. 9 Convergence curve

Table 2 Range of K p , K i ,
Parameter Lower value Upper value
K d , λ, and μ
Kp 0.1 3
Ki 0.1 1
Kd 0.1 0.5
λ 0.5 1.5
μ 0.5 1.5

Table 3 Obtained values of K p , K i , K d , λ, and μ

Algorithm-controller Kp Ki Kd Lambda mu
BWO -FOPID(Proposed) 2.6597 0.7462 0.4263 1.0106 1.3442
C-YSGA FOPID[19] 1.7775 0.9463 0.3525 1.206 1.1273
PSO-FOPID[10] 1.5338 0.6523 0.9722 1.209 0.9702
CS-FOPID[15] 2.549 0.1759 0.3904 1.38 0.97
GA-FOPID[12] 0.9632 0.3599 0.2816 1.8307 0.5491

using the step response displayed in Fig. 10. From the figure, it can be observed that
the BWO-FOPID controller produces a low value of overshoot than the others. To
further investigate the controller performance, T s , T r , and E ss were calculated and
compared with other FOPID controllers.
The BWO algorithm produces the best parameter values because of the canni-
balism stage, in which the weak solutions are automatically omitted and only strong
solutions exist. It is observed that the BWO-FOPID controller has a better settling
time of 0.1727s and overshoot 1.2774 s and produced a very less steady-state error.
The rise time of the controller is a little higher than the PSO-FOPID and GA-FOPID
Optimal Design of Fractional Order PID Controller … 31

1.2

0.8
Magnitude

C-YSGA

0.6 PSO
CS
GA
0.4
BWO

0.2

0
0 1 2 3 4 5 6 7 8 9 10
Time(s)

Fig. 10 Step response comparison of controllers

Table 4 Performance measures for various controllers

Algorithm-controller Rise time (s) Settling time (s) Overshoot (%) Steady-state error
BWO-FOPID(Proposed) 0.1127 0.1727 1.2774 1.8972E−04
C-YSGA FOPID[19] 0.1347 0.2 1.89 0.009
PSO-FOPID[10] 0.0614 1.3313 22.58 0.0175
CS-FOPID[15] 0.0963 0.9774 3.56 0.0321
GA-FOPID [12] 1.3008 1.6967 6.99 0.0677

controllers. Moreover, the PSO-FOPID controller produced the highest overshoot of

22.58%, whereas the GA-FOPID controller produced a high rise time of 1.3008 s
and settling time of 1.6967 s.
Since, in the voltage regulator systems, overshoot causes severe problems than
rise time issues more importance should be given to optimizing overshoot. If the
operating environment of the system is strictly constrained, then a tradeoff can be
made between rise time and overshoot. The comparison for performance parameters
of different FOPID controllers was made in Table 4.

6.2 Robust Analysis

To understand the reliability of the designed controller, the robust analysis was
performed by changing the time constants of various subsystems in the range of
-20% – 20%. Step responses were plotted for variation in τ a , τ e , τ g, and τ s values.
From Fig. 11a–d, it was observed that the BWO-FOPID controller performs well
even though there is a change of parameter values up to 40%.
32 V. K. Munagala and R. K. Jatoth

1.2 1.2

1 1
Terminal Voltage

Terminal Voltage
-20%
-20%
0.8 0.8
-10%
-10%
10%
0.6 10% 0.6
20%
20%
0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time(s) Time(s)
(a) (b)

1.2 1.2

1 1

Terminal Voltage
Terminal Voltage

-20%
0.8 -20% 0.8 -10%
-10%
10%
0.6 10% 0.6
20%
20%
0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time(s) Time(s)
(c) (d)

Fig. 11 Variation of time constants of AVR system a τ a , b τ e , c τ g , d τs

7 Conclusion

A new meta-heuristic optimization-based approach for FOPID controller design to

regulate the terminal voltage of the AVR system was presented. The method uses
the Black Widow Optimization algorithm to identify optimum parameter values of
the proposed controller. The results show that the BWO tuned fractional controller
produced better values of parameters. It is due to the cannibalism exhibited by
the population in which only stronger ones survive. As a result, the controller has
improved rise time, settling time of the overall system with acceptable overshoot.
To check the controller performance, it was compared with C-YSGA FOPID, PSO-
FOPID, CS-FOPID, GA-FOPID controllers. To study the behavior of the controller
under parameter uncertainty, the robust analysis was performed and results show that
the proposed BWO tuned FOPID controller was able to perform decently.

Acknowledgements This work is funded by the Department of Science and Technology, DST-ICPS
division, Govt. of India, under the grant number DST/ICPS/CPS-INDIVIDUAL/2018/433(G).
Optimal Design of Fractional Order PID Controller … 33

References

1. Shah P, Agashe S (2016) Review of fractional PID controller. Mechatronics 38:29–41

2. Petráš I (2009) Fractional-order feedback control of a DC motor. J Electr Eng 60:117–128s
3. Fan H, Sun Y, Zhang X (2007) Research on fractional order controller in servo press control
system. In: International conference on mechatronics and automation, ICMA, pp 2934–2938
4. Domingues J, Valerio D, da Costa JS (2009) Rule-based fractional control of an irrigation
canal. In: Proceedings of 35th annual conference of IEEE industrial electronics IECON ’09,
pp 1712–1717
5. Tehrani K, Amirahmadi A, Rafiei S, Griva G, Barrandon L, Hamzaoui M, Rasoanarivo I,
Sargos F (2010) Design of fractional order PID controller for boost converter based on multi-
objective optimization. In: Proceedings of 14th international power electronics and motion
control conference (EPE/PEMC), pp 179–185
6. Changmao Q, Naiming Q, Zhiguo S (2010) Fractional PID controller design of hypersonic
flight vehicle. In: Proceedings of international conference on computer, mechatronics, control
and electronic engineering (CMCE), pp 466–469
7. Ahn HS, Bhambhani V, Chen YQ (2009) Fractional-order integral and derivative controller for
temperature profile tracking. Sadhana 34:833–850
8. Zhang D-L, Tang Y-G, Guan X-P (2014) Optimum design of fractional order PID controller
for an AVR system using an improved artificial bee colony algorithm. Acta Automatica Sin
40:973–979
9. Li M, Dingyu X (2009) Design of an optimal fractional-order PID controller using multi-
objective GA optimization. In: 2009 Chinese control and decision conference, Guilin, pp 3849–
3853
10. Zamani M, Karimi-Ghartemani M, Sadati N, Parniani M (2009) Design of a fractional order
PID controller for, an AVR using particle swarm optimization. Control Eng Pract 17:1380–1387
11. Lee CH, Chang FK (2010) Fractional-order PID controller optimization via improved
electromagnetism-like algorithm. Expert Syst Appl 37:8871–8878
12. Pan I, Das S (2012) Chaotic multi-objective optimization based design of fractional order
PIλ Dμ controller in AVR system. Int J Electr Power Energy Syst 43:393–407
13. Gaing ZL (2004) A particle swarm optimization approach for optimum design of PID controller
in AVR system. IEEE Trans Energy Convers 19:384–391
14. Tang Y, Cui M, Hua C, Lixiaong L, Yang Y (2012) Optimum design of fractional order PIλ Dμ
controller for AVR system using chaotic ant swarm. Expert Syst Appl 39::6887–6896
15. Sikander A, Thakur P, Bansal RC, Rajasekar S (2018) A novel technique to design cuckoo
search based FOPID controller for AVR in power systems. Comput Electr Eng 70:261–274
16. Zeng GQ, Chen J, Dai YX, Li LM, Zheng CW, Chen MR (2015) Design of fractional order
PID controller for automatic regulator voltage system based on multi-objective extremal
optimization. Neurocomputing 160:173–184
17. Ortiz-Quisbert ME, Duarte-Mermoud MA, Milla F, Castro-Linares R, Lefranc G (2018)
Optimal fractional order adaptive controllers for AVR applications. Electr Eng 100:267–283
18. Pan I, Das S (2013) Frequency domain design of fractional order PID controller for AVR system
using chaotic multi-objective optimization. Int J Electr Power Energy Syst 51:106–118
19. Mihailo M, alasan MC, Diego O (2020) Fractional order PID controller design for an AVR
system using chaotic yellow saddle goatfish algorithm. Mathematics, MDPI, pp 1–21
20. Zhang H, Zhou J, Zhang Y, Fang N, Zhang R (2013) Short term hydrothermal scheduling using
multi-objective differential evolution with three chaotic sequences. Int J Electr Power Energy
Syst 47:85–99
21. Dos Coelho LS, Alotto P (2008) Multi-objective electromagnetic optimization based on a
nondominated sorting genetic approach with a chaotic crossover operator. IEEE Trans Magn
44:1078–1081
22. Khan IA, Alghamdi AS, Jumani TA, Alamgir A, Awan AB, Khidrani A (2019) Salp swarm
optimization algorithm-based fractional order PID controller for dynamic response and stability
enhancement of an automatic voltage regulator system. Electronics 8:1472
34 V. K. Munagala and R. K. Jatoth

23. Bingul Z, Karahan O (2018) A novel performance criterion approach to optimum design of
PID controller using cuckoo search algorithm for AVR system. J Frankl Inst 355:5534–5559
24. Mosaad AM, Attia MA, Abdelaziz AY (2018) Comparative performance analysis of AVR
controllers using modern optimization techniques. Electr Power Compon Syst 46:2117–2130
25. Ekinci S, Hekimoglu B (2019) Improved kidney-inspired algorithm approach for tuning of PID
controller in AVR System. IEEE Access 7:39935–39947
26. Mosaad AM, Attia MA, Abdelaziz AY (2019) Whale optimization algorithm to tune PID and
PIDA controllers on AVR system. Ain Shams Eng J 10:755–767
27. Blondin MJ, Sanchis J, Sicard P, Herrero JM (2018) New optimal controller tuning method for
an AVR system using a simplified ant colony optimization with a new constrained Nelder–Mead
algorithm. Appl Soft Comput J 62:216–229
28. Calasan M, Micev M, Djurovic Z, Mageed HMA (2020) Artificial ecosystem-based optimiza-
tion for optimal tuning of robust PID controllers in AVR systems with limited value of excitation
voltage. Int J Electr Eng Educ 1:1–25
29. Al Gizi AJH, Mustafa MW, Al-geelani NA, Alsaedi MA (2015) Sugeno fuzzy PID tuning, by
genetic-neural for AVR in electrical power generation. Appl Soft Comput J 28: 226–236
30. Daniel Z, Bernardo M, Alma R, Arturo V-G, Erik C, Marco P-C (2018) A novel bio-inspired
optimization model based on Yellow Saddle Goatfish behavior. BioSystems 174:1–21
31. Monje CA, Vinagre BM, Chen YQ, Feliu V, Lanusse P, Sabatier J (2004) Proposals for fractional
PIλ Dμ tuning. In: Proceedings of 1st IFAC workshop on fractional derivatives and applications,
Bordeaux, France
32. Monje CA, Vinagre BM, Feliu V, Chen Y (2008) Tuning and auto-tuning of fractional order
controllers for industry applications. Control Eng Pract 16:798–812
33. Hayyolalam V, Kazem AAP, Algorithm BWO (2020) A novel meta-heuristic approach for
solving engineering optimization problems. Eng Appl Artif Intell 87(103249):1–28
34. Monje CA, Chen Y, Vinagre BM, Xue D, Feliu-Batlle V (2010) Fractional-order systems and
controls: fundamentals and applications. Springer, Berlin
LSTM Network for Hotspot Prediction
in Traffic Density of Cellular Network

S. Swedha and E. S. Gopi

Abstract This paper implements long short-term memory (LSTM) network to pre-
dict hotspot parameters in traffic density of cellular networks. The traffic density
depends on numerous factors like time, location, number of mobile users connected
and so on. It exhibits spatial and temporal relationships. However, only certain regions
have higher data rates, known as hotspots. A hotspot is defined as a circular region
with a particular centre and radius where the traffic density is the highest compared
to other regions at a given timestamp. Forecasting traffic density is very important,
especially in urban areas. Prediction of hotspots using LSTM would result in better
resource allocation, beam forming, hand overs and so on. We propose two meth-
ods, namely log likelihood ratio (LLR) method and cumulative distribution function
(CDF) method to compute the hotspot parameters. On comparing the performances
of the two methods, it can be concluded that the CDF method is more efficient and
less computationally complex than the LLR method.

Keywords Hotspot · LSTM · LLR · CDF · Traffic density · Cellular networks

1 Introduction

Wireless cellular networks consist of several base stations in a given geographical

area. The traffic density of a particular network in a specific area depends on numer-
ous factors like time, location, number of base stations, users connected and so on.
Traffic density of cellular networks can be computed as the number of users access-
ing or packets/bytes transmitted by every base station in a given area. The temporal
autocorrelation and spatial correlation among neighbouring base stations of cellular
network data are nonzero (refer [1, 2]). However, at a given time, only certain loca-
tions in the given area have a high influx of traffic density. We term such locations as
‘hotspots’. Forecasting the traffic density of mobile users with high accuracy, espe-

S. Swedha (B) · E. S. Gopi

National Institute of Technology, Tiruchirappalli, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 35

cially in urban areas, is the need of the hour (refer [3]). The aim of the paper is to
identify such hotspots using two different methods and predict the future hotspot in
the given area.
Spatio-temporal neural network architectures have also been proposed by previous
scientists using deep neural networks to address the same. It has been concluded that
long-term prediction is enhanced through such methods. Thus, the importance of deep
learning algorithms in mobile and wireless networking and even hotspot prediction
has been further imprinted (refer [4]).
Emerging hybrid deep learning models for temporal and spatial modelling with
the help of LSTM and auto-encoder-based deep model, respectively, have been
researched upon previously (refer [1]). Our motivation to employ LSTM is further
confirmed through references such as [1–3, 5–8]. For instance, sequence learning
using LSTM networks incorporating a general end-to-end approach to predict tar-
get variables of very long sequences has been presented with minimal assumptions
about the data sequence [6]. It has been concluded that LSTM network’s perfor-
mance is high even for very long sequences. Reference [5] reviews an illustrative
benchmark problem wherein conventional LSTM outperforms RNN. The shortcom-
ings of LSTM networks are addressed by proposing forget gates (refer [5]). It solves
the problem of processing input streams that are continuous and do not have marked
sequence ends. Thus, we employ LSTM for prediction and compare the performance
of the two proposed methods.
The data is represented as images for better visualization and interpretation (refer
[9]). There are some areas in these images that are more dense than others, which
form the hotspot region at a given timestamp.
In the first method, called the log likelihood ratio (LLR) method (refer [10]), we
find the centre and radius of hotspot using LLR. We consider two hypotheses: the
null hypothesis, H0 , represents the assumption that the traffic density to be uniformly
distributed and the other hypothesis, H1 , represents the actual distribution of traffic
density. We find the LLR for the two hypotheses and maximize it to obtain the hotspot
parameters. We train an LSTM network with input as sequence of raw data for 10
consecutive timestamps and target variable as hotspot parameters of 11th timestamp.
In the second method, called the cumulative distribution function (CDF) method,
we find the CDF starting from the centre of the hotspot found through LLR method
by increasing the radius from a minimum radius to the maximum radius that will
cover the entire image. We use CDF to compute the expectation value in each contour
which is the area between two concentric circles and plot it as an image. Using the
CDF, we determine the radius of hotspot as the least radius whose CDF is greater
than a threshold value fixed by us depending on the data. We train an LSTM network
with input as sequence of CDF for 10 consecutive timestamps and target variable as
hotspot parameters of 11th timestamp, where the radius of hotspot is same as radius
computed using CDF.
The proposed methods differ from the existing methods in the sense that it predicts
the hotspot parameters using LSTM. The second proposed method in fact makes use
of CDF to reduce complexity and compute the hotspot parameters. Further, the data
LSTM Network for Hotspot Prediction in Traffic … 37

is visualized as images with the hotspot region plotted on the image in order to better
understand the physical implication of the data attained.
In Sect. 2, we will briefly discuss how the data representing traffic density can be
visualized as images. We use a matrix to store the data and scale it to 255 in order to
represent it as images. We use MATLAB to plot the images. Section 3 discusses at
length about the LLR algorithm and its implementation to find the hotspot parameters.
Section 4 describes the CDF method’s algorithm to compute hotspot parameters and
each contour’s expectation value. In Sect. 5, the results of the two proposed methods
are compared. The various applications and extensions of the work presented in this
paper are discussed in Sect. 6.

2 Dataset Collection and Representation

The dataset contains the traffic density at a city level scale for one week, collected
at each hour (refer [11]). It consists of base station number and the number of users,
bytes and packets accessing that particular base station at a given hour. The corre-
sponding latitude and longitude of each base station is given in a separate file.
In order to represent the given data as an image for every timestamp, we fix the
image size as 151 × 151 (see Fig. 1). The minimum and maximum latitude and lon-
gitude is normalized and scaled to 150, with latitude serving as the x coordinate
and longitude serving as the y coordinate of the image. A matrix M, of dimension
151 × 151, is formed, and corresponding to the location of base station, the appro-
priate element of the matrix is assigned as the number of users. It must be noted that
the number of packets and bytes is not used to represent the data as image. If more
than one base station’s location corresponds to the same pixel coordinates, then we
add the data value to the already existing value at the corresponding element of M.
It is then scaled to 255 to represent it as image. Finally, 100 times the logarithm
of the matrix is represented as an image using MATLAB functions. This process is
repeated for all the timestamps.

3 Hotspot Prediction Using LLR Method

In this proposed solution, we define and identify a hotspot using log likelihood ratio
(LLR) method as to be discussed in Sect. 3.1. After identifying the hotspot parameters
which are the centre and radius of hotspot, we proceed to train the LSTM network
to predict the hotspot parameters of future timestamp. For this, we reshape the entire
matrix of dataset after normalization and use it as input to the LSTM network. We
give raw data values of 10 consecutive timestamps as input to the LSTM network and
use 11th timestamp’s hotspot parameters calculated through LLR method as target
variables.
38 S. Swedha and E. S. Gopi

Fig. 1 Representation of traffic density data as image for 115th timestamp

3.1 Algorithm to Find Hotspot Using LLR

A hotspot is defined as a region where the traffic density is high. It can be of any shape.
Based on the previous work done in [10], it has been concluded that circular hotspots
are better than ring hotspots. Thus, in this paper, we consider it to be circular. To
identify the coordinates of the centre and radius of hotspot, we employ ‘log likelihood
ratio’ (LLR) method.
Consider the total region to be represented as S. At a given location on the image
with pixel coordinates (x, y), consider a circular region R of radius r . At a given
timestamp, let J be the total number of users in the entire region, K be the expected
number of users within the circular region R assuming uniform distribution, where
K = area(R)×J
area(S)
and L be the actual number of users within the circular region R. Let z
be a random variable vector with 22, 801(151 × 151) elements, where each element
represents each pixel point of the image, taking values 0 and 1. We assume that the
elements are independent of each other. A zero represents that the the pixel is outside
the circle constructed, and one represents that the pixel is within the circle. In other
words, the values 0 and 1 represent whether or not the pixel lies outside the hotspot
region chosen.
Consider null hypothesis H0 as uniform distribution of traffic density across the
given area for the given circular region R. Under the null hypothesis H0 , the proba-
bility that a pixel is within the hotspot is, KJ and the probability that a pixel is outside
the hotspot is (J −K )
. Since L users are within the hotspot R, the probability that z
J
L J −K J −L
follows H0 is, p(z|H0 ) = KJ J
.
Consider hypothesis H1 as the actual nonuniform distribution of traffic density
across the given area. At a given location on the image with pixel coordinates (x, y),
consider a circular region R of radius r . For a given circular region R with L users
inside it and considering the hypothesis H1 , the probability that a pixel is within the
LSTM Network for Hotspot Prediction in Traffic … 39

Fig. 2 a Image of 115th timestamp, b Hotspot computed using LLR method for 115th timestamp

hotspot is LJ and the probability that a pixel is outside the hotspot is, (J −L)
J
. Since
L users are within the hotspot R, the probability that z follows H1 is, p(z|H1 ) =
L L J −L J −L
.
J J
p(z|H1 )
Thus, the log likelihood ratio is defined as LLR = log p(z|H 0)
= L log KL +
−L
(J − L) log JJ−K . In order to find the hotspot, we need to maximize LLR. We
increase the radius r from 4 to 8 units by steps 0.1 unit and traverse at every pixel
location. The circle having the maximum LLR value is labelled as hotspot (see Fig. 2).

3.2 LSTM Architecture to Predict Future Hotspot Using LLR

Matrix M is reshaped into a 22,801-dimensional vector which is normalized. A

sequence of vectors from 10 consecutive timestamps is given as input to the LSTM
network. The target variable consists of 11th timestamp’s normalized hotspot param-
eters, (r, x, y) where r is the radius of circle and (x, y) is the centre of circle. The
LSTM architecture (see Fig. 3) consists of two layers. It uses the default functions,
namely hyperbolic tangent function and sigmoid function, at the input gate, forget
gate and output gate (refer [12]). xt represents the input, which in our case is 10
consecutive vectors of size 22,801 each and yt represents the output of LSTM cell.
yt−1 represents the previous output from the cell. igt , f gt , ogt and cst are the input
gate, forget gate, output gate and cell state vectors respectively at the current instant
t. P and Q are the weight matrices. b is the biases for the gates. The equation at
the input gate is given by igt = sigma(Pi yt−1 + Q i xt + bi ). The equation at the for-
get gate is given by f gt = sigma(P f yt−1 + Q f xt + b f ). The equation at the output
gate is given by ogt = sigma(Po yt−1 + Q o xt + bo ). The equation of cell vector is
given by cst = f gt . ∗ cst−1 + igt . ∗ tan h(Pc syt−1 + Q c sxt + bc s), and the equation
of the output yt is given by yt = ogt . ∗ tan h(cst ) where .∗ represents element-wise
40 S. Swedha and E. S. Gopi

Fig. 3 LSTM cell diagram

Fig. 4 Traffic density values of 10 consecutive traffic density images (time 105 to time 114) as
plotted in Fig. 1 are reshaped into a vector each of size 22,801. The raw data is given as input
sequence to the LSTM. It predicts the hotspot parameters of the 11th timestamp (time115). We
have plotted the circle found through LLR computation (red) and LSTM-LLR predicted circle
(black). The zoomed portion of the predicted hotspot can be seen in the 12th subplot

vector product (refer [12]). The ten consecutive timestamps and its prediction of
hotspot parameters of 11th timestamp through LSTM-LLR can be seen in Fig. 4 for
timestamp 115.

4 Hotspot Prediction Using CDF Method

In the first proposed solution, we are reshaping the entire normalized matrix of dataset
and using it as input to the LSTM layer. This has higher complexity due to its large
size. Hence, our aim is to predict the hotspot parameters through a simpler approach.
One such approach is to use ‘cumulative distribution function (CDF)’. It provides a
better representation of the dataset. In this method, we compute expectation values
within contours which are concentric circles from the centre of hotspot calculated
through LLR method and represent it as an image. We give CDF values appended
LSTM Network for Hotspot Prediction in Traffic … 41

with centre coordinates (for each timestamp) of 10 consecutive timestamps as input to

the LSTM network and use 11th timestamp’s hotspot parameters calculated through
CDF method as target variables.

4.1 Algorithm to Find Hotspot Using Cumulative

Distribution Function

4.1.1 Computation of CDF

We take the centre of hotspot computed using LLR method. We consider radius of 4
units and increase it by 1 unit until
√ all the pixels are covered. Since it is a square image,
the maximum radius would be 2 times the length of side of square. This is because
the maximum radius would occur when the√hotspot is at the corner of the square.
Thus, the radius is incremented from 4 to 2 times 151 which is approximately
213 units by steps of 1 unit. We then add the values at each pixel within the circle
considered at each iteration and divide it by the sum of all values at all pixels. We
store this value in a 210-dimensional vector. Thus, CDF is computed in this manner.
As one can expect, the last element of CDF vector will always be 1. We take the least
radius as hotspot radius for which the CDF value is greater than 0.1. We have taken
the value as 0.1 after experimentation. At this value, the radius calculated through
LLR method and radius calculated through CDF method almost coincide (see Fig. 5
as an example). This value may differ depending on the dataset.

4.1.2 Computation of Values of Contours

We start from the centre of hotspot computed using LLR method. We increase the
radius from 4 to 213 units by steps of 1 unit. Consider a 151 × 151 dimensional
matrix Contour to store values of each contour. We count the number of pixels
within a ring of inner radius r − 1 and outer radius r where r > 4 and the number of
pixels within a circle of radius r = 4. For the pixels inside the ring, the corresponding
elements of Contour matrix are assigned as difference between CDF value at r and
CDF value at r − 1 multiplied by the number of pixels inside the ring when r > 4.
For r = 4 and pixels inside a circle of radius r = 4, the corresponding elements of
Contour matrix are assigned as CDF value at r multiplied by the number of pixels
inside the circle of radius r = 4. Once all the elements of Contour matrix have been
assigned, 100 times logarithm of Contour matrix is plotted along with the hotspots
from both methods (see Fig. 5).
42 S. Swedha and E. S. Gopi

Fig. 5 (i) and (ii) represent timestamps 115 and 116, respectively. a Contour images. The red
and green circles represent the hotspot regions obtained through LLR and CDF methods, respec-
tively. Note that in the 115th timestamp, the two circles have coincided. b Cumulative distribution
function’s stem graphs

4.2 LSTM Architecture to Predict Future Hotspot Using CDF

CDF vectors of all timestamps are already in the normalized form. A sequence of
vectors with CDF value of each timestamp appended with the normalized x and y
coordinates of hotspot of that timestamp from 10 consecutive timestamps is given
as input to the LSTM network. The target variable consists of 11th timestamp’s
normalized hotspot parameters, (r, x, y) where r is the radius of circle and (x, y)
is the centre of circle. Thus, the input size becomes 210 + 2, which is 212. The
LSTM architecture consists of two layers (see Fig. 6). It uses the default functions,
namely hyperbolic tangent function and sigmoid function, at the input gate, forget
gate and output gate (refer [12]). xt represents the input, which in our case is 10
consecutive vectors of size 212 each and yt represents output of the LSTM cell. yt−1
represents the previous output from the LSTM cell. The equations of the LSTM cell
LSTM Network for Hotspot Prediction in Traffic … 43

Fig. 6 LSTM cell diagram

Fig. 7 The CDF values of 10 consecutive contour images (time 105 to time 114) as plotted in
Fig. 5 are initialized as a vector each of size 210. It is appended with normalized values of centre
of hotspot parameters of the 10 consecutive timestamps. It is given as input sequence (each of size
212) to the LSTM network. It predicts the hotspot parameters of the 11th timestamp (time115).
We have plotted the circle found through LLR computation(red) and CDF computation (green) and
LSTM-CDF prediction (yellow). In this case, the circles have coincided. Hence, only the LSTM-
CDF predicted circle (yellow) can be seen. The zoomed portion of the predicted hotspot can be
seen in the 12th subplot

are same as those in Sect. 3.2. The ten consecutive timestamps and its prediction of
hotspot parameters of 11th timestamp through LSTM-CDF can be seen in Fig. 7 for
timestamp 115.

5 Results

Different stages of implementation of the two methods discussed can be seen in

Fig. 8. The zoomed portions of Fig. 8 can be seen in Fig. 9. It shows how close
the circles computed through CDF method and predicted using LSTM-LLR and
LSTM-CDF are with that computed through LLR method. The hotspot values can
be seen in Fig. 12. Columns 3 and 5 represent the radius in terms of kilometres and
centre coordinates (x, y) in terms of latitude and longitude, respectively. When the
44 S. Swedha and E. S. Gopi

Fig. 8 (i)–(v) represent timestamps 113–117, respectively. a Representation of traffic density as

images as described in Fig. 1. b Hotspot predicted by LLR method as described in Fig. 2. c Contour
images obtained through CDF method along with hotspot regions predicted by LLR and CDF
method as described in Fig. 5. d Hotspot regions predicted by LSTM-LLR (black circle) and LSTM-
CDF (yellow circle) methods along with the hotspot regions computed through LLR (red (see Fig. 2))
and CDF (green (see Fig. 5)) methods

Fig. 9 Represents the zoomed portions of each timestamp (113–117) in Fig. 8d

LSTM Network for Hotspot Prediction in Traffic … 45

Fig. 10 (i) loss function of LSTM network (ii) normalized error in prediction of radius for testing
data (iii) normalized error in prediction of x coordinate of centre of hotspot for testing data (iv)
normalized error in prediction of y coordinate of centre of hotspot for testing data for a LLR method
b CDF method

Fig. 11 Represents the average error in hotspot parameters when different methods (LLR and CDF)
are compared with LSTM-LLR and LSTM-CDF predictions

Fig. 12 Represents the hotspot parameter values found through LLR and CDF computation and
their corresponding values in the real world for the given data set

radius differs by 1 unit on the image, it translates to a difference of 0.221 km (see

columns 2, 3, 4 and 5 of row 1 in Fig. 12). The comparison of the performance of
the LSTM networks of both methods can be seen in Figs. 10 and 11. The average
error in predicting the hotspot parameters can be seen in Fig. 11. The first column
describes which method and prediction are compared. The LSTM-CDF prediction
method (with radius computed using LLR value or CDF threshold) performs better
than LSTM-LLR prediction method in predicting the centre of hotspot as its average
46 S. Swedha and E. S. Gopi

error between predicted and actual values of x- and y-coordinates of the centre of
hotspot is lesser than those of LSTM-LLR prediction method (see rows 1 and 2 of
Figs. 11 and 10), although the average error between predicted and actual values of
radius predicted by the LSTM-CDF prediction method is more than that through
LSTM-LLR prediction method (see row 1, column 1 of Figs. 11 and 10). However,
the total average error in predicting hotspot parameters using LSTM-CDF prediction
using both CDF threshold and LLR value on comparison with LSTM-LLR method
is lesser (see Fig. 11). This can be further verified by seeing Figs. 4, 7 and 9. It
can be noted that prediction of centre of hotspot region is important for efficient
resource allocation and LSTM-CDF prediction even with the radius computed using
CDF threshold performs well. The LSTM architecture in CDF method is of lesser
complexity as the input size is lesser than the LSTM architecture for LLR method
(see Figs. 3 and 6). Depending on the application, the contour rings after a certain
CDF value need not be considered. This is because the CDF value tends to become
nearly the same, with marginal difference beyond a certain radius. This need not be
represented in the image if that particular application does not require them (Fig. 12).

6 Conclusions

The prediction of hotspot and computation of contour can be used to steer the antenna
to the desired direction. This would help in better reception of signals and increase
signal-to-noise ratio. It would result in better allocation of bandwidth. Further, the
beam forming capabilities are enhanced through the prediction of hotspot as the
antenna can focus better on the more dense regions. In fact, the contour images give
a better description of traffic density as it also estimates an expected value for each
contour. In this manner, a more efficient resource allocation takes place [13]. Thus,
when a mobile user is connected from one base station to the other, it results in a
smooth hand over.
As it can be seen from the training data’s images, there is only one hotspot in
every timestamp. However, this might not be the case for a different geographical
region. In such cases, the image can be divided into parts (say 4), and for each part,
a local hotspot can be computed using the algorithms mentioned in this paper. The
overall hotspot of the timestamp would be the one that corresponds to the highest
LLR value. While training the LSTM network, the target variables would be the
parameters of all local hotspots.

References

1. Wang J, Tang J, Xu Z, Wang Y, Xue G, Zhang X, Yang D (2017) Spatiotemporal modeling and
prediction in cellular networks: a big data enabled deep learning approach. In: IEEE INFOCOM
2017—IEEE conference on computer communications
LSTM Network for Hotspot Prediction in Traffic … 47

2. Feng J, Chen X, Gao R, Zeng M, Li Y (2018) DeepTP: an end-to-end neural network for mobile
cellular traffic prediction. IEEE Netw 32(6):108–115
3. Zhang C, Patras P (2018) Long-term mobile traffic forecasting using deep spatio-temporal
neural networks. In: Mobihoc ’18: proceedings of the eighteenth ACM international symposium
on mobile ad hoc networking and computing, pp 231–240
4. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a
survey. IEEE Commun Surv Tutor 21(3):2224–2287
5. Gers FA, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with
LSTM. In: Proceedings of ICANN’99 international conference on artificial neural networks
(Edinburgh, Scotland), vol. 2. IEE, London, pp 850–855
6. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks.
Adv Neural Inf Process Syst J
7. Huang C-W, Chiang C-T, Li Q (2017) A study of deep learning networks on mobile traffic
forecasting. In: 2017 IEEE 28th annual international symposium on personal, indoor, and
mobile radio communications (PIMRC)
8. Chen L, Yang D, Zhang D, Wang C, Li J, Nguyen T-M-T (2018) Deep mobile traffic forecast
and complementary base station clustering for C-RAN optimization. J Netw Comput Appl
00:1–12
9. Zhang C, Zhang H, Yuan D, Zhang M (2018) Citywide cellular traffic prediction based on
densely connected convolutional neural networks. IEEE Commun Lett 22(8):1656–1659
10. Nair SN, Gopi ES (2019) Deep learning techniques for crime hotspot detection. In: Optimization
in machine learning and applications, algorithms for intelligent systems, pp 13–29
11. Chen X, Jin Y, Qiang S, Hu W, Jiang K (2015) Analyzing and modeling spatio-temporal
dependence of cellular traffic at city scale. In: 2015 IEEE international conference, 2015 com-
munications (ICC)
12. Lu Y (2016) Empirical evaluation of a new approach to simplifying long short-term memory
(LSTM). In: arXiv:1612.03707 [cs.NE]
13. Alawe I, Ksentini A, Hadjadj-Aoul Y, Bertin P (2018) Improving traffic forecasting for 5G
core network scalability: a machine learning approach. IEEE Netw 32(6):42–49
Generative Adversarial Network
and Reinforcement Learning to Estimate
Channel Coefficients

Pranav Mani , E. S. Gopi , Hrishikesh Shekhar , and Sharan Chandra

Abstract The emergence of massive multiple-input multiple-output (MIMO)

systems throughout the world, due to the promise of enhanced data rates, has led
to an increasing need to guarantee accuracy. There is little value in large data rates if
the channel state information (CSI) is subject to frequent contamination. In the con-
text of massive MIMO systems, error in decoding the signal is introduced mainly due
to two key factors: (i) intercell interference (ii) intracell interference. The problem
of extracting the information signal from the contaminated signal can be interpreted
as a signal separation problem where all the signals involved are Gaussian. A two-
step approach is proposed to achieve this. First, a generative adversarial network
(GAN) is used, to learn the distribution of three Gaussian sources (the desired signal,
interference, and noise) from their mixture. The learnt distributions yield the mean
and variances of three Gaussian signals. The variances predicted by the GAN, along
with the sum, are used to generate the three original signals. This is interpreted as a
reinforcement learning (RL) problem. Such an interpretation provides for life-long
learning with decreasing error in the estimated signals. As a result, the desired signal
is recovered from a corrupted signal, in a two-step process.

Keywords Multiple-input multiple-output (MIMO) · Channel state information

(CSI) · Artificial neural networks (ANNs) · Deep learning (DL) · Generative
adversarial networks (GANs) · Reinforcement learning (RL)

1 Introduction

Massive MIMO systems are one of the primary emerging 5G wireless communi-
cation technologies [1] and continue to grow in popularity. These systems rely on
orthogonality of a user’s channel vector with the received signal, hence allowing a

All authors have given equal contribution.

P. Mani (B) · E. S. Gopi · H. Shekhar · S. Chandra

Department of Electronics and Communication Engineering, National Institute of Technology,
Tiruchirappalli 620015, India
e-mail: [email protected]
© Springer Nature Singapore Pte Ltd. 2021 49
E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence
for Wireless Communication, Lecture Notes in Electrical Engineering 749,
https://doi.org/10.1007/978-981-16-0289-4_4
50 P. Mani et al.

computationally inexpensive operation to recover the message signal [2]. However,

recovering the required user’s channel coefficient is challenging. In practice, a “pilot”
data is sent across the channel to calculate the required channel coefficient values,
but the values observed are corrupted by both inter- and intracell interference, as well
as noise. In recent times, deep learning (DL) algorithms [3] have played a pivotal
role in mobile and wireless networking [4] and hence, the authors look to leverage
the power of DL-based methodologies to attack this problem.
The disturbances produced due to the inter- and intracell interference can be
modeled in the form of Gaussian distributions. Hence, the problem of extracting
the channel coefficient can be reduced to the separation of three Gaussian variables.
This paper deals with the separation of the observed channel coefficients at the base
station into the intended signals, intracell interference and intercell interference. The
technique utilized in this paper to deduce the value of the channel coefficient is
broken down into two key parts. The first part deals with estimating the individual
distributions given the sum. The subsequent part deals with sampling values from
the distributions modelled in the first step, to estimate values of each of the signals
that comprise the sum.
Generative adversarial networks (GAN) [5] have shown to be promising in wire-
less channel modeling [6]. In this paper, the authors look to tap into this potential and
hence in Sect. 3.2, a GAN approach is proposed to explore isolation of the constituent
Gaussian distributions observed in the received signal. The use of GANs is based
on their ability to learn underlying data space distributions without explicit parame-
terization of output density functions. The sum is fed into three separate GANs and
each GAN is tasked with learning one of the three constituent distributions. In this
paper, the authors collectively refer to this approach of using an array of GANs as a
“Bank of GANs” approach. Utilizing the bank of GANs, the constituent parameters
of the Gaussian distributions are derived. In this case, these parameters are the mean
and variance.
Section 3.3 explores the use of a neural network with an MMSE loss function,
which can be interpreted as a reinforcement learning (RL) [7] problem, to sample
three signals from the learnt distributions conditioned on the value of the sum of the
signals which is available at the receiver’s side. In our context, the state of the agent
is made up of three variances deduced by the GANs along with the observed value
of the channel coefficient, i.e., the Gaussian sum or the mixture signal. An action
then consists of using this environment state information to estimate the values of
the original uncontaminated channel coefficient, intercell interference signal, and
intracell interference signal. The reward is set as a negative of the mean squared
error (MSE). Maximization of the reward is then equivalent to minimization of the
MSE loss function between the predicted signals and the actual signals that are
present in the training data. It can also be noted that an interpretation as a single-
step RL problem can be extended to enable lifelong learning. This would then allow
for the estimate of the signals to adapt to changes in the physical properties of the
channel with time.
Generative Adversarial Network and Reinforcement Learning … 51

2 Contributions of the Paper

In this paper, the authors explore a two-step approach to obtain corrected channel
coefficients from corrupted channel coefficients. The first step attempts to separate
the underlying distributions that make up the corrupted channel coefficients, which
consists of three separate Gaussian distributions, using a “Bank of GANs” approach.
This is followed up by extracting the true value of the channel coefficient by employ-
ing an RL agent. It is shown that the GAN-based approach is able to extract the source
distributions. Further, it is seen that a lifelong learning [8] RL system is capable of
picking up trends from the underlying data conditioned on the variance they are
drawn from and their sum. The authors also point out the idea that lifelong learning
allows adaptability to changes in the physical properties of the channel.

3 Signal Source Separation

3.1 Using Generative Adversarial Networks and

Reinforcement Learning

In practical scenarios, there exists multiple cells or base stations. In such scenarios,
an antenna in a base station receives a mixture of three signals, which are the intended
signal (message), intracell interference (interference from other users in that channel),
and intercell interference (between two cells). In order to extract the intended signal,
we need to separate the sources from the mixture. In the massive MIMO scenario,
we can model all the sources as being Gaussian distributed. That is, if X 1 , X 2 , X 3
are three Gaussian distributed signals and we have, X = X 1 + X 2 + X 3 , we need to
generate our estimates of the source signals, X 1 , X 2 , X 3 .
A two-step approach is proposed to achieve this (refer Fig. 1). First, a deep gener-
ative model is applied to learn the underlying distribution of the data, without param-
eterizing the output. A generative adversarial network (GAN) is used to achieve this.
This is shown in Sect. 3.2. Subsequently, a reinforcement learning-based approach
is used to obtain estimates of the source signals from the learnt distributions. This is
shown in Sect. 3.3.

3.2 Generative Adversarial Networks for Data Space

Distribution Modelling

Generative adversarial networks (GANs) consist of a generator and a discriminator.

The generator tries to generate data that follows the underlying probability distri-
bution of the training samples without explicitly parameterizing the output density.
52 P. Mani et al.

Fig. 1 Schematic representation of the proposed system to estimate channel coefficients. Here, the
output from each of the GANs is a random variable which follows the distribution X i ∼ N (0, βi )
where N (0, βi ) represents a Gaussian distribution with zero mean and variance of βi

The role of the discriminator is to identify real and fake data. In order to model
the discriminator and the generator, neural networks are used. A neural network G
(z, θ 1 ) is used to model the generator and it maps input z to the data space x (the
space in which the training samples of the desired distribution lie). The discriminator
neural network D (x, θ 2 ) gives the probability a vector x from the dataspace is real.
Therefore, the discriminator network weights need to be trained so as to maximize
D(x, θ 2 ) when x belongs to the real dataset, and 1−D(x, θ 2 ) when x belongs to fake
data (generated by the generator network), that is x = G(z, θ 1 ). Thus, we can interpret
the discriminator and the generator as two agents playing a minimax game on the
following objective function, V (while using binary cross entropy loss):

V (D, G) = min max Ex∼pdata (x) log (D (x)) + Ez∼pz (z) log (1 − D (G (z)))
G D
(1)
where pdata is the distribution over the real data and pz is the distribution over the
input to the generator.
The proposed algorithm uses a “Bank of GANs” approach consisting of three
GAN networks. The output of the ith generator is values drawn from the distribution
corresponding to the ith signal. Using the trained GAN, the mean and variance of
each Gaussian source are obtained (refer Fig. 2). In Sect. 3.3, this learnt distribution is
used to sample estimates of the original source signals using a reinforcement learning
approach. The procedure used to learn the distributions is presented as Algorithm 1.
Generative Adversarial Network and Reinforcement Learning … 53

Fig. 2 Collective output from the “Bank of GANs” model is plotted. The GANs were fed with
signals from Gaussian distributions having variances a 1, 2.25, 4 and b 2, 10, 10. The predicted
variances are a 0.88, 2.323, 4.115 and b 2.08, 10.147, 11.038

Algorithm 1 Bank of GAN(s)

1: Define neural network architecture for the generator and discriminator
2: Initialize three generator(G 1 , G 2 , G 3 ) and discriminator networks(D1 , D2 , D3 ) for learning the
distributions of the three signals
3: Generate data from three Gaussian distributions to represent signal, intercell and intracell inter-
ference
4: Add them to form the mixture signal
5: for each epoch do
6: for each batch do
7: for i = 1, 2, 3 do
8: Sample a batch of X i source signals from real(training data)
9: Sample batch of mixture signals and forward through G i to obtain fake data.
10: Forward the real and fake data through Di to obtain predictions of probabilities of
inputs being real.
11: Feed this probability to negative binary cross entropy log function.
12: Use this prediction to obtain the gradients with respect to the weights of the discrimi-
nator network.
13: Update the weights of the discriminator using the optimizer chosen.
14: Forward the fake data through the discriminator to obtain probability that generator
output is classified as real by the discriminator.
15: Feed this probability to the negative binary cross entropy loss.
16: Use this prediction to obtain gradient of the objective function with respect to param-
eters of the generator network.
17: Update the weights of the generator
18: end for
19: end for
20: end for
21: Use the trained networks to forward on the mixture signal to produce outputs.
22: Collect outputs and compute estimates of variance and mean.
54 P. Mani et al.

3.3 Reinforcement Learning-Based Sampling Technique for

Signal Estimation

In the previous section, a generative adversarial network was used to learn the under-
lying distribution of the source signals given the mixture signal. In this section, a
single-step RL method is proposed to sample the learnt distribution given the mixture
signal, so as to extract the original source signals. This method can be interpreted
as a reinforcement learning agent which represents its environment using the mix-
ture signal and the variances of the three source signals (obtained using the GANs).
From this state, an action simply consists of sampling three signals from the pre-
dicted Gaussians. This is done using a neural network whose outputs are the required
estimates.
It can be noted that, during training, a batch of mixture signals of batch size,
m, is fed as input and the reward is defined to be the negative of the collective
mean squared error. This allows the network to understand trends that are typical of
the signal and noise data. Also, a less noisy training period is observed. The reward
function was designed such that it acts as a measure of how close the sampled signals
are to the original source signals. The next task is to perform gradient ascent on this
reward. Equivalently, an attempt is made to minimize the negative of the reward,
3 2
R = − i=1 (1/m)|xi − x̂i | , where xi is the actual ith batch of source signals
and x̂i is the ith corresponding batch of source signals sampled by the RL agent. It
can be observed from the results shown in Fig. 3 that the predicted signals learn the
trend of the original source signals sampled from the dataspace. The algorithm used
for training this agent is described in Algorithm 2.

Algorithm 2 Single-Step RL for Sampling

1: Initialize parameters of a neural network to mathematically model sampling policy
2: Initialize parameters for learning such as learning rate, optimizer, batch size, number of epochs,
etc.
3: Generate data drawn from three Gaussians of a range of variances
4: Add them to form the mixture signal
5: for each epoch do
6: for each iteration through batches do
7: Sample batch of mixture signals and variances of inputs and forward through network to
obtain predictions of source signals.
8: Compute gradients with respect to the parameters of the neural network of the objective
3 2
function : − i=1 (1/m)|xi − x̂i |
9: Update the weights of the neural network based on the optimizer used.
10: end for
11: end for
12: Use the trained network for obtaining samples while inferencing
Generative Adversarial Network and Reinforcement Learning … 55

Fig. 3 Actual value of each signal is plotted in red while the predicted distribution is plotted in
yellow. A and B depict two instances of test results with variances for A being in the range of 1–5
for the desired signal and 5–15 for interference and noise. Variances for B are in the range of 2–6
for the desired signal and 7–15 for interference and noise. Mapping for sum, X 3 , X 2 and X 1 are
depicted in (a)–(d), respectively
56 P. Mani et al.

4 Results

In this section, the results and experiments corresponding to the study carried out in
each section are presented.

4.1 Extraction of Distributions Using GAN

The approach explored in Sect. 3.2 is implemented using data drawn from Gaussian
distributions with mean zer o and variances (a) 1, 2.25, 4 and (b) 2, 10, 10. The
resulting distributions mapped by the GANs are illustrated in Fig. 2.
It can be seen from Fig. 2 that the distributions predicted by the ‘Bank of GANs’,
all exhibit Gaussian-like bell curves. The means and variances are calculated using
the outputs from the generators. The variances obtained have a mean squared error
of 0.0109 and 0.3684 in Fig. 2a, b, respectively. These variances are then fed into the
RL network, for sampling the source signal’s values.

4.2 Estimating True Values of Channel Coefficients

In this section, the results of training a neural network are highlighted. The network
utilizes a leaky version of the rectified linear activation function [9] with a negative
slope of 0.2, for the hidden layers. For the output layer, a linear activation function
is used. For training, the network applies an RMSProp [10] optimizer with an initial
learning rate of 0.0001. In order to generate training data, the Monte Carlo [11]
approach is employed to generate signals from Gaussian distributions. The results of
training are shown on two different variance ranges for the intended and interference
signals: intended from 1 to 5, interference from 5 to 15, and intended from 2 to 6,
interference from 7 to 15. The results of testing on these models are shown in Fig. 3.
The test data is generated from Gaussians whose variances themselves are drawn
from a uniform random distribution. The predictions of the agent and the true value
of the coefficients are plotted in Fig. 3 for each of the three components along with
the sum of the three components. It can be observed that even though the signals used
to generate test data are sampled randomly, and conformed to lie within a limited
range, the trained network is able to capture the trend in the signals. Although the
RL network is able to capture the general trend in the signals, there still exists scope
for improvement in mapping the exact values of the distribution. However, the actual
mixture signal is mapped closely by the sum of the three predicted signals.
The graphs in Fig. 3 indicate that the reinforcement learning agent is able to
replicate the general trends of the input distribution with reasonable accuracy. The
observed mean squared error loss on the test data in Fig. 3a is: 0.006919, 0.81324,
2.08179, and 1.87812 for Sum, X 1 , X 2 , and X 3 , respectively, and the observed mean
Generative Adversarial Network and Reinforcement Learning … 57

squared error loss on the test data in Fig. 3b is: 0.002142, 1.74944, 2.30441, and
2.17069 for Sum, X 1 , X 2 , and X 3 , respectively.
The mixture of signals is also fed to train the GAN, along with the variances learnt
by the GAN and obtain the estimates of the original signals from the RL agent. The
observed MSE is 0.86326, 1.91800, 1.91943 and 0.00425 f or X 1 , X 2 , X 3 and Sum,
respectively.

5 Conclusions

Massive MIMO systems offer highly improved communication performance. One

of the fundamental impediments to the use of these systems is pilot contamination
[12]. The performance of the system is limited by the accuracy and complexity of
the techniques used to estimate the channel coefficients based on the pilot signals.
Considering the ability of generative adversarial networks to learn data space
distribution, their usage in signal source separation is explored. It is seen that this
approach can have practical significance in separating the distributions correspond-
ing to the various signals being received at the base station, given a mixture of
the individual distributions. It is known that independent component analysis [13]
depends on the increased Gaussianity of a mixture as opposed to the original signals.
Therefore, their usage here is ruled out when all signals are Gaussian. The authors
have therefore presented the idea of using a two-step approach to estimating channel
coefficients of a massive MIMO system from the contaminated coefficients, with the
second step, involving sampling of values using a single-step RL approach. In doing
so, the corrected channel coefficients could be obtained from the corrupted channel
coefficients within a reasonable degree of accuracy.
By using larger datasets, more accurate results can be achieved. It can be noted
that the use of Wasserstein GAN [14] increased dataset sizes, and sophisticated net-
work architectures are potential directions for improving on the accuracy of this
two-step concept. An important application of the correction of channel coefficients
in MIMO systems is enabling scaled-down power allocation at the sender side while
still achieving nonzero SINR. The realization of such a system offers the potential to
reduce power consumption while establishing finite channel capacity. In such sys-
tems, however, with large numbers of antennas, the number of channel coefficients
increase and estimating them accurately is crucial. Further with today’s usage lev-
els of mobile and hand-held wireless devices, there exists a surplus of data which
contain underlying patterns in a wide array of applications including power consump-
tion, message characteristics, etc. It is therefore important to investigate data-driven
methods such as the one proposed in this paper.
58 P. Mani et al.

References

1. Wang CX, Haider F, Gao X, You XH, Yang Y, Yuan D, Hepsaydir E (2014) Cellular archi-
tecture and key technologies for 5G wireless communication networks. IEEE Commun Mag
52(2):122–130
2. Gopi ES (2016) Digital signal processing for wireless communication using Matlab. Springer
International Publishing
3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
4. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a
survey. IEEE Commun Surv Tutor 21(3):2224–2287
5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio
Y (2014) Generative adversarial nets. In: Advances in neural information processing systems,
pp 2672–2680
6. Yang Y, Li Y, Zhang W, Qin F, Zhu P, Wang CX (2019) Generative-adversarial-network-based
wireless channel modeling: challenges and opportunities. IEEE Commun Mag 57(3):22–27
7. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press
8. Thrun S (1998) Lifelong learning algorithms. In: Learning to learn. Springer, Boston, pp 181–
209
9. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolu-
tional network. arXiv preprint arXiv:1505.00853
10. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint
arXiv:1609.04747
11. Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol. 10. Wiley,
New York
12. Elijah O, Leow CY, Rahman TA, Nunoo S, Iliya SZ (2015) A comprehensive survey of pilot
contamination in massive MIMO-5G system. IEEE Commun Surv Tutor 18(2):905–923
13. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications.
Neural Netw 13(4–5):411–430
14. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875
Novel Method of Self-interference
Cancelation in Full-Duplex Radios for
5G Wireless Technology Using Neural
Networks

L. Yashvanth , V. Dharanya , and E. S. Gopi

Abstract Full-duplex communication is a promising technique which guarantees

an enhanced spectral efficiency in modern 5G wireless communications. In this tech-
nique, same set of frequency channels is used for simultaneous uplink and downlink
signal transmissions and hence is termed as full-duplex (FD) communications or
full-duplex radios. However, a major shortcoming of this technique is the presence
of self-interference (SI), which arises due to the presence of both transmitters and
receivers in close proximity and in fact several solutions have been proposed to mit-
igate it. In this paper, we give a new insight on the applicability of neural networks
in solving (linear and nonlinear) SI problems using hybrid cancelations.

Keywords Full-duplex radios · Self-interference · Hybrid cancelation · Neural

networks

1 Introduction

It is a known fact that 5G wireless technology has a lot of defined innovative wire-
less principles and algorithms to provide maximized benefits to users in terms of
high data rate, reduced power consumption, high spectral efficiency, etc. [1]. In
such a scenario, “In-band Full-Duplex (FD)” communication is one among these
methods, that seeks to achieve better spectral efficiency. It exploits the same set of

All authors have given equal contribution.

L. Yashvanth (B) · V. Dharanya · E. S. Gopi

Department of Electronics and Communication Engineering, National Institute of Technology,
Tiruchirappalli 620015, India
e-mail: [email protected]
V. Dharanya
e-mail: [email protected]
E. S. Gopi
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 59

Fig. 1 Full-duplex
communication scenario

frequency resource channels for simultaneous uplink and downlink signal transmis-
sions [2], unlike distinct forward and reverse channels, that happens in conventional
half-duplex communication (3G/4G). The scenario is pictured in Fig. 1 for commu-
nication between two nodes in FD mode with all signals being transmitted/received
in carrier frequency band f 1 .
But, owing to the mentioned technique, there is a very high possibility of the
receiver to receive signals from its own transmitter, which is formally called as self-
interference (SI) (red highlighted signals in Fig. 1) [3–5]. SI is an artifact that arises
in FD scenarios which is highly undesirable and thus has the potential to corrupt the
receiving signal (or) received signal of interest (SOI) to a large extent. There have
been many works being reported as of today, to solve the problem of SI. One among
them, which proves to be effective is the hybrid cancelation of SI [2, 5]. This method
utilizes successive SI cancelations in analog and as well in digital domain. Whereas
such robust hybrid cancelation of SI do exist, in this paper, we give a new direction
to tackle the problem by means of artificial intelligence (AI) using neural networks,
which shall be proved to perform well even for nonlinear SI cancelation.

2 Signal Modeling

In this section, an attempt is made to model the complete transceiver at baseband

level of transmission. The complete setup along with SI cancelation mechanism is
shown in Fig. 2. As shown in figure, let x(nTs ) be the actual digital baseband data
to be transmitted. Subsequently after passing it through DAC and a power amplifier
(typically Class C power amplifier), let the final analog baseband signal that is ready to
be modulated and transmitted through antenna be denoted as x1 (t). Note that, for the
sake of simplicity, bandpass processing blocks such as modulator and demodulators
are not shown in this figure.
On the other hand, let the receiving antenna receive a signal from another host with
same carrier frequency that it used for transmitting its data to the same host. Thus,
the picture renders to an “In-band Full-Duplex” communication. As a result, let the
Novel Method of Self-interference Cancelation in Full-Duplex … 61

Fig. 2 Transceiver setup signal modeling

self-interference (SI) from the transmitter affect the receiver of the same transceiver.
Hence, the overall received signal at the input of receiver block at baseband level is
assumed to be y(t) + α1 x1 (t − β) + α2 3 x1 3 (t − β) + α3 5 x1 5 (t − β) + g(t) where
y(t) is the actual signal of interest (SOI) with α1 x1 (t − β) and α2 3 x1 3 (t − β) +
α3 5 x1 5 (t − β) representing the linear SI and nonlinear SI (neglecting higher-order
terms greater than 5th harmonic), respectively, from transmitter [2, 4]. It should be
noted that due to the presence of RF circuit application in the transceiver such as
Power amplifiers etc., it is quite possible for them to generate these higher-order
harmonics of their inputs [6]. Hence, these terms manifest as nonlinear SI to the
receiver. Furthermore, let α and β represent the possible scaling and delay factors
incurring to the transmitting signal to manifest as SI at receiver. Additionally, let g(t)
represent the additive channel noise.
Subsequently, first stage SI cancelation is performed in analog domain (discussed
in upcoming section) which tries to perform partial linear SI cancelation only. Further,
passing it through LNA and ADC, digital cancelation is also performed which makes
the resultant output free from both linear and nonlinear SI. Thus, this method of
employing both analog and digital cancelation of SI is commonly referred to as the
hybrid cancelation of SI.

3 Solutions for Self-Interference (SI) Cancelation

3.1 Outline of Hybrid SI Cancelation

3.1.1 Passive Analog Cancelation

In the passive analog cancelation, an RF component subdues the SI. This can be
realized with the help of a circulator, antenna separation, antenna cancelation, or an
isolator. One of the main limitations of this technique is that it cannot suppress the
SI reflected from the environment. More details can be found in [2, 5].
62 L. Yashvanth et al.

3.1.2 Active Analog Cancelation

The residual SI from passive analog cancelation is alleviated by the active analog
cancelation. As has been mentioned earlier, this attempt is made only to suppress the
linear SI from the composite signal, which otherwise would lead to saturation at the
ADC block leading to SOI distortion. Accordingly, let the composite received signal
as discussed be

y3 (t) = y(t) + α1 x1 (t − β) + α2 3 x1 3 (t − β) + α3 5 x1 5 (t − β) + g(t) (1)

with symbols having same meanings. Thus, active analog cancelation (or simply
call it analog cancelation) attempts to remove linear term, namely αx1 (t − β) before
processing the signal by LNA. As the transmitted signal x1 (t) is known to receiver
(because it is the same node which transmits the transmitting signal), active analog
cancelation tries to generate an estimate of αx1 (t − β) and hence removes it from the
received signal leaving with partial SOI. Literature given by Kim et al. [2], mentions
that, this estimate can be predicted as a linear combination of time-shifted versions
of the transmitting signal x1 (t). An attempt is made to understand this logic and is
mathematically worked as follows:

From Nyquist Sampling-Reconstruction theorem,

∞
t
α x̂1 (t) = α x1 (nTs ) sin c( − n) (2)
n=−∞
Ts

It can be shown that [7],

∞
t
α x̂1 (t + τ ) = α x1 (nTs + τ ) sin c( − n) (3)
n=−∞
Ts
∞
−β
=⇒ α x̂1 (τ − β) = α x1 (nTs + τ ) sin c( − n) (4)
n=−∞
Ts

Replacing τ with t and introducing a variable cn , the equation is modified as:

∞

α x̂1 (t − β) = cn x1 (t + nTs ) (5)
n=−∞
Novel Method of Self-interference Cancelation in Full-Duplex … 63

Hence, once an estimate of the linear SI is computed, it is subtracted from the

composite received signal to obtain a partially suppressed SI signal. i.e.,

y2 (t) = y3 (t) − α x̂1 (t − β) (6)

3.1.3 Digital Cancelation

As mentioned, the main challenge in the SI cancelation is the linear distortion caused
by multipath delay. Apart from this, the nonlinear distortion caused by the nonlin-
earity of the transmitter power amplifier (PA) at high transmit power, quantization
noise, and phase noise [2, 4] also hinder efficient SI cancelation. For simplicity, let us
ignore quantization noise, because it is almost inevitable to prevent it in the regime
of digital signal processing. The purpose of digital SI cancelation is to completely
suppress the residual SI from the analog cancelation techniques.
In digital cancelation technique, an attempt is made to model the SI channel filter,
whose output (which is SI) is then merely subtracted from the ADC output [4, 5, 8].
The linear interference component can be modeled as:

m
I1 (n) = h(k)y1 (n − k) (7)
k=−m+1

where I1 (n) is the linear SI component and h(k) constitutes the corresponding linear
SI channel filter parameters.
Similarly, let,

2r +1
m
I2 (n) = y1 (n − k)|y1 (n − k)|t−1 h t (k) (8)
t=3,5,7 k=−m+1

where I2 (n) is the nonlinear interference component and h t (k) constitutes the coef-
ficients of the tth order nonlinear SI channel filter model.
Thus, the calculated linear and nonlinear components can be removed from the
received signal y1 (n) as y1 (n) − I1 (n) − I2 (n) to estimate the signal of interest y(n)
as ŷ(n). In order to obtain I1 (n) and I2 (n), the filter coefficients h(k) and h t (k) have
to be estimated. For estimating the filter coefficients, let us define the associated cost
functions, J1 and J2 formulated based on the least squares setup and seek them to be
minimized. Thus, J1 and J2 , defined from [2] are as follows :

p−1

m
J1 = |I1 p (n) − y1 p (n − k)h(k)|2 (9)
n=0 k=−m+1

where p is assumed to be the number of pilot symbols. (Thus, it assumed that, in

the pilot phase, receiving SOI is considered to be absent, which means y1 p (n) is
64 L. Yashvanth et al.

the actual raw transmitting signal (before launched by transmitting antenna), i.e.,
y1 p (n) = x(n)∀n ∈ {0, 1, . . . , p − 1} and I1 p (n) serving as the linear SI at output
of analog cancelation block (digital version)). And from [9] as,

p−1

2r +1
m
J2 = |I2 p (n) − y1 p (n − k)|y1 p (n − k)|t−1 h t (k)|2 (10)
n=0 t=3,5,7 k=−m+1

with symbols having similar meanings from that of (9). These equations are solved
using pseudo-inverse technique (obtained from the method of least squares). This
phase of estimating the filter coefficients from above equations can be precisely
termed as SI channel estimation.

3.2 Proposed Solution for Implementing Digital Cancelation

Using Neural Networks

The proposition is to model the digital cancelation block using a neural network
model. Here, a feed-forward, back-propagating neural network [10] is used to esti-
mate the SI channel (comprising linear and nonlinear components jointly) by solving
the minimization problems in (9) and (10). The model can be approximated with three
hidden layers as shown in Fig. 3. Let the number of nodes in the input layer be N and
all three hidden layers contain same number of nodes as the input layer. The output
layer contains a single node for predicting the SI sample values.
Thus, the associated overall loss function based on mean square error (MSE) per
epoch would be of the form:

n=Nb −1
1
|z(n) − ẑ(n)|2 (11)
Nb n=0

Fig. 3 Architecture of the neural network used

Novel Method of Self-interference Cancelation in Full-Duplex … 65

where with assumptions of 19 “effective” weights ({h k }k=9

k=−9 ),

ẑ(n) = h −9 y1 (n − 9) + h −8 y1 (n − 8) + ... + h 0 y1 (n) + · · · (12)

+ h 8 y1 (n + 8) + h 9 y1 (n + 9)

and z(n) representing the desired SI samples. Here, Nb represents the number of
iterations. To be very precise, Nb = p − N + 2. Also, it is reinforced that y1 (n) is
same as x(n) in the training phase.
At this point, it is worth to realize the fact that, the novelty of this paper is to
utilize the same set of weights of neural networks (in analogy with the coefficients of
filter modeling SI channel) to cancel both linear and nonlinear SI from the composite
signal. Hence, the proposed method is indeed a better technique than conventional
approach, wherein two different and distinct filters are employed to cancel linear and
nonlinear SI separately.
Thus, authors claim that the computational complexity of the proposed approach
is indeed lesser than the conventional solutions.
The trained models are then used to find the linear and nonlinear interference com-
ponents from the received signal during testing phase. These interference components
are then removed from the received signal to obtain the signal of interest, ŷ(n).

3.2.1 Implementation Details

Let the additional specifics of the neural network be initialized as follows:

• Activation function for each layer—rectified linear unit(ReLU).
• Total number of input nodes, N (hence, number of “effective” weights) = 19.
• The weights are updated after each batch containing 19 samples per row is pro-
cessed. Also, total number of training samples per iteration is 19.
• The model is trained with 83 iterations per epoch.
• Number of Epochs = 400.
• Weights are updated using equations governed by Adam optimization technique
(instead of classical stochastic gradient descent). Initially the weights were initial-
ized to null values.
• For tuning the hyperparameters of the model k-fold cross-validation (k = 5) is
used instead of a separate validation dataset. Table 1 summarizes the details of the
dataset used for training and testing.

4 Results and Discussions

In this section, we describe the implementation details and corresponding results

with suitable interpretations and discussions in performing SI cancelation using pro-
posed neural networks. In order to illustrate the proposed solution, in this paper, the
following scenario is considered.
66 L. Yashvanth et al.

Table 1 Dataset details

Purpose No. of sample sets Description
Training 83 100 symbols out of 1000
symbols are used as pilot
symbols (p = 100), i.e., for
training the model. A total of
83 sample sets is formed out of
100 symbols with each set
containing 19 symbols
Testing 883 Remaining 900 symbols are
used for testing which forms
883 sample sets

With the set of specifications as mentioned in Sec. 3.2.1, a MSE of 0.1 was obtained.
1. Transmitting signal, x(n)—Baseband signal of 10s duration (sampling frequency,
f s = 100 Hz) with spectral content between 15 and 25 Hz. The signal is created
using FIR coefficients by frequency sampling technique.
2. Receiving signal, y(n)—Baseband signal of 10s duration (sampling frequency,
f s = 100 Hz) with spectral content between 35 and 45 Hz. The signal is generated
as a periodic random Gaussian signal.
3. Channel noise, g(n)—Additive white Gaussian noise with resulting SNR = 0 dB.
The relevant time-domain plots of transmitting and receiving signals are given,
respectively, in Figs. 4 and 5. The corresponding frequency domain plots are depicted
in Figs. 6 and 7. Further, in order to mimic the presence of analog versions of above
signals, the signals are defined with a higher sampling frequency, say 10 times its
original f s , i.e., 1000 Hz. Successively, as per (1), a composite signal to model the
received signal embedded in SI is formed with β = 100 and with random values for
αi ∀i ∈ {1, 2, 3}. This assumption is taken because in general, a wireless channel is
time-varying in nature [11].
As per the sequence defined in Sect. 3.1, the foremost step is SI via passive analog
cancelation. As this method is accomplished by means of physical structures, only the
successive two steps are accounted for in this paper. Accordingly, next step is analog
cancelation. In accordance with (5), an estimate of linear SI is constructed with the
help of linear combination of 40 shifted versions of the transmitting signal. Let this
estimate be subtracted from received composite signal to obtain y2 (t). The resultant
signal which passes through a LNA and an ADC, is now ready to be processed by
the trained neural network as described in Sect. 3.2.
Once the neural network is trained, the network is then employed in testing phase,
acting as a mere filter with defined weights obtained in training phase. Thus, the resul-
tant signal is now filtered and subsequently, the output is subtracted from the signal
that was partially SI free (the digital signal at input of neural network).
Novel Method of Self-interference Cancelation in Full-Duplex … 67

X 501
6
Y 5.237
5

4
Transmitting Signal Strength

-1

-2

-3

-4
0 100 200 300 400 500 600 700 800 900 1000
Sample Number

Fig. 4 Transmitting Signal (SI signal) – Time domain waveform

2
Receiving Signal Amplitude

-1

-2

-3
0 100 200 300 400 500 600 700 800 900 1000
Sample Number

Fig. 5 Receiving Signal (SOI) – Time domain waveform

68 L. Yashvanth et al.

Magnitude (dB)
0

-50

-100

-150
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
4
10
0
Phase (degrees)

-1

-2

-3

-4
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)

Fig. 6 Transmitting Signal (SI signal) – Frequency domain - Magnitude and phase response

100
Magnitude (dB)

-100

-200

-300
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
0
Phase (degrees)

-5000

-10000

-15000
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)

Fig. 7 Receiving Signal (SOI) – Frequency domain – Magnitude and phase response
Novel Method of Self-interference Cancelation in Full-Duplex … 69

2.5
Receiving Signal
2 Extracted Signal

1.5

1
Signal Amplitude

0.5

-0.5

-1

-1.5

-2

-2.5
10 20 30 40 50 60 70 80 90 100
Sample Number

Fig. 8 Superimposed Receiving Signal (SOI) and SI canceled signal – 0 to 100 samples

2.5
Receiving Signal
2 Extracted Signal

1.5

1
Signal Amplitude

0.5

-0.5

-1

-1.5

-2

-2.5
110 120 130 140 150 160 170 180 190 200
Sample Number

Fig. 9 Superimposed Receiving Signal (SOI) and SI canceled signal – 100 to 200 samples

The resultant extracted signals are compared with the SOI and the relevant plots
are sketched after averaging over 5 Monte Carlo simulations in Figs. 8, 9, 10, 11, 12,
13, 14, 15, 16, and 17.
70 L. Yashvanth et al.

2.5
Receiving Signal
2 Extracted Signal

1.5

1
Signal Amplitude

0.5

-0.5

-1

-1.5

-2

-2.5
200 210 220 230 240 250 260 270 280 290 300
Sample Number

Fig. 10 Superimposed Receiving Signal (SOI) and SI canceled signal – 200 to 300 samples

2.5
Receiving Signal
2 Extracted Signal

1.5

1
Signal Amplitude

0.5

-0.5

-1

-1.5

-2

-2.5
300 310 320 330 340 350 360 370 380 390 400
Sample Number

Fig. 11 Superimposed Receiving Signal (SOI) and SI canceled signal – 300 to 400 samples
Novel Method of Self-interference Cancelation in Full-Duplex … 71

2.5
Receiving Signal
2 Extracted Signal

1.5

1
Signal Amplitude

0.5

-0.5

-1

-1.5

-2

-2.5
400 410 420 430 440 450 460 470 480 490 500
Sample Number

Fig. 12 Superimposed Receiving Signal (SOI) and SI canceled signal – 400 to 500 samples

3
Receiving Signal
Extracted Signal
2

1
Signal Amplitude

-1

-2

-3

-4
500 510 520 530 540 550 560 570 580 590 600
Sample Number

Fig. 13 Superimposed Receiving Signal (SOI) and SI canceled signal – 500 to 600 samples
72 L. Yashvanth et al.

3
Receiving Signal
Extracted Signal
2

1
Signal Amplitude

-1

-2

-3
600 610 620 630 640 650 660 670 680 690 700
Sample Number

Fig. 14 Superimposed Receiving Signal (SOI) and SI canceled signal – 600 to 700 samples

3
Receiving Signal
Extracted Signal
2

1
Signal Amplitude

-1

-2

-3
700 710 720 730 740 750 760 770 780 790 800
Sample Number

Fig. 15 Superimposed Receiving Signal (SOI) and SI canceled signal – 700 to 800 samples
Novel Method of Self-interference Cancelation in Full-Duplex … 73

3
Receiving Signal
Extracted Signal
2

1
Signal Amplitude

-1

-2

-3
800 810 820 830 840 850 860 870 880 890 900
Sample Number

Fig. 16 Superimposed Receiving Signal (SOI) and SI canceled signal – 800 to 900 samples

3
Receiving Signal
Extracted Signal
2

1
Signal Amplitude

-1

-2

-3
900 910 920 930 940 950 960 970 980 990 1000
Sample Number

Fig. 17 Superimposed Receiving Signal (SOI) and SI canceled signal – 900 to 1000 samples
74 L. Yashvanth et al.

Further, an attempt is made to visualize the efficiency of SI cancelation in fre-

quency domain. In Figs. 18 and 19, while the former characterizes the spectral infor-
mation of the net received signal (1), the latter depicts the frequency domain informa-
tion of final SI-free extracted signal. While Fig. 18 contains substantial information
across entire baseband from 15 to 45 Hz, Fig. 19 depicts the significant information
only in the frequency range specified by SOI. Thus, it is evident that, the proposed
solution indeed suppresses the nonlinear SI very well from the SOI just with the help
of one filter (neural network), with merely one set of weights.
Also, to look at the SI cancelation in more detail, consider Fig. 4, which suggests
that for the 1000 sample transmitting signal, a peak in its amplitude occurs at approx-
imately half the duration (≈ 501th sample) of the signal. However, by virtue of (1)
and the choice of β as 100, more curiosity arises to visualize the different signal
sample values at 601th sample. This illustration is shown in Fig. 20.
Thus, as seen from the figure, SI is very well suppressed in the extracted signal.
Furthermore, the correlation between the extracted signal and desired SOI is found
to be profoundly higher.

60
Magnitude (dB)

0
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
4
10
0
Phase (degrees)

-5

-10
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)

Fig. 18 Composite signal (received by receiver) – Frequency domain – Magnitude and phase
response
Novel Method of Self-interference Cancelation in Full-Duplex … 75

Magnitude (dB)
40

-20
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
10 4
0
Phase (degrees)

-5

-10
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)

Fig. 19 SI canceled signal – Frequency domain – Magnitude and phase response

15
SI Signal
Receiving Signal
X 601 Extracted Signal
10 Y 12.57
Signal Amplitude

5
X 601
Y 0.7048
0

X 601
Y -0.7835
-5

-10
585 590 595 600 605 610 615
Sample Number

Fig. 20 Illustration of SI Cancelation : Superimposed Receiving signal (SOI), SI canceled signal

and SI signal (Transmitting signal)
76 L. Yashvanth et al.

5 Conclusions

As has been described in Sect. 3.1, an effective conventional method of curbing the
self-interference cancelation that arises in an In-band full-duplex communication is
sought using hybrid cancelation technique. In so far, it demands for the employment
of two separate optimum filters to suppress linear and nonlinear SI components,
respectively, in the digital domain. However, the trivial means to construct them
individually by solving (9) and (10) are computationally expensive. Hence, the pro-
posed method using single neural network in place of aforementioned optimum
filters greatly simplifies the computation expensiveness without compromising with
the quality of results.
This is because the neural network-based technique seeks to suppress both lin-
ear and nonlinear SI components jointly in a single step. The justification of this
above statement clearly lies in the demonstrated results from the preceding section.
It should be vigilantly noted that the suppression of SI is well achieved both in time
and frequency domain. However, it is worth reconciling the fact that only till 5th har-
monics which have more potential to cause significant SI are considered as nonlinear
terms throughout this paper, and hence are liable to be canceled. Other higher-order
terms can safely be neglected.

References

1. Chávez-Santiago R, Szydełko M, Kliks A et al (2015) 5G: the convergence of wireless com-

munications. Wirel Pers Commun 83:1617–1642. https://doi.org/10.1007/s11277-015-2467-
2
2. Kim J, Sim MS, Chung M, Kim DK, Chae CB (2016) Full duplex radios in 5G: fundamentals,
design and prototyping. In: Luo FL, Zhang C (eds) Signal processing for 5G. https://doi.org/
10.1002/9781119116493.ch22
3. Zhou M, Liao Y, Song L (2017) Full-duplex wireless communications for 5G. In: Xiang W,
Zheng K, Shen X (eds) 5G mobile communications. Springer, Cham. https://doi.org/10.1007/
978-3-319-34208-5_11
4. Ahmed EA (2014) Self-interference cancellation in full-duplex wireless systems. UC Irvine.
ProQuest ID: Ahmed_uci_0030D_12951. Merritt ID: ark:/13030/m5rz0s9d. Retrieved from
https://escholarship.org/uc/item/7zh6f8fm
5. Nwankwo CD, Zhang L, Quddus A, Imran MA, Tafazolli R (2018) A survey of self-interference
management techniques for single frequency full duplex systems. IEEE Access 6:30242–
30268. https://doi.org/10.1109/ACCESS.2017.2774143
6. Farzaneh F (2018) Introduction to wireless communication circuits
7. Proakis John G, Manolakis Dimitris K (2006) Digital signal processing, 4th edn. Prentice-Hall
Inc., USA
8. Kaiser T, Zarifeh N (2016) General principles and basic algorithms for full duplex transmission.
In: Luo FL, Zhang C (eds) Signal processing for 5G. https://doi.org/10.1002/9781119116493.
ch16
9. Haneda K, Valkama M, Riihonen T, Antonio Rodriguez E, Korpi D (2016) Design and imple-
mentation of full duplex transceivers. In: Luo FL, Zhang C (eds). https://doi.org/10.1002/
9781119116493.ch17
Novel Method of Self-interference Cancelation in Full-Duplex … 77

10. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating
errors. Nature 323:533–536. https://doi.org/10.1038/323533a0
11. Rappaport T (2001) Wireless communications: principles and practice, 2nd edn. Prentice Hall
PTR, USA
Dimensionality Reduction of KDD-99
Using Self-perpetuating Algorithm

Swapnil Umbarkar and Kirti Sharma

Abstract In this digitized world, massive amount of data is available on the network
but that too is not safe and secure from the stupefying techniques of the attackers.
These threats lead to the need for intrusion detection systems (IDSs). As standard
model KDD-99 dataset is used for research work in IDSs. But KDD-99 dataset
suffers from the dimensionality curse as the number of features and the total number
of instances available in the dataset are too large. In this paper, a self-perpetuating
algorithm on the individually analyzed feature selection techniques is proposed. The
proposed algorithm came up with reduced feature subset of up to 14 features with
reduced time, increased accuracy by 0.369%, and number of features decreased by
66.66% with J48 algorithm.

Keywords Feature selection · KDD-99 dataset · J48 algorithm · Classification ·

Dimensionality reduction

1 Introduction

Intrusion in any network is the most blemish part of the network. Those attacks are
being spread in various forms. The research work for this intrusion in a network is a
major concern. New attacks are observed every single day on our system/network.
From small firms to large organizations, all are in the trap of these attacks. Attackers
find stupefying techniques every single day to interrupt the network that conquers
the network security tools of even the big firms. Because of it, there is a need to
develop systems to fight against every recent type of attack. To solve all these threats
on the network, intrusion detection systems are made to detect the attacks. Based
on the type of attack, intrusion detection systems (IDSs) are categorized into two
types, namely host-based IDSs and network-based IDSs. Host-based IDSs scan and
examine the computer systems’ files and OS processes, while network-based IDSs
do the same over network traffic. To develop such IDSs, training of our system is to be

S. Umbarkar (B) · K. Sharma

Computer Science and Engineering Department, Parul Institute of Engineering and Technology,
Parul University, Vadodara, India

© Springer Nature Singapore Pte Ltd. 2021 79

done as such efficient outputs are drawn without any information loss. In this paper,
the KDD-99 dataset is considered to analyze the results. The KDD-99 specifies the
attacks types broadly in five types: (a) DDoS, (b) U2R, (c) R2L, (d) Probe, and (e)
normal. The pitfall of the KDD-99 dataset is its increased dimensionality which is
approximately 42*400 K. This much-increased dimensionality gives the increased
time complexity for IDSs.

1.1 Feature Selection Methods

Before training any IDS, feature selection is performed that would train our system
to detect the attacks. Targeting the feature selection process and decreasing time
complexity is the major concern of our study of research in this paper. Feature selec-
tion techniques are generally done in three ways: (a) filter methods, (b) wrapper
methods, and (c) embedded methods. Our major concern in research is basically to
uncover the technique which could give the best features in our feature selection
process by analyzing all types of feature selection techniques. Filter methods gener-
ally give the feature by analyzing the interdependence with dependent features while
wrapper methods analyze the adequacy of the feature by practically training the algo-
rithms on it and at the end embedded methods give out the features by analyzing each
recursion of the algorithm. In the paper, firstly the feature selection techniques are
individually analyzed. Feature selection techniques [1] are analyzed individually:
1. CfsSubsetEval
2. ClassifierAttributeEval
3. ClassiferSubsetEval
4. GainRatioAttributeEval
5. InfoGainAttributeEval
6. OneRattributeEval
7. SymmetricalUncertAttributeEval
8. WrapperSubsetEval
After analyzing these algorithms individually, the proposed algorithm is applied
to take out the best feature selection technique with minimum features and increased
time complexity.

2 Related Work

This section is putting light on the previous studies on feature reduction methods
and the classification methods used in order to increase the efficiency and reduce
the time complexity. In 2018, Umbarkar and Shukla [1] proposed heuristic-based
feature reduction techniques for dimensionality reduction of the KDD-99 dataset.
They considered only three feature selection techniques, viz. information gain, gain
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 81

ratio, and correlation coefficient. They managed to achieve an accuracy of 92.60%

using a C4.5 classification algorithm. They analyzed the results by considering only
the three feature selection techniques although eight feature selection techniques
are present. In 2020, Sydney Kasongo and Sun [2] proposed wrapper-based feature
extraction for wireless intrusion detection system in which they proposed the WFEU
technique giving 22 attributes of reduced feature set. They used SVM as the classifi-
cation technique giving an accuracy of only 77.16%. In 2015, Bjerkestrand et al. [3]
analyzed various feature selection algorithms and proposed three feature selection
algorithms with different attribute evaluators and search methods. They indicated
that the performance of the classifier is not affected if the number of attributes is
reduced. In 2020, Li et al. [4] proposed a method by considering the weakly corre-
lated features only. They gave out the results by dividing the original training dataset
into four parts and applied CNN, giving high accuracy and low complexity on the
NSL-KDD dataset. In 2019. Sara et al. [5] used filter and wrapper methods for feature
selection. They used feature grouping based on linear correlation coefficient cuttle-
fish algorithm (FGLCC-CFA) with a reduced feature subset of 15 features applying
FGLCC to get high speed and CFA to get the best accuracy. In 2019, Selvakumar
and Muneeswaran [6] used filter and wrapper-based methods with firefly algorithm
to reduce the large set of features in the KDD-99 dataset. Using C4.5 and Bayesian
network, they gave the reduced feature set of 10 features only stating improved
accuracy. In 2020, Alazzam et al. [7] brought an algorithm named pigeon-inspired
optimizer which is derived by evaluating three datasets: KDD-99, NSL-KDD, and
UNSW-NB15. The output presented the best efficiency in terms of TPR, FPR, accu-
racy, and F-score in comparison to other algorithms. Only one type of feature selec-
tion is used by them, wrapper methods, even though other ways can also be analyzed
to compare the results. In 2019, Hakim et al. [8] brought the analysis by consid-
ering the stated algorithms, information gain, gain ratio chi-squared and relief selec-
tion methods with J48, random forest, Naïve Bayes, and KNN classification algo-
rithms. The results come up with the best results showing enhancement in feature
selection significantly but the little increment in accuracy. In 2011, Nziga [9] used
two-dimensionality reduction techniques, one linear technique (principal component
analysis), and the other nonlinear technique (multidimensional scaling) and found a
lower-dimensional optimal set of features from the main dataset, KDD-99. In order to
compare the classification techniques also, she used J48 classified and Naïve Bayes
approach and got 4 and 12 dimensions of the features but the feature selection tech-
nique remained to two only. In 2015, Chabathula et al. [10] used principal component
analysis (PCA) feature selection technique and applied it to different classifiers like
random forest, KNN, J48, SVM and got the best result with random forest tree algo-
rithm. Here too the author remained restricted to one feature selection technique. In
2010, Das and Nayak [11] proposed a generic approach where they used divide-and-
conquer technique leaving behind the feature selection algorithm available. They
gave the idea for our algorithm, self-perpetuating algorithm, that a generic approach
can lead to an algorithm too.
After referencing all research papers related and required for our research work, we
highlighted the above work that we considered as our reference. The previous studies
82 S. Umbarkar and K. Sharma

focused on increasing the accuracy but the set of features taken remained same, no
reduction in that was done. Although some papers presented work in reduction of
features too, we have done a general approach by considering all the feature selection
techniques and taking out the best out of it using derived self-perpetuating algorithm.

3 Proposed Work

3.1 Basic Idea Behind Self-perpetuating Algorithm

Intrusion detection systems (IDSs) are designed as such to detect the attacks in the
network by training our system with the predefined dataset and applying the IDS
on our system. As standard model, KDD-99 dataset is always considered. The same
dataset is considered to propose the self-perpetuating algorithm. But the issue comes
in the training phase only of the system. As already known, the KDD-99 dataset
with 42 features is having huge dimensionality, this makes our training phase time-
consuming. The study proves that all the features present in the dataset are not of equal
importance. The set of attributes can be reduced although with improved efficiency.
Many researchers already made algorithms proposed to reduce the dimensionality
of the dataset. They studied a particular class of feature selection leaving analysis of
all the classes of feature selection techniques.

3.2 The Proposed Algorithm

In the paper, a self-perpetuating algorithm is proposed by analyzing both filter and

wrapper methods of feature selection. So major feature selection methods concerned
for analyzing the results are: (a) CfsSubsetEval, (b) ClassfierAttributeEval, (c)
ClassifierSubsetEval, (d) Information Gain, Gain Ratio, (e) OneRAttributeEval, (f)
SymmetricalUncertAttributeEval, and (g) WrapperSubsetEval. The main principle
of the self-perpetuating algorithm is based on analyzing individual feature selec-
tion techniques and then combining the best feature selection with rest so that
features should not be lost, more efficient algorithm can be derived. The bench-
mark to compare the accuracy and time complexity of every method that will be
analyzed is set as the accuracy (Ai ) and time complexity (T i ) with the full dataset.
At the initial stage of the self-perpetuating algorithm, the dataset is passed to all the
feature selection techniques successively.
Algorithm: Self-perpetuating algorithm
Input: KDD-99 dataset
Output: Optimized reduced feature set
A = Accuracy of full dataset
T = Time complexity of full dataset
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 83

Ai = Accuracy of feature set FSi

T i = Time complexity of feature set FSi
FSi : CfsSubsetEval, ClassfierAttributeEval, ClassifierSubsetEval, Information
Gain, Gain Ratio, OneRAttributeEval, SymmetricalUncertAttributeEval, Wrapper-
SubsetEval
1. for each FSi
2. do
3. Apply C4.5 to each derived feature sets got from FSi
4. if(Ai > A) && (T i > T )
5. FSbest_individual = FSi
6. end if
7. end for
8. for each (FSi - FSbest_individual)
9. do
10. FScombine_features = (FSbest_individual) Union FSi
11. Apply C4.5 to each derived feature sets got from FScombine_features
12. if(Acombine_features > A) && ( Tcombine_features > T )
13. FSbest_combine = FScombine_features
14. end if
15. end for
16. end
After applying the C4.5 classification algorithm at every technique, its accuracy
and time complexity is compared with Ai and Ti . If it is greater than the bench-
mark, then that would be the superior technique giving best accuracy in less time and
reduced feature set (FSbest_individual ). The next phase of the self-perpetuating algo-
rithm is to unite FSbest_individual with the rest of the listed feature selection techniques
and again comparing its accuracy and time complexity with Ai and Ti by applying
C4.5 classifications algorithm. After analyzing all the iterations of the second phase
FSbest_combine is derived.

4 Experimental Setup

For the study, the KDD-99 dataset is considered. The system having 8 Gb of RAM
and Windows 10 operating system. For analyzing the techniques, Weka 3.8.4 is
used. The feature selection techniques listed in [1] are applied for the entire KDD-99
training dataset. After applying different feature selection techniques, the ranking
of the features is derived, as shown in Table 1. After getting feature ranking for
different feature selection algorithms, the most important features are selected for
each category of feature selection techniques, and the rest of the features are consid-
ered as unpotential features as they are providing less information. The comparison
of potential attributes and unpotential attributes is shown in Fig. 1 (Table 2).
84

Table 1 Feature ranking

Feature selection method Feature ranking
CfsSubsetEval 3,11,12,31,37
ClassifieAttributeEval 41,13,12,20,14,15,16,17,18,11,10,9,4,2,3,5,8,6,7,19,21,40,34,33,22,35,36,37,38,39,32,31,30,25,23,24,26,29,27,28,1
ClassifierSubsetEval 3,5,11,23,32,34,35,36
Information_Gain 5,23,3,6,24,12,36,32,2,37,33,35,34,31,30,29,38,39,25,4,26,1, 40,41,27,28,10,22,16,19,13,17,11,8,14,18,9,15,7,20,21
Gain_Ratio 12,11,14,22,9,37,3,2,31,17,32,6,18,36,5,19,16,1,10,15,23,38, 25,35,39,24,26,30,4,34,33,41,40,27,29,13,28,8,7,20,21
OneRattributeEval 5,23,3,6,12,36,32,37,24,31,35,33,34,2,1,39,41,38,40,30,29,2 7,25,26,4,10,16,28,19,22,17,11,13,18,14,15,9,7,20,8,21
SymmetricalUncertAt tributeEval 12,3,37,5,6,2,36,32,23,31,24,35,34,38,33,1,25,39,30,4,26,29, 41,40,27,28,10,22,16,19,13,17,11,8,14,18,9,15,7,20,21
WrapperSubsetEval 3,5,30,35,36
S. Umbarkar and K. Sharma
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 85

Fig. 1 Attribute comparison

Table 2 Comparison of size and attributes of different reduced feature subset

Feature selection Volume (Mb) No. of selected attributes No. of unpotential
methods attributes
KDD full dataset 46.6 42 0
Cfs subset Eval 6.73 5 37
Classifier attribute 10.8 10 32
Eval
Classifier subset Eval 12.9 8 34
Info gain attribute Eval 15 10 32
Gain ratio attribute 11.4 10 32
Eval
OneR attribute Eval 15.1 10 32
Symmetrical uncert 14.4 10 32
attribute Eval
Wrapper subset Eval 8.37 5 37

For the selected feature subset, the J48 classification algorithm is applied on
the KDD-99 training dataset. For each selected feature subset, after applying the
classification algorithm in Weka 3.8.4, the results for training accuracy and training
time are as shown in Table 3.
Figures 2 and 3 show a comparison of different feature selection algorithms with
their training accuracies and training time, respectively. Clearly, from both figures,
WrapperSubsetEval is having more accuracy and less training time than the original
KDD-99 dataset with 42 features.
In the next stage feature subset selected by each feature selection method is consid-
ered and with a reduced feature set J48 classification algorithm is trained. In the next
86 S. Umbarkar and K. Sharma

Table 3 Comparison of feature selection methods

Feature Total attribute Selected Classification Accuracy Training time
selection attribute algorithm (Training%) (s)
methods
Cfs subset Eval 42 5 J48 97.140 0.800
Classifier 42 10 J48 93.892 1.760
attribute Eval
Classifier 42 8 J48 99.961 0.990
subset Eval
Info gain 42 10 J48 99.930 2.300
attribute Eval
Gain ratio 42 10 J48 98.740 5.560
attribute Eval
OneR attribute 42 10 J48 99.933 0.970
Eval
Symmetrical 42 10 J48 99.330 1.140
uncert attribute
Eval
Wrapper 42 5 J48 99.952 0.830
subset Eval

Fig. 2 Comparison of training accuracy of different feature selection methods

phase, accuracy and time complexity are calculated for the testing dataset. Table 4
gives the accuracy and time complexity of the J48 algorithm for different reduced
feature sets of feature selection methods.
From Fig. 4, WrapperSubsetEval is having more accuracy, i.e., 92.218% than
original KDD-99 dataset with 42 features. The accuracy got from reduced feature
subset is 0.133% more than original accuracy of KDD-99 calculated on 42 features.
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 87

Fig. 3 Comparison of training time of different feature selection methods

Table 4 Comparison of
Accuracy (%) Time (s)
testing accuracy and time
complexity of different All features 92.085 14.77
feature selection methods CfsSubsetEval 87.174 129.2
ClassifierAttributeEval 85.411 10.278
ClassifieSubsetEval 92.049 123.64
Information_Gain 91.829 5.234
Gain_Ratio 91.807 11.678
OneRAttributeEval 91.763 142.43
SymmetricalUncertAttributeEval 91.828 123.11
WrapperSubsetEval 92.218 138.15

As well as, a number of features also reduced from 42 to 5, i.e., decrement of 88.095%
of overall features volume (Fig. 1).
In the next phase, the reduced feature set got from WrapperSubsetEval is combined
with all other reduced feature subsets of feature selection methods. The J48 classifi-
cation algorithm is trained by combined subsets and accuracy and time complexity
is calculated for the testing dataset which is shown in Table 5.
From Fig. 6 WrapperSubsetEval combined with Gain_Ratio is having more accu-
racy, i.e., 92.454% than the original KDD-99 dataset with 42 features. The accuracy
got from the reduced feature subset is 0.369% more than the original accuracy of
KDD-99 calculated on 42 features. As well as, the number of features also reduced
from 42 to 14, i.e., decrement of 66.66% of overall features volume (Fig. 1).
88 S. Umbarkar and K. Sharma

Fig. 4 Comparison of testing accuracy of different feature selection methods

Fig. 5 Comparison of testing time of different feature selection methods

5 Conclusion and Future Work

In this paper, with the help of our proposed self-perpetuating algorithm, we reduced
feature subset of up to 14 features. On reduced feature subset testing, accuracy and
time complexity are calculated which is better than the original dataset having 42
features. The testing accuracy is increased by 0.369% and the number of features
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 89

Table 5 Testing accuracy and time complexity of feature selection techniques with
WrapperSubsetEval
Accuracy (%) Time (s)
ClassifierSubsetEval Union WrapperSubsetEval 92.064 1949.55
ClassifierSubsetEval Union Info_Gain 91.93 243.68
CfsSubsetEval Union WrapperSubsetEval 92.204 2687.01
WrapperSubsetEval Union Info_Gain 92.014 159.35
WrapperSubsetEval Union Gain_Ratio 92.454 161.29
WrapperSubsetEval union OneRAttributeEval 92.025 154.45
WrapperSubsetEval union SymmetricalUncertAttributeEval 91.99 144.32
WrapperSubsetEval Union ClassifierAttributeEval 92.0792 228.9

Fig. 6 Comparison of testing accuracy of different feature selection methods

decreased by 66.66%. Thus, the proposed algorithm successfully reduced the dimen-
sionality of the KDD-99 dataset. In the future, this work can be extended by applying
variations of mathematical operations to get the reduced feature set. Further compar-
ison of feature selection methods can be done by considering different classification
algorithms like decision tree, Naïve Bayes, KNN, etc.
90 S. Umbarkar and K. Sharma

Fig. 7 Comparison of testing time of different feature selection methods

Reference

1. Umbarkar S, Shukla S (2018) Analysis of heuristic-based feature reduction method in intrusion

detection system. In: 2018 5th international conference on signal processing and integrated
networks (SPIN), Noida, 2018, pp 717–720
2. Kasongo SM, Sun Y (2020) A deep learning method with wrapper based feature extraction for
wireless intrusion detection system. Comput Secur
3. Bjerkestrand T, Tsaptsinos D, Pfluegel E (2015) An evaluation of feature selection and reduc-
tion algorithms for network IDS data. In: 2015 international conference on cyber situational
awareness, data analytics and assessment (CyberSA), London, pp 1–2
4. Li Y, Xu Y, Liu Z, Hou H, Zheng Y, Xin Y, Lizhen Cui Y (2020) Robust detection for network
intrusion of industrial IoT based on multi CNN fusion. Measurement 154
5. Mohammadi S, Mirvaziri H, Ghazizadeh-Ahsaee M, Karimipour H (2019) Cyber intrusion
detection by combined feature selection algorithm. J Inf Secur Appl 44
6. Selvakumar B, Muneeswaran K (2019) Firefly algorithm based feature selection for network
intrusion detection. Comput Secur 81
7. Alazzam H, Sharieh A, Sabri KE (2020) A feature selection algorithm for intrusion detection
system based on pigeon inspired optimizer. Expert Syst Appl148
8. Hakim L, Fatma R, Novriandi (2019) Influence analysis of feature selection to network intrusion
detection system performance using NSL-KDD dataset. In: 2019 international conference on
computer science, information technology, and electrical engineering (ICOMITEE), Jember,
Indonesia, pp 217–220
9. Nziga J (2011) Minimal dataset for network intrusion detection systems via dimension-
ality reduction. In: 2011 sixth international conference on digital information management,
Melbourn, QLD, pp 168–173. https://doi.org/https://doi.org/10.1109/ICDIM.2011.6093368
10. Chabathula KJ, Jaidhar CD, Ajay Kumara MA (2015) Comparative study of principal compo-
nent analysis based intrusion detection approach using machine learning algorithms. In: 2015
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 91

3rd international conference on signal processing, communication and networking (ICSCN),

Chennai, pp 1–6. https://doi.org/https://doi.org/10.1109/ICSCN.2015.7219853
11. Das A, Nayak RB (2012) A divide and conquer feature reduction and feature selection algorithm
in KDD intrusion detection dataset. In: IET Chennai 3rd international on sustainable energy
and intelligent systems (SEISCON). Tiruchengode, pp 1–4. https://doi.org/https://doi.org/10.
1049/cp.2012.2241
Energy-Efficient Neighbor Discovery
Using Bacterial Foraging Optimization
(BFO) Algorithm for Directional
Wireless Sensor Networks

Sagar Mekala and K. Shahu Chatrapati

Abstract In directional wireless sensor networks (WSN), the existing neighbor’s

discovery methods involve high latency and energy consumption, compared to the
block design-based methods. Moreover, the duty cycle schedule of nodes has to be
addressed to increase the network lifetime. In this paper, an energy-efficient collab-
orative neighbor discovery mechanism using the bacterial foraging optimization
(BFO) algorithm is recommended. In this computation, each node with a directional
antenna performs beamforming using BFOA with sector number and beam direction
as the fitness function. In the end, appropriate active nodes with higher energy levels
are selected from the neighbors during data transmission. The obtained results have
shown that the recommended model minimizes power conservation and delay and
enhances the lifetime of time of network activity.

Keywords WSN · Energy · BFO · Algorithm · Neighbor

1 Introduction

Typically, WSNs consist of a finite set of resource constraint tiny devices such
as sensor and actuators deployed in a field to investigate physical and environ-
mental interests. These small-sized devices are equipped with limited power, less
storage, short-range radio transceivers, and limited processing. Therefore, they have
not only sensed capability but also data processing and communication capabili-
ties. In sensor networks nodes densely distributed in a field to attend cooperatively
engaged for allotted function such as environment monitoring (for example, temper-
ature, air quality, noise, humidity, animal movements, water quality, or pollutants),
industrial process control, battlefield surveillance, healthcare control, home intel-
ligence, and security and surveillance intelligence [1]. Traditional WSNs contain

S. Mekala (B)
Department of CSE, Mahatma Gandhi University, Nalgonda, Telangana, India
K. Shahu Chatrapati
Department of CSE, JNTUH CEM, Peddapalli, Telangana, India

© Springer Nature Singapore Pte Ltd. 2021 93

many sensor devices that can coordinate to accomplish a common task within an
environmental area. Each sensor consists of a small microcontroller for communi-
cations [2]. Effective route discovery models can be used to increase the network
lifetime by consuming limited power during communication activities [3–5].
The fundamental objective of WSNs is that discovering all the neighbors of a
sensor node is called neighbor discovery, which is one of the primitive functionalities
for many sensor networking applications. But it has been a challenging operation
since the number of neighbors of a node could not be predicted accurately. Neighbor
discovery with limited energy conservation is an essence of network formation, which
regulates sensor network setup and normal operations (such as routing and topology
control) and prolongs the lifetime of WSNs. To address the design problems of
energy-efficient neighbor discovery in recent literature, discovering the neighbor’s
process significantly classified into three categories: probabilistic, deterministic, and
quorum-based. Regular neighbor discovery mechanisms focus on reducing power
conservation by limiting the active periods of sensor nodes in sensor networks [1].
A neighbor discovery protocol (NDP) is a representation scheme to find neighbor
nodes. The basic concept of symmetric neighbor discovery, every node in the WSN
has a common duty cycle. On the other hand, nodes in network use independent
duty cycle in asymmetric approaches. In WSNs, nodes operated on limited battery
power and resource-constrained environments. Hence, NDPs need to address issues
relevant to asymmetric and symmetric duty cycles efficiently. Therefore, the lack of
supporting neighbor discovery in asymmetric networks is a considerable limitation
of the block designs-based NDP [4, 6]. In WSNs, the sensor nodes operated on three
modes by adopting a low-duty cycle (i.e., the nodes in the network have a minimum
work period than the sleep period). To evaluate the energy efficiency of nodes in
the sensor network has two primary metrics are proposed, such as discovery latency
and duty cycle. However, low-duty cycled sensor network node in standby mode
particular period of time to wake up the neighbors, leads to a considerable delay. In
general, a small duty cycle causes a longer discovery latency and vice versa. On the
other hand, some WSNs applications due to the dynamic nature of the devices in the
sensor network lead the constant changes in network topology so that the neighbor
node changes from time to time. Hence, it is a problematic process to find neighbor
nodes with limited power consumption and low discovery latency [5, 7].

1.1 Problem Identification

As discussed in [2], the neighbor discovery methods U-connect, Disco, and quorum
involve high latency and energy consumption, compared to the block design based
neighbor discovery methods. But it involves high complexity and computation
overhead since the new block design.
The collaborative neighbor discovery mechanism for directional WSN [8] applies
the beamforming technique for neighbor discovery. But during beamforming, the
appropriate sector number and beam direction have been chosen for better results
Energy-Efficient Neighbor Discovery Using Bacterial … 95

[8–11]. Moreover, this technique did not address the node’s energy level duty cycle
schedule to prolong the lifetime of the network.
To solve these issues, energy-efficient collaborative neighbor discovery mecha-
nism using BFOA was proposed.

2 Related Works

The great responsibility of resource-constrained, self-organized, and small in size

devices in WSNs are to discover neighbor nodes efficiently in many time crucial
applications. Many neighbor discovery techniques have emerged in sensor networks
with the goals of efficient power utilization and data delivery, adjusting the duty cycle
and latency [1–3]. In this subsequent section, we explore existing models, influence
to design the proposed model in this paper.
To preserve unnecessary wakeup schedules in the neighbor discovery process and
to address issues in block-based design duty cycle schedule in various applications
of IoT, Lee et al. [4] have proposed a scheme for neighbor discovery by merging
different duty cycles and block design with the help of the exclusive-OR NDP. The
implemented model remarkably operated on both asymmetric and symmetric duty
cycle environments to preserve power consumption.
To optimize the performance of discovering the neighbor process, Nur et al. [8]
have established a model called the collaborative neighbor detection (COND) appli-
ance for directional wireless sensor networks (DSNs), to achieve low-duty cycle
neighbor discovery technique in sensor network by message sampling in a distributed
manner to get neighbor nodes. The simulation results of indirect discovery COND
show that it significantly minimizes the discovery latency in the network.
In literature, numerous neighbor discovery algorithms were proposed, and most
of this emphasis is on pairwise schedule discovery. Chen et al. [9] have implemented
a neighbor discovery for mobile sensor networks using a group-based design to work
on the top-up of existing pairwise schedule node discovery mechanisms by careful
designing of schedule reference model among all nodes in the network to prolong
the lifetime and by reducing discovery latency.
By considering the node constraints, packet overhead, and traffic patterns in
WSNs, Amiri et al. [12] have suggested a bio-inspired model based on the foraging
behavior of ants, i.e., ant colony optimization. The inspired model, combined with
fuzzy logic, affects the route discovery from the source node to the sink node in
a multihop communication. The simulation results of the fuzzy logic ant colony
optimization routing algorithm (FACOR) significantly prolong the lifetime of the
network by careful maintenance of power conservation of a node in WSNs and need
to address node’s failure, mobility, and new node in a high-density network.
To enhance the performance of data packet delivery in unreliable wireless commu-
nication, Meghashree et al. [13] have offered a model by adopting the reactive route
discovery protocol in WSNs, and to reduce discovery overhead and they introduced
a biased back-off scheme in the route discovery stage.
96 S. Mekala and K. Shahu Chatrapati

Energy-efficient shortest path (EESP) model in WSNs is proposed by Lingam

et al. [14]. EESP pursues hierarchal approaches of DSR and energy-efficient AODV
along with the use of uniform load distribution among network nodes, and EESP
discovers an optimal path of minimum by considering the intermediate node energy,
from the obtained results, show that maximizes the average lifetime of nodes in the
WSNs, where nodes have been same capabilities.
Based on the proposed algorithm, only a certain number of nodes should be ON
their radio in the network for some time. All the nodes in a specific region’s network
do not exist in the active period for all time. Hence, active schedules of a node in
the network are minimized. Therefore, there is a higher possibility of discovering at
least a few neighboring devices in the WSNs with the same duty cycle. As mentioned
in the previous section, it is quite efficient and straightforward to implement an
asynchronous protocol for WSNs, and we combined COND with BFOA to prolong
the lifetime of the network in this paper [4–6, 9].

3 Energy-Efficient Neighbor Discovery Using BFOA

3.1 Overview

In this paper, an energy-efficient collaborative neighbor discovery mechanism using

BFOA is presented. In this algorithm, each node with a directional antenna performs
beamforming using BFO to poll the neighbors along with its sector. The sector
number and beam direction are considered as the fitness function for BFOA. During
the polling stage, each node sends a REPLY to the HELLO message, which contains
the node ID, its remaining energy of a node, and its duty cycle schedule. On receiving
the reply message, the polling node can obtain the complete neighborhood informa-
tion and their duty cycle schedule and energy levels. Hence, during data transmis-
sion, the optimum active nodes with sufficient battery energy are selected from the
neighbors.

3.2 Fundamentals of Optimization Algorithm

In recent years, a new stochastic technique was proposed to study optimization

problems, i.e., bacterial foraging algorithm (BFA) based on the natural nature of
Escherichia coli (E. coli) bacteria which are residing in individual viscera. The
BFA is an optimization algorithm based on computational intelligence. It has been
widely adopted and proposed in various engineering problems, including directional
antennas, power consumption mechanisms, controller design, and artificial neural
networks because of the social insect behavior of E. coli bacterium. Therefore, we
were inspired to use the basic properties of bacterium foraging optimization by
Energy-Efficient Neighbor Discovery Using Bacterial … 97

applying three processes, such as chemotaxis, reproduction and elimination, and

dispersal. In general, there are two different ways of shifts in E. Coli bacterium, such
as tumbling and swimming to perform various operations, for example, finding food,
nesting, brooding, protecting, and guarding [5, 11].
These two modes of operations can be performed randomly; that process is chemo-
tactic to find nutrients. For an actual bacterium, the tumbling movement performed
during foraging employing a collection of stretched flagella. The flagella guide the
E. coli bacterium during its movement. The bacterium can increase the speed of the
swimming rate when the flagella choose to shift in an anti-clockwise direction [11].
In this algorithm, the bacterium endures chemotactic, they were intense to forward
toward good food and prevent virulent environment, they received sufficient nutrient
gradients, and they increase their population and decide to divide accordingly to
form its reproduction blocks. In the elimination-dispersal phase, all the bacteria in a
specific block are to be destroyed, or a complete block is dispersed into a new pasture
the environment [11].

3.3 Estimation of Metrics

a. Fitness Function
The following equation illustrates the fitness function for BFOA:

F = Fswitch × (|W | − 1) × (1)

where switch delay: F switch , set of disks: W, and delay turning parameter:

=ρ×

where ρ is selected value such that the latency in every layer should be reduced.

When a node in the directional network, the discovery of neighbors guaranteed

in a given layer w ∈ W, then it is crucial to a node wait for a small period and
vice versa; otherwise, it leads to unnecessary discovery overhead. Consequently, the
selected value of ρ dynamically controls the awake up the time slot of a node in a
particular layer w ∈ W for neighbor discovery.
b. Node Residual Energy
The following equation describes every node enduring energy (Eres) intensity
after one data packet transmission/reception on a sensor network.

E res = E i − (E t x + Er x ) (2)

where E i: preliminary power intensity, E tx: power conservation for broadcast,

and E rx : power conservation for reception.
98 S. Mekala and K. Shahu Chatrapati

c. Node Duty Cycle Schedule

In this section, we demonstrated the scheduling of duty cycle, which leads
to the energy-budget of nodes in the network. The duty cycle is fraction
time a node in a work period. As we know, nodes in the network can operate
in three modes: work period, sleep period, and listens period. The following
equations can demonstrate nodes duty cycle in a network.
The number of data packets aggregated by the receiver node at time slot T is
assumed employing the following equation.

+T
tini

A1 = DPR(t)dt (3)
tini

The following equation represents the amount of data packet broadcasted by a

node at time slot T is

s +Tt x
tini +T

A2 = DPT(t)dt (4)
tini

The following equation shows that the whole packet aggregated by receiver node
excluding T tx is

T−Tt x

A3 = DPR(t)dt (5)
tini

We consider DPT(t) and DPR(t) as a low advent cost. Then, the following
equation can estimate duty cycle by considering the mean advent cost.

Z − n + DPT(t)Tt x
T < (6)
DPR(t)
Z −n
T < + Tt x (7)
DPR(t)

where 0 < T tx < T d < Tsl .

Consequently, to avoid the transmission and reception inaccuracy in the network,
the duty cycle is including the with clock drift time (T d ), which is presented in the
following Eq. 8.

Z − n + DPT(t)Tt x Z − n
T < min , + Tt x , Td , Tmax (8)
DPR(t) DPR(t)
Energy-Efficient Neighbor Discovery Using Bacterial … 99

where
1
Td = (9)
BW ∗
BW: channel utilization, bandwidth, : clock moment cost.
d. Polling Neighbors using BFO
In this algorithm, each node with a directional antenna performs beamforming
using BFO to poll the neighbors along with its layer. The layer number and
beam direction are considered as the fitness function for BFOA.

Assume deployed nodes in the network: N

Let the number of layers in the network: X
Let the number of chemotactic modes: C c and its the corresponding probability:
Pc .
Let the number reproduction iterations: Rr and its the corresponding probability:
Pr .
Let the number of elimination disperse phases: Dd and its corresponding
probability: Pd .
Let S(i) be the phase dimension, where i = 1, 2,.., N.
Let α i be the location of the nodes in the network. Where i = 1, 2, …, N.
Let Fα be the pooled aspect of the attendants and repellents of the devices, where
F signifies the gradient.
When Fα < 0 that indicates the nodes in the network are operative circumstances.
Fα = 0 that defines the nodes in the network are innocuous framework.
Fα > 0 that indicates the nodes in the network are very unpleasant conditions.

Let K c, r, d = α i c, r, d (10)

where i = 1, 2, 3, …, N.
The above equation discloses the location of every associate in S nodules’
populace at a similar incident.
Let LTN be the node lifetime in the sensor network measured in the course of
chemotactic stages.
Let C > 0 signify the elementary chemotactic phase dimension, which used to
describe the distance between the stages in the course of turns.
Let α be the unit span arbitrary track signifying all. (Using unit span, the track
of drive after a fall can be assessed.)
The following sequence steps are elaborate in design of optimization algorithms:
100 S. Mekala and K. Shahu Chatrapati
Energy-Efficient Neighbor Discovery Using Bacterial … 101

Arrange the chemotactic arguments S(i) and the X nodes according to

the ascending order of C iAC T I V E .
The nodes in the network with the high measured value of C iAC T I V E can die and
available neighbor nodes with adequate energy vales separated into two.
Note: During the replication phase, the population arranged accordingly such that
the least energy node could die, and the most magnificent nodes are split into two
halves and put down in the same environment.
If r < R, move to the removal diffusion phase. In this circumstance, we have not
attained the number of specified replication phases.
Removal Diffusion
For every, i range from 1 to X with possibility Pd , remove and diffuse every nodule,
remove the node and diffuse one to an arbitrary position in the developmental area.
If e < Pd , then move to step 1. Otherwise, the process terminated.
Note: The frequency of a chemotactic event is larger than the replication activity,
which is around more substantial than an elimination-dispersal event.
Thus, based on the sector number and beam direction, neighbors are polled along
with its sector.

3.4 Energy-Efficient Neighbor Discovery

The sequence of steps represented in this algorithm as follows:

In the polling phase, every sensor node Si broadcasts the beacons (short HELLO
message) to its neighbor set of sensor devices.

beacon
Si −→ Neighbor sensor node-set
102 S. Mekala and K. Shahu Chatrapati

Table 1 Beacon format of a

Node_ID Frame_ID Remaining energy Duty cycle
node

The border setup of the beacon is explored in Table 1.

In the next section, we study the parameter in the beacon (HELLO message),
including the remaining energy and duty cycle.
Each node sends a REPLY to the beacon (HELLO message).

REPLY
Si −→ Neighboring Nodes

On receiving the reply message, the polling node can obtain the complete details
of neighbors along with their duty schedule and energy levels.
During data transmission, the active nodes with high residual energy are selected
from the neighbors.

4 Simulation Results

4.1 Simulation Setup

The proposed energy-efficient neighbor discovery using bacterial foraging optimiza-

tion (EENDBFO) algorithm is simulated in NS2 and compared with the collabora-
tive neighbor discovery (COND) mode [8]. The performance metrics are neighbor
discovery delay, neighbor discovery ratio, packets received, or average node residual
energy (Table 2).

4.2 Simulation Results and Analysis

Impact of differing the nodes To analyze the group of nodes in a sensor network,
we change node density in the area accordingly from 50 to 200.
Figure 1 exhibited the discovery delay for EENDBFO and COND when the
number of nodes diversified in the way of 50, 100, 150, and 200. Concerning Fig. 1,
the discovery delay in EENDBFO reduces from 7.5 to 3.2, and discovery delay in
COND reduces from 10.0 to 5.1. But the discovery delay in EENDBFO is 36%
smaller when differentiated with COND.

Figure 2 depicts the discovery ratio calculated for EENDBFO and COND when
several nodes in the network have differed. From obtained results, the nodes grow
from 50 to 200, and the discovery ratio EENDBFO grows from 0.40 to 0.54, and
the discovery ratio COND grows from 0.20 to 0.45. The analysis shows that the
discovery ratio of EENDBFO is 33% larger concerning the COND.
Energy-Efficient Neighbor Discovery Using Bacterial … 103

Table 2 Simulation metrics

Number of nodes deployed 50, 100, 150 and 200
Size of deployment area 1300 × 1300
Deployment type Uniform random
MAC protocol IEEE 802.11b
Traffic type CBR
Data transmission rate 50 kb
Propagation model Free space model
Antenna type Directional antenna
Modulation BPSK
Number of directions 4
Transmission range 200–400 m
Slot duration 1 ms
Initial node energy 12.0 J
Transmission energy consumption 0.660 W
Reception energy conservation 0.395 W

Fig. 1 Neighbor discovery

delay for varying the nodes

Fig. 2 Neighbor discovery

ratio for varying the nodes
104 S. Mekala and K. Shahu Chatrapati

Fig. 3 Packets received for

varying the nodes

Figure 3 explored the set of data packet reception in EENDBFO and COND when
the number of nodes diversified in the way of 50, 100, 150, and 200. The simulation
results show that the reception of packets at EENDBFO extends from 1233 to 1688,
and the reception of data packets at COND extending from 809 to 1381. Hence, the
reception of data packets at EENDBFO is 24% larger packets when differentiated
with COND.
Simulation in Fig. 4 explores that the average residual energy constructed in
EENDBFO and COND when the nodes in the network have differed. As per simu-
lation, EENDBFO, the average remaining energy of a node reduces from 11.8 to
11.4 J, and COND, the average remaining energy of a node reduces from 7.4 to 5.1 J.
The average residual energy in EENDBFO is 44% smaller than COND.

Impact of varying Communication Range of Node To analyze the impact of the

communication range of nodes in a network’s deployed area, we carefully engineered
nodes, transmission, and receiver range accordingly from 200 to 400 m. Figure 5
exhibits the discovery delay accumulated from EENDBFO and COND when the
number of nodes diversified in the way of 200, 250, 300, 350, and 400. Concerning
Fig. 5, the discovery delay in EENDBFO improved from 3.8 to 4.8, and the discovery

Fig. 4 Average residual

energy for varying the nodes
Energy-Efficient Neighbor Discovery Using Bacterial … 105

Fig. 5 Neighbor discovery

delay for varying the
communication range

delay in COND improved from 4.6 to 7.0 s. But the discovery delay of EENDBFO
is 26% smaller when differentiated with COND.
Figure 6 depicts the node discovery ratio calculated for EENDBFO and COND
when several nodes in a sensor network setup are deferred. In simulation results, the
nodes are range from 200 to 400, and the node discovery ratio in EENDBFO reduces
from 0.70 to 0.59, and another hand discovery ratio in COND reduces from 0.43 to
0.26. The analysis clearly shows that the discovery ratio of EENDBFO is 50% faster
when compared with COND.
Figure 7 explores the set of data packet reception in EENDBFO and COND when
the number of nodes diversified in the way of 200, 250, 300, 350, and 400. The
simulation results show that the reception of packets at EENDBFO extends from
1757 to 2149, and the reception of data packets at COND extends from 1439 to
1750. Hence, the reception of data packets at EENDBFO is 20% larger packets
when differentiated with COND.
Simulation results in Fig. 8 explore the average residual energy constructed in
EENDBFO and COND when the nodes in the network have differed. As per simu-
lation, EENDBFO, the average remaining power of a node increases from 10.4 to

Fig. 6 Neighbor discovery

ratio for varying the range
106 S. Mekala and K. Shahu Chatrapati

Fig. 7 Packets received for

varying the range

Fig. 8 Average residual energy for varying the range

11.4 J, and COND, the average remaining power of a node increases from 5.5 to
6.9 J. The average residual energy in EENDBFO is 44% larger than COND.

5 Conclusion

In this paper, we have developed an EENDBFOA for directional WSN. In this algo-
rithm, each node with directional antenna performs beam forming using BFO to
poll the neighbours along its sector. During the polling stage, the appropriate active
nodes with higher energy levels can be selected from the neighbours during data trans-
mission. By simulation results, it has been shown that the proposed EENDBFOA
minimizes discovery delay and energy consumption and increases discovery ratio.
Energy-Efficient Neighbor Discovery Using Bacterial … 107

References

1. Manir SB (2015) Collective neighbor discovery in wireless sensor network. Int J Comput Appl
(0975–8887), 131(11)
2. Choi S, Lee W, Song T, Youn J-H (2015) Block design-based asynchronous neighbor discovery
protocol for wireless sensor networks. J Sens 2015. Article ID 951652, 12 p
3. Selva Reegan A, Baburaj E (2015) An effective model of the neighbor discovery and energy
efficient routing method for wireless sensor networks. Indian J Sci Technol 8(23). https://doi.
org/10.17485/ijst/2015/v8i23/79348, Sept 2015
4. Lee W, Song T-S, Youn J-H (2017) Asymmetric neighbor discovery protocol for wireless sensor
networks using block design. Int J Control Autom 10(1):387–396
5. Sun W, Yangy Z, Wang K, Liuy Y (2014) Hello: a generic flexible protocol for neighbor
discovery. IEEE
6. Qiu Y, Li S, Xu X, Li Z (2016) Talk more listen less: energy-efficient neighbor discovery in
wireless sensor networks. IEEE
7. Karthikeyan V, Vinod A, Jeyakumar P (2014) An energy-efficient neighbour node discovery
method for wireless sensor networks. arXiv preprint arXiv:1402.3655,2014
8. Nur FN, Sharmin S, Ahsan Habib M, Abdur Razzaque M, Shariful Islam M, Almogren A,
Mehedi Hassan M, Alamri A (2017) Collaborative neighbor discovery in directional wireless
sensor networks: algorithm and analysis. EURASIP J Wireless Commun Netw 2017:119
9. Chen L, Shu Y, Gu Y, Guo S, He T, Zhang F, Chen J (2015) Group-based neighbor discovery
in low-duty-cycle mobile sensor networks. IEEE Trans Mobile Comput
10. Agarwal R, Banerjee A, Gauthier V, Becker M, Kiat Yeo C, Lee BS (2011) Self-organization
of nodes using bio-inspired techniques for achieving small-world properties. IEEE
11. Das S, Biswas A, Dasgupta S, Abraham A (2009) Bacterial foraging optimization algorithm:
theoretical foundations, analysis, and applications. Foundations of computational intelligence,
vol 3. Springer, Berlin, pp 23–55
12. Amiri E, Keshavarz H, Alizadeh M, Zamani M, Khodadadi T (2014) Energy efficient routing
in wireless sensor networks based on fuzzy ant colony optimization. Int J Distrib Sensor Netw
2014. Article ID 768936, 17 p
13. Meghashree M, Uma S (2015) Providing efficient route discovery using reactive routing in
wireless sensor networks. Int J Res Comput Appl Robot 3(4):145–151
14. Sathees Lingam P, Parthasarathi S, Hariharan K (2017) Energy efficient shortest path routing
protocol for wireless sensor networks. Int J Innov Res Adv Eng (IJIRAE) 4(06):2349–2163
Auto-encoder—LSTM-Based Outlier
Detection Method for WSNs

Bhanu Chander and Kumaravelan Gopalakrishnan

Abstract Wireless sensor networks (WSNs) have got tremendous interest from
various real-life appliances, in particular environmental applications. In such long-
stand employed sensors, it is difficult to check the features and quality of raw sensed
data. After the deployment, there are chances that sensor nodes may expose to unsym-
pathetic circumstances, which result in sensors to stop working or convey them to
send inaccurate data. If such things not detected, the quality of the sensor network
can be greatly reduced. Outlier detection ensures the quality of the sensor by safe
and sound monitoring as well as consistent detection of attractive and important
events. In this article, we proposed a novel method called smooth auto-encoder to
learn strong plus discriminative feature representations, and reconstruction error of
among input–output of smooth auto-encoder is utilized as an activation signal for
outlier detection. Moreover, we employed LSTM-bidirectional RNN for maturity
voting for collective outlier detection.

Keywords WSNs · Outlier · Smooth auto-encoder · LSTM-RNN

1 Introduction

From the results of recent advances in computer connections, wireless machinery

tools, and information and communication technologies now have novel technology
named Wireless Sensor Networks (WSNs). In ancient times, we have wired nodes
and networks which produce very low results. But, by the development of wireless
networks, now it is sufficient to build large networks with a high outcome. From
the innovation, WSNs are widely employed in numerous real-life appliances such
as industrial, academic, civil, and military fields. Here, the deployed nodes major
objective is to collect valuable raw sensed data from real world and transform it to
expert systems where they are analyzed for appropriate decision making [1–4]. But,

B. Chander (B) · K. Gopalakrishnan

Department of Computer Science and Engineering, Pondicherry University, Pondicherry 609605,
India

© Springer Nature Singapore Pte Ltd. 2021 109

depending on the applications, sensor nodes deployed in the unsympathetic surround-

ings along with the inhibited potentials of sensors like bandwidth, energy, CPU
performance, memory, etc., compose WSNs defenseless against dissimilar types of
misbehaviors or outliers. An outlier or anomaly is a dimension that extensively devi-
ates from the common patterns of sensed data. Outlier first defined by well-known
researcher Grubbs in the year 1960 as “An Outlier observation or outlier, is one that
deviates markedly from other members of the sample in which it occurs.” In the year
2015, Titouna defined as “An observation that deviates a lot from other observations
and can be generated by a different mechanism,” Van Vuong in 2017 defined outlier
as “Data Items which doesn’t conform to an expected pattern or other items in the
data set.” Outliers that influence the sensor data always keep up a correspondence
to node software or hardware malfunction, reading errors, malicious attacks, and
strange events. Hence, it is vital to proficiently as well as perfectly classify outliers
in the sensor data to make sure data eminence, safe and sound monitoring, plus
consistent recognition of attractive and important events. However, in the environ-
ment of WSNs, the outlier is one of the sources that significantly manipulate the
collected data, for a high nature decision making on expert systems needs the quality
of data [3–12]. And the main motive why we need to detect outliers in sensed data
is outliers from time to time contain more interesting than normal patterns since
they may enclose essential hidden information. If we find this outlier, we can detect
upcoming events before they occur.
Outlier or anomaly detection is an extremely indispensable issue for numerous
research domains like data mining, health care, drug or medicines, and sensor
networks, and it has been researched in dissimilar types of phases and appliances.
Here, any outlier exposure task intends to identify patterns that are not in deal with
the estimated pattern, and such out of the ordinary patterns are defined as outliers.
Moreover, it is useful in discover noise, fraud, defects, intrusion, and errors, and so
on [1–3]. The subject of how to detect outlier or anomaly has turn into increasingly
considerable in the disease diagnosis, machine health, and fraud detection of credit
card, environmental events, intrusion detection of set-ups, and other aspects. Outlier
discovery approaches make sure about the quality of sensor data. Both effectual
and well-organized outlier detection methods designed for WSNs not only cate-
gorize outliers in a scattered and online mode with high detection precision and
low false alarm, but also gratify WSN resource limitations in terms of bandwidth,
memory, communication, and computational. But the situation of sensor networks
plus the temperament of sensor data makes the intention of a proper outlier exposure
system to tough. The unnatural atmosphere of a WSN also impacts on outlier detec-
tion approaches. From literature, sensor node restraint on computational power then
memory, and it stands for the approaches employed or developed for outlier expo-
sure, should contain a low computational complication and engage in modest memory
space [4–6, 13–15]. Additionally, relabeled or preprocessed data are complicated to
acquire in WSNs. Outlier or anomaly recognition for WSNs should be intelligent
to function on un-labeled data. From the above study, we can conclude that the key
dispute of outlier or anomaly exposure in WSNs is to identify outliers with high
Auto-encoder—LSTM-Based Outlier Detection … 111

precision at the same time as consuming nominal resources of the sensor node or the
network [7, 8, 14–16].
Researches on outlier detection have prepared numerous techniques in a great
improvement stage for sensor network. Machine learning (ML) models produce enor-
mous outcomes with huge accuracy when they have prearranged datasets. Coming
to WSN, which is placed in real-time appliances, it is difficult to get labeled data.
Deep learning (DL) is a subdivision of ML; with the help of numerous nonlinear
transformations; DL permits the networks to mechanically learn the representation
from raw sensed data. Long-established ML methods generally need considerable
domain professionals as well as time to opt for first-rate features from the raw sensed
data. DL endows with simplifying the progression of artificial feature mining that
conquer the limitations of traditional or long-established ML models [3–6, 13, 14].
In the year 2006, Hinton prepared the researchers pay interest to DL. In his work,
Hinton projected a technique to teach (DNN) deep neural network: At first, supervised
greedy-layer-wised pre-assistance was employed to locate a set of moderate first-rate
parameters, after that a minor change to the complete network, and it effectively shuns
the dilemma of gradient loss.
Since the advances in numerous additional features, DL and representation knowl-
edge have been employed in the field of outlier exposure in WSNs as sound and
well. In contrast with long-established ML, DL offers more talent and shows poten-
tial progression in WSNs modernization. Some of them are high prediction accu-
racy—ML cannot analyze the entire complex parameters such as channel variation
and obstructions, etc., but DL can efficiently abstract all of this layer by layer. In
addition, there is no need to preprocess input data—because DL typically selects
the feature parameters openly composed from the set-up; this improvement of DL
lessens the proposed design complication and enlarges the forecast precision [7, 8,
10–12, 14–16]. As a replacement for scheming features physically, it will be more
helpful whether a model is to learn or trained for efficient feature signs by design
from dream data, during representation learning. For a mixture of computer visualiza-
tion odd jobs, an idyllic feature or characteristic representation should be healthy for
small disparities, smooth for maintaining data structures, as well as discriminative for
taxonomy associated tasks. DL provides more success rate in WSN applications since
DL tolerates incomplete or erroneous input raw sensed data and can easily handle
a large amount of input information and the capability to make control decisions
[1–4, 7, 8, 14–16].
In this manuscript, we offered an auto-encoder modification, smooth auto-encoder
(SmAE), headed for learn strong, hefty, booming as well as choicy feature represen-
tations. It is totally special from standard AEs which recreate every example from its
encoding; we utilize the encoding of every instance to rebuild its confined neighbors.
In this way, the learned demonstrations are constant, invariable with local neighbors
and moreover vigorous to petite deviations of the inputs.
112 B. Chander and K. Gopalakrishnan

2 Related Literature Work

In [15], authors designed a novel outlier detection model that gains knowledge
of spatio-temporal relationships among dissimilar sensors and that gained knowl-
edge of learned representation employed for recognition of outliers. They utilized
SODESN-based distributed RNN structural design along with the leaning method to
train SODESN. Authors simulate designed representation with real-world collected
data, and outcomes show excellent detection even with inadequate link qualities. In
[16], authors proposed two outlier detection approaches, LADS and LADQA, espe-
cially for WSNs. The authors employed QS-SVM and converted it to a sort problem
to decrease linear computation complications. The experimental outcome confirms
that the proposed approaches have lower computation with high accuracy of outlier
detection. The authors of [7] come with a different schemes, and they proposed a deep
auto-encoder to discover outliers from the spectrum of sensor nodes by comparing the
normal data with a fixed threshold value. Evaluation is done with various numbers of
hidden layers, and results achieve better performance. The authors fabricated a model
for varied WSNs to detect outliers by design with the help of cloud data analysis
[8]. The tentative evaluation of the projected process is performed on both edge plus
cloud test on real data that have been obtained in an indoor construction atmosphere
after that faint with a series of fake impairments. The gained outcome shows that
the projected process can self-adapt to the atmosphere deviations and properly clas-
sify the outliers. The authors of [9] prepared novel outlier detection Toeplitz support
vector data description (TSVDD) for efficient outlier exposure, and they utilized the
Toeplitz matrix for random feature mapping which decreased both space and time
complications. Moreover, new model selection was employed to make the model
stable with lower dimensional features. The experimental results on IBRL datasets
reveal that TSVDD reaches higher precision and lower time complexity in compar-
ison with hand methods. Reference [10] projected a one-class communal outlier
exposure with LSTM-RNN-based neural network. A model is trained with standard
time series data; here, the prediction error of a definite quantity of most recent time
steps higher than the threshold value will be a sign of collective outlier. The represen-
tation is calculated on a time series report of the KDD-1999 dataset and simulation
express that the projected replica can notice collective anomaly resourcefully.
The authors of [11] planned a model that forecasts the subsequent short-range
frame from the preceding frames via employing LSTM with denoising auto-encoder.
Here, the restoration error among the input with output of the auto-encoder is
employed like an activation gesture to sense original actions. In [12], the authors
proposed a novel technique where deep auto-encoder deployed as a central classifier,
training model with cross-entropy loss job, and back-propagation model to resolve
the issues of weight updating momentum factors are included. Lab experimental
observations on datasets have shown that the proposal has a high-quality precision
of feature extraction. In [17], the authors employed a novel LSTM for detection of
outliers from time-based data which are number f time steps ahead. The prediction
Auto-encoder—LSTM-Based Outlier Detection … 113

inaccuracy of a solitary point was subsequently calculated through forming its fore-
cast error vector to robust a multivariate Gaussian supply, which was employed to
evaluate the probability of the outlier’s actions. The authors of [18] proposed a model
by merging both predictive auto-encoders with LSTM for acoustic outlier gestures.
They predicted a novel reconstruction error on auto-encoder, the data instance that
shows the above threshold named as a novel event. The design of [18] is too utilized
in a [19], and here, LSTM-RNNs are engaged to predict short-range frames.

3 Proposed Model

3.1 Auto-encoder Preliminaries

Deep learning approaches learn from multiple-layered non-linear transformations

from input to output representations. These kinds of operations are put at a high-
ranking position on feature extraction compare to long-established models. As we
discussed in the above sections, DL approaches are able to detain more concep-
tual features at superior layers, representative DL models like stacked auto-encoder
(SAE), convolution neural networks (CNN) and deep belief networks (DBN) have
reportedly reached great achievement in object tracking, event recognition, image
classification, computer vision, and pattern recognition, etc. However, in comparison
with all these DL models, auto-encoders can directly gain knowledge of the feature
mapping task by lessening the reconstruction error among input and its encoding.
According to LeCun—1987, some of the probabilistic approaches might describe as
intermediate variables whose subsequent be construed as a representation. Auto-
encoder agenda comes under this category; it starts by explicitly demonstrating
feature-extracting task in a definite parameterized closed form. This entire function
is named as encode and is referred as ( f ), proficient calculation of a feature vector
h = f (x) starting an input x. For every data instance x(t) commencing a data set
{x(1), ..., x(T )}, we describe h(t) = f (x(t)). Here, h(t) is the representation or regu-
lations code computed from x(t). Here, a new parameterized encoder function (g )
mapping from feature-space to input generates rebuilding r = g (h). Auto-encoders
are parameterized from end to end by their encoder along with decoder, those skilled
using dissimilar training beliefs. The parameters (θ ) of both encoder and decoder,
intelligent on the tasks like rebuilding then possible of unique input, i.e., efforts to
obtain the little reconstruction or rebuilding error L(x, r)—a measure of the deviation
among x, and its replication an average on a training set. In favor of trimming down
the restoration error to detain the construction of the data creation and division, it
was essential that somewhat in the learning stage standard or the parameterization
prevents the AE from discovering the unique purpose to zero reconstruction mistake.
Moreover, for the progression of auto-encoders, numerous regularization terms
are proposed. Sparse auto-encoders penalize the hidden-unit sparse with an L1
consequence or Kullback–Leibler (KL) deviation; Denoising auto-encoders (DAE)
114 B. Chander and K. Gopalakrishnan

tests healthy for undersized random perturbations. Contractive auto-encoders (CAE)

diminish the amount of efficient freedom levels of the demonstration via accumu-
lating a systematic contractive consequence. Here, DAE and CAE vigorous to minute
alterations of the inputs between training exemplars.

3.1.1 Smooth Auto-encoder

In comparison with other auto-encoders, smooth auto-encoders (SmAE) are

completely special; it powerfully learns nonlinear feature signs. On behalf of every
input, SmAE intends to restructure or rebuild its surrounded target neighbors,
as an alternative to modernize itself as long-established auto-encoder variations
accomplish.
The actual objective principle of SmAE is described as:

n
k

dh

Jsm AE () = wn x j , xi L x j , g(f(xi )) + β K L(ρ ρ j )
i=1 j=1 j=1

Here, w(·, ·) indicates the weight function characterizes in the course of a

smoothing kernel w(x j , x i ) = 1/Z K(d(x j , x i )), as well as the point Z is exploited

to certification kj=1 wn x j , xi = 1 for every part of i.
k is the amount of aimed neighbors of x i . d(·,·) is a distance or space that deals
with the feature space/relationship in the novel space. The first term of the above
equation pushes the bordering input examples to enclose related representation. Like
same, the produced attributes are not only strong with confined dissimilarities, in
addition flexible the same as the input examples on various datasets. Coming to the
second term, it regularizes on model complication via KL sparsity. Depending on
the applications, dissimilar kernels are applied for mapping nonlinear separable data
instances to high dimensions. Here, we applied the radial basis function (RBF) since
it has a lesser amount of computational complication which is very helpful to increase
the network lifetime. Another reason to choose the RBF kernel was the parameters
gamma (γ ) and cost (c) play a key task. The same way various distance measures can
be adopted based on metric learning; here, we employed Mahalanobis because it uses
group means and variances for every variable which solves the correlations issues.
A target neighbor has dissimilar variations or concepts, in training data; we decide
some k nearest neighbors (Knn) based on Mahalanobis. The knn measured as the k
target neighbors along with the subsequent detachments is applied to calculate the
weight assignment. In this article, the designed model carried on the restoration or
reconstruction error that the SmAE assigned to recreate an output that has not seen in
the training time. The weighted reconstruction error of the same with
cross-entropy
loss for sample x i can be simplified into the form— kj=1 wn x j , xi x j . Log(g(f (xi )))

− (1 − kj=1 wn x j , xi x j . Log(1 − g(f (xi ))). The target function of same will be
n h
written as: JsmAE () = i=1 L ce (x j , g(f (xi ))) + β dj=1 KL(ρ||ρ j ).
Auto-encoder—LSTM-Based Outlier Detection … 115

3.2 BLSTM-RNN Preliminaries

From the past few years, long short-tem memory recurrent neural network (LSTM-
RNN) has been applied to represent the association among existing and preceding
events and holds the time series issues efficiently. In general, an LSTM-RNN is not
just skilled on standard data; it is also talented to forecast, quite a few times steps
ahead of an input. Most of the methods estimate outliers at the individual level from
related work, not at the collective level; moreover, both standard and outlier data
applied for the training phase. Coming to the design background of LSTM, it holds
the input layer, LSTM hidden layer, along with output layer. Here, an input node
takes input data and output will be any transform (sigmoid, tanh, etc.) utilities. The
LSTM hidden layer is fashioned as of the count of smart nodes those be entirely
associated to the input plus output nodes. Coming to the LSTM hidden layer, this
is fashioned from some well-groomed nodes that are completely related to the input
as well as output nodes. Gradient descent plus back-propagation are some of the
well-known techniques utilized for best of its loss function; moreover, it updates its
factors. As discussed above, LSTM has the authority to integrate deeds into a system
by teaching it with standard data. So the network turns as envoy for variants of the
data. In detail, a prediction is prepared with two characteristics: first—the value of
an example and second—its pose at a definite time. This suggests that two similar
input values at dissimilar times possibly outcome in two dissimilar outputs. And the
reason was LSTM-RNN is stateful, and it has a remembrance that varies in reaction
to inputs.
So here we designed a fresh communal outlier detection technique based on
LSTM with bidirectional RNN. Here, LSTM-RNN is utilized for correlation among
proceeding as well as existing time steps to approximate outlier score for every time
step, which helps for expanding time series outlier detection. In addition, bidirectional
RNNs were used to access the situation from mutual temporal information. It was
done through handing the input data in both ways through two split hidden layers
and then delivering to the output layer. The arrangement of bidirectional RNNs along
with LSTM memory blocks guides to bidirectional-LSTM set-up; here, perspective
from both temporal ways is exploited. And, this helps toward developing collective
outlier exposure based on the progressions of solitary data points based on their
outlier score. We prepare an LSTM-RNN on standard data to gain knowledge of
ordinary behavior. And this prepared model confirmed on standard validation sets
for guesstimate model parameters. Then the resulted classifier utilizes to cost the
outlier score in support of a particular data instance at every time step. The outlier
score of a series of time steps will summative starting the involvement of every
entity. With the help of fixed threshold, a series of solitary time steps is specified as
communal outlier if its outlier score is superior to the threshold.
For better accuracy, we made a mixture of initial assessments for the finest
network with changeable hidden layers along with their size. The finest network
draft for RNNs holds three hidden layers with 156-256-156 LSTM units. As well
as, the BRNNs finest layout contains six hidden layers where three for each track
116 B. Chander and K. Gopalakrishnan

through 216 LSTM units each. System weights are repetitively restructured with stan-
dard gradient descent through back-propagation of sum of squared error (SSE). The
gradient descent technique entails the system weights to be initialized by nonzero
standards; as a result, we initialize the weights by random Gaussian distribution
through mean (0) as well as standard deviation (0.1).
Threshold value
In designed model, both input and output layers of the system enclose 54 units. So, the
accomplished auto-encoder is proficient toward recreating every example along with
novel events through handing out the reconstruction error with an adaptive threshold.
For each and every time step, the Euclidean distance flanked by every identical input
value along with the system output is calculated. The spaces are summed-up plus
separated through the number of coefficients to stand for the reconstruction error of
every time step through a solitary assessment. For the best possible event exposure,
a threshold “θth ” is practiced to gain a binary sign. Here, threshold is relative to
median of the error signal of a progression e0 as a result of multiplicative coefficient
β, restrained to the choice from βmin = 1 to βmax = 2:

θth = β ∗ median(e0 ) (1)

4 Experimental Results

For the experimental results, we consider a benchmark data set accumulated from
WSNs positioned at Intel Berkeley Research Laboratory (IBRL). Here, the data are
gathered with the TinyDB in network query processing method that fabricates on
the Tiny-OS policy. The sited WSNs include 54 Mica2Dot sensor nodes sited in the
IBRL for 30 days nearly 720 h. Sensors assemble data with five dimensions voltage
in volts, light in Lux, the temperature in degree celsius, humidity ranging from 0
to 100%, along with set-up topology position for every 30 s gap. In IBRL set-up,
Node 0 is considered the starting node and remained nodes broadcast data with more
than a few hops to node 0. The farthest nodes produce the sensed data with the
utmost of 10 hops. For 720 h, these 54 nodes collected almost 2.3 million readings.
For the experiment on the proposed model, we prepared a testing set because the
original atmosphere data did not contain any labels as to which data are normal
and outlier. Here, we choose three dimensions: humidity, temperature, and voltage.
We engaged k-fold cross-validation to reduce the samples to half the size. Each
of these dimensions holds 5000 sample for preparation or training, 1000 sample
for certifying or validating, plus 2000 samples for testing. In our technique, we
apply the unsupervised-based target neighbor toward exemplify weight function;
furthermore, network is promote fine-tuning by RBF kernel. The hyper-parameters
like layer aspect, sparsity consequence as well as kernel bandwidths were found
through the validation set. In Table 1, we mentioned model accuracy, precision along
Auto-encoder—LSTM-Based Outlier Detection … 117

Table 1 Accuracy and error rate of same with existing methods

Method AE DAE CAE Same AE-2 DAE-2 CAE-2 SmAE-2
Accuracy 95.24 94.68 92.49 96.17 97.12 96.46 97.98 99.26
Error rate 1.98 1.58 1.46 1.18 1.64 1.15 1.10 0.82

with its error rates; moreover, we compared the proposed model with other existed
AE, DAE, CAE, AE-2, DAE-2, CAE-2, and SmAE-2 (Here, sign 2 indicates the
projected version build by stacking 2 hidden layers). And, the outcome result shows
the same has high-quality accuracy as well as a low error rate.
So, two labels normal and outlier are prepared, and this data set holds nearly
5000 normal and 400 abnormal samples. We employed k-fold cross-validation to
compress the samples to half the size. After various testing procedures, we fix with
best network model that trained with momentum of 0.9, learning velocity l = {1e−3
to 1e−7 } with dissimilar noise sigma values σ = {0.25, 0.5}. 54–20–54, 54–54–54,
and 54–128–54 is the best network topologies, so we maintained the same network
set-up for every testing, for the best comparison work. Here, each of the network
topologies are trained and evaluated for every 50 epochs.
From Table 2, it clearly shows the overall valuation and evaluation of the projected
method with other accessible up-to-date techniques, and our projected method shows
the most excellent results in terms of precision, recall, along with F measure up
to 96.89, 94.43, and 95.90 with input noise-standard-deviation of 0.5 see Table
3. Here, we conducted numerous experiments with numerous network layouts for
each network style; however, we explain the most excellent standard network layout
results. By arranging input noise deviation of 0.1, 0.25 both BLSTM-AE, LSTM-
SmAE produce higher values precision values with nearly 91.89, 93.46, and 92.24,

Table 2 Performance evolution designed method with various network layouts and existing
methods
Method Precision Recall F-measure

L ST M − AE
54–54–54, 54–128–54 89.1, 90.24 86.90, 88.29 85.24, 86.32
B L ST M − AE

L ST M − D AE
54–54–54, 54–128–54 92.23, 92.85 91.63, 91.45 93.48, 92.69
B L ST M − D AE

L ST M − C AE
54–54–54, 54–128–54 94.41, 94.90 92.81, 93.14 93.43, 94.17
B L ST M − C AE

L ST M − Sm AE
54–54–54, 54–128–54 96.77, 96.89 93.32, 94.43 95.73, 95.90
B L ST M − Sm AE
118 B. Chander and K. Gopalakrishnan

Table 3 Comparison evolution of designed method precision with existing model

Precision
98
96
94
92
90
88 Precision
86
84

94.64. In the end, the achieved results showed that the employment smooth auto-
encoder with the different BLSTM proposal is valuable; moreover, a momentous
performance progression with respect toward the modern technology was designed.
For collective outlier detection, we observe the prediction errors of a particular
successive data point. For this, we calculate the relative error, collective error, and
prediction error. For relative error, we analyze the error among real value along with
its own prediction value from BLSTM-RNN at each time step. And, it is described in
equation form as RE (x, x) = |x − x|. Prediction Error Threshold (PET) estimates
that a particular time stamp value is considered as standard or a point for possible
collective outlier. If the RE is more than the calculated PET, then it is placed as a
point of collective outlier. Finally, the collective range identifies the collective outliers
based on the count of minimum amount of outliers come into view in succession in
a network flow.

5 Conclusion

Outlier detection in WSNs is one of the challenging tasks, and researchers are contin-
uously working on it for best results. In this article, we try to plan a model for outlier
detection by employing smooth auto-encoder-based LSTM-bidirectional RNN. We
have provoked this process as a result of exploring SmAE for its robust learning ability
of target neighbor representation, LSTM-RNN for the issues of time series, and try
Auto-encoder—LSTM-Based Outlier Detection … 119

to adjust both of the techniques to detect group outliers. The designed version is esti-
mated with the benchmark IBRL dataset. Experimental analysis proves the projected
method has superior accuracy, recall contrast to existing techniques.

References

1. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional

neural networks. In: NIPS
2. Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action
recognition. IEEE Trans Pattern Anal Mach Intell 35:221–231
3. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking.
In: NIPS
4. Ngiam J, Coates A, Lahiri A, Prochnow B, Le QV, Ng AY (2011) On optimization methods
for deep learning. In: ICML
5. Ranzato M, Boureau YL, LeCun Y (2007) Sparse feature learning for deep belief networks.
In: NIPS
6. Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. In:
NIPS
7. Feng Q, Zhang Y, Li C, Dou Z, Wang J (2016) Anomaly detection o f spectrum in wireless
communication via deep auto-encoders. J Supercomput. https://doi.org/10.1007/s11227-017-
2017-7
8. Cauteruccio F, Fortino G, Guerrieri A, Liotta A, Mocanu DC, Perra C, Terracina G, Vega
MT (2019) Short-long term anomaly detection in wireless sensor networks based on machine
learning and multi-parameterized edit distance. Inf Fus 52:13–30
9. Huan Z, Wei C, Li G-H (2018) Outlier detection in wireless sensor networks using model
selection based support vector data description. Sensors 18
10. Thi NN, Cao VL, Le-Khac N-A (2016) One-class collective anomaly detection based on
LSTM–RNNs. IEEE
11. Marchil E, Vesperini F, Weningerl F, Squartini FES (2015) Non-linear prediction with LSTM
recurrent neural networks for acoustic novelty detection. IEEE
12. Zhu J, Ming Y, Song Y, Wang S (2017) Mechanism of situation element acquisition based on
deep auto-encoder network in wireless sensor networks. Int J Distrib Sensor Netw 13(3)
13. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust
features with denoising autoencoders. In: ICML
14. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising auto-
encoders: learning useful representation in a deep network with a local denoising criterion. J
Mach Learn Res 11:3371–3408
15. Oliver O (2013) Distributed fault detection in sensor networks using a recurrent neural network.
Neural Process Lett. https://doi.org/10.1007/s11063-013-9327-4
16. Cheng P, Zhu M (2015) Lightweight anomaly detection for wireless sensor networks. Int J
Distrib Sensor Netw 2015. Article ID 653232
17. Malhotra P, Vig L, Shroff G, Agarwal P (2015) Long short term memory networks for anomaly
detection in time series. In: Proceedings. Presses universitaires de Louvain, p 89
18. Marchi E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic
acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural
networks. In: 2015 IEEE international conference on acoustics, speech and signal processing
(ICASSP), pp 1996–2000. IEEE
19. Marchi E, Vesperini F, Weninger F, Eyben F, Squartini S, Schuller B (2015) Non-linear predic-
tion with LSTM recurrent neural networks for acoustic novelty detection. In: 2015 International
joint conference on neural networks (IJCNN). IEEE, pp 1–7
An Improved Swarm Optimization
Algorithm-Based Harmonics Estimation
and Optimal Switching Angle
Identification

M. Alekhya, S. Ramyaka, N. Sambasiva Rao, and Ch. Durga Prasad

Abstract In this paper, harmonic parameters are estimated using an improved

particle swarm optimization (IPSO) algorithm and extended the concept for iden-
tification of correct switching angles of inverters to minimize the total harmonic
contents. Initially a power system voltage signal with multiple harmonic compo-
nents is considered in the presence of noise, and the parameters such as amplitude
(A) and phase angle (ϕ) are estimated by using conventional PSO and IPSO. Later an
objective function is framed for such voltage for cascade H-bridge inverter to identify
the precise switching angles which reduces overall harmonic contents. Comparisons
show the effectiveness of the IPSO in both cases to identify optimal solutions.

Keywords PSO · Harmonics · Optimal switching · Inertia weight

1 Introduction

The structural changes in integrated power system with renewable energy resources,
converters, and inverters along with highly nonlinear loads inject harmonics and lead
to poor quality of electrical power [1]. These injected harmonics need to be estimated
and mitigated with proper solutions since they will result in some adverse effects on
regular functions of relays and other devices. The estimation of harmonics in the

M. Alekhya (B) · S. Ramyaka · N. Sambasiva Rao

Department of Electrical and Electronics Engineering, NRI Institute of Technology, Vijayawad,
India
e-mail: [email protected]
S. Ramyaka
e-mail: [email protected]
N. Sambasiva Rao
e-mail: [email protected]
Ch. Durga Prasad
Department of Electrical and Electronics Engineering, SRKR Engineering College, Bhimavaram,
India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 121

power system signals and identification of optimal switching angles of inverters

to minimize the injected harmonics during DC to AC conversion are achieved
by intelligent optimization techniques in a better way compared to conventional
approaches.
Some of the approaches available in literature are: Utilizing structural properties
of the voltage signal injected with harmonics and noise, genetic algorithm (GA)
applied in support with least square technique in [3] for estimation of nonlinear
parameters. Since the convergence rate of GA is slow, PSO applied in 2008 [4] for
same estimation problem in similar process mentioned in [3]. Compared to GA, PSO
yields better fitness value and the estimation results close to actual values compared
to GA created a research line for application of intelligent optimization algorithms.
Later, an improved version of PSO is applied for this harmonic estimation problem
to get more accurate results with fast convergence rate [5]. However, this improved
PSO is complex in structure compared to PSO and computational time also large.
Artificial bee colony (ABC) algorithm hybridized with least squares applied in [6] in
line with earlier articles for better results with more accuracy in the presence of noise.
The objective of aforementioned stochastic and population search-based algorithms
is to estimate amplitude and phase of distorted signals with fast convergence and high
accuracy. Some of the other optimization algorithms applied in the same domain are
available in [7–10]. These harmonics injected from the power electronics devices
are minimized by optimal switching concept. Few works are available in literature
for the solutions of identification of optimal switching times of various inverters.
Several optimization techniques were applied to identify the switch patterns [11].
In this paper, a simple improved version of PSO is used for harmonics estimation
of distorted voltage signals and further optimal switching times identification to
reduce the harmonic contents. This improved PSO produces global optimal values
and reduces additional burden on selection of variables. This also provides high
accurate and fast converged results compared PSO.

2 Harmonic Estimation and Switching Angles

Identification

The voltage signal with multiple harmonics (ωh = h·2π f 0 ) and noise (μ(t)) is
expressed in time domain with fundamental frequency f 0 is

N
v(t) = Ah sin(ωh t + ϕh ) + μ(t) (1)
h=1

In Eq. (1), N = total number of harmonics. Representation of Eq. (1) in discrete

form with sampling period Ts for computing the errors is given by
An Improved Swarm Optimization Algorithm-Based … 123

N
v(k) = Ah sin(ωh kTs + ϕh ) + μ(k) (2)
h=1

Let the estimated parameters of amplitude and phase are Ah and ϕh , respectively.
The distorted signal with estimated parameters is represented as,

N

v(k) = Ah sin ωh kTs + ϕh (3)
h=1

Once the actual and estimated signals are available, then an objective function
is framed with the help of error and it attains minimum values when the estimated
signal is closely matching with actual signal. Therefore, the first objective function
used for harmonic components estimation [3–6] is given by

N
J1 = min (v(k) − v(k))2 (4)
h=1

Later, these harmonics generated from the signal conversion activities and
nonlinear load participation are minimized by identification of suitable filter param-
eters and/or switching angles patterns. In this case, second objective function is
framed for cascade H-bridge inverter to find optimal switching angles so that the
output signal consists non-dominated harmonic contents [11].
4
V ∗ − V1 1 V5 2 1 V7 2
J2 = min 100 · 1 ∗ + 50 · + 50 · (5)
(δ1 ,δ2 ,δ3 ) V1 5 V1 7 V1

In Eq. (5), V1 , V5 and V7 are the harmonic components whose expressions are
available in [R]. At optimal solution of switching angles δ1 , δ2 , δ3 , the objective
function attains its minimum.

3 Improved Particle Swarm Optimization Algorithm

Among aforementioned intelligent optimization techniques, PSO algorithm is simple

and produces global optimal functional values with convergence rate. However, the
selection of control parameters plays key role in searching process [12–15]. But the
conventional PSO operates with constant control parameters suffers with premature
conditions. Later, several variants were proposed but these variants are more complex
than parent PSO. Therefore, the velocity equation of particles is readjusted in this
paper with damped quantities shown in Eq. (6). This automatically updated position
vector shown in Eq. (7).
124 M. Alekhya et al.

vni+1 = ωωd vni + c1 cd r1 pbesti − pni + c2 cd r2 gbesti − pni (6)

pni+1 = pni + vni+1 (7)

All the terms in Eqs. (6) and (7) are as same as PSO and ωd and cd are the damping
values inserted for each control parameter. The improvements in the results for both
estimation and mitigation with the proposed method are presented in consequent
sections by providing comparison results with parent PSO.

4 Simulation Results

Initially harmonics injected voltage signal corrupted with noise along with zero
frequency component is considered for estimating harmonics components by both
PSO and IPSO. This test signal consists harmonics of order fundamental, 3rd, 5th,
7th, 11th is generated in MATLAB software. The mathematical expression of the
signal represented in the form of Eq. (1) is given by

x(t) = 1.5. sin(ωt + 80) + 0.5 sin(3ωt + 60) + 0.2 sin(5ωt + 45)
+ 0.15 sin(7ωt + 36) + 0.1 sin(11ωt + 30) + 0.5 exp(−5t)

In the estimation problem, additional noise is also included. First, the harmonic
components along with DC decaying component are estimated using conventional
PSO with constant inertia weights strategy. Four values of inertia, weights are consid-
ered for this purpose since there is no specific procedure for selection of such control
parameter. Later IPSO is applied for the same problem with worst inertia weight
values and obtained global best values. All these results are reported in Table 1.

Table 1 Optimal drift parameter values for single line to ground faults
PSO case Parameter 1st 3rd 5th 7th 11th Zero
ω = 0.9 A 1.4084 0.4906 0.1569 0.0010 0.0612 0.7666
ϕ 80.685 63.579 13.307 32.076 71.326 –
ω = 0.8 A 1.4994 0.5006 0.1986 0.1490 0.0512 0.5116
ϕ 79.966 60.039 44.0.790 35.967 90.238 −
ω = 0.7 A 1.5002 0.5004 0.1998 0.1501 0.0999 0.5108
ϕ 79.993 59.911 44.809 35.732 30.199 −
ω = 0.6 A 1.5000 0.4997 0.2000 0.1502 0.0999 0.5096
ϕ 80.017 60.003 44.938 36.174 29.682 −
ω = 0.3 A 1.4980 0.4515 0.1832 0.1356 0.0764 0.5053
ϕ 79.865 34.239 66.939 64.989 69.663 −
Proposed A 1.4993 0.5008 0.2003 0.1501 0.1006 0.5105
ϕ 79.995 60.025 45.326 35.918 29.844 −
An Improved Swarm Optimization Algorithm-Based … 125

Fitness values

30.998
30.52

3.742

0.0212

0.0209

0.0216
0.9 0.8 0.7 0.6 0.3 PROPOSED

Fig. 1 Fitness function J1 values for different inertia weights

Table 2 Optimal drift

Method δ1 δ2 δ3
parameter values for single
line to ground faults Reference [11] 33.498 54.759 67.103
Proposed 33.506 54.757 67.110

For all PSO runs at different inertia weights, the values of fitness function at the
end of final iteration are plotted in Fig. 1. From this Fig. 1, it is observed that the
proposed dynamic control parameters concept reduces selection of control parame-
ters burden for finding global optimal solutions. The same PSO strategy is applied
for identification of optimal switching values in order to minimize the total harmonic
distortion (THD). For this purpose, Eq. (5) is considered and the results are reported
in Table 2 at a modulation index (m) of 0.6

5 Conclusions

In this paper, harmonic component estimation and optimal switching conditions

patterns are identified using improved PSO algorithm and compared with standard
PSO. Comparisons revealed the importance of control variables selection in the orig-
inal PSO and simple mechanism adopted in improved PSO eliminates the additional
burden on this selection. Without increasing the computational burden, accurate
results achieved with fast convergence with proposed technique.

References

1. Harris FJ (1978) On the use of windows for harmonic analysis with the discrete Fourier
transform. Proc IEEE 66(1):51–83
2. Ren Z, Wang B (2010) Estimation algorithms of harmonic parameters based on the FFT. In:
2010 Asia-pacific power and energy engineering conference. IEEE, Mar 2010, pp 1–4
126 M. Alekhya et al.

3. Bettayeb M, Qidwai U (2003) A hybrid least squares-GA-based algorithm for harmonic

estimation. IEEE Trans Power Deliv 18(2):377–382
4. Lu Z, Ji TY, Tang WH, Wu QH (2008) Optimal harmonic estimation using a particle swarm
optimizer. IEEE Trans Power Deliv 23(2):1166–1174
5. Yin YN, Lin WX, Li WL (2010). Estimation amplitude and phase of harmonic based on
improved PSO. In: IEEE ICCA 2010. IEEE, June 2010, pp 826–831
6. Biswas S, Chatterjee A, Goswami SK (2013) An artificial bee colony-least square algorithm
for solving harmonic estimation problems. Appl Soft Comput 13(5):2343–2355
7. Kabalci Y, Kockanat S, Kabalci E (2018) A modified ABC algorithm approach for power
system harmonic estimation problems. Electric Power Syst Res 154:160–173
8. Singh SK, Kumari D, Sinha N, Goswami AK, Sinha N (2017) Gravity search algorithm
hybridized recursive least square method for power system harmonic estimation. Eng Sci
Technol Int J 20(3):874–884
9. Singh SK, Sinha N, Goswami AK, Sinha N (2016) Power system harmonic estimation using
biogeography hybridized recursive least square algorithm. Int J Electr Power Energy Syst
83:219–228
10. Singh SK, Sinha N, Goswami AK, Sinha N (2016) Robust estimation of power system
harmonics using a hybrid firefly based recursive least square algorithm. Int J Electr Power
Energy Syst 80:287–296
11. Kundu S, Burman AD, Giri SK, Mukherjee S, Banerjee S (2017) Comparative study between
different optimization techniques for finding precise switching angle for SHE-PWM of three-
phase seven-level cascaded H-bridge inverter. IET Power Electron 11(3):600–609
12. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95—
International Conference on Neural Networks, vol 4. IEEE, Nov 1995, pp 1942–1948
13. Nagaraju TV, Prasad CD (2020) Swarm-assisted multiple linear regression models for
compression index (Cc) estimation of blended expansive clays. Arabian J Geosci 13(9)
14. Prasad CD, Biswal M, Nayak PK (2019) Wavelet operated single index based fault detection
scheme for transmission line protection with swarm intelligent support. Energy Syst 1–20
15. Nagaraju TV, Prasad CD, Raju MJ (2020) Prediction of California bearing ratio using particle
swarm optimization. In: Soft computing for problem solving. Springer, Singapore, pp 795–803
A Study on Ensemble Methods for
Classification

R. Harine Rajashree and M. Hariharan

Abstract Classification is the most common task in machine learning which aims
in categorizing the input to set of known labels. Numerous techniques have evolved
over time to improve the performance of classification. Ensemble learning is one
such technique which focuses on improving the performance by combining diverse
set of learners which work together to provide better stability and accuracy. Ensemble
learning is used in various fields including medical data analysis, sentiment analysis
and banking data analysis. The proposed work focuses on surveying the techniques
used in ensemble learning which covers stacking, boosting and bagging techniques,
improvements in the field and challenges addressed in ensemble learning for classifi-
cation. The motivation is to understand the role of ensemble methods in classification
across various fields.

Keywords Machine learning · Ensemble learning · Boosting · Bagging

1 Introduction

Machine learning is one of the ways to gain artificial intelligence. Machine learning
focuses on equipping the machine to learn on itself without being explicitly pro-
grammed. This, in turn, leads way to gain intelligence. Machine learning is widely
classified into types, namely supervised learning and unsupervised learning. Clas-
sification is a prominent machine learning task which works into mapping input to
output. It is an supervised learning which does mapping of provided input to an
output. It basically finds the class to which an input data might possibly belong. It
is supervised learning since the data used to train the model which approximates

R. Harine Rajashree (B)

Department of CSE, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India
M. Hariharan
Post Graduate Programme in Management, Indian Institute of Management,
Tiruchirappalli, Tamil Nadu, India

© Springer Nature Singapore Pte Ltd. 2021 127

Fig. 1 Types of machine learning

the mapping function are labelled with correct classes. Figure 1 depicts the types of
machine learning along with applications.
In addition to classification, regression is also a supervised learning which is also
applied in various fields like risk assessment and stock value analysis. Classification is
a predictive modelling where the class label of an input data is predicted.The model is
trained with numerous data which is already labelled. Some classic examples include
spam/non-spam mail classification, handwritten character classification. Binomial
and multi-class are two diverse types of classifications. Many popular algorithms are
involved to perform the classification task. Few well known are
• K-nearest neighbours
• Naive Bayes
• Decision trees
• Random forest
Although the performance of the algorithms was commendable, there is consistent
necessity to improve the performance. Ensemble learning is one familiar technique to
improve the accuracy. Ensemble learning, in turn, has many approaches to improve
the accuracy of classification. The idea of using ensemble models is to combine
numerous weak learners to act as a single strong learner. The work presented analyses
the various methods of ensemble techniques, its application in the fields and also
experimental analysis on the effect of ensemble learning.

The rest of the paper is organized as follows: Sect. 2 studies the related work,
Sect. 3 where the various ensemble techniques are discussed, Sect. 4 discusses the
application of ensemble techniques and Sect. 5 discusses the use of ensemble tech-
niques in deep learning, Sect. 6 provides an experimentation, and Sect. 7 concludes
the proposed survey.
A Study on Ensemble Methods for Classification 129

2 Related Work

Ensemble is an evergreen research area where many studies have been proposed.
The work proposed by Sagi et al. [1] provides a detailed study covering the advent of
ensemble learning besides explaining the history of every ensemble technique. The
key take away from the proposed work is the idea to refine the algorithms to fit big
data. Another mention is to carry future work of combining deep learning with ensem-
ble learning. Gomes et al. [2] proposed a survey specifically on the use of ensemble
learning in context of data stream. The author has studied over 60 algorithms to
provide a taxonomy for ensemble learners in the context. The study concludes with
evidence that data stream ensemble learning models help in overcoming challenges
like concept drift, and it tends to wok in real-time scenarios. Dietterich et al. [3]
initiated a better understanding in their work which explains a plenty of questions
such as why ensemble works, basic methods for constructing ensembles. The paper
describes the algorithms that form a single hypothesis to perform input to output
mapping and suffer from three main losses, namely
• Statistical problem
• Computational problem
• Representation problem
These problems cause high variance and bias, thus explaining why ensembles can
reduce the bias and variance. Ren et al. [4] discussed the theories that lead to ensemble
learning which includes bias variance decomposition and diversity. The paper also
categorizes the algorithms under different classes. It also discusses distinct methods
like fuzzy-based ensemble methods and deep learning-based methods. The work
focuses equally on regression tasks besides analysing classification tasks. In this
paper, various approaches involved in classification tasks, the application of ensemble
learning and the challenges are discussed.

3 Ensemble Learning Approaches

Ensemble learning works by forming a set of hypothesis, whereas classic algorithms

work to find a single hypothesis. The ensemble then makes the hypothesis vote in
some manner to predict the output class. There are numerous explanations on how
to categorize the ensemble methods. Few articles categorize them into sequential
ensemble techniques and parallel ensemble techniques. The author in [3] classifies
the techniques based on the hypothesis design. The model can be independently
constructed hypothesis that is diverse and accurate. The other methods are where
hypotheses are constructed as an additive model. The author in [4] does a very
specific category like decomposition-based ensemble methods and fuzzy ensemble
methods. The description from [1] is the most simple categorization.
• Dependent framework
• Independent framework
130 R. Harine Rajashree and M. Hariharan

Dependent framework is when the result of each learner affects the next learner.
The learning of the next learner is affected by the previous learning. Independent
framework is constructed independently from each other. Any ensemble technique
would fall under these two categories. The prime types of ensemble techniques are
• Bagging
• Boosting
• Stacking
These techniques have numerous algorithms working under them to achieve the goals
of ensemble.

3.1 Bagging

One major key to ensemble models is diversity. Diversity plays crucial role in
improving the performance of ensemble model. Bagging is one of the approaches
to implement diversity. Bagging is bootstrap aggregating which works by training
each inducer on a subset of instances. Each inducer gets trained on different subsets,
thereby generating different hypotheses. Then a voting technique is used to deter-
mine the prediction of the test data. Bagging often contains homogeneous learners
and implements data diversity by working on samples of data. In [5], Dietterich dis-
cusses in detail why ensembles perform better than individual learners. The author
claims that bagging is the most straightforward method to construct an ensemble by
manipulating training examples. He also states that this method of ensemble works
better on unstable algorithms. These algorithms are affected by major changes even
when the manipulation is small. Tharwat et al. [6] propose a plant identification
model which uses bagging classifier. The work proposes the usage of bagging classi-
fier on a fused feature vector model for better accuracy. Decision tree learner is used
as base learner, and the results show that the accuracy gets increased with increase
in number of learners. The paper also finds that the accuracy rate was proportional
to the number of training data and size of the ensemble. Wu et al. in [7] propose an
intelligent ensemble machine learning method based on bagging for thermal percep-
tion prediction. The author shows the performance of the ensemble against SVM and
ANN. The ensemble outperformed the classic algorithms in prediction of thermal
comfort and many other measures. Many improvements were suggested in bagging
some of which include improved bagging algorithm. The algorithm is improvised by
assigning an entropy to each sample. Jiang et al. in [8] used the algorithm for pattern
recognition to recognize ultra-high-frequency signals. The model showed improved
performance against many algorithms. Another interesting variant is wagging which
is weight aggregation. It works by assigning weights to samples. However, in the
work by Bauer et al. [9], there was no significant improvements shown in results. But,
bagging has shown improved performance by decreasing the error. In many exper-
iments along with Naive Bayes and MC4, the error has been significantly reduced.
A Study on Ensemble Methods for Classification 131

In [10], Kotsiantis et al. categorize the variants of bagging algorithms into eight
categories which include
• Methods using alternative sampling techniques.
• Methods using different voting rule.
• Methods adding noise and inducing variants.

3.2 Boosting

Boosting is an iterative technique which works by adjusting the weights of the obser-
vation made by previous classifier. This ensures that the instances which are not
classified properly are picked more often than the correctly predicted instances. This
makes boosting a well-known technique under dependent framework. There are many
algorithms in boosting technique of which the following are discussed.
• AdaBoost
• Gradient boosting.
AdaBoost AdaBoost was the pioneer in boosting technique. It works to combine
weak learners to form a strong learner. It uses weights and assigns them in such a
way that weights of wrongly classified instances are increased, and for the correctly
classified ones the weights are decreased. Thus, the weights make the successive
learners concentrate more on wrongly classified instances. In [11], the author dis-
cusses AdaBoost in a very detailed manner. Various aspects of AdaBoost have been
explained forming an expansive theory on the algorithm. Prabhakar et al. in [12] pro-
posed a model combining dimensionality reduction technique and AdaBoost classi-
fier for classification of Epilepsy using EEG signals. The classification is improved
with more than 90% accuracy. Haixing et al. [13] used an AdaBoost-kNN ensemble
learning model for classification of multi-class imbalanced data. The model uses
kNN as base learner and incorporates AdaBoost. The results showed 20% increase
in accuracy than classic kNN model. Many other works involved usage of AdaBoost
along with feature selection techniques for increased accuracy.
Gradient Boosting Gradient boosting machine (GBM) works in an additive sequen-
tial model. The major difference between AdaBoost and GBM is the way they manage
the drawbacks of the previous learner. While AdaBoost uses weights, GBM uses gra-
dients to compensate the drawbacks in succeeding learners. One prominent advantage
of using GBM is that it allows user to optimize user-specified cost function instead
of unrealistic loss function. Many literatures use an improvement of gradient boost
which is extreme gradient boost (XGB). Shi et al. [14] propose a weighted XGB
which is used for ECG heartbeat classification. The model was used in classifying
heartbeats under four categories like normal and ventricular, and the work concludes
saying the method is suitable for clinical application.
132 R. Harine Rajashree and M. Hariharan

3.3 Stacking

Stacking combines multiple learners by employing a meta learner. The base level
learners are trained on the training data set, and the meta-learner trains on the base
learner features. The significance of stacking is that it can reap the benefits of well-
performing models by learning a meta-learner on it. The learners are heterogeneous,
and unlike boosting only a single learner is used to learn from base learners. Stack-
ing can also happen in multiple levels, but they might be data and time expensive.
Ghasem et al. [15] used a stacking-based ensemble approach for implementing an
automated system for melanoma classification. The author also proposed a hierar-
chical structure-based stacking approach which showed better results besides the
stacking approach.

3.4 Random Forest

Random forest is a very popular ensemble method which is bagging method where
trees are fit on bootstrap samples. Random forest adds randomness by selecting best
features out of random subset of features. Sampling over features gives the added
advantage that the trees do not have to look at the same features to make decisions.
Lakshmanaprabhu et al. [16] proposed a random forest classifier approach for big
data classification. The work exhibits how ensemble techniques work with big data.
The model is implemented on health data, and RFC is used to classify the same.
The results showed maximum precision of 94% and showed improvement against
existing methods. Paul et al. [17] proposed an improvised random forest algorithm
which iteratively reduces the features which are considered unimportant. The paper
aimed to reduce the number of trees and features while still maintaining the accuracy.
It could prove that the addition of trees or further reduction of features does not have
effect on accuracy.

4 Application of Ensemble Techniques

Ensembles are employed due to their ability to mitigate a lot of problems that might
occur while using machine learning. Many literatures [1, 4] discuss the advantages
and disadvantages. The significant benefits of using ensemble are discussed below.
• Class imbalance: When the data have majority of instances belonging to a single
class, then it is said to be class imbalance. Machine learning algorithms thereby
might develop an inclination towards that class. Employing ensemble methods
can mitigate this issue by performing balanced sampling, or employing learners
that would cancel the inclinations of the previous learner. In [13], it is shown how
ensemble is used on imbalanced data.
A Study on Ensemble Methods for Classification 133

Table 1 Findings from the literature stated above

Data set Bagging Boosting Random forest
Letter data set 94.90 96.74 96.84
Led-24 73.57 71.43 74.93
Iris 94.67 94.67 94.67
Sonar 77.14 81.43 81.90

• Bias Variance Error: Ensemble methods tackle the bias or variance error that might
occur in the base learners. For instance, bagging reduces the errors associated with
random fluctuations in training samples.
• Concept drift: Concept drift is the change in the underlying relationships due to
change in the labels over time. Ensembles are used as a remedy since diversity in
the ensembles usually reduces the error that might occur due to the drift.
Similar to the benefits, there are certain limitations in using ensembles. Few of them
are
• Storage expensive
• Time expensive
• Understanding the effect of parameters like size of ensemble and selection of
learners on the accuracy.
At few places, smaller ensembles work better, whereas in some literature the increase
in accuracy is stated to be proportional to the number of learners. Robert et al. [18]
conducted an experimental analysis by comparing the ensemble methods against 34
data sets. Some significant outcomes of the analysis are mentioned in Table 1.
From the findings, it is visible that at some cases ensembles can also perform poor,
whereas in most cases according to the literature the accuracy is better. Although
few questions are still open, ensembles have widely been employed for improved
performance.

5 Deep Learning and Ensemble Techniques

The emergence of deep learning has resulted in enormous growth in various domains.
Deep learning paves way for improvements in artificial intelligence to get par with
humans. Architectures like densely connected neural network and convolutional neu-
ral network are very popular. Deep learning plays an important role in speech recog-
nition, object detection and so on. Aggregation of multiple deep learning models is a
simple way to employ ensemble in deep learning. Other way is to employ ensemble
inside the network. Dropouts and residual blocks are an improvement in such models.
They tend to create variations in the network, thereby improving accuracy. Numerous
literatures show how neural networks employ ensemble methods for classification
134 R. Harine Rajashree and M. Hariharan

purposes. Liu et al. [19] proposed an ensemble of convolutional neural networks with
different architectures for vehicle-type classification. The results show that the mean
precision increased by 2% than single models. Zheng et al. [20] in the work proposed
an ensemble deep learning approach to extract EEG features. It uses bagging along
with LSTM model and showed higher accuracy when compared with techniques
like RNN. Such works exhibit how ensemble is marching forward in the artificial
intelligence era.

6 Experimentation

To understand the effect of ensemble methods, the ensemble methods have been
employed on an open banking data set which studies the attributes of an user to clas-
sify if he/she would take a loan from the bank. The findings from the experimentation
are listed in the table below.

Method Accuracy
Stacking 88.5
AdaBoost 89.01
GBM 88.27
Bagging 88.72
Random forest 86.91

AdaBoost exhibits highest accuracy of 89% with decision tree as base learner.
Random forest shows 86.91% accuracy. The 3% increase in accuracy is still promis-
ing to help the task. On the other hand, another data set trying to classify default
credit payments was experimented. Random forest showed the highest accuracy of
98.76%, whereas the other techniques showed accuracy around 75% only. This shows
the advantage of ensemble technique while also expressing the limitation of effect of
parameters on accuracy. The effect of feature selection can also be studied in future
for reaping much higher performance.

7 Conclusion

Classification tasks aim to predict the class label of unknown test data. Ensem-
ble learning is a popular technique to improve the performance of classification. It
combines numerous learners to form a single strong learner. Many techniques and
algorithms are present in ensemble learning. Bagging, boosting and stacking are the
very popular ensemble techniques. These algorithms are discussed along with their
applications in various fields. Besides, the future of ensemble learning in the field
of artificial intelligence , advantages and disadvantages in application of ensemble
learning are explained in detail. As a future work, analysis can be made on how
A Study on Ensemble Methods for Classification 135

ensemble learning can be used efficiently to fit big data and in the direction of build-
ings models that are simpler and effective in terms of cost and time.

References

1. Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl
Discov 8(4):
2. Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data
stream classification. ACM Comput Surv (CSUR) 50(2):1–36
3. Dietterich TG (2002) Ensemble learning. The handbook of brain theory and neural networks
2:110–125
4. Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent develop-
ments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53
5. Dietterich TG (2000) Ensemble methods in machine learning. International workshop on mul-
tiple classifier systems. Springer, Berlin, Heidelberg, pp 1–15
6. Tharwat A, Gaber T, Awad YM, Dey N, Hassanien AE (2016) Plants identification using
feature fusion technique and bagging classifier. The 1st international conference on advanced
intelligent system and informatics (AISI2015), 28–30 Nov 2015, Beni Suef. Egypt. Springer,
Cham, pp 461–471
7. Wu Z, Li N, Peng J, Cui H, Liu P, Li H, Li X (2018) Using an ensemble machine learn-
ing methodology-Bagging to predict occupants-thermal comfort in buildings. Energy Build
173:117–127
8. Jiang T, Li J, Zheng Y, Sun C (2011) Improved bagging algorithm for pattern recognition in
UHF signals of partial discharges. Energies 4(7):1087–1101
9. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bag-
ging, boosting, and variants. Mach Learn 36(1–2):105–139
10. Kotsiantis SB (2014) Bagging and boosting variants for handling classifications problems: a
survey. Knowl Eng Rev 29(1):78
11. Schapire RE (2013) Explaining adaboost. Empirical inference. Springer, Berlin, Heidelberg,
pp 37–52
12. Prabhakar SK, Rajaguru H (2017) Adaboost classifier with dimensionality reduction techniques
for epilepsy classification from EEG. International conference on biomedical and health infor-
matics. Springer, Singapore, pp 185–189
13. Haixiang G, Yijing L, Yanan L, Xiao L, Jinling L (2016) BPSO-adaboost-KNN ensemble
learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49:176–
193
14. Shi H, Wang H, Huang Y, Zhao L, Qin C, Liu C (2019) A hierarchical method based on
weighted extreme gradient boosting in ECG heartbeat classification. Comput Methods Progr
Biomed 171:1–10
15. Ghalejoogh GS, Kordy HM (2020) Ebrahimi F (2020) A hierarchical structure based on Stack-
ing approach for skin lesion classification. Expert Syst Appl 145:
16. Lakshmanaprabu SK, Shankar K, Ilayaraja M, Nasir AW, Vijayakumar V, Chilamkurti N (2019)
Random forest for big data classification in the internet of things using optimal features. Int J
Mach Learn Cybern 10(10):2609–2618
17. Paul A, Mukherjee DP, Das P, Gangopadhyay A, Chintha AR, Kundu S (2018) Improved
random forest for classification. IEEE Trans Image Process 27(8):4012–4024
18. Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Philip Kegelmeyer W, Eschrich S (2004)
A comparison of ensemble creation techniques. International workshop on multiple classifier
systems. Springer, Berlin, Heidelberg, pp 223–232
19. Liu W, Zhang M, Luo Z, Cai Y (2017) An ensemble deep learning method for vehicle type
classification on visual traffic surveillance sensors. IEEE Access 5:24417–24425
136 R. Harine Rajashree and M. Hariharan

20. Zheng X, Chen W, You Y, Jiang Y, Li M, Zhang T (2020) Ensemble deep learning for automated
visual classification using EEG signals. Patt Recogn 102:
An Improved Particle Swarm
Optimization-Based System
Identification

Pasila Eswari, Y. Ramalakshmanna, and Ch. Durga Prasad

Abstract An improved particle swarm optimization (IPSO) is used to identify infi-

nite impulse response (IIR) system based on error minimization concept. Since
the parameter selection of conventional PSO influences searching process, dynamic
control parameters are inserted in the mechanism to avoid premature solutions. This
modification helps to final global optimal values even the initial control parameters
are worst in nature. The method is tested for two standard IIR systems of third- and
fourth-order models to show the improvements. Finally, comparative results show
the effectiveness of the dynamic nature of the control parameters of PSO in order to
find close parameter values of unknown systems.

Keywords IIR filter · Particle swarm optimization · Control parameters

1 Introduction

The elimination of specific band of frequencies is achieved by digital filters in digital

processors. Linear and nonlinear filter are the broad classification for such digital
filters. For better filtering, IIR filter is widely used instead of FIR filters in control,
signal processing, and communication related fields. The objective is to estimate the
actual parameters of the unknown system for patterns of different input and outputs
[1, 2].
Several gradients-based learning methods [3–5], intelligent approaches [6], evolu-
tionary and swarm optimization techniques [7–13] were used to estimate the
filter/system parameters. In past, gradient approaches were used for filters to esti-
mate frequency [3, 4]. Quaternion algebra concept is introduced later in [5] to reduce
the complexity in the design of IIR filter, and a learning algorithm for its training
was proposed. Since the prediction of the adaptive IIR algorithms is more difficult,

P. Eswari (B) · Y. Ramalakshmanna

Department of ECE, SRKR Engineering College, Bhimavaram, Andhra Pradesh, India
Ch. Durga Prasad
Department of EEE, SRKR Engineering College, Bhimavaram, Andhra Pradesh, India

© Springer Nature Singapore Pte Ltd. 2021 137

intelligent approaches provide alternate solutions with less complexity and with high
convergence [6]. Particle swarm optimization (PSO) was introduced in [7] for IIR
filter coefficients identification. To reconstruct missing elements of N-dimensional
data, this PSO was adopted in [7]. Earlier to this, directly the parameters were esti-
mated using PSO under ideal conditions in [8]. Evolutionary-based algorithms such
as genetic and differential evolution algorithms [9, 10] were also tried in digital
filter (IIR) coefficients identification. These aforementioned intelligent approaches
need control parameters, and their selection influences the convergence and hence
nonparametric-type algorithms were also applied for identification of parameters of
IIR and FIR filters. Teaching and learning-based optimization (TLBO) applied in
[11] to identify filter parameters. Other mathematical-based algorithms were also
available in literature for estimation of filters in noise conditions [12, 13]. Recently
cat behavior-oriented optimization algorithm (CSO) [1], gravitational search-based
technique (GSA) [2], and recent algorithms [14] were applied in estimation problem
for better convergence.
For accurate identification of filter coefficients with fast convergence rate,
improved PSO is used in this paper since the PSO algorithm is easy to implement and
fast to execute. However, the control parameters influenced on final results are mini-
mized by damping nature and achieved close values to exact solution. The efficacy
is tested with few well-defined models discussed in consequent sections.

2 Problem Formulation

Identification of the exact modal parameters of the unknown system from the obser-
vations of output and input patterns is known as system identification. This task is
completed by parameters substitutions of the model for the set of standard inputs so
that its output matches the system actual outputs. The schematic representation of
system identification is given in Fig. 1 in line with definition where the parameters
are identified by optimization algorithm.
The input (x)–output (y) relation can be described in terms of the following Eq. (1)

N
M
bi y(n − i) = ak x(n − k) (1)
i=0 k=0

In Eq. (1), N (≥ M) is the filter’s order. The transfer function of the filter described
in Eq. (1) is given by
M
Y (z) ak z −k
H (z) = = k=0 (2)
X (z) N
i=0 bi z
−i

Suppose b0 = 1 the adaptive IIR filter transfer function is,

An Improved Particle Swarm Optimization-Based System … 139

Actual
IIR System

Model
IIR System

Optimization
Algorithm

Fig. 1 Approach diagram for system identification

M −k
k=0 ak z
H (z) = N (3)
1 + i=1 bi z −i

In detail, Eq. (3) is rewritten as

Y (z) a0 + a1 z −1 + a2 z −2 + · · · + a M z −M
H (z) = = (4)
X (z) 1 + b1 z −1 + b2 z −2 + · · · + b N z −N

The estimated filter model is given by

M −k
k=0 âk z
He (z) = N
1 + i=1 b̂i z −i

To identify the correct parameters of the actual system, an error is calculated from
the known and unknown systems outputs using the equation given by

Error, e(k) = y(k) − ye (k) (5)

For nearer parameters estimation, the error defined in Eq. (5) is approaching zero
for the entire time scale. Identification of such exact parameters is achieved using
population search-based techniques where the objective function is framed with the
help of error shown in Eq. (5) is given by

N
J = min e(k)2 (6)
k=1

At optimal solution, J is approaching zero and process is converged. For this

purpose, an improved PSO is used with dynamic control parameters.
140 P. Eswari et al.

3 Improved Particle Swarm Optimization Algorithm

PSO is a popular search-based intelligent algorithm implemented from the food

searching mechanism of birds [15]. The primary solution of the optimization problem
is randomly generated in the search space known as initial solution. Each solution is
represented as ‘position’. With the help of best position of individual and group, new
position is updated with the help of ‘velocity’ calculation. This velocity is calculated
for individual birds using the current position, local best position, and global best
position of particles along with other parameters known as control parameters. These
control parameters selection influences overall search process. The wise selection
of suitable control parameters is a difficult task and hence new algorithms were
proposed in later stages. However, an improved PSO is used in this by considering
the simple architecture and fast convergence of the PSO algorithm [16–18]. This
improved version used dynamic control parameters which produces more reliable
solutions irrespective of initial selection of control parameter values. Using this, the
position and velocity equations of particles are given by

vni+1 = ωωd vni + c1 cd r1 pbesti − pni + c2 cd r2 gbesti − pni (7)

pni+1 = pni + vni+1 (8)

All the terms in Eqs. (7), (8) are as same as PSO ( pni represents position and vni
represents velocity) and ωd and cd are the damping values inserted for each control
parameter. Since the acceleration coefficients are mixing with random number, only
inertia weight is more influenced parameter. Therefore, the dynamic change is consid-
ered only for inertia weight for the rest of the paper. The improvements in the results
with the proposed method are reported in Sect. 4.

4 Simulation Results

To analyze the performance of the improved PSO for the estimation parameters, two
case studies have been taken.
Test system 1: The transfer function of the fourth-order plant (fourth-order IIR
filter) is given by

a0 + a1 z −1 + a2 z −2 + a3 z −3
H (z) = (9)
1 − b1 z −1 − b2 z −2 − b3 z −3 − b4 z −4

In Eq. (9), the actual coefficients of the unknown system are presented in first row
of Table 1. As stated in Sect. 3, initially the identification of the filter parameters
is checked using PSO with constant control parameters. At ω = 0.8, the minimum
value of the objective function is achieved. The estimated parameters are reported in
An Improved Particle Swarm Optimization-Based System … 141

Table 1 Estimated parameters of unknown test system 1

Case a1 a2 a3 b1 b2 b3 b4
Actual −0.9000 0.8100 −0.7290 −0.0400 −0.2775 0.2101 −0.1400
ω = 0.2 −0.8973 0.8068 −0.7285 −0.0422 −0.2773 0.2116 −0.1392
ω = 0.6 −0.8961 0.8058 −0.7284 −0.0435 −0.2776 0.2112 −0.1395
ω = 0.8 −0.8709 0.7677 −0.7189 −0.0620 −0.2680 0.2171 −0.1269
Proposed −0.8972 0.8066 −0.7283 −0.0422 −0.2777 0.2110 −0.1391

Table 2 Estimated parameters of unknown test system 2

Case a0 a1 a2 b1 b2 b3
Actual −0.2 −0.4 0.5 −0.6 −0.25 0.2
Proposed −0.1964 −0.4095 0.5052 −0.5928 −0.2418 0.1960

the second row of Table 1. At other constant values of the inertia weight parameter,
the optimal values achieved with PSO are also presented in Table 1. Among all cases,
the best solution is achieved when the ω = 0.2. However, with the proposed dynamic
natured ω, similar estimated values are achieved with worst initialization of control
parameters. The final objective function values at three different inertia weights 0.2,
0.6, and 0.8 are −56.51, −52.61, and −31.58 dB, respectively. However, −55.56 dB
is the function value for the proposed method where the initial value is started at 0.8
inertia weight.
Test system 2: The transfer function of the third-order plant is given by

a0 + a1 z −1 + a2 z −2
H (z) = (10)
1 − b1 z −1 − b2 z −2 − b3 z −3

In Eq. (10), the actual coefficients of the unknown system are presented in first row
of Table 2. The improved PSO is applied to identify the unknown plant parameters
and the final solution values are reported in Table 2 which is close to actual values.

5 Conclusions

In this paper, improved PSO is applied for estimation of unknown plant parameters.
This method avoids the additional burden in the selection of control parameters of
the conventional PSO and produces global optimal values (close) irrespective of
initialization process. The results provided with two higher-order models show the
advantages of the improved PSO in the system identification process.
142 P. Eswari et al.

References

1. Panda G, Pradhan PM, Majhi B (2011) IIR system identification using cat swarm optimization.
Expert Syst Appl 38(10):12671–12683
2. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2011) Filter modeling using gravitational search
algorithm. Eng Appl Artif Intell 24(1):117–122
3. Chicharo JF, Ng TS (1990) Gradient-based adaptive IIR notch filtering for frequency estimation.
IEEE Trans Acoust Speech Sig Process 38(5):769–777
4. Netto SL, Diniz PS, Agathoklis P (1995) Adaptive IIR filtering algorithms for system
identification: a general framework. IEEE Trans Educ 38(1):54–66
5. Took CC, Mandic DP (2010) Quaternion-valued stochastic gradient-based adaptive IIR
filtering. IEEE Trans Sig Process 58(7):3895–3901
6. Cho C, Gupta KC (1999) EM-ANN modeling of overlapping open-ends in multilayer microstrip
lines for design of bandpass filters. In: IEEE antennas and propagation society international
symposium 1999 Digest. Held in conjunction with: USNC/URSI National Radio Science
Meeting (Cat. No. 99CH37010), vol 4. IEEE, pp 2592–2595
7. Hartmann A, Lemos JM, Costa RS, Vinga S (2014) Identifying IIR filter coefficients using
particle swarm optimization with application to reconstruction of missing cardiovascular
signals. Eng Appl Artif Intell 34:193–198
8. Durmuş B, Gün A (2011) Parameter identification using particle swarm optimization.
In: Proceedings, 6th international advanced technologies symposium, (IATS 11), Elazığ,
Turkey, pp 188–192
9. Ma Q, Cowan CF (1996) Genetic algorithms applied to the adaptation of IIR filters. Sig Process
48(2):155–163
10. Karaboga N (2005) Digital IIR filter design using differential evolution algorithm. EURASIP
J Adv Sig Process 2005(8):856824
11. Singh R, Verma HK (2013) Teaching–learning-based optimization algorithm for parameter
identification in the design of IIR filters. J Inst Eng (India): Ser B 94(4):285–294
12. DeBrunner VE, Beex AA (1990) An informational approach to the convergence of output
error adaptive IIR filter structures. In: International conference on acoustics, speech, and signal
processing. IEEE, pp 1261–1264
13. Wang Y, Ding F (2017) Iterative estimation for a non-linear IIR filter with moving average
noise by means of the data filtering technique. IMA J Math Control Inf 34(3):745–764
14. Zhao R, Wang Y, Liu C, Hu P, Jelodar H, Yuan C, Li Y, Masood I, Rabbani M, Li H, Li B
(2019) Selfish herd optimization algorithm based on chaotic strategy for adaptive IIR system
identification problem. Soft Comput, 1–48
15. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-
international conference on neural networks, vol 4. IEEE, pp 1942–1948
16. Nagaraju TV, Prasad CD (2020) Swarm-assisted multiple linear regression models for
compression index (Cc) estimation of blended expansive clays. Arab J Geosci 13(9)
17. Prasad CD, Biswal M, Nayak PK (2019) Wavelet operated single index based fault detection
scheme for transmission line protection with swarm intelligent support. Energy Syst, 1–20
18. Nagaraju TV, Prasad CD, Raju MJ (2020) Prediction of California bearing ratio using particle
swarm optimization. In: Soft computing for problem solving. Springer, Singapore, pp 795–803
Channel Coverage Identification
Conditions for Massive MIMO
Millimeter Wave at 28 and 39 GHz Using
Fine K-Nearest Neighbor Machine
Learning Algorithm

Vankayala Chethan Prakash, G. Nagarajan, and N. Priyavarthan

Abstract Massive MIMO millimeter wave (mm-wave) system that integrates

various technologies together with hundreds of antennas that supports devices
together. In a mm-wave communication, the signal degrades due to its atmospheric
absorption, a pencil beam is formed that is liable to attenuate due to obstacles present
in between the propagation paths. Offering such a huge bandwidth a greater number
of devices are interconnected. However, in order to provide a seamless connection of
devices, identification of channel conditions is a need to analyze. From the channel
analysis, a channel characterization is found for classifying of signal paths into
Line of Sight (LoS) and Non-Line of Sight (NLoS). An energy detector is used for
the signals perceiving above 10 dB. These signals are analyzed for channel condi-
tions such as pathloss and power delay profile. In this work, independent identically
distributed AWGN channel is considered. Based on which a dataset is constructed,
machine learning algorithm, namely K-nearest neighbor (K-NN), is applied for effi-
cient channel characterization into LoS and NLoS. An accuracy of 96.3 and 94.3%
is obtained for pathloss, and an accuracy of 94.5 and 93.3% is obtained for power
delay profile at 28 and 39 GHz, respectively.

Keywords LoS · Massive MIMO · mm-wave · NLoS · Pathloss and power delay
profile

1 Introduction

The technical aspects of massive MIMO have brought the integration of various
networks together namely fifth generation. Massive MIMO with mm-wave has gained

V. C. Prakash (B) · G. Nagarajan

Department of ECE, Pondicherry Engineering College, Puducherry, India
e-mail: [email protected]
N. Priyavarthan
Department of CSE, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India

© Springer Nature Singapore Pte Ltd. 2021 143

its research importance in handling many numbers of transceiving terminals effi-

ciently. To support this greater number of devices, a huge amount of bandwidth is
required. For such higher bandwidth, the system must be operated in higher frequen-
cies such as mm-wave frequencies. However, signals experience heavy distortions
due to its shorter wavelength. These signals when obstructed create multiple paths
and makes the quality of the signal to degrade. For wireless sensor networks, Internet
of things, human-centric applications, device-centric applications, there is a demand
for such higher bandwidth. In order to provide coverage to these numbers of devices,
the channel conditions with respect to the propagating environment is analyzed.
Massive MIMO operated in TDD and FDD mode, where in TDD mode one of the
uplink channels or the downlink channels is estimated, by Hermitian transpose the
other channel is estimated, whereas in FDD channel the uplink and downlink channel
are estimated separately. The obstacles present in the propagation of signals between
the transmitter and the receiver makes the signal to degrade further with respect to
the operating frequency of the system. To enhance the signals that are intended to
the users, the device localization is mere important. In general, it is widely known
that localization happens through global positioning system (GPS).
Many techniques are in usage based on the distance, arrival of the signal, and
geometric techniques. In some scenarios, these techniques are used in combination
with GPS that makes the localization procedure more reliable. GPS and the above-
said techniques fail to provide accuracy in localization procedure due to factors
present in the propagation environment. Due to wide spread of devices, there is a
need to localize and provide signal toward the intended device. In 5G and beyond
5G networks, there is a need for massive coverage enhancement for various IoT
applications such as agriculture, medicine, where these IoT devices are employed in
a distributed manner both in urban and rural areas. For better quality of service, a
distributed massive MIMO system is a better solution [1]. A beam aligned framework
is proposed based on Bayesian decision where the coordination between the base
station and the user equipment is taken into account [2]. A distance-based localization
and mapping are proposed with extended Kalman filter irrespective of propagation
environment and position of base station [3]. A fingerprinting database positioning
is done with the received signal strength based on Gaussian regression. It reduces
complexity in analyzing the position of individual user terminal on comparison with
range-based or angle-based techniques [4]. A two-step localization procedure is
followed where the angle of arrival (AOA) and triangulation is proposed for local-
ization. To reduce the complexity of localization compressed sensing is utilized for
the identification of LoS and NLoS conditions [5].
For a massive MIMO mm-wave system, a mixed analog–digital convertor is
designed to enhance the overall system performance [6]. Localization is carried
with angle of departure (AOD) and received signal strength (RSS) for each indi-
vidual user terminal at the base station for beamforming signals that uses orthogonal
frequency division multiplexing (OFDM) with reduced peak-to-average power ratio
(PAPR) [7]. Situational awareness in a massive MIMO mm-wave system for prop-
agation environment where a model is designed in which the statistical channel
conditions and the position with respect to clock offset are considered [8]. For high
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 145

data rate communication in a massive MIMO mm-wave system, a joint position

estimation and orientation is performed for 5G networks [9]. An enhanced coop-
erative group localization is performed for coverage connectivity where the RSS
and AOA which increases the localization accuracy for both massive MIMO and
device 2 device networks [10]. A hybrid RSS-AOA technique with uniform cylin-
drical arrangement of antennas in massive MIMO mm-wave system is considered for
localization. A channel compression method is proposed where the received signal
vector dimensions are reduced by maintaining the same accuracy [11].
A massive MIMO system with uniform cylindrical array is considered where the
received signal vectors are transformed into beam space vector where a linear relation
is maintained based on the direction of arrival with low complexity search algorithm
[12]. A sparsity-based error correcting localization, i.e., on the residual update by
generalized orthogonal matching pursuit algorithm which reduces complexity in
large-scale arrays [13]. A fingerprint localization for a single site massive MIMO-
OFDM system where an angle delay channel matrix is obtained from the instan-
taneous channel condition, with clustering algorithm the complexity is reduced
[14].
An angle delay Doppler power spectrum is extracted from the channel state infor-
mation, a fingerprint is built, and it is compared pre-collected reference points. This
fingerprinting database is used with a distance-based kernel method thus increases
the localization accuracy [15]. In a massive MIMO, LTE user channel access the
location information is obtained with the synchronization when connected to the
network. In order to improve the allocation of radio resource to the user, the location
information is used for beamforming that is designed for both LoS and multi-path
environments [16].
A mobile cloud computation and massive MIMO are integrated together in 5G
networks. Direction of arrival (DOA) technique is used for localization of real-time
monitoring of patients [17]. A multiple hypothesis testing for power delay profile is
performed where the maximum likelihood and Bayesian estimation are proposed for
the identification of NLoS from LoS conditions [18]. Propagation environment with
obstacles is examined for pathloss, power delay profile, RMS delay spread, and mean
excess delay in a massive MIMO mm-wave system [19]. A hybrid RSS-TOA-based
localization is proposed for a massive MIMO mm-wave frequency for 32 and 64
antennas at the base station with 4 and 8 user terminals. An energy detector is utilized
for channel densification process which reduces channel complexity [20]. The chan-
nels are studied for mm-wave frequencies at 4.5, 28, and 38 GHz, the pathloss
is studied for vertical–vertical, vertical-horizontal, and vertical-omni polarizations
in an indoor environment [21]. An IoT-based healthcare system in an indoor and
outdoor is considered, where a design methodology is proposed with accelerometer
and magnetometer for localization of patient with identification of patient activities
such as walking, standing, sleeping are monitored and transmitted to the concerned
staff [22]. A home remote monitoring system is designed for patients in IoT environ-
ments, the protocol conversion of ISO to IEEE 11,073 protocol, M2M protocol, and
a scheduling algorithm is proposed for medical data transmission to staff at hospitals.
In spite of data transmission, secure storage and authorization of data are also taken
146 V. C. Prakash et al.

into account [23]. In a dense forest environment, the future wireless sensor networks
and IoT devices are deployed and examined for pathloss at 2.4 GHz. Two scenarios
were considered for simulation and measurements, namely (a) free space zone and
(b) diffraction zone, where the delay spread values are also presented [24].
In a 5G network, mm-wave frequencies with IoT devices are considered for higher
bandwidth. Pathloss is analyzed for 38 GHz in an outdoor environment for charac-
terization of LoS and NLoS for antenna polarizations, namely vertical–vertical and
vertical-horizontal. Parameters such as cell throughput, throughput at edges, spectral
efficiency, and fairness index [25]. IoT in industrial applications needs a wide range
of bandwidth with seamless connectivity, where a need for localization is a must.
Normally, narrow band IoT is used for industrial and healthcare applications, and
GPS fails to localize for such a low power IoT device. Based on the distance, an
analytical model is designed for geometric probabilistic analysis [26]. Indoor posi-
tioning system for an IoT environment is considered, and a wi-fi trilateration method
is proposed for position of users with respect to the reference points [27]. In IoT,
localization of devices has gained importance for quality of experience.
A localization technique is designed on the part of Butler project, namely FP-
7, which is the most commendable technique on EU projects [28]. The feasibility
of massive MIMO in industrial IoT is analyzed by placing massive antennas in
datacenter for seamless connectivity with large number of devices [29]. Massive
MIMO with IoT is analyzed for connectivity under two generic schemes, namely
massive machine-type communication and ultra-reliable low latency communication.
For physical layer technologies between massive MIMO and IoT, a strong integration
is needed in terms of protocol design [30].

2 Network Architecture

A distributed massive MIMO mm-wave system is considered for identification of

coverage area based on received signal at the user equipment. Figure 1 shows the
architecture of distributed massive MIMO. The radio towers are deployed with
massive antennas to provide connectivity to number of user equipment. For such
number of devices, a huge bandwidth is required. For seamless connectivity, the
propagating channel conditions are analyzed for LoS and NLoS.

3 Simulation Methodology

In 5G communication networks such as mobile networks, wireless sensor networks,

cognitive radio networks, device-to-device communication, there is a need to study
on the propagation environment. For a next generation of wireless communication, a
greater number of devices are to be connected together. Most of the literature review
reveals that massive MIMO mm-wave cellular communication reveals that for it is
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 147

Fig. 1 Distributed massive MIMO connectivity

operated at 28 and 39 GHz frequencies. To extend support for such huge number of
devices, massive MIMO with mm-wave frequencies is considered as a use case.
A distributed massive MIMO with 128 transmitting antennas and 4 receiving
antennas operating at 28 and 39 GHz is considered. As multiple copies of signals
arrive at the receiver, an energy detector is utilized where signals above 10 dB is
allowed. Based on these channels, measurements are made on the uplink channel.
With the principle of channel reciprocity, the downlink channel measurements are
obtained. Parameters such as pathloss and power delay profile are extracted for
LoS and NLoS scenarios. A dataset is constructed based on simulations for both
the parameters. A fine K-NN algorithm is trained, and a tenfold cross-validation is
performed on the dataset. A 1000 samples of dataset has been built. The full dataset
is divided into ten parts where nine parts are used for training and one part is used
for testing.

4 Simulation Measurements

An identically independent distributed AWGN channel is considered where the

uplink channel at the base station is analyzed for pathloss and power delay profile at
28 and 39 GHz. As the channel operates in TDD mode, the uplink and the downlink
channels can be estimated simultaneously with the Hermatian matrix. The channel
at the uplink is estimated in frequency domain [31]

Y ( f ) = H ( f ) · x( f ) + d( f ) (1)

where
d( f ) is said to be the IID noise vector at the receiver with mean zero and a unit
variance,
148 V. C. Prakash et al.

Table 1 Simulation
Entities Remarks
parameters
Simulation tool MATLAB 2019a
Frequency 28, 39 GHz
No. of transmitting antennas 128
No. of receiving antennas 4
Channel Indoor, urban
Environment AWGN
Operating mode TDD

x( f ) is said to be the signal transmitted from the RRH antennas to the user
equipment and
H ( f ) represents the channel frequency response.
With channel reciprocity, the downlink channel is obtained with Hermatian
transpose of the uplink channel that is given as

Y ( f )α H ( f )H H ( f ) · x( f ) + d( f ) (2)

With inverse fast Fourier transform, the channel frequency response is converted
into channel impulse response (Table 1).

h(t) = I F F T (H ( f )) (3)

5 Pathloss

Pathloss is defined as the ratio of received power to transmitted power. The received
signal power varies with the distance. As the distance increases, the received signal
power reduces thus degrading the signal. In comparison with LoS, NLoS suffers a
greater degradation due to the presence of obstacles. Figure 2 shows the pathloss in
LoS environment at frequencies of 28 and 39 GHz. Figure 3 shows the pathloss in
NLoS environment of operating frequencies of 28 and 39 GHz. From the figures, it
is clear that as the operating frequency increases, the pathloss also increases.
The receiver power of the antenna in pathloss is given by

PT
PR = Ar (4)
4π D β
where
PT Transmitting power
D distance between the transmitter and receiver
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 149

Fig. 2 Pathloss at LoS environment

Fig. 3 Pathloss at NLoS environment

150 V. C. Prakash et al.

β Pathloss exponent
Ar Aperture area of the receiver
The aperture area of the receiver is given by
α
Ar = G r (5)
4π
where

Gr Receiver Gain

c
α= (6)
f

The antenna gains of both the transmitter and receiver antenna gain is considered
to be one due to isotropic antennas. The pathloss incurred between the transmitter
and receiver with respect to distance is given

PL = PT − PR (7)

4π f
PL = 20 log (8)
c

6 Power Delay Profile

The power delay profile is given by the average received signal power with respect
to its time delay. Figure 4 represents the power delay profile where the time varying
channels are examined. From the figure, it is clearly visible that the channel operating
at 39 GHz experiences more distortions and appears to be higher in NLoS conditions
than LoS condition on comparison with 28 GHz. It is evident that the negative
received signal power tends to be in NLoS condition.
The power delay profile is given by

PDP(t) = |h(t)|2 (9)

Channel Coverage Identification Conditions for Massive MIMO Millimeter … 151

Fig. 4 Power delay profile

7 Fine-KNN

Fine K-nearest neighbor algorithm is a nonparametric algorithm where the entire

dataset is used as training during classification. It predicts the values based on
their distances between datapoints with respect to the Euclidean distance among
the training and testing data. It arranges the values of the datapoints in an ascending
order. From the set of values, the top values of datapoints is chosen. From the frequent
occurrences of the dataset values, the classes are separated. Thus, the classification
of LoS and NLoS happens.
The distance between data points A = [a1 , a2 , …., an ] and B = [b1 , b2 , …, bn ] is
represented as

d(A, B) = (b1 − a1 )2 + (b2 − a2 )2 + · · · + (bn − an )2 (10)

The scatterplot represents the relationship of variables in the datasets. Figures 5

and 6 show the classification of datapoints into LoS and NLoS for 28 and 39 GHz,
respectively. The red data points represent the LoS conditions, and the blue data
points represent the NLoS condition. However, the crossed data points represent the
misclassification of LoS and NLoS conditions. Both the figures exhibit a positive
correlation where the x-axis and the Y-axis increase linearly.
The confusion matrix shows the accuracy of classification with the true value
and the predicted value. These values are made with respect to the observed value
and predicted value. The confusion matrix resembles the accuracy of the machine
learning algorithm. Figures 7 and 8 denote the true positive, false positive, false
negative, and true negative values of pathloss at 28 and 39 GHz. These values are
displayed based on the conditions made between the observations and predictions as
per the classes 1 and 0.
152 V. C. Prakash et al.

Fig. 5 Scatterplot of pathloss at 28 GHz

Fig. 6 Scatterplot of pathloss at 39 GHz

From the dataset of 1000 samples, true positive value is 530, false positive value
is 21, false negative value is 16, and the true negative value is 433. The true positive
rate for class 0 is 0.96 and for class 1 is 0.96, whereas the false negative rate is 0.04
for both the classes, respectively. The positive predictive value of class 1 and 0 is 0.95
and 0.97 and the false discovery rate of class 1 and 0 is 0.05 and 0.03. For 39 GHz,
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 153

Fig. 7 Confusion matrix of

pathloss at 28 GHz

Fig. 8 Confusion matrix of

pathloss at 39 GHz

true positive value is 521, false positive value is 29, false negative value is 22, and
true negative value is 428. The true positive rate for class 1 and 0 is 0.95 and false
negative rate for both classes 1 and 0 is 0.05. The positive predictive value of class 1
and 0 is 0.94 and 0.96, and the false discovery rate of class 1 and 0 is 0.06 and 0.04.
Figures 9 and 11 show the receiver operating characteristics of pathloss at 28 and
39 GHz for positive class 1. The performance of the classifier is determined with
154 V. C. Prakash et al.

Fig. 9 ROC of pathloss at 28 GHz

the ROC curve and is represented with a red point on the curve. The accuracy of the
classifier is denoted by the area under the curve. As the area under the curve increases,
the accuracy of the classifier also increases. Figure 9 shows the curve between true
positive rate and false positive rate where TPR is 0.96 and the FPR is 0.04. From
Fig. 11, the curve depicts the TPR of 0.95 and FPR of 0.05. Figures 10 and 12. show
the ROC curve of pathloss of 28 and 39 GHz for positive class 0. From the graph,
Fig. 10, the TPR is 0.96 and FPR is 0.04 and from Fig. 12, TPR is 0.95 and FPR is
0.05.
Scatterplot represents the relationship between the variables and its correlation.
Figures 13 and 14 show the scatterplot for PDP dataset of classes 1 and 0, i.e., LoS
and NLoS conditions. The red data points depict the LoS condition, and the blue
data points depict the NLoS conditions. Misclassification such as LoS into NLoS
conditions and vice versa is marked with red- and blue-colored cross-markings.
However, the scatterplot shows no correlation between the variables in the datasets.
The confusion matrix explains the prediction accuracy of the machine learning
algorithms. It displays the values between the observations and the predictions, that
are depicted in Figs. 15 and 16. It provides the true positive, false positive, false
negative, and true negative values for PDP datasets operating at 28 and 39 GHz.
From the dataset of 1000 samples, the true positive value is 518, false positive
value is 32, false negative value is 23, and true negative value 427 for a dataset
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 155

Fig. 10 ROC of pathloss at 39 GHz

operating at 28 GHz. The true positive rate for class 1 and 0 is as 0.95 and 0.94, and
the false negative rate is of 0.05 and 0.06. The false discovery rate for class 1 and 0
is 0.07 and 0.04, and the positive predictive value for class 1 and 0 is 0.93 and 0.96.
For 39 GHz, the true positive value is 515, false positive value is 35, false negative
value is 32, and the true negative value is 418, the true positive rate for class 1 and
0 is 0.93 and 0.94, and false negative rate is 0.07 and 0.06 for classes 1 and 0. The
positive predictive value is of 0.93 and 0.96 for class 1 and 0, whereas the false
discovery rate is of 0.07 and 0.04 for class 1 and 0, respectively.
Figures 17 and 18 show the ROC of PDP (positive class 1) for both 28 and
39 GHz, respectively. The performance of the classifier is analyzed with the ROC
curves that are denoted with red points in the graph. The area under the curve shows
the accuracy of the algorithm, higher the area under the curve, higher is the accuracy
of the algorithm. The graph is plotted for true positive rate and false positive rate
where the true positive rate is of 0.95, and false positive rate is of 0.06 at 28 GHz.
For 39 GHz, the true positive rate is of 0.94 and the false positive rate is of 0.05.
Figures 19 and 20, ROC of PDP for 28 and 39 GHz (positive class 0). It represents
the performance of the prediction accuracy that is represented by red dot and the area
under the curve. Larger the area under the curve indicates the higher accuracy of the
algorithm. The curve is plotted for true positive rate and false positive rate where
156 V. C. Prakash et al.

Fig. 11 ROC of pathloss at 28 GHz

TPR is 0.94 and the FPR is 0.05 for 28 GHz and for 39 GHz, TPR is 0.93 and FPR
is 0.07.

8 Conclusion

In an indoor massive MIMO system operating at mm-wave frequency signal expe-

rience a greater degradation with operating frequency and obstacles present in the
propagation environment. The channel conditions are analyzed for LoS and NLoS
environment with an energy detector, and a dataset with 1000 samples is built. A
machine learning algorithm, namely K-NN, is used for classification of LoS and
NLoS conditions. A tenfold cross-validation is performed where the accuracy of the
classification is analyzed with the testing and training data. An accuracy of about 96.3
and 94.3% is obtained for pathloss, and an accuracy of 94.5 and 93.3% is obtained
for power delay profile at 28 and 39 GHz, respectively.
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 157

Fig. 12 ROC of pathloss at 39 GHz

Fig. 13 Scatterplot of PDP at 28 GHz

158 V. C. Prakash et al.

Fig. 14 Scatterplot of PDP at 39 GHz

Fig. 15 Confusion matrix of

PDP at 28 GHz
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 159

Fig. 16 Confusion matrix of

PDP at 39 GHz

Fig. 17 ROC of PDP at 28 GHz

160 V. C. Prakash et al.

Fig. 18 ROC of PDP at 39 GHz

Channel Coverage Identification Conditions for Massive MIMO Millimeter … 161

Fig. 19 ROC of PDP at 28 GHz

Fig. 20 ROC of PDP at 39 GHz

162 V. C. Prakash et al.

References

1. Chen X, Kwan Ng DW, Yu W, Larsson EG, Al Dhahir N, Schober R (2020) Massive access
for 5G and beyond. arXiv preprint arXiv:2002.03491, pp 1–21
2. Maschietti F, Gesbert D, de Kerret P, Wymeersch H (2017) Robust location-aided beam
alignment in millimeter wave massive MIMO. In: IEEE Global Communications Conference
3. Li X, Leitinger E, Oskarsson M, Astrom K, Tufvesson F (2019) Massive MIMO based localiza-
tion and mapping exploiting phase information of multipath components. IEEE Trans Wireless
Commun 18(9):4254–4267
4. Savic V, Larsson EG (2015) Fingerprinting based positioning in distributed massive MIMO
systems. In: IEEE 82nd vehicular technology conference
5. Garcia N, Wymeersch H, Larsson EG, Haimovich AM, Coulon M (2017) Direct localization
for massive MIMO. IEEE Trans Signal Process 65(10):2475–2487
6. Zhang J., Dai L, Li X, Liu Y, Hanzo L (2018) On low resolution ADCs in practical 5G
millimeter-wave massive MIMO systems. IEEE Commun Mag 56(7):205–211
7. Mahyiddin WA, Mazuki ALA, Dimyati K, Othman M, Mokhtar N, Arof H (2019) Localization
using joint AOD and RSS method in massive MIMO system. Radioengineering 28(4):749–756
8. Mendrzik R, Meyer F, Bauch G, Win MZ (2019) Enabling situational awareness in millimeter
wave massive MIMO systems. IEEE J Sel Top Signal Process 13(5):1196–1211
9. Shahmansoori A, Garcia GE, Destino G, Grandos G, Wymeersch H (2015) 5G position and
orientation estimation through millimeter wave MIMO. IEEE Globecom Workshops
10. Leila G, Najjar L (2020) Enhanced cooperative group localization with identification of
LOS/NLOS BSs in 5G dense networks. Ad Hoc Netw 88–96
11. Lin Z, Lv T, Mathiopoulos PT (2018) 3-D indoor positioning for millimeter-wave massive
MIMO systems. IEEE Trans Commun 66(6):2472–2486
12. Lv T, Tan F, Gao H, Yang S (2016) A beamspace approach for 2-D localization of incoherently
distributed sources in massive MIMO systems. Signal Process 30–45
13. Abhishek, Sah AK, Chaturvedi AK (2016) Improved sparsity behaviour and error localization
in detectors for large MIMO systems. IEEE Globecom Workshops
14. Sun X, Gao X, Ye Li G, Han W (2018) Single-site localization based on a new type of fingerprint
for massive MIMO-OFDM systems. IEEE Trans Veh Techn 67(7), 6134–6145
15. Zhang X, Zhu H, Luo X (2018) MIDAR: massive MIMO based detection and ranging. In:
IEEE Global Communication Conference
16. Fedorov A, Zhang H, Chen Y (2018) User localization using random access channel signals
in LTE networks with massive MIMO. In: IEEE 27th International Conference on Computer
Communication and Networks (ICCCN)
17. Wan L, Han G, Shu L, Feng N (2018) The critical patients localization algorithm using sparse
representation for mixed signals in emergency healthcare system. IEEE Syst J 12(1):52–63
18. Prakash VC, Nagarajan G, Ramanathan P (2019) Indoor channel characterization with multiple
hypothesis testing in massive multiple input multiple output. J Comput Theor Nanosci
16(4):1275–1279
19. Prakash VC, Nagarajan G, Batmavady S (2019) Channel analysis for an indoor massive MIMO
mm-wave system. In: International conference on artificial intelligence, smart grid and smart
city applications
20. Prakash VC, Nagarajan G (2019) A hybrid RSS-TOA based localization for distributed indoor
massive MIMO systems. In: International conference on emerging current trends in computing
and expert technology. Springer, Berlin
21. Majed MB, Rahman TA, Aziz OA, Hindia MN, Hanafi E (2018) Channel characterization and
path loss modeling in indoor environment at 4.5, 28 and 38 GHz for 5G cellular networks. Int
J Antennas Propag Hindawi 1–14
22. Dziak., Jachimczyk., Kulesza.: IoT-Based Information System for Healthcare Application:
Design Methodology Approach, Applied Sciences, MDPI, 7(6), 596, (2017).
23. Park K, Park J, Lee JW (2017) An IoT system for remote monitoring of patients at home. Appl
Sci MDPI 7(3):260
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 163

24. Iturri P, Aguirre E, Echarri M, Azpilicueta L, Eguizabal A, Falcone F, Alejos A (2019) Radio
channel characterization in dense forest environments for IoT-5G. Proceedings, MDPI 4(1)
25. Qamar F, Hindia MHDN, Dimyati K, Noordin KA, Majed MB, Rahman TA, Amiri IS (2019)
Investigation of future 5G-IoT Millimeter-wave network performance at 38 GHz for urban
microcell outdoor environment. Electronics, MDPI 8(5):495
26. Tong F, Sun Y, He S (2019) On positioning performance for the narrow-band internet of things:
how participating eNBs impact? IEEE Trans Ind Inf 15(1):423–433
27. Rusli ME, Ali M, Jamil N, Md Din M (2016) An improved indoor positioning algorithm based
on RSSI-trilateration technique for internet of things. In: IOT, International conference on
computer and communication engineering (ICCCE)
28. Macagnano D, Destino G, Abreu G (2014) Indoor positioning: a key enabling technology for
IoT applications. IEEE World Forum on Internet of Things
29. Lee BM, Yang H (2017) Massive MIMO for industrial internet of things in cyber-physical
systems. IEEE Trans Ind Inf 14(6):2641–2652
30. Bana A-S, Carvalho ED, Soret B, Abrao T, Marinello JC, Larsson EG, Popovski P (2019)
Massive MIMO for Internet of Things (IoT) connectivity. Phys Commun 1–17
31. Li J, Ai B, He R, Wang Q, Yang M, Zhang B, Guan K, He D, Zhong Z., Zhou T, Li N (2017)
Indoor massive multiple-input multiple-output channel characterization and performance
evaluation. Front Inf Technol Electr Eng 18(6):773–787
Flip Flop Neural Networks: Modelling
Memory for Efficient Forecasting

S. Sujith Kumar, C. Vigneswaran, and V. Srinivasa Chakravarthy

Abstract Flip flops circuits can memorize information with the help of their bi-
stable dynamics. Inspired by the flip flop circuits used in digital electronics, in this
work we define a flip flop neuron and construct a neural network endowed with
memory. Flip flop neural networks (FFNNs) function like recurrent neural networks
(RNNs) and therefore are capable of processing temporal information. To validate
FFNNs competency on sequential processing, we solved benchmark time series
prediction and classification problems with different domains. Three datasets are used
for time series prediction: (1) household power consumption, (2) flight passenger
prediction and (3) stock price prediction. As an instance of time series classifica-
tion, we select indoor movement classification problem. The FFNN performance
is compared with RNNs consisting of long short-term memory (LSTM) units. In
all the problems, the FFNNs show either show superior or near equal performance
compared to LSTM. Flips flops shall also potentially be used for harder sequential
problems, like action recognition and video understanding.

Keywords Flip flops · LSTM · Memory

1 Introduction

Efficient prediction and forecasting of time series data involve capturing patterns
in the history of the data. Feed-forward networks process data in a single instance
and therefore cannot solve time series prediction problems unless data history is

S. Sujith Kumar · V. Srinivasa Chakravarthy (B)

Department of Biotechnology. Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of
Technology, Madras, Chennai 600036, India
e-mail: [email protected]
S. Sujith Kumar
e-mail: [email protected]
C. Vigneswaran
School of Computing, SASTRA Deemed University, Thanjavur, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 165

explicitly presented to the network through tapped delay lines and other techniques
for representing temporal features. Alternatively, a neural network with loops, which
would have the ability to recognize the patterns in the data and comprehend the
information over multiple time steps and also, can process temporal data by virtue
their memory property [1–3]. Flip flops are basic electronic circuits with memory
property. Based on their input conditions, they can hold on to information through
time or simply allow to pass through [4]. In this paper, we show how using neuron
models that emulate electronic flips flops it is possible to construct neural networks
with excellent temporal processing properties. It will be demonstrated that such
networks show high levels of performance with prediction and classification of time
series. This paper describes an implementation of flip flop neural networks for solving
benchmark sequential problems. It also presents a brief comparison of the results with
LSTM-based models.

2 Previous Work

Holla and Chakravarthy [5] have described a deep neural network consisting of a
hidden layer of flip flop neurons. The network compared favourably with LSTM
and other RNN models on long-delay decision-making problems. In this paper, we
use a variation of the flip flop neural network described in [5] and apply problem
statements pertaining to prediction and classification of time series data. Flip flop
model is compared with the popular RNN variant, long short-term memory (LSTM),
and the observations of the comparative study are described in Sect. 3.

2.1 Long Short-Term Memory (LSTM)

Long short-term memory (LSTM) is a popular and dominant variant of RNNs and
one of the most widely employed memory-based units [1, 2]. They are widely used
for sequential and time series-based problems. They have the ability to retain infor-
mation that would be highly discriminative in the final decision-making process
and also exhibit the ability to forget or discard information that contributes less to
the performance of the model. The LSTM operates through a gating mechanism.
This was done predominantly to overcome the issues of catastrophic forgetting or
vanishing gradients that are associated with the long-term memory. Basically, the
task of the gates is to purge information that would only end up serving as noise to
the model and utilize information that would prove to be crucial. This mechanism of
remembering and forgetting information, achieved through the gating mechanism,
is implemented by training the gating parameters over the set of input features. The
input gate of LSTM decides what new patterns of data would be preserved in the
long-term memory. Thus, the input gate filters out the combination of the current
input and the short-term memory and transmits it to the downstream structures. The
Flip Flop Neural Networks: Modelling Memory … 167

forget gate of LSTM decides which patterns from the long-term memory will be
preserved and which ones would be discarded by multiplying the long-term memory
with forget vectors obtained by the current input.
The output gate is the one which produces the short-term memory which will be
used by the next LSTM cell as memory from the previous time step.

3 Model Architecture

Flip flops are electrical latch circuits that can store the information related to the
previous time steps. They consist of two stable states: one that stores information of
the previous time steps and the other that clears the state. The current state of flip
flop is dependent on stable input states and the previous state. SR, JK, D, and T flip
flops are the types of flip flops that are widely used in the field of digital electronics.
We will be working with the SR flip flop in our simulation experiments as it is the
simplest implementation of the bi-stable latch. The JK flip flop is more generalized
version of SR flip flop with the ability to avoid undefined state when both the inputs
are high. The SR flip flop is a bi-stable latch circuit that consists of two competing
inputs S and R, to SET and the RESET, respectively. The output of the circuit at the
current time step (Qt ) will have a value of 0 or 1 depending on the states of S and R
inputs. The feedback mechanism helps to model memory in this circuitry. Thus, SET
(S), RESET (R), and the output of the previous time step (Qt − 1 ) are given as input
to the flip flop for the particular time step. Table 1 shows the truth table of simple
bi-stable SR flip flop.
From Table 1, we can interpret that changes to the inputs (S and R) are crucial in
determining the state at the current time step (Qt ). The equivalent algebraic equation
of the SR flip flop is given below.

Q t = S + R Q t−1 (1)

Since the current output depends on the last state, the SR flip flop has memory
property. The complete architecture of a flip flop-based neural network is given in
Fig. 1.
The network depicted in Fig. 1 has five layers out of which the third layer is the
flip flop layer which plays the role of memory. Input to the flip flop layer from the
previous layer is divided into half to obtain set and reset input of flip flops. The output

Table 1 SR flip flop’s truth

Set (S) Reset (R) Feedback (Qt − 1 ) Output (Qt )
table realization
1 0 X 1
0 1 X 0
0 0 Qt −1 Qt − 1
1 1 X Undefined
168 S. Sujith Kumar et al.

Fig. 1 Flip flop neural network consisting of five layers with flip flop layer consists of five flip
flops

at the previous time step Q t − 1 is fed back as input to the flip flops to obtain Qt , the
output at the next step. Thus, the final step is the propagation of output Qt through a
linear layer following till the last output layer.
The forward propagation of layers involving the conventional neurons is given as,

Z = W · X +b (2)

where ‘W ’ is the weights of the network initialized via Xavier initialization, ‘X’
is the input data and ‘b’ is the bias term.
The output obtained post the application of the activation function is given as,

A = tanh(Z ) (3)

Let N i be the total nodes preceding flip flops, and the inputs set (X S ) and reset
(X R ) are obtained by,

X S = X [Ni mod 2 == 0] (even neurons) (4)

X R = X [Ni mod 2 == 1] (odd neurons) (5)

where X[k] is the output of the kth neuron (zero indexed) from the previous layer
and mod defines the modulus function to get the remainder.
The weights projecting from the previous layer to the flip flops are modelled as
one-to-one connections, so that the dimension of previous layer to the flip flop layer
is twice that of the flip flops,
Flip Flop Neural Networks: Modelling Memory … 169

W S = W [Ni mod2 == 0] (6)

W R = W [Ni mod 2 == 1] (7)

Thus, the weighted inputs S and R are given by,

S = X S · WS (8)

R = X R · WR (9)

The final state V (t + 1) of the flip flop layer at t + 1 time step is given by the
equation,

V (t + 1) = S + (1 − R) ∗ V (t) − S ∗ (1 − R) ∗ V (t) (10)

where V (t) is the previous state of the flip flop layer. The backpropagation through
flip flop layer is defined as,

∂E ∂E ∂ OF F ∂ S
= · · (11)
∂ws ∂ OF F ∂S ∂ws
∂E ∂E ∂ OF F ∂ R
= · · (12)
∂w R ∂ OF F ∂R ∂w R

Thus, the partial derivatives of the weights are given by,

∂ OF F
= 1 − (1 − R) ∗ V (t) (13)
∂R
∂ OF F
= − V (t) ∗ (S + 1) (14)
∂R
∂S
=S (15)
∂ws
∂R
=R (16)
∂w R

4 Experiments

The FFNN architecture described is applied to three prediction and classification

time series prediction problems. The prediction problems considered are:
170 S. Sujith Kumar et al.

1. Household power consumption.

2. Flight passenger prediction.
3. Stock price prediction.
The time series classification problem considered is indoor movement classifica-
tion.
For different problem statements, different architectures are used depending on the
complexity of the data. The changes in the architecture are mainly done to the initial
input layer to match the input features. Since the experiments carried out were related
to prediction and classification (binary), the final output layer has a single node. The
aforementioned time series problems were also tackled using LSTM networks so that
a clear comparison could be obtained with a benchmark. The results of the models
prediction along with predictions obtained through LSTMs are given below. The
results have been obtained empirically by narrowing down on the best set of hyper-
parameters that yielded the most efficient results for both flip flop network and the
LSTM network.

4.1 Household Power Consumption

In the household power consumption problem, the dataset was obtained from Kaggle
[6] which contained measurements gathered between December 2006 and November
2010, for a total of 36 months. It comprises seven features: the global active power,
submetering 1, submetering 2, submetering 3, voltage, global intensity and the global
reactive power. The problem was framed in such a way that the model must predict
the global active power of the future months provided that it is trained on the historic
data of the aforementioned seven features. The architecture followed for the flip flop
model comprises three hidden layers similar to Fig. 1, with dimensions: 10, 5 and
10. The input layer and output layer are set with 7 and 1, respectively. The LSTM
model on the other hand comprises a hidden layer of size 30. Both the models used a
window size of 60 during training to represent the history of the data. Adam optimizer
is used to optimize the model parameters, and surrogate loss was calculated using
mean squared error (MSE) as a training criterion for both the models. Figure 2 shows
the predictions obtained through an FFNN and an LSTM network. Table 2 presents
the MSE of the model on the test dataset. From both Fig. 2 and Table 2, it is clearly
evident that the flip flop network is more efficient than LSTM in its ability to predict
the power consumption pattern.

1
N
Mean Square Error (MSE) = (X i − Yi )2
N i

where ‘N’ is the total number of inputs, X i is the ground truth label for that input,
and Y i is the output predicted by the FFNN.
Flip Flop Neural Networks: Modelling Memory … 171

Fig. 2 Predictions done by flip flop network and LSTM on the power consumption test dataset

Table 2 MSE on test data by

Model Training epochs Mean squared error
trained flip flop network and
LSTM Flip flop network 100 0.00073 (7.3e−4)
LSTM 100 0.00133 (1.3e−3)

4.2 Flight Passenger Prediction

The international airline passenger dataset obtained through Kaggle [7] contains the
number of passengers travelled every month internationally. The task is to predict
this univariate time series representing the number of passengers that would travel in
the subsequent months. The architecture of flip flop network used is similar to that
for power consumption dataset except the input layer is set with a single neuron and
LSTM consists of 20 hidden units. The same setup of loss function and optimizer
from previous experimentation were used. From Fig. 3 and Table 3, it can be easily

Fig. 3 Depicts the prediction given by the flip flop network and LSTM on the test data of flight
passenger dataset
172 S. Sujith Kumar et al.

Table 3 MSE on test data by

Model Training epochs Mean squared error
trained flip flop network and
LSTM Flip flop network 100 0.0010367 (1.0e−3)
LSTM 100 0.0023718 (2.3e−3)

concluded that the flip flop network clearly outperforms LSTM and is more effective
in capturing the relevant temporal information from the history to predict accurately.

4.3 Stock Price Prediction

We have taken the Apple stock price dataset for the stock price prediction experiment
which consists of Apple’s stock in the period of January 2010 to February 2020 [8].
This is a multivariate time series prediction problem as there were four features: open,
high, low and close. The task of the model was to predict the open prices of the stock
in future days (test set) based on training on the past data. The FFNN used for this
dataset consists of input layer and output layer of dimensions 4 and 1, respectively,
and hidden layers are set with the same number of neurons as that of the previous
two experiments, whereas LSTM is modelled with 30 hidden units. A window size
of 60 days is used to capture the history. Figure 4 shows the predictions made for the
subsequent 1200 days by flip flop network and LSTM on test data. Table 4 presents
the MSE on test data by both the models. Although the FFNN predicts the correct
pattern of the stock’s opening price, this time the predictions are not as accurate as

Fig. 4 Depicts the predictions given by the flip flop model and LSTM on the apple stock test dataset

Table 4 MSE loss on test

Model Training epochs Mean squared error
data by trained flip flop
network and LSTM Flip flop network 100 0.0048732 (4.8e−3)
LSTM 100 0.0033890 (3.3e−3)
Flip Flop Neural Networks: Modelling Memory … 173

that by LSTM and also the predictions made by LSTM are less noisy as compared
to the FFNN prediction.

4.4 Indoor Movement Classification

‘Indoor user movement’ dataset is a benchmark dataset used for time series classifi-
cation and retrieved from the UCI repository [9]. The dataset is collected by placing
four wireless sensors in an environment and a moving subject; on the basis of the
subject’s movement, the wireless sensors recorded a series of signal strength along
time. Depending on the recorded signal strength from the sensors, the movement
was binary classified as −1 and 1, wherein −1 and +1 represent no transition and
transition between the rooms, respectively. The architecture utilized for the FFNN
consists of 4 neurons and 1 neuron in the input and output layers, respectively, with
no changes in hidden layers as used in previous experiments. Further, LSTM is set
with 30 hidden units; binary cross-entropy (BCE) loss is used as the empirical loss
during training. A window size of 70 was used to look back at data of previous time
steps during the training and validation phase. Figure 5 shows the validation accuracy
during at every 10 epochs on the validation dataset, which is shuffled and split in a
70:30 ratio from the original dataset. It is noted that at the end of training, the FFNN
acquired the accuracy of 91.01, whereas LSTM reached a lesser accuracy of 88.06.

Fig. 5 Performance by flip

flop network and LSTM in
terms of validation accuracy
174 S. Sujith Kumar et al.

5 Conclusion

Flip flops modelled as neural networks prove to be an effective way of keeping hold of
previous patterns in terms of memory and utilizing them for predictions on the future
time steps. Experiments of FFNNs applied to the domains of time series prediction
and classification show that flip flop models give performance which is comparable,
if not superior, to the performance of LSTMs which are the current state-of-the-art
models for solving temporal problems. Application of flip flops can also be extended
to more complex domains such as scene analysis, video analysis and understanding.

References

1. Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term
memory (LSTM) network. Phys D Nonlinear Phenomena 404:132306
2. Santhanam S (2020) Context based text-generation using LSTM networks. arXiv preprint arXiv:
2005.00048
3. Wu W et al (2019) Using gated recurrent unit network to forecast short-term load considering
impact of electricity price. Energy Procedia 158:3369–3374
4. Chakrabarty R et al (2018) A novel design of flip-flop circuits using quantum dot cellular
automata (QCA). In: 2018 IEEE 8th annual computing and communication workshop and
conference (CCWC). IEEE
5. Pawan Holla F, Chakravarthy S (2016) Decision making with long delays using networks of
flip-flop neurons. In: 2016 International joint conference on neural networks (IJCNN), pp 2767–
2773
6. UCI Machine Learning (2016) Household electric power consumption, Version 1, Aug 2016.
Retrieved from www.kaggle.com/uciml/electric-power-consumption-data-set/metadata
7. Andreazzini D (2017) International airline passengers, Version 1, June 2017. Retrieved from
www.kaggle.com/andreazzini/international-airline-passengers/metadata
8. Nandakumar R, Uttamraj KR, Vishal R, Lokeshwari YV (2018) Stock price prediction using
long short-term memory. Int Res J Eng Technol (IRJET) 3362–338
9. Bacciu D, Barsocchi P, Chessa S et al (2014) An experimental characterization of reservoir
computing in ambient assisted living applications. Neural Comput Appl 24:1451–1464. https://
doi.org/10.1007/s00521-013-1364-4
Wireless Communication Systems
Selection Relay-Based RF-VLC
Underwater Communication System

Mohammad Furqan Ali, Tharindu D. Ponnimbaduge Perera,

Vladislav S. Sergeevich, Sheikh Arbid Irfan,
Unzhakova Ekaterina Viktorovna, Weijia Zhang, Ândrei Camponogara,
and Dushantha Nalin K. Jayakody

Abstract Visible light communication (VLC) has become recently attracted a

renewed communication trade in underwater environment. However, the deployment
of underwater applications and oceanographic data collection is more challenging
than a terrestrial basis communication. In this regard, a more sophisticated com-
munication system needs to deploy in harsh aqueous medium. Afterward, collected
data transmits with the inland base station for further analysis. The necessity of real-
time data streaming for military and scientific purposes a dual-hop hybrid coopera-
tive communication is needed. Throughout this research, a dual-hop hybrid RF and
underwater visible light (UVLC) relayed communication system developed under
consideration of strong turbulence channel conditions along with miss-alignment
of transceivers. Moreover, RF link is modeled by nakagami-m fading distribution,
while the UVLC link is modeled by Gamma-Gamma distribution for strong turbu-
lence channel conditions. Furthermore, an amplify-and-forward (AF) and decode-
and-forward (DF) protocols were considered to assist information transmission with

M. Furqan Ali (B) · T. D. Ponnimbaduge Perera · V. S. Sergeevich · S. Arbid Irfan ·

U. Ekaterina Viktorovna · W. Zhang · D. N. K. Jayakody
School of Computer Science and Robotics, National Research Tomsk Polytechnic University,
Tomsk, Russia
e-mail: [email protected]
T. D. Ponnimbaduge Perera
e-mail: [email protected]
V. S. Sergeevich
e-mail: [email protected]
S. Arbid Irfan
e-mail: [email protected]
D. N. K. Jayakody
e-mail: [email protected]
Â. Camponogara
Federal University of Juiz de Fora, Juiz de Fora, Brazil
e-mail: [email protected]
D. N. K. Jayakody
School of Postgraduate Studies, Sri Lanka Technological Campus, Padukka, Sri Lanka
© Springer Nature Singapore Pte Ltd. 2021 177
E. S. Gopi (ed.), Machine Learning, Deep Learning and Computational Intelligence
for Wireless Communication, Lecture Notes in Electrical Engineering 749,
https://doi.org/10.1007/978-981-16-0289-4_14
178 M. Furqan Ali et al.

underwater-based destination. The simulation results are used to analyze RF-UVLC

link combination and bit error rate (BER) performance through both AF and DF
signal protocols in different waters along with the large- and small-scale factors in
highly turbid water mediums and pointing errors of the propagated light beam. We
used to Monte Carlo approach for the best fitting curves to yield simulation results.

Keywords Cooperative communication · Hybrid underwater visible light

communication (HUVLC) · Underwater wireless communication (UWC) · Visible
light communication (VLC)

1 Introduction

Underwater wireless communication (UWC) has become a promising future tech-

nology for ocean observation. An evolutionary revolution in wireless communication
underwater signaling has become an attracted trade to explore unknown undersea
sectors and phenomenal activities. It encourages to increase human interests toward
underwater environment. Additionally, UWC technology is a significant approach to
enable the realization of many potential applications. A numerous underwater appli-
cations have been observed in existing literature in terms of observing and monitoring
marine life, water pollution control, early detection warning of tsunami, earthquakes,
natural resource and plenty of natural hazardous [1]. Tsunami and earthquakes are
sudden and unexpected events which are highly impossible to control. However,
these natural disasters occur due to imbalance of water phenomenon which can be
monitored and detected earlier by deploying detection techniques such as UWC [2].
Moreover, the necessity of commercial and military applications requires sophisti-
cated hybrid UWC methodology with exceptionally secure data. The current state is
depicted that the large number of underwater wireless applications deployed based
on acoustic signaling. Consequently, acoustic waves are less suitable to fulfill the
communication gap. As a result, in underwater acoustic signaling, the waves propa-
gate with very low speed approximately 1500 m/s because of signal delay, low data
rate in few kbps and low bandwidth [3]. Therefore, the deployment of underwater
applications with high speed and in real-time monitoring married with acoustic link
are more challenging. In addition, ocean water has more density, permittivity and
electrical conductivity compared to terrestrial channel, and these factors affect signal
propagation [4]. On the other hand, the electromagnetic waves propagate over very
short distances in underwater and attenuate easily due to intrinsic (physio-chemical)
properties of water [5].
Thus, in cooperative hybrid communication scenario, underwater visible light
communication (UVLC) is the wireless candidate to fulfill the desired communica-
tion requirements by combining with electromagnetic waves (in RF ranges). Indeed,
VLC has shown impressive performance against traditional acoustic communication
with higher bandwidth, lower time delay and latency, high data rate and better security
performances especially for real-time video streaming and underwater mapping for
Selection Relay-Based RF-VLC Underwater Communication System 179

geographical data collection [6]. Furthermore, VLC link has shown superiority over
existing traditional wireless candidates in underwater, since the VLC setup is more
easy to install, very cost effective for various underwater applications deployment
over short distances [7].
VLC carried out a research interest to deploy as communication purposes and
opens the door of future opportunities to signal transmission over long distances.
Hence, the VLC technology has drawn an attention of many researchers worldwide,
mainly due to its potentials for the next-generation communication systems (i.e.,
5G and 6G). Additionally, VLC technology based on light emitting diodes (LEDs)
plays a major role in the broadband wireless communication technology nowadays.
On terrestrial basis communication, VLC carried out a research interest toward high
data rate of various deployable applications. It is a solution of increasing demand of
high data-traffic and an alternative communication media for indoor applications. It
shows high performance especially for indoor wireless communication using LED
lamps. LEDs also have many advantages, i.e., low electrical power consumption, tiny
in size, reliable for using long lifetime period, cost effective and have a capability
of very less heat radiation [8]. Additionally, the current VLC technology possesses
various other merits and availability as no electromagnetic interference and radiation
with highly secure data rate transmission without any delay. An another approach of
VLC could be complimented with FSO link to enable high data rate in multimedia
services for improving system quality of service (QoS) and performances [9].
RF communication is used as potential wireless signal carrier on terrestrial basis
over long distances. However, dual-hop hybrid communication link is investigated for
improving system quality over long ranges in different channel conditions and their
requirements. A combined RF-underwater optical communication (UWOC) system
has been proposed in the literature [10–12]. The above matter facts, widely addressed
turbulence and pointing error phenomena for UVLC, few works are recorded. In
[13], the authors investigated a vertical UVLC system model using Gamma-Gamma
probability distribution and assumed strong channel turbulence conditions of water
along with a closed form expression to calculate BER performance at the undersea-
based destination as formulated. Similarly, in [14], the authors have designed the
UVLC system over log-normal distribution and derived closed form expression for
asymptotic BER performances. There is one another study carried out an impressive
work that proposed a multi-input multi-output (MIMO)-based UVLC channel model
and widely analyzed the diversity gain of the system in the presence of turbulence
properties of aqueous medium [15]. Throughout this work, we investigate a combined
hybrid link with two different communication hops for information transmission in
different channel conditions.
This study is more focused on VLC underwater link. The underwater channels
are highly complex to deploy communication setup and chosen for modulation tech-
niques. To calculate the BER performance by implementing on-off-keying (OOK)
modulation technique makes simple the system performance analyses rather than
higher modulation techniques. By motivated from this work, we investigated the
suitable relay for RF-VLC hybrid dual-hop communication under strong channel
conditions along with misalignment of transceivers. Additionally, we comprised the
180 M. Furqan Ali et al.

BER performance for RF-UVLC link considering nakagami-m fading factor with
VLC link impaired by strong turbulence channel conditions throughout amplify-
and-forward (AF) and decode-and-forward (DF) relay protocols in different types
of water media. To the best of our knowledge, in the literature, there are only a few
studies that have investigated the VLC signaling in different water mediums. In this
regard, the main contribution of this work is to propose the concept of a cooperative
hybrid relay RF-UVLC communication system model in highly turbid water channel
conditions with different relay protocols.

1.1 Paper Structure

The reminder of this study is organized as follows: In Sect. 2, we proposed a model of

dual-hop hybrid communication system considering different channel impairments.
The overall BER performance calculation based on OOK modulation scheme is
summarized in Sect. 3. Then, numerical results of the proposed system model are
presented and discussed in Sect. 4. Finally, in Sect. 5, we state some concluding
remarks.

2 Proposed System Model

The proposed system model is assumed as a hybrid RF-UVLC link where a single
antenna source node s broadcasts signal and communicates with underwater-based
destination node d through an AF relay node r which is equipped with two antennas
for reception. We consider the signal transmission through AF relay (see Fig. 1)
and evaluate the system performance. In another assumption, we used a DF relay to
assist the transmission of information and comprise the suitable relay protocol for the
hybrid RF-UVLC system in different waters. RF link is modeled by the Nakagami-m
distribution fading, while VLC link is assumed and modeled by Gamma-Gamma and
exponential Gaussian distribution random variables. Moreover, the relay consists of
two directional antennas where one antenna toward to the source for receiving signal
via RF link and other toward to the destination which is responsible to transmit
information to underwater-based destination through the VLC link. The AF relay
node receives the signals from s, amplifies and then forwards it to the undersea
destination d, the whole system concept is depicted in Fig. 1. While the DF relay
receives the signals from s, regenerates the signal and then forwards to the undersea
destination d. It is noteworthy that s and d are identical and mounted with a single
antenna. The whole system works in half-duplex mode. The signal broadcasted by
source to relay and further forward regenerated or amplified information transmission
with fixed amplification gain factor to the undersea-based destination.
Selection Relay-Based RF-VLC Underwater Communication System 181

Fig. 1 Proposed system model of dual-hop hybrid cooperative RF-VLC underwater wireless com-
munication, where the source communicates with the destination through a relay in different com-
munication links along with different channel conditions

2.1 Source-Relay (s − r) Hop

On terrestrial basis, RF communication link s broadcasts signals with an average

electrical signal power E sr . Thus, at the relay the received signal ysr can be written
as
ysr = E sr h sr x + n sr , (1)

where x denotes the transmitted information signal, n sr is used to model noise con-
sidering by additive white Gaussian noise (AWGN) with zero mean and variance σsr2 .
Moreover, h sr is used to model channel coefficient between source and relay, s − r ,
which is modeled by the Nakagami-m distribution with w, z ∈ {s, r, d}. Similarly,
the distance between communication nodes is represented by dwz . In addition, the
182 M. Furqan Ali et al.

system assumes intensity modulation and direct-detection (IM/DD) technique with

OOK modulation format. Thus, the signal-to-noise ratio (SNR) at the relay is given
by
E sr |h sr |2
γsr = , (2)
σsr2

The average SNR for the s-r link is denoted by γsr = γ̄sr h 2 and can be expressed as

E sr
γ̄sr = , (3)
σsr2

Aforementioned, the RF link is modeled by considering Nakagami-m flat fading.

Thus, the probability density function (PDF) of expected γsr is Gamma distributed
and given as [16]
m γ m−1
f sr (γ ) = exp(−γ ), (4)
Γ (m)

where Γ (·) represents the Gamma function, = m

γ̄sr
denotes the ratio of Nakagami-m
fading factor and average SNR, and m ≥ 21 .

2.2 Relay-Destination (r − d) Hop

In order to evaluate strong Gamma induced turbulence fading, we consider the

Gamma-Gamma distribution for representing the link between relay and destination
(r − d) proposed in [17]. To model VLC link, the channel conditions h rd consider
as the combination of path loss h l , water turbidity h t and pointing error h p . Conse-
quently, the channel coefficient for the r − d link is defined as h rd = h l h t h p . Note
that h l is deterministic while h t and h p are random variables following the Gamma-
Gamma probability and exponential Gaussian distribution, respectively [18]. Fur-
thermore, the RF signal received at the relay is converted into an optical signal using
sub-carrier intensity modulation (SIM) scheme and then transmitted to the undersea
destination [19]. In this way, the received VLC signal yrd at the undersea destination
can be expressed as
yrd = E rd ηrs h rd x̄ + n rd , (5)

in which, E rd is the average electrical signal power, η is the electrical to optical

conversion efficiency, rs is the photo-detector responsivity, x̄ is the information from
the source regenerated by the relay, and n rd is the additive white Gaussian noise with
zero mean and variance σrd2 . Taking into account the usage of the AF protocol in the
dual-hop hybrid RF-UVLC system model, the received signal information x at the
relay is amplified and then forwarded to the undersea destination. Consequently, the
received signal information at the undersea destination is given by
Selection Relay-Based RF-VLC Underwater Communication System 183

yrd = E rd ηrs ρ(ysr )h rd + n rd , (6)
√
where ρ = √ E rd
denotes the amplify factor. Also, replacing ysr value from
E sr |h sr |2 +N0
(1) and received signal can be specified as

yrd = E rd ηrs ρ E sr h sr x + n sr h rd + n rd (7)

= E sr E rd ηrs ρh sr h rd x + E rd ηrs ρh rd n sr + n rd , (8)

Ps Pn

in which Ps and Pn denote the received signal power and additive noise power,
respectively. Thus, the SNR at the destination for r − d link can be written as
2
E rd E sr dsr-t η2 rs2 |h sr |2 |h rd |2
γrd = . (9)
E rd η rs |h rd | σsr2 + E sr2 dsr−t |h sr |2 σrd2
2 2 2 2 + σsr2 σrd2

2.3 Underwater Attenuation Coefficient Model

Optical link severs due to the physio-chemical properties of water channel and color
division organic materials (CDOM). Additionally, the existing suspended small-scale
and large-scale factors are also responsible for optical signal fading. The VLC signal
is directly affected by the absorption and scattering phenomenon in the underwater
environment. In our investigated system, considering the r − d link, the path loss
is modeled and using the extinction coefficient c(λ), which is the total sum of the
absorption a(λ) and scattering b(λ) coefficients. The expected numerical values of
a(λ), (bλ) and c(λ) for simulation results in different waters are mentioned in Table
1. Then, the extinction coefficient, which varies according to the types of water, is
described as
c(λ) = a(λ) + b(λ), (10)

Table 1 Expected experimental values of small-scale, large-scale and extinction coefficients in

different water mediums [20]
Different waters for a(λ) (10−3 ) b(λ) (10−3 ) c(λ) (10−3 )
UWC
Pure seawater 53 3 56
Clear ocean water 69 80 150
Coastal ocean water 88 216 305
184 M. Furqan Ali et al.

If the r and d nodes are apart by a given vertical distance dt and the Beer Lambert
expression is adopted, the path loss of UVLC link is given by [21]

h l = exp(−c(λ)dt ). (11)

The VLC link requires the proper alignment of beam length. Additionally, the neces-
sity condition of link arrangementis that the receiver should be in field of view
(FOV) for proper signal transmission. The modified channel attenuation coefficient
is described in terms of path loss and geometrical losses. The geometrical losses
depend on the physical constraint of setup, i.e., aperture diameter, full width trans-
mitter beam divergence angle and correction coefficient for simulation results. If the
signal is transmitted through a collimated light source, such as a laser diode, then
the geometrical losses are negligible, and as a consequence, the signal depends only
on the path loss. Moreover, the geometrical losses are taken into account for the
diffused and semi-collimated sources, i.e., LEDs and diffused LDs [22]. Thus, the
overall attenuation of the optical link in terms of path loss and geometrical losses is
described as [23]
2 τ
Dr Dr
h l ≈ h pl + h gl ≈ dt−2 exp −c dt1−τ , (12)
θF θF

where Dr , θF and τ are represented receiver aperture diameter, full width transmitter
beam divergence angle, and correction coefficient, while h pl and h gl are the path loss
and geometrical losses, respectively.

2.4 Water Turbidity Channel Modeling

As the proposed system model is paid greater attention to the underwater VLC link,
the RF link is excluded from the scope of this research on considering complex
channel conditions. Therefore, s − r link simply modeled by nakagami-m fading.
Furthermore, VLC link is modeled to consider heavy turbulence channel conditions
combining with pointing errors. According to [24], the VLC link under strong channel
conditions follows the Gamma-Gamma probability distribution and can be expressed
as (αrd +βrd )
(αrd βrd ) 2 (αrd +βrd )
f h t (h t ) = 2 (h t ) 2 −1 K αrd −βrd (2 αrd βrd h t ), (13)
Γ (αrd )Γ (βrd )

where Γ (·) is the Gamma function, and modified Bessel function of the second kind
is denoted by K (αrd −βrd ) (·). The large-scale αrd and small-scale βrd parameters are,
respectively, given by Elamassie et al. [13]
Selection Relay-Based RF-VLC Underwater Communication System 185
⎡ ⎛ ⎞ ⎤−1
⎢ ⎜ 0.49σh2t ⎟ ⎥
αrd = ⎢ ⎜
⎣exp ⎝
⎟ ⎥
76 ⎠ − 1⎦ (14)
12
1 + 0.56(1 − Θ)σh t 5

⎡ ⎛ ⎞ ⎤−1
⎢ ⎜ 0.51σh2t ⎟ ⎥
βrd = ⎢ exp ⎜
⎝
⎟ − 1⎥ , (15)
⎣ 12 6 ⎠ ⎦
5

1 + 0.69σh5t

In (14) and (15), the scintillation index for plane wave model known as Rytov
7 11
variance denoted by σh2t . The Rytov variance can be defined as 1.23Cn2 k 6 L 6 where
wave number k = 2π λ
, refractive-index structure Cn2 and corresponding link length
denoted by L.
The variation of αrd and βrd parameters are shown in Fig. 2. It is clearly depicted
that as scintillation index increses the parameters decrease and vise-versa although
αrd parameter shows an exponentially increment comparing with βrd parameter while
increasing scintillation index.

45
-Large scale factor
-Small scale factor
40

30
Parameters: ,

0
-2 -1 0 1 2
10 10 10 10 10
Log intensity variance

Fig. 2 Analysis of the large- and small-scale factors in underwater medium

186 M. Furqan Ali et al.

2.5 Pointing Error in Underwater VLC Link

An another signal fading source of optical communication is the position deviation

of the transmitter optical beam to the receiver aperture, which is named as pointing
error. It occurs due to malposition phenomena and inclination of relay buoy and/or
receiver, caused by the flexibility of ocean currents and waves. The pointing error
is equivalent beam width dependent where the equivalent beam width is described
as wzeq = 2σd ζ . The parameter ζ is presented as the ratio between the equivalent
beam width radius and the pointing error displacement standard deviation σd . The
random radial displacement Rd is calculated as Rd = Rd2x + Rd2y . The fractions Rd2x
and Rd2y are denoted as displacements along with the horizontal and elevation axes,
respectively. Moreover, the collected power fraction is assigned by A p . Thus, the
pointing error can be expressed as [17]

2R 2
h p ≈ A exp − 2 d
p
(16)
wzeq

3 BER Performance of the System

In the proposed dual-hop communication system model, based on (5) and (6) to
calculate BER performance of receiving signal, a single-carrier OOK modulation
technique is used to transmit information. In the RF-UVLC hybrid communication
link, the instantaneous SNR for whole system at the destination employing AF pro-
tocol with fixed-gain relay can be calculated as
γsr γrd
γd = , (17)
γrd + C

where the fixed-gain amplifying constant is denoted by C. The overall system BER
performance on OOK modulation technique over AWGN channel can be calculated
as [25]
γd
BERd = Q . (18)
2

4 Numerical Results

This section covers the numerical analysis of BER for the proposed dual-hop hybrid
RF/UVLC system model considering distinct water medium. Unless otherwise, we
used the physical constraints of setup as photo-detector aperture diameter Dr , full
width transmitter beam divergence angle θ , distance between base station and relay
Selection Relay-Based RF-VLC Underwater Communication System 187

dsr and the vertical depth of destination from sea surface dt . The numeric values used
in simulation are summarized in Table 2. We target to calculate BER performances
at the destination which is located vertically in underwater environment depicted
as Fig. 1. To simulate the results, we calculate αrd and βrd when the ocean water
temperature and salinity vary. In our simulation, αrd and βrd are considered when the
water temperature is 5◦ C and salinity at 20 per salinity unit (PSU). The corresponding
values used in simulation are summarized in Table 2.

Table 2 Numerical values adopted in the simulation

Symbol and description Numeric values
Aperture diameter (Dr ) 5 cm
Divergence angle (θ) 6◦
Distance between source and relay (dsr ) 200 m
Distance between relay and destination (dt ) 15 m
Laser diode photo-detector efficiency (η) 0.5
Photo-detector responsivity (rs ) 0.28
Large-scale factor (αrd ) 5.9645
Small-scale factor (βrd ) 4.3840
Equivalent beam width 2.5

100

10-1

-2
10
BER

10-3

With Pointing Error in Pure Sea Water

With Pointing Errorin Clear Ocean Water
-4 With Pointing Error in Coastal Ocean Water
10
Without Turbulence & Pointing Error
Pure Sea Water
Clear Ocean Water
Coastal Ocean Water

0 10 20 30 40 50 60
SNR, [dB]

Fig. 3 Decode-and-forward (DF) relayed BER performance of hybrid RF-VLC communication

link in different water mediums with and without pointing errors
188 M. Furqan Ali et al.

In Fig. 3, we simulate the BER performance of DF relay which is used to assist

information in underwater-based destination. In a hybrid DF relayed RF-UVLC
link, the best BER performance is achieved in pure seawater. The BER performance
without turbulence condition in pure seawater is also analyzed. The BER performance
in highly strong turbulence conditions without and with pointing error comprised also
depicted in Fig. 3. In more contrast, the BER performance in pure seawater shows
better performance in highly turbid water rather than clear ocean water with pointing
error in highly turbid water conditions. It is clearly seen that coastal ocean water has
shown poor performance as compared to the pure and clear ocean waters due to high
turbidity and randomness of water currents.
In Fig. 4, the simulation results show the BER performance of AF relayed hybrid
RF-VLC link. The BER performances in comparison with both of the strong tur-
bulence and with pointing error conditions are summarized. In Fig. 3, it is clearly
seen as pure seawater and clear ocean water have superior performance in high SNR
channel conditions. Moreover, AF relay has superior performance in proposed chan-
nel model as compared with DF relayed communication link in less SNR channel
conditions. If targeting 1 × 10−4 BER at high SNR value, AF relayed model has
superior performance regardless of waters comparatively DF relayed communica-

With Pointing Error (Pure Sea Water)

10 0 With Pointing Error in Clear Ocean Water
With Pointing Error in Coastal Ocean Water
RF-VLC without Turbulence
RF-VLC Pure Sea Water
RF-VLC Clear Ocean Water
10
-1 RF-VLC Coastal Ocean Water

10 -2
BER

-3
10

-4
10

0 5 10 15 20 25 30 35 40 45 50
SNR, dB

Fig. 4 Amplify-and-forward (AF) relayed BER performance of RF-VLC communication link in

different water mediums with and without pointing errors
Selection Relay-Based RF-VLC Underwater Communication System 189

100

-1
10

-2
10
BER

10-3

AF-relay in Pure Sea Water

-4
10 AF-relay in Clear Ocean Water
AF-relay in Coastal Ocean Water
DF-relay in Pure Sea Water
DF-relay in Clear Ocean Water
DF-relay in Coastal Ocean Water

0 10 20 30 40 50 60
SNR, dB

Fig. 5 Detailed comparison of both AF and DF relayed BER performance of RF-VLC hybrid
communication link in different water mediums

tion link. Relatively, comparable performance in clear ocean water achieving higher
BER performance with AF relayed RF-VLC combination in low SNR values.
A more detailed comparison between AF and DF relayed RF-VLC hybrid com-
munication link in different waters is depicted in Fig. 5. It is clearly mentioned that
the both the relayed links in pure seawater in strong channel conditions over low
SNR values show better performance. The BER performance of coastal ocean water
shows poor performance in both the combined communication links.
The best comparison of BER performance in different water of dual-hop commu-
nication link with pointing errors is depicted in Fig. 6. A more detailed comparison is
summarized for both relayed communications. In Fig. 6, the performances of AF and
DF relay are analyzed in highly turbid channel conditions along with pointing error
impairments. It clearly refers that AF relay has shown the superior BER performance
than DF relay in all types of channel conditions in low SNR while DF relay shows
almost the same performance but relatively higher SNR channel conditions. Thus,
achieving high BER in different water mediums at high SNR, the RF-UVLC link
shows better performance as combined wireless hybrid communication candidate.
190 M. Furqan Ali et al.

100

10-1

-2
10
BER

-3
10

AF-relay in Pure Sea Water

AF-relay in Clear Ocean Water
-4 AF-relay in Coastal Ocean Water
10
DF-relay in Pure Sea Water
DF-relay in Clear Ocean Water
DF-relay in Coastal Ocean Water

0 10 20 30 40 50 60
SNR, dB

Fig. 6 Detailed comparison of both AF and DF relayed BER performance of RF-VLC hybrid
communication link in different water mediums considering only pointing error

5 Conclusion

A hybrid communication system is an approachable technique to achieve reliable data

rate in different channel conditions regardless of different water mediums. Moreover,
VLC is a promising key-enabling technology for acquiring high data rate in differ-
ent waters. Thus, in this paper, we provided simulation results using Monte Carlo
approach to verify the superiority and investigated the BER performance of RF-
UVLC link under different relay protocols. In this work, a dual-hops channel model
consisting of an RF and a VLC underwater link is considered. An underwater-based
floating buoy is considered as a relay and is used to assist information transmission
through different protocols between onshore base station and undersea node. Then,
we calculated the BER performance at underwater-based destination. Throughout
this work, we investigated the proposed system model performance in different water
types. In the simulation results, it is clearly shown that AF relay shows better BER
performances than DF relay-based communication in lower SNR conditions.

Acknowledgements This work was funded by the framework of the Competitiveness Enhancement
Program of the National Research Tomsk Polytechnic University grant No. VIU-ISHITR-180/2020.
Selection Relay-Based RF-VLC Underwater Communication System 191

References

1. Ali MF, Jayakody DNK, Chursin YA, Affes S, Dmitry S (2019) Recent advances and future
directions on underwater wireless communications. Arch Comput Methods Eng, 1–34
2. Ali MF, Jayakody NK, Perera TDP, Krikdis I (2019) Underwater communications: recent
advances. In: ETIC2019 international conference on emerging technologies of information
and communications (ETIC), pp 1–6
3. Zeng Z, Fu S, Zhang H, Dong Y, Cheng J (2016) A survey of underwater optical wireless
communications. IEEE Commun Surv Tutor 19(1):204–238
4. Dautta M, Hasan MI (2017) Underwater vehicle communication using electromagnetic fields
in shallow seas. In: 2017 international conference on electrical, computer and communication
engineering (ECCE). IEEE, pp 38–43
5. Kaushal H, Kaddoum G (2016) Underwater optical wireless communication. IEEE. Access
4:1518–1547
6. Awan KM, Shah PA, Iqbal K, Gillani S, Ahmad W, Nam Y (2019) Underwater wireless sensor
networks: a review of recent issues and challenges. Wirel Commun Mobile Comput
7. Majumdar AK (2014) Advanced free space optics (FSO): a systems approach, vol 186. Springer,
Berlin
8. Singh S, Kakamanshadi G, Gupta S (2015) Visible light communication-an emerging wire-
less communication technology. In: 2015 2nd international conference on recent advances in
engineering & computational sciences (RAECS). IEEE, pp 1–3
9. Gupta A, Sharma N, Garg P, Alouini M-S (2017) Cascaded fso-vlc communication system.
IEEE Wirel Commun Lett 6(6):810–813
10. Zhang J, Dai L, Zhang Y, Wang Z (2015) Unified performance analysis of mixed
radio frequency/free-space optical dual-hop transmission systems. J Lightwave Technol
33(11):2286–2293
11. Ansari IS, Yilmaz F, Alouini M-S (2013) Impact of pointing errors on the performance of
mixed rf/fso dual-hop transmission systems. IEEE Wirel Commun Lett 2(3):351–354
12. Charles JR, Hoppe DJ, Sehic A (2011) Hybrid rf/optical communication terminal with spherical
primary optics for optical reception. In: 2011 international conference on space optical systems
and applications (ICSOS). IEEE, pp 171–179
13. Elamassie M, Sait SM, Uysal M (2018) Underwater visible light communications in cascaded
gamma-gamma turbulence. In: IEEE globecom workshops (GC Wkshps). IEEE, 1–6
14. Elamassie M, Al-Nahhal M, Kizilirmak RC, Uysal M (2019) Transmit laser selection for under-
water visible light communication systems. In: IEEE 30th annual international symposium on
personal, indoor and mobile radio communications (PIMRC). IEEE, 1–6
15. Yilmaz A, Elamassie M, Uysal M (2019) Diversity gain analysis of underwater vertical mimo
vlc links in the presence of turbulence. In: 2019 IEEE international black sea conference on
communications and networking (BlackSeaCom). IEEE, pp 1–6
16. Illi E, El Bouanani F, Da Costa DB, Ayoub F, Dias US (2018) Dual-hop mixed rf-uow com-
munication system: a phy security analysis. IEEE Access 6:55-345–55-360
17. Elamassie M, Uysal M (2019) Vertical underwater vlc links over cascaded gamma-gamma
turbulence channels with pointing errors. In: 2019 IEEE international black sea conference on
communications and networking (BlackSeaCom). IEEE, pp 1–5
18. Farid AA, Hranilovic S (2007) Outage capacity optimization for free-space optical links with
pointing errors. J Lightwave Technol 25(7):1702–1710
19. Song X, Cheng J (2012) Optical communication using subcarrier intensity modulation in strong
atmospheric turbulence. J Lightwave Technol 30(22):3484–3493
20. Hanson F, Radic S (2008) High bandwidth underwater optical communication. Appl Opt
47(2):277–283
21. Mobley CD, Gentili B, Gordon HR, Jin Z, Kattawar GW, Morel A, Reinersman P, Stamnes
K, Stavn RH (1993) Comparison of numerical models for computing underwater light fields.
Appl Opt 32(36):7484–7504
192 M. Furqan Ali et al.

22. Elamassie M, Uysal M (2018) Performance characterization of vertical underwater vlc links
in the presence of turbulence. In: 11th international symposium on communication systems,
networks & digital signal processing (CSNDSP). IEEE, pp 1–6
23. Elamassie M, Miramirkhani F, Uysal M (2018) Channel modeling and performance charac-
terization of underwater visible light communications. In: 2018 IEEE international conference
on communications workshops (ICC workshops). IEEE, pp 1–5
24. Sandalidis HG, Tsiftsis TA, Karagiannidis GK (2009) Optical wireless communications with
heterodyne detection over turbulence channels with pointing errors. J Lightwave Technol
27(20):4440–4445
25. Grubor J, Randel S, Langer K-D, Walewski JW (2008) Broadband information broadcasting
using led-based interior lighting. J Lightwave Technol 26(24):3883–3892
Circular Polarized Octal Band CPW-Fed
Antenna Using Theory of Characteristic
Mode for Wireless Communication
Applications

Reshmi Dhara

Abstract An innovative design thought intended for a multipurpose multiuseful

printed antenna with coplanar waveguide (CPW)-fed support circular polariza-
tion (CP) is depicted in this manuscript. Using theory of characteristic modes
(TCMs) is investigated for octal band circular polarization (CP). TCM depicts that
the whole radiator contribute to excite electric and magnetic modes to generate
broad impedance performance also. To find resonating CP frequencies and radiating
behavior, seven characteristics modes are excited using asymmetric CPW-fed tech-
niques. The antenna is made up of a radiator that is contained of a ring whose shape
is hexagonal. This is connected with an annular ring on the left most corners that
generate wide circular polarization. The implemented design of antenna generates a
broad impedance bandwidth (IBW) band spanning over 1.5 GHz—beyond 14 GHz.
Additionally, the 3 dB axial ratio band widths (ARBWs simulated) for octal bands
are 310 MHz (3.13–3.34 GHz, f c (CP resonating frequency) = 3.2 GHz), 310 MHz
(6.45–6.76 GHz, f c = 6.6 GHz), 40 MHz (8.08–8.12 GHz, f c = 8.1 GHz), 120 MHz
(8.63–8.74 GHz, f c = 8.7 GHz), 180 MHz (9.49–9.67 GHz, f c = 9.5 GHz), 30 MHz
(11.69–11.72 GHz, f c = 11.7 GHz), 40 MHz (12.19–12.23 GHz, f c = 12.2 GHz)
and 140 MHz (12.57–12.71 GHz, f c = 12.6 GHz) correspondingly.

Keywords Octal band antenna · Circular polarized · CPW-fed · Theory of

characteristic modes (TCMs)

1 Introduction

Presently, printed monopole antennas are widely utilized because of their many
attractive features, like omnidirectional radiation patterns, wide impedance band-
width, ease of fabrication, low cost, and lightweight. On the other hand, monopole
antennas are well-matched with integrated circuitry of wireless communication

R. Dhara (B)
Department of Electronics and Communication Engineering, National Institute of Technology
Sikkim, Ravangla 737139, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 193

devices because of their easy feed techniques. Additionally, maximum number of the
monopole antennas aims are to support linearly polarized (LP) radiation. Utilization
of CP antennas is more beneficial to create and obtain CP EM waves and is compar-
atively less dependent to their exact positionings. The CP is habitually produced by
stimulating two nearly degenerate orthogonal resonant modes of equal amplitude.
So, if CP is generated by the monopole antenna, its performance may get significantly
improved. CP antennas can generate polarization variety by creating both left-hand
circular polarization (LHCP) and right-hand circular polarization (RHCP).
Circular polarization can be generated by single feed with slotted loop for L-band
communication [1]. Here, the extremely big size of the antenna achieved IBW and
ARBW of both 11.1% (140 MHz, f c = 1.262 GHz). A rectangular microstrip antenna
of size 24 × 16 × 1.5875 mm3 with slotted ground plane attained linearly polarized
(LP) IBW are 5.125–5.395 and 5.725–5.985 GHz [2]. A planar antenna with large
dimension 30 × 30 × 1.6 mm3 achieved LP IBW of 220 MHz, i.e., 8.9% [3]. Another
large triple-strip antenna with size 50 × 50 × 1.6 mm3 gave dual ARBW of 70 MHz
at the lower band (1.57 GHz) and 60 MHz at the upper band (2.33 GHz) within IBW
spanning over 1.43–3.29 GHz [4]. Another designed concept has been discussed in
Ref. [5] for getting wide IBW and wide ARBW.
The papers cited above produced either only dual or triple CP bands with narrower
IBW. In this paper achieved IBW is very wide and produced octal bands CP are large
compared to earlier reports and the structure of the antenna is very simple. The
proposed antenna generated ultra-wideband with superior impedance matching over
wider frequency range and able to exciting CP multibands. CPW-fed is used in this
antenna. The proposed antenna simultaneously has reasonably higher gain, wider
bandwidth, and multi-CP characteristics in comparison to the earlier cited antennas.
Owing to antenna designs generating octal CP bands, motivated us for to focus our
work on planning a compact antenna giving octal or more CP bands.
However, the TCM analysis is lacking in previously reported literature for the
wideband/UWB antennas. Here, the proposed CP antenna also utilizes the analysis
of TCM tools [6] for the broad impedance band and octal CP band response.
Herein this paper, the proposed antenna is designed using Eqs. (i)–(viii) following
the references of some related existing designs [7–9]. Primary goal of this work was to
design a multi-CP band application antenna for small form factor devices. We hoped
to design the circularly polarized compact planar monopole antenna with single-fed
for octal bands CP applications. This would defeat the necessity for use of multiple
circular polarized antennas.
The implemented antenna is planed taking 1.5 GHz as the theoretical lower
resonating frequency, so that it can cover all of the Wi-Fi, WLAN, and UWB bands.
But after optimization, the simulated impedance bandwidth of the proposed antenna
observed at 1.5 GHz with smaller dimension of size compared to theoretical size.
This is excellent result fulfilling the criteria for miniaturization. Our designed antenna
gave octal band CP characteristics, in addition also gave broad IBW. A hexagonal
ring connected with an annular ring over left most corners which gives wide CP
bands (AR ≤ 3 dB) inside the range IBW curve. In association with related study,
in our knowledge, this is one of the best results achieved. FR4-epoxy substrate is
Circular Polarized Octal Band CPW-Fed Antenna Using … 195

used here, which produces some extra complicacies beyond 12 GHz. It places a
restriction on our proposed antenna that it cannot be used for applications further
than microwave frequency band. Simulation was done using ANSYS Electronics
Desktop 2020R1. For the proposed antenna simulated IBW span is over 1.5 GHz to
beyond 14 GHz. In addition, the simulated ARBWs for octal bands are 310 MHz
(3.13–3.34 GHz), 310 MHz (6.45–6.76 GHz), 40 MHz (8.08–8.12 GHz), 120 MHz
(8.63–8.74 GHz), 180 MHz (9.49–9.67 GHz), 30 MHz (11.69–11.72 GHz), 40 MHz
(12.19–12.23 GHz), and 140 MHz (12.57–12.71 GHz). Size of the antenna is 55 ×
56 × 1.6 mm3 , with 23.24% size reduction can be possible.
The paper is prepared as follows: Sect. 2: Theory of Characteristics Modes
analysis; Sect. 3: Procedure of Antenna Design; Sect. 4: Experimental Result and
Discussion; and Sect. 5: Conclusion.

2 Theory of Characteristics Modes Analysis (TCMs)

Here, CMA operation for implemented antenna is demonstrated. Figure 1 shows the
CMA analysis for this octal band circularly polarized antenna. Figure 1a described
the implemented antenna configuration and plot of eigenvalues versus frequency plot
of the seven fundamental characteristic modes. The eigenvalues (λn = 0) for mode
2, 3, 4, 6, 7, 8, 10 are dominant mode, whereas no mode is inductive mode as it has
very high eigenvalues (λn > 0) and 1, 5, and 9 modes are capacitive mode as it has
low high eigenvalues (λn < 0).
Figure 1b described the implemented antenna configuration and plot of charac-
teristics angle versus frequency plot of the seven fundamental characteristic modes.
Here modes 2, 3, 4, 7, 6, 8, 10 cross 180° axis line at resonant frequencies 12.36, 11.8,
10.65, 9.93, 8.27, 8.09, 3.90 GHz, respectively, which are dominant mode, whereas
1, 5, 9 modes are non-resonant mode as they does not cross 180° axis line.
Similarly Fig. 1c described large model significance value around 1 is dominant
at their resonant frequencies for mode 2, 3, 4, 7, 6, 8, 10 and model significance
(<0.43) for 1, 5, and 9 modes is non-resonant.
Existing mode for generation of octal band CP:
It is well known that phase required for a resonant CP mode is 90° and amplitude
is equal for two degenerated modes. Here the antenna for CMA analysis is done
without feeding structure. The substrate and ground plane are considered infinite
and radiator is considered zero thickness with PEC [10, 11].
Figure 2a–h depicts the modal far field radiation pattern for the radiator for 10
modes at their CP resonating frequency. Figure 2a shows that the mode 10 and 8 are
the fundamental mode in x-direction and y-direction, respectively, and radiate in +
z direction at f c1 = 3.2 GHz. A circular polarization is produced due to these two
modes as phase differences between these two orthogonal modes have of 90°.
Figure 2b shows that the mode 10 and 6 are the fundamental mode in y-direction
and x-direction correspondingly and radiate in +z-direction at f c2 = 6.6 GHz. A
196 R. Dhara

Fig. 1 TCMs analysis for seven modes for a eigenvalues, b characteristics angle, and c modal
significance
Circular Polarized Octal Band CPW-Fed Antenna Using … 197

(a) fc1= 3.2 GHz

(b) fc2= 6.6 GHz

(c) fc3= 8.1 GHz

(d) fc4= 8.7 GHz

(e) fc5= 9.5 GHz

(f) f = 11.7
c6
GHz

(g) f c7 = 12.2
GHz

(h) f c8 = 12.6
GHz

Fig. 2 Modal distribution of current and modal field (radiation pattern at far field) for 7 modes at
CP resonating frequencies

CP is produced due to these pair of modes as phase differences between these two
orthogonal modes have of 90°. Modes 8, 7 lead to cancelation of electric field in the
far field zone at +z-direction.
Figure 2c shows that the modes 8 and 6 are the important mode in y-direction and
x-direction correspondingly and radiate in +z-direction at f c3 = 8.1 GHz. A CP is
formed due to this pair of modes having phase difference of 90°. Modes 10, 7, 4, 3
lead to cancelation of electric field in the far field zone at +z-direction.
198 R. Dhara

Figure 2d shows that the modes 7 and 6 are the fundamental mode in y-direction
and x-direction, respectively, and radiate in +z-direction at f c4 = 8.7 GHz. Again
CP is produced due to these two modes as phase differences between these two
orthogonal modes have of 90°. Modes 10, 8, 4, 3 lead to cancelation of electric field
in the far field zone at +z-direction.
Figure 2e shows that the modes 7 and 6 are the fundamental mode in y-direction
and x-direction, respectively, and radiate in +z-direction at f c5 = 9.5 GHz. Here also,
CP is produced due to these two modes are orthogonal modes with 90° angle. Modes
10, 8, 4, 3 lead to cancelation of electric field in the far field zone at +z direction.
Figure 2f shows that the modes 4 and 3 are the fundamental mode in y-direction
and x-direction, respectively, and radiate in +z-direction at f c6 = 11.7 GHz. A circular
polarization is produced due to these two modes as phase differences between these
two orthogonal modes have of 90°. Modes 10, 8, 7, 6, 2 lead to cancelation of electric
field in the far field zone at +z direction.
Figure 2g shows that the modes 3 and 2 are the fundamental mode in x-direction
and y-direction, respectively, and radiate in +z-direction at f c7 = 12.2 GHz. A circular
polarization is produced due to these two modes as phase differences between these
two orthogonal modes have of 90°. Modes 10, 8, 7, 6, 4 lead to cancelation of electric
field in the far field zone at +z-direction.
Figure 2h shows that the modes 7 and 3 are the fundamental mode in x-direction
and y-direction, respectively, and radiate in +z-direction at f c8 = 12.6 GHz. A circular
polarization is produced due to these two modes as phase differences between these
two orthogonal modes have of 90°. Modes 10, 8, 6, 4, 2 lead to cancelation of current
in the far field zone at +z direction.

3 Antenna Design Procedure

A. Antenna configuration
Design of the antenna is shown in Fig. 3. Antenna is made up on an FR4-
epoxy substrate with a relative permittivity εr = 4.4 and loss tangent tan δ
= 0.02. The overall size of the antenna is 55 × 56 × 1.6 mm3 . As Fig. 3
demonstrates, the feed line of 50 of length L f and width W f is coupled to
an impedance transformer. The hexagonal ring monopole antenna is joined at
its left corner with an annular ring. The width Ws for both rings are same. As
an alternative to multifed structure, the designed single-fed antenna is based
on a dual-loop monopole structure not including additional feeding parts for
a 90° phase difference among two orthogonal polarized modes. During the
performance evaluation of the antenna, the radiator structure can be examined
as a combination of two perturbed rings. Thus, the perturbation reasons the
radiating of CP wave at wanted bands. Owing to increase the bandwidth the
monopole antenna has shifted as of the center of the feed line to the left in
order to increase CP bandwidth. Underneath the conditions of an unchangeable
Circular Polarized Octal Band CPW-Fed Antenna Using … 199

Fig. 3 Geometry of the

proposed antenna
(Wsub = 56, L sub = 55, h =
1.6, L 1 = 15.5, R1 =
14.3, R2 = 7.75, g =
0.9, L t = 13.4, Wt =
1.8, L f = 8.6, W f =
3, Ws = 1.8, d = −1.4)

antenna sized, parameters W s , d, W sub , L 1 are the key parameters affect the
bandwidth of the proposed antenna.
B. Design of the antenna parameters at resonating Frequency (f r1 ) 1.5 GHz [7–9]:
Considering dielectric constant εr = 4.4, and thickness (h) = 1.6 mm of
FR-4 epoxy substrate, and the resonant frequency (f r1 ) = 1.5 GHz. Using
conventional design procedure, the design parameters of the antenna theoretical
calculation depict as following equation.
I. Width of the patch calculation (W ):

1 2
W = (i)
2 fr √μ0ε0 εr +1

II. Effective dielectric constant (εreff ):

εr +1 εr −1 h −1/2
εr eff = + 1 + 12 (ii)
2 2 w

III. Guided wavelength calculation (λg ):

1 1
λg = ×√ (iii)
fr 1√μ0ε0 ε r eff

IV. Effective length calculation (Leff =):

200 R. Dhara

λg
L eff = (iv)
2
V. Length extension calculation:

L (εr eff + 0.3) Wh + 0.264
= 0.412 (v)
h (εr eff − 0.258) Wh + 0.8

VI. Actual length of patch calculation (L):

L = L eff − 2L (vi)

VII. Length of the substrate calculation:

L sub = L + 6 × h (vii)

VIII. Width of the substrate calculation:

Wsub = W + 6 × h (viii)

Substituting εr = 4.4 and f r1 = 1.5 GHz, we got: L sub = 56.98 and W sub =
70.42 mm.
After optimization we got: L sub = 55 mm and W sub = 56 mm. So now 23.24%
size reduction can be possible at same lower resonating frequency 1.5 GHz.
C. Operating principle
Figure 4 depicts comparisons improvement process of the antenna. Figures 5
and 6 show the comparisons of simulated IBW and ARBW improvement for
the proposed antenna.
Antenna 1 utilizes a hexagonal patch with a quarter wave transmission and
CPW-fed ground plane [12–15]. From Figs. 5 and 6, it is understandable that
impedance bandwidth is good but ARBW is very poor. In order to improve
IBW more, ARBW use a hexagonal ring in state of hexagonal patch. But still it
does not satisfy the 3 dB ARBW criterion. So in order to create the perturbation

Fig. 4 Four progress procedures of proposed antenna. a Antenna 1, b Antenna 2, c Antenna 3,

d Antenna 4
Circular Polarized Octal Band CPW-Fed Antenna Using … 201

Fig. 5 Reflection coefficient (simulated) for the proposed antenna improvement process

Fig. 6 Axial ratio bandwidth (simulated) for the proposed antenna improvement process

of electric field, the hexagonal ring has shifted to the left from the center by
a distance ‘d’. From Fig. 6, it can be seen that the ARBW has improved at
higher frequency band. So further to improve ARBW at lower band also, an
annular ring has added to the left corner of the hexagonal ring. From Figs. 4
and 5, it could be seen the greatest improvement of IBW and ARBW by using
Antenna 4 compared to other structures. Since the results were better compared
to related studies, Antenna 4 has been chosen for final design.
D. Parametric study
So as to obtain the optimized dimensions of the monopole antenna, various
parameters performance is analyzed. The CPW-fed ground plane length was
studied and presented in Fig. 7.

Additionally, a significant characteristic of the designed antenna is the influence

of the matching of impedance caused by the coupling phenomenon among the ground
plane with CPW-fed and the patch. For this, the effects of the CPW-fed ground length
202 R. Dhara

Fig. 7 Return loss (simulated) versus frequency for the proposed monopole antenna with various
L 1 length

Fig. 8 Return loss (simulated) versus frequency for the proposed monopole antenna with various
W s length

L 1 = 13.0, 13.5, 14.0, 14.5, 15.0, 15.5, and 16.0 mm on the presentation of the
proposed monopole antenna were also studied and presented in Fig. 5. The acquired
results evidently demonstrate that the lowest resonant frequency shifts toward higher
frequency band and forms with decreasing the height ‘L 1 ’. Higher resonant frequency
also shifts toward higher frequency band and disappears gradually. This for the reason
that rising the feed gap height considerably increases the total parallel capacitive
effect, lowers quality factor, and increases the resonance frequency [16]. The IBW
alters considerably with changing the length. This is because of the sensitivity of the
impedance matching to the feed gap. The ground plane, serving as an impedance
matching circuit, adjusts the input impedance and the operating bandwidth while the
length is different. To be summarized, the optimized length (L 1 ) is examined to be
at L 1 = 15.5 mm.
Circular Polarized Octal Band CPW-Fed Antenna Using … 203

Figure 8 shows the simulated IBW for different length of the strip width. From the
graph it is clear that IBW change is negligible while it has a great impact on ARBW
as shown in Fig. 9. From Fig. 9 it is clear that when W s = 1.8 mm gives wide CP
band for better impedance matching at this position.
Figure 10 depicts the IBW for varying length of ‘d’. From the graph it is clear
that IBW change is almost negligible while it has a great influence on ARBW as
shown in Fig. 11. From Fig. 11 it is clear that when d = −1.4 mm, wide CP band is
given at this position. This position gives perturbation of electric field with the same
amplitude and has a 90° phase difference.
Figure 12 depicts the IBW for varying length of the substrate. From the graph it
is clear that IBW change is negligible while it has a great effect on ARBW as shown
in Fig. 13. From Fig. 13 it is clear that when W sub = 56 mm, wide CP band for better
impedance matching is given at this position.

Fig. 9 ARBW (simulated) versus frequency for the proposed monopole antenna with various W s
length

Fig. 10 Return loss (simulated) versus frequency for the proposed monopole antenna with various
d length
204 R. Dhara

Fig. 11 ARBW (simulated) versus frequency for the proposed monopole antenna with various
d length

Fig.12 Return loss (simulated) versus frequency for the proposed monopole antenna with various
W sub length

Fig. 13 ARBW (simulated) versus frequency for the proposed monopole antenna with various
W sub length
Circular Polarized Octal Band CPW-Fed Antenna Using … 205

4 Experimental Results and Discussion

Simulations were executed utilizing ANSYS Electronics Desktop 2020R1. From

Fig. 14 it is obvious that implemented antenna has simulated −10 dB IBW from
1.5 GHz- beyond 14.0 GHz, IBW is 12.5 GHz, center frequency is 7.75 GHz, 161%.
VSWR < 2 which one is simulated for the implemented antenna on whole IBW
region shown as in Fig. 15.
Figure 16 shows simulated ARBW for proposed design antenna. Proposed antenna
gives eight CP bands within the IBW curve. The CP bands are 310 MHz (3.13–
3.34 GHz, f c = 3.2 GHz), 310 MHz (6.45–6.76 GHz, f c 6.6 GHz), 40 MHz
(8.08–8.12 GHz, f c 8.1 GHz), 120 MHz (8.63–8.74 GHz, f c 8.7 GHz), 180 MHz
(9.49–9.67 GHz, f c 9.5 GHz), 30 MHz (11.69–11.72 GHz, f c 11.7 GHz), 40 MHz

Fig. 14 Reflection coefficient (simulated) of the proposed antenna

Fig. 15 Simulated VSWR of the proposed antenna

206 R. Dhara

Fig. 16 Simulated ARBW of the proposed antenna

Fig. 17 Radiation pattern (simulated) for the proposed antenna in the a XZ (ϕ = 0°) plane and
b YZ (ϕ = 90°) plane

(12.19–12.23 GHz, f c 12.2 GHz), and 140 MHz (12.57–12.71 GHz, f c 12.6 GHz),
respectively, which can be useful for wireless communication application.
Figure 17 demonstrates the radiation pattern which one are simulated for the XZ
plane (ϕ = 0°) and YZ plane (ϕ = 90°) at 3.2 GHz. The radiation pattern which one
is simulated in Fig. 14a, b has shown that the cross-polarization levels are 19 dB
lower than co-polarization levels in the broadside direction.
Figure 18 demonstrates that the radiation pattern which one is simulated, RHCP
at broadside direction. Similarly it could be exposed the radiation pattern at erstwhile
CP resonating frequency (f c ) is RHCP at broadside direction. When the structure is
inverted, opposite polarization can be seen.
The peak gain which one is simulated of this implemented antenna is 8.37 dBi at
a frequency of 12.7 GHz as illustrated in Fig. 19 and the peak gain at the other center
frequency of CP band is −2.35 dBi at 3.2 GHz, 1.25 dBi at 6.6 GHz, 3.55 dBi at
8.1 GHz, 3.62 dBi at 8.7 GHz, 3.79 dBi at 9.5 GHz, 4.76 dBi at 11.7 GHz, 5.97 dBi
at 12.2, and 7.74 dBi at 12.6 GHz correspondingly.
Figure 20 demonstrates radiation efficiency which one is simulated for imple-
mented antenna vs. frequency. On behalf of the all CP bands, simulated efficiency is
Circular Polarized Octal Band CPW-Fed Antenna Using … 207

Fig. 18 Simulated radiation

patterns (LHCP and RHCP)
in the a for XZ (ϕ = 0°) and
b for YZ (ϕ = 90°) planes

Frequency= 3.2 GHz

(a) XZ plane (b) YZ plane

Fig. 19 Peak gain

(simulated) for the proposed
antenna

Fig. 20 Simulated radiation

efficiency for proposed
antenna

inside 65–98% and the highest efficiency is 97.87% at 1.6 GHz, and the efficiency at
the center frequency of CP band is 95.68% at 3.2 GHz, 94.26% at 6.6 GHz, 90.25%
at 8.1 GHz, 87.25% at 8.7 GHz, 85.16% at 9.5 GHz, 72.21% at 11.7 GHz, 70.29%
at 12.2 GHz, and 69.74% at 12.6 GHz correspondingly which is especially good for
practical purpose.
Comparisons have been made in Table 1 with proposed antenna and very recently
design multiband antenna [17–19]. It is surveyed that the proposed work demon-
strates improved widest impedance band and octal CP characteristics in addition
compact size and good gain.
208 R. Dhara

Table 1 Assessment of proposed antenna among recently developed multiband antenna

Ref. (year) Antenna size IBW ARBW at center No of CP
frequency f c GHz bands
15 (2017) 21 × 21 × 1.7127–2.0393 GHz, 1.65–1.95 GHz at 01
0.8 mm3 2.303–2.509 GHz, 1.9 GHz
3.235–3.843 GHz,
4.933–5.629 GHz,
7.5–7.7 GHz
16 (2019) 0.26λ0 × 0.26λ0 × 2.5% at 1.575 GHz, 1.1% at 1.575 GHz, 04
0.024λ0 , λ0 is 2.2% at 1.227 GHz, 1.0% at 1.227 GHz,
wavelength at 1.38% at 1.176 GHz, 4.1% at 1.176 GHz,
1.2 GHz 2.7% at 2.3 GHz 1.5% at 2.3 GHz
17 (2018) 47 × 40 × 2.1–2.87 GHz, 2.78–2.87 GHz at 07
1.57 mm3 2.91–3.08 GHz, 2.84 GHz,
3.39–3.95 GHz, 2.89–2.96 GHz at
4.08–4.51 GHz, 2.91 GHz,
4.62–4.67 GHz, 3.87–3.99 GHz at
4.68–6.96 GHz 3.89 GHz,
5.07–5.1 GHz at
5.085 GHz,
5.34–5.40 GHz at
5.378 GHz,
5.50–5.53 GHz at
5.51 GHz,
5.59–5.65 GHz at
5.624 GHz
Our work 56 × 55 × 1.5—beyond 14 GHz 3.13–3.34 GHz at 08
1.6 mm3 3.2 GHz,
6.45–6.76 GHz at
6.6 GHz,
8.08–8.12 GHz at
8.1 GHz,
8.63–8.74 GHz at
8.7 GHz,
9.49–9.67 GHz at
9.5 GHz,
11.69–11.72 GHz at
11.7 GHz,
12.19–12.23 GHz at
12.2 GHz,
12.57–12.71 GHz at
12.6 GHz

5 Conclusion

An octal bands CPW-fed monopole antenna has been presented here. The designed
antenna is uncomplicated and simple to fabricate. In addition, implemented antenna
gets wider impedance bandwidth along with octal CP bands, which gratify the
necessities of the current octal CP bands wireless communication devices. Through
Circular Polarized Octal Band CPW-Fed Antenna Using … 209

the asymmetric CPW-fed techniques, simply modes among symmetric currents are
capable to be stimulated. These modes have a constant phase variation over respec-
tive wide range of frequencies, generating octal CP bands. This geometry formula
gives great expediency for the octal band CP antenna devise by designing the radiator
and feeding configuration individually. This TCM tools creates a great extent simpler
method to simulate a CP antenna assessment with former techniques.

References

1. Qing X, Chia YWM (1999) A novel single-feed circular polarized slotted loop antenna. In:
Antennas and propagation society international symposium, vol. 1. IEEE, pp 248–251, July
1999
2. Chakraborty U, Kundu A, Chowdhury SK, Bhattacharjee AK (2014) Compact dual-band
microstrip antenna for IEEE 802.11 a WLAN application. IEEE Antennas Wirel Propag Lett
13:407–410
3. Suma MN, Raj RK, Joseph M, Bybi PC, Mohanan P (2006) A compact dual band planar
branched monopole antenna for DCS/2.4-GHz WLAN applications. IEEE Microwave Wirel
Compon Lett 16(5):275–277
4. Hsu CW, Shih MH, Wang CJ (2016) A triple-strip monopole antenna with dual-band circular
polarization. In: Antennas and propagation (APCAP), 2016 IEEE 5th Asia-Pacific conference
on IEEE, pp 137–138, July 2016
5. Wu JW, Ke JY, Jou CF, Wang CJ (2010) A microstrip-fed broadband circularly polarized
monopole antenna. IET Microw Antennas Propag 4(4):518–525
6. Dhara R, Yadav S, Sharma MM, Jana SK, Govil MC (2021) A circularly polarized quad-
band annular ring antenna with asymmetric ground plane using theory of characteristic modes.
Progress Electromag Res M, 100:51–68. https://doi.org/10.2528/PIERM20102006
7. Balanis CA (2016) Antenna theory: analysis and design. Wiley, New York. ISBN-978-1-118-
64206
8. Pozar DM (1992) Microstrip antennas. Proc IEEE 80(1):79–91. https://doi.org/10.1109/5.
119568
9. Guo Y-X, Bian L, Quan Shi X (2009) Broadband circularly polarized annular-ring microstrip
antenna. IEEE Trans Antennas Propag 57(8):2474–2477. https://doi.org/10.1109/TAP.2009.
2024584
10. Dhara R, Mitra M (2020) A triple-band circularly polarized annular ring antenna with asym-
metric ground plane for wireless applications. Eng Rep 2(4):e12150. https://doi.org/10.1002/
eng2.12150
11. Dhara R, Jana SK, Mitra M (2020) Tri-band circularly polarized monopole antenna for wireless
communication application. Radioelectron Commun Syst 63(4):213–222
12. Dhara R (2020) Quad-band circularly polarized CPW-fed G-shaped printed antenna with square
slot. Radioelectron Commun Syst 63(7):376–385
13. Dhara R, Jana SK, Mitra M (2020) CPW-fed triple-band circularly polarized printed inverted
C-shaped monopole antenna with closed-loop and two semi-hexagonal notches on ground
plane. In: Optical and wireless technologies. Springer, Singapore, pp 161–175
14. Dhara R, Kundu T (2020) A compact inverted Y-shaped circularly polarized wideband
monopole antenna with open loop. Eng Rep 2:e12326. https://doi.org/10.1002/eng2.12326
15. Garbacz R, Turpin R (1971) A generalized expansion for radiated and scattered fields. IEEE
Trans Antennas Propag 19(3):348–358. https://doi.org/10.1109/TAP.1971.1139935
16. Alam M, Kanaujia BK, Beg MT, Kumar S, Rambabu K (2019) A hexa-band dual-sense
circularly polarized antenna for WLAN/Wi-MAX/SDARS and C-band applications. Int J RF
Microwave Comput Aided Eng 29(4):e21599
210 R. Dhara

17. Chen Y, Wang CF (2015) Characteristic modes: theory and applications in antenna engineering.
Wiley, New York
18. Pedram K, Nourinia J, Ghobadi C, Karamirad M (2017) A multiband circularly polarized
antenna with simple structure for wireless communication system. Microwave Opt Technol
Lett 59(9):2290–2297
19. Falade OP, Ur-Rehman M, Yang X, Safdar GA, Parini CG, Chen X (2020) Design of a compact
multiband circularly polarized antenna for global navigation satellite systems and 5G/B5G
applications. Int J RF Microwave Comput Aided Eng 30(6):e22182
Massive MIMO Pre-coders for Cognitive
Radio Network Performance
Improvement: A Technological Survey

Mayank Kothari and U. Ragavendran

Abstract Cognitive radio (CR) is the most reliable technology for efficient spectrum
usage. In cognitive radio network (CRN), primary users share frequency band with
secondary user. Secondary users relay the traffic of primary user while primary users
granted the restricted access of spectrum to secondary user. During establishment
of concurrent communication link between primary and secondary users, interfer-
ence between both links reduces the performance of CRN. Multiple input multiple
outputs (MIMO) have the capabilities to overcome inter-user interference (IUI) of
CRN in underlay mode and established concurrent communication link. Massive
MIMO systems utilize a large number of antennas at the base station to concurrently
serve a group of user equipment (UE) in the same frequency band. This technology
will help in improving channel capacity and throughput in 5G and beyond 5G wire-
less communication systems. This technology can achieve low-power consumption
at the base station. In massive MIMO-based CRN, pre-coding techniques helps to
mitigate inter-user interference during transmission of information to primary and
secondary users at the same time and frequency. In this review, pre-coding techniques
of linear, nonlinear, and constant envelope pre-coding which is classified where trun-
cated polynomial expansion of linear pre-coding gives better performance than zero
forcing, minimum mean square error, and regression-based linear pre-coding.

Keywords Adaptive massive MIMO · Intelligence CRN · Dynamic spectrum

utilization · Supervised pre-coding · IUI coordination

M. Kothari (B) · U. Ragavendran

SVKM’s NMIMS (Deemed To Be University), Shirpur, India
e-mail: [email protected]
U. Ragavendran
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 211

1 Introduction

Everything is going on wireless communication whether it is mobile phones

earphones or let it be smart watches connected through our mobile phones via Blue-
tooth. Wireless communication systems technology has gained much attention in
recent years. With increment in number of mobile users greater than population of
world and real-time streaming of bandwidth hungry videos, there is huge requirement
of the improvement in wireless communication system that can efficiently utilize
frequency spectrum as well as enhance the data rate and throughput of systems. High
expanding growth in wireless traffic has experienced over a long duration of time. In
future, this trend will continue followed by new innovations and technology imple-
mentation, for example, artificial intelligence, Internet of things, augmented reality,
etc. The researchers are investigating new techniques for the next-generation mobile
networks (NGMN) also known as fifth generation (5G) that will be emerged with rich
technologies like evolving radio access techniques, hyper dense small cell deploy-
ment, self-organizing networks, machine-type communication, cognitive radio for
dynamic spectrum sharing, etc., and have great potential to fulfill the demands of
future application and services [1]. The cognitive radio (CR) in NGMN is fore-
seen to empower huge improvements like dynamic radio spectrum access technolo-
gies in congruity with limited conditions and spectrum utilization. The enormous
complexity in 5G because of limited conditions, the heterogeneous systems admin-
istration, and the accessibility of different access techniques will not be ideally dealt
with manual info or with those regular calculations which need flexibility to the
ecological changes. Hence, the association of CR in 5G will deliver versatility and
the executives to those confined conditions with extraordinarily diminished human
intercession [2].
Cognitive radio network (CRN) is the most reliable technology for enhancing the
utilization of frequency spectrum [3], restricted mutual interference among users and
adaptable conjunction with different radio access innovations which are profitable
advantages [4]. The objective of cognitive radio (CR) is primary user (licensed user)
granted the access of spectrum dynamically to secondary user (unlicensed user) under
the condition that there should be no mortification in quality of service of primary
users (PUs) due to interferences with secondary users (SUs) [5]. Spectrum overlay,
underlay, and interweave are the three basic approaches used for concurrent commu-
nication in CRN [6]. In underlay approach, one prominent solution for reducing IUI
is exploiting spatial diversity using multiple antenna techniques [7]. This technique
not only reduces interference but also gives opportunity to SU for enhancing data
rate.
Multiple antenna systems are universally accepted in different wireless commu-
nication standards like UMTS (two-antenna system), 4G LTE-A (eight-antenna
system), and 802.11n [8]. Massive MIMO technology has great potential for
improving transmission capacity by exploiting benefits of spatial multiplexing [9].
Massive MIMO Pre-coders for Cognitive Radio Network … 213

Pre-coding technique has the capability of maximizing the data rate of multiple trans-
mission streams at the transmitter by suppressing the interferences between different
users in massive MIMO-CRN [10].

2 System Background

2.1 Cognitive Radio Network (CRN)

As per the report of Federal Communication Commission (FCC), rigid spectrum

assignment for licensed services is highly inefficient. A few portion of spectrum are
highly dense, while other licensed bands are unused for 90% of time [11, 12]. This
inefficiency of spectrum assignment will lead to new research direction. Intelligence
CRN can find dense and free spectrum and dynamically select their transmission
waveform; channel access approach and networking protocol require better quality of
network and desired application need [13]. Functionality of CRN can be understood
with the help of Fig. 1.
In spectrum sensing, CRN finds available spectrum holes periodically for sharing
with SUs. Spectrum holes are unutilized or less utilized frequency channel during
sensing period. Spectrum decision selects the best channel from spectrum holes based
on quality of service requirement of SUs. Channel selection decision should also be
based on channel state information, channel losses, opportunistic scheduling, and
interference avoidance at the PU. Spectrum sharing has to coordinate the access of
SUs by avoiding mutual interference between the users. Spectrum can be shared
in interweave, overlay, and underlay mode. In interweave mode, vacant channel in
idle condition of PUs is utilized by secondary while in underlay mode SUs can

Fig. 1 Functions of cognitive radio networks [2]

214 M. Kothari and U. Ragavendran

communicate with PUs simultaneously under some interference temperature limit.

If signal level of SU is greater than interference temperature limit, then it produces
interference in communication of PUs. Based on interference, two types on CR
network may be established: non-interfering CRN and interference-tolerant cognitive
radio.
Non-interfering cognitive radio worked in interweave mode. In this mode, no inter-
ference is produced by SUs that affect the communication of primary systems. It only
reuses spectrum holes that are not currently used by the PUs. Wireless regional area
network (WRAN) is an example of non-interfering CRN. In interference-tolerant
CRN, primary systems share the whole spectrum with the SUs. In this kind of
network, SUs have to maintain their signal level below interference temperature
limit without producing any interruption on primary operation. Primary network
intervention is required for determining interference temperature limit. Fig. 2 shows
different types of interferences may exist in interference tolerant CRN.
For improving the performance, interference mitigation techniques are required
that can mitigate interference within PUs, within SUs, and between PUs and SUs.
Interference mitigate techniques categories in Table 1 can be applied at transmitter or
receiver. Interference mitigation applied at receiver is called as interference cance-
lation or suppression techniques while interference mitigation applied at transmitter
is known as interference avoidance techniques.

Fig. 2 Interferences types in CRN network [14]

Massive MIMO Pre-coders for Cognitive Radio Network … 215

Table 1 Interference cancelation and avoidance techniques [15]

Region White space Gray and black space All regions
Techniques No cancelation needed Cancelation techniques at Avoidance at SU
SU receiver transmitter
• Receive beam forming • Spectrum shaping
• Using filter • Predistortion filtering
• Transform based • Spread spectrum
• Cyclostationarity • Transmit beamforming
(for single data stream)
• Pre-coding (for multiple
data stream)

2.2 MIMO System

In wireless communication, information is transmitted from few meters to thousands

kilometer distance without any physical connections from electrical conductors,
wires, or cables. Voice/data connections in wireless communication are following
the Martin Cooper law, i.e., double in every 2.5 years since the beginning of wireless.
For reliable wireless communication, the biggest challenge is to protect transmitted
signals from fading and shadowing due to multipath propagation and large obsta-
cles between transmitter and receiver. By improving throughput, reliability of the
wireless communication system can be improved that depends on bandwidth, cell
density, and spectral efficiency.

Fig. 3 Classification of wireless communication systems

216 M. Kothari and U. Ragavendran

Wireless communication can be classified as single antenna system and multi-

antenna systems based on number of antenna at transmitter and receiver side shown
in Fig. 3. Multi-antenna can be divided into three types. Single input multiple output
(SIMO) systems have only one antenna at transmitter and more than one antenna at
receiver while multiple input single output (MISO) systems have multiple antenna
at transmitter while single antenna at receiver. But MIMO system that has multiple
antennas at transmitter as well as at receiver has been much attracted because of
linear increment in channel capacity with increments in number of antennas. But in
SIMO and MISO system, channel capacity increases logarithmically [16]. In SU-
MIMO system, multiple antennas of BS are providing communication facility to
single user at one instant, but in MU-MIMO system, multiple antennas of BS are
shared by more than one user at one instant. If the ratio between number of antenna
at BS and users connected to BS is more than eight then this system is termes
as massive MIMO system. Massive MIMO system supports time division duplexing
operation for utilizing channel reciprocity [17] while MU-MIMO supports time divi-
sion and frequency division duplexing. Another advantage of massive MIMO over
MU-MIMO is no variation in link quality with time and frequency [18].

2.3 CRN MIMO

CR and MIMO both are the technologies used in wireless communication at physical
layer. CR technology is used in different bands for improving usage of spectrum while
MIMO concept is used to improve the spectral efficiency of same band by enhancing
throughput of or reducing inter-user interference (IUI) of communication link. By
combining benefits of both technologies, a new model CRN MIMO can be developed
that can support dynamic spectrum selection as well as effective utilization of selected
spectrum by exploiting benefits of spatial multiplexing [19]. For exploiting benefits of
spatial multiplexing, channel state information (CSI) can be acquired with the help of
supervised and unsupervised algorithms. In supervised approach, channels learn after
every coherence period by transmitting pilot symbol between transmitter and receiver
that reduces system throughput and spectral efficiency [20]. In unsupervised approach
without the knowledge of channel, information is extracted from the received signal.
Supervised pre-coding is designed by combining both approaches [21].

2.4 Massive MIMO Systems

Massive multiple input multiple output (MIMO) system utilizes the same frequency
band concurrently to handle so many components at the base station (BS) and multiple
user equipment (UE) [22]. The spectral efficiency and link reliability of massive
MIMO have improved many folds in contrast to the existing MIMO technology
[23, 24] with reduction in operational power and BS hardware costs also [25, 26].
Massive MIMO Pre-coders for Cognitive Radio Network … 217

Fig. 4 Massive MIMO

system component [28]

This advancement increases the computational complexity at BS as compared to the

existing systems.
Figure 4 shows the system components of massive MIMO system. This system
generally operates on the principle of time division duplex (TDD) at the same
frequency. TDD mode has benefits like requirement of channel information only at
BS and no dependency of uplink estimation overhead on active antenna; it depends
only on number of terminals [27].

3 Pre-coding in Massive MIMO

Pre-coding is the generalization of beamforming that helps in transmission and recep-

tion of multiple data stream with reduced interference. Transmit pre-coding aligns
the transmitting signal toward the desired terminal and segregates among the signal of
other terminals at receiver during combining. Reliability of received signal increases
with increment in number of antenna [29]. Pre-coding is applied at BS for transmit-
ting multiple data stream from multiple transmit antennas having independent and
appropriate weightings in such a way that link throughput should be maximized.
In massive MIMO, for estimating the CSI, user transmits signal to BS. This signal
consists of some references frame. Let x 1 , x 2 , · · · x k be the signal of k different users.
Each user transmits the signal at same time and frequency to BS. Signal received on
antenna of BS is

y = H1 x1 + H2 x2 + · · · + Hk xk (1)

where H1, H2 · · · H k are the channel parameters added with signal. This parameter
is utilized at BS for estimating the channel. Let S 1 , 2 ,· · · , k be the actual data of user
that BS transmits to user. After apply pre-coding, Z will be transmitted by BS

Z = P1 S1 + P2 S2 + · · · + Pk Sk (2)
218 M. Kothari and U. Ragavendran

where P1 , P2 , · · · Pk are the pre-coding value multiplied with actual signal in

such a way it reduces the IUI. Data is received at user, assuming channel reciprocity
in time division duplexing,

u 1 = H1H Z + n (3)

u 2 = H2H Z + n (4)

Similarly,

u k = HkH Z + n (5)

where u1, u2 , · · · uk are the data received to user.

For user 1,

(6)
Desired signal Inter-user interference

Designing of pre-coding should be such that

HiH P j = 1 for i = j (7)

And

HiH P j = 0 for i = j (8)

Pre-coding technique may be categorized as cooperative and non-cooperative pre-

coding. It may be categorized as linear pre-coding, nonlinear pre-coding, and constant
envelope pre-coding. Linear pre-coding is more preferable over nonlinear pre-coding
as the number of antenna increases at BS. Zero forcing pre-coder, minimum mean
square error [30], maximum ratio transmission [31], and conjugate beam forming are
the linear pre-coding techniques while in nonlinear pre-coding dirty paper pre-coding
(DPC) is one of popular techniques. Nonlinear pre-coding adequately disposes
of the interference between various clients and accomplish ideal. Computational
complexity of linear pre-coder is less than the nonlinear pre-coder, but nonlinear has
less compatibilities to enhance data rate as compared to linear pre-coding. In constant
envelope (CE) pre-coding scheme, amplitude of transmitting signal is restricted at
certain level to reduce power. In this technique, a CE signal is transmitted by each
antenna, and phases are used at the receiver to regenerate desired information signals
[32].
Massive MIMO Pre-coders for Cognitive Radio Network … 219

3.1 Linear Pre-coding Techniques

A simplified linear pre-coding technique for massive MIMO system is implemented

with pre- or post-processing where bit error rate (BER) is determined for 32 × 8, 64
× 8, and 128 × 8 massive MIMO systems using pre-coding. Zero forcing transmitter
(ZFT) and matched filter bound (MFB) give same performance during interference
cancelation but it differs during post-processing due to noise. Under same frequency,
zero forcing (ZF) algorithm has a dependence on pseudo-inverse value of channel
matrix, but MRT/MRC and EGT/EGC have dependence on generated interference
and that can be removed by iterative interference canceller [33].
Throughput of conventional regularized zero forcing (RZF) pre-coding is high
with high computational and usage implementation complexity for required channel
matrix inversion. Polynomial expansion-based linear pre-coding technique that is
truncated polynomial expansion (TPE) has a low complexity linear pre-coding
scheme which is used to improve complexity and throughput of massive MIMO
system. TPE pre-coding schemes empower basic hardware implementation by
approximating the inversion with the help of TPE. For comparing the complexity
between RZF and TPE pre-coding, operations are divided into two timescales. First
scale includes operations that take place only one time during one coherence period,
and second scale includes operations every time in the channel of downlink transmis-
sion. As per the analysis, RZF pre-coding is more suitable for long coherence time
as compared to number of transmitting antenna and single antenna user while TPE
pre-coding is more suitable for small coherence time as compared to the number of
transmitting antenna and single antenna user. The performance of RZF and TPE pre-
coder is similar with reference to low signal-to-noise ratio (SNRs) and extensive CSI
errors. But for high SNR and accurate channel information and when TPE order is
low, the performance of TPE pre-coding is poor in comparison with RZF pre-coding
and for high SNR and accurate channel information and when TPE order is high; the
performance of TPE pre-coding is close to RZF pre-coding for single cell scenario
[34].
Minimum mean square error (MMSE) linear pre-coding proposed in [35] is
designed with constraint of transmit power. Simulation results of channel matrices
element are independent and identical distributed Gaussian random variable with zero
mean and unit variance. A characteristic between sum rate and number of antenna is
plotted for linear feedback, perfect CSI, and scheme proposed by author. As per the
characteristics, sum rate increases linearly on increasing number of antenna close to
15, from 15 to 250, sum rate increases but not linear, and after 250, sum rate tends
toward the saturation. Sum rate of proposed scheme is higher as compared to other
scheme [35].
A downlink linear pre-coder suited with quadrature amplitude modulation (QAM)
for massive MIMO system is designed for uniform linear arrays (ULA) and uniform
planar array antenna (UPA) used at the BS. For maintaining the orthogonal between
users, spatial virtual channel model (VCM) representation is used in downlink chan-
nels for dividing the distant user into spatial sector. The per-group pre-coding
220 M. Kothari and U. Ragavendran

approach enhances the gain on the downlink (throughput increase by 60% or

200% in some case) and pre-coding complexity of transmitter and receiver decay
exponentially [36].
A low complexity compensation algorithm for tracking the outer pre-coder under
the two-tier pre-coding scheme and time-varying channels attempts to battle different
implementation challenges like the enormous pilot symbols and feedback over-
head, real-time global CSI requirement, and high computational complexity raised
in massive MIMO systems. In [37], pre-coder is divided into two sections at the BS:
inner and outer pre-coder. Function of outer pre-coder is to reduce inter-cell and
inter-cluster interferences, and function of inner pre-coder is spatial multiplexing of
intra-cluster user. An iterative algorithm is used that decreases the computational
complexity of outer pre-coder, which is derived by determining global optimal solu-
tion on the Grassmann manifold. Compensation algorithm is used for convergence to
global optimal solution. Characteristics are plotted between mean cell throughputs
versus SNR and mean cell throughput versus speed under different base lines that
shows the throughput performance of two-tier pre-coding with compensation algo-
rithm is better with lower complexity as compared to two-tier pre-coding with block
diagonalization. Key parameters of pre-coding techniques are compared in Tables 2
and 3.
A two-stage subspace constrained pre-coder proposed for massive MIMO cellular
systems uses the spatial channel correlation structure. At each BS, this pre-coder is
isolated into internal pre-coder and transmits subspace control matrix. This pre-
coding technique is the design for reducing intra-cell and inter-cell interference by
following non-trivial-based design optimization. In this pre-coding scheme, inner

Table 2 Pre-coding techniques comparison

Techniques Number of user Number of antenna at Performance metric
user terminal (mobile
station)
Truncated polynomial Multiple user Single User rate
expansion [38]
Conjugate beam Single and multiple Multiple Average rate
forming [39] users
Zero forcing (ZF) and Multiple user Single Downlink transmit
maximum ratio power, achievable sum
transmission [31] rate
Minimum Multiple user Single Achievable sum
mean-squared error square error
[30]
Regularized Zero Multiple user Single Throughput, Spectral
forcing [34] efficiency
Cell edge aware [40] Multiple user Single Achievable sum rate
per cell, coverage
probability
Massive MIMO Pre-coders for Cognitive Radio Network … 221

pre-coder is based on ZF pre-coding while transmit subspace control is based on the

quality of service (QoS) optimization. Because of dependency of inner ZF pre-coder
on transmit subspace control variable and non-convex nature of transmit subspace
control variable, there is few constraints in the existing approximation method.
For improving QoS optimization, biconvex approximation approach is used. This
approach contains three steps, i.e., interference approximation step, SINR chance
constraint restriction step, and semi-definite relaxation step. Performance of this
pre-coding technique is compared with fractional frequency reuse (baseline 1), clus-
tered CoMP (baseline 2), and joint spatial division and multiplexing per-group
processing (JSDM-PGP) (baseline 3). Performance of this pre-coding scheme is
better to baseline 1, baseline 3, and also from baseline 2 with backhaul latency of
10 ms [41].

3.2 Nonlinear Pre-coding Techniques

The proposed low complexity hybrid pre-coder design with lower feedback bits
utilizes the block sparsity structure, i.e., similar to virtual channel. This hybrid pre-
coding algorithm is divided into two steps. First step “preliminary block-support
sets identification” uses greedy sequence clustering for finding relation between
current element and existing element, and second step “complete block-support sets
identification”. With the help of simulation result, spectral efficiency, and SNR char-
acteristics is plotted for optimal performance, orthogonal matching pursuit, greedy
sequence clustering-based sparse pre-coding scheme which lessen number of quan-
tization bits for the analog pre-coder by well arranging the columns as per the block
sparsity structure [42].
Full-dimensional massive MIMO systems proposed use multi-layer pre-coding
for productively overseeing various types of interference and utilizing the expansive
channel attributes. For reducing inter-cell interference and intra-cell multiuser inter-
ference, enhancing effective signal power, three-layer pre-coding technique is used.
Under one-ring channel models for cell interior users and under single-path channels
for every user, this technique gives optimal performance [43].

3.3 Constant Envelope Pre-coding Techniques

Box-constrained regression techniques are utilized for transmission of CE signals. An

exceptionally versatile systolic design is actualized for sixteen clients and processing
elements (PEs) used for pre-coding. For smaller area ratios, CE pre-coder has a lower
sum rate than that of ZF because of the multi-client interference, but by expanding
number of antennas will enhance the performance of CE due to large degree-of-
freedom. This systolic design brings about a high throughput with a gate count of
14 K and power consumption of 3.96 mW per-antenna per user per iteration for each
222 M. Kothari and U. Ragavendran

Table 3 Key points of liner pre-coding techniques

Linear pre-coding techniques Remark
Minimum mean square error (MMSE) [5] Sum rate increases till the no. of antenna at BS
is 250 and then goes to saturation
Sum rate increases linear for ten users, and
then increment is slow
VMC decomposition of the downlink channels Better performance as compared to MMSE
[36]
Truncated polynomial expansion [34] Number of transmitting antenna at base station
= 256
No. of users = 64

PE 65 nm technology. For hardware optimization, clock gating, hardware reuse, and

pipelining techniques are used [44].
A low complexity per-antenna CE pre-coding algorithm is proposed for
frequency-selective channel which provides the fruitful information about the power
consumption of massive MIMO system. For Rayleigh fading channel, 3 dB reduc-
tion in total transmit power can be achieved by adding the number of BS two times.
The performance of Rayleigh fading frequency-selective channel for CE pre-coding
is better under transmit power constraint (TAPC) as compared to uniform average
power delay profile [45].
An omnidirectional pre-coding (OP)-based transmission for public channels in
massive MIMO systems prompts a noteworthy diminishment in the downlink pilot
overhead. In this system, Rayleigh flat-fading channel is considered. System perfor-
mance was measured in terms of achievable ergodic rate, outage probability, and
peak-to-average power ratio (PAPR). This pre-coding method maximizes the achiev-
able ergodic rate and achievable diversity order and minimizes the outage probability
for the independent and identically distributed (I.I.D.) and spatially correlated chan-
nels in the large-scale array regime. This pre-coding method generates the PAPR of
the transmitted signal into its state [46] .
An omnidirectional pre-coding and omnidirectional combining-based synchro-
nization technique is proposed [47] for millimeter wave massive MIMO system.
Pre-coding matrices are devised at BS, and for initial downlink synchronization,
combining matrices are devised at user terminal. For these two matrices, constant
amplitude and the transmission power averaged considered for the total K time
slots ought to be consistent for any spatial direction, because for implementing the
architecture of both matrices, phase shifters network is used in the analog domain.
Generalized likelihood ratio test (GLRT)-based synchronization detector is used to
understand the response to the false alarm (FA) and missed detection (MD) proba-
bility of the pre-coding and combining matrices for I.I.D. channel and single-path
channel. In this paper, the performance of omnidirectional, quasi-omnidirectional
and random pre-coding, and combining methods is compared by designing char-
acteristics between MD probability and SNR by keeping channel path and time
slot constant. From the characteristics, it can be concluding that omnidirectional
Massive MIMO Pre-coders for Cognitive Radio Network … 223

pre-coding and combining method has best performance as compared to other two
methods for omnidirectional coverage at BS and user terminal.

3.4 Pre-coding in MIMO CRN

Two subspace projection-based pre-coding schemes are as follows: full projection

and partial projection pre-coding proposed in [14] for nullifying the interference
between PU and SU and improving SU’s throughput in CR MIMO systems.
An opportunistic interference alignment (OIA) pre-coding technique improves
the data rate of SU with mitigating interference between PU and SU. In this paper,
this technique is proposed for multi-SU-MIMO underlay CRN for Rayleigh channel
model, Nakagami-m, and Rician model. SUs achievable rate increase due to increase
in diversity order while it decreases due to non-availability of spatial dimension [48].

4 Conclusion

CRN massive MIMO system has the capabilities to improve the efficiency of spec-
trum as well as improve the spectral efficiency within the band by reducing inter-
user interference with the help of pre-coder before transmission and detector at the
receiver. This detailed review process covered the constant envelope, omnidirec-
tional, multi-layer, two-stage subspace, and hybrid pre-coder with its specifications,
merits, and demerits. In this review, it was found that the multi-layer pre-coder by
extracting maximum signal at first layer and its performance can be improved due to
decontaminating pilot signal. For increasing the users at the base station for concur-
rent communications by enhancing the reliability and throughput of system, number
antennas at the base station should be increased. But at the same time, interference
between channels may also increases. Pre-coding and detection techniques should be
adaptable with client’s enhancement for maintaining required bit error rate. Minimum
mean square, VMC decomposition, and regression-based linear pre-coding tech-
niques give better performance for 24 mobile users with 64 or 128 antennas at
base station while truncated polynomial expansion gives better performance with 64
mobile users with 256 antennas at base station, but the rate of performance degrades
when the number of transmitting antenna increases. Further, pre-coding algorithm
implementation with ratio between user and antenna at base station less than eight
is the new challenge for massive MIMO-CRN.
224 M. Kothari and U. Ragavendran

Reference

1. Alliance NGMN (2016) Perspectives on vertical industries and implications for 5G. White
Paper (June 2016)
2. Kusaladharma S, Tellambura C (1999) An overview of cognitive radio networks. Wiley
Encyclopedia Electr Electron Eng 1–17
3. Mitola J, Maguire GQ (1999) Cognitive radio: making software radios more personal. IEEE
Personal Commun 6(4):13–18
4. Abdulkadir Y, Simpson O, Sun Y (2019) Interference alignment for cognitive radio commu-
nications and networks: a survey. J Sens Actuat Netw 8(4):50
5. Haykin S (2005) Cognitive radio: brain-empowered wireless communications. IEEE J Sel
Areas Commun 23(2):201–220
6. Seyfi M, Muhaidat S, Liang J (2013) Relay selection in cognitive radio networks with
interference constraints. IET Commun 7(10):922–930
7. Mathuranathan V. MIMO (2014)—Diversity and spatial multiplexing. https://www.gaussianw
aves.com/2014/08/mimo-diversity-and-spatial-multiplexing/
8. Perahia E (2008) IEEE 802.11n development: history, process, and technology. IEEE Commun
Mag 46(7):48–55
9. Fu L, Zhang YJA, Huang J (2013) Energy efficient transmissions in MIMO cognitive radio
networks. IEEE J Sel Areas Commun 31(11):2420–2431
10. Nguyen V-D, Tran L-N, Duong TQ, Shin O-S, Farrell R (2016) An efficient precoder design
for multiuser MIMO cognitive radio networks with interference constraints. IEEE Trans Veh
Technol 66(5):3991–4004
11. Force, FCC Spectrum Policy Task (2002) Report of the spectrum efficiency working group.
https://www.fcc.Gov/sptf/files/SEWGFinalReport_1.Pdf
12. Islam MH, Koh CL, Oh SW, Qing X, Lai YY, Wang C, Liang Y-C et al (2008) Spectrum survey
in Singapore: occupancy measurements and analyses. In: 2008 3rd International conference on
cognitive radio oriented wireless networks and communications (CrownCom 2008), pp 1–7.
IEEE
13. Mitola J (1999) Cognitive radio for flexible mobile multimedia communications. In: 1999
IEEE international workshop on mobile multimedia communications (MoMuC’99) (Cat. No.
99EX384), 3–10. IEEE
14. Chen Z, Wang C-X, Hong X, Thompson J, Vorobyov SA, Zhao F, Ge X (2013) Interference
mitigation for cognitive radio MIMO systems based on practical precoding. Phys Commun
9:308–315
15. Chen Z (2011) Interference modelling and management for cognitive radio networks. PhD
dissertation, Heriot-Watt University
16. Paulraj AJ, Gore DA, Nabar RU, Bolcskei H (2004) An overview of MIMO communications—a
key to gigabit wireless. Proceedings IEEE 92(2):198–218
17. Björnson E, Larsson EG, Marzetta TL (2016) Massive MIMO: ten myths and one critical
question. IEEE Commun Mag 54(2):114–123
18. Björnson E (2017) Six differences between MU-MIMO and Massive MIMO. https://ma-mimo.
ellintech.se/2017/10/17/six-differences-between-mu-mimo-and-massive-mimo/
19. Dapena A, Castro PM, Labrador J (2010) Combination of supervised and unsupervised algo-
rithms for communication systems with linear precoding. In: The 2010 international joint
conference on neural networks (IJCNN), pp 1–8. IEEE
20. Datta A, Mandloi M, Bhatia V (2019) Reliability feedback-aided low-complexity detection in
uplink massive MIMO systems. Int J Commun Syst 32(15):e4085
21. Gao C, Shi Y, Thomas Hou Y, Kompella S (2011) On the throughput of MIMO-empowered
multihop cognitive radio networks. IEEE Trans Mobile Comput 10(11):1505–1519
22. Marzetta TL (2010) Noncooperative cellular wireless with unlimited numbers of base station
antennas. IEEE Trans Wireless Commun 9(11):3590–3600
Massive MIMO Pre-coders for Cognitive Radio Network … 225

23. Huh H, Caire G, Papadopoulos HC, Ramprashad SA (2012) Achieving massive MIMO spectral
efficiency with a not-so-large number of antennas. IEEE Trans Wireless Commun 11(9):3226–
3239
24. Rusek F, Persson D, Lau BK, Larsson EG, Marzetta TL, Edfors O, Tufvesson F (2012) Scaling
up MIMO: opportunities and challenges with very large arrays. IEEE Signal Process Mag
30(1):40–60
25. Ngo HQ, Larsson EG, Marzetta TL (2013) Energy and spectral efficiency of very large multiuser
MIMO systems. IEEE Trans Commun 61(4):1436–1449
26. Noha Hassan ID, Fernando X (2017) Massive MIMO wireless networks. An overview. www.
mdpi.com/journal/electronics
27. Mandloi M, Bhatia V (2016) Low-complexity near-optimal iterative sequential detection for
uplink massive MIMO systems. IEEE Commun Lett 21(3):568–571
28. Manshaei MH, Félegyházi M, Freudiger J, Hubaux J-P, Marbach P (2007) Spectrum sharing
games of network operators and cognitive radios. In: Cognitive wireless networks. Springer,
Dordrecht, pp 555–578
29. Mandloi M, Hussain MA, Bhatia V (2017) Improved multiple feedback successive interference
cancellation algorithms for near-optimal MIMO detection. IET Commun 11(1):150–159
30. Li X, Bjornson E, Larsson EG, Zhou S, Wang J (2015) A multi-cell MMSE detector for
massive MIMO systems and new large system analysis. In: 2015 IEEE global communications
conference (GLOBECOM), pp 1–6. IEEE
31. Gao X, Edfors O, Rusek F, Tufvesson F (2011) Linear pre-coding performance in measured
very-large MIMO channels. In: 2011 IEEE vehicular technology conference (VTC Fall), 1–5.
IEEE
32. Pan J, Ma W-K (2014) Constant envelope precoding for single-user large-scale MISO channels:
efficient precoding and optimal designs. IEEE J Sel Top Signal Process 8(5):982–995
33. Da Silva MM, Dinis R (2017) A simplified massive MIMO implemented with pre or post-
processing. Phys Commun 25:355–362
34. Mueller A, Kammoun A, Björnson E, Debbah M (2016) Linear precoding based on polynomial
expansion: reducing complexity in massive MIMO. EURASIP J Wireless Commun Netw
2016(1), 63
35. Ge Z, Haiyan W (2017) Linear precoding design for massive MIMO based on the minimum
mean square error algorithm. EURASIP J Embedded Syst 2017(1):1–6
36. Ketseoglou T, Ayanoglu E (2018) Downlink precoding for massive MIMO systems exploiting
virtual channel model sparsity: IEEE Trans Commun 66(5):1925–1939
37. Chen J, Lau VKN (2014) Two-tier precoding for FDD multi-cell massive MIMO time-varying
interference networks. IEEE J Sel Areas Commun 32(6):1230–1238
38. Kammoun A, Müller A, Björnson E, Debbah M (2014) Linear precoding based on polynomial
expansion: large-scale multi-cell MIMO systems. IEEE J Sel Top Signal Process 8(5):861–875
39. Yue D-W, Li GY (2014) LOS-based conjugate beamforming and power-scaling law in massive-
MIMO systems. arXiv preprint arXiv, 1404.1654
40. Yang HH, Geraci G, Quek TQS, Andrews JG (2017) Cell-edge-aware precoding for downlink
massive MIMO cellular networks. IEEE Trans Signal Process 65(13):3344–3358
41. Liu A, Lau VKN (2015) Two-stage subspace constrained precoding in massive MIMO cellular
systems. IEEE Trans Wireless Commun 14(6):3271–3279
42. Liu X, Zou W (2018) Block-sparse hybrid precoding and limited feedback for millimeter wave
massive MIMO systems. Phys Commun 26:81–86
43. Alkhateeb A, Leus G, Heath RW (2017) Multi-layer precoding: a potential solution for full-
dimensional massive MIMO systems. IEEE Trans Wireless Commun 16(9):5810–5824
44. Prabhu H, Rusek F, Rodrigues JN, Edfors O (2015) High throughput constant envelope pre-
coder for massive MIMO systems. In: 2015 IEEE international symposium on circuits and
systems (ISCAS), pp 1502–1505. IEEE
45. Mohammed SK, Larsson EG (2013) Constant-envelope multi-user precoding for frequency-
selective massive MIMO systems. IEEE Wireless Commun Lett 2(5):547–550
226 M. Kothari and U. Ragavendran

46. Xia X-G, Gao X (2016) A space-time code design for omnidirectional transmission in massive
MIMO systems. IEEE Wireless Commun Lett 5(5):512–515
47. Meng X, Gao X, Xia X-G (2017) Omnidirectional precoding and combining based synchro-
nization for millimeter wave massive MIMO systems. IEEE Trans Commun 66(3):1013–1026
48. Garg S, Jain M, Gangopadhyay R, Rawal D (2016) Opportunistic interference alignment in
multi-user MIMO cognitive radio networks for different fading channels. In: 2016 Twenty
second national conference on communication (NCC), pp 1–6. IEEE
Design of MIMO Antenna Using
Circular Split Ring Slot Defected Ground
Structure for ISM Band Applications

F. B. Shiddanagouda, R. M. Vani, and P. V. Hunagund

Abstract In this work, a systematic approach for design of MIMO antenna using
circular split ring slot defected ground structure for Industrial Scientific and Medical
(ISM) band applications. The overall MIMO antenna is inserted on flame retardant
fiber glass epoxy (FR-4) substrate with the dimensions of 60 × 62.8 × 1.6 mm3 . The
elements of MIMO antennas are patch antennas defected with circular split ring slot
defected ground structures (CSRSDGS). The CSRSDGS are used for patch antenna
miniaturization to ISM band applications. The dimension of the individual patch
antenna element is 11.35 × 15.25 mm2 . The proposed MIMO antenna resonates at
5.725 GHz with a bandwidth of 265 MHz and mutual coupling coefficient (MCC)
of −22.42 dB which makes it suitable to use for ISM band applications.

Keywords MIMO antenna · CSRSDGS · Bandwidth · MCC

1 Introduction

Recent growth in the wireless system has resulted in a high data rate and low latency
which are the most demanding requirements. To meet these requirements, the new
wireless standards were made. In these standards, multiple-input multiple-output
(MIMO) system is an emerging mechanization for achieving those requirements [1].
In literature, several MIMO antennas have been reported and each of the reported
papers uses different kinds of techniques to enhance the MIMO antenna performance.
MIMO antenna is an assembly of multiple antennas on the same ground plane. When

F. B. Shiddanagouda (B)
Department of ECE, Vignan Institute of Technology and Science, Hyderabad 508284, Telangana,
India
e-mail: [email protected]
R. M. Vani
Department of USIC, Gulbarga University, Kalaburagi 585106, Karnataka, India
P. V. Hunagund
Department of Applied Electronics, Gulbarga University, Kalaburagi 585106, Karnataka, India

© Springer Nature Singapore Pte Ltd. 2021 227

more than one antenna elements in the same ground plane, the excitation of surface
waves can lead to high level of mutual coupling. Therefore, it becomes challenging
to achieve high data rate and low error rate with deployment of multiple antennas
in limited space and reduction of mutual coupling in the operating frequency bands
[2]. To achieve those requirements in this paper presents a compact four element
MIMO antenna using circular split ring slot defected ground structures for ISM band
applications. The following sections of the paper are explained in a systematic way
of proposed MIMO antenna design, results and discussions, and finally followed by
conclusion.

2 Antenna Design

The prototype of simulated conventional MIMO antenna (CMA) is shown in Fig. 1.

This antenna is having four identical rectangular patch antenna elements with a
partition of λ/4 distance. Separate 50 microstrip feed line was excited to four
patch antenna elements, with their calculated dimensions are represented in Table 1,
respectively [3].
The study carried by conventional MIMO antenna (CMA) loaded with circular
split ring slot defected ground structure (CSRSDGS) to improve CMA parameters,
and it is named as proposed MIMO antenna (PMA). Figure 2 shows the optimized
unit cell of circular split ring slot defected ground structure (CSRSDGS), geometry
was obtained through parametric analysis, and final dimensions are given in Table 2.

Fig. 1 Geometry of CMA

Table 1 Dimensions of CMA

Parameters Ls Ws 1 2
Dimensions (mm) 60 62.8 11.35 15.25
Parameters 3 4 5 6
Dimensions (mm) 4.9 0.5 6.15 3.06
Design of MIMO Antenna Using Circular Split … 229

Fig. 2 Geometry of
CSRSDGS

Table 2 Dimensions of
Parameters Dimensions (mm)
CSRSDGS
Radius of the inner circle r1 2.1
Radius of the outer circle r2 3.3
Circle width Cw 0.6
Spacing between the two ring Cg 0.6
Split ring gap g 0.4

Here, four CSRSDGS are defected exactly beneath the rectangular patch of the CMA
with a partition of λ/4 distance. Figure 3 shows the geometry of PMA. The complete
investigation was done for PMA, a significant size reduction, wide bandwidth, and
mutual coupling reductions are obtained.

Fig. 3 Geometry of PMA

230 F. B. Shiddanagouda et al.

3 Results and Discussions

In this work, using ANSYS HFSS 15.0, Electromagnetic simulation software

antennas were designed. The conventional MIMO antenna (CMA) resonates at
5.9 GHz with a bandwidth of 204 MHz, and return loss of −21.3 dB is obtained
at the resonating frequency as shown in Fig. 4.
The proposed MIMO antenna (PMA) resonates at 5.725 GHz with bandwidths
of 265 MHz along with minimum return loss of −19.3 dB, respectively, as shown
in Fig. 5. Hence by loading, four CSRSDGS are in the ground plane suppress the
unwanted surface wave and to control harmonics in PMA to enhance the parameters
interns of bandwidth enhancement as compared to CMA as well as virtual size
reductions are obtained. So from Eq. (1), virtual size reduction of antenna is calculated
[4].

L C − L RA
Virtual Size reduction (% ) = × 100 (1)
LC

where L RA is the patch length of the reference (conventional) antenna and L C is

the patch length of the antenna resonating at that frequency or at reduced resonant
frequencies (proposed antenna). But the width of the patch is same at both designed
and actual resonating frequencies. So that by using CSRSDGS, the proposed MIMO
antenna (PMA) virtual size reduction of 4.6% is obtained.
The mutual coupling coefficient (MCC) is a major factor to be considered while
designing MIMO antennas because it degrades the performance of the system. Hence,
conventional MIMO antenna (CMA) gives mutual coupling between port-1 and port-
2 which is −20.9 dB at 5.9 GHz as depicted in Fig. 6, and proposed MIMO antenna

Fig. 4 Return loss of CMA

Design of MIMO Antenna Using Circular Split … 231

Fig. 5 Return loss of PMA

Fig. 6 MCC of CMA

(PMA) gives very low mutual coupling, i.e., −22.42 dB at 5.725 GHz as depicted in
Fig. 7, respectively.
In envelope correlation coefficient (ECC), it decides how much the communica-
tion channels are isolated. ECC can be estimated of individual elements from the
S-parameters [5]. So from Eq. (2), CMA achieved 0.001 ECC at 5.9 GHz is shown in
Fig. 8 and PMA achieved 0.002 ECC at 5.725 GHz is shown in Fig. 9, respectively.
∗
S S12 + S ∗ S22 2
ρ= 11
21
(2)
1 − | S11 |2 − | S21 |2 1 − | S22 |2 − | S12 |2
232 F. B. Shiddanagouda et al.

Fig. 7 MCC of PMA

Fig. 8 ECC of CMA

Fig. 9 ECC of PMA

Design of MIMO Antenna Using Circular Split … 233

Diversity gain is a critical parameter that must be taken into account while evalu-
ating the MIMO antenna performance. The diversity gain (DG) has been calculated
using the mathematical Eq. (3) using ECC [6]. The obtained diversity gain of the
CMA is 9.9 dB, and PMA is also 9.9 dB, respectively.

DG = 10
1 − |ρ|2 (3)

Therefore, the value of ECC and DG can confirm PMA is acceptable for MIMO
operation.
The peak gain of an antenna decides the area of coverage and link budget of the
system. The CMA peak gain is 5.69 dB at 6 GHz as shown in Fig. 10, and PMA peak
gain is 4.65 dB at 5.25 GHz as shown in Fig. 11, respectively.
The radiation pattern decides how antenna propagates the electromagnetic energy.
The radiation pattern is studied at CMA resonating frequency of 5.9 GHz which is a
broadside radiation as shown in Fig. 12. The PMA radiation pattern is also studied
for the respective resonating frequency points. The radiation pattern at 5.725 GHz,
which is a broad side radiation, is shown in Fig. 13.
The results obtained from the CMA and PMA are summarized in Table 3. It has
been observed that PMA parameters show that there is an acceptable limit across the
operating band for ISM band applications.

Fig. 10 Total peak gain of CMA

234 F. B. Shiddanagouda et al.

Fig. 11 Total peak gain of PMA

Fig. 12 Radiation pattern of CMA

Design of MIMO Antenna Using Circular Split … 235

Fig. 13 Radiation pattern of PMA

Table 3 Summarized results

Parameters CMA PMA
of the CMA and PMA
Resonating frequency (GHz) 5.9 GHz 5.725 GHz
Return loss (dB) −21.3 dB −19.3 dB
Bandwidth (MHz) 204 MHz 265 MHz
MCC (dB) −20.9 dB −22.42 dB
Total peak gain (dB) 5.69 dB 4.65 dB
Virtual size reduction (%) − 4.6%
ECC 0.001 0.002
DG(dB) 9.9 dB 9.9 dB

4 Conclusion

This paper presented design of MIMO antenna using circular split ring slot defected
ground structure for ISM band applications. The proposed MIMO antenna (PMA)
resonates at 5.725 GHz. The antenna offers 265 MHz bandwidth and total peak gain
of 4.65 dB. Due to the ground plane defected with CSRSDGS unit cells, mutual
236 F. B. Shiddanagouda et al.

coupling between the antenna elements reduced better than -22.42 dB as well as
virtual size reduction of 4.6% is obtained. The proposed MIMO antenna, envelope
correlation coefficients, diversity gain, total peak gain, and radiation pattern show
that it is acceptable to the ISM band applications.

Acknowledgements We would like to thankful to Indian Institute of Technology Kharagpur and

Gulbarga University Kalaburagi provided licensed versions of HFSS simulation software.

References

1. Patel R, Desai A (2018) An electrically small antenna using defected ground structure for RFID,
GPS, and IEEE802.11 a/b/g/s applications. J Prog Electromagn Res Lett 75:75–81
2. Fizzah S, Abid M (2018) Design and analysis of UWB MIMO with enhanced isolation. In:
Proceedings of ınternational electrical engineering conference
3. Balanis A (1993) Theory of antennas. IEEE Trans Antenna Propag AP-41(9)
4. Shiddanagouda FB, Vani RM, Hunagund PV (2019) Design and analysis of MIMO antenna for
next generation wireless applications. IEEE Explore 978-5386-7070-6/18/$31.0@2018IEEE
5. Han MS, Choi J (2010) Compact multiband MIMO antenna for next generation USB dongle
applications. IEEE Trans Antenna Propag AP-2(10)
6. Chi Y-J, Chen F-C (2012) 4-port quadric-polarization diversity antenna with novel feeding
network. In: Proceedings of the antenna and propagation Conference 2012
Performance Comparison of Arduino
IDE and Runlinc IDE for Promotion
of IoT STEM AI in Education Process

Sangay Chedup, Dushantha Nalin K. Jayakody, Bevek Subba,

and Hassaan Hydher

Abstract Giving early access to the knowledge and skills on the Internet of Things
(IoT) and artificial intelligence (AI) technologies would lead to early innovations and
inventions. The introduction of such technology to primary and high school students
is felt significant to harness the creativity of youths faster and earlier. In order to
do so, it is important to give them access to friendly and easy technologies. This
eventually will help them realize the potential of IoT and AI. This project studies
two platforms Arduino IDE and runlinc IDE on their user-friendliness, ease of IoT
STEM AI application development. A system model-based experimental comparison
was carried out. The microcontrollers and sensors are independent variables. The
required program code in lines of statements, time taken to develop the program code,
and parameters controlled by sensors as dependent variables. The user experience
survey was conducted to supplement the experimental findings. The respondents are
primary and high school students, university students and teachers, professionals,
and researchers. Respondents are largely who had experience of using both Arduino

This work is supported by Sri Lanka Technological Campus through the Responsive Research
Seed Grant with Grant ID RRSG/2020/B15. It is the joined project titled COVID-19 Online
BabySitting: Engaging Children through Learning STEM AI and IoT Technologies between the
Sri Lanka Technological Campus, SRI LANKA, and the Jigme Namgyel Engineering College,
Royal University of Bhutan, Bhutan.

S. Chedup (B) · B. Subba

Jigme Namgyel Engineering College, Royal University of Bhutan, Deothang, Bhutan
e-mail: [email protected]
B. Subba
e-mail: [email protected]
D. N. K. Jayakody
Centre for Telecommunications, School of Engineering, Sri Lanka Technological Campus,
Padukka, Sri Lanka
e-mail: [email protected]
H. Hydher
Sri Lanka Technological Campus, Padukka, Sri Lanka
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 237

and runlinc. Both the experiment result and the survey showed that the runlinc is
easy and faster to realize the development of IoT and AI application development.

Keywords IoT · AI · Arduino IDE · STEMSEL · Runlinc · Education

1 Introduction

Living through the Fourth Industrial Revolution 4.0 (4IR), the technological advance-
ment has brought in innovations, inventions which benefited mankind in many folds.
The impact of 4IR is already felt in many dimensions, and it is expected to further
grow with ever-growing technological and socioeconomic evolution taking place
around us [1]. But, on the other hand, 4IR has also proved to be disruptive technol-
ogy to the extent of world leaders calling for the move to abandon it [2, 3] because AI
in particular will expedite human extinction on the planet earth. Industrial revolutions
have also led to wealth concentration. The AI in particular has intensified risks of
dehumanizing to the point human now face an existential threat in both environmen-
tal and humanitarian terms [3]. Leading to the Fifth Industrial Revolution 5.0 (5IR)
in contrast to 4IR, the technology and innovations best practices are being bent back
toward the service of humanity by the champions of 5IR [3]. Tide is changing to AI
fighting AI by transforming it for the benefit of humanity by harnessing its positive
potential. Scientific community is focused in harnessing the potential of emerging
technologies for rapid development across all spectrum of applications.
Dr. George Land’s creativity test for NASA and similar studies are however even
relevant today [4–7]. As per the authors [7–10] early access to the knowledge and
skills on technologies are important to harness the creative thinkings. This would
eventually result to early innovations and inventions. As per [8], only around 28 per-
cent of high school entrants declare interest in a STEM-related field. However, 57%
of these students lose interest in STEM fields by the time they graduate from high
school. The need for STEM literacy in finding the global solutions and the prospect
of fostering its impact at all levels of educations have led to developing a STEM
curriculum [9]. This is expected to transform the STEM education in preparing the
future citizens through collaborative and interdisciplinary approach. The importance
of STEM education is realized, and initiatives are taken to enhance as per [10, 11] to
meet the STEM skills required in the job market in higher education. Java language
is fastest compared to [12] in performance which in turn play vital role in promot-
ing STEM and computer science education. As a measure to promote project-based
learning toward finding the real-world solution, it is recommended in [13] that the
school administration encourages STEM programs at the K12 level of education.
This will have bigger impact on the number of STEM courses offered at the uni-
versity education [13, 14]. However, study on the prospect of greater impact by the
emerging technology if such technologies are introduced early is very limited. STEM
education based on technology such as Arduino, Raspberry Pi microcontroller are
largely prevalent in higher education, research and professional applications. The
Performance Comparison of Arduino IDE and Runlinc IDE . . . 239

access to these technologies is almost non-existent in primary and high school edu-
cation particularly in disadvantaged society. Considering this prevailing situation and
the prospect of reaching technologies is for innovations, it is felt that the awareness,
knowing, and recognizing the right tool is important. Providing the knowledge and
skills early would mean harnessing and unlocking the creativity of youths, thereby
reaching higher education skills to grassroots many folds early.
To realize the potential of AI and IoT in enhancing the quality of education and
significance of transformed STEM education reaching out to the grassroots, this
paper examines two basic AI and IoT development platforms Arduino IDE and run-
linc IDE. This work intends to compare the usability, accessibility, user-friendliness,
and potential of these two platforms for better understanding and development of AI
and IoT applications. The comparison of the two platforms are drawn on experiment-
based case study carried out by authors on ease of coding, performance based on time,
and interactivity of the platforms. User experience survey is carried out to ascertain
the experimental findings. Survey respondents include student between the age of
10–15, university students, and professionals with first-hand experience on both the
platforms. The work is arranged in following sections: Sect. 2 outlines the opportu-
nities and challenges with IoT and AI for enhancing the quality of education through
literature review. Section 3 presents the case study carried out through experimental
set up where AI- and IoT-enabled smart home system model is considered. In Sect. 4,
the platforms for AI and IoT applications development Arduino IDE for Arduino and
runlinc IDE for STEMSEL are studied on the basis of ease of coding and usage. In
Sect. 5, two platforms are studied through experimental analysis. In Sect. 6, results of
experiment and the data collected from the survey are further analyzed. The general
findings and potential future works are presented in Sect. 7.

2 Significance of IoT and AI in Education:

Opportunities and Challenges

IoT have potential to connect 28 billion devices to the Internet by 2020 [15]. Fur-
ther, as per [16], 127 new IoT devices are connected to the web in every second
and estimates the installation of 31 billion IoT devices during 2020. In all these IoT
applications, variety of microchips or microcomputers are used as a backbone. In
[17], UNESCO outlines AI for sustainable development, its challenges and oppor-
tunities. However, UNESCO report remains only a prospect of AI for sustainable
development where very little is achieved thus far. Leveraging the AI and IoT tech-
nologies can lead to achievement of United Nations Sustainable Development Goals
(UN SDGs) particularly in poverty elimination, gender balance and enhancing qual-
ity of education. The executive summary in [18] outlines the analysis on how AI
can be used to improve learning outcomes with the priority to accomplish sustain-
able development goal 4: Equitable and Quality Education For All. Reference [18]
also outlines the six challenges on the context of AI technology for quality education
240 S. Chedup et al.

where main challenges are examined in four categories. References [19, 20] examine
the prospect of positive impact by IoT in transforming technical higher education.
Further, it states that impact of IoT will be greater in higher education, particularly
in universities. The smart education concept is thoroughly analyzed through liter-
ature review in [21] from theoretical point of view. The research opportunities in
AI and IoT are outlined to make learning more creative and attractive for students
by incorporation of developing hands-on skills and capacity in controlling or oper-
ating appliances used in everyday life into educational settings. Rajput [22] found
out that despite the tremendous progress made in technological advancements in
the past years, incorporation of technology to ensure smart classroom systems has
been delayed for some reasons. It is important to leverage modern emerging tech-
nologies in teaching–learning. Authors in [22] further implied on the significance
of changing classroom settings based on emerging technologies to ensure teaching
learning becomes more interactive. Through such measures, students should be able
to identify problems and come up with ideas to solve. “IoT can reform the education
system considering many benefits such as active engagements, enhancing quality of
instruction and greater efficiency” [22].
The potential of IoT in developing assistive technology in tertiary education for
people with disabilities is theoretically reviewed in [20]. Banica et al. in [23] con-
cluded that the concept of IoT has great potential to remove all barriers to education,
such as physical location, geography, language and economic development. The
combination of technology and education would lead to faster and simpler learning,
improve the level of knowledge and the quality of students. But, as any new con-
cept emerged, it still has no widespread functional models and standards; moreover,
universities are not prepared to accept all the changes proposed by IoT in the edu-
cational sector. In order to work toward achieving all the potential of IoT and AI,
penetration of these technologies is important in primary and high schools, not only
in tertiary education and researchers. Introducing these technologies at early stage of
education will have greater potential to unlock the imagination and creativity among
the youths in developing a better technology for sustainable development. While so
much research has happened and continues to happen on emerging technologies,
introduction of AI and IoT platforms is more or less uncommon to young minds,
particularly for youths in disadvantaged societies. STEM education is perceived to
be more demanding and difficult among high school graduates, and this has led to
decrease in enrollment against the need for creative, innovative, and talented work-
force [10]. In [9], it was noted that secondary teachers are more reluctant toward
transforming STEM education as compared to the positive response from the early
years and elementary cohorts. Incentives for the faculty involved in STEM education,
outreach, and its potential impact were studied in [11]. Reference [24] identifies the
lack of soft skills and formal training of educators in concepts of STEM education as
major obstacle for STEM-based education. The need for better understanding, better
teaching methods and more help in STEM education was identified and provided
recommendations to advance the STEM education [15]. Further, the authors in [25–
30] examined the IoT platforms and applications; however, its impact on education
process is not strongly mentioned.
Performance Comparison of Arduino IDE and Runlinc IDE . . . 241

Providing hands-on skill development and experiential learning to harness the

imagination and creativity for innovation and inventions are significant. It is pos-
sible by exposing them to AI, IoT and other emerging technologies and provide
access to platforms in their early learning stage. It will have immense benefit for the
fulfillment of UN SDG goals, particularly the quality of education and poverty reduc-
tion through entrepreneurship skills derived from transforming Science Technology
Engineering Mathematics (STEM) education to STEM AI Social Business (SASB).
Authors in [31] outlined that IoT will affect every part of society particularly the
educational institutions and universities by transforming the learning environment
based on hands-on experience and experiment-based methods. Reference [32] pro-
posed an education learning mechanism named intelligence of learning things (IoLT)
and concluded that “IoT will have huge impact on higher education with potential on
cost reduction, time saving, enriched safety and improved collaboration”. Reference
[33–39] discussed on the key role that AI can play in leading the path for innova-
tion within IoT for Education 4.0. Reference [33] further outlines that Education
4.0 influenced by IoT and other emerging trends imposed by Industry 4.0 means the
future job market would expect the future workers are embraced with emerging tech-
nologies not only knowledgeable. Authors in [36, 37, 39] outlines the importance of
universities in playing key role in providing platform for educating the future gener-
ation and innovation responding to Industrial Revolution 4.0. A concerted effort is
required in making digital and data literacy more accessible to students regardless
of discipline.
However, as per the knowledge and references, there are no study done on plat-
forms which can better reach IoT and AI technology to grassroots. Study on impact of
reaching such technologies to K-12 level of education is also not carried out. There-
fore, among many platforms, two platforms runlinc IDE is compared with Arduino
IDE on the prospect of reaching IoT and AI technology to K-12 level of education
and its benefit for early creative thinking and innovations.

3 System Model: Smart Home

An experimental setup, sub-set of larger IoT-based smart home in Fig. 1 is set up

by deploying three basic sensors to collect four different parameters for comparison
on the basis of coding. The smart home model is chosen mainly to understand and
experience the real-working scenario on how the homes can be automated and made
smart by deploying smart sensors. The home devices are also controlled remotely via
Internet hence making a deployment of IoT. The sensors deployed detects the envi-
ronmental conditions. For example, light-dependent resistor (LDR) detects the light
intensity when exposed to different weather conditions. An analog data is generated,
and a threshold is set for the control of LED/Lamp.
If the intensity of light is less than or equal to the threshold value, the microcon-
troller is activated and put ON the LED/Lamp and vice versa automatically. On the
other hand, the same data is in the Internet. Through the Internet, program code is
242 S. Chedup et al.

Fig. 1 System model

Table 1 Peripheral devices used to build system model

Components Function/Purpose
LDR Analog data on the light intensity generated as it is exposed to different
lighting conditions
Temperature sensor Analog data on temperature is generated as it is exposed to different
(LM35) environment/surrounding conditions
Thermistor Used to generate analog data depending on the different
environment/surrounding conditions
Temperature sensor It is used in virtual simulation in tinkarCAD to perform the same
(TEMP36) functions as that of LM35 and thermistor
Relay 2 relay module SRD-05 V Dc-SL-C is used for switching high power
lamps
DC motor 2 pin 5 V DC motor used to drive the fan depending upon the sensor data

sent to microcontroller board. Simultaneously, same web page is opened on remote

device (mobile), and the control operation of output device (LED/lamp) is sent via
internet to microcontroller, thereby putting the LED/lamp ON or OFF remotely. This
procedure happens for other sensors deployed in the system model. The smart sensors
and their functionality are described in Table 1 as part of architecture. The model is
only used by author to experimentally compare the two platforms.
Performance Comparison of Arduino IDE and Runlinc IDE . . . 243

Fig. 2 Methodology

Further, in order to ascertain the findings from experiment by authors, user expe-
rience Google survey on Arduino and runlinc was incorporated in the findings. The
sample includes 8 students below the age of 15, 67 university students between the
age 15–25, and 25 professionals from teaching and research above the age of 25.
The sample includes the users with experience on both the platforms. Responders
are from Bhutan, Australia, Malaysia, Uganda, Nigeria, Cameroon, and Philippines.
The waterfall methodology in Fig. 2 was used to ensure the key stages of study are
maintained and fulfilled accordingly.

4 Results and Discussion

5 Microcontrollers

5.1 Arduino UNO and Arduino IDE

Arduino [27, 40, 41] is an open-source electronics platform useful to make interac-
tive projects. Arduino microcontroller acts as a central processing unit (CPU) which
244 S. Chedup et al.

senses the environment by integrating many sensors. CPU will activate peripheral out-
put devices by receiving inputs from many sensors, thereby affecting its surrounding
by controlling lights, motors, and other actuators. The Arduino programming envi-
ronment is a convenience abstraction layer between C/C++ coding commonly used
for programming microcontrollers. The Arduino abstraction layer has made a splash
by allowing the programming of microcontrollers using simplified IDEs. The stan-
dardized hardware and nurturing an active open-source community that contributes
to code libraries simplify the use of many hardware peripherals such as sensors
and motors. Using Arduino is much easier and quicker than directly programming
microcontrollers. This ease of use has allowed many students, electronics hobbyists,
and programming enthusiasts to learn and apply microcontrollers to a broad array
of applications including the cutting-edge domains of IoT and AI. Arduino is one
step removed from programming microcontrollers directly. Yet, the leap between
programming Arduino and directly programming an ATMEGA328 IC is quite large.
The open-source Arduino software (IDE) provides platform to code and upload the
code to the board [35]. It is compatible on Windows, Mac OS X and Linux. The
code in this environment is written in Java, but it also works on processing based and
other open-source software. This IDE is used with any Arduino board. Arduino UNO
is used in this experimental setup. Arduino compatible ESP8266 Wi-Fi module is
required to be installed separately to connect to the Internet so that users can operate
from anywhere in the world. It has 8 pins with 2 pins [42] of 1 TXD and 1 RXD,
2 GPIO pins, viz. GPIO 0 and GPIO 2, Reset, VCC and Ground. TX and RX are
the transmitter and receiver pins used to flash the embedded code. AMS1117 will
supply 3.3V to the ESP8266 Wi-Fi module [36, 37].

5.2 STEMSEL and Runlinc IDE

Runlinc IDE is a web page-based AI and IoT application development platform. It

works on any browser on a computer (Windows, Mac, Chromebook, Linux, pad, or
phone using java script block). It supports Science Technology Engineering Maths
Social Enterprise Learning (STEMSEL)microcontroller. STEMSEL microchip con-
sists of Wi-Fi microchip which enables in interfacing the STEMSEL to runlinc IDE
in web page. This web page-based tool provides huge potential for remote controlling
of application devices deployed in IoT network. STEMSEL Wi-Fi module comes
with the STEMSEL microcontroller. It comes in two versions V.1.1 and V.1.0, and it
requires network setting with user ID and password runlinc1234 for V.1.1.and pass-
word Hartley2018 with same user name for V.1.0. Once the module is mounted on
the STEMSEL Kit, users can link to the runlinc web page with USB plug in and does
not require separate installation. The IoT application developed on runlinc works in
two ways—on the Internet and on device chip.
Performance Comparison of Arduino IDE and Runlinc IDE . . . 245

6 Implementation

Same parameters are considered to rig up a circuit on both Arduino UNO and STEM-
SEL. Program code to control the output devices, viz. LED, motor, fan from the data
generated by input analog devices, viz. LDR, thermistor, and LM35 is developed
on Arduino IDE and runlinc IDE. The time taken to write a complete code for each
component and corresponding number of lines of code required in making the device
operational is noted for analysis. The program code in deploying all the system com-
ponents to build a required system model is developed. The corresponding time taken
and number of lines of code required are noted. The time taken to build a physical
circuit on both the platforms is not considered for comparison as it was noted that the
time required was almost same in both. The experimental arrangements, results, and
observations are made in Figs. 3 and 4. The overall results and observations from the
experiment is represented in Table 2.

Fig. 3 Expt. 1—Control LED/lamp using Arduino UNO and STEMSEL

Fig. 4 Expt. 2—Control LED/lamp with the input from sensors using Arduino UNO and STEMSEL
246 S. Chedup et al.

Table 2 Overall results and observation from experiment setup

S. No. Task STEMSEL Arduino STEMSEL Arduino Remarks
performed
No. of lines of code Time taken
1. Putting LED 3 10 <1 min >2 min
ON and OFF
2. LDR to 5 21 <2 min >5 min
control LED
(using if else
condition)
3. Thermistor to 6 16 <3 min < 5 min
sense the
surrounding
temperature
and turn on
FAN
4. Thermistor 12 21 <4 min < 10 min
and LDR to
control motor
and LED
5. LM35 4 24 <2 min >1 h error
temperature correction
sensor to put takes even
ON fan more time
6. Incorporating 13 36 <7 min >30 min Without
all the sensor error
into one checking
system model
(LDR, LM35)
7. Creating 2 63 [36] <2 min >2h time
ON/OFF approxi-
button on the mated
web page considering
the lengthy
code, which
can take
even more
8. DC motor 1 12 <1 min <5 min
control

The basic difference between the two platforms Arduino and runlinc—Arduino
is programmed on computer, whereas the later is programmed on web page/Internet.
Same parameters are considered to ensure same yardstick is maintained for better
results. The time taken and complexity to build the physical or virtual circuit is
not considered for the comparison. As per the experimental results in Table 5, the
number of lines of code which is equated with the time is very less in runlinc to
achieve the same result. As the complexity of program grows for the increase in
Performance Comparison of Arduino IDE and Runlinc IDE . . . 247

Fig. 5 Graph represents the age range of respondents with experience on Arduino IDE and runlinc
IDE

the parameters incorporated into the system, the number of lines of code required
in Arduino increases almost exponentially. The number of lines of code and time
taken is even more drastic while incorporating the concept of IoT. On the other
hand, creating an ON/OFF button on runlinc requires mere 2 lines of code, whereas
it is many folds in Arduino. Withstanding the procedure to create web server in
Arduino IDE, it requires 64 lines of code to create an ON/OFF [43]. The analysis of
responses on the survey is presented in graphical representation in Figs. 5, 6, 7, 8,
9, 10, 11, 12, 13 and 14. Sample largely consist of university students represented
in Figs. 5 and 6 with relatively good experience on the usage of platforms specified.
It is found that large percentage of university students have good knowledge on AI
and IoT applications basically developed using either of the platforms as indicated
by Figs. 7 and 8. It is also found that larger section of sample has better experience
of doing AI and IoT projects/research on Arduino and runlinc. On further analysis,
users experience of doing AI and IoT projects is better with runlinc compared with
Arduino as indicated in Fig. 8, especially for the beginners.

7 Conclusion and Future Works

On comparison of Arduino IDE and runlinc IDE platforms primarily on ease of

understanding, interactivity and complexity of program code, the later provides eas-
ier option. Runlinc by development being web page based, understanding and real-
ization of AI and IoT is easier, faster. It is also found to be easier in demonstration
248 S. Chedup et al.

Fig. 6 The graph represents the qualification and profession of respondents with experience on
Arduino IDE and runlinc IDE

Fig. 7 Users knowledge and experience in AI and IoT technology

Performance Comparison of Arduino IDE and Runlinc IDE . . . 249

Fig. 8 Statistics of respondents who used Arduino or STEMSEL in doing their

study/project/research

Fig. 9 Users Knowledge on Arduino and runlinc

250 S. Chedup et al.

Fig. 10 Users experience on doing projects/research on Arduino and runlinc

Fig. 11 Respondents’ comparison of two platforms on usage

Performance Comparison of Arduino IDE and Runlinc IDE . . . 251

Fig. 12 User preferences of IoT and AI platforms

Fig. 13 Users feedback on the ease of coding which is dependent on the number of lines of code
252 S. Chedup et al.

Fig. 14 Users response on the potential/prospect of IoT STEM AI technology for quality education

of real scenario of the Internet of things since it enables in controlling devices in the
Internet as well as on-chip, thereby making it easier and effective to teach AI and
IoT technologies for the beginners. On the other hand, lengthy, complex, and labo-
rious program code has made challenging, especially for beginners, with Arduino
despite abundance of open-source resources freely available with different versions
of Arduino microcontrollers supported by Arduino IDE. Few standout features of
runlinc IDE and STEMSEL are its ability to create sound, vision, and motion with
only 1 line of code and two lines of code to build ON/OFF buttons on web page
to control any devices anywhere remotely. Considering the simplicity, faster, and
less coding involved, it provides good alternative to Arduino in reaching out and
realizing the great potential of AI and IoT in primary schools to universities. Future
research could focus on how these technologies can transform the STEM education
into educational tool for educating on IoT and AI technologies. Considering the less
time required for the execution, it can also be used in developing smart systems in
epidemic situations. This also has potential for high-end researches in AI and IoT.

Acknowledgements Authors would like to thank Dr. Miroslav Kostecki, TechnicalDirector,

STEMSEL Foundation, Adelaide, Australia, for helping us in capacity development on runlinc
IDE and STEMSEL through several workshops and demonstrations via Skype. Authors also would
like to express our gratitude to Mr. Michael Cheich for his continues support in Arduino learning
by sharing Arduino learning resources provided by Electronics Programming Academy, USA.
Performance Comparison of Arduino IDE and Runlinc IDE . . . 253

References

1. Morrar RA, Saeed HA (2017) The fourth industrial revolution (Industry 4.0): a social innovation
perspective. Technol Innov Manage Rev 7(11):12–20
2. Yunus M (2018) Yunus Warns of survival threat from artificial intelligence. The
Economic Times. https://www.thequint.com/news/hot-news/yunus-warns-of-survival-threat-
from-artificial-intelligence. Accessed 9 Mar 2020
3. Gauri P (2019) What the fifth industrial revolution and why it matters. World Eco-
nomic Forum. https://europeansting.com/2019/05/16/what-the-fifth-industrial-revolution-is-
and-why-it-matters/. Accessed 12 Sept 2020
4. Venkatraman R (2020) You’re 96 percent less creative than you were as a child. Here’s How to
reverse that sure, you can’t be a kid again, but you can think like one. INC. Accessed 23 May
2020
5. The waste of creative talents. George Land’s creativity test. In: LIFE, 16 Jan 2015. Accessed
24 May 2020
6. Land G. Evidence that children become less creative over time (and how to fix it). In: TED
talk. Accessed 23 May 2020
7. Robinson K (2006) Do schools kill creativity? TED ideas worth spreading
8. Bues D (2019) STEM education: how best to illuminate the lamp of learning. In: 2019 IEEE
integrated STEM education conference (ISEC). IEEE
9. Francis K et al (2018) Forming and transforming STEM teacher education: A follow up to pio-
neering STEM education. In: 2018 IEEE global engineering education conference (EDUCON).
IEEE
10. Vasiu R, Andone D (2019) An analyze and actions to increase the quality in STEM higher
education. In: 2019 IEEE integrated STEM education conference (ISEC). IEEE
11. Miorelli J et al (2015) Improving faculty perception of and engagement in STEM education.
In: 2015 IEEE frontiers in education conference (FIE). IEEE
12. Huang A (2015) Comparison of programming performance: promoting STEM and computer
science education. In: 2015 IEEE integrated STEM education conference. IEEE
13. Forawi S (2018) Science, technology, engineering and mathematics (STEM) education: mean-
ingful learning contexts and frameworks. In: 2018 International conference on computer, con-
trol, electrical, and electronics engineering (ICCCEEE). IEEE
14. Thibaut L et al (2018) The influence of teachers’ attitudes and school context on instructional
practices in integrated STEM education. Teach Teacher Educ 71(2018): 190–205
15. Goldman Sachs (2014) The Internet of Things: making sense of the next mega-trend, vol 201
16. Maayan D (2020) The IoT rundown for for 2020: stats, risks, and solutions. security today.
Accessed 8 Apr 2020
17. UNESCO. Artificial intelligence for sustainable development: challenges and opportunities for
UNESCO’s science and engineering programmes. Principles for artificial intelligence towards
a humanistic approach?
18. Pedro F et al (2019) Artificial intelligence in education: challenges and opportunities for sus-
tainable development
19. Aldowah H et al (2017) Internet of Things in higher education: a study on future learning. J
Phys: Conf Seri 892(1) (IOP Publishing)
20. Hollier S, Abou-Zahra S (2018) Internet of Things (IoT) as assistive technology: potential
applications in tertiary education. In: Proceedings of the internet of accessible things, pp 1–4
21. Martín AC et al (2019) Smart education: A review and future research directions. Multidisc
Digit Publ Inst Proc 31(1)
22. Rajput M (2020) Use of IoT in education sector and why it’s a good idea. IoT for all, 31 Dec
2019. Accessed 9 Mar 2020
23. Banica L et al (2017) The impact of internet-of-things in higher education. Sci Bull-Econ Sc
16(1):53–59
24. Goodwin M et al (2017) Strategies to address major obstacles to STEM-based education. In:
2017 IEEE integrated STEM education conference (ISEC). IEEE
254 S. Chedup et al.

25. Ray Partha Pratim (2016) A survey of IoT cloud platforms. Future Comput Inf J 1(1–2):35–46
26. Ganguly P (2016) Selecting the right IoT cloud platform. In: 2016 International conference on
Internet of Things and applications (IOTA). IEEE
27. Novák M et al (2018) Use of the Arduino platform in teaching programming. In: 2018 IV
international conference on information technologies in engineering education (Inforino). IEEE
28. Singh KJ, Kapoor DS (2017) Create your own internet of things: a survey of IoT platforms.
IEEE Consumer Electron Mag 6(2)57–68
29. Pflanzner T, Kertész A (2018) A taxonomy and survey of IoT cloud applications. EAI Endorsed
Trans Internet of Things 3(12) (Terjedelem-14)
30. Tayeb S et al (2017) A survey on IoT communication and computation frameworks: An indus-
trial perspective. In: 2017 IEEE 7th annual computing and communication workshop and
conference (CCWC). IEEE
31. Sruthi M, Kavitha BR (2016) A survey on IoT platform. Int J Sci Res Mod Educ (IJSRME).
ISSN (online) 2455-5630
32. Satu MS et al (2018) IoLT: An IoT based collaborative blended learning platform in higher
education. In: 2018 International conference on innovation in engineering and technology
(ICIET). IEEE
33. Ciolacu MI et al (2019) Education 4.0—Jump to innovation with IoT in higher education. In:
2019 IEEE 25th international symposium for design and technology in electronic packaging
(SIITME). IEEE
34. Ciolacu MI, Binder L, Popp H (2019) Enabling IoT in education 4.0 with biosensors from
wearables and artificial intelligence. In: 2019 IEEE 25th international symposium for design
and technology in electronic packaging (SIITME). Cluj-Napoca, Romania, pp 17–24
35. Sani RM (2019) Adopting Internet of Things for higher education. In: Redesigning higher
education initiatives for industry 4.0. IGI Global, pp 23–40
36. The duke perspective, impact of industry 4.0 on education, 21 Mar 2019. Accessed 16 March
2020
37. Hurtuk J et al (2017) The Arduino platform connected to education process. In: 2017 IEEE
21st international conference on intelligent engineering systems (INES). IEEE
38. Herger ML, Bodarky M (2015) Engaging students with open source technologies and Arduino.
In: 2015 IEEE integrated STEM education conference. IEEE
39. Yoo W, Pattaparla SR, Sameer AS (2016) Curriculum development for computing education
academy to enhance high school students’ interest in computing. In: 2016 IEEE integrated
STEM education conference (ISEC). IEEE
40. Gandhi PL, Himanshu SM. Smartphone-FPGA based ballon payload using cots components.
Memory 32.72KByte: 8KByte
41. Shlibek M, Mhereeg M (2019) Comparison between Arduino based wireless and wire methods
for the provision of power theft detection. Eur J Eng Sci Technol 2(4):45–59
42. Srivastava P et al (2018) IOT based controlling of hybrid energy system using ESP8266. In:
2018 IEEMA engineer infinite conference (eTechNxT). IEEE
43. Santos R (2020) ESP8266 web server with Arduino IDE. In: Bench test Arduino server 2 LEDs
pc1.pdf, web. 17 Mar 2020
Analysis of Small Loop Antenna Using
Numerical EM Technique

R. Seetharaman and Chaitanya Krishna Chevula

Abstract Maxwell’s equation on an integral note with time dependency as a condi-

tion and angular frequency as a tool forms application in many areas of electro-
magnetism. Maxwell’s equation in phasor and integral forms aids in surface area
and volume measurements in terms of contour representation. This paper focuses
on improving the aspects of a less resonant small loop antenna which is a charac-
teristic of magnetization. Electric field features with an inherent property of plane
waves that pervades in all directions within the surface become an important tool for
analyzing various parameters of this loop antenna. The laid-out boundary conditions
over the surface form an important part in solving the problem. Problem proceeds
with an initial value of directivity and existing radiation pattern. Application of finite
element method to this small loop antenna is the theme of the paper. This will help
in solving the computational domain over the surface area by assigning triangular
meshes and truncation schemes. Further application of conjugate gradient method to
the elements on the surface will show the improved performance of the loop antenna.

Keywords Time dependent Maxwell’s equation · Small loop antenna · Finite

element mesh · Conjugate gradient method

1 Introduction

Time dependent integral equations on an amalgamation sense formed the original

Maxwell’s equation. Central part of studying this equation lies in the domain of
frequency represented as fields of harmonic variation. Frequency that makes this
happens is referred to as

rad
A F (ω) = 2π f (1)
s

R. Seetharaman (B) · C. K. Chevula

Department of Electronics and Communication Engineering, CEG Campus, Anna University,
Chennai 600025, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 255

Table 1 Frequency levels for

Frequency Hz A F (ω)rad s−1 Wavelength (m) D Si (P) λ(m)
the dimension of system
Power line 376 5 × 106 5 × 105
AMl f 6.28 × 106 300 30
FMh f 6.28 × 108 3 0.3
M[μ] f 6.28 × 1010 0.03 0.003

where A F (ω) is called as the angular frequency. Table 1 shows an application of

A F (ω) to a system whose power radiation must be infinitesimal with a slowly varying
current. This is achieved by fixing A F (ω) equal to 2πc/D Si (P), where D Si (P) is the
dimension of the system. The fields associated with the system happen to be nearer
to the system itself. The problem then is in near field context.
In Table 1 for the power line, we have the dimension of the system as 300 miles.
AMl f stands for the low-frequency range of the amplitude modulated radio frequency
wave. FMh f for the high-frequency range of the frequency modulated wave, and
M[μ] f stands for the large frequency range of the microwaves.
Relationship between time harmonic and time dependent electric field is given by

E h x hr , yhr , z hr ; t = Re[E t (xt , yt , z t )e jA F (ω)t (2)

In Eq. (2), the left hand side can be written on the component wise as

x̂ E xt 0 cos(A F (ω)t + φx ) + ŷ E yt 0 cos A F (ω)t + φ y
+ ẑ E xt 0 cos(A F (ω)t + φz ) (2a)
√
In both Eqs. (2) and (2a) j = −1. Following Eqs. (2) and (2a), the electric field
in terms of the phasor can be written as

E ∠φ x∠φ , y∠φ , z ∠φ = x̂ E xt 0 e jφx
+ ŷ E yt 0 e jφ y + ẑ E zt 0 e jφz (3)

In Eq. (3), E ∠φ is referred to the electric field phasor. Attempt to bring in E ∠φ

into the time dependent Maxwell’s equation yields
⎫
∇ × H = J + jA F (ω)εE ⎪
⎪
⎪
∇ × E = −M − jA F (ω)μH ⎬
(4)
∇·(μH ) = ρm ⎪
⎪
⎪
⎭
∇·(εE) = ρ

where E is electric field intensity V/m,

H is magnetic field intensity A/m,
Analysis of Small Loop Antenna Using Numerical EM Technique 257

J is electric current density A/m2 ,

M is magnetic current density V/m2 .
Equation (4) has charge phasors which are scalar quantity and are given as ρ
which is electric charge density expressed as C/m2 and ρm which is a magnetic
charge density expressed as Wb m3 . Last two equations of Equation (4) help to
write the Maxwell’s equation in the phasor form as
⎫
∇ × H∠φ = Ji + j A F (ω)εE ∠φ ⎪
⎪
⎪
⎪
∇ × E ∠φ = −Mi − jA F (ω)μH∠φ ⎬
(5)
∇ · μH∠φ = − ∇ · Mi∠φ /jA F (ω)⎪
⎪
⎪
⎪
∇· = − ∇ · Ji∠φ /jA F (ω) ⎭

where ε = ε0 εr .
Here, ε0 represents free space permittivity 8.854 × 10−12 F m, and εr represents
medium’s relative permittivity.

μ = μ0 μr

Here, μ0 represents free space permeability 4π × 10−7 H/m, and εr represents

medium’s relative permeability.
Also subscript i in Eq. (5) represents impressed current. Equation (5) leads to the
integral form of the Maxwell’s equation as
¨ ⎫

H∠φ · dI = Ji + jA F (ω)εE ∠φ · dS ⎪
⎪
⎪
⎪
⎪
⎪
CB SC ⎪
⎪
¨ ⎪
⎪
⎪
⎪
E ∠φ · dI = − Mi + jA F (ω)μH∠φ · dS ⎪
⎪
⎪
⎪
CB SC
⎬
¨ ˚ ˚ (6)
∇ · Mi∠φ ⎪
μ·H∠φ · dS = − · dV = ρm ∠φ · dV ⎪
⎪
⎪
jA F (ω) ⎪
⎪
⎪
⎪
SV
¨
VeS
˚ ˚
VeS ⎪
⎪
∇ · Ji∠φ ⎪
⎪
⎪
ε·E ∠φ · dS = − · dV = ρ∠φ · dV ⎪ ⎪
⎪
jA F (ω) ⎭
SV VeS VeS

where C B represents the boundary of the contour, SC represents the surface enclosed
by the contour, SV denotes the surface of the volume, and VeS denotes the volume of
the enclosed surface. In Eq. (6), all integrations are carried over closed surfaces.
258 R. Seetharaman and C. K. Chevula

2 Loop Antennas

Electric field present at a x from the origin can be written as an integral of the plane
wave
¨

E Pw (x) = FPw ()e i k · x d (7)
4π

where E Pw (
x ) is the electric field at x by taking into consideration all real angles
present in the domain ,
is the solid angle that takes into consideration both elevation and azimuth angles,

e i k · x is the plane wave representation.
In Eq. (7), d is formulated as d = sin ξ dξ dθ , and the wavenumber k is given
by

k = −k x̂ sin ξ cos θ + ŷ sin ξ sin θ + ẑ cos ξ (7a)

Geometrically, the electric field in Eq. (7) can now be written as

2π π

E Pw r̂ = FPw (ξ, θ )e i k · r sin ξ dξ dθ (8)
0 0

Equation (7) contains the angular spectrum component FPw () which is
expressed as

FPw () = ξ̂ Fξ Pw () + θ̂ Fθ Pw () (9)

where ξ and θ are orthogonal with respect to each other in a vector sense and are
−
→
orthogonal specifically to k also. The quantities Fξ Pw and F θ Pw are complex and
can be written, respectively, as

Fξ Pw () = Fξr Pw () + i Fξ i Pw () (9a)

Fθ Pw () = Fθr Pw () + i Fθi Pw () (9b)

Average power dissipated over the surface in a period of time is

P dt = Re E × H · n̂dS (10)
Sen
Analysis of Small Loop Antenna Using Numerical EM Technique 259

Which can be further reduced as

P dt = Re n̂ × E · H dS (10a)
Sen

In Eqs. (10) and (10a), Sen is the enclosed surface, and dS is the elemental surface.
With the surface S subjected to the required boundary conditions, we have

n̂ × E ∼
= ζ H (11)

Equation (11) holds for the surface S. Laplacian of electric field intensity can be
written as

∇ 2 E = x̂∇ 2 E x + ŷ∇ 2 E y + ẑ∇ 2 E z (12)

Under these circumstances, the angular frequency can also be written as

E
ω =√ (13)
με

where E is the eigen value of the electric field.

In Eq. 11, the quantity ζ is defined as

ω μ0
ζ = (14)
iσs

where σs represents conductivity of the surface.

Figure 1 illustrates how a loop antenna is designed for practical purposes. Here, the
antenna is centered along z-axis and lying on the plane of xy-plane. This antenna has
directivity D with a predefined value associated with. Direction of the current flow
is indicated along the surface of the loop. The length of the loop antenna is indicated
by the factor of l. Usually, l for loop antenna is defined in terms of the wavelength.
Electric and magnetic fields are lying in the respective direction as shown.
Maxwell’s curl equation is

∇ × E = iω B (15)

where B is the magnetic flux density in Tesla. From Eqs. (7) and (15), we can write

1
H (
r) = ∇ × E Pw (
r) (16)
iA F (ω)μ
260 R. Seetharaman and C. K. Chevula

Fig. 1 Diagrammatic
representation of the small Axis
loop antenna H

E
I

which can be further solved to

¨
1
H (
r) = k̂ × FPw ()e i k · r d (16a)
η
4π

where η is the characteristic impedance of free space. The relationship between mean
square magnetic field and mean square electric field is given as
2

2 E msq (
r 2 )

r 1 ) =
Hmsq ( (17)
η2

Directivity is defined by

r = 4π PE (θ, φ)
D (18)
PE (θ, φ)d

where PE (θ, φ) is the power emitted toward the elevation and azimuth angles and

PE (θ, φ) is the total power emitted by the system. Under special circumstances,
we have
4π
Dr (θ, φ) = A p (θ, φ) (19)
λ2
Analysis of Small Loop Antenna Using Numerical EM Technique 261

Fig. 2 Illustration of
Azimuth and elevation angle
application to small loop
antenna

where A p (θ, φ) = Power density

Flux density
.
Figure 2 gives a physical interpretation by placing the small loop antenna as z-axis
centric and lying in xy-plane in the angular point of view as (ξ, θ ). This gives the
component-wise treatment for receiving Eq. (20) as
⎫
iA F (ω)μAl sin ξ ⎬
Cθ =
2η Rr (20)
⎭
Cξ = 0

where Al corresponds to the area of the loop,

Cθ denotes the component of the elevation angle,
Cξ represents the component of the azimuth angle,
Rr denotes the radiation resistance of the loop.
Since the azimuth component of the magnetic component is orthogonal to the
central axis of the loop, we have Cξ = 0. This also happens due to the fact that the
elevation component of the electrical field lies orthogonal to the conducting loop
lying in the xy-plane.
Determining Cθ depends upon the magnetic flux, which passes through the loop,
induced voltage which will be determined with the help of −(iω) and the induced
current to be found with the assistance of radiation resistance. The radiation resistance
is given by
262 R. Seetharaman and C. K. Chevula

2π η k Al 2
Rr = (21)
3 λ

The polarization factor is given by

E Pw (
r )ω2 μ2 Al
Plr = (22)
12η2 Rr

3 Finite Element Mesh

Figure 3 illustrates the idea of applying the triangular mesh to the small loop antenna
in this problem. Each unit sphere consists of the triangular mesh arrangement. Choice
of the triangular mesh goes for making the computational tasks easier. For better
performance of this small loop antenna, finite element mesh method is applied.
Complex tasks can be completed because this problem involves both 3D and 2D
cases due to the fast processing of the triangular mesh. It forms the basis function
to be solved within the computational domain. This is aided by the fact that finite
element method needs less memory compared to other techniques and can handle
shapes geometrically without altering it.
Fields and field patterns are present in all unit spheres with loop antenna. This
defines the computational domain. Dividing this domain into smaller domains with
associated boundary conditions paves for the next set of the application of finite

Fig. 3 Mesh application to Unit

loop antenna Sphere

Triangular
Mesh
Current
loop
Analysis of Small Loop Antenna Using Numerical EM Technique 263

element method (FEM). This has area where not only fields are marked, but areas
where fields have to be found.

A SM0 SM E Pw ( r ) ϒ E,H = K BC
r ), H ( (23)

where A SM0 SM denotes an symmetric matrix with fewer nonzero elements,

ϒ E,H represents the characteristic of field patterns,
K BC is laid out based on the boundary conditions.
Unique aspect of A SM0 SM is the sparse property of the matrix which helps in inter-
action among the elements within the sub-domains. Then, it brings out a relation
between them. ϒ E,H gets supplied with the variables which will be used for finding
out the solution. This has to be solved with the restriction of boundary conditions as
given by K BC .
Discretization of the computational domain is the keen part of FEM. Meshes in
the domain are defined in terms of fraction of wavelength. Weights become part of
the mesh elements that are usually built as a series of expanding functions. This helps
to solve for the shape functional. In general, the number of unknown parameters to
be determined in FEM depends on this expansion function and the memory required
for solving FEM.

4 Methodology

E ∠φ helps to get the value of the electric field over the surface, while the respective
electric and magnetic components of Maxwell’s phasor set of equations help to
arrive at the result on the surface area and volume of the material. E Pw ( r ) gives
the plane wave component of electric field present on a point over the loop antenna.
Then, E & H can be calculated over a solid angle.
P dt gives the power available over a particular area. With this P dt , the problem gets
solved by calculating for the directivity aspect of small loop antenna. Component-
wise results for E & H become the next part of the calculation. With this available
parameters, loops antenna’s polarization factor is given by Plr . These parameters
will provide the data in this problem.
Minimization of the mesh with the help of conjugate gradient method is the pivotal
part. Conjugate gradient method additionally demands A SM0 SM which must be positive
definite. Initially, the problem gets solved by setting

A SM0 SM E Pw (
r ), H (
r) ϒ E,H − K BC = ϒ E,H (24)

Conjugate gradient solves (24) by setting

264 R. Seetharaman and C. K. Chevula

ϒ E,H = min
ϒ E,H (25)

Equation (25) states that Eq. (24) is solved by minimization procedure. This takes
the problem to the new level of setting

ϒ E,H = 0
∇min (26)

Equation (26) solves the problem with the aid of successive iteration of the loops by
getting the values of directions graphically and assigning newer minimized values of
ϒ E,H during each iteration with the assistance of coefficients to directional vector.
This cumulative effort of running N number of iterations over the entire vector space
of Eq. (24) will work out the minimized values of ϒ E,H . This truncation scheme
running over the entire computer domain will finalize the small antennas’ shape for
superior results in communication systems.

5 Antennas-Gradient Methods

Topological derivative, which is one of the gradient methods, is used for obtaining
sliced sections of an image. High-frequency waves can be fine-tuned for operations
in space atmosphere with the help of Strum Liouville operator which is a differen-
tial operator. Weak derivative along with distributional derivative methods is used
for obtaining splayed out results in medical images. Piezo electric material used
in ultrasound equipments of medical imaging can also be used for designing the
antennas.
Diffusion operators can also be used for gradient solutions. Intelligence antennas
are used for the short-range communications in the case of radio frequency applica-
tions [1]. Short antennas can also be used for broadcasting the amplitude modulated
wave in the mid-frequency range [2]. Short pulse antenna, a type of short antennas
is used for analyzing the radiation of a source using spherical waves [3].
Conical helical antenna, another type of short antenna can be used not only for
communication purposes, but also for imaging purposes [4]. Calculating the magnetic
field of near zone for small loop antenna with better accuracy is possible. This
magnetic field can be used as a reference for calibrating other field meters [5]. The
same is calculated by taking into account the polarization factor [6].
Analyzing the small loop antenna in the range of 3–10 MHz forms an interesting
study for its behavioral pattern [7]. Small loop antennas can also be used as a probe for
investigating magnetic fields [8]. Study on small loop antenna’s radiation efficiency
when it is fabricated from a super conductor is quite interesting [9]. Polarization
diversity study with the help of loop antenna forms a gripping application of loop
type of antennas [10].
Analysis of Small Loop Antenna Using Numerical EM Technique 265

6 Conclusion and Discussion

The phasor form of Maxwell’s equation, its integral form, and plane wave component
of electric field are all taken into account for improving the aspects of small loop
antenna in terms of directivity, power radiation, etc., with the help of I-V angles.
Further treatment of finite element method and conjugate gradient method helps
to achieve superior performance of small loop antennas. Matrix methods help to
solve design elements for small loop antennas and can also be used for other type of
antennas. Electric field expressed in terms of phase is taken into account for the rest
of the problem. Time dependent Maxwell’s equation in the phasor form is considered
for application to the electromagnetic radiating material. Integral form of Maxwell’s
equation gives the required result for calculating the surface area and volume of
the radiating loop antenna. Plane wave component of the electric field, directivity,
and polarization factor further helps in analyzing the performance of the small loop
antenna. Assignment of positive symmetric linear matrix equation for improving
the performance of the loop antenna is the highlight of this problem. Finite element
method is applied to the shape of antenna by simultaneously handling the topology
of small loop antenna. Conjugate gradient method helps to solve the problem for
improved designs of small antennas.

References

1. Mikko S, Pekka KV (2009) Apparatus and method for controlling diverse short range antennas
of a near field communication circuit. US 7,541,930 B2
2. Trainotti V (2001) Short medium frequency AM antennas. IEEE Trans Broadcast 47(3):263–
284
3. Shlivinski A, Heyman E (1999) Time domain near field analysis of pulsed short pulsed antennas.
IEEE Trans Anten Propag 47(2):271–279
4. Nenzi P, Varlamava V, Marzano FS, Fabrizio P (2013) U-Helix: on chip short conical antenna. In
Proceedings of 7th European Conference on Antennas and Propagation (EuCAP), Gothenburg,
Sweden, pp 1289–1293
5. Frank MG (1967) The near-zone magnetic field of a small circular-loop antenna. J Res National
Bureau Standards—C Eng Instru 71C(4):319–326
6. Bhattacharyya BK (1964) Electromagnetic fields of a small loop antenna on the surface of a
polarizable medium. GeoPhysics 29(5):814–831
7. Boswell A, Tyler AJ, White A (2005) Performance of a small loop antenna in the 3–10 MHz
band. IEEE Anten Propag Mag 47(2):51–56
8. Whiteside H, King R (1964) The loop antenna as a probe. IEEE Trans Anten Propag 12(3):291–
297
9. Wu Z, Mehler MJ, Maclean TSM, Lancaster MJ, Gough CJ (1989) High TC superconducting
small loop antenna. Phys C Superconduct Appl 162(01):385–386
10. Kim DS, Hyung Ahn C, Yun T, Sung JL, Kwang CL, Wee Sang P (2007) A windmill-shaped
loop antenna for polarization diversity. In: Proceedings of IEEE antennas and propagation
society international symposium, Honolulu, HI, pp 361–364
A Monopole Octagonal Sierpinski Carpet
Antenna with Defective Ground
Structure for SWB Applications

E. Aravindraj, G. Nagarajan, and R. Senthil Kumaran

Abstract In this article, a notched octagonal-shaped microstrip patch antenna with

Sierpinski Carpet fractal and defective ground structure is presented. On the effect
of enhancement, four major alterations are made in a conventional microstrip patch
antenna such as octagonal radiating patch enhances the width of the radiating band,
Sierpinski fractal structure effects in miniaturization of antenna size, notches provides
better fringing effect to the radiating element, symmetrical defective ground structure
(DGS) perturb the EM fields around defected area to improvise capacitive and induc-
tive effects. The miniaturized antenna design comprises of a four iterative Sierpinski
Carpet fractal levels in a printed octagonal patch with symmetrical DGS on 30 × 30 ×
1.6 mm3 dimensional area. This super-wideband (SWB) antenna structure operates
at the frequencies between 4.1 and 19.8 GHz (S11 ≤−10 dB; VSWR < 2). Between
the respective frequencies, the peak gain and directivity values of 6.1 dBi and 6.45
dBi have been obtained in the simulation analysis made using Ansys FSS EM solver
17.2. The antenna is built by photo-lithographic method and analyzed in ZNB-20
vector analyzer. The simulated design offers maximum fractional bandwidth (FBW)
of 131.38% with the bandwidth ratio of 4.82:1. Hence, the proposed antenna fulfills
the bands such as C-band (4–8 GHz), X-band (8–12 GHz), and K-band (12–18 GHz)
and also partially covers ultra-wideband (UWB) spectrum (3.1–10.6 GHz).

Keywords Bandwidth · DGS · Gain · Octagonal patch · Sierpinski Carpet

fractal · Size · SWB

E. Aravindraj (B) · G. Nagarajan

Pondicherry Engineering College, Puducherry, India
e-mail: [email protected]
G. Nagarajan
e-mail: [email protected]
R. S. Kumaran
IFET College of Engineering, Villupuram, Tamilnadu, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 267

1 Introduction

The demand on wireless communication range has been improvised to a greater

extend in day-to-day scenario. The radiators which operate at very low and high
frequencies with a single piece element are growing tremendously in demand. The
miniaturized versions of the conventional antennas will not be effective for higher
frequencies. Here, the patch antennas are good candidates to operate at low profile
and can be operated at very high frequency range with low-cost margin. The advance-
ment in the patch antennas is moved to a greater extends by overcoming the limi-
tations such as poor gain and narrow bandwidth [1]. Certain techniques are used at
various levels usage of the antenna in an effective way. In recent years, vendors and
operators of various upcoming technologies are exclusively interested in working
with very low and high frequencies simultaneously. This leads the researchers to
experiment super-wideband (SWB) frequencies to get operated using a miniaturized
low profile antenna. The antenna that radiated with the ratio of bandwidth above a
decade, i.e., 10:1 is known to as SWB antenna [2]. In SWB technology, both low and
high frequencies are operated at high bandwidth dimension ratio (BDR) with less
electrical dimensions [3]. The miniaturized versions of the antennas well suits for
this kind of advanced usage of frequencies. A tentative implementation of fractals in
the conventional patches will make them operate at high and low frequencies with
its miniaturized version [4]. The installation of fractals should be tentative; it should
not move to a greater extend. It affects the easy manufacturing capabilities of the
antenna. There are many fractal versions available such as Sierpinski, Monkowski,
Koch curves, and Antipodal Vivaldi [5, 6].
The most predominantly used one is Sierpinski fractal, that too Sierpinski Carpets
fractals are well used in miniaturization of polygonal shapes. The shapes of the patch
antennas will also play a vital role in improving the frequency range, and added
notch structures will effective current distribution on the printed radiating element.
Among different shapes of patch, specifically octagon will cut off all the corners
and helps for good fringing effect [7]. The fringing fields are the initiatives for the
radiation which are emitted from each corners of the printed patch with the help
of distributed electric field. The fringing effect depends on the width of the patch
and the feeding technique. There are various feeding techniques available namely
microstrip inset feed, proximity feed, aperture feed, coplanar feed, and co-axial
feed [8]. However, microstrip inset feed is capable of this purpose to maintain decent
impedance all over the range. Defective ground structure (DGS) is a defect integrated
into the microwave planar circuit to improvise the bandwidth wider with a better
gain. However, bandwidth and gain are inversely proportional to each other; certain
parameters will lead them to improve to some extent. This emerging technique is also
used attaining circular polarization and suppresses the higher model harmonics [9,
10]. The DGS underneath to the microstrip line will make mutual bonding among feed
and ground. As represented in Fig. 1, the intended structure consists of a Sierpinski
Carpet fractal loaded monopole octagonal-shaped patch printed on a flame retardant
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 269

Fig. 1 Overall design of

octagonal Sierpinski Carpet
fractal antenna

(FR-4) dielectric medium and a DGS with two truncations on either ends of the
ground [11].
In addition, the fractal structures can be applied as loads, ground structure, coun-
terpoises, etc. Fractal resonators are the models which are recently evolved in wide-
band systems with negative refractive index. It is familiarly known as metamaterials.
Metamaterial structure with close packed fractal resonators will incorporate a wide-
band of microwave frequencies. The fractal structures are introduced in filters for size
miniaturization and better rejection. The paper comprises of fractal antenna design
methodology in Sect. 2; simulation results such as S11, VSWR, bandwidth, gain,
and directivity in Sect. 3; Sect. 4 concludes the paper.

2 Antenna Design and Analysis

On designing a patch antenna, there are certain design methodologies and principles
must be followed as given below.

2.1 Methodology

The design methodology of patch antenna consists of certain predefined calculations

and values such as operating frequency (f r = 5.2 GHz), thickness of the substrate (t
= 1.6 mm), permittivity of the dielectric material (Er = 4.4), and velocity of light (C
= 3 × 108 m/s). The width and length of the patch are essential values for designing
a patch antenna [12, 13].
270 E. Aravindraj et al.

Table 1 Optimized geometry

Antenna parts Specifications Dimensions (mm)
of octagonal patch antenna
Patch Width (W ) 15
Length (L) 15
Ground plane Width (Wg) 30
Length (Lg) 5
Substrate Length (Ls) 30
Width (Ws) 30
Thickness (t) 1.6
Feed Width (Wf ) 1.8
Length (Lf ) 6.5

c
W = (1)
(εr +1)
2 fr 2

C 0.412h(εeff + 0.3) Wh + 0.264
L= √ −2 (2)
2 fr εeff (εeff − 0.258) Wh + 0.8

As represented in Table 1, the patch width denoted as W and length of the patch is
represented as L [14, 15]. Figure 2a and b represents the L and W of substrate and
ground plane which are denoted as (W s and L s ) and (W g and L g ), respectively, where
the effective dielectric constant Eeff is represented as

1
εr + 1 εr − 1 h − /2
εeff = + 1 + 12 (3)
2 2 W

(a) (b)

Fig. 2 Optimized specifications of octagonal patch antenna a front view (patch), b back view
(ground plane)
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 271

(a) (b)

Fig. 3 Dimensions of designed antenna a front view, b back view

2.2 Configuration

The front view and the back view design of the intended antenna are pictured in
Fig. 3a, b accordingly with respect to geometric representations [16, 17]. The octag-
onal patch is traced on a FR-4 dielectric material with permittivity (Er = 4.4), loss
tangent (tan δ = 0.02), and thickness (t = 1.6 mm). The antenna design occupies
compact area in the circuit board [18–20].

2.3 Operating Principle

Major modifications should be made in the conventional patch antenna to miniature

the size and enhance the bandwidth. As explained in the last section, a symmetric
octagonal patch has been designed. Then, Sierpinski Carpet fractal structure is intro-
duced in the increasing order iterations as shown in Fig. 4 [21, 22]. An octagon-
shaped cuts are made on the patch. Depends on the level of iteration, the octagons
get multiplied. The iteration levels are introduced using iteration function system

(0) (1) (2) (3) (4)

Fig. 4 Iterations of Sierpinski Carpet fractal patch

272 E. Aravindraj et al.

Table 2 Dimension of slots

Iterations Dimensions
in Sierpinski carpet fractal
structure for each iterations 0th iteration None
1st iteration 3.6 × 3.6 mm2
2nd iteration 1.8 × 1.8 mm2
3rd iteration 0.9 × 0.9 mm2
4th iteration 0.45 × 0.45 mm2

(IFS) manner. As represented in Table 2, the dimensions of the octagons and the
space between the octagons are decided using IFS. The Hausdorff dimension of the
carpet is represented as

d
ai+1 = ai (4)
3k
log d
=S (5)
log 3

where
i
ai = 3dk denoted as area of iteration

k Number of iterations
d Number of slots in that iteration
S Size of the particular slot.

1
Ln = (6)
4π 2 2 Ci
f on
fc
Cn = (7)
2Z 0 2π f n2 − f 01
2

1
CP = r2 (8)
2π f T X n(n−1)
X n2 − X n1 Li
Lsn = + 2
(9)
2π f T fr
f on
−1

As shown in Fig. 5, the DGS in the ground induces a parallel combination of capac-
itance (C n ) and inductance (L n ) due to the dielectric slit between the metal layers.
The slots are made under the microstrip line and provide a parallel capacitance. The
series inductance is induced and when the frequency is increasing reactance of the
transmission line and capacitance increases and decreases accordingly [23–25]. The
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 273

Fig. 5 Equivalent circuit of defective ground structure

defective ground structure will provide better impedance matching. Due to the varia-
tion in capacitance, the electric field gets changes. Due to surface waves and fringing
fields, L s1 and L s2 enable some inductance. The series inductance and parallel capac-
itance initiate attenuation pole and eliminate certain frequency signal, which leads
multiple successive resonance frequencies of L n and C n resonators. Therefore, due
to the current distribution in two paths in both the side of transmission line, the
bandwidth of the antenna enhances to high signal region [26, 27].

3 Results and Discussion

On comparing the four different iterative levels of Sierpinski fractal structure in

terms of S11 parameter (i.e., return loss versus frequency), it is understood that when
iteration increases the S11 output characteristics also increases. Both the impedance
bandwidth and bandwidth dimension ratio get comparatively increased due to the
effect of ground plane and fractal structure (Fig. 6).
Voltage standing wave ratio is another version of reflection co-efficient where the
value should not be less than or equal to 1. It is a critical benchmark to determine
radio energy influenced from the energy root. Like reflection co-efficient, VSWR
value should be less than 2 for efficient radiating version which results that iteration
4 has a wider bandwidth and lesser reflection co-efficient (Fig. 7).
The gain at different iterative levels is shown in Fig. 8, which represents that
increase in iterative levels will improve gain values. As same as gain, directivity is the
ability to absorb energy in a particular direction, which is represented in Fig. 9. There
are different resonance frequencies such as 5.2, 7.5, 9.9, 12.9, 16.1, and 18.6 GHz.
Among these all resonance frequencies, the simulation is carried out at 10 GHz band
for better performance. The designed antenna structure produces a peak gain value
of 6.28 dBi at 19 GHz and a peak directivity value of 6.66 dBi at 17 GHz. Figure 10
represents the radiation pattern at azimuth and elevation angles for all five iterations.
The bandwidth of the antenna at different iterative levels increases gradually and
it is presented in Table 3. The bandwidth ratio and fractional bandwidth are also
analyzed and observed. The bandwidth ratio represents the ratio of higher frequency
274 E. Aravindraj et al.

Fig. 6 S11 versus frequency

Fig. 7 VSWR versus

frequency

to the lower frequency radiated by the antenna. The fractional bandwidth implies the
ratio between the impedance bandwidth and center frequency.
The current distribution in the antenna for different iterative levels is compared at
10 GHz resonance frequency. Figure 11 represents the current dissemination focused
on the external and internal margins of the octagonal slots in the patch or radiating
element. This feature affects the impedance of the antenna matching with the trans-
mission line. Thus, DGS is introduced in the system consolidates the problem with
the slots etched in the ground plane.
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 275

Fig. 8 Gain versus

frequency

Fig. 9 Directivity versus

frequency

The antenna model has been fabricated and examined in R&S vector analyzer
(ZNB-20) which is capable of measuring 100 Hz to 20 GHz. The measured antenna
is producing a promising consent with the simulated antenna design. Here, Fig. 12a,
b shows the fabricated design model of the monopole notched octagonal Sierpinski
Fractal antenna with DGS for SWB applications. Figure 13 shows the output wave-
form obtained by the antenna on measuring with the ZNB-20 vector analyzer (100 Hz
to 20 GHz).
276 E. Aravindraj et al.

(a) (b) (c)

(d) (e)
Fig. 10 Radiation pattern in different iterative levels a 0th iteration, b 1st iteration, c 2nd iteration,
d 3rd iteration, and e 4th iteration

Table 3 Bandwidth comparison between different iterations

Antenna f L (GHz) f H (GHz) BW ratio FBW (%)
0th iteration 4.5 19.8 4.4:1 125.95
1st iteration 4.1 19.3 4.7:1 129.91
2nd iteration 4.3 19.6 4.55:1 128.03
3rd iteration 4.4 19.8 4.5:1 127.27
4th iteration 4.1 19.8 4.82:1 131.38

The measured S11 output of the antenna is shown in Fig. 13. The measured results
give almost the same values in terms of S11 which are 4.1–19.8 GHz (S11 ≤−10 dB;
VSWR ≤ 2. The fabricate model occupies the total area of only 30 × 30 × 1.6 mm3
but can cover a huge frequency range of around 15.7 GHz bandwidth. Since this is
fabricated with FR-4, it will be reliable and durable to the environment.
Table 4 represents the comparison between some recent developments in the SWB
antenna design with fractal structures. The miniaturization in size is made up to 30
× 30 mm2 which gives a compact design to the antenna and a good bandwidth and
gain values such as 15.7 GHz and 6.281 dBi, respectively are obtained.
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 277

(a) (b) (c)

(d) (e)

Fig. 11 Current distribution in different iterative levels a 0th iteration, b 1st iteration, c 2nd iteration,
d 3rd iteration, and e 4th iteration

(a) (b)

Fig. 12 Fabricated design model of monopole notched octagonal Sierpinski Fractal antenna with
DGS a front view b back view

4 Conclusion

In this work, a notched octagonal-shaped microstrip patch antenna with Sierpinski

Carpet fractal and defective ground structure is presented. The miniaturized antenna
design has 30 × 30 × 1.6 mm3 dimensional area. This SWB antenna structure
operates at the frequencies between 4.1 and 19.8 GHz (S11 ≤ −10 dB; VSWR <2).
278 E. Aravindraj et al.

Fig. 13 Measured output of

the antenna prototype in
ZNB-20 vector analyzer

Table 4 Comparison between some recent developments in SWB antenna design and proposed
work
Reference no Normalized Frequency Fractional BW Peak gain (dBi) BW ratio
size (mm3 ) range (GHz) (%)
[16] 120 × 120 × 0.70–4.71 148.14 3.9 16.72:1
1.6
[17] 28 × 28 × 3.5–15.1 124.73 3.5 4.31:1
1.6
[18] 24 × 22 × 3.1–10.9 111.42 4.1 3.406
1.57
[19] 18.5 × 39 × 3.2–12 115.78 4 3.66:1
1.6
[20] 30 × 24.8 × 2.6–10.8 122.38 1.2 4.15:1
1.6
[21] 40 × 30 × 3 2.23–3.1 32.64 3.6 1.39:1
[22] 52 × 42 × 0.96–13.98 174.29 4.2 14.56
0.94
[23] 52 × 42 × 0.96–10.9 167.62 3.1 11.47
1.575
[24] 52 × 46 × 0.95–13.8 174.23 4.5 14.2:1
1.6
[25] 150 × 150 × 0.64–1.6 84.61 5.3 2.5:1
0.5
[26] 40 × 38 × 2.25–11.05 132.33 5.05 4.91:1
1.6
Prop. work 30 × 30 × 4.1–19.8 131.38 6.1 4.82:1
1.6
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 279

Between the respective frequencies, the peak gain and directivity values of 6.1 dBi
and 6.45dBi are obtained. The antenna is built by photo-lithographic method and
analyzed in ZNB-20 vector analyzer (100 Hz to 20 GHz). Hereby, the measured
antenna is producing a good agreement with the simulated antenna design. The
intended antenna model offers a maximum fractional bandwidth of 131.38% with
the bandwidth ratio of 4.82:1, where various applications can be utilized within the
range. Hence, the proposed antenna fulfills the bands such as C-band (4–8 GHz), X-
band (8–12 GHz), and K-band (12–18 GHz) and also partially covers ultra-wideband
(UWB) spectrum (3.1–10.6 GHz).

References

1. Karmakar (2020) Fractal antennas and arrays: a review and recent developments. Int J
Microwave Wireless Technol 12(7):1–25
2. Rahman SU, Cao Q, Ullah H (2018) Compact design of trapezoid shape monopole antenna for
SWB application. Microwave Opt Technol Lett 61(8):1931–1937
3. Darimireddy NK, Ramana Reddy R (2018) A miniaturized hexagonal-triangular fractal antenna
for wide-band applications. Int J Antenna Propag 60(2):101–110
4. Dong Y, Hong W, Liu L (2009) Performance analysis of a printed super-wideband antenna.
Microwave Opt Technol Lett 51(4):949–956
5. Moosazadeh M, Kharkovsky S (2017) Antipodal Vivaldi antenna with improved radiation
characteristics for civil engineering applications. IET Microwaves Antennas Propag 11(6):796–
803
6. Wang Z, Yin Y, Wu J, Lian R (2015) A miniaturized CPW-fed antipodal vivaldi antenna with
enhanced radiation performance for wideband applications. IEEE Antennas Wireless Propag
Lett 15(3):16–19
7. Rahman MM, Islam MR (2019) A compact design and analysis of a fractal microstrip antenna
for ultra wideband applications. American J Eng Res 8(10):45–49
8. Zaidi NI, Ali MT, Abd Rahman NH (2019) Analysis of different feeding techniques on textile
antenna. In: 2019 International symposium on antennas and propagation, IEEE Xplore, Xi’an,
China, pp 1–3
9. Khandelwal MK (2017) Defected ground structure: fundamentals, analysis, and applications
in modern wireless trends. Int J Antenna Propag 17(2):1–23
10. Hong JS, Karyamapudi BM (2005) A general circuit model for defected ground structures in
planar transmission lines. IEEE Microwave Wireless Components Lett 15(10):706–708
11. Ali T, Mohammad Saadh AW (2017) A miniaturized metamaterial slot antenna for wireless
applications. AEU Int J Electron Commun 82(12):368–382
12. Aravindraj E, Ayyappan K (2017) Design of slotted H-shaped patch antenna for 2.4 GHz
WLAN applications. In: International Conference on Computer Communication Information
IEEE Xplore, Coimbatore, India, pp 1–5
13. Aravindraj E, Ayyappan K, Kumar R (2017) Performance analysis of rectangular MPA using
different substrate materials For WLAN application. ICTACT J Commun Technol 8(1):1447–
1452
14. Jena MR, Mishra GP (2019) Fractal geometry and its application to antenna designs. Int J Eng
Adv Technol 9(1):3726–3743
15. Constantine ASU, Balanis A Antenna theory: analysis and design, pp 811–882
16. Wang F, Bin F, Sun Q (2017) A Compact UHF antenna based on complementary fractal
technique. IEEE Access Multidiscip 10(9):21118–21125
17. Ali T, Subhash BK (2018) A miniaturized decagonal Sierpinski UWB fractal antenna. Prog
Electromag Res 85(7):161–174
280 E. Aravindraj et al.

18. Soleimani H, Orazi H (2017) Miniaturization of UWB triangular slot antenna by the use of
dual-reverse-arrow fractal. IET Microwaves Antennas Propag 11(4):450–456
19. Gorai A, Pal M, Ghatak R (2017) A Compact fractal shaped antenna for ultra wideband and
bluetooth wireless systems with WLAN rejection functionality. IEEE Antennas Wirel Propag
Lett 16(5):2163–2166
20. Ali T, Mohammad Saadh AW (2018) A miniaturized slotted ground structure UWB antenna
for multiband application. Microwave Opt Technol Lett 60(8):2060–2068
21. Sur D, Sharma A (2019) A novel wideband Minkowski fractal antenna with assistance of
triangular dielectric resonator elements. Int J RF Microwave Comput Aided Eng 29(2):1–8
22. Okas P, Sharma A, Gangwar RK (2017) Circular base loaded modified rectangular monopole
radiator for super wideband application. Microwave Opt Technol Lett 59(10):2421–2428
23. Okas P, Sharma A, Das G, Gangwar RK (2018) Elliptical slot loaded partially segmented
circular monopole antenna for super wideband application. Int J Electron Commun 5(88):63–69
24. Okas P, Sharma A, Gangwar RK (2018) Super-wideband CPW fed modified square monopole
antenna with stabilized radiation characteristics. Microwave Opt Technol Lett 60(3):568–575
25. Dong Y, Hong W, Liu L, Zhang Y, Kuai Z (2019) Performance analysis of a printed super-
wideband antenna. Microwave Opt Technol Lett 51(4):949–956
26. Syeed MAA, Samsuzzaman M (2018) Polygonal shaped patch with circular slotted ground
antenna for ultra-wideband applications. In: 2018 International conference on computer,
communication, chemical, material and electronic engineering (IC4ME2). IEEE Xplore,
Rajshahi, Bangladesh, pp 1–4
27. Aravindraj E, Nagarajan G, Senthil Kumaran R (2020) Design and analysis of recursive square
fractal antenna for WLAN applications. In: 2020 International conference on emerging trends
in information technology and engineering. IEEE Xplore, Vellore, India, pp 1–5
DFT Spread C-DSLM for Low PAPR
FBMC with OQAM Systems

K. Ayappasamy, G. Nagarajan, and P. Elavarasan

Abstract Filter bank multicarrier system (FBMC) is a part of multicarrier scheme

which segmented large frequency spectrum into several narrow subchannels. During
the coherent addition at the transmitter end, the peak powers increased N times the
average value of powers of all N subcarriers which cause the linearity of high power
amplifier. To avoid this, the peak power is reduced so that the ratio of peak power to the
average power (PAPR) is evaluated using selective mapping (SLM) scheme. While
analyzing computational complexity, it found practically high. So that a discrete
Fourier transform (DFT) spread, converse vectors with less complex, dispersion-
based SLM, named as DFT spread C-DSLM, is anticipated for less PAPR and compu-
tational complexity of FBMC with offset QAM scheme. Simulation results expose
that the parameters like PAPR, BER, and computational complexity are proved as
better. So the proposed DFT spread C-DSLM scheme offers better performance than
existing one.

Keywords Complexity · DFT spread · FBMC with OQAM · PAPR

1 Introduction

Cellular mobile-based wireless communication systems oblige a variability of claim

situations instantaneously in an unrestricted framework. Filter bank multicarrier
system is one of the potential system to carter the needs of wireless communica-
tion. The high peak power is the major issue in FBMC scheme [1–3]. The partially
transmitted sequences (PTS) [4, 5] and SLM [6–8] are the two distinctive schemes
to reduce PAPR in FBMC. The pruned DFT spread FBMC schemes found better [9,

K. Ayappasamy (B) · G. Nagarajan

Department of ECE Pondicherry Engineering College, Puducherry 605014, India
G. Nagarajan
e-mail: [email protected]
P. Elavarasan
Department of ECE, RGCET, Puducherry 607403, India

© Springer Nature Singapore Pte Ltd. 2021 281

10]. As an added one, when less number of subcarrier is used, even the cyclic prefix
(CP) is needed to reduce the ISI. In general, FBMC suffers for high complexity [11,
12] and the more PAPR [13]. Therefore, the PAPR and complexity are reduced for
better system.
There are various PAPR reduction methods [14, 15]. These are categorized as
signal distortion methods [16, 17] non-signal distortion classes [18–23]. The modified
PTS based methods are primarily included in probabilistic schemes [24–26] and SLM
based methods [27, 28]. The dispersion-based selective mapping (DSLM) method
is proposed in order to reduce PAPR. Also, the C-DSLM method is proposed to
generate candidate signals by making product with original signal by cyclically
shifted conversion vectors [29, 30]. The residues of following are arranged as: A
transitory FBMC with OQAM model and its problems are shown in Sect. 2. A
conventional SLM method is described Sect. 3. Section 4 designates the proposed
DFT spread C-DSLM method. The comparisons of simulated results of existing and
proposed schemes are listed in Sect. 5. Section 6 gives conclusion and summarization.

2 System Model

FBMC with OQAM system is proved better than OFDM which suffers due to low
frequency utilization because of cyclic prefix and worst ability of out off band
suppression also it proves better claim in 5G system.

2.1 FBMC with OQAM System Model

The signal model of FBMC with OQAM using a FIR filter x(t) is represented as

n−1
T j2πn1t
x(t) = an1,m h t − m e T e jθn1,m (1)
n=1 m=z
2

The length L is the multiple values of K and N, where N is subcarrier and K is

the factor of overlapping. The phase angle is denoted as An1 = π /2 (n + m) which is
positioned in between information symbols and subcarriers [2, 3]. In equation x(t),
symbol interval is T /2. The subcarriers are split by PPN and are fed to parallel to
serial convertor, and vectors of the signal will be equal to the length of the filter as
in Fig. 1a, b.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 283

Fig. 1 a Transmitter of FBMC scheme. b Receiver of FBMC scheme

2.2 Overlapping Structured FBMC with OQAM Signals

The FBMC with OQAM signals consist of M symbols with superposition principle
with M even number and N subcarriers. The real and imaginary structure found by
OQAM are converted from series to comparable manner using vectors [G = d 0 , d 1 ,
d 2 … d (2 M/4)−1 ] sN(2 M/4) , d m = vector, d m = [d 0 m , d 1 m … d N −1 m ], and d n m is the
combination of real and imaginary. d n m = an m + jbn m , where an m is real and bn m is
imaginary parts. The data matrix G is redefined with elements d n m allocated by the
succeeding equation.
m
m
dn1 = an12 where, m = even upto M − 2

(m−1)/2
= bn1 where m = odd upto M − 1 (2)
284 K. Ayappasamy et al.

Fig. 2 Power comparison of FBMC with OQAM and OFDM symbols

The signal s(t) of FBMC with OQAM with continuous time which varies from m
= 0 to N − 1 is given as

M−1 N −1
m j π2 (m+n1) j2πn1t/T T
s(t) = dn1 e e g t −m (3a)
m=0 n1=0
2

where

j π2 (m+n1) j2πn1t/T T
gm,n1 (t) = e e g t −m (3b)
2

typically L g = product of L and N when L represents overlapping factors and is

assigned in even values which greater than 4. The value of N should be greater than
Lg to avoid overlaping. The comparison of overlapping with different symbols of
OFDM and FBMC, with respect to its mean power is denoted in Fig . 2. Also, the
FMBC symbol which overlaping with T/2 duration is shown in Fig. 3 where T is one
FBMC symbol period.

2.3 The PAPR Description in FBMC with OQAM System

The transmitted FBMC signal s(t) is alienated into plentiful disruptions with a time
period T. Then, the PAPR is intended as
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 285

Fig. 3 Overlapping in FBMC with OQAM (L = 4)

maxi T ≤t≤(i+1+0)T |s(t)|2

PAPR(s(t)) = 10 log (dB) (4)
E s(t)2

The signal is oversampled for an oversampling factor O ≥ 4 in order to replicate

the unpredictability and peak dispersion of transmitted signal, in order to achieve a
precise PAPR value. The complementary cumulative distribution function (CCDF)
is a commonly used energetic measuring indicator which is castoff to describe the
PAPR of a signal. It is used to replicate the prospect that the PAPR strokes a threshold
Z.

3 Selective Mapping (SLM) Scheme

This scheme is an exact virtuous method to reduce PAPR in multicarrier modulation.

286 K. Ayappasamy et al.

Fig. 4 Conventional SLM scheme

3.1 Conventional SLM Scheme

The streams consists of message are given by x = [x 1 [0], x 2 [1], …, x n [N − 1]] is

grew using U consistent phase factors phu = [phu 0 , phu 1 , … phu N −1 ] where u varies
from 1 to U and pvu = ejθvu when v varies from 0 to N − 1. Also the value θvu belongs
and varies from 0 to 2π. The values of U altered sequences x u = Pu x = [Pu 0
· X[0], Pu 1 · X[1], …, Pu N−1 · X n [N − 1]] which is found. The symbols with t are
denoted as x u = [x 0 u [0], x 1 u [1], …, x n u [N − 1]] and are gained by smearing the
IFFT to the uth autonomous frequency-field arrangement x u . The FBMC signal x ũ
including the deepest PAPR is designated. The parameter ũ is considered which is
deliberated in Eq. (5).

ũ = arg min PAPR(x u ) (5)
u=0,1,2.....U

In order to receive the actual transmitted signal at the receiving end, the ũ of the
designated phase factors pu are directed as in Fig. 4.

3.2 The SLM with Converse Vectors (C-SLM)

In this C-SLM scheme, the data are produced using cyclically shifted product method
and permit the action which convert from frequency domain to time domain which
are denoted in Fig. 5, which envisages the subsequent equations

x = IFFT N {x} = F x (6)

x u = IFFT N p u x u = F Q u x (7)
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 287

Fig. 5 Conversion vector-based SLM (C-SLM)

⎛ ⎞
1 1 1 ... 1
⎜1 W N−1 W N−2 . . . W N−N −1 ⎟
⎜ ⎟
1 ⎜1 W N−2 W N−4 −2(N −2) ⎟
F= ⎜ . . . WN ⎟ (8)
N⎜⎜ .. .. .. .. .. ⎟
⎟
⎝. . . . . ⎠
1 W N−(N −1) W N−2(N −1) . . . W N−2(N −1)

where W N = e−j4π /2 N . The reverse for (6) is denoted in (9)

x = F −1 x (9)

where F −1 is fast Fourier transform; and Eq. (7) is redrafted as

x u = F Q u F −1 x = C u x (10)

where Qu denotes a square matrix collection of fundamentals in phase rotation factors.

The matrix of N point IFFT transform is denoted as F which is shown in Eq. (8).

3.3 Conversion Vector-Based Dispersive SLM (C-DSLM)

Schemes with FBMC with OQAM

A conversion vector with dispersive SLM (C-DSLM) method is executed for low
PAPR. In the C-DSLM, the manner of producing phase factor is same as normal
SLM method, for example, the present mnth sequence; the finest phase factors are
resolute using succeeding equation with an interval T 0 .
The oversampling past symbols are denoted as
288 K. Ayappasamy et al.

2m−1 N −1
T
xmμmin

e j T nt e j∅m ,n
2π
O s (t) = ,n g t − m (11a)
m =0 n1=0
2

The current symbol is denoted as

2m+1 N −1
T
xmμ ,n g

e j T nt e j∅m ,n
2π
c (t) =
s
t −m (11b)
m =2m n=0
2

The selection conversion vectors are depends on the sum of the over sampling of
past symbols and current symbols.

4 Proposed DFT Spread C-DSLM for FBMC—OQAM

System

The DFT spread, C-DSLM scheme for minimum PAPR for FBMC with OQAM
scheme is proposed. The DFT spreading is clarified in Sect. 4.1. Then, the conversion
vectors are designed and constructed, and then, precise step is encountered.

4.1 DFT Spreading

The method of DFT spreading applied in conventional C-DSLM method is shown in

Fig. 6. The outputs of OQAM modulation are used to the inputs of discrete Fourier
transform (DFT), and the outputs of DFT are applied to the conversion vectors
module. The extensive implementations of DFT spreading are designated as shown
in Fig. 7. The incoming data d (0: N D−1),m show mnth real and imaginary sequenced
vectors with product results of parameters N and D. The inverse DFT with poly-phase
network method is used to provide the sequences of data. Generally, poly-phase
network is used for sum operations.

4.2 The Conversion Vectors and Its Design

The elements in the conversion vector cu found by enchanting the discrete Fourier
transform of the phase factors pu which is present at the sets {±1, ± j}. Conversion
vector cu may be gratified the below conditions:
a. The value of nonzero elements in cu should be less than or equal to 4.20.
b. The complex values of the nonzero elements in cu should be selected from the
set {1, − 1, and 0}.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 289

Fig. 6 DFT spread C-DSLM scheme

Fig. 7 DFT spreading method

The model of matrix C3 is denoted in Eq. (12).

290 K. Ayappasamy et al.

⎛ ⎞
1 0 0 0 j 0 0 0 1 0 0 0 −j 0 0 0
⎜ 0 1 0 0 0 j 0 0 0 1 0 0 0 −j 0 0 ⎟
⎜ ⎟
⎜ 0 0 1 0 0 0 j 0 0 0 1 0 0 0 −j 0 ⎟
⎜ ⎟
⎜ 0 −j ⎟
⎜ 0 0 1 0 0 0 j 0 0 0 1 0 0 0 ⎟
⎜ ⎟
⎜−j 0 0 0 1 0 0 0 j 0 0 0 1 0 0 0 ⎟
⎜ ⎟
⎜ 0 −j 0 0 0 1 0 0 0 j 0 0 0 1 0 0 ⎟
⎜ ⎟
⎜ 0 0 −j 0 0 0 1 0 0 0 j 0 0 0 1 0 ⎟
⎜ ⎟
⎜ 0 0 0 −j 0 0 0 1 0 0 0 j 0 0 0 1 ⎟
C = 0.5 ∗ ⎜
⎜ 1
⎟
⎜ 0 0 0 −j 0 0 0 1 0 0 0 j 0 0 0 ⎟⎟
⎜ 0 −j 0 ⎟
⎜ 1 0 0 0 0 0 0 1 0 0 0 j 0 ⎟
⎜ ⎟
⎜ 0 0 1 0 0 0 −j 0 0 0 1 0 0 0 j 0 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 0 0 −j 0 0 0 1 0 0 0 j ⎟
⎜ ⎟
⎜ j 0 0 0 1 0 0 0 −j 0 0 0 1 0 0 0 ⎟
⎜ ⎟
⎜ 0 j 0 0 0 1 0 0 0 −j 0 0 0 1 0 0 ⎟
⎜ ⎟
⎝ 0 0 j 0 0 0 1 0 0 0 −j 0 0 0 1 0 ⎠
0 0 0 j 0 0 0 1 0 0 0 −j 0 0 0 1
(12)

As per the inverse negative of the theorem, Table 1 is denoted. By echoing the
vector by the ratio of N and D values, the anticipated phase factors rotation vector
pu is found as,

p u [( p̃ u )T , ( p̃ u )T , . . . , ( p̃ u )T ]T . (13)

Using p̃3 = {1, −1, 1, 1}, an example yields

P 3 = [1, −1, 1, 1, 1, −1, 1, 1 . . . , 1 − 1, 1, 1] (14)

C 3 = [1, 0, 0 . . . 0, − j, 0, 0 . . . 0, 1, 0, 0, . . . 0, j, 0, 0, . . . 0] (15)

One symbol (e.g., 1, 0, 0…0) equals ((N − 4)/4) duration. The above progression
is articulated by the product of s and C u , i.e., su = C u s, where C u the convolution
matrix agreeing to cu is denoted as
N −2 x (N −1)
0 1
Cu = cx , cx . . . cx , c (16)

where (cx )‹n› characterizes the descending cyclical shifting of cx using n essentials.
Equation (12) displays structure of downward cyclic shift with C3 .
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 291

Table 1 Phase rotation

Index Rotation of phase vectors Conversion vectors
vector and the corresponding
conversion vectors 1 P1 = − 1 −1 −1 1 − 0 1 −j 0 1 +j

2 P2 = − −1 1 −1 1 − 0 0 −1 0

3 P3 = − 1 1 −1 1 − 1 1 −1 1

4 P4 = − −1 −1 1 1 − 0 −1 − j 0 −1 + j

5 P5 = − 1 −1 1 1 − 1 −j 1 j

6 P6 = − −1 1 1 1 − 1 −1 −1 −1

7 P7 = − 1 1 1 1 − 1000

8 P8 = 1 1 1 1 1000

9 P9 = −1 1 1 1 1 −1 −1 −1

10 P10 = 1 −1 1 1 1 −j 1 j

11 P11 = −1 −1 1 1 0 −1 − j 0 −1 + j

12 P12 = 1 1 −1 1 1 1 −1 −1

13 P13 = −1 1 −1 1 0 0 −1 0

14 P14 = 1 −1 −1 1 −1 j −1 j

15 P15 = −1 −1 −1 1 − −1 − j −1 j

16 P16 = − −1 −1 −1 1 − 0 1 −j 0 1 +j

4.3 Proposed DFT Spread Converse Vectors with DSLM

Method (C-DSLM)

An operation of DFT spread C-DSLM method is described below.

Method 1: A Phase Initializing. A pairs of N, M, U, L, and also O are parameters.
Index m is assigned as 01. The conversion matrix C u using Eqs. (17)–(19) is denoted
as
T u T T T
Pu = Pu , P , . . . , pu ∈ C 1×ON (17)

C u = IFFT P u ∈ C 1×ON (18)

292 K. Ayappasamy et al.

u 0
Cu = (C ) , (C u )1 · · · (C u )(ON−1) ∈ C ON×ON (19)

where u denotes {1, 2, …, U} which is the phase rotational values pu in(17), which
is given in Table 1.
Method 2: Modulation with Conversion Vectors. Injecting the value of products
of (0–1) with N zeros at central place presents vectors x = [x 1 , x 2 , …, x N/ 2 ].

S u = C u .s 1 ∈ R 1×ON , u = 2, 3, . . . , U (20)

The values su is recurrent to L values and got product with g which gives
T
SLu = [(S u )T , . . . , S u )T ∈ R 1×L .ON (21)
T
where [(S u )T , . . . , S u )T repeats L times.
Calculation of PAPR: The value of peak power to average power is calculated for
su (t) on a certain interval T o
maxt∈T0 S u (t)2
PAPRuT0 = , u ∈ {1, 2, . . . , U } (22)
1
T
∫T0 |S u (t)|2 dt

The value of T 0 is distresses reduced PAPR for proposed scheme and where T 0
= [mT, mT + 4 T ].
Method 3: Selection. The minimum of PAPR is observed and is recorded by the
following formula:

u min = min PAPRuT0 (23)

0≤u≤U −1

Method 4: Update. The present overlapping input symbol vector,

m +1
Smu min
+1 = S u min (t) (24)

The index umin is deposited in a vector SI which is sent as SI to receive it at the

receiver

SI = [SI u min ] (25)

Then, the incremented value of m as m is added with 1. Method 2 will be recurring

till m = M.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 293

Table 2 Computational
Schemes Number of Number of additions
complexity calculation
multiplications
C-DSLM UM(ON/2) UMON log2 (ON)
Log2 (ON)
Proposed DFT M(ON/2) MON log2 (ON) +
spread C-DSLM log2 (ON) 3(U − 1) MON

4.4 Analysis of Computational Complexity

A complexity measure is done for proposed DFT spread C-DSLM scheme and
compared with existing methods. It is restrained by the number of multiplication and
addition processes. The CCRR is abbreviated as computational complexity reduction
ratio which is defined by (Table 2)

complexity of DFT Spread CDSLM
CCRR = 1 − × 100% (26)
Complexity of CDSLM

5 Performance Evaluation

The results are simulated for the proposed scheme using MATLAB R2018a. The
proposed DFT spread C-DSLM scheme is well suited for low PAPR and low compu-
tational complexity. The simulated results given in Table 5 are compared with existing
C-DSLM scheme which is designed for FBMC with OQAM signals.

5.1 Simulation Environment

The number of subchannel N taken for simulation is 64, 256, 512 and 1024, and
the number of symbol is assigned as 100. Subcarrier spacing is assigned as 15 kHz.
The oversample factor O and overlap factor L are assigned as 4. The phase rotation
vectors U are assigned as 4, 8, and 16. The four QAM modulations are used, and the
sampling period T s is assigned as 0.4 µs. Multipath fading channels are used to test
the performance. The parameters assigned for simulation are given in Table 3.
294 K. Ayappasamy et al.

Table 3 Simulation
Simulation attributes Remarks
parameters
Tool used for simulation MATLAB R2018a
Subchannels N considered 64, 256, 512 and 1024
Period for sampling T s T s = 0.4 µs
Number of symbols M 100
Oversampling factor O 4
Phase rotation vectors U 4, 8, or 16,
Modulation 4 QAM (4 OQAM)
Real-valued symbols 3 × 104 OQAM
Overlap factor L 4
Channel Multipath fading channels
Subcarrier spacing 15 kHz

5.2 Calculation of Computational Complexity

To assess the complexity of proposed method and conventional methods, the assigned
subcarriers’ N is 64, M is 100, and O is 4. The phase rotation vectors U are assigned
as 4, 8, or 16; the numbers of complex addition and multiplication are given in Table
2. The calculated values of complexity using assigned specification are given in
Table 3. The counts of calculated multiplications for the proposed method is lesser
which is 30,822, for all U = 4, 8, and 16, whereas the counts for existing C-DSLM is
123,301, 246,603, and 493,158 for U = 4, 8, and 16, respectively. Also the percentage
of CCRR when U = 4, 8, and 16 are 75%, 87.51%, and 93.8%, respectively. On the
other hand, the numbers of additions of proposed method are also proved as less
CCRR. So, it is proved that the proposed method offers better concert than existing
C-DSLM scheme (Table 4).

Table 4 Computational complexity

Phase rotation Types of Existing C-DSLM Proposed method CCRR in %
vectors calculations
U=4 No. of 123,301 30,822 (less) 75
multiplications
No. of additions 246,579 292,044 18.4
U=8 No. of 246,603 30,822 (less) 87.51
multiplications
No. of additions 493,158 599,244 21.2
U = 16 No. of 493,158 30,822 (less) 93.8
multiplications
No. of additions 986,316 1,213,644 23
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 295

Table 5 Comparison of results

Parameters DSLM as in C-DSLM Proposed CCRR %
[1] DFT spread
C-DSLM
PAPR 8 dB when 7.1 dB 6.3 dB when –
CCDF is when CCDF 10−4
10−4 CCDF 10−4
Bit error rate 23 dB SNR 24.5 dB 15 dB SNR –
when 10−15 SNR when for 10−15
BER 10−15 BER BER
Computational Multiplications U= 409,600 123,301 30,822 75%
complexity 4 (less)
U= 819,200 246,603 30,822 87.51%
8 (less)
U= 1,638,400 493,158 30,822 93.8%
16 (less)
Additions U= 819,200 246,579 292,044 18.4%
4
U= 1,638,400 493,158 599,244 21.2%
8
U= 3,276,800 986,316 1,213,644 23%
16

5.3 Transmission and Reception of DFT Spread C-DSLM

Scheme

The outputs of various parameters of transmission and reception of proposed DFT

spread C-DSLM scheme are shown in Figs. 8, 9, 10, 11, 12, 13, 14, 15 and 16. These
are obtained when the inputs are assigned as number of subcarrier is 1024, carrier
count is 64, bits per symbol are 8, and symbol per carrier is 16. These results are
obtained when the SNR is 30 dB.
Carrier Magnitude Analysis. First carrier magnitude analysis is shown between
transmission and reception as shown in Figs. 8 and 13. This gives the magnitude
level 1 for given 64 subcarriers. It is also proved that there is zero degree phase
deviation of subcarriers between transmitter and receiver.
Carrier Phase Analysis. The carrier phase analysis between transmission and recep-
tion for the assigned subcarriers are shown in Figs. 9 and 14. It is proved that there
is no phase deviation of subcarriers between input and output side.
Time Domain Signal Analysis. Figures 10 and 16 show that the transmitted and
received time domain signals which prove that these two signals are same in
amplitude.
296 K. Ayappasamy et al.

DFT spread C-DSLM transmission - Carrier Magnitude analysis

1.5

Magnitude 1

0.5

-0.5
0 100 200 300 400 500 600 700 800 900 1000
subcarriers

Fig. 8 Carrier magnitude analysis of proposed system when transmission

DFT spread C-DSLM FBMC transmission - Carrier Phase analysis

200

150

100
Phase (degrees)

-50

-100

-150

-200
0 100 200 300 400 500 600 700 800 900 1000
subcarriers

Fig. 9 Carrier phase analysis when transmission

Peak Power and Spectrum. The proposed DFT spread C-DSLM gives less PAPR
than the conventional system which is proved by Fig. 11. The peak power is limited
within 0.05 to −0.05 V for assigned 1024 subcarriers. Also the limited received
spectrums with subcarriers powers are shown in Fig. 12. The normalized frequencies
which are assigned for this are f s /2 = 0.5. The angle of phase rotation factors are
shown in Fig. 15.
PAPR Comparison. The PAPR for the proposed DFT spread and other existing
scheme is simulated and compared as shown in Fig. 17. The proposed method offers
6.3 dB for 10−4 CCDF of less value than compared to C-DSLM and DSLM which
shows 7.1 dB and 8 dB for same 10−4 CCDF, respectively. So the proposed system
offers less PAPR than other conventional systems which is due to the selection of
relevant phase factors.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 297

DFT spread C-DSLM FBMC transmission - Time domain

sinal at One Symbol Period
0.04

0.03

0.02

0.01
Amplitude

-0.01

-0.02

-0.03

-0.04
0 200 400 600 800 1000 1200
subcarriers

Fig. 10 Time domain signal of DFT spread C-DSLM transmission

DFT spread C-DSLM FBMC transmission- peak power

limited signals
0.15

0.1
Amplitude (volts)

0.05

-0.05

-0.1

-0.15

-0.2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
subcarriers

Fig. 11 Peak power limited up to 1024 subcarrier

DFT spread C-DSLM FBMC- REceived Signal Spectrum

5
0
-5
Magnitude (dB)

-10
-15
-20
-25
-30
-35
-40
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency (0.5 = fs/2)

Fig. 12 Spectrum of received signal

298 K. Ayappasamy et al.

DFT spread C-DSLM FBMC- Received carrier magnitude

1.5

Magnitude 1

0.5

-0.5
0 100 200 300 400 500 600 700 800 900 1000
subcarriers

Fig. 13 Carrier magnitude analysis when reception

DFT spread C-DSLM FBMC- Received phase analysis

200

150

100
Phase (degrees)

-50

-100

-150

-200
0 100 200 300 400 500 600 700 800 900 1000
subcarriers

Fig. 14 Carrier phase analysis when reception

BER Performance of Different Schemes. The bit error performance of DSLM, C-

DSLM, and proposed DFT spread C-DSLM is simulated which is shown in Fig. 18.
It is noted that the proposed scheme offers low power which is 15 dB of SNR for the
10−15 bit error rate. But the existing methods of DSLM and C-DSLM offer 23 and
24.5 dB of SNR in dB for same 10−15 bit error rate. So, the proposed method offers
better bit error rate performance than compared to the existing one (Table 5).

6 Conclusion

A DFT spread converse vectors with less complex, dispersion-based SLM method is
implemented for reduced PAPR and low complexity in FBMC with OQAM system.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 299

Fig. 15 Phase factor 90 1.5

analysis 120 60

150 30
0.5

180 0

210 330

240 300
270

Received signal
0.04

0.03

0.02
Amplitude in v

0.01

-0.01

-0.02

-0.03

-0.04
0 200 400 600 800 1000 1200
subcarriers

Fig. 16 Received time domain signal of DFT spread C-DSLM scheme

An estimation of the complexity and simulation results of proposed scheme are shown
and compared with the existing dispersive SLM and C-DSLM schemes. The proposed
system offered better PAPR reduction which is 6.3 dB, whereas the existing methods
show 7.1 and 8 dB when CCDF is 10−4 . Also the proposed scheme proved that it
offers less computational complexity which has 93.8% of computational complexity
reduction ratio (CCRR) when phase rotation factor U = 16 for complex multipli-
cations than compared to the conventional schemes. Therefore, the proposed DFT
spread C-DSLM scheme proposals improved performance while comparing to the
existing DSLM and C-DSLM schemes.
300 K. Ayappasamy et al.

0 PAPR of Different Schemes

10
FBMC-OQAM
SLM -FBMC
DSLM- FBMC
-1 C-DSLM FBMC
10 proposed DFT C-DSLM FBMC
P(PAPR>z)

-2
10

-3
10

-4
10
4 5 6 7 8 9 10 11 12
z in dB

Fig. 17 PAPR comparison for different schemes

BER performances for different schemes

0
10

-5
10
Bit Error Rate

-10
10

C-DSLM
proposed DFT spread C-DSLM
-15 DSLM
10
0 5 10 15 20 25
SNR in dB

Fig. 18 BER performance for different PAPR reduction schemes

References

1. Cheng X, Shi W, Zhao Y (2020) A novel conversion vector-based low-complexity SLM scheme
for PAPR reduction in FBMC/OQAM systems. IEEE Trans Broadcast 3:1–11
2. Na D, Choi K (2020) DFT spreading-based low PAPR FBMC with embedded side information.
IEEE Trans Commun 68:1–15
3. Jinwei J, Ren G, Zhang H (2015) A semi-blind SLM scheme for PAPR reduction in OFDM
systems with low-complexity transceiver. IEEE Trans Veh Technol 64:2698–2703
4. Chen D, Tian Y, Qu D, Jiang T (2018) OQAM-OFDM for in future internet of things: a survey
on key technologies and challenges. IEEE Internet Things J 5:3788–3809
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 301

5. Choi J, Oh Y, Lee H, Seo J (2017) Pilot-aided channel estimation utilizing intrinsic interference
for FBMC/OQAM systems. IEEE Trans Broadcast 63(4):644–655
6. Li D, Chen D, Qu Y, Zhang TJ (2019) Receiver design for Alamouti coded FBMC system in
highly frequency selective channels. IEEE Trans Broadcast 65(3):601–608
7. Zhang L, Xiao P, Zafar A, Quddus AU, Tafazolli R (2017) FBMC system: an insight into
doubly dispersive channel impact. IEEE Trans Veh Technol 66:3942–3956
8. Tian Y, Chen D, Luo K, Jiang T (2019) Prototype filter design to minimize stop-band energy
with constraint on channel estimation performance for OQAM/FBMC systems. IEEE Trans
Broadcast 65:260–269
9. Al-Dweik A, Younis S, Hazmi A, Tsimenidis C, Sharif B (2012) Efficient OFDM symbol
timing estimator using power difference measurements. IEEE Trans Veh Technol 61:509–520
10. Ayappasamy K, Nagarajan G, Elavarasan P (2020) Decision feedback equalizers and Alamouti
coded DFT spread for low PAPR FBMC-OQAM system. In: IEEE international conference on
emerging trends in information technology and engineering (ic-ETITE). IEEE explore digital
library electronic. ISBN: 978-1-7281-4142-8
11. Cheng X, Liu D, Wang C, Yan S, Zhu Z (2019) Deep learning based channel estimation and
equalization scheme for FBMC/OQAM systems. IEEE Wirel Commun Lett 8:881–884
12. Li W, Qu D, Jiang T (2018) An efficient preamble design based on comb-type pilots for channel
estimation in FBMC/OQAM systems. IEEE Access 6:64698–64707
13. Elavarasan P, Nagarajan G (2012) Optimal phase selection factor for PTS using GPW and RPW
in OFDM systems. J Comput Sci 6:140–147
14. Elavarasan P, Nagarajan G (2014) A summarization on PAPR techniques for OFDM systems.
Int J Inst Eng Ser B 96:381–389
15. Ayappasamy K, Nagarajan G, Elavarasan P (2019) FBMC OQAM-PTS with virtual symbols
and DFT spreading techniques. In: UGC CARE list category B: indexed in emerging sources
citation index (web of science). Mathematical reviews (MathSciNet). J Adv Appl Math Sci
18:817–825
16. Li XD, Cimini LJ (1998) Effect of clipping and filtering on the performance of OFDM. IEEE
Commun Lett 2:131–133
17. Wang X, Tjhung TT, Ng CS (1999) Reduction of peak-to-average power ratio of OFDM system
using a companding technique. IEEE Trans Broadcast 45:303–307
18. Muller SH, Huber JB (1997) OFDM with reduced peak-to-average power ratio by optimum
combination of partial transmit sequences. Electron Lett 33:368–369
19. Bauml RW, Fischer RFH, Huber JB (1996) Reducing the peak-to average power ratio of
multicarrier modulation by selected mapping. Electron Lett 32:2056–2057
20. Elavarasan P, Nagarajan G (2015) Peak-power reduction using improved partial transmit
sequence in orthogonal frequency division multiplexing systems. Int J Comput Electr Eng
44:80–90 (SCIE, SCOPUS, Elsevier, IF-1.747)
21. Jones AE, Wilkinson TA, Barton SK (1994) Block coding scheme for reduction of peak to
mean envelope power ratio of multicarrier transmission schemes. Electron Lett 30:2098–2099
22. Wulich D (1996) Reduction of peak to mean ratio of multicarrier modulation using cyclic
coding. Electron Lett 32:432–433
23. Jiang T, Li X (2010) Using fountain codes to control the peak-to-average power ratio of OFDM
signals. IEEE Trans Veh Technol 59:3779–3785
24. Kollar Z, Horvath P (Jun 2012) PAPR reduction of FBMC by clipping and its iterative
compensation. J Comput Netw Commun 5. Art. no. 382736
25. You Z, Lu I-T, Yang R, Li JL (2013) Flexible companding design for PAPR reduction in OFDM
and FBMC systems. In: Proceedings of the international conference on computer networks and
communications (ICNC). San Diego, CA, USA, pp 408–412
26. Qu D, Lu S, Jiang T (2013) Multi-block joint optimization for the peak to-average power ratio
reduction of FBMC-OQAM signals. IEEE Trans Sig Process 61:1605–1613
27. Ye C, Li Z, Jiang T, Ni C, Qi Q (2014) PAPR reduction of OQAMOFDM signals using segmental
PTS scheme with low complexity. IEEE Trans Broadcast 60:141–147
302 K. Ayappasamy et al.

28. He Z, Zhou L, Chen Y, Ling X (2018) Low-complexity PTS scheme for PAPR reduction in
FBMC-OQAM systems. IEEE Commun Lett 22:2322–2325
29. Goff SYL, Al-Samahi SS, Khoo BK, Tsimenidis CC, Sharif BS (2009) Selected mapping
without side information for PAPR reduction in OFDM. IEEE Trans Wirel Commun 8:3320–
3325
30. Hong E, Kim H, Yang K, Har D (2013) Pilot-aided side information detection in SLM-based
OFDM system. IEEE Trans Wirel Commun 12:3140–3147
Secure, Efficient, Lightweight
Authentication in Wireless Sensor
Networks

Bhanu Chander and Kumaravelan Gopalakrishnan

Abstract In recent years, communication-based security is important for the

progression sensor node appliances in wireless sensor network (WSN). The nodes
deployed in unattended regions sense various types of raw information depending on
the application and dispatch it to the base station. Intruder or challenger can insert
forged data records in the network and compromise the routing records among nodes.
For trustworthy communication of data, confidentiality along with integrity are two
foremost security trends which eradicate illegal sensors from communication. In
literature, numerous models suggested for authentication in WSNs but most of the
models suffer from high communication, limited storage, and node lifespan. In this
article, a secure, efficient authentication mechanism is projected which applies MAC
and digital certificate schemes for authentication with other nodes to initiate their
communication practice. Message authentication is one of the most successful ways
to prevent illegal and degrading messages from being promoted in WSNs.

Keywords Sensor node · Base station · Cluster head · Message authentication

code · Digital certificate

1 Introduction

A wireless sensor network (WSN) mostly employed to observe physical or envi-

ronmental circumstances like pressure, temperature, sound, etc. Also, sensor nodes
cooperatively collaborate to transport their observed data records through a wireless
communication network to a central location or base station (BS) [1, 2]. Military
appliances like battlefield surveillance support the expansion of WSNs; in the present
day, such networks exploiting in various industrialized and end-user applications
like building safety, expansion of IoT and smart cities, the wearable body sensors
for health monitoring and medical research, machine health monitoring, industrial

B. Chander (B) · K. Gopalakrishnan

Department of Computer Science and Engineering, Pondicherry University, Pondicherry 609605,
India

© Springer Nature Singapore Pte Ltd. 2021 303

product monitoring, space and under-water applications [1–3]. Though the collec-
tion and processing of the sensing data from the coverage area, WSNs allow users
to access detailed and consistent information at any time and any place, which is
an omnipresent sensing skill. Most of these appliances hold large-scale WSNs, as a
result, the quantity of raw sensed data is very large, but the sensor nodes contain very
small resource constraints such as limited bandwidth, battery life, computing power,
small storage space, and communication resources. In WSNs, incomplete battery
life is one of the essential constraints because it is difficult to exchange the batteries,
and there are no chances for recharging the randomly deployed sensor nodes. For
this reason, energy efficiency is an indispensable part of WSNs. On the other hand,
WSNs regularly function in hazard atmospheres. So, adversary has chances that
straightforwardly incarcerate a sensor node from target land and extort entire sensed
records from its memory since nodes are not in general tamper-resistant due to their
expenses efficacy [4–7].
WSNs have been broadly utilized by researchers, academicians, and industrialists
in dissimilar personal, organizational appliances. In recent times, WSN has turn out
to be the most important field and moreover gets the attention of researchers to
set up a secure network next to the malevolent node [4, 5, 8]. Safety measures of
WSN can be subjected to threats by foes. Since the inadequate resources of a sensor
node, it is hard to apply conventional which produces great results in other networks
to mitigate various threats. Intruder or attackers come up with several methods to
access the sensible, confidential raw data from low-powered sensor nodes. Thus, it is
necessary to design solid security measurements for sensible WSNs, to be damaged
from a mixture of attacks [2–5, 8].

2 Authentication in WSN

There is a strong requirement to protect the sensed readings and data in WSNs.
Authentication is one of the security mechanisms that defend WSNs from an ample
range of security attacks. Authentication is a procedure by which the individuality of
a sensor in the setup is confirmed as well as promises that the records or the messages
initiate from an authentic basis. In simple words, authentication affords to the data
which can be propel or access by every node in the set-up [1–3, 8]. Besides, it is
significant toward averting and gets the fault records from the illegal abusers. In hand,
there are numerous authentication methods proposed in WSN security, and those are
categorized under authentication using unicast, multicast, and broadcast messages,
cryptographic keys, and static, mobile, or both aspects of WSN. In unicast—performs
point-to-point authentications with unicast messages, there is no involvement of any
other sources. In broadcast, messages are directly taken from reliable sources and
cannot be altered throughout transmit. Inspection identity of source from where the
message initiated conforms the message reliability for making sure the message
uniqueness, falsification, impersonation, etc., are some of the basic works of the
broadcast [1–4, 8]. Coming to cryptographic-based authentication, it could be an
Secure, Efficient, Lightweight Authentication in Wireless … 305

either symmetric or asymmetric method. In symmetric, the initiator and receiver

use the same key for authentication as well as certification. In asymmetric, initiator
makes his sign employing the private key, and beneficiary authenticates it through
particular public key.
Hence, the purpose of suitable security explanations for WSN has been an active
research challenge. Some protocols show good results in other networks that have
high resource constraints, annd adopting those into WSNs is not applicable because
WSNs inherit a few boundaries. Foremost, power restrictions are owing to the
compactness of the sensor picture. Next, there is vulnerability restriction owing to a
shortage in the physical guard, along with the open scenery of wireless communi-
cation channels. Based on the resource boundaries of sensors, symmetric key tech-
niques are a superior preference to utilize in comparison with asymmetric key tech-
nique although symmetric key cryptographic practices are insecure to develop. To
conflict these confines, the homomorphism encryption along with message authen-
tication codes (MAC) brings into progress the secrecy, validation with reliability in
WSNs. Considering the above-mentioned issues, this article proposes an authenti-
cation protocol based on MAC and digital certificate swap among sensor nodes and
BS. Also, both node mobility, re-authentication have been taken into concern.

3 Related Work

In [7], authors designed a network that uses a novel, well-organized source anony-
mous message authentication scheme (SAMA) performed on elliptic curve cryp-
tography (ECC) to produce unconditional source anonymity. The scheme allows
in between nodes to authenticate the message, and thus, any tainted message can
notice and be dropped to preserve the sensor power. In [9], developed SDAACA
protocol holds a pair of algorithms: Secure data fragmentation (SDF) along with node
joining authorization (NJA). Here, SDF covers the data records from being spoiled by
attackers via fragmenting to small pieces. NJA authorizes any new node that wants to
connect with the network. In [10], authors analyzed two-factor authentication draw-
backs and planned a novel three-factor certification protocol in favor of WSNs in
IoT. Here, fuzzy models extract abusers biometric data for password verification then
verified with BAN logic. The simulation part shows that the protocol achieves free
password change, detects known attacks, and quickly recognizes informal logins.
Authors of [11] designed a model to avoid attacks in node and cluster level, authors
proposed a new procedure which contains two algorithms. First, the key renewal
scheme performed at the cluster and followed by the node authentication process.
According to the authors, we can authenticate nodes at some stage in the key estab-
lishment step and refurbish the key sporadically, [12] author offered a speedy confir-
mation of vBNN-IBS, a pairing-free identity-based signature by condensed signature
magnitude. The speeding up procedure intended to decrease the energy utilization
and thus expand the system life span by diminishing the computation overhead owed
to signature certification. Authors in [13] developed a trivial authentication practice
306 B. Chander and K. Gopalakrishnan

for node-node, node-base station, and base station-node. All of these protocols follow
ECC and hidden generator theory along with the Hash chain in [14] two protocols
are presented; one takes care of the probability of the proposed model to have a
trusted sensor node. The second part analyzes the energy consumption of model,
and improved protocol authenticates newly joined nodes in WSNs with the help of
a trusted proposal. In [15], the author’s projects improvised three-factor authentica-
tions and definite with ProVerif, and results show the protocol secure against both
formal and informal attacks; also, it has high robustness. In [16], digital certificate-
based node authentication developed by authors, where every node in the network
assumes that BS is the trusted third party that provides the digital certificate to all
legitimate nodes. By verifying the details stored in certificate, nodes authenticate
with each other [17] and propose simple authentication and key distributed scheme
among sensor nodes; moreover, authors developed a re-authentication set of rules
by considering node mobility [18] and designed a new certificate-less authentica-
tion scheme to avoid man in the middle assault. Simulation outcome proves that the
projected design reduces 6–15% energy consumption. From [19], authors analyzed
three-factor authentications as well as develop a secured protocol and verify it with
numerous attacks. In [20] authors fabricated an authentication and key management
scheme (AKMS) in favor of WSNs use symmetric keys with keyed Hash functions
along with the bidirectional procedure for message reliability and faithfulness.

4 Proposed Authentication Mechanism

4.1 Simple Cluster Head (CH) Development

First of all, each sensor node holds some unused energy, and after waiting spec-
ified time instance, sensor node forwards HELLO packets to all available sensor
nodes. Here, the time depends on the user, which mentioned at the time of network
deployment, and if the node does not receive any initial packet-related notification
within the time, it is declared as CH. After receiving HELLO packets, each sensor
node compares the received signal strengths and the node which has the highest
value elected as CH. The non-CH node finds the minimum hop distance (HDmin )
involving itself and its consequent CH with the help of received power (pr ) from
the CH and communication range of the sensor nodes (nt ). Additionally, the sensor
node announces its location with the help of a non-persistent CSMA approach, so
sensors those extremely away from CH can utilize intermediate nodes to transfer
their sensible data to CH through multi-hop communications.
Secure, Efficient, Lightweight Authentication in Wireless … 307

4.2 Proposed Protocol

In this protocol, we assumed that the base station (BS) and cluster head (CH) have
the highest resources where adversary cannot able to perform any kind of attacks on
them. BS stores the identities of each sensor node and cluster head in its memory
storage. Base station forwards primary secret keys to every single sensor node plus
cluster heads, for instance—K AB = Primary secret key mutual among sensor nodes

A and B. K AB = Secret key mutual among A and B to calculate MAC and so on.
Similarly, BS and CH share the public key. Every sensor node generates its public
and private keys, share d keys for encryption plus MAC calculation is done on it.
The keys used in this scheme are independent of working. As a result, the attacker
can get crack or recognize any one of the keys it is not so useful for future operation
because the attacker could not recognize any new keys since the computation uses
autonomous of keys.
Notations employed in proposed procedure
A, B Sensor nodes
BS Base station
N Nonce values
CH Cluster head
PUa , PUb Public keys of node A and B
PRa , PRb Private keys of node A and B
PUCH , PRCH Public, private keys of CH
PU, PR Public key, private key of BS
K AB Secret key shared among A and B

K AB Secret key shared among A and B used to calculate the MAC
K A,CH Secret key shared among A and CH

K A,CH Secret key shared among A and CH used to calculate the MAC
K B,CH Secret key shared among B and CH

K B,C H Secret key shared among B and CH used to calculate the MAC
K CH,BS Secret key shared among CH and BS

K CH,BS Secret key shared among CH and BS used to calculate the MAC
K chu Cluster authentication key
IDa Identity of node A
C Digital certificate
m Message
R Digital certificate request message
{M} K AB Encrypted memo M with encryption

MAC{M} K AB
Calculation of MAC of (M) by MAC key K AB .
In the primary level, sensor node ‘A’ initiates a certificate request message to
its corresponding cluster head. Then, CH validates the uniqueness of sensor node
and furthermore forwards the request to BS. Then, BS creates a certificate for the
requested node, forwards it to CH, and CH receives a digital certificate and sends it
to an appropriate sensor node.
308 B. Chander and K. Gopalakrishnan

Step 1: Sensor Node A to CH

A → CH : M1 = PUa , IDa Na , R|MAC(PUa , IDa , Na , R)K A,CH

In step 1, sensor node A forwards a message ‘M1 ’ toward CH and that contains a
certificate request message ‘R’, along with its ‘IDa ’, public key ‘PUa ’ plus a nonce
‘N 1 ’, besides, concatenates the MAC of M 1 encrypted via the MAC secret key shared
among A, CH.
Step 2: Cluster Head (CH) to Base Station (BS)

CH → BS : IDCH , Nch , M1 | MAC(Nch , M1 )K CH,BS PRCH

When CH receives a message ‘M1 ’, it adds its ID and new nonce value to the
message and adds MAC of this message to the BS. Before sending it to BS, the
entire message is encrypted by private key of CH.
Step 3: Base Station (BS) to Cluster Head (CH)

BS → CH : IDCH , Na , Nch , C| MAC(C, Na , Nch , IDCH )K CH,BS PR

After receiving the message, BS station decrypts it with CH public key which was
stored in its database at the moment of network exploitation and verifies MAC value.
If verification is successful, BS generates digital certificate ‘C’ for Node A, and the
same is concatenated by the MAC of this message to the BS and again encrypted with
its own private key ‘PR’. The certificate contains a version number, serial number,
issuer name, life span, and extensions.
Step 4: Cluster Head (CH) to Sensor Node (A)

CH → A : ID A , Na , Nch , C|MAC(ID A , Na , Nch , C)K CH,A

CH decrypts the received message with BS public key which stored in its
database before the network setup. After verifying the MAC value, CH transforms
the certificate to Node A and encrypts it with Shared MAC key.
Step 5: Sensor Node (A) to Sensor Node (B)

A → B : (ID A , Na , C)K A,B

Upon receiving a certificate from CH, sensor node (A) stores it in a database. For
authentication purposes, Node (A) transmits the certificate to the node (B) which was
encrypted with a shared secret key among both the nodes. After receiving certificate,
Node (B) completes further functions like first it checks the Node ID with its memory
database lists. It equals means Node (A) is a legitimated node. Then, it checks the
legality of digital certificates by scrutinizing the life epoch of the certificate and the
Secure, Efficient, Lightweight Authentication in Wireless … 309

public-key algorithm that compose a sign on it. If all function is fulfilled, indicate
node is legitimate, lively, and comes from an official BS.
Step 6: Sensor Node (B) to Sensor Node (A)

B → A : (ID B , Nb , C)K B,A

Sensor node (B) follows the same procedure to authenticate with a sensor node
(A).
Cluster authentication key for re-authentication of Nodes: In WSN’s, sensor nodes
employed in hazard, terrible, risky,and unsafe environments. Due to unexpected situ-
ations that happen in the mentioned fields, some of the nodes may lose their connec-
tivity with appropriate sensors. For re-authentication, CH shares cluster authentica-
tion key (K chu ) with non-CH ahead of the network deployment. For instance, Node
(A) loses its connectivity and wants to reconnect with Node (B), and it forwards
identities, nonce and cluster authentication key and concatenated MAC value. Then,
Node (B) decrypts and verifies the MAC value, and if verification is successful means,
it is a legitimated node. Here, we save the battery power by decreasing the commu-
nication which takes place in the initial authentication procedure; CH itself forwards
the cluster re-authentication key.

A → B : ID A , IDb , {K chu } K AB |MAC I D A , I Db , {K chu } K AB K
AB

5 AVISPA Tool Simulation Results, Security Analysis,

and BAN Logic

Automated validation of Internet security protocol and applications is a popular

tool for automatic confirmation of Internet safety—privacy set of rules. It affords a
collection of functions to build and evaluate prescribed security protocols written in
‘High-Level Protocol Specification Language’ (HLSPL). There are four back-ends
till now many more to come, and every back end gives different results. The offered
protocol is verified by OFMC and CL-At checker, and simulation results show Fig. 1
the protocol specified without any intermissions. (a) Proves the outcome in OFMC
which raises fixed hierarchy characterized by protocol scrutiny in a demand-driven
approach. It is used for demonstrating the protocol confirmation in the enclosed
number of session, particularly bounding messages those are proficient to engendered
by the trespasser. The CL-At checker reports all the possible reviews from whatever
security protocol written in intermediary function into a set of restraints that can
resourcefully exploit to trace attacks on procedures. Figure (b) Shows CL-At checker
result window that projected protocol secures against recognized attacks and archives
the goals as specified.
310 B. Chander and K. Gopalakrishnan

(1) (2)

Fig. 1 Simulation outcome of the proposed protocol in AVISPA (1) OFMC (2) CL-AtSe

In any kind of authentication protocol or mechanism, analysis of security measures

is essential. The designed protocol achieved some great fundamental security levels
which are scheduled as integrity and man-in-middle attack—message reliability
accomplished with both MAC and digital certificate verification, hence, protocol
secures against man-in-middle attack. Key freshness—by verifying the nonce values,
it confirmed that keys generated in the current session. Confidentiality—initial
authentication among node-node, node-CH and CH-BS with shared MAC key, so, no
entity knows the communication information, by doing this, protocol works against
reply attack. Masquerade attack—each communication message encrypted with a
preloaded shared secret key, MAC-based shared secret key which is not generated
by an attacker. In the authentication, digital certificate is utilized, so there is no way
for an attacker to send a request message. DoS attack—in the proposed scheme, BS
creates a digital certificate, and the remaining on the base of a request message from
sensors through CH remain authentication made by a node. In very rare conditions,
if one node jammed, it does not affect network overall performance.

5.1 BAN Logic Analysis: Logical Rules of BAN Logic

Message meaning rule (Rule (1))—P believes Q has said X, P believes the key
K is mutual or shared key with Q, P sees X is encrypted by K. Random number
verification rule (Rule (2))—P believes Q believes X if P believes X is sent now
and Q has said X. Jurisdiction rule (Rule (3))—P believes X if P believes Q has
jurisdiction for X and P believes Q believes X. Fresh transmission rule (Rule (4))—P
believes fresh (X)/P believes fresh (X, Y ) . Trust polymerization and trust projection
rule (Rule (5))—P believes X, P believes Y /P believes (X, Y ) P believes (X, Y )/P
believes X. See rule (Rule (6))—P can decrypt message he obtained if P received
messages encrypted with own public key.
Secure, Efficient, Lightweight Authentication in Wireless … 311

1. Proof Abelieves A ← K A,B → B

(a) Because A believes A ← K A,CH → CH. A can see (Nch , C) K CH,A based on
Rule 1, A believes CH said ‘C’
(b) Because A believes fresh (N) based on Rule 2 and Rule 4, we can get: A believes
CH believes ‘C’
(c) Because A believes CH controls ‘C’ from rule 3, A believes fresh (Nch ) from
Rule 4, we can get: A believes ‘C’.
(d) Because A believes BS controls ‘C’ based on Rule 3, A believes fresh (Nch )
from Rule 4, we can get: A believes ‘C’.
(e) In conclusion according to Rule 5: we get A ← K A,B → B.
2. Proof Bbelieves B ← K A,B → A

(a) Because B believes B ← K B,CH → CH. B can see (Nch , C) K CH,B from
Rule 1, A believes CH said ‘C’
(b) Because B thinks fresh (N) from Rule 2, Rule 4, we find: B believes Ch trusts
‘C’.
(c) Because B believes CH directs ‘C’ from Rule 3, A believes fresh (Nch ) from
Rule 4, we can get: B believes ‘C’.
(d) Because B believes BS controls ‘C’ from Rule 3, A believes fresh (Nch ) from
Rule 4, we can get: A believes ‘C’.
(e) In conclusion according to Rule 5: we can get B ← K B,A → A

6 Conclusion

In WSNs, security plays a major role, and sensed data records should not be easily
reached to unlawful nodes. The attacks which happened on WSNs appliances are
due to the anxious validation schemes. Several verification, authentication schemes
are exited in the literature, but most of them need many computational resources. But
nodes in WSNs hold limited resource constraints, so the proposed scheme should be
simple, lightweight and consume less power. Here, we designed a secured authenti-
cation scheme, where we first perform a simple cluster technique, after that, sensor
node requests a digital certificate to BS through CH. Moreover, re-authentication is
done by cluster re-authentication key which increases the life of the node by avoiding
unnecessary computation for new certificate. MAC applied each message to increase
integrity and authenticity. The proposed model provoked to be secure by evolution
and initiates safe and sound through a model checker tool for testing authentication
protocol called AVISPA and with BAN logic.
312 B. Chander and K. Gopalakrishnan

References

1. Binh HTT, Dey N (eds) Soft computing in wireless sensor networks. CRC Press
2. Dogra H, Kohli J (2016) Secure data transmission using cryptography techniques in wireless
sensor networks: a survey. Indian J Sci Technol 9(47)
3. Rajeswari SR, Seenivasagam V (2016) Comparative study on various authentication protocols
in wireless sensor networks. Sci World J 3
4. Lu Z, Qu G, Liu Z (2019) A survey on recent advances in vehicular network security, trust,
and privacy. IEEE Trans Intel Transp Syst 20(2):760–776
5. Azees M, Vijayakumar P, Deborah LJ (2016) Comprehensive survey on security services in
vehicular ad-hoc networks. IET Intel Transp Syst 10(6):379–388
6. Chander B (2020) Clustering and Bayesian networks. In: Handbook of research on big data
clustering and machine learning. IGI Global, pp 50–73
7. Choukimath RC, Ayyannavar VV (2014) Secure and efficient intermediate node authentication
in wireless sensor networks. Int J Sig Process Syst 1(3):71–74
8. Fu Z, Huang F, Ren K, Weng J, Wang C (2017) Privacy-preserving smart semantic search
based on conceptual graphs over encrypted outsourced data. IEEE Trans Inf Forensics Secur
12:1874–1884
9. Razaque A, Rizv SS (2017) Secure data aggregation using access control and authentication
for wireless sensor networks. Comput Secur 70:532–545
10. Li X, Niu J, Kumari S, Wu F (2018) A three-factor anonymous authentication scheme for
wireless sensor networks in internet of things environments. J Netw Comput Appl 103:194–204
11. Lee S, Kim K (2015) Key renewal scheme with sensor authentication under clustered wireless
sensor networks. Electron Lett 51(4):368–371
12. Benzaid C, Lounis K, Al-Nemrat AB, Nadjib AM (2016) Fast authentication in wireless sensor
networks. Fut Gener Comput Syst 55:362–375
13. Moon AHU (2016) Authentication protocols for WSN using ECC and hidden generator. Int J
Comput Appl 133(13):42–47
14. Yussoff YM, Kamarudin (2017) Lightweight trusted authentication protocol for wireless sensor
network (WSN). Int J Commun 2:130–136
15. Jawad KM (2019) An improved three-factor anonymous authentication protocol for WSN s
based iot system using symmetric cryptography. In: International conference on communication
technologies, ComTech 2019, pp 53–59
16. Bhanu C, Kumaravelan (2018) Simple and secure authentication in WSNs using digital
certification. Int J Pure Appl Math 119(16):137–143
17. El Dayem A, Rizk SS, Mokhtar MA (2016) An efficient authentication protocol and key estab-
lishment in dynamic WSN. In: Proceedings of the 6th international conference on information
communication and management, ICICM 2016, pp 178–182
18. Gaur SS, Mohapatra (2017) An efficient certificate less authentication encryption for WSN
based on clustering algorithm. Int J Appl Eng Res 12(14):4184–4190
19. Jung J, Moon J (2017) Efficient and security enhanced anonymous authentication with key
agreement scheme in wireless sensor networks. Sensors 17(3)
20. Qin D, Jia S (2016) A lightweight authentication and key management scheme for wireless
sensor networks. J Sens
Performance Evaluation of Logic Gates
Using Magnetic Tunnel Junction

Jyoti Garg and Subodh Wairya

Abstract In today’s advanced computing system, power consumption and speed

are important factors for any device. As technology advances, the channel length of
MOSFET reduces that causes an increase in leakage current. There is an emerging
technology-spintronics with low power consumption, high reliability, and high
endurance to overcome all these issues. In this paper, magnetic tunnel junction-
based logic gates- AND/NAND, OR/NOR has been proposed. This method shows
that it consumes less power than other approaches.

Keywords Spintronics · Magnetic tunnel junction · Logic gates · Nonvolatile

logic device

1 Introduction

VLSI design is facing many challenges as the shrinking of transistor dimensions,

lower power supply all these things cause an increase in leakage current due to that
device performance diminish. Many technologies like QCA, CNFET, SET, and nano-
magnetic devices overcome these issues [1–3]. Spintronics is one of the emerging
technology to overcome these issues. There are many techniques like—adiabatic
circuits, non-conventional CMOS, and energy storage in the capacitor [4] low power
designing.
In spintronics, the position of a transistor can be altered by only flipping the spin
of electrons. So, it has the properties of stand by the power that is almost equal
to zero, high speed, and good compatibility with CMOS. Spin transfer torque has
the properties of high density and limitless endurance. It has an essential feature
of scalability because only the scaling of CMOS in the deep sub-micron regions

J. Garg (B)
Department of Electronics Engineering, Dr. A.P.J. Abdul Kalam Technical University, Lucknow,
Uttar Pradesh, India
S. Wairya
Department of Electronics and Communication Engineering, IET, Lucknow, Uttar Pradesh, India

© Springer Nature Singapore Pte Ltd. 2021 313

causes the flow of leakage current [5]. From the last few decades, we have seen
logic gates using CMOS technology. Logic gates using MTJ have been proposed to
avoid leakage current. There are many existing designs of logic gates using MTJ.
The existing designs using MTJ helps to reduce power and delay but there are many
other drawbacks also [6, 7]. MTJ has the property of storage and processing that
helps to reduce memory and delay. In logic designing using MTJ, there is always a
need for a sensing amplifier to read the data and process it further.
In [8] paper, a magnetic XOR gate has been designed that has six MTJ and tran-
sistors. This paper shows that it has a small area, but number of counts of transistors
increases, due to that write energy also increases that is a significant drawback. In
[9] paper also, there is a requirement of additional circuitry due to that increment
in area, power consumption, and delay takes place. There are few designs in that
single MTJ that have been used for the implementation of linear functions [10].
If a nonlinear function has to be designed, then using the linear function can be
designed, and this is a process of multiple stages. In [11], spin diode logic family
has been proposed in which dynamic power dissipation is more than writing power
dissipation, and dynamic power plays a vital role in designing a digital circuit. So,
there is one technique adiabatic to design low-power circuits. In adiabatic circuits,
charging and discharging of load capacitor is controlled to reduce power dissipation
[12]. This paper’s remaining part is arranged as a brief review of magnetic tunnel
junction, proposed circuit description, analysis, results, and conclusion.

2 Magnetic Tunnel Junction

As per the latest research, MRAM has a central key element magnetic tunnel junction
(MTJ) (Fig. 1).
In MTJ, there are two layers- one is the free layer, and the second is the reference
layer. Both layers are formed from ferromagnetic material; there is an oxide layer for
tunneling purposes among these two layers. Information is stored in a bit form ‘0
(low resistance state) or ‘1 (high resistance state) by the magnetization of the free
layer to reference layer magnetization. If the free layer and reference layer magnetic
moment is in the opposite direction, then MTJ is in high resistance state, and it is
considered high logic. If the free layer magnetic moment is in the same direction
with the reference layer magnetic moment, then MTJ is considered low resistance. It
can be read as low logic [13]. In Fig. 1 MTJ, and in Fig. 2, the low and high resistance
state of MTJ is represented.
In MTJ, writing current depends inversely on the size of MTJ means if the size of
MTJ decreases, then write current increases or vice versa. A large amount of current
is required to design the small size of MTJ, which was a bottleneck in MRAM. To
avoid this problem, STT MRAM came into existence. The basic structure of STT
MRAM has 1 transistor and 1 MTJ called 1T1MTJ [14].
Performance Evaluation of Logic Gates Using Magnetic … 315

Fig. 1 Representation of magnetic tunnel junction

Fig. 2 States of MTJ ‘0’ low resistance state, ‘1’ high resistance state

3 Proposed Approach

In this section, the magnetic tunnel junction-based logic circuit is proposed. Magnetic
tunnel junction-based logic circuits have three parts, as shown in Fig. 3 [15]. The
first part is a sense amplifier, the second part is CMOS logic, and the third one is
the MTJ part. Sense amplifier forms a pull-up network and combining CMOS logic
and MTJ both forms pull-down network. Programming is done in STT MRAM. In
CMOS, logic part logic is designed using MOS elements, and the sense amplifier is
316 J. Garg and S. Wairya

Fig. 3 Structure of a
MTJ-based circuit

the last part that gives the output. In this paper to design sense amplifier reference
has taken from [15].
Figures 4 and 5 show the magnetic tunnel junction-based circuit diagram of the
NAND/AND gate and NOR/OR gate, respectively. In Fig. 4, both AND and NAND

Fig. 4 NAND/AND gate

using MTJ
Performance Evaluation of Logic Gates Using Magnetic … 317

Fig. 5 NOR/OR gate using

MTJ

logic are implemented, taking both the inputs with its complementary form. In the
sense amplifier, there are eight transistors—P1, P2, P3, P4, N1, N2, N3, and N7.
Transistor N4, N5, and N6 form the CMOS logic. Input 1(A or Abar) is applied to
CMOS logic, and Input 2 (B or Bbar) is applied to MTJ. When MTJ 1 is set as Rap
and MTJ2 is Rp , then input B is considered as logic 1 or vice versa. If Input A is
taken 0 and Input B is taken 1, then the working is explained by taking input ‘01.’
If the clk signal is ‘0’ (precharge phase), then P1 and P4 are in on state, and P2,
P3, and N7 are in off state. If the clk signal is ‘1’ then P1 and P4 will be in off
state and N7 will be in ON state. Transistor N4 and N5 are in off state. MTJ2 is in
parallel, and MTJ1 is in antiparallel state. The left branch of the circuit is in cut-off
state, and the output of AND logic is grounded. In the meantime, NAND gate logic
is charged to maximum supply V DD. A discharge signal is applied at transistor N3
that may have different voltage according to the precharge or evaluation phases. Due
to MTJ1 in an antiparallel state, the NAND gate branch has a higher resistance and
causes slow discharge. AND gate branch discharges fastly, and due to that, transistor
P2 crosses its threshold voltage level, which turns on transistor P3. Also, transistor
N3 reaches its maximum supply. AND gate branch starts to discharge, and it crosses
the threshold voltage level of transistor N1, which makes AND node voltage to low
level, but transistor N3 is in on state that makes node voltage to a high level. That
318 J. Garg and S. Wairya

gives output 0 at AND gate logic and 1 at NAND gate logic. Similarly, OR gate
logic/NOR gate logic circuits can be explained.

4 Simulation of Logic Circuits and Comparison

The logic circuits NAND/AND gate and NOR/OR gate simulation using MTJ has
been carried out for performance evaluation. The simulation results of NAND/ AND
logic gate and OR/NOR logic gate using MTJ are illustrated in Figs. 6 and 7, respec-

Fig. 6 Simulation results of NAND/AND gate using MTJ

Performance Evaluation of Logic Gates Using Magnetic … 319

tively. All the designs have been simulated in HSPICE tool with CMOS technology
of 32 nm [16] and MTJ model [17].
Table 1 shows the performance evaluation of the NAND circuit using MTJ design
with conventional CMOS NAND login design.
Table 2 shows the performance evaluation of the NOR circuit using MTJ design
with conventional CMOS NOR logic design.

Fig. 7 Simulation results of NOR/OR Gate using MTJ

320 J. Garg and S. Wairya

Table 1 Performance
MTJ design Conventional design
evaluation of NAND/AND
gate using MTJ versus Dynamic power (nW) 1.8 4.7
conventional design Propagation delay (ns) 34 29
Standby power (nW) 0.5 3

Table 2 Performance
MTJ design Conventional design
evaluation of NOR/OR gate
using MTJ versus Dynamic power (nW) 2.8 5.2
conventional design Propagation delay (ns) 42 38
Standby power (nW) 0.9 4.6

Fig. 8 Comparison of power consumption of hybrid NAND/NOR gate with conventional CMOS
NAND/NOR gate

From the above results, as shown in Tables 1 and 2, it can be concluded that logic
circuits using MTJ show better performance than the conventional design (Fig. 8).

5 Conclusion

This research paper has tried to show that logic circuits using magnetic tunnel junction
show better results than conventional design. The impact of magnetic tunnel junction
on power issues also has been studied. In this paper, NAND and NOR circuits have
been simulated. It can be concluded that there is a significant reduction in power
consumption with the MTJ design that is more than 50%. A later novel model file
can be created for MTJ, so that better results can be achieved.
Performance Evaluation of Logic Gates Using Magnetic … 321

References

1. Goswami M, Kumar B, Tibrewal H, Mazumdar S (2014) Efficient realization of digital logic

circuit using QCA multiplexer. In: 2nd international conference on business and information
management (ICBIM). Durgapur, India, pp 165–170
2. Sahoo R, Sahoo SK, Sankisa KC (2015) Design of an efficient CNTFET using optimum number
of CNT in channel region for logic gate implementation. In: International conference on VLSI
systems architecture, technology and applications (VLSI-SATA) Bangalore India, pp 1–4
3. Rajasekaran S, Sundari G (2017) Design and analysis of logic gates using single electron nano-
devices. In: International conference on advances in electrical technology for green energy
(ICAETGT). Coimbatore, India, pp 64–68
4. Padmavathi B, Geetha BT, Bhuvaneshwari K (2017) Low power design techniques and imple-
mentation strategies adopted in VLSI circuits. In: IEEE international conference on power,
control, signals and instrumentation engineering (ICPCSI), Chennai, India, pp. 1764–1767
5. Kim J, Paul A, Crowell PA, Koester SJ, Sapatnekar SS, Wang JP (2015) Spin-based computing:
device concepts, current status, and a case study on a high-performance microprocessor. Proc
IEEE 103:106–130
6. Deng E, Zhang Y, Klein JO, Ravelsona D, Chappert D, Zhao W (2013) Low power magnetic
full-adder based on spin transfer torque MRAM. : IEEE Trans Magnet 49:4982–4987
7. Zhang D, Zeng L, Gao T, Gong F, Qin X, Kang W, Zhang Y, Klein JO, Zhao W (2017)
Reliability-enhanced separated pre-charge sensing amplifier for hybrid CMOS/MTJ logic
circuits. IEEE Trans Magn 53(9):1–5
8. Trinh HP, Zhao W, Klein JO, Zhang Y, Ravelsona D, Chappert C (2013) Magnetic adder based
on racetrack memory. IEEE Trans. Circ Syst I 60(6):1469–1477
9. Gupta MK, Hasan M (2016) A low-power robust easily cascaded PentaMTJ-based combina-
tional and sequential circuits. IEEE Trans Very Large Scale Integr (VLSI) Syst 24(1):218–222
10. Mahmoudi H, Windbacher Th, Vi S, Si S (2013) Implication logic gates using spin-transfer-
torque-operated magnetic tunnel junctions for intrinsic logic-in-memory. Solid-State Electron,
191–197
11. Friedman JS, Rangaraju N, Ismail YI, Wessels BW (2012) A spin-diode logic family. IEEE
Trans Nanotechnol 11(5):1026–1032
12. Agrawal A, Gupta TK, Dadoria AK, Kumar D (2016) A novel efficient adiabatic logic design
for ultra low power. In: International conference on ICT in business industry & government
(ICTBIG), Indore, pp 1–7
13. lkega S, Mancoff FB, Janesky J, Aggarwal S (2020) Magnetoresistive random access memory:
present and future. IEEE Trans Electron Dev 67(4):1407–1419
14. Garzón Es, De Rose R, Fe C, Li T, Lanuzza M (2019) Assessment of STT-MRAM perfor-
mance at nanoscaled technology nodes using a device-to-memory simulation framework. J
Microelectron Eng 215
15. Barla P, Shet D, Joshi VK, Bhat S (2020) Design and analysis of LIM hybrid MTJ/CMOS logic
gates. In: 5th international conference on devices circuits and systems (ICDCS), Coimbatore
India, pp 41–45
16. Zhao W, Cao Y (2006) New generation of predictive technology model for sub-45nm early
design exploration. IEEE Trans Electron Dev 53(11):2816–2823
17. Kim J, Chen A, Behin-Aein A, Kumar S, Wang JP, Kim CH (2015) A technology-agnostic MTJ
SPICE model with user-defined dimensions for STT-MRAM scalability studies. In: Custom
integrated circuits conference (CICC), IEEE, pp 1–4
Medical IoT—Automatic Medical
Dispensing Machine

C. V. Nisha Angeline , S. Muthuramlingam , E. Rahul Ganesh,

S. Siva Pratheep, and V. Nishanthan

Abstract Internet of things (IoT) is playing a vital role in the development of various
high-performance smart systems. So much research is being done to improve the
quality of human life in various ways. One such research is in the field of hospital
management. After the recent COVID situation which has led to concept of social
distancing and contactless transactions, we propose a centralized hospital manage-
ment system using IoT. The proposed system is a mobile app that can be made use
of by the doctor to access the patient history from the centralized database. The
doctor can then make an E-prescription based on his diagnosis. The E-prescription is
generated as a QR code on the patient-side app. The patient can show the QR code to
the automatic medical dispensing machine (AMDM) which dispenses the prescribed
medicines to the patient by matching it against a QR code. This helps to avoid 70%
of the medical errors due to manual prescription and achieve the concept of social
distancing and contactless transaction.

Keywords IoT · QR code · Covid · Social distancing · Dispenser · Android app ·

Medicines

1 Introduction

The Coronavirus is a big family of viruses which causes several respiratory infec-
tions varying from a simple common cold to more severe diseases such as severe
acute respiratory syndrome (SARS) and middle east respiratory syndrome (MERS).
About 80% of the people who become infected by COVID-19 recovers from it
without needing hospital treatment. But the other 20% becomes critical due to diffi-
culty in breathing. The elderly and those are already having medical problems like
diabetes, heart and lung disease, high blood pressure or cancer are at a greater risk of

C. V. Nisha Angeline (B) · S. Muthuramlingam · E. R. Ganesh · S. S. Pratheep · V. Nishanthan

Thiagarajar College of Engineering, Madurai, India
S. Muthuramlingam
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 323

becoming critical. The World Health Organisation (WHO) says that the COVID-19
is transmitted through the droplets from the nose or mouth expelled when COVID-19
infected coughs, sneezes, or speaks. A normal person could catch COVID-19 if they
breathe in these droplets, and hence, it is important to maintain social distancing.
These droplets can also stick to any objects and surfaces, and people can become
infected by them. Hence, a contactless transaction is very essential. Many hospitals
are finding it challenging to maintain the regular services at the hospitals during
the pandemic [1]. People who are suffering from chronic illnesses and those who
need regular hospital visits are found helpless due to this. According to the National
Health Mission administrative data statistics, lots of people have been affected due to
transmission of COVID from going to a hospital for regular treatment. Hence, in the
past few months„ hospitals have denied normal consultancies and also postponed
surgeries and operations to prevent patients from getting infected by COVID-19
virus. There is a growing need for automation to prevent this transmission and yet
provide the regular services in the health sector. Hence, it is proposed to develop an
automated medical dispensing machine that avoids human contact while coming to
the hospital and during distribution of medicine.

2 Challenges Faced During COVID-19 Situation

The hospital management systems in India are not prepared to contain the current
COVID situation. The hospitals in India are still in a developmental stage and have
a lot of challenges like inadequate doctors, nurses, medication, and equipment. In
the current pandemic situation with no vaccines being discovered, the best chance
is to avoid people from getting infected. The doctors who are working in hospitals
have refrained themselves from going to their own home and isolated themselves in
separate accommodation in fear of infecting their family members. The same is the
case for non-COVID patients who visit the hospital. Many cases were reported to
have caught COVID-19 due to their recent doctor visit for their chronic illness which
is literally like from frying pan into the fire kind of situation. Now, the healthcare
sector must look for ways to provide general services for the regular patients yet
prevent the infection from spreading.

3 Need for Automatic Medical Dispensing Machine

Whether rural or urban, there is critical need to avoid meeting physically yet provide
the regular hospital service undauntedly. Let us consider an example of diabetic
patient who visits the hospital on a regular basis. Every month the doctor might require
checking his blood sugar levels and provide consultation and prescribe medication
according to the patient’s condition. In a regular visit, the patient might physically
visit the doctor and get these above steps done. The same can be initiated virtually
Medical IoT—Automatic Medical Dispensing Machine 325

using a mobile application along with the sensors in the device. If any patient needs to
get medicines, that can be done through an automated machine without any physical
contact. Hence, the automatic medical dispensing machine helps.
• To ensure social distancing
• To ensure contactless transaction for payment and distribution of medication.

4 Existing Solutions

4.1 Semi-automated Dispensing Machine Using Barcode

A barcode-assisted system was developed for medical administration and dispensing

systems that helps to automate the process of ordering and distribution of the medicine
[2]. The system helped in managing the physician’s orders, retrieving historical
information about the patient’s prior medications, recognizing patient’s identity, and
assigning skilled labor jobs. However, the system was semi-automated. In the study,
the nurses of the hospital were to scan the barcode on the wristbands of the patients or
the badges of the patients. The nurses had to take note of the medications that were
prescribed to the patient, and they have to manually fetch it from the pharmacy. It also
was not considered cost-efficient, as each nurse allocated for the prior is provided
with a laptop with seamless wireless connectivity to fetch the medication data from
the hospital server.

4.2 Drug Data Transfer System

At Brigham and Women’s Hospital (BWH), the physician enters the medication
details into a Computerized Physician Order Entry Software [3]. The medications
entered by the physician are sent to the pharmacy information system electroni-
cally. Very commonly prescribed medicines were stocked in semi-automated medical
dispensing machines, while the least commonly prescribed medicines are kept under
the monitoring of the nursing unit. This system does not record the medications given
to the patient, and without this historical data of medication, this system is prone to
prior mentioned drug duplication problem.

4.3 Computerized Physician Order Entry (CPOE)

for Neonatal Ward

The Sayan-HIS (hospital information system) [4] is used by the physician to automate
the process of prescription for medicines and also for lab tests. The patient need
326 C. V. Nisha Angeline et al.

Fig. 1 System design

not carry any paper bills or prescriptions even for lab tests, and everything will be
available in the system on time of the visit of the patient. The HIS also puts a check
to the prescriptions made by the physician by checking the dosage that could be
administered for a medicine.

5 Proposed Solution

The proposed solution is to have a centralized hospital database across the country.
Every citizen is linked to the database as doctor/patient using their Aadhar id. Any
citizen visiting any hospital across the country will be asked for their Aadhar id
for registration. The DocHelp app is used by the doctor to go through the previous
medical history of the patient from the centralized database. The database stores all
the previous consultation of the patient by any doctor he has consulted. The doctor
can
(a) View Profile
(b) View Medical History
(c) View Drug History
(d) Prescribe Medicines.
For accessing the patient records, the doctor must scan the QR code from the
patients MediHelp app. Once the consultation is done, the doctor gives an E-
prescription in the MediHelp app against the patients id. The E-prescription is auto-
matically available for the patient. The payment for the consultancy can also be
done online. The above-mentioned steps are for a normal consultation situation.
In our current pandemic situation as COVID-19, where the visit to the hospital is
restricted, the doctors can allot slots in his calendar for the patients to book for his
virtual assistance. During the slot, the doctor can virtually attend the patients and
consult them via the same above-mentioned apps which would provide options for
video calling. In case, further assistance is required, the doctor can allot a slot for
Medical IoT—Automatic Medical Dispensing Machine 327

the patients visit such that he avoids crowd in the hospital. The E-prescription can
either be ordered through online booking for home delivery or dispensed via the
automated medical dispenser machine (AMDM) which will be like ATM placed in
many locations. When the patient shows his QR code to the panel in the AMDM
machine, the machine dispenses the medicines as per the prescription. The overall
database of the machine, the hospitals, and the patients is maintained in the AMDM
Admin app (Fig. 1).

5.1 Modules in Our Solution

• Automated medical dispenser machine (AMDM)

• DocHelp app
• MediHelp app
• AMDM Admin app.

The automated medical dispenser machine (AMDM) consists of a display panel

and a webcam which captures the QR code from the patient app and feeds it to the
ZBar barcode reader. Once the prescription is scanned, the panel displays the list of
medicines and corresponding rate of the medicines. Just like online shopping, the
patient can alter the order as per his choice and then click ok in his patient app. He
need not touch the panel in the machine, thereby providing contactless transaction.
Once the patient approves the list of medicines, the payment is detected through
online payment mode on the patient-side app, and the medicines are dispensed via
the machine. The software for the machine is written in Raspberry Pi which is a
minicomputer and connected the machine to Internet via WiFi.
Firebase
The centralized database for the mobile apps is to be maintained using Firebase. The
Firebase uses BaaS for providing service. The use of Firebase makes our app more
optimized as the applications connect to the database using “WebSocket” than the
normal HTTP calls to access the user data. The usage of WebSocket optimizes the
performance of the mobile apps to a greater extent due to single socket connection.
All our apps make use of Firebase as a BaaS for database and user authentication.
Firebase provides security features on its own, and hence, the app data is secured.
ZXing
We have used ZXing, an open-source, image processing library for scanning multi-
format 1D/2D barcode in Java. This library is made use to generate and read QR
code within the app.
Material Design
Google’s material design is ensured across all the apps to make the user interface
standard and easier to interact.
328 C. V. Nisha Angeline et al.

Fig. 2 Use case diagram for MediHelp and DocHelp

System Workflow
See Figs. 2, 3, 4 and 5.

6 Conclusion and Future Scope

The proposed project AMDM overcomes the human interaction during a pandemic
situation as this COVID-19. When compared to traditional manual systems, the
Medical IoT—Automatic Medical Dispensing Machine 329

Fig. 3 Swim lane diagram depicting the working of AMDM

Fig. 4 Screenshots of MediHelp app (patient-side app)

proposed AMDM may help in contactless medicine dispensing and consultation to a

greater extent in hospital. It is planned to extend the machine as a standalone device
for rural places where phone call facility is to be made available through the AMDM
to the doctors so that the patients in remote location can be consulted immediately
and medicines dispensed quickly on time. The machine will have sensors that could
check the body temperature and the oxygen saturation level as a future enhancement.
330 C. V. Nisha Angeline et al.

Fig. 5 Screenshots of DocHelpapp (doctor-side app)

References

1. https://www.who.int/emergencies/diseases/novel-coronavirus-2019
2. Poon EG, Cina JL, Churchill W, Patel N, Featherstone E, Rothschild JM, Keohane CA, Whit-
temore AD, Bates DW, Gandhi TK (2006) Medication dispensing errors and potential adverse
drug events before and after implementing bar code technology in the pharmacy. Ann Intern
Med 145:426–434
3. Maviglia SM, Yoo JY, Franz C, Featherstone E, Churchill W, Bates DW, Gandhi TK, Poon
EG (2007) Cost–benefit analysis of a hospital pharmacy bar code solution. Arch Intern Med
167:788–794
4. Kazemi A, Ellenius J, Pourasghar F, Tofighi S, Salehi A, Amanati A, Fors UG (2011) The effect
of computerized physician order entry and decision support system on medication errors in the
neonatal ward: experiences from an iranian teaching hospital
Performance Analysis of Digital
Modulation Formats in FSO

Monica Gautam and Sourabh Sahu

Abstract Free space optics (FSO) is the technology, wherein information is being
transferred from one end to another by propagating optical signals in the atmo-
sphere. Its working is similar to that of optical fiber, but there is no need of any
physical link to establish connection between the transmitter and the receiver end.
FSO communication is both a faster and cost-effective technique as compared to that
of conventional optical fiber communication. In the proposed work, On–Off keying
(OOK) and Phase-shift keying (PSK) techniques have been compared. The most
important parameters considered are a quality factor (Q), bit error rate (BER), and
height of eye. Here in this work, the study related to modulation schemes is considered
for the fulfillment of three tasks, which are important, efficiency of a transmission
data, resistivity to a nonlinearity, and ease of implementation. The complete study is
performed using electrical and optical models.

Keywords OOK · PSK · BER · FSO · DPSK · O-QPSK · CW · NRZ · RZ · MZ

1 Introduction

FSO is wireless communication technology, which uses an optical signal to transmit

information from a transmitter to a receiver in free space as a channel [1]. The field
of nonlinear optics in silica tubes is over 20 years old [2]. To improve, FSO is used
to achieve a greater bandwidth [3].
Modulation is a process, by which we can change any of the physical properties
of carrier, with the signal containing the data. It involves the process of data mapping
[4]. PSK has a greater efficiency that will help to increase communication capacity.
Data is supported on the phase of optical information [5]. If the phase is changed
according to the data, then PSK is generated.
Depending upon the number of partitions of phase, it is classified into various
types, such as binary phase-shift keying (BPSK) and quadrature phase-shift keying

M. Gautam (B) · S. Sahu

E&TC Department, JEC, Jabalpur, Madhya Pradesh, India

© Springer Nature Singapore Pte Ltd. 2021 331

(QPSK). BPSK uses two phases for the carrier signal, which are separated by 180° [5].
In BPSK modulation, first phase denotes “1”, whereas the second phase denotes “0”
[5]. Signal changes its level, phase alterations with 180°, or 0°. QPSK modulation
uses four different phase values separated by 90°. QPSK allows space for other
users by decreasing the data rate to half. QPSK, also known as quadri-phase. BPSK
transmits 2 bps, whereas QPSK is used to transmit 4 bps [4]. We can also use the
multi-level modulation format, because it may increase the bit rate and capacity of
fiber communication [6].
At the beginning of optical fiber communication, the digital information is firstly
converted into digit “0” and digit “1” [7, 8]. Amplitude-shift keying (ASK) is widely
used in optical systems and networks of many scales. PSK has demonstrated advan-
tages compared to ASK in long haul transmission systems [9]. Depending on their
demands, the receiver can be designed, and it can be made by using optical delay
line interferometer and photodetector. Digital modulation techniques can be imple-
mented for optical communication by making use of optical devices, such as delay
line interferometer, which are typically Mach–Zehnder (MZI) type, or Michelson-
type interferometers based on multiple beams interference, in which one beam is
time delayed with respect to other by the desired interval.
Lithium Niobate (LiNbO3 ) has a very high intrinsic modulation bandwidth, and
the device switching speed is limited by a variety of physical constants [10]. So,
the better option is Mach–Zehnder (MZ) modulator. MZ modulator has three ports:
First port is for modulation, second for continuous wave (CW) LASER, and third
terminal gives the output [11]. Usually, at the receiver side, a delay line interferometer
based on silicon photonics can also be built for the modulation to maintain a low-
cost direct detection [12]. The delay line interferometers are also known as optical
differential phase-shift keying (ODPSK) demodulators. As applied to DPSK, the
modulation delay line interferometer converts a phase keyed signal into an amplitude
keyed signal. MZI is used to absorb interference when the interference of light wave
happens.

2 Implementation of Modulation

Khajwal et al. [13] have proposed a paper and have studied various compensa-
tions related to FSO. The author has analyzed the performance of FSO—single
input single output (SISO) and FSO—wavelength division multiplexing (WDM) for
several atmospheric situations. The quality of the received signal has been studied,
as transmission range and power in different weather conditions.
Elayoubi et al. [14] have demonstrated 50% DPSK with return-to-zero (RZ) which
proof to perform better than differential phase-shift keying (DPSK) and non-return-
to-zero on–off keying (NRZ-OOK) from satellite to ground at 40 Gbps bit rates.
However, the electrical NRZ-RZ converter in transmitter remains essential as supple-
mentary apparatus, which is the drawback [15]. It has been assessed using simulation
software.
Performance Analysis of Digital Modulation Formats in FSO 333

Alifdal et al. [16] have studied optical signal-to-noise ratio (OSNR) for different
chromatic dispersion values. Based on offset quadrature phase-shift keying (OQPSK)
modulation, the author has proposed the advantages of wavelength division multi-
plexing (WDM). Using MATLAB, and Optisystem simulations, that the author has
shown, the improvement in BER and OSNR, when the dispersion coefficient is high.
Instead, the author has observed the effectiveness of the given system in terms of
decreased BER.
Sanou et al. [17] have demonstrated the orthogonal frequency division multi-
plexing associated with the OQPSK filter. In similar studies, this method can send
signals to a greater distance, and it does not require equalization for a certain distance.
Through the work, bandwidth is maximized, because cyclic prefix is not used.
Moreover, the complication of transmitter and receiver remains compact.

3 Digital Schemes

Usually, an On–Off keying transmitter is made by intensity modulation, driven by

an electrical data signal. On–Off keying denotes on and off according to signal
[18]. Typically, we would use the OOK of the LASER source to communicate data
through an optical domain [8]. Later, the pulse phases are randomly modulated, so
that carrier parameters are suppressed in an adverse manner [19]. Erbium-doped
amplifiers are the most significant progress in optical fiber system development [20],
but its operational levels may be fluctuated through time due to aging and the other
effects [21].

4 Differential Phase-Shift Keying

The phase shift may have occurred for logic “1” only. Based on numerical simulation,
the conversion from NRZ-OOK to RZ BPSK and RZ QPSK may be possible up to
40 Gbps systems [14]. Therefore, we can convert other formats into DPSK, to get
more advantages.

4.1 DPSK System

The electrical system basic setup is given in Fig. 1. The system can be separated
into two important segments, which are the transmitter and receiver. First, the DPSK
electrical signal is generated by using pseudorandom binary sequence (PRBs) at a
sequence length of 256 bits, and RZ wave, and passed through an electrical link. The
receiver consists of a DPSK pulse generator, in which the bit rate is 30.375 Mbps, and
the sample rate is 19.4 GHz. Here, 64 samples per bit are used. Quadrature detector
334 M. Gautam and S. Sahu

Fig. 1 DPSK electrical modulation using OptiSystem

is operated at 550 MHz with a gain of 2 dB. Figure 2 shows the eye diagram for
DPSK electrical. BER showed error-free operation [22].

Fig. 2 Eye diagram for DPSK electrical

Performance Analysis of Digital Modulation Formats in FSO 335

4.2 DPSK Optical System

The designing of the DPSK signal with an optical link is shown in Fig. 3. In this
method, the signal is optically modulated in DPSK format, after that it is passed
through fiber. The transmitter system consists of the CW LASER source with 4 mW
power, and frequency of 193.1 THz, non-return-to-zero (NRZ) waveform is generated
from the binary values of PRBs at length 128 bits and is provided to MZ modulator
at a data rate of 40 Gbps. For long haul transmission, six optical loops are used. At
the receiver side, a low-pass filter, which is operated at a 0.8 × bit rate, is used. Bit
rate is a function of total system length [22]. The output of the system is observed at
60 km. Figure 4 shows the eye diagram of this system.

5 Offset Quadrature Phase-Shift Keying

The direct and shifted sequences are combined with the carrier and then added to
generate a digital OQPSK. Usually, OQPSK modulator dissipates greater energy and
area [23].

Fig. 3 DPSK optical system by OptiSystem

336 M. Gautam and S. Sahu

Fig. 4 Eye diagram for DPSK optical

5.1 OQPSK System

Figure 5 shows the basic setup for the OQPSK electrical system, in which the OQPSK
signal is generated electrically and then sent through the electrical channel. It is
mainly consisting of two segments, which are the transmitter and receiver. In the
transmitter system, the OQPSK signal is generated electrically by using PRBs at a
length of 256 bits, and RZ wave, which is passed through an OQPSK modulator. The
quadrature detector is operated at 550 MHz with a gain of 2 dB, in which the bit rate

Fig. 5 OQPSK electrical modulation using OptiSystem

Performance Analysis of Digital Modulation Formats in FSO 337

Fig. 6 Eye diagram for OQPSK electrical in-phase and quadrature phase

is 30.375 Mbps, and the sample rate is 19.4 GHz. Here, 64 samples are taken per bit.
The receiver mainly consists of an OQPSK pulse generator. Figure 6 shows the eye
diagrams for I and Q outputs.

5.2 OQPSK Optical System

The designing of the OQPSK optical is shown in Fig. 7. Binary data generates a
waveform, which will be separated by 45°. In the last, the desired signal is to be sent
by adding two phases. The transmitter consists of a CW LASER source with 4 mW
power and a frequency of 193.1 THz, and NRZ waveform is produced from the binary
values of PRBs at length 256 bits and is provided to the MZ modulator at a data rate
of 40 Gbps. The two modulators have a 90° phase difference between them. The
2 MZ modulates an optical signal to produce I and Q parts. The receiver consists of
a demodulator, working on the principle of 90° optical hybrid coupler. The received

Fig. 7 OQPSK optical modulation using OptiSystem

338 M. Gautam and S. Sahu

Fig. 8 Eye diagram for OQPSK optical in-phase and quadrature phase

signal is modulated and separated into I and Q parts, as shown in Fig. 8. Here, 32
samples are taken per bit, and the response is observed at 10 km. We visualize the
BER and OSNR by using an OSNR analyzer and a BER analyzer.
After running simulation, we may visualize results from eye diagram.

6 Results and Discussions

RZ-DPSK proof to perform better than DPSK, and NRZ-OOK, practically because
of inter-symbol interference (ISI) reduction, when RZ pulses are used [14].
In this paper, we are focusing on various approaches for the mitigation of FSO
channel impairments and the effect of optical propagation limitations. We have
described the comparison of DPSK and OQPSK. Table 1 shows the comparison
between different parameters used for DPSK and OQPSK techniques. In general,
the maximum acceptable bit error rate is about 10–9 [24].

Table 1 Comparison between the methods

Digital Scheme DPSK OQPSK
Parameters Electrical Optical Electrical Optical
I Q I Q
Bit Rate (bps) 30.375 M 40 G 30.375 M 40 G
Frequency (Hz) 550 M 193 T 550 M 193 T
Distance (km) 0 60 0 10
Quality factor 0 36.0093 0 0 44.1117 43.854
Eye Opening (au) 0 0.0049397 0 0 0.0013262 0.0013204
Bit error rate 1 4.94066 to 324 1 1 0 0
Performance Analysis of Digital Modulation Formats in FSO 339

Usha et al. [23] suggested a new approach for OQPSK, in which the comparison
between existing and booth multiplier method had been done. This paper was focused
on the area reduction. The work did not provide any calculation about the error, quality
factor, and the height of the eye. The work did not compare the method from the
other formats of PSK, in terms of the majorly used parameters.

7 Conclusion

In the proposed work, we have compared the most popular modulation systems.
Usually, the FSO transmission medium contains transmitter and receiver. It is a
technique that is used to send the information at visible or infrared (IF) optics. The
fast growth of Internet-based services and the mobile telephony, as well as the advent
of multimedia services will lead to a huge increase in the traffic. It requires a massive
extension of the transport capacity of public networks. It has been identified that
the modulator which consumes low power may have greater efficiency. OQPSK
modulator consumes more power, so we prefer DPSK if the power is a significant
parameter. It is also observed that by using the LASER beam combination technique
in the multibeam FSO system, an effect of the atmospheric turbulence is significantly
reduced.

References

1. Kaur M, Anuranjana SK, Kesarwani A, Vohra PS (2018) Analyzing the internal parameters of
free space optical communication. In: 2018 7th international conference on Reliability, Infocom
Technologies and Optimization (trends and future directions) (ICRITO), Noida, India, 2018,
pp 298–301. https://doi.org/10.1109/ICRITO.2018.8748589
2. Chraplyvy AR (1994) Impact of nonlinearities on lightwave systems. Opt Photon News 5:16–21
3. Bloom S, Korevaar E, Schuster J, Willebrand H (2003) Understanding the performance of
free-space optics [Invited]. J Opt Netw 2:178–200
4. Prakash SA, Banu AT, Raghul EB, Prakash P (2018)Multilevel modulation format conver-
sion using delay-line filter. In: 2018 IEEE world symposium on communication engineering
(WSCE), Singapore, Singapore, pp 1–4. https://doi.org/10.1109/WSCE.2018.8690527
5. Kishikawa H, Seddighian P, Goto N, Yanagiya S, Chen LR (2011) All-optical modulation
format conversion from binary to quadrature phase-shift keying using delay line interferometer.
In: IEEE Photonic Society 24th Annual Meeting, Arlington, VA, pp 513–514. https://doi.org/
10.1109/PHO.2011.6110647
6. Kikuchi N (2005) Amplitude and phase modulated 8-ary and 16-ary multilevel signaling tech-
nologies for high-speed optical fiber communication. In: Proceedings of SPIE 6021, optical
transmission, switching, and subsystems III, 602127 (9 December 2005). https://doi.org/10.
1117/12.636406
7. Charlet G (2006) Progress in optical modulation formats for high-bit rate WDM transmis-
sions. IEEE J Sel Top Quantum Electron 12(4):469–483. https://doi.org/10.1109/JSTQE.2006.
876185
8. Gumaste A, Antony T (2002) DWDM network designs and engineering solutions. Cisco Press,
Indianapolis
340 M. Gautam and S. Sahu

9. Yan C et al (2006) All-optical format conversion from NRZ to BPSK using a single saturated
SOA. IEEE Photon Technol Lett 18(22):2368–2370. https://doi.org/10.1109/LPT.2006.885633
10. Wooten EL et al (2000) A review of lithium niobate modulators for fiber-optic communications
systems. IEEE J Sel Top Quant Electron 6(1):69–82. https://doi.org/10.1109/2944.826874
11. El-Nahal FI, Salha MA (2013) Comparison between OQPSK and DPSK bidirectional radio
over fiber transmission systems. Univers J Electr Electron Eng 1(4):129–133. https://doi.org/
10.13189/ujeee.2013.010405
12. Zheng L, Du J, Xu K, Wu X, Tsang HK, He Z (2017) High speed DPSK modulation up to
30 Gbps for short reach optical communications using a silicon microring modulator. In: 2017
16th international conference on optical communications and networks (ICOCN), Wuzhen, pp
1–3. https://doi.org/10.1109/ICOCN.2017.8121199
13. Khajwal TN, Mushtaq A, Kaur S (2020) Performance analysis of FSO-SISO and FSO-WDM
systems under different atmospheric conditions. In: 2020 7th international conference on signal
processing and integrated networks (SPIN), Noida, India, pp 312–316. https://doi.org/10.1109/
SPIN48934.2020.9071116
14. Elayoubi K, Rissons A, Lacan J, St. Antonin L, Sotom M, Le Kernec A (2017) RZ-DPSK optical
modulation for free space optical communication by satellites. In: 2017 Opto-electronics and
communications conference (OECC) and photonics global conference (PGC), Singapore, pp
1-2. https://doi.org/10.1109/OECC.2017.8115015
15. Okamura Y, Hanawa M (2012) All-optical generation of optical BPSK/QPSK signals inter-
leaved with reference light. IEEE Photon Technol Lett 24(20):1789–1791. https://doi.org/10.
1109/LPT.2012.2209867
16. Alifdal H, Abdi F, Abbou FM (2017) Performance analysis of an 80 Gb/s WDM system
using OQPSK modulation under FWM effect and chromatic dispersion. In: 2017 international
conference on wireless technologies, embedded and intelligent systems (WITS), Fez, pp 1–6.
https://doi.org/10.1109/WITS.2017.7934663
17. Sanou SR, Zougmore F, Koalaga Z (2014) Performances of OFDM/OQPSK modulation for
optical high-speed transmission in long haul fiber over 1600 Km. Glob J Res Eng Gener Eng
14(2):9–14
18. Vanmathi P, Sulthana AKT (2019) Hybrid optical amplifier performance in OAF using OOK
and BPSK modulations. In: 2019 international conference on intelligent computing and control
systems (ICCS), Madurai, India, pp 695–699. https://doi.org/10.1109/ICCS45141.2019.906
5900
19. Mishina K, Kitagawa S, Maruta A (2007) All-optical modulation format conversion from on-
off-keying to multiple-level phase-shift-keying based on nonlinearity in optical fiber. Opt Expr
15:8444–8453
20. Zhou Y, Lord A, Sikora S (2002) Ultra-Long-Haul WDM transmission systems. BT Technol J
20:61–70. https://doi.org/10.1023/A:1021386818577
21. Novak S, Moesle A (2002) Analytic model for gain modulation in EDFAs. J Lightwave Technol
20:975. https://www.osapublishing.org/jlt/abstract.cfm?URI=jlt-20-6-975
22. Sato K (2002) Semiconductor light sources for 40-Gb/s transmission systems. J Lightwave
Technol 20(12):2035–2043. https://doi.org/10.1109/JLT.2002.806763
23. Chraplyvy AR, Tkach RW (1993) What is the actual capacity of single-mode fibers in ampli-
fied lightwave systems? IEEE Photon Technol Lett 5(6):666–668. https://doi.org/10.1109/68.
219704
24. Usha SM, Mahesh HB (2019) Low power and area optimized architecture for OQPSK modu-
lator. In: 2019 international conference on wireless communications signal processing and
networking (WiSPNET), Chennai, India, pp 123–126. https://doi.org/10.1109/WiSPNET45
539.2019.9032723
25. Shaina, Gupta A (2016) Comparative analysis of free space optical communication system for
various optical transmission windows under adverse weather conditions. Procedia Comput Sci
89:99–106
High-Level Synthesis of Cellular
Automata–Belousov Zhabotinsky
Reaction in FPGA

P. Purushothaman , S. Srihari, and S. Deivalakshmi

Abstract The Belousov Zhabotinsky (BZ) reaction is a chemical reaction that oscil-
lates in space and time. The reaction involves complex mechanisms and steps.
However, simple mathematical models of cellular automata (CA) can be used to
replicate the reaction. Cellular automatons have been used for a long time in history
to simulate various physical processes, and the simulations can reveal mathemat-
ical details involved in a seemingly complicated procedure. CA design prototypes
are used to model diffusion processes and chemical reactions. CA models are imple-
mented in FPGAs to achieve accelerated results when compared to CPU-based archi-
tectures. In this research, we implement the BZ reaction on a Xilinx FPGA using a
high-level synthesis methodology with Vivado HLS.

Keywords Belousov Zhabotinsky reaction · Cellular automata · High-level

synthesis

1 Introduction

1.1 Cellular Automata

The idea of CA was introduced by Neumann [1] and Stanislaw Ulam in the 1950s
as a discrete computational model and used to represent complex and nonlinear
dynamic systems. It has a variety of applications in real world [2, 3], from modeling
complex biological systems [4], developing better cryptography algorithms [5], and
to providing a framework to model impacts of socio-economic practices on environ-
ment models [6]. According to Stephen Wolfram [7], cellular automata are primarily
classified into four types as automata in which.

P. Purushothaman (B) · S. Srihari · S. Deivalakshmi

National Institute of Technology, Tiruchirappalli 620015, TN, India
e-mail: [email protected]
S. Srihari
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 341

• Patterns stabilize into homogeneity

• Patterns evolve into oscillating structures
• Patterns evolve in a chaotic manner
• Patterns become extraordinarily complex and last for a long time with stable local
structures.
The BZ reaction belongs to the fourth type, where the stable local structures repeat
themselves for a very long duration of time, and the overall pattern is confoundingly
convoluted.
CAs are one of the first use cases of FPGAs as hardware accelerators [8, 9]. The
improvements in FPGA technology and scalability make them the right choice for
implementing CA models. However, with newer algorithms and operations being
introduced for CAs, developers face the hardship of writing them in lower-level
languages like Verilog and VHDL.

1.2 High-Level Synthesis

High-level synthesis (HLS) is a method for programming FPGAs. It provides the

freedom of implementing FPGA architectures in higher-level languages like C/C++.
It helps designers to implement complex algorithms in a shorter period. Even though
the development time is reduced, the hardware implementation is not guaranteed to
be optimized than conventional methods. Designers can overcome these issues by
using compiler directives present in the HLS software used, which helps in allocating
resources, do pipelining, and various other optimizations. Xilinx provides the Vivado
HLS tool [10] for programming their FPGAs using high-level synthesis flow with
many optimization directives.

2 Belousov Zhabotinsky Reaction

The Belousov Zhabotinsky reaction (BZ reaction) [12] as shown in Fig. 1 is a common
example of a time-based oscillatory chemical reaction. It follows nonlinear chemical
dynamics. When the reaction happens on a two-dimensional plane, self-organized
spirals of different reagents are formed. The reagents have distinctive colors that
help in distinguishing each other. Numerous computer models were proposed to
simulate the spiraling pattern generated by the reaction. In this work, we implement
a simplified version of the BZ reaction presented by Ball [13].
High-Level Synthesis of Cellular Automata–Belousov … 343

Fig. 1 Belousov
Zhabotinsky chemical
reaction [11]

Fig. 2 Spirals in a simulated

BZ reaction cellular
automata

2.1 Simplified Reaction Mechanism

The simplified form of the BZ reaction from [13] can be represented by a sequence
of reactions.

X + Y → 2X (1)

Y + Z → 2Y (2)

Z + X → 2Z (3)

Equation (1) means that, given a sufficient quantity of Y exists, the creation of X
is autocatalyzed. Similarly, Eqs. (2) and (3) are also autocatalyzed until the usage of
Z and X, respectively. Summing up all the three equations, we can observe that the
reaction is a complete cycle. With these equations, we can now model the reaction’s
cellular automaton.
344 P. Purushothaman et al.

Let X t , Y t , Z t represent the quantity/concentration of reagents X, Y, Z at time t.

The quantities/concentrations of reagents at a near-future time t + can be written
as,

X t+ = X t + X t (Yt − Z t ) (4)

Yt+ = Yt + Yt (Z t − X t ) (5)

Z t+ = Z t + Z t (X t − Yt ) (6)

At every instant, the future concentration of any reagent depends upon the concen-
tration of the remaining two. We can introduce additional parameters for modifying
individual reaction rates. However, we ignore them here for the sake of simplicity.
The above reactions are valid only for a single spatial location and can be modeled
without a CA. For representing an oscillating reaction on a two-dimensional surface,
a cellular automaton is needed. The concentrations are averaged over a 3 × 3 neigh-
borhood window to introduce diffusion between reagents, and the results are applied
in Eqs. 4–6. Thus, reaction at a location is influenced by eight of its neighbors. More
sophisticated models can be created by considering a broader neighborhood (Fig. 2).

2.2 Reaction Surface

We have implemented the cellular automaton on a von Neumann grid, also known
as a toroidal grid. In a toroidal grid, the tessellations are always continuous, and
there are no edges. For practically implementing the toroidal grid, we implemented
a two-dimensional array where every edge was warped to their opposite edge.

3 Programming the Automata in FPGA

In the previous section, we explained the reaction mechanism, and in this section,
we explain how the CA is implemented in C++ and the optimizations introduced for
implementing in FPGA.
The concentrations of the reagents are represented as real values and stored in
fixed-point arrays. Two such arrays are created to represent the current and future state
of the concentrations. The initial concentration states are assigned random values. In
the real-life scenario, this process mimics the sites which start the chemical reaction.
After this, we compute the average concentrations of the reagents based upon the
neighborhood and proceed with the reaction equations. Two of the commonly used
neighborhood windows are Moore neighborhood and von Neumann neighborhood
as shown in Fig. 3. If the new concentration of a reagent reaches above one or below
High-Level Synthesis of Cellular Automata–Belousov … 345

Fig. 3 Moore and von Neumann grid with extended neighborhood schemes

zero, they are clipped to 1 or 0, respectively. It ensures that no single reagent saturates
and dominates the reaction. The future state now becomes the current state, and the
current state variables are used for storing the future state. The flowchart (Fig. 4)

Fig. 4 Flowchart
346 P. Purushothaman et al.

explains the process. We initialize the concentrations randomly in the first step. In
the subsequent steps, we update the future concentrations based upon the current
concentrations and swap the current state to future state and vice-versa.

3.1 Optimizations Specific to FPGA

Even though the algorithm written in C++ can directly be implemented with Vivado
HLS, it can be optimized for FPGA implementation, as stated earlier. In this section,
we list the optimizations introduced to the native C++ code. The cellular automaton
model is implemented as a 128 × 128 pixel two-dimensional von Neumann grid.
We chose the Xilinx Zynq Z720 series platform for the purpose. The CA model
architecture’s latency can be decreased further by pipelining the execution. Pipelined
architectures try to club independent operations together and execute them in parallel.
Compiler directive commands can easily invoke pipelining in Vivado HLS. The
instructions try to accommodate pipelining to the possible extent beyond which the
designer is warned to modify the program flow. The concentration variables are
stored in two-port RAM for improving the pipeline performance.

4 Results

4.1 Simulation

The simulation output from Vivado HLS was converted into a picture-sequence
format using additional python scripts. It helped us to visualize the reaction in a
better way. Figures 5, 6 and 7 shows the proceeding of the simulated reaction.

Fig. 5 Random initialization

of concentrations in the
beginning
High-Level Synthesis of Cellular Automata–Belousov … 347

Fig. 6 Loops and curves

start to appear after 40 cycles

Fig. 7 Loops, curves

(highlighted) becomes stable
sustain themselves after 100
cycles

4.2 Synthesis

The HLS code was synthesized after being successfully simulated. The compiler
directives for pipelining and code optimization were included. The synthesized HDL
resource usage is shown in Fig. 8 and given in Table 1.
The results for the executions of the algorithm are summarized in the table below.
The maximum frequency for the algorithm was 119.55 MHz.

5 Conclusions

Thus, in this research article, we presented a simplified representation of the BZ reac-

tion. The simplified model was successfully simulated and implemented in a Xilinx
Zynq FPGA fabric using a high-level synthesis tool. The tool provided the advantages
of optimizing hardware resources without focusing more on the implementation by
the designers.
348 P. Purushothaman et al.

Fig. 8 Resource utilization report from VIVADO HLS

Table 1 Resource utilization

Utilized blocks Available blocks
BRAM 18K 0 280
DSP 48E 15 220
Flip flops 4763 106,400
Lookup tables 7940 53,200

Reference

1. von Neumann J (1966) Theory of self-reproducing automata. University of Illinois Press

2. Sarkar P (2000) A brief history of cellular automata. ACM Comput Surv 32(1):80–107. https://
doi.org/10.1145/349194.349202
3. Harao, Noguchi (1978) On some dynamical properties of finite cellular automaton. IEEE Trans
Comput C-27(1):42–52. https://doi.org/10.1109/TC.1978.1674951
4. Hwang M et al (2009) Rule-based simulation of multi-cellular biological systems—a review
of modeling techniques. Cell Mol Bioeng 2(3):285–294. https://doi.org/10.1007/s12195-009-
0078-2
5. Nandi S, Kar BK, Pal Chaudhuri P (1994)Theory and applications of cellular automata in
cryptography. IEEE Trans Comput 43(12):1346–1357. https://doi.org/10.1109/12.338094
6. Engelen G, White R, Uljee I et al (1995) Using cellular automata for integrated modelling of
socio-environmental systems. Environ Monit Assess 34:203–214. https://doi.org/10.1007/BF0
0546036
7. Wolfram S (1984) Universality and complexity in cellular automata. Phys D: Nonlinear
Phenomena 10(1–2):1–35. ISSN 0167-2789. https://doi.org/10.1016/0167-2789(84)90245-8.
https://www.sciencedirect.com/science/article/pii/0167278984902458
8. Halbach M, Hoffmann R (2004) Implementing cellular automata in FPGA logic. In: 18th
international parallel and distributed processing symposium. In: Proceedings. IEEE
9. Hochberger C et al (2000) The cellular processor architecture CEPRA-1X and its configuration
by CDL. In: International parallel and distributed processing symposium. Springer, Berlin
High-Level Synthesis of Cellular Automata–Belousov … 349

10. Feist T (2012)Vivado design suite. Xilinx Inc., San Jose, White Paper 5, p 30
11. Recreating one of the weirdest reactions, NileRed, Uploaded in Youtube. Link: https://www.
youtube.com/watch?v=LL3kVtc-4vY
12. BZ reactions—Chemistry by Michael Rogers, Stephen Morris
13. Ball P (1996) Designing the molecular world: chemistry at the frontier. Princeton University
Press, vol 19
IoT-Based Calling Bell

Sundara Babu Maddu, Gaddam Venu Gopal, Ch. Lasya Sarada,

and B. Bhargavi

Abstract IoT-based calling bell is used to maintain security and to know imme-
diately who has visited our house. Whenever the person is at the door and hits the
calling bell button, then immediately a call is made to the given phone number. The
owner then receives the call and can communicate with the visitor that if you are out
of house informing them that you will be back home in few minutes or you may ask
them to come on some other convenient time. It also sends a message to the phone
number of owner telling them that some had visited his/her house. This IoT device
is built in a way to improve the efficiency and working of the bell with software.

Keywords IoT bell · Arduino · Security

1 Introduction

The owner needs to register his/her details like name and phone number. The IoT-
based calling bell product will be set up in the house owner and two buttons will be
provided in the system which allows visitor to press the bell buttons. If one button is
pressed, the system sends SMS to the owner that someone had visited the house, and
if other button is pressed, the system makes a call to owner registered number in the
system. The owner will also be provided an option of keeping only one button which
sends both SMS and making call to owner alternatively. This allows owner to know
that someone had visited his/her house. This provides an efficient security system to
the house. In existing calling bells, there is a facility which just allows user to hit the
calling bell and as a result of it just rings in the house. If the user presses the calling
bell in the absence of people at home, the owner is unable to identify who has visited
his house. In this context, this application/device is not at all useful to the owner.
In proposed system, advantage of higher security, convenient and efficient commu-
nication with the visitor. In order to make the owner alert, the stranger presses the

S. B. Maddu (B) · G. Venu Gopal · Ch. L. Sarada · B. Bhargavi

P V P Siddhartha Institute of Technology, Kanuru, Vijayawada, Andhra Pradesh, India

© Springer Nature Singapore Pte Ltd. 2021 351

bell at the door at first and then it sends a message to the owner. Next, the commu-
nication is sent to the given phone number through a call automatically. The owner
then receives message that someone is at their door and then the call is made and
can communicate with the visitor that if you are out of house informing them that
you will be back home in few minutes or you may ask them to come on some other
convenient time.

2 Proposed Method

First the owner needs to register his/her details like name and phone number. The
IoT-based calling bell product will be set up in the house owner. Two buttons will be
provided in the system which allows visitor to press the bell buttons. If one button
is pressed, the system sends SMS to the owner that someone had visited the house.
If other button is pressed, the system makes a call to owner registered number in the
system. The owner will also be provided an option of keeping only one button which
sends both SMS and making call to owner alternatively. This allows owner to know
that someone had visited his/her house. This provides an efficient security system to
the house.

3 Arduino Specific Instructions

In the event of setting up Arduino software, there is a need to go back to home screen
and select the desired board from the list on the right column of the page. In this
context, the manufacturers of ESP 8266 AI THINKER have launched A6 GSM.
• It is observed that the module is less expensive than SIM 900 which can be
connected in an easy manner. In the above diagrams, it is seen how to connect
with Arduino while making a call and sending SMS.
• In order to provide power to the A6 GSM mobile, a mobile adapter is used where
vcc pin of GSM can be circled with PWR_KEY pin that acts as a chip enable. In this
process, it can be connected or removed at regular intervals whenever necessary.
To start the module, it is required to operate a HIGH trigger at PWR_KEY pin.
• And then, a suitable SIM is inserted in the stipulated module which is mainly to
fix micro SIM. When a nano- SIM is used, it is required to utilize a converter to
fit the SIM in the slot.
• The RxD pin of A6 GSM is connected to Tx of Arduino.
• The TxD pin of A6 goes to Rx of Arduino.
• GND pin of A6 to GND of Arduino (Fig. 1).
IoT-Based Calling Bell 353

Fig. 1 Arduino board and pin diagram

4 Screens

See Figs. 2, 3 and 4.

Fig. 2 GSM module interfacing with Arduino

354 S. B. Maddu et al.

Fig. 3 Making a call module

Fig. 4 SMS module

IoT-Based Calling Bell 355

5 System Test

In this, test cases are decided based on specification of the system. The software or
the module to be tested is treated as a black box testing, and hence, this also called
black box testing (Figs. 5, 6, 7, 8, 9, 10, 11 and 12; Tables 1, 2, 3, 4, 5, 6, 7, and 8).

Fig. 5 Test case for power supply to Arduino

Fig. 6 Test case for no proper power supply to Arduino

356 S. B. Maddu et al.

Fig. 7 Test case for proper power supply to GSM module

Fig. 8 Test case for no

proper supply to GSM
module

6 Conclusion

Therefore, it is concluded that the owner of the concerned house is required to register
the information like name and phone number. The IoT-based calling bell with two
buttons can be established that facilitates the visitor to press the bell buttons. Suppose
one button is pressed, the system communicates SMS to the owner when someone
visits the house. When the other button is pressed, the system makes a call to the
owner’s register number. And also, the owner is given an opportunity to send SMS
and make a call alternatively with the help of a button that allows him to about
the identification of the visitor. It works like an effective security system when it is
IoT-Based Calling Bell 357

Fig. 9 Test case for

receiving SMS from GSM
module to given phone
number

connected with a mike, speaker and camera. It enables the system to make a call with
what the owner is able to communicate with the visitor with the help of a video call.
358 S. B. Maddu et al.

Fig. 10 Test case for not

receiving SMS from GSM
module to given phone
number
IoT-Based Calling Bell 359

Fig. 11 Test case for making

call from GSM module to
given phone number
360 S. B. Maddu et al.

Fig. 12 Test case for not

able to make a call from
GSM module to given phone
number

Table 1 Checking for proper power supply to Arduino board

Trial. S. no Key in Projected Experimental Condition
performance performance P = Yes
F = No
1 Power supply to Proper supply Proper supply Yes
Arduino

Table 2 Checking for inproper power supply to Arduino board

Trial. S. no Key in Projected Experimental Condition P = Yes F
performance performance = No
2 Power supply to No proper No proper Yes
Arduino supply supply
IoT-Based Calling Bell 361

Table 3 Checking for proper power supply to GSM Module

Trial. S. no Key in Projected Experimental Condition
performance performance P = Yes
F = No
3 Power supply to GSM Proper supply Proper supply Yes
module

Table 4 Checking for inproper power supply to GSM Module

Trial. S. no Key in Projected Experimental Condition
performance performance P = Yes
F = No
4 Power supply to GSM No proper No proper supply Yes
module supply

Table 5 Checking for Message reception

Trial. S. no Key in Projected Experimental Condition P = Yes F
performance performance = No
5 Sending SMS to Message Message Yes
given phone received received
number

Table 6 Checking for no Message reception

Trial. S. no Key in Projected Experimental Condition
performance performance P = Yes
F = No
6 Sending SMS to given SMS not SMS not Yes
phone number received received

Table 7 Checking for Making a call

Trial. S. no Key in Projected Experimental Condition
performance performance P = Yes
F = No
7 Making a call to given Made a call Made a call Yes
phone number

Table 8 Checking for not making a cal

Trial. S. no Key in Projected Experimental Condition
performance performance P = Yes
F = No
8 Making a call to given No call made No call made Yes
phone number
362 S. B. Maddu et al.

References

1. https://projects.raspberrypi.org/en/projects/noobs-install
2. https://www.dummies.com/computers/arduino/how-to-install-arduino-for-windows/
3. https://www.cloudreach.com/blog/iot-doorbell/
4. https://www.hackster.io/taiyuk/iot-doorbell-faee18
5. https://harizanov.com/2013/07/raspberry-pi-emalsms-doorbell-notifier-picture-of-the-person-
ringing-it/
6. https://www.twilio.com/blog/2016/11/make-receive-phone-calls-python-bottle-twilio-voice.
html
7. https://raspi.tv/2013/how-to-use-interrupts-with-python-on-the-raspberry-pi-and-rpi-gpio
8. https://github.com/heston/ding-dong-ditch
9. https://www.twilio.com/docs/tutorials/server-notifications-python-django#listing-our-server-
administrators
10. https://www.twilio.com/blog/2016/11/make-receive-phone-calls-python-bottle-twilio-voice.
html
11. https://www.fullstackpython.com/blog/send-sms-text-messages-python.html
12. https://www.twilio.com/docs/guides/how-to-make-outbound-phone-calls-in-python
Mobile Data Applications
Development of an Ensemble Gradient
Boosting Algorithm for Generating
Alerts About Impending Soil Movements

Ankush Pathania , Praveen Kumar , Priyanka , Aakash Maurya ,

K. V. Uday , and Varun Dutt

Abstract Natural disasters such as landslides are the source of immense damage
to life and property. However, less is known on how one could generate accu-
rate alerts against landslides sufficiently ahead in time. The major objective of this
research is to develop and cross-validate a novel ensemble gradient boosting algo-
rithm for generating specific warnings about impending movements of soil at a
actual landslide site. Data about soil movements at 10-min intervals were collected
via a landslide monitoring system deployed at a actual landslide site in real world
situated at the Gharpa Hill, Mandi, India. A new ensemble support vector machine–
extreme gradient boosting (SVM-XGBoost) algorithm was developed, where the
alert predictions of an SVM algorithm were fed into an XGBoost classifier to predict
the alert severity 10-min ahead of time. The performance of the SVM-XGBoost
algorithm was compared to other algorithms including, Naïve Bayes (NB), decision
trees (DTs), random forest (RF), SVMs, XGBoost, and different new XGBoost vari-
ants (NB-XGBoost, DT-XGBoost, and RF-XGBoost). Results revealed that the new
SVM-XGBoost algorithm significantly outperformed the other algorithms incor-
rectly predicting soil movement alerts 10-min ahead of time. We highlight the
utility of developing newer ensemble-based machine learning algorithms for an alert
generation against impending landslides in the real world.

Keywords Landslides · Machine learning · Decision tree · Random forest ·

Support vector machine · Naïve Bayes · Extreme gradient boost · Classification ·
Alerts

A. Pathania · P. Kumar · Priyanka · A. Maurya · V. Dutt (B)

Applied Cognitive Science Lab, Indian Institute of Technology Mandi Kamand, Mandi, India
e-mail: [email protected]
K. V. Uday
Geohazard Studies Laboratory, Indian Institute of Technology Mandi Kamand, Mandi, India

© Springer Nature Singapore Pte Ltd. 2021 365

1 Introduction

Natural disasters are source of vast damage to property and lives [1]. Landslides are
most common natural disasters in hilly regions [2]. These landslides cause problems
like roadblocks, as well as other forms of damage to lives and property [2]. If people
and authorities could be alerted about soil movements on hills sufficiently in advance,
then these alerts may help to timely evacuate people from landslide sites as well as
divert traffic on roads about to be blocked by a landslide [3]. To generate alerts from
landslide sites, one needs to develop and deploy landslide monitoring systems. Recent
research has developed and used the internet of things (IoT)-based landslide moni-
toring systems on real-world landslide sites [4]. For example, the developed system
is capable of recording soil movements and logging them into a cloud server at 10-
min intervals. The data recorded consists of readings of five three-axis accelerometer
sensors placed vertically beneath the soil sub-surface 1-m apart from each other at a
landslide site. The soil displacement values are computed using these accelerations,
and the displacement values (soil movements) are then used to monitor soil move-
ments and impending landslides. Thus, data collected by the deployed system could
then be used to generate timely alerts via SMSes on cellphones with some lead time
[4–7]. However, the generation of alerts of different severity ahead of time may need
the involvement of machine learning (ML) algorithms.
Study of literature reveals that a lot of ML algorithms have been used for landslide
applications [4–15]. For example, Bui et al. used support vector machine (SVM),
decision tree (DT), and Naïve Bayes (NB) ML algorithms for landslide suscepti-
bility mapping in Vietnam [11]. Similarly, Chen et al. used support vector machine
(SVM), random forest (RF), and a logistic model tree (LMT) for landslide suscep-
tibility mapping in the long county area (China) [12]. Also, Kumar et al. compared
ensemble and non-ensemble ML algorithms to predict the amount of debris flow at
the Tangni landslide in India. Non-ensemble algorithms (sequential minimal opti-
mization (SMO), and autoregression (AR)) and ensemble algorithms (random forest,
bagging, stacking, and voting) involving the non-ensemble algorithms have also been
proposed to predict weekly debris flow at the Tangni landslide (India) [6]. Similarly,
Sahin et al. have proposed gradient boosting machines (GBM), extreme gradient
boosting (XGBoost), and random forest (RF) algorithms for landslide susceptibility
mapping for the Ayancik District of Sinop Province (Turkey) [16].
The different ML applications detailed above have either been for susceptibility
mapping or the prediction of the amount of debris flow on a landslide. However,
less attention has been given on the generation of alerts from landslide sites ahead in
time based upon soil displacements occurring currently and in the recent past. The
prime objective of this paper is to address this gap in the literature and to develop
ML algorithms for generating alerts about the severity of soil movements ahead in
time by relying upon recent soil movements. Specifically, in this paper, we propose
a new ML algorithm, support vector machine–extreme gradient boosting (SVM-
XGBoost), where the movement severity prediction of the SVM algorithm is first
obtained. Then it is fed into an XGBoost algorithm to derive the final predictions
Development of an Ensemble Gradient Boosting Algorithm … 367

about the severity of soil movements. These predictions could then be used for
generating alerts on cellphones of people living in the landslide-prone area. For
benchmarking the performance of the new SVM-XGBoost algorithm, we evaluated
the soil movement severity predictions from single algorithms (e.g., SVM [17], DT
[18], RF [19], NB [20], and XGBoost [21]) and other variants of the new ensemble
gradient algorithm (NB-XGBoost, DT-XGBoost, and RF-XGBoost).
In what follows, first, we discuss the related work and the data used for this study.
Then, we brief the different ML algorithms that we used for classification purpose
in this study. Finally, we report results from different ML algorithms and finalizing
the paper by highlighting the implications of our findings for prediction and alert
generation for impending soil movement.

2 Background

ML algorithms have been used in the past for the prediction of natural phenomena,
including soil movements and associated landslides [4–15]. Landslides are a result
of excessive soil movements, and the occurrence of landslides is rare events. Hence,
the application of ML in landslide prediction is a class imbalance problem [13].
Such issues may thus require the use of precision, recall or true positive (TP) rate,
false positive (FP) rate, F 1 score, receiver operating characteristic (ROC), area under
ROC curve (AUC), and sensitivity index (d ) compared to traditional measures like
accuracy [22].
Some ML approaches have been developed for landslide susceptibility mapping
[6, 7, 11–16]. For example, Ref. [11] used SVM, DT, NB for landslide susceptibility
mapping in Hoa Binh Province (Vietnam). Results showed that the SVM algorithm
performed the best, followed by DT and NB algorithms. Reference [12] used SVM
and RF and logistic model tree (LMT) for landslide susceptibility mapping in the
Long County area (China). Results showed that the RF algorithm outperformed the
other two algorithms. Due to the class imbalance datasets, Refs. [11, 12] used ROC
curves and AUCs to analyze different algorithms.
Reference [13] compared logistic regression, DT, SVM, RF, and multilayer
perceptron (MLP) algorithms to classify the landslide susceptibility mapping using
the rainfall and previous instances of landslide between 2011 and 2015 on National
Highway NH-21 between Mandi and Manali. They tackled the imbalanced class
problem using oversampling techniques to enhance the predictions. These algorithms
were validated using tenfold cross-validation, and sensitivity index (d ) was used to
evaluate the scores. The best performing algorithm was the RF, followed by DT and
logistic regression algorithms.
Reference [16] used gradient boosting machines (GBM), extreme gradient
boosting (XGBoost), and random forest (RF) algorithms to map landslide suscep-
tibility in the Ayancik District of Sinop Province, situated in the Black Sea region
of Turkey. 105 landslide locations in the area and 15 landslide causative factors
were used for this study [16]. The performance of the ensemble algorithms was
368 A. Pathania et al.

validated using different accuracy metrics, including AUC, overall accuracy (OA),
root mean square error (RMSE), and Kappa coefficient. Results showed that the
XGBoost method produced higher accuracy results and thus performed better than
other ensemble methods [16].
There is also literature on the use of ML for debris flow prediction [6]. For example,
Ref. [6] compared non-ensemble ML algorithms (sequential minimal optimization
(SMO), and autoregression) and ensemble ML algorithms (RF, bagging, stacking,
and voting) involving the non-ensemble algorithms to predict the weekly debris
flow at the Tangni landslide, India between 2013 and 2014. Results revealed that
the ensemble algorithms (RF, bagging, and stacking) performed better compared to
non-ensemble algorithms [6].
As explained above, both non-ensemble and ensemble ML algorithms were
proposed in literature either for landslide susceptibility mapping or for the prediction
of debris flow [6, 7, 11–16]. However, less attention has been given on the generation
of alerts ahead in time based upon soil displacement severity at landslide sites. This
research addresses this literature gap by considering the prediction of soil displace-
ment severity and soil movement alerts via a novel ensemble ML algorithm, support
vector machine–extreme gradient boosting (SVM-XGBoost).
Specifically, in this paper, we develop the SVM-XGBoost algorithm, where the
movement severity prediction of the former algorithm is first obtained. Then it is fed
into the later algorithm to derive the final predictions about the severity of soil move-
ments. We compare the performance of the SVM-XGBoost algorithm with other
single algorithms (SVM, DT, RF, NB, and XGBoost) as well as different novel
ensemble variants (NB-XGBoost, DT-XGBoost, and RF-XGBoost). For bench-
marking the performance of these new algorithms, we evaluated the soil movement
severity prediction from single algorithms like SVM [17], DT [18], RF [19], NB [20],
and XGBoost [21] using the tenfold cross-validation procedure [20]. The choice of
the single algorithms was based on their performance for landslide susceptibility
mapping or debris flow predictions in prior literature.

3 Methodology

3.1 Data

The dataset analyzed in this paper belongs to the Gharpa landslide in Mandi district
of Himachal Pradesh, India. This landslide is located on Mandi-Bajaura road, which
is a connecting route between Mandi and Kullu districts of Himachal Pradesh, India.
This road is of considerable significance as it is used as an alternate route during
monsoon when the national highway between Mandi and Kullu is blocked due to
heavy rains. Data on soil movements were collected from the Gharpa landslide on
a 10-min scale between July 26, 2019, and October 6, 2019, across five different
sensors in a single borehole. The borehole contained five sensors S1, S2, S3, S4,
Development of an Ensemble Gradient Boosting Algorithm … 369

and S5 to varying depths of 1 m, 2 m, 3 m, 4 m, and 5 m beneath the ground,

respectively [4] (see Figs. 1 and 2). The soil movement data were measured using
an MPU6050 sensor, which is a three-axis accelerometer. The sensor nodes were
placed in a single borehole each 1-m apart (Fig. 2). Each sensor node consists of
MPU6050, a capacitive soil moisture sensor, and a force sensor. The accelerations
acting on different sensors were converted into displacements using the following
kinematical equation:

S = ut + 1/2 at 2 (1)

Here u is the initial velocity, a is the acceleration, t is the time over which the
acceleration acts on the sensor, and S is the displacement. Here, u is zero, as there
is no initial velocity component. The acceleration values are the values given by
the MPU6050 accelerometer sensor; and, t is 0.001, which is one millisecond (time
taken by the accelerometer to read the values of accelerations) [4].
Equation 1 was used to derive the displacement along the three-axes, and the resul-
tant displacement of these three mutually perpendicular displacements was calculated
as the actual displacement for a particular sensor.
It was observed that sensor S3 possessed 1004 nonzero displacement instances
out of a total of 2344 zero and nonzero displacement instances, which were the
most among all five sensors. Thus, sensor S3 was the one that was closest to the

Fig. 1 Location of the landslide at Gharpa Hill, Mandi District, India

370 A. Pathania et al.

Fig. 2 Sensor placement in the borehole containing five sensors at regular depths of 1 m

landslide failure plane, and data of this sensor has been used for comparing different
algorithms.
We classified actual displacement for each instance of the sensor S3 into four
classes of displacements based upon their severity. The no displacement class (class
0) represented absolute zero displacements. The low displacement class (class 1),
moderate displacement class (class 2), and high displacement class (class 3) repre-
sented displacements based upon their percentiles (see Table 1). For computing these
percentiles, all S3 sensor displacements were converted into a Z-score (using mean
= 3.248 µm and standard deviation = 2.271 µm). Next, the Z-score ranges (see
Table 1) were used to compute different percentiles. Once data were divided into
different classes, it was split into 10-parts for a tenfold cross-validation procedure.
In the cross-validation procedure, each algorithm was trained on 9-parts and tested
on 1-part. This procedure was repeated 10-times, i.e., once for each tested part. The
training data were used to find the best values of parameters in different algorithms,
whereas the test data were used to test the algorithms with the best parameter values
found during training.
Development of an Ensemble Gradient Boosting Algorithm … 371

Table 1 Range of percentiles

Class Range of percentiles Z-score range
and their corresponding
classes Class 0 (No DNEa DNEa
displacement class)
Class 1 (Low 0 to 25th Less than −0.674
displacement class)
Class 2 (Moderate 25th to 75th −0.674 to +0.674
displacement class)
Class 3 (High 75th to 100th More than +0.674
displacement class)
a Does not exist as Class 0 contains absolute zero displacement
instances; therefore, it is not related to the standard normal
distribution of nonzero displacement instances

Table 2 Composition of
Class Number of instances
each class across data
Class 0 (No displacement class) 1340
Class 1 (Low displacement class) 377
Class 2 (Moderate displacement class) 406
Class 3 (High displacement class) 221

For doing ML analyses, it is required to have two or more classes, generally

referred to as the positive class and the negative class. In these data, we had four
different classes. Thus, for example, when the actual value of the class was class 2,
then class 2 was taken as positive, and all other classes (class 0, class 1, and class
3) were considered as negative. Based upon prior research [6], we used the actual
displacements (in µm) of the four successive previous 10-min instances (including
the current instance) as an input for the algorithms evaluated. We got the displacement
class for the next 10-min interval. Table 2 shows the composition of each class across
data. As can be seen in Table 2, there were fewer instances of the low, moderate, and
high movement classes compared to the no-movement class. Thus, the dataset was
class imbalanced. In the next section, we will discuss the methodology followed,
keeping in mind the class imbalance in data.

3.2 Measures for Evaluating ML Algorithms

The most basic way of evaluating the performance of an ML algorithm is error rate
or accuracy of the algorithm. Accuracy is the rate at which a classifier classifies
correctly. However, evaluating the performance by accuracy can give ambiguous
assumptions for class-imbalanced data [5]. For example, 99% of instances of the
data belong to a particular class, now if a classifier is trained on this dataset, then it
can easily get an accuracy of 99% by simply classifying every instance as belonging
372 A. Pathania et al.

to that class. Therefore, in a class-imbalanced dataset it is significant to consider

performance measures that assess the classifier’s implementation class-wise. Based
upon literature [22], we have chosen precision, recall or true positives (TP) rate, false
positives (FP) rate, F 1 score, and sensitivity index (d ) as measures for performance
evaluation of different ML algorithms. For a binary classification, i.e., classification
on dataset containing only two classes positive and negative, the precision, recall or
TP rate, FP rate and F1 score are defined as:

Precision = (number of TP)/(number of TP + number of FP) (2)

Recall or TP rate = (number of TP)/(number of TP + number of FN) (3)

FP rate = (number of FP)/(number of FP + number of TN) (4)

F1 score = 2 ∗ (Recall ∗ Precision)/(Recall + Precision) (5)

where TP are the number of instances of positive class identified as of positive

class,
FP are the number of instances of negative class identified as positive class, true
negatives (TN) are the number of instances of negative class identified as negative
class, and false negatives (FN) are the number of instances of positive class classified
as negative class.
The sensitivity index has been used as the primary performance metric for eval-
uation of different classifiers in this research. The sensitivity index (d ) represents
the separation between the means of the correctly classified instances and the incor-
rectly classified instances [17]. The formula used to calculate it is as per the following
equation:

d = Z(TP rate) − Z(FP rate) (6)

where function Z(p), p ∈ [0, 1], is the inverse of the cumulative distribution func-
tion of the Gaussian distribution. The greater (and more positive) is the value of sensi-
tivity index (d ) of an ML algorithm, the enhanced is the algorithm’s performance
compared to other algorithms.
We used the sensitivity index for evaluating the performance of different sets of
hyperparameters across algorithms using the tenfold cross-validation procedure.

3.3 Different Algorithms Used for Classification

1. Support vector machine. It is a supervised learning algorithm constructed on

statistical learning theory and the structural risk minimization principle [17]. It
Development of an Ensemble Gradient Boosting Algorithm … 373

uses the train data to indirectly map the given input space into a high-dimensional
feature space [23]. Further, in this high-dimensional feature space, the optimal
hyperplane which separates the classes are calculated by minimizing the classifi-
cation errors and maximizing the margins of class boundaries [23]. The objective
function is used to penalize the model for instances that are either misclassified
or lie within the margin [23]. The regularization parameter is a degree of impor-
tance, which is given to misclassifications while finding the optimal hyperplane.
The kernel parameter is used to vary the shape of this hyperplane [23].
2. Decision tree. It is a hierarchical algorithm composed of decision rules that
recursively split independent variables into zones. Split is done in such a way;
the maximum homogeneity for a node is achieved after every split [18]. Homo-
geneity of a sample is measured as its entropy (0 for completely homogenous
and 1 for equally heterogeneous) and based on the decrease in entropy after a
dataset is split on an attribute the information gain of that attribute is defined
[24]. The dataset is divided into branches by finding the attribute with the largest
information gain and repeating this process until the termination condition on
every branch gives our decision tree [24]. The min samples split parameter is
the minimum number of samples required to split an internal node, and the max
depth parameter is the maximum number of edges from the root to the leaf of
the tree [25].
3. Random forest. It is an ensemble algorithm that exploits many classification
trees (a “forest”) to stabilize the algorithm predictions [19]. It exploits binary
trees that use a randomly generated subset of the data which contain subset of
total attributes generated through bootstrapping techniques [17]. Every tree is
made in order to minimize classification errors, and an ensemble of multiple
trees is used to maximize the algorithm’s stability [21]. The number of estimator
parameters is the number of trees in the forest. Min samples split parameters
are the minimum number of samples required to split an internal node of a tree,
and the max depth parameter is the maximum number of edges from the root to
the leaf of the tree [26].
4. Naive Bayes. It is a probabilistic algorithm based on Bayes’ theorem. Using the
training data, it develops a posterior conditional probability for classification
into a particular class, given the feature instances, likelihood function, and a
prior probability [20]. This algorithm has no such hyperparameters.
5. XGBoost. Extreme gradient boosting (XGBoost) is a supervised learning
method based on decision tree boosting [27]. It uses ensemble technique to
sequentially adds up several decision trees and optimizes the loss function by
using the gradient descent boosting method [27]. It also uses a variety of other
ways to avoid overfitting and reduce time to completion [27]. The number of
estimators parameter is the number of trees in the model, and the max depth
parameter is the maximum number of edges from the root to the leaf for a tree
[28].
6. SVM-XGBoost. Support vector machine–extreme gradient boosting (SVM-
XGBoost) is the novel ensemble ML algorithm that we propose in this paper,
which is based on the ensemble of two algorithms, the former SVM and the
374 A. Pathania et al.

later XGBoost. The movement severity prediction of the former algorithm is

first obtained. Then it is fed into a later algorithm to derive the final predictions
about the severity of soil movements. We have also used the DT-XGBoost,
NB-XGBoost, and the RF-XGBoost variants of these novel ensemble ML
algorithms, which can be defined similarly.

3.4 Model Calibration

1. Support vector machine. In SVM, the kernel hyperparameter was varied

between linear, polynomial, and radial basis function (RBF) kernel. For the
polynomial kernel, the degree of the polynomial was varied between 1, 2, and
3. The gamma parameter for RBF and polynomial kernel was set to “scale,”
which is the default value in the scikit-learn library of python used to imple-
ment SVM. The regularization parameter (C) was varied logarithmically from
0.1 to 1000. The best set of hyperparameters was RBF kernel with 1000 as a
regularization parameter.
2. Decision tree. In DT, the min samples split parameter was varied between 2,
3, and 4. The max depth parameter was set to default value as in the scikit-
learn library in python used to implement a decision tree, which means nodes
are expanded until all leaves are pure min sample split criterion violates. The
best set of hyperparameters was the one with two as minimum samples split
parameter.
3. Random forest. In RF, the number of estimator parameter was varied between
50, 100, 150, and 1000. The min samples split parameters were varied between
2, 3, and 4, and the max depth parameter was set to the default value. The
default value as in the scikit-learn library in python used to implement random
forest, which means nodes are expanded until all leaves are pure min sample
split criterion violates. The best set of hyperparameters was two as minimum
samples split and 150 as a number of estimators.
4. Naive Bayes. NB algorithm does not have any such hyperparameters.
5. XGBoost. In XGBoost, the number of estimators’ parameter was varied between
50, 100, 150, and 1000. The max depth parameter was varied between 2, 5,
and 10, whereas the other parameters were used in their default value as in
the scikit-learn library of python used to implement XGBoost. The best set of
hyperparameters was ten as max depth and 150 as several estimators.
6. SVM-XGBoost. For all the novel ensemble-based ML algorithms that we
proposed in this paper, we used the best set of hyperparameters for every single
algorithm which was obtained while running each algorithm individually. That
means the XGBoost algorithm used in SVM-XGBoost and its three variants, i.e.,
DT-XGBoost, RF-XGBoost, and NB-XGBoost algorithms, had 150 as several
estimators and ten as max depth hyperparameters. The SVM algorithm in the
SVM-XGBoost had a radial basis function (RBF) kernel and 1000 as regular-
ization hyperparameter. The DT algorithm used in the DT-XGBoost had two as
Development of an Ensemble Gradient Boosting Algorithm … 375

min samples split hyperparameter. The RF algorithm used in the RF-XGBoost

had 150 as the number of estimators and two as min samples split hyperparam-
eter. In contrast, the NB algorithm uses no such hyperparameter, so it was used
as it is in NB-XGBoost.

4 Results

Table 3 illustrates the performance of the novel ensemble-based ML algorithms,

SVM-XGBoost, and its three variants, i.e., DT-XGBoost, NB-XGBoost, and RF-
XGBoost algorithms, which we propose in this paper alongside standard XGBoost
algorithm. These algorithms involve an ensemble of two individual algorithms where
the prediction of soil movement severity from the first algorithm is first calculated
and fed as an additional attribute for the second algorithm, which predicts the final
soil movement severity. Therefore, these algorithms have two sets of best hyperpa-
rameters (one for each algorithm), which are shown in Table 4. As the dataset was
class-imbalanced, the classifying algorithm’s performance was primarily compared

Table 3 Tenfold cross-validation comparisons of different new classification algorithm on the

landslide dataset
Classifier Comparison metrics
Accuracy Precision TP rate or FP rate F 1 score Sensitivity
recall index (D )
SVM-XGBoost 90.46 0.942 0.837 0.054 0.881 2.553
DT-XGBoost 90.23 0.940 0.823 0.055 0.877 2.528
NB-XGBoost 90.11 0.941 0.822 0.056 0.877 2.516
RF-XGBoost 90.00 0.938 0.819 0.056 0.874 2.502
XGBoost 74.32 0.652 0.555 0.149 0.599 1.179

Table 4 Best hyperparameters for different new classification algorithm on the landslide dataset
Classifier Best hyperparameter for the first Best hyperparameter for XGBoost in
algorithm each new algorithm
SVM-XGBoost SVM: (Kernel: RBF, C: 1000) XGBoost: (N estimators: 150, Max.
depth: 10)
DT-XGBoost DT: (Min. samples split: 2) XGBoost: (N estimators: 150, Max.
depth: 10)
NB-XGBoost NB: (No such Hyperparameters) XGBoost: (N estimators: 150, Max.
depth: 10)
RF-XGBoost RF: (N estimators : 150, Min. samples XGBoost: (N estimators: 150, Max.
split: 2) depth: 10)
XGBoost XGBoost: (N estimators: 150, Max. –
depth: 10)
376 A. Pathania et al.

by the sensitivity index (D-prime) of each algorithm. We have also displayed the TP
rate or Recall, FP rate, Precision, F 1 score, and accuracy of each algorithm was the
further reference.
The sensitivity index for the novel ensemble-based ML algorithms was highest
for SVM-XGBoost, followed by DT-XGBoost, followed by NB-XGBoost, followed
by RF-XGBoost. The sensitivity index for all of these new algorithms was more
than twice the sensitivity index of the standard XGBoost algorithm. The TP rate
and FP rate of SVM-XGBoost algorithm were 0.837 and 0.054, respectively, when
compared to the TP rate and FP rate of standard XGBoost, i.e., 0.555 and 0.149,
there was 50.82% improvement (increase) to the TP rate and 63.76% improvement
(decrease) to the FP rate of standard XGBoost, respectively. The accuracy and f1 score
jumped from 74.32% and 0.599, respectively, of standard XGBoost to 90.46% and
0.881, respectively, for the SVM-XGBoost algorithm. These highlighted the signif-
icant improvement in performance when the novel ensemble-based ML algorithms
proposed in this paper were used.
Table 5 displays the performance of the different traditional ML classifiers along-
side the new SVM-XGBoost algorithm. The sensitivity index for traditional algo-
rithms apart from the XGBoost algorithm was highest for SVM, followed by Decision
trees, followed by the random forest. Naive Bayes was performing the worst out of
the conventional algorithms and had a sensitivity index of 0.306, which was nearly
3.75 folds, the sensitivity index of the SVM algorithm. The sensitivity index had
increased from 1.146 that of the best traditional classifier listed in this table, SVM to
2.553 that of SVM-XGBoost, which was about 123% improvement, when compared
to the sensitivity index of Naive Bayes to SVM-XGBoost it is about 734% improve-
ment. Table 6 shows the best hyperparameters of different traditional classifiers on
the landslide dataset.

Table 5 Tenfold cross-validation comparisons of different traditional classifiers on the landslide

dataset
Classifier Comparison metrics
Accuracy Precision TP rate or FP rate F 1 score Sensitivity
recall index (D )
SVM 73.59 0.636 0.555 0.157 0.593 1.146
Decision 74.15 DNEa 0.534 0.153 DNEa 1.111
tree
Random 74.36 DNEa 0.532 0.152 DNEa 1.108
forest
Naive 57.68 DNEa 0.357 0.251 DNEa 0.306
Bayes
SVM- 90.46 0.942 0.837 0.054 0.881 2.553
XGBoost
a Does not exist as a division of zero by zero occurred in the calculation
Development of an Ensemble Gradient Boosting Algorithm … 377

Table 6 Best hyperparameters of different traditional classifiers on the landslide dataset

Classifier Best Hyperparameter
SVM SVM: (Kernel: RBF, C: 1000)
Decision tree DT: (Min. samples split: 2)
Random forest RF: (N estimators: 150, Min. samples split: 2)
Naive Bayes NB: (No such hyperparameters)
SVM-XGBoost SVM: (Kernel: RBF, C:1000), XGBoost: (N estimators: 150, Max. depth: 10)

5 Discussion and Conclusion

The primary emphasis of ML algorithms in the landslide domain is to predict the

soil movements promptly so that people can be alerted about impending landslides.
When warning people about impending landslides, it is essential to note that class-
based alert messages reflecting the soil movement severity make a higher sense
than reporting the exact soil movement. In this work, we applied support vector
machines (SVM), decision trees (DT), random forest (RF), naive Bayes (NB),
extreme gradient boosting (XGBoost), and proposed four novel ensemble-based
ML algorithms (SVM-XGBoost, DT-XGBoost, NB-XGBoost, RF-XGBoost). We
mainly use the prediction of the former algorithm as an additional attribute for
predicting the final class by the latter algorithm to predict the class of soil movement
on a ten-minute time interval at Gharpha landslide in Himachal Pradesh, India. The
entire dataset was classified into four classes based on soil movement severity of each
instance, class 0 (no displacement), class 1 (low displacement), class 2 (moderate
displacement), and class 3 (high displacement) using the training dataset. Classifying
algorithms were used to generate the class of soil movements in the subsequent 10-
min interval given the history of movements in prior, i.e., four successive ten minutes
intervals and tenfold cross-validation was performed to evaluate the performance of
all algorithms. We used precision, recall, or true positive rate (TP rate), false posi-
tive rate (FP Rate), F 1 score, and the sensitivity index (d ) to compare the results.
Our results revealed that the new algorithms (SVM-XGBoost, DT-XGBoost, NB-
XGBoost, RF-XGBoost) performed significantly better compared to the traditional
algorithms. Among these new algorithms, SVM-XGBoost was the best amongst all
the algorithms used.
First, the new ensemble-based gradient boosting algorithms performed better
compared to traditional algorithms in the soil movement class prediction exercise.
A likely reason for this result could be that the ensemble approach using predic-
tion from the former algorithm as an attribute for the XGBoost algorithm helped in
better prediction of the XGBoost algorithm. For example, the results of Ref. [16]
presented that the XGBoost algorithm, according to the optimum algorithm achieved
lower prediction error and higher accuracy results than the other ensemble methods.
But combining the two algorithms improved the performance of the new algorithms
further.
378 A. Pathania et al.

Second, it was found that XGBoost performed best amongst the traditional algo-
rithms we used, followed by SVM. XGBoost produces a prediction algorithm in the
form of a boosting ensemble of weak classification trees by a gradient descent that
optimizes the loss function [29]. The algorithm is highly effective in reducing the
processing time, and can be used for both regression and classification tasks [29].
Our results have several implications for predicting soil movement in the real
world and alerting people about impending landslides. The novel ensemble-based
ML algorithms that we propose in this paper performed significantly better than the
existing traditional algorithms. Thus, people living or visiting landslide-prone areas
can be benefited by alerts of impending landslides ahead of time using this research.
Policymakers like the government shall deploy these sensors at various stations and
use newer ensemble-based ML algorithms to generate alert for the affected people
and prepare for damage repair like roadblock repairs ahead of time.
There are several ways forward in the future in this research program. First, it
would be useful to replicate our algorithm results across many landslide sites in the
Himalayan mountains. Second, in this paper, although we used univariate data, i.e.,
prediction of soil movements was based only on previous soil movements. These
data could next be correlated with multivariate data about weather and rain to make
more precise predictions about soil movements even in the longer temporal horizon.
Third, motivated by the performance of these new algorithms, it would be good to
evaluate algorithms that make predictions of more than one algorithm and use them
as additional attributes for our main algorithm. We plan to incorporate some of these
ideas soon in our research program on soil movement predictions.

References

1. Pande RK (2006) Landslide problems in Uttaranchal, India: issues and challenges. Disaster
Prevent Manage: Int J (2006)
2. Parkash S (2011) Historical records of socio-economically significant landslides in India. J
South Asia Disaster Stud
3. Liu JW, Shih CS, Chu ETH (2012) Cyberphysical elements of disaster-prepared smart
environments. Computer 46(2):69–75
4. Pathania A, Kumar P, Priyanka A, Singh R, Chaturvedi P, Uday KV, Dutt V (2020) Development
of a low cost, sub-surface IoT framework for landslide monitoring, warning, and prediction.
In: International conference on advances in computing, communication, embedded, and secure
systems (ACCESS 2020)
5. Pathania A, Kumar P, Priyanka, Maurya A, Kumar M, Singh R, Chaturvedi P, Uday KV, Dutt V
(in press) Predictions of soil movements using persistence, auto-regression, and neural network
models: a case-study in Mandi, India. In: International conference on paradigms of computing,
communication and data sciences (PCCDS-2020)
6. Kumar P, Sihag P, Pathania A, Agarwal S, Mali N Chaturvedi P, Singh R, Uday KV, Dutt V
(2019) Landslide debris-flow prediction using ensemble and non-ensemble machine-learning
methods
7. Kumar P, Sihag P, Pathania A, Agarwal S, Mali N, Singh R, Chaturvedi P, Uday KV, Dutt V
(2019) September. Predictions of weekly soil movements using moving-average and support-
vector methods: a case-study in Chamoli, India. In: International conference on information
technology in geo-engineering. Springer, Cham, pp 393–405
Development of an Ensemble Gradient Boosting Algorithm … 379

8. Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation

by random forests technique: sensitivity and scaling issues. Nat Hazards Earth Syst Sci
13(11):2815–2831
9. Goetz JN, Brenning A, Petschko H, Leopold P (2015) Evaluating machine learning and
statistical prediction techniques for landslide susceptibility modeling. Comput Geosci 81:1–11
10. Ließ M, Glaser B, Huwe B (2011) Functional soil-landscape modeling to estimate slope stability
in a steep Andean mountain forest region. Geomorphology 132(3–4):287–299
11. Bui DT, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in
Vietnam using support vector machines, decision tree, and Naive Bayes Models. Math Prob
Eng
12. Chen W, Xie X, Wang J, Pradhan B, Hong H, Bui DT, Duan Z, Ma J (2017) A comparative
study of the logistic model tree, random forest, and classification and regression tree models
for spatial prediction of landslide susceptibility. Catena 151:147–160
13. Agrawal K, Baweja Y, Dwivedi D, Saha R, Prasad P, Agrawal S, Kapoor S, Chaturvedi P,
Mali N, Uday VK, Dutt V (2017) A comparison of class imbalance techniques for real-world
landslide predictions. In: 2017 International conference on machine learning and data science
(MLDS)). IEEE, pp 1–8
14. Stanley TA, Kirschbaum DB, Sobieszczyk S, Borak JS, Slaughter SL (2019) A landslide climate
indicator from machine learning (2019)
15. Coimbra CF, Pedro HT (2013) Stochastic learning methods. In: Solar energy forecasting and
resource assessment. Elsevier, pp 383–406
16. Sahin EK (2020) Assessing the predictive capability of ensemble tree methods for landslide
susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN
Appl Sci 2(7):1–17
17. Macmillan N, Creelman C (2010) Detection theory. Psychology Press, New York, NJ, USA
18. Vapnik V (1998) Statistical learning theory. Wiley, New York
19. Quinlan J (1986) Induction of decision trees. Mach Learn 1(1):81–106
20. Breiman L (2001) Random forests. Mach Learn 45(1):5–32
21. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the
22nd ACM SIGKDD international conference on knowledge discovery and data mining pp
785–794
22. Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC
plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432.
https://doi.org/10.1371/journal.pone.0118432
23. Towards data science. https://towardsdatascience.com/support-vector-machine-introduction-
to-machine-learning-algorithms-934a444fca47. Last accessed 05 Aug 2020
24. Sayad S. https://www.saedsayad.com/decision_tree.htm. Last accessed 05 Aug 2020
25. Scikitlearn. https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassi
fier.html. Last accessed 05 Aug 2020
26. Scikitlearn. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForest
Classifier.html. Last accessed 05 Aug 2020
27. Irina R (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on
empirical methods in artificial intelligence, vol 3, no 22, pp 41–46
28. XGBoost read the docs. https://xgboost.readthedocs.io/en/latest/parameter.html. Last accessed
05 Aug 2020
29. Cui Y, Cai M, Stanley HE (2017) Comparative analysis and classification of cassette exons and
constitutive exons. Biomed Res Int. https://doi.org/10.1155/2017/7323508
Seam Carving Detection and Localization
Using Two-Stage Deep Neural Networks

Lakshmanan Nataraj, Chandrakanth Gudavalli,

Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran,
and B. S. Manjunath

Abstract Seam carving is a method to resize an image in a content-aware fashion.

However, this method can also be used to carve out objects from images. In this paper,
we propose a two-step method to detect and localize seam carved images. First, we
build a detector to detect small patches in an image that has been seam carved. Next,
we compute a heatmap on an image based on the patch detector’s output. Using these
heatmaps, we build another detector to detect if a whole image is seam carved or
not. Our experimental results show that our approach is effective in detecting and
localizing seam carved images.

Keywords Image forensics · Seam carving detection · Fake images · Object

removal

1 Introduction

With new cameras, mobile phones and digital tablets, the amount of digital images
has had an exponential increase. Social media platforms have also contributed to
their increased distribution. At the same time, software for manipulating these dig-
ital images has also significantly evolved. These software tools make it trivial for
people to manipulate these digital images. The objective of media forensics is to
identify these manipulations and detect these doctored images. Over the years, many
techniques have been proposed to identify image manipulations. These include digi-
tal artifacts based on camera forensics, resampling characteristics, compression, and
others. A common operation in image tampering is removing certain image regions
in a “content-aware” way. In this regard, seam carving is a popular technique for
“content-aware” image resizing [1, 39]. In seam carving, the “important content” in
an image is left unaffected when the image is resized and it is generally assumed that
the “important content” is not characterized by the low energy pixels. Since seam

L. Nataraj (B) · C. Gudavalli · T. Manhar Mohammed · S. Chandrasekaran · B. S. Manjunath

Mayachitra Inc., Santa Barbara, CA, USA
URL: https://mayachitra.com

© Springer Nature Singapore Pte Ltd. 2021 381

Fig. 1 Illustration of seam carving detection and localization: a original image, b object marked
in red to be removed and object marked in green to be preserved, c seam carved image with object
removed, d seam carving detection heatmap using proposed approach (red pixels are areas where
seams were likely removed)

carving-based object removal involves non-traditional ways of removing objects, it

is a challenge to detect doctored images that have been seam carved. In this paper, we
propose a novel method to detect and localize seam carved images using two stages
of convolutional neural networks (CNNs): one for detection and one for localization.
First, we train a CNN to identify patches that have been seam carved. For every pixel
in an image, we then compute the detection score which results in a heatmap for the
whole image that can be used for localization. Finally, we train another CNN on the
heatmaps which gives a score at the image level to determine if an image has been
seam carved or not. Fig. 1 illustrates the proposed approach.
The rest of the paper is organized as follows. Sect. 2 presents the related work
in this area, and Sect. 3 introduces seam carving and seam insertion on images. The
methodology to detect seam carving is presented in Sect. 4, while the experiments
are detailed in Sect. 5. Finally, the conclusion is presented in Sect. 6.

2 Related Work

There have been several works proposed to detect digital image manipulations. These
include detection of splicing, morphing, resampling artifacts, copy-move, seam carv-
ing, computer-generated (CG) images, JPEG artifacts, inpainting, compression arti-
facts, to name a few. Many methods have been proposed to detect copy-move [11, 21],
resampling [6, 8, 15, 20, 29, 33, 34, 36], splicing [2, 18, 37], and inpainting-based
object removal [23, 44]. Other approaches exploit JPEG compression artifacts [7, 13,
24, 28] or artifacts arising from artificial intelligence (AI) generated images [3, 16,
30, 48]. In recent years, deep learning-based methods have shown better performance
in detecting image manipulations [4, 5, 8, 35].
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 383

Several methods have been proposed over the past decade to detect seam carving-
based manipulations [9, 14, 17, 19, 22, 25–27, 38, 40–43, 46, 47]. These include
methods using steganalysis [38], hashing [14, 27], local binary pattern [46, 47], and
deep learning-based methods [10, 31, 32, 45]. In this paper, our approach to detect
seam carving-based manipulations is also based on deep learning.

3 Seam Carving and Seam Insertion

A seam is defined as an optimal 8-connected path of pixels on an image either

from top-to-bottom or left-to-right. In seam carving, the seams are removed and
the image dimension is reduced by a column or a row. In seam insertion, a seam
is first removed and two pixels are inserted at the position where the seam was
removed. Figs. 2 and 3 illustrate the processes of seam carving and seam insertion. An
energy function computed for all points along a seam is considered for the optimality
criterion for seam selection. This choice of seams helps in maintaining the image
quality during the resizing process. We consider the seam carved/inserted image
as a tampered image because the image dimensions and its content are altered..
Hence, the problem of detecting seam carving/insertion is important from an image
forensics perspective. Interpolation kernel-based methods for re-sampling detection
will fail when the resizing in the doctored image is done using seam carving/insertion.
Though it was initially proposed for automatic image resizing while maintaining a
good perceptual quality of the resized image, seam carving has also been used for

(i) before seam carving (ii) after seam carving

a1,1 a1,2 a1,3 a1,5 b1,1 b1,2 b1,3 b1,4

a2,1 a2,2 a2,4 a2,5 b2,1 b2,2 b2,3 b2,4

seam carving path

a3,3 b3,3 b3,4

a4,1 a4,2 a4,5 b4,1 b4,2 b4,3 b4,4

At these locations, ai,j = bi,j At these locations, ai,j+1 = bi,j

Fig. 2 Example of seam carving when a 4 × 5 matrix a is seam carved and a 4 × 4 matrix b results
due to the removal of a single seam
384 L. Nataraj et al.

(i) before seam insertion (ii) after seam insertion

a1,1 a1,2 a1,3 a1,5 b1,1 b1,2 b1,3 b1,4 b1,6

a2,1 a2,2 a2,4 a2,5 b2,1 b2,2 b2,4 b2,5 b2,6

seam insertion path pixels are changed on either side of seam

a3,3 insertion path b3,3 b3,4

a4,1 a4,2 a4,5 b4,1 b4,2 b4,3 b4,4 b4,6

Fig. 3 Example of seam insertion: (i) a and (ii) b are the 4 × 5 and the 4 × 6 image matrices before
and after seam insertion, respectively. For points along the seam, the values are modified as shown
a +a a +a
for the first row: b1,1 = a1,1 , b1,2 = a1,2 , b1,3 = round( 1,2 2 1,3 ), b1,4 = round( 1,3 2 1,4 ), b1,5 =
a1,4 , b1,6 = a1,5

removal of certain image regions. It is to be noted that seam carving can discard
and retain certain regions, depending on the weight we assign to certain regions.
For example, for an object removal problem, we may need to ensure that certain
image regions are left unaffected as distorting them may cause significant perceptual
distortion. We first explain how seam carving is used for object removal and then
discuss the interesting problems involved.

4 Detection of Seam Carving

In order to detect and localize seam carving in images, we propose a two-stage detec-
tion approach: one for detection of seam carved patches and the other for localizing
seam carving in an image by generating a heatmap. First, we train a deep neural
network to identify whether patches in an image have been seam carved or not. We
then divide an image into patches, and for every patch, we compute the detection
score which results in a heatmap for the whole image. This heatmap can be used for
localization of seam carving. Finally, we train another deep neural network with the
heatmaps as input which gives a score at the image level to determine whether an
image has been seam carved or not. The entire block schematic is shown in Fig. 4.
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 385

(a) Stage 1

(b) Stage 2

Fig. 4 Two-stage approach

5 Experiments

5.1 Experimental Setup

We first extract 64 × 128 patches from images belonging to RAISE dataset [12].
From these patches, we form two classes of image patches: first class where the
patches are further cropped to 64 × 64, and the second class where the patches are
386 L. Nataraj et al.

Fig. 5 Screenshot of: a non-seam carved patches, b seam carved patches

seam carved horizontally by 50% to obtain 64 × 64 seam carved patches. In this

way, we obtained 16,000 patches from the RAISE dataset (8000 in each class) and
40,000 patches from the Dresden dataset (20,000 in each class). These were further
randomly divided into 80% training, 10% testing and 10% validation (Fig. 5).

5.2 Learning

The patches are trained using a multilayer deep convolutional neural network which
consists of convolution layer with 32 3 × 3 convs, followed by ReLu layer, convo-
lution layer with 32 5 × 5 convs followed by max pooling layer, convolution layer
with 64 3 × 3 convs followed by ReLu layer, convolution layer with 64 5 × 5 convs
followed by max pooling layer, convolution layer with 128 3 × 3 convs followed by
ReLu layer, convolution layer with 128 5 × 5 convs followed by max pooling layer,
and finally a 256 dense layer followed by a 256 dense layer and a sigmoid layer. We
train this model till a high training accuracy and validation accuracy are obtained.

5.3 Detection Heatmaps

Using the trained model on the patches, the probability of a pixel being seam carved or
not is computed on overlapping patches in an image. Figure 6 shows the heatmaps on
non-seam carved and seam carved images. As we can see, the heatmaps on the seam
carved images have more red regions than the images on non-seam carved images.
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 387

Fig. 6 Detection heatmaps on images that have a not been seam carved and b seam carved

Even for an image that has the blue sky, the heatmaps can be clearly identified for
seam carved image and the non-seam carved image. This motivated us to train the
heatmaps with another CNN which takes the heatmaps as input (Fig. 4b) and outputs
the probability whether an image has been seam carved or not. As we can see from
Fig. 7, we obtained high accuracy when trained on the heatmaps.

5.4 Robustness to Percentage of Seams Removed

In this experiment, we varied the percentage of seams removed in the testing set and
evaluated the model which was trained with 50% seams removed, in order to check
the robustness of the model for different amounts of seams removed. The area under
the curve (AUC) is the evaluation metric for varying percentage of seams removed.
The results are given in Table 1. We observe that the AUC is very high for percentages
around 50% and decreases for lower percentages of seams removed. This shows that
the model is generalizable for most percentages of seams removed. In the future, we
will train another model for lower percentages.
388 L. Nataraj et al.

Fig. 7 ROC curve of seam carving detection on the model trained on the heatmaps

Table 1 Robustness to Percentage Area under the curve (AUC)

percentage of seams removed
1 0.6464
2 0.7838
5 0.9274
8 0.9540
10 0.9724
20 0.9866
30 0.9919
40 0.9937
50 0.9916
60 0.9502
70 0.9150
80 0.8670
90 0.8223

5.5 Robustness to JPEG Compression

In this experiment, we evaluated the robustness of our proposed against JPEG com-
pression. We varied the JPEG quality factors (QFs) of test images from 100 to 50.
The model was trained on seam carved and non-seam carved patches and images,
which were also JPEG compressed between the quality factors of 70-100. The area
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 389

Table 2 Robustness to JPEG compression

JPEG quality factor (QF) Area under the curve (AUC)
100 0.9376
90 0.9160
80 0.8578
70 0.8027
60 0.7658
50 0.7332

under the curve (AUC) is chosen as the evaluation metric. The results are given in
Table 2. We observe that the AUC is high when the QF is high (compression is low)
and the AUC reduces as the QF decreases (compression increases). However, even
at a QF of 50, the AUC is still reasonably high.

5.6 Explainability on Object Removed Images

Here, we evaluate our approach in a practical scenario where objects are removed
in images using seam carving. We chose an object or a region in an image that has
to be removed. The weights of this region are set to a low value such that the seam
carving algorithm is forced to pass through this region, thus removing the object
from the image. When our approach was evaluated on these images, we observe that
our model is able to localize the region that was removed as well as the paths taken
by the seam carving algorithm as shown in Fig. 8.
The detection heatmaps also exhibit explainability as shown in Fig. 9 where an
object is marked for removal in red. While this object is removed successfully, a
person’s leg in the foreground also gets removed (top row). To prevent this, another
area is marked in green (bottom row) by giving high weights so that the person’s legs
are not removed. As we can see from the heatmaps computed on the seam carved
images (top and bottom row), the path showing the possible seams also changes near
the person’s legs, thus exhibiting explainability.

5.7 Extension to Seam Insertion Detection

Finally, we also extend the seam carving detection approach to detecting seam inser-
tion. We first extract 64 × 64 patches from images belonging to RAISE dataset [12].
From these patches, we form two classes of image patches: first class where the
patches are further cropped to 64 × 64, and the second class where the patches are
seam inserted from 64 × 32 dimensions to 64 × 64 seam dimensions. In this way, we
390 L. Nataraj et al.

Fig. 8 Detection heatmaps on images where objects have been removed using seam carving: a
original image, b heatmap computed on original image, c object marked for removal in red, d
image with object removed using seam carving, e heatmap computed on object removed image
showing the possible seam paths

obtained 16,000 patches from the RAISE dataset (8000 in each class). These were
further randomly divided into 80% training, 10% testing and 10% validation. The
patches are trained using a multilayer convolutional neural network as explained in
Sect. 5.2. We train this model till a high training accuracy and validation accuracy
are obtained. Using the trained model on the patches, the probability of a pixel being
seam inserted or not is computed on overlapping patches in an image to produce a
heatmap. Another model is trained on the heatmaps to determine if an image has
seam insertions or not. As we can see from Fig. 10, we obtained high accuracy when
trained on the heatmaps.

6 Conclusion and Future Work

In this paper, we presented an approach to detect seam carved images. Using two
stages of CNNs, we detect and localize areas in an image that have been seam carved.
In the future, we will focus on making our detections more robust, combining seam
carving and insertions, and also extend to other object removal methods such as
inpainting.
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 391

Fig. 9 Explainability in the heatmaps: a original image, b heatmap computed on original image,
c object marked for removal in red and area preserved in green (bottom row), d image with object
removed using seam carving (in the top row—person’s leg is removed while preserved in the
bottom row), e heatmap computed on object removed image showing the possible seam paths with
explainability. The seam paths change on the top row and bottom row near the person’s leg

Fig. 10 ROC curves at image level to detect seam inserted images

392 L. Nataraj et al.

Acknowledgements This research was developed with funding from the Defense Advanced
Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of
the author and should not be interpreted as representing the official views or policies of the Depart-
ment of Defense or the US Government. The paper is approved for public release and distribution
unlimited.

References

1. Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans Graph
26(3):10
2. Bappy JH, Simons C, Nataraj L, Manjunath B, Roy-Chowdhury AK (2019) Hybrid lstm and
encoder-decoder architecture for detection of image forgeries. IEEE Trans Image Process
28(7):3286–3300
3. Barni M, Kallas K, Nowroozi E, Tondi B (2020) CNN detection of GAN-generated face images
based on cross-band co-occurrences analysis. arXiv:2007.12909
4. Bayar B, Stamm MC. Design principles of convolutional neural networks for multimedia foren-
sics. In: The 2017 IS&T international symposium on electronic imaging: media watermarking,
security, and forensics. IS&T Electronic Imaging
5. Bayar B, Stamm MC (2016) A deep learning approach to universal image manipulation detec-
tion using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information
hiding and multimedia security, pp 5–10
6. Bayar B, Stamm MC (2017) On the robustness of constrained convolutional neural networks
to jpeg post-compression for image resampling detection. In: The 42nd IEEE international
conference on acoustics, speech and signal processing
7. Bianchi T, De Rosa A, Piva A (2011) Improved DCT coefficient analysis for forgery localization
in JPEG images. In: 2011 IEEE international conference on Acoustics, speech and signal
processing (ICASSP). IEEE, pp 2444–2447
8. Bunk J, Bappy JH, Mohammed TM, Nataraj L, Flenner A, Manjunath B, Chandrasekaran S,
Roy-Chowdhury AK, Peterson L (2017) Detection and localization of image forgeries using
resampling features and deep learning. In: 2017 IEEE conference on computer vision and
pattern recognition workshops (CVPRW), pp 1881–1889
9. Chang WL, Shih TK, Hsu HH (2013) Detection of seam carving in JPEG images. In: 2013
International joint conference on awareness science and technology & Ubi-Media computing
(iCAST 2013 & UMEDIA 2013). IEEE, pp 632–638
10. Cieslak LFS, Da Costa KA, PauloPapa J (2018) Seam carving detection using convolutional
neural networks. In: 2018 IEEE 12th international symposium on applied computational intel-
ligence and informatics (SACI). IEEE, pp 000195–000200
11. Cozzolino D, Poggi G, Verdoliva L (2015) Efficient dense-field copy-move forgery detection.
IEEE Trans Inf Forens Secur 10(11):2284–2297
12. Dang-Nguyen DT, Pasquini C, Conotter V, Boato G (2015) Raise: a raw images dataset for
digital image forensics. In: Proceedings of the 6th ACM multimedia systems conference. ACM,
pp 219–224
13. Farid H (2009) Exposing digital forgeries from JPEG ghosts. IEEE Trans Inf Forens Secur
4(1):154–160
14. Fei W, Gaobo Y, Leida L, Ming X, Dengyong Z (2015) Detection of seam carving-based video
retargeting using forensics hash. Secur Commun Netw 8(12):2102–2113
15. Feng X, Cox IJ, Doerr G (2012) Normalized energy density-based forensic detection of resam-
pled images. IEEE Trans Multimedia 14(3):536–545
16. Goebel M, Nataraj L, Nanjundaswamy T, Mohammed TM, Chandrasekaran S, Manjunath B
(2020) Detection, attribution and localization of gan generated images. arXiv:2007.10466
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 393

17. Gong Q, Shan Q, Ke Y, Guo J (2018) Detecting the location of seam and recovering image for
seam inserted image. J Comput Methods Sci Eng 18(2):499–509
18. Guillemot C, Le Meur O (2014) Image inpainting: overview and recent advances. Signal Process
Mag 31(1):127–144
19. Han R, Ke Y, Du L, Qin F, Guo J (2018) Exploring the location of object deleted by seam-
carving. Expert Syst Appl 95:162–171
20. Kirchner M (2008) On the detectability of local resampling in digital images. In: Security,
forensics, steganography, and watermarking of multimedia contents X, vol 6819, issue, 1, p
68190F. http://link.aip.org/link/?PSI/6819/68190F/1
21. Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection
scheme. IEEE Trans Inf Forens Secur 10(3):507–518
22. Li Y, Xia M, Liu X, Yang G (2020) Identification of various image retargeting techniques using
hybrid features. J Inf Secur Appl 51:102459
23. Liang Z, Yang G, Ding X, Li L (2015) An efficient forgery detection algorithm for object
removal by exemplar-based image inpainting. J Visual Commun Image Represent 30:75–85
24. Lin Z, He J, Tang X, Tang CK (2009) Fast, automatic and fine-grained tampered JPEG image
detection via dct coefficient analysis. Pattern Recogn 42(11):2492–2501
25. Liu Q, Chen Z (2014) Improved approaches with calibrated neighboring joint density to ste-
ganalysis and seam-carved forgery detection in JPEG images. ACM Trans Intell Syst Technol
(TIST) 5(4):1–30
26. Liu Q, Cooper PA, Zhou B (2013) An improved approach to detecting content-aware scaling-
based tampering in JPEG images. In: 2013 IEEE China summit and international conference
on signal and information processing. IEEE, pp 432–436
27. Lu W, Wu M (2011) Seam carving estimation using forensic hash. In: Proceedings of the
thirteenth ACM multimedia workshop on multimedia and security, pp. 9–14
28. Luo W, Huang J, Qiu G (2010) JPEG error analysis and its applications to digital image
forensics. IEEE Trans Inf Forens Security 5(3):480–491
29. Mahdian B, Saic S (2008) Blind authentication using periodic properties of interpolation. Inf
Forens IEEE Trans Secur 3(3):529–538
30. Marra F, Gragnaniello D, Cozzolino D, Verdoliva L (2018) Detection of GAN-generated fake
images over social networks. In: 2018 IEEE conference on multimedia information processing
and retrieval (MIPR). IEEE, pp. 384–389
31. Nam SH, Ahn W, Mun SM, Park J, Kim D, Yu IJ, Lee HK (2019) Content-aware image
resizing detection using deep neural network. In: 2019 IEEE international conference on image
processing (ICIP). IEEE, pp 106–110
32. Nam SH, Ahn W, Yu IJ, Kwon MJ, Son M, Lee HK (2020) Deep convolutional neural network
for identifying seam-carving forgery. arXiv:2007.02393
33. Nataraj L, Sarkar A, Manjunath BS (2010) Improving re-sampling detection by adding noise.
In: SPIE, media forensics and security, vol 7541. http://vision.ece.ucsb.edu/publications/
lakshman_spie_2010.pdf
34. Popescu AC, Farid H (2005) Exposing digital forgeries by detecting traces of resampling. IEEE
Trans Signal Process 53(2):758–767
35. Rao Y, Ni J (2016) A deep learning approach to detection of splicing and copy-move forgeries in
images. In: 2016 IEEE international workshop on information forensics and security (WIFS).
IEEE, pp 1–6
36. Ryu SJ, Lee HK (2014) Estimation of linear transformation by analyzing the periodicity of
interpolation. Pattern Recogn Lett 36:89–99
37. Salloum R, Ren Y, Kuo CCJ (2018) Image splicing localization using a multi-task fully con-
volutional network (MFCN). J Visual Communi Image Represent 51:201–209
38. Sarkar A, Nataraj L, Manjunath BS (2009) Detection of seam carving and localization of seam
insertions in digital images. In: Proceedings of the 11th ACM workshop on multimedia and
security. ACM, pp 107–116
39. Shamir A, Avidan S (2009) Seam carving for media retargeting. Commun ACM 52(1):77–85
394 L. Nataraj et al.

40. Sheng G, Gao T (2016) Detection of seam-carving image based on benford’s law for forensic
applications. Int J Digital Crime Forens (IJDCF) 8(1):51–61
41. Sheng G, Li T, Su Q, Chen B, Tang Y (2017) Detection of content-aware image resizing based
on benford’s law. Soft Comput 21(19):5693–5701
42. Wattanachote K, Shih TK, Chang WL, Chang HH (2015) Tamper detection of JPEG image
due to seam modifications. IEEE Trans Inf Forens Secur 10(12):2477–2491
43. Wei JD, Lin YJ, Wu YJ, Kang L.W (2013) A patch analysis approach for seam-carved image
detection. In: ACM SIGGRAPH 2013 posters, pp. 1–1
44. Wu Q, Sun SJ, Zhu W, Li GH, Tu D (2008) Detection of digital doctoring in exemplar-based
inpainted images. In: 2008 International conference on machine learning and cybernetics, vol 3.
IEEE, pp 1222–1226
45. Ye J, Shi Y, Xu G, Shi YQ (2018) A convolutional neural network based seam carving detection
scheme for uncompressed digital images. In: International workshop on digital watermarking.
Springer, pp 3–13
46. Zhang D, Li Q, Yang G, Li L, Sun X (2017) Detection of image seam carving by using weber
local descriptor and local binary patterns. J inf Secur Appl 36:135–144
47. Zhang D, Yang G, Li F, Wang J, Sangaiah AK (2020) Detecting seam carved images using
uniform local binary patterns. Multimedia Tools Appl 79(13):8415–8430
48. Zhang X, Karaman S, Chang SF (2019) Detecting and simulating artifacts in GAN fake images.
In: 2019 IEEE international workshop on information forensics and security (WIFS). IEEE,
pp 1–6
A Machine Learning-Based Approach
to Password Authentication Using
Keystroke Biometrics

Adesh Thakare, Shreyas Gondane, Nilesh Prasad, and Siddhant Chigale

Abstract Keystroke authentication systems, though they are getting more popular,
are a rather common method of data/network access. The typing dynamics apply to
the automatic process of recognizing or verifying an individual’s identity depending
on the form and style of tapping on a keyboard. It allows to authenticate individuals
through their way of typing their password or a free text on a keyboard. In this paper,
we perform analysis of machine learning algorithms on a keystroke dynamics-based
data set with features like Hold time, Keyup-Keydown time and Keydown-Keydown
time. This research is based on our methodology of using support vector machines
(polynomial & radial basis kernels), random forest algorithm and artificial neural
networks to recognize users based on their keystroke patterns. The review analysis
shows a great result in user identification based on keystroke patterns with artificial
neural networks showing better results among the three algorithms implemented at
91.8% accuracy.

Keywords Keystroke · Biometrics · Machine learning · Classification ·

Authentication

1 Introduction

The authentication method primarily based on biometrics is typically divided into

two groups. The first type of authentication functions various physiological features
like fingerprinting, face and voice recognition. This type of system faces issues such
as attacks based on real-world functions that can affect the implementation of their
biometric security [1], the expense of additional system requirement [2] and the
acceptance of misinformation by users that makes the user always aware of their

A. Thakare (B) · N. Prasad · S. Chigale

Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India
S. Gondane
Department of Electronics & Telecommunication, Vishwakarma Institute of Technology,
Pune, India

© Springer Nature Singapore Pte Ltd. 2021 395

change properties [3]. Such drawbacks require us to switch to another authentication

scheme built on behavioral features that might not be the same for all, like pace
typing, motions swipe, mouse dynamics, etc. The new less authentication, typing-
based password is known as dynamic typing or keystroke dynamic technique. Since
no more hardware devices are needed, it can be implemented by simply down-
loading and updating a device into the system. The keystroke dynamics features
are obtained based on the KeyPress/Release/Hold event timing information. The
keystroke dynamics are known as the process by which user typing is analyzed.
The keystroke dynamics acknowledges keyboard activities to recognize unauthenti-
cated access, which focuses on different people’s way of typing. It is mainly used
for cybersecurity, which focuses on the delay between keystrokes, tapping periods,
positioning of fingers, key strain, etc. Authentication is done according to a person’s
typing style. The keystroke testing can usually be divided into dynamic and static
techniques. For static testing, the analysis happens only at unique incidences like
user login. The keyboard behavior of the user is consistently compiled and mea-
sured in dynamic authentication. To manage various types of keyboards and remote
access, an improved dynamic keystroke authentication scheme is required. If the
user repeatedly uses the same password, the rhythm and speed of typing [4] can dif-
fer. Although, keystroke dynamics performance hampers from various reasons like
user susceptibility to fatigue, marked variation in typing types, injury, typing skill,
keyboard hardware variation, etc.
In this paper, we are going to review how machine learning methods like random
forest classification or deep learning techniques like artificial neural networks can
be used to detect the user based on keystroke pattern. We would review how data
have been collected and results are obtained using different techniques. The paper
is organized into five sections, Sect. 2 focuses on the research done till now and
many approaches toward keystroke dynamics authentication system with details on
the feature characteristics and parameters involved for evaluation. Section 3 exhibits
information about the models involved for comparative analysis based on accuracy to
implement a keystroke authentication system. Section 4 describes the specifications
of the data set used and the features that are taken into consideration. Section 5
presents the evaluation of algorithms and the results based on accuracy toward the
keystroke application.

2 Keystroke Dynamics

2.1 Types of Authentication Systems

The authentication method for keystroke dynamics is distinguished by two versions

called the static and dynamic authentication model.
Dynamic authentication mechanism can again be classified as a periodic and
continuous system of authentication. The structure of the static model operates with
A Machine Learning-Based Approach to Password Authentication . . . 397

Fig. 1 Authentication systems

one defined sentence such as username or password. On the basis of contrast, the
degree of similarity between the typing approximation of the reported static password
and the real typing pace of the same password is measured, and the acceptance or
disapproval of the user is determined.
The model could be rigidly analyzed around timing of the same phrase’s training
efforts. Integrating username and password can help to improve the process of static
authentication. M. S. Obaidat et al. have a detailed evaluation of the identification
of static keystrokes on a personal data [5]. This method can reject the user because
of irregular typing speed, requiring several attempts at login to get authenticated.
Periodic dynamic authentication system is expected to solve multiple attempts at
logging in. The methodology also tends to resolve the constraint of challenging
authentication for unique or identified text. Varied inputs can be subject to periodic
authentication. This system does not rely on definite text entry and is capable of
authenticating any information (Fig. 1).
Continuous keystroke analysis is an improvement to standard keystroke analysis.
This records keystroke characteristics over the entire authentication session length.
The imposter may be observed in this approach at an earlier level than in a routinely
tracked implementation. Additional analysis is the main drawback of this strategy. It
makes the solution more complex and affects device efficiency. Continuous keystroke
analysis would be helpful during times by user when using keyboard after signing
in for multiple tasks such as accessing the Web site, typing the text, chatting, etc.,
while intermittent analysis is desired when the user’s tracking time is quite short as
the time span where the user enters username and password.

2.2 Timing Features

User keystroke dynamics are initially gathered to create security profiles for each
individual. The dynamics of the keystroke assess user typing habits and fit the pattern
with the record of the associated profile.
398 A. Thakare et al.

Fig. 2 Timing features

To achieve these different features, a particular rhythm of a user is extracted. We

have summarized different features that researchers have investigated and analyzed
in this section [6].
The nature of the keystrokes is a study of a series of key events and the time that
everybody spends. It may be time taken for KeyPress or KeyRelease. KeyPress and
KeyRelease are centered upon some of the most frequently collected functions. Just
in Fig. 2, the KeyPress time is displayed as P, and the KeyRelease time is displayed
as R. Such attributes are shown below.
• As seen in D1 , D2 and D3 in Fig. 2, the delay between events of KeyPressing (P)
and KeyReleasing (R) is Dwell Time.
• As represented by F1 and F2 in Fig. 2, the difference in time between KeyRelease
(R) and KeyPress (P) events is Flight Time.
• Digraph is denoted by Di1 in Fig. 2 which is calculated by the time through middle
of the events.
• Trigraph is denoted by Tri1 in Fig. 2 Which is calculated by delay for every three
keys in succession.
• n-graphs is calculated by the latency throughout n-number of events.
Various researchers have used different sets of features to use keystroke dynamics
to authenticate users . The methods used to determine the function of the different
features in keystroke dynamics are illustrated above [7]. Digraph usually means a
two character sequence, comprising characters, punctuation, space and numerals.
Among other things, digraphs are used for multiple reasons.
A Machine Learning-Based Approach to Password Authentication . . . 399

One of their main assumptions is that each key when pressed will give a different,
user-dependent, acoustic signal. This helped them to learn an alphabet from the
machine by clustering test keystroke sounds. The digraph latencies were then used
to generate the ranking within pairs of the virtual letters. Studies have shown that
little research was performed using unusual functions like using Shift Key, using
Caps Lock Key, using Number Key and using Left or Right Shift Key for user
authentication.

2.3 Evaluation Parameters

The keystroke dynamics methodologies are analyzed using the approaches false
acceptance rate (FAR) and false rejection rate (FRR). FAR is classified as the per-
centage of identifying instances where an unauthorized user falsely accepts a bio-
metric security system attempt to access them. The false rejection rate (FRR) is
characterized as the percentage of instances that the biometric protection system
may erroneously deny a user’s approved attempt at access.

3 Models Used

The comparative analysis of three algorithms has been performed here to evaluate
the authentication system based on the timing features and evaluation parameters
discussed above. We take into consideration support vector machines (SVM), ran-
dom forest algorithm (RF) and artificial neural networks (ANN) for determining the
computation efficiency and accuracy based on the data set provided.

3.1 Support Vector Machines (SVM)

By constructing an n-dimensional, SVM performs classification and maximizes the

margin to achieve the best result in classification. SVMs were based on the definition
of classifiers for hyperplane or linear separability. Supposing we have n training data
points (x1 , y1 ) , (x2 , y2 ) , . . . .(xn yn ) where xi ∈ R m and yi ∈ {−1, +1}. Consider-
ing a (w, b) hyper plane, where w is the weight and b is the bias.
Classification can be given by

n
D (x) = sign (w.x + b) = sign( ai yi (xi .x) + b) (1)
i
400 A. Thakare et al.

Fig. 3 Decision boundary with hyperplanes

where w represents the hyperplane, and weight vector direction gives us the class
expected. The data points that are similar to the hyperplane, which are called the
support vectors, have a minimum distance to the decision boundary as shown in
Fig. 3.
SVM has restrictions that it has a lot of processing costs and provides unreliable
results because the data set is distinguished by a wide variety of features, and there is a
limited train data set. Instead of the inner product of two transformed data vectors, this
can be solved by inserting a kernel function into the feature space. A kernel function is
set to conform in some extended space to a dot product of two characteristic vectors.
There are two extensively used kernel functions in such processes:
• Polynomial Kernel function:

p
K (xi , x j ) = xi .x j + 1 (2)

• RBF Kernel function:

K (xi , x j ) = ex p[−γ||xi − x j ||2 ] (3)

3.2 Random Forest Algorithm (RF)

Random forest is an ensemble of learning algorithms based on methods. RF consists

of a series of classifiers for tree. Every tree is composed of nodes and edges. The
received group classifies new data points through a majority within each classification
model’s predictions, as shown in Fig. 4.
A Machine Learning-Based Approach to Password Authentication . . . 401

Fig. 4 Random forest classifier

This approach incorporates a bagging cycle (bootstrap aggregation) and a set of

random splits. Each tree is extracted from the data set from a separate bootstrap
sample, and each tree categorizes the data. The final outcome is a majority vote
between the trees. The random forest algorithm is defined by the following steps:
• Construct samples of the data from k trees bootstrap.
• For each of the bootstrap samples, grow an unpruned tree.
• Randomly sample n-try of the predictors at each node, and pick the best split
among those factors.
• Predict new data through a combination of the k tree predictions.

3.3 Artificial Neural Network (ANN)

A type of information retrieval influenced by the manner complex neurons systems is

used. A neural network typically includes a large number of nodes running in parallel
and organized in tiers. In human vision processing, the first tier receives the raw input
information, analogous to optic nerves. In the same way as neurons further from the
optic nerve transmit signals from those closest to it, each successive tier absorbs the
output from the tier before it, rather than from the actual information. The last stage
of the network gives the output of the system as shown in Fig. 5. Artificial neural
networks (ANN) performed best on our data set. We had set the number of epochs
as 400. It is a six-layer neural network with 56 nodes on the output layer.
402 A. Thakare et al.

Fig. 5 Artificial neural network architecture

Training: There are many methods for preparing a neural network for backpropaga-
tion. The most common approaches used to learn if the neural network is backprop-
agated are the gradient descent technique and the conjugate gradient process. As a
learning algorithm, the algorithm for backpropagation uses the process of first-order
gradient descent. In the gradient decent process, parameters are updated in a direction
that correlates with the negative gradient of the error surface. For the training of the
neural network in respectable gradient methodology, the choice of parameters such
as learning rate and momentum rate is important. Classic backpropogation parame-
ters are very sensitive. If the learning rate is very low, learning will be slow and the
algorithm will not stabilize if the learning rate is too high, so selecting the learning
rate is very important. In addition , the initial collection of neural weights impacts
convergence.
The conjugate gradient is a second-order minimization process. Other second-
order minimization techniques, including such Newton and quasi-Newton tech-
niques, can be used explicitly to develop neural networks. Between them, the con-
jugate gradient is the shortest and easiest, and for this reason, the second-order
minimization approach of neural network training is most widely used. The descent
of the conjugate gradient does not proceed down the gradient; instead, it proceeds
in a path conjugated with the previous step’s course. The gradient responsible for
the current phase, in other words, remains perpendicular to the paths of all previous
steps. Each step from the same point is at least as good as the steepest descent. Such
a number of phases are non-interfering, so that by the next step, the minimization
carried out in one step will not be partly undone. The synaptic weight Wk is updated
as follows:
Wk + 1 = Wk + ak .dk (4)
A Machine Learning-Based Approach to Password Authentication . . . 403

where, k i.e., learning rate is calculated using

SSE(Wk + ak .dk ) ≤ SSE(Wk ) (5)

−gk k=0
dk = (6)
−gk + βk .dk−1 otherwise

where dk is the search direction, gk is the gradient, and βk is the gradient scaling
factor.
Activation functions: Different activation functions used in ANN were Softmax,
ReLU and Sigmoid.
Softmax: It is easy to apply the sigmoid function, and the ReLUs will not cease the
impact throughout your training phase. However, they cannot help much when you
want to deal with classification issues. Like a sigmoid function, the softmax function
squashes the outputs of each device to between 0 and 1. But it also splits each output
such that the cumulative number of the outputs is equal to 1. The output of the
softmax function is analogous to a distribution of categorical probability, telling you
that all of the groups are likely to be valid (Fig. 6).
ReLU: Instead of sigmoid, most recent deep learning networks use rectified linear
units (ReLUs) for the hidden layers. A rectified linear unit has output 0 if the input
is less than 0, and raw output otherwise. That is, if the input is greater than 0, the
output is equal to the input. ReLUs’ machinery is more like a real neuron in your
body. ReLU activations are the simplest non-linear activation function you can use,
obviously. When you get the input is positive, the derivative is just 1, so there isn’t
the squeezing effect you meet on back propagated errors from the sigmoid function.

Fig. 6 a Standard neural net, b after dropout

404 A. Thakare et al.

Research has shown that ReLUs result in much faster training for large networks.
Most frameworks like TensorFlow and TF Learn make it simple to use ReLUs on
the hidden layers, so you won’t need to implement them yourself.
Regularization: Dropout regularization is one of the methods used. During process-
ing, the crucial concept is to randomly remove units (along with their connections)
from the neural network. This avoids too much co-adapting by units. Dropout sam-
ples from an increasing amount of various “thinned” networks during preparation.
By merely using a single untinned network that has smaller weights, it is possible to
estimate the result of combining the estimates of all these thinned networks at test
time. This greatly eliminates overfitting and provides substantial advantages over
other types of regularization.

4 Dataset

For this post, the training and test data are the CMU Keystroke Dynamics Benchmark
Data collection given by Killourhy et al. [8]. It contains the keystroke information for
51 users, with each user typing the “.tie5Roanl” password 400 times. The data were
gathered from multiple sessions of at least one day’s difference between the sessions,
meaning that any regular differences can be identified in the user’s typing. Besides
the 51 users provided by the CMU data set, we have appended the project members’
keystroke records. The three most widely used features for keystroke dynamics are
as follows:
• Hold time: period from the press to the release of a key.
• Keydown-Keydown time: period between the successive keys being pressed.
• Keyup-Keydown time : period from one KeyRelease and the next key click.

5 Results

A Linux Computer was used with Python and an open-source library named PyX-
Hook built on the same to monitor the keystrokes of project group members. Template
was written in Python to log keystrokes. The recordings are first stored in JSON
format after positive processing, and then another script is created for adding the
recordings in the initial data set. After preparing the data set, each user’s 300 records
are used for training and the remaining 100 for testing. The keystroke dynamics
efficiency was evaluated with respect to accuracy of models implemented.
For comparative analysis, we considered SVM, RF and ANN in terms of model
accuracy. The data set was implemented on two SVM kernel functions, namely RBF
and polynomial for analysis, where polynomial kernel has shown better performance
mainly due to the data set size. Random forest classification comes close in terms
A Machine Learning-Based Approach to Password Authentication . . . 405

Table 1 Model performance

Algorithm Accuracy (%)
SVM (RBF) 84.2
SVM (Poly) 87.3
Random forest 91.2
Artificial neural network 91.8

of accuracy to artificial neural networks with the latter outperforming all algorithms
taken into consideration as shown in Table 1.

6 Conclusion

We also proved in this work that the method such as dynamic keystroke can recognize
user authentication within the program. Moreover, by only using three features shows
a promising result. Earlier, to improve the performance, we would like to create more
functionality that can be added as special user authentication. Some of the possible
works is not to confine the device to only some login. Users would be in a position
to use different passwords. Keyboard functionality has a range of benefits, one of
which is low cost as there are no extra and non-invasive tools for the user because
users do not need to use and employ any new tools to use dynamic keyboards. This
work indicates a high precision using ANN of about 92%. In turn, incorporation of
real-time password protection technology will now become the work of the future.
This research can continue to compare how the pattern of keystrokes varies according
to various devices.

References

1. Uludag U et al (2004) Biometric cryptosystems: issues and challenges. Proc IEEE 92:948–960
2. Jain AK et al (2005) Biometric template security: challenges and solutions. In: Signal processing
conference, pp 1–4
3. Moody J (2004) Public perceptions of biometric devices: the effect of misinformation on accep-
tance and use. In: Issues Inform Sci Inf Technol
4. Umphress D, Williams G (1985) Identity verification through keyboard characteristics. Int J
Man-Mach Stud 23:263–273
5. Obaidat MS et al (1999) Estimation of pitch period of speech signal using a new dyadic wavelet
algorithm. Inf Sci 119:21–39
6. Gunetti D et al (2005) Keystroke analysis of free text. ACM Trans Inf Syst Secur (TISSEC)
8:312–347
7. Mondal S, Bours P (2015) A computational approach to the continuous authentication biometric
system. Inf Sci 304:28–53
406 A. Thakare et al.

8. Killourhy KS, Maxion RA (2009) Comparing anomaly-detection algorithms for keystroke

dynamics. In: 2009 IEEE/IFIP international conference on dependable systems & networks,
Lisbon, 2009, pp 125–134. https://doi.org/10.1109/DSN.2009.5270346
Attention-Based SRGAN for Super
Resolution of Satellite Images

D. Synthiya Vinothini and B. Sathya Bama

Abstract Single image super resolution plays a vital role in satellite image
processing as the observed satellite image generally has low resolution due to the
bottleneck in imaging sensor equipment and the communication bandwidth. Deep
learning provides a better solution to improve its resolution compared to many sophis-
ticated algorithms; hence, a deep attention-based SRGAN network is proposed. The
GAN network consists of an attention-based SR generator to hallucinate the missing
fine texture detail, a discriminator to guess how realistic is the generated image.
The SR generator consists of a feature reconstruction network and attention mecha-
nism. Feature reconstruction network consists of residually connected RDB blocks
to reconstruct HR feature. The attention mechanism acts as a feature selector to
enhance high-frequency details and suppress undesirable components in uniform
region. The reconstructed HR feature and enhanced high-frequency information
are fused together for better visual perception. The experiment is conducted on
WorldView-2 satellite data using Googles free cloud computing GPU, Google colab.
The proposed deep network performs better than the other conventional methods.

Keywords Super resolution · Satellite image · Deep learning · Generative

adversarial network

1 Introduction

Single image super resolution (SISR) has enticed the attention of many researchers
and AI companies. The fundamental principle of SISR is to reconstruct a high-
resolution image from a low-resolution image. This is obviously an ill-posed problem
since a number of HR solutions can be derived for a given set of LR data. Conversely,
it is considered as an underdetermined inverse problem, where the solution is not
exclusive. Over the past decades, enormous works have been carried out to address
this issue. Basically, interpolation-based methods are simple and fast, yet smooth the

D. Synthiya Vinothini (B) · B. Sathya Bama

Thiagarajar College of Engineering, Madurai 625015, Tamil Nadu, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 407

data resulting in jaggy and ringing effects. While reconstruction-based methods rely
on different smoothing priors and constraints and still remain inept at regions like
textures and edges. To improve the efficiency of the super resolution (SR) algorithm,
prior information like sparsity, self-similarity, and exemplar priors were learned
from the images. These learning-based methods formulate the coefficients between
the LR and HR image training pair either by learning sparse coefficients, self-similar
structures, or exemplar images.

1.1 Deep Learning for Super Resolution

Deep learning is currently progressing in many computer vision fields. With available
large datasets and computation power, deep learning achieves good accuracy by an
end-to-end learning. With the advent of SR based on the convolutional neural network
(SRCNN), deep learning is dynamically increasing the SR performance. SRCNN is
a three-layer shallow network that directly learns an end-to-end nonlinear mapping
function. The network learns the upscaling filter parameters directly. Subsequently,
deeply recursive convolutional network architecture has a small model parameter yet
permits pixel dependencies for a long range. Dilated convolutional neural network
uses dilated convolutions also known as atrous convolution which is a vivid method
to increase the receptive field of the network exponentially with linear parameter
accretion. Several studies [1, 2] show that increasing the depth of the network can
efficiently increase the model’s accuracy as they are potential to model high complex
mapping. Such deep networks can be efficiently trained using batch normalization
[3]. The learning ability of CNN is made powerful with skip connections [4–6] and
residual blocks [5], where instead of identity learning the network learns the residue.
This design choice has relived the network from vanishing gradient problem which
remained a bottleneck in training deep networks. The performance is fueled by the
right choice of architectural modules that increase the depth, width, and growth rate
of the network. VDSR [4] has increased network depth by piling more convolutional
layers with residual learning. Enhanced deep residual network SR (EDSR) [7] and
multiscale deep SR (MDSR) system [7] use the residual block to build a wide and
deep network with residual scaling [8], respectively. SRResNet [9] also takes the
benefit of residual learning and adopts the efficient sub-pixel convolution layer, while
the advantage of dense connection that is a direct connection from the previous
layers is adopted in SRDenseNet [10]. Residual dense network [11] uses hierarchical
features from the LR image using residual dense blocks. The residual in residual dense
block [11] strategy improves the perceptual quality of the reconstructed image. This
network also shows that a higher growth rate can improve the model’s performance.
Attention-Based SRGAN for Super Resolution of Satellite Images 409

1.2 Generative Adversarial Network-Based Deep Learning

for SR

Recently, generative adversarial network (GAN) has been dynamically explored in

the SR paradigm. A GAN framework consists of a generator to produce upscaled
images, a discriminator to differentiate between the available and generated image,
and finally, the choice of the loss function. The two models are simultaneously trained
to compete with each other. The generator tries to generate images that are not iden-
tified by the discriminator as a fake image while the latter try to judge the fake
image correctly. The perception preserving approaches improve the visual quality
of SR images compared to the content-preserving approaches. The content-driven
approaches try to preserve the pixel value, which introduces smoothing artifacts in the
SR image. To avoid this overly smoothing problem introduced by the per-pixel loss
function, Ledig [9] proposed a generative adversarial network with perceptual loss
function. Perceptual loss function enhances the visual image quality by decreasing the
error in feature space than the pixel space itself. Johnson [12] proposed a perceptual
loss function using the pre-trained 16-layer VGG network feature. SRGAN model
uses perceptual loss and adversarial loss to generate realistic textures similar to
natural images. Cycle in Cycle GAN (CinCGAN) [13] upsamples the LR image by a
tuned EDSR model to obtain the SR image. Zhang [14] proposed an SR-based GAN
framework where the generator uses a sub-pixel convolution layer for upscaling.
Enhanced SRGAN is improved by adopting the RRDB network as a generator. The
relativistic average GAN (RaGAN) increases the discriminator’s accuracy by iden-
tifying whether the image is more realistic than the other. Self-attention SRGAN
(SASRGAN) considers local and global dependencies to enhance the detail textures
and structures [15].

1.3 Motivation and Contribution

Recently, generative adversarial network has gained its popularity in generating

images and realistic textures. The capability of GAN architectures to generate fine
texture details has motivated its application for image super resolution. The success
of the SR network depends on its architecture, training, and optimization objective
function. Further, these networks reconstruct the textures without understanding its
actual positions. This work proposes an attention-based GAN architecture to improve
the performance of an SR network. This attention mechanism provides information
about the textural region.
The main contribution of the work is as follows.
• This work proposes an attention-based SRGAN model to differentiate texture
and smooth area, i.e., high-frequency and low-frequency regions. Once high-
frequency details are identified, it is enhanced for better performance.
410 D. Synthiya Vinothini and B. Sathya Bama

• This work also proposes a generator, i.e., an SR network that hallucinates the
missing fine texture detail. The SR network consists of feature reconstruction
and attention generating network. The feature reconstruction network consists
of residual connection of dense modules, while the attention generating network
comprises of dense connection of residual blocks.
• A discriminator is proposed to relatively guess how realistic is the generated
image. This network has a residual in residual connection of dense modules.

2 Attention-Based SRGAN Model

The single image super resolution is an ill-posed problem that generates a high-
resolution super resolved image I SR from a low-resolution image I LR , where I LR
is a downsampled, blurred, and noisy version of the original high-resolution image
I HR . It is represented as

I LR = D B I HR + ε (1)

where D is the downsampling factor, B is the blurring operator, and ε is the additive
noise. If the tensor size of I LR is H × W × M, where H, W, M denotes the height,
width, and number of color channels in the I LR image, then the tensor size of I SR
will be of dimension DH × DW × M. Reconstruction does not always guarantee the
original I HR but an image I SR which is more similar to it. To obtain such an image,
it is not always advisable to check the content, i.e., the pixel value but the percep-
tual quality of the reconstructed image. GAN framework is an excellent choice to
improve the perceptual quality of the image. The success of a GAN network depends
on the architecture of the generator, discriminator, and the choice of loss functions.
Its general architecture is given in Fig. 1. While training, the HR data is downsampled
to a LR data. The generator of a GAN network upsamples the LR data into SR data,
which is compared with the available HR database by the discriminator to identify
the truthfulness of the generated SR data for natural texture. The loss is then calcu-
lated and back propagated to train the generator and the discriminator. Generator is
trained such that it generates more realistic image to fool the discriminator, mean-
while discriminator is trained to intelligibly identify the generated images. Aiming

Fig. 1 General architecture for GAN network

Attention-Based SRGAN for Super Resolution of Satellite Images 411

to enhance the overall visual quality of the reconstructed SR image, this section first
proposes a novel network design for generator and discriminator and then the loss
functions.

2.1 Network Structure

To solve the super resolution problem, we propose an attention-based model. The

proposed model has two basic network building block viz., residual dense block
(RDB) and dense residual block (DRB). Each basic building block generates
hierarchical feature for its shallow feature input.
Residual Dense Block. RDB block consists of dense block (DB), i.e., dense connec-
tion of d convolution block (F n,c d ), dense feature fusion (F RDF ), and residual connec-
tion of DB (F RL n ). The network structure consists of N RDB blocks connected sequen-
tially such that the input to a RDB block is the output of its preceding RDB block,
and its structure is shown in Fig. 2. Its functionality is represented as H RDB .

Fn = HRDB,n (Fn−1 ); 1 ⇐ n ⇐ N (2)

where F n−1 and F n are the input and output feature of the nth RDB block.
H RDF represents the functionality of fusion of dense feature within RDB block
and given by the following equation.

n
FRDF = HRDF Fn−1 , Fn,c
1
, Fn,c
2
, . . . Fn,c
d
, . . . Fn,c
D
; (1 ⇐ d ⇐ D) (3)

The features from d convolution blocks are concatenated and are subjected to a
1 × 1 convolution layer. F n,c d is the output of dth convolution block of nth RDB
block. Each convolution block consists of a convolution layer followed by batch
normalization and ReLU, and its functionality is denoted by H CBR .

Fig. 2 Architecture of residual dense block network

412 D. Synthiya Vinothini and B. Sathya Bama

d
Fn,c = HCBR Fn−1 , Fn,c1 , . . . Fn,d−1 (4)

where [.,.] denotes the concatenation of features. The residual learning feature is the
output of the nth RDB block is given by
n
FRL = FRDF
n
+ Fn−1 (5)

Finally, this residual feature is given to a convolution block to yield the RDB
block output.
n
Fn = HCBR FRL (6)

Dense Residual Block. DRB block consists of residual block (RB), i.e., residual
connection of d convolution layers (F m,r,c d ), and dense connection of RB (F DRB m ).
Its structure is shown in Fig. 3. Its functionality is represented as H DRB .
m
out = HDRB FDRB in ; 1 ⇐ m ⇐ M
m
FDRB (7)

Fig. 3 Architecture of dense residual block network

Attention-Based SRGAN for Super Resolution of Satellite Images 413

where F DRB in m and F DRB out m represents the input and output feature of the mth DRB
block. H DF represents the functionality of fusion of dense feature within DRB block,
while H RB and H C denote the functionality of the residual block and convolution
layer, respectively.
m
out = HDF FDRB in , Fm,RB , Fm,RB , . . . Fm,RB , . . . , Fm,RB , ;
m 1 2 r R
FDRB
(8)
(1 ⇐ r ⇐ R)

where F m,RB in r represents the input to the rth residual block in mth DRB block, and
F m,r,c D denotes the output of the Dth convolution layer in the rth residual block in
mth DRB block.
r
r
Fm,RB = HRB Fm,RB in , Fm,r,c ; (1 ⇐ r ⇐ R)
D
(9)

= Fm,RB
r
in + Fm,r,c
D
(10)

2.2 Generator

The basic idea behind a SRGAN architecture is that the generative model G is trained
to cheat the discriminator D that is trained to differentiate real or super resolved
images. Thus, the generator will be able to generate high-resolution images that are
more close to real images and undistinguishable by the discriminator. Thus, the SR
network generates a perceptually plausible images.
The generator architecture to generate I SR is illustrated in Fig. 4. Its aim is to
achieve a generating or mapping function G by an end-to-end learning, which hallu-
cinates an HR image for a given LR image. The generator network comprises of two
stages: feature reconstruction and attention generation network. The feature recon-
struction structure intends to reconstruct the HR information, while the attention
structure generates the weightage for high-frequency information to be restored.
The feature reconstruction network is a fully convolutional structure that recon-
structs high-frequency details to be injected into the interpolated LR image. To predict
the relative plausible pixels in HR space, a large receptive field is required. This
need influences to use deep cascaded blocks for extracting hierarchical features. It
comprises of three modules: initial section for shallow feature extraction, hierar-
chical feature extraction module (HFM) using the residual RDB module, and finally
the upscaling module (UM).
The shallow features (F SF ) are given by

FSF = HSFE I LR (11)
414 D. Synthiya Vinothini and B. Sathya Bama

Fig. 4 Architecture of the proposed attention-based SR generator network

= HC I LR (12)

where H SFE (.) denotes the initial shallow feature extraction process. This low-level
shallow features contain significant information to restore HR image. It is fed to
residual RDB module to extract the hierarchical features (F HF ) which is given by

FHF = HHFE (FSF ) (13)

= HRRDB (FSF ) (14)

= FSF + HSRDB (FSF ) (15)

= FSF + FDF (16)

where H HFE (.) denotes the hierarchical feature extraction process, which is the sum of
shallow and dense features (F DF ) extracted by the residual RDB whose functionality
is given by H RRDB (.). The residual RDB module is arranged such that N number of
RDB blocks are arranged sequentially so that input to each RDB block is the output
of its preceding block. Thus, the super resolution network will be benefitted by the
collective information at various levels. Then, this structure is connected with a skip
connection to form a residual network. This connection increases the movement of
gradient and information over the network and thus reduces the vanishing gradient
problem in training deeper networks. The functionality of the sequential RDB is
represented by HSRDB (.) and can be given by the following equation
Attention-Based SRGAN for Super Resolution of Satellite Images 415

FDF = HSRDB (FSF ) = HRDB,N ... HRDB,n ... HRDB,2 HRDB,1 (FSF ) (17)

Finally, the upscaled feature (F UM ) obtained from the upscaling function (H UM )

and is represented as

FUM = HUM (FHF ) (18)

The upscaled feature is then enhanced by attention features produced by the

attention mechanism. The attention network utilizes the U-net [16] architecture where
convolution layer is replaced by densely connected residual blocks. Since the dense
structure effectively reuses the information, it reduces the number of parameters
and alleviates gradient vanishing problem, thus makes the deep network easier to
train with less computation complexity and memory requirement. This structure
helps the SR network to effectively identify the high-frequency region for selective
enhancement of textural region to improve the networks performance.
The bicubic interpolation (F BC ) is fed as an input to the attention mechanism
which contains ordered M number of DRB blocks. M is an odd value, and the input
to each DRB block is defined as follows
⎧
⎨ HMP(HC (FBC )) m=1
FDRB in m = HAP FDRB outm−1 2 ≤ m ≤ M+1 (19)
⎩ M+1 2
HUP FDRB outm−1 , FDRB out M−m+1 2
< m ≤ M

where HMP , HAP , HUP denote the functionality of max pooling, average pooling, and
upsampling, respectively.
The structure consists of an encoding and decoding path where feature sizes
shrink and grow, respectively. In the encoding path, pooling is applied to reduce
the data dimension and increase the receptive field to predict the high-frequency
region. Meanwhile in the decoding path, the encoded features upsampled using a
deconvolution layer. This path also takes the advantage of integrating the low-level
feature from the encoding path.
The significant information from the low-level feature is reused. Thus, the
combined feature can specifically identify the textural region that needs more weigh-
tage or attention by the feature reconstruction network. The final attention feature
has the size of the HR image to be generated with a single channel output. It uses
sigmoid activation to limit its value between 0 and 1. The texture regions will have
more weightage or attention, and its feature value will be more close to 1.
The functionality of attention mechanism is represented as H AF , and the attention
feature is FAF .

FAF = H AF (FBC ) (20)

Finally, the interpolated image is added with is residual. This residual (F EH )

feature is the enhanced textural feature obtained as a product of the upsampled
feature (F UM ) from feature reconstruction structure and its corresponding attention
416 D. Synthiya Vinothini and B. Sathya Bama

feature (FAF ).

I SR = FBC + FEH (21)

I SR = FBC + (FUM ∗ FAF ) (22)

2.3 Discriminator

The discriminator network is trained to distinguish the real HR images from the
generated SR images. The network structure is given in Fig. 5. The network consists
of four stages: a shallow feature extraction, hierarchical feature extraction, flattening,
and discrimination. The feature extraction structure of the discriminator follows the
same architecture as that of the generator. Once the hierarchical feature is extracted,
it is flatted by a dense network. Then, the dense network is reduced to one neuron, to
give a unique value, which is finally fed to a sigmoid activation (σ ) to limit the value
between 0 and 1. The discriminator performance is enhanced based on the relativistic
average discriminator (RaD) [17]. A standard discriminator deduces the probability
for an input image to be realistic and natural, whereas the RaD tries to estimate the
probability that a real image xn is relatively more natural than the generated image
x g and vice versa and thus expressed as

DRa = (xn ) = σ (D(xn ) − Ex g [D x g ]) (23)

DRa x g = σ (D x g − Exn [D(xn )]) (24)

where D(.) denotes the non-transformed discriminator output, σ represents a sigmoid

function, Ex g [.] and Exn [.] represent the mean of all the generated and real image
in the mini-batch, respectively. Equations (23) and (24) compute the distance to
measure the relative realness and fakeness.

Fig. 5 Architecture of the discriminator network for attention-based SRGAN

Attention-Based SRGAN for Super Resolution of Satellite Images 417

2.4 Loss Function

The loss function for the discriminator of relative average standard GAN (RaSGAN)
is defined as

L RaSGAN
D = Exn log(DRa (xn )) + Ex g log 1 − D Ra x g (25)

The adversarial loss for its generator is defined as

L RaSGAN
G = Ex g log DRa x g + Exn log(1 − D Ra (xn )) (26)

To improve the training of the GAN network, we propose to use the following
loss function in addition with the adversarial loss. The total generator loss (L G ) is
given as

L G = αL RaSGAN
G + L perceptual + βL 1 (27)

where α and β are constants to regulate the loss functions. L 1 is the content loss that
computes the 1-norm distance between the generated SR image and its reference HR
image.

L 1 = Exm [G(xm ) − Re f (xm )1 ] (28)

where G(xm ), Re f (xm ), and Exm are the generated SR image for input LR image
‘x’ in the mini-batch, its HR reference image, and the mean value for all input
LR images in the mini-batch. Content loss may preserve the information but often
fails to maintain the high-frequency information which lead to smooth textures and
unpleasant results. To enhance the perceptual quality in the generated image, percep-
tual loss (L perceptual ) is used. Instead of providing a pixel-wise loss measure, this gives
feature-wise measure. It is the VGG loss obtained from the activation layers of the
pre-trained 19 layer VGG network. It is measured as the distance between the VGG
perceptual feature (ϕ) of the generated SR image and its HR counterpart.

L perceptual = Exm [ϕ(G(xm )) − ϕ(Re f (xm ))] (29)

3 Results and Discussion

This section analyzes the performance of the proposed method for super resolution
of satellite imagery as well as other recently developed methods in the field. The
experimental simulation is conducted on WorldView-2 images. Super resolution is
a problem of recovering an image from its decimated, blurred, warped, and noisy
418 D. Synthiya Vinothini and B. Sathya Bama

version. This work considers reconstruction from a decimated data. The size of all
images considered for experimental simulation is 256 × 256. The original image
was spatially downsampled by a downsampling factor f ds to obtain a low-resolution
image.
The experiment is conducted for WorldView-2 satellite images, and the visual
comparison of its result is exhibited in Fig. 6. WorldView-2 provides a high-resolution
(0.46 m) panchromatic and eight multispectral bands with a spatial resolution of
1.84 m. Out of the eight bands, four represents the standards color channels viz., red,
green, blue and near infrared1, and four extra band viz., coastal, yellow, red edge, and

Fig. 6 Visual comparison of different super resolution methods on worldview2 data. a Original HR
satellite data. b Downsampled version of a. c CC. d ICBI. e IWF. f DCC. g Adaptive polynomial
regression (proposed). h Quad gradient method (proposed). i SCN. j SRGAN. k Attention-based
SR and l Attention-based SRGAN (proposed)
Attention-Based SRGAN for Super Resolution of Satellite Images 419

near infrared 2 for improved spectral analysis, mapping, exploring, and monitoring.
Its high altitude has the advantage to revisit any place on earth in 1.1 days. For our
experiment, we use Worldview-2 dataset acquired over Madurai region—South India
acquired on 04th June 2010.
For training the deep learning-based SR network, large dataset is required. This is
satisfied by extracting the large satellite image data tile into non-overlapping patches.
For this experiment, 20,000 h/LR patch pairs of size 256 × 256 are extracted. This
dataset is split into 90/10% for training/validation. Instead of classical stochastic
gradient descent procedure, this uses an adaptive moment estimation (Adam) opti-
mization procedure, as it can efficiently solve practical deep learning problems that
use large models and datasets. The configuration parameters alpha, beta1, beta2,
epsilon for Adam optimizer is set as 10–4 , 0.9, 0.999 and 10–8 , respectively. Alpha is
the learning rate or step size, and beta1 and beta2 are the exponential decay rate for
first and second moment estimate, respectively, epsilon is a very small constant to
prevent division by zero in the process. The network is trained for 100 epochs using
a mini-batch size of 64.
For attention-based SRGAN, the network parameters are as follows: The number
of RDB blocks (N), the number of DRB blocks (M), the number of convolution layers
(D) within each RDB and Res block in DRB network, the number of Res Block
(R) within DRB block, and the growth rate (G), i.e., width of each convolutional
layer. These parameters are set as N = 5, M = 5, D = 8, R = 5, and G = 64.
The model is trained using Google Colab, which provides Nvidia Tesla K80 GPU.
Figure 7 explores the effect of network growth rate on the metrics PSNR. Based

Fig. 7 Effect of growth rate on PSNR for attention-based SR network

420 D. Synthiya Vinothini and B. Sathya Bama

on the observed result, it can be concluded that growth rate of 64 is optimum. The
network with higher growth rate is yielding only a saturated performance.
For convenience, f ds is set such that it is 2n where n is the number of iterations the
classical algorithm is applied to get the super resolved image. The super resolution
is performed on the downsampled image. The methods other than deep learning
is iterated for (n − 1) times. The final super resolved image is obtained after ‘n’
iterations which will be of same size as that of the original HR image.
Then, the super resolved image is compared with the original HR image for
quality assessment of the algorithm. The proposed super resolution method was
compared with several state of art methods including cubic convolution, interactive
curvature-based interpolation, inverse Wiener filter (IWF), directional cubic convo-
lution, sparse coding-based network (SCN), and super resolution generative adver-
sarial network. Further the results of the attention-based mechanism are compared
for general SR network and SRGAN network. The results are compared to determine
their discrepancies and other valuable measures.
The original image has been downsampled with factors f ds = 2 and 4 using
bicubic kernel. The performance of the algorithms is discussed based on the results
of the quantitative measures. For the purpose of evaluation of the truthfulness of
the proposed algorithm, the quality metrics like PSNR, RMSE, degree of distortion,
correlation coefficient, and structural similarity are measured and given in Tables 1
and 2.
Comparing the metrics value of adaptive polynomial regression with the other
interpolation-based methods like CC and ICBI, it gives better performance, but still
it has higher distortion rate. SSIM value also shows that it requires edge enhancement.

Table 1 Quantitative performance evaluation at fds = 2 of various methods on the worldview2

dataset
Metrics @ f ds = 2
Methods PSNR RMSE Degree of distortion Correlation SSIM
coefficient
CC 23.6874 16.6790 9.5703 0.9325 0.8677
ICBI 24.9978 14.3434 9.3341 0.9499 0.8995
IWF 25.8817 12.9555 8.0236 0.9607 0.9158
DCC 25.2362 13.9550 9.0782 0.9529 0.8988
Adaptive 25.1680 14.0650 8.6237 0.9533 0.9057
Polynomial
Quad gradient 26.7507 11.7221 7.3827 0.9692 0.9256
SCN 28.1661 9.9594 5.9717 0.9768 0.9535
SRGAN 28.6940 9.3722 6.1091 0.9794 0.9539
Attention-based 30.0603 8.0080 4.8451 0.9855 0.9675
SR
Attention-based 30.5908 7.5336 5.0229 0.9865 0.9717
SRGAN
Attention-Based SRGAN for Super Resolution of Satellite Images 421

Table 2 Quantitative performance evaluation at f ds = 4 of various methods on the worldview2

dataset
Metrics @ f ds = 4
Methods PSNR RMSE Degree of distortion Correlation SSIM
coefficient
CC 19.4568 20.9548 11.9556 0.8612 0.7856
ICBI 20.6595 20.0351 11.5689 0.8654 0.8064
IWF 20.9556 19.9567 11.0215 0.8832 0.8234
DCC 21.2156 19.5486 11.3546 0.8745 0.8152
Adaptive 21.9987 19.2456 11.2253 0.8845 0.8356
polynomial
Quad gradient 22.5465 18.7845 10.2354 0.8953 0.8575
SCN 23.9564 16.9865 9.0112 0.9074 0.8696
SRGAN 24.4870 14.8946 9.5648 0.9132 0.8858
Attention-based 25.5488 13.2356 8.0564 0.9198 0.8905
SR
Attention-based 25.9865 12.9856 8.6265 0.9256 0.8996
SRGAN

Hence, a gradient-based technique is proposed. Usually, gradient-based approaches

fill the missing pixels value in the dominant gradient direction, but DCC fills pixel
value in two orthogonal directions, so its performance is better compared to CC. Our
proposed method quad gradient approach fills the value based on four directions by
weighting their gradients and shows an improved performance compared to DCC. But
still its performance is not satisfactory; hence, an attention-based SRGAN is devel-
oped. The attention mechanism identifies the high-frequency region and enhances it.
This is verified by the improved metric value in terms of SSIM. Attention mechanism
is better compared to SCN and SRGAN networks, and further attention in SRGAN
can still produce a better visually plausible image. In each case, it is obvious from
the tabulated results of the quality metrics that the proposed attention-based SRGAN
method outperforms among the other methods.

4 Conclusion

In this work, the deep learning-based method is proposed to serve the purpose of
resolution enhancement in satellite images. Deep learning provides a better solution
compared with many sophisticated algorithms; hence, this work proposes a deep
attention-based SRGAN. The GAN network consists of a SR generator to hallucinate
the missing fine texture detail, a discriminator to guess how realistic is the generated
image. An attention-based SR network is proposed for SR generator. The SR gener-
ator consists of a feature reconstruction network and attention mechanism. Feature
422 D. Synthiya Vinothini and B. Sathya Bama

reconstruction network consists of residually connected RDB blocks to reconstruct

HR feature. The attention mechanism identifies the high-frequency information and
enhances the same. The reconstructed HR feature and enhanced high-frequency infor-
mation are fused together for better visual perception. The experiment is conducted on
WorldView-2 satellite data using Googles free cloud computing GPU, Google colab.
The proposed deep network performs better than the other conventional methods.

References

1. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image
recognition. In: International conference on learning representations (ICLR), pp 1–14. arXiv
preprint arXiv:1409.1556
2. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern
recognition (CVPR), pp 1–9
3. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing
internal covariate shift. In: Proceedings of the 32nd international conference on machine
learning (ICML), pp 448–456
4. Kim J, Kwon Lee J, Mu Lee K (2015) Accurate image super-resolution using very deep
convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 1646–1654
5. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European
conference on computer vision (ECCV), Springer, pp 630–645
6. Kim J, Kwon Lee J, Mu Lee K (2016) Deeply-recursive convolutional network for image super-
resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 1637–1645
7. Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single
image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern
recognition workshops, pp 136–144
8. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the
impact of residual connections on learning. In: Thirty-first AAAI conference on artificial
intelligence, pp 4278–4284
9. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Shi W (2017) Photo-
realistic single image super-resolution using a generative adversarial network. In: Proceedings
of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
10. Tong T, Li G, Liu X, Gao Q (2017) Image super-resolution using dense skip connections. In:
Proceedings of the IEEE international conference on computer vision, pp 4799–4807
11. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual dense network for image super-
resolution. In: International IEEE conference on computer vision and pattern recognition
(CVPR), pp 2472–2481
12. Johnson J, Alahi A, Li F (2016) Perceptual losses for real-time style transfer and super-
resolution. In: European conference on computer vision (ECCV), Springer, pp 694–711
13. Zhang Y, Liu S, Dong C, Zhang X, Yuan Y (2019) Multiple cycle-in-cycle generative adversarial
networks for unsupervised image super-resolution. IEEE Trans Image Process 29:1101–1112
14. Zhang D, Shao J, Hu G, Gao L (2017) Sharp and real image super-resolution using generative
adversarial network. In: International conference on neural information processing, Springer,
Cham, pp 217–226
15. Zong L, Chen L (2019) Single image super-resolution based on self-attention. In: IEEE inter-
national conference on unmanned systems and artificial intelligence (ICUSAI), Xi’an, China,
pp 56–60. doi: https://doi.org/10.1109/ICUSAI47366.2019.9124791
Attention-Based SRGAN for Super Resolution of Satellite Images 423

16. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical
image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A (eds) Medical Image
Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture
Notes in Computer Science, vol 9351. Springer, Cham. https://doi.org/10.1007/978-3-319-
24574-4_28
17. Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from
standard GAN. arXiv preprint arXiv:1807.00734
Detection of Acute Lymphoblastic
Leukemia Using Machine Learning
Techniques

Pradeep Kumar Das, Ayush Pradhan, and Sukadev Meher

Abstract Automatic detection of acute lymphoblastic leukemia (ALL) is essential,

as well as a challenging job. Recently, machine learning and deep learning-based
classification have emerged as an esteemed approach in medical image analysis. In the
presence of small medical datasets, transfer learning dominates over traditional deep
learning methods. In this work, we have presented an effective and computationally
efficient ALL detection technique. We have presented three models by introducing
fully connected layers and/ or dropout layers in ResNet50 architecture. Out of these
three models, the model that demonstrates the best training performance is selected to
extract features efficiently. Finally, we have applied logistic regression, support vector
machine (SVM), and random forest to classify ALL and compare their performances.

Keywords Classification · Deep learning · Leukemia · Machine learning ·

Transfer learning

1 Introduction

Leukemia (blood cancer) is caused due to defective functioning of white blood cells
(WBC) [1]. It affects the immune system of the body. ALL is a blood cancer that
causes over production of lymphoblasts (immature lymphocytes) [1–10]. Micro-
scopic blood-cell analysis is an efficient and cost-effective approach for the early
diagnosis of heamatological disorders [1–5, 7–11]. Microscopic analysis is generally

(Invited paper)

P. K. Das (B) · A. Pradhan · S. Meher

National Institute of Technology Rourkela, Rourkela 769008, India
e-mail: [email protected]
A. Pradhan
e-mail: [email protected]
S. Meher
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2021 425

coupled with other tests, which include blood smear tests, bone marrow aspiration
to make the classification more accurate. However, microscopic blood-cell analysis
has a crucial role in early screening/ diagnosis of disease [1–3, 11]. Notwithstanding
the lengthy process involved, the tests also require an expert opinion from experi-
enced doctors. Thereby proving that apart from being time-consuming, the methods
involved are also expensive.
In a bid to find a workaround for the problem mentioned above, researchers
generally use the concept of computer vision and its various techniques to automate
the manual tasks of classification. Mohaptra et al. [9] have suggested a fuzzy-based
segmentation approach for efficient ALL detection. In [4, 9, 12], texture, shape,
and color features are extracted in the feature extraction stage. Then, SVM [13] is
employed to efficiently classify ALL.
In [1], AdaBoost with random forest is used to properly classify ALL. On the
other hand, Narjim et al. [10] have suggested an ensemble classifier-based ALL
classification approach. Rawat et al. have suggested a hybrid classifier-based ALL
detection approach [6].
Currently, transfer learning has gained an important role in medical image analysis
due to its outstanding performance in small datasets [2, 14]. Vogado et al. [2] have
proposed a transfer learning-based feature extraction and SVM [13]-based ALL
classification method. They have used AlexNet [15], CafNet [16], Vgg-f [17], and
ensemble of all these three methods to extract efficient features and compare their
performances.
In this work, we have presented three models by introducing fully connected layers
and/ or dropout layers in ResNet50 architecture [18, 19]. Using the best model of the
lot, we have extracted efficient features. Then, we have employed machine learning
techniques for efficient ALL classification.

2 Datasets

In this work, we use ALL-IDB2 dataset [20], for the validation of the proposed ALL
classification approach. It contains 260 images with 130 images of ALL and 130
images of healthy cells. Some of these images are displayed in Fig. 1.

3 Proposed Method

The proposed ALL classification method is presented as shown in Fig. 2. The aim
of this work is to present an effective and computationally efficient ALL detection
technique.
Here, we have suggested a transfer learning-based feature extraction technique
since it is preferred over traditional CNNs particularly, while we have ALLIDB2 like
small datasets. We have presented three transfer learning models by introducing fully
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 427

a b c

d e f

Fig. 1 Sample images: a–c represent healthy cells; d–f are unhealthy (ALL) cells

Fig. 2 Block diagram of proposed ALL classification method

connected layers and/ or dropout layers in ResNet50 architecture [18, 19]. Finally,
we have applied machine learning techniques to classify ALL effectively. The details
of the work have been discussed as follows.
Transfer learning is an adaptive learning method where weights and models trained
for one task can be reutilized to perform some other task without retraining the model
from scratch again [2, 14, 21, 22]. The basic idea is that a model trained over a large
and diverse dataset. The model can then be repurposed and fine-tuned to perform the
required task on a specific dataset to give out specified results without one having
to retrain the model from scratch again. It helps reduce the training and execution
time and reduces the emphasis on the requirement of a large amount of dataset and
428 P. K. Das et al.

eases the hardware dependencies without having to do any trade-off with accuracy
and efficiency of the performance [2, 14, 21, 22].
The key difference between the various deep-net architectures such as AlexNet
[15], VGG-16 [23], Res-Net50 [18] is the number of parameters involved, depth of
the layers, the architecture design, etc. In this experiment, we have slightly modified
ResNet50 architecture [18, 19] by adding by introducing fully connected layers and/
or dropout layers and suggested three model as follows.
It is a general understanding that the accuracy and complexity of the feature han-
dling increase with an increase in a deep neural network’s depths, i.e., as the depth
increases, accuracy also increases. But this is not always true. It so happens that the
initial increase in depth accuracy increases because of the handling of more com-
plex features, but after a certain depth, accuracy again starts decreasing. It happens
because of the vanishing gradient problem. In the proposed experiment, the chal-
lenge was to incorporate a deep enough layer to handle various feature complexities
but without having to do any trade-off of accuracy. The authors in [18] found a solu-
tion to this. They proposed a deep-net architecture known as ResNet [18], which
stands for residual network. In this, they propose the concept of identity shortcut
(skip) connection. The identity shortcut works in the principle that an identity fea-
ture vector from the earlier layer can be stacked with the forward layer’s feature.
The skip connection gives an additional path for the gradient flow. Hence, solve the
vanishing gradient problem. It allows the room to increase the deep-net architecture’s
depth without worrying about the vanishing gradient problem. Figure 3 displays the
ResNet50 architecture [18, 19]. The convolutional block and identity block used in
here are shown in Figs. 4 and 5, respectively.
For this experiment, we have tried to customize the ResNet50 architecture(
restricting ourselves to changes only in the dense layer section) to suit the required
purpose. We have proposed three different customized models as shown in Figs. 6,

Fig. 3 ResNet50 architecture [18, 19]

Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 429

Fig. 4 Convolutional block

Fig. 5 Identity block

Fig. 6 Model 1

7, and 8. Based on the training results, a comparative analysis among them has been
made to select the best of the lot to carry out further operations.
The algorithm is designed to select the best model, as shown in Fig. 9. In each
epoch, the validation accuracy of the model is checked. Suppose it is greater than
the current best model’s validation accuracy. In that case, the best model’s valida-
430 P. K. Das et al.

Fig. 7 Model 2

Fig. 8 Model 3

tion accuracy is updated, and the corresponding weight of the architecture for the
following accuracy is saved. The best model in the last epoch has to be used for
feature extraction in the testing phase. In the feature extraction process, features are
extracted from the penultimate layer or fc2 layer, as shown in Fig. 10.
Finally, we have employed logistic regression [24], SVM [13], and random forest
[25] to classify ALL and compare their performances.
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 431

Fig. 9 Flowchart for updating the best weight and selecting the model having best training perfor-
mance
432 P. K. Das et al.

Fig. 10 Features are extracted from dense layer and fed to different classifiers

4 Result and Discussion

This section deals with a comparative training performance analysis of these three
models. Then, the classification performances of logistic regression, SVM, and ran-
dom forest are highlighted. Figures 11 and 12 present the performances of Model-1
in the training phase. On the other hand, the performances of Model-2 in the training
phase are shown in Figs. 13 and 14. Similarly, Figs. 15 and 16 represent the perfor-
mances of Model-3 in the training phase. From these figures, we notice that in all
these three models, the training and validation accuracy enhance with the increased
epoch. We also notice that the training and validation losses decrease with increased
epoch.
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 433

Fig. 11 Variation of training and validation accuracies of Model 1 with respect to the number of
epoch

Fig. 12 Variation of training and validation losses of Model 1 with respect to the number of epoch
434 P. K. Das et al.

Fig. 13 Variation of training and validation accuracies of Model 2 with respect to the number of
epoch

Fig. 14 Variation of training and validation losses of Model 2 with respect to the number of epoch
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 435

Fig. 15 Variation of training and validation accuracies of Model 3 with respect to the number of
epoch

Fig. 16 Variation of training and validation losses of Model 3 with respect to the number of epoch

From these figures, we observe that Model 1 yields superior performance. It

displays the best training and validation characteristics curve. Model 1 achieves the
best performance with a validation accuracy of 92.3 %. The validation accuracy of
436 P. K. Das et al.

Table 1 Classification performance

Method Sensitivity Specificity Accuracy (%) Precision (%) F1 score (%)
(%) (%)
Logistics 100.00 93.55 96.15 91.30 95.45
regression
Random forest 95.65 96.55 96.15 95.65 95.65
SVM 100.00 93.55 96.15 91.30 95.45
The best performance are highlighted in bold letter

Model 2 and Model 3 is 76.3 % and 86.5 %, respectively. Hence, Model 1 is selected
for feature extraction in the testing phase.
Table 1 represents the classification performance. From the table, we see that
logistics regression and SVM achieve similar performance with the best sensitivity.
All three methods gain 96.15 % accuracy. On the other hand, random forest achieves
the best performance in terms of specificity, precision, and accuracy.

5 Conclusion

Notwithstanding the advancement in medical image analysis, the detection of ALL

is still a challenging task. In this work, we have presented an automatic ALL detec-
tion approach. Here, we have suggested a transfer learning-based feature extraction
followed by a machine learning-based classification approach. We have proposed
three transfer learning-based models in the feature extraction stage by introducing
fully connected layers and/ or dropout layers in ResNet50 architecture. The model,
which has the best training performance, is applied for efficient feature extraction.
We have employed logistic regression, SVM, and random forest to classify ALL and
compare their classification performances.

References

1. Mishra S, Majhi B, Sa PK (2019) Texture feature based classification on microscopic blood
smear for acute lymphoblastic leukemia detection. Biomed Signal Process Control 47:303–311.
https://doi.org/10.1016/j.bspc.2018.08.012
2. Vogado LH, Veras RM, Araujo FH, Silva RR, Aires KR (2018) Leukemia diagnosis in blood
slides using transfer learning in CNNs and SVM for classification. Eng Appl Artif Intell 72:415–
422. https://doi.org/10.1016/j.engappai.2018.04.024
3. Al-Dulaimi K, Banks J, Nguyen K, Al-Sabaawi A, Tomeo-Reyes I, Chandran V (2020) Segmen-
tation of white blood cell, nucleus and cytoplasm in digital haematology microscope images: a
review challenges, current and future potential techniques. IEEE Rev Biomed Eng https://doi.
org/10.1109/RBME.2020.3004639
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 437

4. Putzu L, Caocci G, Di Ruberto C (2014) Leucocyte classification for leukaemia detection using
image processing techniques. Artif Intell Med 62:179–191 https://doi.org/10.1016/j.artmed.
2014.09.002
5. Rawat J, Singh A, Bhadauria HS, Virmani J (2015) Computer aided diagnostic system for
detection of leukemia using microscopic images. Procedia Comput Sci 70:748–756. https://
doi.org/10.1016/j.procs.2015.10.113
6. Rawat J, Singh A, Bhadauria H (2017) Classification of acute lymphoblastic leukaemia using
hybrid hierarchical classifiers. Multimed Tools Appl 76:19057–19085
7. El Houby EMF (2018) Framework of computer aided diagnosis systems for cancer classification
based on medical images. J Med Syst 42(8):1–11. https://doi.org/10.1007/s10916-018-1010-
x
8. Rahman A, Hasan MM (2018) Automatic detection of white blood cells from microscopic
images for malignancy classification of acute lymphoblastic leukemia. In: 2018 International
conference on innovation in engineering and technology (ICIET). IEEE, pp 1–6
9. Mohapatra S, Samanta SS, Patra D, Satpathi S (2011) Fuzzy based blood image segmentation
for automated leukemia detection. In: 2011 International conference on devices and commu-
nications ICDeCom 2011—Proceedings. https://doi.org/10.1109/icdecom.2011.5738491
10. Narjim S, Al Mamun A, Kundu D (2020) Diagnosis of acute lymphoblastic leukemia from
microscopic image of peripheral blood smear using image processing technique. In Interna-
tional conference on cyber security and computer science. Springer, pp 515–526. https://doi.
org/10.1007/978-3-030-52856-0-41
11. Das PK, Meher S, Panda R, Abraham A (2020) A review of automated methods for the detection
of sickle cell disease. IEEE Rev Biomed Eng 13:309–324. https://doi.org/10.1109/RBME.
2019.2917780
12. Agaian S, Madhukar M, Chronopoulos AT (2018) A new acute leukaemia-automated classifi-
cation system. Comput Meth Biomech Biomed Eng: Imaging Vis 6 (3):303–314
13. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural
Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
14. Gong Y, Zhang Y, Zhu H, Lv J, Cheng Q, Zhang H, He Y, Wang S (2020) Fetal congenital heart
disease echocardiogram screening based on DGACNN: adversarial one-class classification
combined with video transfer learning. IEEE Trans Med Imaging 39(4):1206–1222. https://
doi.org/10.1109/TMI.2019.2946059
15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional
neural networks. In: Proceedings of advances in neural information, pp 1097–1105
16. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014)
Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM
international conference on multimedia, MM’14. ACM, New York, NY, USA, pp 675–678
17. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:
delving deep into convolutional nets. In: British machine vision conference
18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
19. Ji Q, Huang J, He W, Sun Y (2019) Optimized deep convolutional neural networks for identifi-
cation of macular diseases from optical coherence tomography images. Algorithms 12(3):1–12
20. Labati RD, Piuri V, Scotti F (2011) All-IDB: The acute lymphoblastic leukemia image database
for image processing. In 2011 18th IEEE international conference on image processing. IEEE,
pp 2045-2048
21. Wang S, Zhang L, Zuo W, Zhang B (2020) Class-specific reconstruction transfer learning for
visual recognition across domains. IEEE Trans Image Process 29:2424–2438. https://doi.org/
10.1109/TIP.2019.2948480
22. Han N, Wu J, Fang X, Xie S, Zhan S, Xie K, Li X (2020) Latent elastic-net transfer learning.
IEEE Trans Image Process 29:2820–2833. https://doi.org/10.1109/TIP.2019.2952739
23. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image
recognition. In: ICLR, pp 1–14
24. Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression. Wiley, p 398
25. Liaw A, Wiener M (2002) Classification and regression by random Forest. R News 2(3):18–22
Computer-Aided Classifier
for Identification of Renal Cystic
Abnormalities Using Bosniak
Classification

P. R. Mohammed Akhil and Menka Yadav

Abstract In this chapter, an effective computer-aided detection and diagnosis

system is proposed which identifies cystic abnormalities in kidneys from 2D abdom-
inal CT scan images. The model differentiates between a normal and abnormal kidney
and, if found abnormal, grades the level of cancer in the kidney. This model-based
system is developed to assist physicians and radiologists to come to an accurate
preliminary diagnosis which would help inappropriate treatment and follow-up. The
proposed system can also be used as a preliminary diagnostic tool in rural and under-
developed areas of the world where quick and easy access to efficient radiologists
are limited. Building a classifier to classify the tumor into normal, grade 1, grade 2,
grade 3, and grade 4 is the key differentiator of this system compared to the existing
tumor identification methodologies available. The concept of Bosniak classification
of renal tumors, the most popular preliminary diagnosis methodology preferred by
radiologists was used to develop the model using a feed-forward Artificial Neural
network (ANN). The neural network was trained using 82 CT images, and the trained
network was tested using 30 CT images. The results showed a malignancy classifica-
tion rate of 93.75% and grading accuracy of 86.67% when compared to the clinical
results following biopsy of the tumor.

Keywords Kidney · Cancer · Bosniak classification · MATLAB · Image

processing · Artificial Neural network (ANN) · Computed tomography

P. R. Mohammed Akhil
NVIDIA, Bangalore, India
P. R. Mohammed Akhil · M. Yadav (B)
NIT, Tiruchirappalli, Tamil Nadu, India
e-mail: [email protected]
M. Yadav
ECE Department, MNIT, Jaipur, Rajasthan, India

1 Introduction

Renal cell carcinoma (RCC) is the ninth most common cancer in men and 14th
most common cancer in women [1]. RCC accounts for approximately 90% of all
renal malignancies. Traditionally, 30–40% of patients with RCC have died due to
the disease as compared to 20% mortality rates associated with prostate and urinary
bladder cancers [2]. Overall, the lifetime risk for developing kidney cancer in men is
about 1 in 48. The lifetime risk for women is 1 in 83. Identification of renal tumors
correctly at the right time is very crucial. Proper follow-up based on the preliminary
diagnosis will help in reducing complications for the patients. Early stage detection
coupled with proper treatment can help in drastically reducing mortality rates related
to renal tumors.
Medical imaging plays a crucial role in the detection and diagnosis of tumors.
Prescribing the appropriate imaging technique for a patient is crucial for proper detec-
tion of the tumor. Ultra sound is generally the first step for patients with suspected
renal disease because of low cost, availability, and lesser radiation effects [3]. But
the role of ultra sound in identifying renal tumors is limited. There could be overlap
between complex cystic masses and solid lesions. The most commonly used method
to evaluate renal masses is contrast-enhanced CT [4]. It has also proven to be the first
choice for grading of renal tumors with high accuracies in both early and advanced
stages. High resolution, reproducibility, and moderate cost allow CT to be the primary
choice for imaging. CT has a sensitivity of about 90% for small renal masses and
moving toward 100% for larger ones [5]. MRI is also used excessively for the eval-
uation of intermediate renal masses and also for grading of renal cancer [6]. Tradi-
tionally, MRI is preferred when the contrast enhancement in CT is questionable, and
the radiologist is unable to make a confirmed preliminary diagnosis. MRI is also
preferred in case of pregnancies, allergies, or for follow-up to reduce the effects of
radiation.
Medical images are most susceptible to salt and pepper noise compared to other
types of noises generally found in digital images. The median filter is the most
efficient and common filter used to remove salt and pepper noise in digital images.
T. Huang et al. proposes a fast two-dimensional median filtering which is based
on storing the gray-level histogram values of the pixels in the filtering window to
reduce sorting time compared to conventional methods [7]. But this tendency is to be
memory intensive as the values have to be stored prior to processing. H. Hwang et.al
proposed an adaptive median filter which uses a variable window size for a median
filter which provides better performance while maintaining sharpness [8]. R.H. Chan
et.al does a modification on the adaptive median filter by dividing the filtering into two
stages, namely adaptive filtering and image regularization [9]. This shows significant
improvement with respect to edge preservation and noise compression.
Accurate and quick segmentation of kidney is highly essential in computer-aided
diagnosis. Jun Xie et al. came up with segmentation of kidney based on texture and
shape priors in ultrasound images [10]. In ultrasound images, region growing methods
cannot be used because of a large amount of speckle noise. Hence, they use to shape
Computer-Aided Classifier for Identification of Renal … 441

and texture priors. But using prior models is computationally exhaustive in case of CT
images because active contour models give good accuracy in such images without
the use of any prior models and much less computation or memory usage. D.W
Lin et al. divides the different kidney segmentation techniques into threshold based,
knowledge based, and region growing based [11]. He proposes a computer-aided
kidney segmentation which relies on an anatomical structure to develop a course to
fine segmentation structure. This method is applicable for kidneys of different sizes
as it makes use of relative distance of the two kidneys from the spine. Even though
the method seems promising, the entire methodology works on the assumption that
the spine is visible in the m/2th slice of the CT images where m is the total number of
slices. This could very well vary from person to person, and hence, the method could
achieve only about 88% accuracy in segmentation which demands other methods
which can result in better accuracy. S.A.Tuncer et al. developed an Android oper-
ating system in mobile devices for segmentation of kidneys and abdominal images
[12]. In his work, the vertebral column is first determined by applying to pre-process
to the images. Later, connected component labeling is used to obtain the kidney
areas. The results generated in the PC were later transferred to mobile. This method,
even though looks more promising, could attain only an accuracy of 85% which
demands another alternative yet again. N. Farzaneh et al. proposed an automated
kidney segmentation for traumatically injured patients using machine learning and
active contour modeling [13]. This method first develops a 3D initialization mask
within the abdominal cavity, and then, further divides that cavity into small patches
and extracts multiple features. The features are then used by a random forest classifier
to detect potential initialization voxels. It is followed by an adaptive region growing
on both left and right kidneys to segment out them individually and then combine
the two results to form a final segmented image. This method showed slightly more
accuracy of 88.9% as compared to Lin et al. However, since the main aim in the
proposed method is to provide highly accurate classification of tumors rather than
segmentation, a modification of Lin’s methodology is preferred as it is computa-
tionally faster as compared to Negar et.al as the latter involves 3D alignment and
machine learning.
Extracting the correct and adequate number of features is crucial to develop an
efficient classification. More the number of features, more the chances of developing
a foolproof classification. R.M. Haralick et al. identified texture as one of the most
important features for image classification [14]. They assumed that the texture content
in an image is contained in the overall or average spatial relationship which the gray
tones in an image have to one another. They developed gray tone spatial-dependence
matrix or gray-level co-occurrence matrix (GLCM) which depicts the relationship
between adjacent gray tones in an image. Using the dependence matrix, they devel-
oped a certain set of features which were tested for accuracy by the classification of
two different sets of datatypes, one being five different kinds of sandstones and other
aerial photographs of eight land use categories. M Galloway et al. developed a set
of texture features based on gray-level run lengths [15]. The gray-level run is a set
of consecutive, collinear pixels having the same gray level. Length of the run is the
442 P. R. Mohammed Akhil and M. Yadav

number of pixels in the run. Galloway suggested five texture features based on the
gray-level run length matrix (GLRLM).
Galloway observed that in a course texture, relatively long gray-level runs would
occur more often and a fine texture would primarily contain shorter runs. Later, the
research trend started to shift toward combining both GLRLM and GLCM texture
features to develop more accurate classification methodologies. Recognition of image
patterns independent of size, position, orientation, and reflection is very crucial for
the accurate analysis of medical images. Ming-Kuei Hu derived moment invariants
which could achieve that goal [16].
Accurate classification is the key for any tumor detection system to be efficient.
M.G. Linguraru et al. proposed a computer-assisted radiology tool to assess renal
tumors in triple-phase contrast-enhanced abdominal CT [17]. He classified the renal
tumors into normal cysts, Von Hippel–Lindau syndrome (VHL) lesions, and heredi-
tary papillary renal carcinomas (HPRC). M.G. Linguraru et al. also proposed another
computer-aided renal cancer classification tool from contrast-enhanced CT for proper
management and classification of renal tumors [17]. From the segmented lesions,
classification of different lesion types was done using histograms of curve-related
features using random sampling. Using histogram curve-related features (HCF), the
structural differences were quantifiable which helped in classifying between cysts
and cancers. In this method, five types of lesions were analyzed, namely benign cysts,
Von Hippel–Lindau syndrome (VHL), Birt-Hogg-Dube’(BHD) syndromes, heredi-
tary papillary renal carcinomas (HPRC), and hereditary leiomyomatosis and renal
cell cancers (HLRCC). T. Mangayarkarasi et al. proposed a computer assistive tool
for classification of different renal pathologies from ultrasound kidney images [18].
Global thresholding is applied for segmentation of the kidneys. From the segmented
kidneys, first order statistical features such as mean, entropy, and the standard devi-
ation are extracted which is used as inputs to the classifier. A probabilistic neural
network is used to classify the kidney images into the normal kidney, kidney stone,
normal cyst, or tumorous cyst. M. Koshdeli et al. developed a model using convolu-
tional neural networks for tumor grading from hematoxylin and eosin (H&E) stained
sections of kidney [19]. Using the deep learning model, they were able to classify
the sample images into six categories of normal, fat, blood, stroma, low-grade gran-
ular tumor, and high-grade clear cell carcinoma. All these methods based on the
classification of abdominal CT images simply classify the images into either benign
or malign. Some methods classify them into different types of tumor diseases. But
there has been no methodology to identify whether a particular kidney is tumorous
and at the same time to know what stage of the tumor it is currently in, which
would help in suggesting appropriate follow-up. This is the technology gap which
is being addressed in this project. Morton A. Bosniak proposed a robust classifica-
tion which helps in differentiating between complex renal cysts and renal tumors
[20]. He proposed four stages of classification of kidney based on septa formation,
calcification contrast enhancement, and thickened irregular walls. His classification
of renal tumors is used as the standard reference tool by radiologists worldwide
for initial diagnosis which regarding renal cysts. In this paper, a computer-aided
diagnosis system, which can classify the kidneys from abdominal CT into normal,
Computer-Aided Classifier for Identification of Renal … 443

Grade 1, Grade 2, Grade 3, and Grade 4, is proposed. Classification is based on

Bosniak classification using a feed-forward Artificial Neural network (ANN)[21]
which is trained with data set comprising a combination of GLCM, GLRLM and
Hu’s moment features extracted from the segmented kidneys.
The remainder of this paper is organized in the following manner. In Sect. 2, the
proposed method is presented. The databases used for evaluation and experimental
setup along with the results are detailed in Sect. 3. Finally, the main conclusions are
presented in Sect. 4.

2 Methodology

Figure 1 presents an overview of the proposed approach. There are four main stages:
pre-processing, kidney segmentation, feature extraction, and classification. Figure 2
represents the flow chart of the proposed CAD system. First the input scan image
is read into the system. Majority of the image processing applications occur in the
grayscale pixel range. Hence, the input scan image is converted into grayscale pixel
range if found otherwise. Luminosity method is used for the same [23]. 2D scan
images can be of different frame size varying from patient to patient, time of measure-
ment, presence of contrast, and other factors. In order to maintain uniformity in the
input frame size for the methodology proposed, we resize the input scan image to
256 × 256 pixels. CT scan images are highly susceptible to salt and pepper noise
[24]. To make sure the processing is not affected by the input ambient noise, a median
filter is used to remove noise if any [2]. This marks the end of the pre-processing
involved in our system. Both the kidney locations are identified and segmented out
using prior anatomy knowledge and adaptive rectangular contour region growing.
The necessary features are extracted from the kidneys for classification. These are

Fig. 1 Block diagram of the proposed system. Input image after being pre-processed under goes
a series of processing steps and finally classifies the input scan image as normal kidney or into
different grades if found abnormal
444 P. R. Mohammed Akhil and M. Yadav

Fig. 2 Flow chart explaining the design flow of the system starting from reading the input to the
final classification

fed to a classifier which identifies abnormalities if present and then grades them into
four different grades, namely grade 1, grade 2, grade 3, or grade 4.

2.1 Pre-processing

CT images are susceptible to a lot of noise particularly salt and pepper noise. We use
an adaptive median filter which removes the noise effectively but also has a good
amount of edge preservation [5].
Computer-Aided Classifier for Identification of Renal … 445

2.2 Kidney Segmentation

Once the image is de-noised and resized, both the kidneys need to be properly
segmented out for further processing. A modification of Lin’s method of kidney
segmentation is adopted in which prior anatomy knowledge is leveraged to locate
the kidneys, and adaptive region growing is carried out to accurately segment out the
kidneys [11]. Once the spine is located with prior anatomical knowledge, appropriate
location of kidney is narrowed out with respect to the spine. As opposed to the adap-
tive seed growing adopted in Lin’s method, an adaptive rectangular contour growing
is used here. Using the approximate location of kidney based on prior knowledge,
an adaptive rectangular contour is formed. The key differentiator here as compared
to the existing methods is that the size of the initial rectangular contour for region
growing varies from image to image. The algorithm is summarized as follows:
Algorithm 1: Kidney Segmentation
1. Locate spine using prior anatomical knowledge
2. Fix an initial seed point for reference based on relative distance from spine
3. Perform fuzzy-c means clustering on the image [25]
4. Perform rough segmentation of kidney region
5. Determine four corner pixel values of the roughly segmented kidney ( x1 , x2 ,
x3 , x4 )
6. Determine the rectangular contour corner pixel values ( x1 − 5, x2 − 5, x3 − 5,
x4 − 5)
7. Perform region growing operation until boundary condition is satisfied [11]
8. Segment out the detected kidney area.

2.3 Feature Extraction

Once the kidneys are segmented out, the features required for accurate classification
of the renal abnormalities need to be extracted. There are two aspects for the selec-
tion of features: (a) More the number of features, more foolproof the classification
becomes and (b) The features extracted should help in classifying irrespective of
size, translation, rotation and/or reflection.
Our extraction strategy includes determining three sets of features: (a) gray-level
co-occurrence matrix (b) gray-level run length matrix, and (c) Hu’s moments.
Gray-Level Co-occurrence Matrix (GLCM)
GLCM represents the angular and spatial relationship over an image sub-region of
specific size. Analysis of GLCM helps in understanding the textural features of the
feature. Once the GLCM matrix is calculated, a total of 20 features are extracted
from the matrix.
446 P. R. Mohammed Akhil and M. Yadav

Gray-Level Run Length Matrix (GLRLM)

GLRLM gives information regarding the number of pixels in a run. A run can be
defined a set of consecutive, collinear pixels having the same gray level. GLRLM
also helps in understanding the textural features which are crucial for classification
of patterns/images [15]. A total of 11 features are extracted from GLRLM. Studies
have showed that GLRLM outperforms GLCM in terms of accuracy in classification
of images [26]. However, to increase accuracy of the classification a combination of
both these features is used [27].
Hu’s Moments
Hu’s moment invariants represent a set of features that remain the same irrespective
of size, translation, rotation and/or reflection. This set of features is crucial for our
system since each input CT scan image is unique and will vary from person to person
depending on age, time of exposure and related factors. Hu’s moments helps us in
developing a general feature set that stays true in all varying conditions.
Most of the systems which classify tumors or abnormality generally use a combi-
nation of GLCM and GLRLM [27] or GLCM and Hu’s moments [28] or GLRLM
and Hu’s moments [29]. In the proposed work, primary intention is to develop a
classifier which makes use of combination of all these three feature sets. This makes
it a unique feature set as compared to the existing methodologies.

2.4 Classification

The main part of the proposed system is to accurately identify the abnormali-
ties and also to effectively grade them into different stages. For this, the concept
of Bosniak classification is adopted. Bosniak classification helps in differentiating
between different types of cysts using traits such as calcification, septa formation,
presence of high density fluid in cysts, and irregularity of wall or solid elements.
Using these traits, the cysts are classified into four main categories:
Category 1
Simple cyst with imperceptible wall and well-rounded shape falls into this category.
These cysts are approximately 0% malignant, and hence, no follow-up is required.
Category 2
Minimally complex cyst with a few <1 mm thin septa or thin calcification falls into
this category. The lesions are non-enhancing under contrast and hence approximately
0% malignant. Follow-up for this category is also not sufficient.
Category 3
Intermediately complex cyst with thick, nodular multiple septa with measurable
enhancement under contrast falls into this category. The treatment includes partial
Computer-Aided Classifier for Identification of Renal … 447

nephrectomy or radio frequency ablation in elderly or poor surgical candidates. The

malignancy risk for cysts falling under this category is approximately 55%.
Category 4
Clearly malignant solid mass with a large cystic or necrotic component falls into this
category. The treatment includes partial or total nephrectomy. The malignancy risk
for cysts falling under this category is 100%.
Using the above as the classification bench-mark, a system is developed to effectively
classify the cysts once detected. We use a feed-forward Artificial Neural network
(ANN) for the multi-stage classification [30].
ANN
Neural networks are the most commonly used model for pattern recognition and
classification. Selection of a particular type of neural network is subject to the appli-
cation in which it is used. Radial basis function neural network is mostly used in
power restoration systems [31]. Kohonen self-organizing neural network is used to
identify patterns in data. It is widely used in medical analysis to cluster the data
into different categories [32]. Recurrent neural network is mostly used in text to
speech (TTS) conversion models [33]. Convolution neural networks are extensively
used in signal and image processing applications like facial recognition and various
computer vision techniques [34]. Feed-forward neural network is one of the simplest
form of ANN where the data flows in only one direction. It is one of the most preferred
model in computer vision and pattern recognition when the classification of target
classes is quite difficult [35]. Since the proposed system requires accurate multi-stage
classification, feed-forward ANN is used for developing the classifier.
In feed-forward ANN, the data passes through the input nodes and exit through
the output nodes. The presence of hidden layer is optional subject to its necessity
and the complexity of the data set which is being used in the system. Each layer in a
network is comprised of various computational units called neurons which connects
to different layers. Each neuron multiplies an initial value by some weight, sums
results with other values coming from different neurons into it, adjusts the present
result by a bias, and then produces a normalized output as a function of an activation
function.
The following concepts are important in the context of ANN:
Initialization
The initial values of the weights and bias greatly affects convergence of the network.
The aim is to achieve convergence within minimum time duration. Xavier’s random
weight initialization algorithm is used for initialization [36]. This method not only
reduces the chances of running into gradient problems, but also helps to converge to
least error faster.
448 P. R. Mohammed Akhil and M. Yadav

Activation Function
The activation function is responsible for nonlinear mapping between the inputs
and response variable. For the inputs and hidden layers, we use rectifier linear units
(ReLU), defined as

f (x) = max(0, x) (1)

ReLU is found to achieve better results than classical sigmoid, or hyperbolic

tangent functions, and speeds up the training [37, 38]. One of the limitations of
ReLU is that some of the gradients can be fragile during training and can die which
would result in a weight updation which will make it never activate on any input
again. This limitation is overcome by using leaky rectifier linear unit (LReLU) [39]
that introduces a small slope that keeps the updates alive. This function is defined as

f (x) = max(0, x) + αmin(0, x) (2)

where α is the leakiness parameter. For the output layer, softmax activation function
is used [40].
Regularization
It is used to avoid overfitting. Regularization works by penalizing the coefficients.
We use dropout [41, 42] for regularization of our network. In this method, nodes
are selected at random and are removed along with all their incoming and outgoing
connections at every iteration. This ensures randomness in the output produced. The
probability of dropping a node is given by ρ which is tuned for better performance
by grid search method [43]. Once sufficient randomness is achieved during training,
all the nodes are used while testing.
Loss Function
The main aim during training is to minimize the loss function. Mean squared error
is used as given in Eq. (3),

2 (i) 2
L− y − ỳ (i) (3)
n

where i varies from 1 to n, (y(i) − ỳ(i) ) is named as residual and the target of MSE
loss function is to minimize residual sum of squares.
To train ANN, the loss function must be minimized, but it is highly nonlinear.
Levenberg–Marquardt algorithm is used for optimization [44]. Figure 3 represents a
state diagram for training a neural network with the Levenberg–Marquardt algorithm.
The first step is to calculate the loss, the gradient, and Hessian approximation. Then,
the damping factor is adjusted so as to reduce the loss function at each iteration.
Computer-Aided Classifier for Identification of Renal … 449

Fig. 3 Levenberg–Marquardt algorithm workflow

3 Results and Discussion

3.1 Database

A total of 112 2D abdominal CT scan images are used as the database for developing
and validating this system. The scan images were acquired from The Cancer Imaging
Archive (TCIA). The data set was acquired from patients who are in the age ranging
from 15 to 82. Data set has gender composition of 67 male and 45 female patients.
This ensures a good diversity and authenticity to the methodology proposed.

3.2 Experiment Setup

The entire setup can be divided into two: training and validation. Training refers to
the development of the proposed classification system and validation refers to testing
the accuracy and authenticity of the developed model.
Training
A set of 82 CT scan images are used for training the feed-forward neural network for
classification. The training data set comprises of five normal kidney scans, 6 grade
1 scans, 34 grade 2 scans, 32 grade 3 scans, and 5 grade 4 scans. The diagnosis
of each of the training images is obtained after preliminary scans followed by a
biopsy if needed. For training, each image undergoes pre-processing, segmentation,
and feature extraction. The extracted features are stored in a matrix called ‘Data’
which has a dimension of i × j, where i corresponds to the number of features being
extracted and j corresponds to the number of images used for training. The dimension
of the data matrix in our case, hence, is 38 × 82. The diagnosis result corresponding
450 P. R. Mohammed Akhil and M. Yadav

Table 1 Hyperparamaters
Stage Hyperparameter Value
for the proposed method
Initialization Bias 0.1
Weights Xavier
Leaky ReLU α 0.333
Dropout p 0.4
Training Epochs 26
Gradient 0.00098142
Mu 1e-08

to each of the training image is stored in another matrix called ‘Target’ which has a
dimension of k × l where k corresponds to the number of classification stages and l
corresponds to the number of images used for training. The target matrix formed is
used as the desired output while training the neural network.
Two hidden layers have been used for the feed-forward neural network. Each
hidden layer has 30 neurons in them. The choice of number of hidden layers and
the number of neurons per layer is based on the concept of ensembles of neural
network [45]. Table 1 shows the hyperparameters which are being used in the clas-
sifier network. The values obtained are the result of repeated iterations until the
optimum performance is reached.
Testing and Validation
A set of 30 images were used for testing the performance and accuracy of the devel-
oped system. All the images being tested underwent all processing stages of the
proposed system. The result obtained from the developed system was compared
against the diagnosis result of the tumor samples after biopsy. The average time
taken for pre-processing, kidney segmentation, and feature extraction cumulatively
was 41.3 s. The classifier was observed to converge to the final result in an average
of 11 s after 26 iterations. Figure 7 depicts the box plots of the features extracted
from the segmented kidneys. It gives a representation of the range of each feature
for a particular grade of classification and also the mean value and standard error for
the same. This helps us in understanding where each grade of tumor stands in the
statistical point of view.
Figure 6 depicts the performance characteristics of the developed classifier. Best
performance is achieved at epoch 26, and the classifier terminates after that.
Figure 4 shows the simulation steps of the designed system for grade 2 type
tumor. Figure 4a represents resized scan image ready for processing. Initial rect-
angular contour is developed on filtered image as depicted in Fig. 4b, d represents
final boundary detected by our region growing algorithm which finally results in
segmented kidneys as Fig. 4d. 3D volumetric analysis of the segmented kidneys
helps in better understanding of the textural distribution as shown in Fig. 4e. Figure 5
shows the simulation results for the same work flow for grade 3 type tumor.
Computer-Aided Classifier for Identification of Renal … 451

Fig. 4 Simulation work flow of the proposed method for grade 2 type tumor. a Input CT scan
image after resizing to 256 × 256 dimension. b Adaptive rectangular contour created based on
prior knowledge of kidney. c Boundary of kidneys identified using region growing algorithm.
d Segmented kidneys. e 3D textural distribution of the segmented kidneys

Fig. 5 Simulation work flow of the proposed method for grade 3 type tumor. a Input CT scan
image after resizing to 256 × 256 dimension. b Adaptive rectangular contour created based on
prior knowledge of kidney. c Boundary of kidneys identified using region growing algorithm.
d Segmented kidneys. e 3D textural distribution of the segmented kidneys

Fig. 6 Training performance characteristics of the classifier. The classifier attains least error from
the desired output at epoch 26
452 P. R. Mohammed Akhil and M. Yadav

Fig. 7 Box plot representing the range of the different features extracted for each grade of tumor.
Each graph depicts the mean, first quartile, and third quartile values of a particular feature extracted
corresponding to each grade of tumor

The results obtained are validated using standard performance measures. True
positive fraction and true negative fraction are used to calculate accuracy, sensitivity,
and specificity of the proposed system.
Based on the available literature [46]:
• True positive fraction (TPF) is the ratio between the number of positive
observations and the number of true positive conditions.
• False positive fraction (FPF) is the ratio between number of positive observations
and the number of true negative conditions.
• True negative fraction (TNF) is the ratio between number of negative observations
and the number of true negative conditions.
Computer-Aided Classifier for Identification of Renal … 453

• False negative fraction (FNF) is the ratio between number of negative observations
and the number of true positive conditions.
Analysis of database of kidney images received;
Ntot = Number of examination cases = 30
Ntp = Positive true condition = 15
Ntn = Negative true condition = 17
Notp = Number of positive observation from Ntp = 14
Nofn = Negative condition from Ntp = 1
Notn = Negative observations from Ntn = 1
Nofp = Positive observations from Ntn = 16.
Computation of final performance measures is given as follows:

Nofp
Accuracy for normal images = (4)
Ntn

Accuracyforabnormalimages = Notp/Ntp (5)

Sensitivity = TP/(TP + FN) (6)

TN
Specificity = (7)
TN + FP

where
• TP = True positive—Predicts abnormal as abnormal.
• FP = False positive—Predicts normal as abnormal.
• TN = True negative—Predicts normal as normal.
• FN = False negative—Predicts abnormal as normal.
Classifier performance (Rate of Classification) is given by

Notp + Nofp
CR = ∗ 100 (8)
Ntot

Table 3 depicts the performance measures and classification rate of the proposed
method.
Table 2 shows the comparison of malignancy classification rate between our
proposed method and other existing methods. Proposed method outperformed all
existing methods depicting a classification rate of 93.5%. Out of the 30 images being
tested, once an abnormality was identified, 24 were correctly identified as the respec-
tive grade whereas 4 were misclassified. Out of these 4 images, 3 images were graded
454 P. R. Mohammed Akhil and M. Yadav

Table 2 Comparison of tumor classification rate with previously reported work

S. no. Author Modality Methodology used Classification rate
1 Boukeroui, D. et al Abdominal CT Automatic region of 92.81% for simple
interest identification kidneys and 65.81%
using statistical for difficult kidneys
framework
2 Kim, D. Y. et al Abdominal CT Texture analysis 92.307%
using histogram
features and seed
growing
3 Mangayarkarasi, T. Kidney ultrasound PNN-based classifier 93.5%
et al
4 Proposed method Abdominal CT ANN-based classifier 93.75%
using Bosniak
classification

Table 3 Validation of results

S. no. Image classes Accuracy (%) Specificity (%) Sensitivity (%)
1 Normal 93.33 93.33 86.67
2 Abnormal 94.11 95.12 87.5
Classification rate = 93.75%

to the nearest other grade while one image was graded as grade 4 while it was reported
as grade 2. The grading accuracy of the developed system is 86.67%.

4 Conclusions

In summary, a novel adaptive rectangular contour-based kidney segmentation

followed by identification of renal abnormalities and grading the stage of the detected
tumor is proposed. Design flow starts by pre-processing consisting of color to
grayscale conversion followed by median filtering and image resizing. After that,
using prior anatomy knowledge, approximate location of the kidneys is found out,
and an adaptive rectangular contour is calculated which grows until the boundary of
kidneys is determined. Required features are extracted from the segmented kidney,
and a feed-forward ANN is trained using these to detect and grade the tumor if
present.
In designing the proposed method, heterogeneity and diversity of the scan images,
which are being used for developing the model, were taken into account. Adaptive
rectangular contour ensures efficient segmentation whatever be the size and shape
of a patient’s kidney. This makes the proposed method more universally applicable.
Factors like irregular size, reflection, and rotation were taken into consideration while
Computer-Aided Classifier for Identification of Renal … 455

training the neural network classifier using Hu’s moments to ensure more accurate
classification. The ANN used is built over two hidden layers with 30 neurons in
each layer. The performance was tested using different configurations of the neural
network but the current configuration delivered the best performance for the given
data set. It was verified that Bosniak classification can be effectively implemented
on computer-aided diagnosis and that our system could be used as a second opinion
for expert radiologists.
The proposed method using database from The Cancer Imaging Archive (TCIA).
The method showed 93.75% classification rate in detecting the presence of any
abnormalities in kidney which makes our design superior over other existing method-
ologies. The accuracy for grading the tumors into different stages was found to be
86.67%. The accuracy of any neural network depends on the data set which is being
used for training. Larger the data set, more chances for the developed system to be
accurate. Our design was developed on 112 images, and hence, the data set is too
small justifying the lower accuracy in grading. The grading accuracy can be improved
in the future by acquiring a much bigger data set and training the neural network
with it. However, the architecture of the existing classifier might need some changes
along with the hyperparameters to ensure better performance.

Acknowledgements We are thankful to the ECE department of NIT Tiruchirappalli for providing
resources to do this work. We are also thankful to MHRD for providing scholarship during Masters.

Conflict of Interest There is no conflict of interest.

References

1. Rini BI, Campbell SC, Escudier B (2009) Renal cell carcinoma. The Lancet 373(9669):1119–
1132
2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics
2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
CA: Cancer J Clinic 68(6):394–424
3. Brascho DJ, Bryan JM, Wilson EE (1977) Diagnostic ultrasound to determine renal size and
position for renal blocking in radiation therapy. Int J Radiat Oncol Biol Phys 2(11):1217–1220
4. Conti P, Strauss L (1991) The applications of pet in clinical oncology. J Nucl Med 32(4):623–
648
5. Warshauer DM, McCarthy SM, Street L, Bookbinder M, Glickman M, Richter J, Hammers
L, Taylor C, Rosenfield A (1988) Detection of renal masses: sensitivities and specificities of
excretory urography/linear tomography, us, and ct. Radiology 169(2):363–365
6. Reznek RH (2004) CT/MRI in staging renal cell carcinoma. Cancer Imag Off Publ Int Cancer
Imag Soc 4:S25–S32
7. Huang T, Yang G, Tang G (1979) A fast two-dimensional median filtering algorithm. Sig Proces
IEEE Trans Acoust Speech 27(1):13–18
8. Hwang H, Haddad RA (1995) Adaptive median filters: new algorithms and results. IEEE Trans
Image Process 4(4):499–502
9. Chan RH, Ho C-W, Nikolova M (2005) Salt-and-pepper noise removal by median-type noise
detectors and detail-preserving regularization. IEEE Trans Image Process 14(10):1479–1485
456 P. R. Mohammed Akhil and M. Yadav

10. Xie J, Jiang Y, tat Tsui H (2005) Segmentation of kidney from ultrasound images based on
texture and shape priors. IEEE Trans Med Imag 24(1):45–57
11. Lin D-T, Lei C-C, Hung S-W (2006) Computer-aided kidney segmentation on abdominal CT
images. IEEE Trans Inf Technol Biomed 10(1):59–65
12. Tuncer SA, Alkan A (2017) Segmentation of kidneys and abdominal images in mobile devices
with the android operating system by using the connected component labeling method. In:
Proceedings of Electronics and Microelectronics (MIPRO) 2017 40th international convention
information and communication technology, pp 1094–1097
13. Farzaneh N, Soroushmehr SMR, Patel H, Wood A, Gryak J, Fessell D, Najarian K (2018)
Automated kidney segmentation for traumatic injured patients through ensemble learning and
active contour modeling. In: Proceedings of 40th annual international conference of the IEEE
engineering in medicine and biology society (EMBC), pp 3418–3421
14. Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification.
Cybern IEEE Trans Syst Man SMC-3(6):610–621
15. Galloway MM (1974) Texture analysis using grey level run lengths. NASA STI/Recon
Technical Report N, vol 75
16. Hu M-K (1962) Visual pattern recognition by moment invariants. IRE Trans Inform Theor
8(2):179–187
17. Linguraru MG, Gautam R, Peterson J, Yao J, Linehan WM, Summers RM (2009) Renal
tumor quantification and classification in triple-phase contrast-enhanced abdominal CT. In:
Proceedings of IEEE international symposium biomedical imaging: from nano to macro, pp
1310–1313
18. Linguraru MG, Yao J, Gautam R, Peterson J, Li Z, Linehan WM, Summers RM (2009) Renal
tumor quantification and classification in contrast-enhanced abdominal ct. Pattern Recogn
42(6):1149–1161
19. Mangayarkarasi T, Jamal DN (2017) PNN-based analysis system to classify renal pathologies
in kidney ultrasound images. In: Proceedings of 2nd international conference computing and
communications technologies (ICCCT), pp 123–126
20. Khoshdeli M, Borowsky A, Parvin B (2018) Deep learning models differentiate tumor grades
from H&E stained histology sections. In: Proceedings of 40th Annual International Conference
of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 620–623
21. Bosniak MA (1986) The current radiological approach to renal cysts. Radiology 158:1–10
22. Lippmann R (1987) An introduction to computing with neural nets. IEEE ASSP Mag 4(2):4–22
23. Smith K, Landes P-E, Thollot J, Myszkowski K (2008) Apparent greyscale: a simple and fast
conversion to perceptually accurate images and video. In: Computer graphics forum, vol 27,
no 2. Wiley Online Library, pp 193–200
24. Verma R, Ali J (2013) A comparative study of various types of image noise and efficient noise
removal techniques. Int J Adv Res Comput Sci Softw Eng 3(10)
25. Bezdek JC, Ehrlich R, Full W (1984) Fcm: The fuzzy c-means clustering algorithm. Comput
Geosci 10(2–3):191–203
26. Singh K (2016) A comparison of gray-level run length matrix and gray-level co-occurrence
matrix towards cereal grain classification. Int J Comput Eng Technol (IJCET) 7(6):9–17
27. Mohanty AK, Beberta S, Lenka SK (2011) Classifying benign and malignant mass using glcm
and glrlm based texture features from mammogram. Int J Eng Res Appl (IJERA) 1(3):687–693
28. Alegre E, GonzáLez-Castro V, Alaiz-RodríGuez R, GarcíAOrdáS MT (2012) Texture and
moments-based classification of the acrosome integrity of boar spermatozoa images. Comput
Methods Programs Biomed 108(2):873–881
29. Chaieb R, Kalti K (2018) Feature subset selection for classification of malignant and benign
breast masses in digital mammography. Pattern Anal Appl 1–27
30. Li Y, Fan F (2005) Classification of schizophrenia and depression by EEG with anns*. In:
Proceedings of IEEE Engineering in Medicine and Biology 27th Annual Conference, pp 2679–
2682
31. Sadeghkhani I, Ketabi A, Feuillet R (2012) Radial basis function neural network application
to power system restoration studies. Comput Intell Neurosci 3(10)
Computer-Aided Classifier for Identification of Renal … 457

32. Van Biesen W, Sieben G, Lameire N, Vanholder R (1998) Application of kohonen neural
networks for the non-morphological distinction between glomerular and tubular renal disease.
Nephrol Dial Transplant 13(1):59–66
33. Arik SO, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, Miller J, Ng A,
Raiman J et al (2017) Deep voice: real-time neural text-to-speech. arXiv preprint arXiv:1702.
07825
34. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251
35. Jian Z, Wu WX (2011) The application of feed-forward neural network for the x-ray image
fusion. J Phys Conf Se 312(6):062005
36. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feed forward neural
networks. In: Proceedings of the thirteenth international conference on artificial intelligence
and statistics, pp 249–256
37. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional
neural networks. In: Advances in neural information processing systems, pp 1097–1105
38. Jarrett K, Kavukcuoglu K, LeCun Y et al (2009) What is the best multistage architecture for
object recognition? In: 2009 IEEE 12th international conference on in computer vision. IEEE,
pp 2146–2153
39. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic
models. In: Proceedings of ICML, vol 30, no 1, p 3
40. Dunne RA, Campbell NA (1997) On the pairing of the softmax activation and cross-entropy
penalty functions and the derivation of the softmax activation function. In: Proceedings of 8th
Aust. conference on the neural networks, Melbourne, vol 181. Citeseer, p 185s
41. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple
way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
42. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving
neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.
0580,
43. Gal Y, Hron J, Kendall A (2017) Concrete dropout. In: Advances in neural information
processing systems, pp 3581–3590
44. Moré JJ (1978) The Levenberg-Marquardt algorithm: implementation and theory. In: Numerical
analysis. Springer, Berlin, pp 105–116
45. Hansen L, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell
12(10):993–1001
46. Diakides M, Bronzino JD, Peterson DR (2012) Medical infrared imaging: principles and
practices. CRC Press, Boca Raton
Recognition of Obscure Objects Using
Super Resolution-Based Generative
Adversarial Networks
B. Sathyabama, A. Arunesh, D. SynthiyaVinothini, S. Anupriyadharsini,
and S. Md. Mansoor Roomi

Abstract Object recognition has achieved a good progress in computer vision, but
still it is a difficult task in case of low-resolution images, because traditional discrim-
inant features in high resolution usually disappear in low resolution. In surveil-
lance systems, the region of interests gets blurred due to the distance between the
camera and object and also due to illumination effects. In low-resolution images,
objects appear very small and blurred, thus making recognition of those objects
tedious. Super resolution for natural images is a classic and difficult problem in
image and video processing. But rapid developments in deep learning have recently
sparked interest in super resolution of images. In this paper, generative adversarial
network (GAN) that has been successfully employed in generating images and real-
istic textures with fine details has been extended to the application of image super
resolution. The paper aims at improving the resolution of the obscure objects to
improve the classification accuracy of the system. This is done by detecting obscure
objects using RCNN, then improves its resolution using GAN network and finally
classifies the improved images using AlexNet. The experiment is conducted using
a MSCOCO dataset and collected shoe database in Google COLAB. The super
resolved images increase the classification accuracy by 16%.

Keywords Generative adversarial network (GAN) · Super resolution (SR) ·

Region-based convolutional neural network (RCNN)

1 Introduction

Object detection is a challenging task in computer vision. It is very difficult because

of the significant amount of variation between the images that belong to the same

B. Sathyabama (B) · A. Arunesh · D. SynthiyaVinothini · S. Md. Mansoor Roomi

Thiagarajar College of Engineering, Madurai, Tamilnadu, India
e-mail: [email protected]
S. Anupriyadharsini
PSG College of Technology, Coimbatore, Tamilnadu, India

object category. Other factors that complicate the process of object detection are
viewpoint and scale, partial occlusions, illumination and multiple instances. Object
detection is the process of finding the positions of each and every concerned objects
present in an image. It is actually the process of finding the bounding box for each
object. One approach is to use a sliding window to scan the image in scale space
and to classify each window individually. According to Dalal, in INRIA dataset, the
detection performance is 90% when the resolution of the image is 640 × 480, sliding
window and size of the object are 64 × 128. But the detection performance dropped
to 40% when the sliding window is 16 × 32. Since the proper size of the pedestrian
in the image is unknown previously, it is difficult to detect when the resolution is
low. Objects in low resolution are mostly missed in traditional method of detection
as shown in Fig. 1. It can be noted that traditional detectors detect humans which
are larger in images than the obscure objects like cars. The objects bounded by red-
bounding boxes are not so clear which make them hard to recognize. From Fig. 1
[1] the pedestrians marked as 5 and 6 can’t be recognized properly because of its low
resolution. Hence the objects 5 and 6 are classified as Obscure objects. Similarly ,
the cars marked as red boundary in Fig. 2 are not recognized.
Object detection is important in both low resolution and high resolution. It is diffi-
cult in low resolution than high-resolution image. Image super resolution recovers
a single or a sequence of high-resolution images from a single or sequence of low-
resolution images. It has several practical applications in various real-world problems
in a wide range of fields like satellite and remote sensing imaging, medical imaging,
computer vision, security, biometrics, and forensics, etc.

Fig. 1 Humans detection

Recognition of Obscure Objects Using Super Resolution-Based … 461

Fig. 2 Car detection

1.1 Image Super Resolution

Image super resolution recovers a single or a sequence of high-resolution images from

a single or sequence of low-resolution images. Based on the number of input images,
SR can be either single or multiple SR. SR algorithms were developed by frequency
domain-based method, probability-based methods, reconstruction-based methods,
regularization-based methods and learning-based methods, machine learning, and
deep learning. Single image super resolution (SISR) [2] often uses some learning
algorithms and attempts to envisage the missing information of the reconstructed
image by learning or mapping the relationship between LR and HR pair in the training
dataset. In today’s digital era, enormous data is available for training. The traditional
machine learning algorithms lack to define a complex high-dimensional mapping
for available massive raw data. With the advent of deep learning [3], powerful deep
networks have been trained to achieve state-of-the-art performance. It learns the
hierarchical representation of data to extract high-level abstractions that link LR and
HR space.

1.2 Deep Learning for Super Resolution

high complex mapping. Such deep networks can be efficiently trained using batch
normalization. The learning ability of CNN is made powerful with skip connections
and residual blocks, where instead of identity learning the network learns the residue.
This design choice has relived the network from vanishing gradient problem which
remained a bottleneck in training deep networks. VDSR has increased network depth
by piling more convolution layers with residual learning. EDSR and MDSR use the
residual block to build a wide and deep network with residual scaling, respectively.

2 Proposed Methodology

After the development of deep convolutional networks, object detection of small

or obscure objects has been an easy task. A particular object is considered small
if it occupies less than 1% of the total image area. This problem is seen in many
of the applications today like traffic signals, detection of people in shopping malls,
and detection of cars on roads. Detection of obscure objects is very difficult. It is
difficult to detect small objects from generic clutter in the background. And also small
objects become smaller with each pooling layer when the image passes through a
CNN architecture like VGG16. For example if the size of an object is 32 × 32, it will
represent at most one pixel after five layers of pooling in VGG16. Another reason
that complicates the detection is that there is less availability large dataset for small
objects and also some small objects have simple shapes. MS COCO and VOC 2012
have some specific instances of small objects but there are no large datasets for small
objects.
Most of the works are focused on recognition using single or multiple clear objects.
The obscure objects such as License plates, shoes, and wall clocks are not recognized.
From the human perception, an object is recognized if it is classified under particular
class or the information is correctly extracted. Obscure objects may be small, rotated,
or translated and not visible due to poor lighting conditions. The common problem
found while recognizing obscure objects is that the objects are at low resolution.
So to improve the recognition accuracy for obscure objects, super resolution using
deep learning model is developed. To increase the resolution of the detected object,
generative adversarial network (GAN) is used in which the object is detected by
using region-based convolutional neural networks (RCNN). This method minimizes
the generative loss and discriminator loss and to improve the accuracy of the network.
The proposed methodology is shown in Fig. 3.
The proposed system for obscure objects recognition is super resolution-based
deep learning using generative adversarial networks (GAN). The obscure objects
are trained, and their features are extracted using neural networks. The features thus
obtained are used to detect the objects by using region-based convolutional neural
network (RCNN). The region of interest of detected objects is then extracted and
super resolved by using generative adversarial networks. The super resolved ROI is
further recognized by AlexNet.
Recognition of Obscure Objects Using Super Resolution-Based … 463

Fig. 3 Proposed
methodology
Input Image

Object Detection Using Region Based

Convolutional Neural Networks (RCNN)

Extraction of Region of Interest of

Obscure Object

Super Resolution of ROI using Generative

Adversarial Networks

Recognition/ Classification of Super Resolved

ROI(Obscure Object) Using ALEXNET

STOP

2.1 Stages of Recognition

In the proposed method, for the identification of multiple objects using super
resolution-based deep learning approach, two stages of three networks RCNN, GAN,
and ALEXNET are used, as shown in Fig. 4.
Stage I Object Detection using RCNN. Convolutional neural networks (ConvNets
or CNNs) have proven very effective in recognition and classification of images.
RCNN helps in localizing the objects. RCNN consists of three simple steps. The
input image is scanned for possible objects to generate region proposals. CNN runs
on top of each of the regions. The output of each CNN is fed into a classifier that
464 B. Sathyabama et al.

Input Image RCNN- GAN- STAGE ALEXNET

with Obscured STAGE I II (Super Reso- STAGE III
Objects (Detection) lution) (Classification)

Fig. 4 Three stages of proposed work

classifies the region, and then a linear regression is done to tighten the bounding
box of the object. For an obscure object like license plate image as input image, the
region of the objects (number plate, numbers, fonts) is extracted, and this region is
fed as input to the CNN. The network trains and extracts features for the objects. The
result is the trained network which can be used in the testing stage.
Stage II. Super Resolution Using Generative Adversarial Network. Now the
boundary of the detected object is extracted, and the region of interest is cropped and
fed as an input to the generative adversarial network for super resolution [4]. The
GAN usually consists of two networks: generator network and discriminator network.
Different styles of architectures can be used in both generator and discriminator
networks.
To enhance the overall quality of the reconstructed SR image, this section first
proposes a novel network design for generator and discriminator and then the
improved loss function. It is started with a HR version of the input image followed by
the lower version. In order to train the generator, the low-resolution image should be
given, to get the output close to the high-resolution version. The obtained output is
super resolved image. Then the discriminator will be trained to distinguish between
the images. The generator network uses a set of residual blocks that comprises ReLUs
and BatchNorm and convolution layers. After the low-resolution images pass through
these blocks, there are two deconvolution layers that increase the resolution. The
discriminator has eight convolutional layers that lead to sigmoid activation function
that produces the probabilities of whether the image is of high resolution, the real
image or super resolution, and artificial image. The architecture of GAN is shown in
Fig. 5 [4].
Loss Functions. Loss functions are actually a weighted sum of individual loss
functions.
Content Loss. It is the Euclidean distance loss between the feature maps of the new
reconstructed image (i.e., the output) and the actual high-resolution training image.
Adversarial Loss. It supports the outputs that are similar to the original data distri-
bution by negative log likelihood. With the help of this loss function, the generator
makes sure to output the larger resolution images which look natural and also retain
almost same pixel space when compared to the low-resolution version.
−3 SR
l SR = l SR
X + 10 l GEN (1)
Recognition of Obscure Objects Using Super Resolution-Based … 465

Fig. 5 Architecture of GAN

Mean Square Error Loss. It is the difference sum over the generate image and
the target image; this is clearly minimized when the generated image is close to the
target.

1 HR LR 2
rW r H
SR
lMSE = I − G θ I (2)
r 2 WH x=1 y=1 x,y G x,y

VGG Loss. It is the difference sum of the feature space from the VGG network
instead of the pixels, and features are matched instead. It makes the generator much
more capable of producing natural looking images than by pure pixel matching alone.

Wi, j Hi, j
1 2
j = ∅i, j I HR x,y − ∅i, j (G θG I LR )x,y
SR
lVGG/i. (3)
Wi, j Hi, j x=1 y=1

Thus, the ROI is super resolved by using generative adversarial networks and
further provided for classification.
Stage III Classification Using AlexNet. AlexNet is a pre-trained convolutional
neural network model. As a result of learning, this network has rich feature repre-
sentations for a given wide range of images. This network takes an input image and
outputs a label for all the objects along with the probabilities. It supports transfer
learning [5].
This model has 25 layers which comprises five convolutional layers and three
fully connected layers. Normalization is done after each convolutional layer. ReLU
is performed after each convolutional and fully connected layer shown in Fig. 6 [5].
Dropout is applied before the first and the second fully connected layer. Dropout
466 B. Sathyabama et al.

Fig. 6 AlexNet architecture

and normalization are the two layers added to the AlexNet model other than the four
common layers of CNN. Dropout is a regularization technique to reduce overfitting
in neural networks. When a function is too closely fit to a limited set of data points,
overfitting occurs. It prevents complex co-adaptations on training data [5]. It is a
best way to perform model averaging with neural networks. Normalization layer
performs a transformation which maintains the mean activation almost as 0 and the
activation standard deviation close to 1.
So the first image of obscure object is detected using RCNN, the detected image is
super resolved by using generative adversarial network, and finally the super resolved
is classified by using AlexNet.

3 Results and Discussion

There are three stages of training and testing. In the stage I of object detection using
RCNN, the MSCOCO dataset is collected from the Google and object detection is
implemented in KERAS, python.

3.1 Database for Stage I RCNN:

COCO (COMMON OBJECTS IN CONTEXT) is a large-scale object detection,

segmentation, and captioning dataset. COCO has several features. This dataset
contains 123,287 images, 886,284 instances, and annotations. An example of COCO
Dataset is shown in Fig. 7 [6].
Since COCO dataset contains all the objects, training is with MS COCO dataset
will be able to detect any kind of obscure objects. Hence, MS COCO dataset is used
for obscure objects detection [6].
Recognition of Obscure Objects Using Super Resolution-Based … 467

Fig. 7 MSCOCO Dataset

Table 1 Database of shoe

S. No. Type No. of Subclasses No. of Images
images
1 Boots 5 13,393
2 Sandals 3 6089
3 Shoes 10 30,169
4 Slippers 3 1069

3.2 Stage II-GAN

In the stage II of super resolution using generative adversarial networks, the shoe
database consists of 50,025 images that are collected for training. Shoe is taken as
one of the obscure objects, and the super resolved shoe image is classified under
eight classes: sandals, shoes, slippers, pre-walker boots, over the knee, mid-calf,
ankle, and boots by using AlexNet in stage III of classification. The database images
containing the obscure object shoe are collected and shown in Table 1.
The database images containing the obscure objects (shoes) are collected and are
shown. Totally, 50,025 shoe images are collected under this fourth category from UT
Zappos 50K dataset as shown in Fig. 8 [7].

3.3 Stage III-ALEXNET

In order to train the AlexNet to recognize the super resolved image, the shoe images
(2100 images) are categorized into eight classes, namely ankle, knee calf, mid-calf,
over the knee, pre-walker boots, sandals, slippers, and shoes and trained for 500
iterations.
468 B. Sathyabama et al.

Fig. 8 Shoe dataset

3.4 RCNN-Based Object Detection

For RCNN, the input is the region of the object in the image. The region of the object
in the image is extracted and labeled. This region is stored which is used as an input for
RCNN. Since the region of the object is given as the input, this convolutional neural
network is known as the region-based convolutional neural network. This extraction
of regions helps in the easy recognition of objects even in complex background.
Here the mask RCNN model [6] is previously trained by using MSCOCO dataset
and that model is used for object Detection. The result of mask RCNN is shown in
Fig. 9. All the objects like human airplanes and cars can be detected by using MS
COCO dataset. These objects which cannot be recognized are defined as obscure
objects and GAN-based super resolution is applied.
Thus, this pre-trained model [6] is used to detect obscure objects in an image.
The boundaries of obscure object which is the region of interest (shoe is considered
here) are extracted. By using the bounding box coordinates of ROI, the object is
cropped and given as the input to generative adversarial network (GAN). Mask
RCNN provides good accuracy comparing to any RCNN networks as it involves
instance segmentation process.

3.5 GAN-Based Super Resolution

For GAN, two phases of training are carried out with different datasets of images.
Table 2 presents the training stage of GAN.
Recognition of Obscure Objects Using Super Resolution-Based … 469

Fig. 9 Obscure objects detection in RCNN

Table 2 Training
Phase I(SRGAN X4) Phase II (SRGAN X4)
stages—GAN
Total images 682 50,025
Training set 650 951
Test set 31 249
Dimensions 200 × 200 × 3 100 × 100 × 3
Epochs 500 500
Batch size 4 4
470 B. Sathyabama et al.

Input to Generator
First 100 × 100 image is down sampled to 25 × 25 image and resized to 100 × 100
image to get low-resolution image.
AlexNet-based Classification
The first layer, the image input layer, requires input images of size 227-by-227-by-3,
and here 3 is the number of color channels. In the implementation of AlexNet, to split
the network across two limited memory GPUs for training, some convolutional layers
use filter groups. In these layers, the filters are split into two groups. The layer splits
the input into two sections along the channel dimension and then applies each filter
group to a different section. The layer then concatenates the two resulting sections
together to produce the output. For example, in the second convolutional layer in
AlexNet, the layer splits the weights into two groups of 128 filters. Each filter has 48
channels. The input to the layer has 96 channels and is split into two sections with 48
channels. The layer applies each group of filters to a different section and produces
two outputs with 128 channels. The layer then concatenates these two outputs to give
a final output with 256 channels. The network has almost 62.3 million parameters.
These parameters are used to define the labels for each classes. This AlexNet model is
used for training purpose. In AlexNet, totally 2100 images of eight classes are trained
under different categories after creating a CSV file, and three different datasets are
tested in order to know about the classification accuracy (Fig. 10).
It can be seen that the low-resolution images are classified with the accuracy
of 68%, and the super resolution images are classified with more accuracy of 84%.
Hence, the super resolution helps in recognition and classification of obscure objects.
Performance Metrics
Table 3 shows the performance analysis with PSNR and SSIM metrics and Table 4
shows Alex Net Accuracy Results.

4 Conclusion

This paper has implemented an obscure object recognition system in which the
obscure objects are detected using RCNN and then super resolved using genera-
tive adversarial networks. The resolution improved images are then classified by
using a standard classifier AlexNet. The experiment is conducted in two phases:
One is using MSCOCO dataset, and the other is using collected shoe dataset. The
proposed work is simulated in Google Colab framework with Keras API (TensorFlow
backend). The experimental results have shown that the accuracy of classification is
obviously increased by 16% for the super resolved obscure objects. This task finds
direct applications in footprint detection and recognition, abnormal event detection
from surveillance videos, and license plate recognition, extracting information from
Recognition of Obscure Objects Using Super Resolution-Based … 471

LR IMAGE GENERATED IMAGE HR IMAGE

PHASE I RESULTS (A)

PHASE-I RESULTS (B)

PHASE II RESULTS (A)

PHASE II RESULTS (B)

Fig. 10 a PhaseI results (A&B). b Phase II results(A&B)

472 B. Sathyabama et al.

Table 3 Performance
Peak signal-to-noise ratio Structural similarity
metrics GAN
(PSNR) (SSIM)
Phase I Maximum value: 43.4 db Maximum value:
(Range: 40–43 db) 0.356
Phase II Maximum value: 48.2 db Maximum value:
(Range: 43–48 db) 0.422

Table 4 AlexNet accuracy results

Test set batch Number of images in test Number of images Accuracy (%)
set correctly classified
LR images 50 34 68
Super resolved images 100 84 84
(generated by GAN in
previous stage)
HR Images 420 392 93.3

satellite imagery and obscure objects recognition. The proposed method helps in
better recognition of obscure objects [1, 8].

References

1. https://www.cis.upenn.edu/%7Ejshi/ped_html/
2. Huang J-B, Singh A, Ahuja N (2015) Single image super resolution from transformed
self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp 5197–5206
3. Fu L-C, Liu CY (2001) Computer vision based object detection and recognition for vehicle
driving. In: Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automa-
tion (Cat. No.01CH37164), Seoul, South Korea, pp 2634–2641, vol 3. https://doi.org/10.1109/
ROBOT.2001.933020
4. Ledig C (2017) et al. Photo-realistic single image super-resolution using a generative adversarial
network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Honolulu, HI, pp 105–114. https://doi.org/10.1109/CVPR.2017.19
5. Krizhevsky A, Sutskever I, Hinton GE (2012) Image net classification with deep convolutional
neural networks
6. https://github.com/matterport/Mask_RCNN
7. http://vision.cs.utexas.edu/projects/finegrained/utzap50k/
8. Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for
accurate object detection and segmentation. IEEE Trans Pattern Anal Machine Intell 38(1):142–
158. https://doi.org/10.1109/TPAMI.2015.2437384
Low-Power U-Net for Semantic Image
Segmentation

Vennelakanti Venkata Bhargava Narendra, P. Rangababu,

and Bunil Kumar Balabantaray

Abstract Digital image segmentation especially semantic segmentation of RGB

images is a computationally intensive task in digital image processing as it needs low-
level dimensional information. Image segmentation finds its applications in science
and as well as in industries. Deep learning-based segmentation is by now robustly
established as a strong mechanism in image segmentation. Nowadays, graphics pro-
cessing units (GPUs) are widely used for the acceleration of deep learning appli-
cations. GPUs are often costly and power hungry. Field programmable gate arrays
(FPGAs) provide a low-cost, low-power and high power-efficient alternative for infer-
ence of deep learning architectures to GPUs. U-Net is a deep learning architecture
for semantic image segmentation among many other architectures used for image
segmentation. This work demonstrates the usage of an end-to-end system to develop
and deploy a low-power deep learning architecture on FPGA. For the aforementioned
purpose, U-Net, a popular convolutional neural network for semantic segmentation
of images, is chosen. The network when tested on images of size 1024 × 512 shows
that FPGA consumes roughly one-third of the power that is consumed by GPU with
minimal reduction in accuracy. The network uses Xilinx® deep learning processor
unit (DPU) for implementing convolutional neural networks on the programmable
logic of the FPGA.

Keywords Convolutional neural network · Deep learning processing unit · Image

segmentation · Field programmable gate array · Graphics processing unit

This work is supported by TEQIP-III project at NIT Meghalaya and was funded by World Bank,
NPIU, and MHRD, Govt. of India.

V. Venkata Bhargava Narendra · P. Rangababu (B)

Department of Electronics and Communication Engineering, National Institute of Technology
Meghalaya, Shillong 793003, India
e-mail: [email protected]
P. Rangababu · B. K. Balabantaray
Department of Computer Science and Engineering, National Institute of Technology Meghalaya,
Shillong 793003, India
e-mail: [email protected]

1 Introduction

Image segmentation is the task of dividing an image into various regions correspond-
ing to the distinct characteristics of the pixels. Semantic image segmentation is the
task of labeling each pixel of an image to a corresponding class. Let S be the complete
contiguous region occupied by an image. Then, image segmentation can be viewed
as a process that divides S into n sub-regions, S1 , S2 , ....Sn , such that

n
1. Si = S
i=1
2. Si is
a connected set, for i = 0, 1, 2, …, n
3. Si ∩ S j = ∅ for all i and j, i = j
4. F(Si ) = TRUE for i = 0, 1, 2, …, n
5. F( Si ∪ S j ) = FALSE for any adjacent regions Si and S j .
where F(Si ) is a analytical base defined over the points in set Sk , and ∅ is the null
set. The basic problem in segmentation is to divide an image into regions that satisfy
the above conditions. Humans use ample data when performing segmentation, but
putting the knowledge into effect would require substantial human efforts, computa-
tion time and a database with substantial knowledge of domain. Deep neural network
(DNN) segmentation succeeds in dealing with these problems by extracting the field
understanding from a database of pixels that are labeled. An image segmentation
neural network can process small areas of an image to extract simple characteris-
tics. A decision-making mechanism or another neural network can then integrate
these characteristics to label the areas of an image accordingly. When compared to
other deep learning techniques, convolutional neural networks (CNNs) have shown
exceptional functioning in various computer vision problems like segmentation [14],
detection of objects [19], etc.

1.1 Convolutional Neural Networks

CNNs play a major role in the development of computer vision with deep learning.
They are almost similar to normal neural networks and specialize in capturing the
temporal and spatial information of the inputs. A CNN/ConvNet makes the appar-
ent presupposition that images are inputs, which makes it easier to us to embed
explicit properties into the architecture. This results in the reduction of variables
in the network makes them efficient when compared to feed-forward neural nets.
CNNs contain filters/kernels and biases that learn various patterns/objects in parallel
Low-Power U-Net for Semantic Image Segmentation 475

specific to the images during the training phase and use that knowledge to identify
the objects during inference. The individual building blocks of a CNN are namely
convolutional layer, pooling layer and flattening layer [9].

1.2 Quantized Neural Network

The motivation behind quantizing neural networks is to make the models compact
without any or negligible loss in accuracy. With quantized neural networks, bitwise
operators can be used rather than floating point operations to perform forward and
backward passes. Particularly using fixed point operations saves energy [7] and is
much suitable where power consumption is a critical factor. The components that can
be quantized in a neural network are weights, activations and gradients. Gradients are
quantized to reduce the communication cost between processing units when training,
whereas weights and activations are quantized to reduce the computational intensity
and network memory footprint during inference. Quantized neural networks (QNN)
can have independent data representation of inputs, weights and output activations,
as well as different bit widths in different layers of the same networks.

1.3 Need of Field Programmable Gate Array (FPGA) for

inference of CNNs

FPGA is a semiconductor integrated circuit which can be reprogrammed to desired

application requirements even after manufacturing, and in fact, their functionality
can be changed for every power-up. The user can take advantage of certain properties
like choosing degree of parallelism for the application and other architectural bene-
fits to accelerate the application and/or to reduce the power consumption. Benefits of
FPGAs are flexibility, acceleration, integration and total cost of ownership. Naturally
CNNs are computationally intensive models. As the network becomes complex, bil-
lions of arithmetic operations, millions of variables and considerable computational
power are needed to train and carry out inference of the extensive CNNs [14, 19].
Mostly GPUs are used as hardware accelerators for CNNs [21] during both train-
ing and inference phases because of their high throughput and memory bandwidth
as they are greatly coherent for floating point matrix-based operations. But GPUs
are often costly and consume considerable amount of power and suffer from poor
power efficiency. Hence, operating GPUs in a power constrained environment or
for applications which require high power efficiency becomes difficult. It has been
well established that FPGAs provide an alternative to GPUs and when carefully pro-
grammed, surpass GPU in terms of performance with less power consumption than
that of GPU [13].
476 V. Venkata Bhargava Narendra et al.

2 Related Works

Ronneberger et al. [14] proposed a network that can give accurate segmentations
even when trained on very small number of images (in the order of 10 s). The
network consists of contracting path (downsampling/subsampling path)where con-
volutions are applied on input image and features are extracted and an expansive
path (upsampling path) which uses the features extracted in the previous step to
construct a segmented images using up convolutions. This network is more robust
to noise and is especially suitable for medical image segmentation. Moons et al.
[12] presented a methodology to reduce the consumption of energy of embedded
neural networks by presenting quantized neural networks. Also, a hardware energy
model is presented for topology selection. In [11], authors presented wide reduced-
precision network (WRPN) for quantization of weights and activations. It was found
that the activations occupy more memory than weights. Hence, the authors adopted
a strategy of using increased number of filters in each layer to compensate accuracy
degradation due to quantization. DNNs have large number of variables but not all
of them are of equally significant. DNNs are more immune to noise [18]. Random
noise functions act as regularizers, and adding noise to inputs or weights occasion-
ally can attain more desirable performance. In a quantized neural network (QNN),
low-precision operations can be considered as random noise. Recent theories like the
ones discussed in [2] suggest that QNNs still preserve many significant properties
of their full-precision equivalents. Implementation of CNNs on FPGAs came into
sight in the 90s when virtual image processor was introduced by Cloutier et al. [3].
Virtual image processor is a single-instruction stream multiple-data stream (SIMD)
multiprocessor architecture. It achieves better performance by using approximate
computing to reduce computational intensity. Umuroglu et al. [20] introduced neu-
ral network library for fast and easy implementation of neural networks on FPGAs.
The authors employed a novel set of optimizations that allowed robust mapping of
binary neural network (BNN) to hardware. They implemented fully connected layer,
convolution layer and pooling layer. They achieved significant performance with
low-power on MNIST dataset, CIFAR-10 dataset and SVHN datasets. Nurvitadhi et
al. [13] not long ago assessed emerging DNN algorithms on present day graphics
processing units (GPUs) and FPGAs. The results show that the present trends in
CNNs may support FPGAs. In some cases, FPGAs may offer greater performance
than GPUs. Suda et al. [17] implemented extensible layers on FPGA and proposed a
methodical design space exploration methodology to increase the throughput of an
OpenCL-based FPGA accelerator for a given CNN model, taking into account the
FPGA resource limitations. Farabet et al. [5] presented a coherent execution of CNN
on a low-end digital signal processor (DSP)-aimed FPGAs. The execution utilizes
the intrinsic parallelism of CNN and yields full benefit from multiply and accumulate
(MAC) units on the FPGA. They demonstrated that with proper memory bandwidth
to an external memory, interesting performance can be achieved. Faraone et al. [6]
argue that in place of concentrating on developing robust designs to speed up the
well-known low-precision CNNs, we should also attempt to alter the network to befit
Low-Power U-Net for Semantic Image Segmentation 477

the FPGA. They developed a completely automated tool-flow which concentrates on

the altering the network architecture through filter pruning. Su et al. [16] proposed
a quantization training strategy that lets quantized neural network inference with a
reduced memory footprint and competitive model accuracy. They also investigated
the accuracy-throughput trade-off for different parameter precision applied to vari-
ous types of neural network architectures. This work provides a valuable insight on
the correlation between data representation (precision) and efficiency of hardware.

3 Methodology

3.1 Vitis™ AI Development Kit

Vitis™AI Development environment comprises the Vitis-AI development kit, for

the AI inference on Xilinx® hardware platforms. It consists of IP cores that are
optimized for the selected Zynq® -7000 SoC, Zynq® UltraScale+™MPSoC devices,
tools to quantize, compile and analyze the performance of the network, libraries
that make it easier to develop the application, models and example designs that
can be deployed instantly. The development environment allows the development
of models in familiar frameworks like Caffe, Tensorflow and facilitates the porting
of the models into FPGA suitable format with little to no efforts. The design is
highly efficient and easy to use, thus releasing the complete possible capability of AI
acceleration on Xilinx® FPGA and adaptive compute acceleration platforms (ACAP).
This facilitates the users to develop deep learning inference applications, without in-
depth knowledge on FPGAs by extracting away the complexity of the intrinsic FPGA
and ACAP devices. The Vitis-AI stack is shown in Fig. 1.

3.2 Development Flow

• Choose an application where machine learning is desirable.

• Develop machine learning architecture using any of the frameworks shown in
Fig. 1 and train the network.
• Quantize the network to the required precision using artificial intelligence (AI)
quantizer.
• Choose the device of your choice and integrate deep learning processor unit (DPU)
with the processing system of the corresponding FPGA in Vivado Design Suite.
• From the design created in the above step, generate a DPU configuration file.
• Now, compile the quantized network using the DPU configuration file with AI
compiler.
• Generate boot files and software development kit (SDK) using the Xilinx Petalinux
tool.
478 V. Venkata Bhargava Narendra et al.

Fig. 1 Vitis-AI stack [18]

• Finally, using SDK generated, develop a software application project in C++/

Python so that the network can run on DPU.
• Copy the software application, boot files and other necessary data into the SD card
and run the DPU application and evaluate the performance.

The flow is summarized in Fig. 2.

3.3 Deep Learning Processing Unit

The Xilinx® DPU is a customizable computation engine committed to CNNs. It

contains a greatly optimized instruction set and supports most of the CNNS. The
implementation of DPU is limited to a selected Zynq® -7000 SoC and Zynq® Ultra-
Scale+™MPSoC due to limited number of resources on the programmable logic
(PL) on the devices [1]. The user should provide the instructions that are required
to run the DPU and also memory that can be accessible by the DPU for reading and
writing of images and storing of temporary data. A program running on the acceler-
ated processing unit is also required for servicing of interrupts and coordination of
data transfers. Figure 3 shows the top-level diagram of DPU. The configuration of
DPU that is used when it is connected to the processing system of Zynq® -7000 SoC
is shown in Table 1.
Low-Power U-Net for Semantic Image Segmentation 479

Fig. 2 Flowchart
representation of
development flow Stop

Choose a problem
and prepare dataset that
suits
your problem
Run the application
and copy the results
back into host PC
for postprocessing

Choose a network

Copy the boot files,

application and
necessary data into
SDcard and deploy
on FPGA
Train the network

No
Export the compiled
model into SDK and
create a linux
Reached application
expected
mIoU
(chosen metric)

Compile the model

Yes

Quantize the network Yes

Negligible
Evaluate mIoU or no reduction
of the quantized in mIoU
model
480 V. Venkata Bhargava Narendra et al.

Fig. 3 Deep learning processing unit

Table 1 DPU configuration

Number of cores 1
DPU architecture 1152
RAM usage Low
Channel augmentation Enabled
Depthwise convolution Enabled
Average pool Enabled
ReLU type ReLU + Leaky ReLU + ReLU6
Number of SFM cores 0

3.4 Network Architecture

U-Net is a popular CNN. The architecture is built upon fully convolutional network
(FCN) [15] such that it can work even with very less number of training images. The
network comprises a downsampling path and upsampling path. The downsampling
path is made up of repeated implementation of 3 × 3 sliding-window convolutions
where each convolution is followed by a nonlinearity (ReLU), 2 × 2 max pooling unit
with stride 2 for subsampling. At every subsampling step, the number of feature maps
is increased by two times. The upsampling path is similar to that of the downsampling
path except that the pooling layers in the downsampling path are replaced with
deconvolutions or up-convolution to improve the resolution of the feature maps.
High-resolution feature maps, i.e., feature maps from the second convolution layer
in each step of the downsampling path are concatenated with the output from the
upsampling path. The network is implemented using Caffe framework [8] with some
minor modifications in the architecture so that the network can be deployed on
FPGA.1 The modifications are as follows:
• Replacing the un-pooling layer with deconvolution layer in the expansive path
• Replacing all PReLU with ReLU

1 The code used is taken from Vitis-AI github Repository.

Low-Power U-Net for Semantic Image Segmentation 481

• Replaced BatchNorm layers with a merged BatchNorm + Scale layer

• BatchNorm with scaling layers is inserted before ReLU layers as the DPU does
not support the data flow from convolution to both the concatenation and ReLU
at the same time.

3.5 Dataset

The data used to train the network is obtained from [4]. The dataset2 concentrates on
semantic understanding of urban street scenes. The dataset consists of 5000 images
with high-quality annotations, 20,000 images with coarse annotations from 50 cities
which are taken over a span of several months during daytime. The dataset contains 30
classes which are instance-wise annotated and also dense annotated so that the dataset
can be used for both instance segmentation, semantic segmentation, respectively.
Images with high-quality annotations are selected for the training of the network.

3.6 Training

The network is trained on 2975 images with high-quality annotations using Caffe
framework. Adam optimizer [10], a stochastic gradient descent optimization strat-
egy, is used to achieve convergence. The images are resized to 512 × 512 and are
normalized per channel on the fly (Fig. 4).
Training the network greatly depends on choice of hyperparameters, and a wrong
selection can lead to overfitting, underfitting or even not to train at all. The most
important hyperparameter is learning rate (0.0005) which tells the optimizer how far
has to move the weights in the direction of the gradient. As described in [14], a high
momentum is used (0.9) such that a great number of samples that are already seen
during the training decide the update in the ongoing optimization step. The large
weights are penalized by a factor of 0.0005. The “softmax with loss layer” of the
Caffe framework is employed to calculate the cost function. The network processes
eight (batch size) RGB training images for each iteration. The network is trained for
2000 iterations on two Nvidia Quadro RTX 5000 GPUs simultaneously (multi-GPU
training). To reduce the training complexity, the network is trained on 19 classes out
of the 30 classes available. The classes and their corresponding legends are shown
in Fig. 5.

2 The dataset can be found at https://www.cityscapes-dataset.com/.

482 V. Venkata Bhargava Narendra et al.

INPUT OUTPUT
512X512 3
512X512 3
Conv2d 1X1 +
Softmax
Conv2d 3X3 + BN
+ReLU 512X512 64

Conv2d 3X3 + BN
+ReLU
512X512 64
512X512 64
Conv2d 3X3 + BN
+ReLU
512X512
Conv2d 3X3 + BN
512X512 64 64 +ReLU

512X512 64
Pool 2X2

Upsample 2X2
256X256 64

Conv2d 3X3 + BN 512x512 128

+ReLU

Conv2d 3X3 + BN
256X256 128 +ReLU

Conv2d 3X3 + BN 256X256 128

+ReLU
256X256
Conv2d 3X3 + BN
256X256 128 128 +ReLU

256X256 128
Pool 2X2

Upsample 2X2

128X128 128 128X128 256

Conv2d 3X3 + BN
+ReLU Conv2d 3X3 + BN
+ReLU

128X128 256 128X128 256

Conv2d 3X3 + BN
+ReLU Conv2d 3X3 + BN
128X128
+ReLU
256
128X128 256 128X128 256

Pool 2X2
Upsample 2X2

64X64 512
64X64 256
Conv2d 3X3 + BN
Conv2d 3X3 + BN +ReLU
+ReLU
64X64 512
64X64 512
64X64 Conv2d 3X3 + BN
Conv2d 3X3 + BN +ReLU
+ReLU
32X32 512
64X64 512
32X32
64X64 512
32X32
Conv2d Conv2d
Pool 2X2 3X3 + BN 3X3 + BN Upsample 2X2
+ReLU +ReLU

512 1024 1024

Input shape of a particular layer are defined on the top-left side of the box and input channels are represented on top-right side of the box.
whereas for upsampling path they are on bottom left and bottom right respectively

Fig. 4 U-Net architecture [14]

Low-Power U-Net for Semantic Image Segmentation 483

Fig. 5 Class-legend (RGB

format)

4 Experiments and Results

4.1 Processing System—Programmable Logic System

ZedBoard™is a low-cost development kit which uses the Xilinx Zynq® -7000
APSoC. The board comprises all the necessary connections, ports and assisting func-
tions to facilitate a variety of uses. The expansible characteristics of the board make
it best suitable for quick prototyping and proof-of-concept development. The PS-PL
connections of ZedBoard™3 are made as shown in Fig. 6 The hardware utilization
reports are shown in Table 2.

4.2 Network Inference and Quantization

As the training progressed, regular mIOU measurements were taken to score the
models against the Cityscapes validation dataset (500 images). The mIOU plot is
shown in Fig. 7. From Fig. 7, it is observed that after 2000 iterations, the mIOU

3 The
scripts required to develop the PS-PL system can be found at “LFAR: Porting the ResNet-50
CNN application to a ZedBoard”.
484

Fig. 6 PL-PS connections for Zynq® -7000 SoC

V. Venkata Bhargava Narendra et al.
Low-Power U-Net for Semantic Image Segmentation 485

Table 2 Resource utilization

Site type Used Fixed Available Util%
Slice LUTs 33,079 0 53,200 62.18
LUT as logic 31,328 0 53,200 58.89
LUT as memory 1751 0 17400 10.06
Slice registers 62,912 0 106,400 59.13
Register as flip 62,912 0 106,400 59.13
flop
F7 muxes 269 0 26,600 1.01
F8 muxes 19 0 13,300 0.14
Block RAM tile 123 0 120 87.86
RAM36/FIFO 112 0 120 80.00
RAMB18 22 0 280 7.86
DSP 212 0 220 96.36

Fig. 7 mIoU plot

achieved is 0.27124. This model is used for quantization. Three different quantized
models are generated each with different precision which are
• INT8 (Both the weights and activations are 8-bit quantized)
• A6W8 (Activations are 6-bit quantized, and weights are 8-bit quantized)
• INT4 (Both the weights and activations are 4-bit quantized).
The mIOU of the floating point(FP) model and the quantized models is shown in
Table 3. The quantized models were scored against the Cityscapes validation dataset.
486 V. Venkata Bhargava Narendra et al.

Table 3 meanIoU of floating point model and quantized models

Floating point INT8 A6W8 INT4
mIOU 0.27214 0.270442 0.231647 0.056643

From Table 3, it can be observed that there is almost no loss of mIOU for 8-bit
quantization of weights and activations and there is a drastic reduction of mIOU
for 4-bit quantization of the weights and activations. Whereas the loss of A6W8
quantized model lies somewhere in between INT8 and INT4 quantized models. The
8-bit quantized model is compiled for the development of the application. Finally,
a multithread segmentation application was developed and deployed on FPGA. All
500 images of the validation dataset could not fit on the buffer due to hardware
limitations. So, the network was tested on five images, and out of which, a sample
image and inference results of GPU and FPGA are shown in Fig. 8. The performance
profiling information of the 8-bit neural network on FPGA is shown in Table 4.
The profiler gives the following details for each layer.
• Workload: The computation workload of DPU kernel in the unit of MOPS.
• Memory: The memory space in the unit of MB that is required by the DPU for
feature maps of hidden layers layers.
• RunTime: Time taken to execute each layer.
• Performance: The computation efficiency or performance in the unit of GOPS.
• Utilization: The effective use of DPU in %.
• MB/S: The average bandwidth of DDR memory access.
The power consumption and execution time of the network in graphics processing
unit (Nvidia Quadro RTX 500) and FPGA (ZedBoard) are shown in Table 5.
The ZedBoard execution time includes reading/preparing the input and writing
the output, whereas the GPU measurement only includes the forward inference time
of the models.

5 Discussion

The resource utilization of the system that is deployed on the programmable logic
part of ZedBoard is shown in Table 2. As the network is large, we can observe
from Table 2 that almost all the block RAMs and DSPs are utilized and also a
significant portion of other resources is also utilized. U-Net is developed using Caffe
framework. The network is trained for 2000 iterations on two Nvidia Quadro RTX
GPUs and achieved an mIOU of 0.272214. The average forward pass time when
the network is deployed on GPU is 193.986 ms. The power consumed by GPU
Low-Power U-Net for Semantic Image Segmentation 487

(a) Sample Image [?]

(b) Floating point model result evaluated on host

(c) Fixed point model result evaluated on FPGA

Fig. 8 Inference output

488 V. Venkata Bhargava Narendra et al.

Table 4 Profiling information

[DNNDK] Performance profile—DPU Kernel “segmentation_0” DPU Task “segmentation_0-3”
ID Node name Workload Mem Run time Perf Utilization MB/S
(MOP) (MB) (ms) (GOPS)
(%)
1 conv_d0a_b 452.98 8.44 8.96 50.5 48.7 941.5
2 conv_d0b_c 63.68 16.11 118.42 81.6 78.7 136.1
3 BatchNorm_d0c 16.78 16.02 17.12 1.0 0.9 935.7
4 pool_d0c_1a 0.00 10.03 6.91 0.0 0.0 1451.1
5 conv_d1a_b 4831.84 6.10 54.34 88.9 85.8 112.3
6 conv_d1b_c 9663.68 8.20 99.56 97.1 93.6 82.3
7 BatchNorm_d1c 8.39 8.01 4.40 1.9 1.8 1820.2
8 pool_d1c_2a 0.00 5.01 3.45 0.0 0.0 1453.6
9 conv_d2a_b 4831.84 3.32 49.76 97.1 93.7 66.7
10 conv_d2b_c 9663.68 4.63 99.91 96.7 93.3 46.3
11 BatchNorm_d2c 4.19 4.01 2.22 1.9 1.8 1801.1
12 pool_d2c_3a 0.00 2.51 1.74 0.0 0.0 1438.2
13 conv_d3a_b 4831.84 2.67 48.77 99.1 95.6 54.7
14 conv_d3b_c 9663.68 4.34 96.74 99.9 96.3 44.8
15 BatchNorm_1 2.10 2.00 1.06 2.0 1.9 1899.1
16 pool_d3c_4a 0.00 1.25 0.89 0.0 0.0 1412.3
17 conv_d4a_b 4831.84 5.31 48.55 99.5 96.0 109.4
18 conv_d4b_c 9663.68 10.21 106.26 90.9 87.7 96.1
19 upconv_d4c_u3a 147.48 3.67 23.12 92.9 89.6 158.6
20 conv_u3b_c 19,327.35 7.87 200.08 96.6 93.2 39.4
21 conv_u3c_d 9663.68 4.34 96.75 99.9 96.3 44.8
22 upconv_u3d_u2a 4294.97 6.27 44.12 97.3 93.9 142.0
23 concat_d2cc_u2a_b 0.00 12.01 15.45 0.0 0.0 777.9
24 conv_u2b_c 28,991.03 10.05 293.98 98.6 95.1 34.2
25 conv_u2c_d 9663.68 4.63 99.91 96.7 93.3 46.3
26 upconv_u2d_u1a 2147.48 6.26 22.90 93.8 90.5 273.3
27 concat_d1cc_u1a_b 0.00 16.02 20.79 0.0 0.0 770.8
28 conv_u1b_c 19,327.35 12.41 199.36 96.9 93.5 62.3
29 conv_u1c_d 9663.68 8.20 99.57 97.1 93.6 82.3
30 upconv_u1d_u0a_NEW 2147.48 12.20 24.91 86.2 83.1 489.6
31 conv_u0b_c_New 19327.35 24.21 217.09 89.0 85.9 111.5
32 conv_u0c_d_New 9663.68 16.11 118.42 81.6 78.7 136.1
33 conv_u0d_score_New 318.77 10.42 6.43 49.6 47.8 1619.6
Total nodes In Avg. 204814.16 286.07 2251.94 91.0 87.7 127.0
All
Low-Power U-Net for Semantic Image Segmentation 489

Table 5 Power and time comparison

GPU (Floating point model) FPGA (INT8 model)
Power 178 Wa 55.5 Wb
Execution time 193.986 ms 2251.94 ms
a The GPU power was estimated using the “nvidia-smi” command while running forward inference.
b We get the total board power consumed by ZedBoard by measuring the voltage across pins 1 and
2 of J21 and divide by 10 m to get the input current. Then, multiply that by the measured input
voltage (which is 12 V).

for forward inference is 178 W. The network is 8-bit quantized and deployed on
ZedBoard. The time taken for execution of network (includes reading/preparing &
saving/writing the image) is 2251.94 ms. The power consumed by the ZedBoard
is 55.5 W which is approximately one-third of the power consumed by GPU. The
lack of acceleration of the application can be attributed to the insufficient amount
of resources on ZedBoard. One possible way of achieving acceleration is to prune
the network and remove redundant variables in the work. This way the network is
further compressed, acceleration maybe achieved, and also, power consumption can
further be reduced.

6 Conclusion

Usage of an end-to-end system provided by Xilinx for deploying convolutional neu-

ral networks on FPGAs is studied and demonstrated by implementing U-Net on
ZedBoard. The architecture of U-Net is modified so that it fits into the FPGA. The
network is trained on two GPUs simultaneously (multi-GPU training) and is deployed
on GPU and FPGA, and the performance is studied. The resource utilization of the
implemented IP (DPU) is shown in Table 2. The performance profiling of network
on FPGA shows that network consumes almost one-third of the power that is con-
sumed by GPU. The comparison is studied and is shown in Table 5. Detailed per
layer performance statistics are shown in Table 4. The reduction in the power con-
sumption of the network when deployed on FPGA once again shows the advantage
of quantization of neural networks and also the usage of FPGAs in power-critical
applications.

References

1. Xilinx. Vitis AI user guide (ug1414). https://www.xilinx.com/support/documentation/sw_

manuals/vitis_ai/1_0/ug1414-vitis-ai.pdf
2. Anderson AG, Berg CP (2017) The high-dimensional geometry of binary neural networks.
arXiv:1705.07199
490 V. Venkata Bhargava Narendra et al.

3. Cloutier J, Cosatto E, Pigeon S, Boyer FR, Simard PY (1996) VIP: An FPGA-based processor
for image processing and neural networks. In: Proceedings of fifth international conference
on microelectronics for neural networks. IEEE, pp 330–336. https://doi.org/10.1109/MNNFS.
1996.493811
4. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele
B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of
the IEEE conference on computer vision and pattern recognition, pp 3213–3223. https://doi.
org/10.1109/CVPR.2016.350
5. Farabet C, Poulet C, Han JY, LeCun Y (2009) CNP: An FPGA-based processor for convolu-
tional networks. In: 2009 International conference on field programmable logic and applica-
tions. IEEE, pp 32–37. https://doi.org/10.1109/FPL.2009.5272559
6. Faraone J, Gambardella G, Fraser N, Blott M, Leong P, Boland D (2018) Customizing low-
precision deep neural networks for FPGAs. In: 2018 28th International conference on field
programmable logic and applications (FPL). IEEE, pp 97–973. https://doi.org/10.1109/FPL.
2018.00025
7. Guo Y (2018) A survey on methods and theories of quantized neural networks. arXiv preprint
arXiv:1808.04752
8. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T
(2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd
ACM international conference on Multimedia, pp 675–678. https://doi.org/10.1145/2647868.
2654889
9. Kalade S (2019) Deep learning. In: Exploring Zynq MPSoC: With PYNQ and machine learning
applications, Chap. 20, 1 edn. Strathclyde Academic Media, pp 481–508
10. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint
arXiv:1412.6980
11. Mishra A, Nurvitadhi E, Cook JJ, Marr D (2017) WRPN: wide reduced-precision networks.
arXiv preprint arXiv:1709.01134
12. Moons B, Goetschalckx K, Van Berckelaer N, Verhelst M (2017) Minimum energy quantized
neural networks. In: 2017 51st Asilomar conference on signals, systems, and computers. IEEE,
pp 1921–1925. https://doi.org/10.1109/ACSSC.2017.8335699
13. Nurvitadhi E, Venkatesh G, Sim J, Marr D, Huang R, Ong Gee Hock J, Liew YT, Srivatsan
K, Moss D, Subhaschandra S, Boudoukh G (2017) Can FPGAs beat GPUs in accelerating
next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA interna-
tional symposium on field-programmable gate arrays. FPGA’17. Association for Computing
Machinery, New York, NY, USA, pp 5–14. https://doi.org/10.1145/3020078.3021740
14. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image
segmentation. In: International conference on medical image computing and computer-assisted
intervention. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
15. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation.
IEEE Ann History Comput (04):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
16. Su J, Fraser NJ, Gambardella G, Blott M, Durelli G, Thomas DB, Leong PH, Cheung PY (2018)
Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable
logic. In: International symposium on applied reconfigurable computing. Springer, pp 29–42.
https://doi.org/10.1007/978-3-319-78890-6_3
17. Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo JS, Cao Y (2016)
Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neu-
ral networks. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-
programmable gate arrays. FPGA’16. Association for Computing Machinery, New York, NY,
USA, pp 16–25. https://doi.org/10.1145/2847263.2847276
18. Sung W, Shin S, Hwang K (2015) Resiliency of deep neural networks under quantization. arXiv
preprint arXiv:1511.06488
19. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Low-Power U-Net for Semantic Image Segmentation 491

20. Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre, M., Vissers, K.: Finn: A
framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017
ACM/SIGDA international symposium on field-programmable gate arrays, pp 65–74. https://
doi.org/10.1145/3020078.3021744
21. Yazdanbakhsh A, Park J, Sharma H, Lotfi-Kamran P, Esmaeilzadeh H (2015) Neural acceler-
ation for GPU throughput processors. In: Proceedings of the 48th international symposium on
microarchitecture, pp 482–493. https://doi.org/10.1145/2830772.2830810
Electrocardiogram Signal Classification
for the Detection of Abnormalities Using
Discrete Wavelet Transform
and Artificial Neural Network Back
Propagation Algorithm

M. Ramkumar, C. Ganesh Babu, and R. Sarath Kumar

Abstract Analysis of electrocardiogram signals for detecting the cardiac abnor-

malities by utilizing the discrete wavelet transform (DWT) and the artificial neural
network with (BPA) back propagation algorithm is being preceded in this study. The
proposed algorithm is utilized for detecting the abnormality condition in the elec-
trocardiogram signal sample and perform the classification into two various classes
such as normal type and abnormal type of classifying the ECG signal. The ECG
sample data have been acquired from the MIT-BIH arrhythmia database. Among the
48 files in MIT-BIH cardiac arrhythmia database, 45 files of each single minute of
recording have been chosen for acquisition. And among the 45 files, 25 number of
files are being determined as the normal category of ECG signal, and the remaining
20 number of files are being determined as the abnormal category of ECG signal.
Once the ECG is being acquired from the database, preprocessing of the signal is
being undergone with which the noise is removed. After the process of denoising,
extraction of features is being carried out under two different sections, one is the
morphological features of electrocardiogram signal, and the other one is the features
selected on the basis of discrete wavelet transform which are combinedly given as an
input parameter to the classifier section. Artificial neural network is being depicted
as the classifier of ECG signals which utilizes the back propagation algorithm. The
analysis over its classification performance metrics is being made over calculating
the sensitivity, specificity, positive predictivity, and accuracy percentage. The classi-
fication over normal and the abnormal category has been resulted with the accuracy
of 98.21% using the artificial neural network back propagation algorithm.

M. Ramkumar (B) · R. Sarath Kumar

Department of Electronics and Communication Engineeeing, Sri Krishna College of Engineering
and Technology, Coimbatore, India
R. Sarath Kumar
e-mail: [email protected]
C. Ganesh Babu
Bannari Amman Institute of Technology, Sathyamangalam, India

Keywords ECG · MIT-BIH arrhythmia database · Artificial neural networks ·

DWT · Feature extraction

1 Introduction

The electrical behavior of cardiac muscle shows the periodic contraction and the
periodic relaxation of the cardiac muscle which is being designated as an electrocar-
diogram signal or waveform and is abbreviated as ECG waveform. The waveform for
making the analysis of electrocardiogram signal is utilized for making the diagnosis
of different heart diseases or abnormalities. The conditions of cardiac muscles are
utilized for making the diagnosis by an essential tool represented as electrocardio-
graphy. The different stages of ECG signal processing mechanisms are possessed
with preprocessing (removal of noise), removal of baseline wander, extraction of
features, selection of extracted features, and finally detection of cardiac arrhythmias
in terms of classifying the abnormal state of heart from the normal state. An ECG
signal is possessed with five basic peaks in the waveform, and they are denoted as P
peak, Q amplitude peak, R amplitude peak, S amplitude peak, T amplitude peak, and
rarely the presence of U amplitude peak might also exist. The representation of P
peak in the ECG waveform denotes the state of depolarization, the QRS peak in the
ECG waveform is usually represented in terms of QRS complex which denotes the
state of depolarizing the ventricles and the representation of T peak denotes the state
of ventricle repolarization [1]. One of the most essential sections of analyzing the
ECG waveform is purely dependent on the QRS complex shape that exists. The ECG
waveform for which the measurement has been undergone might get differed for the
same subject in such a way that the existence of heart beat has been determined in
various time durations and similar type of waveform could be acquired by the ECG
device for different subjects [2]. The cells of pacemaker within the sinoatrial node
(SA node) are utilized for generation and the regulation of heart rhythm periodically
which is placed at the top position of atrium in right side. The normal rhythmic beat
type of cardiac muscle is determined to be as most regular, and the depolarization
state of atrial section is most probably followed by the depolarization ventricles. For
instance, if it is of the arrhythmia state, the cardiac rhythm will be considered as the
most irregular and that might be either too much of fast or too much of slow.
Very few methods for analyzing, the ECG signal for detecting the cardiac arrhyth-
mias has been implemented for many years in order to enhance the value of sensitivity
and accuracy in classifying it. These methodologies are inclusive of few computa-
tional intelligence algorithms such as modeling of autoregressive coefficients [3],
wavelet coefficient technique [4], neural network using radial basis function [5],
self-organizing map (SOM) [5, 6], and the techniques of fuzzy clustering of C-means
technique [7].
Electrocardiogram Signal Classification for the Detection … 495

Fig. 1 Functional block diagram of ECG classification using ANN-BPA

2 Proposed Method for Classifying ECG Signals

The functional block diagram of the proposed methodology of ECG waveform clas-
sifying system is shown in Fig. 1. The entire functional technique is segregated into
three different sections. They are preprocessing of ECG signals, feature extraction
process of ECG signals, and classifying process of ECG signals.
The acquisition of raw signal of electrocardiogram is being made from MIT-
BIH arrhythmia database. Initially, the process of preprocessing is being carried out
in terms of removing the noise from the electrocardiogram signals along with the
removal of baseline wander in terms of threshold. These components are considered
as an unwanted component to process ECG signal. After the stage of preprocessing
is being completed, the features in terms of morphological features and the features
on the basis of discrete wavelet transform are being extracted which is further being
preceded to the classification stage as a usable form of data. As the final stage,
the extracted features are being selected completely and allowed to undergo the
process of classification by the utilization of artificial neural network using BPA.
The classification output results with the categorization of normal ECG signal or the
abnormal ECG signal.

3 Collection of Database

In this proposed study, the acquisition of ECG waveforms has been made from MIT-
BIH cardiac arrhythmia database, and it has been used from the source of physionet.
496 M. Ramkumar et al.

The electrocardiogram signal resources from the cardiac arrhythmia database of MIT-
BIH have been received from the laboratory of Beth Israel hospital. This database
totally consist of forty-eight files each with the recording of half an hour or thirty
minutes which is segregated into two different sections with which the initial one
has possessed with twenty-three number of files sequenced from hundred (100) to
one hundred and twenty-four (124) excluding few files in the middle, and the next
one possess twenty-five files which has been sequenced from two hundred (200) to
two hundred and thirty-four (234) excluding few files in the middle [8, 9].
This arrhythmia database is totally comprising of 1,09,000 count of beat labels.
The signals of ECG which has been acquired from MIT-BIH cardiac physionet
arrhythmia database are being read by its header file in the text format, binary format,
and the binary annotation formatted file. These files over the header provide the
explanation of brief data of ECG signals such as total count of samples, frequency
of sampling, ECG waveform format, ECG lead type, total count of leads in ECG,
history of patients or subjects from which the ECG signals has been acquired, and
the described medical data of the patients. The storage of ECG signals is made in the
format of 212, annotation file in the binary format, wherein which each individual
sample that gets superimposed by the lead count and the storage is made in the 12-bit
format, and the beat annotation has been possessed with the binary annotation file
[9].

4 Preprocessing of Electrocardiogram Signal

The initial stage of processing the ECG signal is denoted as preprocessing, with
which it is mandatory for eliminating the noises from the acquired input signals
utilizing the technique of discrete wavelet transform (DWT). For electrocardiogram
signal preprocessing, the elimination of noise undergoes various strategies from the
different sources of noise [10]. This technique of preprocessing is carried out in
order to extract noiseless features from the denoised ECG wave component which
will result with enhancement in the efficiency of the classification system [11]. This
ECG signal preprocessing mainly comprising of two different processes where one
is ECG signal denoising and the other one is removal of baseline (threshold) wander
using the wavelet transform technique with the multiresolution dimension.
a. ECG Signal Denoising

At this particular stage, various structures of noise are being eliminated utilizing the
fourth order Daubechies wavelet (Db4). The procedure for denoising the ECG signal
is inclusive of three main steps [10]. There are two coefficients being considered in the
wavelet transformation technique, one is the approximation coefficient, and the other
one is detail coefficient. The detail coefficients of the ECG signal are being influenced
by higher frequencies in the initial level, whereas the approximation coefficients of
the ECG signals are being influenced by lower frequencies in the single-dimensional
Electrocardiogram Signal Classification for the Detection … 497

signals of discrete form. In order to denoise the ECG wave component by utilizing
the DWT technique, the decomposition of the signal is made for various components
which is being exhibited at various scales. As the initial step of work, the selection
of wavelet coefficient in the appropriate manner is made, and the decomposition of
the signal is allowed to undergo to the Nth level. As the preceding of second step,
the selection of threshold is made by the utilization of different techniques wherein
which in our study, the techniques of thresholding which works automatically has
been implied. After the selection of soft threshold, for the entire level of detail
coefficients, it has to be applied. At the final stage, the reconstruction of the signal is
made on the basis of Nth level approximation coefficients and the detail coefficients
for which the modifications have been undergone from level 1 to level N [12]. The
representation of original ECG signal is shown in Fig. 2. The representation of
denoised or filtered ECG signal is shown in Fig. 3.
b. Removal of Baseline Wander

The artifacts of noise which usually influence the signals of electrocardiogram is

determined as baseline wander. At the normal state, its origin will be recorded from
the state of respiration, and it relies in between the frequency of 0.15–0.30 Hz. Hence,
the process of eliminating the baseline wander is required for making the analysis
over electrocardiogram signal for diminishing the irregularities in the morphological
realization of ECG beats. In our proposed study, the elimination of baseline wander
from the electrocardiogram signal is being carried out by initially uploading the
raw ECG waveform preceding with the smoothening of data in the vector column

Fig. 2 Representation of raw original ECG signal

498 M. Ramkumar et al.

Fig. 3 Representation of filtered ECG signal

of Y by utilizing the moving average filter, and later on, the results are acquired
from vector column Y [12]. The selection of span in this proposed work in order
to get the smoothened data is made as 150, and at the final stage, the subtraction is
being carried out from the acquired raw ECG wave component to the smoothened
electrocardiogram signal. Therefore, the computation of signal is being attained with
pure form that is released out from baseline wander.

5 Feature Extraction of ECG Signal

As soon as the process of denoising and baseline wander removal is being implied
over the processing of ECG signal classification, the next stage is the process of
extracting the features from the ECG wave component in preceding with the analysis.
The capability of the system in manipulating and computing the ECG data in the
compressed form of processing parameters is one of the most essential parts of
extraction by utilizing the WT technique denoted as features. Extraction of features
for processing the data is one of the most essential steps in recognizing the patterns.
The extraction of ECG features could be made in several forms. In this proposed study,
two different feature types are being extracted from the ECG wave component. They
are as follows.
(a) ECG signal morphological features.
(b) Features on the basis of wavelet coefficients.
Electrocardiogram Signal Classification for the Detection … 499

Approximation coefficient
1.5

0.5

0
0 500 1000 1500 2000 2500

Detail coefficient
1

0.5

-0.5

-1
0 500 1000 1500 2000 2500

Fig. 4 Representation of approximation and the detail coefficients of ECG signal

The process of extracting the features and selecting the extracted features plays an
essential role in recognizing the patterns. The computation of coefficients acquired
from discrete wavelet transform coefficients (DWT) points the signal distribution
over the frequency domain and time domain [13]. Figure 4 depicts the approxi-
mation coefficients and detail coefficients of ECG wave component. Therefore, the
predetermined detail coefficients and the approximation coefficients of electrocar-
diographic wavelet functions are recognized as the vector of features in denoting the
signals. Utilization of wavelet coefficients directly as the input sets which is being
processed in the neural network classifier might increase the total count of neurons
present in the hidden layer which creates the harmful realization effect in the compu-
tational process of neural network. Hence, the process of dimensionality reduction
has to be made implemented in the stage of feature extraction. For this dimension-
ality reduction of features which has to be extracted, the probabilistic analysis over
the wavelet coefficients has to be ascertained. The features which arose from the
statistical or probabilistic representation in the distribution over the frequency-time
response for the ECG signal component is as follows.
(a) Mean of the sub-band coefficients of detail and the approximation.
(b) Standard deviation of the sub-band coefficients of detail and the approximation.
(c) Variance of the sub-band coefficients of detail and the approximation in each
level.
Hence, for each signals of ECG, forty-eight number of features in terms of wavelet
have been acquired. Along with the probabilistic or the statistical features, the features
of ECG morphology are also being acquired for processing into the classifier [11].
500 M. Ramkumar et al.

The features which is being related to the ECG morphology are R–R interval standard
deviation, interval of P–R peak, interval of P–T peak, interval of S–T peak, interval
of T–T peak, interval of Q–T peak, maximum amplitude peak values of P amplitude
peak, Q amplitude peak, R amplitude peak, S amplitude peak, T amplitude peak,
QRS complex, and total count of R amplitude peaks. Hence, there are totally 64
count of features that have been acquired and processed as an input to the classifier
of artificial neural network system [14]. Since the feature vector quantities might
be quite variant, the process of normalization is needed for standardizing all the
features to the desired level. Performing the normalization for the mean and standard
deviation gives permission for the artificial neural network for treating each input as
equally important over its specified value range.

6 Artificial Neural Network Classifier Using Back

Propagation Algorithm

Artificial neural network (ANN) is usually represented as one of the best computa-
tional intelligent techniques with which its motivation is being raised by its biolog-
ical structure of neural networks. A neural network is consisting of an interconnected
cluster of artificial neurons. This proposed study makes the description for the utiliza-
tion of neural network in recognizing the pattern with which the units of inputs
denotes the feature vector and the unit of output denotes the section of pattern with
which the classification has to be undergone. Each one of the feature vectors which
is designated as the input vectors is being sent to the layer of input, and each unit’s
output is designated as the vector’s corresponding element. Each units of the hidden
layer determines the calculation over the additive value of weights corresponding to
the inputs to frame the scalar outline for defining the net activation function. The net
activation function is defined as the inner product of the weight vector and the input
vector in the hidden layer unit [15].
a. Back Propagation Algorithm
The artificial neural network which implies the algorithm of back propagation permits
the real-time acquisition of mapping the input and the output functional information
within the networks of multilayered function. By the utilization of back propagation
algorithm, the execution over ANN is carried with the search of gradient descent
function in order to make the reduction of mean square error (MSE). This mean
square error is measured between the designated output and the output which has
obtained actually from the system of network by the adjustment of weights. The
outcome of back propagation algorithm has been estimated with high precision and
adapted for most of the classification applications with which the system process
with the generalized information and the rules designated.
Electrocardiogram Signal Classification for the Detection … 501

7 Simulated Results and Discussions of ANN Classifier

Using Back Propagation Algorithm

The database of MIT-BIH cardiac arrhythmia is being partitioned into two unified
classes such as normal classification of ECG and the abnormal classification of ECG.
Each individual file of 60 s recording in being acquired in terms of data, and it is
segregated into two different classes on the basis of total count of heart beats and
attainment of maximum peak over it. Out of forty-eight recording of ECG signal,
with its each individual of half an hour recording, only forty-five have been chosen
for processing as mentioned earlier (twenty-five records acquired from normal class,
and the rest twenty acquired from abnormal class. Out of the entire forty-eight files in
the physionet database, 102, 217, and 107 are the subject files of ECG which were not
considered in this proposed study of classification. Table 1 depicts the categorization
of record number which has been utilized from MIT-BIH physionet database. As on
the whole, sixty-four counts of features have been utilized, and it is split into two
different mentioned classes. These are the features on the basis of DWT and also
obtained from the morphology of ECG signal. Among the 64 features, forty-eight
features are on the basis of discrete wavelet transform, and the remaining sixteen are
the morphological ECG features. These predefined features are processed as an input
to the artificial neural network using back propagation algorithm. For proceeding with
the part of simulation, initially, the training for the neural network has to be made
with the data. Making the combination of the features which has been extracted, the
dimension of 64 * 26 vector in the form of matrix has been formulated for training
the input informational data, and the 19 elements were utilized for performing the
testing process of neural network.
The result over the simulation has been acquired by using the ANN classifier by
the application of back propagation algorithm. Totally, 20 counts of neurons have
been utilized for testing and training the ECG wave component. Two number of
neurons are utilized in the output layer of the neural network which is denoted as
(0,1) and (1,0) which corresponds to the normal category and the abnormal category,
respectively.
Sensitivity, specificity, positive predictivity, and accuracy are the performance
metrics determined over the accuracy of classification. The formulas of the
performance metrics have been represented in the following equations.

Sensitivity(Se)% = TP/(TP + FN) ∗ 100 (1)

Table 1 Record distribution from the MIT-BIH cardiac arrhythmia physionet database
S. no. Classification Numbering of records
1 Normal category 100, 101, 103, 105, 106, 112, 113, 114, 115, 116, 117, 121, 122,
123, 201, 202, 205, 209, 213, 215, 219, 220, 222, 234
2 Abnormal category 104, 108, 109, 111, 118, 119, 124, 200, 203, 207, 208, 210, 212,
214, 217, 221, 223, 228, 230, 231, 232
502 M. Ramkumar et al.

Specificity(Sp)% = TN/(TN + FP) ∗ 100 (2)

Positive Predictivity(Pp) = TP/(TP + FP) ∗ 100 (3)

Accuracy(ACC) = (TP + TN)/(TP + FN + FP + TN) ∗ 100 (4)

where TP abbreviates as true positive, TN abbreviates true negative, FP abbreviates

false positive, and FN abbreviates false negative.
FP: Normal clusters classifies as the state of abnormal clusters.
TP: Abnormal clusters classifies as the state of abnormal clusters.
FN: Abnormal clusters classifies as the state of normal clusters.
TN: Normal clusters classifies as the state of normal clusters.
The confusion matrix possessed by the neural network classifier is given in Table
2. The percentage of accuracy has been determined in the ratio of ECG samples
which has undergone correct classification to the total count of the samples. The
neural network training window is shown in Fig. 6, and the training state and best
training performance are given in Figs. 7 and 8, respectively.
As per the data revealed from the confusion matrix, one normal category has
undergone the wrong classification as the abnormal one. That means, out of 25
normal category of ECG files, twenty-four have undergone the correct classification
except the one which undergone the wrong classification. Similarly, in performing
the classification task of abnormal category, all the files of ECG have been correctly
classified with which the accuracy of classifying the normal categorization results
with 96% of accuracy, and for abnormal category it results with cent percent. On the
whole, the performance of the classification system results with 97.8% of accuracy
on an average. Along with the accuracy, the average, sensitivity, specificity, and the
positive predictivity have been resulted with 95, 97.65, and 98%, respectively. The
performance metric evaluation for the percentage of accuracy, sensitivity, specificity,
and positive predictivity is shown in Fig. 5.

Table 2 Confusion matrix representation of ANN-BPA classifier

Method of Output/target Normal Abnormal Accuracy Sensitivity Specificity Positive
classification category category (%) (%) (%) predictivity
(%)
ANN using Normal 24 0 96 90 95.3 96
back category
propagation Abnormal 1 20 100 100 100 100
category
Total 25 20 97.8 95 97.65 98
Electrocardiogram Signal Classification for the Detection … 503

Performance Metrics Evaluation of Normal

and Abnormal Category
Normal Category Abnormal Category Average

102%
100%
98%
96%
94%
92%
90%
88%
86%
84%
Accuracy% Sensitivity% Specificity% Positive Predictivity
%

Fig. 5 Performance metric evaluation of normal and abnormal category

Fig. 6 Neural network training window

504 M. Ramkumar et al.

Fig. 7 Neural network training state

Fig. 8 Best training performance of neural network

8 Conclusion

This proposed work has revealed that the detection of abnormality state in the ECG
signal on the basis of DWT and ANN using back propagation algorithm (BPA). It
attains the average, sensitivity, specificity, positive predictivity, and accuracy of 95,
Electrocardiogram Signal Classification for the Detection … 505

97.65, 98, and 97.8%, respectively. The classification has been done by acquiring the
electrocardiogram signal from MIT-BIH cardiac arrhythmia database on the basis of
electrocardiogram heart beats aligned with it. By considering 45 ECG files from the
physionet database and considering totally 64 counts of features inclusive of both
statistical and the morphological features, it is achieved with the overall optimized
accuracy of 97.8% in categorization using a computational intelligent system. As the
part of future work, the optimization could be increased in classifying the cardiac
abnormalities in detecting the abnormality conditions by the acquisition of real-time
ECG signals.

References

1. Maglaveras N, Stamkapoulos T, Diamantaras K, Pappas C, Strintzis M (1998) ECG pattern

recognition and classification usingnon-linear transformations and neural networks: a review.
Int J Med Inform 52:191–208
2. Osowski S, Linh TH (2001) ECG beat recognition using fuzzy hybrid neural network. IEEE
Trans Biomed Eng 48:1265–1271
3. Srinivasan N, Ge DF, Krishnan SM (2002) Autoregressive modeling and classification of
cardiac arrhythmias. In: Proceedings of the second joint conference houston, TX, USA, 23–26
October 2002
4. de Chazal P, Celler BG, Rei RB (2000) Using wavelet coefficients for the classification of
the electrocardiogram. In: Proceedings of the 22nd Annual EMBS international conference,
Chicago IL, 23–28 July 2000
5. Hussain H, Fatt LL (2007) Efficient ECG signal classification using sparsely connected radial
basis function neural network. In: Proceeding of the 6th WSEAS international conference on
circuits, systems, electronics, control and signal processing, December 2007, pp 412–416
6. Risk MR, Sobh JF, Philip Saul J (1997) Beat detection and classification of ECG using self
organizing maps. In: Proceedings—19th international conference—IEEEIEMBS, Chicago, IL.
USA, October 30–November 2, 1997
7. Ozbay Y,Ceylan R, Karlik B (2011) Integration of type-2 fuzzy clustering and wavelet transform
in a neural network based ECG classifier. Exp Syst Appl 38:1004–1010
8. Stamkopoulos T, Maglaveras N, Diamantaras K, Strintzis M (1998) ECG Analysis using
nonlinear PCA neural networks for ischemia detection. IEEE Trans Signal Process 46(11)
9. Dokur Z, Olmez T (2001) ECG beat classification by a novel hybrid neural network. Comput
Methods Prog Biomed 66:167–181
10. Zhou J (2003) Automatic detection of premature ventricular contraction using quantum neural
networks. In: Proceedings of the Third IEEE symposium on bioinformatics and bioengineering
11. Jadhav SM, Nalbalwar SL, Ghatol AA (2010) ECG arrhythmia classification using modular
neural network model. In: IEEE EMBS conference on biomedical engineering & sciences
(IECBES 2010), Kuala Lumpur, Malaysia, 30th November–December 2, 2010
12. Inan OT, Giovangrandi L, Kovacs GTA (2006) Robust neural-network-based classification of
premature ventricular contractions using wavelet transform and timing interval features. IEEE
Trans Biomed Eng 53(12)
13. Jiang W, Kong SG (2007) Block-based neural networks for personalized ECG signal
classification. IEEE Trans Neural Netw 18(6)
14. Lamedo M, Martınez JP (2011) Heartbeat classification using feature selection driven by
database generalization criteria. IEEE Trans Biomed Eng 58(3)
15. Melgani F, Bazi Y (2008) Classification of electrocardiogram signals with support vector
machines and particle swarm optimization. IEEE Trans Inf Technol Biomed 12(5)
Performance Analysis of Optimizers for
Glaucoma Diagnosis from Fundus
Images Using Transfer Learning

Poonguzhali Elangovan and Malaya Kumar Nath

Abstract Transferring the weights from the pre-trained model results in faster and
easier training than training the network from scratch. The proper choice of optimizer
may improve the performance of the deep neural networks for image classification
problems. This paper analyzes and compares three standard first-order optimizers like
stochastic gradient descent with momentum (SGDM), adaptive moment estimation
(Adam), and root mean square propagation (RMSProp), particularly for detecting
glaucoma from fundus images using different CNN architectures like AlexNet, VGG-
19, and ResNet-101. Experiment results show that network parameters updated using
Adam optimizer yields better results in most of the databases. Among the models,
VGG-19 has obtained the highest classification accuracy of 91.71, 87.8, and 97.12%,
in DRISHTI-GS1, RIM-ONE(2), and LAG databases, respectively. ResNet-101 has
outperformed other networks in ORIGA and ACRIMA databases, with the highest
classification accuracy of 80.5% and 98.5%, respectively.

Keywords Fundus image · Glaucoma · Optimizers · Transfer learning

1 Introduction

Optic nerves are cluster of nerves located in the back of human eye. It carries the
visual information from the retina to brain. Among the various retinal disorders, glau-
coma is the most common disorder which affects the optic nerve. Due to increased
intraocular pressure, the optic nerve may get compressed and damaged. This results
in loss of peripheral vision. Normally, glaucoma has no symptoms at the initial
stages. Diagnosis of this retinal disorder at an early stage is a challenging task. Some
traditional techniques like intraocular pressure measurement, optic nerve head eval-
uation, and visual field testing may have certain limitations, which can be overcome
by computer-aided diagnosis (CAD) approaches.

P. Elangovan (B) · M. K. Nath

Department of ECE, National Institute of Technology Puducherry, Karaikal, India

Fundus imaging, optical coherence tomography (OCT), retinal thickness analyzer

(RTA), scanning laser polarimetry (SLP), Heidelberg retinal tomography (HRT) are
commonly used imaging modalities for glaucoma diagnosis. Among these modali-
ties, fundus photography is simple and reliable for identifying various pathologies
related to different retinal diseases. Fundus image is a two-dimensional representa-
tion of interior surface of human eye. The brightest portion of the fundus image is
optic disk, which is considered as the region of interest for detecting the glaucoma
disease. Optic cup (OC) is the central part of optic disk (OD) and a progressive
increase in optic cup size is a clinical sign for glaucoma [1].
Computer-aided diagnosis of glaucoma from fundus images mainly comprises
of feature extraction, feature reduction, and classification. Parameters like cup to
disk ratio (CDR), ISTN rules, cup entropy, rim entropy, etc., are calculated from
segmented OD and OC [2–4]. Based on the computed parameters, glaucoma can
be detected from the fundus images. Discriminative features were extracted using
wavelet transform [5, 6]. Classification is done by suitable classifier. Some promis-
ing results have been achieved by the researchers in both domains. But the major
drawback is, these techniques mainly depend on the hand-crafted features. With the
invention of convolutional neural network (CNN) by LeCun [7], there is a tremen-
dous growth in the field of deep neural network especially in glaucoma detection.
In [8], the segmentation of OD and OC were performed using deep neural network,
whereas in [9, 10], deep neural network has been developed for the classification of
glaucomatous and normal images.
The learnable parameters (weights and biases) in deep neural architecture are
updated using optimization algorithms. Optimization algorithms are the standard
backpropagation algorithms, which may have a significative influence on the network
training. Generally, optimization algorithms are categorized into three groups: first-
order, higher-order, and derivative-free. First-order optimization techniques such
as SGDM, Adam, RMSProp, AdaGrad, etc., are simply gradient-based methods
which mainly use the first derivative of gradient information. The network param-
eters are updated in the negative direction of gradients. Higher-order optimization
like quasi-Newton and Hessian-free methods updates the network parameters based
on the second-order gradient information. Derivative-free optimization methods are
employed in those cases where derivative of the objective function does not exist.
Bayesian optimization, genetic algorithms, particle swarm optimization, etc., are the
commonly used derivative-free optimization methods.
The main objective of this work is to compare the performance of various pre-
trained models like AlexNet, VGG-19, and ResNet-101 using different first-order
optimizers like SGDM, Adam, and RMSProp. Publicly available databases like
DRISHTI-GS1, RIM-ONE(2), ORIGA, ACRIMA, and LAG are used. Performance
of the networks is evaluated using accuracy, sensitivity, specificity, and precision.
The structure of paper is as follows: Sect. 2 describes the related work. Section 3
illustrates the method employed for classification. Section 4 provides the simulation
results with discussions. Section 5 provides the conclusions.
Performance Analysis of Optimizers for Glaucoma Diagnosis . . . 509

2 Related Work

Recently, convolutional neural networks (CNNs) have been predominantly applied

in various image processing domains. CNNs can learn the features at different levels.
This makes it appropriate for glaucoma detection. Studies from the literature reveal
that glaucoma diagnosis using CNN has achieved better performance. Researchers
have utilized either pre-trained model or standard CNN for classification.
Chen et al. [11] have classified the glaucoma and normal images using CNN
model, which includes four convolutional layers and two fully connected layers. Dur-
ing training, the learnable parameters are updated using SGDM optimizer. ORIGA
and Singapore Chinese Eye Study (SCES) databases are considered in their work.
Different architectures such as VGG-19, GoogleNet, ResNet-50, and DeNet are
implemented by Gomez-Valverde et al. [12] for automatic detection of glaucoma
from fundus images. In addition, standard 16-layer CNN is also developed for
glaucoma classification. Optimization of parameters is achieved by SGD optimizer.
Around 2313 fundus images are used. Authors have concluded that VGG-19 outper-
forms other models.
In Diaz-Pinto et al. [13], the performance of various pre-trained models such as
VGG-16, VGG-19, Inception-v3, ResNet-50, and Xception are analyzed and com-
pared. For all pre-trained models, SGDM optimizer is used. They have used 1707
images and concluded that Xception model performs better in the classification task.
Eight standard deep learning architectures VGG-16, VGG-19, ResNet, DenseNet,
Inceptionv3, InceptionResNet, Xception, and NASNetMobile are investigated for
glaucoma detection in Zhen et al. [14]. All the pre-trained models are optimized
using SGDM optimizer. They have reported that the highest classification accuracy
is obtained using DenseNet model.
Raghavendra et al. [15] have developed a 20-layer CNN model to extract the
discriminative features for classification. The learnable parameters are optimized
using SGDM optimizer. They have evaluated their algorithm on 1426 images.
In Li et al. [16], attention-based CNN (AG-CNN) is employed for glaucoma
detection. Large-scale attention-based glaucoma (LAG) database is developed. Adam
optimizer is used during training. Around 5824 fundus images are used in their work.
It is evident from the literature that SGDM or Adam optimizers are mostly used
for updating the parameters during training. With the suitable selection of optimizer,
the network performance may get improved. Also, the major goal of classifying the
glaucomatous and normal images with reduced false rates may be achieved by the
proper selection of optimizer for updating the network parameters. This motivated us
to analyze and compare the performance of different first-order optimizers, especially
for classifying glaucoma and normal images using various pre-trained models such
as AlexNet, VGG-19, and ResNet-101. Experiment results reveal that the pre-trained
VGG-19 model outperforms the other architectures toward classifying glaucoma and
normal fundus images.
510 P. Elangovan and M. K. Nath

3 Methodology

Automatic detection of glaucoma using deep neural network has gained popular-
ity in recent years. Training the neural network from scratch is time consuming. It
requires an effective hyperparameter selection technique. Instead, transferring the
weights from the standard pre-trained network is easy and provides a better perfor-
mance metrics for classification problems. Figure 1 gives the block diagram of trans-
fer learning-based glaucoma detection from fundus image. The fundus images are
resized to a standard input size of pre-trained network. As deep neural network works
well with larger number of images, data augmentation using rotation is performed in
the pre-processing stage. The initial layers and network weights are transferred from
the selected model. The discriminative features from the fundus images are extracted
by suitable network. Classification is done by modifying the final layers.

3.1 Transfer Learning

Transfer learning is a typical deep learning approach in which the weights from pre-
trained models can be transferred to new classification problem. This results in faster
and easier training. In this work, AlexNet, VGG-19, and ResNet-101 architectures
are used. AlexNet [17] comprises of eight learnable layers (five convolutional and
three fully connected layers). VGG-19 [18] is one among the standard pre-trained
model which is effectively used for image classification due to its deep architecture.
It comprises of forty-seven layers, which includes sixteen convolutional layers, five

Glaucoma

Load pre-
Image
trained Replace Re-train
Resizing
Model Final the
and Data
(AlexNet/ layers model
AugmentaƟon
VGG-19 /
ResNet-101

Input fundus Pre-processing Feature ExtracƟon and classificaƟon using Normal

images Transfer learning

Fig. 1 Transfer learning-based classification using pre-trained model

Performance Analysis of Optimizers for Glaucoma Diagnosis . . . 511

max pooling layers, and three fully connected layers. ResNet-101 [19] from residual
networks overcomes the vanishing gradient problem by incorporating the residual
connections in the network.

3.2 Optimization Algorithms

Gradient descent is the standard first-order optimization algorithm which iteratively

updates the learnable parameters of neural network in order to minimize the loss. Gen-
erally, gradient gives the information about the direction at which the loss function
has the steepest rate of change. Each learnable parameter is updated in the negative
direction of the gradient with a suitable step size, called learning rate. Mathemati-
cally, the update equation is represented as

∂L
W = W −η∗ , (1)
∂W
where W represents learnable parameter vector, η is the step size, and L is the
loss function. Depending on the number of data samples employed for gradient
computation, gradient descent algorithm has three major variants: batch gradient
descent (BGD), stochastic gradient descent (SGD), and mini-batch gradient descent
(MBGD). In BGD algorithm, the gradient of the loss function is computed for the
complete training dataset, whereas SGD performs a parameter update for each train-
ing sample. The entire training dataset is divided into mini-batches and the parameters
are updated for every mini-batch in MBGD algorithm. BGD results in slow training
and redundant computations. In contrast, SGD is faster, but frequent updates with a
high variance results in fluctuations. The variance of the parameter updates is greatly
reduced in mini-batch gradient descent and this may lead to stable convergence
compared to other two variants.
Stochastic gradient descent with momentum (SGDM) Stochastic gradient descent
with momentum [20] is an extension of SGD algorithm. It incorporates the past
gradients in each dimension. The momentum term reduces the undesired oscillations
and makes the algorithm to attain convergence at faster speed.

Algorithm 1 SGDM optimizer

1: Select the initial parameter vector W0 and objective function f (W )
2: Select the step size η and decay rate of moving average γ
3: Initialize first moment vector M0 to zero
4: while W0 not converged do
5: i =i +1
6: Compute the gradients at timestep i, gi .
7: Update the biased first moment estimate using Mi = γ Mi−1 + (1 − γ) gi
8: Update the parameters using Wi = Wi−1 − η Mi
9: end while
512 P. Elangovan and M. K. Nath

Adaptive moment estimation (Adam) Adaptive moment estimation (Adam) is a

widely used method for stochastic optimization. Adaptive learning rates of each
parameter are computed by the mean and variance of the gradients. It incorporates
the advantages of adaptive gradient algorithm (AdaGrad) and root mean square prop-
agation (RMSProp) optimization algorithm [21]. The update rule is given as:
η
Wi+1 = Wi − √ M1i . (2)
V 1i + ζ

M1i is the bias-corrected mean, V 1i is the bias-corrected variance and are described
by
Mi
M1i = , (3)
1 − γ1i

Vi
V 1i = , (4)
1 − γ2i

Mi and Vi are mean and variance of the gradients, respectively. Mathematically,

Mi and Vi are described as

Mi = γ1 Mi−1 + (1 − γ1 )gi , (5)

Vi = γ2 Vi−1 + (1 − γ2 )gi2 , (6)

Algorithm 2 Adam optimizer

1: Select the initial parameter vector W0 and objective function f (W )
2: Select the step size η, exponential decay rates for the moment estimates γ1 and γ2
3: Initialize first moment vector M0 to zero
4: Initialize second moment vector V0 to zero
5: while W0 not converged do
6: i =i +1
7: Compute the gradients at timestep i, gi .
8: Update the biased first moment estimate using Equation (5) and biased second moment
estimate using Equation (6)
9: Compute the M1i and V 1i using Equation (3) and Equation (4), respectively.
10: Update the parameters using Equation (2)
11: end while

Root mean square propagation (RMSProp) Root mean square propagation

(RMSProp) is another optimization algorithm which improves the network train-
ing by utilizing adaptive learning rates. The moving average of squared gradients for
each parameter is considered in this algorithm. It mainly restricts the oscillations in
vertical direction.
Performance Analysis of Optimizers for Glaucoma Diagnosis . . . 513

Algorithm 3 RMSProp optimizer

1: Select the initial parameter vector W0 and objective function f (W )
2: Select the step size η and decay rate γ1
3: Initialize second moment vector V0 to zero
4: while W0 not converged do
5: i =i +1
6: Compute the gradients at timestep i, gi .
7: Update the biased second moment estimate using Vi = γ1 Vi−1 + (1 − γ1 ) gi2
8: Update the parameters using Wi = Wi−1 − η √gVi
i
9: end while

4 Results and Discussion

Several retinal databases like DRISHTI-GS1, RIM-ONE(2), ORIGA, ACRIMA,

and LAG are considered for evaluating the performance of the deep neural network.
Figure 2 describes the sample glaucoma and normal images from various databases.
The first row and second row represent glaucoma and normal images from DRISHTI-
GS1, ORIGA, RIM-ONE(2), LAG, and ACRIMA, respectively. The description of
retinal databases is provided in Table 1.
In this work, AlexNet, VGG-19, and ResNet-101 networks are trained and tested
with various databases for the classification of glaucoma and normal images. For
each database, the pre-trained model is trained with 70% of images and tested with
30% images. Training is done with different optimizers like Adam, SGDM, and
RMSProp. The performance metrics used in this work are described in Table 2.
Proper selection of hyperparameters like batch size, maximum epochs, and step
size may improve the performance of deep neural network. A batch size of 32 is
selected for training the pre-trained models. As pre-trained network weights are
transferred, small step size of 0.00001 and number of epochs of 20 may result in better
performance of the networks. For each database, network parameters are updated

Fig. 2 Sample glaucoma and normal images from various databases

514 P. Elangovan and M. K. Nath

Table 1 Description of databases

Database Number Number of Number of Dimension and
oforiginal images training images testing images format
DRISHTI-GS1 Gl-89 Gl-1240 Gl-540 2896 × 1944
[22] Nor-12 Nor-160 Nor-40 PNG

RIM-ONE2 [23] Gl-200 Gl-700 Gl-300 290 × 290

Nor-255 Nor-895 Nor-380 to 1375 × 1654
JPEG
ORIGA [24] Gl-168 Gl-1176 Gl-504 3072 × 2048
Nor-482 Nor-1687 Nor-723 JPEG
LAG [16] Gl-2392 Gl-1198 Gl-513 500 × 500
Nor-3432 Nor-2200 Nor-943 JPEG
ACRIMA [13] Gl-396 Gl-2770 Gl-1190 400 × 400
Nor-309 Nor-2160 Nor-930 to 1156 × 1156
JPEG
Gl indicates glaucoma images and Nor represents normal images

Table 2 Performance metrics

S. no. Metrics Mathematical expression
1 Accuracy ACC = TP+TN+FP+FN
TP+TN

2 Sensitivity SN = TP+FN
TP

3 Specificity SP = TN+FP
TN

4 Precision PR = TP+FP
TP

using three standard first-order optimizers like SGDM, Adam, and RMSProp, and
the performance metrics are noted. It is observed from the simulation results that
Adam optimizer yield better results compared to other two optimizers.
Tables 3, 4, 5, 6, and 7 describe the results obtained for different databases using
AlexNet, VGG-19, and ResNet-101 model. The metric precision is considered for
unbalanced databases (DRISHTI-GS1 and ORIGA) and classification accuracy for
balanced databases (RIM-ONE(2), ACRIMA, and LAG). The learnable parame-
ters updated using Adam optimizer yields better results in RIM-ONE(2) and LAG
databases. A highest classification accuracy of 87.8% and 97.12% are obtained in
RIM-ONE(2) and LAG databases, respectively, using VGG-19 model. For ACRIMA
database, AlexNet and VGG-19 yield better results with Adam optimizer. ResNet-
101 with SGDM performs better in ACRIMA database, with highest accuracy of
98.5%. A better precision of 92.91% is obtained for DRISHTI-GS1 with VGG-19
using Adam optimizer, whereas for ORIGA database, ResNet-101 with RMSProp
optimizer results in highest precision value of 81.2%. From this analysis, it is evident
that the performance of deep neural network depends mainly on the databases and
optimizers employed.
Performance Analysis of Optimizers for Glaucoma Diagnosis . . . 515

Table 3 Performance metrics obtained in DRISHTI-GS1 database

Optimizer/ SGDM Adam RmsProp
network
ACC SN SP PR ACC SN SP PR ACC SN SP PR
AlexNet 88.93 96.17 40.14 91.57 88.70 96.10 38.80 91.40 89.50 96.50 42.50 91.90
VGG-19 91.77 98.09 49.17 92.87 90.38 96.30 50.42 92.91 90.91 99.80 29.58 90.62
ResNet-101 89.40 99.80 18.80 89.20 91.50 99.70 33.80 91.10 90.80 99.80 30.00 90.60
The bold indicates best values

Table 4 Performance metrics obtained in RIM-ONE(2) database

Optimizer/ SGDM Adam RmsProp
network
ACC SN SP PR ACC SN SP PR ACC SN SP PR
AlexNet 83.78 79.03 87.58 83.28 84.90 81.30 87.60 83.80 84.90 77.30 90.80 86.90
VGG-19 87.30 86.11 88.26 85.34 87.80 86.44 88.86 86.00 84.41 86.67 82.63 80.36
ResNet-101 83.70 83.00 84.20 80.60 85.40 83.70 86.80 83.40 81.90 87.30 77.60 75.50
The bold indicates best values

Table 5 Performance metrics obtained in ORIGA database

Optimizer/ SGDM Adam RmsProp
network
ACC SN SP PR ACC SN SP PR ACC SN SP PR
AlexNet 75.65 57.09 88.96 78.41 71.70 61.00 79.30 67.50 68.20 53.70 78.30 63.60
VGG-19 75.84 58.04 88.24 77.58 76.60 60.91 87.48 77.00 74.33 50.20 91.15 79.81
ResNet-101 76.40 62.50 86.10 76.00 74.30 58.80 85.10 73.50 80.50 68.60 88.80 81.20
The bold indicates best values

Table 6 Performance metrics obtained in LAG database

Optimizer/ SGDM Adam RmsProp
network
ACC SN SP PR ACC SN SP PR ACC SN SP PR
AlexNet 91.49 86.66 94.09 88.90 95.60 92.40 97.30 95.00 95.30 90.40 98.00 96.10
VGG-19 93.70 90.80 95.20 91.20 97.12 96.82 97.28 95.10 96.04 92.98 97.70 95.70
ResNet-101 90.50 86.20 92.90 86.80 95.90 95.70 96.00 92.80 95.70 92.40 97.50 95.20
The bold indicates best values

Table 7 Performance metrics obtained in ACRIMA database

Optimizer/ SGDM Adam RmsProp
network
ACC SN SP PR ACC SN SP PR ACC SN SP PR
AlexNet 95.47 93.59 97.80 98.22 96.70 96.60 96.80 97.50 95.00 92.80 98.00 98.30
VGG-19 96.32 95.21 97.74 98.18 96.67 95.38 98.33 98.65 94.15 91.68 97.31 97.76
ResNet-101 98.50 99.10 97.70 98.30 98.30 98.70 97.80 98.30 98.10 99.70 96.00 97.00
The bold indicates best values
516 P. Elangovan and M. K. Nath

DRISHTI-GS1 ORIGA
100 ACC
100
SN ACC
90 SP 90 SN
PR
SP
PR
80 80

70 70

60 60

50 50

40 40

30 30

20 20

10 10

0 0
Chen Juan Ragavendra AlexNet VGG-19 ResNet-101 Chen Juan Ragavendra AlexNet VGG-19 ResNet-101

Fig. 3 Comparison of results for DRIGHTI-GS1 and ORIGA

RIM-ONE(2) ACRIMA
90 100
ACC ACC
SN SN
80 SP 90 SP
PR PR

70 80
70
60
60
50
50
40
40
30
30
20 20
10 10

0 0
Chen Juan Ragavendra AlexNet VGG-19 ResNet-101 Chen Juan Ragavendra AlexNet VGG-19 ResNet-101

Fig. 4 Comparison of results for RIM-ONE(2) and ACRIMA

In addition, the CNN architectures described in the literature (Chen et al. [11],
Gomez-Valverde et al. [12], Raghavendra et al. [15]) are also implemented and the
results are compared with the pre-trained models. Figures 3, 4, and 5 represent the
performance metrics obtained by different CNN architectures for various databases.
For pre-trained models, optimizer which results in better performance metrics is con-
sidered for comparison. Performance comparison of different architectures reveals
that except ORIGA database, pre-trained model performs better toward classifying
the glaucoma and normal images than the networks trained from the scratch.

5 Conclusion

In this paper, the performance of various optimizers like Adam, SGDM, and
RMSProp are analyzed and compared for the automatic detection of glaucoma
from fundus images using transfer learning technique. Three standard models like
AlexNet, VGG-19, and ResNet-101 are considered for extracting the discriminative
features from the original images. DRISHTI-GS1, RIM-ONE(2), ORIGA, ACRIMA,
and LAG retinal databases are considered to evaluate the network performance.
Performance Analysis of Optimizers for Glaucoma Diagnosis . . . 517

LAG
100
ACC
SN
90 SP
PR
80

0
Chen Juan Ragavendra AlexNet VGG-19 ResNet-101

Fig. 5 Comparison of results for LAG

VGG-19 model with parameters updated using Adam optimizer has obtained the
highest classification accuracy of 87.8%, 97.12%, and precision of 92.91% in RIM-
ONE(2), LAG, and DRISHTI-GS1, respectively. An overall classification accuracy
of 98.5% and precision of 81.2% are obtained using RmsProp and SGDM optimizer
in ResNet-101 model. Compared to the networks trained from scratch, pre-trained
models perform better in classifying glaucoma and normal images.

References

1. Elangovan P, Ravichandran G, Nath MK, Acharya OP (2018) Review on localization of optic

disc in retinal fundus images. In: 2018 International conference on applied electromagnetics,
signal processing and communication (AESPC). Bhubaneswar, India, pp 1–7. https://doi.org/
10.1109/AESPC44649.2018.9033304
2. Joshi GD, Sivaswamy J, Krishnadas SR (2011) Optic disk and cup segmentation from monoc-
ular color retinal images for glaucoma assessment. IEEE Trans Med Imaging 30(6):1192–1205
3. Mittapalli PS, Kande GB (2016) Segmentation of optic disk and optic cup from digital fundus
images for the assessment of glaucoma. Biomed Signal Process Control 24:34–46
4. Elangovan P, Nath MK, Mishra M (2020) Statistical parameters for glaucoma detection form
color fundus images. In: Third international conference on computing and network communi-
cation in procedia computer science, vol 171, pp 2675–2683
5. Nath MK, Dandapat S (2017) Differential entropy in wavelet sub-band for assessment of
glaucoma. Int J Imaging Syst Technol 22(3):161–165
6. Maheshwari S, Pachori RB, Acharya UR (2017) Automated diagnosis of glaucoma using
empirical wavelet transform and correntropy features extracted from fundus images. IEEE J
Biomed Health Inf 21(3):803–813
7. Lecun Y (1989) Generalization and network design strategies. Technical report. Zurich,
Switzerland
518 P. Elangovan and M. K. Nath

8. Zilly J, Buhmann JM, Mahapatra D (2017) Glaucoma detection using entropy sampling and
ensemble learning for automatic optic cup and disc segmentation. Comput Med Imaging Graph
55:28–41
9. Bajwa MN, Malik MI, Siddiqui SA, Dengel A, Shafait F, Neumeier W, Ahmed S (2019) Two-
stage framework for optic disc localization and glaucoma classification in retinal fundus images
using deep learning. BMC Med Inf Decis Mak 19(1):1–16
10. Elangovan P, Nath MK (2020) Glaucoma assessment from color fundus images using convo-
lutional neural network. Int J Imaging Syst Technol 1–17. https://doi.org/10.1002/ima.22494
11. Chen X, Xu Y, Wong D, Wong T-Y, Liu J (2015) Glaucoma detection based on deep convolu-
tional neural network. In: Annual international conference of the IEEE engineering in medicine
and biology society (EMBC), Milan, Italy, pp 715–718
12. Gomez-Valverde J, Anton A, Fatti G, Liefers B, Herranz A, Santos A, Sanchez C, Ledesma-
Carbayo M (2019) Automatic glaucoma classification using color fundus images based on
convolutional neural networks and transfer learning. Br J Ophthalmol 10(2):892–913
13. Diaz-Pinto A, Morales S, Naranjo V, Kohler T, Mossi J, Navea A (2019) Cnns for auto-
matic glaucoma assessment using fundus images:an extensive validation. Bio-Med Eng OnLine
18(29):1–19
14. Zhen Y, Wang L, Liu H, Zhang J, Pu J (2005) Performance assessment of the deep learning
technologies in grading glaucoma severity. Medical Image Anal 9(4):297–314
15. Raghavendra U, Fujita H, Bhandary SV, Gudigar A, Hong Tan J, Rajendra Acharya U (2018)
Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus
images. Inf Sci 144(29):41–49
16. Li L, Xu M, Wang X, Jiang L, Liu H (2019) Attention based glaucoma detection: a large-scale
database and CNN model. In: The IEEE conference on computer vision and pattern recognition
(CVPR), pp 1–10
17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional
neural networks. Neural Inf Process Syst 25(2):1097–1105
18. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition.
CoRR, vol.abs/1409.1556
19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE
conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp 770–778
20. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw
12:145–151
21. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International confer-
ence on learning representations, San Diego, CA, pp 1–15
22. Sivaswamy J, Krishnadas SR, Joshi GD, Jain M, Tabish AUS (2014) Drishti-gs: retinal image
dataset for optic nerve head (ONH) segmentation. In: 2014 IEEE 11th international symposium
on biomedical imaging (ISBI), Beijing, China, pp 53–56
23. Fumero F, Alayon S, Sanchez J, Sigut J, Gonzalez-Hernandez M (2011) Rim-one: an open
retinal image database for optic nerve evaluation. In: IEEE symposium on computer based
medical systems (CBMS), Bristol, UK, pp 1–6
24. Zhang Z, Yin F, Liu J, Wong W, Tan N, Lee B-H, Cheng J, Wong T-Y (2010) Origa(-light): an
online retinal fundus image database for glaucoma analysis and research. In: Annual interna-
tional conference of the IEEE engineering in medicine and biology, Buenos, Aires, pp 3065–
3068
Machine Learning based Early
Prediction of Disease with Risk Factors
Data of the Patient Using Support Vector
Machines

Usharani Chelladurai and Seethalakshmi Pandian

Abstract Early detection of diseases plays an important role in improving the quality
of healthcare and can help people to prevent dangerous health conditions. Early detec-
tion of chronic disease is a critical task in the area of health data analysis. This paper
proposes a novel fuzzy logic-based SVM machine learning technique for patient-
centered healthcare analytics for premature prediction of chronic diseases such as
hypertension, hypothyroidism, and obesity based on individual patient risk factors
data. The proposed method consists of preprocessing, feature selection, feature
extraction, fuzzy SVM classification, and post-processing predicts the severity of the
disease. The proposed fuzzy-based support vector machines (SVMs) are designed
to classify important features by applying machine learning techniques to ensure
accuracy in the prediction of chronic disease. The SVM is a margin-based classi-
fier that maps input data onto a high-dimensional space and classifies them with a
linear approximation. This technique combines outputs from different classification
models and has shown the highest accuracy compared to previous techniques. None
of the previous studies have integrated fuzzy with SVM classifiers for chronic disease
datasets. The proposed machine learning-based disease prediction model for early
diagnosis and timely treatment for non-communicable disease/chronic disease inves-
tigates the risk factors data from the patient treatment log. This model provides early
risk detection, helps doctors to follow the appropriate precautions, and measures to
minimize and prevent a patient from reaching critical phases of the disease. The
system would significantly decrease human mortality rates and strengthen health
services.

Keywords Classification · Early disease prediction · Feature selection ·

Fuzzy-based SVM · Machine learning · Rstudio analytics

U. Chelladurai (B) · S. Pandian

Departments of Computer Science and Engineering and Electronics and Communication
Engineering, University College of Engineering, BIT Campus, Anna University, Tiruchirappalli,
Tamil Nadu, India

1 Introduction

The World Health Organization has reported that the global responsibility and danger
of Non-Communicable Diseases is a major public health challenge that threatens
social and economic development worldwide [1]. Non-Communicable Diseases
(NCDs), also known as chronic diseases, tend to be of long duration and are the result
of a combination of genetic, physiological, environmental, and behavioral factors [2].
The main types of NCDs are cardiovascular diseases (like heart attacks and stroke),
cancers, chronic respiratory diseases (such as chronic obstructive pulmonary disease
and asthma), and diabetes [3]. The main causes of NCD are harmful use of tobacco
and alcohol, physical inactivity, and unhealthy diets all increase the risk of dying
from an NCD [4].
People of all age groups, regions, and countries are affected by NCDs [5, 6].
These conditions are often related to older age groups, but evidence shows that 15
million of all deaths attributed to NCDs occur between the ages of 30 and 69 years.
Of these “premature” deaths, over 85% are estimated to occur in low- and middle-
income countries. Children, adults, and the elderly are all vulnerable to the risk
factors contributing to NCDs, whether from unhealthy diets, physical inactivity, and
exposure to tobacco smoke or the harmful use of alcohol. These diseases are driven
by forces that include the rapid growth of globalization, unplanned urbanization,
unhealthy lifestyles, and population aging. Unhealthy diets and a lack of physical
activity may show up in people as raised blood pressure, increased blood glucose,
elevated blood lipids, and obesity. These are called metabolic risk factors that can
lead to cardiovascular disease, the leading NCD in terms of premature deaths [7, 8].
Approximately 639 million adults in developing countries suffer from hypertension
and are estimated to reach nearly 1 billion adults by 2025. WHO projections show
that NCDs will be responsible for a significantly increased total number of deaths
in the next decade. NCD deaths are projected to increase by 15% globally between
2010 and 2020 (to 44 million deaths) [9, 10]. As the growing awareness of the risk
of NCD, several recent studies have utilized machine learning models as a decision-
making technique for early detection and timely treatment based on an individual’s
risk factors data, and hence, appropriate care can take at an earlier stage [11, 12].
Machine learning algorithms in healthcare, machine learning is helping to stream-
line administrative processes in hospitals, plan and treat infectious diseases and
personalize medical treatments [13, 14]. It can impact hospitals and health systems
in improving efficiency while reducing the cost of care.”
In many previous studies, the optimization of the classifier is done with the “test
set,” which may outcome in an unplanned influence of the test data on the clas-
sifier thus higher sensitivity and specificity than those that may be experienced in
real-world conditions [15, 16]. To provide more actual estimates of sensitivity and
specificity, the test data must not influence the training of the algorithm [17]. There-
fore, we employ a double cross-validation method, where data are divided into a
training set and a test set, and the training set is further subdivided into a learning
set and a validation set. The fuzzy-based SVM classification model is trained on
Machine Learning based Early Prediction of Disease … 521

learning and validation datasets and tested on a dataset that is not touched in training
[18, 19]. Because the test dataset is left completely out of training, the results with
this experimental design can more accurately reflect the expected prediction rate in
real-world conditions [20].
Our Contribution
The main objective of this proposed work is to develop algorithms and architectures
for an implantable device that can reliably provide disease prediction with sufficient
time to trigger treatment. The specific task in this paper is to investigate the feasi-
bility of a patient-log-specific classification approach with the FSVM to distinguish
between risk factors attributes and normal attributes. The algorithm has been tested
on a dataset from the hospital NCD database, which has been made available for
comparing the results of different algorithms on the same datasets.
The remainder of the paper is organized as follows: Sect. 2 describes literature
review and Sect. 3 presents the proposed system methodology. Sect. 4 presents the
experimental setup and results. Sect. 5 discussion and highlights of the proposed
system and conclusion and feature work are drawn at Sect. 6.

2 Literature Review

Luis Eduardo et al. [21] constructed a cardiac hybrid imaging machine based on
complex assumptions. Machine learning is able to determine and understand complex
data structures in order to solve the challenges of estimation and classification.
Alarsan and Younes [22] suggested a machine learning-based ECG (electrocardio-
gram) classification method using a variety of ECG features to detect cardiac ECG
abnormalities. Golino et al. [23] described the customized diabetic patient health
monitoring system has been clearly defined by the use of BLE-based and G-based
sensors. Alfian described forecasting higher blood pressure using machine learning
approaches, these methods are not classified as chronic disease models.
In [24, 25], authors discussed machine learning can be utilized as early disease
prediction model to predict severity of chronic diseases based on the current indi-
vidual’s risk factors data. Several studies have utilized machine learning model and
revealed significant results for predicting severe illness of diabetes, hypertension,
hypothyroidism, obesity risk factors.
In [26–28], authors described a hybrid machine learning approach is a one of well-
known and widely used machine learning model. In [29, 30], author presented an
approach that the main idea is to combine two machine learning models to help reduce
bias and variance and hence improve the prediction results. Previous studies have
also used hybrid approach and shown significant outcomes for improving medical
decision making and diagnosis, predicting the severity of heart diseases, NCD. and
identifying their risk factors.
522 U. Chelladurai and S. Pandian

Fig. 1 Proposed disease prediction model for non-communicable diseases

3 Proposed System

3.1 Overview of the Proposed System

In this subsection, the proposed system presents the design view of the fuzzy-based
SVM machine learning technique for disease prediction. Figure 1 illustrates the
framework architecture and entities of the proposed system that are used in this
section. The proposed system consists of (A) data collection (B) data transforma-
tion (C) data storage and security (D) data modeling and (E) data visualization and
knowledge discovery.

3.2 Proposed System Implementation

This section briefly explains the inputs and outputs of each entity of the proposed
method. The proposed system deals with data collection, dataset creation, prepro-
cessing, feature selection, feature extraction, and applying our proposed algorithm
fuzzy-based SVM classification for prediction of diseases. The input of the proposed
system is a chronic disease dataset, and the output of the proposed system generates
the accurate possibility of diseases and identifies the patients with risky conditions.
In the below subsection, every component of the proposed system is discussed in
detail.
A. Data Collection/Health dataset Creation
The first step of the proposed method includes the processing of data from
various sources in different formats. The proposed system uses a compilation
of medical data obtained from the hospital. The dataset includes 3000 patient
Machine Learning based Early Prediction of Disease … 523

records with 75 attributes of various data types, such as statistical, unam-

biguous, and image data. The dataset includes information on the patient’s
log/medical history of care that has been maintained for the past several years.
The system extracts the dataset from patients who have been affected by non-
communicable diseases such as hypertension, hypothyroid, and obesity. The
categorized patient health dataset is maintained separately for a database called
NCD/chronic dataset. The chronic dataset contains around 1000 patient records.
Once data on chronic disease has been obtained, the system verifies each and
every data entry into the dataset for the formatting process. Dataset formatting
is a key process for the transformation of data.
B. Data Transformation
Preprocessing
Once the data is available, the next step of the proposed system is to filter the
dataset, because the dataset collected may contains noise. Data transformation
is required to improve the quality of data prior to the analysis or modeling
process. In the proposed system, the health dataset is fed into machine learning
preprocessing, which involves removing artifacts such as outliers, noise, errors,
missing values. Any of the artifacts mentioned that changes the value of the
stored data that affects the future testing and training process. In the prepro-
cessing stage, the artifact collection is removed for further analysis. Artifacts
present in the dataset, then those artifacts are referred to as Not Valid (NA)
and the entire row of artifacts is permanently excluded from the dataset. All
instances have been eliminated by removing the entire row from the dataset. The
system normalizes the values of each instance by fuzzy measuring its values
by a threshold that produces a fuzzy-based NCD dataset for feature selection.
Table 1 illustrates the dataset description with the attribute name, description,
data type, and range of values.

Feature Selection and Feature extraction

Outlier detection is performed by the system, and dataset is ready for feature selection
and feature extraction. A total of 21 features were extracted from 75 raw features of
the original dataset. Feature selection is based on highly correlated attributes such
as age, blood pressure, blood sugar, thyroid, ECG, LDL, HDL, triglyceride, body
mass index, hereditary of disease, heart beat rate, hemoglobin count, and operation
attained.
Feature selection is the automatic selection of attributes in our dataset that are
most relevant to the predicting modeling problem. It is the process of selecting a
subset of relevant features for model construction. NCD disease prediction is based
on the values of risk factors attribute.
C. Data Storage and Security
In this phase, the transformed data is ready for secure storage. The collected
health data is private and sensitive, and it is important to take necessary measures
in the transformation. In order to ensure the protection of the health data, a
secured storage solution is important to produce the entire dataset. There are
524 U. Chelladurai and S. Pandian

Table 1 Health dataset

Attribute name Description Range Data type
attribute description
GEN Patient gender 0–1 Nominal
male/female
AGE Patient age 40–85 Numeric
ELDP Elderly patient 0–1 Nominal
BMI Body mass index 0–2 Nominal
FOH Food habit 0–1 Nominal
SMO Smoking habit 0–1 Nominal
ALC Alcoholic habit 0–1 Nominal
RBP Resting blood pressure 0–2 Nominal
FBS Fasting blood sugar 0–1 Nominal
TSH Thyroid simulating 0–1 Nominal
hormone
CPT Chest pain type 0–3 Numeric
LDL Low density 0–1 Nominal
lipoprotein
HDL High density 0–1 Nominal
lipoprotein
TRY Triglyceride 0–1 Nominal
CHOL Total cholesterol 0–1 Nominal
RECG Resting electro cardio 0–1 Nominal
graphs
HBR Heat beat rate 0–1 Nominal
HEMO Hemoglobin count 0–1 Nominal
EXER Exercise 0–1 Nominal
HD Hereditary of diseases 0–1 Nominal
OPER Operation attained 0–3 Numeric

tremendous secure solutions are there, but we suggest that Blockchain tech-
nology is one technology that provides more security than others. Increased
safety of medical records, immutable patient records, patient-centric health
records, secure and verifiable medical records, and the massive amount of
medical data is available at anytime and anywhere. This technology also
addresses challenges such as interoperability, integration with existing systems,
technological, and adoption barriers. Blockchain is a decentralized peer-to-peer
architecture. It is a distributed ledger technology. Participants in the distributed
network record digital transactions into a shared ledger. Each participant in the
network stores the same copy of the shared ledger and changes to the shared
ledger are reflected in all copies in the distributed network. Blockchain tech-
nology itself a data repository, it provides security to health data and privacy to
patients.
Machine Learning based Early Prediction of Disease … 525

D. Data Modeling
Once the data has been collected, transformed, and stored in secured storage
solutions, the data processing analysis is performed to generate useful knowl-
edge. In the proposed system, a rule-based theorem has been evaluated to meet
class attributes. These attributes are highly affected/correlated with the class
attribute. The features are selected, extracted and formed a new dataset with
21 features with 1000 instances dataset. Table 2 describes the fuzzy values of
each attribute with 25 samples in this section. In the proposed system, among
21 attributes the highly correlated and disease prediction attributes are gender,
age, bmi, fbs, foh, rbp, chol and, ldl, hdl, and try. Figures 3, 4, 5, 6, and 7 clearly
show how the correlated attributes highly influence the diseases.

Fuzzy-based SVM classification and Rule-based validation

Training dataset data (D) = {yi , x i }; i = 1,2, … ,n where x i ∈ Rn represent the
ith vector and yi ∈ Rn represent the target item. The linear SVM finds the optimal
hyperplane of the form f (x) = wT x + b where w is a dimensional coefficient vector
and b is a offset. This is done by solving the subsequent optimization problem:

n
Minw , b, ξi 1/2 w 2 + C ξi (1)
i=1

s.t. yi w T xi + b ≥ 1 − ξi , ξi ≥ 0, ∀i ∈ {1, 2, . . . , m} (2)

Metrics of Sensitivity and Specificity

Sensitivity denotes that obtained true positive rate [the possibility of risk]/true
positive [the possibility of risk] + false positive that is no possibility of risk/disease.

Sensitivity = TruePositive/(TruePositive + FalseNegative)

The complement of sensitivity is specificity, and prediction of the negative class

is called true negative.

Specificity = TrueNegative/(FalsePositive + TrueNegative)

The unfairness classification, the specificity might be less than the sensitivity
predicted. Sensitivity and specificity can be combined into a single score that balances
both concerns, called the geometric mean or G-Mean (Table 3).

G−Mean = sqrt(Sensitivity ∗ Specificity)

The true positive rate is the sensitivity

• TruePositiveRate = TruePositive/(TruePositive + FalseNegative)
526

Table 2 Preprocessed fuzzy-based health dataset

GEN AGE ELDP BMI FOH SMO ALC DISH CPT RBP CHOL LDL HDL TRY FBS RECG TSH HEMO HBR EXER OPER
0 41 0 0 1 0 0 0 1 1 1 1 2 0 0 0 0 2 0 0 2
1 56 0 1 1 0 1 0 1 0 1 1 2 0 1 1 1 1 0 0 2
0 57 0 1 1 0 1 0 0 0 2 1 2 1 0 1 1 0 0 1 2
1 57 0 0 1 0 1 0 0 2 0 0 0 0 0 1 0 0 0 0 1
0 56 0 1 1 0 0 0 1 2 2 1 2 1 0 0 1 0 0 0 2
1 52 0 2 1 1 0 0 2 4 0 0 2 1 1 1 1 0 0 0 3
1 57 0 1 1 0 1 0 2 3 0 0 2 0 0 1 1 0 0 0 2
1 54 0 1 1 1 0 0 0 2 1 1 2 0 0 1 1 0 0 0 2
1 64 1 1 1 0 0 0 3 0 1 1 2 0 0 0 1 0 0 1 2
0 58 0 0 1 1 0 0 3 3 2 1 2 1 1 0 0 0 0 0 2
0 50 0 0 1 1 1 0 2 0 1 1 2 1 0 1 0 0 0 0 2
0 58 0 1 1 0 0 0 2 0 2 1 2 1 0 1 1 0 0 0 2
0 66 1 2 1 1 0 0 3 3 1 1 2 1 0 1 1 0 0 0 2
0 69 1 2 1 0 0 0 3 2 1 1 2 0 0 1 0 0 0 0 2
1 59 0 2 1 0 0 0 0 2 1 1 2 0 0 1 1 2 0 0 3
1 61 1 1 1 0 0 1 2 3 2 1 2 1 0 1 1 0 0 1 2
1 40 0 2 1 0 0 0 3 2 0 0 0 1 0 1 1 0 0 1 3
0 71 1 0 1 1 0 0 1 3 2 1 2 1 0 1 0 0 0 0 2
1 59 0 0 0 0 0 0 2 3 1 0 2 1 1 1 0 0 0 0 2
1 51 0 0 1 0 1 0 2 0 0 0 0 0 0 1 0 0 0 0 2
0 65 1 0 1 0 1 0 2 2 2 1 2 1 1 0 0 0 0 0 2
U. Chelladurai and S. Pandian
Machine Learning based Early Prediction of Disease … 527

Table 3 Confusion matrix of

Positive prediction Negative prediction
FSVM algorithm
Positive class True positive (TP) False negative (FN)
Negative class False positive (FP) True negative (TN)

The false positive rate is calculated as:

• FalsePositiveRate = FalsePositive/(FalsePositive + TrueNegative)

SVM Confusion Matrix and Statistics

p3 no yes
no 35 0
yes 0 40
Accuracy: 1
95% CI: (0.952, 1)
No Information Rate: 0.5333
P-Value [Acc > NIR]: < 2.2e-16
Kappa: 1
Mcnemar’s Test P-Value: NA
Sensitivity: 1.0000
Specificity: 1.0000
Pos Pred Value: 1.0000
Neg Pred Value: 1.0000
Prevalence: 0.4667
Detection Rate: 0.4667
Detection Prevalence: 0.4667
Balanced Accuracy: 1.0000.
The proposed algorithm, fuzzy-based SVM generates two groups of class values.
True Positive = 35 and True negative = 40, where False positive = 0 and false
Negative = 0, where 75 rows were taken into account for testing for predicting the
possibility of early care needed patients and patients are not in risk region. Initially,
the testing processed is applied for only 75 patients from our dataset. The proposed
system has been considered to classify more samples that are the entire dataset. In
this system, FSVMs have two typical parameters: false positives (FPs,) abnormal
samples possibility of severe illness and false negatives (FNs,) normal samples not
required any emergency care.
E. Knowledge Discovery and Data Visualization
Disease Prediction and Risk Identification
The modeling phase comes up with new information and valued knowledge to
be used by decision-makers. Several previous studies have been trained and on
the typical patient, a dataset using patient risk factors data by multiple prediction
models, but it fails to generate high rates in real conditions. But in our system, the
precision on the test data signifies the concrete precision in real life. To attain
528 U. Chelladurai and S. Pandian

Table 4 Results accuracy, sensitivity, and specificity of various models

Algorithm Accuracy % Classification Detection Prevalence Sensitivity Specificity
error %
Naive 97.3 2.7 35.14 36.49 96.3 97.8
Bayes
Random 100 0 36.67 36.67 100 100
forest
SVM 100 0 46.67 46.67 100 100
Decision 100 0 32.2 32.2 100 100
tree
KNN 81.36 18.64 28.81 42.37 68.0 91.18
Logistic 100 0 32.2 32.2 100 100
regression

a less fair prediction rate, the proposed system has used rule-based attribute
optimization that ensures test data samples. In the chronic dataset, if an instance
has n diseases such as hypertension, hypothyroidism, and obesity and more at
a time, then identify those instances as highest priority instances. The system
randomly selects 20% of test data and 80% of training data for continuous test
for different algorithms. Table 4 illustrates the accuracy, classification error rate,
detection values, prevalence, sensitivity, and specificity of different algorithms
compared with SVM techniques. Once the model has been well trained, the
prediction rate is evaluated by the testing model, this process is repeated until
the average prediction rate is calculated, which determines the accuracy. In the
existing systems, the accuracy and sensitivity are compared with the proposed
system Naive Bayes accuracy = 97.3% and sensitivity = 96.3, Random forest
accuracy = 100%, and sensitivity = 100 the detection and prevalence values are
less as compared with the proposed system. Decision Tree accuracy = 100%
and sensitivity = 100% and the detection and prevalence values are less as
compared with the proposed FSVM system. In KNN the accuracy = 81.36 and
sensitivity = 68.0, Logistic regression the accuracy = 100% and sensitivity =
100%. As compared with existing algorithms for the given dataset for predicting
immature diseases, the FSVM generates more accuracy than other techniques.

4 Experiments and Results

4.1 Mobile Application for Health Monitoring

Rstudio has been used for machine learning and disease prediction analysis. We have
tested a patient-centric fuzzy classification algorithm for disease prediction on the
NCD dataset of 1000 patients with 21 extracted features. To evaluate the algorithm,
we have implemented the proposed method into a mobile application to show the
Machine Learning based Early Prediction of Disease … 529

Fig. 2 Mobile application machine learning-based health checker

possibility of our proposed system in live applications. Figure 3 shows the mobile
application interface of our chronic disease dataset. The interface retrieves health
parameter according to the PatientID, and then the parameters are verified with a
threshold value and then apply rule-based SVM for predicting patient condition either
normal or abnormal. Our proposed model significantly enhanced the prediction rate
and correctly predicted the severity of diseases like hypertension, hypothyroidism,
diabetes, and obesity in 716 patients of 1000 with more sensitivity than other models.
In this section, Fig. 2 shows the mobile interface developed using an android studio.
This shows, whether the patient is in risky condition or not and what type of diseases
are possible and finally, most affected diseases are identified. The developed mobile
application uses chronic disease dataset for testing the immature state of disease.
The patient can easily monitor the current status and previous illness, according to
that patient can go for earlier care and take appropriate drugs. The developed health
checker application provides e-consultation in the pandemic situation and provides
the facility of e-medicine with homecare.

4.2 R-Studio Data Visualization

Data visualization is done through Rstudio; after applying machine learning algo-
rithms, results are plotted and presented in Fig. 3, 4, 5, 6 and 7 by using R studio data
530 U. Chelladurai and S. Pandian

Fig. 3 Distribution of correlated attributes BMI,TSH,AGE,RBP CHOL in bar plot

plotting technique. Figure 3 shows the distribution of our health dataset used in this
research, distribution of predicted attributed are with high accuracy is presented in
Fig. 4, the dataset outlier samples are presented in Fig. 5, measurements of selected
features are presented in Fig. 6 and seperation of class variables are presented in
Fig. 7.

5 Discussion

The results of the proposed system have been compared with other existing machine
learning algorithms, and the performance measures of existing algorithms were
compared and incorporated in Table 4. The proposed algorithm has shown the highest
sensitivity as compared with others. When comparing the specificity of KNN and
random forest, the proposed algorithm performs well and achieves more sensitivity
than others. It is marked from the experimental results that the proposed fuzzy-
based SVM technique with reduced attributes improves the classification accuracy
Machine Learning based Early Prediction of Disease … 531

Fig. 4 Distribution of predicted attributes in histogram plot with accuracy

Fig. 5 Outlier samples in dataset using box and whisker plot

532 U. Chelladurai and S. Pandian

Fig. 6 Distribution and measurements of selected features using density plot

Fig. 7 Separation of class variables using scatter plot

Machine Learning based Early Prediction of Disease … 533

and generated with fair results. While some studies may present greater sensitivity
than our proposed technique, their algorithms were trained and tested on the same
datasets; therefore, the results are not directly comparable.

6 Conclusion

In this paper, a machine learning based early prediction of diseases with risk factors
data of a patitent using support vector machine is proposed. The work has been carried
out by applying a rule-based FSVM algorithm after the removal of artifacts. Irrelevant
things in health datasets may harmfully affect the disease prediction process and may
generate poor results. After applying the proposed algorithm, an improved prediction
is achieved by selecting the essential features for each patient. The identified risk
factors attribute such as age, fasting blood sugar, resting blood pressure, cholesterol,
BMI, TSH, and ECG in the health database are separated for disease prediction.
The prediction also is improved by adding more features, including cross-correlation
attributes and discriminating features also improve the classification rate and enhance
accuracy. Early prediction, disease identification, and the possibility of diseases are
done through the proposed system. A mobile application has been developed for
chronic management and tested with available chronic dataset for early care, elderly
care, and homecare.

References

1. World Health Organization (online). https://www.who.int/news-room/fact-sheets/detail/non

communicable-diseases
2. World Health Organization (online). https://www.who.int/gho/ncd/en/
3. WHO (online). https://www.who.int/nmh/publications/ncd_report_full_en.pdf
4. Weka: Data Mining Software in Java. [Online]. Available: https://www.cs.waikato.ac.nz/ml/
weka/
5. Machine learning Mastry( Online) 2019
6. Alloubani A, Saleh A, Abdelhafiz I (2018) Hypertension and diabetes mellitus as a predictive
risk factors for stroke, diabetes metabolic syndrome. Clin Res Rev 12(4):577–584
7. Blumand AL, Langley P (1997) Selection of relevant features and examples in machine
learning. Artif Intell 97(1–2):245–271
8. Aggarwal CC (ed) (2014) Data classification: algorithms and applications. CRC Press, Boca
Raton, FL, USA
9. Ozcift A, Gulten A (2011) Classifier ensemble construction with rotation forest to improve
medical diagnosis performance of machine learning algorithms. Comput Methods Prog Biomed
104(3):443–451
10. Ijaz MF, Alfian G, Syafrudin M, Rhee J (2018) Hybrid prediction model for type 2 diabetes
and hypertension using DBSCAN-based outlier detection, synthetic minority over sampling
technique (SMOTE), and random forest. Appl Sci 8(8):1325
11. Fitriyani NL, Syafrudi M, Alfian G, Rhee j (2017) Development of disease prediction model
based on ensemble learning approach for diabetes and hypertension. IEEE access, special
section on data-enabled intelligence for digital health, vol7
534 U. Chelladurai and S. Pandian

12. Harliman R, Uchida K (2018) Data- and algorithm-hybrid approach for imbalanced data
problems in deep neural network. Int J Mach Learn Comput 8(3):208–213
13. Hanand J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, SanDiego, CA,
USA
14. Mohan S, Thirumalai C, Srivastava G Effective heart disease prediction using hybrid machine
learning techniques. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2923707
15. Alfian G, Syafrudin M, Ijaz M, Syaekhoni M, Fitriyani N, Rhee J (2018) A personalized
healthcare monitoring system for diabetic patients by utilizing BLE-based sensors and real-time
data processing. Sensors 18(7):2183
16. Lemaitre G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A Python toolbox to tackle the
curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563
17. Naiarun N, Moungmai R (2015) Comparison of classifiers for the risk of diabetes prediction.
Procedia Comput Sci 69:132–142
18. UCI Machine Learning Repository (2015) Chronic_Kidney_Disease Data Set. [Online].
Available: https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease
19. Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based
on data mining. Inform Med Unlocked 10:100–107
20. Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2019) Development of DPM based on ensemble
learning approach for diabetes and hypertension. IEEE Access, 7:144777–144787. https://doi.
org/10.1109/ACCESS.2019.2945129
21. Juarez-Orozco LE, Martinez-Manzanera O, Nesterov SV, Kajander S, Knuuti J (2018) The
machine learning horizon in cardiac hybrid imaging. Springer Open Eur J Hybrid Imag. https://
doi.org/10.1186/s41824-018-0033-3
22. Alarsan FI, Younes M (2019) Analysis and classification of heart diseases using heartbeat
features and machine learning algorithms. J Big Data 6:81. https://doi.org/10.1186/s40537-
019-0244-x
23. Golino H (2013) Women’s dataset from the ’predicting increased blood pressure using machine
learning, Figshare. [Online]. Available: https://doi.org/10.6084/m9.figshare.845664.v1
24. Anderson JP, Parikh JR, Shenfeld DK, Ivanov V, Marks C, Church BW, Laramie JM, Mardekian
J, Piper BA, Willke RJ, Rublee DA (2016) Reverse engineering and evaluation of prediction
models for progression to type 2 diabetes: an application of machine learning using electronic
health records. J Diabetes Sci Technol 10(1):6–18
25. Sakr S, Elshawi R, Ahmed A, Qureshi WT, Brawner C, Keteyian S, Blaha MJ, Al-Mallah MH
(2018) Using machine learning on cardiorespiratory fitness data for predicting hypertension:
The Henry Ford ExercIse Testing (FIT) project. PLoS ONE 13(4) (Art. no. e0195344)
26. Sun J, McNaughton CD, Zhang P, Perer A, Gkoulalas-Divanis A, Denny JC, Kirby J, Lasko T,
Saip A, Malin BA (2014) Predicting changes in hypertension control using electronic health
records from a chronic disease management program. J Amer Med Inform Assoc 21(2):337–344
27. Singh N, Singh P, Bhagat D (2019) A rule extraction approach from support vector machines
for diagnosing hypertension among diabetics. Expert Syst Appl 130:188–205
28. Calheiros RN, Ramamohanarao K, Buyya R, Leckie C, Versteeg S (2017) On the effectiveness
of isolation-based anomaly detection in cloud data centers. Concurrency Comput Pract Expert
29(18):e4169
29. Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for
balancing machine learning training data. ACM SIGKDD Explore Newslett 6(1):20–29
30. Goel G, Maguire L, Li Y, McLoone S (2013) Evaluation of sampling methods for learning from
imbalanced data. In: Huang D-S, Bevilacqua V, Figueroa JC, Premaratne P (eds) Intelligent
computing theories, vol 7995. Springer, Berlin, Germany, pp 392–401
Scene Classification of Remotely Sensed
Images using Ensembled Machine
Learning Models

P. Deepan and L. R. Sudha

Abstract Classification of remote sensing images (RSIs) is a challenging task and

has become an active research topic in the field of remote sensing community.
Over the past six decades, variety of machine learning algorithms such as logistic
regression (LR), K-nearest neighbours (K-NN), random forest (RF), support vector
machine (SVM) and multilayer perceptron (MLP) has been applied for scene clas-
sification. In order to improve robustness over a single model, we have introduced a
hybrid approach called as ensembling which is nothing but training multiple models
instead of a single model and to combine predictions from these models. Five different
ensemble methods, namely AdaBoost, bagging, majority voting, weighted voting
and stacking, are evaluated in this paper. For evaluating the proposed approach, we
have collected 8000 remote sensing images from PatternNet dataset and found that
ensembling majority voting technique applied with MLP, SVM-linear, SVM-kernel
and RF classifiers shows an out performance of 93.5% accuracy which is higher than
the individual classifiers.

Keywords Remotely sensed images · Scene classification · Ensemble classifier ·

Voting rule · Bagging · Boosting and stack generalization

1 Introduction

Remote sensing is the process of acquisitioning information about an earth surface

object without being any physical contact, usually by means of sonar or electro-
magnetic radiation. Classification of objects in remote sensing images (RSI) is a
fundamental and challenging task in remote sensing community area and used for
earth observation application such as agricultural monitoring, civilian monitoring
in military, city planning, crop planning, geomorphology, land use and land cover,
metrology, soil mapping [1]. The remote sensing images information can be gathered
from different resources like unnamed aerial vehicle (UAV), satellite and airplanes.

P. Deepan (B) · L. R. Sudha

Department of Computer Science and Engineering, Annamalai University, Chidambaram, Tamil
Nadu, India

There are three types of satellite imaginary: panchromatic, multispectral and hyper-
spectral images [2]. Over the past three decades, machine learning (ML) has been
success of artificial intelligence (AI) that involves the study and development of
computational models of learning process. It has been used in various applications [3]
such as image recognition, speech recognition, image classification, object detection
and web page ranking.
Supervised and unsupervised learning are main research areas in the field of
machine learning. In the unsupervised learning [4], model generation is based on
set of training data without target (output) value. The aim of unsupervised learning
algorithm is to organize data in some way or to group it into clusters or to find
different ways of looking at complex data in such a way that it appears simpler or
more organized. It is mainly used in knowledge discovery, parameter determination
and preprocessing. (1) k-means clustering, (2) Apriori algorithm is the different kinds
of unsupervised learning methods.
In the supervised learning [5], model generation is based on set of training data
with target (output) value. The model is mapping between the input and target values.
Finally, the model is used to predict the instance with unlabelled data. The purpose of
supervised learning is to provide a model that has low prediction error on future data.
Support vector machine (SVM), artificial neural network (ANN), k-nearest neighbour
(k-NN), logistic regression, decision tree [6], random forest, multilayer perceptron,
naive Bayes and ensemble learning are the different kinds of supervised learning
methods. In the past few decades, many researchers have reported the application of
multiple combine classifier (MCC) to produce a single classification in RSI images.
The resulting classifier is referred to as the ensemble classifier, and the classifier is
usually more accurate than any of the individual or base classifiers. An ensemble
classifier combines the decision of a set of classifier by hard voting or soft voting to
classify unknown examples.
The main objective of this paper is to propose a highly efficient classification
model for the classification of remote sensing images using ensembling techniques.
Ensemble model has exhibited great potential in recent decades to improve the clas-
sification accuracy and reliability of the classification of the remote sensing image
scenes. We have combined the decision of some base classifiers for the classifica-
tion of remote sensing images by the techniques, namely AdaBoost, Bagging, Hard
Voting, Soft Voting and Stack Generalization, and the results of each member clas-
sifier are evaluated. The rest of this paper is planned as follows: Sect. 2 provides
review of the literature of the proposed work. Section 3 includes the architecture of
proposed work. Section 4 evaluates the performance metrics of proposed approach.
Section 5 discusses the experimental result and analysis. Finally, conclusion is shown
in Sect. 6.
Scene Classification of Remotely Sensed Images using Ensembled … 537

2 Related Works

In the past few decades, researchers have been working for improving the classifica-
tion accuracy [7]. However, the classification accuracy is affected by the quality of
training data used and real-world data suffers from many problems that may degrade
the interpretation ability of the remote sensing data [8]. Various traditional machine
learning algorithms have been applied by various researchers with the aim of solving
the classification problems and tried to improve the current methods that exist in
order to handle complex factors. Mountrakis et al. [9] discussed the applications of
SVMs in remote sensing. In many cases, the SVM classifiers have better accuracy,
stability and robustness when compared with other classifiers such as neural network
and k-nearest neighbour. Thanh Noh et al. presented an approach that compared
land cover classification of RSIs with nonparametric classifiers such as SVM, k-
NN and random forest algorithms. In their work, they choose fourteen classes of
data and compared with the above-mentioned three classifiers. Jeevitha et al. [10]
discussed spatial information-based image classification using SVM. It is focused
on image classification by selecting the active learning approach and comparison
of their performance was made. Ayhan et al. [11] analysed various image classifi-
cation methods for RSI images. In this study, researchers compared artificial neural
networks, standard maximum likelihood classifier, and the fuzzy logic method. Based
on the comparison, ANN classification is more robust than other two classifiers.
Cavallaro et al. [12] developed an image classification model for huge amount of data
by using the support vector machine. Comparison between the k-NN and random
forest was presented in the study.
McInerney et al. [13] show that random forest classifier achieved the highest
accuracy than the k-NN classifier for scene classification of remote sensing images.
Hagar et al. [14] used a hybrid algorithm, k-NN and ANN. They used ANN for
testing and extracting features, while k-NN was used for image segmentation and
classification. They have achieved 92% of accuracy by k-NN and said it was better
than ANN as it only achieved 89.2% accuracy. David et al. [15] discussed nonpara-
metric regression and classification of ML algorithms for geosciences. The aim of
this approach is solving classification problems in the area of geosciences. Belgiu
et al. [16] reviewed random forest in remote sensing. The random forest classifier
can successfully handle the high dimensional data. Pal et al. [17] developed random
forest classifier for scene classification of remote sensing images that compared its
performance with SVM in terms of accuracy, training time and user-defined param-
eters. Zhao et al. [18] proposed an multiple-based bag of visual words model (multi-
BOVW), which is a two-phase classification method. The multi-BOVW approach
outperformed traditional score-level fusion-based multi-BOVW approach. Zanaty
et al. [19] used a comparison study by taking SVM and multilayer perceptron for
classification of data. Blanzieri et al. [20] proposed a new variant of k-NN classifier
based on the maximum margin principle. Wang et al. [21] proposed remote sensing
image classification based on SVM and modified binary coded ACO algorithm.
Huaifei Shen et al. [7] proposed a multiple classifier classifiers by using different
538 P. Deepan and L. R. Sudha

voting methods for RSI image classifications. Yunqi Miao et al. introduced a MCS
for RSI image scene classification. In this technique, MCS can successfully classify
the RSIs with higher accuracy and reduce the computation cost. The above traditional
classifier techniques are not efficient, because they give low performance of train and
test data. Taking the above disadvantages into consideration, our contribution is to
propose an ensemble model for scene classification of RSI images by using SVM,
random forest and multilayer perceptron.

3 Proposed Works

This section presents the feature extraction techniques, the traditional classifiers used
for ensembling which are also called as base classifiers and the proposed ensemble
classifier. In the first stage, speed-up robust feature techniques are used to extract the
features from the dataset. Then, the extracted feature values are applied into the base
classifiers such as support vector machine, decision tree, logistic regression, random
forest, multilayer perceptron, Naive Bayes and k-nearest neighbours. Based on the
results, the best three base classifiers are ensembled for improving the accuracy of
RSI scene classification.

3.1 Speed-Up Robust Feature (SURF) Extraction

In computer vision, SURF technique is composed of two steps, namely local feature
detector and descriptor, developed by Bay et al. [22]. The standard version of SURF
is advancement of SIFT [23]. It is much faster and more robust, since it uses invariant
features of local similarity for image matching. SURF’s initial stage is the gener-
ation of key points. The next stage defines the invariant descriptor of these key
points, which are further used for various applications such as image classification,
image registration, camera calibration, and correspondence determination between
two images of the same object. As shown in Fig. 1, the SURF feature extraction
technique consists of four stages.
First step of SURF process is shaping the integral image, and it is efficient way
for calculating the sum of values in an input image. It can also be used to measure
the average intensity of the image produced. Afterwards, search point of interest
for the coordinates [24]. A point at which direction of the boundary or edge of an
object rapidly changes is called as point of interest. Input image and corresponding
points of interest for images are shown in Fig. 2a, b, respectively. Harris corner
detector is familiar and widely used corner detector, but it is not scale invariant. The
Hessian matrix for automatic scale selection solved the problem. For the purpose of
point detection in an image, SURF uses Hessian matrix approximation that is both
scale and rotation invariant. After finding feature candidate of the image, a key point
candidate is searched by non-maxima suppression method. The last step of SURF is
Scene Classification of Remotely Sensed Images using Ensembled … 539

Fig. 1 Feature extraction technique

Fig. 2 a Sample input image, b points of interest image

to describe the obtained key point. The process is ended by finding pixel distribution
of neighbours around the key point, which generates SURF feature vector for an
input image.
540 P. Deepan and L. R. Sudha

3.2 Ensemble Classifier Learning Systems

Ensemble classification [25] is also called as committee based learning. It is an effec-

tive method to increase the accuracy rate of classification system. The ensemble
model is able to enhance weak learner that are somewhat better than the random
guess to strong aggregated learners which can make better accurate predictions.
Moreover, it combines the decisions from multiple base classifier models is much
stronger and to improve the performance of model. In most of the case, ensemble
classifier is supervised learning algorithm, because it can train with the labeled data
and then it is used to predict data. Ensemble model is consisting of two steps, namely
generating the base learning and then combining the base learners [26]. In general,
base learners are generated from training data followed by base learning algorithm
which can be decision tree, neural network or any kind of ML algorithms. Ensemble
classifier methods already achieved better success in real-world applications such
as image recognition, medical diagnosis and remote sensing. In this work, we focus
four standard ensemble methods, namely voting, bagging, boosting and stack gener-
alization. The ensemble classifiers are mainly used for improving the performance
and increasing the predication rate in RSI image classification.

3.2.1 Bagging Classifier

Bagging is the abbreviation of bootstrap aggregating. It is a simple and the most

familiar ensemble model in the machine learning techniques [27]. This model
combines the predictions from multiple base or weak classifiers together to form
better accuracy prediction rather than single base classifier model. The bagging
model is uses a statistical analysis approach for improving the estimation of one
by combining many classifier. These models construct n number of trees by using
the bootstrap sampling of training data and combine their prediction to produce the
final prediction. Bagging is a common procedure that can reduce the variance for
those algorithms having high variance. In general, the decision tree algorithms have
high variance.

3.2.2 Boosting Classifier

Boosting is a fixed method that can construct a strong classifier from a set of base
classifiers or weak classifiers [28]. The boosting is performed by constructing a
model from the set of training data and then creating a second model that attempt to
rectify the errors from the first model. The models will be applied until the training
set is perfectly predicted. Bagging has been shown to minimize the variance of the
classification, thus boosting decreases both variance and bias of the classification. In
general, boosting can achieve more accuracy of classification results than the bagging
algorithm. Boosting classifiers can be represented in the form:
Scene Classification of Remotely Sensed Images using Ensembled … 541

Fig. 3 AdaBoost ensemble classifier

N
FN = f t (X ) (1)
i=1

where f t (x) is the base classifier that takes the test sample set X and returns the
corresponding class. N represents the number of base classifiers. The computational
time of boosting algorithms is more than bagging algorithms. There are three kinds
of ensemble algorithms to boost the model, such as AdaBoost, gradient tree boosting
and XGBoost. The AdaBoost which is also called as Adaptive Boosting and it is one
of the famous ensemble algorithms that have been developed for the classification
purpose. In some cases, AdaBoost classifier fails to improve the performance of
the base classifier due to over fitting problem. The boosting ensemble classifier
operations are shown in Fig. 3.

3.2.3 Voting Classifier

The voting classifier is one of the most familiar ensemble classifiers, which is used
to combine or ensemble the traditional classifier based on the voting rule [29]. There
are two types of voting classifier, namely soft voting and hard voting. Hard voting
model is selected from an ensemble to make the final prediction based on the majority
voting for accuracy [30]. The hard voting is also known as majority voting. In soft
voting model, we can predict the class label that can take average probability of
classes. For example, consider the three traditional classifier, namely SVM, k-NN
and MLP. In hard voting, it will give score 1 of 3 (2 vote in favour and 1 against), so it
would classify as positive. Similarly, soft voting will give the average of probabilities
which is 0.6 and classify as positive. Figure 4 shows the block diagram of voting rule
classifiers.
542 P. Deepan and L. R. Sudha

Fig. 4 Voting rule classifier

3.2.4 Stack Generalization

The concept of stacking was introduced by Wolpert et al. [31] which concludes
stacking works by deducing the biases of generalizes with respect to learning set.
Breiman discussed stacked regression by using cross-validation to for the good
combination. The stacking is to stack the predictions p1 , p2 , … pm by linear
combination with weights ai , i ∈ 1,2, … , m:

m
Pstacking = ai pi (x) (2)
i=1

where the weight vector a is learned by a meta-learners. A series of individual first

level base classifiers (support vector machine, random forest, logistic regression
and multilayer perceptron) are trained on a dataset in this stacking technique and
a second-level meta classifier (SVM-kernel) is taught to predict new test samples
based on the predictions of the individual first level base classifier combinations.
Typically, the use of stacking increases the precision of individual base classifiers.
The concept of stack generalization process is shown in Fig. 5.

4 Performance Evaluation Metrics

Performance evaluation metrics are used to evaluate the performance of proposed

model. There are different types of performance metrics such as precision, recall,
accuracy and F1-measure to evaluate traditional machine learning algorithms. These
metrics are calculated using the confusion matrix as shown in Fig. 6. In this matrix,
actual classifications are in column side and predicted values are in row side. Precision
is the measure of quality or exactness, whereas recall is the measure of quantity or
Scene Classification of Remotely Sensed Images using Ensembled … 543

Fig. 5 Stack generalization classifier

Fig. 6 Confusion matrix Actual

P N
Prediction True False
Y
Positive Positive
False True
N
Negative Negative
P N

completeness. These two measures depend on True Positive (TP) in the confusion
matrix.
Let TP, TN, FP and FN denote the true positive, true negative, false positive and
false negative, respectively. The TP is a result in which the models estimate the
positive class correctly. The TN is a result in which the models estimate the negative
class correctly. The FP is a result in which the positive class is incorrectly predicted
by the models. The FN is a result where the negative class is incorrectly predicted
by the models.

4.1 Precision

The precision metrics is used to measure the proportion of positive prediction of the
proposed ensemble classification model. The number of true positive results, divided
by the number of positive results expected by the classifier, is determined.

TruePositive
PRE = (3)
(TruePositive + FalsePositive)
544 P. Deepan and L. R. Sudha

4.2 Recall

The recall measures proportion of positive that are correctly detected. The number
of correct positive results divided by the number of all relevant samples can be
calculated from the recall measure.
TruePositive
REC = (4)
(TruePositive + FalseNegative)

4.3 Accuracy

The accuracy measure can be calculated by dividing the number of correct predictions
by the total number of input samples.

TruePositive + TrueNegative
Acc = (5)
TruePositive + FalsePositive + FalseNegative + TrueNegative

4.4 F1-Score

The F1-measure (harmonic mean) is used to balance between the precision and recall
measures. The F1-score measure can be calculated as follows:
Precision × Recall
F =2× (6)
(Precision + Recall)

5 Results and Discussions

5.1 Dataset Description

The proposed ensemble model (AdaBoost, Bagging, Voting and Stack generaliza-
tion) has been developed on Python and Anaconda IDE tools. The model was applied
on the PatternNet dataset [32]. It contains totally 38 classes, 30,400 satellite images
and each class consists of 800 images. The resolution of each image has a size of
256 × 256 pixels with RGB colour space. The spatial picture resolution ranges from
0.062 to 4.69 m. We have selected ten classes randomly in our proposed work, such
as airplane, baseball field, beach, bridge, forest, harbour, overpass, river, storage
Scene Classification of Remotely Sensed Images using Ensembled … 545

tank and tennis court, and they are labelled 0–9, respectively. Some sample images
from the PatternNet dataset for RSI scene classification are shown in Fig. 7. Each
row shows sample images in each class. The dataset was independently divided into
training and testing sets. 80% of dataset are used for training the proposed ensemble
model and 20% of dataset are used for testing.

5.2 Experimental Analysis of Base Classifiers

In this section, we have analysed the performance of base classifiers such as SVM,
decision tree, logistic regression, random forest, multilayer perceptron, Naive Bayes
and K-NN. The average accuracy, precision, recall and F1-score of each base clas-
sifier were assessed and summarized in Table 1. Support vector machine and multi-
layer perceptron generated the highest performance accuracy of 92%. Random forest
took the second highest performance of 91% accuracy. The classification accuracy
by decision tree was the lowest among all eight base classifiers with accuracy of
70.75%.
In order to improve the performance of individual or base classifiers, we have
to combine those classifiers with other classifiers. In our ensemble work, we have
developed the following five ensemble learning models.
1. We have used bagging in random forest classifier.
2. With stack generalization, we have ensemble random forest, SVM-Linear, MLP,
SVM-kernel and logistic regression.
3. We have applied AdaBoost method in random forest classifier.
4. For weighted voting method, we have ensemble three base classifiers, namely
MLP, SVM and random forest.
5. Finally, for majority voting method we have ensemble base classifiers of MLP,
SVM and random forest.
Table 2 and Fig. 9 show the performance of different ensembling techniques on
the base models, namely SVM-Linear, random forest, multilayer perceptron, SVM-
kernel and logistic regression. It is inferred from the table that, the ensemble models
show better performance even though the performance of individual base classifier
is comparatively low. Also we found that, majority voting technique applied with
MLP, SVM-Linear and random forest gives highest accuracy of 93.5% and bagging
technique applied with random forest gives lowest accuracy of 91.25%. But, it is
better than all the five base classifiers used. Confusion matrix of the ensemble models
are shown in Fig. 8.
546 P. Deepan and L. R. Sudha

Fig. 7 Sample dataset images of PatternNet

Scene Classification of Remotely Sensed Images using Ensembled … 547

Table 1 Performance analysis of base classifiers

S. No. Base classifiers Accuracy Precision Recall F1-score
1 SVM-linear 92.0 92 92 92
2 Random forest 91 91 91 91
3 MLP 92 91 91 91
4 SVM-kernel 90.75 91 91 91
5 Logistic regression 90.25 90 90 90
6 K-NN 83.25 87 83 82
7 Naive Bayes 78.5 81 79 78
8 Decision tree 70.75 71 71 71

Table 2 Performance analysis of ensemble classifiers

S. No. Base classifiers Accuracy Precision Recall F1-Score
1 RF + AdaBoost 92.25 92 92 92
2 RF + Bagging 92 91 91 91
3 MLP + RF + SVM + Majority Voting 93.50 94 94 93
4 MLP + RF + SVM + Weighted Voting 93 93 93 93
5 RF + SVM + SVM − k + LR + MLP + 92 92 92 92
Stack Generalization

6 Conclusion

Scene classification in remote sensing images is a challenging problem because

objects of same category have often a diverse appearance. So to set good results,
we have introduced a hybrid approach called as ensembling which is nothing but
training multiple models instead of a single model and to combine predictions from
these models. Five different ensemble methods, namely AdaBoost, bagging, majority
voting, weighted voting and stacking are evaluated in this paper. We have observed
that majority voting shows best performance and bagging shows least performance
in the proposed ensemble models, but it is better than the best base classifier models
SVM and MLP. However, there is still scope for improvement. To handle large image
dataset, instead of ensembling machine learning techniques, deep learning tech-
niques can be ensembled for reducing the complexity and improving the classification
performance.
548 P. Deepan and L. R. Sudha

Fig. 8 Confusion matrix for our ensemble model

Scene Classification of Remotely Sensed Images using Ensembled … 549

Fig. 9 Comparative analysis of ensemble model

References

1. Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: benchmark and state
of the art. In: Proceedings of the IEEE, pp 1–19
2. Ghamisi P, Plaza J, Chen Y, Li J (2017) Advanced supervised classifiers for hyperspectral
images: a review. IEEE Geosci Remote Sens 5:1–23
3. Maxwell AE, Warner TA, Fang F (2018) Implementation of machine-learning classification in
remote sensing: an applied review. Int J Remote Sens 2784–2817
4. Cheriyadat AM (2014) Unsupervised feature learning for aerial scene classification. IEEE
Trans Geosci Remote Sens 1–12
5. Deepan P, Sudha LR (2020) Object classification of remote sensing image using deep convo-
lutional neural network. In: The cognitive approach in cloud computing and internet of things
technologies for surveillance tracking systems, pp 107–120. https://doi.org/10.1016/B978-0-
12-816385-6.00008-8
6. Akbulut Y, Sengur A, Guo Y, Smarandache F (2017) NS-k-NN: neutrosophic set-based k-
nearest neighbors classifier. Symmetry 9:1–12
7. Deepan P, Sudha LR (2019) Fusion of deep learning models for improving classification
accuracy of remote sensing images. Mech Continua Math Sci 14:189–201
8. Deepan P, Sudha LR (2020) Remote sensing image scene classification using dilated
convolutional neural networks. Int J Emerg Trends Eng Res 8(7):3622–3630
9. Mountrakis G, Im J, Ogole C (2011) Support vector machines in remote sensing: a review.
ISPRS J Photogram Remote Sens 247–259
10. Thanh Noi P, Kappas M (2018) Comparison of random forest, k-nearest neighbor, and support
vector machine classifiers for land cover classification using sentinel-2 imagery. J Sci Technol
Sens 1–20
11. Jeevitha P, Ganesh Kumar P (2014) Spatial information based image classification using support
vector machine. Int J Innov Res Comput Commun Eng 14–22
12. Ayhan E, Kansu O (2012) Analysis of image classification methods for remote sensing
experimental techniques. Soc Exp Mech 18–25
13. Cavallaro G, Riedel M, Richerzhagen M (2013) On understanding big data impacts in remotely
sensed image classification using support vector machine methods. IEEE J Select Top Appl
Earth Observ Remote Sens 1–13
14. McInerney DO, Nieuwenhuis M (2007) A comparative analysis of kNN and decision tree
methods for the irish national forest inventory. Int J Remote Sens 4937–4955
550 P. Deepan and L. R. Sudha

15. Hagar HME, Mahmoud HA, Mousa FA (2015) Bovines muzzle classification based on machine
learning techniques. Procedia Comput Sci 65:864–871
16. David J, Alavi H, Gandomi H (2015) Machine learning in geosciences and remote sensing.
Geosci Front 1–9
17. Belgiu M, Drăguţ L (2016) Random forest in remote sensing: a review of applications and
future directions. ISPRS J Photogram Remote Sens 24–31
18. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens
217–222
19. Zhao L, Tang P, Hue L (2016) Feature significance-based multi bag-of-visual-words model for
remote sensing image scene classification. J Appl Remote Sens 1–9
20. Zanaty EA (2012) Support vector machines (SVMs) versus multilayer perceptron (MLP) in
data classification. Egypt Inf J 177–183
21. Blanzieri E, Melgani F (2008) Nearest neighbor classification of remote sensing images with
the maximal margin principle. IEEE Trans Geosci Remote Sens 1804–1811
22. Wanga M, Wana Y, Yeb Z (2017) Remote sensing image classification based on the optimal
support vector machine and modified binary coded ant colony optimization algorithm. Inf Sci
1–22
23. Miao Y, Hainan WH, Zhang B (2018) Multiple Classifier System for Remote Sensing Images
Classification, pp 491–501, Springer Nature Switzerland AG
24. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: Computer
vision–ECCV, pp 404–417, Springer, Berlin
25. Krig S (2014) Interest point detector and feature descriptor survey. In: Computer vision metrics,
pp 217–282, Springer, Berlin
26. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction
techniques in machine learning. In: International conference on science and information, pp
372–378
27. Han M, Zhu X, Yao W (2012) Remote sensing image classification based on neural network
ensemble algorithm. Int J Neuro Comput 33–138
28. Chen Y, Dou P, Yang X (2017) Improving land use/cover classification with a multiple classifier
system using adaboost integration technique. J Remote Sens 1–20
29. Galar M, Fernandez E, Bustince H, Herrera F (2012) A review on ensembles for the class imbal-
ance problem: bagging, boosting and hybrid-based approaches. IEEE Trans Comp Package
Manuf Technol 463–484
30. Dieterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems,
pp 1–15, Springer, Berlin
31. Shen H, Lin Y, Tian Q (2018) A comparison of multiple classifier combinations using different
voting-weights for remote sensing image classification. Int J Remote Sens 3705–3722
32. Kuncheva LI, Rodríguez JJ (2012) A weighted voting framework for classifiers ensembles. Int
J Knowl Inf Syst 1–17
33. Wolpert DH (1992) Stack Generalization. Neural Network 241–259
34. PatternNet Dataset is available at https://sites.google.com/view/zhouwx/dataset
35. Mohandes M, Deriche M, Aliyu S (2018) Classifiers combination techniques: a comprehensive
review. IEEE Access 1–14
Fuzziness and Vagueness in Natural
Language Quantiﬁers: Searching
and Systemizing Few Patterns
in Predicate Logic

Harjit Singh

Abstract The study of quantiﬁcation has become a signiﬁcant area of research

among logicians, philosophers, linguists, computer scientists, and others. Related to
quantification, the form of generalized quantifiers is generally found in natural
language. However, another type of quantifiers, which is called fuzzy and vague
and they always need sort of proper treatment and resolution. Many proposals
like (‘the fuzzy inclusion relation’ and ‘monotonicity in the ith argument’) have
been introduced to deal such issue. Here, we especially focus on to a natural
language (Punjabi) to study few but important fuzzy quantifier cases that do not find
in Hindi. We select 29 predicates and compare them in two data sets with these
(ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]; ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i]; and ਮਸਾ-ਕੁ [məsɑ-kʊ])
fuzzy quantifiers and suggest the mapping plan. After comparing both languages
(Punjabi and Hindi), we claim that Punjabi has comparatively strong quantification
and it can be discussed and analyzed in a predicate logic.

Keywords Quantiﬁers Fuzzy forms Punjabi Hindi Predicate logic

1 Introduction

The term ‘quantifier’ is derived from the Latin word ‘quantitas,’ which means
a sense of quantity. According to Aristotle, quantifiers are basically expressions of
universal and existential quantifiers. In the beginning, logicians have not defined
quantifiers but also to place them in a grammatical framework by analyzing the new
quantification structures. It has occupied a significant place in the traditional and
modern logic studies. In general, quantifiers are defined as the quantity carrier
words in the linguistics discipline. In fact, they consider as nouns, names, and noun
phrases and on the other hand, they appear with verbs or verb phrase structures.
They are generally symbolized with ‘e’ to indicate entities like nouns and noun
phrases. In English, noun phrases are defined under ‘generalized quantifiers’ those

H. Singh (&)
Indira Gandhi National Tribal University, Amarkantak, Madhya Pradesh 484887, India

have specific properties. Further, the study of noun phrases usually brought up the
types concept e <e, t> , e, <e, t> t to analysis the English noun phrases [1].
By theoretical assumptions of quantifiers in the context of a natural language, we
have found that there is a model like M = (E, [[ ]]) where the distribution of
E considers as the discourse’s domain and [[ ]] it indicates to the assignment
function for quantifiers in a M (model). Here, it is an important to notice that the
denotation property of nominal category is represented through Quantifier [2].
In this paper, we have total seven sections. Section 1 discusses the introductory
part of quantifiers. Section 2 surveys the quantifiers and fuzzy quantifiers in gen-
eral. Section 3 presents the aims and objectives of the study. Section 4 deals with a
contrastive study of quantifiers both in Punjabi and Hindi. Section 5 describes the
vague nature of Punjabi quantifiers and predicate logic. Section 6 presents the
results of the study. Section 7 concludes that the nature and function of fuzzy
quantifiers is significant not only for a language specific purpose but also to gen-
eralize the mapping plan and strategies.

2 Related Works

The idea of ‘quantiﬁcation’ is related to Aristotle who brought the ‘syllogism’

system to define the expressions like (all, not, not all, some) in a natural language. It
is described by (Fig. 1).1
Here, such Quantifier1, Qunatifier2, and Quantifier3 stand in opposition and later on,
the proposal of opposition and contractions have attracted to the logicians and
mathematicians those who brought new logical investigations and founded the
modern logic [3]. It has been investigated that variables like x, y, z may be filled
up with names and they could represent to the objects within a discourse. We find
that ‘x’ is a variable that can replace with a name like ‘cat,’ ‘dog,’ ‘horse’ in animal
discourse. Further, such information can also be restructured by ‘8’ universal
quantifiers in ‘anything,’ ‘everything,’ and ‘all thing’ contexts.

‘x0 . . .‘x0 ½variable

‘x0 . . .‘cat0 =‘dog0 =‘horse0 ½names
½ þ animal living objects
‘x0 . . .8x
X is a cat=dog. . .8x; x is a cat=dog

1
Note that the first combination of (A and E) shows the universality of quantifiers and the next one
(I and O) seems us the combination of negative quantifiers. Affirmation within quantifiers iden-
tified with a group of A and I. While the E and O group becomes fused to create negative
quantifiers. It determines the relations, particularly ‘the binary relation between the sets’ of
quantifiers.
Fuzziness and Vagueness in Natural Language Quantifiers … 553

Fig. 1 An opposition within

quantiﬁers

Similarly, existential quantifiers like ‘9’ contains ‘some’ for some kind of senses
to define the variable x in the form of an object either animals or humans in a given
discourse. It is an interesting to check the availability of animal objects with such
quantifier [4, pp. 57–60].

‘x0 is a horse
9x ðx is a horseÞ

Linguistically, we know that quantiﬁers contain syntactic and semantic infor-

mation. We can say that they are two important elements are using to deﬁne the
quantiﬁers.

Q¼Q syntactic
DETs expressions
Q ¼ Q½semantic
DETs ðn þ 1Þ ary quantifiers; n 1

In the standard predicate logic, we have found that DETs such as (every, a, the)
are determiners that deﬁne the binary relations in a natural language. Sometimes,
we have seen that DETs behave like denotations [5]. Each lexical item in the form
of a word class like (Noun, Verb, Adjective, Adverb, Preposition, etc.) consists of a
speciﬁc meaning in a natural language. Due to various contexts of a single lexical
item, sometimes it may create ambiguity in a discourse. Such situations may
be appeared as fuzzy or vague within a logic and they could be dealt with the help
of fuzzy sets2 [6].

2
The class of objects and with the degree of members may generally be discussed under fuzzy set.
We generally assume that a set is a combination of objects and things that carry the values either 0
or 1. And we also discuss union, intersection, complement like features under fuzzy set.
554 H. Singh

3 Objectives of the Study

• To survey the quantiﬁers and fuzzy quantiﬁers in general

• To present the comparative study of Punjabi and Hindi quantifiers
• To collect the data of fuzzy quantifiers in Punjabi
• To investigate the place of quantifiers in a predicate logic
• To develop the mapping procedure for fuzzy quantifiers in a predicate logic.

4 Quantiﬁers in Punjabi and Hindi

In this section, we first compare the quantifiers in both Punjabi and Hindi.
Secondly, we particularly observe a phonetic matter in the context of Punjabi
only and are proposing an additional inventory of few fuzzy quantifiers those are
not similar with Hindi.

4.1 Punjabi Quantiﬁers

Punjabi is a modern Indo-Aryan language, which is written in Gurumukhi script. It

has developed a tonal system. In general, it follows the subject–object–verb
(SOV) word order, where there is a lot of scope for scrambling. Morphologically, it
permits derivational and inflectional forms. It has singular and plural number and
masculine and feminine gender. It has open like (noun, verb, adjective, adverb,
pronoun) and close (post-position, conjunction, inter-junction, etc.) word classes
system. It has also certain types of quantifiers. Some quantifiers have fuzzy nature in
Punjabi. Tables 1, 2, and 3 show certain types of quantifiers in Punjabi (Table 1).
In Punjabi discourse, we do not have exact number and gender features related
to such quantifiers. On the other hand, the plural markers such as ‘ਏ’, ‘ਆਂ’ cannot
be applied with all quantifier forms. It is represented by Table 2.
Table 2 has shown that only few quantifier forms like (ਥੌੜਾ ! ਥੌੜ/ੇ
ʈhoɽɑ ! ʈhoɽe; ਬਹੁਤ ! ਬਹੁਤੇ/bohətə ! bohəte; and ਕਈ ! ਕਈਆਂ/kəi ! kəiɑ)
appear with ‘ਏ’, ‘ਆਂ’ plural markers in Punjabi. Along with such forms, Punjabi is
also rich with few other quantifier forms that are vague and fuzzy. See Table 3.
In Table 3, we may argue that above quantifiers as (ਮਾੜਾ-ਜਾ/[mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ/
[ɟəmɑ-i]; ਮਸਾ-ਕੁ/[məsɑ-kʊ]; and ਭੌਰਾ-ਕੁ/[pòrɑ-kʊ]) may only be appeared in Punjabi.3

3
Note that such quantifiers (ਮਾੜਾ-ਜਾ/[mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ/[ɟəmɑ-i]; ਮਸਾ-ਕੁ/[məsɑ-kʊ]; and ਭੌਰਾ-ਕੁ/
[porɑ-kʊ]) may be considered as fuzzy quantifiers in Punjabi. Mostly, linguists have thought that
Punjabi and Hindi are similar; however, they ignore the language specific aspect here. Related
these quantifiers in Punjabi, we have an idea that Hindi speakers do not have any transliteration.
Fuzziness and Vagueness in Natural Language Quantifiers … 555

Table 1 Punjabi quantiﬁers

Table 2 Quantiﬁers with plurality markers ‘ਏ’ and ‘ਆਂ’

4.2 Hindi Quantiﬁers

Hindi is a new Indo-Aryan (NIA) language, which has a Devanagari script. It has a
subject–object–verb word order, and it is rich with open and close word classes. It
has singular and plural number. It has masculine, feminine like gender properties
and types of cases that marked at nominal classes. Sometimes, it is found similar
with Punjabi; however, there are many differences between them. Like Punjabi, it
has also an inventory of few quantifiers. Table 4 shows the list [7].
Table 4 shows that there are 10 types of quantifiers found in Hindi. Of these,
हर/hər [every] and कुछ/kʊʃə [something] are types of universal and existential
quantifiers. We can compare थोड़ा/ʈhoɽɑ [a little]; लगभग/ləɡəbhəɡə [approxi-
mately]; सर/ser [approximately two ponds in weight] like quantifiers with Punjabi.
556 H. Singh

Table 3 Comparison related fuzzy quantiﬁers in Punjabi and Hindi

As we have already seen in Table 3 that there are four types consider as fuzzy
quantifiers. They may or may not similar with Hindi quantifiers, or it is possible that
the difference may be happened due to phonetics only. Here, we must argue that
Punjabi has already similar phonetics and usage of these Hindi quantifiers.
However, the fuzzy quantifiers, which have been discussed in Table 3 are addi-
tional forms in Punjabi and they could not be assumed even phonetically and not
possible from translation point of view in Hindi. Thus, we can say that even the
semantics could be matched for fuzzy quantifiers in both the languages however
Hindi quantifiers like थोड़ा/ʈhoɽɑ [a little]; लगभग/ləɡəbhəɡə [approximately]; सर/
ser [approximately two ponds in weight] could not be compared with Punjabi
fuzzy quantifiers because it is not a matter of phonetics only. In fact, it is a matter
of language specific that grants such variety only in Punjabi.

5 Vague Nature for Punjabi Quantiﬁers: Some

Investigations in a Predicate Logic

We have argued that a list of Punjabi quantifiers in Table 3 is not found anywhere
in Hindi language. All they are vague in nature. Before going to discuss them, we
would like to begin with generalized quantifiers such as ‘ਹਰ’ (every) and ‘ਕੁੱਝ’
(some) in a Punjabi. Table 5 shows them in a relation to predicate logic.
Table 5 has demonstrated that ‘x’ variable stands for a man, and it also repre-
sents the property of honesty in both the contexts of universal quantifier and
existential quantifier. On the other hand, it is significant to notice that it is bound
with both the quantifiers in Punjabi [8].
Punjabi has few quantifiers (like ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]; ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ
[ɟəmɑ-i]; and ਮਸਾ-ਕੁ [məsɑ-kʊ]) that generally found as fuzzy. They are
Fuzziness and Vagueness in Natural Language Quantifiers … 557

Table 4 Hindi quantiﬁers

comparatively important in a spoken discourse, however, they found less attractive.

We could try to generalize them according to a predicate logic.4
Table 6 has shown the usage of fuzzy quantifiers in the simple sentences. We
have found that the last sentence ਰਮੇਸ਼ ਮਸਾ-ਕੁ ਨਾਰਾਜ਼ ਹੈ is less acceptable with a ਮਸਾ-ਕੁ
[məsɑ-kʊ] quantifier, because it does not have a similar sense like other quantifiers.
Moreover, it is found that a name Ramesh and the state of annoyed have been used
in all sentences. We could assume them as individual constant and predicate con-
stant, respectively. Here, Ramesh is an individual constant and annoyed is predicate
constant. In other words, we can say that annoyed is the property of an individual
(Ramesh). See Table 7.
Table 7 indicates that ‘Ramesh’ is the individual constant and ‘a bit annoyed’ is
the predicate constant. In a predicate logic, annoying state considers as a property
for an individual and he/she can be replaced with a ‘x’ variable. For a predicate
variable, we can keep U symbol that is represented as is follows.

4
In predicate logic, both definities and indefinities play a significant role and they generally
identify with type symbols <e, t/t> to consider each and every expression as type only [10].
558 H. Singh

Table 5 ‘ਹਰ’ and ‘ਕੁੱਝ’ in predicate logic

R (Ab/c) = (a bit/completely annoyed property of) R

R (x) = (x cannot denote a particular individual, it can be anyone) R
U (x) = (U is a predicate variable)
R (x) ! A (x) = (Ramesh is x and a annoyed property also associated with a x).
Thus, it is clear that a simple sentence and complex sentence contain predicates.
They are only combination of individual constants and predicate constants that can
be substituted with variables. And they also allow to universal quantifiers, exis-
tential quantifiers, and fuzzy quantifiers. The purpose behind is that it gives a formal
representation of predicates to understand a concept and analysis it in a systematic
way [9: 58–71].

5.1 Structuring Fuzzy Quantiﬁers

We have selected total 29 verbs in (its basic, imperfective and perfective forms) to
structure such fuzzy quantifiers in Punjabi. We may notice the difference between
all four types (like ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]; ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i]; and ਮਸਾ-
ਕੁ [məsɑ-kʊ]) when they come with predicates. Find Table 8.
Table 8 has represented the data set for Punjabi fuzzy quantifiers in relation to
number of predicates. We find out that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] quantifier may
successfully be appeared with all mentioned predicates. While ਭੌਰਾ-ਕੁ [pòrɑ-kʊ] has
comparatively not shown any correspondence with total eight predicates in the
table. Here, we can also see the grammatical changes in verbal predicates due to
direct influence of fuzzy quantifiers.
Table 9 has represented the grammatical (present and past perfect) informa-
tion and negation contexts of fuzzy quantifiers in Punjabi. As we have already seen
in Table 8 that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] quantifier has qualified both perfective and
Fuzziness and Vagueness in Natural Language Quantifiers … 559

Table 6 Fuzzy quantiﬁers

Table 7 ਰਮੇਸ਼ ਭੌਰਾ-ਕੁ/ਮਾੜਾ-ਜਾ/ਜਮ੍ਹਾ-ਈ/ਮਸਾ-ਕੁ ਨਾਰਾਜ਼ ਹੈ in a predicate logic

negative contexts. While remaining other quantiﬁers have not been ﬁtted according
to grammatical information, and we have found them unadjustable and incontextual
here.

5.2 Mapping Plan for Fuzzy Quantiﬁers

Based on the above data set, we may assume few steps to generalize and map the
quantifiers in a natural language like Punjabi. The following steps may present a
thought of as an algorithm.
S1Input: a simple sentence or groups in a text/discourse
S2Search: manually all sorts of quantifiers, especially focus will be given to fuzzy
forms
S3 Slots: initially investigated quantifiers keep in separate slots a, b, c, etc.
S4Replacement: Go to step 1 and replace each quantifier with other in a same text/
discourse
S5Check the results: notice the results before and after the replacement
S6Single slot: After getting the satisfactory results then keeps only one slot for
resulted quantifiers (Fig. 2).
560 H. Singh

Table 8 Predicates with fuzzy quantiﬁers

6 Discussions and Results

Vagueness and the fuzzy character of quantifiers are commonly found in a natural
language. Sometimes, they give monotone effects to the spontaneous speech,
however, not always much recognized due to certain contextual linkage. In pre-
vious sections, we have seen the availability of few fuzzy quantifiers with predicates
and their occurrences in a grammar. Secondly, we have also intuitively tried to draw
a mapping plan to search them and finally to keep them in only one slot. When we
carefully look at Table 8 then we find that the selected four categories of fuzzy
quantifiers give different results. See Fig. 3.
Fig no (3) presents that ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] may be occured total (29/29 = 100%)
with total 29 predicates. Secondly, ਮਸਾ-ਕੁ/ਈ [məsɑ-kʊ/ɪ]) has (26/29 = 89.65%)
Fuzziness and Vagueness in Natural Language Quantifiers … 561

Table 9 Grammatical sketches of fuzzy quantiﬁers

successfully appeared with all predicates. Only ਭਰਾ-ਕੁ [pòrɑ-kʊ] has received the
lowest percentage (20/29 = 68.96%) in relation to predicates.
Further, we have studied these four fuzzy quantifiers in the grammatical con-
texts. We have selected four predicates (ਚੱਲੀਂ/tʃəli:; ਖਾਈਂ/khɑi:; ਦਈਂ/dəi:; and ਆਈਂ/ɑ:
i), and we find that ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i] does not tolerate (perfect or negative) contexts
and it gives zero result, whereas other quantifiers appear and adjust with the
predicates. Study Figs. 4, 5, 6, and 7.
Fig no (4) 4 shows that the selected predicates can be mapped onto ਮਾੜਾ-ਜਾ
[mɑɽɑ-ɟɑ]. It appears with each predicate form with (4/4 = 100%) results and is
fully adjustable.
Fig no (5) determines that ਭੌਰਾ-ਕੁ [pòrɑ-kʊ] has appeared only with (ਚੱਲੀਂ/tʃəli:;
ਖਾਈਂ/khɑi:; and ਦਈਂ/dəi:) predicates, and it shows (3/4 = 75%) correct result. On the
other hand, it does not come with ਆਈਂ/ɑ:i predicate.
Fig no (6) suggests that ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i] cannot appear and adjust with
above-mentioned four predicates so that it gives only (0/4 = 0%) results.
Fig no (7) shows that ਮਸਾ-ਕੁ/ਈ [məsɑ-kʊ/ɪ]) can appear with all except of ਖਾਈਂ/
khɑi:. It also gives same results (3/4 = 75%) as we got in the case of ਭੌਰਾ-ਕੁ [pòrɑ-
kʊ].
In this way, we have found that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] a type of fuzzy quantifier
is qualified with (29/29 = 100%) and (4/4 = 100%) results in selected both
562 H. Singh

Fig. 2 Steps of an algorithm

Tables 8 and 9 data sets. In brief, Figs. 8 and 9 will summarize the same results in
more concrete way.
Both Figs. 8 and 9 are clearly showing the data entries of each fuzzy quantiﬁer
against each predicate in the total numbers.
Fuzziness and Vagueness in Natural Language Quantiﬁers … 563

Fig. 3 Fuzzy quantiﬁers in predicates

Fig. 4 ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ]

564 H. Singh

Fig. 5 ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]

Fig. 6 ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i]

Fuzziness and Vagueness in Natural Language Quantiﬁers … 565

Fig. 7 ਮਸਾ-ਕੁ/ਈ [məsɑ-kʊ/ɪ]

Fig. 8 Fuzzy quantiﬁers with few predicates

566 H. Singh

Fig. 9 Predicates and fuzzy quantiﬁers (total in numbers)

7 Conclusion and Future Endeavors

The study of fuzzy quantifiers in a predicate logic has been noticed long ago. It may
be related with linguistics, philosophy, mathematics, computer science, and engi-
neering. While discussing the natural languages, we primarily focus on to Punjabi
and take a U-turn with Hindi to generalize few ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]; ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ];
ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i]; and ਮਸਾ-ਕੁ [məsɑ-kʊ] like forms of fuzzy quantifiers and
have prepared the mapping plan. It is remembered that this is a first and an initial
level observation in Punjabi so that the selection of data sets and the mapping plan
may be revised in future. Based on two types of data sets (1) data set with 29
predicates and 4 fuzzy quantifiers and (2) data set with 4 predicates and 4 fuzzy
quantifiers, we have found that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] a type has successfully
achieved 100% results, whereas another type ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i] has not appeared a
single time with any predicate in a second data type.

Acknowledgements I would like to thank Vanita Chawadha (a Ph.D. Scholar at JNU) who has
helped me to compare the data sets and to meet the results.

References

1. Bach E, Jelinek E, Kratzer A, Partee BB (eds) (2013) Quantification in natural languages, vol
54. Springer Science and Business Media
2. Gutiérrez-Rexach J (ed) (2003) Semantics: generalized quantifiers and scope, vol 2. Taylor
and Francis
Fuzziness and Vagueness in Natural Language Quantifiers … 567

3. Peters S, Westerstahl D, Westerståhl D (2006) Quantiﬁers in language and logic. Oxford

University Press
4. Epstein RL (2011) Classical mathematical logic: the semantic foundations of logic. Princeton
University Press
5. Wick M, Gabbay DM, Guenthner F (2007) Handbook of philosophical logic, vol 14
6. Glöckner I (2008) Fuzzy quantiﬁers: a computational theory, vol 193. Springer
7. Kachru Y (2006) Hindi, vol 12. John Benjamins Publishing
8. Barwise J, Etchemendy J, Allwein G, Barker-Plummer D, Liu A (2000) Language, proof and
logic. CSLI publications
9. Allwood J, Andersson GG, Andersson LG, Dahl O (1977) Logic in linguistics. Cambridge
University Press
10. Coppock E, Champollion L (2019) Invitation to formal semantics. Manuscript, Boston
University and New York University. eecoppock.info/semantics-boot-camp.pdf
An Attempt on Twitter ‘likes’ Grading
Strategy Using Pure Linguistic Feature
Engineering: A Novel Approach

Lovedeep Singh and Kanishk Gautam

Abstract Twitter is one of the people’s favorite social media platforms used for
sharing thoughts about different aspects may it be emotional like ‘love’, ‘motivation’,
‘dedication’ businesses like ‘marketing’, ‘startup’, ‘blogging’ or health like ‘gym’,
‘fitness’, ‘food’, and similar areas. People follow hashtags for topics in their interest.
Agreement of a tweet can be measured by likes or retweets. This paper deals with
pure linguistic features other than using embeddings in vector space via TFIDF
or Doc2Vec. This paper deals with a collection of tweets on such hashtags and
classifying the level of likes the tweet will get using pure linguistic features in the
form of a grade.

Keywords Twitter · Likes · Hashtags · Popularity · Agreement · Natural language

processing (NLP) · Classification using pure linguistic features

1 Introduction

With the advent of social media platforms, most of the population is engaged on these
platforms in their day-to-day life. Twitter has been one of the most preferred platforms
for expressing opinions about primitive spheres in daily life. Twitter is considered
as more convenient by people owned to its easy and light user interface with little to
no complex features to offer. People use twitter to connect with people with similar
interests in real time. Hashtags are added to the tweet so that the community members
can take part in the conversation. Twitter has also been preferred as a platform for
marketing and recently also being utilized by the education sector. People have the
choice to retweet, like, or comment on a tweet. Retweets are often used to promote
content on twitter.
Previous literature has mostly focused on specific areas like politics, celebrities,
and sports. Their aim has been to study about the popularity of a tweet with retweet
as the popularity measure. These have considered all aspects like user details, which

L. Singh (B) · K. Gautam

Computer Science and Engineering, Punjab Engineering College, Chandigarh, India

include user followers, user verified or not, tweet details such as time of the tweet,
comments, likes, text of tweet either via vector representation or linguistic features
to train a machine learning model for predicting retweets. This paper tries to use only
linguistic features of the text without going in space vector representations to predict
the possible range of number of likes the tweet will get. It focuses on tweets related
to primitive aspects encompassing emotional, health, and other mundane areas. We
collect recent tweets from 2019 Jan to 2019 Dec and use classification algorithms to
predict the like_grade(the bracket of the number of likes) for a tweet.
In this paper, we try different combinations of features and classification
algorithms on the dataset.

2 Related Work

The popularity of a tweet can be measured using the number of ‘retweets’, ‘likes’, and
‘comments’. Earlier work [1] has been done to predict the popularity using ‘retweets’
as the measure of popularity. Researches [2] have also tried to analyze user influence
on twitter using various complex techniques. We try using ‘likes’ as the agreement
measure, ‘retweets’ can be more attributed to the promotion of content being either in
favor, being against, or simple promotion for business purposes. People retweet the
posts to introduce their followers to someone new. A retweet is used when the goal
of the retweet is to amplify your message. People like a tweet when they are in favor
of the tweet. It appears as ‘likes’ is the more appropriate indicator of agreement.

3 Dataset

We kept data limited to recent tweets of a year (2019 Jan to 2019 Dec) rather than
older tweets to keep up with the latest writing style and trends on twitter. We did
not restrict our domain to any specific area to keep the study more generic and open.
Data have been collected based on common hashtags and not specific to any sector.

3.1 Collection

We collected a total of 96 K tweets based on the following 65 hashtags—

( ‘travel’, ‘marketing’, ‘giveaway’, ‘win’, ‘quote’, ‘art’, ‘startup’, ‘socialmedia’,
‘Love_Yours_’, ‘JIMIN’, ‘BTS’, ‘POTUS’, ‘WorldCup’, ‘WorldPizzaDay’, ‘Game-
ofThrones’, ‘TBT’, ‘Blogging’, ‘love’, ‘followback’, ‘Twitterers’, ‘tweegram’,
‘photooftheday’, ‘20likes’, ‘amazing’, ‘smile’, ‘follow4follow’, ‘like4like’, ‘look’,
‘instalike’, ‘igers’, ‘picoftheday’, ‘food’, ‘instadaily’, ‘instafollow’, ‘followme’,
‘girl’, ‘instagood’, ‘bestoftheday’, ‘instacool’, ‘colorful’, ‘style’, ‘swag’, ‘comment’,
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure … 571

‘blackandwhite’, ‘health’, ‘fitness’, ‘fit’, ‘flex’, ‘gymlife’, ‘mcm’, ‘wcw’, ‘fitfam’,

‘quotes’, ‘motivation’, ‘active’, ‘grind’, ‘focus’, ‘dedication’, ‘car’, ‘cars’, ‘driver’,
‘food’, ‘foodporn’, ‘happynewyear’). We used twitterscraper1 for collecting tweets
on these hashtags. We kept the language restricted to English, time of the tweets
restricted to the year 2019 and retrieved the contents in form a CSV. You can find
the sample query for the same below.
$ twitterscraper "#happynewyear" --lang en -c -bd 2019-01-01 -ed
2019-12-31 -o happynewyear.csv

3.2 Filtering

As paper focuses on using the linguistic features, we filtered out data for tweets
that had external links in the form of images, videos, or websites. This reduced the
number of tweets to about 12 K records.

3.3 Preprocessing

The processed the data for any duplicate rows and removed the same. The number
of records after the removal of duplicates came about 11 K. The reduction can be
attributed to duplication due to multiple hashtags. A tweet could have come multiple
times under different hashtags.

4 Methodology

Figure 1 clearly states the flow starting right from the data in the form of tweets till
the final machine learning model training and predictions.
We studied the available approaches to deal with textual data in natural language
processing (NLP). One of the popular strategies is to represent text as a vector in
multi-dimension and use classical machine learning algorithms like support vector
machine (SVM) and K-nearest neighbors (KNN). Although this approach has been
successful to some degree, it is likely to lose linguistic features such as user sentiment.
Therefore, we try to approach this using linguistic features rather than using TFIDF
[3] or Doc2Vec [4] that convert text into vector. Researchers [5] have studied linguistic
features and have discussed their importance in text quality.

1 https://github.com/taspinar/twitterscraper.
572 L. Singh and K. Gautam

Fig. 1 Complete flow of the methodology

4.1 Linguistic Features

We use the following features that depict the user’s mood, user writing tone, style of
writing, etc.
F1-Text Sentiment (t-sentiment)
We use Vader Sentiment Analysis2 [6] to get the sentiment of the text of the tweet. We
use the compound sentiment form the API. It is a number from −1 to +1 exclusive. It
depicts the tone of the writer, whether he was positive, negative, or neutral, depending
on the number.
F2-Hashtag Sentiment (h-sentiment)
Hashtags are an important part of the tweet that is used to share it among people with
similar interests. The sentiment of the hashtags is an important predictor of the kind
of writer and the followers of the hashtag. We also use this along with text sentiment
and use Vader sentiment analysis for the same.

2 https://pypi.org/project/vaderSentiment/.
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure … 573

F3-Punctuation Count (p-count)

Punctuations are an important part of language and give valuable insight to the type
of content, user writing style, and other hidden features. We use the total count of
the number of punctuations as a part of the features.
F4-Hashtags Count (h-count)
Hashtags are used to show tweet in the feed of a user following the hashtag. More
the number of hashtags means that the tweet is likely to be seen by more number
of people and get more attention and maybe more likes. Therefore, we also use the
number of hashtags as a part of the features.
F5-Readability
Readability is a yardstick that reflects the ease to comprehend the written text. It
primarily depends upon the vocabulary and syntax of the text. It is an important
aspect that can affect the likelihood of the user to read the tweet or simply skip
through it due to a large amount of alien jargon. Hence, we also keep readability as
a part of the features. We use textstat3 to calculate readability.

4.2 Problem Formulation

We transform the number of likes in the form of a grade (like_grade). Each grade
corresponds to a bracket of number of likes obtained dynamically from the given
dataset. We use the following method to obtain the grade brackets rather than using
hardcoded values.

3 https://pypi.org/project/textstat/.
574 L. Singh and K. Gautam

x = mean(likes).
sd = std(likes).
A1 = x + sd.
A2 = x + 0.95 ∗ sd.
A3 = x + 0.9 ∗ sd.
.
.
.
A21 = x.
A22 = max(0, x − 0.05 ∗ sd).
A23 = max(0, x − 0.1 ∗ sd).
.
.
.
A41 = max(0, x − sd).

where A1, A2 … A41 are the grades, x is the arithmetic mean of the number of likes
in the given dataset, sd is the standard deviation for the number of likes in the given
dataset. We try to keep a large number of grades to keep the bracket size small.

4.3 Machine Learning Models

Classification has been one of the primitive areas of machine learning. We have
evolved with a number of algorithms in this area. We use the following most
popular classification models in our study. These are support vector machine (SVM),
K-nearest neighbors (KNN)—50 neighbors, decision tree, Random Forest—300
estimators.
Each of these models has already been used for multi-label classification problems
and is not limited to binary classification problems like logistic regression. Our train–
test split of data is 4:1; we use 80% data for training and 20% data for testing the
models. Since the distribution of grades can be highly random, we want to have
sufficient data for training, but we also want to have enough data for testing purpose,
so 4:1 seems a good fit.

5 Feature Engineering

Our supervised machine learning model has to learn a function ‘f ’ such that it is able
to predict ‘y’ (like_grade) based on the ‘x’ (features) (Fig. 2).
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure … 575

Fig. 2 Features F1, F2, F3, F4, F5, an ML model which will train on the dataset and learn f (a
mapping) from X (a combination of these features) to Y (like_grade)

We have a classical problem of classification based on certain derived features

from the text. We try different combination of features with an aim to figure out
which features are playing the most important role in predicting the like_grade.

6 Results

The results of the experiments done with all the four machine learning models have
been given in the Tables 1, 2, 3 and 4. Each table shows the input features used for

Table 1 Results for SVM

Features Accuracy (%)
model
t-sentiment 90
h-sentiment 90
p-count 90
h-count 90
Readability 90
All features 90
576 L. Singh and K. Gautam

Table 2 Results for KNN

Features Accuracy (%)
model
t-sentiment 90
h-sentiment 90
p-count 90
h-count 90
Readability 90
All features 90

Table 3 Results for decision

Features Accuracy (%)
tree model
t-sentiment 85
h-sentiment 90
p-count 89
h-count 90
Readability 89
All features 78

Table 4 Results for random

Features Accuracy (%)
forest model
t-sentiment 87
h-sentiment 90
p-count 90
h-count 90
Readability 89
All features 89

that model and the corresponding accuracy obtained using those features. We trained
model using individual features and also taking all the features together.

7 Conclusion

All the algorithms did perform fairly decent in terms of accuracy. SVM and KNN
were the best algorithms in terms of accuracy. Decision tree was not as accu-
rate when compared with the other three algorithms. Although, in terms of accu-
racy, the algorithms did perform well, they were not able to successfully predict
‘like_grade’ separately to all types of grades in the test dataset. They failed to recog-
nize other like_grades that were less in number and mostly predicted A22, which
had the maximum number of tweets. The decision tree model did not perform best
in terms of accuracy but was able to assign grades to a few classes other than A22
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure … 577

(0, mean(likes)), whereas others dominated because of the large number of tweets
with the like_grade as A22.

8 Limitations and Future Work

We could further reduce the bracket of the like_grade. This is challenging since
by reducing it further, we are slowly drifting from classification to regression. It is
difficult to strike a balance where most of the tweets do not fall in the single bracket,
and the problem is still in proximity of classification.
Another solution to deal with the dominance of one grade is to get a big enough
dataset with ample of tweets falling in each like_grade bracket. This will help the
model to learn better and differentiate between different like_grades.
We can try and build an Artificial Neural Network (ANN), which might eventually
yield better results that are not biased toward a single like_grade.
We omitted tweets with some external links during the filtering phase. These
tweets could be used if we incorporate the effect of external links within our models.
Like we could have simple features as link_count(count of links), has_img(is an
image present), etc., or complex ones such as using convolutional neural network
(CNN) on these images.
The complete code and dataset are available on GitHub4 which can be used for
further experimentations.

References

1. Huang D, Zhou J, Mu D, Yang F (2014)Retweet behavior prediction in Twitter. In: 2014 seventh
international symposium on computational intelligence and design, Hangzhou, 2014, pp 30–33
2. Riquelme F, González-Cantergiani P (2016) Measuring user influence on Twitter: a survey. J Inf
Process Manag 52. https://doi.org/10.1016/j.ipm.2016.04.003
3. Shahmirzadi O, Lugowski A, Younge K (2018) Text similarity in vector space models: a
comparative study
4. Quoc L, Tomas M (2014) Distributed representations of sentences and documents. In:
Proceedings of the 31st international conference on international conference on machine
learning—volume 32 (ICML’14). JMLR.org, II–1188–II–1196
5. McNamara D, Mccarthy P (2010) Linguistic features of writing quality. Written Commun WRIT
Commun 27:57–86. https://doi.org/10.1177/0741088309351547
6. Hutto CJ, Gilbert EE (2014) VADER: a parsimonious rule-based model for sentiment analysis of
social media text. In: Eighth international conference on weblogs and social media (ICWSM-14).
Ann Arbor, MI

4 https://github.com/singh-l/Twitter_Like_Grade.
Groundwater Level Prediction
and Correlative Study with Groundwater
Contamination Under Conditional
Scenarios: Insights from Multivariate
Deep LSTM Neural Network Modeling

Ahan Chatterjee , Trisha Sinha, and Rumela Mukherjee

Abstract Groundwater is the primary source for drinking water and irrigation in
India, and from last few years, due to population burst across the nation, there is
a sharp decline in the groundwater level (availability). There is a constant pressure
balance among the groundwater and seawater level so that contaminated water cannot
seep into and due to lowering level there is an alarming situation for water contam-
ination across India. In this paper, we aim to find the liaison between groundwater
level and ground contamination condition through LSTM predictive modeling. The
proposed algorithm for groundwater prediction is based on conditional approach
through deep LSTM modeling and the ground contamination is calculated using an
aggregated scoring approach modeled using Euclidian distance concept. Lastly, a
correlative study is being provided to analyze the liaison in between the said vari-
ables. There is a high negative correlation among the said variables indicating loss of
groundwater level is increasing the contamination level across the taken zones. The
experiment has been carried out on the data across the three eastern Indian states,
viz. West Bengal, Odhisa, and Bihar for a time span from 2004 to 2017.

Keywords LSTM · Neural network · Euclidian distance · Groundwater level ·

Groundwater contamination

1 Introduction

It is a well-known fact that India is an agriculture-based country where a fair amount

of population depends on agriculture. For their water supply, India is widely depen-
dent on water pumped out from aquifers. Aquifers are the rocks where groundwater
is stored and is made up of gravel, sandstone or limestone, and sand. It has been esti-
mated that more than 90% of groundwater in India is used for agricultural purpose,

A. Chatterjee (B) · R. Mukherjee

Department of Computer Science and Engineering, The Neotia University, Sarisha, India
T. Sinha
Department of Robotics Engineering, The Neotia University, Sarisha, India

mainly irrigation. The remaining groundwater, which is nearly 24 million cubic

meters, is used for drinking. As a whole, 80% of India’s population utilize ground-
water for irrigation and drinking. The contribution of rainfall to India’s groundwater
resource is 68%, being the major source for maintaining the groundwater level. There
are also other sources like canal seepage, recharge from tanks and ponds, and water
conservation systems which together contribute 32% to the country’s groundwater
[1, 2].
Due to the increasing population of India, there is an alarming decrease in ground-
water quantity every year. According to experts, India is rapidly moving toward
a shortage of groundwater due to excessive use and population. The groundwater
quality is degrading mainly due to the contamination by geogenic and anthropogenic
activities. The quality has deteriorated to an extent where it can be even hazardous
for living beings. Presence of fairly high concentrations of fluoride, nitrate, iron,
arsenic, and other toxic metal ions have been observed throughout many states in
India. Also, the salinity and hardness of the groundwater have increased, which makes
it unfit for use. It has been noticed that high fluoride content in groundwater, beyond
the permissible limit of 1.5 mg/L, is one of the major reasons of health problems
in India. Another highly poisonous element, arsenic, has been found widely in the
groundwater of eastern India, mainly West Bengal.
There are few boards that have been formed to maintain the amount and quality of
groundwater in India, the Central Ground Water Board (CGWB) being one among
them. It is entrusted with the task of estimating groundwater resources, developing
technologies for sustainable use of groundwater, and monitoring and implementing
policies for proper management of groundwater resources. Due to the access to
adequate amount of data, many hydrologists opt for machine learning approaches
to forecast information about groundwater availability and resources. The machine
learning models help to develop a relationship between the input parameters and the
output parameters through iterative learning process [3–5].
Artificial neural networks (ANNs), specially recurrent neural networks or RNNs,
have been widely used for evaluating time series data concerning groundwater table.
The LSTM approach is highly effective for time series forecasting as it has the ability
to process entire sequences of data in addition to single data points because of the
presence of its feedback connections. It can learn the context required for prediction
instead of working on prespecified and fixed context. LSTM also offers flexibility
for modeling problems and can very easily process more than one input parameter.
In this paper, we aim to find the liaison between groundwater availability factor and
water contamination level across the three states of eastern India, viz. West Bengal,
Odisha, and Bihar. We have examined the trend of availability of groundwater level
from the year 2004–2017, using LSTM model and based on the model, we have
predicted the availability of groundwater for upcoming years under five circumstan-
tial scenarios, i.e., controlled water usage with good rainfall, controlled water usage
with poor rainfall, uncontrolled water usage with good rainfall, uncontrolled water
usage with poor rainfall, and usual general circumstances. The circumstances have
been created with the dependency on rainfall as it is the primary source of ground-
water refilling. Similarly the trend of contamination level of water has also been
Groundwater Level Prediction and Correlative Study … 581

examined from the year 2004–2017. The major cause of water contamination is the
decreasing water pressure barrier of groundwater level and sinking of seawater into
it. Afterward, a correlation study has been carried out with two factors, viz. ground-
water level (G.W. Level) and water contamination level, predicting the upcoming
conditions of these states under the stated circumstantial scenarios.
The novelty of the work lies in the prediction of the upcoming condition of G.W.
level in the stated states under different scenarios along with the condition of water
pressure level with the saline water which will eventually lead to water contamination
problem across Eastern India.
The paper has been structured in the following manner; Sect. 2 contains litera-
ture review followed by LSTM modeling of groundwater level prediction along with
the simulation results of different scenarios in Sect. 3. Section 4 contains predictive
modeling of water contamination across the states, and Sect. 5 contains correlation
study among groundwater level availability and groundwater contamination condi-
tions. Lastly, Sect. 6 contains the concluding remark and further scope of study
[7–12].

2 Literature Review

Sarkar and Pandey [1] estimated and predicted the quality of stream water using
ANN. They have taken three layers of neural net to predict their outcomes. They
have predicted the DO concentration of Mathura city along the downstream. Behzat
et al. [2] used SVM algorithm and ANN to predict the groundwater level across the
river bed; they have showed SVM could provide better result than ANN in case of
lower amount of data or due to class imbalance problem. Galavi et al. [3] proposed a
modified ARIMA model to predict the groundwater level, and it showed the modified
version over performed both traditional ARIMA model and adaptive network-based
fuzzy logic. Wang et al. [4] proposed a hybrid model to predict the outcome. They
proposed a coupled model of ARIMA and ensemble empirical mode decomposi-
tion (EEMD-ARIMA) to predict the outcome, and results showed that it is easily
outperforming the results of traditional ARIMA model. Yao et al. [5] optimized the
Elman-RNN network using some genetic algorithm as optimizer in the model and
results showed significant improvement in accuracy metrics. Fan et al. [6] proposed
a multivariable linear regression to show the relation and predicted the outcome on
Yangtze River bank. Zang et al. [7] proposed a PCA-based multivariate autoregres-
sion time series analysis on the data. Yang et al. [8] showed a time series-based
forecasting using random forest regression model. Seo et al. [9] showed a wavelet-
based ANN model to forecast the groundwater level condition. Rashid et al. [10]
developed a benchmark LSTM modified model, and they have optimized the LSTM
model using various genetic algorithm, viz. harmony search (HS), gray wolf opti-
mization (GWO), and ant lion optimization technique. The comparative results of the
model are given with traditional RNN algorithm, showing hybrid model giving better
performance. Nawi et al. [11] proposed a data classifier model using cuckoo search
582 A. Chatterjee et al.

algorithm as its optimizer. The hybrid RNN and cuckoo search showed significant
results.

3 Multivariate LSTM Modeling Under Circumstantial

Scenarios of Groundwater Prediction

One of the most promising algorithms which came up to predict the time series data
is Elman-RNN or simply RNN. RNN works by taking the previous state output as
its next set of input. Thus, the time state t will be dependent on t − 1. Though being
highly effective, it has one major drawback of vanishing gradient descent. Long
short-term memory algorithm thus came up with filling up the drawback of RNN
algorithm by introducing the concept of gates which are capable of remembering the
outputs states.
a. Theoretical Framework
The LSTM architecture diminishes the major problem of the RNN architecture
i.e., the vanishing gradient descent problem. It minimized the error in the process
by adding the constant error flow term through the hidden cells but not through the
activation function. In LSTM architecture, there are four gates through which the
memory allocation occurs, mainly forget gate f , input gate i, input modulation gate
g, and the output gate [6].
The forget gate used to process the output of the last state h t−1 and it mainly used
to remove or forget the irrelevant data which is present in the information which have
been passed through the model. The activation function which is majorly used in the
forget gate is the sigmoid function.
The input gate in the architecture adds required information which will be helpful
for further computation. Sigmoid activation function is taken to update the value in
the gate, while tanh activation function is being used for candidate value and scaling
purpose. The equations for the gates are given below.

f t = σ W f · h t−1 , xt + b f (1)

i t = σ Wi · h t−1 , xt + b f (2)

Ct = tan h Wi · h t−1 , xt + bC (3)

Ct = f t × Ct−1 + i t × Ct (4)

The output through the sigmoid activation function is shown in Eq. 5.

ot = σ W0 · h t−1 , xt + b0 (5)
Groundwater Level Prediction and Correlative Study … 583

Fig. 1 LSTM cell architecture. Source Supretha et al. [12]

h t = ot × tan h(Ct ) (6)

The LSTM network is based on the architecture of data backpropagation similar

to RNN. The error term which is being generated is divided through the layers across
the time series data. The architecture is shown in Fig. 1 [13–15].
b. Empirical Model for Groundwater Level Prediction
In this paper, we have taken conditional cases to predict the groundwater level
availability across the eastern Indian states (three taken into consideration for lower
computational complexity), viz. West Bengal, Bihar, and Odhisa. The major influ-
encing factors which taken into consideration while formulating the model are rainfall
(both pre- and post-monsoon), groundwater used in irrigation, and net availability of
water in previous year. The yearly data has been interpolated ** into monthly form
to get more time steps so that our batch size gets enough data for computation. The
time lag for 5 months has been formulated tm , tm − 1, tm − 2, tm − 3, tm − 4. The
lag considers the output for present time step tm for an input variable xtm . The output
will be monthly time step, i.e., tm + 1.
In India, our work has been divided under two seasons mainly pre-monsoon and
post-monsoon. Here, the major downfall happens during the monsoon and there
is retreating rainfall for the later part of the year. This observation of the model
concludes that recharge of groundwater every year R is associated with rainfall
infiltrates through the soil, recharge from artificial sources and the annual water
discharge W is associated with groundwater usage and water used in agriculture.
584 A. Chatterjee et al.

Other minor recharge and discharge are taken as constant as their value is negligible
in the analysis but considered for equation balancing. As we have considered out
season into two major parts we can use biannual modeling for our analysis. Thus, for
biannual time step tb , R and W are calculated using the following equations [15–19]
(Fig. 2).

R(tb →tb +1) = Htb +1 − Htb (7)

tb → tb + 1 : Biannual interval post monsoon

W(tb →tb +1) = Htb − Htb +1 (8)

tb → tb + 1 : Biannual interval pre monsoon

Htb , Htb+1 : Ground water level

The model predicts in one time step further, and the proposed algorithm is iterative
in nature. Initially four time steps are being feeded, then successive ground level will
be predicted under given conditional circumstances, depending on that the algorithm
will flow into its next iteration taking the variables.

Fig. 2 Recharge and discharge of water in graphical representation. Source Created by Author
Groundwater Level Prediction and Correlative Study … 585

Conditional Scenarios
Good Rainfall and Controlled water usage: (C.S. 1). In this circumstantial scenario,
we have assumed that there is a heavy rainfall (better than normal or mean) has
occurred and water recharge is more than usual. Another parameter taken is ground-
water usage; in this case, we have assumed the usage is controlled , i.e., no such
wastage is recorded for the year.
Good Rainfall and Uncontrolled water usage: (C.S. 2). In this case, heavy rainfall
(10% higher than mean) assumed along with high wastage of groundwater throughout
the year has been recorded. Thus, there will be a shortage of groundwater level in
the next year as there is a high misuse of water.
Poor Rainfall and Controlled water usage: (C.S. 3). In this case, poor rainfall
(20% lower than mean) is assumed along with high controlled water usage across
the year.
Poor Rainfall and Uncontrolled water usage: (C.S. 4). In this case, poor rainfall
(20% lower than mean) is assumed along with high waste of water usage is being
recorded across the year [20].
Proposed Algorithm

I nput : R, W aspar amet er s

Out put : P r ed i ct i on f or C ond i t i onal Sceanr i os
Begi n
Feed : T imeSeries&&I nput Parameterin L ST M N etwor k
T.S.AComputationin L ST Mnetwor k

c = Out put Htb , Htb+1
i f (c > Aq.B)&&(seasoanlrain f all(s.r )mean > c)
I mplementC.S.1
F : C.S.1conditiontoL ST MCell
i f else(c > Aq.B)&&(s.r.mean < c)
I mplementC.S.2
F : C.S.2conditiontoL ST MCell
i f (c < Aq.B)&&(s.r.mean > c)
I mplementC.S.3
F : C.S.3conditiontoL ST MCell
i f else(c < Aq.B)&&(s.r.mean < c)
I mplementC.S.4
F : C.S.1conditiontoL ST MCell
else
I mplement G.T.
End
586 A. Chatterjee et al.

3.1 Simulation Results of Conditional Scenarios

Conditional Scenario 1. In this scenario, we observe that the groundwater avail-

ability increases subsequently due to good rainfall as well as the water usage has
been also in control. In coming years, we see a rise of water level availability. This
can be assumed our best-case scenario (Fig. 3).
Conditional Scenario 2. In this scenario, we observe there is a loss of groundwater
in future as the water usage has been uncontrolled and waster has been wasted at
a rapid pace, nevertheless good rainfall managed the condition a bit. This can be
considered as our average case scenario [20, 21] (Fig. 4).
Conditional Scenario 3. In this scenario, we observe that there is a lack of rainfall
across the eastern part of the country thus major source of water recharge is being
halted due to this. The water usage in this case is controlled and the loss is being
compensated somehow. This also can be treated as our average case scenario (Fig. 5).
Conditional Scenario 4. In this scenario, we observe lack of rainfall along with
uncontrolled water usage across the considered states. This led to serious lack of
groundwater availability across the zones, lower availability of groundwater leads to
seeping of contaminated seawater into the zone making the available water unusable
(Fig. 6).

Fig. 3 Predictive curve for Conditional Scenario 1. Source Created by Author

Groundwater Level Prediction and Correlative Study … 587

Fig. 4 Predictive curve for Conditional Scenario 2. Source Created by Author

Fig. 5 Predictive curve for Conditional Scenario 3. Source Created by Author

588 A. Chatterjee et al.

Fig. 6 Predictive curve for Conditional Scenario 4. Source Created by Author

The simulation results show that there is a lack of groundwater in the worst-case
scenario for upcoming time, as the trend is going downward. The best-case scenario
shows a promising content of water as it rises from the past year trend. Average case
scenarios following usual trend with minor changes in the curve [22–24].
Evaluation Metrics. The deep LSTM model has been fitted into time series data
thus the model is being regressed along the variables, thus we use root mean-square
for our model evaluation and it is represented using Eq. 9. Table 1 shows our RMSE
value (Fig. 7).

2
N
yi − yi
RMSE = (9)
i=1
N

Table 1 Results of model

Metrics name Metrics value Parameters
RMSE 0.00324 25 epochs, 32 batch size, 8 time step (multivariate)
Source Created by Author
Groundwater Level Prediction and Correlative Study … 589

Fig. 7 Loss curve for the model. Source Created by Author

4 Groundwater Contamination and Predictive Modeling

Groundwater contamination is a major problem persisting across the villages of India,

around 1 lakh people die in India due to arsenic contamination in the groundwater.
The figures are dangerous and growing steeply.
The water contamination is directly related to lowering level of groundwater
availability. There is a pressure balance in between the seawater which cannot be
used for regular activities and groundwater. Lowering of the groundwater level can
lead to the disbalance and the seawater can seep into and the groundwater gets
contaminated.
In this section, we aim to draw the trend of groundwater contamination across
three states of eastern India, namely West Bengal, Odhisa, and Bihar. We have taken
eight parameters, namely temperature, pH, conductivity, B.O.D, nitrate level, faecal
coil, total coil, and fluoride level. Using these measuring parameters, we have gener-
ated a total contamination score using mathematical modeling to compare the trend
cumulatively and to relate the changing trend with the variation of groundwater
level. The water quality indices value fluctuate a lot over a period of time thus data
processing is needed for prediction section [25].
590 A. Chatterjee et al.

4.1 Data Preprocessing and Empirical Modeling

The quality which have been measured are majorly through sensors thus it is a safe
side to preprocess the data to make it in the same scale before fitting the model. We
have taken data across various districts of the said states thus there would be variation
of upper and lower values.
In various scenarios, data was missing or changed drastically, thus to increase the
robustness of the predictive modeling linear interpolation is being used. The rela-
tionship between two known data and one unknown is seen as linear and formulated
using Eq. 10.

xk+ j − xk
xk+i = xk + i · (10)
j

where
xk+i = Missing Data
xk = Data known before
xk+ j = Known data after.
After interpolating the data, error handling is being checked. The computation of
corrected data and erroneous data is shown in Eq. 11.
(xk−1 +xk+1 )
, if |xk − xk−1 | > β1 or |xk − xk+1 | > β2
xk = f (x) = 2 (11)
xk , else

The data which has been collected has various upper and lower limits. Thus,
we form an aggregate indexing compromising eight factors thus normalizing it as
composite unit has different units of measurement. As a result, each parameter is
normalized by Eq. 12.

X i − X min
Xi = (12)
X max − X min

where
X i = Normalized value lies between [0, 1]
X min = Minimum Value observed
X max = Maximum Value observed.
We have computed the relationship between groundwater level and water contam-
ination level and for that an aggregate scoring method is needed. We have given equal
weights to all the parameters taken for analysis. The eight-dimensional indices maybe
represented in eight dimensions with minimum value 0 and maximum of 1.
The aggregate score of water contamination uses weighted Euclidian distance
from the ideal point of (1, 1, 1, 1, 1, 1, 1, 1). So, the calculation is given in Eq. 13.
Groundwater Level Prediction and Correlative Study … 591

(1 − T )2 + (1 − p H )2 + (1 − C)2 + (1 − B)2
+(1 − N )2 + (1 − Fc)2 + (1 − T c)2 + (1 − F)2
Agw = 1 − (13)
8
where
T = Temperature
pH = pH of water
C = Conductivity
B = B.O.D
N = Nitrate Level
Fc = Faecal Coil
T c = Total Coil
F = Fluoride level.
The data has been preprocessed and now we have built the empirical model which
will be modeled to feed the data in our neural network modeling. Let the water quality
is taken for constant place and time and the parameter number is j then we can say:

Si,n = Yi,1 , T1 . . . Yi,n , Tn (14)

The feeded data will be linear imputation computed using Eq. 15.

(Yi,u − Yi,v )
L(t) = Yi,u + t − Ti,u (15)
(Ti,u − Ti,v )

4.2 Model Simulation and Discussion

The model has been simulated on the normalized and processed dataset and predic-
tions are made of the next years to get the insight on how the water contamination
level is varying across the years in the said states. As the data is being normalized
and aggregated using Formulas 12 and 13, respectively, 1 will denote highest amount
of water contamination whereas 0 will give the lowest (Figs. 8, 9 and Table 2).

5 Correlated Study Between Groundwater Level

and Groundwater Contamination

Through our modeling, in Sect. 3 we have built a multivariate LSTM neural net
model and it has been modeled under different conditional scenarios and the output
curve shows us that under best-case scenario the groundwater level is much better
592 A. Chatterjee et al.

Fig. 8 Predictive curve for groundwater contamination level. Source Created by Author

Fig. 9 Loss curve for the model. Source Created by Author

Groundwater Level Prediction and Correlative Study … 593

Table 2 Results of model

Metrics name Metrics value Parameters
RMSE 0.00004235 500 epochs, 32 batch size, 8 time step (multivariate)
Source Created by Author

Table 3 Pearson correlation

Pearson correlation table Groundwater contamination
results
Groundwater level −0.6754
Source Created by Author

than the worst-case scenario and the average case follows the similar usual trend. In
Sect. 4, we have predicted the water quality level by taking eight parameters of water
contamination, the entire data has been processed and an aggregated final score is
generated using data normalization and Euclidian distance formation.
We know that there is a constant water pressure balancing among groundwater and
seawater so that seawater cannot seep into making the water unusable. In this section,
we have presented a correlated study between these two factors, viz. groundwater
level and groundwater contamination. We analyze how much these factors are being
correlated across the year and how they are influencing each other. Along with this,
we examine how much change in the water contamination is being caused by the
change of groundwater level.
At first, we have examined the correlation value between the two variables using
Pearson’s correlation factor (Table 3).
The results from the correlation value show there is a high negative correla-
tion among the variables taken. This denotes that our assumption of water pressure
balancing is true and the water is getting contaminated with gradual decrease of
groundwater level leading to increase of water contamination.
To solidify our claims, we use OLS model keeping our water level as independent
variable and the contamination level as dependent value and from the result we
can conclude that how much contamination level is increasing per 1 unit change of
groundwater level (Table 4).
It is very much clear from the correlation result that these two variables are related
in negative way thus constant loss of groundwater level is misbalancing the water
balance and water contamination is increasing. OLS model suggests that for every
1-unit loss of groundwater there will be 0.0356 unit increase of water contamination
level.
Comparing the two above graphs, i.e., Figs. 10 and 11, it can be concluded that
the time during which the groundwater level is decreasing, the contamination level is

Table 4 OLS model results

Coefficient from OLS model −0.0356
Source Created by Author
594 A. Chatterjee et al.

Fig. 10 Future prediction curve for groundwater level. Source Created by Author

Fig. 11 Future prediction curve for aggregated groundwater contamination. Source Created by
Author
Groundwater Level Prediction and Correlative Study … 595

evidently increasing. Even from the correlation factor, it is observed that these factors
have negative correlation. Hence, with increasing groundwater level, the contami-
nation is decreasing over the given period of time. Also, if the predicted plots are
considered for both the factors, it can be observed that even in the upcoming time,
the contamination level will decrease with increasing groundwater level.

6 Concluding Remarks and Future Scope of Study

Decreasing groundwater level is a serious situation across India. Along with the rising
level of groundwater contamination is questioning the existence of human race. In
this paper, we have analyzed and predicted the upcoming trend for the groundwater
availability using LSTM modeling under different circumstantial scenarios. We have
assumed scenarios which will affect our results in water level. Simulation result of
the model is being taken for analysis. In the next section, we have predicted the trend
for groundwater contamination level, the data is being highly preprocessed and an
aggregate function value is being generated using Euclidian distance method. The
trend has been analyzed and in the next section a correlative study has been shown in
between groundwater level and ground contamination level. We observed a negative
correlation among the said variables indicating the lowering level of groundwater
level is allowing the unsuitable seawater seep into the floors and the reaming water
is getting contaminated through this. Lastly, we showed prediction for coming years
using interpolated-average moving value of the time series.
The study could be extended by taking the water level for more states or across the
nation. We can use genetic algorithms as optimizer function such as ABC algorithm,
LA algorithm to check the trend style. The time span taken could be extended for 10
more years for better curve result.

References

1. Sarkar A, Pandey P (2015) River water quality modelling using artificial neural network
technique. Aquatic Procedia 4:1070–1077
2. House PLA, Chang H (2011) Urban water demand modeling: review of concepts, methods,
and organizing principles. Water Res Res 47(5)
3. Gwaivangmin BI, Jiya JD (2017) Water demand prediction using artificial neural network for
supervisory control. Nigerian J Technol 36(1):148–154
4. Coulibaly P, Anctil F, Aravena R, Bobée B (2001) Artificial neural network modeling of water
table depth fluctuations. Water Resour Res 37(4):885–896
5. Gulati A, Banerjee P (2016) Emerging water crisis in India: key issues and way forward. Indian
J Econ Special Centennial Issue 681–704
6. Barzegar R, Adamowski J, Moghaddam AA (2016) Application of wavelet-artificial intel-
ligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran.
Stochastic Environ Res Risk Assess 30(7):1797–1819
596 A. Chatterjee et al.

7. Adamowski J, Karapataki C (2010) Comparison of multivariate regression and artificial neural

networks for peak urban water-demand forecasting: evaluation of different ANN learning
algorithms. J Hydrol Eng 15(10):729–743
8. Benítez R, Ortiz-Caraballo C, Preciado JC, Conejero JM, Sánchez Figueroa F, Rubio-Largo A
(2019) A short-term data based water consumption prediction approach. Energies 12(12):2359
9. Res EAM Incoming water crisis in India: as understood by others
10. Rojek I (2008) Neural networks as prediction models for water intake in water supply system.
In: International conference on artificial intelligence and soft computing. Springer, Berlin,
Heidelberg, pp 1109–1119
11. Saravanan M, Sridhar A, Bharadwaj KN, Mohanavalli S, Srividhya V (2015) River network
optimization using machine learning. In: International conference in swarm intelligence.
Springer, Cham, pp 409–420
12. Shenoy N, Nayak P (2019) Lion algorithm-optimized long short-term memory network for
groundwater level forecasting in Udupi District, India. arXiv preprint arXiv:1912.05934
13. Pan M, Zhou H, Cao J, Liu Y, Hao J, Li S, Chen CH (2020) Water level prediction model based
on GRU and CNN. IEEE Access 8:60090–60100
14. Tsanis IK, Coulibaly P, Daliakopoulos IN (2008) Improving groundwater level forecasting
with a feedforward neural network and linearly regressed projected precipitation. J Hydroinf
10(4):317–330
15. Suhag R (2019) Overview of ground water in India. PRS
16. van der Lugt BJ, Feelders AJ (2019) Conditional forecasting of water level time series
with RNNs. In: International workshop on advanced analysis and learning on temporal data.
Springer, Cham, pp 55–71
17. Satishkumar U, Kulkarni P (2018) Simulation of groundwater level using recurrent neural
network (RNN) in Raichur District, Karnataka, India. Int J Curr Microbiol App Sci 7(12):3358–
3367
18. Mohanty S, Jha MK, Kumar A, Sudheer KP (2010) Artificial neural network modeling
for groundwater level forecasting in a river island of eastern India. Water Resour Manage
24(9):1845–1865
19. Halder S, Roy MB, Roy PK (2020) Analysis of groundwater level trend and groundwater
drought using standard groundwater level index: a case study of an eastern river basin of West
Bengal, India. SN Appl Sci 2(3):1–24
20. Ghosh NC, Singh RD (2009) Groundwater arsenic contamination in India: vulnerability and
scope for remedy
21. Gokhale R, Sohoni M (2015) Detecting appropriate groundwater-level trends for safe
groundwater development. Current Sci 395–404
22. Hu Z, Zhang Y, Zhao Y, Xie M, Zhong J, Tu Z, Liu J (2019) A Water quality prediction
method based on the deep LSTM network considering correlation in smart mariculture. Sensors
19(6):1420
23. Liu P, Wang J, Sangaiah AK, Xie Y, Yin X (2019) Analysis and prediction of water quality
using LSTM deep neural networks in IoT environment. Sustainability 11(7):2058
24. Bowes BD, Sadler JM, Morsy MM, Behl M, Goodall JL (2019) Forecasting groundwater table
in a flood prone coastal city with long short-term memory and recurrent neural networks. Water
11(5):1098
25. Daliakopoulos IN, Coulibaly P, Tsanis IK (2005) Groundwater level forecasting using artificial
neural networks. J Hydrol 309(1–4):229–240
A Novel Deep Hybrid Spectral Network
for Hyperspectral Image Classification

K. Priyadharshini @ Manisha and B. Sathya Bama

Abstract Image classification is the process of allocating land cover classes to

picture elements. Hyperspectral image classification of land cover is not easy due
to the problematic variability among the samples for training count of band spectra.
Deep learning has network which is capable of learning from unstructured or unla-
beled data without supervision. The convolutional neural network (CNN) is one
of the most widely employed approaches for visual data processing based on deep
learning. To classify the hyperspectral data, a hybrid network with 2D and 3D CNN
is developed. The spectral information and spatial information are used together
for hyperspectral image analysis that enhances the experiment result considerably.
With pixel as the basic analysis unit, classification technique of convolutional neural
network has been developed. Principal component analysis (PCA) is implemented
which lowers the dimensionality of hyperspectral data. PCA implementation reduces
feature size to increase computational efficiency. The first principal component has
a superior function, as it has the highest variance compared to the other compo-
nents. Three hyperspectral datasets are used for the analysis such as Pavia university,
Indian pines, and Salinas scene. Hyperspectral data of Indian Pines is obtained from
the Northwest Indiana on June 1992 by Airborne Visible Infrared Imaging Spectrom-
eter (AVIRIS). Indian pine data has the size (145 × 145) pixels. The Pavia university
is obtained from Reflective Optics System Imaging Spectrometer (ROSIS) Pavia,
Northern Italy, in 2001. The data has 103 spectral bands with the size of 610 × 340
pixels. The Salinas scene is obtained from the AVIRIS Salinas Valley, CA, USA, in
1998, with 512 × 217 spatial dimension. The hybrid CNN is computationally effi-
cient compared to the 3D CNN and for minimum training data it provides enhanced
performance.

Keywords Hybrid spectral network · Convolutional neural network · Principal

component analysis · Support vector machine

K. Priyadharshini @ Manisha (B) · B. Sathya Bama

Thiagarajar College of Engineering, Madurai, India

1 Introduction

Hyperspectral images (HSIs) comprise numerous of near-image band spectra to

impart simultaneously both rich spectral and spatial detail. Hyperspectral devices
receive the energy from several overlapping small spectral channels of electromag-
netic spectrum, i.e., hyperspectral sensors capture information as a collection of
images covering hundreds of small and contiguous spectral bands over a broad range
of spectrum, allowing the detection of accurate spectral signatures for different signal
materials.
Hyperspectral imaging is a technique that produces a spectral variance spatial map,
making it as useful tool for many applications. The huge amount of bands allows
the way for objects to be identified by corresponding output in spectral domain.
However, this huge amount of band spectra is the factor which results to complex
problems in analysis techniques. From either airborne or spaceborne platforms the
hyperspectral imaging cameras collect radiance data. Before analysis, techniques
that radiance data must be converted to apparent surface reflectance.
HSIs are obtained by scanning different spectral bands within the same area.
Such spectral domain data may add some extent of similarities, implying that two
consecutive bands may show identical perceptions. But recognition of these hyper-
spectral associations is beneficial. A precise reflectance or radiance distribution may
be recorded at each pixel. The resulting hyperspectral image (HSI) can be used to
locate objects, classify different components, and detect processes in various fields
of application such as military, agricultural, and mineralogical applications.

2 Related Works

HSI technology was primarily used in many difficult Earth observation and remote
sensing applications such as greenery monitoring, urbanization research, farm and
field technology, and surveillance. The need for quick and reliable authentication
and object recognition methods has intensified the interest in applying hyperspec-
tral imaging for quality control in the agriculture, medicinal, and food industries.
Definitely, a physical material with rich spectral details has its own characteristic
reflectance or radiance signature.
Hyperspectral remote sensors have a superior discriminating capability, partic-
ularly for materials which are visually similar. These distinctive features enable
numerous uses in fields of computer vision and remote sensing, e.g., military target
identification, inventory control, and medical diagnosis, etc. There is therefore a
tradeoff between high spectral resolution and spatial accuracy. The benefits of hyper-
spectral imaging over conventional approaches include reduced sample processing,
non-destructive design, fast acquisition times, and simultaneous simulation of spatial
distribution of various chemical compositions [1].
A Novel Deep Hybrid Spectral Network for Hyperspectral … 599

One of the main problems is how the HSI functions can be easily removed. Spec-
tral—spatial features are currently commonly used, and efficiency in HSI classifica-
tion has slowly increased from using only spectral features to using spectral—spatial
features together [2]. Deep learning models have been developed for the purpose of
classifying HSI to remove spectral—spatial characteristics. The core view of deep
learning is to derive conceptual features from original input, using superimposed
multilayer representation.
In SAE-LR, the testing time is enhanced in comparison with KNN and SVM. Also,
it takes much time for the training [3]. The methods such as WI-DL and QBC observes
more time for testing and training. The traditional image priors need be integrated
into the DHSIS method to advance the accuracy and performance [4]. There is a
longest training time observed for the Salinas scene dataset for the proposed CNN
[5].
The method of band selection is vital to choose the salient bands before fusion
with the extracted hashing codes to decrease training time and save storage space.
The major drawback is the requirement of a number of properly labeled data for the
model preparation.

3 Methodology

A deep hybrid spectral network is developed by using the 2D and 3D convolutional

layers. The 3D CNN extracts the spatial and also the spectral data but at increased
computational complications. For handling the spectral information, the 2D CNN is
not suitable as it contains only the spatial information.
The hyperspectral data obtained from the AVIRIS and ROSIS hyperspectral
sensors are used for the analysis.

Hyperspectral datacube : I ∈ R M×N ×D (1)

I—Original input
M—Width of input
N—Height of input
D—No. of spectral bands.
Principal component analysis (PCA)—after dimensionality reduction, the spectral
bands are reduced to B from D.

PCA reduced datacube X ∈ R M×N ×B (2)

X-Modified input after PCA

Using 2D-CNN or 3D CNN results in a very complex model. The major cause
occurs because of the reason that hyperspectral data are volumetric data. The 2D CNN
alone cannot extract good discriminating maps of the spectral dimensions from the
600 K. Priyadharshini and B. Sathya Bama

Fig. 1 Network structure for deep hybrid spectral network

feature. Correspondingly, a 3D CNN is more computationally complicated and seems

to perform worse alone over several spectral bands for groups with similar textures.
Figure 1 represents the deep hybrid network for the classification of hyperspectral
image.
For the hybrid spectral network, the 3D CNN and 2D CNN layers are constructed
in such a way that they make full use of both spectral and spatial feature maps to
achieve optimum accuracy. The dataset is divided into overlapping small 3D patches
and the calculations are tabulated in Table 2.
The total 3D patches observed from X is given by,

(M − S + 1) × (N − S + 1) (3)

The 3D patch at location (α, β) is represented by Pα,β , acquires the width from

α − (S − 1)/2 to α + (S − 1)/2, (4)

and height from,

β − (S − 1)/2 to β + (S − 1)/2 (5)

The input data in 2D CNN is transformed with 2D kernels. The convolution comes
about, measuring the sum of the dot product between the input data and the kernel.
To cover maximum spatial dimension, the filter is strided over the input data. The
3D convolution is achieved by translating the 3D data to a 3D kernel. The feature
maps of the convolution layer are created in the hybrid model for HSI data using the
3D kernel over multiple contiguous bands within the input layer.
A Novel Deep Hybrid Spectral Network for Hyperspectral … 601

The 2D CNN is applied once before the flatten layer, bearing in mind that the
spatial information inside the varying spectral bands is severely discriminated against
without significant loss of information from the spectral domain, which is necessary
for HSI data. In the hybrid model, the total count of parameters depends on the count
of classes in a dataset.
Table 1 represents the calculation of total trainable parameters of the proposed
hybrid CNN. The network consists of four convolutional layers and three dense
layers. Of the four convolutional layers, three are 3D convolutional layers and the
remaining one is the 2D convolutional layer (Table 2).
In hybrid network, the count of trainable weight parameters for the Indian pines
dataset is 5, 122, 176 and the count of the patches is 14641. The Adam optimizer
backpropagation algorithm is used to train the weights and initialize it randomly to
a value.

4 Dataset

The experiments were conducted on three hyperspectral dataset such as Pavia univer-
sity, Indian pines, and Salinas scene. In 1992, the AVIRIS sensor obtained Indian
pines dataset over the Indian pines test site in northwest Indiana. IP has images with
a spatial dimension of 145 × 145 m/pixel and a wavelength of 224 spectral bands
varying from 400 to 2500 nm, 24 bands were omitted. 16 vegetation groups which
are mutually exclusive are located in the IP dataset. Nearly 50% (10,249) of a total
of 21,025 pixels, however, include ground truth information from each of the 16
different classes.
Pavia university dataset was acquired by the ROSIS sensor, Northern Italy, in 2001.
This consists of 610 spatially 340 pixels and spectral information is recorded with
1.3 mpp spatial resolution within 103 bands varying from 430 to 860 nm wavelength.
The ground truth is conceived to provide nine levels of urban land. In addition,
approximately 20% of total 207,400 picture elements include information about
ground reality.
The Salinas scene dataset was collected in 1998 over the Salinas Valley, CA, USA
by the 224-band AVIRIS sensor, and the images are 512–217 spatial dimensions and
spectral information is encoded in 224 bands with a wavelength varying from 360
to 2500 nm. For both in Salinas scene and Indian pines due to water absorption, 20
spectral bands were discarded.

5 Results

The plot of accuracy, epoch, and loss provides an indication of useful things about
the training of the model, such as the speed of convergence over epochs (slope).
The classified image obtained from the hybrid CNN is represented in Fig. 2.
602

Table 1 Trainable parameters

Indian pines Pavia university Salinas scene
Conv_3d 1 − (8 × 3 x 3 × 7 x 1) + 8 = 512 Conv_3d 1 − (8 × 3 x 3 × 7 x 1) + 8 = 512 Conv_3d 1 − (8 × 3 x 3 × 7 x 1) + 8 = 512
Conv_3d 2 − (16 × 3 x 3 × 5 x 8) + 16 = 5776 Conv_3d 2 − (16 × 3 x 3 × 5 x 8) + 16 = 5776 Conv_3d 2 − (16 × 3 x 3 × 5 x 8) + 16 = 5776
Conv_3d 3 − (32 × 3 x 3 × 3 x 16) + 32 = 13,856 Conv_3d 3 − (32 × 3 x 3 × 3 x 16) + 32 = 13,856 Conv_3d 3 − (32 × 3 x 3 × 3 x 16) + 32 = 13,856
Conv_2d − (3 × 3 x 576 × 64) + 64 = 331,840 Conv_2d − (3 × 3 x 96 × 64) + 64 = 55,360 Conv_2d − (3 × 3 x 96 × 64) + 64 = 55,360
Dense 1 − (18,496 × 256) + 256 = 4,735,232 Dense 1 − (18,496 × 256) + 256 = 4,735,232 Dense 1 − (18,496 × 256) + 256 = 4,735,232
Dense 2 − (256 × 128) + 128 = 32,896 Dense 2 − (256 × 128) + 128 = 32,896 Dense 2 − (256 × 128) + 128 = 32,896
Dense 3 − (128 × 16) + 16 = 2064 Dense 3 − (128 × 16) + 16 = 1161 Dense 3 − (128 × 16) + 16 = 2064
Total trainable parameters = 5,122,176 Total trainable parameters = 4,844,793 Total trainable parameters = 4,845,696
K. Priyadharshini and B. Sathya Bama
A Novel Deep Hybrid Spectral Network for Hyperspectral … 603

Table 2 Calculation of number of patches

Dataset M N S No. of patches
Indian pines 145 145 25 14,641
Pavia university 610 340 19 190,624
Salinas scene 512 217 19 98,306

Fig. 2 Indian pines, Pavia university and Salinas scene, respectively—classification

Table 3 Accuracy
Network Accuracy (%)
comparison
SVM 88.18
2D CNN 89.48
3D CNN 90.40
Hybrid network 99.79

The number of epoch considered for the proposed hybrid CNN is 100. For
the validation and training samples, the value of loss convergence and accuracy
is obtained.
Table 3 represents the accuracy comparison of the proposed hybrid network with
the 3D CNN, 2D CNN, and the support vector machine. The hybrid spectral network
provides an accuracy of 99.79% (Fig. 3).

6 Conclusion

Hyperspectral image classification is not easy as the ratio involving number of bands
in the spectral domain and the number of samples for training is adverse. Three
benchmark hyperspectral datasets such as Salinas scene, Pavia university, and the
Indian pines are used for the classification. The 3D or 2D convolution single-handedly
cannot reflect the highly discriminatory function as opposed to 3D and 2D hybrid
604 K. Priyadharshini and B. Sathya Bama

Accuracy Vs Epoch for Indian Accuracy Vs Epoch for Pavia Accuracy Vs Epochs
pines University
1.5
1.5

Accuracy
2 1
Accuracy

Accuracy
1
1 0.5
0.5
0 0

1
18
35
52
69
86
0 1 1223344556677889
1
13
25
37
49
61
73
85
97
Epochs Epochs
Epochs
loss Vs Epochs loss Vs Epoch for Pavia loss Vs Epochs of Salinas
3.5 University Scene
3
2 6
2.5
1.5
2 4
Loss

loss
Loss

1
1.5 2
0.5
1
0 0
0.5

1
16
31
46
61
76
91
1
16
31
46
61
76
91
0 Epochs Epochs
1
11
21
31
41
51
61
71
81
91

Epochs

Fig. 3 Plot of Accuracy versus Epoch and loss versus Epoch for Indian pines, Pavia University
and Salinas Scene

convolutions. The proposed model is more beneficial than the 2D CNN and 3D CNN.
The used 25 × 25 spatial dimension is most suitable for the proposed method.
The experimentation is carried on three hyperspectral datasets to analyze and
compare the performance metrics. Classification of hyperspectral data using SVM
provides an accuracy of 88.18%. The performance of the proposed model is able to
outperform 2D CNN (89.48%) and 3D CNN (90.40%) by providing the accuracy of
99.79%. The hybrid CNN is computationally efficient compared to the 3D CNN and
for minimum training data it provides enhanced performance.

References

1. Chang C-I (2003) Hyperspectral imaging: techniques for spectral detection and classification.
Springer Science and Business Media, vol 1
2. Camps-Valls G, Tuia D, Bruzzone L, Benediktsson JA (2014) Ad_x0002_vances in hyperspectral
image classification: Earth monitoring with statistical learning methods. IEEE Signal Process
Mag 31(1):45–54
3. Liu P, Zhang H, Eom KB (2017) Active deep learning for classification of hyperspectral images.
IEEE J Select Top Appl Earth Observ Remote Sens 10(2)
4. Dian R, Li S, Guo A, Fang L (2018) Deep hyperspectral image sharpening. In: IEEE transactions
on neural networks and learning systems, vol 29, no 11
5. Yu C, Zhao M, Song M, Wang Y, Li F, Han R, Chang C-I (2019) Hyperspectral image classi-
fication method based on CNN architecture embedding with hashing semantic feature. IEEE J
Select Top Appl Earth Observ Remote Sens 12(6)
Anomaly Prognostication of Retinal
Fundus Images Using EALCLAHE
Enhancement and Classifying
with Support Vector Machine

P. Raja Rajeswari Chandni

Abstract Ophthalmic diseases are generally not serious, but can be lifesaving too.
Even though genetic eye disorders have their own significant effect for generations,
man-made disorders due to certain unhealthy practices can induce serious conditions
like vision loss, retinal damage, macular degeneration caused in young adults due
to smoking, and so on. Besides all the odds, detection of diseases way before they
start to threaten could be easier to get rid of major damage. This proposed system
focuses on providing first-level investigation in detecting ophthalmic diseases and
to assist subjects to identify the anomalous behavior earlier as well as initiate reme-
dial measures. Retinal fundus images used undergo pre- and post-processing stage,
then is trained, tested, and classified based on the disorders like vitelliform macular
dystrophy (VMD), retinal artery and vein occlusion (RAVO), Purtscher’s retinopathy
(PR), and diabetic patients with macular edema (ME).

Keywords SVM · Edge-aware local contrast adaptive histogram equalization

(EALCLAHE) · Color features · Image processing

1 Introduction

Ophthalmic diseases possess no threat to human beings initially; however, change

over time causes impeccable effect to the subject. A human visual system model
(HVSM) is employed by the computer vision experts in diagnosing diseases through
digital image and video processing by CAD systems. These experts are providing
simplified models that are easy to understand and work on for further processing,
exploring, and identifying abnormalities. Manual assessments can often be miscalcu-
lative while correlating the actual symptoms. Thus, improved and quality enhanced
systems are needed to provide correct diagnosis at the right time, where medical
image processing (MIP) algorithms come into existence [1]. The MIP’s strategy to
work with images includes producing images from normal fundus image to upgraded

P. Raja Rajeswari Chandni (B)

Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, India

form like improving quality, differentiating regions, identifying edges, working on

non-uniformity, measuring entropy, color balancing, etc., and has significant effect in
processing the medical images [2, 3]. The MIP’s strategy also helps the diagnosis of
such retinal abnormalities acquired with the help of color photographs, in which the
image is undesirable for all types of retinal disease identification. So effective prepro-
cessing techniques are needed to process and produce contrast image with better color
balanced enhancement by avoiding non-uniform illumination [4]. In order to denote
the differences between processed and original image, so many evaluators are avail-
able like NIQE, RIQA, etc., to assess the output images. But processing of color
images is not easier as that of the gray scaled ones, and thus, in this paper, an effec-
tive algorithm for reducing non-uniform illumination while preserving the edges is
implemented to identify diseases and help the society in treating disease before it
gets worsen.

2 Related Literary Work

Sakthi Karthi Durai et al. [1] in this paper discuss the various diseases detected from
retinal fundus images. The detected diseases are age-related macular degeneration
(AMD), cataract, hypertensive retinopathy, diabetic retinopathy. Various classifiers
and preprocessing methods were reviewed of which adaptive histogram equalization
forms the major role in preprocessing and SVM gave the best output.
Kandpal and Jain [5], Sarika et al. [6] in this paper deal with the various methods
of enhancing the color texture features that are used in the preprocessing of retinal
fundus images and the CLAHE method with edges and dominant orientations proved
to be better of all. The technique simply suppresses the non-textured pixels and
enhances the textured so that better quality image is obtained which is estimated
with the help of BRISQUE.
Shailesh et al. [7] in this review paper brief the comparison of techniques so far
employed using various preprocessing to classification stages. The initial prepro-
cessing stage mostly used CLAHE together with color moments for effective image
retrieval process. CLAHE proved to be flexible in detecting the non-textured region.
Segmentation techniques deployed were ROI, green channel extraction, quantiza-
tion, thresholding, etc. A series of classifiers was incorporated, in which support
vector machine (SVM), radial basis function neural network (RBFNN), artificial
neuro fuzzy (ANF), artificial neural network (ANN), random forest (RF), decision
tree performed well.
Onaran et al. [8] in this paper discuss on the findings of the Purtscher’s retinopathy
(PR), that is caused mainly due to the traumatic state, which mainly forms like a
small series of cotton wool spots. In this paper, two image types are used OCT and
FUNDUS. On comparing the results between the two, FUNDUS images provided a
good start for the accumulation of the small bilateral cotton wool spots manly on the
posterior poles of the retina.
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 607

Xiao et al. [9] in this paper deeply discuss the vector quantization technique that
is effective in detecting macular edema (ME). As the color images are od 24-bit, the
clustering vectors divide the image into eight levels with the seven threshold values
injected to the segmentation process of 8 × 2 × 2. Thus, most of the information is
extracted from this segmentation technique for the processing of RGB image.
Anusha et al. [10] in this paper explain the feature extraction techniques that are
used to get the color moments mainly for image retrieval and the values of mean,
stand deviation, skewness distribution between moments. The textured features like
entropy, energy, etc., are also got to understand the features well. Thus, image retrieval
process is effectively fast using color moments.

3 Proposed System

The proposed system utilizes CAD systems for the diagnosis for medical data on
physical abnormality. Firstly, the retinal fundus images are preprocessed with help
of histogram technique and are also quantized to obtain the intermediate levels color
information. Secondly, segmentation of the images is done using thresholding and
quantization mainly for the purpose of post-processing to avoid missing any infor-
mation which could be useful in detecting the retinal surface’s abnormality. Thirdly,
features are extracted to get the texture details, information regarding feature differ-
entiation, energy, mean amplitude, median, and standard deviation to make it easy for
the algorithm to obtain a firm decision in detection of diseases. Finally, all the above-
mentioned steps are summarized and coupled to the training step for classification
using SVM classifier and is tested for accuracy (Fig. 1).

4 Description of the Schematic Diagram

4.1 Dataset

Retinal fundus images were initially collected from online platform like MESSIDOR,
DRIVE, STARE, KAGGLE, CHASE, ARIA, ADCIS, etc., which aim at providing
datasets for research purposes. The process is initialized with the acquisition of
images through a digital camera or fundus images obtained from the dataset
repository.
608 P. Raja Rajeswari Chandni

Fig. 1 Schematic overview

of the proposed system

4.2 Stage I—Preprocessing

Retinal fundus images are prominent to periodic noises like salt and pepper and Gaus-
sian noise; thus, a firm preprocessing technique is required to strengthen the overall
region of the image. The introduced (EALCLAHE) technique initially processes
RGB components separately along with gamma correction and G-matrix to display
as the RGB dash components. The clip limit is set with Rayleigh distribution, upon
setting the edge threshold limit to leave away intact the strong edges with minimum
intensity amplitude and initiating the amount of enhancement for smoothing the
local contrast. Further, for denoising the enhanced images denoising convolution
neural networks (dnCNN) which has pretrained nets and offers better noise reducing
techniques over the others. The flowchart of the preprocessing step is listed below in
Fig. 2. For testing the image quality, PSNR, SSIM, NIQE, BRISQUE quality metrics
are used (Fig. 3).
The testing parameters for the noise removal used in this proposed method are:
• Peak Signal-to-Noise Ratio (PSNR):
Peak signal-to-noise ratio or the PSNR [11] is employed to calculate the variations
present in the original and denoised image of size M × N. This is estimated using
equation of SNR and is expressed in decibel (dB). The original image and denoised
images are represented as r (x, y) and t(x, y) respectively.
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 609

Fig. 2 Schematic flow of the preprocessing system

2552
P SN R = 10 log10 (1)
MSE
where 255 is the highest intensity value in the grayscale image and MSE is the
mean-squared error and is given by.
610 P. Raja Rajeswari Chandni

Fig. 3 Results of the preprocessing step

M,N [r (x, y) − t(x, y)]2
M SE = (2)
M*N
• Structural Similarity Index Measure (SSIM):
SSIM index is calculated for a selected window ranging between x and y of size
N × N may be drained in the subsequent way

(2μμw + C1 )(2σ (I, Iw ) + C2

SSIM = 2 (3)
μ + μ2w + C1 (σ I )2 + σ Iw2 + C2
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 611

Table 1 List of quality metrics for four different test images

Evaluation parameters Sample image 1 Sample image 2 Sample image 3 Sample image 4
PSNR 21.2276 23.5478 22.7431 23.9875
SSIM 0.8631 0.8976 0.9054 0.9127
NIQE 3.0017 3.0006 2.9871 3.0076
BRISQUE 25.0067 23.1549 24.5479 24.3103

μx = > Average value of x.

μy = > Average value of y.
x 2 = > ‘x’—Variance
y2 = > ‘y’—Variance
C 1 = (k 1 L)2 , C 2 = (k 2 L)2 .
C 1 and C 2 the variables to stabilize the weak denominator.
L is the dynamic range.
The default values of k 1 = 0.01, k 2 = 0.03.
• Naturalness Image Quality Evaluator (NIQE):
Naturalness or no-reference image quality score is a nonnegative scalar value
that measures the distance between natural scenes calculated from image A with
respect to the input model.
Score = niqe(A);

• Blind/Referenceless Image Spatial Quality (BRISQUE):

BRISQUE predicts the score of the image with the use of support vector regression
models that are trained on sets of images which respond to differential mean
opinion score values. Usually the BRISQUE score values vary from 0 to 100,
lower the value, better is the perceptual quality.
Score = brisque(A);

The testing parameters of the proposed system for quality metrics are tabulated
below (Table 1 and Fig. 4).

4.3 Stage II—Segmentation

Though the previous step acts upon the improvement of image data to suppress
the undesirable distortions, segmentation is also utmost necessary to fine tune the
properties of the image.
The proceeding step is the conversion of RGB color space to HSV (hue, saturation,
value) in order to obtain the luminance values of each color which provides better
information for the feature extraction process. The individual threshold values for
each color are obtained based on range set segmentation and are listed below in Table
612 P. Raja Rajeswari Chandni

Fig. 4 a, d, g Original images, b, e, h luminance enhanced images, c, f, i images with proposed

EALCLAHE output

Table 2 List of the threshold

Images Threshold Threshold Threshold
values obtained and is set for
value of H value of S value of V
the quantization level
Test image 1 0.5 0.5 0.3
Test image 2 0.45 0.5 0.4
Test image 3 0.5 0.45 0.04

2. Then, the images are quantized to separate into eight levels. In an aim to obtain
the complete color information regarding the hue, saturation, value of the RGB color
space (Fig. 5).

4.4 Stage III—Feature Extraction

Feature extraction is the step that which forms the base of predicting the diseases
through classification.
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 613

Fig. 5 Segmented image using imquantize ( )

The proposed system utilizes color texture analysis using Gabor filter. Gabor
filters are so flexible that it offers higher degrees of freedom over Gaussian deriva-
tives. During texture analysis, Gabor features are extracted to analyze the particular
frequency of an image in specified direction around the region of analytical interest.
Wavelet transform is also used to support the texture analysis in obtaining a combo
of feature vectors. Further, image retrieval allows diagnostic process more effective
to differentiate images based on its color features (Table 3).

4.5 Stage IV—Classification

This step helps in differentiating between the similarity features. The features like
color moments, mean amplitude, standard deviation, energy, entropy, and color
textures are obtained from the above step and are injected to the classifier to perform
at its best.
SVMs are efficient classifiers mainly for machine learning researches. There are
several functional approaches including polynomial, radial basis, neural networks,
etc. The linear SVM classifier maps points into divided categories such that they are
wide separated with gaps between them. Thus, a hyperplane is selected to classify
the dataset provided and the plane must satisfy the condition

Yi [(w · xi ) + b] ≥ 1 − εi , εi ≥ 0 (4)

where
W = the weight vector
614 P. Raja Rajeswari Chandni

Table 3 List of the feature

Features Equations
extraction techniques used
Energy N
−1 N
−1
p(x, y)2
x=0 y=0

Correlation N
−1 N
−1
(1−μx )(1−μ y ) p(x,y)
σx σ y
x=0 y=0

Entropy N
−1 N
−1
p(x, y) log( p(x, y))
x=0 y=0

Contrast N
−1 N
−1
|x − y|2 p(x, y)
x=0 y=0

Mean 1
L−1
n ri j p ri j
i =0
j =0
N
Standard deviation i=1 (x i j −x)
2)
N

b = the bias
εi = the slack variable.
The prompted system is classified effectively using support vector machine
(SVM). This system uses 7:3 ratio for training and testing of image database. Further,
the classified images are inspected with the help of confusion matrix in order to
examine the classifiers performance. The Accuracy of the system is represented
graphically in the fig below.

4.6 Results and Conclusion

The classified images are successfully tested with the help of confusion matrix and
the number of misclassified images is also available in confusion matrix table and
their accuracies are plotted in the graph. The accuracy is estimated using TP, TN, FP,
FN parameters. The overall accuracy of the system obtained to be 94.3%. Further
using various other comparison techniques can improve the quality of the system
(Figs. 6, 7 and Table 4).

TP + TN
Accur acy % = × 100 (5)
TP + FP + TN + FN
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 615

Fig. 6 Confusion matrix for the proposed system

Fig. 7 Graphical representation of the accuracies obtained for each class

616 P. Raja Rajeswari Chandni

Table 4 Accuracy values for

Categories Accuracy (%)
individual class
VMD 0.913
RAVO 0.924
PR 0.851
ME 0.924

5 Conclusion and Future Scope

The ability to detect various diseases in their early stage is a very useful work for the
society. This system can be useful in healthcare domain, especially in routine check-
ups and so, the diseases could be caught in their initial stages. It helps in recognizing
the disease of the person and minimizes the cost of diagnosing the disease. The
proposed system is based on digital image processing technique, which combines
the features of Retinal color, shape, texture to form a feature vector for texture
analysis and then predicting the disease using supervised learning SVM algorithm.
Our future enhancement is to implement this project with hardware setup to overcome
some limitations of image processing and to further enhance the model to a product
or even an android application which can be used to conduct test without human
supervision and no duly cost for diagnosing.

References

1. Sakthi Karthi Durai et al B (2020) A research on retinal diseases predictionin image processing.
Int J Innov Technol Explor Eng 9(3S):384–388. https://doi.org/10.35940/ijitee.c1082.0193s20
2. Vonghirandecha P, Karnjanadecha M, Intajag S (2019) Contrast and color balance enhancement
for non-uniform illumination retinal images. Tehničkiglasnik 13(4):291–296. https://doi.org/
10.31803/tg-20191104185229
3. Rupail B (2019) Color image enhancement with different image segmentation techniques. Int
J Comput Appl 178(8):36–40. https://doi.org/10.5120/ijca2019918790
4. Jiménez-García J, Romero-Oraá R, García M, López-Gálvez M, Hornero R (2019) Combina-
tion of global features for the automatic quality assessment of retinal images. Entropy 21(3):311.
https://doi.org/10.3390/e21030311
5. Kandpal A, Jain N (2020) Retinal image enhancement using edge-based texture histogram
equalization. In: 2020 7th international conference on signal processing and integrated networks
(SPIN), Noida, India, pp 477–482. https://doi.org/10.1109/SPIN48934.2020.9071108
6. Sarika BP, Patil BP (2020) Automated macula proximity diagnosis for early finding of diabetic
macular edema. In: Research on biomedical engineering. Springer, Berlin. https://doi.org/10.
1007/s42600-020-00065-9
7. Shailesh K, Shashwat P, Basant K (2020) Automated detection of eye related diseases using
digital image processing. In: Handbook of multimedia information security: techniques and
applications, pp 513–544. https://doi.org/10.1007/978-3-030-15887-3_25
8. Onaran Z, Akbulut Y, Tursun S, Oğurel T, Gökçınar N, Alpcan A (2019) Purtscher-
like retinopathy associated with synthetic cannabinoid (Bonzai) use. Turkish J Ophthalmol
49(2):114–116. https://doi.org/10.4274/tjo.galenos.2018.67670
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 617

9. Xiao W, He L, Mao Y, Yang H (2018) Multimodal imaging in purtscher retinopathy. Retina

38:1. https://doi.org/10.1097/IAE.0000000000002218
10. Anusha V, Reddy V, Ramashri T (2014) Content based image retrieval using color moments
and texture. Int J Eng Res Technol 3
11. Gonzalez R, Woods R Digital image processing
Analysis of Pre-earthquake Signals Using
ANN: Implication for Short-Term
Earthquake Forecasting

Ramya Jeyaraman, M. Senthil Kumar, and N. Venkatanathan

Abstract Earthquake is complex physical phenomena. The heterogeneous nature

of the earth’s interior is the reason for the unpredictable nature of earthquake occur-
rence. In recent years, scientists across the world are trying to develop a model
using multiparameter earthquake precursors. In this paper, we discuss the associ-
ation of abnormal irregularity in solid earth tides (SET) and anomalous transient
change outgoing longwave radiation (OLR) with major earthquakes and utilizes a
neural network to forecast the occurrence of notable earthquakes. We have considered
the area of Simeulue, Indonesia region, and considered earthquakes of magnitude
>5.0 takes place during the period from 2004 to 2014. Earthquake parameters for
Simeulue, Indonesia region, has been taken for analysis by which anomaly date of
solid earth tide and weights has been assigned for the continual anomaly days, OLR
anomaly date, distance, day of OLR anomaly, latitude, longitude, anomaly index
which appears before the earthquake are selected as input parameters, whereas the
date of occurrence of earthquake, latitude, longitude, depth, magnitude are selected as
output parameter for the neural network. We have used Elman backpropagation neural
network model for forecasting the above-said output parameters. The analysis of the
results given by the EBPNN have shown reasonable accuracy. Even though the results
have to be tested in other regions, the results of the EBPNN have shown encouraging
signs in developing an effective short-term earthquake-forecasting model.

Keywords Earthquake forecasting · Solid earth tides · Outgoing longwave

radiation · Artificial neural network

1 Introduction

Scientists have associated solid earth tides with the occurrence of earthquakes, as
the displacement produced by earth tides affects the motion of the tectonic plates;
the results suggest that the big earthquakes are triggered by the abnormal irregularity

R. Jeyaraman · M. Senthil Kumar · N. Venkatanathan (B)

SASTRA Deemed to be University, Thanjavur, Tamil Nadu, India
e-mail: [email protected]

in solid earth tides (SET) and anomalous transient change in outgoing longwave
radiation (OLR). Ide [1] has confirmed that the majority of the higher magnitude
earthquakes are likely to happen when there is high tidal stress which is limited to
specific regions or circumstances. Similarly, transient thermal abnormalities occur-
ring before destructive earthquakes were detected by Russian scientists during the
late 1980s through the use of satellite technology. By understanding the atmospheric
earthquake signals scientifically by making use of advanced remote sensing instru-
ments, satellite thermal imaging data can be used as an effective tool in the detection
of OLR anomaly [2].

1.1 Involving Concepts

A machine learning subgroup of artificial intelligence(AI) which offers computa-

tional statistical tools to explore the data and train a model by analyzing complex
algorithms by programming. A subgroup of techniques for machine learning strategy
attempts to imitate the deep learning anatomy of the human brain. This includes a
multineural network architecture that focuses on predictive analytics used to construct
complicated models and algorithms to generate a predictive analysis. In making reli-
able decisions, predictive analytical models provide an upward drive and expose
“complex and secret perspectives” by learning from past patterns and historical data
relationships.

1.2 Introduction to Neural Networks

When there is a need for tabular datasets processing, solving classification prediction
problems, and regression prediction problem, artificial neural networks (ANN) is
used.
Moustra et al. [3] developed an artificial neural network for time series analysis
of seismic electric signals through which input data is the magnitude and the corre-
sponding output will be the next day magnitude and performance evaluation has been
made.
Recurrent neural networks(RNN) is a class of ANN that is used when the informa-
tion is in the form of time series data and when there is temporal dynamic behavior,
CNN is used when there is a need to map the image data to a resulting variable when
there is a temporal dynamic activity.
It holds well with data that has a spatial relationship. Asim et al. [4] predicted
using seismic features and classified in combination with support vector regressor—
hybrid neural network prediction system and their performance can be measured for
a particular region. Vardaan et al. [5] discussed forecasting earthquakes and trends
using a series of past earthquakes. Long short-term memory (LSTM), one of the
categories of RNN, is used for modeling the series of earthquakes. The model is
Analysis of Pre-earthquake Signals Using ANN: Implication … 621

trained for predicting the future trend of earthquakes. It is contrasted with feed-
forward neural network (FFNN) and as a result, LSTM was found to better than
FFNN.

1.3 Elman Backpropagation Neural Network

The Elman neural networks is a type of dynamic RNN that follows a varied feed-
forward topology. The architecture of Elman neural network comprises of the input
layer, followed by a hidden layer, and finally the output layer. The best part of using
the Elman network is that the particular context input nodes which memorize the
previous data of the hidden nodes. This makes Elman-NN applicable in the fields of
dynamic system identification and prediction control parameters used for creating
the Elman backpropagation neural network.
The historical dataset of precursory parameters of the earthquake occurred in
Simeulue, Indonesia region from 2004 to 2018 is considered for training and subse-
quent testing will be done for the historical dataset of precursory parameters of the
earthquake that occurred in Simeulue, Indonesia region, during 2004–2014. We have
utilized the United States Geological Survey (USGS) for obtaining earthquake cata-
logs. The number of iterations will be changed to forecast the earthquakes concerning
spatial variables such as latitude, longitude, magnitude, and date/time of occurrence
with reasonable accuracy for achieving optimization. These precursors were observed
several days to months before the occurrence of big earthquakes; hence, we have used
ANN to forecast the occurrence of earthquakes.

2 Study Area

Indonesia is prone to earthquakes due to its location on the Ring of Fire, an arc of
volcanoes and fault lines in the Pacific Ocean basin. The field shaped like a shoe
extends 40,000 km (25,000 miles) and is where most earthquakes occur around the
world.
Several large earthquakes have struck the Indonesian region as Indonesia’s
tectonics seem to be highly complex because, and many tectonic plates like Eurasian
Plate, Australian Plate, Philippine Sea Plate will meet at this point and the Pacific Plate
between two oceanic plates [6]. Sumatra sits above the convergent plate boundary,
where the Australia Plate is suppressed along with the Sunda megathrust under the
Sunda Plate. The convergence on this section of the boundary is strongly oblique
and the strike-slip portion of the plate movement is accommodated along the Great
Sumatran Fault on the right side. The Sunda megathrust activity has triggered several
tremendous earthquakes.
The February 20, 2008 magnitude 7.4 Simeulue, Indonesia earthquake occurred
as a result of a thrust fault on the border between the Australia and Sunda plates.
622 R. Jeyaraman et al.

The Australia plate travels north-northeast toward the Sunda plate at a pace of about
55 mm/year at the location of this earthquake [7] (Table 1).
In this study, 28 earthquakes of Simeulue, Indonesia region, earthquakes are
analyzed in terms of magnitude (Fig. 1).

Table 1 List of earthquakes occurred in Simeulue, Indonesia region since 2004 with magnitude
>6 (data provided by USGS https://earthquake.usgs.gov)
Event Origin time Latitude Longitude Mag Depth Place
25-07-2012 00:27:45.260Z 2.707 96.045 6.4 22 Simeulue, Indonesia
26-01-2011 15:42:29.590Z 2.205 96.829 6.1 23 Simeulue, Indonesia
09-12-2009 21:29:02.890Z 2.759 95.91 6 21 Simeulue, Indonesia
29-03-2008 17:30:50.150Z 2.855 95.296 6.3 20 Simeulue, Indonesia
20-02-2008 08:08:30.520Z 2.768 95.964 7.4 26 Simeulue, Indonesia
22-12-2007 12:26:17.470Z 2.087 96.806 6.1 23 Simeulue, Indonesia
29-09-2007 05:37:07.260Z 2.9 95.523 6 35 Simeulue, Indonesia
07-04-2007 09:51:51.620Z 2.916 95.7 6.1 30 Simeulue, Indonesia
11-08-2006 20:54:14.370Z 2.403 96.348 6.2 22 Simeulue, Indonesia
19-11-2005 14:10:13.030Z 2.164 96.786 6.5 21 Simeulue, Indonesia
08-06-2005 06:28:10.920Z 2.17 96.724 6.1 23.5 Simeulue, Indonesia
28-04-2005 14:07:33.700Z 2.132 96.799 6.2 22 Simeulue, Indonesia
30-03-2005 16:19:41.100Z 2.993 95.414 6.3 22 Simeulue, Indonesia
26-02-2005 12:56:52.620Z 2.908 95.592 6.8 36 Simeulue, Indonesia
27-12-2004 20:10:51.310Z 2.93 95.606 5.8 28.9 Simeulue, Indonesia
28-12-2004 03:52:59.230Z 2.805 95.512 5 24.9 Simeulue, Indonesia
29-12-2004 10:52:52.000Z 2.799 95.566 5.4 23.2 Simeulue, Indonesia
01-01-2005 01:55:28.460Z 2.91 95.623 5.7 24.5 Simeulue, Indonesia
05-02-2005 04:09:53.640Z 2.325 95.065 5.1 30 Simeulue, Indonesia
09-02-2005 01:02:26.190Z 2.278 95.156 5 28.7 Simeulue, Indonesia
24-02-2005 07:35:50.460Z 2.891 95.729 5.6 30 Simeulue, Indonesia
28-03-2005 12:56:52.620Z 2.335 96.596 5.4 28.9 Simeulue, Indonesia
28-03-2005 16:34:40.570Z 2.087 96.503 5.5 30.6 Simeulue, Indonesia
28-03-2005 16:44:29.780Z 2.276 96.183 5.1 30 Simeulue, Indonesia
28-03-2005 17:03:34.430Z 2.751 96.049 5.4 30 Simeulue, Indonesia
28-03-2005 18:48:53.500Z 2.467 96.758 5.1 26.8 Simeulue, Indonesia
28-03-2005 19:54:01.090Z 2.889 96.411 5.6 29.2 Simeulue, Indonesia
28-03-2005 23:37:31.350Z 2.914 96.387 5.4 28.6 Simeulue, Indonesia
Analysis of Pre-earthquake Signals Using ANN: Implication … 623

Fig. 1 A location map for Simeulue, Indonesia region. The red color indicates the epicenters over
the region of 2004–2018 earthquakes which hasa magnitude above 6 as listed in Table 1

3 Methodology

3.1 Anomalous Outgoing Longwave Radiation

At the height of the atmosphere, outgoing longwave radiation (OLR) is thermal

energy reflecting the sum of energy released by the earth’s surface and atmosphere
into space. It is the budget for earth radiation. NOAA—A polar-orbiting spacecraft
observing OLR values. (W/m2 ) is the energy flux of outgoing longwave radiation
from the surface of the earth. From 160 °E to 160 °W longitude, data is centered on
equatorial regions. The raw data is translated into a regular index of anomalies. A
grid resolution of 1° × 1° (latitude × longitude) is the information obtained. OLR
data is time series data spanning the entire planet. Variations in the anomaly were
found 3-60 days or seven months before the devastating earthquakes.
Anomaly variations of the OLR flux have been determined from the mean OLR
flux of the past 10 years,

t
Elmτ = Elmτ (1)
τ =1
624 R. Jeyaraman et al.

where
“t” is the number of predefined previous year for which mean OLR flux is
determined for given location (l, m) and time (t)

E pqτ − Elmτ
Flux index (Elmτ ) = (2)
σlmτ

where
Elmτ —Flux index value for p-latitude, q-longitude, and data acquisition time
(t).
Elmτ —Current OLR value flux determined for spatial coordinates (p, q) and time
(t).
Elmτ —Mean OLR value flux determined for spatial coordinates (p, q) and time
(t).
Anomalous nature of flux index of energy “[Elmτ ]∗ ” can be determined by
removing out the energy flux index value below +2σ level of mean OLR flux, and it
helps in maintaining the duration of anomalous flux observed.

If Elmτ ≥ Elmτ + 2

Then,

Elmτ = [Elmτ ]∗ ELSE Elmτ = 0 (3)

where
[Elmτ ]∗ = Anomalous energy flux index observed for given location and time.

3.2 Elman Backpropagation Neural Network

Studies were done on earthquake prediction using recurrent neural networks (RNN)
on the pre-earthquake scenario of seismic, solid earth tide, and atmospheric parame-
ters of earthquakes occurred in the Simeulue, Indonesia region, over the past 14 years
(2004–2018) will be given as an input for the machine to learn. The use of the neural
network in the learning environment can be done in two stages namely training and
testing.
The retrospective analysis has been made on the earthquakes of Simeulue,
Indonesia region. In this present analysis, we looked for the stress that is investi-
gated during the syzygy from the perspective of their seismic activity effect within
solid earth tides.
The recurrent dy/dt deceleration causes the interlocking of the interface of the
tectonic plate, leading to increased tidal stresses. Therefore, such rapid deformation
will lead to a shift in the state of stress over the entire seismic zone, leading to the
release of maximum energy and thus increasing the risk of earthquakes. Given this
Analysis of Pre-earthquake Signals Using ANN: Implication … 625

Fig. 2 Elman backpropagation network

triggering effect of solid earth tides, earthquakes of greater magnitude are likely to
occur (Fig. 2).
Our current work focuses on the creation of a neural network model using a
recurrent neural network, and the performance of the Elman backpropagation neural
network is investigated with the corresponding input parameters to determine the
accuracy of the network.

4 Findings and Discussions

Florido, Emilio et al. [8] analyzed using the Levenberg–Marquardt backpropagation

algorithm is used for training in a feed-forward ANN and the obtained results were
compared to the simple backpropagation algorithm. The total neurons which are
present were established empirically in the hidden layer [9]. To find the degree of
similarity between tidal and OLR occurrence and the magnitude of earthquakes,
cross-correlation is made.
In the present work, we identified a relationship between SET, OLR, and time of
earthquake occurrence by analyzing the earthquake in 2004 and 2014. The results
indicate that solid earth tides (SET) contribute to the interlocking of tectonic plates,
resulting in the release of huge amounts of thermal radiation due to the heat of
transformation phenomenon resulting in irregular tectonic activity.

4.1 Training Function

Based on the network training function named trainlm, a neural network model for
earthquake prediction has been developed. Trainlm is a supervised neural network
training algorithm which works according to Levenberg–Marquardt optimization by
updating the weight and bias values.
626 R. Jeyaraman et al.

4.2 Layers and Description of Nodes

Input nodes involving two solid earth tide variable variables such as date of solid earth
tide anomaly and weights allocated for continual anomaly days, date of atmospheric
OLR anomaly, distance, day of OLR anomaly, latitude, longitude, pre-earthquake
anomaly index are involved in eight variables. There is no fixed approach for fixing
the optimum number of hidden nodes.

4.3 Input Nodes

By analyzing the deviation index, a thermal abnormality in the area is obtained.

Thermal radiation irregularities have been found to occur in the epicenter area for
several months or days before this event [10]. The introduction of radon gas emissions
induces an abnormal decrease in relative humidity and a rise in OLR due to the
core-to-earth drift of H+ ions to rise tectonic activity. This results in a drop in relative
humidity on the surface of the earth. Because of the upward acceleration, the anomaly
in OLR flux is due to latent heat release [11].

4.4 Variables Involved

The seismic information used in this work is derived from the entire USGS instru-
mentally reported quakes that occurred in Simeulue, Indonesia. After removing
aftershocks and foreshocks, an earthquake magnitude greater than 6 Richter is
considered. Input parameters consist of three spatial variables related to earthquake
spatial characteristics, a single variable, and two anomaly value-related variables are
considered.

4.5 Spatial Parameters

Longitude, latitude, and depth of earthquake are three parameters that are allocated
to each event.

4.6 Time Variable

The day and time between the event and the anomaly happened are considered to be
minimum peak date and maximum peak date is considered in this variable. The peak
Analysis of Pre-earthquake Signals Using ANN: Implication … 627

Table 2 List of parameters

Parameters Value
used for predicting the
magnitude of earthquakes Network style Elman backprop
occurred in Simeulue, The feature used for training Train LM
Indonesia region since 2004
Adaption learning Learn GDM
with magnitude >6 (data
provided by USGS https://ear Performance parameter function MSE
thquake.usgs.gov) Layers_count 4
Number of neurons 10
Transfer function Transig
Training Levenberg–Marquardt
Data division Random
No. of epochs 10

date difference is calculated by the difference between the acceleration to deceleration

that happened between the anomaly days (Table 2).
Since the earth is a highly homogeneous medium, the task of predicting earth-
quakes remains difficult. To ease the difficulty, the neural network method is used to
understand the precursors that emerge before the occurrence of major earthquakes
due to the different physical changes.
A category of recurrent neural networks has a simple structure and it is consid-
ered to be an effective tool in solving time sequence problems. It also can reflect
the dynamic behavior of the system. Bhatia et al. [12] discussed many techniques
for evaluating the multilayer perceptron for different input parameters and a set of
hyperparameters and time series analysis can be done effectively using LSTM using
different inputs and a different set of hyperparameters.
In this work, a neural network is developed utilizing the Elman backpropagation
network. The input from the input layer is fed into the hidden layer, then the output
from the hidden layer is entered into the output layer. The activation function is
then applied to the intermediate layers of the network to get the output. Then the
output measured error is backpropagated to the input layer from which its weights
are changed. The same approach continues until the corresponding output is identical
to the target data or the output error is minimized.
From Fig. 3, the observed and predicted largest events using the recurrent neural
network are compared by keeping the threshold value of ±0.3 for latitude and longi-
tude. Here, out of 28 earthquakes, 18 earthquakes output hold good for the neural
network thereby forecasted output is 64.29% efficient compared with the actual one
for latitude, and the error percentage is 35.7.
Out of 28 earthquakes, only five earthquakes hold good when compared with the
actual one for longitude. Hence, it shows less efficiency for the longitude of 17.85%
with an error percentage of 82.1428.
Figure 4a shows the expected vs threshold value for depth parameter. It has been
inferred from the graph by keeping the threshold value of ±2.5 for depth and ±0.45
for magnitude. For depth value, the forecasted output is 21/28 earthquakes holds
628 R. Jeyaraman et al.

Error Calculation
16
14
12
10
8
6
4
2
0
Above -0.30 0 to -0.3 0 to 0.3 Above+0.3

Latitude Longitude

Fig. 3 Comparison of the expected versus actual predicted graph for spatial coordinates

a Depth b Magnitude
15 15
10 10
5 5
0 0
Above - 0 to -2.5 0 to Above Above - 0 to - 0 to Above
2.5 +2.5 +2.5 0.45 0.45 +0.45 +0.45

Fig. 4 Comparison of the expected versus actual predicted graph for a depth value, b magnitude

good which is 75% efficient compared with the actual one and the error percentage
of 25.
For magnitude, out of 28 earthquakes, 14 earthquakes output hold good for the
neural network thereby forecasted output is 50% efficient compared with the actual
one and the error percentage is 50. The accuracy can be improved by adding more
data to the neural network.
The retrospective research findings are following the genetic consensus in the
field of seismology:
Input: The precursory earthquake parameters for Simeulue, Indonesia region, has
been taken for analysis by which date of an anomaly of solid earth tide and weights
assigned for the continual anomaly days, OLR anomaly date, distance, day of OLR
anomaly, latitude, longitude, anomaly index in which the input parameters appear
before the earthquake is chosen.
Output: The latitude, longitude, depth, the magnitude will be the output parameter
for the neural network.
The network is modeled with four hidden layers. Method of training happens by
which a predefined desired target output is compared with the actual output and the
difference is termed as an error in terms of percentage.
Analysis of Pre-earthquake Signals Using ANN: Implication … 629

From the retrospective analysis, it is inferred that it is possible to forecast the

earthquakes (at least in this study region) with reasonable accuracy. The output
reveals that the error percentage is minimal and the predicted output holds good
for some input vs output parameters as mentioned above. Although at the present
earthquake prediction cannot be made with a high degree of certainty, this research
offers a scientific method for assessing the short-term seismic hazard potential of an
area.

5 Conclusion

The significance of anomaly obtained in SET evidence a very high impact of OLR
on earthquake triggering. Hence, tidal amplitude irregularities of SET trigger plate
tectonics thereby leads to OLR anomalies, which act as a short-term precursor for
detecting the time of occurrence of earthquakes. When the tidal triggering is found
to be stronger, the larger magnitude earthquake will occur. Through the analysis with
higher reliability, we have identified a strong link between the precursors and location
of the devastating earthquake and Outgoing longwave radiation and the magnitude of
the earthquake. The result we obtained strongly suggests that a neural network model
using multiparameter earthquake precursors can develop a short-term earthquake-
forecasting model. In this paper, the correlation of peculiar anomalies in solid earth
tides (SET) and anomalous outgoing transient shift longwave radiation (OLR) with
major earthquakes is obtained. We use a neural network to predict large earthquakes
for Simeulue, Indonesia region, and considered earthquakes with a magnitude greater
than 5.0 occurred during the period from 2004 to 2014. We discuss the issue in
anticipating the spatial variables of the earthquake by finding a pattern through a
neural network in the history of earthquakes. Preliminary outcomes of this research
are discussed. Although the technique is capable of achieving effectiveness, further
efforts are being made to achieve a stringent conclusion. Also, since the networks
tend to miss significant aftershocks and pre-shocks, it is expected that the results
can be improved by including more number of data in the neural network to achieve
better efficiency.

Acknowledgements We are greatly indebted to the Ministry of Earth Sciences for financial assis-
tance (Project No: MoES/P. O(seismo)/1(343)/2018). We thank National Oceanic and Atmospheric
Administration for providing data for Outgoing Longwave radiation to the user community.

References

1. Ide S, Yabe S, Tanaka Y (2016) Earthquake potential revealed by tidal influence on earthquake
size frequency statistics. Nat Geosci Lett. https://doi.org/10.1038/NGEO2796
630 R. Jeyaraman et al.

2. Carreno E, Capote R, Yague A (2001) Observations of thermal anomaly associated to seismic

activity from remote sensing. General Assembly of European Seismology Commission,
Portugal, pp 265–269
3. Moustra M, Avraamides M, Christodoulou C (2011) Artificial neural networks for earthquake
prediction using time series magnitude data or Seismic Electric Signals. Expert Syst Appl
38:15032–15039. https://doi.org/10.1016/j.eswa.2011.05.043
4. Asim KM, Idris A, Iqbal T, Martínez-Álvarez F (2018) Earthquake prediction model using
support vector regressor and hybrid neural networks. PLoS ONE 13(7):e0199004. https://doi.
org/10.1371/journal.pone.019900
5. Vardaan K, Bhandarkar T, Satish N, Sridhar S, Sivakumar R, Ghosh S (2019) Earthquake trend
prediction using long short-term memory RNN. Int J Electr Comput Eng (IJECE) 9(2):1304–
1312. ISSN: 2088-8708. https://doi.org/10.11591/ijece.v9i2.pp1304-1312
6. Boen T (2006) Structural damage in the March 2005 Nias-Simeulue earthquake. Earthq Spectra
22. https://doi.org/10.1193/1.2208147
7. Borrero J, McAdoo BG, Jaffe B, Dengler L, Gelfenbaum G, Higman B, Hidayat R, Moore A,
Kongko W, Lukijanto L, Peters R, Prasetya G, Titov V, Yulianto E (2005) Field survey of the
March 28, 2005 Nias-Simeulue earthquake and tsunami. Pure Appl Geophys 168:1075-1088.
https://doi.org/10.1007/s00024-010-0218-6
8. Florido E et al (2016) Earthquake magnitude prediction based on artificial neural networks: a
survey
9. Wang Q, Jackson DD, Kagan YY (2009) California earthquakes, 1800–2007: A unified catalog
with moment magnitudes, uncertainties, and focal mechanisms. Seismol Res Lett 80(3):446–
457
10. Jing F, Shen X, Kang C (2012) Outgoing long wave radiation variability feature prior to the
Japan M9.0 earthquake on March 11, 2011. In: IEEE international geoscience and remote
sensing symposium, Munich, pp 1162–1165. https://doi.org/10.1109/IGARSS.2012.6351341
11. Natarajan V, Bobrovskiy V, Shopin S (2019) Satellite and ground-based observation of pre-
earthquake signals—a case study on the Central Italy region earthquakes. Indian J Phys
12. Bhatia AA, Pasari S, Mehta A (2018) Earthquake forecasting using artificial neural networks.
In: The international archives of the photogrammetry, remote sensing and spatial information
sciences, vol XLII-5
A Novel Method for Plant Leaf Disease
Classification Using Deep Learning
Techniques

R. Sangeetha and M. Mary Shanthi Rani

Abstract Agricultural productivity is one of the important sectors that influence the
Indian economy. One of the greatest challenges that affect agricultural productivity
is plant disease which is quite prevalent in almost all crops. Hence, plant disease
detection has become a hot research area to enhance agricultural productivity. Auto-
mated detection of plant diseases is hugely beneficial to farmers as it reduces the
manual workload of monitoring and detection of the symptoms of diseases at a very
early stage itself. In this work, an innovative method to categorize the tomato and
maize plant leaf diseases has been presented. The efficiency of the proposed method
has been analyzed with plant village dataset.

Keywords Agricultural productivity · Classification · Plant leaf disease

1 Introduction

Agriculture is one of the significant important sectors that have a great influence
on the economy of developing countries. The main occupation of 60% of the rural
populace is agriculture, and the livelihood of the farmers depends solely on their
agricultural productivity—the greatest challenge faced by farmers in the prevention
and treatment of plant diseases. Despite the hard and sustained efforts of farmers,
productivity is affected by crop diseases, which needs to be addressed.
With the remarkable innovations in sensors and communications technologies,
the agricultural sector is becoming digital with automated farm practices like water
management, crop disease monitoring, pest control, and precision farming, etc. Clas-
sification and identification of plant disease are one of the important applications of
machine learning.
Machine learning deals with the development of algorithms that perform tasks
mimicking human intelligence. It learns abstractions from data just like human

R. Sangeetha · M. Mary Shanthi Rani (B)

Department of Computer Science and Applications, The Gandhigram Rural Institute (Deemed To
Be University), Gandhigram, Dindigul, Tamil Nadu, India
e-mail: [email protected]

beings learn from experience and observations. Machine learning has become a
trendy research area with its growing number of applications in computer vision
in the fields of medicine, agriculture, remote sensing, forensics, law enforcement,
etc. Deep learning is a subset of machine learning which learns data associations
using deep neural networks [1]. Several researchers have explored the utilization of
deep neural networks in plant disease classification and detection. In this paper, a
convolution neural network (CNN) model has been constructed for fast and accurate
categorization of diseases affecting tomato leaves and maize leaves. As tomato is
one of the income-generating crops of farmers of rural South India, this work has
been developed for helping them in early detection of disease.
In this research work, a pre-trained CNN model smaller VGG16 net was used
to classify the leaf diseases of various plants from the image dataset. The review of
the literature is presented in Sect. 2. Our proposed work of the plant leaf disease
classification is described in Sect. 3. The experimental setup and results discussion
in terms of accuracy are presented in Sect. 4. Conclusion and future enhancement
are discussed in Sect. 5.

2 Literature Review

Kawasaki et al. [2] developed a new method of deep convolutional neural network
to differentiate healthy cucumbers leaves. The CNN model used to identify two
injurious microorganism infections: melon yellow spot virus (MYSV) and zucchini
yellow mosaic leaves (ZYMV). This model accuracy is 94.9% for cucumber leaf
disease.
Mohanty et al. [3] described a CNN model to classify the leaf disease using three
types of plant leaf dataset: colored images, gray-scaled images, and segmented leaves.
Two standard architectures AlexNet and GoogleNet are used for classification. The
highest accuracy for AlexNet is 0.9927%, and the GoogleNet is 0.9934%, by the
transfer learning.
Sladojevic et al. [4] developed a deep convolutional neural network method to
classify the plant diseases. The transformation of the CNN model increase is used to
increase the dataset size. The accuracy of this model with fine-tuning is 96.3% and
without fine-tuning is 95.8%.
Nachtigall et al. [5] discussed the application of CNN to detect and catego-
rize images of apple trees. AlexNet has been used to categorize the disease. They
compared the shallow technique against a deep convolutional neural network. In that
method, the multi-layer perception was chosen. The accuracy of the CNN model
97.3% for apple tree leaves.
Brahimi et al. [6] described the convolutional neural network for classifying
tomato leaf disease. The tomato leaves are split into nine classes of diseases. The
method used two standard architectures to be described. AlexNet and GoogleNet
are used in learning from scratch or transfer learning. The GoogleNet improves the
A Novel Method for Plant Leaf Disease Classification … 633

accuracy from 97.71 to 99.18%, and AlexNet improves the accuracy from 97.35 to
98.66%.
Dechant et al. [7] applied the deep learning method to classify the maize plants.
Three phases were suggested in this model. In the first phase, several techniques
have been taught, and in the second phase, a heat map was produced to indicate the
probability of infection in each image. The heat map was used in the final phase to
classify the picture. The total accuracy for maize leaves was 96.7%.
Lu et al. [8] described applying the CNN model to classify the rice leaves diseases.
They collect 500 images from the yield to build a dataset. AlexNet was used to create
a rice disease classifier, with the overall accuracy of 95.48% for rice leaves.
Kulkarni et al. [9] discussed an artificial neural network (ANN) methodology
to find plant disease detection and classification. Gabor filter is used for extracting
feature that gives better recognition result. An ANN classifier classifies the various
kinds of plant diseases and also identifies the mixture of color and leaf features.
Konstantinos et al. [10] described the CNN model to detect both diseased and
non-diseased leaves. Several model architectures have been trained, with the highest
results in disease identification achieving a 99.53% success rate.
Fujita et al. [11] applied a CNN classifier model using cucumber disease. The
dataset consists of seven different classes, including a healthy class. The work is
based on AlexNet architecture to classify cucumber diseases. The accuracy of the
proposed work was 82.3%.

3 Materials and Methods

The major objective of this research work is to effectively build a convolutional

neural network for classification of tomato and maize leaf diseases. Tomato leaves
are affected by seven common diseases that consist of target spot, mosaic virus,
yellow leaf curl virus, bacterial spot, early blight, late blight, and septoria leaf spot
[12]. The three common diseases that affect the maize leaves include northern leaf
blight, brown spot, and round spot [13].
CNN is a category of artificial neural network well suited for object detection
and classification problem, specifically in computer vision. It is also referred to as
ConvNet or CNN. It is a deep learning technique consisting of three layers. One is
the input layer which is the starting node and output node which is the ending node
and hidden layer which is present in between the input and output layer, and it could
be multiple hidden layers present in one layer. The hidden layers contain convolution
layer, ReLU, pooling (here we used max pooling), and fully connected layer. The
following four steps are carried out in the convolution layer.
• Divide the image into the filter,
• Multiply the image filter by the corresponding image filter,
• Add the image filter,
• Divide the total amount of pixels.
634 R. Sangeetha and M. Mary Shanthi Rani

Pooling layer is used to reduce the space dimension of an image. Batch normaliza-
tion allows every layer of a network to learn without anyone else’s input somewhat
more autonomously of different layers.
The proposed work utilizes deep CNN smaller VGG16 with thirteen layers for
characterizing various sorts of diseases in tomato and maize leaves.
Rectified linear unit (ReLU) is an activation function which is used to convert the
positive part of its output. It is the main commonly used function as it learns faster
than other functions and computationally less intensive.
Figure 1 displays the workflow of the proposed model (Fig. 2).
The detailed information regarding the no. of classes and the number of images
which are used in the dataset is given in Tables 1 and 2.
Figure 3 shows the visual representation of ten types of diseases by healthy and
unhealthy leaves.
The proposed method involves the following three main stages:
1. Preprocessing
This step involves the selection and fine-tuning of the relevant dataset.
2. Training
This stage is the core of a deep learning process which trains the CNN model
to categorize diseases using the preprocessed dataset.
3. Testing
The trained model is validated with the test dataset, and the accuracy of the
model is calculated in this stage.

Fig. 1 Flow diagram of our proposed work for tomato leaves

A Novel Method for Plant Leaf Disease Classification … 635

Fig. 2 Pictorial representation of our proposed work for maize leaves

Table 1 Dataset summary

# Classes for tomato leaves # Images
for tomato leaves
Early blight 1246
Bacterial spot 2127
Target spot 1404
Septoria spot 1771
Yellow leaf curl virus 1963
Late blight 1909
Mosaic virus 1246
Tomato healthy leaf 1591
Total 13,257

3.1 Preprocessing

One of the most vital elements of any deep learning application is to train the dataset
using the model. In the proposed work, images are taken from plant village dataset.
It consists of 13,257 images of tomato leaves and 3150 images of maize images,
636 R. Sangeetha and M. Mary Shanthi Rani

Table 2 Dataset summary

# Classes for maize leaves # Images
for maize leaves
Northern leaf blight 854
Brown spot 857
Round spot 855
Maize healthy leaf 584
Total 3150

Fig. 3 a Healthy leaf. b Bacteria spot. c Early blight. d Late blight. e Mosaic virus. f Septoria leaf
spot. g Target spot. h Yellow leaf curl virus. i Northern leaf blight. j Brown spot. k Round spot

including both healthy and non-healthy leaves. The dataset is initially divided into
the ratio of 80:20 or 70:30 for the training phase and test phase to improve the results.
The accuracy of the network depends on the size and proportion that has been taken
for training and testing. Overfitting of data results in high test dataset error, and
underfitting leads to both high training and test errors.
In the proposed method, the dataset is divided into 80:20. All the images are
resized to 256 * 256 as a preprocessing step, to reduce the time complexity of the
training phase.

3.2 Training

In the training phase, the dataset is trained using smaller VGG16 model with ReLU
activation function. One important feature of ReLU is that it eliminates negative
values in the given input by replacing with zero. This model uses binary cross-entropy
rather than categorical cross-entropy.
A Novel Method for Plant Leaf Disease Classification … 637

Fig. 4 Sample model for smaller VGG16 net architecture

3.2.1 Smaller VGG16 Net

Simonyon and Zisserman introduced the VGG network architecture. The proposed
model uses a pretrained smaller VGG16 net. Here, thirteen convolution layers are
present, and each layer is followed by ReLU layer. Max pooling is present in some
convolution layers to trim down the dimension of the image. Batch normalization
helps to learn faster and achieve higher overall accuracy. Both ReLU activation func-
tion and batch normalization are applied in all experiments. Dropout is a technique
which is used to reduce overfitting in the model during the training set. The softmax
function is used in the final layer of the deep learning-based classifier. The training
phase using VGG16 network is shown in Fig. 4.

3.3 Testing

In this segment, the validation set for prediction of the leaf as healthy/unhealthy with
its disease name is utilized to estimate the performance of the classifier. Fine-tuning:
It helps to improve the accuracy of classification by making the small modification
hyperparameters and increasing the number of layers.

4 Results and Discussion

The experimental results of our model VGG16 for the plant village dataset are given
in Table 3. It lists the classification accuracy of each of the seven diseases along with
638 R. Sangeetha and M. Mary Shanthi Rani

Table 3 Classification accuracy of various tomato leaf diseases using smaller VGG16 net
No. of Bacterial Early Late Septoria Target Yellow curl Mosaic
images spot blight blight spot spot virus virus
400 72.49 68.24 63.29 60.35 40.86 78.97 61
728 84.86 91.59 82.29 75.86 80.75 96.76 91.4
953 90.26 94.86 93.12 85.45 91.5 85.82 78.65
1246 99.94 98.69 98.71 98.46 98.4 99.91 99.74

healthy leaves. An accuracy is described as a several correctly classify images parti-

tioned divide by an absolute number of images in the dataset. The dataset contains
more than 1000 images under each disease class with a maximum limit of 1246
images.
The graphical demonstration of the accuracy of the model is shown in Fig. 5.
The graphical illustration of the accuracy of the model is shown in Fig. 6.
Table 4 presents the influence of batch size on the classification accuracy. It is
obvious from Table 4 that accuracy increases with minibatch size. It is also worth
noting that there is not much increase in accuracy for batch sizes 16, 32, and 64.
There is a steep rise inaccuracy from batch size 2–8.

Fig. 5 Classification accuracy for tomato leaf

A Novel Method for Plant Leaf Disease Classification … 639

Fig. 6 Classification accuracy for maize leaf

Table 4 Classification
No. of images Northern leaf blight Brown spot Round spot
accuracy of various maize
leaf diseases using smaller 200 62.94 68.24 63.29
VGG16 net 400 74.52 78.23 76.12
600 83.75 84.91 83.08
850 95.17 96.45 94.65

Table 5 also shows that our model achieves good accuracy above 97 with batch
size for early blight, yellow curl, and mosaic virus.
The graphical representation of Table 5 is shown in Fig. 7.

Table 5 Classification accuracy for various batch sizes

Batch size Bacterial Early Late Septoria Target Yellow curl Mosaic
spot blight blight spot spot virus virus
2 72.49 68.24 63.29 68.11 73.58 69.95 81
8 91.97 70.14 93.1 93.78 88.48 97.82 98.21
16 95.39 97.55 94.57 95.95 91.91 98.84 98.91
32 98.7 98.41 97.96 98.16 95.8 99.5 98.24
64 99.05 99 98.25 98.73 98.78 99.12 99.74
640 R. Sangeetha and M. Mary Shanthi Rani

Fig. 7 Analysis of classification accuracy for various batch sizes

Table 6 presents the influence of batch size on the classification accuracy of maize
leaves. Table 6 clearly demonstrates that accuracy increases with minibatch size. It
is also worth noting that there is not much increase in accuracy for batch sizes 16,
32, and 64. There is a steep rise inaccuracy from batch size 2–8.
The graphical representation of Table 6 is shown in Fig. 8.
Figure 9 demonstrates the visual presentation of the outputs of the proposed model
for test images. Tomato leaves and maize leaves diseases using smaller VGG16 net.
It is observable from Fig. 4 that our trained model has achieved 98% accuracy in
classifying tomato leaf diseases and maize leaf diseases.

Table 6 Classification
Batch size Northern leaf blight Brown spot Round spot
accuracy for various batch
sizes 2 69.27 67.87 65.45
8 89.62 70.53 87.76
16 90.46 92.47 90.35
32 93.65 94.01 93.43
64 95.31 94.56 94.86
A Novel Method for Plant Leaf Disease Classification … 641

Fig. 8 Analysis of classification accuracy for various batch sizes

5 Conclusion

In this paper, a smaller VGG16 net has been used to classify the diseases affecting
tomato and maize leaves using plant village dataset. The model uses thirteen layers
instead of 16 layers in VGG16. The results have demonstrated that the model has
achieved 99.18% for tomato leaves and 94.91% for maize leaves. Classification
accuracy is evaluated with 13,257 images of healthy and unhealthy tomato leaves
and 3150 images for maize leaves. The performance of this model has been analyzed
for different minibatch sizes and number of tomato and maize images. This paper
is focused on classifying the diseases in tomato and maize leaves. In the future, this
could be extended to classify diseases of other leaves as well.
642 R. Sangeetha and M. Mary Shanthi Rani

Tomato Leaf Diseases Maize Leaf Diseases

Fig. 9 The testing accuracy comparison between the Tomato and Maize leaf disease

Acknowledgements The experiments are carried out at Advanced Image Processing Laboratory,
Department of Computer Science and Application, The Gandhigram Rural Institute (Deemed to be
University), Dindigul, and funded by DST-FIST.
A Novel Method for Plant Leaf Disease Classification … 643

References

1. Kalpana Devi M, Mary Shanthi Rani M (2020) A review on detection of diabetic retinopathy.
Int J Sci Technol Res 9(2). ISSN: 2277-8616
2. Kawasaki Y, Uga H, Kagiwada S, Iyatomi H (2015)Basic study of automated diagnosis of viral
plant diseases using convolutional neural networks. In: International symposium on visual
computing, pp 638–645. Springer, Cham
3. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease
detection. Front Plant Sci 7:1419
4. Sladojevic S, Arsenovic M, Anderla A, Culibrk D, Stefanovic D (2016) Deep neural networks
based recognition of plant diseases by leaf image classification. Comput Intell Neurosci
5. Nachtigall LG, Araujo RM, Nachtigall GR (2016) Classification of apple tree disorders using
convolutional neural networks. In: IEEE 28th international conference on tools with artificial
intelligence (ICTAI), pp 472–476
6. Brahimi M, Boukhalfa K, Moussaoui A (2017) Deep learning for tomato diseases: classification
and symptoms visualization. Appl Artif Intell 31:299–315
7. DeChant C, Wiesner-Hanks T, Chen S, Stewart EL, Yosinski J, Gore MA, Nelson RJ, Lipson
H (2017) Automated identification of northern leaf blight-infected maize plants from field
imagery using deep learning. Phytopathology 107:1426–1432
8. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep
convolutional neural networks. Neurocomputing 267:378–384
9. Kulkarni Anancl H, Ashwinpatil RK (2012) Applying image processing technique to detect
plantdisease. Int J Modern Eng Res 2(5):3661–3664
10. Ferentinos PK (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agric 145:311–318
11. Fujita E, Kawasaki Y, Uga H, Kagiwada S, Iyatomi H (2016) Basic investigation on a robust and
practical plant diagnostic system. In: 15th IEEE international conference on machine learning
and applications, ICMLA, pp 989–992
12. Sangeetha R, Mary Shanthi Rani M (2019) Tomato leaf disease prediction using convolutional
neural network. Int J Innov Technol Explor Eng 9(1):1348–1352
13. Zhang X, Qiao Y, Meng F, Fan C, Zhang M (2018) Identification of maize leaf diseases using
improved deep convolutional neural networks. IEEE Access 6:30370–30377

Solved Problems For Transient Electrical Circuits
No ratings yet
Solved Problems For Transient Electrical Circuits
235 pages
10.1007@978 981 15 0214 9 PDF
100% (1)
10.1007@978 981 15 0214 9 PDF
1,011 pages
Cloud Controls Matrix v3 0 1
No ratings yet
Cloud Controls Matrix v3 0 1
1,304 pages
2020 Book InnovationsInElectricalAndElec PDF
No ratings yet
2020 Book InnovationsInElectricalAndElec PDF
839 pages
Bok - 978 981 15 7031 5
No ratings yet
Bok - 978 981 15 7031 5
1,126 pages
Sistema de Gestión de Documentos en El Lugar de Trabajo Que Emplea Computación en La Nube y Tecnología Social
No ratings yet
Sistema de Gestión de Documentos en El Lugar de Trabajo Que Emplea Computación en La Nube y Tecnología Social
724 pages
Principles of Digital Communications: Majd Hallak 170701119
100% (1)
Principles of Digital Communications: Majd Hallak 170701119
30 pages
RPA Interview Questions
100% (1)
RPA Interview Questions
2 pages
2019 Book ProceedingsOfThe1stInternation
No ratings yet
2019 Book ProceedingsOfThe1stInternation
793 pages
Lecture Notes in Analog Electronics Discrete and Integrated Large Signal Amplifiers 9789811965272 9789811965289 - Compress
No ratings yet
Lecture Notes in Analog Electronics Discrete and Integrated Large Signal Amplifiers 9789811965272 9789811965289 - Compress
375 pages
10.1007@978 3 030 37558 4 PDF
No ratings yet
10.1007@978 3 030 37558 4 PDF
411 pages
REA Approach Model AIS
No ratings yet
REA Approach Model AIS
45 pages
BOOK-Electrical Design of A 400 KV Composite Tower
100% (2)
BOOK-Electrical Design of A 400 KV Composite Tower
254 pages
Aleaud Acknowledgments With Idoc - Aae Adapter 1
No ratings yet
Aleaud Acknowledgments With Idoc - Aae Adapter 1
7 pages
Lecture Notes in Electrical Engineering
No ratings yet
Lecture Notes in Electrical Engineering
33 pages
Asychronisation OM
No ratings yet
Asychronisation OM
94 pages
Machine Learning
100% (2)
Machine Learning
885 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
50 pages
Recent Advances in Power Electronics and Drives: Shailendra Kumar Bhim Singh Arun Kumar Singh Editors
No ratings yet
Recent Advances in Power Electronics and Drives: Shailendra Kumar Bhim Singh Arun Kumar Singh Editors
641 pages
10.1007@978 981 32 9453 0
No ratings yet
10.1007@978 981 32 9453 0
237 pages
Comptia Security+ Study Guide (Sy0-501) : Chapter 1: Managing Risk
No ratings yet
Comptia Security+ Study Guide (Sy0-501) : Chapter 1: Managing Risk
9 pages
ICEEE 2020: Topics and Tracks
No ratings yet
ICEEE 2020: Topics and Tracks
1 page
Proceeding of 2021 International Conference On Wireless Communications, Networking and Applications
No ratings yet
Proceeding of 2021 International Conference On Wireless Communications, Networking and Applications
1,251 pages
ADC.F.7 Preliminary Design Review
No ratings yet
ADC.F.7 Preliminary Design Review
3 pages
Proceedings of International Conference On Power Electronics and Renewable Energy Systems
No ratings yet
Proceedings of International Conference On Power Electronics and Renewable Energy Systems
683 pages
Madhukar Dhumpeti: Devops Engineer
No ratings yet
Madhukar Dhumpeti: Devops Engineer
2 pages
10.1007@978 981 13 7123 3
No ratings yet
10.1007@978 981 13 7123 3
628 pages
Advances in Automation II (Andrey A. Radionov, Vadim R. Gasiyarov)
No ratings yet
Advances in Automation II (Andrey A. Radionov, Vadim R. Gasiyarov)
1,423 pages
Machine Learning For Predictive Analysis Proceedings of Ictis 2020 9789811571053
100% (2)
Machine Learning For Predictive Analysis Proceedings of Ictis 2020 9789811571053
627 pages
Recent Advances in Power Electronics and Drives
100% (2)
Recent Advances in Power Electronics and Drives
384 pages
Alggorithms
No ratings yet
Alggorithms
897 pages
Design and Simulation of DAC On The Basis of Capacitor Array
No ratings yet
Design and Simulation of DAC On The Basis of Capacitor Array
4 pages
Advanced Radiation Sensors VLSI Design
No ratings yet
Advanced Radiation Sensors VLSI Design
524 pages
profileMunsifa-Khan-Barbhuyanpublication363124255 Performance Analysis of Improved Mobility Mode
No ratings yet
profileMunsifa-Khan-Barbhuyanpublication363124255 Performance Analysis of Improved Mobility Mode
557 pages
AES Chris Feldwick 2004 5
No ratings yet
AES Chris Feldwick 2004 5
97 pages
eSIM Market in Asia
No ratings yet
eSIM Market in Asia
14 pages
SDH Concepts
No ratings yet
SDH Concepts
94 pages
Principles of Electronic Communication Systems: Second Edition
No ratings yet
Principles of Electronic Communication Systems: Second Edition
60 pages
Case Study 1 in Project Management (Nokia)
No ratings yet
Case Study 1 in Project Management (Nokia)
12 pages
Stars 1.06
No ratings yet
Stars 1.06
22 pages
AWS Case Study Guiltfree 1
No ratings yet
AWS Case Study Guiltfree 1
3 pages
(3) 机器人辅助卫星板装配研究
No ratings yet
(3) 机器人辅助卫星板装配研究
1,257 pages
Structural and Dynamic Analysis of Optimized Four Bar Mechanism Considering Counterweight in Coupler Link - ScienceDirect
No ratings yet
Structural and Dynamic Analysis of Optimized Four Bar Mechanism Considering Counterweight in Coupler Link - ScienceDirect
1 page
Module 1 Fundamentals
No ratings yet
Module 1 Fundamentals
73 pages
International Conference On Artificial Intelligence For Smart Community
No ratings yet
International Conference On Artificial Intelligence For Smart Community
1,049 pages
Oops Through Java (R22a0507)
No ratings yet
Oops Through Java (R22a0507)
131 pages
2022 Cognitive Io Tfor Future City Pattern Recognitionand Data Analysiswith Applications
No ratings yet
2022 Cognitive Io Tfor Future City Pattern Recognitionand Data Analysiswith Applications
19 pages
Ebook Rcaai Maanipal2023
No ratings yet
Ebook Rcaai Maanipal2023
1,039 pages
Deep Learning-Based Channel Estimation For Beamspace Mmwave Massive MIMO Systems
No ratings yet
Deep Learning-Based Channel Estimation For Beamspace Mmwave Massive MIMO Systems
4 pages
An IoT-based System For Monitoring Power Failure in 22-KV Distribution Transformer Substations Using LoRa Communication - Springer
No ratings yet
An IoT-based System For Monitoring Power Failure in 22-KV Distribution Transformer Substations Using LoRa Communication - Springer
30 pages
Robotics, Control and Computer Vision
No ratings yet
Robotics, Control and Computer Vision
600 pages
Data Science and Applications (Satyasai Jagannath Nanda, Rajendra Prasad Yadav Etc.) (Z-Library)
No ratings yet
Data Science and Applications (Satyasai Jagannath Nanda, Rajendra Prasad Yadav Etc.) (Z-Library)
546 pages
2021 Advances in VLSI - Communicatio-, and Signal Processing
No ratings yet
2021 Advances in VLSI - Communicatio-, and Signal Processing
726 pages
Emerging Research in Computing, Information, Communication and Applications
No ratings yet
Emerging Research in Computing, Information, Communication and Applications
1,028 pages
Digital Ecosystems: Interconnecting Advanced Networks With AI Applications
No ratings yet
Digital Ecosystems: Interconnecting Advanced Networks With AI Applications
918 pages
Parallel and Distributed Computing, Applications and Technologies
No ratings yet
Parallel and Distributed Computing, Applications and Technologies
325 pages
Air Traffic Management and System III 2019
No ratings yet
Air Traffic Management and System III 2019
293 pages
10.1007@978 981 15 0226 2
No ratings yet
10.1007@978 981 15 0226 2
204 pages
Lecture Notes in Electrical Engineering: Series Editors
No ratings yet
Lecture Notes in Electrical Engineering: Series Editors
12 pages
Effectof Fiber Wavinessonthe Elastic Propertiesof Pultruded Glass Fiber Reinforced Composites
No ratings yet
Effectof Fiber Wavinessonthe Elastic Propertiesof Pultruded Glass Fiber Reinforced Composites
682 pages
Solution Guide - IBM Sterling Order Management - Lightwell
No ratings yet
Solution Guide - IBM Sterling Order Management - Lightwell
5 pages
A New Approach in Energy Consumption Based On Genetic Algorithm and Fuzzy Logic For WSN
No ratings yet
A New Approach in Energy Consumption Based On Genetic Algorithm and Fuzzy Logic For WSN
1,239 pages
Micro and Nanoelectronics Devices, Circuits and Systems: Trupti Ranjan Lenka Durgamadhab Misra Lan Fu
No ratings yet
Micro and Nanoelectronics Devices, Circuits and Systems: Trupti Ranjan Lenka Durgamadhab Misra Lan Fu
519 pages
PLDT Serbilis: AKA QIK Project
No ratings yet
PLDT Serbilis: AKA QIK Project
17 pages
Communications, Signal Processing, and Systems: Qilian Liang Xin Liu Zhenyu Na Wei Wang Jiasong Mu Baoju Zhang
No ratings yet
Communications, Signal Processing, and Systems: Qilian Liang Xin Liu Zhenyu Na Wei Wang Jiasong Mu Baoju Zhang
1,228 pages
MEMS and Microfluidics in Healthcare: Koushik Guha Gorachand Dutta Arindam Biswas K. Srinivasa Rao
No ratings yet
MEMS and Microfluidics in Healthcare: Koushik Guha Gorachand Dutta Arindam Biswas K. Srinivasa Rao
251 pages
PYTHON Pandas and Manipulation Data
No ratings yet
PYTHON Pandas and Manipulation Data
36 pages
Resume Amin
No ratings yet
Resume Amin
3 pages
Technical Consulting Report - Fifth Batch - Private EP Networks
No ratings yet
Technical Consulting Report - Fifth Batch - Private EP Networks
29 pages
Ibme
No ratings yet
Ibme
4 pages
Icdsmla 2020: Amit Kumar Sabrina Senatore Vinit Kumar Gunjan
No ratings yet
Icdsmla 2020: Amit Kumar Sabrina Senatore Vinit Kumar Gunjan
1,600 pages
Deep Learning Channel Estimation For OFDM 5G Systems With Different Channel Models
No ratings yet
Deep Learning Channel Estimation For OFDM 5G Systems With Different Channel Models
22 pages
Doan Wiley
No ratings yet
Doan Wiley
15 pages
Lecture Notes in Electrical Engineering: Series Editors
No ratings yet
Lecture Notes in Electrical Engineering: Series Editors
16 pages
Ti c55x DSP
No ratings yet
Ti c55x DSP
2 pages
Proceedings of IEMTRONICS 2024 International IoT, Electronics and Mechatronics Conference, Volume 1
No ratings yet
Proceedings of IEMTRONICS 2024 International IoT, Electronics and Mechatronics Conference, Volume 1
510 pages
Datastage Interview Questions
No ratings yet
Datastage Interview Questions
61 pages
Control, Instrumentation and Mechatronics: Theory and Practice
No ratings yet
Control, Instrumentation and Mechatronics: Theory and Practice
880 pages
Emerging Electronics and Automation Select Proceedings of The 3rd International Conference, E2A 2023, Volume 2
No ratings yet
Emerging Electronics and Automation Select Proceedings of The 3rd International Conference, E2A 2023, Volume 2
329 pages
The Proceedings of 2023 4th International Symposium On Insulation and Discharge Computation For Power Equipment (IDCOMPU2023) Volume II
No ratings yet
The Proceedings of 2023 4th International Symposium On Insulation and Discharge Computation For Power Equipment (IDCOMPU2023) Volume II
731 pages
Editors Proceedingsof 4 TH International Conferenceon Machine Learning Advancesin Computing Renewable Energyand Communication MARC
No ratings yet
Editors Proceedingsof 4 TH International Conferenceon Machine Learning Advancesin Computing Renewable Energyand Communication MARC
399 pages
ICDSMLA 2023 Vol 1 Survey Paper (With Conference Title)
No ratings yet
ICDSMLA 2023 Vol 1 Survey Paper (With Conference Title)
28 pages
ICCCE 2020: Amit Kumar Stefan Mozar
No ratings yet
ICCCE 2020: Amit Kumar Stefan Mozar
1,561 pages
Asynchronous Bus.
No ratings yet
Asynchronous Bus.
3 pages
PM Tutorial 4.2 Presentation Group 6 - Case 2 Chapter 6
No ratings yet
PM Tutorial 4.2 Presentation Group 6 - Case 2 Chapter 6
14 pages
(Raje) Artificial Intelligence and Technologies. Select Proceedings of ICRTAC-AIT 2020 (2022)
No ratings yet
(Raje) Artificial Intelligence and Technologies. Select Proceedings of ICRTAC-AIT 2020 (2022)
656 pages
Proceedings of International Conference On Recent Innovations in Computing
No ratings yet
Proceedings of International Conference On Recent Innovations in Computing
689 pages
Artificial Intelligence For Sustainable Energy: Jimson Mathew Lenin Gopal Filbert H. Juwono
No ratings yet
Artificial Intelligence For Sustainable Energy: Jimson Mathew Lenin Gopal Filbert H. Juwono
413 pages
Metaheuristics and Optimization in Computer and Electrical Engineering Vol 2 Hybrid and Improved Algorithms Navid Razmjooy Instant Download
No ratings yet
Metaheuristics and Optimization in Computer and Electrical Engineering Vol 2 Hybrid and Improved Algorithms Navid Razmjooy Instant Download
84 pages
Emerging Technologies For Computing, Communication and Smart Cities
No ratings yet
Emerging Technologies For Computing, Communication and Smart Cities
778 pages
Lecture Notes in Electrical Engineering
No ratings yet
Lecture Notes in Electrical Engineering
879 pages
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
From Everand
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
Dr. Anita Gehlot
No ratings yet
Comprehensive Review of the ELECTRONICS (Analog, Digital, Microprocessor)
From Everand
Comprehensive Review of the ELECTRONICS (Analog, Digital, Microprocessor)
DR.MOHAMMAD GHUFRAN ALI SIDDIQUI
No ratings yet
Cryptography and Network Security: Demystifying the ideas of Network Security, Cryptographic Algorithms, Wireless Security, IP Security, System Security, and Email Security
From Everand
Cryptography and Network Security: Demystifying the ideas of Network Security, Cryptographic Algorithms, Wireless Security, IP Security, System Security, and Email Security
Bhushan Trivedi
No ratings yet

Machine Learning, Deep Learning and Computational Intelligence For Wireless Communication

Uploaded by

Machine Learning, Deep Learning and Computational Intelligence For Wireless Communication

Uploaded by

Lecture Notes in Electrical Engineering 749

More information about this series at http://www.springer.com/series/7818

Machine Learning, Deep

ISSN 1876-1100 ISSN 1876-1119 (electronic)

© Springer Nature Singapore Pte Ltd. 2021

Tiruchirappalli, India E. S. Gopi

Machine Learning, Deep Learning and Computational Intelligence for Wireless

for hotspot detection, GAN to estimate channel coefficients, self-interference cancel-

Technical Programme Commitee

Abhinav K. Nair, Radisys India private Limited

Umesh C. Pati, National Institute of Technology Rourkela

Executive Commitee Members

Supporting Team Members

Rajasekharreddy Poreddy, Research scholar

Machine Learning, Deep Learning and Computational Intelligence

An Improved Swarm Optimization Algorithm-Based Harmonics

Wireless Communication Systems

A Monopole Octagonal Sierpinski Carpet Antenna with Defective

Mobile Data Applications

Detection of Acute Lymphoblastic Leukemia Using Machine

Anomaly Prognostication of Retinal Fundus Images Using

Sharan Chandra , E. S. Gopi , Hrishikesh Shekhar , and Pranav Mani

Keywords Multiple-input multiple-output (MIMO) · Mutual orthogonality ·

S. Chandra (B) · E. S. Gopi · H. Shekhar · P. Mani

© Springer Nature Singapore Pte Ltd. 2021 3

Hence, in Sect. 3 of this paper, an in-depth study is conducted on the number of

2 Contributions of the Paper

Mutually orthogonal channels are highly desirable in MIMO systems. Consider a

3.1 Trend Analysis

h1 and h2 follow Gaussian distribution with zero mean

h11 h∗12 + h21 h∗22 +h31 h∗32 +· · · + hM 1 h∗M 2  

Procedure 1 Monte Carlo Simulation for Orthogonality Data

In order to investigate the degree to which mutual orthogonality holds true in

3.2 Deep Learning Architecture to Predict Number of

4.1 Signal-to-Interference-Noise-Ratio (SINR) Analysis

where the power allocated to each user, Pu = EMu , is represented as a function of Eu ,

Procedure 2 Monte Carlo Simulation for Perfect CSI

4.2 Deep Learning Architecture to Predict Number

5 Imperfect CSI: An SINR Analysis

When we substitute, Pu = √Eu

Procedure 3 Monte Carlo Simulation for Imperfect CSI

6.1 Mutual Orthogonality Simulation Data

6.2 Predicting Number of Base Station Antennas Required

6.3 Perfect CSI-SINR Convergence Simulation Data

Fig. 6 The testing data was

6.4 Perfect CSI-Predicting Number of Base Station Antennas

Fig. 8 The testing data was

6.5 Imperfect CSI—Analysing the Number of Antennas

Fig. 9 Imperfect CSI-plot of

1. Lu L, Li GY, Swindlehurst AL, Ashikhmin A, Zhang R (2014) An overview of massive MIMO:

Vijaya Kumar Munagala and Ravi Kumar Jatoth

Abstract A new technique to improve the fractional order proportional inte-

Keywords AVR system · FOPID controller · BWO optimization

V. K. Munagala (B) · R. K. Jatoth

© Springer Nature Singapore Pte Ltd. 2021 19

2 Overview of Automatic Voltage Regulator (AVR) System

Synchronous generators are commonly used in power generation systems. Due to

Vref(s) Ve(s) Amplifier Exciter Generator Vt(s)

Fig. 1 Components of the AVR system

Fig. 2 AVR system unit step response

Table 1 Identified key parameters for AVR system

Fig. 3 Bode plot of AVR system without controller

3 Fractional Calculus and Fractional Order Controllers

3.1 Fractional Calculus

The D-operator has two famous definitions known as Grunwald–Letnikov (GL)

where h is the computation step size.

where α  (n − 1, n) and (·) represent the Gamma function.

an Dαn y(t) +an−1 Dαn−1 y(t) + · · · +a0 Dα0 y(t)

y(t) = C · x(t) (10)

3.2 Fractional Order Controller

The fractional order controllers have additional parameters (λ and μ) because of

Fig. 4 Different forms of

4 Black Widow Optimization

Fig. 5 Female black widow

4.1 Initial Population

Widow = [x1 , x2 , x3 , . . . , xn ] (13)

h11 h∗12 + h21 h∗22 +h31 h∗32 +· · · + hM 1 h∗M 2

where α (n − 1, n) and (·) represent the Gamma function.