Machine Learning, Deep Learning and Computational Intelligence For Wireless Communication
Machine Learning, Deep Learning and Computational Intelligence For Wireless Communication
E. S. Gopi Editor
Machine Learning,
Deep Learning and
Computational
Intelligence
for Wireless
Communication
Proceedings of MDCWC 2020
Lecture Notes in Electrical Engineering
Volume 749
Series Editors
Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Naples, Italy
Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán,
Mexico
Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore,
Singapore, Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology,
Karlsruhe, Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Università di Parma, Parma, Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid,
Madrid, Spain
Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität
München, Munich, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA,
USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra,
Barcelona, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University,
Palmerston North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University,
Singapore, Singapore
Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany
Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China
Junjie James Zhang, Charlotte, NC, USA
Yong Li, Hunan University, Changsha, Hunan, China
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering - quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact [email protected].
To submit a proposal or request further information, please contact the
Publishing Editor in your country:
China
Jasmine Dou, Editor ([email protected])
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director ([email protected])
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor ([email protected])
USA, Canada:
Michael Luby, Senior Editor ([email protected])
All other Countries:
Leontina Di Cecco, Senior Editor ([email protected])
** This series is indexed by EI Compendex and Scopus databases. **
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
I dedicate this book to my mother
Late. Mrs. E. S. Meena.
Preface
Due to the feasibility of collecting huge data from mobile and wireless networks, there
are many possibilities of using machine learning, deep learning and the computational
intelligence to interpret and to hunt knowledge from the collected data. The workshop
aims in consolidating the experimental results, integrating the machine learning, deep
learning and computational intelligence for wireless communication and the related
topics. This book consists of the reviewed papers grouped under the following topics:
(a) machine learning, deep learning and computational intelligence algorithms, (b)
wireless communication systems and (c) mobile data applications. I thank those
directly and indirectly involved in executing the online event MDCWC 2020 held
from 22 October to 24 October 2020 successfully.
Thanks
vii
Organization
Patron
Professor Mini Shaji Thomas, Director, NITT
Co-patron
Dr. Muthuchidambaranathan, Head of the ECE Department, NITT
Coordinator and Programme Chair
Dr. E. S. Gopi, Associate Professor, Department of ECE, NITT
Co-coordinators
Dr. B. Rebekka, Assistant Professor, Department of ECE, NITT
Dr. G. Thavasi Raja, Assistant Professor, Department of ECE, NITT
Session Chairs
Anand Kulkurni
Ashish
Gaurav purohit
Gopi E. S.
Lakshmanan Nataraj
Maheswaran
Narasimhadhan A. V.
Rebekka B.
Sathyabama
Satyasai Jagannath Nanda
Shravan Kumar Bandari
Shyam Lal
Smrithi Agarwal
Sudhakar
Sudharson
Thavasi Raja G.
xii Organization
Referees
Anand Kulkarni
Aparna P.
Ashish Patil
Gangadharan G. R.
Gaurav Purohit
Janet Barnabas
Koushik Guha
Lakshmanan Nataraj
Lakshmi Sutha G.
Lalit Singh
Mahammad Shaik
Maheswaran Palani
Mandeep Singh
Murugan R.
Rebekka Balakrishnan
Sanjay Dhar Roy
Sankar Nair
Sathya Bama B.
Satyasai Jagannath Nanda
Shilpi Gupta
Shravan Kumar Bandari
Shrishail Hiremath
Shweta Shah
Shyam Lal
Smriti Agarwal
Sudakar Chauhan
Sudha Vaiyamalai
Sudharsan Parthasarathy
Swaminathan Ramabadran
Thavasi Raja G.
Umesh C. Pati
Varun P. Gopi
Venkata Narasimhadhan Adapa
Vineetha Yogesh
xiii
xiv Contents
Dr. E. S. Gopi has authored eight books, of which seven have been published by
Springer. He has also contributed 8 book chapters to books published by Springer.
He has several papers in international journals and conferences to his credit. He has
20 years of teaching and research experience. He is the coordinator for the pattern
recognition and the computational intelligence laboratory. He is currently Associate
Professor, Department of Electronics and Communication Engineering, National
Institute of Technology, Trichy, India. His books are widely used all over the world.
His book on “Pattern recognition and Computational intelligence using Matlab”,
Springer was recognized as one of the best ebook under “pattern recognition” and
“Matlab” categories by the Book authority, world’s leading site for book recom-
mendations by thought leaders. His research interests include machine intelligence,
pattern recognition, signal processing and computational intelligence. He is the series
editor for the series “Signals and Communication Technology”, Springer publica-
tion. The India International Friendship Society (IFS) has awarded him the “Shiksha
Rattan Puraskar Award” for his meritorious services in the field of education. The
award was presented by Dr. Bhishma Narain Singh, former Governor, Assam and
Tamil Nadu, India. He is also awarded with the “Glory of India Gold Medal” by
International Institute of Success Awareness. This award was presented by Shri.
Syed Sibtey Razi, former Governor of Jharkhand, India. He was also awarded with
“Best citizens of India 2013” by The International Publishing House and Life Time
Golden Achievement award 2021, by Bharat Rattan Publishing House.
xix
Machine Learning, Deep Learning
and Computational Intelligence Algorithms
Deep Learning to Predict the Number
of Antennas in a Massive MIMO Setup
Based on Channel Characteristics
Abstract Deep learning (DL) solutions learn patterns from data and exploit knowl-
edge gained in learning to generate optimum case-specific solutions that outperform
pre-defined generalized heuristics. With an increase in computational capabilities
and availability of data, such solutions are being adopted in a wide array of fields,
including wireless communications. Massive MIMO is expected to be a major cat-
alyst in enabling 5G wireless access technology. The fundamental requirement is
to equip base stations with arrays of many antennas, which are used to serve many
users simultaneously. Mutual orthogonality between user channels in multiple-input
multiple-output (MIMO) systems is highly desired to facilitate effective detection
of user signals sent during uplink. In this paper, we present potential deep learning
applications in massive MIMO networks. In theory, an infinite number of antennas
at the base station ensures mutual orthogonality between each user’s channel state
information (CSI). We propose the use of artificial neural networks (ANN) to pre-
dict the practical number of antennas required for mutual orthogonality given the
variances of the user channels. We then present an analysis to obtain the practical
value of antennas required for convergence of the signal-to-interference-noise ratio
(SINR) to its limiting value, for the case of perfect CSI. Further, we train a deep
learning model to predict the required number of antennas for the SINR to converge
to its limiting value, given the variances of the channels. We then extend the study
to show the convergence of SINR for the case of imperfect CSI.
1 Introduction
With an increasing demand for faster and more reliable data transmission, traditional
data transfer schemes may no longer be able to satisfy growing demands. Several
technologies have been developed to satisfy these resource-heavy requirements, but
massive MIMO [1] is among the leaders of this race. Massive MIMO has proved to be
both effective and efficient in usage of energy and spectrum, promising performance
boosts of up to 10–100 times faster than that provided by existing MIMO systems
[2].
The concept of MIMO can be boiled down to transmitting and receiving multiple
signals over a single channel simultaneously. Although there is no prescribed crite-
rion to classify what a massive MIMO system is, generally, these systems tend to
utilize tens or even hundreds of antennas as opposed to three or four antennas in tra-
ditional MIMO systems. The key advantage of a massive MIMO system is that it can
bring up to a 50-fold increase in capacity without a significant increase in spectrum
requirement. Further, owing to large data rates, massive MIMO is expected to play
a major role in launching and sustaining 5G technology.
A typical massive MIMO system consists of several hundreds of antennas in the
base station and many users, interconnected to form a dense network (refer Fig. 1).
(1) and (2) signify the mathematical model governing the massive MIMO system
[3].
Y =HX +N (1)
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
y1 h11 h12 . . .h1N x1 n1
⎢ y2 ⎥ ⎢ h21 h22 . . .h2N ⎥ ⎢ x2 ⎥ ⎢ n2 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ y3 ⎥ ⎢ h31 h32 . . .h3N ⎥ ⎢ x3 ⎥ ⎢ n3 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
[⎢ y4 ⎥=⎢ h41 h42 . . .h4N ⎥ ⎢ x4 ⎥ + ⎢ n4 ⎥] (2)
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. .. .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . ⎦ ⎣ . .. . .. ⎦⎣ . ⎦ ⎣ . ⎦
yM hM 1 hM 2 . . .hMN xN nM
In massive MIMO, mutual orthogonality between each user’s channel state infor-
mation(CSI), is fundamental to the task of message signal detection. This orthogo-
nality between different channels relies upon an underlying assumption of an infinite
number of base station antennas [3]. However, in practical cases, this is not feasible.
Deep Learning to Predict the Number of Antennas … 5
Fig. 1 Schematic of a
generalized massive MIMO
system where Y i represents
the ith antenna in the base
station and Xi represents the
ith user. The blue lines
indicate the channel
coefficients corresponding to
user 1
We propose a deep learning (DL) model to predict the practical number of antennas to
be installed at the base station to ensure mutual orthogonality between user channels.
We use data generated through Monte Carlo simulation [5] to train our artificial neural
6 S. Chandra et al.
network (ANN) [6]. The model learns a precise mapping between the variances of
the user channel coefficients and the required number of antennas. It is found that
the model predicted values quickly and accurately, thereby allowing for potential
deployment in a massive MIMO system. For perfect channel state information (CSI),
we realize that the practical value of antennas required to drive the SINR to its
limiting value is well within the capabilities of a massive MIMO system. We use a
deep learning model to predict this value. This model is trained using data generated
through a Monte Carlo simulation. However, the more impactful observation is that
this is possible even for the imperfect CSI case, which is of more practical utility. A
similar DL approach can, therefore, be extended to predict the number of antennas
required to obtain the convergent value of SINR, for imperfect CSI.
3 Mutual Orthogonality
In this section, we analyse the trend in orthogonality between each user’s channel
state information (CSI) with the number of antennas at the base station (refer Fig. 2).
We show, using the Weak Law of Large Numbers [7] that, as the number of antennas in
the base station increases, the channel state information corresponding to individual
users must becomes orthogonal to each other:
Fig. 2 We can see how the value, depicted in (3), approaches zero with increasing number of
antennas. In these figures, β2 is fixed at 1 for each β1
Using the dataset generated in Sect. 3.1, we leverage the power of a deep learning
(DL) model whose objective is to predict the number of antennas required to safely
assume orthogonality between channel vectors of individual users.
We make use of an artificial neural network (ANN) that takes the variances of
any two channel vectors, as input, and predicts the number of antennas required to
ensure orthogonality between each user’s channel vector.
The network utilized for the purpose of this study (refer Fig. 3) consists of an
input layer fed with variances β1 and β 2 and an output layer that gives the number of
antennas, M, to ensure orthogonality between two channels with characteristics β1
and β 2 . The leaky version of the rectified linear unit (ReLU) activation function is
applied to each layer, and the learning rates are tweaked as per the Adam optimization
algorithm. Batch gradient descent [8] is used to feed-forward randomized batches of
input variances through the network. The mean-squared error (MSE) [9] is used to
compute the gradients and modify the weights of the network.
The network successfully learns to map the relationship between the output (num-
ber of antennas, M) and the input (variances β1 and β 2 ) (refer Fig. 6).
Deep Learning to Predict the Number of Antennas … 9
Fig. 3 Model of the artificial neural network (ANN), used to predict the number of antennas
required
4 Perfect CSI
The process of detection of user signal in massive MIMO relies upon the multipli-
cation of the received signal with the corresponding user’s channel vector. However,
for this, the channel vector for each user should be determined, and this calculated
value should not deviate considerably in time until the next channel vectors are
recalculated.
To extract the channel vector values, we generally send a pilot data consisting of
an identity matrix to isolate each of our channel vectors individually. However, the
assumption made here is that there is no noise corrupting the channel vectors during
our estimation. The noise arising from corrupted channel vectors is accounted for in
Sect. 5, imperfect CSI.
For simplicity, let us, once again, consider a system of four antennas at the base
station and two users (as in Sect. 3). From RHS of hH 1 Y =h1 HX +h1 N , we have
H H
∗ ∗
h11 (h11 x1 + h12 x2 + n1 ) + . . . + h41 (h41 x1 + h42 x2 + n4 ) . When we consider detec-
tion of signal corresponding to user 1, it is understood that [h11 h21 h31 h41 ] are known
and [h12 h22 h32 h42 ] are complex Gaussian random variables with zero mean and
variance β2 . Further, let us consider noise variance, E N 2 = σN2 = 1 and power
allocated to each user, Pu . Hence, we have, the theoretical limiting value of SINR [3],
Eu
h1 2 Eu Eu β1
lim SINR= lim M
= lim h1 2 = 2 (4)
N M →∞ M σ 2
σN
i=2 βi + σN2
M →∞ M →∞ Eu
M N
Fig. 4 Plot of calculated SINR and expected SINR value versus the number of antennas at the base
station. For a we have, β1 = 0.1 and Ni=2 βi = 0.5 and for b we have, β1 = 1.5 and
N
i=2 βi = 8
The theoretical limiting value of SINR, shown above, works well under the
assumption that there is an infinite number of base station antennas. However, it is
unknown as to what extent the above equations hold for practical values of base sta-
tion antennas. Hence, a study was conducted to analyse how the trend of SINR of the
received signal varies with the number of base station antennas (refer Fig. 4). Please
note that, for the purpose of this study, we have considered noise variance, σN2 = 1.
6: Euser = 1
7: for Beta1 = 0.05, 0.1, . . . , Beta1max do
8: for N i=2 βi = Beta1 + 0.05, Beta1 + 0.1, . . . ,
N
i=2 βi
max do
9: Mavg = 0
10: SINRexpected = Beta1 × E
11: for trial = 1, 2, . . . , Trials do
12: for m = 1, 2, . . . , Mmax do
13: H 1 = Matrix of dimensions m × 1 with values drawn from a complex Gaussian
distribution with mean 0 and variance Beta1
H 12mag ×Euser
14: Num = m
N
β ×E
15: Den = i=2 mi user + 1
16: SINR = Num
Den
17: if SINR − SINRexpected < Threshold then
18: Mavg = Mavg + m
19: Break
20: end if
21: end for
22: end for
N
23: Add Beta1 and i=2 βi to training data input list
Mavg
24: Add ( Trials ) to training data output list
25: end for
26: end for
Deep Learning to Predict the Number of Antennas … 11
The results in Fig. 4 were obtained by fixing Eu = 5 and noise variance, σN2 = 1.
From Fig. 4a, we can infer that by fixing just 127 antennas at the base station, the
received signal SINR converges to the expected value, within a threshold of 0.01.
Similarly, from Fig. 4b, we understand that by fixing 1441 antennas and fixing 3499
antennas at the base station, convergence occurs within an applied threshold of 0.3
and 0.1, respectively. However, this high value of antennas is obtained for a large
cumulative value of user variances (β1 = 1.5 and Ni=2 βi = 8).
On demonstrating the existence of a potential trend, once again, we employ a
Monte Carlo-based approach to simulate practical systems and realize the relation-
ship governing the variances of the channel vectors (β1 and Ni=2 βi ) and the number
of antennas at the base station(M ). We note that the convergent SINR value depends
only on the sum of variances of other channels and not the individual variances
themselves. Hence, by varying the sum over a range of values, we mimic multiple
user scenarios in a practical massive MIMO system. Following the steps shown in
Procedure 2, we calculate expected SINR values and compare it with the expected
SINR at infinity, which is Eβ1 (when we consider signal corresponding to user 1).
Using the data generation method mentioned in Sect. 4.1, we create a dataset that
maps the variances of the channels to the number of antennas required for the SINR to
converge to its limiting value. We use this dataset to train an artificial neural network
(ANN) to learn the mapping. We use a threshold of 0.01 for training.
For the purpose of this study, we use a network that consists of an input layer
fed with variances β1 and Ni=2 βi , where β1 is the variance of the channel whose
SINR we are interested in predicting and Ni=2 βi is the sum of variances of all other
channels. The output layer predicts the minimum number of antennas, M, to ensure
SINR of the channel with variance β1 is within a threshold of its convergent value.
The leaky version of the rectified linear unit (ReLU) activation function is applied to
each layer and the learning rates are tweaked as per the Adam optimization algorithm.
Batch gradient descent is used to feed-forward randomized batches of input variances
through the network. The mean-squared error (MSE) [9] is used to compute the
gradients and modify the weights of the network.
Section 4 deals with the ideal case, where the channel information observed is not
corrupted by noise. However, in reality, this is not the case. Similar to the previous
case, the channel vector values are determined using the “pilot” data sent across the
12 S. Chandra et al.
channel. However, the observed channel vector ĥ has an added noise component to
it, i.e. ĥ = h +error. As a result, the theoretical value of SINR for imperfect CSI
becomes (6).
Pu h1 4
SINR = 2 2
(5)
N
Pu
Pp
h1 2 + Pu i=2 h1 βi + h1
⇒ SINR = N Eu 2 β1 2 (6)
As shown in (6), the SINR converges to a constant value when the number of
antennas in the base station tends to infinity. However, we proceed to study at what
realistic value of base station antennas this expected SINR is obtained.
Deep Learning to Predict the Number of Antennas … 13
The steps utilized for this purpose is similar to that used for Perfect CSI and is
given as Procedure 3. However, the only difference is the formula used for calculating
the expected SINR. Following the steps shown in Procedure 3, we calculate expected
SINR values and compare it with the expected SINR as the number of antennas tends
to infinity.
6 Results
Figure 5 indicates that for an average variance of β1 = 0.5 and β2 = 0.5, the practical
number of antennas required for orthogonality is 125. Also, the trend indicates that
the required number of antennas to guarantee a margin of error of 0.1 % is only
feasible up to β1 = 3.26 and β2 = 3.81.
On the basis of Fig. 5, we can confirm of a conclusive trend between the number
of antennas in the base station (M) and the variance of the two user channels (β1 , β2 ),
required for the CSI dot product to fall below a threshold of 0.01. It can also be seen
from Fig. 5b that the change in the number of antennas, M , is equally sensitive to the
change in both, β1 and β 2 . It can be inferred that as the variances of the two user
channels are increased, there is a general increasing characteristic in the required
number of antennas.
Fig. 5 Figure a and b are two-dimensional and three-dimensional representations of the plot
between β1 , β 2 and the number of antennas, M . A well-defined, directly proportional relationship
can be observed between the independent features, β1 and β 2 , and the dependent variable, number
of antennas, M
14 S. Chandra et al.
The artificial neural network (ANN) approach was able to accurately predict the
required number of antennas within a margin of error of about 20 antennas. This
number is not significant from the perspective of massive MIMO. Further, it is appar-
ent that this error can be reduced further, given more data for the network to train
on.
It is evident from Fig. 6 that the network is able to learn the relationship governing
the variances and the number of required antennas. However, an important point to
be noted is that the above study is confined to two users. Hence, in the practical
scenario, the required number of antennas must be calculated for all pairs of users
and the maximum value obtained, from all pairs, must be considered.
In Sect. 4.1, pertaining to perfect CSI, we make use of Procedure 2 to generate data for
the number of antennas required for SINR convergence to the expected value. Here,
the parameters β1 and Ni=2 βi are varied and the corresponding required number
of antennas is recorded. The resulting values, generated, resemble a triangular plane
with a well-defined inclination angle.
Figure 7a highlights the dependency of the required number of antennas on β1 . As
can be observed, the trend is almost linearly increasing for fixed values of Ni=2 βi .
Similarly, from Fig. 7b, on analysing the variance of required number of antennas
with Ni=2 βi , we once again observe a significantly proportional trend. However,
from Fig. 7c, we can see that the increase in number of antennas is more steep for β1
Fig. 7 Figure depicts a three-dimensional plot between β1 , Ni=2 βi and the number of antennas,
M . Different viewing angles of the plot is shown in a–d. A direct dependence can be observed
between β1 , N i=2 βi and the number of antennas, M
as opposed to Ni=2 βi . From Fig. 7d, we can infer that there is a general increase in
the required number of antennas with increasing β1 and Ni=2 βi values.
Further, we note that on fixing a quintessential value of β1 = 2 and Ni=2 βi = 10,
we observe that the required number of antennas is 115. Analysing the extreme
case, by fixing β1 = 6 and Ni=2 βi = 20, we get a required number of antennas as
330. From this, we can infer that these observations are within the implementation
capabilities of a practical massive MIMO system. Hence, the use of this data to train
a deep learning (DL) model to predict the number of antennas is of high practical
use.
Utilizing the deep learning (DL) model proposed in Sect. 4.2, the required number
of antennas is estimated given a fixed value of β1 and Ni=2 βi . The predicted value
of the number of antennas is plotted against the true value of the required number of
antennas. The estimate, on an average, is within 19 antennas of the required number
16 S. Chandra et al.
of antennas which can be inferred from the calculated mean-squared error (MSE =
368.64688).
As can be seen in Fig. 8, the network is able to, effectively, learn the trend on the
input data. The performance can be improved by using larger datasets for training.
Similar to perfect CSI, the SINR is calculated using a Monte Carlo approach and
averaged over a number of trials. The resulting SINR is plotted alongside the expected
SINR, as shown in Fig. 9.
These results were obtained by fixing Eu = 5, β1 = 0.1, Ni=2 βi = 0.3 and
noise variance, σN2 = 1. The study indicates that by fixing a (i) threshold of 0.1, we
require 411 antennas, (ii) threshold of 0.05, we require 4516 antennas, and for (iii)
threshold of 0.01, we require 18,585 antennas for the received SINR to converge to
the expected value. Hence, according to the accuracy necessary, these values can be
utilized as a lower bound.
7 Conclusions
Often, finding the number of antennas to be installed in the base station for effective
and accurate deciphering of received signals is a challenging task. To solve this issue,
we make use of a deep learning model to obtain the number of antennas required,
given the maximum practically possible pair of input variances. Once at least the
predicted number of antennas are available for use at the base station, the channel
state information (CSI) of users are approximately orthogonal. Further, the latency
of the neural network is small enough to handle dynamic CSI. Accordingly, we
can activate the required number of antennas. The signal-to-interference-noise-ratio
(SINR), in case of perfect channel state information (CSI), is said to converge to
a constant value when the number of antennas, M → ∞. This signifies that even
though power allocated to each user Pu → 0, the SINR does not down-scale to 0.
We realize that for practical values of channel variances, the SINR converges and the
number of antennas required to ensure this convergence occurs at around 200 − 300
antennas. This is followed up by using a deep learning model to predict the required
number of antennas for the convergence of SINR to the expected values. Similarly,
for imperfect channel state information, the number of required antennas increases to
around 450. These observations have potential applications in realizing the limiting
value of SINR, in massive MIMO systems. Further, these observations invite deep
learning (DL) solutions to predict the number of antennas required in the base station
to ensure convergence of SINR, in case of imperfect CSI.
References
7. Weisstein EW, Weak law of large numbers. From MathWorld–A Wolfram Web Resource.
https://mathworld.wolfram.com/WeakLawofLargeNumbers.html
8. Ruder S (2016) An overview of gradient descent optimization algorithms. ArXiv
abs/1609.04747 n. pag
9. Schluchter MD (2014) Mean square error, in Wiley StatsRef: statistics reference online. Wiley,
New York
Optimal Design of Fractional Order PID
Controller for AVR System Using Black
Widow Optimization (BWO) Algorithm
1 Introduction
Electrical power generation systems are responsible for the production of electricity
using various natural resources. These systems are incorporated with generators that
convert mechanical energy into electrical energy. During the conversion process, the
systems tend to oscillate at equilibrium state because of vibrations in the moving
parts, load variations, and various external disturbances. To overcome this, often the
synchronous generators are fed with the help of exciters. The exciters will control
the input to the generators in such a way to uphold the output voltage at a stable level.
In this process, AVR systems are used in the control loop to maintain a stable signal
level at the input of the exciter so that the generator maintains the constant output
voltage at the terminals.
There are various techniques were provided like optimum control, robust control,
H ∞ /H 2 , and predictive control. All these control strategies use a proportional inte-
gral derivative (PID) controller as a basic control element. The modern industrial
controllers still use the PID controller because of its simple structure, ease of under-
standing the operation, and importantly robustness under different operating condi-
tions. Nowadays, the effort has been increased to improve the performance of PID
controllers using evolving mathematical concepts. One such technique is the use
of fractional order calculus to design PID controllers. Such a method adds addi-
tional parameters called the order of integration (λ) and order of differentiation (μ)
to the PID controller and these controllers are called fractional order PID (FOPID)
controllers. For optimal design of FOPID controller, the parameters like proportional
gain (K p ), integral gain (K i ), differential gain (K d ), λ, and μ should be carefully tuned.
These extra parameters will give additional advantages for FOPID controllers [1]. In
the past, FOPID controllers were used in many applications. Some of the applications
include speed control of DC motor [2], control of the servo press system [3], control
of water flow in irrigation canal [4], boost converter [5], control of flight vehicle [6],
and temperature control [7].
Various objective functions were utilized in the literature to tune the parameters
of FOPID/PID controllers. The basic cost function, integral of absolute error (IAE)
was used in [8] along with an improved artificial bee colony algorithm (ABC) for
tuning FOPID controllers. FOPID controller design using genetic algorithm was
discussed in [9]. FOPID controller tuning problem was solved using particle swarm
optimization algorithm (PSO) [10]. Another cost function called integrated squared
error (ISE) was utilized in the tuning of FOPID controllers [11]. Minimization of
ITSE was used to identify the best-tuned parameters of FOPID controller [12]. Zwee
Lee Gaing proposed an objective function [13] for the optimum design of FOPID
controller and this function was used for PID controller tuning [14, 15]. In addition to
the techniques mentioned, a variety of objective functions were created by combining
ITAE, IAE, ISE, and ITSE with weighted combinations of settling time, rise time,
steady-state error, and overshoot [12, 16, 17]. Chaotic map-based algorithms were
also used in the literature [14, 18, 19]. The advantage of chaotic maps is they will
improve the existing algorithm performance which will further optimize the objective
functions. Along with these techniques, authors have used multi-objective optimiza-
tion [20, 21] by combining more than one objective function. Here, a set of Pareto
solutions are generated, from these a suitable solution will be identified. Unknown
parameters of FOPID controller were identified using salp swarm optimization [22],
cuckoo search optimization algorithm [23]. A brief comparison of various optimiza-
tion algorithms for AVR system controllers was discussed in [24]. The design of
FOPID controllers for the AVR system using various optimization algorithms was
discussed in [25–30]. A frequency-domain approach for optimal tuning of FOPID
controller was discussed by [31, 32].
Optimal Design of Fractional Order PID Controller … 21
The paper was organized into the following sections. Sect. 2 discusses the oper-
ation of AVR system and analysis of system response parameters. A brief overview
of fractional differ-integrals and FOPID controllers was given in Sect. 3. A brief
description of the Black Widow Optimization algorithm and its working method-
ology was discussed in Sect. 4. The tuning of FOPID controller parameters using
the BWO algorithm [33] was described in Sect. 5. The performance of the BWO-
FOPID controller was compared with other optimization-based FOPID controllers
and robust analysis for the proposed controller was also made in Sect. 6.
Ka
G a (s) = (1)
1 + sτa
Sensor
(Hs)
where K a is amplifier gain which has values in the range [100, 400] and τ a is the
time constant of the amplifier and lies in the range [0.02, 0.1].
The transfer function of the exciter was represented as
Ke
G e (s) = (2)
1 + sτe
where K e is exciter gain which has values in the range [10,400] and τ e is the time
constant of the exciter and lies in the range [0.5, 1.0].
The transfer function of the generator was represented as
Kg
G g (s) = (3)
1 + sτg
where K g is generator gain which has values in the range [0.7, 1.0], and τ g is the
time constant of the generator and lies in the range [1.0, 2.0].
The transfer function of the sensor was represented as
Ks
G s (s) = (4)
1 + sτs
where K s is exciter gain which has values in the range [1.0, 2.0], and τ g is the time
constant of the sensor and lies in the range [0.001, 0.06].
To understand the behavior and dynamics of the system, a step response is plotted
in Fig. 2 and identified its key performance parameters. Table 1 shows the variation
of key parameters with a change in K g value. Since the terminal voltage varies with
load changes, different values of K g were considered in the range [0.7, 1.0] for step
1.6
1.4
1.2
Output magnitude
0.8
Input
0.6 Kg=1
Kg=0.9
0.4 Kg=0.8
Kg=0.7
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Time(s)
50
Magnitude (dB)
-50
-100
-150
0
Kg=1
Phase (deg)
Kg=0.9
-90 Kg=0.8
Kg=0.7
-180
-270
10-1 100 101 102 103
Frequency (rad/s)
response. The gain margin and phase margin were calculated from the bode plot
mentioned in Fig. 3.
Fractional calculus deals with the evaluation of real order integro-differential equa-
tions. Here, integration and differentiation are denoted by a common differ-integral
operator (D-operator) a Dta , where a, t represent limits of integration and α is the
order of differentiation.
24 V. K. Munagala and R. K. Jatoth
⎧ dα
⎪
⎪ , R(α) > 0,
⎨ dt α
α 1, R(α) = 0,
a Dt = (5)
⎪
⎪
t
⎩ ∫(dτ )−α , R(α) < 0
a
[ t−a
h ]
α 1 k α
a Dt f (t) = lim α (−1) f (t − kh) (6)
h→0 h k
k=0
α 1 dn t f (τ )
a Dt f (t) = ∫ dτ (7)
r (n − α) dt n a (t − τ )α−n+1
ξ
where D ξ =0 Dt , n (0, 1 , 2, … k), m (0, 1, 2, … k), α k and β k (k n, n − 1, …
0) are arbitrary real numbers.
Eq. (8) can also be represented in a more standard form as
q
a Dt x(t) = A · x(t) + B · u(t) (9)
where u Rr , x Rn and y Rp are the input signal, state and output signal of the
fractional order system and A Rnxn , B Rnxr , C Rpxn and q represent fractional
commensurate order.
λ=1,μ=1
PID
λ=0,μ=1
PD
λ=0,μ=0
P λ=1,μ=0 λ=2,μ=0 Integral
PI Order(λ)
were shown in Fig. 4. From the figure, it is observed that all other integer-order
controllers are variations of FOPID controller.
The generalized equation representing fractional PID controller was given by
U (s) Ki
G C (s) = = Kp + λ + Kdsμ (11)
E(s) s
U(s) represents output and E(s) represents the input of the controller. Generally,
the input for any controller is the difference between the desired signal and the
response signal. Correspondingly, in the time domain, the equation can be represented
as
μ
u(t) = K p e(t) + K i Dt−λ e(t) + K i Dt e(t) (12)
Therefore, from Eqs. (11) and (12), the real terms λ and μ make Gc (s) an infinite
order filter because of the real differentiation and integration.
The algorithm was developed by Vahideh et. al [33] based on the black widow spider
lifestyle. Generally, the female spiders are dominant than the male spiders and mostly
active during the night of the day. Whenever female spiders want to mate with male
spiders, they put pheromone on the web and males are attracted to it. After mating,
the female spiders consume the male spider. Then female spiders lay eggs as shown
in Fig. 5 and they will mature in 8–11 days. The hatched black widow spiderlings
exhibit sibling cannibalism because of competition and lack of food source. In some
special cases, the spiderlings will also consume the mother slowly. Because of this,
only strong black widow spiders only survive during their life cycle.
26 V. K. Munagala and R. K. Jatoth
In the algorithm, a variable in the solution space is called a widow and the solution
is called black widow spider. Therefore for an n-dimensional problem, the widow
matrix consists of n elements.
4.2 Procreate
To produce the next generation population, the male and female spiders will mate in
their corresponding webs. To implement this, a matrix with the same length as the
widow matrix called alpha matrix was used whose elements are random numbers.
The corresponding equations are shown in (15) and (16) where x 1 , and x 2 represent
parents and y1 , and y2 represent children.
y1 = α ∗ x1 + (1 − α) ∗ x2 (15)
Optimal Design of Fractional Order PID Controller … 27
y2 = α ∗ x2 + (1 − α) ∗ x1 (16)
The entire process is reiterated for n/2 times and lastly, the parents and spiderlings
are combined and sorted according to their fitness values. The number of parents
participating in procreation is decided by the procreation rate (PR).
4.3 Cannibalism
The black widow spiders exhibit three types of cannibalism. In sexual cannibalism,
after mating, the female spider consumes the male spider. In sibling cannibalism,
stronger spiderlings eat weaker spiderlings. In the third type, sometimes spiderlings
eat their mother. In the algorithm, this behavior was implemented as a selection of
the population according to their fitness values. The population was selected based
on cannibalism rating (CR).
4.4 Mutation
From the population, the Mutepop number of spiders is selected and mutation is
applied for any two randomly selected positions for each spider. The process of
mutation was shown in Fig. 7. The number of spiders to be mutated (Mutepop) is
selected according to the mutation rate (PM).
In the algorithm, the procreation rate was chosen as 0.6, the cannibalism rate was
chosen as 0.44, and the mutation rate was taken as 0.4. The complete flow of the
black widow optimization algorithm was shown in Fig. 6.
The BWO-FOPID controller provides two additional degrees of freedom. This allows
designing a robust controller for a given application. The process to tune the FOPID
controller using the BWO algorithm was denoted as a block diagram in Fig. 8.
V ref (s) is the reference voltage that should be maintained by the AVR system at
its terminals. V t (s) is the actual terminal voltage produced by the system. V e (s) is
error voltage, which indicates the difference of V ref (s) and V t (s). For each iteration
of the BWO algorithm, a population of K p , K i , k d , λ, and μ are generated and
are substituted in the objective function. FOPID controller takes V e (s) as the input
signal and produces the corresponding control signal. For this signal, the AVR system
terminal voltage and error are calculated. The process is repeated until the termination
28 V. K. Munagala and R. K. Jatoth
Generate
Procreate
population using
Logistic Map
Cannibalism
Evaluate Fitness
of Individuals
Mutation
Yes No
End Stop condition Update population
criteria were met using the method mentioned in Sect. 2. Finally, the best values of the
parameters will be identified and are used to design the optimum FOPID controller.
The designed controller was inserted into the system. The controller output is
given as input to the AVR system and it produces corresponding terminal voltage.
The terminal voltage was again compared with the reference voltage and the error
signal is produced. The process is repeated until the error signal becomes zero. When
the desired level was reached, the controller produces a constant U(s) to uphold the
output level at the terminal voltage.
Optimal Design of Fractional Order PID Controller … 29
Parameter Tuning
J=ZLG
(Objective Function)
Sensor
During the FOPID controller design, to tune the parameters, ZLG optimization
function was used. Although various standard optimization functions like IAE, ISE,
ITAE, and ITSE are available, it is mentioned that the ZLG producing better results.
The equation for the ZLG optimization function [13] was given in Eq. (17).
BWO
6.4
6.2
6
5.8
Cost value
5.6
5.4
5.2
5
4.8
4.6
4.4 X: 35
Y: 4.307
5 10 15 20 25 30 35
Iterations
Table 2 Range of K p , K i ,
Parameter Lower value Upper value
K d , λ, and μ
Kp 0.1 3
Ki 0.1 1
Kd 0.1 0.5
λ 0.5 1.5
μ 0.5 1.5
using the step response displayed in Fig. 10. From the figure, it can be observed that
the BWO-FOPID controller produces a low value of overshoot than the others. To
further investigate the controller performance, T s , T r , and E ss were calculated and
compared with other FOPID controllers.
The BWO algorithm produces the best parameter values because of the canni-
balism stage, in which the weak solutions are automatically omitted and only strong
solutions exist. It is observed that the BWO-FOPID controller has a better settling
time of 0.1727s and overshoot 1.2774 s and produced a very less steady-state error.
The rise time of the controller is a little higher than the PSO-FOPID and GA-FOPID
Optimal Design of Fractional Order PID Controller … 31
1.2
0.8
Magnitude
C-YSGA
0.6 PSO
CS
GA
0.4
BWO
0.2
0
0 1 2 3 4 5 6 7 8 9 10
Time(s)
To understand the reliability of the designed controller, the robust analysis was
performed by changing the time constants of various subsystems in the range of
-20% – 20%. Step responses were plotted for variation in τ a , τ e , τ g, and τ s values.
From Fig. 11a–d, it was observed that the BWO-FOPID controller performs well
even though there is a change of parameter values up to 40%.
32 V. K. Munagala and R. K. Jatoth
1.2 1.2
1 1
Terminal Voltage
Terminal Voltage
-20%
-20%
0.8 0.8
-10%
-10%
10%
0.6 10% 0.6
20%
20%
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time(s) Time(s)
(a) (b)
1.2 1.2
1 1
Terminal Voltage
Terminal Voltage
-20%
0.8 -20% 0.8 -10%
-10%
10%
0.6 10% 0.6
20%
20%
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time(s) Time(s)
(c) (d)
7 Conclusion
Acknowledgements This work is funded by the Department of Science and Technology, DST-ICPS
division, Govt. of India, under the grant number DST/ICPS/CPS-INDIVIDUAL/2018/433(G).
Optimal Design of Fractional Order PID Controller … 33
References
23. Bingul Z, Karahan O (2018) A novel performance criterion approach to optimum design of
PID controller using cuckoo search algorithm for AVR system. J Frankl Inst 355:5534–5559
24. Mosaad AM, Attia MA, Abdelaziz AY (2018) Comparative performance analysis of AVR
controllers using modern optimization techniques. Electr Power Compon Syst 46:2117–2130
25. Ekinci S, Hekimoglu B (2019) Improved kidney-inspired algorithm approach for tuning of PID
controller in AVR System. IEEE Access 7:39935–39947
26. Mosaad AM, Attia MA, Abdelaziz AY (2019) Whale optimization algorithm to tune PID and
PIDA controllers on AVR system. Ain Shams Eng J 10:755–767
27. Blondin MJ, Sanchis J, Sicard P, Herrero JM (2018) New optimal controller tuning method for
an AVR system using a simplified ant colony optimization with a new constrained Nelder–Mead
algorithm. Appl Soft Comput J 62:216–229
28. Calasan M, Micev M, Djurovic Z, Mageed HMA (2020) Artificial ecosystem-based optimiza-
tion for optimal tuning of robust PID controllers in AVR systems with limited value of excitation
voltage. Int J Electr Eng Educ 1:1–25
29. Al Gizi AJH, Mustafa MW, Al-geelani NA, Alsaedi MA (2015) Sugeno fuzzy PID tuning, by
genetic-neural for AVR in electrical power generation. Appl Soft Comput J 28: 226–236
30. Daniel Z, Bernardo M, Alma R, Arturo V-G, Erik C, Marco P-C (2018) A novel bio-inspired
optimization model based on Yellow Saddle Goatfish behavior. BioSystems 174:1–21
31. Monje CA, Vinagre BM, Chen YQ, Feliu V, Lanusse P, Sabatier J (2004) Proposals for fractional
PIλ Dμ tuning. In: Proceedings of 1st IFAC workshop on fractional derivatives and applications,
Bordeaux, France
32. Monje CA, Vinagre BM, Feliu V, Chen Y (2008) Tuning and auto-tuning of fractional order
controllers for industry applications. Control Eng Pract 16:798–812
33. Hayyolalam V, Kazem AAP, Algorithm BWO (2020) A novel meta-heuristic approach for
solving engineering optimization problems. Eng Appl Artif Intell 87(103249):1–28
34. Monje CA, Chen Y, Vinagre BM, Xue D, Feliu-Batlle V (2010) Fractional-order systems and
controls: fundamentals and applications. Springer, Berlin
LSTM Network for Hotspot Prediction
in Traffic Density of Cellular Network
Abstract This paper implements long short-term memory (LSTM) network to pre-
dict hotspot parameters in traffic density of cellular networks. The traffic density
depends on numerous factors like time, location, number of mobile users connected
and so on. It exhibits spatial and temporal relationships. However, only certain regions
have higher data rates, known as hotspots. A hotspot is defined as a circular region
with a particular centre and radius where the traffic density is the highest compared
to other regions at a given timestamp. Forecasting traffic density is very important,
especially in urban areas. Prediction of hotspots using LSTM would result in better
resource allocation, beam forming, hand overs and so on. We propose two meth-
ods, namely log likelihood ratio (LLR) method and cumulative distribution function
(CDF) method to compute the hotspot parameters. On comparing the performances
of the two methods, it can be concluded that the CDF method is more efficient and
less computationally complex than the LLR method.
1 Introduction
cially in urban areas, is the need of the hour (refer [3]). The aim of the paper is to
identify such hotspots using two different methods and predict the future hotspot in
the given area.
Spatio-temporal neural network architectures have also been proposed by previous
scientists using deep neural networks to address the same. It has been concluded that
long-term prediction is enhanced through such methods. Thus, the importance of deep
learning algorithms in mobile and wireless networking and even hotspot prediction
has been further imprinted (refer [4]).
Emerging hybrid deep learning models for temporal and spatial modelling with
the help of LSTM and auto-encoder-based deep model, respectively, have been
researched upon previously (refer [1]). Our motivation to employ LSTM is further
confirmed through references such as [1–3, 5–8]. For instance, sequence learning
using LSTM networks incorporating a general end-to-end approach to predict tar-
get variables of very long sequences has been presented with minimal assumptions
about the data sequence [6]. It has been concluded that LSTM network’s perfor-
mance is high even for very long sequences. Reference [5] reviews an illustrative
benchmark problem wherein conventional LSTM outperforms RNN. The shortcom-
ings of LSTM networks are addressed by proposing forget gates (refer [5]). It solves
the problem of processing input streams that are continuous and do not have marked
sequence ends. Thus, we employ LSTM for prediction and compare the performance
of the two proposed methods.
The data is represented as images for better visualization and interpretation (refer
[9]). There are some areas in these images that are more dense than others, which
form the hotspot region at a given timestamp.
In the first method, called the log likelihood ratio (LLR) method (refer [10]), we
find the centre and radius of hotspot using LLR. We consider two hypotheses: the
null hypothesis, H0 , represents the assumption that the traffic density to be uniformly
distributed and the other hypothesis, H1 , represents the actual distribution of traffic
density. We find the LLR for the two hypotheses and maximize it to obtain the hotspot
parameters. We train an LSTM network with input as sequence of raw data for 10
consecutive timestamps and target variable as hotspot parameters of 11th timestamp.
In the second method, called the cumulative distribution function (CDF) method,
we find the CDF starting from the centre of the hotspot found through LLR method
by increasing the radius from a minimum radius to the maximum radius that will
cover the entire image. We use CDF to compute the expectation value in each contour
which is the area between two concentric circles and plot it as an image. Using the
CDF, we determine the radius of hotspot as the least radius whose CDF is greater
than a threshold value fixed by us depending on the data. We train an LSTM network
with input as sequence of CDF for 10 consecutive timestamps and target variable as
hotspot parameters of 11th timestamp, where the radius of hotspot is same as radius
computed using CDF.
The proposed methods differ from the existing methods in the sense that it predicts
the hotspot parameters using LSTM. The second proposed method in fact makes use
of CDF to reduce complexity and compute the hotspot parameters. Further, the data
LSTM Network for Hotspot Prediction in Traffic … 37
is visualized as images with the hotspot region plotted on the image in order to better
understand the physical implication of the data attained.
In Sect. 2, we will briefly discuss how the data representing traffic density can be
visualized as images. We use a matrix to store the data and scale it to 255 in order to
represent it as images. We use MATLAB to plot the images. Section 3 discusses at
length about the LLR algorithm and its implementation to find the hotspot parameters.
Section 4 describes the CDF method’s algorithm to compute hotspot parameters and
each contour’s expectation value. In Sect. 5, the results of the two proposed methods
are compared. The various applications and extensions of the work presented in this
paper are discussed in Sect. 6.
The dataset contains the traffic density at a city level scale for one week, collected
at each hour (refer [11]). It consists of base station number and the number of users,
bytes and packets accessing that particular base station at a given hour. The corre-
sponding latitude and longitude of each base station is given in a separate file.
In order to represent the given data as an image for every timestamp, we fix the
image size as 151 × 151 (see Fig. 1). The minimum and maximum latitude and lon-
gitude is normalized and scaled to 150, with latitude serving as the x coordinate
and longitude serving as the y coordinate of the image. A matrix M, of dimension
151 × 151, is formed, and corresponding to the location of base station, the appro-
priate element of the matrix is assigned as the number of users. It must be noted that
the number of packets and bytes is not used to represent the data as image. If more
than one base station’s location corresponds to the same pixel coordinates, then we
add the data value to the already existing value at the corresponding element of M.
It is then scaled to 255 to represent it as image. Finally, 100 times the logarithm
of the matrix is represented as an image using MATLAB functions. This process is
repeated for all the timestamps.
In this proposed solution, we define and identify a hotspot using log likelihood ratio
(LLR) method as to be discussed in Sect. 3.1. After identifying the hotspot parameters
which are the centre and radius of hotspot, we proceed to train the LSTM network
to predict the hotspot parameters of future timestamp. For this, we reshape the entire
matrix of dataset after normalization and use it as input to the LSTM network. We
give raw data values of 10 consecutive timestamps as input to the LSTM network and
use 11th timestamp’s hotspot parameters calculated through LLR method as target
variables.
38 S. Swedha and E. S. Gopi
A hotspot is defined as a region where the traffic density is high. It can be of any shape.
Based on the previous work done in [10], it has been concluded that circular hotspots
are better than ring hotspots. Thus, in this paper, we consider it to be circular. To
identify the coordinates of the centre and radius of hotspot, we employ ‘log likelihood
ratio’ (LLR) method.
Consider the total region to be represented as S. At a given location on the image
with pixel coordinates (x, y), consider a circular region R of radius r . At a given
timestamp, let J be the total number of users in the entire region, K be the expected
number of users within the circular region R assuming uniform distribution, where
K = area(R)×J
area(S)
and L be the actual number of users within the circular region R. Let z
be a random variable vector with 22, 801(151 × 151) elements, where each element
represents each pixel point of the image, taking values 0 and 1. We assume that the
elements are independent of each other. A zero represents that the the pixel is outside
the circle constructed, and one represents that the pixel is within the circle. In other
words, the values 0 and 1 represent whether or not the pixel lies outside the hotspot
region chosen.
Consider null hypothesis H0 as uniform distribution of traffic density across the
given area for the given circular region R. Under the null hypothesis H0 , the proba-
bility that a pixel is within the hotspot is, KJ and the probability that a pixel is outside
the hotspot is (J −K )
. Since L users are within the hotspot R, the probability that z
J
L J −K J −L
follows H0 is, p(z|H0 ) = KJ J
.
Consider hypothesis H1 as the actual nonuniform distribution of traffic density
across the given area. At a given location on the image with pixel coordinates (x, y),
consider a circular region R of radius r . For a given circular region R with L users
inside it and considering the hypothesis H1 , the probability that a pixel is within the
LSTM Network for Hotspot Prediction in Traffic … 39
Fig. 2 a Image of 115th timestamp, b Hotspot computed using LLR method for 115th timestamp
hotspot is LJ and the probability that a pixel is outside the hotspot is, (J −L)
J
. Since
L users are within the hotspot R, the probability that z follows H1 is, p(z|H1 ) =
L L J −L J −L
.
J J
p(z|H1 )
Thus, the log likelihood ratio is defined as LLR = log p(z|H 0)
= L log KL +
−L
(J − L) log JJ−K . In order to find the hotspot, we need to maximize LLR. We
increase the radius r from 4 to 8 units by steps 0.1 unit and traverse at every pixel
location. The circle having the maximum LLR value is labelled as hotspot (see Fig. 2).
Fig. 4 Traffic density values of 10 consecutive traffic density images (time 105 to time 114) as
plotted in Fig. 1 are reshaped into a vector each of size 22,801. The raw data is given as input
sequence to the LSTM. It predicts the hotspot parameters of the 11th timestamp (time115). We
have plotted the circle found through LLR computation (red) and LSTM-LLR predicted circle
(black). The zoomed portion of the predicted hotspot can be seen in the 12th subplot
vector product (refer [12]). The ten consecutive timestamps and its prediction of
hotspot parameters of 11th timestamp through LSTM-LLR can be seen in Fig. 4 for
timestamp 115.
In the first proposed solution, we are reshaping the entire normalized matrix of dataset
and using it as input to the LSTM layer. This has higher complexity due to its large
size. Hence, our aim is to predict the hotspot parameters through a simpler approach.
One such approach is to use ‘cumulative distribution function (CDF)’. It provides a
better representation of the dataset. In this method, we compute expectation values
within contours which are concentric circles from the centre of hotspot calculated
through LLR method and represent it as an image. We give CDF values appended
LSTM Network for Hotspot Prediction in Traffic … 41
We take the centre of hotspot computed using LLR method. We consider radius of 4
units and increase it by 1 unit until
√ all the pixels are covered. Since it is a square image,
the maximum radius would be 2 times the length of side of square. This is because
the maximum radius would occur when the√hotspot is at the corner of the square.
Thus, the radius is incremented from 4 to 2 times 151 which is approximately
213 units by steps of 1 unit. We then add the values at each pixel within the circle
considered at each iteration and divide it by the sum of all values at all pixels. We
store this value in a 210-dimensional vector. Thus, CDF is computed in this manner.
As one can expect, the last element of CDF vector will always be 1. We take the least
radius as hotspot radius for which the CDF value is greater than 0.1. We have taken
the value as 0.1 after experimentation. At this value, the radius calculated through
LLR method and radius calculated through CDF method almost coincide (see Fig. 5
as an example). This value may differ depending on the dataset.
We start from the centre of hotspot computed using LLR method. We increase the
radius from 4 to 213 units by steps of 1 unit. Consider a 151 × 151 dimensional
matrix Contour to store values of each contour. We count the number of pixels
within a ring of inner radius r − 1 and outer radius r where r > 4 and the number of
pixels within a circle of radius r = 4. For the pixels inside the ring, the corresponding
elements of Contour matrix are assigned as difference between CDF value at r and
CDF value at r − 1 multiplied by the number of pixels inside the ring when r > 4.
For r = 4 and pixels inside a circle of radius r = 4, the corresponding elements of
Contour matrix are assigned as CDF value at r multiplied by the number of pixels
inside the circle of radius r = 4. Once all the elements of Contour matrix have been
assigned, 100 times logarithm of Contour matrix is plotted along with the hotspots
from both methods (see Fig. 5).
42 S. Swedha and E. S. Gopi
Fig. 5 (i) and (ii) represent timestamps 115 and 116, respectively. a Contour images. The red
and green circles represent the hotspot regions obtained through LLR and CDF methods, respec-
tively. Note that in the 115th timestamp, the two circles have coincided. b Cumulative distribution
function’s stem graphs
CDF vectors of all timestamps are already in the normalized form. A sequence of
vectors with CDF value of each timestamp appended with the normalized x and y
coordinates of hotspot of that timestamp from 10 consecutive timestamps is given
as input to the LSTM network. The target variable consists of 11th timestamp’s
normalized hotspot parameters, (r, x, y) where r is the radius of circle and (x, y)
is the centre of circle. Thus, the input size becomes 210 + 2, which is 212. The
LSTM architecture consists of two layers (see Fig. 6). It uses the default functions,
namely hyperbolic tangent function and sigmoid function, at the input gate, forget
gate and output gate (refer [12]). xt represents the input, which in our case is 10
consecutive vectors of size 212 each and yt represents output of the LSTM cell. yt−1
represents the previous output from the LSTM cell. The equations of the LSTM cell
LSTM Network for Hotspot Prediction in Traffic … 43
Fig. 7 The CDF values of 10 consecutive contour images (time 105 to time 114) as plotted in
Fig. 5 are initialized as a vector each of size 210. It is appended with normalized values of centre
of hotspot parameters of the 10 consecutive timestamps. It is given as input sequence (each of size
212) to the LSTM network. It predicts the hotspot parameters of the 11th timestamp (time115).
We have plotted the circle found through LLR computation(red) and CDF computation (green) and
LSTM-CDF prediction (yellow). In this case, the circles have coincided. Hence, only the LSTM-
CDF predicted circle (yellow) can be seen. The zoomed portion of the predicted hotspot can be
seen in the 12th subplot
are same as those in Sect. 3.2. The ten consecutive timestamps and its prediction of
hotspot parameters of 11th timestamp through LSTM-CDF can be seen in Fig. 7 for
timestamp 115.
5 Results
Fig. 10 (i) loss function of LSTM network (ii) normalized error in prediction of radius for testing
data (iii) normalized error in prediction of x coordinate of centre of hotspot for testing data (iv)
normalized error in prediction of y coordinate of centre of hotspot for testing data for a LLR method
b CDF method
Fig. 11 Represents the average error in hotspot parameters when different methods (LLR and CDF)
are compared with LSTM-LLR and LSTM-CDF predictions
Fig. 12 Represents the hotspot parameter values found through LLR and CDF computation and
their corresponding values in the real world for the given data set
error between predicted and actual values of x- and y-coordinates of the centre of
hotspot is lesser than those of LSTM-LLR prediction method (see rows 1 and 2 of
Figs. 11 and 10), although the average error between predicted and actual values of
radius predicted by the LSTM-CDF prediction method is more than that through
LSTM-LLR prediction method (see row 1, column 1 of Figs. 11 and 10). However,
the total average error in predicting hotspot parameters using LSTM-CDF prediction
using both CDF threshold and LLR value on comparison with LSTM-LLR method
is lesser (see Fig. 11). This can be further verified by seeing Figs. 4, 7 and 9. It
can be noted that prediction of centre of hotspot region is important for efficient
resource allocation and LSTM-CDF prediction even with the radius computed using
CDF threshold performs well. The LSTM architecture in CDF method is of lesser
complexity as the input size is lesser than the LSTM architecture for LLR method
(see Figs. 3 and 6). Depending on the application, the contour rings after a certain
CDF value need not be considered. This is because the CDF value tends to become
nearly the same, with marginal difference beyond a certain radius. This need not be
represented in the image if that particular application does not require them (Fig. 12).
6 Conclusions
The prediction of hotspot and computation of contour can be used to steer the antenna
to the desired direction. This would help in better reception of signals and increase
signal-to-noise ratio. It would result in better allocation of bandwidth. Further, the
beam forming capabilities are enhanced through the prediction of hotspot as the
antenna can focus better on the more dense regions. In fact, the contour images give
a better description of traffic density as it also estimates an expected value for each
contour. In this manner, a more efficient resource allocation takes place [13]. Thus,
when a mobile user is connected from one base station to the other, it results in a
smooth hand over.
As it can be seen from the training data’s images, there is only one hotspot in
every timestamp. However, this might not be the case for a different geographical
region. In such cases, the image can be divided into parts (say 4), and for each part,
a local hotspot can be computed using the algorithms mentioned in this paper. The
overall hotspot of the timestamp would be the one that corresponds to the highest
LLR value. While training the LSTM network, the target variables would be the
parameters of all local hotspots.
References
1. Wang J, Tang J, Xu Z, Wang Y, Xue G, Zhang X, Yang D (2017) Spatiotemporal modeling and
prediction in cellular networks: a big data enabled deep learning approach. In: IEEE INFOCOM
2017—IEEE conference on computer communications
LSTM Network for Hotspot Prediction in Traffic … 47
2. Feng J, Chen X, Gao R, Zeng M, Li Y (2018) DeepTP: an end-to-end neural network for mobile
cellular traffic prediction. IEEE Netw 32(6):108–115
3. Zhang C, Patras P (2018) Long-term mobile traffic forecasting using deep spatio-temporal
neural networks. In: Mobihoc ’18: proceedings of the eighteenth ACM international symposium
on mobile ad hoc networking and computing, pp 231–240
4. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a
survey. IEEE Commun Surv Tutor 21(3):2224–2287
5. Gers FA, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with
LSTM. In: Proceedings of ICANN’99 international conference on artificial neural networks
(Edinburgh, Scotland), vol. 2. IEE, London, pp 850–855
6. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks.
Adv Neural Inf Process Syst J
7. Huang C-W, Chiang C-T, Li Q (2017) A study of deep learning networks on mobile traffic
forecasting. In: 2017 IEEE 28th annual international symposium on personal, indoor, and
mobile radio communications (PIMRC)
8. Chen L, Yang D, Zhang D, Wang C, Li J, Nguyen T-M-T (2018) Deep mobile traffic forecast
and complementary base station clustering for C-RAN optimization. J Netw Comput Appl
00:1–12
9. Zhang C, Zhang H, Yuan D, Zhang M (2018) Citywide cellular traffic prediction based on
densely connected convolutional neural networks. IEEE Commun Lett 22(8):1656–1659
10. Nair SN, Gopi ES (2019) Deep learning techniques for crime hotspot detection. In: Optimization
in machine learning and applications, algorithms for intelligent systems, pp 13–29
11. Chen X, Jin Y, Qiang S, Hu W, Jiang K (2015) Analyzing and modeling spatio-temporal
dependence of cellular traffic at city scale. In: 2015 IEEE international conference, 2015 com-
munications (ICC)
12. Lu Y (2016) Empirical evaluation of a new approach to simplifying long short-term memory
(LSTM). In: arXiv:1612.03707 [cs.NE]
13. Alawe I, Ksentini A, Hadjadj-Aoul Y, Bertin P (2018) Improving traffic forecasting for 5G
core network scalability: a machine learning approach. IEEE Netw 32(6):42–49
Generative Adversarial Network
and Reinforcement Learning to Estimate
Channel Coefficients
1 Introduction
Massive MIMO systems are one of the primary emerging 5G wireless communi-
cation technologies [1] and continue to grow in popularity. These systems rely on
orthogonality of a user’s channel vector with the received signal, hence allowing a
In this paper, the authors explore a two-step approach to obtain corrected channel
coefficients from corrupted channel coefficients. The first step attempts to separate
the underlying distributions that make up the corrupted channel coefficients, which
consists of three separate Gaussian distributions, using a “Bank of GANs” approach.
This is followed up by extracting the true value of the channel coefficient by employ-
ing an RL agent. It is shown that the GAN-based approach is able to extract the source
distributions. Further, it is seen that a lifelong learning [8] RL system is capable of
picking up trends from the underlying data conditioned on the variance they are
drawn from and their sum. The authors also point out the idea that lifelong learning
allows adaptability to changes in the physical properties of the channel.
In practical scenarios, there exists multiple cells or base stations. In such scenarios,
an antenna in a base station receives a mixture of three signals, which are the intended
signal (message), intracell interference (interference from other users in that channel),
and intercell interference (between two cells). In order to extract the intended signal,
we need to separate the sources from the mixture. In the massive MIMO scenario,
we can model all the sources as being Gaussian distributed. That is, if X 1 , X 2 , X 3
are three Gaussian distributed signals and we have, X = X 1 + X 2 + X 3 , we need to
generate our estimates of the source signals, X 1 , X 2 , X 3 .
A two-step approach is proposed to achieve this (refer Fig. 1). First, a deep gener-
ative model is applied to learn the underlying distribution of the data, without param-
eterizing the output. A generative adversarial network (GAN) is used to achieve this.
This is shown in Sect. 3.2. Subsequently, a reinforcement learning-based approach
is used to obtain estimates of the source signals from the learnt distributions. This is
shown in Sect. 3.3.
Fig. 1 Schematic representation of the proposed system to estimate channel coefficients. Here, the
output from each of the GANs is a random variable which follows the distribution X i ∼ N (0, βi )
where N (0, βi ) represents a Gaussian distribution with zero mean and variance of βi
The role of the discriminator is to identify real and fake data. In order to model
the discriminator and the generator, neural networks are used. A neural network G
(z, θ 1 ) is used to model the generator and it maps input z to the data space x (the
space in which the training samples of the desired distribution lie). The discriminator
neural network D (x, θ 2 ) gives the probability a vector x from the dataspace is real.
Therefore, the discriminator network weights need to be trained so as to maximize
D(x, θ 2 ) when x belongs to the real dataset, and 1−D(x, θ 2 ) when x belongs to fake
data (generated by the generator network), that is x = G(z, θ 1 ). Thus, we can interpret
the discriminator and the generator as two agents playing a minimax game on the
following objective function, V (while using binary cross entropy loss):
V (D, G) = min max Ex∼pdata (x) log (D (x)) + Ez∼pz (z) log (1 − D (G (z)))
G D
(1)
where pdata is the distribution over the real data and pz is the distribution over the
input to the generator.
The proposed algorithm uses a “Bank of GANs” approach consisting of three
GAN networks. The output of the ith generator is values drawn from the distribution
corresponding to the ith signal. Using the trained GAN, the mean and variance of
each Gaussian source are obtained (refer Fig. 2). In Sect. 3.3, this learnt distribution is
used to sample estimates of the original source signals using a reinforcement learning
approach. The procedure used to learn the distributions is presented as Algorithm 1.
Generative Adversarial Network and Reinforcement Learning … 53
Fig. 2 Collective output from the “Bank of GANs” model is plotted. The GANs were fed with
signals from Gaussian distributions having variances a 1, 2.25, 4 and b 2, 10, 10. The predicted
variances are a 0.88, 2.323, 4.115 and b 2.08, 10.147, 11.038
In the previous section, a generative adversarial network was used to learn the under-
lying distribution of the source signals given the mixture signal. In this section, a
single-step RL method is proposed to sample the learnt distribution given the mixture
signal, so as to extract the original source signals. This method can be interpreted
as a reinforcement learning agent which represents its environment using the mix-
ture signal and the variances of the three source signals (obtained using the GANs).
From this state, an action simply consists of sampling three signals from the pre-
dicted Gaussians. This is done using a neural network whose outputs are the required
estimates.
It can be noted that, during training, a batch of mixture signals of batch size,
m, is fed as input and the reward is defined to be the negative of the collective
mean squared error. This allows the network to understand trends that are typical of
the signal and noise data. Also, a less noisy training period is observed. The reward
function was designed such that it acts as a measure of how close the sampled signals
are to the original source signals. The next task is to perform gradient ascent on this
reward. Equivalently, an attempt is made to minimize the negative of the reward,
3 2
R = − i=1 (1/m)|xi − x̂i | , where xi is the actual ith batch of source signals
and x̂i is the ith corresponding batch of source signals sampled by the RL agent. It
can be observed from the results shown in Fig. 3 that the predicted signals learn the
trend of the original source signals sampled from the dataspace. The algorithm used
for training this agent is described in Algorithm 2.
Fig. 3 Actual value of each signal is plotted in red while the predicted distribution is plotted in
yellow. A and B depict two instances of test results with variances for A being in the range of 1–5
for the desired signal and 5–15 for interference and noise. Variances for B are in the range of 2–6
for the desired signal and 7–15 for interference and noise. Mapping for sum, X 3 , X 2 and X 1 are
depicted in (a)–(d), respectively
56 P. Mani et al.
4 Results
In this section, the results and experiments corresponding to the study carried out in
each section are presented.
The approach explored in Sect. 3.2 is implemented using data drawn from Gaussian
distributions with mean zer o and variances (a) 1, 2.25, 4 and (b) 2, 10, 10. The
resulting distributions mapped by the GANs are illustrated in Fig. 2.
It can be seen from Fig. 2 that the distributions predicted by the ‘Bank of GANs’,
all exhibit Gaussian-like bell curves. The means and variances are calculated using
the outputs from the generators. The variances obtained have a mean squared error
of 0.0109 and 0.3684 in Fig. 2a, b, respectively. These variances are then fed into the
RL network, for sampling the source signal’s values.
In this section, the results of training a neural network are highlighted. The network
utilizes a leaky version of the rectified linear activation function [9] with a negative
slope of 0.2, for the hidden layers. For the output layer, a linear activation function
is used. For training, the network applies an RMSProp [10] optimizer with an initial
learning rate of 0.0001. In order to generate training data, the Monte Carlo [11]
approach is employed to generate signals from Gaussian distributions. The results of
training are shown on two different variance ranges for the intended and interference
signals: intended from 1 to 5, interference from 5 to 15, and intended from 2 to 6,
interference from 7 to 15. The results of testing on these models are shown in Fig. 3.
The test data is generated from Gaussians whose variances themselves are drawn
from a uniform random distribution. The predictions of the agent and the true value
of the coefficients are plotted in Fig. 3 for each of the three components along with
the sum of the three components. It can be observed that even though the signals used
to generate test data are sampled randomly, and conformed to lie within a limited
range, the trained network is able to capture the trend in the signals. Although the
RL network is able to capture the general trend in the signals, there still exists scope
for improvement in mapping the exact values of the distribution. However, the actual
mixture signal is mapped closely by the sum of the three predicted signals.
The graphs in Fig. 3 indicate that the reinforcement learning agent is able to
replicate the general trends of the input distribution with reasonable accuracy. The
observed mean squared error loss on the test data in Fig. 3a is: 0.006919, 0.81324,
2.08179, and 1.87812 for Sum, X 1 , X 2 , and X 3 , respectively, and the observed mean
Generative Adversarial Network and Reinforcement Learning … 57
squared error loss on the test data in Fig. 3b is: 0.002142, 1.74944, 2.30441, and
2.17069 for Sum, X 1 , X 2 , and X 3 , respectively.
The mixture of signals is also fed to train the GAN, along with the variances learnt
by the GAN and obtain the estimates of the original signals from the RL agent. The
observed MSE is 0.86326, 1.91800, 1.91943 and 0.00425 f or X 1 , X 2 , X 3 and Sum,
respectively.
5 Conclusions
References
1. Wang CX, Haider F, Gao X, You XH, Yang Y, Yuan D, Hepsaydir E (2014) Cellular archi-
tecture and key technologies for 5G wireless communication networks. IEEE Commun Mag
52(2):122–130
2. Gopi ES (2016) Digital signal processing for wireless communication using Matlab. Springer
International Publishing
3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
4. Zhang C, Patras P, Haddadi H (2019) Deep learning in mobile and wireless networking: a
survey. IEEE Commun Surv Tutor 21(3):2224–2287
5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio
Y (2014) Generative adversarial nets. In: Advances in neural information processing systems,
pp 2672–2680
6. Yang Y, Li Y, Zhang W, Qin F, Zhu P, Wang CX (2019) Generative-adversarial-network-based
wireless channel modeling: challenges and opportunities. IEEE Commun Mag 57(3):22–27
7. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press
8. Thrun S (1998) Lifelong learning algorithms. In: Learning to learn. Springer, Boston, pp 181–
209
9. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolu-
tional network. arXiv preprint arXiv:1505.00853
10. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint
arXiv:1609.04747
11. Rubinstein RY, Kroese DP (2016) Simulation and the Monte Carlo method, vol. 10. Wiley,
New York
12. Elijah O, Leow CY, Rahman TA, Nunoo S, Iliya SZ (2015) A comprehensive survey of pilot
contamination in massive MIMO-5G system. IEEE Commun Surv Tutor 18(2):905–923
13. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications.
Neural Netw 13(4–5):411–430
14. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875
Novel Method of Self-interference
Cancelation in Full-Duplex Radios for
5G Wireless Technology Using Neural
Networks
1 Introduction
It is a known fact that 5G wireless technology has a lot of defined innovative wire-
less principles and algorithms to provide maximized benefits to users in terms of
high data rate, reduced power consumption, high spectral efficiency, etc. [1]. In
such a scenario, “In-band Full-Duplex (FD)” communication is one among these
methods, that seeks to achieve better spectral efficiency. It exploits the same set of
Fig. 1 Full-duplex
communication scenario
frequency resource channels for simultaneous uplink and downlink signal transmis-
sions [2], unlike distinct forward and reverse channels, that happens in conventional
half-duplex communication (3G/4G). The scenario is pictured in Fig. 1 for commu-
nication between two nodes in FD mode with all signals being transmitted/received
in carrier frequency band f 1 .
But, owing to the mentioned technique, there is a very high possibility of the
receiver to receive signals from its own transmitter, which is formally called as self-
interference (SI) (red highlighted signals in Fig. 1) [3–5]. SI is an artifact that arises
in FD scenarios which is highly undesirable and thus has the potential to corrupt the
receiving signal (or) received signal of interest (SOI) to a large extent. There have
been many works being reported as of today, to solve the problem of SI. One among
them, which proves to be effective is the hybrid cancelation of SI [2, 5]. This method
utilizes successive SI cancelations in analog and as well in digital domain. Whereas
such robust hybrid cancelation of SI do exist, in this paper, we give a new direction
to tackle the problem by means of artificial intelligence (AI) using neural networks,
which shall be proved to perform well even for nonlinear SI cancelation.
2 Signal Modeling
self-interference (SI) from the transmitter affect the receiver of the same transceiver.
Hence, the overall received signal at the input of receiver block at baseband level is
assumed to be y(t) + α1 x1 (t − β) + α2 3 x1 3 (t − β) + α3 5 x1 5 (t − β) + g(t) where
y(t) is the actual signal of interest (SOI) with α1 x1 (t − β) and α2 3 x1 3 (t − β) +
α3 5 x1 5 (t − β) representing the linear SI and nonlinear SI (neglecting higher-order
terms greater than 5th harmonic), respectively, from transmitter [2, 4]. It should be
noted that due to the presence of RF circuit application in the transceiver such as
Power amplifiers etc., it is quite possible for them to generate these higher-order
harmonics of their inputs [6]. Hence, these terms manifest as nonlinear SI to the
receiver. Furthermore, let α and β represent the possible scaling and delay factors
incurring to the transmitting signal to manifest as SI at receiver. Additionally, let g(t)
represent the additive channel noise.
Subsequently, first stage SI cancelation is performed in analog domain (discussed
in upcoming section) which tries to perform partial linear SI cancelation only. Further,
passing it through LNA and ADC, digital cancelation is also performed which makes
the resultant output free from both linear and nonlinear SI. Thus, this method of
employing both analog and digital cancelation of SI is commonly referred to as the
hybrid cancelation of SI.
In the passive analog cancelation, an RF component subdues the SI. This can be
realized with the help of a circulator, antenna separation, antenna cancelation, or an
isolator. One of the main limitations of this technique is that it cannot suppress the
SI reflected from the environment. More details can be found in [2, 5].
62 L. Yashvanth et al.
The residual SI from passive analog cancelation is alleviated by the active analog
cancelation. As has been mentioned earlier, this attempt is made only to suppress the
linear SI from the composite signal, which otherwise would lead to saturation at the
ADC block leading to SOI distortion. Accordingly, let the composite received signal
as discussed be
with symbols having same meanings. Thus, active analog cancelation (or simply
call it analog cancelation) attempts to remove linear term, namely αx1 (t − β) before
processing the signal by LNA. As the transmitted signal x1 (t) is known to receiver
(because it is the same node which transmits the transmitting signal), active analog
cancelation tries to generate an estimate of αx1 (t − β) and hence removes it from the
received signal leaving with partial SOI. Literature given by Kim et al. [2], mentions
that, this estimate can be predicted as a linear combination of time-shifted versions
of the transmitting signal x1 (t). An attempt is made to understand this logic and is
mathematically worked as follows:
∞
t
α x̂1 (t + τ ) = α x1 (nTs + τ ) sin c( − n) (3)
n=−∞
Ts
∞
−β
=⇒ α x̂1 (τ − β) = α x1 (nTs + τ ) sin c( − n) (4)
n=−∞
Ts
As mentioned, the main challenge in the SI cancelation is the linear distortion caused
by multipath delay. Apart from this, the nonlinear distortion caused by the nonlin-
earity of the transmitter power amplifier (PA) at high transmit power, quantization
noise, and phase noise [2, 4] also hinder efficient SI cancelation. For simplicity, let us
ignore quantization noise, because it is almost inevitable to prevent it in the regime
of digital signal processing. The purpose of digital SI cancelation is to completely
suppress the residual SI from the analog cancelation techniques.
In digital cancelation technique, an attempt is made to model the SI channel filter,
whose output (which is SI) is then merely subtracted from the ADC output [4, 5, 8].
The linear interference component can be modeled as:
m
I1 (n) = h(k)y1 (n − k) (7)
k=−m+1
where I1 (n) is the linear SI component and h(k) constitutes the corresponding linear
SI channel filter parameters.
Similarly, let,
2r +1
m
I2 (n) = y1 (n − k)|y1 (n − k)|t−1 h t (k) (8)
t=3,5,7 k=−m+1
where I2 (n) is the nonlinear interference component and h t (k) constitutes the coef-
ficients of the tth order nonlinear SI channel filter model.
Thus, the calculated linear and nonlinear components can be removed from the
received signal y1 (n) as y1 (n) − I1 (n) − I2 (n) to estimate the signal of interest y(n)
as ŷ(n). In order to obtain I1 (n) and I2 (n), the filter coefficients h(k) and h t (k) have
to be estimated. For estimating the filter coefficients, let us define the associated cost
functions, J1 and J2 formulated based on the least squares setup and seek them to be
minimized. Thus, J1 and J2 , defined from [2] are as follows :
p−1
m
J1 = |I1 p (n) − y1 p (n − k)h(k)|2 (9)
n=0 k=−m+1
the actual raw transmitting signal (before launched by transmitting antenna), i.e.,
y1 p (n) = x(n)∀n ∈ {0, 1, . . . , p − 1} and I1 p (n) serving as the linear SI at output
of analog cancelation block (digital version)). And from [9] as,
p−1
2r +1
m
J2 = |I2 p (n) − y1 p (n − k)|y1 p (n − k)|t−1 h t (k)|2 (10)
n=0 t=3,5,7 k=−m+1
with symbols having similar meanings from that of (9). These equations are solved
using pseudo-inverse technique (obtained from the method of least squares). This
phase of estimating the filter coefficients from above equations can be precisely
termed as SI channel estimation.
The proposition is to model the digital cancelation block using a neural network
model. Here, a feed-forward, back-propagating neural network [10] is used to esti-
mate the SI channel (comprising linear and nonlinear components jointly) by solving
the minimization problems in (9) and (10). The model can be approximated with three
hidden layers as shown in Fig. 3. Let the number of nodes in the input layer be N and
all three hidden layers contain same number of nodes as the input layer. The output
layer contains a single node for predicting the SI sample values.
Thus, the associated overall loss function based on mean square error (MSE) per
epoch would be of the form:
n=Nb −1
1
|z(n) − ẑ(n)|2 (11)
Nb n=0
and z(n) representing the desired SI samples. Here, Nb represents the number of
iterations. To be very precise, Nb = p − N + 2. Also, it is reinforced that y1 (n) is
same as x(n) in the training phase.
At this point, it is worth to realize the fact that, the novelty of this paper is to
utilize the same set of weights of neural networks (in analogy with the coefficients of
filter modeling SI channel) to cancel both linear and nonlinear SI from the composite
signal. Hence, the proposed method is indeed a better technique than conventional
approach, wherein two different and distinct filters are employed to cancel linear and
nonlinear SI separately.
Thus, authors claim that the computational complexity of the proposed approach
is indeed lesser than the conventional solutions.
The trained models are then used to find the linear and nonlinear interference com-
ponents from the received signal during testing phase. These interference components
are then removed from the received signal to obtain the signal of interest, ŷ(n).
With the set of specifications as mentioned in Sec. 3.2.1, a MSE of 0.1 was obtained.
1. Transmitting signal, x(n)—Baseband signal of 10s duration (sampling frequency,
f s = 100 Hz) with spectral content between 15 and 25 Hz. The signal is created
using FIR coefficients by frequency sampling technique.
2. Receiving signal, y(n)—Baseband signal of 10s duration (sampling frequency,
f s = 100 Hz) with spectral content between 35 and 45 Hz. The signal is generated
as a periodic random Gaussian signal.
3. Channel noise, g(n)—Additive white Gaussian noise with resulting SNR = 0 dB.
The relevant time-domain plots of transmitting and receiving signals are given,
respectively, in Figs. 4 and 5. The corresponding frequency domain plots are depicted
in Figs. 6 and 7. Further, in order to mimic the presence of analog versions of above
signals, the signals are defined with a higher sampling frequency, say 10 times its
original f s , i.e., 1000 Hz. Successively, as per (1), a composite signal to model the
received signal embedded in SI is formed with β = 100 and with random values for
αi ∀i ∈ {1, 2, 3}. This assumption is taken because in general, a wireless channel is
time-varying in nature [11].
As per the sequence defined in Sect. 3.1, the foremost step is SI via passive analog
cancelation. As this method is accomplished by means of physical structures, only the
successive two steps are accounted for in this paper. Accordingly, next step is analog
cancelation. In accordance with (5), an estimate of linear SI is constructed with the
help of linear combination of 40 shifted versions of the transmitting signal. Let this
estimate be subtracted from received composite signal to obtain y2 (t). The resultant
signal which passes through a LNA and an ADC, is now ready to be processed by
the trained neural network as described in Sect. 3.2.
Once the neural network is trained, the network is then employed in testing phase,
acting as a mere filter with defined weights obtained in training phase. Thus, the resul-
tant signal is now filtered and subsequently, the output is subtracted from the signal
that was partially SI free (the digital signal at input of neural network).
Novel Method of Self-interference Cancelation in Full-Duplex … 67
X 501
6
Y 5.237
5
4
Transmitting Signal Strength
-1
-2
-3
-4
0 100 200 300 400 500 600 700 800 900 1000
Sample Number
2
Receiving Signal Amplitude
-1
-2
-3
0 100 200 300 400 500 600 700 800 900 1000
Sample Number
50
Magnitude (dB)
0
-50
-100
-150
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
4
10
0
Phase (degrees)
-1
-2
-3
-4
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
Fig. 6 Transmitting Signal (SI signal) – Frequency domain - Magnitude and phase response
100
Magnitude (dB)
-100
-200
-300
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
0
Phase (degrees)
-5000
-10000
-15000
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
Fig. 7 Receiving Signal (SOI) – Frequency domain – Magnitude and phase response
Novel Method of Self-interference Cancelation in Full-Duplex … 69
2.5
Receiving Signal
2 Extracted Signal
1.5
1
Signal Amplitude
0.5
-0.5
-1
-1.5
-2
-2.5
10 20 30 40 50 60 70 80 90 100
Sample Number
Fig. 8 Superimposed Receiving Signal (SOI) and SI canceled signal – 0 to 100 samples
2.5
Receiving Signal
2 Extracted Signal
1.5
1
Signal Amplitude
0.5
-0.5
-1
-1.5
-2
-2.5
110 120 130 140 150 160 170 180 190 200
Sample Number
Fig. 9 Superimposed Receiving Signal (SOI) and SI canceled signal – 100 to 200 samples
The resultant extracted signals are compared with the SOI and the relevant plots
are sketched after averaging over 5 Monte Carlo simulations in Figs. 8, 9, 10, 11, 12,
13, 14, 15, 16, and 17.
70 L. Yashvanth et al.
2.5
Receiving Signal
2 Extracted Signal
1.5
1
Signal Amplitude
0.5
-0.5
-1
-1.5
-2
-2.5
200 210 220 230 240 250 260 270 280 290 300
Sample Number
Fig. 10 Superimposed Receiving Signal (SOI) and SI canceled signal – 200 to 300 samples
2.5
Receiving Signal
2 Extracted Signal
1.5
1
Signal Amplitude
0.5
-0.5
-1
-1.5
-2
-2.5
300 310 320 330 340 350 360 370 380 390 400
Sample Number
Fig. 11 Superimposed Receiving Signal (SOI) and SI canceled signal – 300 to 400 samples
Novel Method of Self-interference Cancelation in Full-Duplex … 71
2.5
Receiving Signal
2 Extracted Signal
1.5
1
Signal Amplitude
0.5
-0.5
-1
-1.5
-2
-2.5
400 410 420 430 440 450 460 470 480 490 500
Sample Number
Fig. 12 Superimposed Receiving Signal (SOI) and SI canceled signal – 400 to 500 samples
3
Receiving Signal
Extracted Signal
2
1
Signal Amplitude
-1
-2
-3
-4
500 510 520 530 540 550 560 570 580 590 600
Sample Number
Fig. 13 Superimposed Receiving Signal (SOI) and SI canceled signal – 500 to 600 samples
72 L. Yashvanth et al.
3
Receiving Signal
Extracted Signal
2
1
Signal Amplitude
-1
-2
-3
600 610 620 630 640 650 660 670 680 690 700
Sample Number
Fig. 14 Superimposed Receiving Signal (SOI) and SI canceled signal – 600 to 700 samples
3
Receiving Signal
Extracted Signal
2
1
Signal Amplitude
-1
-2
-3
700 710 720 730 740 750 760 770 780 790 800
Sample Number
Fig. 15 Superimposed Receiving Signal (SOI) and SI canceled signal – 700 to 800 samples
Novel Method of Self-interference Cancelation in Full-Duplex … 73
3
Receiving Signal
Extracted Signal
2
1
Signal Amplitude
-1
-2
-3
800 810 820 830 840 850 860 870 880 890 900
Sample Number
Fig. 16 Superimposed Receiving Signal (SOI) and SI canceled signal – 800 to 900 samples
3
Receiving Signal
Extracted Signal
2
1
Signal Amplitude
-1
-2
-3
900 910 920 930 940 950 960 970 980 990 1000
Sample Number
Fig. 17 Superimposed Receiving Signal (SOI) and SI canceled signal – 900 to 1000 samples
74 L. Yashvanth et al.
60
Magnitude (dB)
40
20
0
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
4
10
0
Phase (degrees)
-5
-10
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
Fig. 18 Composite signal (received by receiver) – Frequency domain – Magnitude and phase
response
Novel Method of Self-interference Cancelation in Full-Duplex … 75
60
Magnitude (dB)
40
20
-20
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
10 4
0
Phase (degrees)
-5
-10
0 5 10 15 20 25 30 35 40 45 50
Frequency (Hz)
15
SI Signal
Receiving Signal
X 601 Extracted Signal
10 Y 12.57
Signal Amplitude
5
X 601
Y 0.7048
0
X 601
Y -0.7835
-5
-10
585 590 595 600 605 610 615
Sample Number
5 Conclusions
As has been described in Sect. 3.1, an effective conventional method of curbing the
self-interference cancelation that arises in an In-band full-duplex communication is
sought using hybrid cancelation technique. In so far, it demands for the employment
of two separate optimum filters to suppress linear and nonlinear SI components,
respectively, in the digital domain. However, the trivial means to construct them
individually by solving (9) and (10) are computationally expensive. Hence, the pro-
posed method using single neural network in place of aforementioned optimum
filters greatly simplifies the computation expensiveness without compromising with
the quality of results.
This is because the neural network-based technique seeks to suppress both lin-
ear and nonlinear SI components jointly in a single step. The justification of this
above statement clearly lies in the demonstrated results from the preceding section.
It should be vigilantly noted that the suppression of SI is well achieved both in time
and frequency domain. However, it is worth reconciling the fact that only till 5th har-
monics which have more potential to cause significant SI are considered as nonlinear
terms throughout this paper, and hence are liable to be canceled. Other higher-order
terms can safely be neglected.
References
10. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating
errors. Nature 323:533–536. https://doi.org/10.1038/323533a0
11. Rappaport T (2001) Wireless communications: principles and practice, 2nd edn. Prentice Hall
PTR, USA
Dimensionality Reduction of KDD-99
Using Self-perpetuating Algorithm
Abstract In this digitized world, massive amount of data is available on the network
but that too is not safe and secure from the stupefying techniques of the attackers.
These threats lead to the need for intrusion detection systems (IDSs). As standard
model KDD-99 dataset is used for research work in IDSs. But KDD-99 dataset
suffers from the dimensionality curse as the number of features and the total number
of instances available in the dataset are too large. In this paper, a self-perpetuating
algorithm on the individually analyzed feature selection techniques is proposed. The
proposed algorithm came up with reduced feature subset of up to 14 features with
reduced time, increased accuracy by 0.369%, and number of features decreased by
66.66% with J48 algorithm.
1 Introduction
Intrusion in any network is the most blemish part of the network. Those attacks are
being spread in various forms. The research work for this intrusion in a network is a
major concern. New attacks are observed every single day on our system/network.
From small firms to large organizations, all are in the trap of these attacks. Attackers
find stupefying techniques every single day to interrupt the network that conquers
the network security tools of even the big firms. Because of it, there is a need to
develop systems to fight against every recent type of attack. To solve all these threats
on the network, intrusion detection systems are made to detect the attacks. Based
on the type of attack, intrusion detection systems (IDSs) are categorized into two
types, namely host-based IDSs and network-based IDSs. Host-based IDSs scan and
examine the computer systems’ files and OS processes, while network-based IDSs
do the same over network traffic. To develop such IDSs, training of our system is to be
done as such efficient outputs are drawn without any information loss. In this paper,
the KDD-99 dataset is considered to analyze the results. The KDD-99 specifies the
attacks types broadly in five types: (a) DDoS, (b) U2R, (c) R2L, (d) Probe, and (e)
normal. The pitfall of the KDD-99 dataset is its increased dimensionality which is
approximately 42*400 K. This much-increased dimensionality gives the increased
time complexity for IDSs.
Before training any IDS, feature selection is performed that would train our system
to detect the attacks. Targeting the feature selection process and decreasing time
complexity is the major concern of our study of research in this paper. Feature selec-
tion techniques are generally done in three ways: (a) filter methods, (b) wrapper
methods, and (c) embedded methods. Our major concern in research is basically to
uncover the technique which could give the best features in our feature selection
process by analyzing all types of feature selection techniques. Filter methods gener-
ally give the feature by analyzing the interdependence with dependent features while
wrapper methods analyze the adequacy of the feature by practically training the algo-
rithms on it and at the end embedded methods give out the features by analyzing each
recursion of the algorithm. In the paper, firstly the feature selection techniques are
individually analyzed. Feature selection techniques [1] are analyzed individually:
1. CfsSubsetEval
2. ClassifierAttributeEval
3. ClassiferSubsetEval
4. GainRatioAttributeEval
5. InfoGainAttributeEval
6. OneRattributeEval
7. SymmetricalUncertAttributeEval
8. WrapperSubsetEval
After analyzing these algorithms individually, the proposed algorithm is applied
to take out the best feature selection technique with minimum features and increased
time complexity.
2 Related Work
This section is putting light on the previous studies on feature reduction methods
and the classification methods used in order to increase the efficiency and reduce
the time complexity. In 2018, Umbarkar and Shukla [1] proposed heuristic-based
feature reduction techniques for dimensionality reduction of the KDD-99 dataset.
They considered only three feature selection techniques, viz. information gain, gain
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 81
focused on increasing the accuracy but the set of features taken remained same, no
reduction in that was done. Although some papers presented work in reduction of
features too, we have done a general approach by considering all the feature selection
techniques and taking out the best out of it using derived self-perpetuating algorithm.
3 Proposed Work
Intrusion detection systems (IDSs) are designed as such to detect the attacks in the
network by training our system with the predefined dataset and applying the IDS
on our system. As standard model, KDD-99 dataset is always considered. The same
dataset is considered to propose the self-perpetuating algorithm. But the issue comes
in the training phase only of the system. As already known, the KDD-99 dataset
with 42 features is having huge dimensionality, this makes our training phase time-
consuming. The study proves that all the features present in the dataset are not of equal
importance. The set of attributes can be reduced although with improved efficiency.
Many researchers already made algorithms proposed to reduce the dimensionality
of the dataset. They studied a particular class of feature selection leaving analysis of
all the classes of feature selection techniques.
4 Experimental Setup
For the study, the KDD-99 dataset is considered. The system having 8 Gb of RAM
and Windows 10 operating system. For analyzing the techniques, Weka 3.8.4 is
used. The feature selection techniques listed in [1] are applied for the entire KDD-99
training dataset. After applying different feature selection techniques, the ranking
of the features is derived, as shown in Table 1. After getting feature ranking for
different feature selection algorithms, the most important features are selected for
each category of feature selection techniques, and the rest of the features are consid-
ered as unpotential features as they are providing less information. The comparison
of potential attributes and unpotential attributes is shown in Fig. 1 (Table 2).
84
For the selected feature subset, the J48 classification algorithm is applied on
the KDD-99 training dataset. For each selected feature subset, after applying the
classification algorithm in Weka 3.8.4, the results for training accuracy and training
time are as shown in Table 3.
Figures 2 and 3 show a comparison of different feature selection algorithms with
their training accuracies and training time, respectively. Clearly, from both figures,
WrapperSubsetEval is having more accuracy and less training time than the original
KDD-99 dataset with 42 features.
In the next stage feature subset selected by each feature selection method is consid-
ered and with a reduced feature set J48 classification algorithm is trained. In the next
86 S. Umbarkar and K. Sharma
phase, accuracy and time complexity are calculated for the testing dataset. Table 4
gives the accuracy and time complexity of the J48 algorithm for different reduced
feature sets of feature selection methods.
From Fig. 4, WrapperSubsetEval is having more accuracy, i.e., 92.218% than
original KDD-99 dataset with 42 features. The accuracy got from reduced feature
subset is 0.133% more than original accuracy of KDD-99 calculated on 42 features.
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 87
Table 4 Comparison of
Accuracy (%) Time (s)
testing accuracy and time
complexity of different All features 92.085 14.77
feature selection methods CfsSubsetEval 87.174 129.2
ClassifierAttributeEval 85.411 10.278
ClassifieSubsetEval 92.049 123.64
Information_Gain 91.829 5.234
Gain_Ratio 91.807 11.678
OneRAttributeEval 91.763 142.43
SymmetricalUncertAttributeEval 91.828 123.11
WrapperSubsetEval 92.218 138.15
As well as, a number of features also reduced from 42 to 5, i.e., decrement of 88.095%
of overall features volume (Fig. 1).
In the next phase, the reduced feature set got from WrapperSubsetEval is combined
with all other reduced feature subsets of feature selection methods. The J48 classifi-
cation algorithm is trained by combined subsets and accuracy and time complexity
is calculated for the testing dataset which is shown in Table 5.
From Fig. 6 WrapperSubsetEval combined with Gain_Ratio is having more accu-
racy, i.e., 92.454% than the original KDD-99 dataset with 42 features. The accuracy
got from the reduced feature subset is 0.369% more than the original accuracy of
KDD-99 calculated on 42 features. As well as, the number of features also reduced
from 42 to 14, i.e., decrement of 66.66% of overall features volume (Fig. 1).
88 S. Umbarkar and K. Sharma
In this paper, with the help of our proposed self-perpetuating algorithm, we reduced
feature subset of up to 14 features. On reduced feature subset testing, accuracy and
time complexity are calculated which is better than the original dataset having 42
features. The testing accuracy is increased by 0.369% and the number of features
Dimensionality Reduction of KDD-99 Using Self-perpetuating … 89
Table 5 Testing accuracy and time complexity of feature selection techniques with
WrapperSubsetEval
Accuracy (%) Time (s)
ClassifierSubsetEval Union WrapperSubsetEval 92.064 1949.55
ClassifierSubsetEval Union Info_Gain 91.93 243.68
CfsSubsetEval Union WrapperSubsetEval 92.204 2687.01
WrapperSubsetEval Union Info_Gain 92.014 159.35
WrapperSubsetEval Union Gain_Ratio 92.454 161.29
WrapperSubsetEval union OneRAttributeEval 92.025 154.45
WrapperSubsetEval union SymmetricalUncertAttributeEval 91.99 144.32
WrapperSubsetEval Union ClassifierAttributeEval 92.0792 228.9
decreased by 66.66%. Thus, the proposed algorithm successfully reduced the dimen-
sionality of the KDD-99 dataset. In the future, this work can be extended by applying
variations of mathematical operations to get the reduced feature set. Further compar-
ison of feature selection methods can be done by considering different classification
algorithms like decision tree, Naïve Bayes, KNN, etc.
90 S. Umbarkar and K. Sharma
Reference
1 Introduction
Typically, WSNs consist of a finite set of resource constraint tiny devices such
as sensor and actuators deployed in a field to investigate physical and environ-
mental interests. These small-sized devices are equipped with limited power, less
storage, short-range radio transceivers, and limited processing. Therefore, they have
not only sensed capability but also data processing and communication capabili-
ties. In sensor networks nodes densely distributed in a field to attend cooperatively
engaged for allotted function such as environment monitoring (for example, temper-
ature, air quality, noise, humidity, animal movements, water quality, or pollutants),
industrial process control, battlefield surveillance, healthcare control, home intel-
ligence, and security and surveillance intelligence [1]. Traditional WSNs contain
S. Mekala (B)
Department of CSE, Mahatma Gandhi University, Nalgonda, Telangana, India
K. Shahu Chatrapati
Department of CSE, JNTUH CEM, Peddapalli, Telangana, India
many sensor devices that can coordinate to accomplish a common task within an
environmental area. Each sensor consists of a small microcontroller for communi-
cations [2]. Effective route discovery models can be used to increase the network
lifetime by consuming limited power during communication activities [3–5].
The fundamental objective of WSNs is that discovering all the neighbors of a
sensor node is called neighbor discovery, which is one of the primitive functionalities
for many sensor networking applications. But it has been a challenging operation
since the number of neighbors of a node could not be predicted accurately. Neighbor
discovery with limited energy conservation is an essence of network formation, which
regulates sensor network setup and normal operations (such as routing and topology
control) and prolongs the lifetime of WSNs. To address the design problems of
energy-efficient neighbor discovery in recent literature, discovering the neighbor’s
process significantly classified into three categories: probabilistic, deterministic, and
quorum-based. Regular neighbor discovery mechanisms focus on reducing power
conservation by limiting the active periods of sensor nodes in sensor networks [1].
A neighbor discovery protocol (NDP) is a representation scheme to find neighbor
nodes. The basic concept of symmetric neighbor discovery, every node in the WSN
has a common duty cycle. On the other hand, nodes in network use independent
duty cycle in asymmetric approaches. In WSNs, nodes operated on limited battery
power and resource-constrained environments. Hence, NDPs need to address issues
relevant to asymmetric and symmetric duty cycles efficiently. Therefore, the lack of
supporting neighbor discovery in asymmetric networks is a considerable limitation
of the block designs-based NDP [4, 6]. In WSNs, the sensor nodes operated on three
modes by adopting a low-duty cycle (i.e., the nodes in the network have a minimum
work period than the sleep period). To evaluate the energy efficiency of nodes in
the sensor network has two primary metrics are proposed, such as discovery latency
and duty cycle. However, low-duty cycled sensor network node in standby mode
particular period of time to wake up the neighbors, leads to a considerable delay. In
general, a small duty cycle causes a longer discovery latency and vice versa. On the
other hand, some WSNs applications due to the dynamic nature of the devices in the
sensor network lead the constant changes in network topology so that the neighbor
node changes from time to time. Hence, it is a problematic process to find neighbor
nodes with limited power consumption and low discovery latency [5, 7].
As discussed in [2], the neighbor discovery methods U-connect, Disco, and quorum
involve high latency and energy consumption, compared to the block design based
neighbor discovery methods. But it involves high complexity and computation
overhead since the new block design.
The collaborative neighbor discovery mechanism for directional WSN [8] applies
the beamforming technique for neighbor discovery. But during beamforming, the
appropriate sector number and beam direction have been chosen for better results
Energy-Efficient Neighbor Discovery Using Bacterial … 95
[8–11]. Moreover, this technique did not address the node’s energy level duty cycle
schedule to prolong the lifetime of the network.
To solve these issues, energy-efficient collaborative neighbor discovery mecha-
nism using BFOA was proposed.
2 Related Works
3.1 Overview
a. Fitness Function
The following equation illustrates the fitness function for BFOA:
where switch delay: F switch , set of disks: W, and delay turning parameter:
=ρ×
where ρ is selected value such that the latency in every layer should be reduced.
E res = E i − (E t x + Er x ) (2)
+T
tini
A1 = DPR(t)dt (3)
tini
s +Tt x
tini +T
A2 = DPT(t)dt (4)
tini
The following equation shows that the whole packet aggregated by receiver node
excluding T tx is
T−Tt x
A3 = DPR(t)dt (5)
tini
We consider DPT(t) and DPR(t) as a low advent cost. Then, the following
equation can estimate duty cycle by considering the mean advent cost.
Z − n + DPT(t)Tt x
T < (6)
DPR(t)
Z −n
T < + Tt x (7)
DPR(t)
where
1
Td = (9)
BW ∗
BW: channel utilization, bandwidth, : clock moment cost.
d. Polling Neighbors using BFO
In this algorithm, each node with a directional antenna performs beamforming
using BFO to poll the neighbors along with its layer. The layer number and
beam direction are considered as the fitness function for BFOA.
Let K c, r, d = α i c, r, d (10)
where i = 1, 2, 3, …, N.
The above equation discloses the location of every associate in S nodules’
populace at a similar incident.
Let LTN be the node lifetime in the sensor network measured in the course of
chemotactic stages.
Let C > 0 signify the elementary chemotactic phase dimension, which used to
describe the distance between the stages in the course of turns.
Let α be the unit span arbitrary track signifying all. (Using unit span, the track
of drive after a fall can be assessed.)
The following sequence steps are elaborate in design of optimization algorithms:
100 S. Mekala and K. Shahu Chatrapati
Energy-Efficient Neighbor Discovery Using Bacterial … 101
beacon
Si −→ Neighbor sensor node-set
102 S. Mekala and K. Shahu Chatrapati
REPLY
Si −→ Neighboring Nodes
On receiving the reply message, the polling node can obtain the complete details
of neighbors along with their duty schedule and energy levels.
During data transmission, the active nodes with high residual energy are selected
from the neighbors.
4 Simulation Results
Impact of differing the nodes To analyze the group of nodes in a sensor network,
we change node density in the area accordingly from 50 to 200.
Figure 1 exhibited the discovery delay for EENDBFO and COND when the
number of nodes diversified in the way of 50, 100, 150, and 200. Concerning Fig. 1,
the discovery delay in EENDBFO reduces from 7.5 to 3.2, and discovery delay in
COND reduces from 10.0 to 5.1. But the discovery delay in EENDBFO is 36%
smaller when differentiated with COND.
Figure 2 depicts the discovery ratio calculated for EENDBFO and COND when
several nodes in the network have differed. From obtained results, the nodes grow
from 50 to 200, and the discovery ratio EENDBFO grows from 0.40 to 0.54, and
the discovery ratio COND grows from 0.20 to 0.45. The analysis shows that the
discovery ratio of EENDBFO is 33% larger concerning the COND.
Energy-Efficient Neighbor Discovery Using Bacterial … 103
Figure 3 explored the set of data packet reception in EENDBFO and COND when
the number of nodes diversified in the way of 50, 100, 150, and 200. The simulation
results show that the reception of packets at EENDBFO extends from 1233 to 1688,
and the reception of data packets at COND extending from 809 to 1381. Hence, the
reception of data packets at EENDBFO is 24% larger packets when differentiated
with COND.
Simulation in Fig. 4 explores that the average residual energy constructed in
EENDBFO and COND when the nodes in the network have differed. As per simu-
lation, EENDBFO, the average remaining energy of a node reduces from 11.8 to
11.4 J, and COND, the average remaining energy of a node reduces from 7.4 to 5.1 J.
The average residual energy in EENDBFO is 44% smaller than COND.
delay in COND improved from 4.6 to 7.0 s. But the discovery delay of EENDBFO
is 26% smaller when differentiated with COND.
Figure 6 depicts the node discovery ratio calculated for EENDBFO and COND
when several nodes in a sensor network setup are deferred. In simulation results, the
nodes are range from 200 to 400, and the node discovery ratio in EENDBFO reduces
from 0.70 to 0.59, and another hand discovery ratio in COND reduces from 0.43 to
0.26. The analysis clearly shows that the discovery ratio of EENDBFO is 50% faster
when compared with COND.
Figure 7 explores the set of data packet reception in EENDBFO and COND when
the number of nodes diversified in the way of 200, 250, 300, 350, and 400. The
simulation results show that the reception of packets at EENDBFO extends from
1757 to 2149, and the reception of data packets at COND extends from 1439 to
1750. Hence, the reception of data packets at EENDBFO is 20% larger packets
when differentiated with COND.
Simulation results in Fig. 8 explore the average residual energy constructed in
EENDBFO and COND when the nodes in the network have differed. As per simu-
lation, EENDBFO, the average remaining power of a node increases from 10.4 to
11.4 J, and COND, the average remaining power of a node increases from 5.5 to
6.9 J. The average residual energy in EENDBFO is 44% larger than COND.
5 Conclusion
In this paper, we have developed an EENDBFOA for directional WSN. In this algo-
rithm, each node with directional antenna performs beam forming using BFO to
poll the neighbours along its sector. During the polling stage, the appropriate active
nodes with higher energy levels can be selected from the neighbours during data trans-
mission. By simulation results, it has been shown that the proposed EENDBFOA
minimizes discovery delay and energy consumption and increases discovery ratio.
Energy-Efficient Neighbor Discovery Using Bacterial … 107
References
1. Manir SB (2015) Collective neighbor discovery in wireless sensor network. Int J Comput Appl
(0975–8887), 131(11)
2. Choi S, Lee W, Song T, Youn J-H (2015) Block design-based asynchronous neighbor discovery
protocol for wireless sensor networks. J Sens 2015. Article ID 951652, 12 p
3. Selva Reegan A, Baburaj E (2015) An effective model of the neighbor discovery and energy
efficient routing method for wireless sensor networks. Indian J Sci Technol 8(23). https://doi.
org/10.17485/ijst/2015/v8i23/79348, Sept 2015
4. Lee W, Song T-S, Youn J-H (2017) Asymmetric neighbor discovery protocol for wireless sensor
networks using block design. Int J Control Autom 10(1):387–396
5. Sun W, Yangy Z, Wang K, Liuy Y (2014) Hello: a generic flexible protocol for neighbor
discovery. IEEE
6. Qiu Y, Li S, Xu X, Li Z (2016) Talk more listen less: energy-efficient neighbor discovery in
wireless sensor networks. IEEE
7. Karthikeyan V, Vinod A, Jeyakumar P (2014) An energy-efficient neighbour node discovery
method for wireless sensor networks. arXiv preprint arXiv:1402.3655,2014
8. Nur FN, Sharmin S, Ahsan Habib M, Abdur Razzaque M, Shariful Islam M, Almogren A,
Mehedi Hassan M, Alamri A (2017) Collaborative neighbor discovery in directional wireless
sensor networks: algorithm and analysis. EURASIP J Wireless Commun Netw 2017:119
9. Chen L, Shu Y, Gu Y, Guo S, He T, Zhang F, Chen J (2015) Group-based neighbor discovery
in low-duty-cycle mobile sensor networks. IEEE Trans Mobile Comput
10. Agarwal R, Banerjee A, Gauthier V, Becker M, Kiat Yeo C, Lee BS (2011) Self-organization
of nodes using bio-inspired techniques for achieving small-world properties. IEEE
11. Das S, Biswas A, Dasgupta S, Abraham A (2009) Bacterial foraging optimization algorithm:
theoretical foundations, analysis, and applications. Foundations of computational intelligence,
vol 3. Springer, Berlin, pp 23–55
12. Amiri E, Keshavarz H, Alizadeh M, Zamani M, Khodadadi T (2014) Energy efficient routing
in wireless sensor networks based on fuzzy ant colony optimization. Int J Distrib Sensor Netw
2014. Article ID 768936, 17 p
13. Meghashree M, Uma S (2015) Providing efficient route discovery using reactive routing in
wireless sensor networks. Int J Res Comput Appl Robot 3(4):145–151
14. Sathees Lingam P, Parthasarathi S, Hariharan K (2017) Energy efficient shortest path routing
protocol for wireless sensor networks. Int J Innov Res Adv Eng (IJIRAE) 4(06):2349–2163
Auto-encoder—LSTM-Based Outlier
Detection Method for WSNs
Abstract Wireless sensor networks (WSNs) have got tremendous interest from
various real-life appliances, in particular environmental applications. In such long-
stand employed sensors, it is difficult to check the features and quality of raw sensed
data. After the deployment, there are chances that sensor nodes may expose to unsym-
pathetic circumstances, which result in sensors to stop working or convey them to
send inaccurate data. If such things not detected, the quality of the sensor network
can be greatly reduced. Outlier detection ensures the quality of the sensor by safe
and sound monitoring as well as consistent detection of attractive and important
events. In this article, we proposed a novel method called smooth auto-encoder to
learn strong plus discriminative feature representations, and reconstruction error of
among input–output of smooth auto-encoder is utilized as an activation signal for
outlier detection. Moreover, we employed LSTM-bidirectional RNN for maturity
voting for collective outlier detection.
1 Introduction
precision at the same time as consuming nominal resources of the sensor node or the
network [7, 8, 14–16].
Researches on outlier detection have prepared numerous techniques in a great
improvement stage for sensor network. Machine learning (ML) models produce enor-
mous outcomes with huge accuracy when they have prearranged datasets. Coming
to WSN, which is placed in real-time appliances, it is difficult to get labeled data.
Deep learning (DL) is a subdivision of ML; with the help of numerous nonlinear
transformations; DL permits the networks to mechanically learn the representation
from raw sensed data. Long-established ML methods generally need considerable
domain professionals as well as time to opt for first-rate features from the raw sensed
data. DL endows with simplifying the progression of artificial feature mining that
conquer the limitations of traditional or long-established ML models [3–6, 13, 14].
In the year 2006, Hinton prepared the researchers pay interest to DL. In his work,
Hinton projected a technique to teach (DNN) deep neural network: At first, supervised
greedy-layer-wised pre-assistance was employed to locate a set of moderate first-rate
parameters, after that a minor change to the complete network, and it effectively shuns
the dilemma of gradient loss.
Since the advances in numerous additional features, DL and representation knowl-
edge have been employed in the field of outlier exposure in WSNs as sound and
well. In contrast with long-established ML, DL offers more talent and shows poten-
tial progression in WSNs modernization. Some of them are high prediction accu-
racy—ML cannot analyze the entire complex parameters such as channel variation
and obstructions, etc., but DL can efficiently abstract all of this layer by layer. In
addition, there is no need to preprocess input data—because DL typically selects
the feature parameters openly composed from the set-up; this improvement of DL
lessens the proposed design complication and enlarges the forecast precision [7, 8,
10–12, 14–16]. As a replacement for scheming features physically, it will be more
helpful whether a model is to learn or trained for efficient feature signs by design
from dream data, during representation learning. For a mixture of computer visualiza-
tion odd jobs, an idyllic feature or characteristic representation should be healthy for
small disparities, smooth for maintaining data structures, as well as discriminative for
taxonomy associated tasks. DL provides more success rate in WSN applications since
DL tolerates incomplete or erroneous input raw sensed data and can easily handle
a large amount of input information and the capability to make control decisions
[1–4, 7, 8, 14–16].
In this manuscript, we offered an auto-encoder modification, smooth auto-encoder
(SmAE), headed for learn strong, hefty, booming as well as choicy feature represen-
tations. It is totally special from standard AEs which recreate every example from its
encoding; we utilize the encoding of every instance to rebuild its confined neighbors.
In this way, the learned demonstrations are constant, invariable with local neighbors
and moreover vigorous to petite deviations of the inputs.
112 B. Chander and K. Gopalakrishnan
In [15], authors designed a novel outlier detection model that gains knowledge
of spatio-temporal relationships among dissimilar sensors and that gained knowl-
edge of learned representation employed for recognition of outliers. They utilized
SODESN-based distributed RNN structural design along with the leaning method to
train SODESN. Authors simulate designed representation with real-world collected
data, and outcomes show excellent detection even with inadequate link qualities. In
[16], authors proposed two outlier detection approaches, LADS and LADQA, espe-
cially for WSNs. The authors employed QS-SVM and converted it to a sort problem
to decrease linear computation complications. The experimental outcome confirms
that the proposed approaches have lower computation with high accuracy of outlier
detection. The authors of [7] come with a different schemes, and they proposed a deep
auto-encoder to discover outliers from the spectrum of sensor nodes by comparing the
normal data with a fixed threshold value. Evaluation is done with various numbers of
hidden layers, and results achieve better performance. The authors fabricated a model
for varied WSNs to detect outliers by design with the help of cloud data analysis
[8]. The tentative evaluation of the projected process is performed on both edge plus
cloud test on real data that have been obtained in an indoor construction atmosphere
after that faint with a series of fake impairments. The gained outcome shows that
the projected process can self-adapt to the atmosphere deviations and properly clas-
sify the outliers. The authors of [9] prepared novel outlier detection Toeplitz support
vector data description (TSVDD) for efficient outlier exposure, and they utilized the
Toeplitz matrix for random feature mapping which decreased both space and time
complications. Moreover, new model selection was employed to make the model
stable with lower dimensional features. The experimental results on IBRL datasets
reveal that TSVDD reaches higher precision and lower time complexity in compar-
ison with hand methods. Reference [10] projected a one-class communal outlier
exposure with LSTM-RNN-based neural network. A model is trained with standard
time series data; here, the prediction error of a definite quantity of most recent time
steps higher than the threshold value will be a sign of collective outlier. The represen-
tation is calculated on a time series report of the KDD-1999 dataset and simulation
express that the projected replica can notice collective anomaly resourcefully.
The authors of [11] planned a model that forecasts the subsequent short-range
frame from the preceding frames via employing LSTM with denoising auto-encoder.
Here, the restoration error among the input with output of the auto-encoder is
employed like an activation gesture to sense original actions. In [12], the authors
proposed a novel technique where deep auto-encoder deployed as a central classifier,
training model with cross-entropy loss job, and back-propagation model to resolve
the issues of weight updating momentum factors are included. Lab experimental
observations on datasets have shown that the proposal has a high-quality precision
of feature extraction. In [17], the authors employed a novel LSTM for detection of
outliers from time-based data which are number f time steps ahead. The prediction
Auto-encoder—LSTM-Based Outlier Detection … 113
inaccuracy of a solitary point was subsequently calculated through forming its fore-
cast error vector to robust a multivariate Gaussian supply, which was employed to
evaluate the probability of the outlier’s actions. The authors of [18] proposed a model
by merging both predictive auto-encoders with LSTM for acoustic outlier gestures.
They predicted a novel reconstruction error on auto-encoder, the data instance that
shows the above threshold named as a novel event. The design of [18] is too utilized
in a [19], and here, LSTM-RNNs are engaged to predict short-range frames.
3 Proposed Model
n
k
dh
Jsm AE () = wn x j , xi L x j , g(f(xi )) + β K L(ρ ρ j )
i=1 j=1 j=1
From the past few years, long short-tem memory recurrent neural network (LSTM-
RNN) has been applied to represent the association among existing and preceding
events and holds the time series issues efficiently. In general, an LSTM-RNN is not
just skilled on standard data; it is also talented to forecast, quite a few times steps
ahead of an input. Most of the methods estimate outliers at the individual level from
related work, not at the collective level; moreover, both standard and outlier data
applied for the training phase. Coming to the design background of LSTM, it holds
the input layer, LSTM hidden layer, along with output layer. Here, an input node
takes input data and output will be any transform (sigmoid, tanh, etc.) utilities. The
LSTM hidden layer is fashioned as of the count of smart nodes those be entirely
associated to the input plus output nodes. Coming to the LSTM hidden layer, this
is fashioned from some well-groomed nodes that are completely related to the input
as well as output nodes. Gradient descent plus back-propagation are some of the
well-known techniques utilized for best of its loss function; moreover, it updates its
factors. As discussed above, LSTM has the authority to integrate deeds into a system
by teaching it with standard data. So the network turns as envoy for variants of the
data. In detail, a prediction is prepared with two characteristics: first—the value of
an example and second—its pose at a definite time. This suggests that two similar
input values at dissimilar times possibly outcome in two dissimilar outputs. And the
reason was LSTM-RNN is stateful, and it has a remembrance that varies in reaction
to inputs.
So here we designed a fresh communal outlier detection technique based on
LSTM with bidirectional RNN. Here, LSTM-RNN is utilized for correlation among
proceeding as well as existing time steps to approximate outlier score for every time
step, which helps for expanding time series outlier detection. In addition, bidirectional
RNNs were used to access the situation from mutual temporal information. It was
done through handing the input data in both ways through two split hidden layers
and then delivering to the output layer. The arrangement of bidirectional RNNs along
with LSTM memory blocks guides to bidirectional-LSTM set-up; here, perspective
from both temporal ways is exploited. And, this helps toward developing collective
outlier exposure based on the progressions of solitary data points based on their
outlier score. We prepare an LSTM-RNN on standard data to gain knowledge of
ordinary behavior. And this prepared model confirmed on standard validation sets
for guesstimate model parameters. Then the resulted classifier utilizes to cost the
outlier score in support of a particular data instance at every time step. The outlier
score of a series of time steps will summative starting the involvement of every
entity. With the help of fixed threshold, a series of solitary time steps is specified as
communal outlier if its outlier score is superior to the threshold.
For better accuracy, we made a mixture of initial assessments for the finest
network with changeable hidden layers along with their size. The finest network
draft for RNNs holds three hidden layers with 156-256-156 LSTM units. As well
as, the BRNNs finest layout contains six hidden layers where three for each track
116 B. Chander and K. Gopalakrishnan
through 216 LSTM units each. System weights are repetitively restructured with stan-
dard gradient descent through back-propagation of sum of squared error (SSE). The
gradient descent technique entails the system weights to be initialized by nonzero
standards; as a result, we initialize the weights by random Gaussian distribution
through mean (0) as well as standard deviation (0.1).
Threshold value
In designed model, both input and output layers of the system enclose 54 units. So, the
accomplished auto-encoder is proficient toward recreating every example along with
novel events through handing out the reconstruction error with an adaptive threshold.
For each and every time step, the Euclidean distance flanked by every identical input
value along with the system output is calculated. The spaces are summed-up plus
separated through the number of coefficients to stand for the reconstruction error of
every time step through a solitary assessment. For the best possible event exposure,
a threshold “θth ” is practiced to gain a binary sign. Here, threshold is relative to
median of the error signal of a progression e0 as a result of multiplicative coefficient
β, restrained to the choice from βmin = 1 to βmax = 2:
4 Experimental Results
For the experimental results, we consider a benchmark data set accumulated from
WSNs positioned at Intel Berkeley Research Laboratory (IBRL). Here, the data are
gathered with the TinyDB in network query processing method that fabricates on
the Tiny-OS policy. The sited WSNs include 54 Mica2Dot sensor nodes sited in the
IBRL for 30 days nearly 720 h. Sensors assemble data with five dimensions voltage
in volts, light in Lux, the temperature in degree celsius, humidity ranging from 0
to 100%, along with set-up topology position for every 30 s gap. In IBRL set-up,
Node 0 is considered the starting node and remained nodes broadcast data with more
than a few hops to node 0. The farthest nodes produce the sensed data with the
utmost of 10 hops. For 720 h, these 54 nodes collected almost 2.3 million readings.
For the experiment on the proposed model, we prepared a testing set because the
original atmosphere data did not contain any labels as to which data are normal
and outlier. Here, we choose three dimensions: humidity, temperature, and voltage.
We engaged k-fold cross-validation to reduce the samples to half the size. Each
of these dimensions holds 5000 sample for preparation or training, 1000 sample
for certifying or validating, plus 2000 samples for testing. In our technique, we
apply the unsupervised-based target neighbor toward exemplify weight function;
furthermore, network is promote fine-tuning by RBF kernel. The hyper-parameters
like layer aspect, sparsity consequence as well as kernel bandwidths were found
through the validation set. In Table 1, we mentioned model accuracy, precision along
Auto-encoder—LSTM-Based Outlier Detection … 117
with its error rates; moreover, we compared the proposed model with other existed
AE, DAE, CAE, AE-2, DAE-2, CAE-2, and SmAE-2 (Here, sign 2 indicates the
projected version build by stacking 2 hidden layers). And, the outcome result shows
the same has high-quality accuracy as well as a low error rate.
So, two labels normal and outlier are prepared, and this data set holds nearly
5000 normal and 400 abnormal samples. We employed k-fold cross-validation to
compress the samples to half the size. After various testing procedures, we fix with
best network model that trained with momentum of 0.9, learning velocity l = {1e−3
to 1e−7 } with dissimilar noise sigma values σ = {0.25, 0.5}. 54–20–54, 54–54–54,
and 54–128–54 is the best network topologies, so we maintained the same network
set-up for every testing, for the best comparison work. Here, each of the network
topologies are trained and evaluated for every 50 epochs.
From Table 2, it clearly shows the overall valuation and evaluation of the projected
method with other accessible up-to-date techniques, and our projected method shows
the most excellent results in terms of precision, recall, along with F measure up
to 96.89, 94.43, and 95.90 with input noise-standard-deviation of 0.5 see Table
3. Here, we conducted numerous experiments with numerous network layouts for
each network style; however, we explain the most excellent standard network layout
results. By arranging input noise deviation of 0.1, 0.25 both BLSTM-AE, LSTM-
SmAE produce higher values precision values with nearly 91.89, 93.46, and 92.24,
Table 2 Performance evolution designed method with various network layouts and existing
methods
Method Precision Recall F-measure
L ST M − AE
54–54–54, 54–128–54 89.1, 90.24 86.90, 88.29 85.24, 86.32
B L ST M − AE
L ST M − D AE
54–54–54, 54–128–54 92.23, 92.85 91.63, 91.45 93.48, 92.69
B L ST M − D AE
L ST M − C AE
54–54–54, 54–128–54 94.41, 94.90 92.81, 93.14 93.43, 94.17
B L ST M − C AE
L ST M − Sm AE
54–54–54, 54–128–54 96.77, 96.89 93.32, 94.43 95.73, 95.90
B L ST M − Sm AE
118 B. Chander and K. Gopalakrishnan
Precision
98
96
94
92
90
88 Precision
86
84
94.64. In the end, the achieved results showed that the employment smooth auto-
encoder with the different BLSTM proposal is valuable; moreover, a momentous
performance progression with respect toward the modern technology was designed.
For collective outlier detection, we observe the prediction errors of a particular
successive data point. For this, we calculate the relative error, collective error, and
prediction error. For relative error, we analyze the error among real value along with
its own prediction value from BLSTM-RNN at each time step. And, it is described in
equation form as RE (x, x) = |x − x|. Prediction Error Threshold (PET) estimates
that a particular time stamp value is considered as standard or a point for possible
collective outlier. If the RE is more than the calculated PET, then it is placed as a
point of collective outlier. Finally, the collective range identifies the collective outliers
based on the count of minimum amount of outliers come into view in succession in
a network flow.
5 Conclusion
Outlier detection in WSNs is one of the challenging tasks, and researchers are contin-
uously working on it for best results. In this article, we try to plan a model for outlier
detection by employing smooth auto-encoder-based LSTM-bidirectional RNN. We
have provoked this process as a result of exploring SmAE for its robust learning ability
of target neighbor representation, LSTM-RNN for the issues of time series, and try
Auto-encoder—LSTM-Based Outlier Detection … 119
to adjust both of the techniques to detect group outliers. The designed version is esti-
mated with the benchmark IBRL dataset. Experimental analysis proves the projected
method has superior accuracy, recall contrast to existing techniques.
References
1 Introduction
The structural changes in integrated power system with renewable energy resources,
converters, and inverters along with highly nonlinear loads inject harmonics and lead
to poor quality of electrical power [1]. These injected harmonics need to be estimated
and mitigated with proper solutions since they will result in some adverse effects on
regular functions of relays and other devices. The estimation of harmonics in the
The voltage signal with multiple harmonics (ωh = h·2π f 0 ) and noise (μ(t)) is
expressed in time domain with fundamental frequency f 0 is
N
v(t) = Ah sin(ωh t + ϕh ) + μ(t) (1)
h=1
N
v(k) = Ah sin(ωh kTs + ϕh ) + μ(k) (2)
h=1
Let the estimated parameters of amplitude and phase are Ah and ϕh , respectively.
The distorted signal with estimated parameters is represented as,
N
v(k) = Ah sin ωh kTs + ϕh (3)
h=1
Once the actual and estimated signals are available, then an objective function
is framed with the help of error and it attains minimum values when the estimated
signal is closely matching with actual signal. Therefore, the first objective function
used for harmonic components estimation [3–6] is given by
N
J1 = min (v(k) − v(k))2 (4)
h=1
Later, these harmonics generated from the signal conversion activities and
nonlinear load participation are minimized by identification of suitable filter param-
eters and/or switching angles patterns. In this case, second objective function is
framed for cascade H-bridge inverter to find optimal switching angles so that the
output signal consists non-dominated harmonic contents [11].
4
V ∗ − V1 1 V5 2 1 V7 2
J2 = min 100 · 1 ∗ + 50 · + 50 · (5)
(δ1 ,δ2 ,δ3 ) V1 5 V1 7 V1
In Eq. (5), V1 , V5 and V7 are the harmonic components whose expressions are
available in [R]. At optimal solution of switching angles δ1 , δ2 , δ3 , the objective
function attains its minimum.
All the terms in Eqs. (6) and (7) are as same as PSO and ωd and cd are the damping
values inserted for each control parameter. The improvements in the results for both
estimation and mitigation with the proposed method are presented in consequent
sections by providing comparison results with parent PSO.
4 Simulation Results
Initially harmonics injected voltage signal corrupted with noise along with zero
frequency component is considered for estimating harmonics components by both
PSO and IPSO. This test signal consists harmonics of order fundamental, 3rd, 5th,
7th, 11th is generated in MATLAB software. The mathematical expression of the
signal represented in the form of Eq. (1) is given by
x(t) = 1.5. sin(ωt + 80) + 0.5 sin(3ωt + 60) + 0.2 sin(5ωt + 45)
+ 0.15 sin(7ωt + 36) + 0.1 sin(11ωt + 30) + 0.5 exp(−5t)
In the estimation problem, additional noise is also included. First, the harmonic
components along with DC decaying component are estimated using conventional
PSO with constant inertia weights strategy. Four values of inertia, weights are consid-
ered for this purpose since there is no specific procedure for selection of such control
parameter. Later IPSO is applied for the same problem with worst inertia weight
values and obtained global best values. All these results are reported in Table 1.
Table 1 Optimal drift parameter values for single line to ground faults
PSO case Parameter 1st 3rd 5th 7th 11th Zero
ω = 0.9 A 1.4084 0.4906 0.1569 0.0010 0.0612 0.7666
ϕ 80.685 63.579 13.307 32.076 71.326 –
ω = 0.8 A 1.4994 0.5006 0.1986 0.1490 0.0512 0.5116
ϕ 79.966 60.039 44.0.790 35.967 90.238 −
ω = 0.7 A 1.5002 0.5004 0.1998 0.1501 0.0999 0.5108
ϕ 79.993 59.911 44.809 35.732 30.199 −
ω = 0.6 A 1.5000 0.4997 0.2000 0.1502 0.0999 0.5096
ϕ 80.017 60.003 44.938 36.174 29.682 −
ω = 0.3 A 1.4980 0.4515 0.1832 0.1356 0.0764 0.5053
ϕ 79.865 34.239 66.939 64.989 69.663 −
Proposed A 1.4993 0.5008 0.2003 0.1501 0.1006 0.5105
ϕ 79.995 60.025 45.326 35.918 29.844 −
An Improved Swarm Optimization Algorithm-Based … 125
Fitness values
30.998
30.52
3.742
0.0212
0.0209
0.0216
0.9 0.8 0.7 0.6 0.3 PROPOSED
For all PSO runs at different inertia weights, the values of fitness function at the
end of final iteration are plotted in Fig. 1. From this Fig. 1, it is observed that the
proposed dynamic control parameters concept reduces selection of control parame-
ters burden for finding global optimal solutions. The same PSO strategy is applied
for identification of optimal switching values in order to minimize the total harmonic
distortion (THD). For this purpose, Eq. (5) is considered and the results are reported
in Table 2 at a modulation index (m) of 0.6
5 Conclusions
References
1. Harris FJ (1978) On the use of windows for harmonic analysis with the discrete Fourier
transform. Proc IEEE 66(1):51–83
2. Ren Z, Wang B (2010) Estimation algorithms of harmonic parameters based on the FFT. In:
2010 Asia-pacific power and energy engineering conference. IEEE, Mar 2010, pp 1–4
126 M. Alekhya et al.
Abstract Classification is the most common task in machine learning which aims
in categorizing the input to set of known labels. Numerous techniques have evolved
over time to improve the performance of classification. Ensemble learning is one
such technique which focuses on improving the performance by combining diverse
set of learners which work together to provide better stability and accuracy. Ensemble
learning is used in various fields including medical data analysis, sentiment analysis
and banking data analysis. The proposed work focuses on surveying the techniques
used in ensemble learning which covers stacking, boosting and bagging techniques,
improvements in the field and challenges addressed in ensemble learning for classifi-
cation. The motivation is to understand the role of ensemble methods in classification
across various fields.
1 Introduction
Machine learning is one of the ways to gain artificial intelligence. Machine learning
focuses on equipping the machine to learn on itself without being explicitly pro-
grammed. This, in turn, leads way to gain intelligence. Machine learning is widely
classified into types, namely supervised learning and unsupervised learning. Clas-
sification is a prominent machine learning task which works into mapping input to
output. It is an supervised learning which does mapping of provided input to an
output. It basically finds the class to which an input data might possibly belong. It
is supervised learning since the data used to train the model which approximates
the mapping function are labelled with correct classes. Figure 1 depicts the types of
machine learning along with applications.
In addition to classification, regression is also a supervised learning which is also
applied in various fields like risk assessment and stock value analysis. Classification is
a predictive modelling where the class label of an input data is predicted.The model is
trained with numerous data which is already labelled. Some classic examples include
spam/non-spam mail classification, handwritten character classification. Binomial
and multi-class are two diverse types of classifications. Many popular algorithms are
involved to perform the classification task. Few well known are
• K-nearest neighbours
• Naive Bayes
• Decision trees
• Random forest
Although the performance of the algorithms was commendable, there is consistent
necessity to improve the performance. Ensemble learning is one familiar technique to
improve the accuracy. Ensemble learning, in turn, has many approaches to improve
the accuracy of classification. The idea of using ensemble models is to combine
numerous weak learners to act as a single strong learner. The work presented analyses
the various methods of ensemble techniques, its application in the fields and also
experimental analysis on the effect of ensemble learning.
The rest of the paper is organized as follows: Sect. 2 studies the related work,
Sect. 3 where the various ensemble techniques are discussed, Sect. 4 discusses the
application of ensemble techniques and Sect. 5 discusses the use of ensemble tech-
niques in deep learning, Sect. 6 provides an experimentation, and Sect. 7 concludes
the proposed survey.
A Study on Ensemble Methods for Classification 129
2 Related Work
Ensemble is an evergreen research area where many studies have been proposed.
The work proposed by Sagi et al. [1] provides a detailed study covering the advent of
ensemble learning besides explaining the history of every ensemble technique. The
key take away from the proposed work is the idea to refine the algorithms to fit big
data. Another mention is to carry future work of combining deep learning with ensem-
ble learning. Gomes et al. [2] proposed a survey specifically on the use of ensemble
learning in context of data stream. The author has studied over 60 algorithms to
provide a taxonomy for ensemble learners in the context. The study concludes with
evidence that data stream ensemble learning models help in overcoming challenges
like concept drift, and it tends to wok in real-time scenarios. Dietterich et al. [3]
initiated a better understanding in their work which explains a plenty of questions
such as why ensemble works, basic methods for constructing ensembles. The paper
describes the algorithms that form a single hypothesis to perform input to output
mapping and suffer from three main losses, namely
• Statistical problem
• Computational problem
• Representation problem
These problems cause high variance and bias, thus explaining why ensembles can
reduce the bias and variance. Ren et al. [4] discussed the theories that lead to ensemble
learning which includes bias variance decomposition and diversity. The paper also
categorizes the algorithms under different classes. It also discusses distinct methods
like fuzzy-based ensemble methods and deep learning-based methods. The work
focuses equally on regression tasks besides analysing classification tasks. In this
paper, various approaches involved in classification tasks, the application of ensemble
learning and the challenges are discussed.
Dependent framework is when the result of each learner affects the next learner.
The learning of the next learner is affected by the previous learning. Independent
framework is constructed independently from each other. Any ensemble technique
would fall under these two categories. The prime types of ensemble techniques are
• Bagging
• Boosting
• Stacking
These techniques have numerous algorithms working under them to achieve the goals
of ensemble.
3.1 Bagging
One major key to ensemble models is diversity. Diversity plays crucial role in
improving the performance of ensemble model. Bagging is one of the approaches
to implement diversity. Bagging is bootstrap aggregating which works by training
each inducer on a subset of instances. Each inducer gets trained on different subsets,
thereby generating different hypotheses. Then a voting technique is used to deter-
mine the prediction of the test data. Bagging often contains homogeneous learners
and implements data diversity by working on samples of data. In [5], Dietterich dis-
cusses in detail why ensembles perform better than individual learners. The author
claims that bagging is the most straightforward method to construct an ensemble by
manipulating training examples. He also states that this method of ensemble works
better on unstable algorithms. These algorithms are affected by major changes even
when the manipulation is small. Tharwat et al. [6] propose a plant identification
model which uses bagging classifier. The work proposes the usage of bagging classi-
fier on a fused feature vector model for better accuracy. Decision tree learner is used
as base learner, and the results show that the accuracy gets increased with increase
in number of learners. The paper also finds that the accuracy rate was proportional
to the number of training data and size of the ensemble. Wu et al. in [7] propose an
intelligent ensemble machine learning method based on bagging for thermal percep-
tion prediction. The author shows the performance of the ensemble against SVM and
ANN. The ensemble outperformed the classic algorithms in prediction of thermal
comfort and many other measures. Many improvements were suggested in bagging
some of which include improved bagging algorithm. The algorithm is improvised by
assigning an entropy to each sample. Jiang et al. in [8] used the algorithm for pattern
recognition to recognize ultra-high-frequency signals. The model showed improved
performance against many algorithms. Another interesting variant is wagging which
is weight aggregation. It works by assigning weights to samples. However, in the
work by Bauer et al. [9], there was no significant improvements shown in results. But,
bagging has shown improved performance by decreasing the error. In many exper-
iments along with Naive Bayes and MC4, the error has been significantly reduced.
A Study on Ensemble Methods for Classification 131
In [10], Kotsiantis et al. categorize the variants of bagging algorithms into eight
categories which include
• Methods using alternative sampling techniques.
• Methods using different voting rule.
• Methods adding noise and inducing variants.
3.2 Boosting
Boosting is an iterative technique which works by adjusting the weights of the obser-
vation made by previous classifier. This ensures that the instances which are not
classified properly are picked more often than the correctly predicted instances. This
makes boosting a well-known technique under dependent framework. There are many
algorithms in boosting technique of which the following are discussed.
• AdaBoost
• Gradient boosting.
AdaBoost AdaBoost was the pioneer in boosting technique. It works to combine
weak learners to form a strong learner. It uses weights and assigns them in such a
way that weights of wrongly classified instances are increased, and for the correctly
classified ones the weights are decreased. Thus, the weights make the successive
learners concentrate more on wrongly classified instances. In [11], the author dis-
cusses AdaBoost in a very detailed manner. Various aspects of AdaBoost have been
explained forming an expansive theory on the algorithm. Prabhakar et al. in [12] pro-
posed a model combining dimensionality reduction technique and AdaBoost classi-
fier for classification of Epilepsy using EEG signals. The classification is improved
with more than 90% accuracy. Haixing et al. [13] used an AdaBoost-kNN ensemble
learning model for classification of multi-class imbalanced data. The model uses
kNN as base learner and incorporates AdaBoost. The results showed 20% increase
in accuracy than classic kNN model. Many other works involved usage of AdaBoost
along with feature selection techniques for increased accuracy.
Gradient Boosting Gradient boosting machine (GBM) works in an additive sequen-
tial model. The major difference between AdaBoost and GBM is the way they manage
the drawbacks of the previous learner. While AdaBoost uses weights, GBM uses gra-
dients to compensate the drawbacks in succeeding learners. One prominent advantage
of using GBM is that it allows user to optimize user-specified cost function instead
of unrealistic loss function. Many literatures use an improvement of gradient boost
which is extreme gradient boost (XGB). Shi et al. [14] propose a weighted XGB
which is used for ECG heartbeat classification. The model was used in classifying
heartbeats under four categories like normal and ventricular, and the work concludes
saying the method is suitable for clinical application.
132 R. Harine Rajashree and M. Hariharan
3.3 Stacking
Stacking combines multiple learners by employing a meta learner. The base level
learners are trained on the training data set, and the meta-learner trains on the base
learner features. The significance of stacking is that it can reap the benefits of well-
performing models by learning a meta-learner on it. The learners are heterogeneous,
and unlike boosting only a single learner is used to learn from base learners. Stack-
ing can also happen in multiple levels, but they might be data and time expensive.
Ghasem et al. [15] used a stacking-based ensemble approach for implementing an
automated system for melanoma classification. The author also proposed a hierar-
chical structure-based stacking approach which showed better results besides the
stacking approach.
Random forest is a very popular ensemble method which is bagging method where
trees are fit on bootstrap samples. Random forest adds randomness by selecting best
features out of random subset of features. Sampling over features gives the added
advantage that the trees do not have to look at the same features to make decisions.
Lakshmanaprabhu et al. [16] proposed a random forest classifier approach for big
data classification. The work exhibits how ensemble techniques work with big data.
The model is implemented on health data, and RFC is used to classify the same.
The results showed maximum precision of 94% and showed improvement against
existing methods. Paul et al. [17] proposed an improvised random forest algorithm
which iteratively reduces the features which are considered unimportant. The paper
aimed to reduce the number of trees and features while still maintaining the accuracy.
It could prove that the addition of trees or further reduction of features does not have
effect on accuracy.
Ensembles are employed due to their ability to mitigate a lot of problems that might
occur while using machine learning. Many literatures [1, 4] discuss the advantages
and disadvantages. The significant benefits of using ensemble are discussed below.
• Class imbalance: When the data have majority of instances belonging to a single
class, then it is said to be class imbalance. Machine learning algorithms thereby
might develop an inclination towards that class. Employing ensemble methods
can mitigate this issue by performing balanced sampling, or employing learners
that would cancel the inclinations of the previous learner. In [13], it is shown how
ensemble is used on imbalanced data.
A Study on Ensemble Methods for Classification 133
• Bias Variance Error: Ensemble methods tackle the bias or variance error that might
occur in the base learners. For instance, bagging reduces the errors associated with
random fluctuations in training samples.
• Concept drift: Concept drift is the change in the underlying relationships due to
change in the labels over time. Ensembles are used as a remedy since diversity in
the ensembles usually reduces the error that might occur due to the drift.
Similar to the benefits, there are certain limitations in using ensembles. Few of them
are
• Storage expensive
• Time expensive
• Understanding the effect of parameters like size of ensemble and selection of
learners on the accuracy.
At few places, smaller ensembles work better, whereas in some literature the increase
in accuracy is stated to be proportional to the number of learners. Robert et al. [18]
conducted an experimental analysis by comparing the ensemble methods against 34
data sets. Some significant outcomes of the analysis are mentioned in Table 1.
From the findings, it is visible that at some cases ensembles can also perform poor,
whereas in most cases according to the literature the accuracy is better. Although
few questions are still open, ensembles have widely been employed for improved
performance.
The emergence of deep learning has resulted in enormous growth in various domains.
Deep learning paves way for improvements in artificial intelligence to get par with
humans. Architectures like densely connected neural network and convolutional neu-
ral network are very popular. Deep learning plays an important role in speech recog-
nition, object detection and so on. Aggregation of multiple deep learning models is a
simple way to employ ensemble in deep learning. Other way is to employ ensemble
inside the network. Dropouts and residual blocks are an improvement in such models.
They tend to create variations in the network, thereby improving accuracy. Numerous
literatures show how neural networks employ ensemble methods for classification
134 R. Harine Rajashree and M. Hariharan
purposes. Liu et al. [19] proposed an ensemble of convolutional neural networks with
different architectures for vehicle-type classification. The results show that the mean
precision increased by 2% than single models. Zheng et al. [20] in the work proposed
an ensemble deep learning approach to extract EEG features. It uses bagging along
with LSTM model and showed higher accuracy when compared with techniques
like RNN. Such works exhibit how ensemble is marching forward in the artificial
intelligence era.
6 Experimentation
To understand the effect of ensemble methods, the ensemble methods have been
employed on an open banking data set which studies the attributes of an user to clas-
sify if he/she would take a loan from the bank. The findings from the experimentation
are listed in the table below.
Method Accuracy
Stacking 88.5
AdaBoost 89.01
GBM 88.27
Bagging 88.72
Random forest 86.91
AdaBoost exhibits highest accuracy of 89% with decision tree as base learner.
Random forest shows 86.91% accuracy. The 3% increase in accuracy is still promis-
ing to help the task. On the other hand, another data set trying to classify default
credit payments was experimented. Random forest showed the highest accuracy of
98.76%, whereas the other techniques showed accuracy around 75% only. This shows
the advantage of ensemble technique while also expressing the limitation of effect of
parameters on accuracy. The effect of feature selection can also be studied in future
for reaping much higher performance.
7 Conclusion
Classification tasks aim to predict the class label of unknown test data. Ensem-
ble learning is a popular technique to improve the performance of classification. It
combines numerous learners to form a single strong learner. Many techniques and
algorithms are present in ensemble learning. Bagging, boosting and stacking are the
very popular ensemble techniques. These algorithms are discussed along with their
applications in various fields. Besides, the future of ensemble learning in the field
of artificial intelligence , advantages and disadvantages in application of ensemble
learning are explained in detail. As a future work, analysis can be made on how
A Study on Ensemble Methods for Classification 135
ensemble learning can be used efficiently to fit big data and in the direction of build-
ings models that are simpler and effective in terms of cost and time.
References
1. Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl
Discov 8(4):
2. Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data
stream classification. ACM Comput Surv (CSUR) 50(2):1–36
3. Dietterich TG (2002) Ensemble learning. The handbook of brain theory and neural networks
2:110–125
4. Ren Y, Zhang L, Suganthan PN (2016) Ensemble classification and regression-recent develop-
ments, applications and future directions. IEEE Comput Intell Mag 11(1):41–53
5. Dietterich TG (2000) Ensemble methods in machine learning. International workshop on mul-
tiple classifier systems. Springer, Berlin, Heidelberg, pp 1–15
6. Tharwat A, Gaber T, Awad YM, Dey N, Hassanien AE (2016) Plants identification using
feature fusion technique and bagging classifier. The 1st international conference on advanced
intelligent system and informatics (AISI2015), 28–30 Nov 2015, Beni Suef. Egypt. Springer,
Cham, pp 461–471
7. Wu Z, Li N, Peng J, Cui H, Liu P, Li H, Li X (2018) Using an ensemble machine learn-
ing methodology-Bagging to predict occupants-thermal comfort in buildings. Energy Build
173:117–127
8. Jiang T, Li J, Zheng Y, Sun C (2011) Improved bagging algorithm for pattern recognition in
UHF signals of partial discharges. Energies 4(7):1087–1101
9. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bag-
ging, boosting, and variants. Mach Learn 36(1–2):105–139
10. Kotsiantis SB (2014) Bagging and boosting variants for handling classifications problems: a
survey. Knowl Eng Rev 29(1):78
11. Schapire RE (2013) Explaining adaboost. Empirical inference. Springer, Berlin, Heidelberg,
pp 37–52
12. Prabhakar SK, Rajaguru H (2017) Adaboost classifier with dimensionality reduction techniques
for epilepsy classification from EEG. International conference on biomedical and health infor-
matics. Springer, Singapore, pp 185–189
13. Haixiang G, Yijing L, Yanan L, Xiao L, Jinling L (2016) BPSO-adaboost-KNN ensemble
learning algorithm for multi-class imbalanced data classification. Eng Appl Artif Intell 49:176–
193
14. Shi H, Wang H, Huang Y, Zhao L, Qin C, Liu C (2019) A hierarchical method based on
weighted extreme gradient boosting in ECG heartbeat classification. Comput Methods Progr
Biomed 171:1–10
15. Ghalejoogh GS, Kordy HM (2020) Ebrahimi F (2020) A hierarchical structure based on Stack-
ing approach for skin lesion classification. Expert Syst Appl 145:
16. Lakshmanaprabu SK, Shankar K, Ilayaraja M, Nasir AW, Vijayakumar V, Chilamkurti N (2019)
Random forest for big data classification in the internet of things using optimal features. Int J
Mach Learn Cybern 10(10):2609–2618
17. Paul A, Mukherjee DP, Das P, Gangopadhyay A, Chintha AR, Kundu S (2018) Improved
random forest for classification. IEEE Trans Image Process 27(8):4012–4024
18. Banfield RE, Hall LO, Bowyer KW, Bhadoria D, Philip Kegelmeyer W, Eschrich S (2004)
A comparison of ensemble creation techniques. International workshop on multiple classifier
systems. Springer, Berlin, Heidelberg, pp 223–232
19. Liu W, Zhang M, Luo Z, Cai Y (2017) An ensemble deep learning method for vehicle type
classification on visual traffic surveillance sensors. IEEE Access 5:24417–24425
136 R. Harine Rajashree and M. Hariharan
20. Zheng X, Chen W, You Y, Jiang Y, Li M, Zhang T (2020) Ensemble deep learning for automated
visual classification using EEG signals. Patt Recogn 102:
An Improved Particle Swarm
Optimization-Based System
Identification
1 Introduction
intelligent approaches provide alternate solutions with less complexity and with high
convergence [6]. Particle swarm optimization (PSO) was introduced in [7] for IIR
filter coefficients identification. To reconstruct missing elements of N-dimensional
data, this PSO was adopted in [7]. Earlier to this, directly the parameters were esti-
mated using PSO under ideal conditions in [8]. Evolutionary-based algorithms such
as genetic and differential evolution algorithms [9, 10] were also tried in digital
filter (IIR) coefficients identification. These aforementioned intelligent approaches
need control parameters, and their selection influences the convergence and hence
nonparametric-type algorithms were also applied for identification of parameters of
IIR and FIR filters. Teaching and learning-based optimization (TLBO) applied in
[11] to identify filter parameters. Other mathematical-based algorithms were also
available in literature for estimation of filters in noise conditions [12, 13]. Recently
cat behavior-oriented optimization algorithm (CSO) [1], gravitational search-based
technique (GSA) [2], and recent algorithms [14] were applied in estimation problem
for better convergence.
For accurate identification of filter coefficients with fast convergence rate,
improved PSO is used in this paper since the PSO algorithm is easy to implement and
fast to execute. However, the control parameters influenced on final results are mini-
mized by damping nature and achieved close values to exact solution. The efficacy
is tested with few well-defined models discussed in consequent sections.
2 Problem Formulation
Identification of the exact modal parameters of the unknown system from the obser-
vations of output and input patterns is known as system identification. This task is
completed by parameters substitutions of the model for the set of standard inputs so
that its output matches the system actual outputs. The schematic representation of
system identification is given in Fig. 1 in line with definition where the parameters
are identified by optimization algorithm.
The input (x)–output (y) relation can be described in terms of the following Eq. (1)
N
M
bi y(n − i) = ak x(n − k) (1)
i=0 k=0
In Eq. (1), N (≥ M) is the filter’s order. The transfer function of the filter described
in Eq. (1) is given by
M
Y (z) ak z −k
H (z) = = k=0 (2)
X (z) N
i=0 bi z
−i
Actual
IIR System
Model
IIR System
Optimization
Algorithm
M −k
k=0 ak z
H (z) = N (3)
1 + i=1 bi z −i
Y (z) a0 + a1 z −1 + a2 z −2 + · · · + a M z −M
H (z) = = (4)
X (z) 1 + b1 z −1 + b2 z −2 + · · · + b N z −N
To identify the correct parameters of the actual system, an error is calculated from
the known and unknown systems outputs using the equation given by
For nearer parameters estimation, the error defined in Eq. (5) is approaching zero
for the entire time scale. Identification of such exact parameters is achieved using
population search-based techniques where the objective function is framed with the
help of error shown in Eq. (5) is given by
N
J = min e(k)2 (6)
k=1
All the terms in Eqs. (7), (8) are as same as PSO ( pni represents position and vni
represents velocity) and ωd and cd are the damping values inserted for each control
parameter. Since the acceleration coefficients are mixing with random number, only
inertia weight is more influenced parameter. Therefore, the dynamic change is consid-
ered only for inertia weight for the rest of the paper. The improvements in the results
with the proposed method are reported in Sect. 4.
4 Simulation Results
To analyze the performance of the improved PSO for the estimation parameters, two
case studies have been taken.
Test system 1: The transfer function of the fourth-order plant (fourth-order IIR
filter) is given by
a0 + a1 z −1 + a2 z −2 + a3 z −3
H (z) = (9)
1 − b1 z −1 − b2 z −2 − b3 z −3 − b4 z −4
In Eq. (9), the actual coefficients of the unknown system are presented in first row
of Table 1. As stated in Sect. 3, initially the identification of the filter parameters
is checked using PSO with constant control parameters. At ω = 0.8, the minimum
value of the objective function is achieved. The estimated parameters are reported in
An Improved Particle Swarm Optimization-Based System … 141
the second row of Table 1. At other constant values of the inertia weight parameter,
the optimal values achieved with PSO are also presented in Table 1. Among all cases,
the best solution is achieved when the ω = 0.2. However, with the proposed dynamic
natured ω, similar estimated values are achieved with worst initialization of control
parameters. The final objective function values at three different inertia weights 0.2,
0.6, and 0.8 are −56.51, −52.61, and −31.58 dB, respectively. However, −55.56 dB
is the function value for the proposed method where the initial value is started at 0.8
inertia weight.
Test system 2: The transfer function of the third-order plant is given by
a0 + a1 z −1 + a2 z −2
H (z) = (10)
1 − b1 z −1 − b2 z −2 − b3 z −3
In Eq. (10), the actual coefficients of the unknown system are presented in first row
of Table 2. The improved PSO is applied to identify the unknown plant parameters
and the final solution values are reported in Table 2 which is close to actual values.
5 Conclusions
In this paper, improved PSO is applied for estimation of unknown plant parameters.
This method avoids the additional burden in the selection of control parameters of
the conventional PSO and produces global optimal values (close) irrespective of
initialization process. The results provided with two higher-order models show the
advantages of the improved PSO in the system identification process.
142 P. Eswari et al.
References
1. Panda G, Pradhan PM, Majhi B (2011) IIR system identification using cat swarm optimization.
Expert Syst Appl 38(10):12671–12683
2. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2011) Filter modeling using gravitational search
algorithm. Eng Appl Artif Intell 24(1):117–122
3. Chicharo JF, Ng TS (1990) Gradient-based adaptive IIR notch filtering for frequency estimation.
IEEE Trans Acoust Speech Sig Process 38(5):769–777
4. Netto SL, Diniz PS, Agathoklis P (1995) Adaptive IIR filtering algorithms for system
identification: a general framework. IEEE Trans Educ 38(1):54–66
5. Took CC, Mandic DP (2010) Quaternion-valued stochastic gradient-based adaptive IIR
filtering. IEEE Trans Sig Process 58(7):3895–3901
6. Cho C, Gupta KC (1999) EM-ANN modeling of overlapping open-ends in multilayer microstrip
lines for design of bandpass filters. In: IEEE antennas and propagation society international
symposium 1999 Digest. Held in conjunction with: USNC/URSI National Radio Science
Meeting (Cat. No. 99CH37010), vol 4. IEEE, pp 2592–2595
7. Hartmann A, Lemos JM, Costa RS, Vinga S (2014) Identifying IIR filter coefficients using
particle swarm optimization with application to reconstruction of missing cardiovascular
signals. Eng Appl Artif Intell 34:193–198
8. Durmuş B, Gün A (2011) Parameter identification using particle swarm optimization.
In: Proceedings, 6th international advanced technologies symposium, (IATS 11), Elazığ,
Turkey, pp 188–192
9. Ma Q, Cowan CF (1996) Genetic algorithms applied to the adaptation of IIR filters. Sig Process
48(2):155–163
10. Karaboga N (2005) Digital IIR filter design using differential evolution algorithm. EURASIP
J Adv Sig Process 2005(8):856824
11. Singh R, Verma HK (2013) Teaching–learning-based optimization algorithm for parameter
identification in the design of IIR filters. J Inst Eng (India): Ser B 94(4):285–294
12. DeBrunner VE, Beex AA (1990) An informational approach to the convergence of output
error adaptive IIR filter structures. In: International conference on acoustics, speech, and signal
processing. IEEE, pp 1261–1264
13. Wang Y, Ding F (2017) Iterative estimation for a non-linear IIR filter with moving average
noise by means of the data filtering technique. IMA J Math Control Inf 34(3):745–764
14. Zhao R, Wang Y, Liu C, Hu P, Jelodar H, Yuan C, Li Y, Masood I, Rabbani M, Li H, Li B
(2019) Selfish herd optimization algorithm based on chaotic strategy for adaptive IIR system
identification problem. Soft Comput, 1–48
15. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-
international conference on neural networks, vol 4. IEEE, pp 1942–1948
16. Nagaraju TV, Prasad CD (2020) Swarm-assisted multiple linear regression models for
compression index (Cc) estimation of blended expansive clays. Arab J Geosci 13(9)
17. Prasad CD, Biswal M, Nayak PK (2019) Wavelet operated single index based fault detection
scheme for transmission line protection with swarm intelligent support. Energy Syst, 1–20
18. Nagaraju TV, Prasad CD, Raju MJ (2020) Prediction of California bearing ratio using particle
swarm optimization. In: Soft computing for problem solving. Springer, Singapore, pp 795–803
Channel Coverage Identification
Conditions for Massive MIMO
Millimeter Wave at 28 and 39 GHz Using
Fine K-Nearest Neighbor Machine
Learning Algorithm
Keywords LoS · Massive MIMO · mm-wave · NLoS · Pathloss and power delay
profile
1 Introduction
The technical aspects of massive MIMO have brought the integration of various
networks together namely fifth generation. Massive MIMO with mm-wave has gained
into account [23]. In a dense forest environment, the future wireless sensor networks
and IoT devices are deployed and examined for pathloss at 2.4 GHz. Two scenarios
were considered for simulation and measurements, namely (a) free space zone and
(b) diffraction zone, where the delay spread values are also presented [24].
In a 5G network, mm-wave frequencies with IoT devices are considered for higher
bandwidth. Pathloss is analyzed for 38 GHz in an outdoor environment for charac-
terization of LoS and NLoS for antenna polarizations, namely vertical–vertical and
vertical-horizontal. Parameters such as cell throughput, throughput at edges, spectral
efficiency, and fairness index [25]. IoT in industrial applications needs a wide range
of bandwidth with seamless connectivity, where a need for localization is a must.
Normally, narrow band IoT is used for industrial and healthcare applications, and
GPS fails to localize for such a low power IoT device. Based on the distance, an
analytical model is designed for geometric probabilistic analysis [26]. Indoor posi-
tioning system for an IoT environment is considered, and a wi-fi trilateration method
is proposed for position of users with respect to the reference points [27]. In IoT,
localization of devices has gained importance for quality of experience.
A localization technique is designed on the part of Butler project, namely FP-
7, which is the most commendable technique on EU projects [28]. The feasibility
of massive MIMO in industrial IoT is analyzed by placing massive antennas in
datacenter for seamless connectivity with large number of devices [29]. Massive
MIMO with IoT is analyzed for connectivity under two generic schemes, namely
massive machine-type communication and ultra-reliable low latency communication.
For physical layer technologies between massive MIMO and IoT, a strong integration
is needed in terms of protocol design [30].
2 Network Architecture
3 Simulation Methodology
operated at 28 and 39 GHz frequencies. To extend support for such huge number of
devices, massive MIMO with mm-wave frequencies is considered as a use case.
A distributed massive MIMO with 128 transmitting antennas and 4 receiving
antennas operating at 28 and 39 GHz is considered. As multiple copies of signals
arrive at the receiver, an energy detector is utilized where signals above 10 dB is
allowed. Based on these channels, measurements are made on the uplink channel.
With the principle of channel reciprocity, the downlink channel measurements are
obtained. Parameters such as pathloss and power delay profile are extracted for
LoS and NLoS scenarios. A dataset is constructed based on simulations for both
the parameters. A fine K-NN algorithm is trained, and a tenfold cross-validation is
performed on the dataset. A 1000 samples of dataset has been built. The full dataset
is divided into ten parts where nine parts are used for training and one part is used
for testing.
4 Simulation Measurements
Y ( f ) = H ( f ) · x( f ) + d( f ) (1)
where
d( f ) is said to be the IID noise vector at the receiver with mean zero and a unit
variance,
148 V. C. Prakash et al.
Table 1 Simulation
Entities Remarks
parameters
Simulation tool MATLAB 2019a
Frequency 28, 39 GHz
No. of transmitting antennas 128
No. of receiving antennas 4
Channel Indoor, urban
Environment AWGN
Operating mode TDD
x( f ) is said to be the signal transmitted from the RRH antennas to the user
equipment and
H ( f ) represents the channel frequency response.
With channel reciprocity, the downlink channel is obtained with Hermatian
transpose of the uplink channel that is given as
Y ( f )α H ( f )H H ( f ) · x( f ) + d( f ) (2)
With inverse fast Fourier transform, the channel frequency response is converted
into channel impulse response (Table 1).
h(t) = I F F T (H ( f )) (3)
5 Pathloss
Pathloss is defined as the ratio of received power to transmitted power. The received
signal power varies with the distance. As the distance increases, the received signal
power reduces thus degrading the signal. In comparison with LoS, NLoS suffers a
greater degradation due to the presence of obstacles. Figure 2 shows the pathloss in
LoS environment at frequencies of 28 and 39 GHz. Figure 3 shows the pathloss in
NLoS environment of operating frequencies of 28 and 39 GHz. From the figures, it
is clear that as the operating frequency increases, the pathloss also increases.
The receiver power of the antenna in pathloss is given by
PT
PR = Ar (4)
4π D β
where
PT Transmitting power
D distance between the transmitter and receiver
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 149
β Pathloss exponent
Ar Aperture area of the receiver
The aperture area of the receiver is given by
α
Ar = G r (5)
4π
where
Gr Receiver Gain
c
α= (6)
f
The antenna gains of both the transmitter and receiver antenna gain is considered
to be one due to isotropic antennas. The pathloss incurred between the transmitter
and receiver with respect to distance is given
PL = PT − PR (7)
4π f
PL = 20 log (8)
c
The power delay profile is given by the average received signal power with respect
to its time delay. Figure 4 represents the power delay profile where the time varying
channels are examined. From the figure, it is clearly visible that the channel operating
at 39 GHz experiences more distortions and appears to be higher in NLoS conditions
than LoS condition on comparison with 28 GHz. It is evident that the negative
received signal power tends to be in NLoS condition.
The power delay profile is given by
7 Fine-KNN
From the dataset of 1000 samples, true positive value is 530, false positive value
is 21, false negative value is 16, and the true negative value is 433. The true positive
rate for class 0 is 0.96 and for class 1 is 0.96, whereas the false negative rate is 0.04
for both the classes, respectively. The positive predictive value of class 1 and 0 is 0.95
and 0.97 and the false discovery rate of class 1 and 0 is 0.05 and 0.03. For 39 GHz,
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 153
true positive value is 521, false positive value is 29, false negative value is 22, and
true negative value is 428. The true positive rate for class 1 and 0 is 0.95 and false
negative rate for both classes 1 and 0 is 0.05. The positive predictive value of class 1
and 0 is 0.94 and 0.96, and the false discovery rate of class 1 and 0 is 0.06 and 0.04.
Figures 9 and 11 show the receiver operating characteristics of pathloss at 28 and
39 GHz for positive class 1. The performance of the classifier is determined with
154 V. C. Prakash et al.
the ROC curve and is represented with a red point on the curve. The accuracy of the
classifier is denoted by the area under the curve. As the area under the curve increases,
the accuracy of the classifier also increases. Figure 9 shows the curve between true
positive rate and false positive rate where TPR is 0.96 and the FPR is 0.04. From
Fig. 11, the curve depicts the TPR of 0.95 and FPR of 0.05. Figures 10 and 12. show
the ROC curve of pathloss of 28 and 39 GHz for positive class 0. From the graph,
Fig. 10, the TPR is 0.96 and FPR is 0.04 and from Fig. 12, TPR is 0.95 and FPR is
0.05.
Scatterplot represents the relationship between the variables and its correlation.
Figures 13 and 14 show the scatterplot for PDP dataset of classes 1 and 0, i.e., LoS
and NLoS conditions. The red data points depict the LoS condition, and the blue
data points depict the NLoS conditions. Misclassification such as LoS into NLoS
conditions and vice versa is marked with red- and blue-colored cross-markings.
However, the scatterplot shows no correlation between the variables in the datasets.
The confusion matrix explains the prediction accuracy of the machine learning
algorithms. It displays the values between the observations and the predictions, that
are depicted in Figs. 15 and 16. It provides the true positive, false positive, false
negative, and true negative values for PDP datasets operating at 28 and 39 GHz.
From the dataset of 1000 samples, the true positive value is 518, false positive
value is 32, false negative value is 23, and true negative value 427 for a dataset
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 155
operating at 28 GHz. The true positive rate for class 1 and 0 is as 0.95 and 0.94, and
the false negative rate is of 0.05 and 0.06. The false discovery rate for class 1 and 0
is 0.07 and 0.04, and the positive predictive value for class 1 and 0 is 0.93 and 0.96.
For 39 GHz, the true positive value is 515, false positive value is 35, false negative
value is 32, and the true negative value is 418, the true positive rate for class 1 and
0 is 0.93 and 0.94, and false negative rate is 0.07 and 0.06 for classes 1 and 0. The
positive predictive value is of 0.93 and 0.96 for class 1 and 0, whereas the false
discovery rate is of 0.07 and 0.04 for class 1 and 0, respectively.
Figures 17 and 18 show the ROC of PDP (positive class 1) for both 28 and
39 GHz, respectively. The performance of the classifier is analyzed with the ROC
curves that are denoted with red points in the graph. The area under the curve shows
the accuracy of the algorithm, higher the area under the curve, higher is the accuracy
of the algorithm. The graph is plotted for true positive rate and false positive rate
where the true positive rate is of 0.95, and false positive rate is of 0.06 at 28 GHz.
For 39 GHz, the true positive rate is of 0.94 and the false positive rate is of 0.05.
Figures 19 and 20, ROC of PDP for 28 and 39 GHz (positive class 0). It represents
the performance of the prediction accuracy that is represented by red dot and the area
under the curve. Larger the area under the curve indicates the higher accuracy of the
algorithm. The curve is plotted for true positive rate and false positive rate where
156 V. C. Prakash et al.
TPR is 0.94 and the FPR is 0.05 for 28 GHz and for 39 GHz, TPR is 0.93 and FPR
is 0.07.
8 Conclusion
References
1. Chen X, Kwan Ng DW, Yu W, Larsson EG, Al Dhahir N, Schober R (2020) Massive access
for 5G and beyond. arXiv preprint arXiv:2002.03491, pp 1–21
2. Maschietti F, Gesbert D, de Kerret P, Wymeersch H (2017) Robust location-aided beam
alignment in millimeter wave massive MIMO. In: IEEE Global Communications Conference
3. Li X, Leitinger E, Oskarsson M, Astrom K, Tufvesson F (2019) Massive MIMO based localiza-
tion and mapping exploiting phase information of multipath components. IEEE Trans Wireless
Commun 18(9):4254–4267
4. Savic V, Larsson EG (2015) Fingerprinting based positioning in distributed massive MIMO
systems. In: IEEE 82nd vehicular technology conference
5. Garcia N, Wymeersch H, Larsson EG, Haimovich AM, Coulon M (2017) Direct localization
for massive MIMO. IEEE Trans Signal Process 65(10):2475–2487
6. Zhang J., Dai L, Li X, Liu Y, Hanzo L (2018) On low resolution ADCs in practical 5G
millimeter-wave massive MIMO systems. IEEE Commun Mag 56(7):205–211
7. Mahyiddin WA, Mazuki ALA, Dimyati K, Othman M, Mokhtar N, Arof H (2019) Localization
using joint AOD and RSS method in massive MIMO system. Radioengineering 28(4):749–756
8. Mendrzik R, Meyer F, Bauch G, Win MZ (2019) Enabling situational awareness in millimeter
wave massive MIMO systems. IEEE J Sel Top Signal Process 13(5):1196–1211
9. Shahmansoori A, Garcia GE, Destino G, Grandos G, Wymeersch H (2015) 5G position and
orientation estimation through millimeter wave MIMO. IEEE Globecom Workshops
10. Leila G, Najjar L (2020) Enhanced cooperative group localization with identification of
LOS/NLOS BSs in 5G dense networks. Ad Hoc Netw 88–96
11. Lin Z, Lv T, Mathiopoulos PT (2018) 3-D indoor positioning for millimeter-wave massive
MIMO systems. IEEE Trans Commun 66(6):2472–2486
12. Lv T, Tan F, Gao H, Yang S (2016) A beamspace approach for 2-D localization of incoherently
distributed sources in massive MIMO systems. Signal Process 30–45
13. Abhishek, Sah AK, Chaturvedi AK (2016) Improved sparsity behaviour and error localization
in detectors for large MIMO systems. IEEE Globecom Workshops
14. Sun X, Gao X, Ye Li G, Han W (2018) Single-site localization based on a new type of fingerprint
for massive MIMO-OFDM systems. IEEE Trans Veh Techn 67(7), 6134–6145
15. Zhang X, Zhu H, Luo X (2018) MIDAR: massive MIMO based detection and ranging. In:
IEEE Global Communication Conference
16. Fedorov A, Zhang H, Chen Y (2018) User localization using random access channel signals
in LTE networks with massive MIMO. In: IEEE 27th International Conference on Computer
Communication and Networks (ICCCN)
17. Wan L, Han G, Shu L, Feng N (2018) The critical patients localization algorithm using sparse
representation for mixed signals in emergency healthcare system. IEEE Syst J 12(1):52–63
18. Prakash VC, Nagarajan G, Ramanathan P (2019) Indoor channel characterization with multiple
hypothesis testing in massive multiple input multiple output. J Comput Theor Nanosci
16(4):1275–1279
19. Prakash VC, Nagarajan G, Batmavady S (2019) Channel analysis for an indoor massive MIMO
mm-wave system. In: International conference on artificial intelligence, smart grid and smart
city applications
20. Prakash VC, Nagarajan G (2019) A hybrid RSS-TOA based localization for distributed indoor
massive MIMO systems. In: International conference on emerging current trends in computing
and expert technology. Springer, Berlin
21. Majed MB, Rahman TA, Aziz OA, Hindia MN, Hanafi E (2018) Channel characterization and
path loss modeling in indoor environment at 4.5, 28 and 38 GHz for 5G cellular networks. Int
J Antennas Propag Hindawi 1–14
22. Dziak., Jachimczyk., Kulesza.: IoT-Based Information System for Healthcare Application:
Design Methodology Approach, Applied Sciences, MDPI, 7(6), 596, (2017).
23. Park K, Park J, Lee JW (2017) An IoT system for remote monitoring of patients at home. Appl
Sci MDPI 7(3):260
Channel Coverage Identification Conditions for Massive MIMO Millimeter … 163
24. Iturri P, Aguirre E, Echarri M, Azpilicueta L, Eguizabal A, Falcone F, Alejos A (2019) Radio
channel characterization in dense forest environments for IoT-5G. Proceedings, MDPI 4(1)
25. Qamar F, Hindia MHDN, Dimyati K, Noordin KA, Majed MB, Rahman TA, Amiri IS (2019)
Investigation of future 5G-IoT Millimeter-wave network performance at 38 GHz for urban
microcell outdoor environment. Electronics, MDPI 8(5):495
26. Tong F, Sun Y, He S (2019) On positioning performance for the narrow-band internet of things:
how participating eNBs impact? IEEE Trans Ind Inf 15(1):423–433
27. Rusli ME, Ali M, Jamil N, Md Din M (2016) An improved indoor positioning algorithm based
on RSSI-trilateration technique for internet of things. In: IOT, International conference on
computer and communication engineering (ICCCE)
28. Macagnano D, Destino G, Abreu G (2014) Indoor positioning: a key enabling technology for
IoT applications. IEEE World Forum on Internet of Things
29. Lee BM, Yang H (2017) Massive MIMO for industrial internet of things in cyber-physical
systems. IEEE Trans Ind Inf 14(6):2641–2652
30. Bana A-S, Carvalho ED, Soret B, Abrao T, Marinello JC, Larsson EG, Popovski P (2019)
Massive MIMO for Internet of Things (IoT) connectivity. Phys Commun 1–17
31. Li J, Ai B, He R, Wang Q, Yang M, Zhang B, Guan K, He D, Zhong Z., Zhou T, Li N (2017)
Indoor massive multiple-input multiple-output channel characterization and performance
evaluation. Front Inf Technol Electr Eng 18(6):773–787
Flip Flop Neural Networks: Modelling
Memory for Efficient Forecasting
Abstract Flip flops circuits can memorize information with the help of their bi-
stable dynamics. Inspired by the flip flop circuits used in digital electronics, in this
work we define a flip flop neuron and construct a neural network endowed with
memory. Flip flop neural networks (FFNNs) function like recurrent neural networks
(RNNs) and therefore are capable of processing temporal information. To validate
FFNNs competency on sequential processing, we solved benchmark time series
prediction and classification problems with different domains. Three datasets are used
for time series prediction: (1) household power consumption, (2) flight passenger
prediction and (3) stock price prediction. As an instance of time series classifica-
tion, we select indoor movement classification problem. The FFNN performance
is compared with RNNs consisting of long short-term memory (LSTM) units. In
all the problems, the FFNNs show either show superior or near equal performance
compared to LSTM. Flips flops shall also potentially be used for harder sequential
problems, like action recognition and video understanding.
1 Introduction
Efficient prediction and forecasting of time series data involve capturing patterns
in the history of the data. Feed-forward networks process data in a single instance
and therefore cannot solve time series prediction problems unless data history is
explicitly presented to the network through tapped delay lines and other techniques
for representing temporal features. Alternatively, a neural network with loops, which
would have the ability to recognize the patterns in the data and comprehend the
information over multiple time steps and also, can process temporal data by virtue
their memory property [1–3]. Flip flops are basic electronic circuits with memory
property. Based on their input conditions, they can hold on to information through
time or simply allow to pass through [4]. In this paper, we show how using neuron
models that emulate electronic flips flops it is possible to construct neural networks
with excellent temporal processing properties. It will be demonstrated that such
networks show high levels of performance with prediction and classification of time
series. This paper describes an implementation of flip flop neural networks for solving
benchmark sequential problems. It also presents a brief comparison of the results with
LSTM-based models.
2 Previous Work
Holla and Chakravarthy [5] have described a deep neural network consisting of a
hidden layer of flip flop neurons. The network compared favourably with LSTM
and other RNN models on long-delay decision-making problems. In this paper, we
use a variation of the flip flop neural network described in [5] and apply problem
statements pertaining to prediction and classification of time series data. Flip flop
model is compared with the popular RNN variant, long short-term memory (LSTM),
and the observations of the comparative study are described in Sect. 3.
Long short-term memory (LSTM) is a popular and dominant variant of RNNs and
one of the most widely employed memory-based units [1, 2]. They are widely used
for sequential and time series-based problems. They have the ability to retain infor-
mation that would be highly discriminative in the final decision-making process
and also exhibit the ability to forget or discard information that contributes less to
the performance of the model. The LSTM operates through a gating mechanism.
This was done predominantly to overcome the issues of catastrophic forgetting or
vanishing gradients that are associated with the long-term memory. Basically, the
task of the gates is to purge information that would only end up serving as noise to
the model and utilize information that would prove to be crucial. This mechanism of
remembering and forgetting information, achieved through the gating mechanism,
is implemented by training the gating parameters over the set of input features. The
input gate of LSTM decides what new patterns of data would be preserved in the
long-term memory. Thus, the input gate filters out the combination of the current
input and the short-term memory and transmits it to the downstream structures. The
Flip Flop Neural Networks: Modelling Memory … 167
forget gate of LSTM decides which patterns from the long-term memory will be
preserved and which ones would be discarded by multiplying the long-term memory
with forget vectors obtained by the current input.
The output gate is the one which produces the short-term memory which will be
used by the next LSTM cell as memory from the previous time step.
3 Model Architecture
Flip flops are electrical latch circuits that can store the information related to the
previous time steps. They consist of two stable states: one that stores information of
the previous time steps and the other that clears the state. The current state of flip
flop is dependent on stable input states and the previous state. SR, JK, D, and T flip
flops are the types of flip flops that are widely used in the field of digital electronics.
We will be working with the SR flip flop in our simulation experiments as it is the
simplest implementation of the bi-stable latch. The JK flip flop is more generalized
version of SR flip flop with the ability to avoid undefined state when both the inputs
are high. The SR flip flop is a bi-stable latch circuit that consists of two competing
inputs S and R, to SET and the RESET, respectively. The output of the circuit at the
current time step (Qt ) will have a value of 0 or 1 depending on the states of S and R
inputs. The feedback mechanism helps to model memory in this circuitry. Thus, SET
(S), RESET (R), and the output of the previous time step (Qt − 1 ) are given as input
to the flip flop for the particular time step. Table 1 shows the truth table of simple
bi-stable SR flip flop.
From Table 1, we can interpret that changes to the inputs (S and R) are crucial in
determining the state at the current time step (Qt ). The equivalent algebraic equation
of the SR flip flop is given below.
Q t = S + R Q t−1 (1)
Since the current output depends on the last state, the SR flip flop has memory
property. The complete architecture of a flip flop-based neural network is given in
Fig. 1.
The network depicted in Fig. 1 has five layers out of which the third layer is the
flip flop layer which plays the role of memory. Input to the flip flop layer from the
previous layer is divided into half to obtain set and reset input of flip flops. The output
Fig. 1 Flip flop neural network consisting of five layers with flip flop layer consists of five flip
flops
at the previous time step Q t − 1 is fed back as input to the flip flops to obtain Qt , the
output at the next step. Thus, the final step is the propagation of output Qt through a
linear layer following till the last output layer.
The forward propagation of layers involving the conventional neurons is given as,
Z = W · X +b (2)
where ‘W ’ is the weights of the network initialized via Xavier initialization, ‘X’
is the input data and ‘b’ is the bias term.
The output obtained post the application of the activation function is given as,
A = tanh(Z ) (3)
Let N i be the total nodes preceding flip flops, and the inputs set (X S ) and reset
(X R ) are obtained by,
where X[k] is the output of the kth neuron (zero indexed) from the previous layer
and mod defines the modulus function to get the remainder.
The weights projecting from the previous layer to the flip flops are modelled as
one-to-one connections, so that the dimension of previous layer to the flip flop layer
is twice that of the flip flops,
Flip Flop Neural Networks: Modelling Memory … 169
S = X S · WS (8)
R = X R · WR (9)
The final state V (t + 1) of the flip flop layer at t + 1 time step is given by the
equation,
where V (t) is the previous state of the flip flop layer. The backpropagation through
flip flop layer is defined as,
∂E ∂E ∂ OF F ∂ S
= · · (11)
∂ws ∂ OF F ∂S ∂ws
∂E ∂E ∂ OF F ∂ R
= · · (12)
∂w R ∂ OF F ∂R ∂w R
∂ OF F
= 1 − (1 − R) ∗ V (t) (13)
∂R
∂ OF F
= − V (t) ∗ (S + 1) (14)
∂R
∂S
=S (15)
∂ws
∂R
=R (16)
∂w R
4 Experiments
In the household power consumption problem, the dataset was obtained from Kaggle
[6] which contained measurements gathered between December 2006 and November
2010, for a total of 36 months. It comprises seven features: the global active power,
submetering 1, submetering 2, submetering 3, voltage, global intensity and the global
reactive power. The problem was framed in such a way that the model must predict
the global active power of the future months provided that it is trained on the historic
data of the aforementioned seven features. The architecture followed for the flip flop
model comprises three hidden layers similar to Fig. 1, with dimensions: 10, 5 and
10. The input layer and output layer are set with 7 and 1, respectively. The LSTM
model on the other hand comprises a hidden layer of size 30. Both the models used a
window size of 60 during training to represent the history of the data. Adam optimizer
is used to optimize the model parameters, and surrogate loss was calculated using
mean squared error (MSE) as a training criterion for both the models. Figure 2 shows
the predictions obtained through an FFNN and an LSTM network. Table 2 presents
the MSE of the model on the test dataset. From both Fig. 2 and Table 2, it is clearly
evident that the flip flop network is more efficient than LSTM in its ability to predict
the power consumption pattern.
1
N
Mean Square Error (MSE) = (X i − Yi )2
N i
where ‘N’ is the total number of inputs, X i is the ground truth label for that input,
and Y i is the output predicted by the FFNN.
Flip Flop Neural Networks: Modelling Memory … 171
Fig. 2 Predictions done by flip flop network and LSTM on the power consumption test dataset
The international airline passenger dataset obtained through Kaggle [7] contains the
number of passengers travelled every month internationally. The task is to predict
this univariate time series representing the number of passengers that would travel in
the subsequent months. The architecture of flip flop network used is similar to that
for power consumption dataset except the input layer is set with a single neuron and
LSTM consists of 20 hidden units. The same setup of loss function and optimizer
from previous experimentation were used. From Fig. 3 and Table 3, it can be easily
Fig. 3 Depicts the prediction given by the flip flop network and LSTM on the test data of flight
passenger dataset
172 S. Sujith Kumar et al.
concluded that the flip flop network clearly outperforms LSTM and is more effective
in capturing the relevant temporal information from the history to predict accurately.
We have taken the Apple stock price dataset for the stock price prediction experiment
which consists of Apple’s stock in the period of January 2010 to February 2020 [8].
This is a multivariate time series prediction problem as there were four features: open,
high, low and close. The task of the model was to predict the open prices of the stock
in future days (test set) based on training on the past data. The FFNN used for this
dataset consists of input layer and output layer of dimensions 4 and 1, respectively,
and hidden layers are set with the same number of neurons as that of the previous
two experiments, whereas LSTM is modelled with 30 hidden units. A window size
of 60 days is used to capture the history. Figure 4 shows the predictions made for the
subsequent 1200 days by flip flop network and LSTM on test data. Table 4 presents
the MSE on test data by both the models. Although the FFNN predicts the correct
pattern of the stock’s opening price, this time the predictions are not as accurate as
Fig. 4 Depicts the predictions given by the flip flop model and LSTM on the apple stock test dataset
that by LSTM and also the predictions made by LSTM are less noisy as compared
to the FFNN prediction.
‘Indoor user movement’ dataset is a benchmark dataset used for time series classifi-
cation and retrieved from the UCI repository [9]. The dataset is collected by placing
four wireless sensors in an environment and a moving subject; on the basis of the
subject’s movement, the wireless sensors recorded a series of signal strength along
time. Depending on the recorded signal strength from the sensors, the movement
was binary classified as −1 and 1, wherein −1 and +1 represent no transition and
transition between the rooms, respectively. The architecture utilized for the FFNN
consists of 4 neurons and 1 neuron in the input and output layers, respectively, with
no changes in hidden layers as used in previous experiments. Further, LSTM is set
with 30 hidden units; binary cross-entropy (BCE) loss is used as the empirical loss
during training. A window size of 70 was used to look back at data of previous time
steps during the training and validation phase. Figure 5 shows the validation accuracy
during at every 10 epochs on the validation dataset, which is shuffled and split in a
70:30 ratio from the original dataset. It is noted that at the end of training, the FFNN
acquired the accuracy of 91.01, whereas LSTM reached a lesser accuracy of 88.06.
5 Conclusion
Flip flops modelled as neural networks prove to be an effective way of keeping hold of
previous patterns in terms of memory and utilizing them for predictions on the future
time steps. Experiments of FFNNs applied to the domains of time series prediction
and classification show that flip flop models give performance which is comparable,
if not superior, to the performance of LSTMs which are the current state-of-the-art
models for solving temporal problems. Application of flip flops can also be extended
to more complex domains such as scene analysis, video analysis and understanding.
References
1. Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term
memory (LSTM) network. Phys D Nonlinear Phenomena 404:132306
2. Santhanam S (2020) Context based text-generation using LSTM networks. arXiv preprint arXiv:
2005.00048
3. Wu W et al (2019) Using gated recurrent unit network to forecast short-term load considering
impact of electricity price. Energy Procedia 158:3369–3374
4. Chakrabarty R et al (2018) A novel design of flip-flop circuits using quantum dot cellular
automata (QCA). In: 2018 IEEE 8th annual computing and communication workshop and
conference (CCWC). IEEE
5. Pawan Holla F, Chakravarthy S (2016) Decision making with long delays using networks of
flip-flop neurons. In: 2016 International joint conference on neural networks (IJCNN), pp 2767–
2773
6. UCI Machine Learning (2016) Household electric power consumption, Version 1, Aug 2016.
Retrieved from www.kaggle.com/uciml/electric-power-consumption-data-set/metadata
7. Andreazzini D (2017) International airline passengers, Version 1, June 2017. Retrieved from
www.kaggle.com/andreazzini/international-airline-passengers/metadata
8. Nandakumar R, Uttamraj KR, Vishal R, Lokeshwari YV (2018) Stock price prediction using
long short-term memory. Int Res J Eng Technol (IRJET) 3362–338
9. Bacciu D, Barsocchi P, Chessa S et al (2014) An experimental characterization of reservoir
computing in ambient assisted living applications. Neural Comput Appl 24:1451–1464. https://
doi.org/10.1007/s00521-013-1364-4
Wireless Communication Systems
Selection Relay-Based RF-VLC
Underwater Communication System
1 Introduction
geographical data collection [6]. Furthermore, VLC link has shown superiority over
existing traditional wireless candidates in underwater, since the VLC setup is more
easy to install, very cost effective for various underwater applications deployment
over short distances [7].
VLC carried out a research interest to deploy as communication purposes and
opens the door of future opportunities to signal transmission over long distances.
Hence, the VLC technology has drawn an attention of many researchers worldwide,
mainly due to its potentials for the next-generation communication systems (i.e.,
5G and 6G). Additionally, VLC technology based on light emitting diodes (LEDs)
plays a major role in the broadband wireless communication technology nowadays.
On terrestrial basis communication, VLC carried out a research interest toward high
data rate of various deployable applications. It is a solution of increasing demand of
high data-traffic and an alternative communication media for indoor applications. It
shows high performance especially for indoor wireless communication using LED
lamps. LEDs also have many advantages, i.e., low electrical power consumption, tiny
in size, reliable for using long lifetime period, cost effective and have a capability
of very less heat radiation [8]. Additionally, the current VLC technology possesses
various other merits and availability as no electromagnetic interference and radiation
with highly secure data rate transmission without any delay. An another approach of
VLC could be complimented with FSO link to enable high data rate in multimedia
services for improving system quality of service (QoS) and performances [9].
RF communication is used as potential wireless signal carrier on terrestrial basis
over long distances. However, dual-hop hybrid communication link is investigated for
improving system quality over long ranges in different channel conditions and their
requirements. A combined RF-underwater optical communication (UWOC) system
has been proposed in the literature [10–12]. The above matter facts, widely addressed
turbulence and pointing error phenomena for UVLC, few works are recorded. In
[13], the authors investigated a vertical UVLC system model using Gamma-Gamma
probability distribution and assumed strong channel turbulence conditions of water
along with a closed form expression to calculate BER performance at the undersea-
based destination as formulated. Similarly, in [14], the authors have designed the
UVLC system over log-normal distribution and derived closed form expression for
asymptotic BER performances. There is one another study carried out an impressive
work that proposed a multi-input multi-output (MIMO)-based UVLC channel model
and widely analyzed the diversity gain of the system in the presence of turbulence
properties of aqueous medium [15]. Throughout this work, we investigate a combined
hybrid link with two different communication hops for information transmission in
different channel conditions.
This study is more focused on VLC underwater link. The underwater channels
are highly complex to deploy communication setup and chosen for modulation tech-
niques. To calculate the BER performance by implementing on-off-keying (OOK)
modulation technique makes simple the system performance analyses rather than
higher modulation techniques. By motivated from this work, we investigated the
suitable relay for RF-VLC hybrid dual-hop communication under strong channel
conditions along with misalignment of transceivers. Additionally, we comprised the
180 M. Furqan Ali et al.
BER performance for RF-UVLC link considering nakagami-m fading factor with
VLC link impaired by strong turbulence channel conditions throughout amplify-
and-forward (AF) and decode-and-forward (DF) relay protocols in different types
of water media. To the best of our knowledge, in the literature, there are only a few
studies that have investigated the VLC signaling in different water mediums. In this
regard, the main contribution of this work is to propose the concept of a cooperative
hybrid relay RF-UVLC communication system model in highly turbid water channel
conditions with different relay protocols.
The proposed system model is assumed as a hybrid RF-UVLC link where a single
antenna source node s broadcasts signal and communicates with underwater-based
destination node d through an AF relay node r which is equipped with two antennas
for reception. We consider the signal transmission through AF relay (see Fig. 1)
and evaluate the system performance. In another assumption, we used a DF relay to
assist the transmission of information and comprise the suitable relay protocol for the
hybrid RF-UVLC system in different waters. RF link is modeled by the Nakagami-m
distribution fading, while VLC link is assumed and modeled by Gamma-Gamma and
exponential Gaussian distribution random variables. Moreover, the relay consists of
two directional antennas where one antenna toward to the source for receiving signal
via RF link and other toward to the destination which is responsible to transmit
information to underwater-based destination through the VLC link. The AF relay
node receives the signals from s, amplifies and then forwards it to the undersea
destination d, the whole system concept is depicted in Fig. 1. While the DF relay
receives the signals from s, regenerates the signal and then forwards to the undersea
destination d. It is noteworthy that s and d are identical and mounted with a single
antenna. The whole system works in half-duplex mode. The signal broadcasted by
source to relay and further forward regenerated or amplified information transmission
with fixed amplification gain factor to the undersea-based destination.
Selection Relay-Based RF-VLC Underwater Communication System 181
Fig. 1 Proposed system model of dual-hop hybrid cooperative RF-VLC underwater wireless com-
munication, where the source communicates with the destination through a relay in different com-
munication links along with different channel conditions
where x denotes the transmitted information signal, n sr is used to model noise con-
sidering by additive white Gaussian noise (AWGN) with zero mean and variance σsr2 .
Moreover, h sr is used to model channel coefficient between source and relay, s − r ,
which is modeled by the Nakagami-m distribution with w, z ∈ {s, r, d}. Similarly,
the distance between communication nodes is represented by dwz . In addition, the
182 M. Furqan Ali et al.
The average SNR for the s-r link is denoted by γsr = γ̄sr h 2 and can be expressed as
E sr
γ̄sr = , (3)
σsr2
= E sr E rd ηrs ρh sr h rd x + E rd ηrs ρh rd n sr + n rd , (8)
Ps Pn
in which Ps and Pn denote the received signal power and additive noise power,
respectively. Thus, the SNR at the destination for r − d link can be written as
2
E rd E sr dsr-t η2 rs2 |h sr |2 |h rd |2
γrd = . (9)
E rd η rs |h rd | σsr2 + E sr2 dsr−t |h sr |2 σrd2
2 2 2 2 + σsr2 σrd2
Optical link severs due to the physio-chemical properties of water channel and color
division organic materials (CDOM). Additionally, the existing suspended small-scale
and large-scale factors are also responsible for optical signal fading. The VLC signal
is directly affected by the absorption and scattering phenomenon in the underwater
environment. In our investigated system, considering the r − d link, the path loss
is modeled and using the extinction coefficient c(λ), which is the total sum of the
absorption a(λ) and scattering b(λ) coefficients. The expected numerical values of
a(λ), (bλ) and c(λ) for simulation results in different waters are mentioned in Table
1. Then, the extinction coefficient, which varies according to the types of water, is
described as
c(λ) = a(λ) + b(λ), (10)
If the r and d nodes are apart by a given vertical distance dt and the Beer Lambert
expression is adopted, the path loss of UVLC link is given by [21]
h l = exp(−c(λ)dt ). (11)
The VLC link requires the proper alignment of beam length. Additionally, the neces-
sity condition of link arrangementis that the receiver should be in field of view
(FOV) for proper signal transmission. The modified channel attenuation coefficient
is described in terms of path loss and geometrical losses. The geometrical losses
depend on the physical constraint of setup, i.e., aperture diameter, full width trans-
mitter beam divergence angle and correction coefficient for simulation results. If the
signal is transmitted through a collimated light source, such as a laser diode, then
the geometrical losses are negligible, and as a consequence, the signal depends only
on the path loss. Moreover, the geometrical losses are taken into account for the
diffused and semi-collimated sources, i.e., LEDs and diffused LDs [22]. Thus, the
overall attenuation of the optical link in terms of path loss and geometrical losses is
described as [23]
2 τ
Dr Dr
h l ≈ h pl + h gl ≈ dt−2 exp −c dt1−τ , (12)
θF θF
where Dr , θF and τ are represented receiver aperture diameter, full width transmitter
beam divergence angle, and correction coefficient, while h pl and h gl are the path loss
and geometrical losses, respectively.
As the proposed system model is paid greater attention to the underwater VLC link,
the RF link is excluded from the scope of this research on considering complex
channel conditions. Therefore, s − r link simply modeled by nakagami-m fading.
Furthermore, VLC link is modeled to consider heavy turbulence channel conditions
combining with pointing errors. According to [24], the VLC link under strong channel
conditions follows the Gamma-Gamma probability distribution and can be expressed
as (αrd +βrd )
(αrd βrd ) 2 (αrd +βrd )
f h t (h t ) = 2 (h t ) 2 −1 K αrd −βrd (2 αrd βrd h t ), (13)
Γ (αrd )Γ (βrd )
where Γ (·) is the Gamma function, and modified Bessel function of the second kind
is denoted by K (αrd −βrd ) (·). The large-scale αrd and small-scale βrd parameters are,
respectively, given by Elamassie et al. [13]
Selection Relay-Based RF-VLC Underwater Communication System 185
⎡ ⎛ ⎞ ⎤−1
⎢ ⎜ 0.49σh2t ⎟ ⎥
αrd = ⎢ ⎜
⎣exp ⎝
⎟ ⎥
76 ⎠ − 1⎦ (14)
12
1 + 0.56(1 − Θ)σh t 5
⎡ ⎛ ⎞ ⎤−1
⎢ ⎜ 0.51σh2t ⎟ ⎥
βrd = ⎢ exp ⎜
⎝
⎟ − 1⎥ , (15)
⎣ 12 6 ⎠ ⎦
5
1 + 0.69σh5t
In (14) and (15), the scintillation index for plane wave model known as Rytov
7 11
variance denoted by σh2t . The Rytov variance can be defined as 1.23Cn2 k 6 L 6 where
wave number k = 2π λ
, refractive-index structure Cn2 and corresponding link length
denoted by L.
The variation of αrd and βrd parameters are shown in Fig. 2. It is clearly depicted
that as scintillation index increses the parameters decrease and vise-versa although
αrd parameter shows an exponentially increment comparing with βrd parameter while
increasing scintillation index.
45
-Large scale factor
-Small scale factor
40
35
30
Parameters: ,
25
20
15
10
0
-2 -1 0 1 2
10 10 10 10 10
Log intensity variance
In the proposed dual-hop communication system model, based on (5) and (6) to
calculate BER performance of receiving signal, a single-carrier OOK modulation
technique is used to transmit information. In the RF-UVLC hybrid communication
link, the instantaneous SNR for whole system at the destination employing AF pro-
tocol with fixed-gain relay can be calculated as
γsr γrd
γd = , (17)
γrd + C
where the fixed-gain amplifying constant is denoted by C. The overall system BER
performance on OOK modulation technique over AWGN channel can be calculated
as [25]
γd
BERd = Q . (18)
2
4 Numerical Results
This section covers the numerical analysis of BER for the proposed dual-hop hybrid
RF/UVLC system model considering distinct water medium. Unless otherwise, we
used the physical constraints of setup as photo-detector aperture diameter Dr , full
width transmitter beam divergence angle θ , distance between base station and relay
Selection Relay-Based RF-VLC Underwater Communication System 187
dsr and the vertical depth of destination from sea surface dt . The numeric values used
in simulation are summarized in Table 2. We target to calculate BER performances
at the destination which is located vertically in underwater environment depicted
as Fig. 1. To simulate the results, we calculate αrd and βrd when the ocean water
temperature and salinity vary. In our simulation, αrd and βrd are considered when the
water temperature is 5◦ C and salinity at 20 per salinity unit (PSU). The corresponding
values used in simulation are summarized in Table 2.
100
10-1
-2
10
BER
10-3
0 10 20 30 40 50 60
SNR, [dB]
10 -2
BER
-3
10
-4
10
0 5 10 15 20 25 30 35 40 45 50
SNR, dB
100
-1
10
-2
10
BER
10-3
0 10 20 30 40 50 60
SNR, dB
Fig. 5 Detailed comparison of both AF and DF relayed BER performance of RF-VLC hybrid
communication link in different water mediums
tion link. Relatively, comparable performance in clear ocean water achieving higher
BER performance with AF relayed RF-VLC combination in low SNR values.
A more detailed comparison between AF and DF relayed RF-VLC hybrid com-
munication link in different waters is depicted in Fig. 5. It is clearly mentioned that
the both the relayed links in pure seawater in strong channel conditions over low
SNR values show better performance. The BER performance of coastal ocean water
shows poor performance in both the combined communication links.
The best comparison of BER performance in different water of dual-hop commu-
nication link with pointing errors is depicted in Fig. 6. A more detailed comparison is
summarized for both relayed communications. In Fig. 6, the performances of AF and
DF relay are analyzed in highly turbid channel conditions along with pointing error
impairments. It clearly refers that AF relay has shown the superior BER performance
than DF relay in all types of channel conditions in low SNR while DF relay shows
almost the same performance but relatively higher SNR channel conditions. Thus,
achieving high BER in different water mediums at high SNR, the RF-UVLC link
shows better performance as combined wireless hybrid communication candidate.
190 M. Furqan Ali et al.
100
10-1
-2
10
BER
-3
10
0 10 20 30 40 50 60
SNR, dB
Fig. 6 Detailed comparison of both AF and DF relayed BER performance of RF-VLC hybrid
communication link in different water mediums considering only pointing error
5 Conclusion
Acknowledgements This work was funded by the framework of the Competitiveness Enhancement
Program of the National Research Tomsk Polytechnic University grant No. VIU-ISHITR-180/2020.
Selection Relay-Based RF-VLC Underwater Communication System 191
References
1. Ali MF, Jayakody DNK, Chursin YA, Affes S, Dmitry S (2019) Recent advances and future
directions on underwater wireless communications. Arch Comput Methods Eng, 1–34
2. Ali MF, Jayakody NK, Perera TDP, Krikdis I (2019) Underwater communications: recent
advances. In: ETIC2019 international conference on emerging technologies of information
and communications (ETIC), pp 1–6
3. Zeng Z, Fu S, Zhang H, Dong Y, Cheng J (2016) A survey of underwater optical wireless
communications. IEEE Commun Surv Tutor 19(1):204–238
4. Dautta M, Hasan MI (2017) Underwater vehicle communication using electromagnetic fields
in shallow seas. In: 2017 international conference on electrical, computer and communication
engineering (ECCE). IEEE, pp 38–43
5. Kaushal H, Kaddoum G (2016) Underwater optical wireless communication. IEEE. Access
4:1518–1547
6. Awan KM, Shah PA, Iqbal K, Gillani S, Ahmad W, Nam Y (2019) Underwater wireless sensor
networks: a review of recent issues and challenges. Wirel Commun Mobile Comput
7. Majumdar AK (2014) Advanced free space optics (FSO): a systems approach, vol 186. Springer,
Berlin
8. Singh S, Kakamanshadi G, Gupta S (2015) Visible light communication-an emerging wire-
less communication technology. In: 2015 2nd international conference on recent advances in
engineering & computational sciences (RAECS). IEEE, pp 1–3
9. Gupta A, Sharma N, Garg P, Alouini M-S (2017) Cascaded fso-vlc communication system.
IEEE Wirel Commun Lett 6(6):810–813
10. Zhang J, Dai L, Zhang Y, Wang Z (2015) Unified performance analysis of mixed
radio frequency/free-space optical dual-hop transmission systems. J Lightwave Technol
33(11):2286–2293
11. Ansari IS, Yilmaz F, Alouini M-S (2013) Impact of pointing errors on the performance of
mixed rf/fso dual-hop transmission systems. IEEE Wirel Commun Lett 2(3):351–354
12. Charles JR, Hoppe DJ, Sehic A (2011) Hybrid rf/optical communication terminal with spherical
primary optics for optical reception. In: 2011 international conference on space optical systems
and applications (ICSOS). IEEE, pp 171–179
13. Elamassie M, Sait SM, Uysal M (2018) Underwater visible light communications in cascaded
gamma-gamma turbulence. In: IEEE globecom workshops (GC Wkshps). IEEE, 1–6
14. Elamassie M, Al-Nahhal M, Kizilirmak RC, Uysal M (2019) Transmit laser selection for under-
water visible light communication systems. In: IEEE 30th annual international symposium on
personal, indoor and mobile radio communications (PIMRC). IEEE, 1–6
15. Yilmaz A, Elamassie M, Uysal M (2019) Diversity gain analysis of underwater vertical mimo
vlc links in the presence of turbulence. In: 2019 IEEE international black sea conference on
communications and networking (BlackSeaCom). IEEE, pp 1–6
16. Illi E, El Bouanani F, Da Costa DB, Ayoub F, Dias US (2018) Dual-hop mixed rf-uow com-
munication system: a phy security analysis. IEEE Access 6:55-345–55-360
17. Elamassie M, Uysal M (2019) Vertical underwater vlc links over cascaded gamma-gamma
turbulence channels with pointing errors. In: 2019 IEEE international black sea conference on
communications and networking (BlackSeaCom). IEEE, pp 1–5
18. Farid AA, Hranilovic S (2007) Outage capacity optimization for free-space optical links with
pointing errors. J Lightwave Technol 25(7):1702–1710
19. Song X, Cheng J (2012) Optical communication using subcarrier intensity modulation in strong
atmospheric turbulence. J Lightwave Technol 30(22):3484–3493
20. Hanson F, Radic S (2008) High bandwidth underwater optical communication. Appl Opt
47(2):277–283
21. Mobley CD, Gentili B, Gordon HR, Jin Z, Kattawar GW, Morel A, Reinersman P, Stamnes
K, Stavn RH (1993) Comparison of numerical models for computing underwater light fields.
Appl Opt 32(36):7484–7504
192 M. Furqan Ali et al.
22. Elamassie M, Uysal M (2018) Performance characterization of vertical underwater vlc links
in the presence of turbulence. In: 11th international symposium on communication systems,
networks & digital signal processing (CSNDSP). IEEE, pp 1–6
23. Elamassie M, Miramirkhani F, Uysal M (2018) Channel modeling and performance charac-
terization of underwater visible light communications. In: 2018 IEEE international conference
on communications workshops (ICC workshops). IEEE, pp 1–5
24. Sandalidis HG, Tsiftsis TA, Karagiannidis GK (2009) Optical wireless communications with
heterodyne detection over turbulence channels with pointing errors. J Lightwave Technol
27(20):4440–4445
25. Grubor J, Randel S, Langer K-D, Walewski JW (2008) Broadband information broadcasting
using led-based interior lighting. J Lightwave Technol 26(24):3883–3892
Circular Polarized Octal Band CPW-Fed
Antenna Using Theory of Characteristic
Mode for Wireless Communication
Applications
Reshmi Dhara
1 Introduction
Presently, printed monopole antennas are widely utilized because of their many
attractive features, like omnidirectional radiation patterns, wide impedance band-
width, ease of fabrication, low cost, and lightweight. On the other hand, monopole
antennas are well-matched with integrated circuitry of wireless communication
R. Dhara (B)
Department of Electronics and Communication Engineering, National Institute of Technology
Sikkim, Ravangla 737139, India
e-mail: [email protected]
devices because of their easy feed techniques. Additionally, maximum number of the
monopole antennas aims are to support linearly polarized (LP) radiation. Utilization
of CP antennas is more beneficial to create and obtain CP EM waves and is compar-
atively less dependent to their exact positionings. The CP is habitually produced by
stimulating two nearly degenerate orthogonal resonant modes of equal amplitude.
So, if CP is generated by the monopole antenna, its performance may get significantly
improved. CP antennas can generate polarization variety by creating both left-hand
circular polarization (LHCP) and right-hand circular polarization (RHCP).
Circular polarization can be generated by single feed with slotted loop for L-band
communication [1]. Here, the extremely big size of the antenna achieved IBW and
ARBW of both 11.1% (140 MHz, f c = 1.262 GHz). A rectangular microstrip antenna
of size 24 × 16 × 1.5875 mm3 with slotted ground plane attained linearly polarized
(LP) IBW are 5.125–5.395 and 5.725–5.985 GHz [2]. A planar antenna with large
dimension 30 × 30 × 1.6 mm3 achieved LP IBW of 220 MHz, i.e., 8.9% [3]. Another
large triple-strip antenna with size 50 × 50 × 1.6 mm3 gave dual ARBW of 70 MHz
at the lower band (1.57 GHz) and 60 MHz at the upper band (2.33 GHz) within IBW
spanning over 1.43–3.29 GHz [4]. Another designed concept has been discussed in
Ref. [5] for getting wide IBW and wide ARBW.
The papers cited above produced either only dual or triple CP bands with narrower
IBW. In this paper achieved IBW is very wide and produced octal bands CP are large
compared to earlier reports and the structure of the antenna is very simple. The
proposed antenna generated ultra-wideband with superior impedance matching over
wider frequency range and able to exciting CP multibands. CPW-fed is used in this
antenna. The proposed antenna simultaneously has reasonably higher gain, wider
bandwidth, and multi-CP characteristics in comparison to the earlier cited antennas.
Owing to antenna designs generating octal CP bands, motivated us for to focus our
work on planning a compact antenna giving octal or more CP bands.
However, the TCM analysis is lacking in previously reported literature for the
wideband/UWB antennas. Here, the proposed CP antenna also utilizes the analysis
of TCM tools [6] for the broad impedance band and octal CP band response.
Herein this paper, the proposed antenna is designed using Eqs. (i)–(viii) following
the references of some related existing designs [7–9]. Primary goal of this work was to
design a multi-CP band application antenna for small form factor devices. We hoped
to design the circularly polarized compact planar monopole antenna with single-fed
for octal bands CP applications. This would defeat the necessity for use of multiple
circular polarized antennas.
The implemented antenna is planed taking 1.5 GHz as the theoretical lower
resonating frequency, so that it can cover all of the Wi-Fi, WLAN, and UWB bands.
But after optimization, the simulated impedance bandwidth of the proposed antenna
observed at 1.5 GHz with smaller dimension of size compared to theoretical size.
This is excellent result fulfilling the criteria for miniaturization. Our designed antenna
gave octal band CP characteristics, in addition also gave broad IBW. A hexagonal
ring connected with an annular ring over left most corners which gives wide CP
bands (AR ≤ 3 dB) inside the range IBW curve. In association with related study,
in our knowledge, this is one of the best results achieved. FR4-epoxy substrate is
Circular Polarized Octal Band CPW-Fed Antenna Using … 195
used here, which produces some extra complicacies beyond 12 GHz. It places a
restriction on our proposed antenna that it cannot be used for applications further
than microwave frequency band. Simulation was done using ANSYS Electronics
Desktop 2020R1. For the proposed antenna simulated IBW span is over 1.5 GHz to
beyond 14 GHz. In addition, the simulated ARBWs for octal bands are 310 MHz
(3.13–3.34 GHz), 310 MHz (6.45–6.76 GHz), 40 MHz (8.08–8.12 GHz), 120 MHz
(8.63–8.74 GHz), 180 MHz (9.49–9.67 GHz), 30 MHz (11.69–11.72 GHz), 40 MHz
(12.19–12.23 GHz), and 140 MHz (12.57–12.71 GHz). Size of the antenna is 55 ×
56 × 1.6 mm3 , with 23.24% size reduction can be possible.
The paper is prepared as follows: Sect. 2: Theory of Characteristics Modes
analysis; Sect. 3: Procedure of Antenna Design; Sect. 4: Experimental Result and
Discussion; and Sect. 5: Conclusion.
Here, CMA operation for implemented antenna is demonstrated. Figure 1 shows the
CMA analysis for this octal band circularly polarized antenna. Figure 1a described
the implemented antenna configuration and plot of eigenvalues versus frequency plot
of the seven fundamental characteristic modes. The eigenvalues (λn = 0) for mode
2, 3, 4, 6, 7, 8, 10 are dominant mode, whereas no mode is inductive mode as it has
very high eigenvalues (λn > 0) and 1, 5, and 9 modes are capacitive mode as it has
low high eigenvalues (λn < 0).
Figure 1b described the implemented antenna configuration and plot of charac-
teristics angle versus frequency plot of the seven fundamental characteristic modes.
Here modes 2, 3, 4, 7, 6, 8, 10 cross 180° axis line at resonant frequencies 12.36, 11.8,
10.65, 9.93, 8.27, 8.09, 3.90 GHz, respectively, which are dominant mode, whereas
1, 5, 9 modes are non-resonant mode as they does not cross 180° axis line.
Similarly Fig. 1c described large model significance value around 1 is dominant
at their resonant frequencies for mode 2, 3, 4, 7, 6, 8, 10 and model significance
(<0.43) for 1, 5, and 9 modes is non-resonant.
Existing mode for generation of octal band CP:
It is well known that phase required for a resonant CP mode is 90° and amplitude
is equal for two degenerated modes. Here the antenna for CMA analysis is done
without feeding structure. The substrate and ground plane are considered infinite
and radiator is considered zero thickness with PEC [10, 11].
Figure 2a–h depicts the modal far field radiation pattern for the radiator for 10
modes at their CP resonating frequency. Figure 2a shows that the mode 10 and 8 are
the fundamental mode in x-direction and y-direction, respectively, and radiate in +
z direction at f c1 = 3.2 GHz. A circular polarization is produced due to these two
modes as phase differences between these two orthogonal modes have of 90°.
Figure 2b shows that the mode 10 and 6 are the fundamental mode in y-direction
and x-direction correspondingly and radiate in +z-direction at f c2 = 6.6 GHz. A
196 R. Dhara
Fig. 1 TCMs analysis for seven modes for a eigenvalues, b characteristics angle, and c modal
significance
Circular Polarized Octal Band CPW-Fed Antenna Using … 197
(f) f = 11.7
c6
GHz
(g) f c7 = 12.2
GHz
(h) f c8 = 12.6
GHz
Fig. 2 Modal distribution of current and modal field (radiation pattern at far field) for 7 modes at
CP resonating frequencies
CP is produced due to these pair of modes as phase differences between these two
orthogonal modes have of 90°. Modes 8, 7 lead to cancelation of electric field in the
far field zone at +z-direction.
Figure 2c shows that the modes 8 and 6 are the important mode in y-direction and
x-direction correspondingly and radiate in +z-direction at f c3 = 8.1 GHz. A CP is
formed due to this pair of modes having phase difference of 90°. Modes 10, 7, 4, 3
lead to cancelation of electric field in the far field zone at +z-direction.
198 R. Dhara
Figure 2d shows that the modes 7 and 6 are the fundamental mode in y-direction
and x-direction, respectively, and radiate in +z-direction at f c4 = 8.7 GHz. Again
CP is produced due to these two modes as phase differences between these two
orthogonal modes have of 90°. Modes 10, 8, 4, 3 lead to cancelation of electric field
in the far field zone at +z-direction.
Figure 2e shows that the modes 7 and 6 are the fundamental mode in y-direction
and x-direction, respectively, and radiate in +z-direction at f c5 = 9.5 GHz. Here also,
CP is produced due to these two modes are orthogonal modes with 90° angle. Modes
10, 8, 4, 3 lead to cancelation of electric field in the far field zone at +z direction.
Figure 2f shows that the modes 4 and 3 are the fundamental mode in y-direction
and x-direction, respectively, and radiate in +z-direction at f c6 = 11.7 GHz. A circular
polarization is produced due to these two modes as phase differences between these
two orthogonal modes have of 90°. Modes 10, 8, 7, 6, 2 lead to cancelation of electric
field in the far field zone at +z direction.
Figure 2g shows that the modes 3 and 2 are the fundamental mode in x-direction
and y-direction, respectively, and radiate in +z-direction at f c7 = 12.2 GHz. A circular
polarization is produced due to these two modes as phase differences between these
two orthogonal modes have of 90°. Modes 10, 8, 7, 6, 4 lead to cancelation of electric
field in the far field zone at +z-direction.
Figure 2h shows that the modes 7 and 3 are the fundamental mode in x-direction
and y-direction, respectively, and radiate in +z-direction at f c8 = 12.6 GHz. A circular
polarization is produced due to these two modes as phase differences between these
two orthogonal modes have of 90°. Modes 10, 8, 6, 4, 2 lead to cancelation of current
in the far field zone at +z direction.
A. Antenna configuration
Design of the antenna is shown in Fig. 3. Antenna is made up on an FR4-
epoxy substrate with a relative permittivity εr = 4.4 and loss tangent tan δ
= 0.02. The overall size of the antenna is 55 × 56 × 1.6 mm3 . As Fig. 3
demonstrates, the feed line of 50 of length L f and width W f is coupled to
an impedance transformer. The hexagonal ring monopole antenna is joined at
its left corner with an annular ring. The width Ws for both rings are same. As
an alternative to multifed structure, the designed single-fed antenna is based
on a dual-loop monopole structure not including additional feeding parts for
a 90° phase difference among two orthogonal polarized modes. During the
performance evaluation of the antenna, the radiator structure can be examined
as a combination of two perturbed rings. Thus, the perturbation reasons the
radiating of CP wave at wanted bands. Owing to increase the bandwidth the
monopole antenna has shifted as of the center of the feed line to the left in
order to increase CP bandwidth. Underneath the conditions of an unchangeable
Circular Polarized Octal Band CPW-Fed Antenna Using … 199
antenna sized, parameters W s , d, W sub , L 1 are the key parameters affect the
bandwidth of the proposed antenna.
B. Design of the antenna parameters at resonating Frequency (f r1 ) 1.5 GHz [7–9]:
Considering dielectric constant εr = 4.4, and thickness (h) = 1.6 mm of
FR-4 epoxy substrate, and the resonant frequency (f r1 ) = 1.5 GHz. Using
conventional design procedure, the design parameters of the antenna theoretical
calculation depict as following equation.
I. Width of the patch calculation (W ):
1 2
W = (i)
2 fr √μ0ε0 εr +1
1 1
λg = ×√ (iii)
fr 1√μ0ε0 ε r eff
λg
L eff = (iv)
2
V. Length extension calculation:
L (εr eff + 0.3) Wh + 0.264
= 0.412 (v)
h (εr eff − 0.258) Wh + 0.8
L sub = L + 6 × h (vii)
Wsub = W + 6 × h (viii)
Substituting εr = 4.4 and f r1 = 1.5 GHz, we got: L sub = 56.98 and W sub =
70.42 mm.
After optimization we got: L sub = 55 mm and W sub = 56 mm. So now 23.24%
size reduction can be possible at same lower resonating frequency 1.5 GHz.
C. Operating principle
Figure 4 depicts comparisons improvement process of the antenna. Figures 5
and 6 show the comparisons of simulated IBW and ARBW improvement for
the proposed antenna.
Antenna 1 utilizes a hexagonal patch with a quarter wave transmission and
CPW-fed ground plane [12–15]. From Figs. 5 and 6, it is understandable that
impedance bandwidth is good but ARBW is very poor. In order to improve
IBW more, ARBW use a hexagonal ring in state of hexagonal patch. But still it
does not satisfy the 3 dB ARBW criterion. So in order to create the perturbation
Fig. 5 Reflection coefficient (simulated) for the proposed antenna improvement process
Fig. 6 Axial ratio bandwidth (simulated) for the proposed antenna improvement process
of electric field, the hexagonal ring has shifted to the left from the center by
a distance ‘d’. From Fig. 6, it can be seen that the ARBW has improved at
higher frequency band. So further to improve ARBW at lower band also, an
annular ring has added to the left corner of the hexagonal ring. From Figs. 4
and 5, it could be seen the greatest improvement of IBW and ARBW by using
Antenna 4 compared to other structures. Since the results were better compared
to related studies, Antenna 4 has been chosen for final design.
D. Parametric study
So as to obtain the optimized dimensions of the monopole antenna, various
parameters performance is analyzed. The CPW-fed ground plane length was
studied and presented in Fig. 7.
Fig. 7 Return loss (simulated) versus frequency for the proposed monopole antenna with various
L 1 length
Fig. 8 Return loss (simulated) versus frequency for the proposed monopole antenna with various
W s length
L 1 = 13.0, 13.5, 14.0, 14.5, 15.0, 15.5, and 16.0 mm on the presentation of the
proposed monopole antenna were also studied and presented in Fig. 5. The acquired
results evidently demonstrate that the lowest resonant frequency shifts toward higher
frequency band and forms with decreasing the height ‘L 1 ’. Higher resonant frequency
also shifts toward higher frequency band and disappears gradually. This for the reason
that rising the feed gap height considerably increases the total parallel capacitive
effect, lowers quality factor, and increases the resonance frequency [16]. The IBW
alters considerably with changing the length. This is because of the sensitivity of the
impedance matching to the feed gap. The ground plane, serving as an impedance
matching circuit, adjusts the input impedance and the operating bandwidth while the
length is different. To be summarized, the optimized length (L 1 ) is examined to be
at L 1 = 15.5 mm.
Circular Polarized Octal Band CPW-Fed Antenna Using … 203
Figure 8 shows the simulated IBW for different length of the strip width. From the
graph it is clear that IBW change is negligible while it has a great impact on ARBW
as shown in Fig. 9. From Fig. 9 it is clear that when W s = 1.8 mm gives wide CP
band for better impedance matching at this position.
Figure 10 depicts the IBW for varying length of ‘d’. From the graph it is clear
that IBW change is almost negligible while it has a great influence on ARBW as
shown in Fig. 11. From Fig. 11 it is clear that when d = −1.4 mm, wide CP band is
given at this position. This position gives perturbation of electric field with the same
amplitude and has a 90° phase difference.
Figure 12 depicts the IBW for varying length of the substrate. From the graph it
is clear that IBW change is negligible while it has a great effect on ARBW as shown
in Fig. 13. From Fig. 13 it is clear that when W sub = 56 mm, wide CP band for better
impedance matching is given at this position.
Fig. 9 ARBW (simulated) versus frequency for the proposed monopole antenna with various W s
length
Fig. 10 Return loss (simulated) versus frequency for the proposed monopole antenna with various
d length
204 R. Dhara
Fig. 11 ARBW (simulated) versus frequency for the proposed monopole antenna with various
d length
Fig.12 Return loss (simulated) versus frequency for the proposed monopole antenna with various
W sub length
Fig. 13 ARBW (simulated) versus frequency for the proposed monopole antenna with various
W sub length
Circular Polarized Octal Band CPW-Fed Antenna Using … 205
Fig. 17 Radiation pattern (simulated) for the proposed antenna in the a XZ (ϕ = 0°) plane and
b YZ (ϕ = 90°) plane
(12.19–12.23 GHz, f c 12.2 GHz), and 140 MHz (12.57–12.71 GHz, f c 12.6 GHz),
respectively, which can be useful for wireless communication application.
Figure 17 demonstrates the radiation pattern which one are simulated for the XZ
plane (ϕ = 0°) and YZ plane (ϕ = 90°) at 3.2 GHz. The radiation pattern which one
is simulated in Fig. 14a, b has shown that the cross-polarization levels are 19 dB
lower than co-polarization levels in the broadside direction.
Figure 18 demonstrates that the radiation pattern which one is simulated, RHCP
at broadside direction. Similarly it could be exposed the radiation pattern at erstwhile
CP resonating frequency (f c ) is RHCP at broadside direction. When the structure is
inverted, opposite polarization can be seen.
The peak gain which one is simulated of this implemented antenna is 8.37 dBi at
a frequency of 12.7 GHz as illustrated in Fig. 19 and the peak gain at the other center
frequency of CP band is −2.35 dBi at 3.2 GHz, 1.25 dBi at 6.6 GHz, 3.55 dBi at
8.1 GHz, 3.62 dBi at 8.7 GHz, 3.79 dBi at 9.5 GHz, 4.76 dBi at 11.7 GHz, 5.97 dBi
at 12.2, and 7.74 dBi at 12.6 GHz correspondingly.
Figure 20 demonstrates radiation efficiency which one is simulated for imple-
mented antenna vs. frequency. On behalf of the all CP bands, simulated efficiency is
Circular Polarized Octal Band CPW-Fed Antenna Using … 207
inside 65–98% and the highest efficiency is 97.87% at 1.6 GHz, and the efficiency at
the center frequency of CP band is 95.68% at 3.2 GHz, 94.26% at 6.6 GHz, 90.25%
at 8.1 GHz, 87.25% at 8.7 GHz, 85.16% at 9.5 GHz, 72.21% at 11.7 GHz, 70.29%
at 12.2 GHz, and 69.74% at 12.6 GHz correspondingly which is especially good for
practical purpose.
Comparisons have been made in Table 1 with proposed antenna and very recently
design multiband antenna [17–19]. It is surveyed that the proposed work demon-
strates improved widest impedance band and octal CP characteristics in addition
compact size and good gain.
208 R. Dhara
5 Conclusion
An octal bands CPW-fed monopole antenna has been presented here. The designed
antenna is uncomplicated and simple to fabricate. In addition, implemented antenna
gets wider impedance bandwidth along with octal CP bands, which gratify the
necessities of the current octal CP bands wireless communication devices. Through
Circular Polarized Octal Band CPW-Fed Antenna Using … 209
the asymmetric CPW-fed techniques, simply modes among symmetric currents are
capable to be stimulated. These modes have a constant phase variation over respec-
tive wide range of frequencies, generating octal CP bands. This geometry formula
gives great expediency for the octal band CP antenna devise by designing the radiator
and feeding configuration individually. This TCM tools creates a great extent simpler
method to simulate a CP antenna assessment with former techniques.
References
1. Qing X, Chia YWM (1999) A novel single-feed circular polarized slotted loop antenna. In:
Antennas and propagation society international symposium, vol. 1. IEEE, pp 248–251, July
1999
2. Chakraborty U, Kundu A, Chowdhury SK, Bhattacharjee AK (2014) Compact dual-band
microstrip antenna for IEEE 802.11 a WLAN application. IEEE Antennas Wirel Propag Lett
13:407–410
3. Suma MN, Raj RK, Joseph M, Bybi PC, Mohanan P (2006) A compact dual band planar
branched monopole antenna for DCS/2.4-GHz WLAN applications. IEEE Microwave Wirel
Compon Lett 16(5):275–277
4. Hsu CW, Shih MH, Wang CJ (2016) A triple-strip monopole antenna with dual-band circular
polarization. In: Antennas and propagation (APCAP), 2016 IEEE 5th Asia-Pacific conference
on IEEE, pp 137–138, July 2016
5. Wu JW, Ke JY, Jou CF, Wang CJ (2010) A microstrip-fed broadband circularly polarized
monopole antenna. IET Microw Antennas Propag 4(4):518–525
6. Dhara R, Yadav S, Sharma MM, Jana SK, Govil MC (2021) A circularly polarized quad-
band annular ring antenna with asymmetric ground plane using theory of characteristic modes.
Progress Electromag Res M, 100:51–68. https://doi.org/10.2528/PIERM20102006
7. Balanis CA (2016) Antenna theory: analysis and design. Wiley, New York. ISBN-978-1-118-
64206
8. Pozar DM (1992) Microstrip antennas. Proc IEEE 80(1):79–91. https://doi.org/10.1109/5.
119568
9. Guo Y-X, Bian L, Quan Shi X (2009) Broadband circularly polarized annular-ring microstrip
antenna. IEEE Trans Antennas Propag 57(8):2474–2477. https://doi.org/10.1109/TAP.2009.
2024584
10. Dhara R, Mitra M (2020) A triple-band circularly polarized annular ring antenna with asym-
metric ground plane for wireless applications. Eng Rep 2(4):e12150. https://doi.org/10.1002/
eng2.12150
11. Dhara R, Jana SK, Mitra M (2020) Tri-band circularly polarized monopole antenna for wireless
communication application. Radioelectron Commun Syst 63(4):213–222
12. Dhara R (2020) Quad-band circularly polarized CPW-fed G-shaped printed antenna with square
slot. Radioelectron Commun Syst 63(7):376–385
13. Dhara R, Jana SK, Mitra M (2020) CPW-fed triple-band circularly polarized printed inverted
C-shaped monopole antenna with closed-loop and two semi-hexagonal notches on ground
plane. In: Optical and wireless technologies. Springer, Singapore, pp 161–175
14. Dhara R, Kundu T (2020) A compact inverted Y-shaped circularly polarized wideband
monopole antenna with open loop. Eng Rep 2:e12326. https://doi.org/10.1002/eng2.12326
15. Garbacz R, Turpin R (1971) A generalized expansion for radiated and scattered fields. IEEE
Trans Antennas Propag 19(3):348–358. https://doi.org/10.1109/TAP.1971.1139935
16. Alam M, Kanaujia BK, Beg MT, Kumar S, Rambabu K (2019) A hexa-band dual-sense
circularly polarized antenna for WLAN/Wi-MAX/SDARS and C-band applications. Int J RF
Microwave Comput Aided Eng 29(4):e21599
210 R. Dhara
17. Chen Y, Wang CF (2015) Characteristic modes: theory and applications in antenna engineering.
Wiley, New York
18. Pedram K, Nourinia J, Ghobadi C, Karamirad M (2017) A multiband circularly polarized
antenna with simple structure for wireless communication system. Microwave Opt Technol
Lett 59(9):2290–2297
19. Falade OP, Ur-Rehman M, Yang X, Safdar GA, Parini CG, Chen X (2020) Design of a compact
multiband circularly polarized antenna for global navigation satellite systems and 5G/B5G
applications. Int J RF Microwave Comput Aided Eng 30(6):e22182
Massive MIMO Pre-coders for Cognitive
Radio Network Performance
Improvement: A Technological Survey
Abstract Cognitive radio (CR) is the most reliable technology for efficient spectrum
usage. In cognitive radio network (CRN), primary users share frequency band with
secondary user. Secondary users relay the traffic of primary user while primary users
granted the restricted access of spectrum to secondary user. During establishment
of concurrent communication link between primary and secondary users, interfer-
ence between both links reduces the performance of CRN. Multiple input multiple
outputs (MIMO) have the capabilities to overcome inter-user interference (IUI) of
CRN in underlay mode and established concurrent communication link. Massive
MIMO systems utilize a large number of antennas at the base station to concurrently
serve a group of user equipment (UE) in the same frequency band. This technology
will help in improving channel capacity and throughput in 5G and beyond 5G wire-
less communication systems. This technology can achieve low-power consumption
at the base station. In massive MIMO-based CRN, pre-coding techniques helps to
mitigate inter-user interference during transmission of information to primary and
secondary users at the same time and frequency. In this review, pre-coding techniques
of linear, nonlinear, and constant envelope pre-coding which is classified where trun-
cated polynomial expansion of linear pre-coding gives better performance than zero
forcing, minimum mean square error, and regression-based linear pre-coding.
1 Introduction
Pre-coding technique has the capability of maximizing the data rate of multiple trans-
mission streams at the transmitter by suppressing the interferences between different
users in massive MIMO-CRN [10].
2 System Background
CR and MIMO both are the technologies used in wireless communication at physical
layer. CR technology is used in different bands for improving usage of spectrum while
MIMO concept is used to improve the spectral efficiency of same band by enhancing
throughput of or reducing inter-user interference (IUI) of communication link. By
combining benefits of both technologies, a new model CRN MIMO can be developed
that can support dynamic spectrum selection as well as effective utilization of selected
spectrum by exploiting benefits of spatial multiplexing [19]. For exploiting benefits of
spatial multiplexing, channel state information (CSI) can be acquired with the help of
supervised and unsupervised algorithms. In supervised approach, channels learn after
every coherence period by transmitting pilot symbol between transmitter and receiver
that reduces system throughput and spectral efficiency [20]. In unsupervised approach
without the knowledge of channel, information is extracted from the received signal.
Supervised pre-coding is designed by combining both approaches [21].
Massive multiple input multiple output (MIMO) system utilizes the same frequency
band concurrently to handle so many components at the base station (BS) and multiple
user equipment (UE) [22]. The spectral efficiency and link reliability of massive
MIMO have improved many folds in contrast to the existing MIMO technology
[23, 24] with reduction in operational power and BS hardware costs also [25, 26].
Massive MIMO Pre-coders for Cognitive Radio Network … 217
y = H1 x1 + H2 x2 + · · · + Hk xk (1)
where H1, H2 · · · H k are the channel parameters added with signal. This parameter
is utilized at BS for estimating the channel. Let S 1 , 2 ,· · · , k be the actual data of user
that BS transmits to user. After apply pre-coding, Z will be transmitted by BS
Z = P1 S1 + P2 S2 + · · · + Pk Sk (2)
218 M. Kothari and U. Ragavendran
u 1 = H1H Z + n (3)
u 2 = H2H Z + n (4)
Similarly,
u k = HkH Z + n (5)
(6)
Desired signal Inter-user interference
And
The proposed low complexity hybrid pre-coder design with lower feedback bits
utilizes the block sparsity structure, i.e., similar to virtual channel. This hybrid pre-
coding algorithm is divided into two steps. First step “preliminary block-support
sets identification” uses greedy sequence clustering for finding relation between
current element and existing element, and second step “complete block-support sets
identification”. With the help of simulation result, spectral efficiency, and SNR char-
acteristics is plotted for optimal performance, orthogonal matching pursuit, greedy
sequence clustering-based sparse pre-coding scheme which lessen number of quan-
tization bits for the analog pre-coder by well arranging the columns as per the block
sparsity structure [42].
Full-dimensional massive MIMO systems proposed use multi-layer pre-coding
for productively overseeing various types of interference and utilizing the expansive
channel attributes. For reducing inter-cell interference and intra-cell multiuser inter-
ference, enhancing effective signal power, three-layer pre-coding technique is used.
Under one-ring channel models for cell interior users and under single-path channels
for every user, this technique gives optimal performance [43].
pre-coding and combining method has best performance as compared to other two
methods for omnidirectional coverage at BS and user terminal.
4 Conclusion
CRN massive MIMO system has the capabilities to improve the efficiency of spec-
trum as well as improve the spectral efficiency within the band by reducing inter-
user interference with the help of pre-coder before transmission and detector at the
receiver. This detailed review process covered the constant envelope, omnidirec-
tional, multi-layer, two-stage subspace, and hybrid pre-coder with its specifications,
merits, and demerits. In this review, it was found that the multi-layer pre-coder by
extracting maximum signal at first layer and its performance can be improved due to
decontaminating pilot signal. For increasing the users at the base station for concur-
rent communications by enhancing the reliability and throughput of system, number
antennas at the base station should be increased. But at the same time, interference
between channels may also increases. Pre-coding and detection techniques should be
adaptable with client’s enhancement for maintaining required bit error rate. Minimum
mean square, VMC decomposition, and regression-based linear pre-coding tech-
niques give better performance for 24 mobile users with 64 or 128 antennas at
base station while truncated polynomial expansion gives better performance with 64
mobile users with 256 antennas at base station, but the rate of performance degrades
when the number of transmitting antenna increases. Further, pre-coding algorithm
implementation with ratio between user and antenna at base station less than eight
is the new challenge for massive MIMO-CRN.
224 M. Kothari and U. Ragavendran
Reference
1. Alliance NGMN (2016) Perspectives on vertical industries and implications for 5G. White
Paper (June 2016)
2. Kusaladharma S, Tellambura C (1999) An overview of cognitive radio networks. Wiley
Encyclopedia Electr Electron Eng 1–17
3. Mitola J, Maguire GQ (1999) Cognitive radio: making software radios more personal. IEEE
Personal Commun 6(4):13–18
4. Abdulkadir Y, Simpson O, Sun Y (2019) Interference alignment for cognitive radio commu-
nications and networks: a survey. J Sens Actuat Netw 8(4):50
5. Haykin S (2005) Cognitive radio: brain-empowered wireless communications. IEEE J Sel
Areas Commun 23(2):201–220
6. Seyfi M, Muhaidat S, Liang J (2013) Relay selection in cognitive radio networks with
interference constraints. IET Commun 7(10):922–930
7. Mathuranathan V. MIMO (2014)—Diversity and spatial multiplexing. https://www.gaussianw
aves.com/2014/08/mimo-diversity-and-spatial-multiplexing/
8. Perahia E (2008) IEEE 802.11n development: history, process, and technology. IEEE Commun
Mag 46(7):48–55
9. Fu L, Zhang YJA, Huang J (2013) Energy efficient transmissions in MIMO cognitive radio
networks. IEEE J Sel Areas Commun 31(11):2420–2431
10. Nguyen V-D, Tran L-N, Duong TQ, Shin O-S, Farrell R (2016) An efficient precoder design
for multiuser MIMO cognitive radio networks with interference constraints. IEEE Trans Veh
Technol 66(5):3991–4004
11. Force, FCC Spectrum Policy Task (2002) Report of the spectrum efficiency working group.
https://www.fcc.Gov/sptf/files/SEWGFinalReport_1.Pdf
12. Islam MH, Koh CL, Oh SW, Qing X, Lai YY, Wang C, Liang Y-C et al (2008) Spectrum survey
in Singapore: occupancy measurements and analyses. In: 2008 3rd International conference on
cognitive radio oriented wireless networks and communications (CrownCom 2008), pp 1–7.
IEEE
13. Mitola J (1999) Cognitive radio for flexible mobile multimedia communications. In: 1999
IEEE international workshop on mobile multimedia communications (MoMuC’99) (Cat. No.
99EX384), 3–10. IEEE
14. Chen Z, Wang C-X, Hong X, Thompson J, Vorobyov SA, Zhao F, Ge X (2013) Interference
mitigation for cognitive radio MIMO systems based on practical precoding. Phys Commun
9:308–315
15. Chen Z (2011) Interference modelling and management for cognitive radio networks. PhD
dissertation, Heriot-Watt University
16. Paulraj AJ, Gore DA, Nabar RU, Bolcskei H (2004) An overview of MIMO communications—a
key to gigabit wireless. Proceedings IEEE 92(2):198–218
17. Björnson E, Larsson EG, Marzetta TL (2016) Massive MIMO: ten myths and one critical
question. IEEE Commun Mag 54(2):114–123
18. Björnson E (2017) Six differences between MU-MIMO and Massive MIMO. https://ma-mimo.
ellintech.se/2017/10/17/six-differences-between-mu-mimo-and-massive-mimo/
19. Dapena A, Castro PM, Labrador J (2010) Combination of supervised and unsupervised algo-
rithms for communication systems with linear precoding. In: The 2010 international joint
conference on neural networks (IJCNN), pp 1–8. IEEE
20. Datta A, Mandloi M, Bhatia V (2019) Reliability feedback-aided low-complexity detection in
uplink massive MIMO systems. Int J Commun Syst 32(15):e4085
21. Gao C, Shi Y, Thomas Hou Y, Kompella S (2011) On the throughput of MIMO-empowered
multihop cognitive radio networks. IEEE Trans Mobile Comput 10(11):1505–1519
22. Marzetta TL (2010) Noncooperative cellular wireless with unlimited numbers of base station
antennas. IEEE Trans Wireless Commun 9(11):3590–3600
Massive MIMO Pre-coders for Cognitive Radio Network … 225
23. Huh H, Caire G, Papadopoulos HC, Ramprashad SA (2012) Achieving massive MIMO spectral
efficiency with a not-so-large number of antennas. IEEE Trans Wireless Commun 11(9):3226–
3239
24. Rusek F, Persson D, Lau BK, Larsson EG, Marzetta TL, Edfors O, Tufvesson F (2012) Scaling
up MIMO: opportunities and challenges with very large arrays. IEEE Signal Process Mag
30(1):40–60
25. Ngo HQ, Larsson EG, Marzetta TL (2013) Energy and spectral efficiency of very large multiuser
MIMO systems. IEEE Trans Commun 61(4):1436–1449
26. Noha Hassan ID, Fernando X (2017) Massive MIMO wireless networks. An overview. www.
mdpi.com/journal/electronics
27. Mandloi M, Bhatia V (2016) Low-complexity near-optimal iterative sequential detection for
uplink massive MIMO systems. IEEE Commun Lett 21(3):568–571
28. Manshaei MH, Félegyházi M, Freudiger J, Hubaux J-P, Marbach P (2007) Spectrum sharing
games of network operators and cognitive radios. In: Cognitive wireless networks. Springer,
Dordrecht, pp 555–578
29. Mandloi M, Hussain MA, Bhatia V (2017) Improved multiple feedback successive interference
cancellation algorithms for near-optimal MIMO detection. IET Commun 11(1):150–159
30. Li X, Bjornson E, Larsson EG, Zhou S, Wang J (2015) A multi-cell MMSE detector for
massive MIMO systems and new large system analysis. In: 2015 IEEE global communications
conference (GLOBECOM), pp 1–6. IEEE
31. Gao X, Edfors O, Rusek F, Tufvesson F (2011) Linear pre-coding performance in measured
very-large MIMO channels. In: 2011 IEEE vehicular technology conference (VTC Fall), 1–5.
IEEE
32. Pan J, Ma W-K (2014) Constant envelope precoding for single-user large-scale MISO channels:
efficient precoding and optimal designs. IEEE J Sel Top Signal Process 8(5):982–995
33. Da Silva MM, Dinis R (2017) A simplified massive MIMO implemented with pre or post-
processing. Phys Commun 25:355–362
34. Mueller A, Kammoun A, Björnson E, Debbah M (2016) Linear precoding based on polynomial
expansion: reducing complexity in massive MIMO. EURASIP J Wireless Commun Netw
2016(1), 63
35. Ge Z, Haiyan W (2017) Linear precoding design for massive MIMO based on the minimum
mean square error algorithm. EURASIP J Embedded Syst 2017(1):1–6
36. Ketseoglou T, Ayanoglu E (2018) Downlink precoding for massive MIMO systems exploiting
virtual channel model sparsity: IEEE Trans Commun 66(5):1925–1939
37. Chen J, Lau VKN (2014) Two-tier precoding for FDD multi-cell massive MIMO time-varying
interference networks. IEEE J Sel Areas Commun 32(6):1230–1238
38. Kammoun A, Müller A, Björnson E, Debbah M (2014) Linear precoding based on polynomial
expansion: large-scale multi-cell MIMO systems. IEEE J Sel Top Signal Process 8(5):861–875
39. Yue D-W, Li GY (2014) LOS-based conjugate beamforming and power-scaling law in massive-
MIMO systems. arXiv preprint arXiv, 1404.1654
40. Yang HH, Geraci G, Quek TQS, Andrews JG (2017) Cell-edge-aware precoding for downlink
massive MIMO cellular networks. IEEE Trans Signal Process 65(13):3344–3358
41. Liu A, Lau VKN (2015) Two-stage subspace constrained precoding in massive MIMO cellular
systems. IEEE Trans Wireless Commun 14(6):3271–3279
42. Liu X, Zou W (2018) Block-sparse hybrid precoding and limited feedback for millimeter wave
massive MIMO systems. Phys Commun 26:81–86
43. Alkhateeb A, Leus G, Heath RW (2017) Multi-layer precoding: a potential solution for full-
dimensional massive MIMO systems. IEEE Trans Wireless Commun 16(9):5810–5824
44. Prabhu H, Rusek F, Rodrigues JN, Edfors O (2015) High throughput constant envelope pre-
coder for massive MIMO systems. In: 2015 IEEE international symposium on circuits and
systems (ISCAS), pp 1502–1505. IEEE
45. Mohammed SK, Larsson EG (2013) Constant-envelope multi-user precoding for frequency-
selective massive MIMO systems. IEEE Wireless Commun Lett 2(5):547–550
226 M. Kothari and U. Ragavendran
46. Xia X-G, Gao X (2016) A space-time code design for omnidirectional transmission in massive
MIMO systems. IEEE Wireless Commun Lett 5(5):512–515
47. Meng X, Gao X, Xia X-G (2017) Omnidirectional precoding and combining based synchro-
nization for millimeter wave massive MIMO systems. IEEE Trans Commun 66(3):1013–1026
48. Garg S, Jain M, Gangopadhyay R, Rawal D (2016) Opportunistic interference alignment in
multi-user MIMO cognitive radio networks for different fading channels. In: 2016 Twenty
second national conference on communication (NCC), pp 1–6. IEEE
Design of MIMO Antenna Using
Circular Split Ring Slot Defected Ground
Structure for ISM Band Applications
Abstract In this work, a systematic approach for design of MIMO antenna using
circular split ring slot defected ground structure for Industrial Scientific and Medical
(ISM) band applications. The overall MIMO antenna is inserted on flame retardant
fiber glass epoxy (FR-4) substrate with the dimensions of 60 × 62.8 × 1.6 mm3 . The
elements of MIMO antennas are patch antennas defected with circular split ring slot
defected ground structures (CSRSDGS). The CSRSDGS are used for patch antenna
miniaturization to ISM band applications. The dimension of the individual patch
antenna element is 11.35 × 15.25 mm2 . The proposed MIMO antenna resonates at
5.725 GHz with a bandwidth of 265 MHz and mutual coupling coefficient (MCC)
of −22.42 dB which makes it suitable to use for ISM band applications.
1 Introduction
Recent growth in the wireless system has resulted in a high data rate and low latency
which are the most demanding requirements. To meet these requirements, the new
wireless standards were made. In these standards, multiple-input multiple-output
(MIMO) system is an emerging mechanization for achieving those requirements [1].
In literature, several MIMO antennas have been reported and each of the reported
papers uses different kinds of techniques to enhance the MIMO antenna performance.
MIMO antenna is an assembly of multiple antennas on the same ground plane. When
F. B. Shiddanagouda (B)
Department of ECE, Vignan Institute of Technology and Science, Hyderabad 508284, Telangana,
India
e-mail: [email protected]
R. M. Vani
Department of USIC, Gulbarga University, Kalaburagi 585106, Karnataka, India
P. V. Hunagund
Department of Applied Electronics, Gulbarga University, Kalaburagi 585106, Karnataka, India
more than one antenna elements in the same ground plane, the excitation of surface
waves can lead to high level of mutual coupling. Therefore, it becomes challenging
to achieve high data rate and low error rate with deployment of multiple antennas
in limited space and reduction of mutual coupling in the operating frequency bands
[2]. To achieve those requirements in this paper presents a compact four element
MIMO antenna using circular split ring slot defected ground structures for ISM band
applications. The following sections of the paper are explained in a systematic way
of proposed MIMO antenna design, results and discussions, and finally followed by
conclusion.
2 Antenna Design
Fig. 2 Geometry of
CSRSDGS
Table 2 Dimensions of
Parameters Dimensions (mm)
CSRSDGS
Radius of the inner circle r1 2.1
Radius of the outer circle r2 3.3
Circle width Cw 0.6
Spacing between the two ring Cg 0.6
Split ring gap g 0.4
Here, four CSRSDGS are defected exactly beneath the rectangular patch of the CMA
with a partition of λ/4 distance. Figure 3 shows the geometry of PMA. The complete
investigation was done for PMA, a significant size reduction, wide bandwidth, and
mutual coupling reductions are obtained.
(PMA) gives very low mutual coupling, i.e., −22.42 dB at 5.725 GHz as depicted in
Fig. 7, respectively.
In envelope correlation coefficient (ECC), it decides how much the communica-
tion channels are isolated. ECC can be estimated of individual elements from the
S-parameters [5]. So from Eq. (2), CMA achieved 0.001 ECC at 5.9 GHz is shown in
Fig. 8 and PMA achieved 0.002 ECC at 5.725 GHz is shown in Fig. 9, respectively.
∗
S S12 + S ∗ S22 2
ρ= 11
21
(2)
1 − | S11 |2 − | S21 |2 1 − | S22 |2 − | S12 |2
232 F. B. Shiddanagouda et al.
Diversity gain is a critical parameter that must be taken into account while evalu-
ating the MIMO antenna performance. The diversity gain (DG) has been calculated
using the mathematical Eq. (3) using ECC [6]. The obtained diversity gain of the
CMA is 9.9 dB, and PMA is also 9.9 dB, respectively.
DG = 10
1 − |ρ|2 (3)
Therefore, the value of ECC and DG can confirm PMA is acceptable for MIMO
operation.
The peak gain of an antenna decides the area of coverage and link budget of the
system. The CMA peak gain is 5.69 dB at 6 GHz as shown in Fig. 10, and PMA peak
gain is 4.65 dB at 5.25 GHz as shown in Fig. 11, respectively.
The radiation pattern decides how antenna propagates the electromagnetic energy.
The radiation pattern is studied at CMA resonating frequency of 5.9 GHz which is a
broadside radiation as shown in Fig. 12. The PMA radiation pattern is also studied
for the respective resonating frequency points. The radiation pattern at 5.725 GHz,
which is a broad side radiation, is shown in Fig. 13.
The results obtained from the CMA and PMA are summarized in Table 3. It has
been observed that PMA parameters show that there is an acceptable limit across the
operating band for ISM band applications.
4 Conclusion
This paper presented design of MIMO antenna using circular split ring slot defected
ground structure for ISM band applications. The proposed MIMO antenna (PMA)
resonates at 5.725 GHz. The antenna offers 265 MHz bandwidth and total peak gain
of 4.65 dB. Due to the ground plane defected with CSRSDGS unit cells, mutual
236 F. B. Shiddanagouda et al.
coupling between the antenna elements reduced better than -22.42 dB as well as
virtual size reduction of 4.6% is obtained. The proposed MIMO antenna, envelope
correlation coefficients, diversity gain, total peak gain, and radiation pattern show
that it is acceptable to the ISM band applications.
References
1. Patel R, Desai A (2018) An electrically small antenna using defected ground structure for RFID,
GPS, and IEEE802.11 a/b/g/s applications. J Prog Electromagn Res Lett 75:75–81
2. Fizzah S, Abid M (2018) Design and analysis of UWB MIMO with enhanced isolation. In:
Proceedings of ınternational electrical engineering conference
3. Balanis A (1993) Theory of antennas. IEEE Trans Antenna Propag AP-41(9)
4. Shiddanagouda FB, Vani RM, Hunagund PV (2019) Design and analysis of MIMO antenna for
next generation wireless applications. IEEE Explore 978-5386-7070-6/18/$31.0@2018IEEE
5. Han MS, Choi J (2010) Compact multiband MIMO antenna for next generation USB dongle
applications. IEEE Trans Antenna Propag AP-2(10)
6. Chi Y-J, Chen F-C (2012) 4-port quadric-polarization diversity antenna with novel feeding
network. In: Proceedings of the antenna and propagation Conference 2012
Performance Comparison of Arduino
IDE and Runlinc IDE for Promotion
of IoT STEM AI in Education Process
Abstract Giving early access to the knowledge and skills on the Internet of Things
(IoT) and artificial intelligence (AI) technologies would lead to early innovations and
inventions. The introduction of such technology to primary and high school students
is felt significant to harness the creativity of youths faster and earlier. In order to
do so, it is important to give them access to friendly and easy technologies. This
eventually will help them realize the potential of IoT and AI. This project studies
two platforms Arduino IDE and runlinc IDE on their user-friendliness, ease of IoT
STEM AI application development. A system model-based experimental comparison
was carried out. The microcontrollers and sensors are independent variables. The
required program code in lines of statements, time taken to develop the program code,
and parameters controlled by sensors as dependent variables. The user experience
survey was conducted to supplement the experimental findings. The respondents are
primary and high school students, university students and teachers, professionals,
and researchers. Respondents are largely who had experience of using both Arduino
This work is supported by Sri Lanka Technological Campus through the Responsive Research
Seed Grant with Grant ID RRSG/2020/B15. It is the joined project titled COVID-19 Online
BabySitting: Engaging Children through Learning STEM AI and IoT Technologies between the
Sri Lanka Technological Campus, SRI LANKA, and the Jigme Namgyel Engineering College,
Royal University of Bhutan, Bhutan.
and runlinc. Both the experiment result and the survey showed that the runlinc is
easy and faster to realize the development of IoT and AI application development.
1 Introduction
Living through the Fourth Industrial Revolution 4.0 (4IR), the technological advance-
ment has brought in innovations, inventions which benefited mankind in many folds.
The impact of 4IR is already felt in many dimensions, and it is expected to further
grow with ever-growing technological and socioeconomic evolution taking place
around us [1]. But, on the other hand, 4IR has also proved to be disruptive technol-
ogy to the extent of world leaders calling for the move to abandon it [2, 3] because AI
in particular will expedite human extinction on the planet earth. Industrial revolutions
have also led to wealth concentration. The AI in particular has intensified risks of
dehumanizing to the point human now face an existential threat in both environmen-
tal and humanitarian terms [3]. Leading to the Fifth Industrial Revolution 5.0 (5IR)
in contrast to 4IR, the technology and innovations best practices are being bent back
toward the service of humanity by the champions of 5IR [3]. Tide is changing to AI
fighting AI by transforming it for the benefit of humanity by harnessing its positive
potential. Scientific community is focused in harnessing the potential of emerging
technologies for rapid development across all spectrum of applications.
Dr. George Land’s creativity test for NASA and similar studies are however even
relevant today [4–7]. As per the authors [7–10] early access to the knowledge and
skills on technologies are important to harness the creative thinkings. This would
eventually result to early innovations and inventions. As per [8], only around 28 per-
cent of high school entrants declare interest in a STEM-related field. However, 57%
of these students lose interest in STEM fields by the time they graduate from high
school. The need for STEM literacy in finding the global solutions and the prospect
of fostering its impact at all levels of educations have led to developing a STEM
curriculum [9]. This is expected to transform the STEM education in preparing the
future citizens through collaborative and interdisciplinary approach. The importance
of STEM education is realized, and initiatives are taken to enhance as per [10, 11] to
meet the STEM skills required in the job market in higher education. Java language
is fastest compared to [12] in performance which in turn play vital role in promot-
ing STEM and computer science education. As a measure to promote project-based
learning toward finding the real-world solution, it is recommended in [13] that the
school administration encourages STEM programs at the K12 level of education.
This will have bigger impact on the number of STEM courses offered at the uni-
versity education [13, 14]. However, study on the prospect of greater impact by the
emerging technology if such technologies are introduced early is very limited. STEM
education based on technology such as Arduino, Raspberry Pi microcontroller are
largely prevalent in higher education, research and professional applications. The
Performance Comparison of Arduino IDE and Runlinc IDE . . . 239
access to these technologies is almost non-existent in primary and high school edu-
cation particularly in disadvantaged society. Considering this prevailing situation and
the prospect of reaching technologies is for innovations, it is felt that the awareness,
knowing, and recognizing the right tool is important. Providing the knowledge and
skills early would mean harnessing and unlocking the creativity of youths, thereby
reaching higher education skills to grassroots many folds early.
To realize the potential of AI and IoT in enhancing the quality of education and
significance of transformed STEM education reaching out to the grassroots, this
paper examines two basic AI and IoT development platforms Arduino IDE and run-
linc IDE. This work intends to compare the usability, accessibility, user-friendliness,
and potential of these two platforms for better understanding and development of AI
and IoT applications. The comparison of the two platforms are drawn on experiment-
based case study carried out by authors on ease of coding, performance based on time,
and interactivity of the platforms. User experience survey is carried out to ascertain
the experimental findings. Survey respondents include student between the age of
10–15, university students, and professionals with first-hand experience on both the
platforms. The work is arranged in following sections: Sect. 2 outlines the opportu-
nities and challenges with IoT and AI for enhancing the quality of education through
literature review. Section 3 presents the case study carried out through experimental
set up where AI- and IoT-enabled smart home system model is considered. In Sect. 4,
the platforms for AI and IoT applications development Arduino IDE for Arduino and
runlinc IDE for STEMSEL are studied on the basis of ease of coding and usage. In
Sect. 5, two platforms are studied through experimental analysis. In Sect. 6, results of
experiment and the data collected from the survey are further analyzed. The general
findings and potential future works are presented in Sect. 7.
IoT have potential to connect 28 billion devices to the Internet by 2020 [15]. Fur-
ther, as per [16], 127 new IoT devices are connected to the web in every second
and estimates the installation of 31 billion IoT devices during 2020. In all these IoT
applications, variety of microchips or microcomputers are used as a backbone. In
[17], UNESCO outlines AI for sustainable development, its challenges and oppor-
tunities. However, UNESCO report remains only a prospect of AI for sustainable
development where very little is achieved thus far. Leveraging the AI and IoT tech-
nologies can lead to achievement of United Nations Sustainable Development Goals
(UN SDGs) particularly in poverty elimination, gender balance and enhancing qual-
ity of education. The executive summary in [18] outlines the analysis on how AI
can be used to improve learning outcomes with the priority to accomplish sustain-
able development goal 4: Equitable and Quality Education For All. Reference [18]
also outlines the six challenges on the context of AI technology for quality education
240 S. Chedup et al.
where main challenges are examined in four categories. References [19, 20] examine
the prospect of positive impact by IoT in transforming technical higher education.
Further, it states that impact of IoT will be greater in higher education, particularly
in universities. The smart education concept is thoroughly analyzed through liter-
ature review in [21] from theoretical point of view. The research opportunities in
AI and IoT are outlined to make learning more creative and attractive for students
by incorporation of developing hands-on skills and capacity in controlling or oper-
ating appliances used in everyday life into educational settings. Rajput [22] found
out that despite the tremendous progress made in technological advancements in
the past years, incorporation of technology to ensure smart classroom systems has
been delayed for some reasons. It is important to leverage modern emerging tech-
nologies in teaching–learning. Authors in [22] further implied on the significance
of changing classroom settings based on emerging technologies to ensure teaching
learning becomes more interactive. Through such measures, students should be able
to identify problems and come up with ideas to solve. “IoT can reform the education
system considering many benefits such as active engagements, enhancing quality of
instruction and greater efficiency” [22].
The potential of IoT in developing assistive technology in tertiary education for
people with disabilities is theoretically reviewed in [20]. Banica et al. in [23] con-
cluded that the concept of IoT has great potential to remove all barriers to education,
such as physical location, geography, language and economic development. The
combination of technology and education would lead to faster and simpler learning,
improve the level of knowledge and the quality of students. But, as any new con-
cept emerged, it still has no widespread functional models and standards; moreover,
universities are not prepared to accept all the changes proposed by IoT in the edu-
cational sector. In order to work toward achieving all the potential of IoT and AI,
penetration of these technologies is important in primary and high schools, not only
in tertiary education and researchers. Introducing these technologies at early stage of
education will have greater potential to unlock the imagination and creativity among
the youths in developing a better technology for sustainable development. While so
much research has happened and continues to happen on emerging technologies,
introduction of AI and IoT platforms is more or less uncommon to young minds,
particularly for youths in disadvantaged societies. STEM education is perceived to
be more demanding and difficult among high school graduates, and this has led to
decrease in enrollment against the need for creative, innovative, and talented work-
force [10]. In [9], it was noted that secondary teachers are more reluctant toward
transforming STEM education as compared to the positive response from the early
years and elementary cohorts. Incentives for the faculty involved in STEM education,
outreach, and its potential impact were studied in [11]. Reference [24] identifies the
lack of soft skills and formal training of educators in concepts of STEM education as
major obstacle for STEM-based education. The need for better understanding, better
teaching methods and more help in STEM education was identified and provided
recommendations to advance the STEM education [15]. Further, the authors in [25–
30] examined the IoT platforms and applications; however, its impact on education
process is not strongly mentioned.
Performance Comparison of Arduino IDE and Runlinc IDE . . . 241
Fig. 2 Methodology
Further, in order to ascertain the findings from experiment by authors, user expe-
rience Google survey on Arduino and runlinc was incorporated in the findings. The
sample includes 8 students below the age of 15, 67 university students between the
age 15–25, and 25 professionals from teaching and research above the age of 25.
The sample includes the users with experience on both the platforms. Responders
are from Bhutan, Australia, Malaysia, Uganda, Nigeria, Cameroon, and Philippines.
The waterfall methodology in Fig. 2 was used to ensure the key stages of study are
maintained and fulfilled accordingly.
5 Microcontrollers
Arduino [27, 40, 41] is an open-source electronics platform useful to make interac-
tive projects. Arduino microcontroller acts as a central processing unit (CPU) which
244 S. Chedup et al.
senses the environment by integrating many sensors. CPU will activate peripheral out-
put devices by receiving inputs from many sensors, thereby affecting its surrounding
by controlling lights, motors, and other actuators. The Arduino programming envi-
ronment is a convenience abstraction layer between C/C++ coding commonly used
for programming microcontrollers. The Arduino abstraction layer has made a splash
by allowing the programming of microcontrollers using simplified IDEs. The stan-
dardized hardware and nurturing an active open-source community that contributes
to code libraries simplify the use of many hardware peripherals such as sensors
and motors. Using Arduino is much easier and quicker than directly programming
microcontrollers. This ease of use has allowed many students, electronics hobbyists,
and programming enthusiasts to learn and apply microcontrollers to a broad array
of applications including the cutting-edge domains of IoT and AI. Arduino is one
step removed from programming microcontrollers directly. Yet, the leap between
programming Arduino and directly programming an ATMEGA328 IC is quite large.
The open-source Arduino software (IDE) provides platform to code and upload the
code to the board [35]. It is compatible on Windows, Mac OS X and Linux. The
code in this environment is written in Java, but it also works on processing based and
other open-source software. This IDE is used with any Arduino board. Arduino UNO
is used in this experimental setup. Arduino compatible ESP8266 Wi-Fi module is
required to be installed separately to connect to the Internet so that users can operate
from anywhere in the world. It has 8 pins with 2 pins [42] of 1 TXD and 1 RXD,
2 GPIO pins, viz. GPIO 0 and GPIO 2, Reset, VCC and Ground. TX and RX are
the transmitter and receiver pins used to flash the embedded code. AMS1117 will
supply 3.3V to the ESP8266 Wi-Fi module [36, 37].
6 Implementation
Same parameters are considered to rig up a circuit on both Arduino UNO and STEM-
SEL. Program code to control the output devices, viz. LED, motor, fan from the data
generated by input analog devices, viz. LDR, thermistor, and LM35 is developed
on Arduino IDE and runlinc IDE. The time taken to write a complete code for each
component and corresponding number of lines of code required in making the device
operational is noted for analysis. The program code in deploying all the system com-
ponents to build a required system model is developed. The corresponding time taken
and number of lines of code required are noted. The time taken to build a physical
circuit on both the platforms is not considered for comparison as it was noted that the
time required was almost same in both. The experimental arrangements, results, and
observations are made in Figs. 3 and 4. The overall results and observations from the
experiment is represented in Table 2.
Fig. 4 Expt. 2—Control LED/lamp with the input from sensors using Arduino UNO and STEMSEL
246 S. Chedup et al.
The basic difference between the two platforms Arduino and runlinc—Arduino
is programmed on computer, whereas the later is programmed on web page/Internet.
Same parameters are considered to ensure same yardstick is maintained for better
results. The time taken and complexity to build the physical or virtual circuit is
not considered for the comparison. As per the experimental results in Table 5, the
number of lines of code which is equated with the time is very less in runlinc to
achieve the same result. As the complexity of program grows for the increase in
Performance Comparison of Arduino IDE and Runlinc IDE . . . 247
Fig. 5 Graph represents the age range of respondents with experience on Arduino IDE and runlinc
IDE
the parameters incorporated into the system, the number of lines of code required
in Arduino increases almost exponentially. The number of lines of code and time
taken is even more drastic while incorporating the concept of IoT. On the other
hand, creating an ON/OFF button on runlinc requires mere 2 lines of code, whereas
it is many folds in Arduino. Withstanding the procedure to create web server in
Arduino IDE, it requires 64 lines of code to create an ON/OFF [43]. The analysis of
responses on the survey is presented in graphical representation in Figs. 5, 6, 7, 8,
9, 10, 11, 12, 13 and 14. Sample largely consist of university students represented
in Figs. 5 and 6 with relatively good experience on the usage of platforms specified.
It is found that large percentage of university students have good knowledge on AI
and IoT applications basically developed using either of the platforms as indicated
by Figs. 7 and 8. It is also found that larger section of sample has better experience
of doing AI and IoT projects/research on Arduino and runlinc. On further analysis,
users experience of doing AI and IoT projects is better with runlinc compared with
Arduino as indicated in Fig. 8, especially for the beginners.
Fig. 6 The graph represents the qualification and profession of respondents with experience on
Arduino IDE and runlinc IDE
Fig. 13 Users feedback on the ease of coding which is dependent on the number of lines of code
252 S. Chedup et al.
Fig. 14 Users response on the potential/prospect of IoT STEM AI technology for quality education
of real scenario of the Internet of things since it enables in controlling devices in the
Internet as well as on-chip, thereby making it easier and effective to teach AI and
IoT technologies for the beginners. On the other hand, lengthy, complex, and labo-
rious program code has made challenging, especially for beginners, with Arduino
despite abundance of open-source resources freely available with different versions
of Arduino microcontrollers supported by Arduino IDE. Few standout features of
runlinc IDE and STEMSEL are its ability to create sound, vision, and motion with
only 1 line of code and two lines of code to build ON/OFF buttons on web page
to control any devices anywhere remotely. Considering the simplicity, faster, and
less coding involved, it provides good alternative to Arduino in reaching out and
realizing the great potential of AI and IoT in primary schools to universities. Future
research could focus on how these technologies can transform the STEM education
into educational tool for educating on IoT and AI technologies. Considering the less
time required for the execution, it can also be used in developing smart systems in
epidemic situations. This also has potential for high-end researches in AI and IoT.
References
1. Morrar RA, Saeed HA (2017) The fourth industrial revolution (Industry 4.0): a social innovation
perspective. Technol Innov Manage Rev 7(11):12–20
2. Yunus M (2018) Yunus Warns of survival threat from artificial intelligence. The
Economic Times. https://www.thequint.com/news/hot-news/yunus-warns-of-survival-threat-
from-artificial-intelligence. Accessed 9 Mar 2020
3. Gauri P (2019) What the fifth industrial revolution and why it matters. World Eco-
nomic Forum. https://europeansting.com/2019/05/16/what-the-fifth-industrial-revolution-is-
and-why-it-matters/. Accessed 12 Sept 2020
4. Venkatraman R (2020) You’re 96 percent less creative than you were as a child. Here’s How to
reverse that sure, you can’t be a kid again, but you can think like one. INC. Accessed 23 May
2020
5. The waste of creative talents. George Land’s creativity test. In: LIFE, 16 Jan 2015. Accessed
24 May 2020
6. Land G. Evidence that children become less creative over time (and how to fix it). In: TED
talk. Accessed 23 May 2020
7. Robinson K (2006) Do schools kill creativity? TED ideas worth spreading
8. Bues D (2019) STEM education: how best to illuminate the lamp of learning. In: 2019 IEEE
integrated STEM education conference (ISEC). IEEE
9. Francis K et al (2018) Forming and transforming STEM teacher education: A follow up to pio-
neering STEM education. In: 2018 IEEE global engineering education conference (EDUCON).
IEEE
10. Vasiu R, Andone D (2019) An analyze and actions to increase the quality in STEM higher
education. In: 2019 IEEE integrated STEM education conference (ISEC). IEEE
11. Miorelli J et al (2015) Improving faculty perception of and engagement in STEM education.
In: 2015 IEEE frontiers in education conference (FIE). IEEE
12. Huang A (2015) Comparison of programming performance: promoting STEM and computer
science education. In: 2015 IEEE integrated STEM education conference. IEEE
13. Forawi S (2018) Science, technology, engineering and mathematics (STEM) education: mean-
ingful learning contexts and frameworks. In: 2018 International conference on computer, con-
trol, electrical, and electronics engineering (ICCCEEE). IEEE
14. Thibaut L et al (2018) The influence of teachers’ attitudes and school context on instructional
practices in integrated STEM education. Teach Teacher Educ 71(2018): 190–205
15. Goldman Sachs (2014) The Internet of Things: making sense of the next mega-trend, vol 201
16. Maayan D (2020) The IoT rundown for for 2020: stats, risks, and solutions. security today.
Accessed 8 Apr 2020
17. UNESCO. Artificial intelligence for sustainable development: challenges and opportunities for
UNESCO’s science and engineering programmes. Principles for artificial intelligence towards
a humanistic approach?
18. Pedro F et al (2019) Artificial intelligence in education: challenges and opportunities for sus-
tainable development
19. Aldowah H et al (2017) Internet of Things in higher education: a study on future learning. J
Phys: Conf Seri 892(1) (IOP Publishing)
20. Hollier S, Abou-Zahra S (2018) Internet of Things (IoT) as assistive technology: potential
applications in tertiary education. In: Proceedings of the internet of accessible things, pp 1–4
21. Martín AC et al (2019) Smart education: A review and future research directions. Multidisc
Digit Publ Inst Proc 31(1)
22. Rajput M (2020) Use of IoT in education sector and why it’s a good idea. IoT for all, 31 Dec
2019. Accessed 9 Mar 2020
23. Banica L et al (2017) The impact of internet-of-things in higher education. Sci Bull-Econ Sc
16(1):53–59
24. Goodwin M et al (2017) Strategies to address major obstacles to STEM-based education. In:
2017 IEEE integrated STEM education conference (ISEC). IEEE
254 S. Chedup et al.
25. Ray Partha Pratim (2016) A survey of IoT cloud platforms. Future Comput Inf J 1(1–2):35–46
26. Ganguly P (2016) Selecting the right IoT cloud platform. In: 2016 International conference on
Internet of Things and applications (IOTA). IEEE
27. Novák M et al (2018) Use of the Arduino platform in teaching programming. In: 2018 IV
international conference on information technologies in engineering education (Inforino). IEEE
28. Singh KJ, Kapoor DS (2017) Create your own internet of things: a survey of IoT platforms.
IEEE Consumer Electron Mag 6(2)57–68
29. Pflanzner T, Kertész A (2018) A taxonomy and survey of IoT cloud applications. EAI Endorsed
Trans Internet of Things 3(12) (Terjedelem-14)
30. Tayeb S et al (2017) A survey on IoT communication and computation frameworks: An indus-
trial perspective. In: 2017 IEEE 7th annual computing and communication workshop and
conference (CCWC). IEEE
31. Sruthi M, Kavitha BR (2016) A survey on IoT platform. Int J Sci Res Mod Educ (IJSRME).
ISSN (online) 2455-5630
32. Satu MS et al (2018) IoLT: An IoT based collaborative blended learning platform in higher
education. In: 2018 International conference on innovation in engineering and technology
(ICIET). IEEE
33. Ciolacu MI et al (2019) Education 4.0—Jump to innovation with IoT in higher education. In:
2019 IEEE 25th international symposium for design and technology in electronic packaging
(SIITME). IEEE
34. Ciolacu MI, Binder L, Popp H (2019) Enabling IoT in education 4.0 with biosensors from
wearables and artificial intelligence. In: 2019 IEEE 25th international symposium for design
and technology in electronic packaging (SIITME). Cluj-Napoca, Romania, pp 17–24
35. Sani RM (2019) Adopting Internet of Things for higher education. In: Redesigning higher
education initiatives for industry 4.0. IGI Global, pp 23–40
36. The duke perspective, impact of industry 4.0 on education, 21 Mar 2019. Accessed 16 March
2020
37. Hurtuk J et al (2017) The Arduino platform connected to education process. In: 2017 IEEE
21st international conference on intelligent engineering systems (INES). IEEE
38. Herger ML, Bodarky M (2015) Engaging students with open source technologies and Arduino.
In: 2015 IEEE integrated STEM education conference. IEEE
39. Yoo W, Pattaparla SR, Sameer AS (2016) Curriculum development for computing education
academy to enhance high school students’ interest in computing. In: 2016 IEEE integrated
STEM education conference (ISEC). IEEE
40. Gandhi PL, Himanshu SM. Smartphone-FPGA based ballon payload using cots components.
Memory 32.72KByte: 8KByte
41. Shlibek M, Mhereeg M (2019) Comparison between Arduino based wireless and wire methods
for the provision of power theft detection. Eur J Eng Sci Technol 2(4):45–59
42. Srivastava P et al (2018) IOT based controlling of hybrid energy system using ESP8266. In:
2018 IEEMA engineer infinite conference (eTechNxT). IEEE
43. Santos R (2020) ESP8266 web server with Arduino IDE. In: Bench test Arduino server 2 LEDs
pc1.pdf, web. 17 Mar 2020
Analysis of Small Loop Antenna Using
Numerical EM Technique
1 Introduction
In Eq. (2), the left hand side can be written on the component wise as
x̂ E xt 0 cos(A F (ω)t + φx ) + ŷ E yt 0 cos A F (ω)t + φ y
+ ẑ E xt 0 cos(A F (ω)t + φz ) (2a)
√
In both Eqs. (2) and (2a) j = −1. Following Eqs. (2) and (2a), the electric field
in terms of the phasor can be written as
E ∠φ x∠φ , y∠φ , z ∠φ = x̂ E xt 0 e jφx
+ ŷ E yt 0 e jφ y + ẑ E zt 0 e jφz (3)
where ε = ε0 εr .
Here, ε0 represents free space permittivity 8.854 × 10−12 F m, and εr represents
medium’s relative permittivity.
μ = μ0 μr
where C B represents the boundary of the contour, SC represents the surface enclosed
by the contour, SV denotes the surface of the volume, and VeS denotes the volume of
the enclosed surface. In Eq. (6), all integrations are carried over closed surfaces.
258 R. Seetharaman and C. K. Chevula
2 Loop Antennas
Electric field present at a x from the origin can be written as an integral of the plane
wave
¨
E Pw (x) = FPw ()e i k · x d (7)
4π
where E Pw (
x ) is the electric field at x by taking into consideration all real angles
present in the domain ,
is the solid angle that takes into consideration both elevation and azimuth angles,
e i k · x is the plane wave representation.
In Eq. (7), d is formulated as d = sin ξ dξ dθ , and the wavenumber k is given
by
k = −k x̂ sin ξ cos θ + ŷ sin ξ sin θ + ẑ cos ξ (7a)
2π π
E Pw r̂ = FPw (ξ, θ )e i k · r sin ξ dξ dθ (8)
0 0
Equation (7) contains the angular spectrum component FPw () which is
expressed as
where ξ and θ are orthogonal with respect to each other in a vector sense and are
−
→
orthogonal specifically to k also. The quantities Fξ Pw and F θ Pw are complex and
can be written, respectively, as
In Eqs. (10) and (10a), Sen is the enclosed surface, and dS is the elemental surface.
With the surface S subjected to the required boundary conditions, we have
n̂ × E ∼
= ζ H (11)
Equation (11) holds for the surface S. Laplacian of electric field intensity can be
written as
E
ω =√ (13)
με
∇ × E = iω B (15)
where B is the magnetic flux density in Tesla. From Eqs. (7) and (15), we can write
1
H (
r) = ∇ × E Pw (
r) (16)
iA F (ω)μ
260 R. Seetharaman and C. K. Chevula
Fig. 1 Diagrammatic
representation of the small Axis
loop antenna H
E
I
where η is the characteristic impedance of free space. The relationship between mean
square magnetic field and mean square electric field is given as
2
2 E msq (
r 2 )
r 1 ) =
Hmsq ( (17)
η2
Directivity is defined by
r = 4π PE (θ, φ)
D (18)
PE (θ, φ)d
where PE (θ, φ) is the power emitted toward the elevation and azimuth angles and
PE (θ, φ) is the total power emitted by the system. Under special circumstances,
we have
4π
Dr (θ, φ) = A p (θ, φ) (19)
λ2
Analysis of Small Loop Antenna Using Numerical EM Technique 261
Fig. 2 Illustration of
Azimuth and elevation angle
application to small loop
antenna
E Pw (
r )ω2 μ2 Al
Plr = (22)
12η2 Rr
Figure 3 illustrates the idea of applying the triangular mesh to the small loop antenna
in this problem. Each unit sphere consists of the triangular mesh arrangement. Choice
of the triangular mesh goes for making the computational tasks easier. For better
performance of this small loop antenna, finite element mesh method is applied.
Complex tasks can be completed because this problem involves both 3D and 2D
cases due to the fast processing of the triangular mesh. It forms the basis function
to be solved within the computational domain. This is aided by the fact that finite
element method needs less memory compared to other techniques and can handle
shapes geometrically without altering it.
Fields and field patterns are present in all unit spheres with loop antenna. This
defines the computational domain. Dividing this domain into smaller domains with
associated boundary conditions paves for the next set of the application of finite
Triangular
Mesh
Current
loop
Analysis of Small Loop Antenna Using Numerical EM Technique 263
element method (FEM). This has area where not only fields are marked, but areas
where fields have to be found.
A SM0 SM E Pw ( r ) ϒ E,H = K BC
r ), H ( (23)
4 Methodology
E ∠φ helps to get the value of the electric field over the surface, while the respective
electric and magnetic components of Maxwell’s phasor set of equations help to
arrive at the result on the surface area and volume of the material. E Pw ( r ) gives
the plane wave component of electric field present on a point over the loop antenna.
Then, E & H can be calculated over a solid angle.
P dt gives the power available over a particular area. With this P dt , the problem gets
solved by calculating for the directivity aspect of small loop antenna. Component-
wise results for E & H become the next part of the calculation. With this available
parameters, loops antenna’s polarization factor is given by Plr . These parameters
will provide the data in this problem.
Minimization of the mesh with the help of conjugate gradient method is the pivotal
part. Conjugate gradient method additionally demands A SM0 SM which must be positive
definite. Initially, the problem gets solved by setting
A SM0 SM E Pw (
r ), H (
r) ϒ E,H − K BC = ϒ E,H (24)
ϒ E,H = min
ϒ E,H (25)
Equation (25) states that Eq. (24) is solved by minimization procedure. This takes
the problem to the new level of setting
ϒ E,H = 0
∇min (26)
Equation (26) solves the problem with the aid of successive iteration of the loops by
getting the values of directions graphically and assigning newer minimized values of
ϒ E,H during each iteration with the assistance of coefficients to directional vector.
This cumulative effort of running N number of iterations over the entire vector space
of Eq. (24) will work out the minimized values of ϒ E,H . This truncation scheme
running over the entire computer domain will finalize the small antennas’ shape for
superior results in communication systems.
5 Antennas-Gradient Methods
Topological derivative, which is one of the gradient methods, is used for obtaining
sliced sections of an image. High-frequency waves can be fine-tuned for operations
in space atmosphere with the help of Strum Liouville operator which is a differen-
tial operator. Weak derivative along with distributional derivative methods is used
for obtaining splayed out results in medical images. Piezo electric material used
in ultrasound equipments of medical imaging can also be used for designing the
antennas.
Diffusion operators can also be used for gradient solutions. Intelligence antennas
are used for the short-range communications in the case of radio frequency applica-
tions [1]. Short antennas can also be used for broadcasting the amplitude modulated
wave in the mid-frequency range [2]. Short pulse antenna, a type of short antennas
is used for analyzing the radiation of a source using spherical waves [3].
Conical helical antenna, another type of short antenna can be used not only for
communication purposes, but also for imaging purposes [4]. Calculating the magnetic
field of near zone for small loop antenna with better accuracy is possible. This
magnetic field can be used as a reference for calibrating other field meters [5]. The
same is calculated by taking into account the polarization factor [6].
Analyzing the small loop antenna in the range of 3–10 MHz forms an interesting
study for its behavioral pattern [7]. Small loop antennas can also be used as a probe for
investigating magnetic fields [8]. Study on small loop antenna’s radiation efficiency
when it is fabricated from a super conductor is quite interesting [9]. Polarization
diversity study with the help of loop antenna forms a gripping application of loop
type of antennas [10].
Analysis of Small Loop Antenna Using Numerical EM Technique 265
The phasor form of Maxwell’s equation, its integral form, and plane wave component
of electric field are all taken into account for improving the aspects of small loop
antenna in terms of directivity, power radiation, etc., with the help of I-V angles.
Further treatment of finite element method and conjugate gradient method helps
to achieve superior performance of small loop antennas. Matrix methods help to
solve design elements for small loop antennas and can also be used for other type of
antennas. Electric field expressed in terms of phase is taken into account for the rest
of the problem. Time dependent Maxwell’s equation in the phasor form is considered
for application to the electromagnetic radiating material. Integral form of Maxwell’s
equation gives the required result for calculating the surface area and volume of
the radiating loop antenna. Plane wave component of the electric field, directivity,
and polarization factor further helps in analyzing the performance of the small loop
antenna. Assignment of positive symmetric linear matrix equation for improving
the performance of the loop antenna is the highlight of this problem. Finite element
method is applied to the shape of antenna by simultaneously handling the topology
of small loop antenna. Conjugate gradient method helps to solve the problem for
improved designs of small antennas.
References
1. Mikko S, Pekka KV (2009) Apparatus and method for controlling diverse short range antennas
of a near field communication circuit. US 7,541,930 B2
2. Trainotti V (2001) Short medium frequency AM antennas. IEEE Trans Broadcast 47(3):263–
284
3. Shlivinski A, Heyman E (1999) Time domain near field analysis of pulsed short pulsed antennas.
IEEE Trans Anten Propag 47(2):271–279
4. Nenzi P, Varlamava V, Marzano FS, Fabrizio P (2013) U-Helix: on chip short conical antenna. In
Proceedings of 7th European Conference on Antennas and Propagation (EuCAP), Gothenburg,
Sweden, pp 1289–1293
5. Frank MG (1967) The near-zone magnetic field of a small circular-loop antenna. J Res National
Bureau Standards—C Eng Instru 71C(4):319–326
6. Bhattacharyya BK (1964) Electromagnetic fields of a small loop antenna on the surface of a
polarizable medium. GeoPhysics 29(5):814–831
7. Boswell A, Tyler AJ, White A (2005) Performance of a small loop antenna in the 3–10 MHz
band. IEEE Anten Propag Mag 47(2):51–56
8. Whiteside H, King R (1964) The loop antenna as a probe. IEEE Trans Anten Propag 12(3):291–
297
9. Wu Z, Mehler MJ, Maclean TSM, Lancaster MJ, Gough CJ (1989) High TC superconducting
small loop antenna. Phys C Superconduct Appl 162(01):385–386
10. Kim DS, Hyung Ahn C, Yun T, Sung JL, Kwang CL, Wee Sang P (2007) A windmill-shaped
loop antenna for polarization diversity. In: Proceedings of IEEE antennas and propagation
society international symposium, Honolulu, HI, pp 361–364
A Monopole Octagonal Sierpinski Carpet
Antenna with Defective Ground
Structure for SWB Applications
1 Introduction
(FR-4) dielectric medium and a DGS with two truncations on either ends of the
ground [11].
In addition, the fractal structures can be applied as loads, ground structure, coun-
terpoises, etc. Fractal resonators are the models which are recently evolved in wide-
band systems with negative refractive index. It is familiarly known as metamaterials.
Metamaterial structure with close packed fractal resonators will incorporate a wide-
band of microwave frequencies. The fractal structures are introduced in filters for size
miniaturization and better rejection. The paper comprises of fractal antenna design
methodology in Sect. 2; simulation results such as S11, VSWR, bandwidth, gain,
and directivity in Sect. 3; Sect. 4 concludes the paper.
On designing a patch antenna, there are certain design methodologies and principles
must be followed as given below.
2.1 Methodology
c
W = (1)
(εr +1)
2 fr 2
C 0.412h(εeff + 0.3) Wh + 0.264
L= √ −2 (2)
2 fr εeff (εeff − 0.258) Wh + 0.8
As represented in Table 1, the patch width denoted as W and length of the patch is
represented as L [14, 15]. Figure 2a and b represents the L and W of substrate and
ground plane which are denoted as (W s and L s ) and (W g and L g ), respectively, where
the effective dielectric constant Eeff is represented as
1
εr + 1 εr − 1 h − /2
εeff = + 1 + 12 (3)
2 2 W
(a) (b)
Fig. 2 Optimized specifications of octagonal patch antenna a front view (patch), b back view
(ground plane)
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 271
(a) (b)
2.2 Configuration
The front view and the back view design of the intended antenna are pictured in
Fig. 3a, b accordingly with respect to geometric representations [16, 17]. The octag-
onal patch is traced on a FR-4 dielectric material with permittivity (Er = 4.4), loss
tangent (tan δ = 0.02), and thickness (t = 1.6 mm). The antenna design occupies
compact area in the circuit board [18–20].
(IFS) manner. As represented in Table 2, the dimensions of the octagons and the
space between the octagons are decided using IFS. The Hausdorff dimension of the
carpet is represented as
d
ai+1 = ai (4)
3k
log d
=S (5)
log 3
where
i
ai = 3dk denoted as area of iteration
k Number of iterations
d Number of slots in that iteration
S Size of the particular slot.
1
Ln = (6)
4π 2 2 Ci
f on
fc
Cn = (7)
2Z 0 2π f n2 − f 01
2
1
CP = r2 (8)
2π f T X n(n−1)
X n2 − X n1 Li
Lsn = + 2
(9)
2π f T fr
f on
−1
As shown in Fig. 5, the DGS in the ground induces a parallel combination of capac-
itance (C n ) and inductance (L n ) due to the dielectric slit between the metal layers.
The slots are made under the microstrip line and provide a parallel capacitance. The
series inductance is induced and when the frequency is increasing reactance of the
transmission line and capacitance increases and decreases accordingly [23–25]. The
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 273
defective ground structure will provide better impedance matching. Due to the varia-
tion in capacitance, the electric field gets changes. Due to surface waves and fringing
fields, L s1 and L s2 enable some inductance. The series inductance and parallel capac-
itance initiate attenuation pole and eliminate certain frequency signal, which leads
multiple successive resonance frequencies of L n and C n resonators. Therefore, due
to the current distribution in two paths in both the side of transmission line, the
bandwidth of the antenna enhances to high signal region [26, 27].
to the lower frequency radiated by the antenna. The fractional bandwidth implies the
ratio between the impedance bandwidth and center frequency.
The current distribution in the antenna for different iterative levels is compared at
10 GHz resonance frequency. Figure 11 represents the current dissemination focused
on the external and internal margins of the octagonal slots in the patch or radiating
element. This feature affects the impedance of the antenna matching with the trans-
mission line. Thus, DGS is introduced in the system consolidates the problem with
the slots etched in the ground plane.
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 275
The antenna model has been fabricated and examined in R&S vector analyzer
(ZNB-20) which is capable of measuring 100 Hz to 20 GHz. The measured antenna
is producing a promising consent with the simulated antenna design. Here, Fig. 12a,
b shows the fabricated design model of the monopole notched octagonal Sierpinski
Fractal antenna with DGS for SWB applications. Figure 13 shows the output wave-
form obtained by the antenna on measuring with the ZNB-20 vector analyzer (100 Hz
to 20 GHz).
276 E. Aravindraj et al.
(d) (e)
Fig. 10 Radiation pattern in different iterative levels a 0th iteration, b 1st iteration, c 2nd iteration,
d 3rd iteration, and e 4th iteration
The measured S11 output of the antenna is shown in Fig. 13. The measured results
give almost the same values in terms of S11 which are 4.1–19.8 GHz (S11 ≤−10 dB;
VSWR ≤ 2. The fabricate model occupies the total area of only 30 × 30 × 1.6 mm3
but can cover a huge frequency range of around 15.7 GHz bandwidth. Since this is
fabricated with FR-4, it will be reliable and durable to the environment.
Table 4 represents the comparison between some recent developments in the SWB
antenna design with fractal structures. The miniaturization in size is made up to 30
× 30 mm2 which gives a compact design to the antenna and a good bandwidth and
gain values such as 15.7 GHz and 6.281 dBi, respectively are obtained.
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 277
(d) (e)
Fig. 11 Current distribution in different iterative levels a 0th iteration, b 1st iteration, c 2nd iteration,
d 3rd iteration, and e 4th iteration
(a) (b)
Fig. 12 Fabricated design model of monopole notched octagonal Sierpinski Fractal antenna with
DGS a front view b back view
4 Conclusion
Table 4 Comparison between some recent developments in SWB antenna design and proposed
work
Reference no Normalized Frequency Fractional BW Peak gain (dBi) BW ratio
size (mm3 ) range (GHz) (%)
[16] 120 × 120 × 0.70–4.71 148.14 3.9 16.72:1
1.6
[17] 28 × 28 × 3.5–15.1 124.73 3.5 4.31:1
1.6
[18] 24 × 22 × 3.1–10.9 111.42 4.1 3.406
1.57
[19] 18.5 × 39 × 3.2–12 115.78 4 3.66:1
1.6
[20] 30 × 24.8 × 2.6–10.8 122.38 1.2 4.15:1
1.6
[21] 40 × 30 × 3 2.23–3.1 32.64 3.6 1.39:1
[22] 52 × 42 × 0.96–13.98 174.29 4.2 14.56
0.94
[23] 52 × 42 × 0.96–10.9 167.62 3.1 11.47
1.575
[24] 52 × 46 × 0.95–13.8 174.23 4.5 14.2:1
1.6
[25] 150 × 150 × 0.64–1.6 84.61 5.3 2.5:1
0.5
[26] 40 × 38 × 2.25–11.05 132.33 5.05 4.91:1
1.6
Prop. work 30 × 30 × 4.1–19.8 131.38 6.1 4.82:1
1.6
A Monopole Octagonal Sierpinski Carpet Antenna with Defective … 279
Between the respective frequencies, the peak gain and directivity values of 6.1 dBi
and 6.45dBi are obtained. The antenna is built by photo-lithographic method and
analyzed in ZNB-20 vector analyzer (100 Hz to 20 GHz). Hereby, the measured
antenna is producing a good agreement with the simulated antenna design. The
intended antenna model offers a maximum fractional bandwidth of 131.38% with
the bandwidth ratio of 4.82:1, where various applications can be utilized within the
range. Hence, the proposed antenna fulfills the bands such as C-band (4–8 GHz), X-
band (8–12 GHz), and K-band (12–18 GHz) and also partially covers ultra-wideband
(UWB) spectrum (3.1–10.6 GHz).
References
1. Karmakar (2020) Fractal antennas and arrays: a review and recent developments. Int J
Microwave Wireless Technol 12(7):1–25
2. Rahman SU, Cao Q, Ullah H (2018) Compact design of trapezoid shape monopole antenna for
SWB application. Microwave Opt Technol Lett 61(8):1931–1937
3. Darimireddy NK, Ramana Reddy R (2018) A miniaturized hexagonal-triangular fractal antenna
for wide-band applications. Int J Antenna Propag 60(2):101–110
4. Dong Y, Hong W, Liu L (2009) Performance analysis of a printed super-wideband antenna.
Microwave Opt Technol Lett 51(4):949–956
5. Moosazadeh M, Kharkovsky S (2017) Antipodal Vivaldi antenna with improved radiation
characteristics for civil engineering applications. IET Microwaves Antennas Propag 11(6):796–
803
6. Wang Z, Yin Y, Wu J, Lian R (2015) A miniaturized CPW-fed antipodal vivaldi antenna with
enhanced radiation performance for wideband applications. IEEE Antennas Wireless Propag
Lett 15(3):16–19
7. Rahman MM, Islam MR (2019) A compact design and analysis of a fractal microstrip antenna
for ultra wideband applications. American J Eng Res 8(10):45–49
8. Zaidi NI, Ali MT, Abd Rahman NH (2019) Analysis of different feeding techniques on textile
antenna. In: 2019 International symposium on antennas and propagation, IEEE Xplore, Xi’an,
China, pp 1–3
9. Khandelwal MK (2017) Defected ground structure: fundamentals, analysis, and applications
in modern wireless trends. Int J Antenna Propag 17(2):1–23
10. Hong JS, Karyamapudi BM (2005) A general circuit model for defected ground structures in
planar transmission lines. IEEE Microwave Wireless Components Lett 15(10):706–708
11. Ali T, Mohammad Saadh AW (2017) A miniaturized metamaterial slot antenna for wireless
applications. AEU Int J Electron Commun 82(12):368–382
12. Aravindraj E, Ayyappan K (2017) Design of slotted H-shaped patch antenna for 2.4 GHz
WLAN applications. In: International Conference on Computer Communication Information
IEEE Xplore, Coimbatore, India, pp 1–5
13. Aravindraj E, Ayyappan K, Kumar R (2017) Performance analysis of rectangular MPA using
different substrate materials For WLAN application. ICTACT J Commun Technol 8(1):1447–
1452
14. Jena MR, Mishra GP (2019) Fractal geometry and its application to antenna designs. Int J Eng
Adv Technol 9(1):3726–3743
15. Constantine ASU, Balanis A Antenna theory: analysis and design, pp 811–882
16. Wang F, Bin F, Sun Q (2017) A Compact UHF antenna based on complementary fractal
technique. IEEE Access Multidiscip 10(9):21118–21125
17. Ali T, Subhash BK (2018) A miniaturized decagonal Sierpinski UWB fractal antenna. Prog
Electromag Res 85(7):161–174
280 E. Aravindraj et al.
18. Soleimani H, Orazi H (2017) Miniaturization of UWB triangular slot antenna by the use of
dual-reverse-arrow fractal. IET Microwaves Antennas Propag 11(4):450–456
19. Gorai A, Pal M, Ghatak R (2017) A Compact fractal shaped antenna for ultra wideband and
bluetooth wireless systems with WLAN rejection functionality. IEEE Antennas Wirel Propag
Lett 16(5):2163–2166
20. Ali T, Mohammad Saadh AW (2018) A miniaturized slotted ground structure UWB antenna
for multiband application. Microwave Opt Technol Lett 60(8):2060–2068
21. Sur D, Sharma A (2019) A novel wideband Minkowski fractal antenna with assistance of
triangular dielectric resonator elements. Int J RF Microwave Comput Aided Eng 29(2):1–8
22. Okas P, Sharma A, Gangwar RK (2017) Circular base loaded modified rectangular monopole
radiator for super wideband application. Microwave Opt Technol Lett 59(10):2421–2428
23. Okas P, Sharma A, Das G, Gangwar RK (2018) Elliptical slot loaded partially segmented
circular monopole antenna for super wideband application. Int J Electron Commun 5(88):63–69
24. Okas P, Sharma A, Gangwar RK (2018) Super-wideband CPW fed modified square monopole
antenna with stabilized radiation characteristics. Microwave Opt Technol Lett 60(3):568–575
25. Dong Y, Hong W, Liu L, Zhang Y, Kuai Z (2019) Performance analysis of a printed super-
wideband antenna. Microwave Opt Technol Lett 51(4):949–956
26. Syeed MAA, Samsuzzaman M (2018) Polygonal shaped patch with circular slotted ground
antenna for ultra-wideband applications. In: 2018 International conference on computer,
communication, chemical, material and electronic engineering (IC4ME2). IEEE Xplore,
Rajshahi, Bangladesh, pp 1–4
27. Aravindraj E, Nagarajan G, Senthil Kumaran R (2020) Design and analysis of recursive square
fractal antenna for WLAN applications. In: 2020 International conference on emerging trends
in information technology and engineering. IEEE Xplore, Vellore, India, pp 1–5
DFT Spread C-DSLM for Low PAPR
FBMC with OQAM Systems
1 Introduction
10]. As an added one, when less number of subcarrier is used, even the cyclic prefix
(CP) is needed to reduce the ISI. In general, FBMC suffers for high complexity [11,
12] and the more PAPR [13]. Therefore, the PAPR and complexity are reduced for
better system.
There are various PAPR reduction methods [14, 15]. These are categorized as
signal distortion methods [16, 17] non-signal distortion classes [18–23]. The modified
PTS based methods are primarily included in probabilistic schemes [24–26] and SLM
based methods [27, 28]. The dispersion-based selective mapping (DSLM) method
is proposed in order to reduce PAPR. Also, the C-DSLM method is proposed to
generate candidate signals by making product with original signal by cyclically
shifted conversion vectors [29, 30]. The residues of following are arranged as: A
transitory FBMC with OQAM model and its problems are shown in Sect. 2. A
conventional SLM method is described Sect. 3. Section 4 designates the proposed
DFT spread C-DSLM method. The comparisons of simulated results of existing and
proposed schemes are listed in Sect. 5. Section 6 gives conclusion and summarization.
2 System Model
FBMC with OQAM system is proved better than OFDM which suffers due to low
frequency utilization because of cyclic prefix and worst ability of out off band
suppression also it proves better claim in 5G system.
The signal model of FBMC with OQAM using a FIR filter x(t) is represented as
n−1
T j2πn1t
x(t) = an1,m h t − m e T e jθn1,m (1)
n=1 m=z
2
The FBMC with OQAM signals consist of M symbols with superposition principle
with M even number and N subcarriers. The real and imaginary structure found by
OQAM are converted from series to comparable manner using vectors [G = d 0 , d 1 ,
d 2 … d (2 M/4)−1 ] sN(2 M/4) , d m = vector, d m = [d 0 m , d 1 m … d N −1 m ], and d n m is the
combination of real and imaginary. d n m = an m + jbn m , where an m is real and bn m is
imaginary parts. The data matrix G is redefined with elements d n m allocated by the
succeeding equation.
m
m
dn1 = an12 where, m = even upto M − 2
(m−1)/2
= bn1 where m = odd upto M − 1 (2)
284 K. Ayappasamy et al.
The signal s(t) of FBMC with OQAM with continuous time which varies from m
= 0 to N − 1 is given as
M−1 N −1
m j π2 (m+n1) j2πn1t/T T
s(t) = dn1 e e g t −m (3a)
m=0 n1=0
2
where
j π2 (m+n1) j2πn1t/T T
gm,n1 (t) = e e g t −m (3b)
2
The transmitted FBMC signal s(t) is alienated into plentiful disruptions with a time
period T. Then, the PAPR is intended as
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 285
In order to receive the actual transmitted signal at the receiving end, the ũ of the
designated phase factors pu are directed as in Fig. 4.
In this C-SLM scheme, the data are produced using cyclically shifted product method
and permit the action which convert from frequency domain to time domain which
are denoted in Fig. 5, which envisages the subsequent equations
x u = IFFT N p u x u = F Q u x (7)
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 287
⎛ ⎞
1 1 1 ... 1
⎜1 W N−1 W N−2 . . . W N−N −1 ⎟
⎜ ⎟
1 ⎜1 W N−2 W N−4 −2(N −2) ⎟
F= ⎜ . . . WN ⎟ (8)
N⎜⎜ .. .. .. .. .. ⎟
⎟
⎝. . . . . ⎠
1 W N−(N −1) W N−2(N −1) . . . W N−2(N −1)
x = F −1 x (9)
x u = F Q u F −1 x = C u x (10)
A conversion vector with dispersive SLM (C-DSLM) method is executed for low
PAPR. In the C-DSLM, the manner of producing phase factor is same as normal
SLM method, for example, the present mnth sequence; the finest phase factors are
resolute using succeeding equation with an interval T 0 .
The oversampling past symbols are denoted as
288 K. Ayappasamy et al.
2m−1 N −1
T
xmμmin
e j T nt e j∅m ,n
2π
O s (t) = ,n g t − m (11a)
m =0 n1=0
2
2m+1 N −1
T
xmμ ,n g
e j T nt e j∅m ,n
2π
c (t) =
s
t −m (11b)
m =2m n=0
2
The selection conversion vectors are depends on the sum of the over sampling of
past symbols and current symbols.
The DFT spread, C-DSLM scheme for minimum PAPR for FBMC with OQAM
scheme is proposed. The DFT spreading is clarified in Sect. 4.1. Then, the conversion
vectors are designed and constructed, and then, precise step is encountered.
The elements in the conversion vector cu found by enchanting the discrete Fourier
transform of the phase factors pu which is present at the sets {±1, ± j}. Conversion
vector cu may be gratified the below conditions:
a. The value of nonzero elements in cu should be less than or equal to 4.20.
b. The complex values of the nonzero elements in cu should be selected from the
set {1, − 1, and 0}.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 289
⎛ ⎞
1 0 0 0 j 0 0 0 1 0 0 0 −j 0 0 0
⎜ 0 1 0 0 0 j 0 0 0 1 0 0 0 −j 0 0 ⎟
⎜ ⎟
⎜ 0 0 1 0 0 0 j 0 0 0 1 0 0 0 −j 0 ⎟
⎜ ⎟
⎜ 0 −j ⎟
⎜ 0 0 1 0 0 0 j 0 0 0 1 0 0 0 ⎟
⎜ ⎟
⎜−j 0 0 0 1 0 0 0 j 0 0 0 1 0 0 0 ⎟
⎜ ⎟
⎜ 0 −j 0 0 0 1 0 0 0 j 0 0 0 1 0 0 ⎟
⎜ ⎟
⎜ 0 0 −j 0 0 0 1 0 0 0 j 0 0 0 1 0 ⎟
⎜ ⎟
⎜ 0 0 0 −j 0 0 0 1 0 0 0 j 0 0 0 1 ⎟
C = 0.5 ∗ ⎜
⎜ 1
⎟
⎜ 0 0 0 −j 0 0 0 1 0 0 0 j 0 0 0 ⎟⎟
⎜ 0 −j 0 ⎟
⎜ 1 0 0 0 0 0 0 1 0 0 0 j 0 ⎟
⎜ ⎟
⎜ 0 0 1 0 0 0 −j 0 0 0 1 0 0 0 j 0 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 0 0 −j 0 0 0 1 0 0 0 j ⎟
⎜ ⎟
⎜ j 0 0 0 1 0 0 0 −j 0 0 0 1 0 0 0 ⎟
⎜ ⎟
⎜ 0 j 0 0 0 1 0 0 0 −j 0 0 0 1 0 0 ⎟
⎜ ⎟
⎝ 0 0 j 0 0 0 1 0 0 0 −j 0 0 0 1 0 ⎠
0 0 0 j 0 0 0 1 0 0 0 −j 0 0 0 1
(12)
As per the inverse negative of the theorem, Table 1 is denoted. By echoing the
vector by the ratio of N and D values, the anticipated phase factors rotation vector
pu is found as,
p u [( p̃ u )T , ( p̃ u )T , . . . , ( p̃ u )T ]T . (13)
C 3 = [1, 0, 0 . . . 0, − j, 0, 0 . . . 0, 1, 0, 0, . . . 0, j, 0, 0, . . . 0] (15)
One symbol (e.g., 1, 0, 0…0) equals ((N − 4)/4) duration. The above progression
is articulated by the product of s and C u , i.e., su = C u s, where C u the convolution
matrix agreeing to cu is denoted as
N −2 x (N −1)
0 1
Cu = cx , cx . . . cx , c (16)
where (cx )‹n› characterizes the descending cyclical shifting of cx using n essentials.
Equation (12) displays structure of downward cyclic shift with C3 .
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 291
u 0
Cu = (C ) , (C u )1 · · · (C u )(ON−1) ∈ C ON×ON (19)
where u denotes {1, 2, …, U} which is the phase rotational values pu in(17), which
is given in Table 1.
Method 2: Modulation with Conversion Vectors. Injecting the value of products
of (0–1) with N zeros at central place presents vectors x = [x 1 , x 2 , …, x N/ 2 ].
S u = C u .s 1 ∈ R 1×ON , u = 2, 3, . . . , U (20)
The values su is recurrent to L values and got product with g which gives
T
SLu = [(S u )T , . . . , S u )T ∈ R 1×L .ON (21)
T
where [(S u )T , . . . , S u )T repeats L times.
Calculation of PAPR: The value of peak power to average power is calculated for
su (t) on a certain interval T o
maxt∈T0 S u (t)2
PAPRuT0 = , u ∈ {1, 2, . . . , U } (22)
1
T
∫T0 |S u (t)|2 dt
The value of T 0 is distresses reduced PAPR for proposed scheme and where T 0
= [mT, mT + 4 T ].
Method 3: Selection. The minimum of PAPR is observed and is recorded by the
following formula:
Table 2 Computational
Schemes Number of Number of additions
complexity calculation
multiplications
C-DSLM UM(ON/2) UMON log2 (ON)
Log2 (ON)
Proposed DFT M(ON/2) MON log2 (ON) +
spread C-DSLM log2 (ON) 3(U − 1) MON
A complexity measure is done for proposed DFT spread C-DSLM scheme and
compared with existing methods. It is restrained by the number of multiplication and
addition processes. The CCRR is abbreviated as computational complexity reduction
ratio which is defined by (Table 2)
complexity of DFT Spread CDSLM
CCRR = 1 − × 100% (26)
Complexity of CDSLM
5 Performance Evaluation
The results are simulated for the proposed scheme using MATLAB R2018a. The
proposed DFT spread C-DSLM scheme is well suited for low PAPR and low compu-
tational complexity. The simulated results given in Table 5 are compared with existing
C-DSLM scheme which is designed for FBMC with OQAM signals.
The number of subchannel N taken for simulation is 64, 256, 512 and 1024, and
the number of symbol is assigned as 100. Subcarrier spacing is assigned as 15 kHz.
The oversample factor O and overlap factor L are assigned as 4. The phase rotation
vectors U are assigned as 4, 8, and 16. The four QAM modulations are used, and the
sampling period T s is assigned as 0.4 µs. Multipath fading channels are used to test
the performance. The parameters assigned for simulation are given in Table 3.
294 K. Ayappasamy et al.
Table 3 Simulation
Simulation attributes Remarks
parameters
Tool used for simulation MATLAB R2018a
Subchannels N considered 64, 256, 512 and 1024
Period for sampling T s T s = 0.4 µs
Number of symbols M 100
Oversampling factor O 4
Phase rotation vectors U 4, 8, or 16,
Modulation 4 QAM (4 OQAM)
Real-valued symbols 3 × 104 OQAM
Overlap factor L 4
Channel Multipath fading channels
Subcarrier spacing 15 kHz
To assess the complexity of proposed method and conventional methods, the assigned
subcarriers’ N is 64, M is 100, and O is 4. The phase rotation vectors U are assigned
as 4, 8, or 16; the numbers of complex addition and multiplication are given in Table
2. The calculated values of complexity using assigned specification are given in
Table 3. The counts of calculated multiplications for the proposed method is lesser
which is 30,822, for all U = 4, 8, and 16, whereas the counts for existing C-DSLM is
123,301, 246,603, and 493,158 for U = 4, 8, and 16, respectively. Also the percentage
of CCRR when U = 4, 8, and 16 are 75%, 87.51%, and 93.8%, respectively. On the
other hand, the numbers of additions of proposed method are also proved as less
CCRR. So, it is proved that the proposed method offers better concert than existing
C-DSLM scheme (Table 4).
Magnitude 1
0.5
-0.5
0 100 200 300 400 500 600 700 800 900 1000
subcarriers
150
100
Phase (degrees)
50
-50
-100
-150
-200
0 100 200 300 400 500 600 700 800 900 1000
subcarriers
Peak Power and Spectrum. The proposed DFT spread C-DSLM gives less PAPR
than the conventional system which is proved by Fig. 11. The peak power is limited
within 0.05 to −0.05 V for assigned 1024 subcarriers. Also the limited received
spectrums with subcarriers powers are shown in Fig. 12. The normalized frequencies
which are assigned for this are f s /2 = 0.5. The angle of phase rotation factors are
shown in Fig. 15.
PAPR Comparison. The PAPR for the proposed DFT spread and other existing
scheme is simulated and compared as shown in Fig. 17. The proposed method offers
6.3 dB for 10−4 CCDF of less value than compared to C-DSLM and DSLM which
shows 7.1 dB and 8 dB for same 10−4 CCDF, respectively. So the proposed system
offers less PAPR than other conventional systems which is due to the selection of
relevant phase factors.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 297
0.03
0.02
0.01
Amplitude
-0.01
-0.02
-0.03
-0.04
0 200 400 600 800 1000 1200
subcarriers
0.1
Amplitude (volts)
0.05
-0.05
-0.1
-0.15
-0.2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
subcarriers
-10
-15
-20
-25
-30
-35
-40
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Normalized Frequency (0.5 = fs/2)
Magnitude 1
0.5
-0.5
0 100 200 300 400 500 600 700 800 900 1000
subcarriers
150
100
Phase (degrees)
50
-50
-100
-150
-200
0 100 200 300 400 500 600 700 800 900 1000
subcarriers
6 Conclusion
A DFT spread converse vectors with less complex, dispersion-based SLM method is
implemented for reduced PAPR and low complexity in FBMC with OQAM system.
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 299
150 30
0.5
180 0
210 330
240 300
270
Received signal
0.04
0.03
0.02
Amplitude in v
0.01
-0.01
-0.02
-0.03
-0.04
0 200 400 600 800 1000 1200
subcarriers
An estimation of the complexity and simulation results of proposed scheme are shown
and compared with the existing dispersive SLM and C-DSLM schemes. The proposed
system offered better PAPR reduction which is 6.3 dB, whereas the existing methods
show 7.1 and 8 dB when CCDF is 10−4 . Also the proposed scheme proved that it
offers less computational complexity which has 93.8% of computational complexity
reduction ratio (CCRR) when phase rotation factor U = 16 for complex multipli-
cations than compared to the conventional schemes. Therefore, the proposed DFT
spread C-DSLM scheme proposals improved performance while comparing to the
existing DSLM and C-DSLM schemes.
300 K. Ayappasamy et al.
-2
10
-3
10
-4
10
4 5 6 7 8 9 10 11 12
z in dB
-5
10
Bit Error Rate
-10
10
C-DSLM
proposed DFT spread C-DSLM
-15 DSLM
10
0 5 10 15 20 25
SNR in dB
References
1. Cheng X, Shi W, Zhao Y (2020) A novel conversion vector-based low-complexity SLM scheme
for PAPR reduction in FBMC/OQAM systems. IEEE Trans Broadcast 3:1–11
2. Na D, Choi K (2020) DFT spreading-based low PAPR FBMC with embedded side information.
IEEE Trans Commun 68:1–15
3. Jinwei J, Ren G, Zhang H (2015) A semi-blind SLM scheme for PAPR reduction in OFDM
systems with low-complexity transceiver. IEEE Trans Veh Technol 64:2698–2703
4. Chen D, Tian Y, Qu D, Jiang T (2018) OQAM-OFDM for in future internet of things: a survey
on key technologies and challenges. IEEE Internet Things J 5:3788–3809
DFT Spread C-DSLM for Low PAPR FBMC with OQAM Systems 301
5. Choi J, Oh Y, Lee H, Seo J (2017) Pilot-aided channel estimation utilizing intrinsic interference
for FBMC/OQAM systems. IEEE Trans Broadcast 63(4):644–655
6. Li D, Chen D, Qu Y, Zhang TJ (2019) Receiver design for Alamouti coded FBMC system in
highly frequency selective channels. IEEE Trans Broadcast 65(3):601–608
7. Zhang L, Xiao P, Zafar A, Quddus AU, Tafazolli R (2017) FBMC system: an insight into
doubly dispersive channel impact. IEEE Trans Veh Technol 66:3942–3956
8. Tian Y, Chen D, Luo K, Jiang T (2019) Prototype filter design to minimize stop-band energy
with constraint on channel estimation performance for OQAM/FBMC systems. IEEE Trans
Broadcast 65:260–269
9. Al-Dweik A, Younis S, Hazmi A, Tsimenidis C, Sharif B (2012) Efficient OFDM symbol
timing estimator using power difference measurements. IEEE Trans Veh Technol 61:509–520
10. Ayappasamy K, Nagarajan G, Elavarasan P (2020) Decision feedback equalizers and Alamouti
coded DFT spread for low PAPR FBMC-OQAM system. In: IEEE international conference on
emerging trends in information technology and engineering (ic-ETITE). IEEE explore digital
library electronic. ISBN: 978-1-7281-4142-8
11. Cheng X, Liu D, Wang C, Yan S, Zhu Z (2019) Deep learning based channel estimation and
equalization scheme for FBMC/OQAM systems. IEEE Wirel Commun Lett 8:881–884
12. Li W, Qu D, Jiang T (2018) An efficient preamble design based on comb-type pilots for channel
estimation in FBMC/OQAM systems. IEEE Access 6:64698–64707
13. Elavarasan P, Nagarajan G (2012) Optimal phase selection factor for PTS using GPW and RPW
in OFDM systems. J Comput Sci 6:140–147
14. Elavarasan P, Nagarajan G (2014) A summarization on PAPR techniques for OFDM systems.
Int J Inst Eng Ser B 96:381–389
15. Ayappasamy K, Nagarajan G, Elavarasan P (2019) FBMC OQAM-PTS with virtual symbols
and DFT spreading techniques. In: UGC CARE list category B: indexed in emerging sources
citation index (web of science). Mathematical reviews (MathSciNet). J Adv Appl Math Sci
18:817–825
16. Li XD, Cimini LJ (1998) Effect of clipping and filtering on the performance of OFDM. IEEE
Commun Lett 2:131–133
17. Wang X, Tjhung TT, Ng CS (1999) Reduction of peak-to-average power ratio of OFDM system
using a companding technique. IEEE Trans Broadcast 45:303–307
18. Muller SH, Huber JB (1997) OFDM with reduced peak-to-average power ratio by optimum
combination of partial transmit sequences. Electron Lett 33:368–369
19. Bauml RW, Fischer RFH, Huber JB (1996) Reducing the peak-to average power ratio of
multicarrier modulation by selected mapping. Electron Lett 32:2056–2057
20. Elavarasan P, Nagarajan G (2015) Peak-power reduction using improved partial transmit
sequence in orthogonal frequency division multiplexing systems. Int J Comput Electr Eng
44:80–90 (SCIE, SCOPUS, Elsevier, IF-1.747)
21. Jones AE, Wilkinson TA, Barton SK (1994) Block coding scheme for reduction of peak to
mean envelope power ratio of multicarrier transmission schemes. Electron Lett 30:2098–2099
22. Wulich D (1996) Reduction of peak to mean ratio of multicarrier modulation using cyclic
coding. Electron Lett 32:432–433
23. Jiang T, Li X (2010) Using fountain codes to control the peak-to-average power ratio of OFDM
signals. IEEE Trans Veh Technol 59:3779–3785
24. Kollar Z, Horvath P (Jun 2012) PAPR reduction of FBMC by clipping and its iterative
compensation. J Comput Netw Commun 5. Art. no. 382736
25. You Z, Lu I-T, Yang R, Li JL (2013) Flexible companding design for PAPR reduction in OFDM
and FBMC systems. In: Proceedings of the international conference on computer networks and
communications (ICNC). San Diego, CA, USA, pp 408–412
26. Qu D, Lu S, Jiang T (2013) Multi-block joint optimization for the peak to-average power ratio
reduction of FBMC-OQAM signals. IEEE Trans Sig Process 61:1605–1613
27. Ye C, Li Z, Jiang T, Ni C, Qi Q (2014) PAPR reduction of OQAMOFDM signals using segmental
PTS scheme with low complexity. IEEE Trans Broadcast 60:141–147
302 K. Ayappasamy et al.
28. He Z, Zhou L, Chen Y, Ling X (2018) Low-complexity PTS scheme for PAPR reduction in
FBMC-OQAM systems. IEEE Commun Lett 22:2322–2325
29. Goff SYL, Al-Samahi SS, Khoo BK, Tsimenidis CC, Sharif BS (2009) Selected mapping
without side information for PAPR reduction in OFDM. IEEE Trans Wirel Commun 8:3320–
3325
30. Hong E, Kim H, Yang K, Har D (2013) Pilot-aided side information detection in SLM-based
OFDM system. IEEE Trans Wirel Commun 12:3140–3147
Secure, Efficient, Lightweight
Authentication in Wireless Sensor
Networks
1 Introduction
product monitoring, space and under-water applications [1–3]. Though the collec-
tion and processing of the sensing data from the coverage area, WSNs allow users
to access detailed and consistent information at any time and any place, which is
an omnipresent sensing skill. Most of these appliances hold large-scale WSNs, as a
result, the quantity of raw sensed data is very large, but the sensor nodes contain very
small resource constraints such as limited bandwidth, battery life, computing power,
small storage space, and communication resources. In WSNs, incomplete battery
life is one of the essential constraints because it is difficult to exchange the batteries,
and there are no chances for recharging the randomly deployed sensor nodes. For
this reason, energy efficiency is an indispensable part of WSNs. On the other hand,
WSNs regularly function in hazard atmospheres. So, adversary has chances that
straightforwardly incarcerate a sensor node from target land and extort entire sensed
records from its memory since nodes are not in general tamper-resistant due to their
expenses efficacy [4–7].
WSNs have been broadly utilized by researchers, academicians, and industrialists
in dissimilar personal, organizational appliances. In recent times, WSN has turn out
to be the most important field and moreover gets the attention of researchers to
set up a secure network next to the malevolent node [4, 5, 8]. Safety measures of
WSN can be subjected to threats by foes. Since the inadequate resources of a sensor
node, it is hard to apply conventional which produces great results in other networks
to mitigate various threats. Intruder or attackers come up with several methods to
access the sensible, confidential raw data from low-powered sensor nodes. Thus, it is
necessary to design solid security measurements for sensible WSNs, to be damaged
from a mixture of attacks [2–5, 8].
2 Authentication in WSN
There is a strong requirement to protect the sensed readings and data in WSNs.
Authentication is one of the security mechanisms that defend WSNs from an ample
range of security attacks. Authentication is a procedure by which the individuality of
a sensor in the setup is confirmed as well as promises that the records or the messages
initiate from an authentic basis. In simple words, authentication affords to the data
which can be propel or access by every node in the set-up [1–3, 8]. Besides, it is
significant toward averting and gets the fault records from the illegal abusers. In hand,
there are numerous authentication methods proposed in WSN security, and those are
categorized under authentication using unicast, multicast, and broadcast messages,
cryptographic keys, and static, mobile, or both aspects of WSN. In unicast—performs
point-to-point authentications with unicast messages, there is no involvement of any
other sources. In broadcast, messages are directly taken from reliable sources and
cannot be altered throughout transmit. Inspection identity of source from where the
message initiated conforms the message reliability for making sure the message
uniqueness, falsification, impersonation, etc., are some of the basic works of the
broadcast [1–4, 8]. Coming to cryptographic-based authentication, it could be an
Secure, Efficient, Lightweight Authentication in Wireless … 305
3 Related Work
In [7], authors designed a network that uses a novel, well-organized source anony-
mous message authentication scheme (SAMA) performed on elliptic curve cryp-
tography (ECC) to produce unconditional source anonymity. The scheme allows
in between nodes to authenticate the message, and thus, any tainted message can
notice and be dropped to preserve the sensor power. In [9], developed SDAACA
protocol holds a pair of algorithms: Secure data fragmentation (SDF) along with node
joining authorization (NJA). Here, SDF covers the data records from being spoiled by
attackers via fragmenting to small pieces. NJA authorizes any new node that wants to
connect with the network. In [10], authors analyzed two-factor authentication draw-
backs and planned a novel three-factor certification protocol in favor of WSNs in
IoT. Here, fuzzy models extract abusers biometric data for password verification then
verified with BAN logic. The simulation part shows that the protocol achieves free
password change, detects known attacks, and quickly recognizes informal logins.
Authors of [11] designed a model to avoid attacks in node and cluster level, authors
proposed a new procedure which contains two algorithms. First, the key renewal
scheme performed at the cluster and followed by the node authentication process.
According to the authors, we can authenticate nodes at some stage in the key estab-
lishment step and refurbish the key sporadically, [12] author offered a speedy confir-
mation of vBNN-IBS, a pairing-free identity-based signature by condensed signature
magnitude. The speeding up procedure intended to decrease the energy utilization
and thus expand the system life span by diminishing the computation overhead owed
to signature certification. Authors in [13] developed a trivial authentication practice
306 B. Chander and K. Gopalakrishnan
for node-node, node-base station, and base station-node. All of these protocols follow
ECC and hidden generator theory along with the Hash chain in [14] two protocols
are presented; one takes care of the probability of the proposed model to have a
trusted sensor node. The second part analyzes the energy consumption of model,
and improved protocol authenticates newly joined nodes in WSNs with the help of
a trusted proposal. In [15], the author’s projects improvised three-factor authentica-
tions and definite with ProVerif, and results show the protocol secure against both
formal and informal attacks; also, it has high robustness. In [16], digital certificate-
based node authentication developed by authors, where every node in the network
assumes that BS is the trusted third party that provides the digital certificate to all
legitimate nodes. By verifying the details stored in certificate, nodes authenticate
with each other [17] and propose simple authentication and key distributed scheme
among sensor nodes; moreover, authors developed a re-authentication set of rules
by considering node mobility [18] and designed a new certificate-less authentica-
tion scheme to avoid man in the middle assault. Simulation outcome proves that the
projected design reduces 6–15% energy consumption. From [19], authors analyzed
three-factor authentications as well as develop a secured protocol and verify it with
numerous attacks. In [20] authors fabricated an authentication and key management
scheme (AKMS) in favor of WSNs use symmetric keys with keyed Hash functions
along with the bidirectional procedure for message reliability and faithfulness.
First of all, each sensor node holds some unused energy, and after waiting spec-
ified time instance, sensor node forwards HELLO packets to all available sensor
nodes. Here, the time depends on the user, which mentioned at the time of network
deployment, and if the node does not receive any initial packet-related notification
within the time, it is declared as CH. After receiving HELLO packets, each sensor
node compares the received signal strengths and the node which has the highest
value elected as CH. The non-CH node finds the minimum hop distance (HDmin )
involving itself and its consequent CH with the help of received power (pr ) from
the CH and communication range of the sensor nodes (nt ). Additionally, the sensor
node announces its location with the help of a non-persistent CSMA approach, so
sensors those extremely away from CH can utilize intermediate nodes to transfer
their sensible data to CH through multi-hop communications.
Secure, Efficient, Lightweight Authentication in Wireless … 307
In this protocol, we assumed that the base station (BS) and cluster head (CH) have
the highest resources where adversary cannot able to perform any kind of attacks on
them. BS stores the identities of each sensor node and cluster head in its memory
storage. Base station forwards primary secret keys to every single sensor node plus
cluster heads, for instance—K AB = Primary secret key mutual among sensor nodes
A and B. K AB = Secret key mutual among A and B to calculate MAC and so on.
Similarly, BS and CH share the public key. Every sensor node generates its public
and private keys, share d keys for encryption plus MAC calculation is done on it.
The keys used in this scheme are independent of working. As a result, the attacker
can get crack or recognize any one of the keys it is not so useful for future operation
because the attacker could not recognize any new keys since the computation uses
autonomous of keys.
Notations employed in proposed procedure
A, B Sensor nodes
BS Base station
N Nonce values
CH Cluster head
PUa , PUb Public keys of node A and B
PRa , PRb Private keys of node A and B
PUCH , PRCH Public, private keys of CH
PU, PR Public key, private key of BS
K AB Secret key shared among A and B
K AB Secret key shared among A and B used to calculate the MAC
K A,CH Secret key shared among A and CH
K A,CH Secret key shared among A and CH used to calculate the MAC
K B,CH Secret key shared among B and CH
K B,C H Secret key shared among B and CH used to calculate the MAC
K CH,BS Secret key shared among CH and BS
K CH,BS Secret key shared among CH and BS used to calculate the MAC
K chu Cluster authentication key
IDa Identity of node A
C Digital certificate
m Message
R Digital certificate request message
{M} K AB Encrypted memo M with encryption
MAC{M} K AB
Calculation of MAC of (M) by MAC key K AB .
In the primary level, sensor node ‘A’ initiates a certificate request message to
its corresponding cluster head. Then, CH validates the uniqueness of sensor node
and furthermore forwards the request to BS. Then, BS creates a certificate for the
requested node, forwards it to CH, and CH receives a digital certificate and sends it
to an appropriate sensor node.
308 B. Chander and K. Gopalakrishnan
In step 1, sensor node A forwards a message ‘M1 ’ toward CH and that contains a
certificate request message ‘R’, along with its ‘IDa ’, public key ‘PUa ’ plus a nonce
‘N 1 ’, besides, concatenates the MAC of M 1 encrypted via the MAC secret key shared
among A, CH.
Step 2: Cluster Head (CH) to Base Station (BS)
CH → BS : IDCH , Nch , M1 | MAC(Nch , M1 )K CH,BS PRCH
When CH receives a message ‘M1 ’, it adds its ID and new nonce value to the
message and adds MAC of this message to the BS. Before sending it to BS, the
entire message is encrypted by private key of CH.
Step 3: Base Station (BS) to Cluster Head (CH)
BS → CH : IDCH , Na , Nch , C| MAC(C, Na , Nch , IDCH )K CH,BS PR
After receiving the message, BS station decrypts it with CH public key which was
stored in its database at the moment of network exploitation and verifies MAC value.
If verification is successful, BS generates digital certificate ‘C’ for Node A, and the
same is concatenated by the MAC of this message to the BS and again encrypted with
its own private key ‘PR’. The certificate contains a version number, serial number,
issuer name, life span, and extensions.
Step 4: Cluster Head (CH) to Sensor Node (A)
CH → A : ID A , Na , Nch , C|MAC(ID A , Na , Nch , C)K CH,A
CH decrypts the received message with BS public key which stored in its
database before the network setup. After verifying the MAC value, CH transforms
the certificate to Node A and encrypts it with Shared MAC key.
Step 5: Sensor Node (A) to Sensor Node (B)
Upon receiving a certificate from CH, sensor node (A) stores it in a database. For
authentication purposes, Node (A) transmits the certificate to the node (B) which was
encrypted with a shared secret key among both the nodes. After receiving certificate,
Node (B) completes further functions like first it checks the Node ID with its memory
database lists. It equals means Node (A) is a legitimated node. Then, it checks the
legality of digital certificates by scrutinizing the life epoch of the certificate and the
Secure, Efficient, Lightweight Authentication in Wireless … 309
public-key algorithm that compose a sign on it. If all function is fulfilled, indicate
node is legitimate, lively, and comes from an official BS.
Step 6: Sensor Node (B) to Sensor Node (A)
Sensor node (B) follows the same procedure to authenticate with a sensor node
(A).
Cluster authentication key for re-authentication of Nodes: In WSN’s, sensor nodes
employed in hazard, terrible, risky,and unsafe environments. Due to unexpected situ-
ations that happen in the mentioned fields, some of the nodes may lose their connec-
tivity with appropriate sensors. For re-authentication, CH shares cluster authentica-
tion key (K chu ) with non-CH ahead of the network deployment. For instance, Node
(A) loses its connectivity and wants to reconnect with Node (B), and it forwards
identities, nonce and cluster authentication key and concatenated MAC value. Then,
Node (B) decrypts and verifies the MAC value, and if verification is successful means,
it is a legitimated node. Here, we save the battery power by decreasing the commu-
nication which takes place in the initial authentication procedure; CH itself forwards
the cluster re-authentication key.
A → B : ID A , IDb , {K chu } K AB |MAC I D A , I Db , {K chu } K AB K
AB
(1) (2)
Fig. 1 Simulation outcome of the proposed protocol in AVISPA (1) OFMC (2) CL-AtSe
Message meaning rule (Rule (1))—P believes Q has said X, P believes the key
K is mutual or shared key with Q, P sees X is encrypted by K. Random number
verification rule (Rule (2))—P believes Q believes X if P believes X is sent now
and Q has said X. Jurisdiction rule (Rule (3))—P believes X if P believes Q has
jurisdiction for X and P believes Q believes X. Fresh transmission rule (Rule (4))—P
believes fresh (X)/P believes fresh (X, Y ) . Trust polymerization and trust projection
rule (Rule (5))—P believes X, P believes Y /P believes (X, Y ) P believes (X, Y )/P
believes X. See rule (Rule (6))—P can decrypt message he obtained if P received
messages encrypted with own public key.
Secure, Efficient, Lightweight Authentication in Wireless … 311
6 Conclusion
In WSNs, security plays a major role, and sensed data records should not be easily
reached to unlawful nodes. The attacks which happened on WSNs appliances are
due to the anxious validation schemes. Several verification, authentication schemes
are exited in the literature, but most of them need many computational resources. But
nodes in WSNs hold limited resource constraints, so the proposed scheme should be
simple, lightweight and consume less power. Here, we designed a secured authenti-
cation scheme, where we first perform a simple cluster technique, after that, sensor
node requests a digital certificate to BS through CH. Moreover, re-authentication is
done by cluster re-authentication key which increases the life of the node by avoiding
unnecessary computation for new certificate. MAC applied each message to increase
integrity and authenticity. The proposed model provoked to be secure by evolution
and initiates safe and sound through a model checker tool for testing authentication
protocol called AVISPA and with BAN logic.
312 B. Chander and K. Gopalakrishnan
References
1. Binh HTT, Dey N (eds) Soft computing in wireless sensor networks. CRC Press
2. Dogra H, Kohli J (2016) Secure data transmission using cryptography techniques in wireless
sensor networks: a survey. Indian J Sci Technol 9(47)
3. Rajeswari SR, Seenivasagam V (2016) Comparative study on various authentication protocols
in wireless sensor networks. Sci World J 3
4. Lu Z, Qu G, Liu Z (2019) A survey on recent advances in vehicular network security, trust,
and privacy. IEEE Trans Intel Transp Syst 20(2):760–776
5. Azees M, Vijayakumar P, Deborah LJ (2016) Comprehensive survey on security services in
vehicular ad-hoc networks. IET Intel Transp Syst 10(6):379–388
6. Chander B (2020) Clustering and Bayesian networks. In: Handbook of research on big data
clustering and machine learning. IGI Global, pp 50–73
7. Choukimath RC, Ayyannavar VV (2014) Secure and efficient intermediate node authentication
in wireless sensor networks. Int J Sig Process Syst 1(3):71–74
8. Fu Z, Huang F, Ren K, Weng J, Wang C (2017) Privacy-preserving smart semantic search
based on conceptual graphs over encrypted outsourced data. IEEE Trans Inf Forensics Secur
12:1874–1884
9. Razaque A, Rizv SS (2017) Secure data aggregation using access control and authentication
for wireless sensor networks. Comput Secur 70:532–545
10. Li X, Niu J, Kumari S, Wu F (2018) A three-factor anonymous authentication scheme for
wireless sensor networks in internet of things environments. J Netw Comput Appl 103:194–204
11. Lee S, Kim K (2015) Key renewal scheme with sensor authentication under clustered wireless
sensor networks. Electron Lett 51(4):368–371
12. Benzaid C, Lounis K, Al-Nemrat AB, Nadjib AM (2016) Fast authentication in wireless sensor
networks. Fut Gener Comput Syst 55:362–375
13. Moon AHU (2016) Authentication protocols for WSN using ECC and hidden generator. Int J
Comput Appl 133(13):42–47
14. Yussoff YM, Kamarudin (2017) Lightweight trusted authentication protocol for wireless sensor
network (WSN). Int J Commun 2:130–136
15. Jawad KM (2019) An improved three-factor anonymous authentication protocol for WSN s
based iot system using symmetric cryptography. In: International conference on communication
technologies, ComTech 2019, pp 53–59
16. Bhanu C, Kumaravelan (2018) Simple and secure authentication in WSNs using digital
certification. Int J Pure Appl Math 119(16):137–143
17. El Dayem A, Rizk SS, Mokhtar MA (2016) An efficient authentication protocol and key estab-
lishment in dynamic WSN. In: Proceedings of the 6th international conference on information
communication and management, ICICM 2016, pp 178–182
18. Gaur SS, Mohapatra (2017) An efficient certificate less authentication encryption for WSN
based on clustering algorithm. Int J Appl Eng Res 12(14):4184–4190
19. Jung J, Moon J (2017) Efficient and security enhanced anonymous authentication with key
agreement scheme in wireless sensor networks. Sensors 17(3)
20. Qin D, Jia S (2016) A lightweight authentication and key management scheme for wireless
sensor networks. J Sens
Performance Evaluation of Logic Gates
Using Magnetic Tunnel Junction
1 Introduction
J. Garg (B)
Department of Electronics Engineering, Dr. A.P.J. Abdul Kalam Technical University, Lucknow,
Uttar Pradesh, India
S. Wairya
Department of Electronics and Communication Engineering, IET, Lucknow, Uttar Pradesh, India
causes the flow of leakage current [5]. From the last few decades, we have seen
logic gates using CMOS technology. Logic gates using MTJ have been proposed to
avoid leakage current. There are many existing designs of logic gates using MTJ.
The existing designs using MTJ helps to reduce power and delay but there are many
other drawbacks also [6, 7]. MTJ has the property of storage and processing that
helps to reduce memory and delay. In logic designing using MTJ, there is always a
need for a sensing amplifier to read the data and process it further.
In [8] paper, a magnetic XOR gate has been designed that has six MTJ and tran-
sistors. This paper shows that it has a small area, but number of counts of transistors
increases, due to that write energy also increases that is a significant drawback. In
[9] paper also, there is a requirement of additional circuitry due to that increment
in area, power consumption, and delay takes place. There are few designs in that
single MTJ that have been used for the implementation of linear functions [10].
If a nonlinear function has to be designed, then using the linear function can be
designed, and this is a process of multiple stages. In [11], spin diode logic family
has been proposed in which dynamic power dissipation is more than writing power
dissipation, and dynamic power plays a vital role in designing a digital circuit. So,
there is one technique adiabatic to design low-power circuits. In adiabatic circuits,
charging and discharging of load capacitor is controlled to reduce power dissipation
[12]. This paper’s remaining part is arranged as a brief review of magnetic tunnel
junction, proposed circuit description, analysis, results, and conclusion.
As per the latest research, MRAM has a central key element magnetic tunnel junction
(MTJ) (Fig. 1).
In MTJ, there are two layers- one is the free layer, and the second is the reference
layer. Both layers are formed from ferromagnetic material; there is an oxide layer for
tunneling purposes among these two layers. Information is stored in a bit form ‘0
(low resistance state) or ‘1 (high resistance state) by the magnetization of the free
layer to reference layer magnetization. If the free layer and reference layer magnetic
moment is in the opposite direction, then MTJ is in high resistance state, and it is
considered high logic. If the free layer magnetic moment is in the same direction
with the reference layer magnetic moment, then MTJ is considered low resistance. It
can be read as low logic [13]. In Fig. 1 MTJ, and in Fig. 2, the low and high resistance
state of MTJ is represented.
In MTJ, writing current depends inversely on the size of MTJ means if the size of
MTJ decreases, then write current increases or vice versa. A large amount of current
is required to design the small size of MTJ, which was a bottleneck in MRAM. To
avoid this problem, STT MRAM came into existence. The basic structure of STT
MRAM has 1 transistor and 1 MTJ called 1T1MTJ [14].
Performance Evaluation of Logic Gates Using Magnetic … 315
Fig. 2 States of MTJ ‘0’ low resistance state, ‘1’ high resistance state
3 Proposed Approach
In this section, the magnetic tunnel junction-based logic circuit is proposed. Magnetic
tunnel junction-based logic circuits have three parts, as shown in Fig. 3 [15]. The
first part is a sense amplifier, the second part is CMOS logic, and the third one is
the MTJ part. Sense amplifier forms a pull-up network and combining CMOS logic
and MTJ both forms pull-down network. Programming is done in STT MRAM. In
CMOS, logic part logic is designed using MOS elements, and the sense amplifier is
316 J. Garg and S. Wairya
Fig. 3 Structure of a
MTJ-based circuit
the last part that gives the output. In this paper to design sense amplifier reference
has taken from [15].
Figures 4 and 5 show the magnetic tunnel junction-based circuit diagram of the
NAND/AND gate and NOR/OR gate, respectively. In Fig. 4, both AND and NAND
logic are implemented, taking both the inputs with its complementary form. In the
sense amplifier, there are eight transistors—P1, P2, P3, P4, N1, N2, N3, and N7.
Transistor N4, N5, and N6 form the CMOS logic. Input 1(A or Abar) is applied to
CMOS logic, and Input 2 (B or Bbar) is applied to MTJ. When MTJ 1 is set as Rap
and MTJ2 is Rp , then input B is considered as logic 1 or vice versa. If Input A is
taken 0 and Input B is taken 1, then the working is explained by taking input ‘01.’
If the clk signal is ‘0’ (precharge phase), then P1 and P4 are in on state, and P2,
P3, and N7 are in off state. If the clk signal is ‘1’ then P1 and P4 will be in off
state and N7 will be in ON state. Transistor N4 and N5 are in off state. MTJ2 is in
parallel, and MTJ1 is in antiparallel state. The left branch of the circuit is in cut-off
state, and the output of AND logic is grounded. In the meantime, NAND gate logic
is charged to maximum supply V DD. A discharge signal is applied at transistor N3
that may have different voltage according to the precharge or evaluation phases. Due
to MTJ1 in an antiparallel state, the NAND gate branch has a higher resistance and
causes slow discharge. AND gate branch discharges fastly, and due to that, transistor
P2 crosses its threshold voltage level, which turns on transistor P3. Also, transistor
N3 reaches its maximum supply. AND gate branch starts to discharge, and it crosses
the threshold voltage level of transistor N1, which makes AND node voltage to low
level, but transistor N3 is in on state that makes node voltage to a high level. That
318 J. Garg and S. Wairya
gives output 0 at AND gate logic and 1 at NAND gate logic. Similarly, OR gate
logic/NOR gate logic circuits can be explained.
The logic circuits NAND/AND gate and NOR/OR gate simulation using MTJ has
been carried out for performance evaluation. The simulation results of NAND/ AND
logic gate and OR/NOR logic gate using MTJ are illustrated in Figs. 6 and 7, respec-
tively. All the designs have been simulated in HSPICE tool with CMOS technology
of 32 nm [16] and MTJ model [17].
Table 1 shows the performance evaluation of the NAND circuit using MTJ design
with conventional CMOS NAND login design.
Table 2 shows the performance evaluation of the NOR circuit using MTJ design
with conventional CMOS NOR logic design.
Table 1 Performance
MTJ design Conventional design
evaluation of NAND/AND
gate using MTJ versus Dynamic power (nW) 1.8 4.7
conventional design Propagation delay (ns) 34 29
Standby power (nW) 0.5 3
Table 2 Performance
MTJ design Conventional design
evaluation of NOR/OR gate
using MTJ versus Dynamic power (nW) 2.8 5.2
conventional design Propagation delay (ns) 42 38
Standby power (nW) 0.9 4.6
Fig. 8 Comparison of power consumption of hybrid NAND/NOR gate with conventional CMOS
NAND/NOR gate
From the above results, as shown in Tables 1 and 2, it can be concluded that logic
circuits using MTJ show better performance than the conventional design (Fig. 8).
5 Conclusion
This research paper has tried to show that logic circuits using magnetic tunnel junction
show better results than conventional design. The impact of magnetic tunnel junction
on power issues also has been studied. In this paper, NAND and NOR circuits have
been simulated. It can be concluded that there is a significant reduction in power
consumption with the MTJ design that is more than 50%. A later novel model file
can be created for MTJ, so that better results can be achieved.
Performance Evaluation of Logic Gates Using Magnetic … 321
References
Abstract Internet of things (IoT) is playing a vital role in the development of various
high-performance smart systems. So much research is being done to improve the
quality of human life in various ways. One such research is in the field of hospital
management. After the recent COVID situation which has led to concept of social
distancing and contactless transactions, we propose a centralized hospital manage-
ment system using IoT. The proposed system is a mobile app that can be made use
of by the doctor to access the patient history from the centralized database. The
doctor can then make an E-prescription based on his diagnosis. The E-prescription is
generated as a QR code on the patient-side app. The patient can show the QR code to
the automatic medical dispensing machine (AMDM) which dispenses the prescribed
medicines to the patient by matching it against a QR code. This helps to avoid 70%
of the medical errors due to manual prescription and achieve the concept of social
distancing and contactless transaction.
1 Introduction
The Coronavirus is a big family of viruses which causes several respiratory infec-
tions varying from a simple common cold to more severe diseases such as severe
acute respiratory syndrome (SARS) and middle east respiratory syndrome (MERS).
About 80% of the people who become infected by COVID-19 recovers from it
without needing hospital treatment. But the other 20% becomes critical due to diffi-
culty in breathing. The elderly and those are already having medical problems like
diabetes, heart and lung disease, high blood pressure or cancer are at a greater risk of
becoming critical. The World Health Organisation (WHO) says that the COVID-19
is transmitted through the droplets from the nose or mouth expelled when COVID-19
infected coughs, sneezes, or speaks. A normal person could catch COVID-19 if they
breathe in these droplets, and hence, it is important to maintain social distancing.
These droplets can also stick to any objects and surfaces, and people can become
infected by them. Hence, a contactless transaction is very essential. Many hospitals
are finding it challenging to maintain the regular services at the hospitals during
the pandemic [1]. People who are suffering from chronic illnesses and those who
need regular hospital visits are found helpless due to this. According to the National
Health Mission administrative data statistics, lots of people have been affected due to
transmission of COVID from going to a hospital for regular treatment. Hence, in the
past few months„ hospitals have denied normal consultancies and also postponed
surgeries and operations to prevent patients from getting infected by COVID-19
virus. There is a growing need for automation to prevent this transmission and yet
provide the regular services in the health sector. Hence, it is proposed to develop an
automated medical dispensing machine that avoids human contact while coming to
the hospital and during distribution of medicine.
The hospital management systems in India are not prepared to contain the current
COVID situation. The hospitals in India are still in a developmental stage and have
a lot of challenges like inadequate doctors, nurses, medication, and equipment. In
the current pandemic situation with no vaccines being discovered, the best chance
is to avoid people from getting infected. The doctors who are working in hospitals
have refrained themselves from going to their own home and isolated themselves in
separate accommodation in fear of infecting their family members. The same is the
case for non-COVID patients who visit the hospital. Many cases were reported to
have caught COVID-19 due to their recent doctor visit for their chronic illness which
is literally like from frying pan into the fire kind of situation. Now, the healthcare
sector must look for ways to provide general services for the regular patients yet
prevent the infection from spreading.
Whether rural or urban, there is critical need to avoid meeting physically yet provide
the regular hospital service undauntedly. Let us consider an example of diabetic
patient who visits the hospital on a regular basis. Every month the doctor might require
checking his blood sugar levels and provide consultation and prescribe medication
according to the patient’s condition. In a regular visit, the patient might physically
visit the doctor and get these above steps done. The same can be initiated virtually
Medical IoT—Automatic Medical Dispensing Machine 325
using a mobile application along with the sensors in the device. If any patient needs to
get medicines, that can be done through an automated machine without any physical
contact. Hence, the automatic medical dispensing machine helps.
• To ensure social distancing
• To ensure contactless transaction for payment and distribution of medication.
4 Existing Solutions
At Brigham and Women’s Hospital (BWH), the physician enters the medication
details into a Computerized Physician Order Entry Software [3]. The medications
entered by the physician are sent to the pharmacy information system electroni-
cally. Very commonly prescribed medicines were stocked in semi-automated medical
dispensing machines, while the least commonly prescribed medicines are kept under
the monitoring of the nursing unit. This system does not record the medications given
to the patient, and without this historical data of medication, this system is prone to
prior mentioned drug duplication problem.
The Sayan-HIS (hospital information system) [4] is used by the physician to automate
the process of prescription for medicines and also for lab tests. The patient need
326 C. V. Nisha Angeline et al.
not carry any paper bills or prescriptions even for lab tests, and everything will be
available in the system on time of the visit of the patient. The HIS also puts a check
to the prescriptions made by the physician by checking the dosage that could be
administered for a medicine.
5 Proposed Solution
The proposed solution is to have a centralized hospital database across the country.
Every citizen is linked to the database as doctor/patient using their Aadhar id. Any
citizen visiting any hospital across the country will be asked for their Aadhar id
for registration. The DocHelp app is used by the doctor to go through the previous
medical history of the patient from the centralized database. The database stores all
the previous consultation of the patient by any doctor he has consulted. The doctor
can
(a) View Profile
(b) View Medical History
(c) View Drug History
(d) Prescribe Medicines.
For accessing the patient records, the doctor must scan the QR code from the
patients MediHelp app. Once the consultation is done, the doctor gives an E-
prescription in the MediHelp app against the patients id. The E-prescription is auto-
matically available for the patient. The payment for the consultancy can also be
done online. The above-mentioned steps are for a normal consultation situation.
In our current pandemic situation as COVID-19, where the visit to the hospital is
restricted, the doctors can allot slots in his calendar for the patients to book for his
virtual assistance. During the slot, the doctor can virtually attend the patients and
consult them via the same above-mentioned apps which would provide options for
video calling. In case, further assistance is required, the doctor can allot a slot for
Medical IoT—Automatic Medical Dispensing Machine 327
the patients visit such that he avoids crowd in the hospital. The E-prescription can
either be ordered through online booking for home delivery or dispensed via the
automated medical dispenser machine (AMDM) which will be like ATM placed in
many locations. When the patient shows his QR code to the panel in the AMDM
machine, the machine dispenses the medicines as per the prescription. The overall
database of the machine, the hospitals, and the patients is maintained in the AMDM
Admin app (Fig. 1).
System Workflow
See Figs. 2, 3, 4 and 5.
The proposed project AMDM overcomes the human interaction during a pandemic
situation as this COVID-19. When compared to traditional manual systems, the
Medical IoT—Automatic Medical Dispensing Machine 329
References
1. https://www.who.int/emergencies/diseases/novel-coronavirus-2019
2. Poon EG, Cina JL, Churchill W, Patel N, Featherstone E, Rothschild JM, Keohane CA, Whit-
temore AD, Bates DW, Gandhi TK (2006) Medication dispensing errors and potential adverse
drug events before and after implementing bar code technology in the pharmacy. Ann Intern
Med 145:426–434
3. Maviglia SM, Yoo JY, Franz C, Featherstone E, Churchill W, Bates DW, Gandhi TK, Poon
EG (2007) Cost–benefit analysis of a hospital pharmacy bar code solution. Arch Intern Med
167:788–794
4. Kazemi A, Ellenius J, Pourasghar F, Tofighi S, Salehi A, Amanati A, Fors UG (2011) The effect
of computerized physician order entry and decision support system on medication errors in the
neonatal ward: experiences from an iranian teaching hospital
Performance Analysis of Digital
Modulation Formats in FSO
Abstract Free space optics (FSO) is the technology, wherein information is being
transferred from one end to another by propagating optical signals in the atmo-
sphere. Its working is similar to that of optical fiber, but there is no need of any
physical link to establish connection between the transmitter and the receiver end.
FSO communication is both a faster and cost-effective technique as compared to that
of conventional optical fiber communication. In the proposed work, On–Off keying
(OOK) and Phase-shift keying (PSK) techniques have been compared. The most
important parameters considered are a quality factor (Q), bit error rate (BER), and
height of eye. Here in this work, the study related to modulation schemes is considered
for the fulfillment of three tasks, which are important, efficiency of a transmission
data, resistivity to a nonlinearity, and ease of implementation. The complete study is
performed using electrical and optical models.
1 Introduction
(QPSK). BPSK uses two phases for the carrier signal, which are separated by 180° [5].
In BPSK modulation, first phase denotes “1”, whereas the second phase denotes “0”
[5]. Signal changes its level, phase alterations with 180°, or 0°. QPSK modulation
uses four different phase values separated by 90°. QPSK allows space for other
users by decreasing the data rate to half. QPSK, also known as quadri-phase. BPSK
transmits 2 bps, whereas QPSK is used to transmit 4 bps [4]. We can also use the
multi-level modulation format, because it may increase the bit rate and capacity of
fiber communication [6].
At the beginning of optical fiber communication, the digital information is firstly
converted into digit “0” and digit “1” [7, 8]. Amplitude-shift keying (ASK) is widely
used in optical systems and networks of many scales. PSK has demonstrated advan-
tages compared to ASK in long haul transmission systems [9]. Depending on their
demands, the receiver can be designed, and it can be made by using optical delay
line interferometer and photodetector. Digital modulation techniques can be imple-
mented for optical communication by making use of optical devices, such as delay
line interferometer, which are typically Mach–Zehnder (MZI) type, or Michelson-
type interferometers based on multiple beams interference, in which one beam is
time delayed with respect to other by the desired interval.
Lithium Niobate (LiNbO3 ) has a very high intrinsic modulation bandwidth, and
the device switching speed is limited by a variety of physical constants [10]. So,
the better option is Mach–Zehnder (MZ) modulator. MZ modulator has three ports:
First port is for modulation, second for continuous wave (CW) LASER, and third
terminal gives the output [11]. Usually, at the receiver side, a delay line interferometer
based on silicon photonics can also be built for the modulation to maintain a low-
cost direct detection [12]. The delay line interferometers are also known as optical
differential phase-shift keying (ODPSK) demodulators. As applied to DPSK, the
modulation delay line interferometer converts a phase keyed signal into an amplitude
keyed signal. MZI is used to absorb interference when the interference of light wave
happens.
2 Implementation of Modulation
Khajwal et al. [13] have proposed a paper and have studied various compensa-
tions related to FSO. The author has analyzed the performance of FSO—single
input single output (SISO) and FSO—wavelength division multiplexing (WDM) for
several atmospheric situations. The quality of the received signal has been studied,
as transmission range and power in different weather conditions.
Elayoubi et al. [14] have demonstrated 50% DPSK with return-to-zero (RZ) which
proof to perform better than differential phase-shift keying (DPSK) and non-return-
to-zero on–off keying (NRZ-OOK) from satellite to ground at 40 Gbps bit rates.
However, the electrical NRZ-RZ converter in transmitter remains essential as supple-
mentary apparatus, which is the drawback [15]. It has been assessed using simulation
software.
Performance Analysis of Digital Modulation Formats in FSO 333
Alifdal et al. [16] have studied optical signal-to-noise ratio (OSNR) for different
chromatic dispersion values. Based on offset quadrature phase-shift keying (OQPSK)
modulation, the author has proposed the advantages of wavelength division multi-
plexing (WDM). Using MATLAB, and Optisystem simulations, that the author has
shown, the improvement in BER and OSNR, when the dispersion coefficient is high.
Instead, the author has observed the effectiveness of the given system in terms of
decreased BER.
Sanou et al. [17] have demonstrated the orthogonal frequency division multi-
plexing associated with the OQPSK filter. In similar studies, this method can send
signals to a greater distance, and it does not require equalization for a certain distance.
Through the work, bandwidth is maximized, because cyclic prefix is not used.
Moreover, the complication of transmitter and receiver remains compact.
3 Digital Schemes
The phase shift may have occurred for logic “1” only. Based on numerical simulation,
the conversion from NRZ-OOK to RZ BPSK and RZ QPSK may be possible up to
40 Gbps systems [14]. Therefore, we can convert other formats into DPSK, to get
more advantages.
The electrical system basic setup is given in Fig. 1. The system can be separated
into two important segments, which are the transmitter and receiver. First, the DPSK
electrical signal is generated by using pseudorandom binary sequence (PRBs) at a
sequence length of 256 bits, and RZ wave, and passed through an electrical link. The
receiver consists of a DPSK pulse generator, in which the bit rate is 30.375 Mbps, and
the sample rate is 19.4 GHz. Here, 64 samples per bit are used. Quadrature detector
334 M. Gautam and S. Sahu
is operated at 550 MHz with a gain of 2 dB. Figure 2 shows the eye diagram for
DPSK electrical. BER showed error-free operation [22].
The designing of the DPSK signal with an optical link is shown in Fig. 3. In this
method, the signal is optically modulated in DPSK format, after that it is passed
through fiber. The transmitter system consists of the CW LASER source with 4 mW
power, and frequency of 193.1 THz, non-return-to-zero (NRZ) waveform is generated
from the binary values of PRBs at length 128 bits and is provided to MZ modulator
at a data rate of 40 Gbps. For long haul transmission, six optical loops are used. At
the receiver side, a low-pass filter, which is operated at a 0.8 × bit rate, is used. Bit
rate is a function of total system length [22]. The output of the system is observed at
60 km. Figure 4 shows the eye diagram of this system.
The direct and shifted sequences are combined with the carrier and then added to
generate a digital OQPSK. Usually, OQPSK modulator dissipates greater energy and
area [23].
Figure 5 shows the basic setup for the OQPSK electrical system, in which the OQPSK
signal is generated electrically and then sent through the electrical channel. It is
mainly consisting of two segments, which are the transmitter and receiver. In the
transmitter system, the OQPSK signal is generated electrically by using PRBs at a
length of 256 bits, and RZ wave, which is passed through an OQPSK modulator. The
quadrature detector is operated at 550 MHz with a gain of 2 dB, in which the bit rate
Fig. 6 Eye diagram for OQPSK electrical in-phase and quadrature phase
is 30.375 Mbps, and the sample rate is 19.4 GHz. Here, 64 samples are taken per bit.
The receiver mainly consists of an OQPSK pulse generator. Figure 6 shows the eye
diagrams for I and Q outputs.
The designing of the OQPSK optical is shown in Fig. 7. Binary data generates a
waveform, which will be separated by 45°. In the last, the desired signal is to be sent
by adding two phases. The transmitter consists of a CW LASER source with 4 mW
power and a frequency of 193.1 THz, and NRZ waveform is produced from the binary
values of PRBs at length 256 bits and is provided to the MZ modulator at a data rate
of 40 Gbps. The two modulators have a 90° phase difference between them. The
2 MZ modulates an optical signal to produce I and Q parts. The receiver consists of
a demodulator, working on the principle of 90° optical hybrid coupler. The received
Fig. 8 Eye diagram for OQPSK optical in-phase and quadrature phase
signal is modulated and separated into I and Q parts, as shown in Fig. 8. Here, 32
samples are taken per bit, and the response is observed at 10 km. We visualize the
BER and OSNR by using an OSNR analyzer and a BER analyzer.
After running simulation, we may visualize results from eye diagram.
RZ-DPSK proof to perform better than DPSK, and NRZ-OOK, practically because
of inter-symbol interference (ISI) reduction, when RZ pulses are used [14].
In this paper, we are focusing on various approaches for the mitigation of FSO
channel impairments and the effect of optical propagation limitations. We have
described the comparison of DPSK and OQPSK. Table 1 shows the comparison
between different parameters used for DPSK and OQPSK techniques. In general,
the maximum acceptable bit error rate is about 10–9 [24].
Usha et al. [23] suggested a new approach for OQPSK, in which the comparison
between existing and booth multiplier method had been done. This paper was focused
on the area reduction. The work did not provide any calculation about the error, quality
factor, and the height of the eye. The work did not compare the method from the
other formats of PSK, in terms of the majorly used parameters.
7 Conclusion
In the proposed work, we have compared the most popular modulation systems.
Usually, the FSO transmission medium contains transmitter and receiver. It is a
technique that is used to send the information at visible or infrared (IF) optics. The
fast growth of Internet-based services and the mobile telephony, as well as the advent
of multimedia services will lead to a huge increase in the traffic. It requires a massive
extension of the transport capacity of public networks. It has been identified that
the modulator which consumes low power may have greater efficiency. OQPSK
modulator consumes more power, so we prefer DPSK if the power is a significant
parameter. It is also observed that by using the LASER beam combination technique
in the multibeam FSO system, an effect of the atmospheric turbulence is significantly
reduced.
References
1. Kaur M, Anuranjana SK, Kesarwani A, Vohra PS (2018) Analyzing the internal parameters of
free space optical communication. In: 2018 7th international conference on Reliability, Infocom
Technologies and Optimization (trends and future directions) (ICRITO), Noida, India, 2018,
pp 298–301. https://doi.org/10.1109/ICRITO.2018.8748589
2. Chraplyvy AR (1994) Impact of nonlinearities on lightwave systems. Opt Photon News 5:16–21
3. Bloom S, Korevaar E, Schuster J, Willebrand H (2003) Understanding the performance of
free-space optics [Invited]. J Opt Netw 2:178–200
4. Prakash SA, Banu AT, Raghul EB, Prakash P (2018)Multilevel modulation format conver-
sion using delay-line filter. In: 2018 IEEE world symposium on communication engineering
(WSCE), Singapore, Singapore, pp 1–4. https://doi.org/10.1109/WSCE.2018.8690527
5. Kishikawa H, Seddighian P, Goto N, Yanagiya S, Chen LR (2011) All-optical modulation
format conversion from binary to quadrature phase-shift keying using delay line interferometer.
In: IEEE Photonic Society 24th Annual Meeting, Arlington, VA, pp 513–514. https://doi.org/
10.1109/PHO.2011.6110647
6. Kikuchi N (2005) Amplitude and phase modulated 8-ary and 16-ary multilevel signaling tech-
nologies for high-speed optical fiber communication. In: Proceedings of SPIE 6021, optical
transmission, switching, and subsystems III, 602127 (9 December 2005). https://doi.org/10.
1117/12.636406
7. Charlet G (2006) Progress in optical modulation formats for high-bit rate WDM transmis-
sions. IEEE J Sel Top Quantum Electron 12(4):469–483. https://doi.org/10.1109/JSTQE.2006.
876185
8. Gumaste A, Antony T (2002) DWDM network designs and engineering solutions. Cisco Press,
Indianapolis
340 M. Gautam and S. Sahu
9. Yan C et al (2006) All-optical format conversion from NRZ to BPSK using a single saturated
SOA. IEEE Photon Technol Lett 18(22):2368–2370. https://doi.org/10.1109/LPT.2006.885633
10. Wooten EL et al (2000) A review of lithium niobate modulators for fiber-optic communications
systems. IEEE J Sel Top Quant Electron 6(1):69–82. https://doi.org/10.1109/2944.826874
11. El-Nahal FI, Salha MA (2013) Comparison between OQPSK and DPSK bidirectional radio
over fiber transmission systems. Univers J Electr Electron Eng 1(4):129–133. https://doi.org/
10.13189/ujeee.2013.010405
12. Zheng L, Du J, Xu K, Wu X, Tsang HK, He Z (2017) High speed DPSK modulation up to
30 Gbps for short reach optical communications using a silicon microring modulator. In: 2017
16th international conference on optical communications and networks (ICOCN), Wuzhen, pp
1–3. https://doi.org/10.1109/ICOCN.2017.8121199
13. Khajwal TN, Mushtaq A, Kaur S (2020) Performance analysis of FSO-SISO and FSO-WDM
systems under different atmospheric conditions. In: 2020 7th international conference on signal
processing and integrated networks (SPIN), Noida, India, pp 312–316. https://doi.org/10.1109/
SPIN48934.2020.9071116
14. Elayoubi K, Rissons A, Lacan J, St. Antonin L, Sotom M, Le Kernec A (2017) RZ-DPSK optical
modulation for free space optical communication by satellites. In: 2017 Opto-electronics and
communications conference (OECC) and photonics global conference (PGC), Singapore, pp
1-2. https://doi.org/10.1109/OECC.2017.8115015
15. Okamura Y, Hanawa M (2012) All-optical generation of optical BPSK/QPSK signals inter-
leaved with reference light. IEEE Photon Technol Lett 24(20):1789–1791. https://doi.org/10.
1109/LPT.2012.2209867
16. Alifdal H, Abdi F, Abbou FM (2017) Performance analysis of an 80 Gb/s WDM system
using OQPSK modulation under FWM effect and chromatic dispersion. In: 2017 international
conference on wireless technologies, embedded and intelligent systems (WITS), Fez, pp 1–6.
https://doi.org/10.1109/WITS.2017.7934663
17. Sanou SR, Zougmore F, Koalaga Z (2014) Performances of OFDM/OQPSK modulation for
optical high-speed transmission in long haul fiber over 1600 Km. Glob J Res Eng Gener Eng
14(2):9–14
18. Vanmathi P, Sulthana AKT (2019) Hybrid optical amplifier performance in OAF using OOK
and BPSK modulations. In: 2019 international conference on intelligent computing and control
systems (ICCS), Madurai, India, pp 695–699. https://doi.org/10.1109/ICCS45141.2019.906
5900
19. Mishina K, Kitagawa S, Maruta A (2007) All-optical modulation format conversion from on-
off-keying to multiple-level phase-shift-keying based on nonlinearity in optical fiber. Opt Expr
15:8444–8453
20. Zhou Y, Lord A, Sikora S (2002) Ultra-Long-Haul WDM transmission systems. BT Technol J
20:61–70. https://doi.org/10.1023/A:1021386818577
21. Novak S, Moesle A (2002) Analytic model for gain modulation in EDFAs. J Lightwave Technol
20:975. https://www.osapublishing.org/jlt/abstract.cfm?URI=jlt-20-6-975
22. Sato K (2002) Semiconductor light sources for 40-Gb/s transmission systems. J Lightwave
Technol 20(12):2035–2043. https://doi.org/10.1109/JLT.2002.806763
23. Chraplyvy AR, Tkach RW (1993) What is the actual capacity of single-mode fibers in ampli-
fied lightwave systems? IEEE Photon Technol Lett 5(6):666–668. https://doi.org/10.1109/68.
219704
24. Usha SM, Mahesh HB (2019) Low power and area optimized architecture for OQPSK modu-
lator. In: 2019 international conference on wireless communications signal processing and
networking (WiSPNET), Chennai, India, pp 123–126. https://doi.org/10.1109/WiSPNET45
539.2019.9032723
25. Shaina, Gupta A (2016) Comparative analysis of free space optical communication system for
various optical transmission windows under adverse weather conditions. Procedia Comput Sci
89:99–106
High-Level Synthesis of Cellular
Automata–Belousov Zhabotinsky
Reaction in FPGA
Abstract The Belousov Zhabotinsky (BZ) reaction is a chemical reaction that oscil-
lates in space and time. The reaction involves complex mechanisms and steps.
However, simple mathematical models of cellular automata (CA) can be used to
replicate the reaction. Cellular automatons have been used for a long time in history
to simulate various physical processes, and the simulations can reveal mathemat-
ical details involved in a seemingly complicated procedure. CA design prototypes
are used to model diffusion processes and chemical reactions. CA models are imple-
mented in FPGAs to achieve accelerated results when compared to CPU-based archi-
tectures. In this research, we implement the BZ reaction on a Xilinx FPGA using a
high-level synthesis methodology with Vivado HLS.
1 Introduction
The idea of CA was introduced by Neumann [1] and Stanislaw Ulam in the 1950s
as a discrete computational model and used to represent complex and nonlinear
dynamic systems. It has a variety of applications in real world [2, 3], from modeling
complex biological systems [4], developing better cryptography algorithms [5], and
to providing a framework to model impacts of socio-economic practices on environ-
ment models [6]. According to Stephen Wolfram [7], cellular automata are primarily
classified into four types as automata in which.
The Belousov Zhabotinsky reaction (BZ reaction) [12] as shown in Fig. 1 is a common
example of a time-based oscillatory chemical reaction. It follows nonlinear chemical
dynamics. When the reaction happens on a two-dimensional plane, self-organized
spirals of different reagents are formed. The reagents have distinctive colors that
help in distinguishing each other. Numerous computer models were proposed to
simulate the spiraling pattern generated by the reaction. In this work, we implement
a simplified version of the BZ reaction presented by Ball [13].
High-Level Synthesis of Cellular Automata–Belousov … 343
Fig. 1 Belousov
Zhabotinsky chemical
reaction [11]
The simplified form of the BZ reaction from [13] can be represented by a sequence
of reactions.
X + Y → 2X (1)
Y + Z → 2Y (2)
Z + X → 2Z (3)
Equation (1) means that, given a sufficient quantity of Y exists, the creation of X
is autocatalyzed. Similarly, Eqs. (2) and (3) are also autocatalyzed until the usage of
Z and X, respectively. Summing up all the three equations, we can observe that the
reaction is a complete cycle. With these equations, we can now model the reaction’s
cellular automaton.
344 P. Purushothaman et al.
Yt+ = Yt + Yt (Z t − X t ) (5)
Z t+ = Z t + Z t (X t − Yt ) (6)
At every instant, the future concentration of any reagent depends upon the concen-
tration of the remaining two. We can introduce additional parameters for modifying
individual reaction rates. However, we ignore them here for the sake of simplicity.
The above reactions are valid only for a single spatial location and can be modeled
without a CA. For representing an oscillating reaction on a two-dimensional surface,
a cellular automaton is needed. The concentrations are averaged over a 3 × 3 neigh-
borhood window to introduce diffusion between reagents, and the results are applied
in Eqs. 4–6. Thus, reaction at a location is influenced by eight of its neighbors. More
sophisticated models can be created by considering a broader neighborhood (Fig. 2).
We have implemented the cellular automaton on a von Neumann grid, also known
as a toroidal grid. In a toroidal grid, the tessellations are always continuous, and
there are no edges. For practically implementing the toroidal grid, we implemented
a two-dimensional array where every edge was warped to their opposite edge.
In the previous section, we explained the reaction mechanism, and in this section,
we explain how the CA is implemented in C++ and the optimizations introduced for
implementing in FPGA.
The concentrations of the reagents are represented as real values and stored in
fixed-point arrays. Two such arrays are created to represent the current and future state
of the concentrations. The initial concentration states are assigned random values. In
the real-life scenario, this process mimics the sites which start the chemical reaction.
After this, we compute the average concentrations of the reagents based upon the
neighborhood and proceed with the reaction equations. Two of the commonly used
neighborhood windows are Moore neighborhood and von Neumann neighborhood
as shown in Fig. 3. If the new concentration of a reagent reaches above one or below
High-Level Synthesis of Cellular Automata–Belousov … 345
Fig. 3 Moore and von Neumann grid with extended neighborhood schemes
zero, they are clipped to 1 or 0, respectively. It ensures that no single reagent saturates
and dominates the reaction. The future state now becomes the current state, and the
current state variables are used for storing the future state. The flowchart (Fig. 4)
Fig. 4 Flowchart
346 P. Purushothaman et al.
explains the process. We initialize the concentrations randomly in the first step. In
the subsequent steps, we update the future concentrations based upon the current
concentrations and swap the current state to future state and vice-versa.
Even though the algorithm written in C++ can directly be implemented with Vivado
HLS, it can be optimized for FPGA implementation, as stated earlier. In this section,
we list the optimizations introduced to the native C++ code. The cellular automaton
model is implemented as a 128 × 128 pixel two-dimensional von Neumann grid.
We chose the Xilinx Zynq Z720 series platform for the purpose. The CA model
architecture’s latency can be decreased further by pipelining the execution. Pipelined
architectures try to club independent operations together and execute them in parallel.
Compiler directive commands can easily invoke pipelining in Vivado HLS. The
instructions try to accommodate pipelining to the possible extent beyond which the
designer is warned to modify the program flow. The concentration variables are
stored in two-port RAM for improving the pipeline performance.
4 Results
4.1 Simulation
The simulation output from Vivado HLS was converted into a picture-sequence
format using additional python scripts. It helped us to visualize the reaction in a
better way. Figures 5, 6 and 7 shows the proceeding of the simulated reaction.
4.2 Synthesis
The HLS code was synthesized after being successfully simulated. The compiler
directives for pipelining and code optimization were included. The synthesized HDL
resource usage is shown in Fig. 8 and given in Table 1.
The results for the executions of the algorithm are summarized in the table below.
The maximum frequency for the algorithm was 119.55 MHz.
5 Conclusions
Reference
10. Feist T (2012)Vivado design suite. Xilinx Inc., San Jose, White Paper 5, p 30
11. Recreating one of the weirdest reactions, NileRed, Uploaded in Youtube. Link: https://www.
youtube.com/watch?v=LL3kVtc-4vY
12. BZ reactions—Chemistry by Michael Rogers, Stephen Morris
13. Ball P (1996) Designing the molecular world: chemistry at the frontier. Princeton University
Press, vol 19
IoT-Based Calling Bell
Abstract IoT-based calling bell is used to maintain security and to know imme-
diately who has visited our house. Whenever the person is at the door and hits the
calling bell button, then immediately a call is made to the given phone number. The
owner then receives the call and can communicate with the visitor that if you are out
of house informing them that you will be back home in few minutes or you may ask
them to come on some other convenient time. It also sends a message to the phone
number of owner telling them that some had visited his/her house. This IoT device
is built in a way to improve the efficiency and working of the bell with software.
1 Introduction
The owner needs to register his/her details like name and phone number. The IoT-
based calling bell product will be set up in the house owner and two buttons will be
provided in the system which allows visitor to press the bell buttons. If one button is
pressed, the system sends SMS to the owner that someone had visited the house, and
if other button is pressed, the system makes a call to owner registered number in the
system. The owner will also be provided an option of keeping only one button which
sends both SMS and making call to owner alternatively. This allows owner to know
that someone had visited his/her house. This provides an efficient security system to
the house. In existing calling bells, there is a facility which just allows user to hit the
calling bell and as a result of it just rings in the house. If the user presses the calling
bell in the absence of people at home, the owner is unable to identify who has visited
his house. In this context, this application/device is not at all useful to the owner.
In proposed system, advantage of higher security, convenient and efficient commu-
nication with the visitor. In order to make the owner alert, the stranger presses the
bell at the door at first and then it sends a message to the owner. Next, the commu-
nication is sent to the given phone number through a call automatically. The owner
then receives message that someone is at their door and then the call is made and
can communicate with the visitor that if you are out of house informing them that
you will be back home in few minutes or you may ask them to come on some other
convenient time.
2 Proposed Method
First the owner needs to register his/her details like name and phone number. The
IoT-based calling bell product will be set up in the house owner. Two buttons will be
provided in the system which allows visitor to press the bell buttons. If one button
is pressed, the system sends SMS to the owner that someone had visited the house.
If other button is pressed, the system makes a call to owner registered number in the
system. The owner will also be provided an option of keeping only one button which
sends both SMS and making call to owner alternatively. This allows owner to know
that someone had visited his/her house. This provides an efficient security system to
the house.
In the event of setting up Arduino software, there is a need to go back to home screen
and select the desired board from the list on the right column of the page. In this
context, the manufacturers of ESP 8266 AI THINKER have launched A6 GSM.
• It is observed that the module is less expensive than SIM 900 which can be
connected in an easy manner. In the above diagrams, it is seen how to connect
with Arduino while making a call and sending SMS.
• In order to provide power to the A6 GSM mobile, a mobile adapter is used where
vcc pin of GSM can be circled with PWR_KEY pin that acts as a chip enable. In this
process, it can be connected or removed at regular intervals whenever necessary.
To start the module, it is required to operate a HIGH trigger at PWR_KEY pin.
• And then, a suitable SIM is inserted in the stipulated module which is mainly to
fix micro SIM. When a nano- SIM is used, it is required to utilize a converter to
fit the SIM in the slot.
• The RxD pin of A6 GSM is connected to Tx of Arduino.
• The TxD pin of A6 goes to Rx of Arduino.
• GND pin of A6 to GND of Arduino (Fig. 1).
IoT-Based Calling Bell 353
4 Screens
5 System Test
In this, test cases are decided based on specification of the system. The software or
the module to be tested is treated as a black box testing, and hence, this also called
black box testing (Figs. 5, 6, 7, 8, 9, 10, 11 and 12; Tables 1, 2, 3, 4, 5, 6, 7, and 8).
6 Conclusion
Therefore, it is concluded that the owner of the concerned house is required to register
the information like name and phone number. The IoT-based calling bell with two
buttons can be established that facilitates the visitor to press the bell buttons. Suppose
one button is pressed, the system communicates SMS to the owner when someone
visits the house. When the other button is pressed, the system makes a call to the
owner’s register number. And also, the owner is given an opportunity to send SMS
and make a call alternatively with the help of a button that allows him to about
the identification of the visitor. It works like an effective security system when it is
IoT-Based Calling Bell 357
connected with a mike, speaker and camera. It enables the system to make a call with
what the owner is able to communicate with the visitor with the help of a video call.
358 S. B. Maddu et al.
References
1. https://projects.raspberrypi.org/en/projects/noobs-install
2. https://www.dummies.com/computers/arduino/how-to-install-arduino-for-windows/
3. https://www.cloudreach.com/blog/iot-doorbell/
4. https://www.hackster.io/taiyuk/iot-doorbell-faee18
5. https://harizanov.com/2013/07/raspberry-pi-emalsms-doorbell-notifier-picture-of-the-person-
ringing-it/
6. https://www.twilio.com/blog/2016/11/make-receive-phone-calls-python-bottle-twilio-voice.
html
7. https://raspi.tv/2013/how-to-use-interrupts-with-python-on-the-raspberry-pi-and-rpi-gpio
8. https://github.com/heston/ding-dong-ditch
9. https://www.twilio.com/docs/tutorials/server-notifications-python-django#listing-our-server-
administrators
10. https://www.twilio.com/blog/2016/11/make-receive-phone-calls-python-bottle-twilio-voice.
html
11. https://www.fullstackpython.com/blog/send-sms-text-messages-python.html
12. https://www.twilio.com/docs/guides/how-to-make-outbound-phone-calls-in-python
Mobile Data Applications
Development of an Ensemble Gradient
Boosting Algorithm for Generating
Alerts About Impending Soil Movements
Abstract Natural disasters such as landslides are the source of immense damage
to life and property. However, less is known on how one could generate accu-
rate alerts against landslides sufficiently ahead in time. The major objective of this
research is to develop and cross-validate a novel ensemble gradient boosting algo-
rithm for generating specific warnings about impending movements of soil at a
actual landslide site. Data about soil movements at 10-min intervals were collected
via a landslide monitoring system deployed at a actual landslide site in real world
situated at the Gharpa Hill, Mandi, India. A new ensemble support vector machine–
extreme gradient boosting (SVM-XGBoost) algorithm was developed, where the
alert predictions of an SVM algorithm were fed into an XGBoost classifier to predict
the alert severity 10-min ahead of time. The performance of the SVM-XGBoost
algorithm was compared to other algorithms including, Naïve Bayes (NB), decision
trees (DTs), random forest (RF), SVMs, XGBoost, and different new XGBoost vari-
ants (NB-XGBoost, DT-XGBoost, and RF-XGBoost). Results revealed that the new
SVM-XGBoost algorithm significantly outperformed the other algorithms incor-
rectly predicting soil movement alerts 10-min ahead of time. We highlight the
utility of developing newer ensemble-based machine learning algorithms for an alert
generation against impending landslides in the real world.
1 Introduction
Natural disasters are source of vast damage to property and lives [1]. Landslides are
most common natural disasters in hilly regions [2]. These landslides cause problems
like roadblocks, as well as other forms of damage to lives and property [2]. If people
and authorities could be alerted about soil movements on hills sufficiently in advance,
then these alerts may help to timely evacuate people from landslide sites as well as
divert traffic on roads about to be blocked by a landslide [3]. To generate alerts from
landslide sites, one needs to develop and deploy landslide monitoring systems. Recent
research has developed and used the internet of things (IoT)-based landslide moni-
toring systems on real-world landslide sites [4]. For example, the developed system
is capable of recording soil movements and logging them into a cloud server at 10-
min intervals. The data recorded consists of readings of five three-axis accelerometer
sensors placed vertically beneath the soil sub-surface 1-m apart from each other at a
landslide site. The soil displacement values are computed using these accelerations,
and the displacement values (soil movements) are then used to monitor soil move-
ments and impending landslides. Thus, data collected by the deployed system could
then be used to generate timely alerts via SMSes on cellphones with some lead time
[4–7]. However, the generation of alerts of different severity ahead of time may need
the involvement of machine learning (ML) algorithms.
Study of literature reveals that a lot of ML algorithms have been used for landslide
applications [4–15]. For example, Bui et al. used support vector machine (SVM),
decision tree (DT), and Naïve Bayes (NB) ML algorithms for landslide suscepti-
bility mapping in Vietnam [11]. Similarly, Chen et al. used support vector machine
(SVM), random forest (RF), and a logistic model tree (LMT) for landslide suscep-
tibility mapping in the long county area (China) [12]. Also, Kumar et al. compared
ensemble and non-ensemble ML algorithms to predict the amount of debris flow at
the Tangni landslide in India. Non-ensemble algorithms (sequential minimal opti-
mization (SMO), and autoregression (AR)) and ensemble algorithms (random forest,
bagging, stacking, and voting) involving the non-ensemble algorithms have also been
proposed to predict weekly debris flow at the Tangni landslide (India) [6]. Similarly,
Sahin et al. have proposed gradient boosting machines (GBM), extreme gradient
boosting (XGBoost), and random forest (RF) algorithms for landslide susceptibility
mapping for the Ayancik District of Sinop Province (Turkey) [16].
The different ML applications detailed above have either been for susceptibility
mapping or the prediction of the amount of debris flow on a landslide. However,
less attention has been given on the generation of alerts from landslide sites ahead in
time based upon soil displacements occurring currently and in the recent past. The
prime objective of this paper is to address this gap in the literature and to develop
ML algorithms for generating alerts about the severity of soil movements ahead in
time by relying upon recent soil movements. Specifically, in this paper, we propose
a new ML algorithm, support vector machine–extreme gradient boosting (SVM-
XGBoost), where the movement severity prediction of the SVM algorithm is first
obtained. Then it is fed into an XGBoost algorithm to derive the final predictions
Development of an Ensemble Gradient Boosting Algorithm … 367
about the severity of soil movements. These predictions could then be used for
generating alerts on cellphones of people living in the landslide-prone area. For
benchmarking the performance of the new SVM-XGBoost algorithm, we evaluated
the soil movement severity predictions from single algorithms (e.g., SVM [17], DT
[18], RF [19], NB [20], and XGBoost [21]) and other variants of the new ensemble
gradient algorithm (NB-XGBoost, DT-XGBoost, and RF-XGBoost).
In what follows, first, we discuss the related work and the data used for this study.
Then, we brief the different ML algorithms that we used for classification purpose
in this study. Finally, we report results from different ML algorithms and finalizing
the paper by highlighting the implications of our findings for prediction and alert
generation for impending soil movement.
2 Background
ML algorithms have been used in the past for the prediction of natural phenomena,
including soil movements and associated landslides [4–15]. Landslides are a result
of excessive soil movements, and the occurrence of landslides is rare events. Hence,
the application of ML in landslide prediction is a class imbalance problem [13].
Such issues may thus require the use of precision, recall or true positive (TP) rate,
false positive (FP) rate, F 1 score, receiver operating characteristic (ROC), area under
ROC curve (AUC), and sensitivity index (d ) compared to traditional measures like
accuracy [22].
Some ML approaches have been developed for landslide susceptibility mapping
[6, 7, 11–16]. For example, Ref. [11] used SVM, DT, NB for landslide susceptibility
mapping in Hoa Binh Province (Vietnam). Results showed that the SVM algorithm
performed the best, followed by DT and NB algorithms. Reference [12] used SVM
and RF and logistic model tree (LMT) for landslide susceptibility mapping in the
Long County area (China). Results showed that the RF algorithm outperformed the
other two algorithms. Due to the class imbalance datasets, Refs. [11, 12] used ROC
curves and AUCs to analyze different algorithms.
Reference [13] compared logistic regression, DT, SVM, RF, and multilayer
perceptron (MLP) algorithms to classify the landslide susceptibility mapping using
the rainfall and previous instances of landslide between 2011 and 2015 on National
Highway NH-21 between Mandi and Manali. They tackled the imbalanced class
problem using oversampling techniques to enhance the predictions. These algorithms
were validated using tenfold cross-validation, and sensitivity index (d ) was used to
evaluate the scores. The best performing algorithm was the RF, followed by DT and
logistic regression algorithms.
Reference [16] used gradient boosting machines (GBM), extreme gradient
boosting (XGBoost), and random forest (RF) algorithms to map landslide suscep-
tibility in the Ayancik District of Sinop Province, situated in the Black Sea region
of Turkey. 105 landslide locations in the area and 15 landslide causative factors
were used for this study [16]. The performance of the ensemble algorithms was
368 A. Pathania et al.
validated using different accuracy metrics, including AUC, overall accuracy (OA),
root mean square error (RMSE), and Kappa coefficient. Results showed that the
XGBoost method produced higher accuracy results and thus performed better than
other ensemble methods [16].
There is also literature on the use of ML for debris flow prediction [6]. For example,
Ref. [6] compared non-ensemble ML algorithms (sequential minimal optimization
(SMO), and autoregression) and ensemble ML algorithms (RF, bagging, stacking,
and voting) involving the non-ensemble algorithms to predict the weekly debris
flow at the Tangni landslide, India between 2013 and 2014. Results revealed that
the ensemble algorithms (RF, bagging, and stacking) performed better compared to
non-ensemble algorithms [6].
As explained above, both non-ensemble and ensemble ML algorithms were
proposed in literature either for landslide susceptibility mapping or for the prediction
of debris flow [6, 7, 11–16]. However, less attention has been given on the generation
of alerts ahead in time based upon soil displacement severity at landslide sites. This
research addresses this literature gap by considering the prediction of soil displace-
ment severity and soil movement alerts via a novel ensemble ML algorithm, support
vector machine–extreme gradient boosting (SVM-XGBoost).
Specifically, in this paper, we develop the SVM-XGBoost algorithm, where the
movement severity prediction of the former algorithm is first obtained. Then it is fed
into the later algorithm to derive the final predictions about the severity of soil move-
ments. We compare the performance of the SVM-XGBoost algorithm with other
single algorithms (SVM, DT, RF, NB, and XGBoost) as well as different novel
ensemble variants (NB-XGBoost, DT-XGBoost, and RF-XGBoost). For bench-
marking the performance of these new algorithms, we evaluated the soil movement
severity prediction from single algorithms like SVM [17], DT [18], RF [19], NB [20],
and XGBoost [21] using the tenfold cross-validation procedure [20]. The choice of
the single algorithms was based on their performance for landslide susceptibility
mapping or debris flow predictions in prior literature.
3 Methodology
3.1 Data
The dataset analyzed in this paper belongs to the Gharpa landslide in Mandi district
of Himachal Pradesh, India. This landslide is located on Mandi-Bajaura road, which
is a connecting route between Mandi and Kullu districts of Himachal Pradesh, India.
This road is of considerable significance as it is used as an alternate route during
monsoon when the national highway between Mandi and Kullu is blocked due to
heavy rains. Data on soil movements were collected from the Gharpa landslide on
a 10-min scale between July 26, 2019, and October 6, 2019, across five different
sensors in a single borehole. The borehole contained five sensors S1, S2, S3, S4,
Development of an Ensemble Gradient Boosting Algorithm … 369
S = ut + 1/2 at 2 (1)
Here u is the initial velocity, a is the acceleration, t is the time over which the
acceleration acts on the sensor, and S is the displacement. Here, u is zero, as there
is no initial velocity component. The acceleration values are the values given by
the MPU6050 accelerometer sensor; and, t is 0.001, which is one millisecond (time
taken by the accelerometer to read the values of accelerations) [4].
Equation 1 was used to derive the displacement along the three-axes, and the resul-
tant displacement of these three mutually perpendicular displacements was calculated
as the actual displacement for a particular sensor.
It was observed that sensor S3 possessed 1004 nonzero displacement instances
out of a total of 2344 zero and nonzero displacement instances, which were the
most among all five sensors. Thus, sensor S3 was the one that was closest to the
Fig. 2 Sensor placement in the borehole containing five sensors at regular depths of 1 m
landslide failure plane, and data of this sensor has been used for comparing different
algorithms.
We classified actual displacement for each instance of the sensor S3 into four
classes of displacements based upon their severity. The no displacement class (class
0) represented absolute zero displacements. The low displacement class (class 1),
moderate displacement class (class 2), and high displacement class (class 3) repre-
sented displacements based upon their percentiles (see Table 1). For computing these
percentiles, all S3 sensor displacements were converted into a Z-score (using mean
= 3.248 µm and standard deviation = 2.271 µm). Next, the Z-score ranges (see
Table 1) were used to compute different percentiles. Once data were divided into
different classes, it was split into 10-parts for a tenfold cross-validation procedure.
In the cross-validation procedure, each algorithm was trained on 9-parts and tested
on 1-part. This procedure was repeated 10-times, i.e., once for each tested part. The
training data were used to find the best values of parameters in different algorithms,
whereas the test data were used to test the algorithms with the best parameter values
found during training.
Development of an Ensemble Gradient Boosting Algorithm … 371
Table 2 Composition of
Class Number of instances
each class across data
Class 0 (No displacement class) 1340
Class 1 (Low displacement class) 377
Class 2 (Moderate displacement class) 406
Class 3 (High displacement class) 221
The most basic way of evaluating the performance of an ML algorithm is error rate
or accuracy of the algorithm. Accuracy is the rate at which a classifier classifies
correctly. However, evaluating the performance by accuracy can give ambiguous
assumptions for class-imbalanced data [5]. For example, 99% of instances of the
data belong to a particular class, now if a classifier is trained on this dataset, then it
can easily get an accuracy of 99% by simply classifying every instance as belonging
372 A. Pathania et al.
where function Z(p), p ∈ [0, 1], is the inverse of the cumulative distribution func-
tion of the Gaussian distribution. The greater (and more positive) is the value of sensi-
tivity index (d ) of an ML algorithm, the enhanced is the algorithm’s performance
compared to other algorithms.
We used the sensitivity index for evaluating the performance of different sets of
hyperparameters across algorithms using the tenfold cross-validation procedure.
uses the train data to indirectly map the given input space into a high-dimensional
feature space [23]. Further, in this high-dimensional feature space, the optimal
hyperplane which separates the classes are calculated by minimizing the classifi-
cation errors and maximizing the margins of class boundaries [23]. The objective
function is used to penalize the model for instances that are either misclassified
or lie within the margin [23]. The regularization parameter is a degree of impor-
tance, which is given to misclassifications while finding the optimal hyperplane.
The kernel parameter is used to vary the shape of this hyperplane [23].
2. Decision tree. It is a hierarchical algorithm composed of decision rules that
recursively split independent variables into zones. Split is done in such a way;
the maximum homogeneity for a node is achieved after every split [18]. Homo-
geneity of a sample is measured as its entropy (0 for completely homogenous
and 1 for equally heterogeneous) and based on the decrease in entropy after a
dataset is split on an attribute the information gain of that attribute is defined
[24]. The dataset is divided into branches by finding the attribute with the largest
information gain and repeating this process until the termination condition on
every branch gives our decision tree [24]. The min samples split parameter is
the minimum number of samples required to split an internal node, and the max
depth parameter is the maximum number of edges from the root to the leaf of
the tree [25].
3. Random forest. It is an ensemble algorithm that exploits many classification
trees (a “forest”) to stabilize the algorithm predictions [19]. It exploits binary
trees that use a randomly generated subset of the data which contain subset of
total attributes generated through bootstrapping techniques [17]. Every tree is
made in order to minimize classification errors, and an ensemble of multiple
trees is used to maximize the algorithm’s stability [21]. The number of estimator
parameters is the number of trees in the forest. Min samples split parameters
are the minimum number of samples required to split an internal node of a tree,
and the max depth parameter is the maximum number of edges from the root to
the leaf of the tree [26].
4. Naive Bayes. It is a probabilistic algorithm based on Bayes’ theorem. Using the
training data, it develops a posterior conditional probability for classification
into a particular class, given the feature instances, likelihood function, and a
prior probability [20]. This algorithm has no such hyperparameters.
5. XGBoost. Extreme gradient boosting (XGBoost) is a supervised learning
method based on decision tree boosting [27]. It uses ensemble technique to
sequentially adds up several decision trees and optimizes the loss function by
using the gradient descent boosting method [27]. It also uses a variety of other
ways to avoid overfitting and reduce time to completion [27]. The number of
estimators parameter is the number of trees in the model, and the max depth
parameter is the maximum number of edges from the root to the leaf for a tree
[28].
6. SVM-XGBoost. Support vector machine–extreme gradient boosting (SVM-
XGBoost) is the novel ensemble ML algorithm that we propose in this paper,
which is based on the ensemble of two algorithms, the former SVM and the
374 A. Pathania et al.
4 Results
Table 4 Best hyperparameters for different new classification algorithm on the landslide dataset
Classifier Best hyperparameter for the first Best hyperparameter for XGBoost in
algorithm each new algorithm
SVM-XGBoost SVM: (Kernel: RBF, C: 1000) XGBoost: (N estimators: 150, Max.
depth: 10)
DT-XGBoost DT: (Min. samples split: 2) XGBoost: (N estimators: 150, Max.
depth: 10)
NB-XGBoost NB: (No such Hyperparameters) XGBoost: (N estimators: 150, Max.
depth: 10)
RF-XGBoost RF: (N estimators : 150, Min. samples XGBoost: (N estimators: 150, Max.
split: 2) depth: 10)
XGBoost XGBoost: (N estimators: 150, Max. –
depth: 10)
376 A. Pathania et al.
by the sensitivity index (D-prime) of each algorithm. We have also displayed the TP
rate or Recall, FP rate, Precision, F 1 score, and accuracy of each algorithm was the
further reference.
The sensitivity index for the novel ensemble-based ML algorithms was highest
for SVM-XGBoost, followed by DT-XGBoost, followed by NB-XGBoost, followed
by RF-XGBoost. The sensitivity index for all of these new algorithms was more
than twice the sensitivity index of the standard XGBoost algorithm. The TP rate
and FP rate of SVM-XGBoost algorithm were 0.837 and 0.054, respectively, when
compared to the TP rate and FP rate of standard XGBoost, i.e., 0.555 and 0.149,
there was 50.82% improvement (increase) to the TP rate and 63.76% improvement
(decrease) to the FP rate of standard XGBoost, respectively. The accuracy and f1 score
jumped from 74.32% and 0.599, respectively, of standard XGBoost to 90.46% and
0.881, respectively, for the SVM-XGBoost algorithm. These highlighted the signif-
icant improvement in performance when the novel ensemble-based ML algorithms
proposed in this paper were used.
Table 5 displays the performance of the different traditional ML classifiers along-
side the new SVM-XGBoost algorithm. The sensitivity index for traditional algo-
rithms apart from the XGBoost algorithm was highest for SVM, followed by Decision
trees, followed by the random forest. Naive Bayes was performing the worst out of
the conventional algorithms and had a sensitivity index of 0.306, which was nearly
3.75 folds, the sensitivity index of the SVM algorithm. The sensitivity index had
increased from 1.146 that of the best traditional classifier listed in this table, SVM to
2.553 that of SVM-XGBoost, which was about 123% improvement, when compared
to the sensitivity index of Naive Bayes to SVM-XGBoost it is about 734% improve-
ment. Table 6 shows the best hyperparameters of different traditional classifiers on
the landslide dataset.
Second, it was found that XGBoost performed best amongst the traditional algo-
rithms we used, followed by SVM. XGBoost produces a prediction algorithm in the
form of a boosting ensemble of weak classification trees by a gradient descent that
optimizes the loss function [29]. The algorithm is highly effective in reducing the
processing time, and can be used for both regression and classification tasks [29].
Our results have several implications for predicting soil movement in the real
world and alerting people about impending landslides. The novel ensemble-based
ML algorithms that we propose in this paper performed significantly better than the
existing traditional algorithms. Thus, people living or visiting landslide-prone areas
can be benefited by alerts of impending landslides ahead of time using this research.
Policymakers like the government shall deploy these sensors at various stations and
use newer ensemble-based ML algorithms to generate alert for the affected people
and prepare for damage repair like roadblock repairs ahead of time.
There are several ways forward in the future in this research program. First, it
would be useful to replicate our algorithm results across many landslide sites in the
Himalayan mountains. Second, in this paper, although we used univariate data, i.e.,
prediction of soil movements was based only on previous soil movements. These
data could next be correlated with multivariate data about weather and rain to make
more precise predictions about soil movements even in the longer temporal horizon.
Third, motivated by the performance of these new algorithms, it would be good to
evaluate algorithms that make predictions of more than one algorithm and use them
as additional attributes for our main algorithm. We plan to incorporate some of these
ideas soon in our research program on soil movement predictions.
References
1. Pande RK (2006) Landslide problems in Uttaranchal, India: issues and challenges. Disaster
Prevent Manage: Int J (2006)
2. Parkash S (2011) Historical records of socio-economically significant landslides in India. J
South Asia Disaster Stud
3. Liu JW, Shih CS, Chu ETH (2012) Cyberphysical elements of disaster-prepared smart
environments. Computer 46(2):69–75
4. Pathania A, Kumar P, Priyanka A, Singh R, Chaturvedi P, Uday KV, Dutt V (2020) Development
of a low cost, sub-surface IoT framework for landslide monitoring, warning, and prediction.
In: International conference on advances in computing, communication, embedded, and secure
systems (ACCESS 2020)
5. Pathania A, Kumar P, Priyanka, Maurya A, Kumar M, Singh R, Chaturvedi P, Uday KV, Dutt V
(in press) Predictions of soil movements using persistence, auto-regression, and neural network
models: a case-study in Mandi, India. In: International conference on paradigms of computing,
communication and data sciences (PCCDS-2020)
6. Kumar P, Sihag P, Pathania A, Agarwal S, Mali N Chaturvedi P, Singh R, Uday KV, Dutt V
(2019) Landslide debris-flow prediction using ensemble and non-ensemble machine-learning
methods
7. Kumar P, Sihag P, Pathania A, Agarwal S, Mali N, Singh R, Chaturvedi P, Uday KV, Dutt V
(2019) September. Predictions of weekly soil movements using moving-average and support-
vector methods: a case-study in Chamoli, India. In: International conference on information
technology in geo-engineering. Springer, Cham, pp 393–405
Development of an Ensemble Gradient Boosting Algorithm … 379
1 Introduction
With new cameras, mobile phones and digital tablets, the amount of digital images
has had an exponential increase. Social media platforms have also contributed to
their increased distribution. At the same time, software for manipulating these dig-
ital images has also significantly evolved. These software tools make it trivial for
people to manipulate these digital images. The objective of media forensics is to
identify these manipulations and detect these doctored images. Over the years, many
techniques have been proposed to identify image manipulations. These include digi-
tal artifacts based on camera forensics, resampling characteristics, compression, and
others. A common operation in image tampering is removing certain image regions
in a “content-aware” way. In this regard, seam carving is a popular technique for
“content-aware” image resizing [1, 39]. In seam carving, the “important content” in
an image is left unaffected when the image is resized and it is generally assumed that
the “important content” is not characterized by the low energy pixels. Since seam
Fig. 1 Illustration of seam carving detection and localization: a original image, b object marked
in red to be removed and object marked in green to be preserved, c seam carved image with object
removed, d seam carving detection heatmap using proposed approach (red pixels are areas where
seams were likely removed)
2 Related Work
There have been several works proposed to detect digital image manipulations. These
include detection of splicing, morphing, resampling artifacts, copy-move, seam carv-
ing, computer-generated (CG) images, JPEG artifacts, inpainting, compression arti-
facts, to name a few. Many methods have been proposed to detect copy-move [11, 21],
resampling [6, 8, 15, 20, 29, 33, 34, 36], splicing [2, 18, 37], and inpainting-based
object removal [23, 44]. Other approaches exploit JPEG compression artifacts [7, 13,
24, 28] or artifacts arising from artificial intelligence (AI) generated images [3, 16,
30, 48]. In recent years, deep learning-based methods have shown better performance
in detecting image manipulations [4, 5, 8, 35].
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 383
Several methods have been proposed over the past decade to detect seam carving-
based manipulations [9, 14, 17, 19, 22, 25–27, 38, 40–43, 46, 47]. These include
methods using steganalysis [38], hashing [14, 27], local binary pattern [46, 47], and
deep learning-based methods [10, 31, 32, 45]. In this paper, our approach to detect
seam carving-based manipulations is also based on deep learning.
Fig. 2 Example of seam carving when a 4 × 5 matrix a is seam carved and a 4 × 4 matrix b results
due to the removal of a single seam
384 L. Nataraj et al.
Fig. 3 Example of seam insertion: (i) a and (ii) b are the 4 × 5 and the 4 × 6 image matrices before
and after seam insertion, respectively. For points along the seam, the values are modified as shown
a +a a +a
for the first row: b1,1 = a1,1 , b1,2 = a1,2 , b1,3 = round( 1,2 2 1,3 ), b1,4 = round( 1,3 2 1,4 ), b1,5 =
a1,4 , b1,6 = a1,5
removal of certain image regions. It is to be noted that seam carving can discard
and retain certain regions, depending on the weight we assign to certain regions.
For example, for an object removal problem, we may need to ensure that certain
image regions are left unaffected as distorting them may cause significant perceptual
distortion. We first explain how seam carving is used for object removal and then
discuss the interesting problems involved.
In order to detect and localize seam carving in images, we propose a two-stage detec-
tion approach: one for detection of seam carved patches and the other for localizing
seam carving in an image by generating a heatmap. First, we train a deep neural
network to identify whether patches in an image have been seam carved or not. We
then divide an image into patches, and for every patch, we compute the detection
score which results in a heatmap for the whole image. This heatmap can be used for
localization of seam carving. Finally, we train another deep neural network with the
heatmaps as input which gives a score at the image level to determine whether an
image has been seam carved or not. The entire block schematic is shown in Fig. 4.
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 385
(a) Stage 1
(b) Stage 2
5 Experiments
We first extract 64 × 128 patches from images belonging to RAISE dataset [12].
From these patches, we form two classes of image patches: first class where the
patches are further cropped to 64 × 64, and the second class where the patches are
386 L. Nataraj et al.
5.2 Learning
The patches are trained using a multilayer deep convolutional neural network which
consists of convolution layer with 32 3 × 3 convs, followed by ReLu layer, convo-
lution layer with 32 5 × 5 convs followed by max pooling layer, convolution layer
with 64 3 × 3 convs followed by ReLu layer, convolution layer with 64 5 × 5 convs
followed by max pooling layer, convolution layer with 128 3 × 3 convs followed by
ReLu layer, convolution layer with 128 5 × 5 convs followed by max pooling layer,
and finally a 256 dense layer followed by a 256 dense layer and a sigmoid layer. We
train this model till a high training accuracy and validation accuracy are obtained.
Using the trained model on the patches, the probability of a pixel being seam carved or
not is computed on overlapping patches in an image. Figure 6 shows the heatmaps on
non-seam carved and seam carved images. As we can see, the heatmaps on the seam
carved images have more red regions than the images on non-seam carved images.
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 387
Fig. 6 Detection heatmaps on images that have a not been seam carved and b seam carved
Even for an image that has the blue sky, the heatmaps can be clearly identified for
seam carved image and the non-seam carved image. This motivated us to train the
heatmaps with another CNN which takes the heatmaps as input (Fig. 4b) and outputs
the probability whether an image has been seam carved or not. As we can see from
Fig. 7, we obtained high accuracy when trained on the heatmaps.
In this experiment, we varied the percentage of seams removed in the testing set and
evaluated the model which was trained with 50% seams removed, in order to check
the robustness of the model for different amounts of seams removed. The area under
the curve (AUC) is the evaluation metric for varying percentage of seams removed.
The results are given in Table 1. We observe that the AUC is very high for percentages
around 50% and decreases for lower percentages of seams removed. This shows that
the model is generalizable for most percentages of seams removed. In the future, we
will train another model for lower percentages.
388 L. Nataraj et al.
Fig. 7 ROC curve of seam carving detection on the model trained on the heatmaps
In this experiment, we evaluated the robustness of our proposed against JPEG com-
pression. We varied the JPEG quality factors (QFs) of test images from 100 to 50.
The model was trained on seam carved and non-seam carved patches and images,
which were also JPEG compressed between the quality factors of 70-100. The area
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 389
under the curve (AUC) is chosen as the evaluation metric. The results are given in
Table 2. We observe that the AUC is high when the QF is high (compression is low)
and the AUC reduces as the QF decreases (compression increases). However, even
at a QF of 50, the AUC is still reasonably high.
Here, we evaluate our approach in a practical scenario where objects are removed
in images using seam carving. We chose an object or a region in an image that has
to be removed. The weights of this region are set to a low value such that the seam
carving algorithm is forced to pass through this region, thus removing the object
from the image. When our approach was evaluated on these images, we observe that
our model is able to localize the region that was removed as well as the paths taken
by the seam carving algorithm as shown in Fig. 8.
The detection heatmaps also exhibit explainability as shown in Fig. 9 where an
object is marked for removal in red. While this object is removed successfully, a
person’s leg in the foreground also gets removed (top row). To prevent this, another
area is marked in green (bottom row) by giving high weights so that the person’s legs
are not removed. As we can see from the heatmaps computed on the seam carved
images (top and bottom row), the path showing the possible seams also changes near
the person’s legs, thus exhibiting explainability.
Finally, we also extend the seam carving detection approach to detecting seam inser-
tion. We first extract 64 × 64 patches from images belonging to RAISE dataset [12].
From these patches, we form two classes of image patches: first class where the
patches are further cropped to 64 × 64, and the second class where the patches are
seam inserted from 64 × 32 dimensions to 64 × 64 seam dimensions. In this way, we
390 L. Nataraj et al.
Fig. 8 Detection heatmaps on images where objects have been removed using seam carving: a
original image, b heatmap computed on original image, c object marked for removal in red, d
image with object removed using seam carving, e heatmap computed on object removed image
showing the possible seam paths
obtained 16,000 patches from the RAISE dataset (8000 in each class). These were
further randomly divided into 80% training, 10% testing and 10% validation. The
patches are trained using a multilayer convolutional neural network as explained in
Sect. 5.2. We train this model till a high training accuracy and validation accuracy
are obtained. Using the trained model on the patches, the probability of a pixel being
seam inserted or not is computed on overlapping patches in an image to produce a
heatmap. Another model is trained on the heatmaps to determine if an image has
seam insertions or not. As we can see from Fig. 10, we obtained high accuracy when
trained on the heatmaps.
In this paper, we presented an approach to detect seam carved images. Using two
stages of CNNs, we detect and localize areas in an image that have been seam carved.
In the future, we will focus on making our detections more robust, combining seam
carving and insertions, and also extend to other object removal methods such as
inpainting.
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 391
Fig. 9 Explainability in the heatmaps: a original image, b heatmap computed on original image,
c object marked for removal in red and area preserved in green (bottom row), d image with object
removed using seam carving (in the top row—person’s leg is removed while preserved in the
bottom row), e heatmap computed on object removed image showing the possible seam paths with
explainability. The seam paths change on the top row and bottom row near the person’s leg
Acknowledgements This research was developed with funding from the Defense Advanced
Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of
the author and should not be interpreted as representing the official views or policies of the Depart-
ment of Defense or the US Government. The paper is approved for public release and distribution
unlimited.
References
1. Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans Graph
26(3):10
2. Bappy JH, Simons C, Nataraj L, Manjunath B, Roy-Chowdhury AK (2019) Hybrid lstm and
encoder-decoder architecture for detection of image forgeries. IEEE Trans Image Process
28(7):3286–3300
3. Barni M, Kallas K, Nowroozi E, Tondi B (2020) CNN detection of GAN-generated face images
based on cross-band co-occurrences analysis. arXiv:2007.12909
4. Bayar B, Stamm MC. Design principles of convolutional neural networks for multimedia foren-
sics. In: The 2017 IS&T international symposium on electronic imaging: media watermarking,
security, and forensics. IS&T Electronic Imaging
5. Bayar B, Stamm MC (2016) A deep learning approach to universal image manipulation detec-
tion using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information
hiding and multimedia security, pp 5–10
6. Bayar B, Stamm MC (2017) On the robustness of constrained convolutional neural networks
to jpeg post-compression for image resampling detection. In: The 42nd IEEE international
conference on acoustics, speech and signal processing
7. Bianchi T, De Rosa A, Piva A (2011) Improved DCT coefficient analysis for forgery localization
in JPEG images. In: 2011 IEEE international conference on Acoustics, speech and signal
processing (ICASSP). IEEE, pp 2444–2447
8. Bunk J, Bappy JH, Mohammed TM, Nataraj L, Flenner A, Manjunath B, Chandrasekaran S,
Roy-Chowdhury AK, Peterson L (2017) Detection and localization of image forgeries using
resampling features and deep learning. In: 2017 IEEE conference on computer vision and
pattern recognition workshops (CVPRW), pp 1881–1889
9. Chang WL, Shih TK, Hsu HH (2013) Detection of seam carving in JPEG images. In: 2013
International joint conference on awareness science and technology & Ubi-Media computing
(iCAST 2013 & UMEDIA 2013). IEEE, pp 632–638
10. Cieslak LFS, Da Costa KA, PauloPapa J (2018) Seam carving detection using convolutional
neural networks. In: 2018 IEEE 12th international symposium on applied computational intel-
ligence and informatics (SACI). IEEE, pp 000195–000200
11. Cozzolino D, Poggi G, Verdoliva L (2015) Efficient dense-field copy-move forgery detection.
IEEE Trans Inf Forens Secur 10(11):2284–2297
12. Dang-Nguyen DT, Pasquini C, Conotter V, Boato G (2015) Raise: a raw images dataset for
digital image forensics. In: Proceedings of the 6th ACM multimedia systems conference. ACM,
pp 219–224
13. Farid H (2009) Exposing digital forgeries from JPEG ghosts. IEEE Trans Inf Forens Secur
4(1):154–160
14. Fei W, Gaobo Y, Leida L, Ming X, Dengyong Z (2015) Detection of seam carving-based video
retargeting using forensics hash. Secur Commun Netw 8(12):2102–2113
15. Feng X, Cox IJ, Doerr G (2012) Normalized energy density-based forensic detection of resam-
pled images. IEEE Trans Multimedia 14(3):536–545
16. Goebel M, Nataraj L, Nanjundaswamy T, Mohammed TM, Chandrasekaran S, Manjunath B
(2020) Detection, attribution and localization of gan generated images. arXiv:2007.10466
Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks 393
17. Gong Q, Shan Q, Ke Y, Guo J (2018) Detecting the location of seam and recovering image for
seam inserted image. J Comput Methods Sci Eng 18(2):499–509
18. Guillemot C, Le Meur O (2014) Image inpainting: overview and recent advances. Signal Process
Mag 31(1):127–144
19. Han R, Ke Y, Du L, Qin F, Guo J (2018) Exploring the location of object deleted by seam-
carving. Expert Syst Appl 95:162–171
20. Kirchner M (2008) On the detectability of local resampling in digital images. In: Security,
forensics, steganography, and watermarking of multimedia contents X, vol 6819, issue, 1, p
68190F. http://link.aip.org/link/?PSI/6819/68190F/1
21. Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection
scheme. IEEE Trans Inf Forens Secur 10(3):507–518
22. Li Y, Xia M, Liu X, Yang G (2020) Identification of various image retargeting techniques using
hybrid features. J Inf Secur Appl 51:102459
23. Liang Z, Yang G, Ding X, Li L (2015) An efficient forgery detection algorithm for object
removal by exemplar-based image inpainting. J Visual Commun Image Represent 30:75–85
24. Lin Z, He J, Tang X, Tang CK (2009) Fast, automatic and fine-grained tampered JPEG image
detection via dct coefficient analysis. Pattern Recogn 42(11):2492–2501
25. Liu Q, Chen Z (2014) Improved approaches with calibrated neighboring joint density to ste-
ganalysis and seam-carved forgery detection in JPEG images. ACM Trans Intell Syst Technol
(TIST) 5(4):1–30
26. Liu Q, Cooper PA, Zhou B (2013) An improved approach to detecting content-aware scaling-
based tampering in JPEG images. In: 2013 IEEE China summit and international conference
on signal and information processing. IEEE, pp 432–436
27. Lu W, Wu M (2011) Seam carving estimation using forensic hash. In: Proceedings of the
thirteenth ACM multimedia workshop on multimedia and security, pp. 9–14
28. Luo W, Huang J, Qiu G (2010) JPEG error analysis and its applications to digital image
forensics. IEEE Trans Inf Forens Security 5(3):480–491
29. Mahdian B, Saic S (2008) Blind authentication using periodic properties of interpolation. Inf
Forens IEEE Trans Secur 3(3):529–538
30. Marra F, Gragnaniello D, Cozzolino D, Verdoliva L (2018) Detection of GAN-generated fake
images over social networks. In: 2018 IEEE conference on multimedia information processing
and retrieval (MIPR). IEEE, pp. 384–389
31. Nam SH, Ahn W, Mun SM, Park J, Kim D, Yu IJ, Lee HK (2019) Content-aware image
resizing detection using deep neural network. In: 2019 IEEE international conference on image
processing (ICIP). IEEE, pp 106–110
32. Nam SH, Ahn W, Yu IJ, Kwon MJ, Son M, Lee HK (2020) Deep convolutional neural network
for identifying seam-carving forgery. arXiv:2007.02393
33. Nataraj L, Sarkar A, Manjunath BS (2010) Improving re-sampling detection by adding noise.
In: SPIE, media forensics and security, vol 7541. http://vision.ece.ucsb.edu/publications/
lakshman_spie_2010.pdf
34. Popescu AC, Farid H (2005) Exposing digital forgeries by detecting traces of resampling. IEEE
Trans Signal Process 53(2):758–767
35. Rao Y, Ni J (2016) A deep learning approach to detection of splicing and copy-move forgeries in
images. In: 2016 IEEE international workshop on information forensics and security (WIFS).
IEEE, pp 1–6
36. Ryu SJ, Lee HK (2014) Estimation of linear transformation by analyzing the periodicity of
interpolation. Pattern Recogn Lett 36:89–99
37. Salloum R, Ren Y, Kuo CCJ (2018) Image splicing localization using a multi-task fully con-
volutional network (MFCN). J Visual Communi Image Represent 51:201–209
38. Sarkar A, Nataraj L, Manjunath BS (2009) Detection of seam carving and localization of seam
insertions in digital images. In: Proceedings of the 11th ACM workshop on multimedia and
security. ACM, pp 107–116
39. Shamir A, Avidan S (2009) Seam carving for media retargeting. Commun ACM 52(1):77–85
394 L. Nataraj et al.
40. Sheng G, Gao T (2016) Detection of seam-carving image based on benford’s law for forensic
applications. Int J Digital Crime Forens (IJDCF) 8(1):51–61
41. Sheng G, Li T, Su Q, Chen B, Tang Y (2017) Detection of content-aware image resizing based
on benford’s law. Soft Comput 21(19):5693–5701
42. Wattanachote K, Shih TK, Chang WL, Chang HH (2015) Tamper detection of JPEG image
due to seam modifications. IEEE Trans Inf Forens Secur 10(12):2477–2491
43. Wei JD, Lin YJ, Wu YJ, Kang L.W (2013) A patch analysis approach for seam-carved image
detection. In: ACM SIGGRAPH 2013 posters, pp. 1–1
44. Wu Q, Sun SJ, Zhu W, Li GH, Tu D (2008) Detection of digital doctoring in exemplar-based
inpainted images. In: 2008 International conference on machine learning and cybernetics, vol 3.
IEEE, pp 1222–1226
45. Ye J, Shi Y, Xu G, Shi YQ (2018) A convolutional neural network based seam carving detection
scheme for uncompressed digital images. In: International workshop on digital watermarking.
Springer, pp 3–13
46. Zhang D, Li Q, Yang G, Li L, Sun X (2017) Detection of image seam carving by using weber
local descriptor and local binary patterns. J inf Secur Appl 36:135–144
47. Zhang D, Yang G, Li F, Wang J, Sangaiah AK (2020) Detecting seam carved images using
uniform local binary patterns. Multimedia Tools Appl 79(13):8415–8430
48. Zhang X, Karaman S, Chang SF (2019) Detecting and simulating artifacts in GAN fake images.
In: 2019 IEEE international workshop on information forensics and security (WIFS). IEEE,
pp 1–6
A Machine Learning-Based Approach
to Password Authentication Using
Keystroke Biometrics
Abstract Keystroke authentication systems, though they are getting more popular,
are a rather common method of data/network access. The typing dynamics apply to
the automatic process of recognizing or verifying an individual’s identity depending
on the form and style of tapping on a keyboard. It allows to authenticate individuals
through their way of typing their password or a free text on a keyboard. In this paper,
we perform analysis of machine learning algorithms on a keystroke dynamics-based
data set with features like Hold time, Keyup-Keydown time and Keydown-Keydown
time. This research is based on our methodology of using support vector machines
(polynomial & radial basis kernels), random forest algorithm and artificial neural
networks to recognize users based on their keystroke patterns. The review analysis
shows a great result in user identification based on keystroke patterns with artificial
neural networks showing better results among the three algorithms implemented at
91.8% accuracy.
1 Introduction
2 Keystroke Dynamics
one defined sentence such as username or password. On the basis of contrast, the
degree of similarity between the typing approximation of the reported static password
and the real typing pace of the same password is measured, and the acceptance or
disapproval of the user is determined.
The model could be rigidly analyzed around timing of the same phrase’s training
efforts. Integrating username and password can help to improve the process of static
authentication. M. S. Obaidat et al. have a detailed evaluation of the identification
of static keystrokes on a personal data [5]. This method can reject the user because
of irregular typing speed, requiring several attempts at login to get authenticated.
Periodic dynamic authentication system is expected to solve multiple attempts at
logging in. The methodology also tends to resolve the constraint of challenging
authentication for unique or identified text. Varied inputs can be subject to periodic
authentication. This system does not rely on definite text entry and is capable of
authenticating any information (Fig. 1).
Continuous keystroke analysis is an improvement to standard keystroke analysis.
This records keystroke characteristics over the entire authentication session length.
The imposter may be observed in this approach at an earlier level than in a routinely
tracked implementation. Additional analysis is the main drawback of this strategy. It
makes the solution more complex and affects device efficiency. Continuous keystroke
analysis would be helpful during times by user when using keyboard after signing
in for multiple tasks such as accessing the Web site, typing the text, chatting, etc.,
while intermittent analysis is desired when the user’s tracking time is quite short as
the time span where the user enters username and password.
User keystroke dynamics are initially gathered to create security profiles for each
individual. The dynamics of the keystroke assess user typing habits and fit the pattern
with the record of the associated profile.
398 A. Thakare et al.
One of their main assumptions is that each key when pressed will give a different,
user-dependent, acoustic signal. This helped them to learn an alphabet from the
machine by clustering test keystroke sounds. The digraph latencies were then used
to generate the ranking within pairs of the virtual letters. Studies have shown that
little research was performed using unusual functions like using Shift Key, using
Caps Lock Key, using Number Key and using Left or Right Shift Key for user
authentication.
The keystroke dynamics methodologies are analyzed using the approaches false
acceptance rate (FAR) and false rejection rate (FRR). FAR is classified as the per-
centage of identifying instances where an unauthorized user falsely accepts a bio-
metric security system attempt to access them. The false rejection rate (FRR) is
characterized as the percentage of instances that the biometric protection system
may erroneously deny a user’s approved attempt at access.
3 Models Used
The comparative analysis of three algorithms has been performed here to evaluate
the authentication system based on the timing features and evaluation parameters
discussed above. We take into consideration support vector machines (SVM), ran-
dom forest algorithm (RF) and artificial neural networks (ANN) for determining the
computation efficiency and accuracy based on the data set provided.
n
D (x) = sign (w.x + b) = sign( ai yi (xi .x) + b) (1)
i
400 A. Thakare et al.
where w represents the hyperplane, and weight vector direction gives us the class
expected. The data points that are similar to the hyperplane, which are called the
support vectors, have a minimum distance to the decision boundary as shown in
Fig. 3.
SVM has restrictions that it has a lot of processing costs and provides unreliable
results because the data set is distinguished by a wide variety of features, and there is a
limited train data set. Instead of the inner product of two transformed data vectors, this
can be solved by inserting a kernel function into the feature space. A kernel function is
set to conform in some extended space to a dot product of two characteristic vectors.
There are two extensively used kernel functions in such processes:
• Polynomial Kernel function:
p
K (xi , x j ) = xi .x j + 1 (2)
Training: There are many methods for preparing a neural network for backpropaga-
tion. The most common approaches used to learn if the neural network is backprop-
agated are the gradient descent technique and the conjugate gradient process. As a
learning algorithm, the algorithm for backpropagation uses the process of first-order
gradient descent. In the gradient decent process, parameters are updated in a direction
that correlates with the negative gradient of the error surface. For the training of the
neural network in respectable gradient methodology, the choice of parameters such
as learning rate and momentum rate is important. Classic backpropogation parame-
ters are very sensitive. If the learning rate is very low, learning will be slow and the
algorithm will not stabilize if the learning rate is too high, so selecting the learning
rate is very important. In addition , the initial collection of neural weights impacts
convergence.
The conjugate gradient is a second-order minimization process. Other second-
order minimization techniques, including such Newton and quasi-Newton tech-
niques, can be used explicitly to develop neural networks. Between them, the con-
jugate gradient is the shortest and easiest, and for this reason, the second-order
minimization approach of neural network training is most widely used. The descent
of the conjugate gradient does not proceed down the gradient; instead, it proceeds
in a path conjugated with the previous step’s course. The gradient responsible for
the current phase, in other words, remains perpendicular to the paths of all previous
steps. Each step from the same point is at least as good as the steepest descent. Such
a number of phases are non-interfering, so that by the next step, the minimization
carried out in one step will not be partly undone. The synaptic weight Wk is updated
as follows:
Wk + 1 = Wk + ak .dk (4)
A Machine Learning-Based Approach to Password Authentication . . . 403
−gk k=0
dk = (6)
−gk + βk .dk−1 otherwise
where dk is the search direction, gk is the gradient, and βk is the gradient scaling
factor.
Activation functions: Different activation functions used in ANN were Softmax,
ReLU and Sigmoid.
Softmax: It is easy to apply the sigmoid function, and the ReLUs will not cease the
impact throughout your training phase. However, they cannot help much when you
want to deal with classification issues. Like a sigmoid function, the softmax function
squashes the outputs of each device to between 0 and 1. But it also splits each output
such that the cumulative number of the outputs is equal to 1. The output of the
softmax function is analogous to a distribution of categorical probability, telling you
that all of the groups are likely to be valid (Fig. 6).
ReLU: Instead of sigmoid, most recent deep learning networks use rectified linear
units (ReLUs) for the hidden layers. A rectified linear unit has output 0 if the input
is less than 0, and raw output otherwise. That is, if the input is greater than 0, the
output is equal to the input. ReLUs’ machinery is more like a real neuron in your
body. ReLU activations are the simplest non-linear activation function you can use,
obviously. When you get the input is positive, the derivative is just 1, so there isn’t
the squeezing effect you meet on back propagated errors from the sigmoid function.
Research has shown that ReLUs result in much faster training for large networks.
Most frameworks like TensorFlow and TF Learn make it simple to use ReLUs on
the hidden layers, so you won’t need to implement them yourself.
Regularization: Dropout regularization is one of the methods used. During process-
ing, the crucial concept is to randomly remove units (along with their connections)
from the neural network. This avoids too much co-adapting by units. Dropout sam-
ples from an increasing amount of various “thinned” networks during preparation.
By merely using a single untinned network that has smaller weights, it is possible to
estimate the result of combining the estimates of all these thinned networks at test
time. This greatly eliminates overfitting and provides substantial advantages over
other types of regularization.
4 Dataset
For this post, the training and test data are the CMU Keystroke Dynamics Benchmark
Data collection given by Killourhy et al. [8]. It contains the keystroke information for
51 users, with each user typing the “.tie5Roanl” password 400 times. The data were
gathered from multiple sessions of at least one day’s difference between the sessions,
meaning that any regular differences can be identified in the user’s typing. Besides
the 51 users provided by the CMU data set, we have appended the project members’
keystroke records. The three most widely used features for keystroke dynamics are
as follows:
• Hold time: period from the press to the release of a key.
• Keydown-Keydown time: period between the successive keys being pressed.
• Keyup-Keydown time : period from one KeyRelease and the next key click.
5 Results
A Linux Computer was used with Python and an open-source library named PyX-
Hook built on the same to monitor the keystrokes of project group members. Template
was written in Python to log keystrokes. The recordings are first stored in JSON
format after positive processing, and then another script is created for adding the
recordings in the initial data set. After preparing the data set, each user’s 300 records
are used for training and the remaining 100 for testing. The keystroke dynamics
efficiency was evaluated with respect to accuracy of models implemented.
For comparative analysis, we considered SVM, RF and ANN in terms of model
accuracy. The data set was implemented on two SVM kernel functions, namely RBF
and polynomial for analysis, where polynomial kernel has shown better performance
mainly due to the data set size. Random forest classification comes close in terms
A Machine Learning-Based Approach to Password Authentication . . . 405
of accuracy to artificial neural networks with the latter outperforming all algorithms
taken into consideration as shown in Table 1.
6 Conclusion
We also proved in this work that the method such as dynamic keystroke can recognize
user authentication within the program. Moreover, by only using three features shows
a promising result. Earlier, to improve the performance, we would like to create more
functionality that can be added as special user authentication. Some of the possible
works is not to confine the device to only some login. Users would be in a position
to use different passwords. Keyboard functionality has a range of benefits, one of
which is low cost as there are no extra and non-invasive tools for the user because
users do not need to use and employ any new tools to use dynamic keyboards. This
work indicates a high precision using ANN of about 92%. In turn, incorporation of
real-time password protection technology will now become the work of the future.
This research can continue to compare how the pattern of keystrokes varies according
to various devices.
References
1. Uludag U et al (2004) Biometric cryptosystems: issues and challenges. Proc IEEE 92:948–960
2. Jain AK et al (2005) Biometric template security: challenges and solutions. In: Signal processing
conference, pp 1–4
3. Moody J (2004) Public perceptions of biometric devices: the effect of misinformation on accep-
tance and use. In: Issues Inform Sci Inf Technol
4. Umphress D, Williams G (1985) Identity verification through keyboard characteristics. Int J
Man-Mach Stud 23:263–273
5. Obaidat MS et al (1999) Estimation of pitch period of speech signal using a new dyadic wavelet
algorithm. Inf Sci 119:21–39
6. Gunetti D et al (2005) Keystroke analysis of free text. ACM Trans Inf Syst Secur (TISSEC)
8:312–347
7. Mondal S, Bours P (2015) A computational approach to the continuous authentication biometric
system. Inf Sci 304:28–53
406 A. Thakare et al.
Abstract Single image super resolution plays a vital role in satellite image
processing as the observed satellite image generally has low resolution due to the
bottleneck in imaging sensor equipment and the communication bandwidth. Deep
learning provides a better solution to improve its resolution compared to many sophis-
ticated algorithms; hence, a deep attention-based SRGAN network is proposed. The
GAN network consists of an attention-based SR generator to hallucinate the missing
fine texture detail, a discriminator to guess how realistic is the generated image.
The SR generator consists of a feature reconstruction network and attention mecha-
nism. Feature reconstruction network consists of residually connected RDB blocks
to reconstruct HR feature. The attention mechanism acts as a feature selector to
enhance high-frequency details and suppress undesirable components in uniform
region. The reconstructed HR feature and enhanced high-frequency information
are fused together for better visual perception. The experiment is conducted on
WorldView-2 satellite data using Googles free cloud computing GPU, Google colab.
The proposed deep network performs better than the other conventional methods.
1 Introduction
Single image super resolution (SISR) has enticed the attention of many researchers
and AI companies. The fundamental principle of SISR is to reconstruct a high-
resolution image from a low-resolution image. This is obviously an ill-posed problem
since a number of HR solutions can be derived for a given set of LR data. Conversely,
it is considered as an underdetermined inverse problem, where the solution is not
exclusive. Over the past decades, enormous works have been carried out to address
this issue. Basically, interpolation-based methods are simple and fast, yet smooth the
data resulting in jaggy and ringing effects. While reconstruction-based methods rely
on different smoothing priors and constraints and still remain inept at regions like
textures and edges. To improve the efficiency of the super resolution (SR) algorithm,
prior information like sparsity, self-similarity, and exemplar priors were learned
from the images. These learning-based methods formulate the coefficients between
the LR and HR image training pair either by learning sparse coefficients, self-similar
structures, or exemplar images.
Deep learning is currently progressing in many computer vision fields. With available
large datasets and computation power, deep learning achieves good accuracy by an
end-to-end learning. With the advent of SR based on the convolutional neural network
(SRCNN), deep learning is dynamically increasing the SR performance. SRCNN is
a three-layer shallow network that directly learns an end-to-end nonlinear mapping
function. The network learns the upscaling filter parameters directly. Subsequently,
deeply recursive convolutional network architecture has a small model parameter yet
permits pixel dependencies for a long range. Dilated convolutional neural network
uses dilated convolutions also known as atrous convolution which is a vivid method
to increase the receptive field of the network exponentially with linear parameter
accretion. Several studies [1, 2] show that increasing the depth of the network can
efficiently increase the model’s accuracy as they are potential to model high complex
mapping. Such deep networks can be efficiently trained using batch normalization
[3]. The learning ability of CNN is made powerful with skip connections [4–6] and
residual blocks [5], where instead of identity learning the network learns the residue.
This design choice has relived the network from vanishing gradient problem which
remained a bottleneck in training deep networks. The performance is fueled by the
right choice of architectural modules that increase the depth, width, and growth rate
of the network. VDSR [4] has increased network depth by piling more convolutional
layers with residual learning. Enhanced deep residual network SR (EDSR) [7] and
multiscale deep SR (MDSR) system [7] use the residual block to build a wide and
deep network with residual scaling [8], respectively. SRResNet [9] also takes the
benefit of residual learning and adopts the efficient sub-pixel convolution layer, while
the advantage of dense connection that is a direct connection from the previous
layers is adopted in SRDenseNet [10]. Residual dense network [11] uses hierarchical
features from the LR image using residual dense blocks. The residual in residual dense
block [11] strategy improves the perceptual quality of the reconstructed image. This
network also shows that a higher growth rate can improve the model’s performance.
Attention-Based SRGAN for Super Resolution of Satellite Images 409
• This work also proposes a generator, i.e., an SR network that hallucinates the
missing fine texture detail. The SR network consists of feature reconstruction
and attention generating network. The feature reconstruction network consists
of residual connection of dense modules, while the attention generating network
comprises of dense connection of residual blocks.
• A discriminator is proposed to relatively guess how realistic is the generated
image. This network has a residual in residual connection of dense modules.
The single image super resolution is an ill-posed problem that generates a high-
resolution super resolved image I SR from a low-resolution image I LR , where I LR
is a downsampled, blurred, and noisy version of the original high-resolution image
I HR . It is represented as
I LR = D B I HR + ε (1)
where D is the downsampling factor, B is the blurring operator, and ε is the additive
noise. If the tensor size of I LR is H × W × M, where H, W, M denotes the height,
width, and number of color channels in the I LR image, then the tensor size of I SR
will be of dimension DH × DW × M. Reconstruction does not always guarantee the
original I HR but an image I SR which is more similar to it. To obtain such an image,
it is not always advisable to check the content, i.e., the pixel value but the percep-
tual quality of the reconstructed image. GAN framework is an excellent choice to
improve the perceptual quality of the image. The success of a GAN network depends
on the architecture of the generator, discriminator, and the choice of loss functions.
Its general architecture is given in Fig. 1. While training, the HR data is downsampled
to a LR data. The generator of a GAN network upsamples the LR data into SR data,
which is compared with the available HR database by the discriminator to identify
the truthfulness of the generated SR data for natural texture. The loss is then calcu-
lated and back propagated to train the generator and the discriminator. Generator is
trained such that it generates more realistic image to fool the discriminator, mean-
while discriminator is trained to intelligibly identify the generated images. Aiming
to enhance the overall visual quality of the reconstructed SR image, this section first
proposes a novel network design for generator and discriminator and then the loss
functions.
where F n−1 and F n are the input and output feature of the nth RDB block.
H RDF represents the functionality of fusion of dense feature within RDB block
and given by the following equation.
n
FRDF = HRDF Fn−1 , Fn,c
1
, Fn,c
2
, . . . Fn,c
d
, . . . Fn,c
D
; (1 ⇐ d ⇐ D) (3)
The features from d convolution blocks are concatenated and are subjected to a
1 × 1 convolution layer. F n,c d is the output of dth convolution block of nth RDB
block. Each convolution block consists of a convolution layer followed by batch
normalization and ReLU, and its functionality is denoted by H CBR .
where [.,.] denotes the concatenation of features. The residual learning feature is the
output of the nth RDB block is given by
n
FRL = FRDF
n
+ Fn−1 (5)
Finally, this residual feature is given to a convolution block to yield the RDB
block output.
n
Fn = HCBR FRL (6)
Dense Residual Block. DRB block consists of residual block (RB), i.e., residual
connection of d convolution layers (F m,r,c d ), and dense connection of RB (F DRB m ).
Its structure is shown in Fig. 3. Its functionality is represented as H DRB .
m
out = HDRB FDRB in ; 1 ⇐ m ⇐ M
m
FDRB (7)
where F DRB in m and F DRB out m represents the input and output feature of the mth DRB
block. H DF represents the functionality of fusion of dense feature within DRB block,
while H RB and H C denote the functionality of the residual block and convolution
layer, respectively.
m
out = HDF FDRB in , Fm,RB , Fm,RB , . . . Fm,RB , . . . , Fm,RB , ;
m 1 2 r R
FDRB
(8)
(1 ⇐ r ⇐ R)
where F m,RB in r represents the input to the rth residual block in mth DRB block, and
F m,r,c D denotes the output of the Dth convolution layer in the rth residual block in
mth DRB block.
r
r
Fm,RB = HRB Fm,RB in , Fm,r,c ; (1 ⇐ r ⇐ R)
D
(9)
= Fm,RB
r
in + Fm,r,c
D
(10)
2.2 Generator
The basic idea behind a SRGAN architecture is that the generative model G is trained
to cheat the discriminator D that is trained to differentiate real or super resolved
images. Thus, the generator will be able to generate high-resolution images that are
more close to real images and undistinguishable by the discriminator. Thus, the SR
network generates a perceptually plausible images.
The generator architecture to generate I SR is illustrated in Fig. 4. Its aim is to
achieve a generating or mapping function G by an end-to-end learning, which hallu-
cinates an HR image for a given LR image. The generator network comprises of two
stages: feature reconstruction and attention generation network. The feature recon-
struction structure intends to reconstruct the HR information, while the attention
structure generates the weightage for high-frequency information to be restored.
The feature reconstruction network is a fully convolutional structure that recon-
structs high-frequency details to be injected into the interpolated LR image. To predict
the relative plausible pixels in HR space, a large receptive field is required. This
need influences to use deep cascaded blocks for extracting hierarchical features. It
comprises of three modules: initial section for shallow feature extraction, hierar-
chical feature extraction module (HFM) using the residual RDB module, and finally
the upscaling module (UM).
The shallow features (F SF ) are given by
FSF = HSFE I LR (11)
414 D. Synthiya Vinothini and B. Sathya Bama
= HC I LR (12)
where H SFE (.) denotes the initial shallow feature extraction process. This low-level
shallow features contain significant information to restore HR image. It is fed to
residual RDB module to extract the hierarchical features (F HF ) which is given by
where H HFE (.) denotes the hierarchical feature extraction process, which is the sum of
shallow and dense features (F DF ) extracted by the residual RDB whose functionality
is given by H RRDB (.). The residual RDB module is arranged such that N number of
RDB blocks are arranged sequentially so that input to each RDB block is the output
of its preceding block. Thus, the super resolution network will be benefitted by the
collective information at various levels. Then, this structure is connected with a skip
connection to form a residual network. This connection increases the movement of
gradient and information over the network and thus reduces the vanishing gradient
problem in training deeper networks. The functionality of the sequential RDB is
represented by HSRDB (.) and can be given by the following equation
Attention-Based SRGAN for Super Resolution of Satellite Images 415
FDF = HSRDB (FSF ) = HRDB,N ... HRDB,n ... HRDB,2 HRDB,1 (FSF ) (17)
where HMP , HAP , HUP denote the functionality of max pooling, average pooling, and
upsampling, respectively.
The structure consists of an encoding and decoding path where feature sizes
shrink and grow, respectively. In the encoding path, pooling is applied to reduce
the data dimension and increase the receptive field to predict the high-frequency
region. Meanwhile in the decoding path, the encoded features upsampled using a
deconvolution layer. This path also takes the advantage of integrating the low-level
feature from the encoding path.
The significant information from the low-level feature is reused. Thus, the
combined feature can specifically identify the textural region that needs more weigh-
tage or attention by the feature reconstruction network. The final attention feature
has the size of the HR image to be generated with a single channel output. It uses
sigmoid activation to limit its value between 0 and 1. The texture regions will have
more weightage or attention, and its feature value will be more close to 1.
The functionality of attention mechanism is represented as H AF , and the attention
feature is FAF .
feature (FAF ).
2.3 Discriminator
The discriminator network is trained to distinguish the real HR images from the
generated SR images. The network structure is given in Fig. 5. The network consists
of four stages: a shallow feature extraction, hierarchical feature extraction, flattening,
and discrimination. The feature extraction structure of the discriminator follows the
same architecture as that of the generator. Once the hierarchical feature is extracted,
it is flatted by a dense network. Then, the dense network is reduced to one neuron, to
give a unique value, which is finally fed to a sigmoid activation (σ ) to limit the value
between 0 and 1. The discriminator performance is enhanced based on the relativistic
average discriminator (RaD) [17]. A standard discriminator deduces the probability
for an input image to be realistic and natural, whereas the RaD tries to estimate the
probability that a real image xn is relatively more natural than the generated image
x g and vice versa and thus expressed as
DRa = (xn ) = σ (D(xn ) − Ex g [D x g ]) (23)
DRa x g = σ (D x g − Exn [D(xn )]) (24)
The loss function for the discriminator of relative average standard GAN (RaSGAN)
is defined as
L RaSGAN
D = Exn log(DRa (xn )) + Ex g log 1 − D Ra x g (25)
To improve the training of the GAN network, we propose to use the following
loss function in addition with the adversarial loss. The total generator loss (L G ) is
given as
L G = αL RaSGAN
G + L perceptual + βL 1 (27)
where α and β are constants to regulate the loss functions. L 1 is the content loss that
computes the 1-norm distance between the generated SR image and its reference HR
image.
where G(xm ), Re f (xm ), and Exm are the generated SR image for input LR image
‘x’ in the mini-batch, its HR reference image, and the mean value for all input
LR images in the mini-batch. Content loss may preserve the information but often
fails to maintain the high-frequency information which lead to smooth textures and
unpleasant results. To enhance the perceptual quality in the generated image, percep-
tual loss (L perceptual ) is used. Instead of providing a pixel-wise loss measure, this gives
feature-wise measure. It is the VGG loss obtained from the activation layers of the
pre-trained 19 layer VGG network. It is measured as the distance between the VGG
perceptual feature (ϕ) of the generated SR image and its HR counterpart.
This section analyzes the performance of the proposed method for super resolution
of satellite imagery as well as other recently developed methods in the field. The
experimental simulation is conducted on WorldView-2 images. Super resolution is
a problem of recovering an image from its decimated, blurred, warped, and noisy
418 D. Synthiya Vinothini and B. Sathya Bama
version. This work considers reconstruction from a decimated data. The size of all
images considered for experimental simulation is 256 × 256. The original image
was spatially downsampled by a downsampling factor f ds to obtain a low-resolution
image.
The experiment is conducted for WorldView-2 satellite images, and the visual
comparison of its result is exhibited in Fig. 6. WorldView-2 provides a high-resolution
(0.46 m) panchromatic and eight multispectral bands with a spatial resolution of
1.84 m. Out of the eight bands, four represents the standards color channels viz., red,
green, blue and near infrared1, and four extra band viz., coastal, yellow, red edge, and
Fig. 6 Visual comparison of different super resolution methods on worldview2 data. a Original HR
satellite data. b Downsampled version of a. c CC. d ICBI. e IWF. f DCC. g Adaptive polynomial
regression (proposed). h Quad gradient method (proposed). i SCN. j SRGAN. k Attention-based
SR and l Attention-based SRGAN (proposed)
Attention-Based SRGAN for Super Resolution of Satellite Images 419
near infrared 2 for improved spectral analysis, mapping, exploring, and monitoring.
Its high altitude has the advantage to revisit any place on earth in 1.1 days. For our
experiment, we use Worldview-2 dataset acquired over Madurai region—South India
acquired on 04th June 2010.
For training the deep learning-based SR network, large dataset is required. This is
satisfied by extracting the large satellite image data tile into non-overlapping patches.
For this experiment, 20,000 h/LR patch pairs of size 256 × 256 are extracted. This
dataset is split into 90/10% for training/validation. Instead of classical stochastic
gradient descent procedure, this uses an adaptive moment estimation (Adam) opti-
mization procedure, as it can efficiently solve practical deep learning problems that
use large models and datasets. The configuration parameters alpha, beta1, beta2,
epsilon for Adam optimizer is set as 10–4 , 0.9, 0.999 and 10–8 , respectively. Alpha is
the learning rate or step size, and beta1 and beta2 are the exponential decay rate for
first and second moment estimate, respectively, epsilon is a very small constant to
prevent division by zero in the process. The network is trained for 100 epochs using
a mini-batch size of 64.
For attention-based SRGAN, the network parameters are as follows: The number
of RDB blocks (N), the number of DRB blocks (M), the number of convolution layers
(D) within each RDB and Res block in DRB network, the number of Res Block
(R) within DRB block, and the growth rate (G), i.e., width of each convolutional
layer. These parameters are set as N = 5, M = 5, D = 8, R = 5, and G = 64.
The model is trained using Google Colab, which provides Nvidia Tesla K80 GPU.
Figure 7 explores the effect of network growth rate on the metrics PSNR. Based
on the observed result, it can be concluded that growth rate of 64 is optimum. The
network with higher growth rate is yielding only a saturated performance.
For convenience, f ds is set such that it is 2n where n is the number of iterations the
classical algorithm is applied to get the super resolved image. The super resolution
is performed on the downsampled image. The methods other than deep learning
is iterated for (n − 1) times. The final super resolved image is obtained after ‘n’
iterations which will be of same size as that of the original HR image.
Then, the super resolved image is compared with the original HR image for
quality assessment of the algorithm. The proposed super resolution method was
compared with several state of art methods including cubic convolution, interactive
curvature-based interpolation, inverse Wiener filter (IWF), directional cubic convo-
lution, sparse coding-based network (SCN), and super resolution generative adver-
sarial network. Further the results of the attention-based mechanism are compared
for general SR network and SRGAN network. The results are compared to determine
their discrepancies and other valuable measures.
The original image has been downsampled with factors f ds = 2 and 4 using
bicubic kernel. The performance of the algorithms is discussed based on the results
of the quantitative measures. For the purpose of evaluation of the truthfulness of
the proposed algorithm, the quality metrics like PSNR, RMSE, degree of distortion,
correlation coefficient, and structural similarity are measured and given in Tables 1
and 2.
Comparing the metrics value of adaptive polynomial regression with the other
interpolation-based methods like CC and ICBI, it gives better performance, but still
it has higher distortion rate. SSIM value also shows that it requires edge enhancement.
4 Conclusion
In this work, the deep learning-based method is proposed to serve the purpose of
resolution enhancement in satellite images. Deep learning provides a better solution
compared with many sophisticated algorithms; hence, this work proposes a deep
attention-based SRGAN. The GAN network consists of a SR generator to hallucinate
the missing fine texture detail, a discriminator to guess how realistic is the generated
image. An attention-based SR network is proposed for SR generator. The SR gener-
ator consists of a feature reconstruction network and attention mechanism. Feature
422 D. Synthiya Vinothini and B. Sathya Bama
References
1. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image
recognition. In: International conference on learning representations (ICLR), pp 1–14. arXiv
preprint arXiv:1409.1556
2. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern
recognition (CVPR), pp 1–9
3. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing
internal covariate shift. In: Proceedings of the 32nd international conference on machine
learning (ICML), pp 448–456
4. Kim J, Kwon Lee J, Mu Lee K (2015) Accurate image super-resolution using very deep
convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 1646–1654
5. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European
conference on computer vision (ECCV), Springer, pp 630–645
6. Kim J, Kwon Lee J, Mu Lee K (2016) Deeply-recursive convolutional network for image super-
resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition,
pp 1637–1645
7. Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single
image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern
recognition workshops, pp 136–144
8. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the
impact of residual connections on learning. In: Thirty-first AAAI conference on artificial
intelligence, pp 4278–4284
9. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Shi W (2017) Photo-
realistic single image super-resolution using a generative adversarial network. In: Proceedings
of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
10. Tong T, Li G, Liu X, Gao Q (2017) Image super-resolution using dense skip connections. In:
Proceedings of the IEEE international conference on computer vision, pp 4799–4807
11. Zhang Y, Tian Y, Kong Y, Zhong B, Fu Y (2018) Residual dense network for image super-
resolution. In: International IEEE conference on computer vision and pattern recognition
(CVPR), pp 2472–2481
12. Johnson J, Alahi A, Li F (2016) Perceptual losses for real-time style transfer and super-
resolution. In: European conference on computer vision (ECCV), Springer, pp 694–711
13. Zhang Y, Liu S, Dong C, Zhang X, Yuan Y (2019) Multiple cycle-in-cycle generative adversarial
networks for unsupervised image super-resolution. IEEE Trans Image Process 29:1101–1112
14. Zhang D, Shao J, Hu G, Gao L (2017) Sharp and real image super-resolution using generative
adversarial network. In: International conference on neural information processing, Springer,
Cham, pp 217–226
15. Zong L, Chen L (2019) Single image super-resolution based on self-attention. In: IEEE inter-
national conference on unmanned systems and artificial intelligence (ICUSAI), Xi’an, China,
pp 56–60. doi: https://doi.org/10.1109/ICUSAI47366.2019.9124791
Attention-Based SRGAN for Super Resolution of Satellite Images 423
16. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical
image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A (eds) Medical Image
Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture
Notes in Computer Science, vol 9351. Springer, Cham. https://doi.org/10.1007/978-3-319-
24574-4_28
17. Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from
standard GAN. arXiv preprint arXiv:1807.00734
Detection of Acute Lymphoblastic
Leukemia Using Machine Learning
Techniques
1 Introduction
Leukemia (blood cancer) is caused due to defective functioning of white blood cells
(WBC) [1]. It affects the immune system of the body. ALL is a blood cancer that
causes over production of lymphoblasts (immature lymphocytes) [1–10]. Micro-
scopic blood-cell analysis is an efficient and cost-effective approach for the early
diagnosis of heamatological disorders [1–5, 7–11]. Microscopic analysis is generally
(Invited paper)
coupled with other tests, which include blood smear tests, bone marrow aspiration
to make the classification more accurate. However, microscopic blood-cell analysis
has a crucial role in early screening/ diagnosis of disease [1–3, 11]. Notwithstanding
the lengthy process involved, the tests also require an expert opinion from experi-
enced doctors. Thereby proving that apart from being time-consuming, the methods
involved are also expensive.
In a bid to find a workaround for the problem mentioned above, researchers
generally use the concept of computer vision and its various techniques to automate
the manual tasks of classification. Mohaptra et al. [9] have suggested a fuzzy-based
segmentation approach for efficient ALL detection. In [4, 9, 12], texture, shape,
and color features are extracted in the feature extraction stage. Then, SVM [13] is
employed to efficiently classify ALL.
In [1], AdaBoost with random forest is used to properly classify ALL. On the
other hand, Narjim et al. [10] have suggested an ensemble classifier-based ALL
classification approach. Rawat et al. have suggested a hybrid classifier-based ALL
detection approach [6].
Currently, transfer learning has gained an important role in medical image analysis
due to its outstanding performance in small datasets [2, 14]. Vogado et al. [2] have
proposed a transfer learning-based feature extraction and SVM [13]-based ALL
classification method. They have used AlexNet [15], CafNet [16], Vgg-f [17], and
ensemble of all these three methods to extract efficient features and compare their
performances.
In this work, we have presented three models by introducing fully connected layers
and/ or dropout layers in ResNet50 architecture [18, 19]. Using the best model of the
lot, we have extracted efficient features. Then, we have employed machine learning
techniques for efficient ALL classification.
2 Datasets
In this work, we use ALL-IDB2 dataset [20], for the validation of the proposed ALL
classification approach. It contains 260 images with 130 images of ALL and 130
images of healthy cells. Some of these images are displayed in Fig. 1.
3 Proposed Method
The proposed ALL classification method is presented as shown in Fig. 2. The aim
of this work is to present an effective and computationally efficient ALL detection
technique.
Here, we have suggested a transfer learning-based feature extraction technique
since it is preferred over traditional CNNs particularly, while we have ALLIDB2 like
small datasets. We have presented three transfer learning models by introducing fully
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 427
a b c
d e f
Fig. 1 Sample images: a–c represent healthy cells; d–f are unhealthy (ALL) cells
connected layers and/ or dropout layers in ResNet50 architecture [18, 19]. Finally,
we have applied machine learning techniques to classify ALL effectively. The details
of the work have been discussed as follows.
Transfer learning is an adaptive learning method where weights and models trained
for one task can be reutilized to perform some other task without retraining the model
from scratch again [2, 14, 21, 22]. The basic idea is that a model trained over a large
and diverse dataset. The model can then be repurposed and fine-tuned to perform the
required task on a specific dataset to give out specified results without one having
to retrain the model from scratch again. It helps reduce the training and execution
time and reduces the emphasis on the requirement of a large amount of dataset and
428 P. K. Das et al.
eases the hardware dependencies without having to do any trade-off with accuracy
and efficiency of the performance [2, 14, 21, 22].
The key difference between the various deep-net architectures such as AlexNet
[15], VGG-16 [23], Res-Net50 [18] is the number of parameters involved, depth of
the layers, the architecture design, etc. In this experiment, we have slightly modified
ResNet50 architecture [18, 19] by adding by introducing fully connected layers and/
or dropout layers and suggested three model as follows.
It is a general understanding that the accuracy and complexity of the feature han-
dling increase with an increase in a deep neural network’s depths, i.e., as the depth
increases, accuracy also increases. But this is not always true. It so happens that the
initial increase in depth accuracy increases because of the handling of more com-
plex features, but after a certain depth, accuracy again starts decreasing. It happens
because of the vanishing gradient problem. In the proposed experiment, the chal-
lenge was to incorporate a deep enough layer to handle various feature complexities
but without having to do any trade-off of accuracy. The authors in [18] found a solu-
tion to this. They proposed a deep-net architecture known as ResNet [18], which
stands for residual network. In this, they propose the concept of identity shortcut
(skip) connection. The identity shortcut works in the principle that an identity fea-
ture vector from the earlier layer can be stacked with the forward layer’s feature.
The skip connection gives an additional path for the gradient flow. Hence, solve the
vanishing gradient problem. It allows the room to increase the deep-net architecture’s
depth without worrying about the vanishing gradient problem. Figure 3 displays the
ResNet50 architecture [18, 19]. The convolutional block and identity block used in
here are shown in Figs. 4 and 5, respectively.
For this experiment, we have tried to customize the ResNet50 architecture(
restricting ourselves to changes only in the dense layer section) to suit the required
purpose. We have proposed three different customized models as shown in Figs. 6,
Fig. 6 Model 1
7, and 8. Based on the training results, a comparative analysis among them has been
made to select the best of the lot to carry out further operations.
The algorithm is designed to select the best model, as shown in Fig. 9. In each
epoch, the validation accuracy of the model is checked. Suppose it is greater than
the current best model’s validation accuracy. In that case, the best model’s valida-
430 P. K. Das et al.
Fig. 7 Model 2
Fig. 8 Model 3
tion accuracy is updated, and the corresponding weight of the architecture for the
following accuracy is saved. The best model in the last epoch has to be used for
feature extraction in the testing phase. In the feature extraction process, features are
extracted from the penultimate layer or fc2 layer, as shown in Fig. 10.
Finally, we have employed logistic regression [24], SVM [13], and random forest
[25] to classify ALL and compare their performances.
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 431
Fig. 9 Flowchart for updating the best weight and selecting the model having best training perfor-
mance
432 P. K. Das et al.
Fig. 10 Features are extracted from dense layer and fed to different classifiers
This section deals with a comparative training performance analysis of these three
models. Then, the classification performances of logistic regression, SVM, and ran-
dom forest are highlighted. Figures 11 and 12 present the performances of Model-1
in the training phase. On the other hand, the performances of Model-2 in the training
phase are shown in Figs. 13 and 14. Similarly, Figs. 15 and 16 represent the perfor-
mances of Model-3 in the training phase. From these figures, we notice that in all
these three models, the training and validation accuracy enhance with the increased
epoch. We also notice that the training and validation losses decrease with increased
epoch.
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 433
Fig. 11 Variation of training and validation accuracies of Model 1 with respect to the number of
epoch
Fig. 12 Variation of training and validation losses of Model 1 with respect to the number of epoch
434 P. K. Das et al.
Fig. 13 Variation of training and validation accuracies of Model 2 with respect to the number of
epoch
Fig. 14 Variation of training and validation losses of Model 2 with respect to the number of epoch
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 435
Fig. 15 Variation of training and validation accuracies of Model 3 with respect to the number of
epoch
Fig. 16 Variation of training and validation losses of Model 3 with respect to the number of epoch
Model 2 and Model 3 is 76.3 % and 86.5 %, respectively. Hence, Model 1 is selected
for feature extraction in the testing phase.
Table 1 represents the classification performance. From the table, we see that
logistics regression and SVM achieve similar performance with the best sensitivity.
All three methods gain 96.15 % accuracy. On the other hand, random forest achieves
the best performance in terms of specificity, precision, and accuracy.
5 Conclusion
References
1. Mishra S, Majhi B, Sa PK (2019) Texture feature based classification on micro- scopic blood
smear for acute lymphoblastic leukemia detection. Biomed Signal Process Control 47:303–311.
https://doi.org/10.1016/j.bspc.2018.08.012
2. Vogado LH, Veras RM, Araujo FH, Silva RR, Aires KR (2018) Leukemia diagnosis in blood
slides using transfer learning in CNNs and SVM for classification. Eng Appl Artif Intell 72:415–
422. https://doi.org/10.1016/j.engappai.2018.04.024
3. Al-Dulaimi K, Banks J, Nguyen K, Al-Sabaawi A, Tomeo-Reyes I, Chandran V (2020) Segmen-
tation of white blood cell, nucleus and cytoplasm in digital haematology microscope images: a
review challenges, current and future potential techniques. IEEE Rev Biomed Eng https://doi.
org/10.1109/RBME.2020.3004639
Detection of Acute Lymphoblastic Leukemia Using Machine Learning Techniques 437
4. Putzu L, Caocci G, Di Ruberto C (2014) Leucocyte classification for leukaemia detection using
image processing techniques. Artif Intell Med 62:179–191 https://doi.org/10.1016/j.artmed.
2014.09.002
5. Rawat J, Singh A, Bhadauria HS, Virmani J (2015) Computer aided diagnostic system for
detection of leukemia using microscopic images. Procedia Comput Sci 70:748–756. https://
doi.org/10.1016/j.procs.2015.10.113
6. Rawat J, Singh A, Bhadauria H (2017) Classification of acute lymphoblastic leukaemia using
hybrid hierarchical classifiers. Multimed Tools Appl 76:19057–19085
7. El Houby EMF (2018) Framework of computer aided diagnosis systems for cancer classification
based on medical images. J Med Syst 42(8):1–11. https://doi.org/10.1007/s10916-018-1010-
x
8. Rahman A, Hasan MM (2018) Automatic detection of white blood cells from microscopic
images for malignancy classification of acute lymphoblastic leukemia. In: 2018 International
conference on innovation in engineering and technology (ICIET). IEEE, pp 1–6
9. Mohapatra S, Samanta SS, Patra D, Satpathi S (2011) Fuzzy based blood image segmentation
for automated leukemia detection. In: 2011 International conference on devices and commu-
nications ICDeCom 2011—Proceedings. https://doi.org/10.1109/icdecom.2011.5738491
10. Narjim S, Al Mamun A, Kundu D (2020) Diagnosis of acute lymphoblastic leukemia from
microscopic image of peripheral blood smear using image processing technique. In Interna-
tional conference on cyber security and computer science. Springer, pp 515–526. https://doi.
org/10.1007/978-3-030-52856-0-41
11. Das PK, Meher S, Panda R, Abraham A (2020) A review of automated methods for the detection
of sickle cell disease. IEEE Rev Biomed Eng 13:309–324. https://doi.org/10.1109/RBME.
2019.2917780
12. Agaian S, Madhukar M, Chronopoulos AT (2018) A new acute leukaemia-automated classifi-
cation system. Comput Meth Biomech Biomed Eng: Imaging Vis 6 (3):303–314
13. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural
Process Lett 9(3):293–300. https://doi.org/10.1023/A:1018628609742
14. Gong Y, Zhang Y, Zhu H, Lv J, Cheng Q, Zhang H, He Y, Wang S (2020) Fetal congenital heart
disease echocardiogram screening based on DGACNN: adversarial one-class classification
combined with video transfer learning. IEEE Trans Med Imaging 39(4):1206–1222. https://
doi.org/10.1109/TMI.2019.2946059
15. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional
neural networks. In: Proceedings of advances in neural information, pp 1097–1105
16. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014)
Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM
international conference on multimedia, MM’14. ACM, New York, NY, USA, pp 675–678
17. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:
delving deep into convolutional nets. In: British machine vision conference
18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
19. Ji Q, Huang J, He W, Sun Y (2019) Optimized deep convolutional neural networks for identifi-
cation of macular diseases from optical coherence tomography images. Algorithms 12(3):1–12
20. Labati RD, Piuri V, Scotti F (2011) All-IDB: The acute lymphoblastic leukemia image database
for image processing. In 2011 18th IEEE international conference on image processing. IEEE,
pp 2045-2048
21. Wang S, Zhang L, Zuo W, Zhang B (2020) Class-specific reconstruction transfer learning for
visual recognition across domains. IEEE Trans Image Process 29:2424–2438. https://doi.org/
10.1109/TIP.2019.2948480
22. Han N, Wu J, Fang X, Xie S, Zhan S, Xie K, Li X (2020) Latent elastic-net transfer learning.
IEEE Trans Image Process 29:2820–2833. https://doi.org/10.1109/TIP.2019.2952739
23. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image
recognition. In: ICLR, pp 1–14
24. Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression. Wiley, p 398
25. Liaw A, Wiener M (2002) Classification and regression by random Forest. R News 2(3):18–22
Computer-Aided Classifier
for Identification of Renal Cystic
Abnormalities Using Bosniak
Classification
P. R. Mohammed Akhil
NVIDIA, Bangalore, India
P. R. Mohammed Akhil · M. Yadav (B)
NIT, Tiruchirappalli, Tamil Nadu, India
e-mail: [email protected]
M. Yadav
ECE Department, MNIT, Jaipur, Rajasthan, India
1 Introduction
Renal cell carcinoma (RCC) is the ninth most common cancer in men and 14th
most common cancer in women [1]. RCC accounts for approximately 90% of all
renal malignancies. Traditionally, 30–40% of patients with RCC have died due to
the disease as compared to 20% mortality rates associated with prostate and urinary
bladder cancers [2]. Overall, the lifetime risk for developing kidney cancer in men is
about 1 in 48. The lifetime risk for women is 1 in 83. Identification of renal tumors
correctly at the right time is very crucial. Proper follow-up based on the preliminary
diagnosis will help in reducing complications for the patients. Early stage detection
coupled with proper treatment can help in drastically reducing mortality rates related
to renal tumors.
Medical imaging plays a crucial role in the detection and diagnosis of tumors.
Prescribing the appropriate imaging technique for a patient is crucial for proper detec-
tion of the tumor. Ultra sound is generally the first step for patients with suspected
renal disease because of low cost, availability, and lesser radiation effects [3]. But
the role of ultra sound in identifying renal tumors is limited. There could be overlap
between complex cystic masses and solid lesions. The most commonly used method
to evaluate renal masses is contrast-enhanced CT [4]. It has also proven to be the first
choice for grading of renal tumors with high accuracies in both early and advanced
stages. High resolution, reproducibility, and moderate cost allow CT to be the primary
choice for imaging. CT has a sensitivity of about 90% for small renal masses and
moving toward 100% for larger ones [5]. MRI is also used excessively for the eval-
uation of intermediate renal masses and also for grading of renal cancer [6]. Tradi-
tionally, MRI is preferred when the contrast enhancement in CT is questionable, and
the radiologist is unable to make a confirmed preliminary diagnosis. MRI is also
preferred in case of pregnancies, allergies, or for follow-up to reduce the effects of
radiation.
Medical images are most susceptible to salt and pepper noise compared to other
types of noises generally found in digital images. The median filter is the most
efficient and common filter used to remove salt and pepper noise in digital images.
T. Huang et al. proposes a fast two-dimensional median filtering which is based
on storing the gray-level histogram values of the pixels in the filtering window to
reduce sorting time compared to conventional methods [7]. But this tendency is to be
memory intensive as the values have to be stored prior to processing. H. Hwang et.al
proposed an adaptive median filter which uses a variable window size for a median
filter which provides better performance while maintaining sharpness [8]. R.H. Chan
et.al does a modification on the adaptive median filter by dividing the filtering into two
stages, namely adaptive filtering and image regularization [9]. This shows significant
improvement with respect to edge preservation and noise compression.
Accurate and quick segmentation of kidney is highly essential in computer-aided
diagnosis. Jun Xie et al. came up with segmentation of kidney based on texture and
shape priors in ultrasound images [10]. In ultrasound images, region growing methods
cannot be used because of a large amount of speckle noise. Hence, they use to shape
Computer-Aided Classifier for Identification of Renal … 441
and texture priors. But using prior models is computationally exhaustive in case of CT
images because active contour models give good accuracy in such images without
the use of any prior models and much less computation or memory usage. D.W
Lin et al. divides the different kidney segmentation techniques into threshold based,
knowledge based, and region growing based [11]. He proposes a computer-aided
kidney segmentation which relies on an anatomical structure to develop a course to
fine segmentation structure. This method is applicable for kidneys of different sizes
as it makes use of relative distance of the two kidneys from the spine. Even though
the method seems promising, the entire methodology works on the assumption that
the spine is visible in the m/2th slice of the CT images where m is the total number of
slices. This could very well vary from person to person, and hence, the method could
achieve only about 88% accuracy in segmentation which demands other methods
which can result in better accuracy. S.A.Tuncer et al. developed an Android oper-
ating system in mobile devices for segmentation of kidneys and abdominal images
[12]. In his work, the vertebral column is first determined by applying to pre-process
to the images. Later, connected component labeling is used to obtain the kidney
areas. The results generated in the PC were later transferred to mobile. This method,
even though looks more promising, could attain only an accuracy of 85% which
demands another alternative yet again. N. Farzaneh et al. proposed an automated
kidney segmentation for traumatically injured patients using machine learning and
active contour modeling [13]. This method first develops a 3D initialization mask
within the abdominal cavity, and then, further divides that cavity into small patches
and extracts multiple features. The features are then used by a random forest classifier
to detect potential initialization voxels. It is followed by an adaptive region growing
on both left and right kidneys to segment out them individually and then combine
the two results to form a final segmented image. This method showed slightly more
accuracy of 88.9% as compared to Lin et al. However, since the main aim in the
proposed method is to provide highly accurate classification of tumors rather than
segmentation, a modification of Lin’s methodology is preferred as it is computa-
tionally faster as compared to Negar et.al as the latter involves 3D alignment and
machine learning.
Extracting the correct and adequate number of features is crucial to develop an
efficient classification. More the number of features, more the chances of developing
a foolproof classification. R.M. Haralick et al. identified texture as one of the most
important features for image classification [14]. They assumed that the texture content
in an image is contained in the overall or average spatial relationship which the gray
tones in an image have to one another. They developed gray tone spatial-dependence
matrix or gray-level co-occurrence matrix (GLCM) which depicts the relationship
between adjacent gray tones in an image. Using the dependence matrix, they devel-
oped a certain set of features which were tested for accuracy by the classification of
two different sets of datatypes, one being five different kinds of sandstones and other
aerial photographs of eight land use categories. M Galloway et al. developed a set
of texture features based on gray-level run lengths [15]. The gray-level run is a set
of consecutive, collinear pixels having the same gray level. Length of the run is the
442 P. R. Mohammed Akhil and M. Yadav
number of pixels in the run. Galloway suggested five texture features based on the
gray-level run length matrix (GLRLM).
Galloway observed that in a course texture, relatively long gray-level runs would
occur more often and a fine texture would primarily contain shorter runs. Later, the
research trend started to shift toward combining both GLRLM and GLCM texture
features to develop more accurate classification methodologies. Recognition of image
patterns independent of size, position, orientation, and reflection is very crucial for
the accurate analysis of medical images. Ming-Kuei Hu derived moment invariants
which could achieve that goal [16].
Accurate classification is the key for any tumor detection system to be efficient.
M.G. Linguraru et al. proposed a computer-assisted radiology tool to assess renal
tumors in triple-phase contrast-enhanced abdominal CT [17]. He classified the renal
tumors into normal cysts, Von Hippel–Lindau syndrome (VHL) lesions, and heredi-
tary papillary renal carcinomas (HPRC). M.G. Linguraru et al. also proposed another
computer-aided renal cancer classification tool from contrast-enhanced CT for proper
management and classification of renal tumors [17]. From the segmented lesions,
classification of different lesion types was done using histograms of curve-related
features using random sampling. Using histogram curve-related features (HCF), the
structural differences were quantifiable which helped in classifying between cysts
and cancers. In this method, five types of lesions were analyzed, namely benign cysts,
Von Hippel–Lindau syndrome (VHL), Birt-Hogg-Dube’(BHD) syndromes, heredi-
tary papillary renal carcinomas (HPRC), and hereditary leiomyomatosis and renal
cell cancers (HLRCC). T. Mangayarkarasi et al. proposed a computer assistive tool
for classification of different renal pathologies from ultrasound kidney images [18].
Global thresholding is applied for segmentation of the kidneys. From the segmented
kidneys, first order statistical features such as mean, entropy, and the standard devi-
ation are extracted which is used as inputs to the classifier. A probabilistic neural
network is used to classify the kidney images into the normal kidney, kidney stone,
normal cyst, or tumorous cyst. M. Koshdeli et al. developed a model using convolu-
tional neural networks for tumor grading from hematoxylin and eosin (H&E) stained
sections of kidney [19]. Using the deep learning model, they were able to classify
the sample images into six categories of normal, fat, blood, stroma, low-grade gran-
ular tumor, and high-grade clear cell carcinoma. All these methods based on the
classification of abdominal CT images simply classify the images into either benign
or malign. Some methods classify them into different types of tumor diseases. But
there has been no methodology to identify whether a particular kidney is tumorous
and at the same time to know what stage of the tumor it is currently in, which
would help in suggesting appropriate follow-up. This is the technology gap which
is being addressed in this project. Morton A. Bosniak proposed a robust classifica-
tion which helps in differentiating between complex renal cysts and renal tumors
[20]. He proposed four stages of classification of kidney based on septa formation,
calcification contrast enhancement, and thickened irregular walls. His classification
of renal tumors is used as the standard reference tool by radiologists worldwide
for initial diagnosis which regarding renal cysts. In this paper, a computer-aided
diagnosis system, which can classify the kidneys from abdominal CT into normal,
Computer-Aided Classifier for Identification of Renal … 443
2 Methodology
Figure 1 presents an overview of the proposed approach. There are four main stages:
pre-processing, kidney segmentation, feature extraction, and classification. Figure 2
represents the flow chart of the proposed CAD system. First the input scan image
is read into the system. Majority of the image processing applications occur in the
grayscale pixel range. Hence, the input scan image is converted into grayscale pixel
range if found otherwise. Luminosity method is used for the same [23]. 2D scan
images can be of different frame size varying from patient to patient, time of measure-
ment, presence of contrast, and other factors. In order to maintain uniformity in the
input frame size for the methodology proposed, we resize the input scan image to
256 × 256 pixels. CT scan images are highly susceptible to salt and pepper noise
[24]. To make sure the processing is not affected by the input ambient noise, a median
filter is used to remove noise if any [2]. This marks the end of the pre-processing
involved in our system. Both the kidney locations are identified and segmented out
using prior anatomy knowledge and adaptive rectangular contour region growing.
The necessary features are extracted from the kidneys for classification. These are
Fig. 1 Block diagram of the proposed system. Input image after being pre-processed under goes
a series of processing steps and finally classifies the input scan image as normal kidney or into
different grades if found abnormal
444 P. R. Mohammed Akhil and M. Yadav
Fig. 2 Flow chart explaining the design flow of the system starting from reading the input to the
final classification
fed to a classifier which identifies abnormalities if present and then grades them into
four different grades, namely grade 1, grade 2, grade 3, or grade 4.
2.1 Pre-processing
CT images are susceptible to a lot of noise particularly salt and pepper noise. We use
an adaptive median filter which removes the noise effectively but also has a good
amount of edge preservation [5].
Computer-Aided Classifier for Identification of Renal … 445
Once the image is de-noised and resized, both the kidneys need to be properly
segmented out for further processing. A modification of Lin’s method of kidney
segmentation is adopted in which prior anatomy knowledge is leveraged to locate
the kidneys, and adaptive region growing is carried out to accurately segment out the
kidneys [11]. Once the spine is located with prior anatomical knowledge, appropriate
location of kidney is narrowed out with respect to the spine. As opposed to the adap-
tive seed growing adopted in Lin’s method, an adaptive rectangular contour growing
is used here. Using the approximate location of kidney based on prior knowledge,
an adaptive rectangular contour is formed. The key differentiator here as compared
to the existing methods is that the size of the initial rectangular contour for region
growing varies from image to image. The algorithm is summarized as follows:
Algorithm 1: Kidney Segmentation
1. Locate spine using prior anatomical knowledge
2. Fix an initial seed point for reference based on relative distance from spine
3. Perform fuzzy-c means clustering on the image [25]
4. Perform rough segmentation of kidney region
5. Determine four corner pixel values of the roughly segmented kidney ( x1 , x2 ,
x3 , x4 )
6. Determine the rectangular contour corner pixel values ( x1 − 5, x2 − 5, x3 − 5,
x4 − 5)
7. Perform region growing operation until boundary condition is satisfied [11]
8. Segment out the detected kidney area.
Once the kidneys are segmented out, the features required for accurate classification
of the renal abnormalities need to be extracted. There are two aspects for the selec-
tion of features: (a) More the number of features, more foolproof the classification
becomes and (b) The features extracted should help in classifying irrespective of
size, translation, rotation and/or reflection.
Our extraction strategy includes determining three sets of features: (a) gray-level
co-occurrence matrix (b) gray-level run length matrix, and (c) Hu’s moments.
Gray-Level Co-occurrence Matrix (GLCM)
GLCM represents the angular and spatial relationship over an image sub-region of
specific size. Analysis of GLCM helps in understanding the textural features of the
feature. Once the GLCM matrix is calculated, a total of 20 features are extracted
from the matrix.
446 P. R. Mohammed Akhil and M. Yadav
2.4 Classification
The main part of the proposed system is to accurately identify the abnormali-
ties and also to effectively grade them into different stages. For this, the concept
of Bosniak classification is adopted. Bosniak classification helps in differentiating
between different types of cysts using traits such as calcification, septa formation,
presence of high density fluid in cysts, and irregularity of wall or solid elements.
Using these traits, the cysts are classified into four main categories:
Category 1
Simple cyst with imperceptible wall and well-rounded shape falls into this category.
These cysts are approximately 0% malignant, and hence, no follow-up is required.
Category 2
Minimally complex cyst with a few <1 mm thin septa or thin calcification falls into
this category. The lesions are non-enhancing under contrast and hence approximately
0% malignant. Follow-up for this category is also not sufficient.
Category 3
Intermediately complex cyst with thick, nodular multiple septa with measurable
enhancement under contrast falls into this category. The treatment includes partial
Computer-Aided Classifier for Identification of Renal … 447
Activation Function
The activation function is responsible for nonlinear mapping between the inputs
and response variable. For the inputs and hidden layers, we use rectifier linear units
(ReLU), defined as
where α is the leakiness parameter. For the output layer, softmax activation function
is used [40].
Regularization
It is used to avoid overfitting. Regularization works by penalizing the coefficients.
We use dropout [41, 42] for regularization of our network. In this method, nodes
are selected at random and are removed along with all their incoming and outgoing
connections at every iteration. This ensures randomness in the output produced. The
probability of dropping a node is given by ρ which is tuned for better performance
by grid search method [43]. Once sufficient randomness is achieved during training,
all the nodes are used while testing.
Loss Function
The main aim during training is to minimize the loss function. Mean squared error
is used as given in Eq. (3),
2 (i) 2
L− y − ỳ (i) (3)
n
where i varies from 1 to n, (y(i) − ỳ(i) ) is named as residual and the target of MSE
loss function is to minimize residual sum of squares.
To train ANN, the loss function must be minimized, but it is highly nonlinear.
Levenberg–Marquardt algorithm is used for optimization [44]. Figure 3 represents a
state diagram for training a neural network with the Levenberg–Marquardt algorithm.
The first step is to calculate the loss, the gradient, and Hessian approximation. Then,
the damping factor is adjusted so as to reduce the loss function at each iteration.
Computer-Aided Classifier for Identification of Renal … 449
3.1 Database
A total of 112 2D abdominal CT scan images are used as the database for developing
and validating this system. The scan images were acquired from The Cancer Imaging
Archive (TCIA). The data set was acquired from patients who are in the age ranging
from 15 to 82. Data set has gender composition of 67 male and 45 female patients.
This ensures a good diversity and authenticity to the methodology proposed.
The entire setup can be divided into two: training and validation. Training refers to
the development of the proposed classification system and validation refers to testing
the accuracy and authenticity of the developed model.
Training
A set of 82 CT scan images are used for training the feed-forward neural network for
classification. The training data set comprises of five normal kidney scans, 6 grade
1 scans, 34 grade 2 scans, 32 grade 3 scans, and 5 grade 4 scans. The diagnosis
of each of the training images is obtained after preliminary scans followed by a
biopsy if needed. For training, each image undergoes pre-processing, segmentation,
and feature extraction. The extracted features are stored in a matrix called ‘Data’
which has a dimension of i × j, where i corresponds to the number of features being
extracted and j corresponds to the number of images used for training. The dimension
of the data matrix in our case, hence, is 38 × 82. The diagnosis result corresponding
450 P. R. Mohammed Akhil and M. Yadav
Table 1 Hyperparamaters
Stage Hyperparameter Value
for the proposed method
Initialization Bias 0.1
Weights Xavier
Leaky ReLU α 0.333
Dropout p 0.4
Training Epochs 26
Gradient 0.00098142
Mu 1e-08
to each of the training image is stored in another matrix called ‘Target’ which has a
dimension of k × l where k corresponds to the number of classification stages and l
corresponds to the number of images used for training. The target matrix formed is
used as the desired output while training the neural network.
Two hidden layers have been used for the feed-forward neural network. Each
hidden layer has 30 neurons in them. The choice of number of hidden layers and
the number of neurons per layer is based on the concept of ensembles of neural
network [45]. Table 1 shows the hyperparameters which are being used in the clas-
sifier network. The values obtained are the result of repeated iterations until the
optimum performance is reached.
Testing and Validation
A set of 30 images were used for testing the performance and accuracy of the devel-
oped system. All the images being tested underwent all processing stages of the
proposed system. The result obtained from the developed system was compared
against the diagnosis result of the tumor samples after biopsy. The average time
taken for pre-processing, kidney segmentation, and feature extraction cumulatively
was 41.3 s. The classifier was observed to converge to the final result in an average
of 11 s after 26 iterations. Figure 7 depicts the box plots of the features extracted
from the segmented kidneys. It gives a representation of the range of each feature
for a particular grade of classification and also the mean value and standard error for
the same. This helps us in understanding where each grade of tumor stands in the
statistical point of view.
Figure 6 depicts the performance characteristics of the developed classifier. Best
performance is achieved at epoch 26, and the classifier terminates after that.
Figure 4 shows the simulation steps of the designed system for grade 2 type
tumor. Figure 4a represents resized scan image ready for processing. Initial rect-
angular contour is developed on filtered image as depicted in Fig. 4b, d represents
final boundary detected by our region growing algorithm which finally results in
segmented kidneys as Fig. 4d. 3D volumetric analysis of the segmented kidneys
helps in better understanding of the textural distribution as shown in Fig. 4e. Figure 5
shows the simulation results for the same work flow for grade 3 type tumor.
Computer-Aided Classifier for Identification of Renal … 451
Fig. 4 Simulation work flow of the proposed method for grade 2 type tumor. a Input CT scan
image after resizing to 256 × 256 dimension. b Adaptive rectangular contour created based on
prior knowledge of kidney. c Boundary of kidneys identified using region growing algorithm.
d Segmented kidneys. e 3D textural distribution of the segmented kidneys
Fig. 5 Simulation work flow of the proposed method for grade 3 type tumor. a Input CT scan
image after resizing to 256 × 256 dimension. b Adaptive rectangular contour created based on
prior knowledge of kidney. c Boundary of kidneys identified using region growing algorithm.
d Segmented kidneys. e 3D textural distribution of the segmented kidneys
Fig. 6 Training performance characteristics of the classifier. The classifier attains least error from
the desired output at epoch 26
452 P. R. Mohammed Akhil and M. Yadav
Fig. 7 Box plot representing the range of the different features extracted for each grade of tumor.
Each graph depicts the mean, first quartile, and third quartile values of a particular feature extracted
corresponding to each grade of tumor
The results obtained are validated using standard performance measures. True
positive fraction and true negative fraction are used to calculate accuracy, sensitivity,
and specificity of the proposed system.
Based on the available literature [46]:
• True positive fraction (TPF) is the ratio between the number of positive
observations and the number of true positive conditions.
• False positive fraction (FPF) is the ratio between number of positive observations
and the number of true negative conditions.
• True negative fraction (TNF) is the ratio between number of negative observations
and the number of true negative conditions.
Computer-Aided Classifier for Identification of Renal … 453
• False negative fraction (FNF) is the ratio between number of negative observations
and the number of true positive conditions.
Analysis of database of kidney images received;
Ntot = Number of examination cases = 30
Ntp = Positive true condition = 15
Ntn = Negative true condition = 17
Notp = Number of positive observation from Ntp = 14
Nofn = Negative condition from Ntp = 1
Notn = Negative observations from Ntn = 1
Nofp = Positive observations from Ntn = 16.
Computation of final performance measures is given as follows:
Nofp
Accuracy for normal images = (4)
Ntn
TN
Specificity = (7)
TN + FP
where
• TP = True positive—Predicts abnormal as abnormal.
• FP = False positive—Predicts normal as abnormal.
• TN = True negative—Predicts normal as normal.
• FN = False negative—Predicts abnormal as normal.
Classifier performance (Rate of Classification) is given by
Notp + Nofp
CR = ∗ 100 (8)
Ntot
Table 3 depicts the performance measures and classification rate of the proposed
method.
Table 2 shows the comparison of malignancy classification rate between our
proposed method and other existing methods. Proposed method outperformed all
existing methods depicting a classification rate of 93.5%. Out of the 30 images being
tested, once an abnormality was identified, 24 were correctly identified as the respec-
tive grade whereas 4 were misclassified. Out of these 4 images, 3 images were graded
454 P. R. Mohammed Akhil and M. Yadav
to the nearest other grade while one image was graded as grade 4 while it was reported
as grade 2. The grading accuracy of the developed system is 86.67%.
4 Conclusions
training the neural network classifier using Hu’s moments to ensure more accurate
classification. The ANN used is built over two hidden layers with 30 neurons in
each layer. The performance was tested using different configurations of the neural
network but the current configuration delivered the best performance for the given
data set. It was verified that Bosniak classification can be effectively implemented
on computer-aided diagnosis and that our system could be used as a second opinion
for expert radiologists.
The proposed method using database from The Cancer Imaging Archive (TCIA).
The method showed 93.75% classification rate in detecting the presence of any
abnormalities in kidney which makes our design superior over other existing method-
ologies. The accuracy for grading the tumors into different stages was found to be
86.67%. The accuracy of any neural network depends on the data set which is being
used for training. Larger the data set, more chances for the developed system to be
accurate. Our design was developed on 112 images, and hence, the data set is too
small justifying the lower accuracy in grading. The grading accuracy can be improved
in the future by acquiring a much bigger data set and training the neural network
with it. However, the architecture of the existing classifier might need some changes
along with the hyperparameters to ensure better performance.
Acknowledgements We are thankful to the ECE department of NIT Tiruchirappalli for providing
resources to do this work. We are also thankful to MHRD for providing scholarship during Masters.
References
1. Rini BI, Campbell SC, Escudier B (2009) Renal cell carcinoma. The Lancet 373(9669):1119–
1132
2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics
2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
CA: Cancer J Clinic 68(6):394–424
3. Brascho DJ, Bryan JM, Wilson EE (1977) Diagnostic ultrasound to determine renal size and
position for renal blocking in radiation therapy. Int J Radiat Oncol Biol Phys 2(11):1217–1220
4. Conti P, Strauss L (1991) The applications of pet in clinical oncology. J Nucl Med 32(4):623–
648
5. Warshauer DM, McCarthy SM, Street L, Bookbinder M, Glickman M, Richter J, Hammers
L, Taylor C, Rosenfield A (1988) Detection of renal masses: sensitivities and specificities of
excretory urography/linear tomography, us, and ct. Radiology 169(2):363–365
6. Reznek RH (2004) CT/MRI in staging renal cell carcinoma. Cancer Imag Off Publ Int Cancer
Imag Soc 4:S25–S32
7. Huang T, Yang G, Tang G (1979) A fast two-dimensional median filtering algorithm. Sig Proces
IEEE Trans Acoust Speech 27(1):13–18
8. Hwang H, Haddad RA (1995) Adaptive median filters: new algorithms and results. IEEE Trans
Image Process 4(4):499–502
9. Chan RH, Ho C-W, Nikolova M (2005) Salt-and-pepper noise removal by median-type noise
detectors and detail-preserving regularization. IEEE Trans Image Process 14(10):1479–1485
456 P. R. Mohammed Akhil and M. Yadav
10. Xie J, Jiang Y, tat Tsui H (2005) Segmentation of kidney from ultrasound images based on
texture and shape priors. IEEE Trans Med Imag 24(1):45–57
11. Lin D-T, Lei C-C, Hung S-W (2006) Computer-aided kidney segmentation on abdominal CT
images. IEEE Trans Inf Technol Biomed 10(1):59–65
12. Tuncer SA, Alkan A (2017) Segmentation of kidneys and abdominal images in mobile devices
with the android operating system by using the connected component labeling method. In:
Proceedings of Electronics and Microelectronics (MIPRO) 2017 40th international convention
information and communication technology, pp 1094–1097
13. Farzaneh N, Soroushmehr SMR, Patel H, Wood A, Gryak J, Fessell D, Najarian K (2018)
Automated kidney segmentation for traumatic injured patients through ensemble learning and
active contour modeling. In: Proceedings of 40th annual international conference of the IEEE
engineering in medicine and biology society (EMBC), pp 3418–3421
14. Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification.
Cybern IEEE Trans Syst Man SMC-3(6):610–621
15. Galloway MM (1974) Texture analysis using grey level run lengths. NASA STI/Recon
Technical Report N, vol 75
16. Hu M-K (1962) Visual pattern recognition by moment invariants. IRE Trans Inform Theor
8(2):179–187
17. Linguraru MG, Gautam R, Peterson J, Yao J, Linehan WM, Summers RM (2009) Renal
tumor quantification and classification in triple-phase contrast-enhanced abdominal CT. In:
Proceedings of IEEE international symposium biomedical imaging: from nano to macro, pp
1310–1313
18. Linguraru MG, Yao J, Gautam R, Peterson J, Li Z, Linehan WM, Summers RM (2009) Renal
tumor quantification and classification in contrast-enhanced abdominal ct. Pattern Recogn
42(6):1149–1161
19. Mangayarkarasi T, Jamal DN (2017) PNN-based analysis system to classify renal pathologies
in kidney ultrasound images. In: Proceedings of 2nd international conference computing and
communications technologies (ICCCT), pp 123–126
20. Khoshdeli M, Borowsky A, Parvin B (2018) Deep learning models differentiate tumor grades
from H&E stained histology sections. In: Proceedings of 40th Annual International Conference
of the IEEE Engineering in Medicine and Biology Society (EMBC), pp 620–623
21. Bosniak MA (1986) The current radiological approach to renal cysts. Radiology 158:1–10
22. Lippmann R (1987) An introduction to computing with neural nets. IEEE ASSP Mag 4(2):4–22
23. Smith K, Landes P-E, Thollot J, Myszkowski K (2008) Apparent greyscale: a simple and fast
conversion to perceptually accurate images and video. In: Computer graphics forum, vol 27,
no 2. Wiley Online Library, pp 193–200
24. Verma R, Ali J (2013) A comparative study of various types of image noise and efficient noise
removal techniques. Int J Adv Res Comput Sci Softw Eng 3(10)
25. Bezdek JC, Ehrlich R, Full W (1984) Fcm: The fuzzy c-means clustering algorithm. Comput
Geosci 10(2–3):191–203
26. Singh K (2016) A comparison of gray-level run length matrix and gray-level co-occurrence
matrix towards cereal grain classification. Int J Comput Eng Technol (IJCET) 7(6):9–17
27. Mohanty AK, Beberta S, Lenka SK (2011) Classifying benign and malignant mass using glcm
and glrlm based texture features from mammogram. Int J Eng Res Appl (IJERA) 1(3):687–693
28. Alegre E, GonzáLez-Castro V, Alaiz-RodríGuez R, GarcíAOrdáS MT (2012) Texture and
moments-based classification of the acrosome integrity of boar spermatozoa images. Comput
Methods Programs Biomed 108(2):873–881
29. Chaieb R, Kalti K (2018) Feature subset selection for classification of malignant and benign
breast masses in digital mammography. Pattern Anal Appl 1–27
30. Li Y, Fan F (2005) Classification of schizophrenia and depression by EEG with anns*. In:
Proceedings of IEEE Engineering in Medicine and Biology 27th Annual Conference, pp 2679–
2682
31. Sadeghkhani I, Ketabi A, Feuillet R (2012) Radial basis function neural network application
to power system restoration studies. Comput Intell Neurosci 3(10)
Computer-Aided Classifier for Identification of Renal … 457
32. Van Biesen W, Sieben G, Lameire N, Vanholder R (1998) Application of kohonen neural
networks for the non-morphological distinction between glomerular and tubular renal disease.
Nephrol Dial Transplant 13(1):59–66
33. Arik SO, Chrzanowski M, Coates A, Diamos G, Gibiansky A, Kang Y, Li X, Miller J, Ng A,
Raiman J et al (2017) Deep voice: real-time neural text-to-speech. arXiv preprint arXiv:1702.
07825
34. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251
35. Jian Z, Wu WX (2011) The application of feed-forward neural network for the x-ray image
fusion. J Phys Conf Se 312(6):062005
36. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feed forward neural
networks. In: Proceedings of the thirteenth international conference on artificial intelligence
and statistics, pp 249–256
37. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional
neural networks. In: Advances in neural information processing systems, pp 1097–1105
38. Jarrett K, Kavukcuoglu K, LeCun Y et al (2009) What is the best multistage architecture for
object recognition? In: 2009 IEEE 12th international conference on in computer vision. IEEE,
pp 2146–2153
39. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic
models. In: Proceedings of ICML, vol 30, no 1, p 3
40. Dunne RA, Campbell NA (1997) On the pairing of the softmax activation and cross-entropy
penalty functions and the derivation of the softmax activation function. In: Proceedings of 8th
Aust. conference on the neural networks, Melbourne, vol 181. Citeseer, p 185s
41. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple
way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
42. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving
neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.
0580,
43. Gal Y, Hron J, Kendall A (2017) Concrete dropout. In: Advances in neural information
processing systems, pp 3581–3590
44. Moré JJ (1978) The Levenberg-Marquardt algorithm: implementation and theory. In: Numerical
analysis. Springer, Berlin, pp 105–116
45. Hansen L, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell
12(10):993–1001
46. Diakides M, Bronzino JD, Peterson DR (2012) Medical infrared imaging: principles and
practices. CRC Press, Boca Raton
Recognition of Obscure Objects Using
Super Resolution-Based Generative
Adversarial Networks
B. Sathyabama, A. Arunesh, D. SynthiyaVinothini, S. Anupriyadharsini,
and S. Md. Mansoor Roomi
Abstract Object recognition has achieved a good progress in computer vision, but
still it is a difficult task in case of low-resolution images, because traditional discrim-
inant features in high resolution usually disappear in low resolution. In surveil-
lance systems, the region of interests gets blurred due to the distance between the
camera and object and also due to illumination effects. In low-resolution images,
objects appear very small and blurred, thus making recognition of those objects
tedious. Super resolution for natural images is a classic and difficult problem in
image and video processing. But rapid developments in deep learning have recently
sparked interest in super resolution of images. In this paper, generative adversarial
network (GAN) that has been successfully employed in generating images and real-
istic textures with fine details has been extended to the application of image super
resolution. The paper aims at improving the resolution of the obscure objects to
improve the classification accuracy of the system. This is done by detecting obscure
objects using RCNN, then improves its resolution using GAN network and finally
classifies the improved images using AlexNet. The experiment is conducted using
a MSCOCO dataset and collected shoe database in Google COLAB. The super
resolved images increase the classification accuracy by 16%.
1 Introduction
object category. Other factors that complicate the process of object detection are
viewpoint and scale, partial occlusions, illumination and multiple instances. Object
detection is the process of finding the positions of each and every concerned objects
present in an image. It is actually the process of finding the bounding box for each
object. One approach is to use a sliding window to scan the image in scale space
and to classify each window individually. According to Dalal, in INRIA dataset, the
detection performance is 90% when the resolution of the image is 640 × 480, sliding
window and size of the object are 64 × 128. But the detection performance dropped
to 40% when the sliding window is 16 × 32. Since the proper size of the pedestrian
in the image is unknown previously, it is difficult to detect when the resolution is
low. Objects in low resolution are mostly missed in traditional method of detection
as shown in Fig. 1. It can be noted that traditional detectors detect humans which
are larger in images than the obscure objects like cars. The objects bounded by red-
bounding boxes are not so clear which make them hard to recognize. From Fig. 1
[1] the pedestrians marked as 5 and 6 can’t be recognized properly because of its low
resolution. Hence the objects 5 and 6 are classified as Obscure objects. Similarly ,
the cars marked as red boundary in Fig. 2 are not recognized.
Object detection is important in both low resolution and high resolution. It is diffi-
cult in low resolution than high-resolution image. Image super resolution recovers
a single or a sequence of high-resolution images from a single or sequence of low-
resolution images. It has several practical applications in various real-world problems
in a wide range of fields like satellite and remote sensing imaging, medical imaging,
computer vision, security, biometrics, and forensics, etc.
Deep learning is currently progressing in many computer vision fields. With available
large datasets and computation power, deep learning achieves good accuracy by
an end-to-end learning. With the advent of SR based on the convolutional neural
network (SRCNN), deep learning is dynamically increasing the SR performance
[3]. SRCNN is a three-layer shallow network that learns an end-to-end nonlinear
mapping function. Subsequently, deeply recursive convolutional network (DRCN)
architecture has a small model parameter yet permits pixel dependencies for a long
range. Dilated convolutional neural network (DCNN) uses dilated convolutions also
known as convolution which is a vivid method to increase the receptive field of the
network exponentially with linear parameter accretion. Increasing the depth of the
network can efficiently increase the model’s accuracy as they are potential to model
462 B. Sathyabama et al.
high complex mapping. Such deep networks can be efficiently trained using batch
normalization. The learning ability of CNN is made powerful with skip connections
and residual blocks, where instead of identity learning the network learns the residue.
This design choice has relived the network from vanishing gradient problem which
remained a bottleneck in training deep networks. VDSR has increased network depth
by piling more convolution layers with residual learning. EDSR and MDSR use the
residual block to build a wide and deep network with residual scaling, respectively.
2 Proposed Methodology
Fig. 3 Proposed
methodology
Input Image
STOP
In the proposed method, for the identification of multiple objects using super
resolution-based deep learning approach, two stages of three networks RCNN, GAN,
and ALEXNET are used, as shown in Fig. 4.
Stage I Object Detection using RCNN. Convolutional neural networks (ConvNets
or CNNs) have proven very effective in recognition and classification of images.
RCNN helps in localizing the objects. RCNN consists of three simple steps. The
input image is scanned for possible objects to generate region proposals. CNN runs
on top of each of the regions. The output of each CNN is fed into a classifier that
464 B. Sathyabama et al.
classifies the region, and then a linear regression is done to tighten the bounding
box of the object. For an obscure object like license plate image as input image, the
region of the objects (number plate, numbers, fonts) is extracted, and this region is
fed as input to the CNN. The network trains and extracts features for the objects. The
result is the trained network which can be used in the testing stage.
Stage II. Super Resolution Using Generative Adversarial Network. Now the
boundary of the detected object is extracted, and the region of interest is cropped and
fed as an input to the generative adversarial network for super resolution [4]. The
GAN usually consists of two networks: generator network and discriminator network.
Different styles of architectures can be used in both generator and discriminator
networks.
To enhance the overall quality of the reconstructed SR image, this section first
proposes a novel network design for generator and discriminator and then the
improved loss function. It is started with a HR version of the input image followed by
the lower version. In order to train the generator, the low-resolution image should be
given, to get the output close to the high-resolution version. The obtained output is
super resolved image. Then the discriminator will be trained to distinguish between
the images. The generator network uses a set of residual blocks that comprises ReLUs
and BatchNorm and convolution layers. After the low-resolution images pass through
these blocks, there are two deconvolution layers that increase the resolution. The
discriminator has eight convolutional layers that lead to sigmoid activation function
that produces the probabilities of whether the image is of high resolution, the real
image or super resolution, and artificial image. The architecture of GAN is shown in
Fig. 5 [4].
Loss Functions. Loss functions are actually a weighted sum of individual loss
functions.
Content Loss. It is the Euclidean distance loss between the feature maps of the new
reconstructed image (i.e., the output) and the actual high-resolution training image.
Adversarial Loss. It supports the outputs that are similar to the original data distri-
bution by negative log likelihood. With the help of this loss function, the generator
makes sure to output the larger resolution images which look natural and also retain
almost same pixel space when compared to the low-resolution version.
−3 SR
l SR = l SR
X + 10 l GEN (1)
Recognition of Obscure Objects Using Super Resolution-Based … 465
Mean Square Error Loss. It is the difference sum over the generate image and
the target image; this is clearly minimized when the generated image is close to the
target.
1 HR LR 2
rW r H
SR
lMSE = I − G θ I (2)
r 2 WH x=1 y=1 x,y G x,y
VGG Loss. It is the difference sum of the feature space from the VGG network
instead of the pixels, and features are matched instead. It makes the generator much
more capable of producing natural looking images than by pure pixel matching alone.
Wi, j Hi, j
1 2
j = ∅i, j I HR x,y − ∅i, j (G θG I LR )x,y
SR
lVGG/i. (3)
Wi, j Hi, j x=1 y=1
Thus, the ROI is super resolved by using generative adversarial networks and
further provided for classification.
Stage III Classification Using AlexNet. AlexNet is a pre-trained convolutional
neural network model. As a result of learning, this network has rich feature repre-
sentations for a given wide range of images. This network takes an input image and
outputs a label for all the objects along with the probabilities. It supports transfer
learning [5].
This model has 25 layers which comprises five convolutional layers and three
fully connected layers. Normalization is done after each convolutional layer. ReLU
is performed after each convolutional and fully connected layer shown in Fig. 6 [5].
Dropout is applied before the first and the second fully connected layer. Dropout
466 B. Sathyabama et al.
and normalization are the two layers added to the AlexNet model other than the four
common layers of CNN. Dropout is a regularization technique to reduce overfitting
in neural networks. When a function is too closely fit to a limited set of data points,
overfitting occurs. It prevents complex co-adaptations on training data [5]. It is a
best way to perform model averaging with neural networks. Normalization layer
performs a transformation which maintains the mean activation almost as 0 and the
activation standard deviation close to 1.
So the first image of obscure object is detected using RCNN, the detected image is
super resolved by using generative adversarial network, and finally the super resolved
is classified by using AlexNet.
There are three stages of training and testing. In the stage I of object detection using
RCNN, the MSCOCO dataset is collected from the Google and object detection is
implemented in KERAS, python.
In the stage II of super resolution using generative adversarial networks, the shoe
database consists of 50,025 images that are collected for training. Shoe is taken as
one of the obscure objects, and the super resolved shoe image is classified under
eight classes: sandals, shoes, slippers, pre-walker boots, over the knee, mid-calf,
ankle, and boots by using AlexNet in stage III of classification. The database images
containing the obscure object shoe are collected and shown in Table 1.
The database images containing the obscure objects (shoes) are collected and are
shown. Totally, 50,025 shoe images are collected under this fourth category from UT
Zappos 50K dataset as shown in Fig. 8 [7].
In order to train the AlexNet to recognize the super resolved image, the shoe images
(2100 images) are categorized into eight classes, namely ankle, knee calf, mid-calf,
over the knee, pre-walker boots, sandals, slippers, and shoes and trained for 500
iterations.
468 B. Sathyabama et al.
For RCNN, the input is the region of the object in the image. The region of the object
in the image is extracted and labeled. This region is stored which is used as an input for
RCNN. Since the region of the object is given as the input, this convolutional neural
network is known as the region-based convolutional neural network. This extraction
of regions helps in the easy recognition of objects even in complex background.
Here the mask RCNN model [6] is previously trained by using MSCOCO dataset
and that model is used for object Detection. The result of mask RCNN is shown in
Fig. 9. All the objects like human airplanes and cars can be detected by using MS
COCO dataset. These objects which cannot be recognized are defined as obscure
objects and GAN-based super resolution is applied.
Thus, this pre-trained model [6] is used to detect obscure objects in an image.
The boundaries of obscure object which is the region of interest (shoe is considered
here) are extracted. By using the bounding box coordinates of ROI, the object is
cropped and given as the input to generative adversarial network (GAN). Mask
RCNN provides good accuracy comparing to any RCNN networks as it involves
instance segmentation process.
For GAN, two phases of training are carried out with different datasets of images.
Table 2 presents the training stage of GAN.
Recognition of Obscure Objects Using Super Resolution-Based … 469
Table 2 Training
Phase I(SRGAN X4) Phase II (SRGAN X4)
stages—GAN
Total images 682 50,025
Training set 650 951
Test set 31 249
Dimensions 200 × 200 × 3 100 × 100 × 3
Epochs 500 500
Batch size 4 4
470 B. Sathyabama et al.
Input to Generator
First 100 × 100 image is down sampled to 25 × 25 image and resized to 100 × 100
image to get low-resolution image.
AlexNet-based Classification
The first layer, the image input layer, requires input images of size 227-by-227-by-3,
and here 3 is the number of color channels. In the implementation of AlexNet, to split
the network across two limited memory GPUs for training, some convolutional layers
use filter groups. In these layers, the filters are split into two groups. The layer splits
the input into two sections along the channel dimension and then applies each filter
group to a different section. The layer then concatenates the two resulting sections
together to produce the output. For example, in the second convolutional layer in
AlexNet, the layer splits the weights into two groups of 128 filters. Each filter has 48
channels. The input to the layer has 96 channels and is split into two sections with 48
channels. The layer applies each group of filters to a different section and produces
two outputs with 128 channels. The layer then concatenates these two outputs to give
a final output with 256 channels. The network has almost 62.3 million parameters.
These parameters are used to define the labels for each classes. This AlexNet model is
used for training purpose. In AlexNet, totally 2100 images of eight classes are trained
under different categories after creating a CSV file, and three different datasets are
tested in order to know about the classification accuracy (Fig. 10).
It can be seen that the low-resolution images are classified with the accuracy
of 68%, and the super resolution images are classified with more accuracy of 84%.
Hence, the super resolution helps in recognition and classification of obscure objects.
Performance Metrics
Table 3 shows the performance analysis with PSNR and SSIM metrics and Table 4
shows Alex Net Accuracy Results.
4 Conclusion
This paper has implemented an obscure object recognition system in which the
obscure objects are detected using RCNN and then super resolved using genera-
tive adversarial networks. The resolution improved images are then classified by
using a standard classifier AlexNet. The experiment is conducted in two phases:
One is using MSCOCO dataset, and the other is using collected shoe dataset. The
proposed work is simulated in Google Colab framework with Keras API (TensorFlow
backend). The experimental results have shown that the accuracy of classification is
obviously increased by 16% for the super resolved obscure objects. This task finds
direct applications in footprint detection and recognition, abnormal event detection
from surveillance videos, and license plate recognition, extracting information from
Recognition of Obscure Objects Using Super Resolution-Based … 471
Table 3 Performance
Peak signal-to-noise ratio Structural similarity
metrics GAN
(PSNR) (SSIM)
Phase I Maximum value: 43.4 db Maximum value:
(Range: 40–43 db) 0.356
Phase II Maximum value: 48.2 db Maximum value:
(Range: 43–48 db) 0.422
satellite imagery and obscure objects recognition. The proposed method helps in
better recognition of obscure objects [1, 8].
References
1. https://www.cis.upenn.edu/%7Ejshi/ped_html/
2. Huang J-B, Singh A, Ahuja N (2015) Single image super resolution from transformed
self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp 5197–5206
3. Fu L-C, Liu CY (2001) Computer vision based object detection and recognition for vehicle
driving. In: Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automa-
tion (Cat. No.01CH37164), Seoul, South Korea, pp 2634–2641, vol 3. https://doi.org/10.1109/
ROBOT.2001.933020
4. Ledig C (2017) et al. Photo-realistic single image super-resolution using a generative adversarial
network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Honolulu, HI, pp 105–114. https://doi.org/10.1109/CVPR.2017.19
5. Krizhevsky A, Sutskever I, Hinton GE (2012) Image net classification with deep convolutional
neural networks
6. https://github.com/matterport/Mask_RCNN
7. http://vision.cs.utexas.edu/projects/finegrained/utzap50k/
8. Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for
accurate object detection and segmentation. IEEE Trans Pattern Anal Machine Intell 38(1):142–
158. https://doi.org/10.1109/TPAMI.2015.2437384
Low-Power U-Net for Semantic Image
Segmentation
This work is supported by TEQIP-III project at NIT Meghalaya and was funded by World Bank,
NPIU, and MHRD, Govt. of India.
1 Introduction
Image segmentation is the task of dividing an image into various regions correspond-
ing to the distinct characteristics of the pixels. Semantic image segmentation is the
task of labeling each pixel of an image to a corresponding class. Let S be the complete
contiguous region occupied by an image. Then, image segmentation can be viewed
as a process that divides S into n sub-regions, S1 , S2 , ....Sn , such that
n
1. Si = S
i=1
2. Si is
a connected set, for i = 0, 1, 2, …, n
3. Si ∩ S j = ∅ for all i and j, i = j
4. F(Si ) = TRUE for i = 0, 1, 2, …, n
5. F( Si ∪ S j ) = FALSE for any adjacent regions Si and S j .
where F(Si ) is a analytical base defined over the points in set Sk , and ∅ is the null
set. The basic problem in segmentation is to divide an image into regions that satisfy
the above conditions. Humans use ample data when performing segmentation, but
putting the knowledge into effect would require substantial human efforts, computa-
tion time and a database with substantial knowledge of domain. Deep neural network
(DNN) segmentation succeeds in dealing with these problems by extracting the field
understanding from a database of pixels that are labeled. An image segmentation
neural network can process small areas of an image to extract simple characteris-
tics. A decision-making mechanism or another neural network can then integrate
these characteristics to label the areas of an image accordingly. When compared to
other deep learning techniques, convolutional neural networks (CNNs) have shown
exceptional functioning in various computer vision problems like segmentation [14],
detection of objects [19], etc.
CNNs play a major role in the development of computer vision with deep learning.
They are almost similar to normal neural networks and specialize in capturing the
temporal and spatial information of the inputs. A CNN/ConvNet makes the appar-
ent presupposition that images are inputs, which makes it easier to us to embed
explicit properties into the architecture. This results in the reduction of variables
in the network makes them efficient when compared to feed-forward neural nets.
CNNs contain filters/kernels and biases that learn various patterns/objects in parallel
Low-Power U-Net for Semantic Image Segmentation 475
specific to the images during the training phase and use that knowledge to identify
the objects during inference. The individual building blocks of a CNN are namely
convolutional layer, pooling layer and flattening layer [9].
The motivation behind quantizing neural networks is to make the models compact
without any or negligible loss in accuracy. With quantized neural networks, bitwise
operators can be used rather than floating point operations to perform forward and
backward passes. Particularly using fixed point operations saves energy [7] and is
much suitable where power consumption is a critical factor. The components that can
be quantized in a neural network are weights, activations and gradients. Gradients are
quantized to reduce the communication cost between processing units when training,
whereas weights and activations are quantized to reduce the computational intensity
and network memory footprint during inference. Quantized neural networks (QNN)
can have independent data representation of inputs, weights and output activations,
as well as different bit widths in different layers of the same networks.
2 Related Works
Ronneberger et al. [14] proposed a network that can give accurate segmentations
even when trained on very small number of images (in the order of 10 s). The
network consists of contracting path (downsampling/subsampling path)where con-
volutions are applied on input image and features are extracted and an expansive
path (upsampling path) which uses the features extracted in the previous step to
construct a segmented images using up convolutions. This network is more robust
to noise and is especially suitable for medical image segmentation. Moons et al.
[12] presented a methodology to reduce the consumption of energy of embedded
neural networks by presenting quantized neural networks. Also, a hardware energy
model is presented for topology selection. In [11], authors presented wide reduced-
precision network (WRPN) for quantization of weights and activations. It was found
that the activations occupy more memory than weights. Hence, the authors adopted
a strategy of using increased number of filters in each layer to compensate accuracy
degradation due to quantization. DNNs have large number of variables but not all
of them are of equally significant. DNNs are more immune to noise [18]. Random
noise functions act as regularizers, and adding noise to inputs or weights occasion-
ally can attain more desirable performance. In a quantized neural network (QNN),
low-precision operations can be considered as random noise. Recent theories like the
ones discussed in [2] suggest that QNNs still preserve many significant properties
of their full-precision equivalents. Implementation of CNNs on FPGAs came into
sight in the 90s when virtual image processor was introduced by Cloutier et al. [3].
Virtual image processor is a single-instruction stream multiple-data stream (SIMD)
multiprocessor architecture. It achieves better performance by using approximate
computing to reduce computational intensity. Umuroglu et al. [20] introduced neu-
ral network library for fast and easy implementation of neural networks on FPGAs.
The authors employed a novel set of optimizations that allowed robust mapping of
binary neural network (BNN) to hardware. They implemented fully connected layer,
convolution layer and pooling layer. They achieved significant performance with
low-power on MNIST dataset, CIFAR-10 dataset and SVHN datasets. Nurvitadhi et
al. [13] not long ago assessed emerging DNN algorithms on present day graphics
processing units (GPUs) and FPGAs. The results show that the present trends in
CNNs may support FPGAs. In some cases, FPGAs may offer greater performance
than GPUs. Suda et al. [17] implemented extensible layers on FPGA and proposed a
methodical design space exploration methodology to increase the throughput of an
OpenCL-based FPGA accelerator for a given CNN model, taking into account the
FPGA resource limitations. Farabet et al. [5] presented a coherent execution of CNN
on a low-end digital signal processor (DSP)-aimed FPGAs. The execution utilizes
the intrinsic parallelism of CNN and yields full benefit from multiply and accumulate
(MAC) units on the FPGA. They demonstrated that with proper memory bandwidth
to an external memory, interesting performance can be achieved. Faraone et al. [6]
argue that in place of concentrating on developing robust designs to speed up the
well-known low-precision CNNs, we should also attempt to alter the network to befit
Low-Power U-Net for Semantic Image Segmentation 477
3 Methodology
Fig. 2 Flowchart
representation of
development flow Stop
Choose a problem
and prepare dataset that
suits
your problem
Run the application
and copy the results
back into host PC
for postprocessing
Choose a network
No
Export the compiled
model into SDK and
create a linux
Reached application
expected
mIoU
(chosen metric)
Yes
No
Negligible
Evaluate mIoU or no reduction
of the quantized in mIoU
model
480 V. Venkata Bhargava Narendra et al.
U-Net is a popular CNN. The architecture is built upon fully convolutional network
(FCN) [15] such that it can work even with very less number of training images. The
network comprises a downsampling path and upsampling path. The downsampling
path is made up of repeated implementation of 3 × 3 sliding-window convolutions
where each convolution is followed by a nonlinearity (ReLU), 2 × 2 max pooling unit
with stride 2 for subsampling. At every subsampling step, the number of feature maps
is increased by two times. The upsampling path is similar to that of the downsampling
path except that the pooling layers in the downsampling path are replaced with
deconvolutions or up-convolution to improve the resolution of the feature maps.
High-resolution feature maps, i.e., feature maps from the second convolution layer
in each step of the downsampling path are concatenated with the output from the
upsampling path. The network is implemented using Caffe framework [8] with some
minor modifications in the architecture so that the network can be deployed on
FPGA.1 The modifications are as follows:
• Replacing the un-pooling layer with deconvolution layer in the expansive path
• Replacing all PReLU with ReLU
3.5 Dataset
The data used to train the network is obtained from [4]. The dataset2 concentrates on
semantic understanding of urban street scenes. The dataset consists of 5000 images
with high-quality annotations, 20,000 images with coarse annotations from 50 cities
which are taken over a span of several months during daytime. The dataset contains 30
classes which are instance-wise annotated and also dense annotated so that the dataset
can be used for both instance segmentation, semantic segmentation, respectively.
Images with high-quality annotations are selected for the training of the network.
3.6 Training
The network is trained on 2975 images with high-quality annotations using Caffe
framework. Adam optimizer [10], a stochastic gradient descent optimization strat-
egy, is used to achieve convergence. The images are resized to 512 × 512 and are
normalized per channel on the fly (Fig. 4).
Training the network greatly depends on choice of hyperparameters, and a wrong
selection can lead to overfitting, underfitting or even not to train at all. The most
important hyperparameter is learning rate (0.0005) which tells the optimizer how far
has to move the weights in the direction of the gradient. As described in [14], a high
momentum is used (0.9) such that a great number of samples that are already seen
during the training decide the update in the ongoing optimization step. The large
weights are penalized by a factor of 0.0005. The “softmax with loss layer” of the
Caffe framework is employed to calculate the cost function. The network processes
eight (batch size) RGB training images for each iteration. The network is trained for
2000 iterations on two Nvidia Quadro RTX 5000 GPUs simultaneously (multi-GPU
training). To reduce the training complexity, the network is trained on 19 classes out
of the 30 classes available. The classes and their corresponding legends are shown
in Fig. 5.
INPUT OUTPUT
512X512 3
512X512 3
Conv2d 1X1 +
Softmax
Conv2d 3X3 + BN
+ReLU 512X512 64
Conv2d 3X3 + BN
+ReLU
512X512 64
512X512 64
Conv2d 3X3 + BN
+ReLU
512X512
Conv2d 3X3 + BN
512X512 64 64 +ReLU
512X512 64
Pool 2X2
Upsample 2X2
256X256 64
Conv2d 3X3 + BN
256X256 128 +ReLU
256X256 128
Pool 2X2
Upsample 2X2
Conv2d 3X3 + BN
+ReLU Conv2d 3X3 + BN
+ReLU
Conv2d 3X3 + BN
+ReLU Conv2d 3X3 + BN
128X128
+ReLU
256
128X128 256 128X128 256
Pool 2X2
Upsample 2X2
64X64 512
64X64 256
Conv2d 3X3 + BN
Conv2d 3X3 + BN +ReLU
+ReLU
64X64 512
64X64 512
64X64 Conv2d 3X3 + BN
Conv2d 3X3 + BN +ReLU
+ReLU
32X32 512
64X64 512
32X32
64X64 512
32X32
Conv2d Conv2d
Pool 2X2 3X3 + BN 3X3 + BN Upsample 2X2
+ReLU +ReLU
Input shape of a particular layer are defined on the top-left side of the box and input channels are represented on top-right side of the box.
whereas for upsampling path they are on bottom left and bottom right respectively
ZedBoard™is a low-cost development kit which uses the Xilinx Zynq® -7000
APSoC. The board comprises all the necessary connections, ports and assisting func-
tions to facilitate a variety of uses. The expansible characteristics of the board make
it best suitable for quick prototyping and proof-of-concept development. The PS-PL
connections of ZedBoard™3 are made as shown in Fig. 6 The hardware utilization
reports are shown in Table 2.
As the training progressed, regular mIOU measurements were taken to score the
models against the Cityscapes validation dataset (500 images). The mIOU plot is
shown in Fig. 7. From Fig. 7, it is observed that after 2000 iterations, the mIOU
3 The
scripts required to develop the PS-PL system can be found at “LFAR: Porting the ResNet-50
CNN application to a ZedBoard”.
484
achieved is 0.27124. This model is used for quantization. Three different quantized
models are generated each with different precision which are
• INT8 (Both the weights and activations are 8-bit quantized)
• A6W8 (Activations are 6-bit quantized, and weights are 8-bit quantized)
• INT4 (Both the weights and activations are 4-bit quantized).
The mIOU of the floating point(FP) model and the quantized models is shown in
Table 3. The quantized models were scored against the Cityscapes validation dataset.
486 V. Venkata Bhargava Narendra et al.
From Table 3, it can be observed that there is almost no loss of mIOU for 8-bit
quantization of weights and activations and there is a drastic reduction of mIOU
for 4-bit quantization of the weights and activations. Whereas the loss of A6W8
quantized model lies somewhere in between INT8 and INT4 quantized models. The
8-bit quantized model is compiled for the development of the application. Finally,
a multithread segmentation application was developed and deployed on FPGA. All
500 images of the validation dataset could not fit on the buffer due to hardware
limitations. So, the network was tested on five images, and out of which, a sample
image and inference results of GPU and FPGA are shown in Fig. 8. The performance
profiling information of the 8-bit neural network on FPGA is shown in Table 4.
The profiler gives the following details for each layer.
• Workload: The computation workload of DPU kernel in the unit of MOPS.
• Memory: The memory space in the unit of MB that is required by the DPU for
feature maps of hidden layers layers.
• RunTime: Time taken to execute each layer.
• Performance: The computation efficiency or performance in the unit of GOPS.
• Utilization: The effective use of DPU in %.
• MB/S: The average bandwidth of DDR memory access.
The power consumption and execution time of the network in graphics processing
unit (Nvidia Quadro RTX 500) and FPGA (ZedBoard) are shown in Table 5.
The ZedBoard execution time includes reading/preparing the input and writing
the output, whereas the GPU measurement only includes the forward inference time
of the models.
5 Discussion
The resource utilization of the system that is deployed on the programmable logic
part of ZedBoard is shown in Table 2. As the network is large, we can observe
from Table 2 that almost all the block RAMs and DSPs are utilized and also a
significant portion of other resources is also utilized. U-Net is developed using Caffe
framework. The network is trained for 2000 iterations on two Nvidia Quadro RTX
GPUs and achieved an mIOU of 0.272214. The average forward pass time when
the network is deployed on GPU is 193.986 ms. The power consumed by GPU
Low-Power U-Net for Semantic Image Segmentation 487
for forward inference is 178 W. The network is 8-bit quantized and deployed on
ZedBoard. The time taken for execution of network (includes reading/preparing &
saving/writing the image) is 2251.94 ms. The power consumed by the ZedBoard
is 55.5 W which is approximately one-third of the power consumed by GPU. The
lack of acceleration of the application can be attributed to the insufficient amount
of resources on ZedBoard. One possible way of achieving acceleration is to prune
the network and remove redundant variables in the work. This way the network is
further compressed, acceleration maybe achieved, and also, power consumption can
further be reduced.
6 Conclusion
References
3. Cloutier J, Cosatto E, Pigeon S, Boyer FR, Simard PY (1996) VIP: An FPGA-based processor
for image processing and neural networks. In: Proceedings of fifth international conference
on microelectronics for neural networks. IEEE, pp 330–336. https://doi.org/10.1109/MNNFS.
1996.493811
4. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele
B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of
the IEEE conference on computer vision and pattern recognition, pp 3213–3223. https://doi.
org/10.1109/CVPR.2016.350
5. Farabet C, Poulet C, Han JY, LeCun Y (2009) CNP: An FPGA-based processor for convolu-
tional networks. In: 2009 International conference on field programmable logic and applica-
tions. IEEE, pp 32–37. https://doi.org/10.1109/FPL.2009.5272559
6. Faraone J, Gambardella G, Fraser N, Blott M, Leong P, Boland D (2018) Customizing low-
precision deep neural networks for FPGAs. In: 2018 28th International conference on field
programmable logic and applications (FPL). IEEE, pp 97–973. https://doi.org/10.1109/FPL.
2018.00025
7. Guo Y (2018) A survey on methods and theories of quantized neural networks. arXiv preprint
arXiv:1808.04752
8. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T
(2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd
ACM international conference on Multimedia, pp 675–678. https://doi.org/10.1145/2647868.
2654889
9. Kalade S (2019) Deep learning. In: Exploring Zynq MPSoC: With PYNQ and machine learning
applications, Chap. 20, 1 edn. Strathclyde Academic Media, pp 481–508
10. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint
arXiv:1412.6980
11. Mishra A, Nurvitadhi E, Cook JJ, Marr D (2017) WRPN: wide reduced-precision networks.
arXiv preprint arXiv:1709.01134
12. Moons B, Goetschalckx K, Van Berckelaer N, Verhelst M (2017) Minimum energy quantized
neural networks. In: 2017 51st Asilomar conference on signals, systems, and computers. IEEE,
pp 1921–1925. https://doi.org/10.1109/ACSSC.2017.8335699
13. Nurvitadhi E, Venkatesh G, Sim J, Marr D, Huang R, Ong Gee Hock J, Liew YT, Srivatsan
K, Moss D, Subhaschandra S, Boudoukh G (2017) Can FPGAs beat GPUs in accelerating
next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA interna-
tional symposium on field-programmable gate arrays. FPGA’17. Association for Computing
Machinery, New York, NY, USA, pp 5–14. https://doi.org/10.1145/3020078.3021740
14. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image
segmentation. In: International conference on medical image computing and computer-assisted
intervention. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
15. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation.
IEEE Ann History Comput (04):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
16. Su J, Fraser NJ, Gambardella G, Blott M, Durelli G, Thomas DB, Leong PH, Cheung PY (2018)
Accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable
logic. In: International symposium on applied reconfigurable computing. Springer, pp 29–42.
https://doi.org/10.1007/978-3-319-78890-6_3
17. Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo JS, Cao Y (2016)
Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neu-
ral networks. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-
programmable gate arrays. FPGA’16. Association for Computing Machinery, New York, NY,
USA, pp 16–25. https://doi.org/10.1145/2847263.2847276
18. Sung W, Shin S, Hwang K (2015) Resiliency of deep neural networks under quantization. arXiv
preprint arXiv:1511.06488
19. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Low-Power U-Net for Semantic Image Segmentation 491
20. Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre, M., Vissers, K.: Finn: A
framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017
ACM/SIGDA international symposium on field-programmable gate arrays, pp 65–74. https://
doi.org/10.1145/3020078.3021744
21. Yazdanbakhsh A, Park J, Sharma H, Lotfi-Kamran P, Esmaeilzadeh H (2015) Neural acceler-
ation for GPU throughput processors. In: Proceedings of the 48th international symposium on
microarchitecture, pp 482–493. https://doi.org/10.1145/2830772.2830810
Electrocardiogram Signal Classification
for the Detection of Abnormalities Using
Discrete Wavelet Transform
and Artificial Neural Network Back
Propagation Algorithm
1 Introduction
The electrical behavior of cardiac muscle shows the periodic contraction and the
periodic relaxation of the cardiac muscle which is being designated as an electrocar-
diogram signal or waveform and is abbreviated as ECG waveform. The waveform for
making the analysis of electrocardiogram signal is utilized for making the diagnosis
of different heart diseases or abnormalities. The conditions of cardiac muscles are
utilized for making the diagnosis by an essential tool represented as electrocardio-
graphy. The different stages of ECG signal processing mechanisms are possessed
with preprocessing (removal of noise), removal of baseline wander, extraction of
features, selection of extracted features, and finally detection of cardiac arrhythmias
in terms of classifying the abnormal state of heart from the normal state. An ECG
signal is possessed with five basic peaks in the waveform, and they are denoted as P
peak, Q amplitude peak, R amplitude peak, S amplitude peak, T amplitude peak, and
rarely the presence of U amplitude peak might also exist. The representation of P
peak in the ECG waveform denotes the state of depolarization, the QRS peak in the
ECG waveform is usually represented in terms of QRS complex which denotes the
state of depolarizing the ventricles and the representation of T peak denotes the state
of ventricle repolarization [1]. One of the most essential sections of analyzing the
ECG waveform is purely dependent on the QRS complex shape that exists. The ECG
waveform for which the measurement has been undergone might get differed for the
same subject in such a way that the existence of heart beat has been determined in
various time durations and similar type of waveform could be acquired by the ECG
device for different subjects [2]. The cells of pacemaker within the sinoatrial node
(SA node) are utilized for generation and the regulation of heart rhythm periodically
which is placed at the top position of atrium in right side. The normal rhythmic beat
type of cardiac muscle is determined to be as most regular, and the depolarization
state of atrial section is most probably followed by the depolarization ventricles. For
instance, if it is of the arrhythmia state, the cardiac rhythm will be considered as the
most irregular and that might be either too much of fast or too much of slow.
Very few methods for analyzing, the ECG signal for detecting the cardiac arrhyth-
mias has been implemented for many years in order to enhance the value of sensitivity
and accuracy in classifying it. These methodologies are inclusive of few computa-
tional intelligence algorithms such as modeling of autoregressive coefficients [3],
wavelet coefficient technique [4], neural network using radial basis function [5],
self-organizing map (SOM) [5, 6], and the techniques of fuzzy clustering of C-means
technique [7].
Electrocardiogram Signal Classification for the Detection … 495
The functional block diagram of the proposed methodology of ECG waveform clas-
sifying system is shown in Fig. 1. The entire functional technique is segregated into
three different sections. They are preprocessing of ECG signals, feature extraction
process of ECG signals, and classifying process of ECG signals.
The acquisition of raw signal of electrocardiogram is being made from MIT-
BIH arrhythmia database. Initially, the process of preprocessing is being carried out
in terms of removing the noise from the electrocardiogram signals along with the
removal of baseline wander in terms of threshold. These components are considered
as an unwanted component to process ECG signal. After the stage of preprocessing
is being completed, the features in terms of morphological features and the features
on the basis of discrete wavelet transform are being extracted which is further being
preceded to the classification stage as a usable form of data. As the final stage,
the extracted features are being selected completely and allowed to undergo the
process of classification by the utilization of artificial neural network using BPA.
The classification output results with the categorization of normal ECG signal or the
abnormal ECG signal.
3 Collection of Database
In this proposed study, the acquisition of ECG waveforms has been made from MIT-
BIH cardiac arrhythmia database, and it has been used from the source of physionet.
496 M. Ramkumar et al.
The electrocardiogram signal resources from the cardiac arrhythmia database of MIT-
BIH have been received from the laboratory of Beth Israel hospital. This database
totally consist of forty-eight files each with the recording of half an hour or thirty
minutes which is segregated into two different sections with which the initial one
has possessed with twenty-three number of files sequenced from hundred (100) to
one hundred and twenty-four (124) excluding few files in the middle, and the next
one possess twenty-five files which has been sequenced from two hundred (200) to
two hundred and thirty-four (234) excluding few files in the middle [8, 9].
This arrhythmia database is totally comprising of 1,09,000 count of beat labels.
The signals of ECG which has been acquired from MIT-BIH cardiac physionet
arrhythmia database are being read by its header file in the text format, binary format,
and the binary annotation formatted file. These files over the header provide the
explanation of brief data of ECG signals such as total count of samples, frequency
of sampling, ECG waveform format, ECG lead type, total count of leads in ECG,
history of patients or subjects from which the ECG signals has been acquired, and
the described medical data of the patients. The storage of ECG signals is made in the
format of 212, annotation file in the binary format, wherein which each individual
sample that gets superimposed by the lead count and the storage is made in the 12-bit
format, and the beat annotation has been possessed with the binary annotation file
[9].
The initial stage of processing the ECG signal is denoted as preprocessing, with
which it is mandatory for eliminating the noises from the acquired input signals
utilizing the technique of discrete wavelet transform (DWT). For electrocardiogram
signal preprocessing, the elimination of noise undergoes various strategies from the
different sources of noise [10]. This technique of preprocessing is carried out in
order to extract noiseless features from the denoised ECG wave component which
will result with enhancement in the efficiency of the classification system [11]. This
ECG signal preprocessing mainly comprising of two different processes where one
is ECG signal denoising and the other one is removal of baseline (threshold) wander
using the wavelet transform technique with the multiresolution dimension.
a. ECG Signal Denoising
At this particular stage, various structures of noise are being eliminated utilizing the
fourth order Daubechies wavelet (Db4). The procedure for denoising the ECG signal
is inclusive of three main steps [10]. There are two coefficients being considered in the
wavelet transformation technique, one is the approximation coefficient, and the other
one is detail coefficient. The detail coefficients of the ECG signal are being influenced
by higher frequencies in the initial level, whereas the approximation coefficients of
the ECG signals are being influenced by lower frequencies in the single-dimensional
Electrocardiogram Signal Classification for the Detection … 497
signals of discrete form. In order to denoise the ECG wave component by utilizing
the DWT technique, the decomposition of the signal is made for various components
which is being exhibited at various scales. As the initial step of work, the selection
of wavelet coefficient in the appropriate manner is made, and the decomposition of
the signal is allowed to undergo to the Nth level. As the preceding of second step,
the selection of threshold is made by the utilization of different techniques wherein
which in our study, the techniques of thresholding which works automatically has
been implied. After the selection of soft threshold, for the entire level of detail
coefficients, it has to be applied. At the final stage, the reconstruction of the signal is
made on the basis of Nth level approximation coefficients and the detail coefficients
for which the modifications have been undergone from level 1 to level N [12]. The
representation of original ECG signal is shown in Fig. 2. The representation of
denoised or filtered ECG signal is shown in Fig. 3.
b. Removal of Baseline Wander
of Y by utilizing the moving average filter, and later on, the results are acquired
from vector column Y [12]. The selection of span in this proposed work in order
to get the smoothened data is made as 150, and at the final stage, the subtraction is
being carried out from the acquired raw ECG wave component to the smoothened
electrocardiogram signal. Therefore, the computation of signal is being attained with
pure form that is released out from baseline wander.
As soon as the process of denoising and baseline wander removal is being implied
over the processing of ECG signal classification, the next stage is the process of
extracting the features from the ECG wave component in preceding with the analysis.
The capability of the system in manipulating and computing the ECG data in the
compressed form of processing parameters is one of the most essential parts of
extraction by utilizing the WT technique denoted as features. Extraction of features
for processing the data is one of the most essential steps in recognizing the patterns.
The extraction of ECG features could be made in several forms. In this proposed study,
two different feature types are being extracted from the ECG wave component. They
are as follows.
(a) ECG signal morphological features.
(b) Features on the basis of wavelet coefficients.
Electrocardiogram Signal Classification for the Detection … 499
Approximation coefficient
1.5
0.5
0
0 500 1000 1500 2000 2500
Detail coefficient
1
0.5
-0.5
-1
0 500 1000 1500 2000 2500
The process of extracting the features and selecting the extracted features plays an
essential role in recognizing the patterns. The computation of coefficients acquired
from discrete wavelet transform coefficients (DWT) points the signal distribution
over the frequency domain and time domain [13]. Figure 4 depicts the approxi-
mation coefficients and detail coefficients of ECG wave component. Therefore, the
predetermined detail coefficients and the approximation coefficients of electrocar-
diographic wavelet functions are recognized as the vector of features in denoting the
signals. Utilization of wavelet coefficients directly as the input sets which is being
processed in the neural network classifier might increase the total count of neurons
present in the hidden layer which creates the harmful realization effect in the compu-
tational process of neural network. Hence, the process of dimensionality reduction
has to be made implemented in the stage of feature extraction. For this dimension-
ality reduction of features which has to be extracted, the probabilistic analysis over
the wavelet coefficients has to be ascertained. The features which arose from the
statistical or probabilistic representation in the distribution over the frequency-time
response for the ECG signal component is as follows.
(a) Mean of the sub-band coefficients of detail and the approximation.
(b) Standard deviation of the sub-band coefficients of detail and the approximation.
(c) Variance of the sub-band coefficients of detail and the approximation in each
level.
Hence, for each signals of ECG, forty-eight number of features in terms of wavelet
have been acquired. Along with the probabilistic or the statistical features, the features
of ECG morphology are also being acquired for processing into the classifier [11].
500 M. Ramkumar et al.
The features which is being related to the ECG morphology are R–R interval standard
deviation, interval of P–R peak, interval of P–T peak, interval of S–T peak, interval
of T–T peak, interval of Q–T peak, maximum amplitude peak values of P amplitude
peak, Q amplitude peak, R amplitude peak, S amplitude peak, T amplitude peak,
QRS complex, and total count of R amplitude peaks. Hence, there are totally 64
count of features that have been acquired and processed as an input to the classifier
of artificial neural network system [14]. Since the feature vector quantities might
be quite variant, the process of normalization is needed for standardizing all the
features to the desired level. Performing the normalization for the mean and standard
deviation gives permission for the artificial neural network for treating each input as
equally important over its specified value range.
Artificial neural network (ANN) is usually represented as one of the best computa-
tional intelligent techniques with which its motivation is being raised by its biolog-
ical structure of neural networks. A neural network is consisting of an interconnected
cluster of artificial neurons. This proposed study makes the description for the utiliza-
tion of neural network in recognizing the pattern with which the units of inputs
denotes the feature vector and the unit of output denotes the section of pattern with
which the classification has to be undergone. Each one of the feature vectors which
is designated as the input vectors is being sent to the layer of input, and each unit’s
output is designated as the vector’s corresponding element. Each units of the hidden
layer determines the calculation over the additive value of weights corresponding to
the inputs to frame the scalar outline for defining the net activation function. The net
activation function is defined as the inner product of the weight vector and the input
vector in the hidden layer unit [15].
a. Back Propagation Algorithm
The artificial neural network which implies the algorithm of back propagation permits
the real-time acquisition of mapping the input and the output functional information
within the networks of multilayered function. By the utilization of back propagation
algorithm, the execution over ANN is carried with the search of gradient descent
function in order to make the reduction of mean square error (MSE). This mean
square error is measured between the designated output and the output which has
obtained actually from the system of network by the adjustment of weights. The
outcome of back propagation algorithm has been estimated with high precision and
adapted for most of the classification applications with which the system process
with the generalized information and the rules designated.
Electrocardiogram Signal Classification for the Detection … 501
The database of MIT-BIH cardiac arrhythmia is being partitioned into two unified
classes such as normal classification of ECG and the abnormal classification of ECG.
Each individual file of 60 s recording in being acquired in terms of data, and it is
segregated into two different classes on the basis of total count of heart beats and
attainment of maximum peak over it. Out of forty-eight recording of ECG signal,
with its each individual of half an hour recording, only forty-five have been chosen
for processing as mentioned earlier (twenty-five records acquired from normal class,
and the rest twenty acquired from abnormal class. Out of the entire forty-eight files in
the physionet database, 102, 217, and 107 are the subject files of ECG which were not
considered in this proposed study of classification. Table 1 depicts the categorization
of record number which has been utilized from MIT-BIH physionet database. As on
the whole, sixty-four counts of features have been utilized, and it is split into two
different mentioned classes. These are the features on the basis of DWT and also
obtained from the morphology of ECG signal. Among the 64 features, forty-eight
features are on the basis of discrete wavelet transform, and the remaining sixteen are
the morphological ECG features. These predefined features are processed as an input
to the artificial neural network using back propagation algorithm. For proceeding with
the part of simulation, initially, the training for the neural network has to be made
with the data. Making the combination of the features which has been extracted, the
dimension of 64 * 26 vector in the form of matrix has been formulated for training
the input informational data, and the 19 elements were utilized for performing the
testing process of neural network.
The result over the simulation has been acquired by using the ANN classifier by
the application of back propagation algorithm. Totally, 20 counts of neurons have
been utilized for testing and training the ECG wave component. Two number of
neurons are utilized in the output layer of the neural network which is denoted as
(0,1) and (1,0) which corresponds to the normal category and the abnormal category,
respectively.
Sensitivity, specificity, positive predictivity, and accuracy are the performance
metrics determined over the accuracy of classification. The formulas of the
performance metrics have been represented in the following equations.
Table 1 Record distribution from the MIT-BIH cardiac arrhythmia physionet database
S. no. Classification Numbering of records
1 Normal category 100, 101, 103, 105, 106, 112, 113, 114, 115, 116, 117, 121, 122,
123, 201, 202, 205, 209, 213, 215, 219, 220, 222, 234
2 Abnormal category 104, 108, 109, 111, 118, 119, 124, 200, 203, 207, 208, 210, 212,
214, 217, 221, 223, 228, 230, 231, 232
502 M. Ramkumar et al.
102%
100%
98%
96%
94%
92%
90%
88%
86%
84%
Accuracy% Sensitivity% Specificity% Positive Predictivity
%
8 Conclusion
This proposed work has revealed that the detection of abnormality state in the ECG
signal on the basis of DWT and ANN using back propagation algorithm (BPA). It
attains the average, sensitivity, specificity, positive predictivity, and accuracy of 95,
Electrocardiogram Signal Classification for the Detection … 505
97.65, 98, and 97.8%, respectively. The classification has been done by acquiring the
electrocardiogram signal from MIT-BIH cardiac arrhythmia database on the basis of
electrocardiogram heart beats aligned with it. By considering 45 ECG files from the
physionet database and considering totally 64 counts of features inclusive of both
statistical and the morphological features, it is achieved with the overall optimized
accuracy of 97.8% in categorization using a computational intelligent system. As the
part of future work, the optimization could be increased in classifying the cardiac
abnormalities in detecting the abnormality conditions by the acquisition of real-time
ECG signals.
References
Abstract Transferring the weights from the pre-trained model results in faster and
easier training than training the network from scratch. The proper choice of optimizer
may improve the performance of the deep neural networks for image classification
problems. This paper analyzes and compares three standard first-order optimizers like
stochastic gradient descent with momentum (SGDM), adaptive moment estimation
(Adam), and root mean square propagation (RMSProp), particularly for detecting
glaucoma from fundus images using different CNN architectures like AlexNet, VGG-
19, and ResNet-101. Experiment results show that network parameters updated using
Adam optimizer yields better results in most of the databases. Among the models,
VGG-19 has obtained the highest classification accuracy of 91.71, 87.8, and 97.12%,
in DRISHTI-GS1, RIM-ONE(2), and LAG databases, respectively. ResNet-101 has
outperformed other networks in ORIGA and ACRIMA databases, with the highest
classification accuracy of 80.5% and 98.5%, respectively.
1 Introduction
Optic nerves are cluster of nerves located in the back of human eye. It carries the
visual information from the retina to brain. Among the various retinal disorders, glau-
coma is the most common disorder which affects the optic nerve. Due to increased
intraocular pressure, the optic nerve may get compressed and damaged. This results
in loss of peripheral vision. Normally, glaucoma has no symptoms at the initial
stages. Diagnosis of this retinal disorder at an early stage is a challenging task. Some
traditional techniques like intraocular pressure measurement, optic nerve head eval-
uation, and visual field testing may have certain limitations, which can be overcome
by computer-aided diagnosis (CAD) approaches.
2 Related Work
3 Methodology
Automatic detection of glaucoma using deep neural network has gained popular-
ity in recent years. Training the neural network from scratch is time consuming. It
requires an effective hyperparameter selection technique. Instead, transferring the
weights from the standard pre-trained network is easy and provides a better perfor-
mance metrics for classification problems. Figure 1 gives the block diagram of trans-
fer learning-based glaucoma detection from fundus image. The fundus images are
resized to a standard input size of pre-trained network. As deep neural network works
well with larger number of images, data augmentation using rotation is performed in
the pre-processing stage. The initial layers and network weights are transferred from
the selected model. The discriminative features from the fundus images are extracted
by suitable network. Classification is done by modifying the final layers.
Transfer learning is a typical deep learning approach in which the weights from pre-
trained models can be transferred to new classification problem. This results in faster
and easier training. In this work, AlexNet, VGG-19, and ResNet-101 architectures
are used. AlexNet [17] comprises of eight learnable layers (five convolutional and
three fully connected layers). VGG-19 [18] is one among the standard pre-trained
model which is effectively used for image classification due to its deep architecture.
It comprises of forty-seven layers, which includes sixteen convolutional layers, five
Glaucoma
Load pre-
Image
trained Replace Re-train
Resizing
Model Final the
and Data
(AlexNet/ layers model
AugmentaƟon
VGG-19 /
ResNet-101
max pooling layers, and three fully connected layers. ResNet-101 [19] from residual
networks overcomes the vanishing gradient problem by incorporating the residual
connections in the network.
∂L
W = W −η∗ , (1)
∂W
where W represents learnable parameter vector, η is the step size, and L is the
loss function. Depending on the number of data samples employed for gradient
computation, gradient descent algorithm has three major variants: batch gradient
descent (BGD), stochastic gradient descent (SGD), and mini-batch gradient descent
(MBGD). In BGD algorithm, the gradient of the loss function is computed for the
complete training dataset, whereas SGD performs a parameter update for each train-
ing sample. The entire training dataset is divided into mini-batches and the parameters
are updated for every mini-batch in MBGD algorithm. BGD results in slow training
and redundant computations. In contrast, SGD is faster, but frequent updates with a
high variance results in fluctuations. The variance of the parameter updates is greatly
reduced in mini-batch gradient descent and this may lead to stable convergence
compared to other two variants.
Stochastic gradient descent with momentum (SGDM) Stochastic gradient descent
with momentum [20] is an extension of SGD algorithm. It incorporates the past
gradients in each dimension. The momentum term reduces the undesired oscillations
and makes the algorithm to attain convergence at faster speed.
M1i is the bias-corrected mean, V 1i is the bias-corrected variance and are described
by
Mi
M1i = , (3)
1 − γ1i
Vi
V 1i = , (4)
1 − γ2i
2 Sensitivity SN = TP+FN
TP
3 Specificity SP = TN+FP
TN
4 Precision PR = TP+FP
TP
using three standard first-order optimizers like SGDM, Adam, and RMSProp, and
the performance metrics are noted. It is observed from the simulation results that
Adam optimizer yield better results compared to other two optimizers.
Tables 3, 4, 5, 6, and 7 describe the results obtained for different databases using
AlexNet, VGG-19, and ResNet-101 model. The metric precision is considered for
unbalanced databases (DRISHTI-GS1 and ORIGA) and classification accuracy for
balanced databases (RIM-ONE(2), ACRIMA, and LAG). The learnable parame-
ters updated using Adam optimizer yields better results in RIM-ONE(2) and LAG
databases. A highest classification accuracy of 87.8% and 97.12% are obtained in
RIM-ONE(2) and LAG databases, respectively, using VGG-19 model. For ACRIMA
database, AlexNet and VGG-19 yield better results with Adam optimizer. ResNet-
101 with SGDM performs better in ACRIMA database, with highest accuracy of
98.5%. A better precision of 92.91% is obtained for DRISHTI-GS1 with VGG-19
using Adam optimizer, whereas for ORIGA database, ResNet-101 with RMSProp
optimizer results in highest precision value of 81.2%. From this analysis, it is evident
that the performance of deep neural network depends mainly on the databases and
optimizers employed.
Performance Analysis of Optimizers for Glaucoma Diagnosis . . . 515
DRISHTI-GS1 ORIGA
100 ACC
100
SN ACC
90 SP 90 SN
PR
SP
PR
80 80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
Chen Juan Ragavendra AlexNet VGG-19 ResNet-101 Chen Juan Ragavendra AlexNet VGG-19 ResNet-101
RIM-ONE(2) ACRIMA
90 100
ACC ACC
SN SN
80 SP 90 SP
PR PR
70 80
70
60
60
50
50
40
40
30
30
20 20
10 10
0 0
Chen Juan Ragavendra AlexNet VGG-19 ResNet-101 Chen Juan Ragavendra AlexNet VGG-19 ResNet-101
In addition, the CNN architectures described in the literature (Chen et al. [11],
Gomez-Valverde et al. [12], Raghavendra et al. [15]) are also implemented and the
results are compared with the pre-trained models. Figures 3, 4, and 5 represent the
performance metrics obtained by different CNN architectures for various databases.
For pre-trained models, optimizer which results in better performance metrics is con-
sidered for comparison. Performance comparison of different architectures reveals
that except ORIGA database, pre-trained model performs better toward classifying
the glaucoma and normal images than the networks trained from the scratch.
5 Conclusion
In this paper, the performance of various optimizers like Adam, SGDM, and
RMSProp are analyzed and compared for the automatic detection of glaucoma
from fundus images using transfer learning technique. Three standard models like
AlexNet, VGG-19, and ResNet-101 are considered for extracting the discriminative
features from the original images. DRISHTI-GS1, RIM-ONE(2), ORIGA, ACRIMA,
and LAG retinal databases are considered to evaluate the network performance.
Performance Analysis of Optimizers for Glaucoma Diagnosis . . . 517
LAG
100
ACC
SN
90 SP
PR
80
70
60
50
40
30
20
10
0
Chen Juan Ragavendra AlexNet VGG-19 ResNet-101
VGG-19 model with parameters updated using Adam optimizer has obtained the
highest classification accuracy of 87.8%, 97.12%, and precision of 92.91% in RIM-
ONE(2), LAG, and DRISHTI-GS1, respectively. An overall classification accuracy
of 98.5% and precision of 81.2% are obtained using RmsProp and SGDM optimizer
in ResNet-101 model. Compared to the networks trained from scratch, pre-trained
models perform better in classifying glaucoma and normal images.
References
8. Zilly J, Buhmann JM, Mahapatra D (2017) Glaucoma detection using entropy sampling and
ensemble learning for automatic optic cup and disc segmentation. Comput Med Imaging Graph
55:28–41
9. Bajwa MN, Malik MI, Siddiqui SA, Dengel A, Shafait F, Neumeier W, Ahmed S (2019) Two-
stage framework for optic disc localization and glaucoma classification in retinal fundus images
using deep learning. BMC Med Inf Decis Mak 19(1):1–16
10. Elangovan P, Nath MK (2020) Glaucoma assessment from color fundus images using convo-
lutional neural network. Int J Imaging Syst Technol 1–17. https://doi.org/10.1002/ima.22494
11. Chen X, Xu Y, Wong D, Wong T-Y, Liu J (2015) Glaucoma detection based on deep convolu-
tional neural network. In: Annual international conference of the IEEE engineering in medicine
and biology society (EMBC), Milan, Italy, pp 715–718
12. Gomez-Valverde J, Anton A, Fatti G, Liefers B, Herranz A, Santos A, Sanchez C, Ledesma-
Carbayo M (2019) Automatic glaucoma classification using color fundus images based on
convolutional neural networks and transfer learning. Br J Ophthalmol 10(2):892–913
13. Diaz-Pinto A, Morales S, Naranjo V, Kohler T, Mossi J, Navea A (2019) Cnns for auto-
matic glaucoma assessment using fundus images:an extensive validation. Bio-Med Eng OnLine
18(29):1–19
14. Zhen Y, Wang L, Liu H, Zhang J, Pu J (2005) Performance assessment of the deep learning
technologies in grading glaucoma severity. Medical Image Anal 9(4):297–314
15. Raghavendra U, Fujita H, Bhandary SV, Gudigar A, Hong Tan J, Rajendra Acharya U (2018)
Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus
images. Inf Sci 144(29):41–49
16. Li L, Xu M, Wang X, Jiang L, Liu H (2019) Attention based glaucoma detection: a large-scale
database and CNN model. In: The IEEE conference on computer vision and pattern recognition
(CVPR), pp 1–10
17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional
neural networks. Neural Inf Process Syst 25(2):1097–1105
18. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition.
CoRR, vol.abs/1409.1556
19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE
conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp 770–778
20. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw
12:145–151
21. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International confer-
ence on learning representations, San Diego, CA, pp 1–15
22. Sivaswamy J, Krishnadas SR, Joshi GD, Jain M, Tabish AUS (2014) Drishti-gs: retinal image
dataset for optic nerve head (ONH) segmentation. In: 2014 IEEE 11th international symposium
on biomedical imaging (ISBI), Beijing, China, pp 53–56
23. Fumero F, Alayon S, Sanchez J, Sigut J, Gonzalez-Hernandez M (2011) Rim-one: an open
retinal image database for optic nerve evaluation. In: IEEE symposium on computer based
medical systems (CBMS), Bristol, UK, pp 1–6
24. Zhang Z, Yin F, Liu J, Wong W, Tan N, Lee B-H, Cheng J, Wong T-Y (2010) Origa(-light): an
online retinal fundus image database for glaucoma analysis and research. In: Annual interna-
tional conference of the IEEE engineering in medicine and biology, Buenos, Aires, pp 3065–
3068
Machine Learning based Early
Prediction of Disease with Risk Factors
Data of the Patient Using Support Vector
Machines
Abstract Early detection of diseases plays an important role in improving the quality
of healthcare and can help people to prevent dangerous health conditions. Early detec-
tion of chronic disease is a critical task in the area of health data analysis. This paper
proposes a novel fuzzy logic-based SVM machine learning technique for patient-
centered healthcare analytics for premature prediction of chronic diseases such as
hypertension, hypothyroidism, and obesity based on individual patient risk factors
data. The proposed method consists of preprocessing, feature selection, feature
extraction, fuzzy SVM classification, and post-processing predicts the severity of the
disease. The proposed fuzzy-based support vector machines (SVMs) are designed
to classify important features by applying machine learning techniques to ensure
accuracy in the prediction of chronic disease. The SVM is a margin-based classi-
fier that maps input data onto a high-dimensional space and classifies them with a
linear approximation. This technique combines outputs from different classification
models and has shown the highest accuracy compared to previous techniques. None
of the previous studies have integrated fuzzy with SVM classifiers for chronic disease
datasets. The proposed machine learning-based disease prediction model for early
diagnosis and timely treatment for non-communicable disease/chronic disease inves-
tigates the risk factors data from the patient treatment log. This model provides early
risk detection, helps doctors to follow the appropriate precautions, and measures to
minimize and prevent a patient from reaching critical phases of the disease. The
system would significantly decrease human mortality rates and strengthen health
services.
1 Introduction
The World Health Organization has reported that the global responsibility and danger
of Non-Communicable Diseases is a major public health challenge that threatens
social and economic development worldwide [1]. Non-Communicable Diseases
(NCDs), also known as chronic diseases, tend to be of long duration and are the result
of a combination of genetic, physiological, environmental, and behavioral factors [2].
The main types of NCDs are cardiovascular diseases (like heart attacks and stroke),
cancers, chronic respiratory diseases (such as chronic obstructive pulmonary disease
and asthma), and diabetes [3]. The main causes of NCD are harmful use of tobacco
and alcohol, physical inactivity, and unhealthy diets all increase the risk of dying
from an NCD [4].
People of all age groups, regions, and countries are affected by NCDs [5, 6].
These conditions are often related to older age groups, but evidence shows that 15
million of all deaths attributed to NCDs occur between the ages of 30 and 69 years.
Of these “premature” deaths, over 85% are estimated to occur in low- and middle-
income countries. Children, adults, and the elderly are all vulnerable to the risk
factors contributing to NCDs, whether from unhealthy diets, physical inactivity, and
exposure to tobacco smoke or the harmful use of alcohol. These diseases are driven
by forces that include the rapid growth of globalization, unplanned urbanization,
unhealthy lifestyles, and population aging. Unhealthy diets and a lack of physical
activity may show up in people as raised blood pressure, increased blood glucose,
elevated blood lipids, and obesity. These are called metabolic risk factors that can
lead to cardiovascular disease, the leading NCD in terms of premature deaths [7, 8].
Approximately 639 million adults in developing countries suffer from hypertension
and are estimated to reach nearly 1 billion adults by 2025. WHO projections show
that NCDs will be responsible for a significantly increased total number of deaths
in the next decade. NCD deaths are projected to increase by 15% globally between
2010 and 2020 (to 44 million deaths) [9, 10]. As the growing awareness of the risk
of NCD, several recent studies have utilized machine learning models as a decision-
making technique for early detection and timely treatment based on an individual’s
risk factors data, and hence, appropriate care can take at an earlier stage [11, 12].
Machine learning algorithms in healthcare, machine learning is helping to stream-
line administrative processes in hospitals, plan and treat infectious diseases and
personalize medical treatments [13, 14]. It can impact hospitals and health systems
in improving efficiency while reducing the cost of care.”
In many previous studies, the optimization of the classifier is done with the “test
set,” which may outcome in an unplanned influence of the test data on the clas-
sifier thus higher sensitivity and specificity than those that may be experienced in
real-world conditions [15, 16]. To provide more actual estimates of sensitivity and
specificity, the test data must not influence the training of the algorithm [17]. There-
fore, we employ a double cross-validation method, where data are divided into a
training set and a test set, and the training set is further subdivided into a learning
set and a validation set. The fuzzy-based SVM classification model is trained on
Machine Learning based Early Prediction of Disease … 521
learning and validation datasets and tested on a dataset that is not touched in training
[18, 19]. Because the test dataset is left completely out of training, the results with
this experimental design can more accurately reflect the expected prediction rate in
real-world conditions [20].
Our Contribution
The main objective of this proposed work is to develop algorithms and architectures
for an implantable device that can reliably provide disease prediction with sufficient
time to trigger treatment. The specific task in this paper is to investigate the feasi-
bility of a patient-log-specific classification approach with the FSVM to distinguish
between risk factors attributes and normal attributes. The algorithm has been tested
on a dataset from the hospital NCD database, which has been made available for
comparing the results of different algorithms on the same datasets.
The remainder of the paper is organized as follows: Sect. 2 describes literature
review and Sect. 3 presents the proposed system methodology. Sect. 4 presents the
experimental setup and results. Sect. 5 discussion and highlights of the proposed
system and conclusion and feature work are drawn at Sect. 6.
2 Literature Review
Luis Eduardo et al. [21] constructed a cardiac hybrid imaging machine based on
complex assumptions. Machine learning is able to determine and understand complex
data structures in order to solve the challenges of estimation and classification.
Alarsan and Younes [22] suggested a machine learning-based ECG (electrocardio-
gram) classification method using a variety of ECG features to detect cardiac ECG
abnormalities. Golino et al. [23] described the customized diabetic patient health
monitoring system has been clearly defined by the use of BLE-based and G-based
sensors. Alfian described forecasting higher blood pressure using machine learning
approaches, these methods are not classified as chronic disease models.
In [24, 25], authors discussed machine learning can be utilized as early disease
prediction model to predict severity of chronic diseases based on the current indi-
vidual’s risk factors data. Several studies have utilized machine learning model and
revealed significant results for predicting severe illness of diabetes, hypertension,
hypothyroidism, obesity risk factors.
In [26–28], authors described a hybrid machine learning approach is a one of well-
known and widely used machine learning model. In [29, 30], author presented an
approach that the main idea is to combine two machine learning models to help reduce
bias and variance and hence improve the prediction results. Previous studies have
also used hybrid approach and shown significant outcomes for improving medical
decision making and diagnosis, predicting the severity of heart diseases, NCD. and
identifying their risk factors.
522 U. Chelladurai and S. Pandian
3 Proposed System
In this subsection, the proposed system presents the design view of the fuzzy-based
SVM machine learning technique for disease prediction. Figure 1 illustrates the
framework architecture and entities of the proposed system that are used in this
section. The proposed system consists of (A) data collection (B) data transforma-
tion (C) data storage and security (D) data modeling and (E) data visualization and
knowledge discovery.
This section briefly explains the inputs and outputs of each entity of the proposed
method. The proposed system deals with data collection, dataset creation, prepro-
cessing, feature selection, feature extraction, and applying our proposed algorithm
fuzzy-based SVM classification for prediction of diseases. The input of the proposed
system is a chronic disease dataset, and the output of the proposed system generates
the accurate possibility of diseases and identifies the patients with risky conditions.
In the below subsection, every component of the proposed system is discussed in
detail.
A. Data Collection/Health dataset Creation
The first step of the proposed method includes the processing of data from
various sources in different formats. The proposed system uses a compilation
of medical data obtained from the hospital. The dataset includes 3000 patient
Machine Learning based Early Prediction of Disease … 523
tremendous secure solutions are there, but we suggest that Blockchain tech-
nology is one technology that provides more security than others. Increased
safety of medical records, immutable patient records, patient-centric health
records, secure and verifiable medical records, and the massive amount of
medical data is available at anytime and anywhere. This technology also
addresses challenges such as interoperability, integration with existing systems,
technological, and adoption barriers. Blockchain is a decentralized peer-to-peer
architecture. It is a distributed ledger technology. Participants in the distributed
network record digital transactions into a shared ledger. Each participant in the
network stores the same copy of the shared ledger and changes to the shared
ledger are reflected in all copies in the distributed network. Blockchain tech-
nology itself a data repository, it provides security to health data and privacy to
patients.
Machine Learning based Early Prediction of Disease … 525
D. Data Modeling
Once the data has been collected, transformed, and stored in secured storage
solutions, the data processing analysis is performed to generate useful knowl-
edge. In the proposed system, a rule-based theorem has been evaluated to meet
class attributes. These attributes are highly affected/correlated with the class
attribute. The features are selected, extracted and formed a new dataset with
21 features with 1000 instances dataset. Table 2 describes the fuzzy values of
each attribute with 25 samples in this section. In the proposed system, among
21 attributes the highly correlated and disease prediction attributes are gender,
age, bmi, fbs, foh, rbp, chol and, ldl, hdl, and try. Figures 3, 4, 5, 6, and 7 clearly
show how the correlated attributes highly influence the diseases.
n
Minw , b, ξi 1/2 w 2 + C ξi (1)
i=1
s.t. yi w T xi + b ≥ 1 − ξi , ξi ≥ 0, ∀i ∈ {1, 2, . . . , m} (2)
The unfairness classification, the specificity might be less than the sensitivity
predicted. Sensitivity and specificity can be combined into a single score that balances
both concerns, called the geometric mean or G-Mean (Table 3).
a less fair prediction rate, the proposed system has used rule-based attribute
optimization that ensures test data samples. In the chronic dataset, if an instance
has n diseases such as hypertension, hypothyroidism, and obesity and more at
a time, then identify those instances as highest priority instances. The system
randomly selects 20% of test data and 80% of training data for continuous test
for different algorithms. Table 4 illustrates the accuracy, classification error rate,
detection values, prevalence, sensitivity, and specificity of different algorithms
compared with SVM techniques. Once the model has been well trained, the
prediction rate is evaluated by the testing model, this process is repeated until
the average prediction rate is calculated, which determines the accuracy. In the
existing systems, the accuracy and sensitivity are compared with the proposed
system Naive Bayes accuracy = 97.3% and sensitivity = 96.3, Random forest
accuracy = 100%, and sensitivity = 100 the detection and prevalence values are
less as compared with the proposed system. Decision Tree accuracy = 100%
and sensitivity = 100% and the detection and prevalence values are less as
compared with the proposed FSVM system. In KNN the accuracy = 81.36 and
sensitivity = 68.0, Logistic regression the accuracy = 100% and sensitivity =
100%. As compared with existing algorithms for the given dataset for predicting
immature diseases, the FSVM generates more accuracy than other techniques.
Rstudio has been used for machine learning and disease prediction analysis. We have
tested a patient-centric fuzzy classification algorithm for disease prediction on the
NCD dataset of 1000 patients with 21 extracted features. To evaluate the algorithm,
we have implemented the proposed method into a mobile application to show the
Machine Learning based Early Prediction of Disease … 529
possibility of our proposed system in live applications. Figure 3 shows the mobile
application interface of our chronic disease dataset. The interface retrieves health
parameter according to the PatientID, and then the parameters are verified with a
threshold value and then apply rule-based SVM for predicting patient condition either
normal or abnormal. Our proposed model significantly enhanced the prediction rate
and correctly predicted the severity of diseases like hypertension, hypothyroidism,
diabetes, and obesity in 716 patients of 1000 with more sensitivity than other models.
In this section, Fig. 2 shows the mobile interface developed using an android studio.
This shows, whether the patient is in risky condition or not and what type of diseases
are possible and finally, most affected diseases are identified. The developed mobile
application uses chronic disease dataset for testing the immature state of disease.
The patient can easily monitor the current status and previous illness, according to
that patient can go for earlier care and take appropriate drugs. The developed health
checker application provides e-consultation in the pandemic situation and provides
the facility of e-medicine with homecare.
Data visualization is done through Rstudio; after applying machine learning algo-
rithms, results are plotted and presented in Fig. 3, 4, 5, 6 and 7 by using R studio data
530 U. Chelladurai and S. Pandian
plotting technique. Figure 3 shows the distribution of our health dataset used in this
research, distribution of predicted attributed are with high accuracy is presented in
Fig. 4, the dataset outlier samples are presented in Fig. 5, measurements of selected
features are presented in Fig. 6 and seperation of class variables are presented in
Fig. 7.
5 Discussion
The results of the proposed system have been compared with other existing machine
learning algorithms, and the performance measures of existing algorithms were
compared and incorporated in Table 4. The proposed algorithm has shown the highest
sensitivity as compared with others. When comparing the specificity of KNN and
random forest, the proposed algorithm performs well and achieves more sensitivity
than others. It is marked from the experimental results that the proposed fuzzy-
based SVM technique with reduced attributes improves the classification accuracy
Machine Learning based Early Prediction of Disease … 531
and generated with fair results. While some studies may present greater sensitivity
than our proposed technique, their algorithms were trained and tested on the same
datasets; therefore, the results are not directly comparable.
6 Conclusion
In this paper, a machine learning based early prediction of diseases with risk factors
data of a patitent using support vector machine is proposed. The work has been carried
out by applying a rule-based FSVM algorithm after the removal of artifacts. Irrelevant
things in health datasets may harmfully affect the disease prediction process and may
generate poor results. After applying the proposed algorithm, an improved prediction
is achieved by selecting the essential features for each patient. The identified risk
factors attribute such as age, fasting blood sugar, resting blood pressure, cholesterol,
BMI, TSH, and ECG in the health database are separated for disease prediction.
The prediction also is improved by adding more features, including cross-correlation
attributes and discriminating features also improve the classification rate and enhance
accuracy. Early prediction, disease identification, and the possibility of diseases are
done through the proposed system. A mobile application has been developed for
chronic management and tested with available chronic dataset for early care, elderly
care, and homecare.
References
12. Harliman R, Uchida K (2018) Data- and algorithm-hybrid approach for imbalanced data
problems in deep neural network. Int J Mach Learn Comput 8(3):208–213
13. Hanand J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, SanDiego, CA,
USA
14. Mohan S, Thirumalai C, Srivastava G Effective heart disease prediction using hybrid machine
learning techniques. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2923707
15. Alfian G, Syafrudin M, Ijaz M, Syaekhoni M, Fitriyani N, Rhee J (2018) A personalized
healthcare monitoring system for diabetic patients by utilizing BLE-based sensors and real-time
data processing. Sensors 18(7):2183
16. Lemaitre G, Nogueira F, Aridas CK (2017) Imbalanced-learn: A Python toolbox to tackle the
curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563
17. Naiarun N, Moungmai R (2015) Comparison of classifiers for the risk of diabetes prediction.
Procedia Comput Sci 69:132–142
18. UCI Machine Learning Repository (2015) Chronic_Kidney_Disease Data Set. [Online].
Available: https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease
19. Wu H, Yang S, Huang Z, He J, Wang X (2018) Type 2 diabetes mellitus prediction model based
on data mining. Inform Med Unlocked 10:100–107
20. Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2019) Development of DPM based on ensemble
learning approach for diabetes and hypertension. IEEE Access, 7:144777–144787. https://doi.
org/10.1109/ACCESS.2019.2945129
21. Juarez-Orozco LE, Martinez-Manzanera O, Nesterov SV, Kajander S, Knuuti J (2018) The
machine learning horizon in cardiac hybrid imaging. Springer Open Eur J Hybrid Imag. https://
doi.org/10.1186/s41824-018-0033-3
22. Alarsan FI, Younes M (2019) Analysis and classification of heart diseases using heartbeat
features and machine learning algorithms. J Big Data 6:81. https://doi.org/10.1186/s40537-
019-0244-x
23. Golino H (2013) Women’s dataset from the ’predicting increased blood pressure using machine
learning, Figshare. [Online]. Available: https://doi.org/10.6084/m9.figshare.845664.v1
24. Anderson JP, Parikh JR, Shenfeld DK, Ivanov V, Marks C, Church BW, Laramie JM, Mardekian
J, Piper BA, Willke RJ, Rublee DA (2016) Reverse engineering and evaluation of prediction
models for progression to type 2 diabetes: an application of machine learning using electronic
health records. J Diabetes Sci Technol 10(1):6–18
25. Sakr S, Elshawi R, Ahmed A, Qureshi WT, Brawner C, Keteyian S, Blaha MJ, Al-Mallah MH
(2018) Using machine learning on cardiorespiratory fitness data for predicting hypertension:
The Henry Ford ExercIse Testing (FIT) project. PLoS ONE 13(4) (Art. no. e0195344)
26. Sun J, McNaughton CD, Zhang P, Perer A, Gkoulalas-Divanis A, Denny JC, Kirby J, Lasko T,
Saip A, Malin BA (2014) Predicting changes in hypertension control using electronic health
records from a chronic disease management program. J Amer Med Inform Assoc 21(2):337–344
27. Singh N, Singh P, Bhagat D (2019) A rule extraction approach from support vector machines
for diagnosing hypertension among diabetics. Expert Syst Appl 130:188–205
28. Calheiros RN, Ramamohanarao K, Buyya R, Leckie C, Versteeg S (2017) On the effectiveness
of isolation-based anomaly detection in cloud data centers. Concurrency Comput Pract Expert
29(18):e4169
29. Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for
balancing machine learning training data. ACM SIGKDD Explore Newslett 6(1):20–29
30. Goel G, Maguire L, Li Y, McLoone S (2013) Evaluation of sampling methods for learning from
imbalanced data. In: Huang D-S, Bevilacqua V, Figueroa JC, Premaratne P (eds) Intelligent
computing theories, vol 7995. Springer, Berlin, Germany, pp 392–401
Scene Classification of Remotely Sensed
Images using Ensembled Machine
Learning Models
1 Introduction
There are three types of satellite imaginary: panchromatic, multispectral and hyper-
spectral images [2]. Over the past three decades, machine learning (ML) has been
success of artificial intelligence (AI) that involves the study and development of
computational models of learning process. It has been used in various applications [3]
such as image recognition, speech recognition, image classification, object detection
and web page ranking.
Supervised and unsupervised learning are main research areas in the field of
machine learning. In the unsupervised learning [4], model generation is based on
set of training data without target (output) value. The aim of unsupervised learning
algorithm is to organize data in some way or to group it into clusters or to find
different ways of looking at complex data in such a way that it appears simpler or
more organized. It is mainly used in knowledge discovery, parameter determination
and preprocessing. (1) k-means clustering, (2) Apriori algorithm is the different kinds
of unsupervised learning methods.
In the supervised learning [5], model generation is based on set of training data
with target (output) value. The model is mapping between the input and target values.
Finally, the model is used to predict the instance with unlabelled data. The purpose of
supervised learning is to provide a model that has low prediction error on future data.
Support vector machine (SVM), artificial neural network (ANN), k-nearest neighbour
(k-NN), logistic regression, decision tree [6], random forest, multilayer perceptron,
naive Bayes and ensemble learning are the different kinds of supervised learning
methods. In the past few decades, many researchers have reported the application of
multiple combine classifier (MCC) to produce a single classification in RSI images.
The resulting classifier is referred to as the ensemble classifier, and the classifier is
usually more accurate than any of the individual or base classifiers. An ensemble
classifier combines the decision of a set of classifier by hard voting or soft voting to
classify unknown examples.
The main objective of this paper is to propose a highly efficient classification
model for the classification of remote sensing images using ensembling techniques.
Ensemble model has exhibited great potential in recent decades to improve the clas-
sification accuracy and reliability of the classification of the remote sensing image
scenes. We have combined the decision of some base classifiers for the classifica-
tion of remote sensing images by the techniques, namely AdaBoost, Bagging, Hard
Voting, Soft Voting and Stack Generalization, and the results of each member clas-
sifier are evaluated. The rest of this paper is planned as follows: Sect. 2 provides
review of the literature of the proposed work. Section 3 includes the architecture of
proposed work. Section 4 evaluates the performance metrics of proposed approach.
Section 5 discusses the experimental result and analysis. Finally, conclusion is shown
in Sect. 6.
Scene Classification of Remotely Sensed Images using Ensembled … 537
2 Related Works
In the past few decades, researchers have been working for improving the classifica-
tion accuracy [7]. However, the classification accuracy is affected by the quality of
training data used and real-world data suffers from many problems that may degrade
the interpretation ability of the remote sensing data [8]. Various traditional machine
learning algorithms have been applied by various researchers with the aim of solving
the classification problems and tried to improve the current methods that exist in
order to handle complex factors. Mountrakis et al. [9] discussed the applications of
SVMs in remote sensing. In many cases, the SVM classifiers have better accuracy,
stability and robustness when compared with other classifiers such as neural network
and k-nearest neighbour. Thanh Noh et al. presented an approach that compared
land cover classification of RSIs with nonparametric classifiers such as SVM, k-
NN and random forest algorithms. In their work, they choose fourteen classes of
data and compared with the above-mentioned three classifiers. Jeevitha et al. [10]
discussed spatial information-based image classification using SVM. It is focused
on image classification by selecting the active learning approach and comparison
of their performance was made. Ayhan et al. [11] analysed various image classifi-
cation methods for RSI images. In this study, researchers compared artificial neural
networks, standard maximum likelihood classifier, and the fuzzy logic method. Based
on the comparison, ANN classification is more robust than other two classifiers.
Cavallaro et al. [12] developed an image classification model for huge amount of data
by using the support vector machine. Comparison between the k-NN and random
forest was presented in the study.
McInerney et al. [13] show that random forest classifier achieved the highest
accuracy than the k-NN classifier for scene classification of remote sensing images.
Hagar et al. [14] used a hybrid algorithm, k-NN and ANN. They used ANN for
testing and extracting features, while k-NN was used for image segmentation and
classification. They have achieved 92% of accuracy by k-NN and said it was better
than ANN as it only achieved 89.2% accuracy. David et al. [15] discussed nonpara-
metric regression and classification of ML algorithms for geosciences. The aim of
this approach is solving classification problems in the area of geosciences. Belgiu
et al. [16] reviewed random forest in remote sensing. The random forest classifier
can successfully handle the high dimensional data. Pal et al. [17] developed random
forest classifier for scene classification of remote sensing images that compared its
performance with SVM in terms of accuracy, training time and user-defined param-
eters. Zhao et al. [18] proposed an multiple-based bag of visual words model (multi-
BOVW), which is a two-phase classification method. The multi-BOVW approach
outperformed traditional score-level fusion-based multi-BOVW approach. Zanaty
et al. [19] used a comparison study by taking SVM and multilayer perceptron for
classification of data. Blanzieri et al. [20] proposed a new variant of k-NN classifier
based on the maximum margin principle. Wang et al. [21] proposed remote sensing
image classification based on SVM and modified binary coded ACO algorithm.
Huaifei Shen et al. [7] proposed a multiple classifier classifiers by using different
538 P. Deepan and L. R. Sudha
voting methods for RSI image classifications. Yunqi Miao et al. introduced a MCS
for RSI image scene classification. In this technique, MCS can successfully classify
the RSIs with higher accuracy and reduce the computation cost. The above traditional
classifier techniques are not efficient, because they give low performance of train and
test data. Taking the above disadvantages into consideration, our contribution is to
propose an ensemble model for scene classification of RSI images by using SVM,
random forest and multilayer perceptron.
3 Proposed Works
This section presents the feature extraction techniques, the traditional classifiers used
for ensembling which are also called as base classifiers and the proposed ensemble
classifier. In the first stage, speed-up robust feature techniques are used to extract the
features from the dataset. Then, the extracted feature values are applied into the base
classifiers such as support vector machine, decision tree, logistic regression, random
forest, multilayer perceptron, Naive Bayes and k-nearest neighbours. Based on the
results, the best three base classifiers are ensembled for improving the accuracy of
RSI scene classification.
In computer vision, SURF technique is composed of two steps, namely local feature
detector and descriptor, developed by Bay et al. [22]. The standard version of SURF
is advancement of SIFT [23]. It is much faster and more robust, since it uses invariant
features of local similarity for image matching. SURF’s initial stage is the gener-
ation of key points. The next stage defines the invariant descriptor of these key
points, which are further used for various applications such as image classification,
image registration, camera calibration, and correspondence determination between
two images of the same object. As shown in Fig. 1, the SURF feature extraction
technique consists of four stages.
First step of SURF process is shaping the integral image, and it is efficient way
for calculating the sum of values in an input image. It can also be used to measure
the average intensity of the image produced. Afterwards, search point of interest
for the coordinates [24]. A point at which direction of the boundary or edge of an
object rapidly changes is called as point of interest. Input image and corresponding
points of interest for images are shown in Fig. 2a, b, respectively. Harris corner
detector is familiar and widely used corner detector, but it is not scale invariant. The
Hessian matrix for automatic scale selection solved the problem. For the purpose of
point detection in an image, SURF uses Hessian matrix approximation that is both
scale and rotation invariant. After finding feature candidate of the image, a key point
candidate is searched by non-maxima suppression method. The last step of SURF is
Scene Classification of Remotely Sensed Images using Ensembled … 539
to describe the obtained key point. The process is ended by finding pixel distribution
of neighbours around the key point, which generates SURF feature vector for an
input image.
540 P. Deepan and L. R. Sudha
Boosting is a fixed method that can construct a strong classifier from a set of base
classifiers or weak classifiers [28]. The boosting is performed by constructing a
model from the set of training data and then creating a second model that attempt to
rectify the errors from the first model. The models will be applied until the training
set is perfectly predicted. Bagging has been shown to minimize the variance of the
classification, thus boosting decreases both variance and bias of the classification. In
general, boosting can achieve more accuracy of classification results than the bagging
algorithm. Boosting classifiers can be represented in the form:
Scene Classification of Remotely Sensed Images using Ensembled … 541
N
FN = f t (X ) (1)
i=1
where f t (x) is the base classifier that takes the test sample set X and returns the
corresponding class. N represents the number of base classifiers. The computational
time of boosting algorithms is more than bagging algorithms. There are three kinds
of ensemble algorithms to boost the model, such as AdaBoost, gradient tree boosting
and XGBoost. The AdaBoost which is also called as Adaptive Boosting and it is one
of the famous ensemble algorithms that have been developed for the classification
purpose. In some cases, AdaBoost classifier fails to improve the performance of
the base classifier due to over fitting problem. The boosting ensemble classifier
operations are shown in Fig. 3.
The voting classifier is one of the most familiar ensemble classifiers, which is used
to combine or ensemble the traditional classifier based on the voting rule [29]. There
are two types of voting classifier, namely soft voting and hard voting. Hard voting
model is selected from an ensemble to make the final prediction based on the majority
voting for accuracy [30]. The hard voting is also known as majority voting. In soft
voting model, we can predict the class label that can take average probability of
classes. For example, consider the three traditional classifier, namely SVM, k-NN
and MLP. In hard voting, it will give score 1 of 3 (2 vote in favour and 1 against), so it
would classify as positive. Similarly, soft voting will give the average of probabilities
which is 0.6 and classify as positive. Figure 4 shows the block diagram of voting rule
classifiers.
542 P. Deepan and L. R. Sudha
The concept of stacking was introduced by Wolpert et al. [31] which concludes
stacking works by deducing the biases of generalizes with respect to learning set.
Breiman discussed stacked regression by using cross-validation to for the good
combination. The stacking is to stack the predictions p1 , p2 , … pm by linear
combination with weights ai , i ∈ 1,2, … , m:
m
Pstacking = ai pi (x) (2)
i=1
P N
Prediction True False
Y
Positive Positive
False True
N
Negative Negative
P N
completeness. These two measures depend on True Positive (TP) in the confusion
matrix.
Let TP, TN, FP and FN denote the true positive, true negative, false positive and
false negative, respectively. The TP is a result in which the models estimate the
positive class correctly. The TN is a result in which the models estimate the negative
class correctly. The FP is a result in which the positive class is incorrectly predicted
by the models. The FN is a result where the negative class is incorrectly predicted
by the models.
4.1 Precision
The precision metrics is used to measure the proportion of positive prediction of the
proposed ensemble classification model. The number of true positive results, divided
by the number of positive results expected by the classifier, is determined.
TruePositive
PRE = (3)
(TruePositive + FalsePositive)
544 P. Deepan and L. R. Sudha
4.2 Recall
The recall measures proportion of positive that are correctly detected. The number
of correct positive results divided by the number of all relevant samples can be
calculated from the recall measure.
TruePositive
REC = (4)
(TruePositive + FalseNegative)
4.3 Accuracy
The accuracy measure can be calculated by dividing the number of correct predictions
by the total number of input samples.
TruePositive + TrueNegative
Acc = (5)
TruePositive + FalsePositive + FalseNegative + TrueNegative
4.4 F1-Score
The F1-measure (harmonic mean) is used to balance between the precision and recall
measures. The F1-score measure can be calculated as follows:
Precision × Recall
F =2× (6)
(Precision + Recall)
The proposed ensemble model (AdaBoost, Bagging, Voting and Stack generaliza-
tion) has been developed on Python and Anaconda IDE tools. The model was applied
on the PatternNet dataset [32]. It contains totally 38 classes, 30,400 satellite images
and each class consists of 800 images. The resolution of each image has a size of
256 × 256 pixels with RGB colour space. The spatial picture resolution ranges from
0.062 to 4.69 m. We have selected ten classes randomly in our proposed work, such
as airplane, baseball field, beach, bridge, forest, harbour, overpass, river, storage
Scene Classification of Remotely Sensed Images using Ensembled … 545
tank and tennis court, and they are labelled 0–9, respectively. Some sample images
from the PatternNet dataset for RSI scene classification are shown in Fig. 7. Each
row shows sample images in each class. The dataset was independently divided into
training and testing sets. 80% of dataset are used for training the proposed ensemble
model and 20% of dataset are used for testing.
In this section, we have analysed the performance of base classifiers such as SVM,
decision tree, logistic regression, random forest, multilayer perceptron, Naive Bayes
and K-NN. The average accuracy, precision, recall and F1-score of each base clas-
sifier were assessed and summarized in Table 1. Support vector machine and multi-
layer perceptron generated the highest performance accuracy of 92%. Random forest
took the second highest performance of 91% accuracy. The classification accuracy
by decision tree was the lowest among all eight base classifiers with accuracy of
70.75%.
In order to improve the performance of individual or base classifiers, we have
to combine those classifiers with other classifiers. In our ensemble work, we have
developed the following five ensemble learning models.
1. We have used bagging in random forest classifier.
2. With stack generalization, we have ensemble random forest, SVM-Linear, MLP,
SVM-kernel and logistic regression.
3. We have applied AdaBoost method in random forest classifier.
4. For weighted voting method, we have ensemble three base classifiers, namely
MLP, SVM and random forest.
5. Finally, for majority voting method we have ensemble base classifiers of MLP,
SVM and random forest.
Table 2 and Fig. 9 show the performance of different ensembling techniques on
the base models, namely SVM-Linear, random forest, multilayer perceptron, SVM-
kernel and logistic regression. It is inferred from the table that, the ensemble models
show better performance even though the performance of individual base classifier
is comparatively low. Also we found that, majority voting technique applied with
MLP, SVM-Linear and random forest gives highest accuracy of 93.5% and bagging
technique applied with random forest gives lowest accuracy of 91.25%. But, it is
better than all the five base classifiers used. Confusion matrix of the ensemble models
are shown in Fig. 8.
546 P. Deepan and L. R. Sudha
6 Conclusion
References
1. Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: benchmark and state
of the art. In: Proceedings of the IEEE, pp 1–19
2. Ghamisi P, Plaza J, Chen Y, Li J (2017) Advanced supervised classifiers for hyperspectral
images: a review. IEEE Geosci Remote Sens 5:1–23
3. Maxwell AE, Warner TA, Fang F (2018) Implementation of machine-learning classification in
remote sensing: an applied review. Int J Remote Sens 2784–2817
4. Cheriyadat AM (2014) Unsupervised feature learning for aerial scene classification. IEEE
Trans Geosci Remote Sens 1–12
5. Deepan P, Sudha LR (2020) Object classification of remote sensing image using deep convo-
lutional neural network. In: The cognitive approach in cloud computing and internet of things
technologies for surveillance tracking systems, pp 107–120. https://doi.org/10.1016/B978-0-
12-816385-6.00008-8
6. Akbulut Y, Sengur A, Guo Y, Smarandache F (2017) NS-k-NN: neutrosophic set-based k-
nearest neighbors classifier. Symmetry 9:1–12
7. Deepan P, Sudha LR (2019) Fusion of deep learning models for improving classification
accuracy of remote sensing images. Mech Continua Math Sci 14:189–201
8. Deepan P, Sudha LR (2020) Remote sensing image scene classification using dilated
convolutional neural networks. Int J Emerg Trends Eng Res 8(7):3622–3630
9. Mountrakis G, Im J, Ogole C (2011) Support vector machines in remote sensing: a review.
ISPRS J Photogram Remote Sens 247–259
10. Thanh Noi P, Kappas M (2018) Comparison of random forest, k-nearest neighbor, and support
vector machine classifiers for land cover classification using sentinel-2 imagery. J Sci Technol
Sens 1–20
11. Jeevitha P, Ganesh Kumar P (2014) Spatial information based image classification using support
vector machine. Int J Innov Res Comput Commun Eng 14–22
12. Ayhan E, Kansu O (2012) Analysis of image classification methods for remote sensing
experimental techniques. Soc Exp Mech 18–25
13. Cavallaro G, Riedel M, Richerzhagen M (2013) On understanding big data impacts in remotely
sensed image classification using support vector machine methods. IEEE J Select Top Appl
Earth Observ Remote Sens 1–13
14. McInerney DO, Nieuwenhuis M (2007) A comparative analysis of kNN and decision tree
methods for the irish national forest inventory. Int J Remote Sens 4937–4955
550 P. Deepan and L. R. Sudha
15. Hagar HME, Mahmoud HA, Mousa FA (2015) Bovines muzzle classification based on machine
learning techniques. Procedia Comput Sci 65:864–871
16. David J, Alavi H, Gandomi H (2015) Machine learning in geosciences and remote sensing.
Geosci Front 1–9
17. Belgiu M, Drăguţ L (2016) Random forest in remote sensing: a review of applications and
future directions. ISPRS J Photogram Remote Sens 24–31
18. Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens
217–222
19. Zhao L, Tang P, Hue L (2016) Feature significance-based multi bag-of-visual-words model for
remote sensing image scene classification. J Appl Remote Sens 1–9
20. Zanaty EA (2012) Support vector machines (SVMs) versus multilayer perceptron (MLP) in
data classification. Egypt Inf J 177–183
21. Blanzieri E, Melgani F (2008) Nearest neighbor classification of remote sensing images with
the maximal margin principle. IEEE Trans Geosci Remote Sens 1804–1811
22. Wanga M, Wana Y, Yeb Z (2017) Remote sensing image classification based on the optimal
support vector machine and modified binary coded ant colony optimization algorithm. Inf Sci
1–22
23. Miao Y, Hainan WH, Zhang B (2018) Multiple Classifier System for Remote Sensing Images
Classification, pp 491–501, Springer Nature Switzerland AG
24. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: Computer
vision–ECCV, pp 404–417, Springer, Berlin
25. Krig S (2014) Interest point detector and feature descriptor survey. In: Computer vision metrics,
pp 217–282, Springer, Berlin
26. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction
techniques in machine learning. In: International conference on science and information, pp
372–378
27. Han M, Zhu X, Yao W (2012) Remote sensing image classification based on neural network
ensemble algorithm. Int J Neuro Comput 33–138
28. Chen Y, Dou P, Yang X (2017) Improving land use/cover classification with a multiple classifier
system using adaboost integration technique. J Remote Sens 1–20
29. Galar M, Fernandez E, Bustince H, Herrera F (2012) A review on ensembles for the class imbal-
ance problem: bagging, boosting and hybrid-based approaches. IEEE Trans Comp Package
Manuf Technol 463–484
30. Dieterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems,
pp 1–15, Springer, Berlin
31. Shen H, Lin Y, Tian Q (2018) A comparison of multiple classifier combinations using different
voting-weights for remote sensing image classification. Int J Remote Sens 3705–3722
32. Kuncheva LI, Rodríguez JJ (2012) A weighted voting framework for classifiers ensembles. Int
J Knowl Inf Syst 1–17
33. Wolpert DH (1992) Stack Generalization. Neural Network 241–259
34. PatternNet Dataset is available at https://sites.google.com/view/zhouwx/dataset
35. Mohandes M, Deriche M, Aliyu S (2018) Classifiers combination techniques: a comprehensive
review. IEEE Access 1–14
Fuzziness and Vagueness in Natural
Language Quantifiers: Searching
and Systemizing Few Patterns
in Predicate Logic
Harjit Singh
1 Introduction
The term ‘quantifier’ is derived from the Latin word ‘quantitas,’ which means
a sense of quantity. According to Aristotle, quantifiers are basically expressions of
universal and existential quantifiers. In the beginning, logicians have not defined
quantifiers but also to place them in a grammatical framework by analyzing the new
quantification structures. It has occupied a significant place in the traditional and
modern logic studies. In general, quantifiers are defined as the quantity carrier
words in the linguistics discipline. In fact, they consider as nouns, names, and noun
phrases and on the other hand, they appear with verbs or verb phrase structures.
They are generally symbolized with ‘e’ to indicate entities like nouns and noun
phrases. In English, noun phrases are defined under ‘generalized quantifiers’ those
H. Singh (&)
Indira Gandhi National Tribal University, Amarkantak, Madhya Pradesh 484887, India
have specific properties. Further, the study of noun phrases usually brought up the
types concept e <e, t> , e, <e, t> t to analysis the English noun phrases [1].
By theoretical assumptions of quantifiers in the context of a natural language, we
have found that there is a model like M = (E, [[ ]]) where the distribution of
E considers as the discourse’s domain and [[ ]] it indicates to the assignment
function for quantifiers in a M (model). Here, it is an important to notice that the
denotation property of nominal category is represented through Quantifier [2].
In this paper, we have total seven sections. Section 1 discusses the introductory
part of quantifiers. Section 2 surveys the quantifiers and fuzzy quantifiers in gen-
eral. Section 3 presents the aims and objectives of the study. Section 4 deals with a
contrastive study of quantifiers both in Punjabi and Hindi. Section 5 describes the
vague nature of Punjabi quantifiers and predicate logic. Section 6 presents the
results of the study. Section 7 concludes that the nature and function of fuzzy
quantifiers is significant not only for a language specific purpose but also to gen-
eralize the mapping plan and strategies.
2 Related Works
1
Note that the first combination of (A and E) shows the universality of quantifiers and the next one
(I and O) seems us the combination of negative quantifiers. Affirmation within quantifiers iden-
tified with a group of A and I. While the E and O group becomes fused to create negative
quantifiers. It determines the relations, particularly ‘the binary relation between the sets’ of
quantifiers.
Fuzziness and Vagueness in Natural Language Quantifiers … 553
Similarly, existential quantifiers like ‘9’ contains ‘some’ for some kind of senses
to define the variable x in the form of an object either animals or humans in a given
discourse. It is an interesting to check the availability of animal objects with such
quantifier [4, pp. 57–60].
‘x0 is a horse
9x ðx is a horseÞ
In the standard predicate logic, we have found that DETs such as (every, a, the)
are determiners that define the binary relations in a natural language. Sometimes,
we have seen that DETs behave like denotations [5]. Each lexical item in the form
of a word class like (Noun, Verb, Adjective, Adverb, Preposition, etc.) consists of a
specific meaning in a natural language. Due to various contexts of a single lexical
item, sometimes it may create ambiguity in a discourse. Such situations may
be appeared as fuzzy or vague within a logic and they could be dealt with the help
of fuzzy sets2 [6].
2
The class of objects and with the degree of members may generally be discussed under fuzzy set.
We generally assume that a set is a combination of objects and things that carry the values either 0
or 1. And we also discuss union, intersection, complement like features under fuzzy set.
554 H. Singh
In this section, we first compare the quantifiers in both Punjabi and Hindi.
Secondly, we particularly observe a phonetic matter in the context of Punjabi
only and are proposing an additional inventory of few fuzzy quantifiers those are
not similar with Hindi.
3
Note that such quantifiers (ਮਾੜਾ-ਜਾ/[mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ/[ɟəmɑ-i]; ਮਸਾ-ਕੁ/[məsɑ-kʊ]; and ਭੌਰਾ-ਕੁ/
[porɑ-kʊ]) may be considered as fuzzy quantifiers in Punjabi. Mostly, linguists have thought that
Punjabi and Hindi are similar; however, they ignore the language specific aspect here. Related
these quantifiers in Punjabi, we have an idea that Hindi speakers do not have any transliteration.
Fuzziness and Vagueness in Natural Language Quantifiers … 555
Hindi is a new Indo-Aryan (NIA) language, which has a Devanagari script. It has a
subject–object–verb word order, and it is rich with open and close word classes. It
has singular and plural number. It has masculine, feminine like gender properties
and types of cases that marked at nominal classes. Sometimes, it is found similar
with Punjabi; however, there are many differences between them. Like Punjabi, it
has also an inventory of few quantifiers. Table 4 shows the list [7].
Table 4 shows that there are 10 types of quantifiers found in Hindi. Of these,
हर/hər [every] and कुछ/kʊʃə [something] are types of universal and existential
quantifiers. We can compare थोड़ा/ʈhoɽɑ [a little]; लगभग/ləɡəbhəɡə [approxi-
mately]; सर/ser [approximately two ponds in weight] like quantifiers with Punjabi.
556 H. Singh
As we have already seen in Table 3 that there are four types consider as fuzzy
quantifiers. They may or may not similar with Hindi quantifiers, or it is possible that
the difference may be happened due to phonetics only. Here, we must argue that
Punjabi has already similar phonetics and usage of these Hindi quantifiers.
However, the fuzzy quantifiers, which have been discussed in Table 3 are addi-
tional forms in Punjabi and they could not be assumed even phonetically and not
possible from translation point of view in Hindi. Thus, we can say that even the
semantics could be matched for fuzzy quantifiers in both the languages however
Hindi quantifiers like थोड़ा/ʈhoɽɑ [a little]; लगभग/ləɡəbhəɡə [approximately]; सर/
ser [approximately two ponds in weight] could not be compared with Punjabi
fuzzy quantifiers because it is not a matter of phonetics only. In fact, it is a matter
of language specific that grants such variety only in Punjabi.
We have argued that a list of Punjabi quantifiers in Table 3 is not found anywhere
in Hindi language. All they are vague in nature. Before going to discuss them, we
would like to begin with generalized quantifiers such as ‘ਹਰ’ (every) and ‘ਕੁੱਝ’
(some) in a Punjabi. Table 5 shows them in a relation to predicate logic.
Table 5 has demonstrated that ‘x’ variable stands for a man, and it also repre-
sents the property of honesty in both the contexts of universal quantifier and
existential quantifier. On the other hand, it is significant to notice that it is bound
with both the quantifiers in Punjabi [8].
Punjabi has few quantifiers (like ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]; ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ
[ɟəmɑ-i]; and ਮਸਾ-ਕੁ [məsɑ-kʊ]) that generally found as fuzzy. They are
Fuzziness and Vagueness in Natural Language Quantifiers … 557
4
In predicate logic, both definities and indefinities play a significant role and they generally
identify with type symbols <e, t/t> to consider each and every expression as type only [10].
558 H. Singh
We have selected total 29 verbs in (its basic, imperfective and perfective forms) to
structure such fuzzy quantifiers in Punjabi. We may notice the difference between
all four types (like ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]; ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ]; ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i]; and ਮਸਾ-
ਕੁ [məsɑ-kʊ]) when they come with predicates. Find Table 8.
Table 8 has represented the data set for Punjabi fuzzy quantifiers in relation to
number of predicates. We find out that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] quantifier may
successfully be appeared with all mentioned predicates. While ਭੌਰਾ-ਕੁ [pòrɑ-kʊ] has
comparatively not shown any correspondence with total eight predicates in the
table. Here, we can also see the grammatical changes in verbal predicates due to
direct influence of fuzzy quantifiers.
Table 9 has represented the grammatical (present and past perfect) informa-
tion and negation contexts of fuzzy quantifiers in Punjabi. As we have already seen
in Table 8 that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] quantifier has qualified both perfective and
Fuzziness and Vagueness in Natural Language Quantifiers … 559
negative contexts. While remaining other quantifiers have not been fitted according
to grammatical information, and we have found them unadjustable and incontextual
here.
Based on the above data set, we may assume few steps to generalize and map the
quantifiers in a natural language like Punjabi. The following steps may present a
thought of as an algorithm.
S1Input: a simple sentence or groups in a text/discourse
S2Search: manually all sorts of quantifiers, especially focus will be given to fuzzy
forms
S3 Slots: initially investigated quantifiers keep in separate slots a, b, c, etc.
S4Replacement: Go to step 1 and replace each quantifier with other in a same text/
discourse
S5Check the results: notice the results before and after the replacement
S6Single slot: After getting the satisfactory results then keeps only one slot for
resulted quantifiers (Fig. 2).
560 H. Singh
Vagueness and the fuzzy character of quantifiers are commonly found in a natural
language. Sometimes, they give monotone effects to the spontaneous speech,
however, not always much recognized due to certain contextual linkage. In pre-
vious sections, we have seen the availability of few fuzzy quantifiers with predicates
and their occurrences in a grammar. Secondly, we have also intuitively tried to draw
a mapping plan to search them and finally to keep them in only one slot. When we
carefully look at Table 8 then we find that the selected four categories of fuzzy
quantifiers give different results. See Fig. 3.
Fig no (3) presents that ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] may be occured total (29/29 = 100%)
with total 29 predicates. Secondly, ਮਸਾ-ਕੁ/ਈ [məsɑ-kʊ/ɪ]) has (26/29 = 89.65%)
Fuzziness and Vagueness in Natural Language Quantifiers … 561
successfully appeared with all predicates. Only ਭਰਾ-ਕੁ [pòrɑ-kʊ] has received the
lowest percentage (20/29 = 68.96%) in relation to predicates.
Further, we have studied these four fuzzy quantifiers in the grammatical con-
texts. We have selected four predicates (ਚੱਲੀਂ/tʃəli:; ਖਾਈਂ/khɑi:; ਦਈਂ/dəi:; and ਆਈਂ/ɑ:
i), and we find that ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i] does not tolerate (perfect or negative) contexts
and it gives zero result, whereas other quantifiers appear and adjust with the
predicates. Study Figs. 4, 5, 6, and 7.
Fig no (4) 4 shows that the selected predicates can be mapped onto ਮਾੜਾ-ਜਾ
[mɑɽɑ-ɟɑ]. It appears with each predicate form with (4/4 = 100%) results and is
fully adjustable.
Fig no (5) determines that ਭੌਰਾ-ਕੁ [pòrɑ-kʊ] has appeared only with (ਚੱਲੀਂ/tʃəli:;
ਖਾਈਂ/khɑi:; and ਦਈਂ/dəi:) predicates, and it shows (3/4 = 75%) correct result. On the
other hand, it does not come with ਆਈਂ/ɑ:i predicate.
Fig no (6) suggests that ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i] cannot appear and adjust with
above-mentioned four predicates so that it gives only (0/4 = 0%) results.
Fig no (7) shows that ਮਸਾ-ਕੁ/ਈ [məsɑ-kʊ/ɪ]) can appear with all except of ਖਾਈਂ/
khɑi:. It also gives same results (3/4 = 75%) as we got in the case of ਭੌਰਾ-ਕੁ [pòrɑ-
kʊ].
In this way, we have found that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] a type of fuzzy quantifier
is qualified with (29/29 = 100%) and (4/4 = 100%) results in selected both
562 H. Singh
Tables 8 and 9 data sets. In brief, Figs. 8 and 9 will summarize the same results in
more concrete way.
Both Figs. 8 and 9 are clearly showing the data entries of each fuzzy quantifier
against each predicate in the total numbers.
Fuzziness and Vagueness in Natural Language Quantifiers … 563
The study of fuzzy quantifiers in a predicate logic has been noticed long ago. It may
be related with linguistics, philosophy, mathematics, computer science, and engi-
neering. While discussing the natural languages, we primarily focus on to Punjabi
and take a U-turn with Hindi to generalize few ਭੌਰਾ-ਕੁ [pòrɑ-kʊ]; ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ];
ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i]; and ਮਸਾ-ਕੁ [məsɑ-kʊ] like forms of fuzzy quantifiers and
have prepared the mapping plan. It is remembered that this is a first and an initial
level observation in Punjabi so that the selection of data sets and the mapping plan
may be revised in future. Based on two types of data sets (1) data set with 29
predicates and 4 fuzzy quantifiers and (2) data set with 4 predicates and 4 fuzzy
quantifiers, we have found that only ਮਾੜਾ-ਜਾ [mɑɽɑ-ɟɑ] a type has successfully
achieved 100% results, whereas another type ਜਮ੍ਹਾ-ਈ [ɟəmɑ-i] has not appeared a
single time with any predicate in a second data type.
Acknowledgements I would like to thank Vanita Chawadha (a Ph.D. Scholar at JNU) who has
helped me to compare the data sets and to meet the results.
References
1. Bach E, Jelinek E, Kratzer A, Partee BB (eds) (2013) Quantification in natural languages, vol
54. Springer Science and Business Media
2. Gutiérrez-Rexach J (ed) (2003) Semantics: generalized quantifiers and scope, vol 2. Taylor
and Francis
Fuzziness and Vagueness in Natural Language Quantifiers … 567
Abstract Twitter is one of the people’s favorite social media platforms used for
sharing thoughts about different aspects may it be emotional like ‘love’, ‘motivation’,
‘dedication’ businesses like ‘marketing’, ‘startup’, ‘blogging’ or health like ‘gym’,
‘fitness’, ‘food’, and similar areas. People follow hashtags for topics in their interest.
Agreement of a tweet can be measured by likes or retweets. This paper deals with
pure linguistic features other than using embeddings in vector space via TFIDF
or Doc2Vec. This paper deals with a collection of tweets on such hashtags and
classifying the level of likes the tweet will get using pure linguistic features in the
form of a grade.
1 Introduction
With the advent of social media platforms, most of the population is engaged on these
platforms in their day-to-day life. Twitter has been one of the most preferred platforms
for expressing opinions about primitive spheres in daily life. Twitter is considered
as more convenient by people owned to its easy and light user interface with little to
no complex features to offer. People use twitter to connect with people with similar
interests in real time. Hashtags are added to the tweet so that the community members
can take part in the conversation. Twitter has also been preferred as a platform for
marketing and recently also being utilized by the education sector. People have the
choice to retweet, like, or comment on a tweet. Retweets are often used to promote
content on twitter.
Previous literature has mostly focused on specific areas like politics, celebrities,
and sports. Their aim has been to study about the popularity of a tweet with retweet
as the popularity measure. These have considered all aspects like user details, which
include user followers, user verified or not, tweet details such as time of the tweet,
comments, likes, text of tweet either via vector representation or linguistic features
to train a machine learning model for predicting retweets. This paper tries to use only
linguistic features of the text without going in space vector representations to predict
the possible range of number of likes the tweet will get. It focuses on tweets related
to primitive aspects encompassing emotional, health, and other mundane areas. We
collect recent tweets from 2019 Jan to 2019 Dec and use classification algorithms to
predict the like_grade(the bracket of the number of likes) for a tweet.
In this paper, we try different combinations of features and classification
algorithms on the dataset.
2 Related Work
The popularity of a tweet can be measured using the number of ‘retweets’, ‘likes’, and
‘comments’. Earlier work [1] has been done to predict the popularity using ‘retweets’
as the measure of popularity. Researches [2] have also tried to analyze user influence
on twitter using various complex techniques. We try using ‘likes’ as the agreement
measure, ‘retweets’ can be more attributed to the promotion of content being either in
favor, being against, or simple promotion for business purposes. People retweet the
posts to introduce their followers to someone new. A retweet is used when the goal
of the retweet is to amplify your message. People like a tweet when they are in favor
of the tweet. It appears as ‘likes’ is the more appropriate indicator of agreement.
3 Dataset
We kept data limited to recent tweets of a year (2019 Jan to 2019 Dec) rather than
older tweets to keep up with the latest writing style and trends on twitter. We did
not restrict our domain to any specific area to keep the study more generic and open.
Data have been collected based on common hashtags and not specific to any sector.
3.1 Collection
3.2 Filtering
As paper focuses on using the linguistic features, we filtered out data for tweets
that had external links in the form of images, videos, or websites. This reduced the
number of tweets to about 12 K records.
3.3 Preprocessing
The processed the data for any duplicate rows and removed the same. The number
of records after the removal of duplicates came about 11 K. The reduction can be
attributed to duplication due to multiple hashtags. A tweet could have come multiple
times under different hashtags.
4 Methodology
Figure 1 clearly states the flow starting right from the data in the form of tweets till
the final machine learning model training and predictions.
We studied the available approaches to deal with textual data in natural language
processing (NLP). One of the popular strategies is to represent text as a vector in
multi-dimension and use classical machine learning algorithms like support vector
machine (SVM) and K-nearest neighbors (KNN). Although this approach has been
successful to some degree, it is likely to lose linguistic features such as user sentiment.
Therefore, we try to approach this using linguistic features rather than using TFIDF
[3] or Doc2Vec [4] that convert text into vector. Researchers [5] have studied linguistic
features and have discussed their importance in text quality.
1 https://github.com/taspinar/twitterscraper.
572 L. Singh and K. Gautam
We use the following features that depict the user’s mood, user writing tone, style of
writing, etc.
F1-Text Sentiment (t-sentiment)
We use Vader Sentiment Analysis2 [6] to get the sentiment of the text of the tweet. We
use the compound sentiment form the API. It is a number from −1 to +1 exclusive. It
depicts the tone of the writer, whether he was positive, negative, or neutral, depending
on the number.
F2-Hashtag Sentiment (h-sentiment)
Hashtags are an important part of the tweet that is used to share it among people with
similar interests. The sentiment of the hashtags is an important predictor of the kind
of writer and the followers of the hashtag. We also use this along with text sentiment
and use Vader sentiment analysis for the same.
2 https://pypi.org/project/vaderSentiment/.
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure … 573
We transform the number of likes in the form of a grade (like_grade). Each grade
corresponds to a bracket of number of likes obtained dynamically from the given
dataset. We use the following method to obtain the grade brackets rather than using
hardcoded values.
3 https://pypi.org/project/textstat/.
574 L. Singh and K. Gautam
x = mean(likes).
sd = std(likes).
A1 = x + sd.
A2 = x + 0.95 ∗ sd.
A3 = x + 0.9 ∗ sd.
.
.
.
A21 = x.
A22 = max(0, x − 0.05 ∗ sd).
A23 = max(0, x − 0.1 ∗ sd).
.
.
.
A41 = max(0, x − sd).
where A1, A2 … A41 are the grades, x is the arithmetic mean of the number of likes
in the given dataset, sd is the standard deviation for the number of likes in the given
dataset. We try to keep a large number of grades to keep the bracket size small.
Classification has been one of the primitive areas of machine learning. We have
evolved with a number of algorithms in this area. We use the following most
popular classification models in our study. These are support vector machine (SVM),
K-nearest neighbors (KNN)—50 neighbors, decision tree, Random Forest—300
estimators.
Each of these models has already been used for multi-label classification problems
and is not limited to binary classification problems like logistic regression. Our train–
test split of data is 4:1; we use 80% data for training and 20% data for testing the
models. Since the distribution of grades can be highly random, we want to have
sufficient data for training, but we also want to have enough data for testing purpose,
so 4:1 seems a good fit.
5 Feature Engineering
Our supervised machine learning model has to learn a function ‘f ’ such that it is able
to predict ‘y’ (like_grade) based on the ‘x’ (features) (Fig. 2).
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure … 575
Fig. 2 Features F1, F2, F3, F4, F5, an ML model which will train on the dataset and learn f (a
mapping) from X (a combination of these features) to Y (like_grade)
6 Results
The results of the experiments done with all the four machine learning models have
been given in the Tables 1, 2, 3 and 4. Each table shows the input features used for
that model and the corresponding accuracy obtained using those features. We trained
model using individual features and also taking all the features together.
7 Conclusion
All the algorithms did perform fairly decent in terms of accuracy. SVM and KNN
were the best algorithms in terms of accuracy. Decision tree was not as accu-
rate when compared with the other three algorithms. Although, in terms of accu-
racy, the algorithms did perform well, they were not able to successfully predict
‘like_grade’ separately to all types of grades in the test dataset. They failed to recog-
nize other like_grades that were less in number and mostly predicted A22, which
had the maximum number of tweets. The decision tree model did not perform best
in terms of accuracy but was able to assign grades to a few classes other than A22
An Attempt on Twitter ‘likes’ Grading Strategy Using Pure … 577
(0, mean(likes)), whereas others dominated because of the large number of tweets
with the like_grade as A22.
We could further reduce the bracket of the like_grade. This is challenging since
by reducing it further, we are slowly drifting from classification to regression. It is
difficult to strike a balance where most of the tweets do not fall in the single bracket,
and the problem is still in proximity of classification.
Another solution to deal with the dominance of one grade is to get a big enough
dataset with ample of tweets falling in each like_grade bracket. This will help the
model to learn better and differentiate between different like_grades.
We can try and build an Artificial Neural Network (ANN), which might eventually
yield better results that are not biased toward a single like_grade.
We omitted tweets with some external links during the filtering phase. These
tweets could be used if we incorporate the effect of external links within our models.
Like we could have simple features as link_count(count of links), has_img(is an
image present), etc., or complex ones such as using convolutional neural network
(CNN) on these images.
The complete code and dataset are available on GitHub4 which can be used for
further experimentations.
References
1. Huang D, Zhou J, Mu D, Yang F (2014)Retweet behavior prediction in Twitter. In: 2014 seventh
international symposium on computational intelligence and design, Hangzhou, 2014, pp 30–33
2. Riquelme F, González-Cantergiani P (2016) Measuring user influence on Twitter: a survey. J Inf
Process Manag 52. https://doi.org/10.1016/j.ipm.2016.04.003
3. Shahmirzadi O, Lugowski A, Younge K (2018) Text similarity in vector space models: a
comparative study
4. Quoc L, Tomas M (2014) Distributed representations of sentences and documents. In:
Proceedings of the 31st international conference on international conference on machine
learning—volume 32 (ICML’14). JMLR.org, II–1188–II–1196
5. McNamara D, Mccarthy P (2010) Linguistic features of writing quality. Written Commun WRIT
Commun 27:57–86. https://doi.org/10.1177/0741088309351547
6. Hutto CJ, Gilbert EE (2014) VADER: a parsimonious rule-based model for sentiment analysis of
social media text. In: Eighth international conference on weblogs and social media (ICWSM-14).
Ann Arbor, MI
4 https://github.com/singh-l/Twitter_Like_Grade.
Groundwater Level Prediction
and Correlative Study with Groundwater
Contamination Under Conditional
Scenarios: Insights from Multivariate
Deep LSTM Neural Network Modeling
Abstract Groundwater is the primary source for drinking water and irrigation in
India, and from last few years, due to population burst across the nation, there is
a sharp decline in the groundwater level (availability). There is a constant pressure
balance among the groundwater and seawater level so that contaminated water cannot
seep into and due to lowering level there is an alarming situation for water contam-
ination across India. In this paper, we aim to find the liaison between groundwater
level and ground contamination condition through LSTM predictive modeling. The
proposed algorithm for groundwater prediction is based on conditional approach
through deep LSTM modeling and the ground contamination is calculated using an
aggregated scoring approach modeled using Euclidian distance concept. Lastly, a
correlative study is being provided to analyze the liaison in between the said vari-
ables. There is a high negative correlation among the said variables indicating loss of
groundwater level is increasing the contamination level across the taken zones. The
experiment has been carried out on the data across the three eastern Indian states,
viz. West Bengal, Odhisa, and Bihar for a time span from 2004 to 2017.
1 Introduction
examined from the year 2004–2017. The major cause of water contamination is the
decreasing water pressure barrier of groundwater level and sinking of seawater into
it. Afterward, a correlation study has been carried out with two factors, viz. ground-
water level (G.W. Level) and water contamination level, predicting the upcoming
conditions of these states under the stated circumstantial scenarios.
The novelty of the work lies in the prediction of the upcoming condition of G.W.
level in the stated states under different scenarios along with the condition of water
pressure level with the saline water which will eventually lead to water contamination
problem across Eastern India.
The paper has been structured in the following manner; Sect. 2 contains litera-
ture review followed by LSTM modeling of groundwater level prediction along with
the simulation results of different scenarios in Sect. 3. Section 4 contains predictive
modeling of water contamination across the states, and Sect. 5 contains correlation
study among groundwater level availability and groundwater contamination condi-
tions. Lastly, Sect. 6 contains the concluding remark and further scope of study
[7–12].
2 Literature Review
Sarkar and Pandey [1] estimated and predicted the quality of stream water using
ANN. They have taken three layers of neural net to predict their outcomes. They
have predicted the DO concentration of Mathura city along the downstream. Behzat
et al. [2] used SVM algorithm and ANN to predict the groundwater level across the
river bed; they have showed SVM could provide better result than ANN in case of
lower amount of data or due to class imbalance problem. Galavi et al. [3] proposed a
modified ARIMA model to predict the groundwater level, and it showed the modified
version over performed both traditional ARIMA model and adaptive network-based
fuzzy logic. Wang et al. [4] proposed a hybrid model to predict the outcome. They
proposed a coupled model of ARIMA and ensemble empirical mode decomposi-
tion (EEMD-ARIMA) to predict the outcome, and results showed that it is easily
outperforming the results of traditional ARIMA model. Yao et al. [5] optimized the
Elman-RNN network using some genetic algorithm as optimizer in the model and
results showed significant improvement in accuracy metrics. Fan et al. [6] proposed
a multivariable linear regression to show the relation and predicted the outcome on
Yangtze River bank. Zang et al. [7] proposed a PCA-based multivariate autoregres-
sion time series analysis on the data. Yang et al. [8] showed a time series-based
forecasting using random forest regression model. Seo et al. [9] showed a wavelet-
based ANN model to forecast the groundwater level condition. Rashid et al. [10]
developed a benchmark LSTM modified model, and they have optimized the LSTM
model using various genetic algorithm, viz. harmony search (HS), gray wolf opti-
mization (GWO), and ant lion optimization technique. The comparative results of the
model are given with traditional RNN algorithm, showing hybrid model giving better
performance. Nawi et al. [11] proposed a data classifier model using cuckoo search
582 A. Chatterjee et al.
algorithm as its optimizer. The hybrid RNN and cuckoo search showed significant
results.
One of the most promising algorithms which came up to predict the time series data
is Elman-RNN or simply RNN. RNN works by taking the previous state output as
its next set of input. Thus, the time state t will be dependent on t − 1. Though being
highly effective, it has one major drawback of vanishing gradient descent. Long
short-term memory algorithm thus came up with filling up the drawback of RNN
algorithm by introducing the concept of gates which are capable of remembering the
outputs states.
a. Theoretical Framework
The LSTM architecture diminishes the major problem of the RNN architecture
i.e., the vanishing gradient descent problem. It minimized the error in the process
by adding the constant error flow term through the hidden cells but not through the
activation function. In LSTM architecture, there are four gates through which the
memory allocation occurs, mainly forget gate f , input gate i, input modulation gate
g, and the output gate [6].
The forget gate used to process the output of the last state h t−1 and it mainly used
to remove or forget the irrelevant data which is present in the information which have
been passed through the model. The activation function which is majorly used in the
forget gate is the sigmoid function.
The input gate in the architecture adds required information which will be helpful
for further computation. Sigmoid activation function is taken to update the value in
the gate, while tanh activation function is being used for candidate value and scaling
purpose. The equations for the gates are given below.
f t = σ W f · h t−1 , xt + b f (1)
i t = σ Wi · h t−1 , xt + b f (2)
Ct = tan h Wi · h t−1 , xt + bC (3)
Ct = f t × Ct−1 + i t × Ct (4)
Other minor recharge and discharge are taken as constant as their value is negligible
in the analysis but considered for equation balancing. As we have considered out
season into two major parts we can use biannual modeling for our analysis. Thus, for
biannual time step tb , R and W are calculated using the following equations [15–19]
(Fig. 2).
The model predicts in one time step further, and the proposed algorithm is iterative
in nature. Initially four time steps are being feeded, then successive ground level will
be predicted under given conditional circumstances, depending on that the algorithm
will flow into its next iteration taking the variables.
Fig. 2 Recharge and discharge of water in graphical representation. Source Created by Author
Groundwater Level Prediction and Correlative Study … 585
Conditional Scenarios
Good Rainfall and Controlled water usage: (C.S. 1). In this circumstantial scenario,
we have assumed that there is a heavy rainfall (better than normal or mean) has
occurred and water recharge is more than usual. Another parameter taken is ground-
water usage; in this case, we have assumed the usage is controlled , i.e., no such
wastage is recorded for the year.
Good Rainfall and Uncontrolled water usage: (C.S. 2). In this case, heavy rainfall
(10% higher than mean) assumed along with high wastage of groundwater throughout
the year has been recorded. Thus, there will be a shortage of groundwater level in
the next year as there is a high misuse of water.
Poor Rainfall and Controlled water usage: (C.S. 3). In this case, poor rainfall
(20% lower than mean) is assumed along with high controlled water usage across
the year.
Poor Rainfall and Uncontrolled water usage: (C.S. 4). In this case, poor rainfall
(20% lower than mean) is assumed along with high waste of water usage is being
recorded across the year [20].
Proposed Algorithm
The simulation results show that there is a lack of groundwater in the worst-case
scenario for upcoming time, as the trend is going downward. The best-case scenario
shows a promising content of water as it rises from the past year trend. Average case
scenarios following usual trend with minor changes in the curve [22–24].
Evaluation Metrics. The deep LSTM model has been fitted into time series data
thus the model is being regressed along the variables, thus we use root mean-square
for our model evaluation and it is represented using Eq. 9. Table 1 shows our RMSE
value (Fig. 7).
2
N
yi − yi
RMSE = (9)
i=1
N
The quality which have been measured are majorly through sensors thus it is a safe
side to preprocess the data to make it in the same scale before fitting the model. We
have taken data across various districts of the said states thus there would be variation
of upper and lower values.
In various scenarios, data was missing or changed drastically, thus to increase the
robustness of the predictive modeling linear interpolation is being used. The rela-
tionship between two known data and one unknown is seen as linear and formulated
using Eq. 10.
xk+ j − xk
xk+i = xk + i · (10)
j
where
xk+i = Missing Data
xk = Data known before
xk+ j = Known data after.
After interpolating the data, error handling is being checked. The computation of
corrected data and erroneous data is shown in Eq. 11.
(xk−1 +xk+1 )
, if |xk − xk−1 | > β1 or |xk − xk+1 | > β2
xk = f (x) = 2 (11)
xk , else
The data which has been collected has various upper and lower limits. Thus,
we form an aggregate indexing compromising eight factors thus normalizing it as
composite unit has different units of measurement. As a result, each parameter is
normalized by Eq. 12.
X i − X min
Xi = (12)
X max − X min
where
X i = Normalized value lies between [0, 1]
X min = Minimum Value observed
X max = Maximum Value observed.
We have computed the relationship between groundwater level and water contam-
ination level and for that an aggregate scoring method is needed. We have given equal
weights to all the parameters taken for analysis. The eight-dimensional indices maybe
represented in eight dimensions with minimum value 0 and maximum of 1.
The aggregate score of water contamination uses weighted Euclidian distance
from the ideal point of (1, 1, 1, 1, 1, 1, 1, 1). So, the calculation is given in Eq. 13.
Groundwater Level Prediction and Correlative Study … 591
(1 − T )2 + (1 − p H )2 + (1 − C)2 + (1 − B)2
+(1 − N )2 + (1 − Fc)2 + (1 − T c)2 + (1 − F)2
Agw = 1 − (13)
8
where
T = Temperature
pH = pH of water
C = Conductivity
B = B.O.D
N = Nitrate Level
Fc = Faecal Coil
T c = Total Coil
F = Fluoride level.
The data has been preprocessed and now we have built the empirical model which
will be modeled to feed the data in our neural network modeling. Let the water quality
is taken for constant place and time and the parameter number is j then we can say:
Si,n = Yi,1 , T1 . . . Yi,n , Tn (14)
The feeded data will be linear imputation computed using Eq. 15.
(Yi,u − Yi,v )
L(t) = Yi,u + t − Ti,u (15)
(Ti,u − Ti,v )
The model has been simulated on the normalized and processed dataset and predic-
tions are made of the next years to get the insight on how the water contamination
level is varying across the years in the said states. As the data is being normalized
and aggregated using Formulas 12 and 13, respectively, 1 will denote highest amount
of water contamination whereas 0 will give the lowest (Figs. 8, 9 and Table 2).
Through our modeling, in Sect. 3 we have built a multivariate LSTM neural net
model and it has been modeled under different conditional scenarios and the output
curve shows us that under best-case scenario the groundwater level is much better
592 A. Chatterjee et al.
Fig. 8 Predictive curve for groundwater contamination level. Source Created by Author
than the worst-case scenario and the average case follows the similar usual trend. In
Sect. 4, we have predicted the water quality level by taking eight parameters of water
contamination, the entire data has been processed and an aggregated final score is
generated using data normalization and Euclidian distance formation.
We know that there is a constant water pressure balancing among groundwater and
seawater so that seawater cannot seep into making the water unusable. In this section,
we have presented a correlated study between these two factors, viz. groundwater
level and groundwater contamination. We analyze how much these factors are being
correlated across the year and how they are influencing each other. Along with this,
we examine how much change in the water contamination is being caused by the
change of groundwater level.
At first, we have examined the correlation value between the two variables using
Pearson’s correlation factor (Table 3).
The results from the correlation value show there is a high negative correla-
tion among the variables taken. This denotes that our assumption of water pressure
balancing is true and the water is getting contaminated with gradual decrease of
groundwater level leading to increase of water contamination.
To solidify our claims, we use OLS model keeping our water level as independent
variable and the contamination level as dependent value and from the result we
can conclude that how much contamination level is increasing per 1 unit change of
groundwater level (Table 4).
It is very much clear from the correlation result that these two variables are related
in negative way thus constant loss of groundwater level is misbalancing the water
balance and water contamination is increasing. OLS model suggests that for every
1-unit loss of groundwater there will be 0.0356 unit increase of water contamination
level.
Comparing the two above graphs, i.e., Figs. 10 and 11, it can be concluded that
the time during which the groundwater level is decreasing, the contamination level is
Fig. 10 Future prediction curve for groundwater level. Source Created by Author
Fig. 11 Future prediction curve for aggregated groundwater contamination. Source Created by
Author
Groundwater Level Prediction and Correlative Study … 595
evidently increasing. Even from the correlation factor, it is observed that these factors
have negative correlation. Hence, with increasing groundwater level, the contami-
nation is decreasing over the given period of time. Also, if the predicted plots are
considered for both the factors, it can be observed that even in the upcoming time,
the contamination level will decrease with increasing groundwater level.
Decreasing groundwater level is a serious situation across India. Along with the rising
level of groundwater contamination is questioning the existence of human race. In
this paper, we have analyzed and predicted the upcoming trend for the groundwater
availability using LSTM modeling under different circumstantial scenarios. We have
assumed scenarios which will affect our results in water level. Simulation result of
the model is being taken for analysis. In the next section, we have predicted the trend
for groundwater contamination level, the data is being highly preprocessed and an
aggregate function value is being generated using Euclidian distance method. The
trend has been analyzed and in the next section a correlative study has been shown in
between groundwater level and ground contamination level. We observed a negative
correlation among the said variables indicating the lowering level of groundwater
level is allowing the unsuitable seawater seep into the floors and the reaming water
is getting contaminated through this. Lastly, we showed prediction for coming years
using interpolated-average moving value of the time series.
The study could be extended by taking the water level for more states or across the
nation. We can use genetic algorithms as optimizer function such as ABC algorithm,
LA algorithm to check the trend style. The time span taken could be extended for 10
more years for better curve result.
References
1. Sarkar A, Pandey P (2015) River water quality modelling using artificial neural network
technique. Aquatic Procedia 4:1070–1077
2. House PLA, Chang H (2011) Urban water demand modeling: review of concepts, methods,
and organizing principles. Water Res Res 47(5)
3. Gwaivangmin BI, Jiya JD (2017) Water demand prediction using artificial neural network for
supervisory control. Nigerian J Technol 36(1):148–154
4. Coulibaly P, Anctil F, Aravena R, Bobée B (2001) Artificial neural network modeling of water
table depth fluctuations. Water Resour Res 37(4):885–896
5. Gulati A, Banerjee P (2016) Emerging water crisis in India: key issues and way forward. Indian
J Econ Special Centennial Issue 681–704
6. Barzegar R, Adamowski J, Moghaddam AA (2016) Application of wavelet-artificial intel-
ligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran.
Stochastic Environ Res Risk Assess 30(7):1797–1819
596 A. Chatterjee et al.
1 Introduction
2 Related Works
HSI technology was primarily used in many difficult Earth observation and remote
sensing applications such as greenery monitoring, urbanization research, farm and
field technology, and surveillance. The need for quick and reliable authentication
and object recognition methods has intensified the interest in applying hyperspec-
tral imaging for quality control in the agriculture, medicinal, and food industries.
Definitely, a physical material with rich spectral details has its own characteristic
reflectance or radiance signature.
Hyperspectral remote sensors have a superior discriminating capability, partic-
ularly for materials which are visually similar. These distinctive features enable
numerous uses in fields of computer vision and remote sensing, e.g., military target
identification, inventory control, and medical diagnosis, etc. There is therefore a
tradeoff between high spectral resolution and spatial accuracy. The benefits of hyper-
spectral imaging over conventional approaches include reduced sample processing,
non-destructive design, fast acquisition times, and simultaneous simulation of spatial
distribution of various chemical compositions [1].
A Novel Deep Hybrid Spectral Network for Hyperspectral … 599
One of the main problems is how the HSI functions can be easily removed. Spec-
tral—spatial features are currently commonly used, and efficiency in HSI classifica-
tion has slowly increased from using only spectral features to using spectral—spatial
features together [2]. Deep learning models have been developed for the purpose of
classifying HSI to remove spectral—spatial characteristics. The core view of deep
learning is to derive conceptual features from original input, using superimposed
multilayer representation.
In SAE-LR, the testing time is enhanced in comparison with KNN and SVM. Also,
it takes much time for the training [3]. The methods such as WI-DL and QBC observes
more time for testing and training. The traditional image priors need be integrated
into the DHSIS method to advance the accuracy and performance [4]. There is a
longest training time observed for the Salinas scene dataset for the proposed CNN
[5].
The method of band selection is vital to choose the salient bands before fusion
with the extracted hashing codes to decrease training time and save storage space.
The major drawback is the requirement of a number of properly labeled data for the
model preparation.
3 Methodology
I—Original input
M—Width of input
N—Height of input
D—No. of spectral bands.
Principal component analysis (PCA)—after dimensionality reduction, the spectral
bands are reduced to B from D.
(M − S + 1) × (N − S + 1) (3)
The 3D patch at location (α, β) is represented by Pα,β , acquires the width from
The input data in 2D CNN is transformed with 2D kernels. The convolution comes
about, measuring the sum of the dot product between the input data and the kernel.
To cover maximum spatial dimension, the filter is strided over the input data. The
3D convolution is achieved by translating the 3D data to a 3D kernel. The feature
maps of the convolution layer are created in the hybrid model for HSI data using the
3D kernel over multiple contiguous bands within the input layer.
A Novel Deep Hybrid Spectral Network for Hyperspectral … 601
The 2D CNN is applied once before the flatten layer, bearing in mind that the
spatial information inside the varying spectral bands is severely discriminated against
without significant loss of information from the spectral domain, which is necessary
for HSI data. In the hybrid model, the total count of parameters depends on the count
of classes in a dataset.
Table 1 represents the calculation of total trainable parameters of the proposed
hybrid CNN. The network consists of four convolutional layers and three dense
layers. Of the four convolutional layers, three are 3D convolutional layers and the
remaining one is the 2D convolutional layer (Table 2).
In hybrid network, the count of trainable weight parameters for the Indian pines
dataset is 5, 122, 176 and the count of the patches is 14641. The Adam optimizer
backpropagation algorithm is used to train the weights and initialize it randomly to
a value.
4 Dataset
The experiments were conducted on three hyperspectral dataset such as Pavia univer-
sity, Indian pines, and Salinas scene. In 1992, the AVIRIS sensor obtained Indian
pines dataset over the Indian pines test site in northwest Indiana. IP has images with
a spatial dimension of 145 × 145 m/pixel and a wavelength of 224 spectral bands
varying from 400 to 2500 nm, 24 bands were omitted. 16 vegetation groups which
are mutually exclusive are located in the IP dataset. Nearly 50% (10,249) of a total
of 21,025 pixels, however, include ground truth information from each of the 16
different classes.
Pavia university dataset was acquired by the ROSIS sensor, Northern Italy, in 2001.
This consists of 610 spatially 340 pixels and spectral information is recorded with
1.3 mpp spatial resolution within 103 bands varying from 430 to 860 nm wavelength.
The ground truth is conceived to provide nine levels of urban land. In addition,
approximately 20% of total 207,400 picture elements include information about
ground reality.
The Salinas scene dataset was collected in 1998 over the Salinas Valley, CA, USA
by the 224-band AVIRIS sensor, and the images are 512–217 spatial dimensions and
spectral information is encoded in 224 bands with a wavelength varying from 360
to 2500 nm. For both in Salinas scene and Indian pines due to water absorption, 20
spectral bands were discarded.
5 Results
The plot of accuracy, epoch, and loss provides an indication of useful things about
the training of the model, such as the speed of convergence over epochs (slope).
The classified image obtained from the hybrid CNN is represented in Fig. 2.
602
Table 3 Accuracy
Network Accuracy (%)
comparison
SVM 88.18
2D CNN 89.48
3D CNN 90.40
Hybrid network 99.79
The number of epoch considered for the proposed hybrid CNN is 100. For
the validation and training samples, the value of loss convergence and accuracy
is obtained.
Table 3 represents the accuracy comparison of the proposed hybrid network with
the 3D CNN, 2D CNN, and the support vector machine. The hybrid spectral network
provides an accuracy of 99.79% (Fig. 3).
6 Conclusion
Hyperspectral image classification is not easy as the ratio involving number of bands
in the spectral domain and the number of samples for training is adverse. Three
benchmark hyperspectral datasets such as Salinas scene, Pavia university, and the
Indian pines are used for the classification. The 3D or 2D convolution single-handedly
cannot reflect the highly discriminatory function as opposed to 3D and 2D hybrid
604 K. Priyadharshini and B. Sathya Bama
Accuracy Vs Epoch for Indian Accuracy Vs Epoch for Pavia Accuracy Vs Epochs
pines University
1.5
1.5
Accuracy
2 1
Accuracy
Accuracy
1
1 0.5
0.5
0 0
1
18
35
52
69
86
0 1 1223344556677889
1
13
25
37
49
61
73
85
97
Epochs Epochs
Epochs
loss Vs Epochs loss Vs Epoch for Pavia loss Vs Epochs of Salinas
3.5 University Scene
3
2 6
2.5
1.5
2 4
Loss
loss
Loss
1
1.5 2
0.5
1
0 0
0.5
1
16
31
46
61
76
91
1
16
31
46
61
76
91
0 Epochs Epochs
1
11
21
31
41
51
61
71
81
91
Epochs
Fig. 3 Plot of Accuracy versus Epoch and loss versus Epoch for Indian pines, Pavia University
and Salinas Scene
convolutions. The proposed model is more beneficial than the 2D CNN and 3D CNN.
The used 25 × 25 spatial dimension is most suitable for the proposed method.
The experimentation is carried on three hyperspectral datasets to analyze and
compare the performance metrics. Classification of hyperspectral data using SVM
provides an accuracy of 88.18%. The performance of the proposed model is able to
outperform 2D CNN (89.48%) and 3D CNN (90.40%) by providing the accuracy of
99.79%. The hybrid CNN is computationally efficient compared to the 3D CNN and
for minimum training data it provides enhanced performance.
References
1. Chang C-I (2003) Hyperspectral imaging: techniques for spectral detection and classification.
Springer Science and Business Media, vol 1
2. Camps-Valls G, Tuia D, Bruzzone L, Benediktsson JA (2014) Ad_x0002_vances in hyperspectral
image classification: Earth monitoring with statistical learning methods. IEEE Signal Process
Mag 31(1):45–54
3. Liu P, Zhang H, Eom KB (2017) Active deep learning for classification of hyperspectral images.
IEEE J Select Top Appl Earth Observ Remote Sens 10(2)
4. Dian R, Li S, Guo A, Fang L (2018) Deep hyperspectral image sharpening. In: IEEE transactions
on neural networks and learning systems, vol 29, no 11
5. Yu C, Zhao M, Song M, Wang Y, Li F, Han R, Chang C-I (2019) Hyperspectral image classi-
fication method based on CNN architecture embedding with hashing semantic feature. IEEE J
Select Top Appl Earth Observ Remote Sens 12(6)
Anomaly Prognostication of Retinal
Fundus Images Using EALCLAHE
Enhancement and Classifying
with Support Vector Machine
Abstract Ophthalmic diseases are generally not serious, but can be lifesaving too.
Even though genetic eye disorders have their own significant effect for generations,
man-made disorders due to certain unhealthy practices can induce serious conditions
like vision loss, retinal damage, macular degeneration caused in young adults due
to smoking, and so on. Besides all the odds, detection of diseases way before they
start to threaten could be easier to get rid of major damage. This proposed system
focuses on providing first-level investigation in detecting ophthalmic diseases and
to assist subjects to identify the anomalous behavior earlier as well as initiate reme-
dial measures. Retinal fundus images used undergo pre- and post-processing stage,
then is trained, tested, and classified based on the disorders like vitelliform macular
dystrophy (VMD), retinal artery and vein occlusion (RAVO), Purtscher’s retinopathy
(PR), and diabetic patients with macular edema (ME).
1 Introduction
Sakthi Karthi Durai et al. [1] in this paper discuss the various diseases detected from
retinal fundus images. The detected diseases are age-related macular degeneration
(AMD), cataract, hypertensive retinopathy, diabetic retinopathy. Various classifiers
and preprocessing methods were reviewed of which adaptive histogram equalization
forms the major role in preprocessing and SVM gave the best output.
Kandpal and Jain [5], Sarika et al. [6] in this paper deal with the various methods
of enhancing the color texture features that are used in the preprocessing of retinal
fundus images and the CLAHE method with edges and dominant orientations proved
to be better of all. The technique simply suppresses the non-textured pixels and
enhances the textured so that better quality image is obtained which is estimated
with the help of BRISQUE.
Shailesh et al. [7] in this review paper brief the comparison of techniques so far
employed using various preprocessing to classification stages. The initial prepro-
cessing stage mostly used CLAHE together with color moments for effective image
retrieval process. CLAHE proved to be flexible in detecting the non-textured region.
Segmentation techniques deployed were ROI, green channel extraction, quantiza-
tion, thresholding, etc. A series of classifiers was incorporated, in which support
vector machine (SVM), radial basis function neural network (RBFNN), artificial
neuro fuzzy (ANF), artificial neural network (ANN), random forest (RF), decision
tree performed well.
Onaran et al. [8] in this paper discuss on the findings of the Purtscher’s retinopathy
(PR), that is caused mainly due to the traumatic state, which mainly forms like a
small series of cotton wool spots. In this paper, two image types are used OCT and
FUNDUS. On comparing the results between the two, FUNDUS images provided a
good start for the accumulation of the small bilateral cotton wool spots manly on the
posterior poles of the retina.
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 607
Xiao et al. [9] in this paper deeply discuss the vector quantization technique that
is effective in detecting macular edema (ME). As the color images are od 24-bit, the
clustering vectors divide the image into eight levels with the seven threshold values
injected to the segmentation process of 8 × 2 × 2. Thus, most of the information is
extracted from this segmentation technique for the processing of RGB image.
Anusha et al. [10] in this paper explain the feature extraction techniques that are
used to get the color moments mainly for image retrieval and the values of mean,
stand deviation, skewness distribution between moments. The textured features like
entropy, energy, etc., are also got to understand the features well. Thus, image retrieval
process is effectively fast using color moments.
3 Proposed System
The proposed system utilizes CAD systems for the diagnosis for medical data on
physical abnormality. Firstly, the retinal fundus images are preprocessed with help
of histogram technique and are also quantized to obtain the intermediate levels color
information. Secondly, segmentation of the images is done using thresholding and
quantization mainly for the purpose of post-processing to avoid missing any infor-
mation which could be useful in detecting the retinal surface’s abnormality. Thirdly,
features are extracted to get the texture details, information regarding feature differ-
entiation, energy, mean amplitude, median, and standard deviation to make it easy for
the algorithm to obtain a firm decision in detection of diseases. Finally, all the above-
mentioned steps are summarized and coupled to the training step for classification
using SVM classifier and is tested for accuracy (Fig. 1).
4.1 Dataset
Retinal fundus images were initially collected from online platform like MESSIDOR,
DRIVE, STARE, KAGGLE, CHASE, ARIA, ADCIS, etc., which aim at providing
datasets for research purposes. The process is initialized with the acquisition of
images through a digital camera or fundus images obtained from the dataset
repository.
608 P. Raja Rajeswari Chandni
Retinal fundus images are prominent to periodic noises like salt and pepper and Gaus-
sian noise; thus, a firm preprocessing technique is required to strengthen the overall
region of the image. The introduced (EALCLAHE) technique initially processes
RGB components separately along with gamma correction and G-matrix to display
as the RGB dash components. The clip limit is set with Rayleigh distribution, upon
setting the edge threshold limit to leave away intact the strong edges with minimum
intensity amplitude and initiating the amount of enhancement for smoothing the
local contrast. Further, for denoising the enhanced images denoising convolution
neural networks (dnCNN) which has pretrained nets and offers better noise reducing
techniques over the others. The flowchart of the preprocessing step is listed below in
Fig. 2. For testing the image quality, PSNR, SSIM, NIQE, BRISQUE quality metrics
are used (Fig. 3).
The testing parameters for the noise removal used in this proposed method are:
• Peak Signal-to-Noise Ratio (PSNR):
Peak signal-to-noise ratio or the PSNR [11] is employed to calculate the variations
present in the original and denoised image of size M × N. This is estimated using
equation of SNR and is expressed in decibel (dB). The original image and denoised
images are represented as r (x, y) and t(x, y) respectively.
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 609
2552
P SN R = 10 log10 (1)
MSE
where 255 is the highest intensity value in the grayscale image and MSE is the
mean-squared error and is given by.
610 P. Raja Rajeswari Chandni
M,N [r (x, y) − t(x, y)]2
M SE = (2)
M*N
• Structural Similarity Index Measure (SSIM):
SSIM index is calculated for a selected window ranging between x and y of size
N × N may be drained in the subsequent way
The testing parameters of the proposed system for quality metrics are tabulated
below (Table 1 and Fig. 4).
Though the previous step acts upon the improvement of image data to suppress
the undesirable distortions, segmentation is also utmost necessary to fine tune the
properties of the image.
The proceeding step is the conversion of RGB color space to HSV (hue, saturation,
value) in order to obtain the luminance values of each color which provides better
information for the feature extraction process. The individual threshold values for
each color are obtained based on range set segmentation and are listed below in Table
612 P. Raja Rajeswari Chandni
2. Then, the images are quantized to separate into eight levels. In an aim to obtain
the complete color information regarding the hue, saturation, value of the RGB color
space (Fig. 5).
Feature extraction is the step that which forms the base of predicting the diseases
through classification.
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 613
The proposed system utilizes color texture analysis using Gabor filter. Gabor
filters are so flexible that it offers higher degrees of freedom over Gaussian deriva-
tives. During texture analysis, Gabor features are extracted to analyze the particular
frequency of an image in specified direction around the region of analytical interest.
Wavelet transform is also used to support the texture analysis in obtaining a combo
of feature vectors. Further, image retrieval allows diagnostic process more effective
to differentiate images based on its color features (Table 3).
This step helps in differentiating between the similarity features. The features like
color moments, mean amplitude, standard deviation, energy, entropy, and color
textures are obtained from the above step and are injected to the classifier to perform
at its best.
SVMs are efficient classifiers mainly for machine learning researches. There are
several functional approaches including polynomial, radial basis, neural networks,
etc. The linear SVM classifier maps points into divided categories such that they are
wide separated with gaps between them. Thus, a hyperplane is selected to classify
the dataset provided and the plane must satisfy the condition
Yi [(w · xi ) + b] ≥ 1 − εi , εi ≥ 0 (4)
where
W = the weight vector
614 P. Raja Rajeswari Chandni
Correlation N
−1 N
−1
(1−μx )(1−μ y ) p(x,y)
σx σ y
x=0 y=0
Entropy N
−1 N
−1
p(x, y) log( p(x, y))
x=0 y=0
Contrast N
−1 N
−1
|x − y|2 p(x, y)
x=0 y=0
Mean 1
L−1
n ri j p ri j
i =0
j =0
N
Standard deviation i=1 (x i j −x)
2)
N
b = the bias
εi = the slack variable.
The prompted system is classified effectively using support vector machine
(SVM). This system uses 7:3 ratio for training and testing of image database. Further,
the classified images are inspected with the help of confusion matrix in order to
examine the classifiers performance. The Accuracy of the system is represented
graphically in the fig below.
The classified images are successfully tested with the help of confusion matrix and
the number of misclassified images is also available in confusion matrix table and
their accuracies are plotted in the graph. The accuracy is estimated using TP, TN, FP,
FN parameters. The overall accuracy of the system obtained to be 94.3%. Further
using various other comparison techniques can improve the quality of the system
(Figs. 6, 7 and Table 4).
TP + TN
Accur acy % = × 100 (5)
TP + FP + TN + FN
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 615
The ability to detect various diseases in their early stage is a very useful work for the
society. This system can be useful in healthcare domain, especially in routine check-
ups and so, the diseases could be caught in their initial stages. It helps in recognizing
the disease of the person and minimizes the cost of diagnosing the disease. The
proposed system is based on digital image processing technique, which combines
the features of Retinal color, shape, texture to form a feature vector for texture
analysis and then predicting the disease using supervised learning SVM algorithm.
Our future enhancement is to implement this project with hardware setup to overcome
some limitations of image processing and to further enhance the model to a product
or even an android application which can be used to conduct test without human
supervision and no duly cost for diagnosing.
References
1. Sakthi Karthi Durai et al B (2020) A research on retinal diseases predictionin image processing.
Int J Innov Technol Explor Eng 9(3S):384–388. https://doi.org/10.35940/ijitee.c1082.0193s20
2. Vonghirandecha P, Karnjanadecha M, Intajag S (2019) Contrast and color balance enhancement
for non-uniform illumination retinal images. Tehničkiglasnik 13(4):291–296. https://doi.org/
10.31803/tg-20191104185229
3. Rupail B (2019) Color image enhancement with different image segmentation techniques. Int
J Comput Appl 178(8):36–40. https://doi.org/10.5120/ijca2019918790
4. Jiménez-García J, Romero-Oraá R, García M, López-Gálvez M, Hornero R (2019) Combina-
tion of global features for the automatic quality assessment of retinal images. Entropy 21(3):311.
https://doi.org/10.3390/e21030311
5. Kandpal A, Jain N (2020) Retinal image enhancement using edge-based texture histogram
equalization. In: 2020 7th international conference on signal processing and integrated networks
(SPIN), Noida, India, pp 477–482. https://doi.org/10.1109/SPIN48934.2020.9071108
6. Sarika BP, Patil BP (2020) Automated macula proximity diagnosis for early finding of diabetic
macular edema. In: Research on biomedical engineering. Springer, Berlin. https://doi.org/10.
1007/s42600-020-00065-9
7. Shailesh K, Shashwat P, Basant K (2020) Automated detection of eye related diseases using
digital image processing. In: Handbook of multimedia information security: techniques and
applications, pp 513–544. https://doi.org/10.1007/978-3-030-15887-3_25
8. Onaran Z, Akbulut Y, Tursun S, Oğurel T, Gökçınar N, Alpcan A (2019) Purtscher-
like retinopathy associated with synthetic cannabinoid (Bonzai) use. Turkish J Ophthalmol
49(2):114–116. https://doi.org/10.4274/tjo.galenos.2018.67670
Anomaly Prognostication of Retinal Fundus Images Using EALCLAHE … 617
1 Introduction
Scientists have associated solid earth tides with the occurrence of earthquakes, as
the displacement produced by earth tides affects the motion of the tectonic plates;
the results suggest that the big earthquakes are triggered by the abnormal irregularity
in solid earth tides (SET) and anomalous transient change in outgoing longwave
radiation (OLR). Ide [1] has confirmed that the majority of the higher magnitude
earthquakes are likely to happen when there is high tidal stress which is limited to
specific regions or circumstances. Similarly, transient thermal abnormalities occur-
ring before destructive earthquakes were detected by Russian scientists during the
late 1980s through the use of satellite technology. By understanding the atmospheric
earthquake signals scientifically by making use of advanced remote sensing instru-
ments, satellite thermal imaging data can be used as an effective tool in the detection
of OLR anomaly [2].
When there is a need for tabular datasets processing, solving classification prediction
problems, and regression prediction problem, artificial neural networks (ANN) is
used.
Moustra et al. [3] developed an artificial neural network for time series analysis
of seismic electric signals through which input data is the magnitude and the corre-
sponding output will be the next day magnitude and performance evaluation has been
made.
Recurrent neural networks(RNN) is a class of ANN that is used when the informa-
tion is in the form of time series data and when there is temporal dynamic behavior,
CNN is used when there is a need to map the image data to a resulting variable when
there is a temporal dynamic activity.
It holds well with data that has a spatial relationship. Asim et al. [4] predicted
using seismic features and classified in combination with support vector regressor—
hybrid neural network prediction system and their performance can be measured for
a particular region. Vardaan et al. [5] discussed forecasting earthquakes and trends
using a series of past earthquakes. Long short-term memory (LSTM), one of the
categories of RNN, is used for modeling the series of earthquakes. The model is
Analysis of Pre-earthquake Signals Using ANN: Implication … 621
trained for predicting the future trend of earthquakes. It is contrasted with feed-
forward neural network (FFNN) and as a result, LSTM was found to better than
FFNN.
The Elman neural networks is a type of dynamic RNN that follows a varied feed-
forward topology. The architecture of Elman neural network comprises of the input
layer, followed by a hidden layer, and finally the output layer. The best part of using
the Elman network is that the particular context input nodes which memorize the
previous data of the hidden nodes. This makes Elman-NN applicable in the fields of
dynamic system identification and prediction control parameters used for creating
the Elman backpropagation neural network.
The historical dataset of precursory parameters of the earthquake occurred in
Simeulue, Indonesia region from 2004 to 2018 is considered for training and subse-
quent testing will be done for the historical dataset of precursory parameters of the
earthquake that occurred in Simeulue, Indonesia region, during 2004–2014. We have
utilized the United States Geological Survey (USGS) for obtaining earthquake cata-
logs. The number of iterations will be changed to forecast the earthquakes concerning
spatial variables such as latitude, longitude, magnitude, and date/time of occurrence
with reasonable accuracy for achieving optimization. These precursors were observed
several days to months before the occurrence of big earthquakes; hence, we have used
ANN to forecast the occurrence of earthquakes.
2 Study Area
Indonesia is prone to earthquakes due to its location on the Ring of Fire, an arc of
volcanoes and fault lines in the Pacific Ocean basin. The field shaped like a shoe
extends 40,000 km (25,000 miles) and is where most earthquakes occur around the
world.
Several large earthquakes have struck the Indonesian region as Indonesia’s
tectonics seem to be highly complex because, and many tectonic plates like Eurasian
Plate, Australian Plate, Philippine Sea Plate will meet at this point and the Pacific Plate
between two oceanic plates [6]. Sumatra sits above the convergent plate boundary,
where the Australia Plate is suppressed along with the Sunda megathrust under the
Sunda Plate. The convergence on this section of the boundary is strongly oblique
and the strike-slip portion of the plate movement is accommodated along the Great
Sumatran Fault on the right side. The Sunda megathrust activity has triggered several
tremendous earthquakes.
The February 20, 2008 magnitude 7.4 Simeulue, Indonesia earthquake occurred
as a result of a thrust fault on the border between the Australia and Sunda plates.
622 R. Jeyaraman et al.
The Australia plate travels north-northeast toward the Sunda plate at a pace of about
55 mm/year at the location of this earthquake [7] (Table 1).
In this study, 28 earthquakes of Simeulue, Indonesia region, earthquakes are
analyzed in terms of magnitude (Fig. 1).
Table 1 List of earthquakes occurred in Simeulue, Indonesia region since 2004 with magnitude
>6 (data provided by USGS https://earthquake.usgs.gov)
Event Origin time Latitude Longitude Mag Depth Place
25-07-2012 00:27:45.260Z 2.707 96.045 6.4 22 Simeulue, Indonesia
26-01-2011 15:42:29.590Z 2.205 96.829 6.1 23 Simeulue, Indonesia
09-12-2009 21:29:02.890Z 2.759 95.91 6 21 Simeulue, Indonesia
29-03-2008 17:30:50.150Z 2.855 95.296 6.3 20 Simeulue, Indonesia
20-02-2008 08:08:30.520Z 2.768 95.964 7.4 26 Simeulue, Indonesia
22-12-2007 12:26:17.470Z 2.087 96.806 6.1 23 Simeulue, Indonesia
29-09-2007 05:37:07.260Z 2.9 95.523 6 35 Simeulue, Indonesia
07-04-2007 09:51:51.620Z 2.916 95.7 6.1 30 Simeulue, Indonesia
11-08-2006 20:54:14.370Z 2.403 96.348 6.2 22 Simeulue, Indonesia
19-11-2005 14:10:13.030Z 2.164 96.786 6.5 21 Simeulue, Indonesia
08-06-2005 06:28:10.920Z 2.17 96.724 6.1 23.5 Simeulue, Indonesia
28-04-2005 14:07:33.700Z 2.132 96.799 6.2 22 Simeulue, Indonesia
30-03-2005 16:19:41.100Z 2.993 95.414 6.3 22 Simeulue, Indonesia
26-02-2005 12:56:52.620Z 2.908 95.592 6.8 36 Simeulue, Indonesia
27-12-2004 20:10:51.310Z 2.93 95.606 5.8 28.9 Simeulue, Indonesia
28-12-2004 03:52:59.230Z 2.805 95.512 5 24.9 Simeulue, Indonesia
29-12-2004 10:52:52.000Z 2.799 95.566 5.4 23.2 Simeulue, Indonesia
01-01-2005 01:55:28.460Z 2.91 95.623 5.7 24.5 Simeulue, Indonesia
05-02-2005 04:09:53.640Z 2.325 95.065 5.1 30 Simeulue, Indonesia
09-02-2005 01:02:26.190Z 2.278 95.156 5 28.7 Simeulue, Indonesia
24-02-2005 07:35:50.460Z 2.891 95.729 5.6 30 Simeulue, Indonesia
28-03-2005 12:56:52.620Z 2.335 96.596 5.4 28.9 Simeulue, Indonesia
28-03-2005 16:34:40.570Z 2.087 96.503 5.5 30.6 Simeulue, Indonesia
28-03-2005 16:44:29.780Z 2.276 96.183 5.1 30 Simeulue, Indonesia
28-03-2005 17:03:34.430Z 2.751 96.049 5.4 30 Simeulue, Indonesia
28-03-2005 18:48:53.500Z 2.467 96.758 5.1 26.8 Simeulue, Indonesia
28-03-2005 19:54:01.090Z 2.889 96.411 5.6 29.2 Simeulue, Indonesia
28-03-2005 23:37:31.350Z 2.914 96.387 5.4 28.6 Simeulue, Indonesia
Analysis of Pre-earthquake Signals Using ANN: Implication … 623
Fig. 1 A location map for Simeulue, Indonesia region. The red color indicates the epicenters over
the region of 2004–2018 earthquakes which hasa magnitude above 6 as listed in Table 1
3 Methodology
t
Elmτ = Elmτ (1)
τ =1
624 R. Jeyaraman et al.
where
“t” is the number of predefined previous year for which mean OLR flux is
determined for given location (l, m) and time (t)
E pqτ − Elmτ
Flux index (Elmτ ) = (2)
σlmτ
where
Elmτ —Flux index value for p-latitude, q-longitude, and data acquisition time
(t).
Elmτ —Current OLR value flux determined for spatial coordinates (p, q) and time
(t).
Elmτ —Mean OLR value flux determined for spatial coordinates (p, q) and time
(t).
Anomalous nature of flux index of energy “[Elmτ ]∗ ” can be determined by
removing out the energy flux index value below +2σ level of mean OLR flux, and it
helps in maintaining the duration of anomalous flux observed.
If Elmτ ≥ Elmτ + 2
Then,
where
[Elmτ ]∗ = Anomalous energy flux index observed for given location and time.
Studies were done on earthquake prediction using recurrent neural networks (RNN)
on the pre-earthquake scenario of seismic, solid earth tide, and atmospheric parame-
ters of earthquakes occurred in the Simeulue, Indonesia region, over the past 14 years
(2004–2018) will be given as an input for the machine to learn. The use of the neural
network in the learning environment can be done in two stages namely training and
testing.
The retrospective analysis has been made on the earthquakes of Simeulue,
Indonesia region. In this present analysis, we looked for the stress that is investi-
gated during the syzygy from the perspective of their seismic activity effect within
solid earth tides.
The recurrent dy/dt deceleration causes the interlocking of the interface of the
tectonic plate, leading to increased tidal stresses. Therefore, such rapid deformation
will lead to a shift in the state of stress over the entire seismic zone, leading to the
release of maximum energy and thus increasing the risk of earthquakes. Given this
Analysis of Pre-earthquake Signals Using ANN: Implication … 625
triggering effect of solid earth tides, earthquakes of greater magnitude are likely to
occur (Fig. 2).
Our current work focuses on the creation of a neural network model using a
recurrent neural network, and the performance of the Elman backpropagation neural
network is investigated with the corresponding input parameters to determine the
accuracy of the network.
Based on the network training function named trainlm, a neural network model for
earthquake prediction has been developed. Trainlm is a supervised neural network
training algorithm which works according to Levenberg–Marquardt optimization by
updating the weight and bias values.
626 R. Jeyaraman et al.
Input nodes involving two solid earth tide variable variables such as date of solid earth
tide anomaly and weights allocated for continual anomaly days, date of atmospheric
OLR anomaly, distance, day of OLR anomaly, latitude, longitude, pre-earthquake
anomaly index are involved in eight variables. There is no fixed approach for fixing
the optimum number of hidden nodes.
The seismic information used in this work is derived from the entire USGS instru-
mentally reported quakes that occurred in Simeulue, Indonesia. After removing
aftershocks and foreshocks, an earthquake magnitude greater than 6 Richter is
considered. Input parameters consist of three spatial variables related to earthquake
spatial characteristics, a single variable, and two anomaly value-related variables are
considered.
Longitude, latitude, and depth of earthquake are three parameters that are allocated
to each event.
The day and time between the event and the anomaly happened are considered to be
minimum peak date and maximum peak date is considered in this variable. The peak
Analysis of Pre-earthquake Signals Using ANN: Implication … 627
Error Calculation
16
14
12
10
8
6
4
2
0
Above -0.30 0 to -0.3 0 to 0.3 Above+0.3
Latitude Longitude
Fig. 3 Comparison of the expected versus actual predicted graph for spatial coordinates
a Depth b Magnitude
15 15
10 10
5 5
0 0
Above - 0 to -2.5 0 to Above Above - 0 to - 0 to Above
2.5 +2.5 +2.5 0.45 0.45 +0.45 +0.45
Fig. 4 Comparison of the expected versus actual predicted graph for a depth value, b magnitude
good which is 75% efficient compared with the actual one and the error percentage
of 25.
For magnitude, out of 28 earthquakes, 14 earthquakes output hold good for the
neural network thereby forecasted output is 50% efficient compared with the actual
one and the error percentage is 50. The accuracy can be improved by adding more
data to the neural network.
The retrospective research findings are following the genetic consensus in the
field of seismology:
Input: The precursory earthquake parameters for Simeulue, Indonesia region, has
been taken for analysis by which date of an anomaly of solid earth tide and weights
assigned for the continual anomaly days, OLR anomaly date, distance, day of OLR
anomaly, latitude, longitude, anomaly index in which the input parameters appear
before the earthquake is chosen.
Output: The latitude, longitude, depth, the magnitude will be the output parameter
for the neural network.
The network is modeled with four hidden layers. Method of training happens by
which a predefined desired target output is compared with the actual output and the
difference is termed as an error in terms of percentage.
Analysis of Pre-earthquake Signals Using ANN: Implication … 629
5 Conclusion
The significance of anomaly obtained in SET evidence a very high impact of OLR
on earthquake triggering. Hence, tidal amplitude irregularities of SET trigger plate
tectonics thereby leads to OLR anomalies, which act as a short-term precursor for
detecting the time of occurrence of earthquakes. When the tidal triggering is found
to be stronger, the larger magnitude earthquake will occur. Through the analysis with
higher reliability, we have identified a strong link between the precursors and location
of the devastating earthquake and Outgoing longwave radiation and the magnitude of
the earthquake. The result we obtained strongly suggests that a neural network model
using multiparameter earthquake precursors can develop a short-term earthquake-
forecasting model. In this paper, the correlation of peculiar anomalies in solid earth
tides (SET) and anomalous outgoing transient shift longwave radiation (OLR) with
major earthquakes is obtained. We use a neural network to predict large earthquakes
for Simeulue, Indonesia region, and considered earthquakes with a magnitude greater
than 5.0 occurred during the period from 2004 to 2014. We discuss the issue in
anticipating the spatial variables of the earthquake by finding a pattern through a
neural network in the history of earthquakes. Preliminary outcomes of this research
are discussed. Although the technique is capable of achieving effectiveness, further
efforts are being made to achieve a stringent conclusion. Also, since the networks
tend to miss significant aftershocks and pre-shocks, it is expected that the results
can be improved by including more number of data in the neural network to achieve
better efficiency.
Acknowledgements We are greatly indebted to the Ministry of Earth Sciences for financial assis-
tance (Project No: MoES/P. O(seismo)/1(343)/2018). We thank National Oceanic and Atmospheric
Administration for providing data for Outgoing Longwave radiation to the user community.
References
1. Ide S, Yabe S, Tanaka Y (2016) Earthquake potential revealed by tidal influence on earthquake
size frequency statistics. Nat Geosci Lett. https://doi.org/10.1038/NGEO2796
630 R. Jeyaraman et al.
Abstract Agricultural productivity is one of the important sectors that influence the
Indian economy. One of the greatest challenges that affect agricultural productivity
is plant disease which is quite prevalent in almost all crops. Hence, plant disease
detection has become a hot research area to enhance agricultural productivity. Auto-
mated detection of plant diseases is hugely beneficial to farmers as it reduces the
manual workload of monitoring and detection of the symptoms of diseases at a very
early stage itself. In this work, an innovative method to categorize the tomato and
maize plant leaf diseases has been presented. The efficiency of the proposed method
has been analyzed with plant village dataset.
1 Introduction
Agriculture is one of the significant important sectors that have a great influence
on the economy of developing countries. The main occupation of 60% of the rural
populace is agriculture, and the livelihood of the farmers depends solely on their
agricultural productivity—the greatest challenge faced by farmers in the prevention
and treatment of plant diseases. Despite the hard and sustained efforts of farmers,
productivity is affected by crop diseases, which needs to be addressed.
With the remarkable innovations in sensors and communications technologies,
the agricultural sector is becoming digital with automated farm practices like water
management, crop disease monitoring, pest control, and precision farming, etc. Clas-
sification and identification of plant disease are one of the important applications of
machine learning.
Machine learning deals with the development of algorithms that perform tasks
mimicking human intelligence. It learns abstractions from data just like human
beings learn from experience and observations. Machine learning has become a
trendy research area with its growing number of applications in computer vision
in the fields of medicine, agriculture, remote sensing, forensics, law enforcement,
etc. Deep learning is a subset of machine learning which learns data associations
using deep neural networks [1]. Several researchers have explored the utilization of
deep neural networks in plant disease classification and detection. In this paper, a
convolution neural network (CNN) model has been constructed for fast and accurate
categorization of diseases affecting tomato leaves and maize leaves. As tomato is
one of the income-generating crops of farmers of rural South India, this work has
been developed for helping them in early detection of disease.
In this research work, a pre-trained CNN model smaller VGG16 net was used
to classify the leaf diseases of various plants from the image dataset. The review of
the literature is presented in Sect. 2. Our proposed work of the plant leaf disease
classification is described in Sect. 3. The experimental setup and results discussion
in terms of accuracy are presented in Sect. 4. Conclusion and future enhancement
are discussed in Sect. 5.
2 Literature Review
Kawasaki et al. [2] developed a new method of deep convolutional neural network
to differentiate healthy cucumbers leaves. The CNN model used to identify two
injurious microorganism infections: melon yellow spot virus (MYSV) and zucchini
yellow mosaic leaves (ZYMV). This model accuracy is 94.9% for cucumber leaf
disease.
Mohanty et al. [3] described a CNN model to classify the leaf disease using three
types of plant leaf dataset: colored images, gray-scaled images, and segmented leaves.
Two standard architectures AlexNet and GoogleNet are used for classification. The
highest accuracy for AlexNet is 0.9927%, and the GoogleNet is 0.9934%, by the
transfer learning.
Sladojevic et al. [4] developed a deep convolutional neural network method to
classify the plant diseases. The transformation of the CNN model increase is used to
increase the dataset size. The accuracy of this model with fine-tuning is 96.3% and
without fine-tuning is 95.8%.
Nachtigall et al. [5] discussed the application of CNN to detect and catego-
rize images of apple trees. AlexNet has been used to categorize the disease. They
compared the shallow technique against a deep convolutional neural network. In that
method, the multi-layer perception was chosen. The accuracy of the CNN model
97.3% for apple tree leaves.
Brahimi et al. [6] described the convolutional neural network for classifying
tomato leaf disease. The tomato leaves are split into nine classes of diseases. The
method used two standard architectures to be described. AlexNet and GoogleNet
are used in learning from scratch or transfer learning. The GoogleNet improves the
A Novel Method for Plant Leaf Disease Classification … 633
accuracy from 97.71 to 99.18%, and AlexNet improves the accuracy from 97.35 to
98.66%.
Dechant et al. [7] applied the deep learning method to classify the maize plants.
Three phases were suggested in this model. In the first phase, several techniques
have been taught, and in the second phase, a heat map was produced to indicate the
probability of infection in each image. The heat map was used in the final phase to
classify the picture. The total accuracy for maize leaves was 96.7%.
Lu et al. [8] described applying the CNN model to classify the rice leaves diseases.
They collect 500 images from the yield to build a dataset. AlexNet was used to create
a rice disease classifier, with the overall accuracy of 95.48% for rice leaves.
Kulkarni et al. [9] discussed an artificial neural network (ANN) methodology
to find plant disease detection and classification. Gabor filter is used for extracting
feature that gives better recognition result. An ANN classifier classifies the various
kinds of plant diseases and also identifies the mixture of color and leaf features.
Konstantinos et al. [10] described the CNN model to detect both diseased and
non-diseased leaves. Several model architectures have been trained, with the highest
results in disease identification achieving a 99.53% success rate.
Fujita et al. [11] applied a CNN classifier model using cucumber disease. The
dataset consists of seven different classes, including a healthy class. The work is
based on AlexNet architecture to classify cucumber diseases. The accuracy of the
proposed work was 82.3%.
Pooling layer is used to reduce the space dimension of an image. Batch normaliza-
tion allows every layer of a network to learn without anyone else’s input somewhat
more autonomously of different layers.
The proposed work utilizes deep CNN smaller VGG16 with thirteen layers for
characterizing various sorts of diseases in tomato and maize leaves.
Rectified linear unit (ReLU) is an activation function which is used to convert the
positive part of its output. It is the main commonly used function as it learns faster
than other functions and computationally less intensive.
Figure 1 displays the workflow of the proposed model (Fig. 2).
The detailed information regarding the no. of classes and the number of images
which are used in the dataset is given in Tables 1 and 2.
Figure 3 shows the visual representation of ten types of diseases by healthy and
unhealthy leaves.
The proposed method involves the following three main stages:
1. Preprocessing
This step involves the selection and fine-tuning of the relevant dataset.
2. Training
This stage is the core of a deep learning process which trains the CNN model
to categorize diseases using the preprocessed dataset.
3. Testing
The trained model is validated with the test dataset, and the accuracy of the
model is calculated in this stage.
3.1 Preprocessing
One of the most vital elements of any deep learning application is to train the dataset
using the model. In the proposed work, images are taken from plant village dataset.
It consists of 13,257 images of tomato leaves and 3150 images of maize images,
636 R. Sangeetha and M. Mary Shanthi Rani
Fig. 3 a Healthy leaf. b Bacteria spot. c Early blight. d Late blight. e Mosaic virus. f Septoria leaf
spot. g Target spot. h Yellow leaf curl virus. i Northern leaf blight. j Brown spot. k Round spot
including both healthy and non-healthy leaves. The dataset is initially divided into
the ratio of 80:20 or 70:30 for the training phase and test phase to improve the results.
The accuracy of the network depends on the size and proportion that has been taken
for training and testing. Overfitting of data results in high test dataset error, and
underfitting leads to both high training and test errors.
In the proposed method, the dataset is divided into 80:20. All the images are
resized to 256 * 256 as a preprocessing step, to reduce the time complexity of the
training phase.
3.2 Training
In the training phase, the dataset is trained using smaller VGG16 model with ReLU
activation function. One important feature of ReLU is that it eliminates negative
values in the given input by replacing with zero. This model uses binary cross-entropy
rather than categorical cross-entropy.
A Novel Method for Plant Leaf Disease Classification … 637
Simonyon and Zisserman introduced the VGG network architecture. The proposed
model uses a pretrained smaller VGG16 net. Here, thirteen convolution layers are
present, and each layer is followed by ReLU layer. Max pooling is present in some
convolution layers to trim down the dimension of the image. Batch normalization
helps to learn faster and achieve higher overall accuracy. Both ReLU activation func-
tion and batch normalization are applied in all experiments. Dropout is a technique
which is used to reduce overfitting in the model during the training set. The softmax
function is used in the final layer of the deep learning-based classifier. The training
phase using VGG16 network is shown in Fig. 4.
3.3 Testing
In this segment, the validation set for prediction of the leaf as healthy/unhealthy with
its disease name is utilized to estimate the performance of the classifier. Fine-tuning:
It helps to improve the accuracy of classification by making the small modification
hyperparameters and increasing the number of layers.
The experimental results of our model VGG16 for the plant village dataset are given
in Table 3. It lists the classification accuracy of each of the seven diseases along with
638 R. Sangeetha and M. Mary Shanthi Rani
Table 3 Classification accuracy of various tomato leaf diseases using smaller VGG16 net
No. of Bacterial Early Late Septoria Target Yellow curl Mosaic
images spot blight blight spot spot virus virus
400 72.49 68.24 63.29 60.35 40.86 78.97 61
728 84.86 91.59 82.29 75.86 80.75 96.76 91.4
953 90.26 94.86 93.12 85.45 91.5 85.82 78.65
1246 99.94 98.69 98.71 98.46 98.4 99.91 99.74
Table 4 Classification
No. of images Northern leaf blight Brown spot Round spot
accuracy of various maize
leaf diseases using smaller 200 62.94 68.24 63.29
VGG16 net 400 74.52 78.23 76.12
600 83.75 84.91 83.08
850 95.17 96.45 94.65
Table 5 also shows that our model achieves good accuracy above 97 with batch
size for early blight, yellow curl, and mosaic virus.
The graphical representation of Table 5 is shown in Fig. 7.
Table 6 presents the influence of batch size on the classification accuracy of maize
leaves. Table 6 clearly demonstrates that accuracy increases with minibatch size. It
is also worth noting that there is not much increase in accuracy for batch sizes 16,
32, and 64. There is a steep rise inaccuracy from batch size 2–8.
The graphical representation of Table 6 is shown in Fig. 8.
Figure 9 demonstrates the visual presentation of the outputs of the proposed model
for test images. Tomato leaves and maize leaves diseases using smaller VGG16 net.
It is observable from Fig. 4 that our trained model has achieved 98% accuracy in
classifying tomato leaf diseases and maize leaf diseases.
Table 6 Classification
Batch size Northern leaf blight Brown spot Round spot
accuracy for various batch
sizes 2 69.27 67.87 65.45
8 89.62 70.53 87.76
16 90.46 92.47 90.35
32 93.65 94.01 93.43
64 95.31 94.56 94.86
A Novel Method for Plant Leaf Disease Classification … 641
5 Conclusion
In this paper, a smaller VGG16 net has been used to classify the diseases affecting
tomato and maize leaves using plant village dataset. The model uses thirteen layers
instead of 16 layers in VGG16. The results have demonstrated that the model has
achieved 99.18% for tomato leaves and 94.91% for maize leaves. Classification
accuracy is evaluated with 13,257 images of healthy and unhealthy tomato leaves
and 3150 images for maize leaves. The performance of this model has been analyzed
for different minibatch sizes and number of tomato and maize images. This paper
is focused on classifying the diseases in tomato and maize leaves. In the future, this
could be extended to classify diseases of other leaves as well.
642 R. Sangeetha and M. Mary Shanthi Rani
Fig. 9 The testing accuracy comparison between the Tomato and Maize leaf disease
Acknowledgements The experiments are carried out at Advanced Image Processing Laboratory,
Department of Computer Science and Application, The Gandhigram Rural Institute (Deemed to be
University), Dindigul, and funded by DST-FIST.
A Novel Method for Plant Leaf Disease Classification … 643
References
1. Kalpana Devi M, Mary Shanthi Rani M (2020) A review on detection of diabetic retinopathy.
Int J Sci Technol Res 9(2). ISSN: 2277-8616
2. Kawasaki Y, Uga H, Kagiwada S, Iyatomi H (2015)Basic study of automated diagnosis of viral
plant diseases using convolutional neural networks. In: International symposium on visual
computing, pp 638–645. Springer, Cham
3. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease
detection. Front Plant Sci 7:1419
4. Sladojevic S, Arsenovic M, Anderla A, Culibrk D, Stefanovic D (2016) Deep neural networks
based recognition of plant diseases by leaf image classification. Comput Intell Neurosci
5. Nachtigall LG, Araujo RM, Nachtigall GR (2016) Classification of apple tree disorders using
convolutional neural networks. In: IEEE 28th international conference on tools with artificial
intelligence (ICTAI), pp 472–476
6. Brahimi M, Boukhalfa K, Moussaoui A (2017) Deep learning for tomato diseases: classification
and symptoms visualization. Appl Artif Intell 31:299–315
7. DeChant C, Wiesner-Hanks T, Chen S, Stewart EL, Yosinski J, Gore MA, Nelson RJ, Lipson
H (2017) Automated identification of northern leaf blight-infected maize plants from field
imagery using deep learning. Phytopathology 107:1426–1432
8. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep
convolutional neural networks. Neurocomputing 267:378–384
9. Kulkarni Anancl H, Ashwinpatil RK (2012) Applying image processing technique to detect
plantdisease. Int J Modern Eng Res 2(5):3661–3664
10. Ferentinos PK (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agric 145:311–318
11. Fujita E, Kawasaki Y, Uga H, Kagiwada S, Iyatomi H (2016) Basic investigation on a robust and
practical plant diagnostic system. In: 15th IEEE international conference on machine learning
and applications, ICMLA, pp 989–992
12. Sangeetha R, Mary Shanthi Rani M (2019) Tomato leaf disease prediction using convolutional
neural network. Int J Innov Technol Explor Eng 9(1):1348–1352
13. Zhang X, Qiao Y, Meng F, Fan C, Zhang M (2018) Identification of maize leaf diseases using
improved deep convolutional neural networks. IEEE Access 6:30370–30377