2021 Book SoftComputingAndItsEngineering
2021 Book SoftComputingAndItsEngineering
Patel
Deepak Garg
Atul Patel
Pawan Lingras (Eds.)
123
Editors
Kanubhai K. Patel Deepak Garg
Charotar University of Science and Bennett University
Technology Greater Noida, Uttar Pradesh, India
Changa, Anand, Gujarat, India
Atul Patel Pawan Lingras
Charotar University of Science and Saint Mary’s University
Technology Halifax, NS, Canada
Changa, Anand, Gujarat, India
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
It is a matter of great privilege to have been tasked with the writing of this preface for
the proceedings of The Second International Conference on Soft Computing and its
Engineering Applications (icSoftComp2020). The conference aimed to provide an
excellent international forum to the researchers, academicians, students, and profes-
sionals in the areas of computer science and engineering to present their research,
knowledge, new ideas, and innovations. The theme of the conference was “Soft
Computing Techniques for Sustainable Development”. The conference was held during
11–12 December 2020, at Charotar University of Science & Technology (CHAR-
USAT), Changa, India, and organized by the Faculty of Computer Science and
Applications, CHARUSAT.
There are three pillars of Soft Computing viz., i) Fuzzy computing, ii) Neuro
computing, and iii) Evolutionary computing. Research submissions in these three areas
were received. The Program Committee of icSoftComp2020 is extremely grateful to
the authors from 15 different countries including the USA, United Arab Emirates,
Mauritius, Saudi Arabia, Palestine, Peru, South Africa, Pakistan, Nigeria, Ecuador,
Libya, Taiwan, Bangladesh, Ukraine, and UK who showed an overwhelming response
to the call for papers, submitting over 252 papers. The entire review team (Technical
Program Committee members along with 12 additional reviewers) expended tremen-
dous effort to ensure fairness and consistency during the selection process, resulting in
the best-quality papers being selected for presentation and publication. It was ensured
that every paper received at least three, and in most cases four, reviews. Checking of
similarities was also done based on international norms and standards. After a rigorous
peer review 28 papers were accepted with an acceptance ratio of 11.11%. The papers
are organised according to the following topics: Theory & Methods, and Systems &
Applications. The proceedings of the conference are published as one volume in the
Communications in Computer and Information Science (CCIS) series by Springer, and
are also indexed by ISI Proceedings, DBLP, Ulrich’s, EI-Compendex, SCOPUS,
Zentralblatt Math, MetaPress, and Springerlink. We, in our capacity as volume editors,
convey our sincere gratitude to Springer for providing the opportunity to publish the
proceedings of icSoftComp2020 in their CCIS series.
icSoftComp2020 provided an excellent international virtual forum to the conference
delegates to present their research, knowledge, new ideas, and innovations. The con-
ference exhibited an exciting technical program. It also featured high-quality work-
shops, keynote, and five expert talks from prominent research and industry leaders. The
Keynote speech was given by Dr. Valentina E. Balas (Professor, Aurel Vlaicu
University of Arad, Arad, Romania). Expert talks were given by Dr. Deepak Garg
(Bennett University, Greater Noida, India), Dr. Sanjay Misra (Covenant University,
Ota, Nigeria), Dr. Vishnu Pendyala (San José State University, San José, CA, USA),
Dr. Kiran Trivedi (Vishwakarma Government Engineering College, Ahmedabad,
India), Dr. Pritpal Singh (Jagiellonian University, Poland), and Dr. Korhan Cengiz
vi Preface
(Trakya University, Turkey). We are grateful to them for sharing their insights on their
latest research with us.
The Organizing Committee of icSoftComp2020 is indebted to Dr. Pankaj Joshi,
Provost of Charotar University of Science and Technology and Patron, for the confi-
dence that he invested in us in organizing this international conference. We would also
like to take this opportunity to extend our heartfelt thanks to the Honorary Chairs of
this conference, Dr. Kalyanmoy Deb (Michigan State University, MI, USA) and
Dr. Leszek Rutkowski (IEEE Fellow) (Częstochowa University of Technology,
Częstochowa, Poland) for their active involvement from the very beginning until the
end of the conference. The quality of a refereed volume primarily depends on the
expertise and dedication of the reviewers who volunteer with a smiling face. The
editors are further indebted to the Technical Program Committee members and external
reviewers who not only produced excellent reviews but also did so in a short time
frame, in spite of their very busy schedules. Because of their quality work it was
possible to maintain the high academic standard of the proceedings. Without their
support, this conference could never have assumed such a successful shape. Special
words of appreciation are due to note the enthusiasm of all the faculty, staff, and
students of the Faculty of Computer Science and Applications of CHARUSAT, who
organized the conference in a professional manner.
It is needless to mention the role of the contributors. The editors would like to take
this opportunity to thank the authors of all submitted papers not only for their hard
work but also for considering the conference a viable platform to showcase some
of their latest findings, not to mention their adherence to the deadlines and patience
with the tedious review process. Special thanks to the team of OCS, whose paper
submission platform was used to organize reviews and collate the files for these pro-
ceedings. We also wish to express our thanks to Ms. Kamiya Khatter, Associate Editor,
Springer Nature India, New Delhi, for her help and cooperation. We gratefully
acknowledge the financial (partial) support received from the Department of Science &
Technology, Government of India and the Gujarat Council on Science & Technology
(GUJCOST), Government of Gujarat, Gandhinagar, India for organizing the confer-
ence. Last but not least, the editors profusely thank all who directly or indirectly helped
us in making icSoftComp2020 a grand success and allowed the conference to achieve
its goals, academic or otherwise.
Patron
Pankaj Joshi Charotar University of Science and Technology, India
Honorary Chairs
Kalyanmoy Deb Michigan State University, USA
Leszek Rutkowski Częstochowa University of Technology, Poland
General Chairs
Atul Patel Charotar University of Science and Technology, India
Pawan Lingras Saint Mary’s University, Canada
Advisory Committee
Arup Dasgupta Geospatial Media and Communications, India
Valentina E. Balas Aurel Vlaicu University of Arad, Romania
Bhushan Trivedi GLS University, India
Bhuvan Unhelkar University of South Florida Sarasota-Manatee, USA
J. C. Bansal Soft Computing Research Society, India
Narendra S. Chaudhari Indian Institute of Technology Indore, India
Rajendra Akerkar Vestlandsforsking, Norway
Sudhirkumar Barai BITS Pilani, India
Devang Joshi Charotar University of Science and Technology, India
S. P. Kosta Charotar University of Science and Technology, India
Dharmendra T. Patel Charotar University of Science and Technology, India
Additional Reviewers
Glaucoma Detection Using Features of Optic Nerve Head, CDR and ISNT
from Fundus Image of Eye. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Kartik Thakkar, Kinjan Chauhan, Anand Sudhalkar, Aditya Sudhalkar,
and Ravi Gulati
Sreesruthi Ramasubramanian(&)
and Senthil Arumugam Muthukumaraswamy
Abstract. Fire-fighting robots with the ability to detect and extinguish fires are
extremely useful in saving lives and property. However, most of these fire-
fighting robots have been designed to operate semi-autonomously. Using path-
planning algorithms to guide the robot to move from the present position to the
target position would greatly improve the performance of the robot in the fire
extinguishing process. Two types of sampling-based path-planning algorithms,
namely, Rapidly Exploring Random Tree (RRT) and Rapidly Exploring Ran-
dom Tree Star (RRT*) are investigated in this paper. The performances of these
algorithms are analyzed and compared based on the computational time taken to
generate paths and the length of the paths generated in order to select an
effective path-planning algorithm. After investigation, RRT* is chosen for path-
planning in both static and dynamic obstacle environments.
1 Introduction
Fire-fighting robots are extremely useful in detecting and extinguishing fires in indoor
environments. Most fire-fighting robots designed have a camera attached so that the
user can control the movement of the robot remotely by observing the environment [1,
2]. However, since these robots aren’t completely autonomous, they require human
supervision. Having an autonomous fire-fighting robot that has the ability to avoid
obstacles and reach the target quickly would improve the efficiency of the robot.
Some robots have been designed to work autonomously by using ultrasonic sensors
for obstacle avoidance [3]. However, the robots that solely rely on ultrasonic sensors
will take a long time to avoid obstacles and reach the destination. The sensor will detect
the static obstacles only as the robot moves towards the target. Implementing motion
planning in fire-fighting robots will give the robots the ability to operate autonomously
and navigate various rooms in a floor to reach the destination quickly.
The objective of motion planning is to develop algorithms that would equip robots
with the ability to decide on how to move from the start to the end position. Usually,
multiple paths exist between start and goal positions. It is often essential to find the path
that minimizes that total distance for the robot to travel in. This is especially important
in fire-fighting robots as time is of essence in saving lives and preventing damage.
However, it is also important to consider the computational time taken for path iden-
tification. These two main factors influence the performance of the path-planning
algorithms.
To implement path-planning, the position of the fire has to be first obtained. Let’s
suppose that the floor of a building has a certain number of rooms. The fire is detected
with respect to the room using deep learning-based object detection model [4]. It is
beneficial to use deep learning rather than conventional flame sensors for fire detection
since the position of the fire can be obtained using the coordinates of the bounding box
surrounding the detected fire in the image or video [5]. Using this position, the target
position of the fire is obtained. The target position is the position a certain distance
away from the fire [6]. This is so that the robot can extinguish the fire without being
affected. The local target position concerning the room is used to obtain the global
target position with respect to the particular floor. The robot is present at a known
location and the path is planned between the two positions using an appropriate path-
planning algorithm. The robot will follow the path found, move to the target position
and extinguish the fire. The overall procedure is illustrated in Fig. 1.
Fig. 1. Flowchart representing the necessary steps to be performed for efficient fire-
extinguishment
On the Performance Analysis of Efficient Path-Planning Algorithms 5
2 Literature Review
Path-planning algorithms have been applied in various situations to find the appropriate
path. The literature review section aims to show some sample situations where
sampling-based algorithms have been applied to find the necessary path and the
comparison made between sampling-based and grid-based path-planning algorithms.
The RRT algorithm was used to perform path-planning to find the path when the
road network was congested due to traffic [8]. The algorithm was simulated based on
data obtained from the road network in Wuhan City. The modification made to the RRT
algorithm was that the road segment closest to the line between the nearest node and
the random point was selected. If the road network found collided with an obstacle
area, resampling was performed to find a new road. Since real-time traffic kept
changing with time, it was essential to re-plan the path according to the data obtained.
Hence, if congested areas were found in the predetermined route, the current road was
considered as the root node and the RRT was initialized again to find the path. In this
manner, RRT was able to find the least congested route based on real-time data.
RRT* was used for planning a minimum dose path in nuclear facilities [10]. RRT*
was proposed for use in nuclear environments since the radioactive environment
changed frequently during decommissioning and it was necessary to frequently gen-
erate a new network. The obstacles were modelled using basic geometric shapes and
detected using virtual reality. The performance of RRT* was compared with Dijkstra’s
algorithm. The path produced by RRT* was smoother and shorter than the path pro-
duced by Dijkstra’s, reaching the target 18.3 s earlier. The cumulative dose of RRT*
was also less by 2.74%. However, the average time taken to generate RRT* was
341 ms more than the time taken by Dijkstra’s. RRT* took more time since it needed to
check for obstacles at every iteration. However, the path generated by RRT* was much
better when considering all aspects. The overall time taken for evacuation following
6 S. Ramasubramanian and S. A. Muthukumaraswamy
RRT* would still be less than the overall time taken following Dijkstra’s. Hence, RRT*
was used for path-planning inside nuclear facilities.
Devaurs et al. discussed about parallelizing RRT through large-scale distributed-
memory architectures using messaging passing interface (MPI) to improve the per-
formance [11]. They compared three parallel versions of RRT- Or Parallel RRT,
Distributed RRT and Manager-worker RRT. They compared their performance on
different motion-planning problems such as passage problem, corridor problem and
roundabout problem. In Or Parallel RRT, each process first computed its own RRT.
The first one to reach the termination condition then produced a termination message,
instructing others to stop. If many processes managed to find the solution at the same
time, the program instructed them to coordinate and agree on which process to display.
The time taken for these communications was negligible when considering the overall
runtime.
In Distributed RRT, for each iteration of the tree, the process had to first check
whether it had already received new nodes from other processes. If that was the case,
the node was added to the copy of the tree in that process and it performed the next
step, i.e. it attempted to expand. If this attempt was possible, the node was added and
announced to all the processes. In Manager-worker RRT, only the manager had access
to the tree. The manager initially checked if it had received nodes from any of the
workers. If this was the case, it added them to the tree, then computed the random node
and found the nearest neighbour in the tree. It then looked for an idle worker to whom it
sent the required data in order to expand the tree. These steps continued until the goal
was reached.
From the results they obtained, they concluded that motion problems for which
computational cost of RRT expansion is high can use Distributed RRT and Manager-
worker RRT to lower the cost. Parallelizing can also be applied for other RRT-based
algorithms such as RRT*. RRT* has a greater number of steps such as adding and
removing edges after rewiring which will increase the time taken for communication.
In such a case, they remarked that Distributed version will be more beneficial com-
pared to Manager-worker version.
Some form of cost indication is required to plan paths using RRT on rough terrains
[12]. In RRT, the nearest neighbour in the tree to the random node was found using the
Euclidean distance. Tahirovic et al. used a roughness-based metric called Roughness
based Navigation Function (RbNF) that represented an estimate of the roughness value.
From the simulation results obtained, they concluded that RRT-RT was very effective in
exploring favourable terrain regions. They observed that the algorithm produced paths
that decreased the total roughness. The results also demonstrated that paths produced
by RRT-RT did not significantly differ from optimal paths. This can also be imple-
mented for other path-planning algorithms such as RRT* to improve their performance
on rough terrains.
On the Performance Analysis of Efficient Path-Planning Algorithms 7
3 Methodology
3.1 Binary Occupancy Map
A map represents the environment of the robot through which it has to navigate.
A Binary Occupancy map is usually built by obtaining information from range sensors.
The Binary Occupancy Map considered for path-planning is shown in Fig. 2 [13].
In the Binary Occupancy Map, the black regions represent occupied spaces and
white regions represent free spaces. The position of the start and end locations is
described in terms of x and y meters. The walls of the rooms are the obstacles which
the robot should avoid and the free space is the region through which the robot should
navigate to reach the target position.
Fig. 2. Binary Occupancy Map of the floor where the robot operates
RRT* is an optimized version of RRT which produces paths that are shorter in
length. RRT* works the same way as RRT but has an additional procedure called the
rewiring procedure to produce shorter paths. In rewiring, the neighbourhood of each
recently added node Nnew is searched based on a radius value r. The value of radius r is
determined by the Eq. (1).
1
logðnÞ d
r ¼ cð Þ ð1Þ
n
where d is the dimension of the search space, c is the constant based on the envi-
ronment and n is the number of nodes in the tree.
The neighbouring node which will make the cost to reach Nnew minimum is chosen
from all the neighbouring nodes within the radius area. If rewiring the neighbouring
node to the node Nnew decreases the cost to the nodes in the radius area compared to the
older costs, the neighbour is rewired to the newly added node and the previous edge
connecting the neighbour to its old parent is removed. RRT* can produce shorter paths
in this manner.
On the Performance Analysis of Efficient Path-Planning Algorithms 9
The Navigation Toolbox has been used to implement the RRT and RRT* algorithms
in MATLAB. The steps below outline the procedure followed to simulate RRT and
RRT* in MATLAB [14, 15]. The path-planning algorithms were simulated on a com-
puter having Intel i7-5500U processor with 2.4 GHz and 8GB RAM using MATLAB
2019b.
• The Binary Occupancy Map and the start and final states in terms of x, y and theta
were the input parameters necessary for obtaining the path. For example, the start
state would be input as (40,20,0) and the end state would be input as (180,80,0) to
plan the path between the respective positions. The theta value was always set as 0.
• The stateSpaceSE2 object was used to store the states and the respective parameters
in state-space in the form of x position, y position and theta.
10 S. Ramasubramanian and S. A. Muthukumaraswamy
4 Results
The simulation results obtained via RRT and RRT* algorithms are presented in this
section. The results obtained using RRT and RRT* are then compared to choose an
efficient path-planning algorithm.
The path generated by RRT and RRT* between the start position (40,80) and the
target position (180,65) is shown in Fig. 3 and Fig. 4 respectively. From Fig. 3 and
Fig. 4, it can be observed that RRT* tends to produce much shorter paths. RRT*
produces shorter paths due to the rewiring procedure.
The computational time taken to generate paths for the robot to move from one
room (120,80) to another room and extinguish the fires present in that room by trav-
elling to target positions (120,20), (160,10) and (180,40) along with the path length is
presented in the following tables. The statistical comparison using the average and the
standard deviation of computational time taken using RRT and RRT* algorithms are
presented in Table 1 to compare the performances of the two algorithms. Also, the
statistical comparison using the average and standard deviation of lengths of the path to
reach the targeted position using RRT and RRT* algorithms are presented in Table 2.
On the Performance Analysis of Efficient Path-Planning Algorithms 11
Fig. 3. The path generated using RRT between (40,80) and (180,65)
Fig. 4. The path generated using RRT* between (40,80) and (180,65)
Table 1. Computational time taken to identify required path using RRT and RRT* algorithms
Starting position Ending position Computational time taken (s)
Average Standard
Deviation
RRT RRT* RRT RRT*
(120,80) (120,20) 0.5401 0.5537 0.3837 0.3971
(120,20) (160,10) 0.3683 0.3411 0.1423 0.0532
(160,10) (180,40) 0.3230 0.3570 0.0538 0.0411
12 S. Ramasubramanian and S. A. Muthukumaraswamy
As observed from Table 2, the length of the path produced by RRT* was on an
average shorter in length than the path produced by RRT. However, since RRT* spent a
considerable amount of time in rewiring, the average computational time of RRT* was
greater compared to RRT as observed in Table 1. When the robot is following the
computed path, the time taken by the robot while following RRT* would be less than
the time taken by the robot while following RRT although RRT* tends to take more
time for computation.
However, at high speeds, the difference in the time to travel would be very small,
even smaller than the difference in computational time taken. The difference in com-
putation time will play a significant role in the overall time taken in that case. The robot
following RRT will be able to reach the target position faster than a robot following
RRT* as the robot following RRT will start earlier. Thus, using RRT at such high speeds
will help to save computational resources and extinguish the fire faster. Hence, it would
be better to use RRT for the path-planning of the robot given the robot can be run at
such a high speed. Otherwise, RRT* would be the best option for path-planning.
An example for a situation where the robot should be run only at low speeds is a
dynamic obstacle environment. It would not be beneficial to run the robot at high
speeds when dynamic obstacles are present as it will be difficult to avoid collision when
the robot suddenly encounters a moving obstacle. In such situations, running the robot
at a lower speed using RRT* would be more beneficial. However, it would not be
practical not use RRT even in static obstacle environments as the path produced by the
algorithms has many turns and the robot would not have the proper leverage to turn at
high speed. Thus, RRT* is chosen for path-planning in all environments.
5 Conclusion
In this paper, the sampling-based path-planning algorithms RRT and RRT* were
compared based on the average computational time taken for path computation and
average path length to find a suitable algorithm for the path-planning of a fire-fighting
robot. Choosing a suitable path-planning algorithm will help the fire-fighting robot to
extinguish the fire in minimum time. Even though RRT produced the path faster on an
average, RRT* produced shorter paths. However, the difference in time taken by the
robot to reach the destination while following both the computed paths would be very
On the Performance Analysis of Efficient Path-Planning Algorithms 13
small when the robot is moving at a high speed. Hence a robot following RRT would be
able to reach the destination faster as RRT has a faster computational time on an
average. Therefore, RRT would be the best option for path planning in a static obstacle
environment where the robot can be run at such a high speed. However, it would not be
practical to run the robot at such high speeds since the robot will not have the proper
leverage to turn. Hence RRT* is chosen for path-planning in both static as well as
dynamic obstacle environments.
6 Future Scope
In order to plan the path of the robot in a dynamic obstacle environment, the speed of
the dynamic obstacle must be known. Using laser sensors or image processing tech-
niques, the speed of the moving obstacle should be found. After finding the speed of
the moving obstacle, the position of the obstacle in each second should be found. Using
this, it can be determined whether the obstacle and the robot will collide and whether it
is necessary to re-plan the path using the path-planning algorithm.
References
1. Mittal, S., et al.: CeaseFire: the fire fighting robot. In: 2018 International Conference on
Advances in Computing, Communication Control and Networking (ICACCCN), pp. 1143–
1146. IEEE (2018)
2. Jia, Y.-Z., et al.: Design and research of small crawler fire fighting robot. In: Chinese
Automation Congress (CAC), pp. 4120–4123. IEEE (2018)
3. Suresh, J.: Fire-fighting robot. In: 2017 International Conference on Computational
Intelligence in Data Science (ICCIDS), pp. 1–4. IEEE (2017)
4. Shen, D., et al.: Flame detection using deep learning. In: 2018 4th International Conference
on Control, Automation and Robotics (ICCAR), pp. 416–420. IEEE (2018)
5. Ramasubramanian, S., Muthukumaraswamy, S.A., Sasikala, A.: Fire detection using
artificial intelligence for fire-fighting robots. In: 2020 4th International Conference on
Intelligent Computing and Control Systems (ICICCS), pp. 180–185. IEEE (2020)
6. Ramasubramanian, S., Muthukumaraswamy, S.A., Sasikala, A.: Enhancements on the
efficacy of firefighting robots through team allocation and path-planning. In: Jeena Jacob, I.,
Shanmugam, S.K., Piramuthu, S., Falkowski-Gilski, P. (eds.) Data Intelligence and
Cognitive Informatics: Proceedings of ICDICI 2020, pp. 139–151. Springer, Singapore
(2021). https://doi.org/10.1007/978-981-15-8530-2_11
7. Hidayatullah, A.A., Handayani, A.N., Fuady, M.J.: Performance analysis of A algorithm to
determine shortest path of fire fighting robot. In: 2017 International Conference on
Sustainable Information Engineering and Technology (SIET), pp. 53–56. IEEE (2017)
8. Hu, Z., et al.: Application of rapidly exploring random tree algorithm in the scene of road
network with scheduled tracks. In: 2017 2nd International Conference on Advanced
Robotics and Mechatronics (ICARM), pp. 311–315. IEEE (2017)
9. Chen, X., Liu, Y., Hong, X., Wei, X., Huang, Y.: Unmanned ship path planning based on
RRT. In: Huang, D.-S., Bevilacqua, V., Premaratne, P., Gupta, P. (eds.) ICIC 2018. LNCS,
vol. 10954, pp. 102–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95930-
6_11
14 S. Ramasubramanian and S. A. Muthukumaraswamy
10. Chao, N., et al.: A sampling-based method with virtual reality technology to provide
minimum dose path navigation for occupational workers in nuclear facilities. Prog. Nucl.
Energy 100, 22–32 (2017)
11. Devaurs, D., Simeon, T., Cortes, J.: Parallelizing RRT on large-scale distributed-memory
architectures. IEEE Trans. Robot. 29(2), 571–579 (2013)
12. Tahirovic, A., Magnani, G.: A roughness-based RRT for mobile robot navigation planning.
IFAC Proc. Vol. 44(1), 5944–5949 (2011)
13. Mathworks.com: Create occupancy grid with binary values. https://www.mathworks.com/
help/nav/ref/binaryoccupancymap.html
14. Mathworks.com: Create an RRT planner for geometric planning. https://www.mathworks.
com/help/nav/ref/plannerRRT.html
15. Mathworks.com: Create an optimal RRT path planner (RRT*). https://www.mathworks.
com/help/nav/ref/plannerRRTstar.html
Automatic Image Colorization Using GANs
1 Introduction
Computer Vision has been around since the 1950s and is one of the most advanced
applications of Artificial Intelligence utilized in today’s world [3]. The success of
Computer Vision is solely due to the massive amount of data that has been generated in
recent years after the advent of Machine Learning. Computer Vision is intertwined with
Deep learning resulting in a wide variety of applications, through Convoluted Neural
Networks (CNNs). Deep neural networks and Convolutional Neural Networks have
already been extensively used in the domain of image processing and computer vision
change. Many image colorization problems and applications have been solved using
these technologies, but there is one major issue with it. If a small amount of noise is
added in these networks, the results go haywire. For instance, supposing an image of a
dog is to be classified, when noise is added to the image (even of a small magnitude) it
leads to the dog image being identified as an ostrich. This led us to explore Generative
Adversarial Networks as an alternative to automatically colorizing gray-scaled images.
GANs are a fairly novel subject in the domain of research. It is increasingly
capturing researchers’ interest due to the advantages it brings along which we have
mentioned throughout. The discerning feature is that we have attempted to solve the
age-old problem of image colorization using GANs with bare minimum parameters and
requirements. We still got promising results which can be improved exponentially by
scaling up our development environment. Although CNN models are extremely effi-
cient with object detection, classification and a vast number of image processing
techniques, they are hungry for data and this can moreover pose a problem when there
is a limit to the amount of data available. GANs cover up this shortcoming very
gracefully as they are capable of generating their own data which is in agreement with
the original input data. In our study, we implement a Conditional GAN model on a
small scale that successfully colorizes images.
The paper also demonstrates the empirical effectiveness of GAN on the CIFAR-10,
MNIST, and TFD image datasets.
Table 1. (continued)
No Reference paper Methodology Limitations
7 Emotional image color Considers semantic Instability due to limited
transfer via deep learning information of images to train and test sets
[19], Liu Da, Jiang Yaxi, Pei solve unnatural color
Min, Liu Shiguang, 2018 problems in images
8 Optimization based Optimization approach to Spatial-temporal approach
grayscale image colorization colorization that reduces to be developed for
[18], Nie Dongdong, Ma computational time and maintaining temporal
Qinyong, Ma Lizhuang, reduces color diffusions coherence
Xiao Shuangjiu, 2007
The discriminator sends out a value D(x) which indicates the chance of x being a
real data instance. Our goal is to optimize the opportunity to recognize real data
instances as real and generated instances as fake, that is to say, the maximum likelihood
of the observed data. For cost choice, cross-entropy is used.
Discriminator Mathematics:
max V(D) = Ex*pdata(x)[log D(x)] + Ez*pz(z)[log(1 − D(G(Z)))]
where,
Ex*pdata(x)[log D(x)] recognizes real images better
Ez*pz(z)[log(1 − D(G(Z)))] recognizes generated images better
Goal: Optimize G that can fool the discriminator the most.
Automatic Image Colorization Using GANs 19
5 Results
Training of GAN models is complicated and differs significantly from other machine
learning models, it is difficult to identify if it is performing upto the mark but easier to
know when it is going in an unanticipated direction. Our aim is to keep the discrim-
inator accuracy as large i.e. closer to 100% and simultaneously ensure that the
Automatic Image Colorization Using GANs 23
generator loss does not reach zero. If so happens, it indicates that the discriminator is
not performing the job well. Below is a graph showing generator and discriminator
accuracy losses (Fig. 5).
discriminator losses must converge to some stable permanent numbers, i.e. a recog-
nized pattern.
1 X
m1 X
n1
MSE ¼ ½Iði; jÞ Kði; jÞ2 ð1Þ
mn i¼0 j¼0
ð2lx ly þ c1 Þð2rxy þ c2 Þ
SSIMðx; yÞ ¼ ð2Þ
ðl2xþ l2y þ c1 Þðr2x þ r2y þ c2 Þ
Post training, once the model acquires good enough accuracy, we discard the
discriminator as it was only used to keep a check on generator’s performance. As soon
as it is learnt that the model is capable of generating in very close resemblance with the
real data, the discriminator’s job is done and is thus removed. Now, the GANs model is
capable to be deployed and run on unseen and new data as well.
Fig. 6. The above images are the result of GAN training. From top to bottom: single-channeled
image, GAN generated image, Ground truth color image. (Color figure online)
Generative adversarial networks are being used far and wide in medical imagery, self-
driving cars, image translation and many more such areas. We tried to do the automatic
colorization of grayscale images using Generative adversarial networks. The images of
CIPHER 10 generated by the GANs with synthetic colors looked reasonably well and
similar to the original images which supports our implementation. However, initially
there were some incidences where the model misunderstood the sea water for grass
during the training process, but with further training, it was successful in properly
identifying the color green. Moreover, we observed that the model also faced an
unusual problem with the color red which it took longer to learn than other colors so it
could also be explored further. Though we have trained the model for single channel
images, the model works well with black and white images as well.
Though GANs are relatively latest areas, it has been significantly working well in
applications such as Image processing and analysis, text to image translation, gener-
ating pictures of people’s faces and so on, which gives directions of further exploration
into different application areas. Our model can be extended to video formats as well so
that GANs would be made further interesting. It can also be used on converting old
movies and videos to a colorized version. Furthermore, GANs model will prove to be
beneficial in case of restoring distorted pictures which have tarnished due to the effects
of time, mishandling etc. Thus, GANs have a potential to fit into a wide range of
applications and make image processing more interesting.
26 R. Dhir et al.
References
1. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural
image synthesis. arXiv preprint arXiv:1809.11096 (2018)
2. Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse
transformers. arXiv preprint arXiv:1904.10509 (2019)
3. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information
Processing Systems, pp. 2672–2680 (2014)
4. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional
adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 1125–1134 (2017)
5. Johari, M.M., Behroozi, H.: Context-aware colorization of gray-scale images utilizing a
cycle-consistent generative adversarial network architecture. Neurocomputing 407, 94–104
(2020). https://doi.org/10.1016/j.neucom.2020.04.042
6. Kuang, X., et al.: Thermal infrared colorization via conditional generative adversarial
network. Infrared Phys. Technol. 107, 103338 (2020)
7. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial
network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 4681–4690 (2017)
8. Ling, W., Dyer, C., Yu, L., Kong, L., Yogatama, D., Young, S.: Relative Pixel Prediction for
Autoregressive Image Generation (2019)
9. Liu, D., Jiang, Y., Pei, M., Liu, S.: Emotional image color transfer via deep learning. Pattern
Recogn. Lett. 110, 16–22 (2018)
10. Liu, S., Zhang, X.: Automatic grayscale image colorization using histogram regression.
Pattern Recogn. Lett. 33(13), 1673–1681 (2012)
11. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.
1784 (2014)
12. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: GauGAN: semantic image synthesis with
spatially adaptive normalization. In: ACM SIGGRAPH 2019 Real-Time Live!, p. 1 (2019)
13. Qin, Z., Liu, Z., Zhu, P., Xue, Y.: A GAN-based image synthesis method for skin lesion
classification. Comput. Methods Programs Biomed. 195, 105568 (2020)
14. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep
convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
15. Wiley, V., Lucas, T.: Computer vision and image processing: a paper review. Int. J. Artif.
Intell. Res. 2(1), 29–36 (2018)
16. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial
networks. In: International Conference on Machine Learning, pp. 7354–7363, May 2019
17. Zhang, J., Zhong, F., Cao, G., Qin, X.: ST-GAN: unsupervised facial image semantic
transformation using generative adversarial networks. In: ACML, pp. 248–263, November
2017
18. Nie, D., Ma, Q., Ma, L., Xiao, S.: Optimization based grayscale image colorization, 9 March
2007. https://www.sciencedirect.com/science/article/abs/pii/S0167865507000700. Accessed
29 June 2020
19. Liu, D., Jiang, Y., Pei, M., Liu, S.: Emotional image color transfer via deep learning. https://
www.sciencedirect.com/science/article/abs/pii/S0167865518300941. Accessed 29 June 2020
Gujarati Task Oriented Dialogue Slot Tagging
Using Deep Neural Network Models
Abstract. In this paper, the primary focus is of Slot Tagging of Gujarat Dia-
logue, which enables the Gujarati language communication between human and
machine, allowing machines to perform given task and provide desired output.
The accuracy of tagging entirely depends on bifurcation of slots and word
embedding. It is also very challenging for a researcher to do proper slot tagging
as dialogue and speech differs from human to human, which makes the slot
tagging methodology more complex. Various deep learning models are available
for slot tagging for the researchers, however, in the instant paper it mainly
focuses on Long Short-Term Memory (LSTM), Convolutional Neural Network -
Long Short-Term Memory (CNN-LSTM) and Long Short-Term Memory –
Conditional Random Field (LSTM-CRF), Bidirectional Long Short-Term
Memory (BiLSTM), Convolutional Neural Network - Bidirectional Long
Short-Term Memory (CNN-BiLSTM) and Bidirectional Long Short-Term
Memory – Conditional Random Field (BiLSTM-CRF). While comparing the
above models with each other, it is observed that BiLSTM models performs
better than LSTM models by a variation *2% of its F1-measure, as it contains
an additional layer which formulates the word string to traverse from backward
to forward. Within BiLSTM models, BiLSTM-CRF has outperformed other two
Bi-LSTM models. Its F1-measure is better than CNN-BiLSTM by 1.2% and
BiLSTM by 2.4%.
1 Introduction
achieved state of art performances [1]. The deep convex network (DCN) gave
promising result on semantic utterance classification [2]. Linear SVM gave better
performance for semantic feature classification [3]. Slot tagging can also be considered
as sequential labelling, through that proper sequencing of word is done for any given
sentence. User dialogues with N words to N sequence tags are converted through slot
tagging. Labelling of each user dialogues is done to get the desired output. HMM/CFG
composite models [4], conditional random fields (CRFs) [5, 6] and support vector
machines (SVMs) [7] are widely used standard models for resolving the slot tagging
problems. However, researchers in the recent past explore deep learning approaches [8,
9] as they provide better accuracy then traditional methods, as these methods also avoid
handcrafted features.
For Gujarati language, task oriented dialogue annotated data is scarcely available.
POS tagger developed for Gujarati dialect gave accuracy of 92% [10], later on NER
Tagger was developed which identified Person, Organization, Location and others
using CRF which gave accuracy of 83% [11]. In the same manner researchers have also
done work in Hindi for Name Entity Recognition. Hindi name entity recognition was
done using BiLSTM and was compared with RNN and RNN-LSTM by researchers and
they have achieved *77.5% F1-score. [12]. Hindi name entity recognition was done
using conditional LSTM and BiLSTM and researchers achieved 74% F1-score [13],
similarly Hindi name entity recognition was done using deep neural network and
researchers achieved F1-score 64.5 [14]. Using traditional method for slot tagging like
CRF has major disadvantage that researchers should have in-depth knowledge of
language, as features are classified manually, to ensure quality of features researchers
need to have language expertise. To implement deep neural models, tagged data is
required, which is scarcely available in Gujarati Language as compared to other lan-
guages. Researchers achieved *96% F1-score on ATIS English airlines data set, using
Recurrent Neural Network (RNN) and combination of RNN and CRF (R-CRF) [15].
Subsequently, many researchers used variation of RNN, which includes LSTM and
BiLSTM for sequence labeling, wherein, they achieved accuracy of *97% on
CoNLL2000 data set [16]. Looking at the accuracy achieved using deep-neural net-
work approaches in other languages [15–17], and because of traditional method dis-
advantages, in this paper we have implemented and compared Gujarati slot tagging
using deep learning models like LSTM, CNN-LSTM, LSTM-CRF, BiLSTM, CNN-
BiLSTM and BiLSTM-CRF. The result shows that BiLSTM models outperforms
LSTM. BiLSTM-CRF outperforms other two BiLSTM models for Gujarati slot tagging
when compared among themselves.
Segment-wise classification of the paper is as follows: Segment 2 describes the slot
tagger, Segment 3 gives an overview of approaches to slot tagging. Furthermore,
Segment 4 describes, Deep Neural Network models for slot tagger experiment on
Gujarati task oriented dialogue and Segment 5 deliberates on paper conclusion.
Gujarati Task Oriented Dialogue Slot Tagging 29
2 Slot Tagger
Sentence
Slot B-Loc O B-Weather- I-Weather- I-Weather- O
Attribute Attribute Attribute
Intent Weather
Aforementioned table shows an example including slot tags and intent. This
example is about weather forecast considering it as intent. Slot tagging comprises of the
in/out/begin (IOB) representation. In this example, (Mumbai) is a location slot
value. Even the aforementioned dialogue has weather-attribute, which is required to
check highest temperature of that location, (maximum temperature),
wherein weather-attribute is specified as the slot values.
The base for slot tagging is the input from the user in the form of a dialogue. This
dialogue is a logical representation of series of words (W). Slot tagging process
includes assignment of tags (S) to each of the words in the given dialogue from the
user. Traditional methods for Language Understanding systems allots appropriate slots
which includes the maximum posteriori probability p(S | W). This process provides the
desired output to find the semantic representation by slot sequencing.
In slot tagging one-hot encoded vector, uses one to represent a particular word in a
vector and rest of the elements in the vector are zero. This vector indicates the input
word value, which is defined as word embedding. These generated word embedding
vectors are then provided to BiLSTM for training and testing purposes.
3.1 LSTM
Long Shor-Term Memory is a deep recurrent neural network with ability to remember
selected data for longer duration [24]. LSTM model consists of number of cells. Data
flows from one cell state to another. Each cell pass on two states, which is cell state
(c) and the hidden state (h) to next cell. It’s the responsibility of cell to remember
information, this is achieved using three gates present in each cell, the output gate (o),
30 R. Parikh and H. Joshi
forget gate (f), and input gate (i). The forget gate mainly functions to discard unwanted
data from cell state. The input gate functions to add new information and the output
gate is responsible to generate relevant information from given input and previous state
values. Given the input sequence A1, A2, A3, …, An the output vector Yn is generated.
The At is a input at time t. Wai, Whi, Wsi, Waf, Whf, Wsf, Wao, Who, Wso are weights and
bi, bo, bc, bf are bias used for computing. The output of LSTM hidden layer is
computed using following formula [25]:
LSTM models were used for neural speech recognition [26] and for entity recog-
nition from clinical text [27]. Figure 1 shows the working of LSTM model for Gujarati
Language Dialogue.
3.2 BiLSTM
Bidirectional LSTM model uses word embedding features of user utterance from left to
right & right to left i.e. from both the directions. This would help in achieving better
accuracy [28]. The Fig. 2 shows the Bidirectional LSTM model working for Gujarati
Dialogue. If the input is of length M it reads the features in N time. At particular time t,
it takes inputted word at time T (At). Two internal states are maintained by this model
to generate output sequence (Yt). The internal states are hidden (ht) and cell (cgt) at
particular time-step T. To train BiLSTM back-propagation through time (BPTT) is
used [29]. The advantage of BiLSTM over LSTM model is that it unfolds the ht at each
time step. In our implementation the entire dialogue is parsed from left position to right
position and right position to left position.
Gujarati Task Oriented Dialogue Slot Tagging 31
3.3 BiLSTM-CRF
BiLSTM-CRF network model is a composition of BiLSTM network model and a CRF
network model, which is more particularly described in Fig. 3. In this network model,
the sentence features are inputted, wherein, firstly, it is provided to forward LSTM
layer from beginning to end and thereafter to backward LSTM layer, from end to
beginning. BiLSTM layer generate a vector of weight of each output tag. CRF layer is
then provided generated vector as input. CRF layer learn some constraints on provided
vector at the time of training. On the basis of this learned constraints CRF layer
performs the final valid predictions. This prominent feature helps in increasing the
accuracy which is further shown [32].
Table 2. Sizes of Sentences and tokens for Training, Validation and Testing Data Set
Training No. of Sentences 2760
No. of Tokens 23281
Validation No. of Sentences 306
No. of Tokens 2800
Testing No. of Sentences 766
No. of Tokens 6052
Gujarati Task Oriented Dialogue Slot Tagging 33
LSTM model takes into consideration context of words in forward direction. From
the above table it may be mentioned that CNN-LSTM model has achieved 1.7% better
accuracy while comparing the model with LSTM. In CNN-LSTM model, CNN word
embedding feature extraction is added in addition to forward context. Convolution on
word and character helps in getting better features. LSTM-CRF model achieve 1.3%
more accuracy by connecting the outputted tag with one another. CRF layer uses
conditional probability, to predict current tag based on past and future tag. In CRF, the
weights of different feature functions determine the likelihood of the labels in the
training data. Bi-LSTM takes into consideration the word context in both forward and
backward direction which give better accuracy in tagging a word.
34 R. Parikh and H. Joshi
This research also elaborates the performance accuracy based on tagging of models
as shown in Table 4:
In the instant research, the result shows that only the base LSTM model is affected
by change in number of LSTM units, but rest of the models are not affected by number
of units The Fig. 5 below shows the increase in F1-score with changes in number of
units in LSTM model. Additionally it also shows the difference of accuracy achieved
using two different optimization functions: ‘rmsprop’ and ‘adam’. However, it is
observed that after 500 units, if we still increase the number of units, there is no major
change in performance of the model.
Gujarati Task Oriented Dialogue Slot Tagging 35
In the proposed research, for rest of the models, number of recurrent units sampled
are 100 units, even if we increase or decrease the size by 50 units, it is observed that
there is no change in the accuracy level of the performing models. But, if we increase
or decrease the size by more than 100 units the accuracy of the performing model is
reduced.
5 Conclusion
Deep neural network models are implemented and compared using Gujarati dialogue
for slot tagging. The state of art performance is successfully achieved using these deep
neural network models. BiLSTM models achieved *2.2% better F1-score than LSTM
models. CNN-BiLSTM achieved 1.2% more accuracy then BiLSTM, whereas
BiLSTM-CRF achieved even higher accuracy then CNN-BiLSTM by 1.2%. In view of
the foregoing, it is concluded that BiLSTM-CRF gives the highest accuracy amongst
all models in Gujarati Language dialogues. Better accuracy can still be achieved, if
more domains are included. Aforementioned, accuracy is achieved using only limited
data, however, better accuracy can be derived, if more tagged annotated data is
included for training and testing.
References
1. Haffner, P., Tur, G., Wright, J.: Optimizing SVMs for complex call classification. In:
Proceedings of the ICASSP, pp. 632–635 (2003)
2. Tur, G., Deng, L., Hakkani-Tür, D., He, X.: Towards deeper understanding: deep convex
networks for semantic utterance classification. In: Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, ICASSP, pp. 5045–5048 (2012)
3. Dauphin, Y.N., Tur, G., Hakkani-Tür, D., Heck, L.: Zero-shot learning for semantic
utterance classification. In: 2nd International Conference on Learning Representations, ICLR
2014 - Conference Track Proceedings, pp. 1–9 (2014)
36 R. Parikh and H. Joshi
4. Wang, Y., Deng, L., Acero, A.: Spoken language understanding—an introduction to the
statistical framework. Signal Process. Mag. 22(5), 16–31 (2005)
5. Wang, Y., Deng, L., Acero, A.: Semantic frame based spoken language understanding. In:
Tur, G., De Mori, R. (eds.) Spoken Language Understanding: Systems for Extracting
Semantic Information from Speech, Chap. 3, pp. 35–80. Wiley, New York (2011)
6. Liu, J., Cyphers, S., Pasupat, P., McGraw, I., Glass, J.: A conversational movie search
system based on conditional random fields. In: 13th Annual Conference of the International
Speech Communication Association 2012, INTERSPEECH 2012, vol. 3, pp. 2453–2456
(2012)
7. Kudo, T., Matsumoto, Y.: Chunking with support vector machines, vol. 816, pp. 1–8 (2001)
8. Deng, L., Tur, G., He, X., Hakkani-Tur, D.: Use of kernel deep convex networks and end-to-
end learning for spoken language understanding. In: Proceedings of the 2012 IEEE Spoken
Language Technology Workshop, SLT 2012, pp. 210–215 (2012)
9. Yao, K., Peng, B., Zhang, Y., Yu, D., Zweig, G., Shi, Y.: Spoken language understanding
using long short-term memory neural networks, pp. 189–194 (2014)
10. Okazaki, N.: CRFsuite: a fast implementation of Conditional Random Fields, CRFs (2007).
https://www.chokkan.org/software/crfsuite/
11. Garg, V., Saraf, N., Majumder, P.: Named entity recognition for Gujarati: a CRF based
approach. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol.
8284, pp. 761–768. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_74
12. Athavale, V., Bharadwaj, S., Pamecha, M., Prabhu, A., Shrivastava, M.: Towards deep
learning in Hindi NER: an approach to tackle the labelled data scarcity. In: Proceedings of
the 13th International Conference on Natural Language Processing, ICON 2016, Varanasi,
India, 17–20 December 2016, pp. 154–160 (2016)
13. Shah, B., Kopparapu, S.K.: A Deep Learning approach for Hindi Named Entity Recognition
(2019)
14. Sharma, R., Morwal, S., Agarwal, B., Chandra, R., Khan, M.S.: A deep neural network-
based model for named entity recognition for Hindi language. Neural Comput. Appl. 32(20),
16191–16203 (2020). https://doi.org/10.1007/s00521-020-04881-z
15. Yang, X., Liu, J.: Using word confusion networks for slot filling in spoken language
understanding. In: Proceedings of the Annual Conference of the International Speech
Communication INTERSPEECH, vol. 2015-January, no. 3, pp. 1353–1357 (2015)
16. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF Models for Sequence Tagging (2015)
17. Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network
architectures and learning methods for spoken language understanding. In: Proceedings of
the Annual Conference of the International Speech Communication Association, INTER-
SPEECH, no. August, pp. 3771–3775 (2013)
18. Deng, L., Tur, G., He, X., Hakkani-Tur, D.: Use of kernel deep convex networks and end-to-
end learning for spoken language understanding. In: 2012 IEEE Spoken Language
Technology Workshop, SLT 2012, pp. 210–215 (2012)
19. Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., Yu, D.: Recurrent neural networks for language
understanding. In: Proceedings of the Interspeech (2013)
20. Xu, P., Sarikaya, R.: Contextual domain classification in spoken language understanding,
pp. 136–140. Microsoft Corporation, Redmond (2014)
21. Ekbal, A.: Language independent named entity recognition in Indian languages. In:
Proceedings of the Workshop on NER South East Asian Languages, IJCNLP 2008,
Hyderabad India, 12 January 2008, pp3340 PDF 160KB, no. January, pp. 33–40 (2008)
22. Ekbal, A.: A conditional random field approach for named entity recognition in Bengali and
Hindi. Linguist. Issues Lang. Technol. 2(1), 589–594 (2009)
Gujarati Task Oriented Dialogue Slot Tagging 37
23. Sasidhar, B.: A survey on named entity recognition in Indian languages with particular
reference to Telugu. IJCSI Int. J. Comput. Sci. 8(2), 438 (2011)
24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997). https://doi.org/10.1162/neco.1997.9.8.1735
25. Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: A unified tagging solution: bidirectional
LSTM recurrent neural network with word embedding (2015)
26. Soltau, H., Liao, H., Sak, H.: Neural speech recognizer: acoustic-to-word LSTM model for
largevocabulary speech recognition. In: Proceedings of the Annual Conference of the
International Speech Communication Association, INTERSPEECH, vol. 2017-August,
pp. 3707–3711 (2017). https://doi.org/10.21437/Interspeech.2017-1566
27. Liu, Z., et al.: Entity recognition from clinical texts via recurrent neural network. BMCMed.
Inform. Decis. Mak. 17(Suppl 2), 53–61 (2017)
28. Graves, A., Mohamed, A.R., Hinton, G.: Speech Recognition with Deep Recurrent Neural
Networks. Department of Computer Science, University of Toronto, pp. 6645–6649 (2013)
29. Bod, M.: A guide to recurrent neural networks and backpropagation. Rnn Dan Bpnn 2(2), 1–
10 (2001)
30. Shah, D., Bhadka, H.: A survey on various approach used in named entity recognition for
Indian languages. Int. J. Comput. Appl. (0975 – 8887) 167(1) (2017)
31. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural
language processing (almost) from scratch. J. Mach. Learn. Res. (JMLR) 12, 2493–2537
(2011)
32. Chen, T., Xu, R., He, Y., Wang, X.: Improving sentiment analysis via sentence type
classification using BiLSTM-CRF and CNN. Expert Syst. Appl. 72, 221–230 (2017). https://
doi.org/10.1016/j.eswa.2016.10.065
33. Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1D & 2D CNN LSTM
networks. Biomed. Signal Process. Control 47, 312–323 (2019). https://doi.org/10.1016/j.
bspc.2018.08.035
34. Wang, J., Yu, L.C., Lai, K.R., Zhang, X.: Dimensional sentiment analysis using a regional
CNN-LSTM model. In: 54th Annual Meeting of the Association for Computational
Linguistics, ACL 2016 - Short Paper, pp. 225–230 (2016). https://doi.org/10.18653/v1/p16-
2037
35. Guimar, V.: NER with Neural Character Embeddings (2014)
36. Reimers, N., Gurevych, I.: Optimal Hyperparameters for Deep LSTM-Networks for
Sequence Labeling Tasks (2017). https://arxiv.org/abs/1707.06799
37. Wu, C., Wu, F., Chen, Y., Wu, S., Yuan, Z., Huang, Y.: Neural metaphor detecting with
CNN-LSTM model, pp. 110–114 (2018). https://doi.org/10.18653/v1/w18-0913
38. Kulkarni, S.: A survey on named entity recognition for south Indian languages, vol. 167, no.
1, pp. 11–18 (2017)
39. Bhatt, S., Patwa, F., Sandhu, R.: An access control framework for cloud-enabled wearable
internet of things. In: Proceedings of the 2017 IEEE 3rd International Conference on
Collaboration and Internet Computing, CIC 2017, vol. 2017-January, pp. 328–338 (2017).
https://doi.org/10.1109/CIC.2017.00050
40. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM
and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005). https://doi.
org/10.1016/j.neunet.2005.06.042
Energy Efficient Aspects of Federated
Learning – Mechanisms
and Opportunities
Shajulin Benedict(B)
1 Introduction
Machine Learning (ML) renders innovative solutions for traditional services
considering a large amount of data. Applications that predict the user experi-
ence of web data, drug discovery, COVID-19 patient movements, shared mobil-
ity, and so forth, which surpass from geographical diversification to organi-
zational distinctions, have marked a sudden surge in the inclusion of ML
approaches while influencing a wide range of researchers/practitioners such as
The rest of the paper is described as follows: Sect. 2 exposes the required
basics of FL with a generic architecture; Sect. 3 details on the functioning of dif-
ferent FL mechanisms with distinct classifications; Sect. 4 highlights the energy
efficient aspects of FL frameworks or architectures; Sect. 5 centres around the
characterization of FL applications; and, finally Sect. 6 conclude the contribu-
tions of paper with a few future outlooks.
3.2 FL Metrics
The metrics that could improve the performance of FL algorithms are classified
into three categories – Performance-oriented, Security oriented, and Algorithm-
specific metrics.
during FL. Higher the Encryption Level, higher is the security. Higher Encryp-
tion Level has performance consequences. Homomorphic encryption techniques,
which were discussed in a few research works [1,18], have higher Encryption
Level ; Mean Time to Detect (MTD) metric is utilized for detecting the time
involved in detecting the security breaches of the FL system; Mean Time to
Resolve (MTR) metric discusses on the time required for solving the security
flaws in the federated environments; Intrusion Level (IL) metric determines the
number of successful intruders for a given period of time. It is a known fact that
the security vulnerability of FL environments is high owing to the involvement
of several IoT nodes, sometimes un-managed nodes too; Cross-ref Scripting (CS)
metric is a Boolean indicator that reflects on the false scripts that are function-
ing on the FL networking systems; N-Unauthorized Devices metric illustrates
the number of unauthorized devices which submit sensor data or collecting FL
algorithms for further processing; Differential Privacy, a Boolean metric, defines
the privacy of data in remote sites.
3.3 FL Architectures/Infrastructures
5 FL Applications
An exploratory study on seven applications that were implemented by
researchers/ practitioners, which utilized FL mechanisms, is disclosed in this
section as case studies. The details of these applications include:
Energy Demand Prediction: Predicting energy demand for charging electric vehi-
cles from distributed charging stations using the FL approach is carried out in
[27]. Here, a locally available training model updates the globally available train-
ing model using the TensorFlow tool. The application employed Deep Neural
Network (DNN) algorithms for learning the charging demands within a network
of HybridFL nodes.
48 S. Benedict
Industrial IoT: Linghe et al. [13] proposed a federated tensor mining framework
to collaboratively learn the assembly line features of an automobile industry.
The FL application falls under the category of Industrial IoT, or Smart Factory,
where the FL approach is dealt with based on transferring secure data among
the factory units. The DNN learning algorithm is applied for heterogeneous data
sources with homomorphic encryption approaches in the application.
Image Classification: Image classification problem is applied in a typical auto-
motive vehicles using FL approach by DongDong et al. [4]. The authors proposed
the application of DNN based FL in this application, where the DNN models
are locally trained and the inference of the models are transferred to the global
training models. The approach has a selective model aggregation platform to
pursue an incentive-based training considering computation cost and the other
metrics such as utility and latency of learning algorithms.
Jamming Attack Detection: Jamming attack is a common problem in Unmanned
Aerial Vehicles (UAV). Detecting the jamming attacks need to be handled in a
collaborative fashion for quick responses in the wireless medium. Nishant et al.
[17] studied the application of security-enabled client group prioritization mech-
anism for suggesting FL in a decentralized fashion using the DNN algorithm.
Here, the locally learned models were globally updated for further refining the
learning global state of the FL.
Failure Prediction in Aeronautics: FL approach, in collaboration with an Active
Learning approach, the semi-supervised learning approach, is applied to predict
the failures of aeronautical systems in Nicolas et al. [16]. The authors combined
FL and Active learning approaches in order to efficiently handle the computation
and communication resources of aeronautical systems during the FL learning
phase.
Visual Object Detection: In this application, Yang et al. [26] attempted to apply
the FL approach for quickly detecting objects. In fact, video applications could
lead to a hefty network bandwidth. This aspect could restrict the utilization
of the traditional centralized learning mechanisms. In this application, Feder-
ated YOLOv3 and Recurrent-Convolutional Neural Network (R-CNN) learning
algorithms were studied in the federated environmental setup.
Mobile Packet Classification: Here, mobile packets were classified using the FL
approach so that the unnecessary handoffs or network latency are avoided. Evita
et al. [6] implemented the application of Federated SVM classifier and DNN
algorithms for classifying mobile packets in the application.
6 Conclusions
References
1. Acar, A., Aksu, H., Selcuk Uluagac, A., Conti, M.: A survey on homomorphic
encryption schemes: theory and implementation. ACM Comput. Surv. 51(4), 1–35
(2018). https://doi.org/10.1145/3214303
2. Alvarez, J.M., Salzmann, M.: Compression-aware training of deep networks. In:
Proceedings of the 31st International Conference on Neural Information Processing
Systems, NIPS 2017, pp 856–867 (2017)
3. Ghosh, A., Hong, J., Yin, D., Ramchandranin, K.: arXiv https://arxiv.org/abs/
1906.06629, pp. 1–26 (2019)
4. Ye, D., Rong, Yu., Pan, M., Han, Z.: Federated learning in vehicular edge comput-
ing: a selective model aggregation approach. IEEE Access 8, 23920–23935 (2020).
https://doi.org/10.1109/ACCESS.2020.2968399
5. Energy Consumption in WIFI. http://repositorio.uchile.cl/bitstream/handle/
2250/132644/Energy-Efficiency-Oriented-Traffic-Offloading-in-wireless-networks.
pdf?sequence=1s. Accessed Mar 2020
6. Bakopoulou, E., Tillman, B., Markopoulou, A.: A federated learning approach for
mobile packet classification, arXiv https://arxiv.org/abs/1907.13113 (2019)
7. Chen, F., Luo, M., Dong, Z., Li, Z., He, X.: Federated meta-learning with fast
convergence and efficient communication, preprint in arXiv https://arxiv.org/abs/
1802.07876. Accessed Mar 2020
8. Fu, J., Liu, Y., Chao, H.C., Bhargava, B., Zhag, Z.: Secure data storage and search-
ing for industrial IoT by integrating fog computing and cloud computing. IEEE
Trans. Ind. Inform. 14(10), 4519–4528 (2018)
9. Zahed, M.I.A., Ahmad, I., Habibi, D., Phung, Q.V.: Content caching in industrial
IoT: security and energy considerations. IEEE Internet of Things J. 7, 491–504
(2019)
10. Ren, J., Wang, H., Hou, T., Zheng, S., Tang, C.: Federated learning-based computa-
tion offloading optimization in edge computing-supported internet of things. IEEE
Access 7, 69194–69201 (2019). https://doi.org/10.1109/ACCESS.2019.2919736
11. Konecny, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D.:
Federated learning: strategies for improving communication efficiency. In: NIPS
Workshop on Private Multi-party Machine Learning (2016)
12. Khan, L.U., et al.: Federated learning for edge networks: resource optimization and
incentive mechanism, arXiv https://arxiv.org/pdf/1911.05642.pdf. Accessed Mar
2020
13. Kong, L., Liu, X.-Y., Sheng, H., Zeng, P., Chen, G.: Federated tensor mining for
secure industrial internet of things. IEEE Trans. Ind. Inf. 16(3), 2144–2153 (2020).
https://doi.org/10.1109/TII.2019.2937876
Energy Efficient Aspects of Federated Learning 51
14. Munir, Md., Abedin, S., Tran, N., Han, Z., Huh, E., Hong, C.S.: Risk-aware
energy scheduling for edge computing with microgrid: a multi-agent deep reinforce-
ment learning approach, arXiv https://arxiv.org/pdf/2003.02157.pdf. Accessed
Mar 2020
15. Wang, M., Zhu, L., Yang, L.T., Lin, M., Deng, X., Yi, L.: Offloading assisted
energy-balanced IoT edge node relocation for confident information coverage. IEEE
Internet of Things 6(3), 4482–4490 (2019)
16. Aussel, N., Chabridon, S., Petetin, Y.: Combining federated and active learn-
ing for communication-efficient distributed failure prediction in aeronautics, arXiv
preprint https://arxiv.org/pdf/2001.07504.pdf. Accessed Mar 2020
17. Mowla, N.I., Tran, N.H., Doh, I., Chae, K.: Federated learning-based cognitive
detection of jamming attack in flying ad-hoc network. IEEE Access 8, 4338–4350
(2019). https://doi.org/10.1109/ACCESS.2019.2962873
18. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and
applications. ACM Trans. Intell. Syst. Technol. 12 (2019). https://doi.org/10.
1145/3298981
19. Lin, S., Yang, G., Zhang, J.: A collaborative learning framework via federated
meta-learning, preprint arXiv https://arxiv.org/abs/2001.03229 (2020)
20. Benedict, S.: Prediction assisted runtime based energy tuning mechanism for HPC
applications. Sustain. Comput. Inform. Syst. 19, 43–51 (2018)
21. Benedict, S.: Threshold acceptance algorithm based energy tuning of scientific
applications using EnergyAnalyzer. In: ISEC2014. ACM (2014). https://doi.acm.
org/10.1145/2590748.2590759
22. Wang, S.: Adaptive federated learning in resource constrained edge computing
systems, arXiv https://arxiv.org/pdf/1804.05271.pdf (2019)
23. Samarakoon, S., Bennis, M., Saad, W., Debbah, M.: Distributed federated learning
for ultra-reliable low-latency vehicular communications. IEEE Trans. Commun.
(2019). https://doi.org/10.1109/TCOMM.2019.2956472
24. Coughlin, T.: Forbes report on data explosion. https://www.forbes.com/sites/
tomcoughlin/2018/11/27/175-zettabytes-by-2025/#7fe3174d5459. Accessed Mar
2020
25. Hesselbach, X., Sanchez-Aguero, V., Valera, F., Vidal, I., Nogales, B.: An NFV-
based energy scheduling algorithm for a 5G enabled fleet of programmable
unmanned aerial vehicles. Wirel. Commun. Mob. Comput. 2019, 1–20 (2019).
https://doi.org/10.1155/2019/4734821
26. Liu, Y., et al.: FedVision: an online visual object detection platform powered by
federated learning, arXiv https://arxiv.org/abs/2001.06202. Accessed Mar 2020
27. Saputra, Y., Thai, H.D., Nguyen, D., Dutkiewicz, E., Mueck, M., Srikanteswara,
S.: Energy demand prediction with federated learning for electric vehicle networks,
arXiv https://arxiv.org/abs/1909.00907. Accessed Mar 2020
Classification of UrbanSound8k: A Study
Using Convolutional Neural Network
and Multiple Data Augmentation Techniques
1 Introduction
As various regions become urbanized and densely populated, the permeation of noises
such as cars honking, busses whirring, siren wailing and the mechanical clattering from
construction becomes inescapable. It has been shown in various studies that exposure
to noise pollution can have a direct impact on the health and wellbeing of the popu-
lation [1]. As the population continues to grow, noise pollution will continue to expand
as a health risk. Understanding these problems better will lead to the creation of new
potential solutions. Research in the areas of audio datasets with respect to music and
speech is prevalent and continues to grow. Much progress has been made in the last
few years in the area of classification of environmental sounds. Classification of urban
sounds has a plethora of applications as well, such as, systems that assist those with
2 Related Work
Salamon et al. presented a taxonomy for urban sounds to enable a common framework
for research and introduced a new dataset, UrbanSound8k, consisting of 10 classes of
audio that spans 27 h with 18.5 h of annotated event occurrences [2]. The ten classes of
sound include the following: “air conditioner, car horn, children playing, dog bark,
drilling, engine idling, gun shot, jackhammer, siren, and street music”. Piczak imple-
mented an environmental sound classification model with a convolutional neural net-
work (CNN) architecture [3]. The datasets were preprocessed into segmented
spectrograms. The CNN architecture consisted of 2 fully connected layers and 2
convolutional layers with max pooling. The proposed model achieved an accuracy of
73%.
The performance of the CNN’s on the UrbanSound8k dataset was improved by
Salamon et al. using data augmentation [4]. Data augmentation was carried out on raw
audio waveform before being transformed into their log-scaled Mel-spectrogram rep-
resentation. The authors used audio augmentation methods such as time stretching,
pitch shifting, dynamic range compression and background noise insertion. Their
model achieved 73% accuracy without data augmentation and 79% accuracy with data
augmentation. Sang et al. proposed a convolutional recurrent neural network model
trained on raw audio waveforms for environmental sound classification [5]. The con-
volutional layers extracted high level features and the recurrent layers made temporal
aggregations of the extracted features. Long short-term networks were used as the
recurrent layers. The proposed architecture achieved an accuracy of 79.06%.
Dai et al. proposed a deep CNN architecture for environmental sound classification
with 34 weight layers with no addition of fully connected layers [6]. Audio samples
were used in their raw waveform as opposed to various feature extraction transfor-
mations. The model achieved decent performance and matched the accuracy of some
models trained on log-scaled Mel-features. This model scored an accuracy of 71.8%.
Zhang et al. used dilated convolutional neural networks and tested the effectiveness of
different activation functions such as rectified linear unit (ReLU), LeakyReLU,
54 A. A. Rahman and J. Angel Arul Jothi
exponential linear unit (ELU), parametric ReLU (PReLU) and Softplus [7]. The models
were tested on the UrbanSound8k, ESC-10 and CICESE datasets. 1D data augmen-
tations were used. The results showed that the LeakyReLU activation function worked
better on the UrbanSound8k and ESC-10 datasets whereas the PReLU activation
function worked better on the CICESE dataset. The reasoning behind the superior
performance of the LeakyReLU activation function being that it trades of network
sparsity for more network information leading to better performance by the classifier.
The proposed model with the LeakyReLU activation function achieved an accuracy of
81.9% on the urban sound dataset.
Though there are work in the literature that have aimed at classifying the Urban-
Sound8k dataset, this work is different from the previous research in the following
ways: 1) This work uses a 5-layer deep CNN for learning the features and classifying
the audio samples from the UrbanSound8k dataset. 2) In this work an in-depth study is
conducted to analyze the effect of the SpecAugment augmentation method combined
with other augmentation methods such as pitch shifting, time stretching and back-
ground noise insertion on the classification of the UrbanSound8k dataset. To the best of
our knowledge, there is no previous work in the literature investigating the influence of
the SpecAugment augmentation method on the UrbanSound8k dataset.
3 Dataset Description
This work uses the UrbanSound8k dataset introduced by Salamon et al. [2]. The
UrbanSound8k consists of 8732 audio tracks belonging to 10 different classes like the
air conditioner, car horn, children playing, dog barking, drilling, engine idling, gun
shot, jackhammer, siren, and street music. These 10 classes were chosen due to the
high frequency of occurrence in the city environments apart from the gun shot class
which was added for variety. Each audio track spanned a duration of 4 s. This time
length was chosen after testing showed that 4 s are enough for models to classify the
dataset with decent accuracy. The sequences which lasted more than 4 s were seg-
mented into slices of 4 s using a sliding window with a hop size of 2s. The total
duration of the audio recording for the 8732 audio tracks was 8.7 h. Figure 1 shows the
amplitude vs time plot of sample audio tracks from each of the 10 classes of the
UrbanSound8k dataset. The matplotlib and librosa python library are used to obtain the
plots [13].
4 Data Preprocessing
This section details the various augmentations and transformations used in this study.
1D Data Augmentation. In order to overcome the deficit of samples in the dataset,
various data augmentation methods are being used that produce new samples from the
existing samples by applying transformations to the samples in the dataset [8]. To
produce augmentations to the raw audio dataset, methods such as pitch shifting, time
Classification of UrbanSound8k 55
stretching and background noise insertion are used in the literature [4]. The values of
pitch shifting and time stretching were chosen based on the results obtain in [4].
Fig. 1. Amplitude vs time plot of sample audio tracks from each of the 10 class of the
UrbanSound8k dataset.
Time Stretching. The audio samples are sped up or slowed down by desired factors.
The rates chosen for this study were 0.81 and 1.07.
Pitch Shifting. The pitch of the samples is increased or decreased according to spec-
ified values. Previous studies have shown that this augmentation method can improve
results obtained by a classifier. The values chosen for this study were −1, −2, 1 and 2.
Background Noise. Noise was generated using the NumPy library and inserted into the
audio samples in the dataset.
Audio Transformations. Classification algorithms, especially CNNs, have been
shown to work significantly better on audio signals after the signals have been
56 A. A. Rahman and J. Angel Arul Jothi
SpecAugment (SA). Traditionally, data augmentation has been applied to audio sig-
nals before they were transformed into various visual representations such as spec-
trograms. Park et al. developed an augmentation method called SpecAugment [11] that
can be applied directly to spectrograms by warping it in the time axis and applying
blocks of consecutive horizontal (Mel-frequency channels) and vertical masks (time
steps). It has been shown that SpecAugment audio signals make the classifier more
robust against deformations in the time domain for speech recognition tasks.
5 Methodology
Deep Learning. Deep learning models are able to learn representations of raw data
through multiple levels of abstractions due to these models being composed of multiple
Classification of UrbanSound8k 57
processing layers. Deep learning models have been able to achieve remarkable levels of
performance and accuracy over the years [14, 15]. Recurrent neural networks and
convolutional neural networks are the most popular deep learning models prevalent in
the literature. Recurrent neural networks work well with sequential data, i.e. text and
speech and convolutional neural networks have had tremendous success in processing
video, images, speech and audio [12].
Convolutional Neural Network. Convolutional Neural Network is composed of mul-
tiple layers that include convolutional, pooling and usually one or more fully connected
layers. The convolutional layers consist of kernels or filters that are trained to optimally
extract features from an image. The pooling layer reduces the number of parameters by
subsampling the feature map output by the convolutional layers. Activation functions
are used in order to add non-linearity to the model so that it can perform complex tasks.
The ReLU activation function has particularly gained popularity. This is because the
ReLU type activation functions overcome the vanishing gradient problem faced by
other activation functions such as the sigmoid and tanh activation function. In order to
compute and minimize the error of the model, loss functions are used. Cross-entropy
and mean squared error are two commonly used loss functions.
Proposed Model. The CNN model used in this study consists of four 2D convolutional
layers with batch normalization. Batch normalization stabilizes the learning process
and reduces training time by standardizing the inputs of each minibatch to a layer. The
audio samples from the UrbanSound8k dataset will first go through various augmen-
tations before being fed into the CNN model for training. The input to the CNN is the
Mel-spectrogram representation of an audio signal.
The first convolutional layer (Conv2D) has 32 filters with a filter size of 3 3. It
receives inputs of dimensions (128,40,174,1), where 128 is the batch size, 40 is the
number of Mel-bands and 174 is the frame counts. Its outputs are of dimensions
(128,38,172,32) where 32 represents the feature maps created by the filters in the first
Conv2D layer. Feature maps are the outputs produced by the filters. The input batch size
to the CNN model is set to 128. There is no zero padding and the stride is kept as one.
2D spatial dropout regularization with rates of 0.07 and 0.14 are added to prevent
overfitting by regularizing the model by setting random units to zero. Studies by Zhang
et al. showed that the leakyReLU activation function worked relatively better on the
UrbanSound8k dataset, hence we decided to use the same in this study [7]. The ReLU
activation function returns the same value if the input value is positive and returns zero
if the input value is negative. The leakyReLU activation function returns small negative
values when the input value is less than zero.
The second Conv2D layer has 32 filters with a filter size of 3 3 and the output
with the dimensions of (128,36,170,32). A max pooling layer of size 2 2 was added
after the batch normalization of the second Conv2D layer. Max pooling layers down
samples the feature maps along the height and the width by choosing the highest values
from each patch the layer passes over. The feature map is down sampled by the max
pooling layer to the dimensions of (128,18,85,32).
The third Conv2D layer has 64 filters with a filter size of 3 3 and the output
having dimensions of (128,16,83,64). The fourth Conv2D layer has 64 filters with a
filter size of 3 3 with the output having dimensions of (128,14,81,64).
58 A. A. Rahman and J. Angel Arul Jothi
A global average pooling layer is added after the batch normalization layer of the
fourth 2D convolutional layer without spatial dropout. The global average pooling
layer down samples the collection of feature maps into a single value of dimensions
(128,64) by taking the average of the values in the feature map.
A fully connected output layer with 10 nodes is added for classification of the data.
The final dense layer is followed by the softmax activation function to represent
probability distribution over the 10 different classes.
The network is trained using the Adam optimizer to minimize the categorical cross-
entropy loss function with early stopping. Figure 3 shows the CNN architecture used in
this work. The CNN model is implemented using Keras.
Mel-Spectrogram
GlobalAveragePooling2D
SpatialDropout2D
SpatialDropout2D
SpatialDropout2D
MaxPooling2D
Conv2D
Conv2D
Conv2D
Conv2D
Output
Dense
Batch Size: 128 Filters: 32 Filters: 32 Filters: 64 Filters: 64 Units: 10
Kernel: 3x3 Kernel: 3x3 Kernel: 3x3 Kernel: 3x3 Activation: softmax
Activation: LeakyReLU Activation: LeakyReLU Activation: LeakyReLU Activation: LeakyReLU Input: (128,64)
Batch Normalization Batch Normalization Batch Normalization Batch Normalization Output: (128,10)
Input: (128,40,174,1) Max Pooling: (2,2) Input: (128,18,85,32) Input: (128,16,83,64)
Output: (128,38,172,32) Input: (128,38,172,32) Output: (128,16,83,64) Output: (128,14,81,64)
Dropout: 0.07 Output: (128,36,170,32) Dropout: 0.14 After Global Average
After Max Pooling: Pooling: (128,64)
(128,18,85,32)
Dropout: 0.07
Figure 4 shows the process pipeline by which the different data are formed. The
number of samples and their distribution in the 10 data have been presented in the bar
graph in Fig. 5. Pitch shifting, time stretching and background noise insertion aug-
mentations are applied to the audio samples before transforming into the Mel-
spectrograms whereas SpecAugment augmentation is applied after the samples are
transformed into their corresponding Mel-spectrogram representation. The CNN
models trained on the different datasets are then evaluated on the evaluation fold on
which no augmentation has been applied.
The CNN model is trained with 10 different data explained in the previous section. The
average accuracy and F1-scores of the different augmentation methods are tabulated in
Table 2. It can be observed from Table 2 that applying augmentations to the dataset did
60 A. A. Rahman and J. Angel Arul Jothi
indeed improve the performance of the CNN model. The CNN model when trained and
tested on the NA data achieved an average accuracy and F1-score of 76.9% and 78.3%
respectively while the highest average accuracy and F1-score of 79.2% and 80.3% is
obtained when the CNN model is trained using SAxTS data.
Fig. 4. Process pipeline depicting the formation of the various input data from the
UrbanSound8k dataset using various audio data augmentation techniques.
Fig. 5. The number of samples and their distribution in each of the 10 data.
Classification of UrbanSound8k 61
Figure 6 shows the boxplot representation of the accuracy values obtained by 10-
fold cross validation of the CNN model using NA and the SAxTS data. It can be
observed from the figure that the proposed CNN model using SAxTS data is more
robust as the corresponding boxplot covers less area. Also, it can be noted that the
CNN model exhibits better accuracy values when it uses SAxTS data. It is worth noting
that the performance of the CNN model did not improve as the number of samples in
each dataset increased as the model trained on the ALL dataset (139,712 samples)
performed worse than the model trained on the SAxTS dataset (52392 samples).
Figure 7 (a) shows the confusion matrix obtained when the CNN model is trained
and tested with SAxTS data. Figure 7 (b) shows the difference matrix obtained by
computing the difference between the confusion matrices obtained when the CNN
model is trained using SAxTS and NA data. Positive values along the diagonal of the
difference matrix indicate that the CNN model performs better when trained using
SAxTS data. In other words, it shows that the CNN model has better performance when
trained using SAxTS data when compared with the NA.
The per-class classification accuracy values of the CNN model when trained and
tested with the SAxTS data are 0.98, 0.89, 0.88, 0.87, 0.85, 0.83, 0.78, 0.75, 0.69 and
0.54 for the Gun Shot, Car Horn, Dog bark, Siren, Street Music, Children Playing,
Drilling, Engine Idling, Jackhammer, and Air Conditioner classes respectively. Fig-
ure 8 shows the per-class accuracy difference values obtained when the CNN model is
trained and tested on different data. For computing the accuracy difference, the accu-
racy value obtained while training the CNN model with the NA data is taken as the
base value. It can be seen that for certain classes data augmentation greatly improved
the performance of the CNN model. However, it can also be noted that in spite of data
augmentation, the CNN model did not achieve good results for certain classes, such as
‘air conditioner’, ‘drilling’ and ‘jackhammer’.
NA SAxTS
Fig. 6. Boxplot showing the accuracy values obtained by 10-fold cross validation of the CNN
model using NA and SAxTS data augmentation methods.
Fig. 7. (a) Confusion matrix of the CNN model when trained and tested with SAxTS data
(b) Difference between confusion matrix values of the CNN when trained with SAxTS and NA.
Finally, we compared the performance of our proposed CNN model with other
state-of-the-art deep learning models like PiczacCNN [3] and SB-CNN [4] with data
augmentation for classifying the audio samples from the UrbanSound8k dataset. The
SB-CNN used time stretching, pitch shifting, background noise insertion and dynamic
range compression as the data augmentation techniques. It was observed that Pic-
zacCNN without data augmentation produced an average accuracy of 73% and SB-
CNN with data augmentation achieved an accuracy of 79%. Further comparisons with
other models, as depicted in Table 3, shows that the proposed CNN model along with
Classification of UrbanSound8k 63
Fig. 8. Per-class accuracy difference values when the CNN model is trained and tested on
different data. Note the vertical line corresponding to the value 0 represents the accuracy value of
the CNN when trained and tested using the NA data.
8 Conclusion
This paper investigated the effectiveness of various audio data augmentation techniques
on the proposed CNN model for classifying the environmental sounds belonging to
10 classes from the Urbansound8k dataset. Several augmentation methods and their
combinations were explored in this study. The experiments showed that the proposed
CNN model achieved better results when the data was augmented using SpecAugment
and time stretching data augmentation techniques. Additionally, the results also support
the fact that applying augmentation to the datasets improves the performance of the
CNN models.
64 A. A. Rahman and J. Angel Arul Jothi
References
1. Stansfeld, S., Matheson, M.: Noise pollution: non-auditory effects on health. Br. Med. Bull.
68, 243–257 (2003). https://doi.org/10.1093/bmb/ldg033
2. Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In:
Proceedings of the ACM International Conference on Multimedia, MM 2014 (2014). https://
doi.org/10.1145/2647868.2655045
3. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In:
IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
(2015). https://doi.org/10.1109/mlsp.2015.7324337
4. Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for
environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017).
https://doi.org/10.1109/lsp.2017.2657381
5. Sang, J., Park, S., Lee, J.: Convolutional recurrent neural networks for urban sound
classification using raw waveforms. In: 2018 26th European Signal Processing Conference
(EUSIPCO) (2018). https://doi.org/10.23919/eusipco.2018.8553247
6. Dai, W., Dai, C., Qu, S., Li, J., Das, S.: Very deep convolutional neural networks for raw
waveforms. In: 2017 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP) (2017). https://doi.org/10.1109/icassp.2017.7952190
7. Zhang, X., Zou, Y., Shi, W.: Dilated convolution neural network with LeakyReLU for
environmental sound classification. In: 2017 22nd International Conference on Digital
Signal Processing (DSP) (2017). https://doi.org/10.1109/icdsp.2017.8096153
8. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning.
J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0
9. Md Shahrin, M.H.: Comparison of Time-Frequency Representations for Environmental
Sound Classification using Convolutional Neural Networks (2017)
10. Rabiner, L.R., Schafer, R.W.: Theory and Applications of Digital Speech Processing.
Pearson (2010)
11. Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech
recognition. In: Interspeech 2019 (2019). https://doi.org/10.21437/interspeech.2019-2680
12. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
13. McFee, B., et al.: librosa: audio and music signal analysis in python. In: Proceedings of the
14th Python in Science Conference, pp. 18–25 (2015)
14. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional
neural networks. In: Neural Information Processing Systems 25 (2012). https://doi.org/10.
1145/3065386
15. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image
Recognition. arXiv:1409.1556 (2014)
Quantile Regression Support Vector Machine
(QRSVM) Model for Time Series Data
Analysis
Dharmendra Patel(&)
1 Introduction
Time series data is very vital for many applications such as economics, medicine,
education, social sciences, epidemiology, weather forecasting, physical sciences etc. to
derive meaningful insights at different points in time. Conventional statistics methods
have several limitations to deal with time series data so specialized methods known as
time series analysis requires predominantly in such cases. The simplest and most
popular method is linear least square method. Least square method gives the trend line
to best fit to a time series data. It exhibits several advantages:
• It is very simple method to understand and derive the prediction
• It is to be applicable for all most all applications
• It gives maximum likelihood solutions if correlate with Markov Conditions.
However, it suffers from several critical limitations:
• Sensitive towards outliers.
• Data needs to be normally distributed for better results.
• It exhibits tendency of outfit data.
© Springer Nature Singapore Pte Ltd. 2021
K. K. Patel et al. (Eds.): icSoftComp 2020, CCIS 1374, pp. 65–74, 2021.
https://doi.org/10.1007/978-981-16-0708-0_6
66 D. Patel
2 Related Work
Statistical Methods predominantly used for time series data analysis. Autoregressive
Integrated Moving Average (ARIMA) model is the most prevalent and commonly used
for time series data analysis [6]. Notwithstanding, these sort of models depend on the
hypothesis that take into an account that time series must be linear and follows a
normal distribution of the data. C. Hamzacebi in 2008 [1] proposed a distinction of
ARIMA model called as Seasonal ARIMA (SARIMA). The prototypical produced
good results for seasonal time series data, however it required to undertake linear form
of associated time series data. The limitations of the linear models could be overcome
by non-linear stochastic models [5, 19]. However, the implementation of these kind of
models is very complex.
QRSVM Model for Time Series Data Analysis 67
Neural Network based time series models have grown as of late and pulled in
expanding considerations [8, 9]. The astounding element of ANNs is their inherent
capability of non-linear modeling with no presupposition about the statistical distri-
bution monitored by the annotations. The incredible highlights about ANN based
models are self-versatile in nature [28]. There is assortment of ANN models exist in the
literature. The Multi-Layer Perceptron (MLP) is the most famous and basic model
dependent on ANN [2, 4, 13, 22]. MLPs contain different layers of computational
components, unified in a feed-forward way [18].MLPs utilize a variety of learning
techniques, the conspicuous is back-propagation [16, 20, 29] where the output esteems
are related with the exact response to compute the value of some foreordained error-
function. The error is then served back through the network. Utilizing this data, the
algorithm controls the degree of each linkage so as to decrease the estimation of the
error function by some insignificant quantity. An overall strategy for non-linear opti-
mization called gradient descent [21, 23] is applied to regulate the degrees. Time
Lagged Neural Network (TLNN) is another variation of Feed Forward way [15, 26].
In TLNN, the input nodes are the time series values at some specific lags. Likewise,
there is a constant input term, which may be expediently taken as 1 and this is linked to
every neuron in the hidden and output layer. The presentation of this constant input unit
circumvents the need of separately introducing a bias term. In 2007, Pang et al. [17]
introduced one model dependent on neural network and efficaciously applied to the
simulation in the rainfall. Li et al. [14], In 2008, presented hybrid model based on AR *
and generalized regression neural network model (GRNN) and that gave respectable
results in the setting to the time series data. Chen and Chang in 2009 [3] came out with
an Evolutionary Artificial Neural Network model (EANN) to build automatically the
architecture and the connections of the weights of the neural network. Khashei and
Bijari in 2010 [11] introduced a new hybrid ANN model, utilizing an ARIMA model to
discover predictions more precise than the model of neural networks. Wu and
Shahidehpour [27] proposed a fusion model based on an adaptive Wavelet Neural
Network (AWNN) and time series models, such as the ARMAX and GARCH, to
predict the day by day estimation of electricity in the market. In [7] researchers pro-
posed a regression neural network model to anticipate widespread time series, which is
a fusion of diverse algorithms for machine learning. Artificial Neural Network based
algorithms are overwhelming for time series data analysis however they show various
constraints such as: appropriate network structure is attained through trial and error,
sometimes mysterious performance of the network, usually require more data to train
the model fittingly, computationally complex and affluent. Support Vector Machine
[12, 24, 25] is the vigorous machine learning technique for the pattern generation and
classification.
The proposed model will use SVM as it isn’t just intended for decent classification
yet additionally expected for an improved speculation of the training data. Solutions
obtained by SVM is always unique as it depends on linearly constrained quadratic
optimization. The model will use the fusion methodology of SVM and Quantile
Regression [10]. Quantile Regression methodology permits for comprehension rela-
tionships between variables outside of the average of the data, making it valuable in
understanding outcomes that are non-normally dispersed and that have nonlinear
relationships with predictor variables.
68 D. Patel
Y ¼ a þ b X þ er e ð2Þ
In this also, conditional scale for dependent variable y is not vary with independent
variable X. In order to realize covariate properties in context to dependent variable
Quantile Regressing concept is required.
Least square regression having only one conditional mean while Quantile
Regression contains numerous conditional quantiles. In the nonlinear quantile regres-
sion, the quantile of the dependent variable Y for a given independent attribute X is
assumed to be nonlinearly related to the input vector Xi 2 Rd and represented by
nonlinear mapping function /(…). The new version related to nonlinearity charac-
teristic of quantile function is represented as:
QX ¼ Wh /ðXÞ ð6Þ
Absolute deviation loss will occur in quantile regression so SVM with quantile
regression plays a vital role. The equation of quantile regression with SVM is repre-
sented as:
Experiment simulation is carried out using R programming language (Figs. 1 and 2).
The box plot of rainfall data indicates that several values are outliers. For the
accurate prediction of the time series data, outlier values also play an important role.
According to least square regression model, residuals values and Coefficient
statistics, are depicted in Table 2 and Table 3 respectively. They having one value
based on central tendency value.
Quantile Regression SVM model generate coefficient based on quantile value. For
the similar data, the coefficient of the QRSVM Model is depicted in Table 4 (Table 5,
Figs. 3 and 4).
QRSVM Model for Time Series Data Analysis 71
From the results we can see that Least Square regression distributes Intercept and X
values on central tendency where as QRSVM model distributes them with multiple values
based on the values of percentile so we can understand and explore the insights with
multiple dimensions. The other thing is if outlier exist in the data then central tendency
value might be compromised whereas this situation will not affect in QRSVM Model.
5 Conclusions
In this proposed work, it is concluded that Least Square Regression Model exhibits
several limitations such as it attempts to define conditional distribution by utilizing only
the average of a distribution. Another thing is, it assumes that the error term is same
across all values of X in which conditional variable (Y/X) to be assumed a constant
variance r2. In order to realize covariate properties in context to dependent variable,
Quantile Regressing concept is required. Based on the experiments of time series data
of weather, the paper concluded that QRSVM model distributes Intercept and X values
in multiple values in order to understand and interpret them effectively. Paper also
concluded that the results of LSR model might be compromised to deal to with outliers
whereas this situation does not exist in QRSVM model.
References
1. Hamzacebi, C.: Improving artificial neural networks’ performance in seasonal time series
forecasting. Inf. Sci. 178, 4550–4559 (2008)
2. Calcagno, G., Antonino, S.: A multilayer neural network-based approach for the
identification of responsiveness to interferon therapy in multiple sclerosis patients. Inf.
Sci. 180(21), 4153–4163 (2010)
QRSVM Model for Time Series Data Analysis 73
3. Chen, Y., Chang, F.-J.: Evolutionary artificial neural networks for hydrological systems
forecasting. J. Hydrol. 367, 125–137 (2009)
4. Wang, D., Liu, D., Zhao, D., Huang, Y., Zhang, D.: A neural-network-based iterative GDHP
approach for solving a class of nonlinear optimal control problems with control constraints.
Neural Comput. Appl. 22(2), 219–227 (2013)
5. Zhang, G.P.: A neural network ensemble method with jittered training data for time series
forecasting. Inf. Sci. 177, 5329–5346 (2007)
6. Zhang, G.P.: Time series forecasting using a hybrid ARIMA and neural network model.
Neurocomputing 50, 159–175 (2003)
7. Gheyas, I.A., Smith, L.S.: A novel neural network ensemble architecture for time series
forecasting. Neurocomuting 74, 3855–3864 (2011)
8. Kihoro, J.M., Otieno, R.O., Wafula, C.: Seasonal time series forecasting: a comparative
study of ARIMA and ANN models. Afr. J. Sci. Technol. (AJST) Sci. Eng. Ser. 5(2), 41–49
(2004)
9. Kamruzzaman, J., Begg, R., Sarker, R.: Artificial Neural Networks in Finance and
Manufacturing. Idea Group Publishing, USA (2006)
10. Yu, K., Lu, Z., Stander, J.: Quantile regression: applications and current research areas. J. R.
Stat. Soc. Ser. D (The Stat.) 52, 331–350 (2003)
11. Khashei, M., Bijari, M.: An artificial neural network (p,d,q) model for timeseries forecasting.
Expert Syst. Appl. 37(1), 479–489 (2010). https://doi.org/10.1016/j.eswa.2009.05.044
12. Cao, L.J., Tay, F.E.H.: Support vector machine with adaptive parameters in financial time
series forecasting. IEEE Trans. Neural Netw. 14(6), 1506–1518 (2003). https://doi.org/10.
1109/TNN.2003.820556
13. Tawfiq, L.N.M.: Design and training artificial neural networksfor solving differential
equations. Ph.D. thesis, University of Baghdad, College of Education Ibn-Al-Haitham
(2004)
14. Li, W., Luo, Y., Zhu, Q., Liu, J., Le, J.: Applications of AR*-GRNN model for the financial
time series forecasting. Neural Comput. Appl. 17, 441–448 (2008)
15. Moseley, N.: Modeling economic time series using a focused time lagged feed forward
neural network. In: Proceedings of Student Research Day, CSIS, Pace University (2003)
16. Nawi, N.M., Ransing, M.R., Ransing, R.S.: An improved conjugate gradient based learning
algorithm for back propagation neural networks. J. Comput. Intell. 4, 46–55 (2007)
17. Pang, B., Guo, S., Xiong, L., Li, C.: A non linear perturbation model based on artificial
neural network. J. Hydrol. 333, 504–516 (2007)
18. Prochazka, A.P.: Feed-forward and recurrent neural networks in signal prediction. In: 4th
International Conference on Computational Cybernetics. IEEE (2007)
19. Parrelli, R.: Introduction to ARCH & GARCH models. Optional TA Handouts, Econ 472
Department of Economics, University of Illinois (2001)
20. Rehman, M.Z., Nawi, N.M., Ghazali, M.I.: Noise-induced hearing loss (NIHL) prediction in
humans using a modified back propagation neural network. In: 2nd International Conference
on Science Engineering and Technology, pp. 185–189 (2011)
21. Ruder, S.: An overview of gradient descent optimization algorithms. https://arxiv.org/pdf/
1609.04747.pdf. Accessed 23 Jan 2018
22. Hoda I., S.A., Nagla, H.A.: On neural network methods for mixed boundary value problems.
Int. J. Nonlinear Sci. 11(3), 312–316 (2011)
23. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM:
visual explanations from deep networks via gradient-based localization. In: Proceedings of
the 2017 IEEE International Conference on Computer Vision (ICCV) (2017). https://doi.org/
10.1109/ICCV.2017.74
74 D. Patel
24. Farooq, T., Guergachi, A., Krishnan, S.: Chaotic time series prediction using knowledge
based Green’s Kernel and least-squares support vector machines. In: Systems, Man and
Cybernetics, pp. 373–378 (2007)
25. Raicharoen, T., Lursinsap, C., Sanguanbhoki, P.: Application of critical support vector
machine to time series prediction. In: Proceedings of the 2003 International Symposium on
Circuits and Systems, ISCAS 2003, vol. 5, pp. 741–744 (2003)
26. Yolcu, U., Egrioglu, E., Aladag, C.H.: A new linear & nonlinear artificial neural network
model for time series forecasting. Decis. Support Syst. 54(3), 1340–1347 (2013)
27. Lei, W., Shahidehpour, M.: A hybrid model for day-ahead price forecasting. IEEE Trans.
Power Syst. 25(3), 1519–1530 (2010). https://doi.org/10.1109/TPWRS.2009.2039948
28. Wang, X., Meng, M.: A hybrid neural network and ARIMA model for energy consumption
forecasting. J. Comput. 7(5), 1184–1190 (2012)
29. Zweiri, Y., Seneviratne, L., Althoefer, K.: Stability analysis of a three-term backpropagation
algorithm. Neural Netw. 18(10), 1341–1347 (2005). https://doi.org/10.1016/j.neunet.2005.
04.007
Selection of Characteristics by Hybrid
Method: RFE, Ridge, Lasso,
and Bayesian for the Power Forecast
for a Photovoltaic System
1 Introduction
Due to the increasing use of photovoltaic systems in the generation of alternative
energy, it is difficult to obtain mathematical or physical models that result in
the efficient use of such systems. Likewise, the tools and algorithms provided
c Springer Nature Singapore Pte Ltd. 2021
K. K. Patel et al. (Eds.): icSoftComp 2020, CCIS 1374, pp. 75–87, 2021.
https://doi.org/10.1007/978-981-16-0708-0_7
76 J. Cruz et al.
by Machine Learning in the use and treatment of data, result in useful tools
to model photovoltaic generation systems. Within the field of data-based fore-
casts we have multiparametric linear regression, which allows forecasts taking
into account a set of independent variables that affect the target or dependent
variable. These variables affect the prediction to some degree, but some of them
do not have so much preponderance in the final forecast, so it is convenient to
eliminate them so that the processing cost and time are reduced. Among some
of the techniques to exclude irrelevant variables or predictors we have: Subset
Selection, Shinkrage Regularization, and Dimension Reduction. Within the first
group that identifies and selects among all the available predictors that are most
related to the target variable, we have: Bestsubset selection and Stepwiese selec-
tion. Within this last group we have: forward, backward, and hybrid. In the
backward method we have the elimination of recursive functions (RFE), which
is the algorithm used in this paper to model the multiparameter photovoltaic
system. RFE is used in various studies such as the selection of attributes in clas-
sifiers based on artificial neural networks in the detection of cyberbullying [1].
In conjunction with other techniques such as SVR for feature selection based on
twin support vector regression [2]; with SVM and Bayes for categorical classifica-
tions [3]. For the modeling of emotions and affective states from EEG, combining
RFE with Random Forest (RF), Support Vector Regression (SVR), Tree-based
bagging [4]. In identifying features for football game earnings forecast, combining
it with were Gradient Boosting and Random Forest [5]. In the prediction of boiler
system failures, using the RFE algorithm in combination with the elimination
of recursive functions by vector machine (SVM-RFE) [6]. In the phenotyping
of high-yield plants [7], to eliminate spectral characteristics, the elimination of
vector-machine recursive characteristics (SVM-RFE), LASSO logistic regression
and random forest are used. To perform the short-term electricity price and
charge forecast using KNN, [8] uses RFE to eliminate redundancy of functions.
To perform heart transplant tests, [9] in pig tests use a combination of RFE-
SVM to select the parameters for the estimation of V0. In the present work,
we perform the combination of RFE with Shinkrage regularization algorithms:
Ridge, Lasso, and Bayesian Ridge, establishing a hybrid algorithm for modeling
the multiparameter photovoltaic system.
2 Methodology
In regression models, a compromise must be made between the bias and the vari-
ance provided by the data to be predicted and the model performed. For this, the
theory provides us with the following variable selection methods (feature selec-
tion): Subset selection, Shrinkage, and Dimension reduction. The first identifies
and selects among all the available predictors those that are most related to the
response variable. Shrinkage or Shrinkage fits the model, including all predic-
tors, but including a method that forces the regression coefficient estimates to
zero. While Dimension Reduction creates a small number of new variables from
combinations of the original variables. Each of them has a subset of techniques
Selection of Characteristics by Hybrid Method 77
such as for subset selection: best subselection and stepwiese selection (forward,
backward and hybrid). For Shrinkage: Ridge, Lasso and ElasticNet. For Dimen-
sion Reduction we have Principal components, Partial Last Square and tSNE.
Subset selection is the task of finding a small subset of the most informative
elements in a basic set. In addition to helping reduce computational time and
algorithm memory, due to working on a much smaller representative set, he
has found numerous applications, including image and video summary, voice
and document summary, grouping, feature selection and models, sensor loca-
tion, social media marketing and product recommendation [10]. The recursive
feature removal method (RFE) used works by recursively removing attributes
and building a model on the remaining attributes. Use precision metrics to rank
the feature based on importance. The RFE method takes the model to be used
and the number of characteristics required as input. Then it gives the classifica-
tion of all the variables, 1 being the most important. It also provides support,
True if it is a relevant feature and False if it is an irrelevant feature.
The data was pre-processed by eliminating the null values. Next, the non-
multicollinearity between the predictors was determined using a heat diagram.
Three hybrid methods of variable selection were performed: RFE-Lasso, RFE-
Ridge, RFE-Bayesian Ridge, comparing them with RFE-OLS, it was used as
a baseline for our work. Finally, the results were validated under conditions of
linearity, normality, no autocorrelation of error terms, and homoscedasticity.
3 Methods
3.1 Recursive Feature Elimination
For RFE we will use the following algorithm:
– 1
Refine/Train the model in the training group using all predictors
– 2
Calculate model performance
– 3
Calculate the importance of variables or classifications
– For (for) each subset size Si , i = 1. . . S do (do
4
• 4.1 Keep the most important variables of Si
• 4.2 Optional: Pre-process the data
• 4.3 Refine/Train the model in the training group using Si predictors
• 4.4 Calculate model performance
• 4.5 Optional: Recalculate rankings for each predictor
• 4.6 End (end)
– 5 Calculate the performance profile on Si
– 6 Determine the appropriate number of predictors
– 7 Use the model corresponding to the optimal Si
The algorithm fits the model to all predictors, each predictor is classified
using its importance for the model. Let S be a sequence of ordered numbers that
are candidate values for the number of predictors to retain (S1 ,S2 , ...). At each
iteration of the feature selection, the highest ranked Si predictors are retained,
78 J. Cruz et al.
3.2 Ridge
For Ridge the sum of squared errors for linear regression is defined by Eq. 1:
N
E= (yi − yˆi )2 (1)
i=1
Just as the data set we want to use to make machine learning models must
follow the Gaussian distribution defined by its mean, μ and variance σ 2 and is
represented by N (μ, σ 2 ), i.e., X∼N (μ, σ 2 ) where X is the input matrix.
For any point xi , the probability of xi is given by Eq. 2.
2
1 − 12 (xi −μ)
P (xi ) = e σ2 (2)
2πσ 2
The occurrence of each xi is independent of the occurrence of another, the joint
probability of each of them is given by Eq. 3:
N
2
1 − 12 (xi −μ)
p(x1 , x2 , ...xN ) = e σ2 (3)
i=1
2πσ 2
Furthermore, linear regression is the solution that gives the maximum likeli-
hood to the line of best fit by Eq. 4:
N
2
1 − 12 (xi −μ)
P (X | μ) = p(x1 , x2 , ...xN ) = 2
e σ2 (4)
i=1
2πσ
Linear regression maximizes this function for the sake of finding the line
of best fit. For this, we take the natural logarithm of the probability function
(likelihood) (L), then differentiate and equal zero by Eq. 5.
N
2 N 2
1 − 12 (xi −μ) 1 − 12 (xi −μ)
ln e σ2 = ln ( e σ2 )= (6)
i=1
2πσ 2 i=1
2πσ 2
Selection of Characteristics by Hybrid Method 79
N
N
1 1 (xi − μ)2
ln ( ) − | (7)
i=1
2πσ 2
i=1
2 σ2
N N 1 (xi −μ)
2
∂ ln(P (X | μ) ∂ i=1
1
ln ( 2πσ 2) ∂ i=1 2 σ2
= − (8)
∂μ ∂μ ∂μ
N
N
(xi − μ) (xi − μ)
=0+ = (9)
i=1
σ2 i=1
σ2
N N
∂ ln(P (X | μ) (xi − μ) i=1 xi
= = 0 =⇒ μ = (10)
∂μ i=1
σ2 N
We take into account here is that maximizing the probability function (likeli-
hood) L is equivalent to minimizing the error function E. Furthermore, and it is
Gaussian distributed with mean transposition (w) * X and variance σ 2 is show
in Eq. 11.
y∼N (ω T X, σ 2 ) o y = ω T X + ε (11)
Where ε∼N (0, σ 2 ) ε is Gaussian distributed noise with zero mean and vari-
ance σ 2 . This is equivalent to saying that in linear regression, the errors are
Gaussian and the trend is linear. For new or outliers, the prediction would be
less accurate for least squares, so we would use the L2 regularization method
or Ridge regression. To do this, we modify the cost function and penalize large
weights as follows by Eq. 12:
N
JRIDGE = (yi − yˆi )2 + λ|w|2 (12)
i=1
A priori:
λ λ
P (w) = √ exp(− wT w) (14)
2π 2
3.3 Ridge-Bayesian
So, applying Bayes
N
exp(J) = exp(−(yn − wT xn )2 )exp(λwT w) (15)
n=1
80 J. Cruz et al.
= Y T T − 2Y T Xw + wT X T Xw + λwT w (16)
∂J
To minimize J, we use ∂w and set its value to 0. Therefore, −2X T +2X T Xw+
2λw = 0
So (X T X + λI)w = X T Y or w = (X T Y )
3.4 Lasso
λ
P (w) = exp(−λ|w|) (19)
2
So that J = (Y − Xw )T (Y − Xw ) + λ|w|
∂J
y ∂w = −2X T Y + 2X T Y + 2X T Xw + λsign(w) = 0
Where sign(w) = 1 If x > 0 and −1 if x < 0 and 0 if x = 0
4 Data Set
The data was collected in the department of Puno whose coordinates are: 15◦
29 27 S and 70◦ 07 37 O. The time period was April and August 2019.
The data to be analyzed were: DC Voltage, AC Voltage, AC Current, Active
Power, Apparent Power, Reactive Power, Frequency, Power Factor, Total Energy,
Daily Energy, DC Voltage, DC Current, and DC Power. Those that were
obtained through the StecaGrid 3010 Inverter. The temperature of the envi-
ronment and the photovoltaic panel were obtained by the PT1000 sensors that
are suitable for temperature-sensitive elements given their special sensitivity,
precision and reliability. Irradiance was obtained through a calibrated Atersa
Selection of Characteristics by Hybrid Method 81
brand cell, whose output signal depends exclusively on solar irradiance and not
on temperature. The amount of data is reduced from 331157 to 123120 because
many of the values obtained are null, for example, the values obtained at night
time. Characteristics such as mean, standard deviation, minimum value, maxi-
mum value and percentages of the pre-processed data are presented in Table 1
and Table 2. The statistics of the data obtained are shown as median, standard
deviation, values: maximums, minimums, and interquartile ranges.
5 Results
The independent variables (predictors) should not be correlated with each other,
as they would cause problems in the interpretation of the coefficients, as well as
the error provided by each one. To determine this, a correlation heat map was
used. Correlation is the basis to eliminate or minimize some variables, this is
done by a variable selection algorithm or by the researcher’s criteria, of course,
advanced methods use an algorithm as will be done later, however, Fig. 1 displays
the matrix to validate subsequent results.
82 J. Cruz et al.
5.2 Prediction
First the RFE method was applied for the selection of variables, to the obtained
results we applied the following Shrinkage regularization methods: Lasso, Ridge
and Bayesian Ridge The data set is divided into training data 98496 (80%) and
test data set 24624 (20%), for better performance seeds are also used. The best
seed is also 8849. The RFE algorithm is applied, the following result is obtained:
[‘Tension AC’, ‘Corriente AC’, ‘Potencia aparente’, ‘Potencia reactiva’, ‘Fre-
cuencia’, ‘Factor de potencia’, ‘Energia total’, ‘Energia diaria’, ‘Tension DC’,
‘Corriente DC’, ‘Potencia DC’, ‘Irradiancia’, ‘Temp modulo’, ‘Temp ambiente’]
[True, True, True, True, True, True, False, False, True, True, True, False,
True, True]
[‘Tension AC’, ‘Corriente AC’, ‘Potencia aparente’, ‘Potencia reactiva’, ‘Fre-
cuencia’, ‘Factor de potencia’, ‘Tension DC’, ‘Corriente DC’, ‘Potencia DC’,
‘Temp modulo’, ‘Temp ambiente’]
Of the 14 variables evaluated, for RFE the optimal number of characteristic
variables was 11 with a score of 0.999768. It is important to mention that RFE
discards: “Energia total”, “Energia diaria” e “Irradiancia”. The hyperparame-
ters are then determined for Ridge an alpha value = 1,538 and for Lasso an alpha
Selection of Characteristics by Hybrid Method 83
value = 0.01. For the models found, we determined R2 and adjusted R2 , the mean
absolute error of R (MAE), the mean square error of R (RMSE) and Score.
Table 3 and Table 4 shows the values obtained for the proposed groups, where
the RFE method with OLS is not part of the research proposal, this result is also
used to compare the research results. The following RFE methods with Lasso,
RFE with Ridge and RFE with Bayesian Ridge; form the proposal of this research.
6.1 Linearity
There must be a linear relationship between the actual data and the prediction
so that the model does not provide inaccurate predictions. It is checked using a
scatter diagram in which the values or points must be on or around the diagonal
line of the diagram Fig. 2 shows the linear relationship.
Fig. 2. RFE-Linearity, (a) Correspond to model OLS, (b) Correspond to model Ridge
(c) Correspond to model Lasso, (d) Correspond to model Bayessian Ridge
84 J. Cruz et al.
The error terms should be distributed normally. The histogram and the proba-
bility graph are shown in Fig. 3.
Autocorrelation indicates that some information is missing that the model should
capture. It would be represented by a systematic bias below or above the predic-
tion. For this we will use the Durbin-Watson test. Value from 0 to 2 is positive
autocorrelation and value from 2 to 4 is negative autocorrelation. For RFE -
OLS there is no autocorrelation. Durbin-Watson Test is 2.0037021333412754,
little to no autocorrelation. For RFE - Bayesian Ridge there is no autocorrela-
tion. Durbin-Watson Test is 2.0037008807358965, little to no autocorrelation. For
RFE - Lasso there is no autocorrelation. Durbin-Watson is 2.0037472224605053,
little to no autocorrelation. Have a For RFE - Ridge there is no autocorrelation.
Durbin-Watson is 2.0037017639830537, little to no autocorrelation.
6.4 Homocedasticity
It must be fulfilled that the error made by the model always has the same
variance. It is presented when the model gives too much weight to a subset of
data, particularly where the variance of the error was the greatest: to detect it,
residuals are plotted to see if the variance is uniform (Fig. 4).
Selection of Characteristics by Hybrid Method 85
In this article we present three hybrid methods for the selection of variables
in the multiparameter regression of photovoltaic systems to predict the levels of
the active power of the photovoltaic system with 14 independent variables, these
methods are RFE - Lasso, RFE - Ridge and RFE - Bayesian Ridge.
Table 3 and Table 4 shows the method comparison, RFE-OLS, which is not
part of our proposal, was compared with OLS to have a benchmark for the fol-
lowing comparisons that are part of the proposal. RFE-Lasso: it has an absolute
error of approximately 0.035% greater than Lasso, which is taken as a disadvan-
tage of the proposal, it has a mean squared error of approximately 0.057% less
than Lasso, which is a significant result considered as a advantage, it has a coef-
ficient of determination of approximately 0.0000309% higher than Lasso, this is
considered greater but almost the same, so it is not considered very advanta-
8 Conclusions
References
1. Çürük, E., Acı, Ç., Saraç Eşsiz, E.: The effects of attribute selection in artificial
neural network based classifiers on cyberbullying detection. In: 2018 3rd Interna-
tional Conference on Computer Science and Engineering (UBMK), Sarajevo, pp.
6–11 (2018). https://doi.org/10.1109/UBMK.2018.8566312
Selection of Characteristics by Hybrid Method 87
2. Wu, Q., Zhang, H., Jing, R., Li, Y.: Feature selection based on twin support vector
regression. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI),
Xiamen, China, pp. 2903–2907 (2019). https://doi.org/10.1109/SSCI44817.2019.
9003001
3. Zheng, Z., Cai, Y., Yang, Y., Li, Y.: Sparse weighted Naive Bayes classifier for
efficient classification of categorical data. In: IEEE Third International Conference
on Data Science in Cyberspace (DSC), Guangzhou, pp. 691–696 (2018). https://
doi.org/10.1109/DSC.2018.00110
4. Al-Fahad, R., Yeasin, M., Anam, A.I., Elahian, B.: Selection of stable features for
modeling 4-D affective space from EEG recording. In: International Joint Confer-
ence on Neural Networks (IJCNN), Anchorage, AK, pp. 1202–1209 (2017). https://
doi.org/10.1109/IJCNN.2017.7965989
5. Tanizaka Filho, M.O., Marujo, E.C., dos Santos, T.C.: Identification of features
for profit forecasting of soccer matches. In: 8th Brazilian Conference on Intelligent
Systems (BRACIS), Salvador, Brazil, pp. 18–23 (2019). https://doi.org/10.1109/
BRACIS.2019.00013
6. Qin, H., Yin, S., Gao, T., Luo, H.: A data-driven fault prediction integrated design
scheme based on ensemble learning for thermal boiler process. In: 2020 IEEE Inter-
national Conference on Industrial Technology (ICIT), Buenos Aires, Argentina, pp.
639–644 (2020). https://doi.org/10.1109/ICIT45562.2020.9067216
7. Moghimi, A., Yang, C., Marchetto, P.M.: Ensemble feature selection for plant
phenotyping: a journey from hyperspectral to multispectral imaging. IEEE Access
6, 56870–56884 (2018). https://doi.org/10.1109/ACCESS.2018.2872801
8. Ashfaq, T., Javaid, N.: Short-term electricity load and price forecasting using
enhanced KNN. In: 2019 International Conference on Frontiers of Information
Technology (FIT), Islamabad, Pakistan, pp. 266–2665 (2019). https://doi.org/10.
1109/FIT47737.2019.00057
9. Xiao, W., et al.: Single-beat measurement of left ventricular contractility in nor-
mothermic ex situ perfused porcine hearts. IEEE Trans. Biomed. Eng. (2020).
https://doi.org/10.1109/TBME.2020.2982655
10. Elhamifar, E., De Paolis Kaluza, M.C.: Subset selection and summarization in
sequential. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing
Systems 30, pp. 1035–1045. Curran Associates Inc. (2017). http://papers.nips.cc/
paper/6704-subset-selection-and-summarization-in-sequential-data.pdf
11. Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of Breiman’s random forest
to modeling structure-activity relationships of pharmaceutical molecules. In: Roli,
F., Kittler, J., Windeatt, T. (eds.) MCS 2004. LNCS, vol. 3077, pp. 334–343.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25966-4 33
Parameter Optimization of Reaching Law
Based Sliding Mode Control by Computational
Intelligence Techniques
1 Introduction
Switching function and control law are the two vital parts of the SMC which decides
the magnitude of chattering and the control effort. To address the chattering issue two
approaches were suggested. In first approach, discontinuous relay type actuator is
substituted by a high-gain device with saturation [6] while second approach is design of
control signal based on a reaching law [7]. In both cases, selections of gain parameters
are critical because performance of the SMC is relied to great extent on gain parameters
of the SMC. The improper tuning of SMC gain parameters could lead to poor system
response, poor accuracy, higher control energy and even system instability. Tradi-
tionally, SMC gain parameters are selected by trial and error method which is very
lengthy and tedious procedure. Thus to avoid such scenario, two different heuristic
algorithms – G.A. and P.S.O. are used for optimum gain selection [8, 9].
GA is heuristic algorithm based on a biological mechanism of survival of the fittest
and basic principle of GA was introduced by Holland [10]. An application of GA
includes constrained and unconstrained optimization, parameter and system identifi-
cation, control, robotics, pattern recognition, training neural network, fault detection,
engineering design etc. [11]. Krishnakumar et al. introduced GA technique for aero-
space related control system optimization problems [12]. Samantha et al. assess the
performance of ANN and SVM classifier with parameter optimization by GA for
bearing fault detection [13]. Mehra et al. designed GA based optimal FOPID controller
to control the speed of DC motor [14]. Later on, Zhou et al. proposed a GA based
optimal sliding mode control for an active suspension system [15].
Particle swarm optimization (PSO) is modern computational intelligence technique
developed by Russell Eberhart and James Kennedy based on research of the bird and
fish group movement characteristics [16].PSO gained wide popularity due to easy
implementation and few parameters – no. of birds, constriction factors c1 & c2, inertial
weight w required for tuning [17]. PSO is applicable to solve lots off optimization
problems that include multi-objective optimization, pattern recognition, control opti-
mization, power electronics, biological system modeling, signal processing, robotics,
automatic target detection and many more [18]. The optimized gain of PID controller
using PSO algorithm for automatic voltage regulator system was investigated in [19].
Zhang et al. blended PSO with back-propagation algorithms to train the ANN [20]. P.
Singh et.al optimizes the Type-2 fuzzy model and hybrid fuzzy model by PSO for time
series and neutrosophic time series forecasting and implemented for predictions of
stock index price and university enrollments predictions [21–24]. Pugh & Martinoli
proposed a multi-search algorithm inspired by PSO to develop efficacious techniques
that permit a team of robots to work unitedly for determining their objects [25].
Mass-spring-damper-MSD systems are widely used as potent conceptual schemes
for modeling purposes in dynamical systems [26]. The concept of MSD system has
immense practical applications such as vibration [27], control system [28], vehicle
suspension system [15, 29] and Position control system [30]. The position control of
MSD system is simple yet challenging problem in control area in the presence of
matched disturbances.
The primary aim of this paper is to compute and compare the optimal sliding gains
using GA and PSO based approaches for reaching law based SMC in the presence of
matched disturbances for MSD system. This paper is shaped in five sections. Section 1
contains introduction about SMC, GA and PSO. In Sect. 2 preliminaries of SMC using
reaching law approach and mathematical background of GA and PSO are discussed.
90 V. Mehra and D. Shah
The design of optimal parameters using GA and PSO for SMC is discussed in Sect. 3.
In Sect. 4 the efficacy of computed sliding gains are compared through simulation
results for MSD system. In Sect. 5 concluding remarks are discussed followed by
possible future directions.
2 Preliminaries
2.1 Sliding Mode Control
SMC is a nonlinear control scheme having noteworthy characteristics of robustness,
insensitive to parameter variation, accuracy and easy implementation. Designing of an
SMC structure consist two phases. In first phase we design sliding surface and in
second phase control input is designed.
Sliding surface r is designed as
n1
d
rðtÞ ¼ þC eð t Þ ð1Þ
dt
where, n = relative degree between input and output. C = is the sliding gain, e(t) is
an error between output signal y(t) and desired signal ydes(t).
For second order dynamical system, sliding surface r given as
1
d
rðtÞ ¼ þ C eð t Þ ð2Þ
dt
Once the sliding surface is decided as in Eq. (3) the next step is to compute the
control law by substituting rðtÞ ¼ 0. The control law selected is based on three
reaching laws which are discussed as under.
1) Constant rate reaching law.
r_ ðtÞ ¼ qsgnðrðtÞÞ ð4Þ
where q = switching gain, Kc = controller gain a having the value between 0 and
1. Once the reaching law is selected the control algorithm is derived by taking time
derivative of sliding surface such that the system state variables would converge to
designed sliding surface at rðtÞ = 0 in the finite time.
where xik ,vik ,pik , pgk , c1 &c2 , r1 &r2 are member position, velocity, best individual
position, Best group position, cognitive and social parameters and random no. Between
0 and 1 respectively.
After movement of all members, next iteration take place and finally members as a
whole is likely to go nearer to the optimal value of the objective function.
92 V. Mehra and D. Shah
5 2 uð t Þ d ð t Þ
x_ 2 ðtÞ ¼ x_ 1 ðtÞ x1 ðtÞ þ þ ð9Þ
2 2 2 2
The objective is here to follow reference trajectory. In our case we take step signal
as reference signal and output y(t) = x1(t) such that, y_ ðtÞ ¼ x_ 1 ðtÞ ¼ x2 ðtÞ; €yðtÞ ¼
€x1 ðtÞ ¼ x_ 2 ðtÞ(t).
Referring Eq. (3), the time derivative of the sliding surface rðtÞ is
Using constant-rate reaching law (4), the control algorithm using Eqs. (9) and (10)
is defined as
5 2 uðtÞ d ðtÞ
q signðrðtÞÞ ¼ Cðx2 ðtÞÞ þ x2 ð t Þ þ x1 ð t Þ
2 2 2 2
uð t Þ 5 2 uð t Þ d ð t Þ
¼ C ðx2 ðtÞÞ þ x2 ðtÞ þ x1 ðtÞ þ q signðrðtÞÞ
2 2 2 2 2
uðtÞ ¼ ð5 2 C Þx2 ðtÞ þ 2x1 ðtÞ d ðtÞ þ 2q signðrðtÞÞ ð11Þ
Similarly, the control algorithm derived using constant plus proportional rate
reaching law is given as
5 2 uð t Þ d ð t Þ
q sgnðrðtÞÞ Kc q ¼ C ðx2 ðtÞÞ þ x2 ð t Þ þ x1 ð t Þ
2 2 2 2
uð t Þ 5 2 d ðtÞ
¼ Cðx2 ðtÞÞ þ x2 ðtÞ þ x1 ðtÞ þ q signðrðtÞÞ þ Kc rðtÞ
2 2 2 2
uðtÞ ¼ ð5 2 CÞx2 ðtÞ þ 2x1 ðtÞ d ðtÞ þ 2qsignðrðtÞÞ þ 2Kc rðtÞ: ð12Þ
Further, the control algorithm derived using power-rate reaching law (6) using Eqs.
(9) and (10) is given as
Parameter Optimization of Reaching Law 93
5 2 uðtÞ d ðtÞ
kc rðtÞa sgnðrðtÞÞ ¼ Cðx2 ðtÞÞ þ x2 ð t Þ þ x1 ð t Þ
2 2 2 2
uð t Þ 5 2 d ðt Þ
¼ Cðx2 ðtÞÞ þ x2 ðtÞ þ x1 ðtÞ þ Kc rðtÞa sgnðrðtÞÞ
2 2 2 2
The control signal derived in Eqs. (11), (12) and (13) using reaching laws shows
satisfactory response by the proper selection of the constant parameters such as C, q,
Controller gain Kc and a. Table 1 and Table 3 shows the optimum values using which
the sliding gain of SMC is computed using GA and PSO approaches. An integral of
absolute of error (IAE) is used as objective function for the computation of parameters.
Table 2 and Table 4 describes the value of optimal gain parameters that is computed
using GA and PSO respectively. The range for the above unknown parameters is
selected as under.
In this section, model of Mass-Spring-Damper system with reaching law based SMC
control structure was built in MATLAB-SIMULINK to affirm the control performance
of the SMC parameter optimized by GA & PSO algorithms.
Figure 1 shows the comparison of step response of MSD system with constant-rate
reaching law whose sliding gain parameters are calculated using GA and PSO. It can be
noticed in step response that output signal tracks the reference input signal accurately in
both times. Moreover, the steady state error of PSO is less to GA. Figure 2 shows the
performance of control signal. It can be noticed that the amplitude of the chattering is
high due to the presence of sign function in constant-rate reaching law. Figure 3
represents sliding surface response. It can be seen that the state variable slides towards
the planned sliding surface rðtÞ = 0 in a finite time in the presence of matched dis-
turbances in both the cases.
Fig. 1. Comparison of step response of a MSD system for constant rate SMC structure with gain
optimization by GA and PSO.
Parameter Optimization of Reaching Law 95
Fig. 2. Control signal response of a MSD system for constant rate SMC structure with gain
optimization by GA and PSO.
Fig. 3. Sliding Surface response of a MSD system for constant rate SMC structure with gain
optimization by (a) GA and (b) PSO.
Figures 4, 5 and 6 show the response of step, control and sliding surface response
of the constant plus proportional rate reaching law. Step responses of the system for
both the cases shows that the output tracks the reference trajectory with negligible
steady state error. Moreover, it can also be noticed that the settling time of the output
signal computed using PSO is faster than that of GA. In the case of control response the
amplitude level of chattering is less as compared to constant-rate reaching law in both
the cases. This indicates that due to less oscillatory behavior of control signal there
would be less weir and tier in the actuator which will increases the efficiency and
performance of the system. Figure 6 represents the result of sliding surface computed
for both the sliding gain parameters shows that for a limited interval of time the state
variable slides towards planned sliding surface rðtÞ = 0.
96 V. Mehra and D. Shah
Fig. 4. Comparison of step response of a MSD system for proportional rate SMC structure with
gain optimization by GA and PSO.
Fig. 5. Control signal response of a MSD system for Proportional rate SMC structure with gain
optimization by (a) GA and (b) PSO.
Fig. 6. Sliding surface response of a MSD system for Proportional rate SMC structure with gain
optimization by (a) GA and (b) PSO.
Parameter Optimization of Reaching Law 97
Figures 7, 8 and 9 show the step response, control response and sliding surface
response for the power-rate reaching law. In both the cases the overall performance of
the system in the terms of step response, control signal and sliding surface is satis-
factory as compared to previous reaching laws. Moreover, the effect of chattering is
also less in the sliding surface which causes the control signal less oscillatory in nature.
Thus from above results, it can be concluded that the sliding gains calculated using
GA and PSO for power-rate reaching law is more efficient and robust than other
reaching laws as it provides faster convergence and less chattering in the control signal
which enhance the overall performance of the system in the presence of matched
disturbances. The comparative analysis in the terms of rise time, peak overshoot (%),
settling time and steady state error is mentioned in Table 5.
Fig. 7. Comparison of Step response of a MSD system for power- rate SMC structure with gain
optimization by GA and PSO.
Fig. 8. Control signal response of a MSD system for power rate SMC structure with gain
optimization by (a) GA and (b) PSO.
98 V. Mehra and D. Shah
Fig. 9. Sliding surface response of a MSD system for power rate SMC structure with gain
optimization by (a) GA and (b) PSO.
5 Conclusion
This paper describes two computational intelligence algorithms namely (i) GA (ii) PSO
for computation of the sliding gains in SMC structure. The SMC structure consists of
reaching law approach such as constant plus proportional rate, proportional-rate and
power-rate respectively. The efficacy of the sliding gains computed using computa-
tional intelligence algorithms was tested on MSD system for the specified reaching
laws. The simulation results proved that the sliding gain computed for all the three
reaching laws shows the optimized performance for the MSD system. The results also
proved that the sliding gain designed using GA and PSO shows optimal performance
for power-rate reaching law than the other two reaching laws. Thus from above
analysis, it can be concluded that the sliding gain computed using computational
intelligence algorithms provides better control performance, transient response, steady
state response and minimal chattering as compared to trial and error method available
Parameter Optimization of Reaching Law 99
in the literatures [3].In this paper two widely used C.I. algorithms were investigated and
currently new C.I. techniques and modified G.A. and P.S.O. techniques are developing
so in future modified G.A. and P.S.O. as well new C.I. technique can be investigated
for optimal gain tuning for reaching law based SMC.
References
1. Utkin, V.I.: Variable structure systems with sliding modes. IEEE Trans. Autom. Control 22
(2), 212–222 (1977)
2. Young, K.D., Utkin, V.I., Ozguner, U.: A control engineer’s guide to sliding mode control.
IEEE Trans. Control Syst. Technol. 7(3), 328–342 (1999)
3. Shah, D.H., Mehta, A.: Discrete-Time Sliding Mode Control for Networked Control System.
Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7536-0
4. Topalov, A.V., Shakev, N.G., Kaynak, O., Kayacan, E.: Neuro-adaptive approach for
controlling a quad-rotor helicopter using sliding mode learning algorithm. In: Adaptation
and Learning in Control and Signal Processing - ALCOSP, IFAC, Turkey, pp. 94–99 (2010)
5. Irfan, S., Mehmood, A., Razzaq, M.T., Iqbal, J.: Advanced sliding mode control techniques
for inverted pendulum: Modeling and simulation. J. Eng. Sci. Technol. 21(4), 753–759
(2018)
6. Gao, W., Hung, J.C.: Variable structure control of nonlinear systems: a new approach. IEEE
Trans. Industr. Electron. 40(1), 45–55 (1993)
7. Hung, J.Y., Gao, W., Hung, J.C.: Variable structure control: a survey. IEEE Trans. Ind.
Electron. 40(1), 2–2 (1993)
8. Kaynak, O., Erbatur, K., Ertugnrl, M.: The fusion of computationally intelligent
methodologies and sliding-mode control-a survey. IEEE Trans. Ind. Electron. 48(1), 4–17
(2001)
9. Yu, X., Kaynak, O.: Sliding mode control made smarter: a computational intelligence
perspective. IEEE Syst. Man Cybern. Mag. 3(2), 31–34 (2017)
10. Holland, J.H.: Adaptation in Natural and Artificial Systems. MIT Press, Cambridge (1975)
11. Man, K.F., Tang, K.S., Kwong, S.: Genetic algorithms: concepts and applications. IEEE
Trans. Ind. Electron. 43(5), 519–534 (1996)
12. Krishnakumar, K., Goldberg, D.E.: Control system optimization using genetic algorithms.
J. Guidan. Control Dyn. 15(3), 735–740 (1992)
13. Samanta, B., Al-Balushi, K.R., Al-Araimi, S.A.: Artificial neural networks and support
vector machines with genetic algorithm for bearing fault detection. Eng. Appl. Artif. Intell.
16(7–8), 657–665 (2003)
14. Mehra, V., Srivastava, S., Varshney, P.: Fractional-order PID controller design for speed
control of DC motor. In: IEEE 3rd International Conference on Emerging Trends in
Engineering and Technology-2010, Nagpur, pp. 422–425 (2010).
15. Zhou, C., Liu, X., Chen, W., Xu, F., Cao, B.: Optimal sliding mode control for an active
suspension system based on a genetic algorithm. Algorithms 11(12), 205–220 (2018)
16. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: IEEE
Proceedings of the Sixth International Symposium on Micro Machine and Human Science,
pp. 39–43 (1995)
17. Bai, Q.: Analysis of particle swarm optimization algorithm. Comput. Inf. Sci. 3(1), 180–184
(2010)
100 V. Mehra and D. Shah
18. Shi, Y.: Particle swarm optimization: developments, applications and resources. In:
Proceedings of the 2001 Congress on Evolutionary Computation, pp. 81–86. IEEE Cat.
No. 01TH8546 (2001)
19. Gaing, Z.L.: A particle swarm optimization approach for optimum design of PID controller
in AVR system. IEEE Trans. Energy Convers. 19(2), 384–391 (2004)
20. Zhang, J.R., Zhang, J., Lok, T.M., Lyu, M.R.: A hybrid particle swarm optimization–back-
propagation algorithm for feed forward neural network training. Appl. Math. Comput. 185
(2), 1026–1037 (2007)
21. Singh, P., Borah, B.: Forecasting stock index price based on M-factors fuzzy time series and
particle swarm optimization. Int. J. Approx. Reason. 55, 812–833 (2014)
22. Singh, P., Dhiman, G.: A hybrid fuzzy time series forecasting model based on granular
computing and bio-inspired optimization approaches. J. Comput. Sci. 27, 370–385 (2018)
23. Singh, P., Huang, Y.P.: A new hybrid time series forecasting model based on the
neutrosophic set and quantum optimization. Comput. Ind. 111, 121–139 (2019)
24. Singh, P.: A novel hybrid time series forecasting model based on neutrosophic-PSO
approach. Int. J. Mach. Learn. Cybern. 11(8), 1643–1658 (2020). https://doi.org/10.1007/
s13042-020-01064-z
25. Pugh, J., Martinoli, A.: Inspiring and modeling multi-robot search with particle swarm
optimization. In: Swarm Intelligence Symposium, pp. 332–339. IEEE (2007)
26. Kim, S.M.: Lumped element modeling of a flexible manipulator system. IEEE/ASME Trans.
Mechatron. 20(2), 967–974 (2014)
27. Den Hartog, J.P.: Mechanical Vibrations. Courier Corporation (1985)
28. Wang, D., Mu, C.: Adaptive-critic-based robust trajectory tracking of uncertain dynamics
and its application to a spring–mass–damper system. IEEE Trans. Ind. Electron. 65(1), 654–
663 (2017)
29. Su, X.: Master–slave control for active suspension systems with hydraulic actuator
dynamics. IEEE Access 5, 3612–3621 (2017)
30. Ayadi, A., Smaoui, M., Aloui, S., Hajji, S., Farza, M.: Adaptive sliding mode control with
moving surface: experimental validation for electro pneumatic system. Mech. Syst. Sig.
Process. 109, 27–44 (2018)
Efficiency of Parallelization Using GPU
in Discrete Dynamic Models Construction
Process
Abstract. The objectives of this paper are: (i) to estimate parallelization effi-
ciency of discrete dynamic models constructing process using GPU; (ii) to
compare the execution time of parallel model depending on the order of model,
the number of discrete values, the number of GPU threads and GPU blocks; and
(iii) to compare the execution time on CPU and GPU. A parallel model for the
prediction of sulfur dioxide emissions into the air of Żywiec city (Poland) based
on historical observations is built and researched in this paper. We have
obtained a parallelization efficiency of 78.1% while executing the constructed
parallel model on GPU Nvidia GTS250. The obtained research results suggest
that the constructing discrete dynamical models must include the efficient use of
parallel computing resources nowadays.
1 Introduction
Construction methods of discrete dynamic models for different systems are sufficiently
developed and widely used. The parametric identification of discrete models was
described in articles by L. Ljung [1], L. Zadeh [2] and Ch. Desoer, V. Strejc [3], V.
Kuntsevych [4]. In terms of computer simulations, the modeling method, which is
based on discrete state equations, is the most promising [5, 6]. In terms of mathematics,
this approach is the most formalized and has practical applications in different areas.
The construction of dynamic models of electrical and electronic circuits was made
using optimization approach. This approach was used by P. Stakhiv and Y. Kozak [7–
9]. This approach makes it possible to build universal models. However, such uni-
versalization leads to the appearance of complex optimization problems that are dif-
ficult to resolve in a reasonable time, even using modern computer technologies.
Let’s consider the generalized mathematical model in the form of state Eqs. (1):
(
! ðkÞ ðkÞ
xðk þ 1Þ ¼ F~
~ xðkÞ þ G~ vð k Þ þ U ~ x ;~ v
; ð1Þ
yðk þ 1Þ ¼ C~
~ xðk þ 1Þ þ D~ vð k þ 1 Þ
where ~ y ~
y – known characteristics of modeled object, ~ k – transient response that are
calculated using the model.
Therefore, the construction of the model can be reduced to calculation values of the
vector ~
k, when the objective function will be minimal. This problem is solved using the
optimization algorithm [13].
The task of finding the minimum point of the nonlinear function Q ~ k with many
variables is difficult. In discrete dynamic models’ construction, the objective function is
a “ravine” with many local minima. For the solution of such problems the Rastrigin’s
Efficiency of Parallelization Using GPU in Discrete Dynamic Models 103
method of a director cone has the best characteristics [14]. Purposeful scan of local
minima can be done using this approach. It accelerates finding of the global minimum
of the objective function. But the computational complexity of this problem is quite
huge. Also, the significant number of input data is used for the construction of the
qualitative model. Consequently, the execution time of the implementation of opti-
mization procedures is also significant [15].
The time complexity of Rastrigin’s optimization algorithm [14] depends linearly on
the number of known discrete values from in known function. Accordingly, the
computational complexity of the problem will be proportional to the amount of input,
not the order of created model. For building a quality model it’s necessary to use a large
amount of data for calculating the value of the objective function. Therefore, the time
spent of calculation the values of functions will also be significant.
The parallelization of this task using SIMD-architecture was proposed in the article
[16]. This architecture allows performing the same thread of instructions for many
threads of data. Taking into this approach, the objective function for different values of
vector ~k is calculate independently. Each objective function will be calculated on a
separate core of GPU.
The flowchart diagram of the parallelization algorithm is presented on Fig. 1.
Today the available devices with SIMD-architecture and with better ratio
performance/price are the GPUs (Graphics Processing Unit) [17, 18]. Therefore, the
proposed algorithm was adapted for these computing accelerators.
In terms of software implementation, the parallelization does not require large
expenditures through programming architecture of GPU [19, 20].
According to this proposed algorithm, the selection of each point will be made in a
separate thread, which will be executed parallelly. Also, the calculation of the value of
ð1Þ ðmÞ
the objective function Qi þ 1 ; . . .; Qi þ 1 in m points will also be executing out on sep-
arate threads.
Same program code is sent to all threads. The input data for each thread are parameters
of the next hypercone, they are the same for all threads. The value of objective function is
received in each entrance of thread. This value is calculated in random point of hyper-
cone. This value will be different for each thread. All output data is stored in the shared-
memory of threads for further calculation of the minimum [21, 22].
The CUDA (Compute Unified Device Architecture) technology was used for pro-
gramming implementation of parallelization algorithm [20, 23]. Currently the computing
using GPUs and CUDA is an innovative combination of features of new generation of
graphics processors NVIDIA, that process thousands of threads with high information
loading. These features are available through programming language C [17, 24].
Today the problem of environmental protection is becoming one of the most important
tasks of science, the development of which is accelerated rapidly by technological
progress in all countries of the world. The rapid development of industry and the
104 I. Strubytska and P. Strubytskyi
significant growth of traffic flows contribute to the emergence of an acute problem - the
protection of ecological systems. In recent decades, ecological systems have been
influenced significantly by anthropogenic factors. Therefore, prediction of character-
istics changes in such systems is an actual task.
A dangerous environmental situation has already emerged today due to the negative
impact of industry in many regions. In particular, pollution of rivers, the air pool,
pollution of the landscape, the destruction of forests, vegetation, wildlife, the fertile
layer, pollution of groundwater, acoustic, electromagnetic and electrostatic pollution.
For example, the air pool is polluted with gas and aerosol emissions (CO2, polycyclic
aromatic hydrocarbons, CO, NOx, SOx, ash, soot and others). Emissions are occurred
Efficiency of Parallelization Using GPU in Discrete Dynamic Models 105
during the combustion of liquid and solid fuels, which form aerosols when released into
the atmosphere. Aerosols can be non-toxic (ash) and toxic (C20H12 is a potent carcino-
genic compound). Also, gas emissions can be toxic (NO2, SO2, NO, CO and others) and
non-toxic (CO2, H2O). All triatomic gases (H2O, NO2, SO2, and especially CO2) belong
to “greenhouse gases”. When gas emissions are released into the atmosphere, they have a
complex physicochemical and biological effect on living organisms and humans, the
level and character of which depend on their concentration in the air.
The combined effects of gas and aerosol emissions from energy objects can lead to
various adverse environmental effects, including crises in the biosphere, such as
deterioration of atmospheric transparency, rainfall and acid rain, greenhouse effect.
Therefore, it is important to predict the concentrations of harmful emissions into the
atmosphere to prevent environmental problems and respond promptly.
Let’s build a model for prediction of emissions into the atmosphere. Model inputs
are weekly averages values of emissions of SO2 (sulfur dioxide) into the air in Żywiec
(Poland) in 2018 [25]. It is necessary to create a mathematical model that based on this
data. This model should be able to predict the emission of SO2 into the air in 2019.
Let’s building the autonomous discrete dynamic model of the third order for testing
of the efficiency of the parallel program of calculation of the objective function. This
model has the following form:
8 0 ðk þ 1Þ 1 0 1 0 ðkÞ 1
>
> x1 F F F x
> B ðk þ 1Þ C
>
11 12 13
B 1ðkÞ C
>
> @ x2 @
A ¼ F21 F22 F22 @ x2 A A
>
>
< ðk þ 1Þ F31 F32 F33 ðk Þ
x
0 3ðk þ 1Þ 1 0 ðk þ 1xÞ31 ; ð3Þ
>
> y x
>
> B 1ðk þ 1Þ C B 1ðk þ 1Þ C
>
> @ A ¼ ð Þ @ A
>
> y2ðk þ 1Þ
C 1 C2 C 3 x2
: ðk þ 1Þ
y3 x3
where k ¼ 1; . . .; 52.
The objective function is:
0 1 0 12
X y1 y1
Q ~
ki ¼ @ y2 A @ y A :
i 2 ð4Þ
y3 y 3
Let’s test the program for calculation of the objective function. The following hardware
and system software were used for these tests:
• GPU NVIDIA GeForce GTS250 (16 multiprocessors with 8 cores each);
• RAM 1024 MB;
• CPU Core2Duo E8400, 3 GHz;
• motherboard ASRock G41M-S3.
106 I. Strubytska and P. Strubytskyi
Let’s considering the performance time of the parallel program for calculation of
the objective function on 128 GPU cores. The average time (with 100 launches) of
parallel computing of the objective function is 2.74 ms.
Then let’s research the change of the execution time of the program respecting the
change of the model order (see on Fig. 2). Obviously, the execution time of parallel
calculation of the objective functions will gradually increase with increase of the order
of the constructed model.
Fig. 2. Dependence of the execution time of the parallel program on the model order
When the number of discrete values will increase, then the execution time of
parallel calculation of the objective function will gradually increase (see on Fig. 3).
Fig. 3. Dependence of the execution time of the parallel program on the number of discrete
values of model
It is also interesting to compare the dependence of the execution time of all the
calculations of objective functions on the number of calculations of the objective
function. Namely, with different number of threads on GPU. Obviously, such depen-
dence is linear for sequential calculations. This dependence is shown on Fig. 4.
Efficiency of Parallelization Using GPU in Discrete Dynamic Models 107
Fig. 4. Dependence of the execution time of the parallel program from the of threads
As it is shown on Fig. 4, this dependence is periodic. Since the execution time was
calculated along with the data transfer, then there are constant delays in each cycle of
optimization algorithm. If we reject these delays then we will see that the periodicity is
repeated every 128 threads. Therefore, it is expedient to use the parallelization when
the number of threads is a multiple number of GPU cores. In this case it is 128 cores.
Then the parallelization will be the most effective.
Build 3D-graph of the dependence of the execution time of the parallel program on
the number of threads and the number of GPU blocks (see on Fig. 5).
Fig. 5. Dependence of the execution time of the parallel program on the number of threads and
the number of blocks
The proposed approach of parallel computing can be used with different opti-
mization algorithms. Therefore, the execution time of the algorithm is not taken into
108 I. Strubytska and P. Strubytskyi
account. In practice, the execution time of the algorithm is less than the required time of
calculations of the objective function.
The CPU performance is much higher than the performance of one GPU core.
However, the parallel realization will be more effective while calculating the objective
function for many sets of the coefficients [20]. Thus, the more calculations of the
objective function we conduct, the more efficient will be the parallelization process of
construction of discrete dynamical models.
The execution time of the parallel program in one core of GPU is shown on Fig. 6.
to the number of processors (provided that the number of calculations of the objective
function is multiplied by one step of the optimization procedure).
5 Conclusions
A parallel model for the prediction of sulfur dioxide emissions into the air of Żywiec
city (Poland) based on historical observations is built and researched in this paper. The
construction of this model using traditional sequential algorithm without its paral-
lelization is failed. Because the construction of such model requires tens of thousands
of iterations of calculations and computer’s RAM is not enough for it. We have
obtained a parallelization efficiency of 78.1% while executing the constructed parallel
model on GPU Nvidia GTS250. We also conducted the comparison of the execution
time of parallel model respecting the model order, the number of discrete values, the
number of GPU threads and the number of GPU blocks. The obtained research results
suggest that the constructing discrete dynamic models must include the efficient use of
parallel computing resources nowadays.
References
1. Ljung, L., Andersson, C., Tiels, K., Schön, T.: Deep learning and system identification. In:
Proceedings of the IFAC World Congress, Berlin (2020)
2. Zadeh, L.A., Desoer, C.A.: Linear System Theory: The State Space Approach Dover Civil
and Mechanical Engineering Series. Dover Publications, New York (2008)
3. Voicu, M.: Advances in Automatic Control. Springer, New York (2004). https://doi.org/10.
1007/978-1-4419-9184-3
110 I. Strubytska and P. Strubytskyi
4. Kuntsevich, V.M., Gubarev, V.F., Kondratenko, Y.P., Lebedev, D.V., Lysenko, V.P. (eds.):
Control Systems: Theory and Applications. Automation Control and Robotics. River
Publishers, Gistrup (2018)
5. Hinamoto, T., Lu, W.: Digital Filter Design and Realization. River Publishers, Gistrup
(2017)
6. Isidori, A., Marconi, L.: Adaptive regulation for linear systems with multiple zeros at the
origin. Int. J. Robust Nonlinear Control 23(9), 1013–1032 (2012)
7. Stakhiv, P., Kozak, Y.: Discrete dynamical macromodels and their usage in electrical
engineering. Int. J. Comput. 10(3), 278–284 (2011)
8. Hoholyuk, O., Kozak, Y., Nakonechnyy, T., Stakhiv, P.: Macromodeling as an approach to
short-term load forecasting of electric power system objects. Comput. Probl. Electr. Eng. 7
(1), 25–32 (2017)
9. Stakhiv, P., Hoholyuk, O., Byczkowska-Lipinska, L.: Mathematical models and macro-
models of electric power transformers. Przeglad Electrotechniczny 5, 163–165 (2011)
10. Kłusek, A., Topa, P., Wąs, J., Lubaś, R.: An implementation of the Social Distances Model
using multi–GPU systems. Int. J. High-Perform. Comput. Appl. 32, 482–495 (2016)
11. Hamilton, B., Webb, C.J.: Room acoustics modelling using GPU-accelerated finite
difference and finite volume methods on a face-centered cubic grid. In: Proceeding of
Conference on Digital Audio Effects (DAFx-13), Maynooth, Ireland (2013)
12. Schalkwijk, J., Jonker, H.J.J., Siebesma, A.P., Meijgaard, E.V.: Weather forecasting using
GPU-based large-eddy simulations. Bull. Am. Meteorol. Soc. 96, 715–723 (2015)
13. Stakhiv, P., Strubytska, I., Kozak, Y.: Parallelization of calculations using GPU in
optimization approach for macromodels construction. Przegliad Elektrotechniczny 7(9), 7–9
(2012)
14. Kaladze, V.A.: Adaptation of casual search by the method of the directing cone.
Vesnik VGTU 8(1), 31–37 (2012)
15. Strubytska, I.: Construction of discrete dynamic model of prediction of particulate matter
emission into the air. Comput. Probl. Electr. Eng. 5, 55–59 (2015)
16. Kozak, Y., Stakhiv, P., Strubytska, I.: Parallelizing of algorithm for optimization of
parameters of discrete dynamical models on massively-parallel processor. Inf. Extr. Proces.
32(108), 125–130 (2010)
17. Boreskov, A., Kharlamov, A., Markovskyy, N.: Parallel computations on GPU. Architecture
and program model CUDA. Publication of Moscow University (2012)
18. Nickolls, J., Dally, W.: The GPU computing era. IEEE Micro 30(2), 56–69 (2010)
19. Kirk, D., Hwu, W.-M.: Programming Massively Parallel Processors: A Hands-on Approach.
Morgan Kaufmann, Burlington (2010)
20. Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU
Programming. Addison-Wesley Professional, Boston (2010)
21. Strubytska, I.: Research of the parallelization of minimum value algorithm on the GPU. In:
Materials of the 1st All-Ukrainian School-Seminar for Young Scientists and Students
“Advanced Computer Information Technologies”, pp. 104–106 (2011)
22. Stakhiv, P., Strubytska, I.: Parallelization method of parameters identification of discrete
dynamic macro models on massively parallel processors. Naukovi notatky 27, 300–305
(2010)
23. Farber, R.: CUDA Application Design and Development. Morgan Kaufmann Publishers,
Burlington (2011)
24. Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs.
Elsevier Inc., Amsterdam (2013)
25. Air Quality Monitoring System. https://powietrze.katowice.wios.gov.pl/stacje. Accessed 15
Aug 2020
On the Performance Analyses of a Modified
Force Field Algorithm for Obstacle Avoidance
in Swarm Robotics
Girish Balasubramanian1(&) ,
Senthil Arumugam Muthukumaraswamy1 , and Xianwen Kong2
1
Heriot-Watt University Dubai Campus, Dubai Knowledge Village,
Dubai, UAE
{gb29,m.senthilarumugam}@hw.ac.uk
2
Heriot-Watt University, Campus The Avenue, Edinburgh E14 4AS, UK
[email protected]
1 Introduction
real world for obstacle avoidance is the artificial potential field algorithm (APF) [3]. In
this algorithm, T represents the target where the robot moves to and O represents the
obstacles which the robot avoids. This is like how magnets apply attractive and
repulsive forces. A combination of these forces around the object make up the path of
the robot to reach its destination. Other algorithms for obstacle avoidance include gap
method [4], particle swarm optimization [5], genetic algorithm [6] and fuzzy logic [7].
A modified force field method (also called APF) is evaluated in this paper. In terms
of robot application, each proximity sensor is given a weight based on the trajectory at
which the robot would face the obstacle on forward movement. The obstacles are given
repulsive forces while the target destination is given an attractive force [9]. In the case
of no destination, there are only repulsive forces present. The sum of the repulsive
forces and the sensor readings aim to provide a sense of direction to the robot.
Seyyed et al. proposed two modified versions of an artificial potential field (APF) al-
gorithm [3]. The first algorithm proposed was called Bug1. In this algorithm, the robot
completely bypasses an object and covers it. The weakness of the algorithm is its
inefficiency. An example of the Bug1 algorithm pathing can be seen in Fig. 1.
Seyyed et. al. Managed to overcome this problem by introducing another modified
algorithm called Bug2 [3]. In the Bug2 algorithm, the robot tries to maintain a path
directly from start to end while circling any obstacles. When the robot can get back to
the line path, the robot continues to move straight to the target location. An example of
this pathing can be seen in Fig. 2. One of the major downsides of the APF algorithm
includes the inability to handle scenarios in cases where the robot is unable to get back
on path or if the robot is surrounded by obstacles. The robot is therefore trapped. The
proposed algorithm successfully tackled this problem and they were able to show
simulation results for verification.
On the Performance Analyses of a Modified Force Field Algorithm 113
However, no practical testing was performed to validate the results. Moreover, both
algorithms shown did not account for cases in which obstacles are entirely blocking the
path to a target and assumes there is a target location to begin with. This algorithm
could be further tested in situations where a robot is set to free roam to look for certain
objects in applications such as search and rescue. The algorithms shown also did not
account for cases in which multiple robots are made to navigate an area at once. These
algorithms could be further tested in a swarm robotics environment.
Fig. 3. Leader Selection of Obstacle Avoidance Algorithm to determine target tracking control
[12].
Anh et al. proposed a force field method that used rotational force fields and
repulsive forces to better avoid more complex shapes [12]. Both clockwise and anti-
clockwise forces were analyzed. Simulations were carried out for multiple robots as
114 G. Balasubramanian et al.
part of a swarm robot capable of taking either V-shaped or circular shaped positions
while avoiding obstacles and trying to maintain on course for the target. They had one
robot in the swarm as a leader robot which was selected using an algorithm as shown in
Fig. 3.
The leader is set to determine the target tracking control algorithm function while
the members of the swarm follow the function. In cases where the leader gets trapped
or cannot maintain shape, the leadership role is passed onto the nearest free member.
The robots follow a V shaped path until the tracked object trajectory is identified. The
robots then assume a circular shape to surround the object while maintaining relative
velocity. This can be seen in Fig. 4 and 5.
Fig. 4. a) Swarm robots in a V-Shape formation. b) Swarm robots in a circular formation [12].
Fig. 5. Path followed by swarm robots using rotational force field algorithm [12].
On the Performance Analyses of a Modified Force Field Algorithm 115
3 Experimental Methods
In this paper, the modified force field method of obstacle avoidance was evaluated in a
single robot and swarm environments. The number of robots were varied between one,
four and ten depending on the experiment. The experiments were all carried out in
Simulink using MATLAB and results were recorded in the form of time taken to
achieve a certain goal. Each robot was made to have 8 forward infrared proximity
sensors. The experiment was carried out using the Robotics Toolbox differential drive
algorithm provided by MathWorks [13]. The algorithms were modified to incorporate
the modified force field algorithm.
The modified forcefield method assumed simplified weightings as opposed to
existing methods. Depending on the expected type of navigation required, each sensor
can be assigned a weight. For example, in the case of exploration in an open area with
less narrow spaces, weights are assigned uniformly for all sensors. However, if the
robots are expected to travel in narrow spaces, less weights can be given to side sensors
to reduce oscillation. An example of a single robot with 8 forward sensors is shown in
Fig. 6.
In the case of the proposed force field algorithm, each sensor is given a weight from
-1 to 1 for the direction of x and y based on the expected path. For the purposes of both
evaluations, the weights were made the same and can be found in Table 1. The theory
behind the selection of these weights are not covered in the scope of this paper.
116 G. Balasubramanian et al.
Table 1. Sensor weights chosen for proposed modified force field method.
Sensor directions 0 1 2 3 4 5 6 7
x − 1 − 0.5 0 0.5 1 0.5 0 − 0.5
y 0 0.5 1 0.5 0 − 0.5 − 1 − 0.5
The sum of the sensor readings multiplied by the contribution is then obtained for
both x and y. Using this data, the velocity of the robot is calculated. This is given by,
X
7
Sum x ¼ jSensorReadingðkÞj ð1Þ
k¼0
X
7
Sum y ¼ jSensorReadingðkÞj ð2Þ
k¼0
A random noise was also added to reduce oscillation in cases where the pathway
around obstacles are too tight. This algorithm was designed in MATLAB and incor-
porated into the robotics toolbox to carry out the test. A flowchart of the algorithm used
can be found in Fig. 7.
Fig. 7. Algorithm showcasing modified force field algorithm applied robot simulation for the
purpose of evaluation 1
On the Performance Analyses of a Modified Force Field Algorithm 117
For the first evaluation, an open area with multiple target points were selected.
A single robot was made to move through the 2-D space until all checkpoints were
identified and passed through. The path of the robot was tracked as well to draw other
conclusions based on navigation. The purpose of this evaluation was to test the
functionality of the tested algorithm and the divergence distance from the object
(waypoint in this case). An example of the area used can be seen in Fig. 8.
Fig. 8. 2-D environment for robot travel created in MATLAB for evaluation 1.
The map consists of multiple checkpoints and the program records the time taken
for the robot to cover all the different checkpoints. The robot is considered successful
once all checkpoints are reached.
In the second evaluation, 2-D maps were also used. However, this time, moving
objects were introduced into the simulation to act as obstacles along with the robots
themselves. The purpose of the second evaluation is to test how the algorithm performs
in a dynamic environment in cases where one, four or ten robots are used. The goal was
for this experiment was to have a robot leave the map. The tests were also carried about
for 500 iterations to normalize the effect of having a robot spawn too close to the exit
and to get a more average result. The average reading of the results is then recorded.
A 2-D map was created in which these robots were made to maneuver. This test
environment can be seen in Fig. 8.
118 G. Balasubramanian et al.
Fig. 9. 2-D environment for robot travel created in MATLAB for evaluation 2.
In this second evaluation, the robots were placed in random areas and moving
objects were made to try and block the path. The timer stopped when any one robot was
able to cross the exit out of the grid (Fig. 9).
In the first experiment, the modified force field algorithm was simulated to run in an
open area and was tasked with crossing 4 checkpoints. This was performed for 100
iterations to obtain more reliable results. An example of the result for the forcefield
method experiment with respect to pathing can be seen in Fig. 12.
The divergence from each waypoint after the first were recorded to evaluate
functionality, this can be seen in Fig. 10.
Fig. 11. Average divergence observed from robot to waypoint over 100 iterations
It was observed that the simplified modified forcefield method had a bigger
divergence from the waypoint path following a much looser restriction. However, due
to this nature of the algorithm, this was expected. Moreover, the robot was successfully
able to navigate around the test environment proving the functionality of this algorithm.
In the second experiment, the modified forcefield algorithm was run in an open area
with multiple robots and tasked to find the exit. This was done to evaluate how different
number of robots perform in a dynamic environment. An example of the simulation
results of this experiment being run with four robots can be seen in Fig. 11.
The performance with respect to time taken was also recorded to identify the time
taken for every robot to exit via the red goal. This result over 500 iterations was
averaged out and recorded in Fig. 12 for three cases where one, four and ten swarm
robots were navigating the space.
120 G. Balasubramanian et al.
Fig. 12. Pathing of four robots using forcefield method in Evaluation 2. (Color figure online)
Fig. 13. Average time taken over 500 iterations for each robot to exit space
The red line crossing through the moving objects were added as a guide for clarity
to determine the moving path. The red crosses serve as guides for the robots to reach
the destination. For example, in Fig. 11, robot 3 can be seen successfully leaving the
area.
From analysis of Fig. 12, one swarm robot takes the most time to exit with 89.6s
compared to adding multiple robots into the same experiment. This was the expected
result since random placements onto the escape grid and lack of robot interaction
On the Performance Analyses of a Modified Force Field Algorithm 121
reduces the ability of the robot to navigate the space. In contrast, the experiment with
four and ten swarm robots take 26.4s and 29.4s, respectively. This was again expected
as adding more robots provide a multi-robot environment in where the robots can
interact with each other and exit the environment. However, in the ten-robot scenario,
the initial robot took longer than for four robots which was inconsistent with the
expectation. This can be attributed to the size of the experiment space; an observation
was made where the correct number of swarm robots need to be selected depending on
the available space for the experiment (Fig. 13).
5 Conclusion
In this paper, the performance and functionality of a modified yet simplified forcefield
algorithm for obstacle avoidance was analyzed. The experiments and simulations used
infrared sensors and tested three scenarios where there were one, four or ten robots
present at a given time. The algorithm was successful in navigating an open envi-
ronment and a dynamic environment in the MATLAB simulation environment.
In open areas, the algorithm with forcefield methods was shown to have a moderate
divergence (0.23 cm–0.44 cm) which could be improved with a different sensor type.
In more dynamic areas, the infrared algorithm with forcefield methods was shown to be
slower by about 340% over 500 iterations with only one robot due to hindrances caused
by obstacles and the lack of robots to interact. However, the swarm robotics envi-
ronments showcased much faster results (26.4–29.4 s) compared to a single robot
which was in line with what was expected.
References
1. Mane, S.B., Vhanale, S.: Genetic algorithm approach for obstacle avoidance and path
optimization of mobile robot. In: Iyer, B., Nalbalwar, S. , Pathak, N. (eds.) Computing,
Communication and Signal Processing. AISC, vol. 810, pp. 649–659. Springer, Singapore
(2019). https://doi.org/10.1007/978-981-13-1513-8_66
2. Chan, S.H., Xu, X., Wu, P.T., Chiang, M.L., Fu, L.C.: Real-time obstacle avoidance using
supervised recurrent neural network with automatic data collection and labeling. In: 2019
IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 472–477.
IEEE (2019)
3. A.K., Wang, J. and Liu, X. : Obstacle avoidance of mobile robots using modified artificial
potential field algorithm. EURASIP J. Wirel. Commun. Netw. 2019(1), 70. (2019). https://
doi.org/10.1186/s13638-019-1396-2
4. Sezer, V., Gokasan, M.: A novel obstacle avoidance algorithm: follow the gap method.
Robot. Auton. Syst. 60(9), 1123–1134 (2012)
5. Chen, X., Li, Y.: Smooth path planning of a mobile robot using stochastic particle swarm
optimization. In: Proceedings of the 2006 IEEE International Conference on Mechatronics
and Automation, pp. 1722–1727. IEEE (2006)
6. Tu, J., Yang, S.X.: Genetic algorithm based path planning for a mobile robot. In: IEEE
International Conference on Robotics and Automation, Proceedings, ICRA 2003, vol. 1,
pp. 1221–1226. IEEE (2003)
122 G. Balasubramanian et al.
7. Reignier, P.: Fuzzy logic techniques for mobile robot obstacle avoidance. Robot. Auton.
Syst. 12(3–4), 143–153 (1994)
8. Jiang, Y., Yang, C., Zhaojie, J., Liu, J.: Obstacle avoidance of a redundant robot using
virtual force field and null space projection. In: Haibin, Y., Liu, J., Liu, L., Zhaojie, J., Liu,
Y., Zhou, D. (eds.) ICIRA 2019. LNCS (LNAI), vol. 11740, pp. 728–739. Springer, Cham
(2019). https://doi.org/10.1007/978-3-030-27526-6_64
9. Borenstein, J., Koren, Y.: Real-time obstacle avoidance for fast mobile robots in cluttered
environments. In: Proceedings of IEEE International Conference on Robotics and
Automation, pp. 572–577. IEEE (1990)
10. Peng, Y., Qu, D., Zhong, Y., Xie, S., Luo, J., Gu, J.: The obstacle detection and obstacle
avoidance algorithm based on 2-D LIDAR. In: 2015 IEEE International Conference on
Information and Automation, pp. 1648–1653. IEEE (2015)
11. Singh, N.H., Thongam, K.: Neural network-based approaches for mobile robot navigation in
static and moving obstacles environments. Intel. Serv. Robot. 12(1), 55–67 (2018). https://
doi.org/10.1007/s11370-018-0260-2
12. Dang, A.D., La, H.M., Nguyen, T., Horn, J.: Formation control for autonomous robots with
collision and obstacle avoidance using a rotational and repulsive force–based approach. Int.
J. Adv. Rob. Syst. 16(3), 1729881419847897 (2019)
13. Ghorpade, D., Thakare, A.D., Doiphode, S.: Obstacle detection and avoidance algorithm for
autonomous mobile robot using 2D LiDAR. In: 2017 International Conference on
Computing, Communication, Control and Automation (ICCUBEA), Pune, pp. 1–6 (2017)
14. Hutabarat, D., Rivai, M., Purwanto, D., Hutomo, H.: Lidar-based obstacle avoidance for the
autonomous mobile robot. In: 2019 12th International Conference on Information &
Communication Technology and System (ICTS), pp. 197–202. IEEE (2019)
15. Mathworks.com.: Mobile robotics simulation toolbox - file exchange - MATLAB Central
(2018). https://www.mathworks.com/matlabcentral/fileexchange/66586-mobile-robotics-sim
ulation-toolbox. Accessed 8 Dec 2018
Rethinking the Limits of Optimization
Economic Order Quantity (EOQ) Using Self
Generating Training Model by Adaptive-
Neuro Fuzzy Inference System
Abstract. This paper deals gives an alternate approach to the traditional way of
solving an optimization of Economic order quantity (EOQ) by applying self-
generating training model in ANFIS. The inventory control for a successful
organization has to be sustained with varying parameters such as demand, setup
cost and ordering cost. This research work combines the fuzzy inference system
and Adaptive neuro-fuzzy inference system to acquire the optimal order quantity
with fuzzy logic. The proposed algorithm of self-generating training dataset
proclaims the efficient model for predicting EOQ with variable demand. This
algorithm is tested with various numbers of datasets and the results are com-
pared with the fuzzified and crisp model. The performance evaluation is done
and the results are satisfactory to apply any nonlinear problems.
1 Introduction
It was Zadeh [20] in 1965 introduced the fuzzy sets that refers to vagueness and
uncertainty in real life. In the year 1983, Zimmerman [21] observed fuzziness in the
study of operational research. Park [12] fuzzified the ordering cost and holding(storage)
cost as trapezoidal fuzzy numbers in the traditional EOQ to solve the non-linear pro-
gramming that resulted in fuzzy EOQ models. Inventory is a part of the major segments
in economy which is ambiguous in nature that concentrates on a proper management,
by increasing the customers and minimizing the inventory costs. This resulted in an
EOQ model introduced by Harris [9] in 1913.Chen [5] was the first person to propose a
probabilistic approach to an inventory with imperfect items. In 2007, Wang [18]
studied the randomness in fuzzy EOQ model for items with imperfection. Dutta [7] in
the year 2004, fuzzified the inventory parameters demand, holding cost and ordering
cost. In inventory management the economic order quantity is the strategy used as a
replenishment model that is used to determine the total inventory costs and also
manipulates to minimize it. Earlier time demand was assumed to be a constant that
resulted as a pitfall in the EOQ model. So, there were models brought with fluctuating
demands to face the consistent seasonality in business. To put up with the complica-
tions, EOQ can be customized using inventory management software suggesting a
well-organized ordering solution. To deal with these limitations it is essential to bring
in the artificial intelligence techniques in solving the real problems.
Although many researchers had simulated the artificial intelligence in inventory
control for production systems. The study of Artificial intelligence in inventory man-
agement was first started off by Jang [10] by combining the fuzzy inference system
with adaptive networks. Gupta and Maransa [8] proposed a two-stage stochastic pro-
gramming uncertainty demand model where in the first, production decisions was
considered and supply chain decisions in the second stage.
Fuzzy based study could be convenient for such uncertain situations and works best
if combined with artificial neural networks (ANN) in the real business world. When
Fuzzy merges with ANN, it results in neuro fuzzy system and fuzzy neural system. The
renowned work of Aliev [2] proposed two different design, namely neuro fuzzy sys-
tems whose key task is to operate mathematical relations, whereas the fuzzy neural
systems is applied to discover the numerical information and knowledge-based data
which are represented by fuzzy numbers. In 1991, Pedrycz [14] generated models by its
conduct towards unpredictability and characterized the connection between the fuzzy
theory and neural networks. Also, Pedrycz [15] extended his study of neurons in
pattern classifiers in the year 1992. Further, NFN was defined by a fuzzy structure
instructed by an algorithm by a smart model by Jang [11]. Fuzzy neural networks are
many distinguished by the connection of their neurons. In 1943, McCulloch [19]
developed a mathematical model using a single neuron which established the
authenticity of a neuron activation in the brain that was widely accepted in theoretical
possibilities. The start of examining the connected nodes to modify the connection
weights was carried out by Hebb [6] in the year 1949. It was Rosenblatt [16] brought
the perceptron neural model that satisfied the neuron’s behavior.
An adaptive neuro-fuzzy logic inventory control system was introduced in the year
2012 by BalazsLenart [4]. Later, Aksoy et al. [1] applied ANFIS in a garment trade
with demand prediction. In 2015, Aengchuan and Phruksaphanrat [3] compared the
three methods namely fuzzy inference system (FIS), Adaptive Neuro-Fuzzy inference
system (ANFIS) and Artificial neural networks with variating membership functions in
solving inventory problems. Out of these, the ANFIS along with gaussian membership
function resulted with the minimum total cost. In 2015, yet another result that fore-
casted the advantage of ANFIS over ANN was carried out by Paul, Azeem and Ghosh
[13] maintaining the optimum inventory level in the inventory management problem.
The economic order quantity equation Kalaiarasi et al. [22] was taken and the adaptive
neuro fuzzy inference system is applied to the obtain the desired model.
Rethinking the Limits of Optimization Economic Order Quantity (EOQ) 125
Many researchers had formulated the hybrid methods of artificial intelligence which
had resulted in better results by applying ANFIS in solving complex EOQ problems.
ANFIS is a learning tool that is extensively used to get the desired output by applying
fuzzy logic with highly interconnected neural network. In this model an inventory
management with the advantage of Adaptive neuro-fuzzy inference system (ANFIS)
has been developed with modelling done in fuzzy using a given set of data.
@T Rðd aSÞ gc
¼ þ ð2Þ
@Q Q2 2
@T
Equating @Q ¼ 0 we obtain the economic order quantity in crisp values [22]
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2Rðd aSÞ
Q¼ ð3Þ
gp
Fig. 1. Flow chart for ANFIS methodology to predict EOQ for varying Demand
Layer 1
Serves to raise the degree of membership and the membership used here is Gaussian
membership function.
and
O1;i ¼ lB ð yÞ; i ¼ 1; 2 ð5Þ
ðxcÞ2
f ðx; r; cÞ ¼ e 2r2
Layer 3
Normalizes the firing strength
wi
O3;i ¼ wi ¼ ; i ¼ 1; 2: ð7Þ
w1 þ w2
Layer 4
Calculates the output based on the parameters of the rule consequent {pi, qi and ri}
Layer 5
Counts the ANFIS output signal by summing all incoming signals will produce
X P
wi fi
w f ¼ Pi
i i i
ð9Þ
i wi :
ANFIS uses the input data scaling by xbounds = [min max] command used in
MATLAB software which represents the scaling parameter of the input function varies
between minimum to maximum value of the data point. Each data point is scaled for
pre-processing of training initially by normalizing it.
Step 3:
This the key step where the neuro fuzzy algorithm generates synthetic dataset for each
data with connection to the adjacent data that controls the noises and errors present in
data.
Step 4:
In this step the algorithm calculates the error percentage of the data based on the
coefficient of variation.
Step 5:
Every iteration the system generates one synthetic data and a model is generated the
based on the number of iterations decided by the user. More the number of synthetic
data better performance of the algorithm.
Step 6:
Finally, the EOQ model based on variable demand rate is predicted by the neuro fuzzy
algorithm.
ANFIS will test the data using the synthetic training dataset generated by this algorithm.
Figure 2 shows the membership function used to train the data (Figs. 3, 4 and 5).
It is very necessary for a company to know the strategy of variable demand with
Economic Order Quantity. Thus, using this algorithm, we can easily predict the EOQ
for variable demand. This algorithm is proved successful while comparing with crisp
and fuzzified model (Fig. 6). Figure 7 represents the three-dimensional model of EOQ
for variable demand. In ANFIS training there is a virtual connection between synthetic
dataset and output. ANFIS is one of the potential soft computing algorithms integrates
neural networks and fuzzy logic. In this EOQ model, Gaussian membership function is
applied to guesstimate the output.
Self-generation of data have certain advantages.
1. Errors or noises in the data can be eliminated.
Rethinking the Limits of Optimization Economic Order Quantity (EOQ) 129
Fig. 2. Represents the Gaussian membership function used for training the data
2. If there is any missing data between two data points, based on the standard devi-
ation and the trend, it can be interpolated. The trend will be maintained.
3. Synthetic training datasets are elastic in nature because the datasets can run between
maximum and minimum of the original data and predicts the exact results though
the data is nonlinear.
4. Developing synthetic data using this algorithm helps the ANFIS system to decide
the output very easily. Fixing the membership functions for a bunch of data, though
it consumes time, it will predict the exact result after defuzzification.
Fig. 3. Shows the self-generated training dataset compared with the original output
Fig. 5. Represents the tested data using synthetic training dataset by self-generating algorithm
Fig. 6. Comparative analysis of ANFIS output with crisp and fuzzified model.
132 A. Stanley Raj et al.
Fig. 7. Three-dimensional model for price dependent coefficient, purchase cost and demand
4 Conclusion
This method of intelligent technique is applied for efficient management for acquiring
economic order quantity. The comparative analysis in conventional methods of linear
solving methods with artificial intelligent techniques has proven to be a valuable and
logically expert tool is assessing EOQ. On combining the neural networks and fuzzy
logic, ANFIS has been applied by training with large number of synthetic datasets
generated by the random permutations of software algorithm. The linguistic knowledge
present in the fuzzy logic approach is used to model complex nonlinear process of EOQ
model. In this paper EOQ model has been solved with different techniques and tested
with the actual crisp model. It reveals that the following method of ANFIS works well
and it can be applied in any logistic expert to analyze the EOQ. Neurofuzzy training
time is quite short and the expert training dataset will provide exact results. The flow of
goods and the procurement of other goods can be easily modeled using ANFIS. The
sensitivity analysis is done and the results revealed that the ANFIS tool has been
universally applied for any type of logistic problem.
References
1. Aksoy, A., Ozturk, N., Sucky, E.: Demand forecasting for apparel manufacturers by using
neuro-fuzzy techniques. J. Model. Manag. 9(1), 18–35 (2014)
2. Aliev, R.A., Guirimov, B., Fazlohhahi, R., Aliev, R.: Evolutionary algorithm-based learning
o fuzzy neural networks. Fuzzy Sets Syst. 160(17), 2553–2566 (2009)
Rethinking the Limits of Optimization Economic Order Quantity (EOQ) 133
3. Aengchuan, P., Phruksaphanrat, B.: Comparison of fuzzy inference system (FIS), FIS with
artificial neural networks (FIS +ANN) and FIS with adaptive neuro-fuzzy inference system
(FIS+ANFIS) for inventory control. J. Intell. Manuf. 29, 905–923 (2015)
4. Lénárt, B., Grzybowska, K., Cimer, M.: Adaptive inventory control in production systems.
In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS
2012. LNCS (LNAI), vol. 7209, pp. 222–228. Springer, Heidelberg (2012). https://doi.org/
10.1007/978-3-642-28931-6_21
5. Chen, C.Y.: A probabilistic approach for traditional EOQ model. J. Inf. Optim. Sci. 24, 249–
253 (2003)
6. Hebb, D.O.: The Organization of Behavior. Wiley, London (1949)
7. Dutta, D., Kumar, P.: Optimal policy for an inventory model without shortages considering
fuzziness in demand, holding cost and ordering cost. Int. J. Adv. Innov. Res. 2(3), 320–325
(2004)
8. Gupta, A., Maransa, C.D.: Managing demand uncertainty in supply chain planning. Comput.
Chem. Eng. 27, 1219–1227 (2003)
9. Harris, F.: Operations and Cost. AW Shaw Co., Chicago (1913)
10. Jang, J.R.: ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man
Cybern. 23(3), 665–685 (1993)
11. Jang, J.-S.R., Sun, C.-T., Mizutani, E.: Neuro-Fuzzy and Soft Computing-A Computational
Approach to Learning and Machine Intelligence. Prentice Hall, New York (1997)
12. Park, K.S.: Fuzzy set theoretical interpretation of economic order quantity. IEEE Trans. Syst.
Man Cybern. SMC 17, 1082–1084 (1987)
13. Paul, S.K., Azeem, A., Ghosh, A.K.: Application of adaptive neuro-fuzzy inference system
and artificial neural network in inventory level forecasting. Int. J. Bus. Inf. Syst. 18(3), 268–
284 (2015)
14. Pedrycz, W.: Neurocomputations in relational systems. IEEE Trans. Pattern Anal. Mach.
Intell. 13(3), 289–297 (1991)
15. Pedrycz, W.: Fuzzy neural networks with reference neurons as pattern classifiers. IEEE
Trans. Neural Networks 3(5), 770–775 (1992)
16. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and
organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)
17. Garai, T., Chakraborty, D., Roy, T.K.: Fully fuzzy inventory model with price-dependent
demand and time varying holding cost under fuzzy decision variables. J. Intell. Fuzzy Syst.
36, 3725–3738 (2019)
18. Wang, X., Tang, W., Zaho, R.: Random fuzzy EOQ model with imperfect quality items.
Fuzzy Optim. Decis. Mak. 6, 139–153 (2007)
19. McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull.
Math. Biophys. 5(4), 115–133 (1943). https://doi.org/10.1007/BF02478259
20. Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
21. Zimmerman, H.J.: Using fuzzy sets in operational research. Eur. J. Oper. Res. 13, 201–206
(1983)
22. Kalaiarasi, K., Sumathi, M., Mary Henrietta, H., Stanley, R.A.: Determining the efficiency of
fuzzy logic EOQ inventory model with varying demand in comparison with Lagrangian and
Kuhn-tucker method through sensitivity analysis. J. Model Based Res. 1(3), 1–2 (2020)
23. Stanley Raj, A., Srinivas, Y., Damodharan, R., Hudson Oliver, D., Viswanath, J.:
Presentation of neurofuzzy optimally weighted sampling model for geoelectrical data
inversion. Modell. Earth Syst. Environ. (2020). https://doi.org/10.1007/s40808-020-00935-2
Hybrid POS Tagger for Gujarati Text
Abstract. POS tagger is one of the most essential steps in Natural Language
Processing applications to analyze the text. We perform a comparative study of
POS taggers available for Gujarati language. Comparative study is based on
evaluation criteria namely Precision, Recall, and F1 score for each tag and
overall accuracy of the POS tagger. A hybrid approach for improving the
accuracy of POS tagger has been proposed for Gujarati Language which com-
prises the LSTM based POS tagging and Computational Linguistic Rules. We
found that accuracy of the statistical taggers is improved by applying language
specific rules.
1 Introduction
sentence where the tags for different senses are different. e.g. can be used
as a common noun– “Goddess”, proper noun –as “Devi”, Auxilary: Participle Noun for
the verb “to give”, or as verb “to give”.
In the following section, study of the existing taggers for Indian languages w.r.t.
accuracy, data set and approach used to build the POS tagger has been carried out. The
objective of this study is to develop the POS tagger for Gujarati language with good
accuracy. State-of-the-art techniques for POS taggers for Gujarati Language and other
Indian languages have been studied in Sect. 2. Comparative study to select the sta-
tistical POS taggers for Gujarati language has been conducted in Sect. 3. Design the
Gujarati language specific rules for improving the accuracy of statistical POS tagger in
Sect. 4. In Sect. 5, experimental analysis and conclusion have been discussed.
2 Literature Survey
Oldest POS tagger was part of the parser developed by Zelling Harris under the
Transformations and Discourse Analysis Project developed using 14 handwritten rules
[24]. Current trend is shifted towards the Machine learning based approach [10] for
which featureset, techniques, and dataset play a key role in developing POS tagger for
any language.
Lot of work has been carried out for English and other foreign languages and state-
of-art-accuracy is achieved. But very less work has been carried out for Indian lan-
guages. POS tagger for the Indian languages have been developed by using Rule based
approach, Hybrid approach, and Machine learning approach consisting Decision Tree,
HMM, Maximum entropy, SVM and CRF techniques as stated in the survey [12] but
among all Hybrid approach outperforms rest of the other approaches for Indian lan-
guages. Comparison is based on the accuracy, method and data set. According to the
survey analysis, Machine learning approach has given good result only when corpus is
having more than 25,000 words [12]. In this section, focus is given to study the state-
of-the-art techniques in the field of POS tagging for Indian languages to find the best
POS tagging techniques and features to improve the accuracy of Gujarati Statisti-
cal POS taggers. Attention of this study is approach, technique, featureset and domain
of the corpus used for improving accuracy of POS tagger.
POS taggers are available for Indian Languages namely Hindi, Bengali, Punjabi,
Dravidian, Malayalam, Marathi, Tamil, Sindhi, Telugu, and Gujarati [4, 6, 7, 12, 13,
15, 17]. Approaches used for Indian Languages other than Gujarati language are CRF
[13], HMM [7], Maximum Entropy [7], SVM [7], Trigram model [15], Bidirec-
tional LSTM [17].
From the above study conducted for other than Gujarati language, Machine
learning approach is giving good result only when corpus contains enough annotated
quality data and language specific features are provided. After studying other Indian
Languages, study of approaches used for Gujarati language has been carried out.
136 C. Tailor and B. Patel
Kapadia & Desai, (2017) have Table 1. Frequency of the POS tags in
developed a POS tagger by adopting percentage
Rule based approach for Gujarati lan- POS tag Percentage
guage using suffix list as well as list of NN 27.33
nouns [11]. They have manually created PUNC 12.68
the corpus containing 500 sentences with VAUX 8.78
2514 words for the testing purpose. VM 8.16
Accuracy of this system is 87.48%. If
JJ 8.09
current word suffix is not matched with
VAUX_VNP 6.27
any of rule made up of suffix, system
PSP 4.53
fails to assign appropriate tag.
RPD 3.16
Hala & Virani, (2015) have used the
DMD 2.96
Hybrid approach to improve the accuracy
of POS tagging for Gujarati language [9]. NNP 2.91
They have tagged data using HMM PRP 2.85
model. Linguistic rules are applied to CCD 2.54
improve the accuracy of POS tagger. NST 1.83
Two rules are discussed. One of them is NEG 1.62
about verb identification: if sentence has QTC 1.46
the structure in which unknown word is QTF 1.06
followed by Noun followed by Pronoun,
then it has to be a verb. This rule is not
suitable for complex sentence structure.
Consider the sentence
[Their scooter has
been sold out to Jay.] This example is contradiction to the rule defined for tagging verb.
Patel & Gali, (2008) have developed a POS tagger for Gujarati language using semi
supervised machine learning approach [1]. They have used CRF with 600 manually
tagged sentences and 5,000 untagged sentences. Testing corpus contains 5,000 words.
Few of the features used in this work are suffixes, prefixes, and number. Achieved
accuracy for this POS tagger is 91.47%. Major errors are found in tagging nouns.
Garg et al., (2013) have developed a POS tagger for Gujarati language using CRF
to improve the accuracy of Named entity recognition [8]. For training, they have used
the corpus available at TDIL website which contains 25 documents each containing
1000 sentences. A feature set is made up of 18 features comprises of suffix and prefix
of word with length 3, word length, previous and last two words, current word, and
word combination up-to context window size five. Similarly, POS tagger for Gujarati
language by Knowledge Based Computer Systems Division of CDAC Mumbai [3] is
developed using the Stanford library of Maximum Entropy model. They have not
published its accuracy as well as training and testing dataset details.
From the above study, we infer that language dependent features are important for
developing and improving the accuracy of POS tagger using Machine learning based
POS tagger. It can be further increased by adding language specific rules. In addition,
the domain and size of corpus also affect the accuracy of POS tagger.
Hybrid POS Tagger for Gujarati Text 137
As we are working for Anaphora Resolution for Gujarati text, we have developed
Gujarati Sentence Tokenizer, Chunker, Context based Gender and Number Identifi-
cation of a noun phrase for Gujarati Text where POS tagger is important component of
the system Anaphora Resolution. As it is used as key feature not only for Anaphora
Resolution but also for developing other components namely Chunker, Context based
Gender and Number Identification of a noun phrase for Gujarati Text. Therefore, in this
section, our main focus is to identify the best statistical Gujarati POS tagger in terms of
accuracy, precision (P), recall (R), and F1 score (F1).
Gujarati monolingual TDIL corpus [21] is used to measure the accuracy of Sta-
tistical POS taggers. This corpus consists of multiple domains namely Art and Culture,
Economy, Entertainment, Religion, Sports, Science and Technology. There are total
30,000 sentences with 3,50,298 tokens in this corpus. We have found around 1200
words were annotated with the wrong tags which were corrected. Tags which are
corrected are: NN, NNP, QTC, QTF, and NEG.
By exploring the Gujarati text of TDIL corpus, we analyze that most frequent tags
used in Gujarati Language listed in Table 1. It also shows that if nouns are accurately
identified then accuracy of POS tagger increases as its contribution as compare to other
tags are more in Gujarati corpus.
There are five existing POS taggers for Gujarati language as discussed in previous
section. Among all, we have studied POS taggers developed by Knowledge Based
Computer Systems Division of CDAC Mumbai using Max-Entropy technique and
Garg et al., (2013) developed at IR Lab Gandhinagar using Conditional Random Fields
(CRF). As these two taggers are developed by covering multiple domain dataset and
publicly available. Rest of the three taggers out of five is not available publicly for the
further study. Accuracy of CRF based Gujarati tagger is available but Max-Entropy
based Gujarati tagger’s accuracy is not available. These both taggers CRF based tagger
and Max-Entropy tagger are tested on TDIL corpus and result the accuracy 67.27% and
77.54% respectively that is below the state-of-art accuracy found in literature which is
shown in Table 2.
As Bidirectional LSTM POS tagger outperforms the existing tagger for Tamil
language [17], we have developed and added Bidirectional Recurrent Neural Network
with LSTM based POS tagger for Gujarati Language which supports long term
dependency of the words in the sentence. At LSTM layer, output vector dimension is
256 and activation function is tanh as to provide non-linearity and tanh performs better
as compare to sigmoid [25]. By evaluating activation functions Relu and tanh for
Gujarati POS Tagging, tanh has performed 1% more accurately than Relu. Adam
Optimizer is used with optimized parameters’ value provided in [17]. Adam Optimizer
with Sparse categorical cross entropy optimization function as a loss function performs
better as POS tagging is multi-labeling problem where data is not encoded using one-
hot encoding [26]. 128 number of samples per gradient update and 5 epochs are used
for generating the model during the training. Corpus is divided into 80%, 20% for
training and testing dataset. 20% of training dataset is used for validation purpose.
Accuracy of this tagger is listed in Table 2.
138 C. Tailor and B. Patel
Table 2. (continued)
POS tag Max-entropy tagger CRF based tagger RNN with
bidirectional LSTM
P R F1 P R F1 P R F1
Conjunction
Coordinator
(CCD)
Conjunction 0.65 0.34 0.45 0.71 0.67 0.69 0.97 0.99 0.98
Subordinator
(CCS)
Conjunction 0.00 0.00 0.00 0.00 0.00 0.00 0.92 0.86 0.89
Subordinator:
Quotative
CCS_UT
Default Particles 0.58 0.88 0.70 0.82 0.83 0.83 0.94 0.90 0.92
(RPD)
Classifier 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Particles (CL)
Interjection 0.93 0.18 0.30 0.00 0.00 0.00 0.82 0.49 0.61
Particles (INJ)
Intensifier 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
particles (INTF)
Negation particles 0.99 0.67 0.80 1.00 0.66 0.79 0.99 0.99 0.99
(NEG)
Foreign word 0.94 0.35 0.51 0.04 0.04 0.04 0.73 0.21 0.32
(RDF)
Symbol (SYM) 0.97 0.94 0.96 1.00 0.92 0.96 1.00 0.99 1.00
Punctuation 1.00 0.99 1.00 1.00 0.94 0.97 1.00 1.00 1.00
(PUNC)
From the Table 2, we have observed that accuracy of Max-Entropy tagger devel-
oped by Knowledge Based Computer Systems Division of CDAC Mumbai for tags
namely QTC, QTF, JJ, NN, NNP, NST, PSP, PRP, PRL, PRC, PRQ, DMI, VAUX,
VAUX_VNP, INJ, NEG, RDF, and PUNC is better as compared to CRF based tagger
developed by Garg et al. (2013) at IR-lab Gandhinagar. Both taggers are having equal
accuracy for the tags: PRF, DMR, DRQ, CCS_UT, and ECH. For rest of the POS tags,
accuracy of the tagger developed by Garg et al. (2013) at IR-lab Gandhinagar is better
as compared to tagger developed at CDAC- Mumbai. But RNN with bidirectional
LSTM outperforms both the taggers in each of the POS tags except only for the POS
tags RDF and QTC. Overall accuracy of tagger developed at IR-lab Gandhinagar,
CDAC-Mumbai and RNN with bidirectional LSTM is 67.27%, 77.54% and 90.83%
respectively. In the next section, we have designed Gujarati language specific rules to
improve the accuracy of the Gujarati POS tagger.
140 C. Tailor and B. Patel
of case markers: ,
, then current word is assigned the tag “NN”.
Accuracy of the Gujarati Hybrid tagger designed using Gujarati statistical POS
taggers and designed rules including lookup table are measured on Gujarati corpus [21]
and Table 4 contains the tag-wise accuracy of the POS tags after applying rules. In
Table 4, F1 score of POS tags in bold font indicate the improvement in the tags.
Accuracy of POS tag QTF has decrease as it gets confused with POS tag JJ for the
word like “ ” [bidʒi] [Second/other]. Rests of the tags are not much affected in
case of Max-Entropy POS tagger developed by CDAC-Mumbai. After applying rules,
accuracy of the tagger developed at IR-Lab gets improved to 75.16% where only
adverb and main verb tags are slightly affected and accuracy of it gets down. In case of
RNN, we have not applied all the rules but we have only applied rules for the POS tags:
QTC, QTO, PRF, PRC, VAUX_VNP, CCS_UT, INJ, NEG, RDF and SYM which
increases the accuracy upto 91.10%. Words adopted from other languages namely
Hindi and English are increasing the error in case of NNP tag. POS tags namely NST,
PRQ, PRI are confused with PSP, DMQ, DMI respectively. If semantic knowledge is
added, then accuracy can be improved. Dimension of the POS tagset also affects the
accuracy of the POS tagger. Domain of training data as well as testing data plays an
important role in the accuracy of POS tagger.
Table 4. (continued)
POS tag RNN with Max-entropy tagger CRF tagger
bidirectional LSTM
P R F1 P R F1 P R F1
VAUX_VNP 0.96 0.79 0.87 0.96 0.34 0.51 0.96 0.27 0.42
CCD 0.93 0.93 0.93 0.94 0.7 0.8 0.94 0.70 0.80
CCS 0.99 0.96 0.98 0.94 0.99 0.96 0.80 1.00 0.89
CCS_UT 0.95 0.97 0.96 0.99 0.99 0.99 0.99 0.99 0.99
RPD 0.94 0.93 0.94 0.64 0.94 0.76 0.83 0.91 0.87
INJ 0.79 0.58 0.67 0.43 0.59 0.5 0.77 0.60 0.67
NEG 0.99 1.00 0.99 0.99 0.99 0.99 0.99 0.99 0.99
RDF 0.95 0.97 0.96 0.97 0.92 0.94 0.59 0.91 0.72
SYM 0.98 0.97 0.97 1 0.94 0.97 1.00 0.93 0.96
PUNC 1.00 1.00 1.00 1 1 1 1.00 0.96 0.98
Accuracy 91.10% 81.36% 75.16%
5 Conclusion
We have deduced that corpus size and domain have major effect on the performance of
the POS tagger. For any language, Hybrid approach for developing POS tagger is
promising as compare to purely Rule based and Machine learning based approach. If
linguistic rules are added to Machine learning based tagger with lookup table for closed
class and open class words with proper suffix list, performance of the tagger is
improved. Even though different taggers are available for Gujarati language, Hybrid
approach using RNN with Bidirectional LSTM combined with language specific Rules
outperforms CRF and Max-entropy tagger. Out of vocabulary words are correctly
tagged with the help of contextual features of the words. The proposed method can also
be applied to various language processing applications like Shallow Parser, Named
Entity, etc. We are working to improve the accuracy of Hybrid tagger developed using
RNN with bidirectional LSTM and Rules.
References
1. En.wikipedia.org, August 2018. https://en.wikipedia.org/wiki/Bengali_language
2. Gujarati-English Learner’s Dictionary, June 2003. https://ccat.sas.upenn.edu/plc/gujarati/
gujengdictionary
3. Knowledge Based Computer Systems Division of CDAC Mumbai. https://kbcs.in/tools.html
4. Antony, P.J., Soman, K.P.: Parts of speech tagging for Indian languages: a literature survey.
Int. J. Comput. Appl. 34(30), 1–8 (2011)
5. Antony, P.J., Mohan, S.P., Soman, K.P.: SVM based part of speech tagger for Malayalam.
In: International Conference on Recent Trends in Information, Telecommunication and
Computing, pp. 339–341 (2010)
144 C. Tailor and B. Patel
6. Bharati, A., Mannem, P.: Introduction to the Shallow Parsing Contest for South Asian
Languages, Hyderabad, pp. 1–8 (2007)
7. Ekbal, A., Bandyopadhyay, S.: Part of speech tagging in Bengali using support vector
machine. In: Proceedings of the 2008 International Conference on Information Technology,
Bhubaneswar, pp. 106–111 (2008)
8. Garg, V., Saraf, N., Majumder, P.: Named entity recognition for Gujarati: a CRF based
approach. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol.
8284, pp. 761–768. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_74
9. Hala, S., Virani, S.: Improve accuracy of part of speech tagger for Gujarati language. Int.
J. Adv. Eng. Res. Dev. 2(5), 187–192 (2015)
10. Horsmann, T., Erbs, N., Zesch, T.: Fast or accurate? – A comparative evaluation of PoS
tagging models. In: International Conference of the German Society for Computational
Linguistics and Language Technology, pp. 22–30 (2015)
11. Kapadia, U., Desai, A.: Rule based Gujarati morphological analyzer. Int. J. Comput. Sci.
Issues 14(2), 30–35 (2017)
12. Mehta, D., Desai, N.: A survey on part-of-speech tagging of Indian languages. In: 1st
International Conference on Computing, Communication, Electrical, Electronics, Devices &
Signal Processing, March 2015
13. Motlani, R., Lalwani, H., Shrivastava, M., Sharma, D.M.: Developing part-of-speech tagger
for a resource poor language: Sindhi. In: Proceedings of the 7th Language and Technology
Conference (2015)
14. Patel, C., Gali, K.: Part-of-speech tagging for Gujarati using conditional random fields. In:
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages,
Hyderabad, pp. 117–122 (2008)
15. Singh, J., Joshi, N., Mathur, I.: Part of speech tagging of Marathi text using trigram method.
Int. J. Adv. Inf. Technol. 35–41 (2013)
16. Tailor, C., Patel, B.: Sentence tokenization using statistical unsupervised machine learning
and rule base approach for running text in Gujarati language. In: International Conference on
Emerging Trends in Expert Applications & Security, Jaipur (2018)
17. Gokul Krishnan, K.S., Pooja, A., Anand Kumar, M., Soman, K.P.: Character based
bidirectional LSTM for disambiguating tamil part of speech categories. IJCTA 10(3), 229–
235 (2017)
18. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceeding ICLR 2015,
pp. 1–15, January 2017
19. Bach, N.X., Linh, N.D., Phuong, T.M.: An empirical study on POS tagging for Vietnamese
social media text. Comput. Speech Lang. 50, 1–15 (2018)
20. Sheth, J., Patel, B.: Saaraansh: Gujarati text summarization system. IRACST 3(7), 46–53
(2017)
21. Gujarati Monolingual Text Corpus ILCI-II, Tdil-dc.in (2017). 1?option=com_download
&task=showresourceDetails&toolid=1882&lang=en. Accessed 05 May 2018
22. Tailor, C., Patel, B.: Context based gender identification of noun phrase for Gujarati text.
Presented at National Conference on Emerging Technologies in IT, Surat (2019)
23. Dakwale, P., Mujadia, V., Sharma, D.M.: A hybrid approach for anaphora resolution in
Hindi. In: International Joint Conference on Natural Language Processing, pp. 977–981
(2013)
24. Harris, Z.S.: String Analysis of Sentence Structure. Mouton, The Hague (1962)
25. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, London (2017)
26. Wei, L.: October 2018. https://cwiki.apache.org/confluence/display/MXNET/Multi-hot
+Sparse+Categorical+Cross-entropy
Optimization of ICT Street Infrastructure
in Smart Cities
Abstract. Technology solutions are now helping the urban local bodies in
Indian Smart Cities to make smart and effective decisions. With these tech-
nologies coming up there IoT devices can be seen making its place on urban
streets. This paper focuses on the overall view of the ICT (Information and
Communication Technology) components and its placement in the city at var-
ious locations on the streets. The research for ICT street infrastructure when
deployed for upcoming smart cities would help in optimization of ICT street
infrastructure and bring down the total life cycle cost in terms of the cost of
ownership and the recurring expense for operation and maintenance, for the
deployment in upcoming cities.
1 Introduction
Supporting Infrastructure such as Junction Box, OFC (Optical Fibre Cable), Electric
Meters, Poles, etc. All these hardware components form the basic components of the
street infrastructure. They are installed on streets, which usually have just 12% to 18%
of the land of the city.
“Smartness” doesn’t come from installing smart devices on traditional infrastruc-
ture. It is using technology to analyse the data gathered, to facilitate decisions and help
improving citizen services to enhance quality of life. Quality comes up with many
dimensions, it includes everything from the air that citizens breathe to how they feel on
the streets. There are dozens of smart sensors/devices installed on the streets, to sense
& communicate the same to the command centres [2].
This paper focuses on the overall view of the ICT (Information and Communication
Technology) components and its placement in the city at various locations. This shows
the complexity due to the haphazard and unplanned ICT street infrastructure. It was
noticed that the most advance cities need to improve a lot for its fundamental street
infrastructure for all smart devices to function and achieve the larger goal of city
operations.
2 Methodology
Research commenced with the background research, which helped derive the problem
statement and research questions. This was followed by a literature review, a study of
street design guidelines for different cities to understand the placement of ICT Street
Infrastructure in cities. Case studies of various cities having different street infras-
tructure with IoT and ICT components were also studied. (Chicago, New York,
Colombia & Pune). This was followed by primary data collection which involved site
visits to various locations in the city. Interactions with various stakeholders were done
to understand the complexity in implementation of ICT street infrastructure. Based on
the data available from primary and secondary data collection, analysis was done of the
ICT street infrastructure in the city. An in-depth study and analysis of current infras-
tructure was done to find the root cause analysis for its high CAPEX and OPEX costs.
Based on this research, a modular pole was designed, and a pilot study was conducted
at one junction in City A.
3 Literature Review
The concept of Smart Cities can be looked upon as a framework for implementing a
vision to help achieve the benefits of modern urbanization. The inclination to adopt the
Smart City model is driven by the need to surpass the challenges posed by traditional
cities, as well as, overcoming them in a systematic manner [3]. It is crucial for cities
therefore to explore a shift towards adopting sustainable city development measures
amongst all stakeholders, namely - Citizens, Businesses and the Government. City-
level interventions to support the development of urban environments that enhance
economic growth and enhance overall quality of life are more vital than ever, but need
to be tailored to the historical, spatial, political, socio-economic and technological
context of cities at different stages of urbanization [4].
As part of literature review, street design guidelines of Pune, New-York &
Colombia were referred. It was found that most of the Indian cities don’t have street
design guidelines. Better Streets, Better Cities by ITDP (Institute for Transportation
and Development Policy) & EPC (Environmental Planning Collaborative) helped to
understand the principles of street infrastructure [3]. Streetlight policy and design
guidelines by the Department of Transportation, District of Columbia helped to
understand Methodology for change in street infrastructure [4]. Pre-Standardization
study report, Unified, secure and resilient ICT infrastructure by Bureau of Indian
Standards helped understand the standardization procedure adopted for the services [5].
AoT (Array of Things) in Chicago has essentially served as a “fitness tracker” for
the city, measuring factors that impact livability in cities such as climate, air quality and
noise. The node is created which contains all sensors and it can be mounted on a traffic
pole or wall whichever is feasible. Department of Transport Chicago is currently
installing it on Light pole and data is hosted on open data portal for private companies
and students to do research [6].
For all cities, the task was to fix all the elements to the street template so that it
defines a unique street, based on the requirement, size, location and demand. Principles
for the street design guideline were Public Safety, Environment, Design and Aesthetics.
Cities like New York have a street design guideline describing all different elements
that can come to a street [7].
“How to design and maintain a city is critical to creating a sustainable ecosystem—
one that provides not only for today’s needs but for the needs of future generations, and
one that takes not only humans into account but all life. To achieve this goal, cities
must end the ‘business as usual’ approach and become caretakers for both the people
they serve and the environment in which they live” [8].
148 G. Doctor et al.
followed later. Another image in Fig. 1 shows the location where traffic light and Street
light pole are installed adjacent to each other.
Ideally, this should be optimized so that there can be a cut of expenses on the
infrastructure with all the solutions being implemented in the city and utilize it for other
projects. Here this can be optimized by keeping both on the same pole and utilizing the
existing street infrastructure to its optimum level.
Fig. 1. Image on left shows the junction A where there are three different poles installed for
different purpose. The figure on right shows Junction B where there are two different poles
installed for different purpose.
All ICT street infrastructures to be placed, go through three phases, namely, Pre-
implementation phase, Implementation phase & Post-implementation phase.
have common, unique, durable and aesthetic infrastructure in streets of the city. This
would allow new ICT Infrastructure coming up in the future installed on the common
infrastructure available. This makes the scope of both vendors clear for operation and
maintenance of the field devices or infrastructure.
and electrical boxes are also different. This causes duplicity of lines under the ground.
Earthing for each is also different. This results in damage of the road during any
maintenance activity and increases the costs of maintenance. The maintenance of
different devices at different timings means frequent damage to the road. In order to
stop this, a simple procedure can be developed in the city.
All the modular poles need to be connected with each other. A common duct for
electrical and network lines is to be laid as a part of connectivity for the poles. The
current layout and the optimized layout of common underground infrastructure on road
junctions/interactions can be observed in Fig. 5.
Optimization of ICT Street Infrastructure in Smart Cities 153
Maintenance would involve only a simple trenching to open the duct, thus mini-
mizing the damage to the road. This activity need not be repeated for all devices as they
all are connected to the same line at one location. This would remove the duplicity in
work, decreasing the operation and maintenance activity, and change the CAPEX and
OPEX costs.
large complexity in the civil, supporting and underground infrastructure and O&M
becomes more difficult with time.
In the proposed scenario there is identical street infrastructure at all locations, so
there is no location related complexity and there is a single infrastructure at each
location that needs maintenance. So, having this identical infrastructure at all locations
reduces the complexity and it is also easier for new agency to maintain the street
infrastructure when there is a change in operations and maintenance contract.
6 Cost Analysis
The analysis was done for the cost incurred at each location in City A and the cost that
can be saved if the street infrastructure at that location is optimized. The cost analysis
was done based on the commercial information available of the current system inte-
grator and the quote that came up for new optimized street infrastructure. Based on the
site visits the draft BoQ (Bill of Quantity) was filled with the quantities at each
location. Then it was analyzed that if all quantities are changed as per the proposed
solution of optimization, there can be a calculated percentage difference that can be
achieved. Figure 6 shows the analysis done for the field infrastructure utilized in City A
with the infrastructure that would be saved if field infrastructure is optimized. Based on
this analysis it was observed that there can be a further decrease of 15%–20% in
quantity if field infrastructure was optimized.
Based on this BOQ calculation as shown in Fig. 6, cost analysis was done for 10
junctions in the city A. This can be seen in Fig. 7. This helped understand the cost
differences arising due to the installation of modular infrastructure at junctions in the
city.
Optimization of ICT Street Infrastructure in Smart Cities 155
7 Conclusion
Building resilience comes with taking the investment designs wisely and prioritize the
efficiency and utilization of the resource allocated. Higher the standardization of the
infrastructure and the processes, higher is the resilience of the city. There is a need to
understand the urban project and its process of operation and maintenance in the city to
contribute to the resilience of the city.
With the change in street infrastructure, there are certain direct benefits that ULB
(Urban Local Body) can have and there are some benefits that can be observed with
time.
• Sharing of the resources for different ICT systems.
• As all devices are using the same infrastructure and supporting devices, thus,
removing the duplicate infrastructure at same location and decreasing the cost
incurred for the same.
• Improvement in financials by decrease in the lifecycle cost of asset (by decreasing
upfront cost and operations and management cost for the asset installed).
• Providing identical infrastructure throughout the city results in the unique character
of infrastructure in the city and helps to maintain the aesthetic consistency.
• Building city resilience to withstand the changes and adopt future technologies.
• Increasing the sustainability by decreasing the street infrastructure and increasing
the efficiency of the infrastructure procured.
The aim of this research was to study the current ICT infrastructure in smart cities and
the ways available to optimize the infrastructure. The simplest way identified was to
have modular poles, that can house the current and upcoming IoT’s, also there can be a
different research itself to develop a single device which can carry out work of multiple
IoT’s as it was done in developing Array of Things in Chicago.
Based on the pilot study at a junction in City A, it was found that having a modular
pole helped to reduce the components of street infrastructure in the city. It would help
to reduce the cost of installation for the additional IoT that will be coming at the
junction. Also, the aesthetics of the junction looked unique and similar on all roads’
156 G. Doctor et al.
connection at the junction. Over a period of time, it was observed that, and it can help
to reduce the complexity of operations and cost of maintenance.
References
1. Ministry of Housing and Urban Affairs Government of India, “What is Smart City”. https://
smartcities.gov.in/content/innerpage/what-is-smart-city.php. Accessed 2020
2. ITU Academy: Smart Sustainable Cities ICT Infrastructure. https://www.itu.int/en/ITU-D/
Regional-Presence/AsiaPacific/Documents/Module%202%20Smart%20Sustainable%20Citi
es%20Infrastructure%20Draft%20H.pdf. Accessed 2019
3. World Econimic forum & PwC: Circular Economy in Cities Evolving the model for a
sustainable Urban Future (2018). https://www3.weforum.org/docs/White_paper_Circular_
Economy_in_Cities_report_2018.pdf. Accessed 02 2019
4. Price waterhouse Coopers: Creating the smart cities of the future (2018)
5. ITDP (Institute for Transportation and Development Policy) & EPC (Environmental
Planning Collaborative): Better Streets, Better Cities. https://www.itdp.in/wp-content/
uploads/2014/04/07.-Better-Streets-Better-Cities-EN.pdf. Accessed 2020
6. District of Columbia: Street Light policy and Design Guidelines. https://comp.ddot.dc.gov/
Documents/Streetlight%20Policy%20and%20Design%20Guidelines.pdf. Accessed 2019
7. Bureau of Indian Standards: Pre Standardisation Study Report: Unified, Secure and
Resilient ICT framework for Smart Infrastructure. https://smartnet.niua.org/content/
f85221e9-4f9b-4b9d-a3e5-85ee63478b1d. Accessed 2019
8. Department of Transport, Chicago: Array of Things (AOT). https://arrayofthings.github.io/.
Accessed 2019
9. Department of Transport New York: Street Design Manual. https://www.nycstreetdesign.
info/. Accessed 2019
10. World Economic Forum: Inspiring Future Cities & urban services shaping the future of
Urban development & services initiative, April 2016. https://www3.weforum.org/docs/
WEF_Urban-Services.pdf. Accessed 2019
11. NIUA (National Institute for Urban Affairs): Smart Net. https://smartnet.niua.org/. Accessed
2019
12. World Economic Forum: Reforms to Accelerate the development of India's Smart Cities
Shaping the Future of Urban Development & Services, April 2016. https://www.weforum.
org/reports/reforms-to-accelerate-the-development-of-india-s-smart-cities-shaping-the-future
-of-urban-development-services. Accessed 2019
13. Pune Municipal Corporation: Urban Street Design Guidelines. https://www.itdp.in/wp-
content/uploads/2016/07/Urban-street-design-guidelines.pdf. Accessed 2019
14. Ministry of Housing and Urban Affairs, Government of India: Mission Statement and
Guidelines. https://smartcities.gov.in/upload/uploadfiles/files/SmartCityGuidelines(1).pdf.
Accessed 2020
Transfer Learning-Based Image Tagging
Using Word Embedding Technique
for Image Retrieval Applications
Abstract. In recent days, social media plays a vital role in day to day
life which increases the number of images that are being shared and
uploaded daily in public and private networks, it is essential to find
an efficient way to tag the images for the purpose of effective image
retrieval and maintenance. Existing methods utilize feature extraction
techniques such as histograms, SIFT, Local Binary Patterns (LBP) which
are limited by their inability to represent images in a better way. We
propose to overcome this problem by leveraging the rich features that can
be extracted from convolutional neural network (CNN) that have been
trained on million images. The features are then fed into an Artificial
Neural Net, which is trained on the image features and multi-label tags.
The tags predicted by the neural net for an image is mapped on to a
word embedding plane from which the most similar words for the given
tag is retrieved. Along with this ANN, we include an Object detection
neural net, which provides additional tags for the image. The usage of
an additional neural net and a word vector model adds to the aspect of
Zero-shot tagging i.e., the ability of the tagger to assign tags to images
outside its own training class examples. The dataset utilized here is the
Flickr-25k dataset.
1 Introduction
With the incredible outreach of the internet and the explosive growth of the
smartphone market millions of images are uploaded to internet daily. Therefore,
it is important to have an efficient way of tagging and keeping the images for
efficient retrieval. It has been already found that using text as a way of index-
ing the images, so finding and efficient and effective way of tagging images is
important. Tag is a piece of text that describes the content of an image or any
digital media. Figure 1 illustrates the image tagging process of assigning images
tags based on the content of the image. The need for automated image tagging
c Springer Nature Singapore Pte Ltd. 2021
K. K. Patel et al. (Eds.): icSoftComp 2020, CCIS 1374, pp. 157–168, 2021.
https://doi.org/10.1007/978-981-16-0708-0_14
158 M. Poonkodi et al.
is due to spamming of same tags again and again and assigning useless tags to
media content. Therefore, for effective retrieval of images it is important to have
tags that reflect on the content of the image perfectly. The process of image
tagging can be divided into identifying the content in the image and assigning
the tags properly. The usage of natural language processing in image processing
[1–4] promises better results than the existing models.
1.1 Challenges
As mentioned earlier, the existing systems utilize features extracted from images
such as histograms, color histograms and Scale invariant feature transformation,
which are ineffective at modelling the image contents. This leads to poor perfor-
mance while trying to predict the tags/contents of the image. Moreover, the time
spent in training a system to assign tags to the images based on these features is
very high. For the above specified reasons, we propose to utilize transfer learning
to model the image better as well as reducing the training time of the system.
By using transfer learning, we extract the best features from the images, which
ultimately leads to better performance when compared to the existing systems.
2 Related Works
The previous proposed models proposed use the concept of transfer learning,
but are mainly an extension of multi-label multi-class classification. Jianlong Fu
and Yong Rui [5] on their proposal mentioned two ways of tagging images. One
of them, was based on neighbor voting to tag images and the other one was to
use a CNN which employs multi label multi class classification to tag the images
based on the training set. Although both of these methods had a good mean
Average Precision (mAP) score, they lacked the ability to assigning tags that
the neural net hasn’t been trained on.
Zhang et al. [7] on their paper, proposed a method of zero shot tagging based
on associating an image to a tag and principal direction on a word vector plane.
But the word vector plane and the unseen tag set used by them is limited. They
assign tags to the images based on the principal direction of the image features
and rank the words in that direction. This is illustrated in the following Eq. 2.
f (Xm ) ≈ Wm (2)
Where f (Xm ) is the principal direction, Xm is the set of visual features and
Wm is the word along the principal direction ranked linearly. By approximating
this function f , zero shot tagging is achieved. Zero shot taggers learn from the
training classes but ultimately it tries to classify test images into unseen classes
[8–10]. This is made possible by the usage of word embeddings, which ultimately
help in zero shot tagging. Our proposed method uses a glove vector model to
provide the word embeddings and based on the tags assigned by the neural net
to assign unseen tags to the images. So, the problem is reduced to training a
neural network to tag the images and a word vector model to assign unseen tags
to them.
Transfer Learning-Based Image Tagging Using Word Embedding Technique 161
3 Proposed Methodology
The previous models-built lack the capacity to assign totally unseen tag sets to
images and to overcome this problem, we propose to utilize “aa” word vector
model combined with a pre-trained CNN and a neural network to assign tags
that are unseen by the neural network classifier.
Object Detection Net. Object detection net is a neural net that has been
trained on the MSCOCO dataset [11] which detects the objects in the given
image. We utilize only the top 3 objects detected by the neural net to augment
the tags already predicted by TaggerNet. We remove any duplicate tags if any
assigned by the object detection neural net.
GloVe Net. We use the GloVe model to generate word embeddings which are
then used to predict similar words. The GloVe model follows the TaggerNet i.e.,
GloVe model works on the output provided by TaggerNet. The tags returned
by TaggerNet is converted into vectors using GloVe model after which all of the
vectors included in the GloVe model is compared with the vector of tags returned
by the TaggerNet model to find most similar tags. We use cosine similarity as the
measure to compare similarity. In this way, two most similar tags are returned
for each tag that is assigned by the TagNet.
Thus, by combining the output returned by each component we have a set
of tags that can be assigned to each image based on its content. These tags can
be indexed and stored in database which can be used in image retrieval and
search systems. The proposed Algorithm 1 is used to perform zero-shot tagging
/augmenting the basic tags on the input image.
Initialize : ObjectDetectionNet
Top tags = ObjectDetectionNet.predict(Image)
Total tags=[]
Total tags.append(Tags)
Total tags.append(similar tag)
Total tags.append(Top tags)
Return: Total tags
Transfer Learning-Based Image Tagging Using Word Embedding Technique 163
3.2 Training
The learning process of the proposed tagger consists of three different stages that
are dependent on each other. The first stage is the training of the TaggerNet
and the second stage is the training of the word embedding model and the last
stage is the finding out the neighbor tags in the word embedding model to the
tags assigned by the TaggerNet in the first step.
As mentioned earlier in the architecture of the TaggerNet, we initialize a
ResNet-50 neural net with image net weights for feature extraction. After the
features are extracted, a neural net is created to learn the tags from the image
features. The neural net utilizes an Adam optimizer [12] with a learning rate of
0.001.
mt
Wt = Wt−1 − η √ (4)
vt + ε
The changing of weights by the Adam optimizer is carried out based on the
formula given in Eq. 4, where Wt is the weight of the current step, Wt−1 is the
weight of previous step, η is the step size, mt and vt are bias corrected first and
second moment values. The structure of the neural net is depicted in Fig. 5.
The Second stage involves training a word embedding model. The model
is trained on large corpus of data (Brown corpus). The pre-processing steps
involved in training the model includes, removal of stop words, lemmatization of
the words. Then, we train the model for set number of iterations, with a fixed
embedding size and the context window size.
After the TaggerNet is trained, the tags returned by it is vectorized using
the word embedding model to find two nearest neighbor words to each tag. The
nearest word to a given tag is calculated using the cosine distance between the
tag vector and a potential similar word vector. By, this way we extract the two
most similar words to each given tag.
T.E
D= (5)
||T ||.||E||
The above Eq. 5 specifies the cosine distance D where, T is the vector of tag
returned by TaggerNet and E is the vector of a potentially similar word in the
164 M. Poonkodi et al.
vector space. We find vector E, which increases the value of D, which in turns
means that the two vectors are more similar to each other. Given a set of tags
Ti we find set of tags Wi that consists of most similar words returned by the
model utilizing the cosine distance specified in Eq. 6.
4 Experimental Analysis
The dataset used to build the tagger is the MIRFLICKR-25000 dataset [13]
which consists of 25000 images with their corresponding tags and ground truth
tags. There are 24 ground truth classes and 1386 tags. The tags have seman-
tic overlapping i.e., an image of tree will be tagged as (tree, plant life). The
distribution of each tag is shown in Fig. 6.
The neural net is trained for 15 epochs with the 20000 image data with a train
and validation split of 80 and 20 respectively. After the neural net is trained, it
is tested on the remaining 5000 images.
4.1 Results
We compare our neural network model’s mean average precision (mAP) score
with pre-existing works in Table 1 which shows that our model outperforms
them. The Area Under Curve (AUC) of the precision recall curve of each of
the 24 classes of MIR-FLICKR 25k dataset is specified in Table 2. The micro-
averaged precision over all the classes is found out to be 0.81. The graph in
Fig. 7 represents the precision-recall curve of each of the 24 ground truth classes
of the MIR-FLFICKR 25k dataset. As, we treat tagging as a form of multi-label
Transfer Learning-Based Image Tagging Using Word Embedding Technique 165
classification we use cross entropy as the loss function coupled with sigmoid
function as the activation function. The variation of the loss and accuracy during
each epoch is shown in the Fig. 8.
Top k accuracy is a performance evaluation metric used especially for multi-
label classification models. The general idea behind this metric is that, it takes
into account the number of guesses that is taken by the model to correctly
classify an image into a set of classes. Top k accuracy scientifically means that
the correct class is in the Top-k predicted probabilities by the model for it to
166 M. Poonkodi et al.
Table 2. Area under curve (AUC) of Precision-recall graph of the classes in FLICKR-
25k dataset.
Fig. 8. Accuracy and loss curve of the 24 classes of MIR-FLICKR 25k dataset during
training and testing
Fig. 9. Results obtained by the proposed image tagger method (Color figure online)
5 Conclusion
References
1. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre-
sentations in vector space. arXiv preprint arXiv:1301.3781 (2013)
168 M. Poonkodi et al.
2. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed represen-
tations of words and phrases and their compositionality. In: Advances in Neural
Information Processing Systems, pp. 3111–3119 (2013)
3. Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vec-
tor grammars. In: Proceedings of the 51st Annual Meeting of the Association for
Computational Linguistics (vol. 1: Long Papers), pp. 455–465 (2013)
4. Tellex, S., Katz, B., Lin, J., Fernandes, A., Marton, G.: Quantitative evaluation
of passage retrieval algorithms for question answering. In Proceedings of the 26th
Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, pp. 41–47. ACM (2003)
5. Fu, J., Rui, Y.: Advances in deep learning approaches for image tagging. APSIPA
Trans. Signal Inf. Process. 6 (2017)
6. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word repre-
sentation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pp. 1532–1543 (2014)
7. Zhang, Y., Gong, B., Shah, M.: Fast zero-shot image tagging. In: 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition (CVPR), pp. 5985–5994.
IEEE (2016)
8. Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Advances
in Neural Information Processing Systems, pp. 2121–2129 (2013)
9. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-
based classification. In: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 819–826 (2013)
10. Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for
multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013)
11. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D.,
Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp.
740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1 48
12. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint
arXiv:1412.6980 (2014)
13. Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: Proceedings of
the 1st ACM International Conference on Multimedia Information Retrieval, pp.
39–43. ACM (2008)
14. Makadia, A., Pavlovic, V., Kumar, S.: Baselines for image annotation. Int. J. Com-
put. Vis. 90(1), 88–105 (2010). https://doi.org/10.1007/s11263-010-0338-6
15. Li, X., Snoek, C.G., Worring, M.: Learning tag relevance by neighbor voting for
social image retrieval. In: Proceedings of the 1st ACM International Conference on
Multimedia Information Retrieval, pp. 180–187. ACM (2008)
16. Branson, S., Van Horn, G., Belongie, S., Perona, P.: Bird species categorization
using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952
(2014)
Comparison of Deep Learning Models
for Cancer Metastases Detection: An
Experimental Study
1 Introduction
Cancer is one of the leading causes of death in the world. Breast Cancer (BC)
is the second most common type of cancer in women worldwide. According to a
report from WHO, 627,000 women died from BC in 2018. Studies have shown
that the incidence of BC is increasing globally [1]. Early detection, diagnosis
and treatment of BC leads to decline in mortality related to the disease [2,4].
Effective treatment of BC is attributed by determining the cancer stage. One of
the indicators to determine the BC stage involves pathological examination of
lymph node samples for the evidence of metastases [17]. However, the study of
the samples by professional pathologists takes a considerable amount of time.
There is also variability in the diagnostic results of the same sample between
different pathologists and by the same pathologist at different point in time.
c Springer Nature Singapore Pte Ltd. 2021
K. K. Patel et al. (Eds.): icSoftComp 2020, CCIS 1374, pp. 169–181, 2021.
https://doi.org/10.1007/978-981-16-0708-0_15
170 V. G. Buddhavarapu and J. A. A. Jothi
2 Related Work
Convolutional Neural Networks (CNN) have experimentally been shown to be
effective at classification and segmentation tasks for histopathology image analy-
Comparison of Deep Learning Models for Cancer Metastases Detection 171
sis. Common tasks include classification of patches or whole slide images (WSIs)
into cancerous or non-cancerous or sub-types of cancer, segmentation of struc-
tures such as tumor and detection of nuclei [9].
Xu et al. proposed stack sparse autoencoder (SSAE) for nuclei detection
[22]. The proposed method was able to outperform 9 other strategies for nuclei
detection. The authors also proposed method of using SSAE with softmax (SM)
for classification of nuclei patches [21]. The method of SSAE+SM performed
better than Principal Component Analysis PCA+SM, SM and SAE+SM with
AUC of 0.8992.
Feng et al. proposed a method for classification of nuclei from breast cancer
histopathology images using stacked denoising autoenoder (SDAE) with soft-
max [5]. The method outperformed other strategies with benign and malignant
accuracies of 98.27% and 90.54% respectively. It was also able to outperform
other architectures even with randomized pixel errors introduced into the data.
Hou et al. proposed a convolutional autoencoder (CAE) with cross sparsity
for the tasks for detection, classification and segmentation of nuclei [7]. The
crosswise sparsity enables feature representation of the background and fore-
ground. The model performed well against other state of the art methods using
different datasets such as MICCAI and TCGA.
The dataset used in the study, PatchCamelyon, is derived from the Came-
lyon16 challenge. The challenge involved the classification of WSIs based on the
presence of metastatic tissue. It has been used as a bench-marking dataset for
many studies in Deep Learning [11,12,15,16,20].
The contribution of this study is the application of unsupervised features
extracted using Autoencoders for the classification of histopathology patches.
3 Dataset
In this work, we use the PatchCamelyon (PCam) dataset [18]. The PCam dataset
contains 327,680 hematoxylin and eosin (H&E) stained patches of sentinal lymph
node sections. The size of each patch is 96×96. Each patch in the PCam dataset
is associated with a binary label (positive or negative) indicating the presence
or absence of metastatic tissue in the patch. Thus, this dataset translates the
task of metastatic detection to a two-class image classification problem. If the
center of a patch contains at least one pixel of tumor tissue then the patch is
assigned a positive label otherwise it is assigned a negative label. The dataset
is split into train, validation and test sets in the ratio of 80:10:10 respectively.
The patches of the PCam dataset are derived from the Camelyon16 challenge
dataset that contains 400 H&E-stained whole slide images (WSIs) of sentinal
lymph node sections of breast cancer patients [3]. Table 1 shows the breakdown
of the PCam dataset into training, validation and testing sets. Figure 1 shows
sample patches from the dataset belonging to the two classes.
172 V. G. Buddhavarapu and J. A. A. Jothi
Table 1. Breakdown of PCam dataset into training, validation and testing sets.
PatchCamelyon Dataset
Training Set 262,144 patches
Validation Set 32,768 patches
Testing Set 32,768 patches
Total 327,680 patches
Fig. 1. Sample images from the PCam dataset belonging to the two classes.
4 Methodology
and Average Pooling that compute the max and average of each pooling size
cluster (commonly 2 × 2) respectively.
Optionally, a Batch Normalization layer is used after the Convolutional layer.
Batch Normalization is a technique that is used to improve the stability and
performance of the CNN by re-scaling and re-centering the inputs [8].
Typically, a CNN consists of multiple Convolutional blocks that consist of
Convolutional, Pooling and Batch Normalization layers. At the end of the final
Convolutional block, the multi-dimensional output is flattened to a linear set of
neurons. The neurons are then fully-connected (every input neuron is connected
to every output neuron) with another set of neurons. This layer is called the
Fully-Connected layer. Finally, the output of the Fully-Connected layer is con-
nected to a set of neurons that represent the labels of the classification task. If the
classification task is binary-classification, then the sigmoid function is used. Oth-
erwise, multi-classification tasks use the softmax function. Equation (1) shows
the formula for sigmoid activation function where x is the input value. Equa-
tion (2) shows the formula for softmax activation function where xi ∈ X and X
is a set of logit scores.
1
Sigmoid(x) = σ(x) = (1)
1 + e–x
exp(xi )
Softmax(xi ) = (2)
j exp(xj )
174 V. G. Buddhavarapu and J. A. A. Jothi
4.2 Autoencoders
Autoencoders [14] are a type of neural network that can learn sparse encod-
ings in an unsupervised manner. Autoencoders consist of two separate networks:
Encoder and Decoder. The input to the Autoencoder is at the Encoder side, and
the output of the Autoencoder is at the Decoder side. The Encoder and Decoder
are connected via the Bottleneck where the sparse representation is located. The
model is generally trained by reconstructing the input data at the output side.
The training, like other feed-forward neural networks, is performed through back-
propogation. Thus, the model learns to encode a representation that is close to
the original input data but sparse in shape. There are numerous types of Autoen-
coders including Sparse Autoencoders and Denoising Autoencoders [19] that are
utilized in learned representation applications, and Variational Autoencoders
[10] that are utilized in generative model applications.
Figure 2 depicts a generalized Autoencoder. Let X and X be the set of input
nodes and output nodes respectively. If xi ∈ X, then h(xi ) = σ(We × xi + b)
where σ is the sigmoid function, We is the weight matrix and b is the bias. If
xi ∈ X , then the reconstructed output is given by xi = σ(Wd × h(xi ) + b ). The
weights and biases are updated during backpropogation. Ideally, the input data
and the reconstructed data are identical: X ≈ X .
block are then passed to a set of Convolution - Batch Normalization layers. The
Convolution layer is the same as the previous block with 64 filters and filter size
3 × 3. The output feature maps are then sent to the Decoder network.
The Decoder network follows the same order of layers as described in the
Encoder network but reversed. After the input of the Decoder passes through
a set of Convolution - Batch Normalization layers, it enters the first Convolu-
tional block. The first Convolutional block of the Decoder network uses the same
layers as described in the second Convolutional block of the Encoder network.
However, the Max Pooling layer is replaced by an Up Sampling layer that will
increase the dimensionality of the input in contrast to the Max Pooling layer.
The output of the first Convolution block then proceeds to the second Convo-
lutional block. The second Convolutional block of the Decoder network uses the
same layers as described in the first Convolutional block of the Encoder network.
Finally, the output of the second Convolutional block is passed through one set
of Convolution - Batch Normalization. The Convolutional layer has 3 filters and
filter size 3 × 3. The output of this set is the reconstructed image of dimensions
96 × 96 × 3.
The FCAE network is trained on the train set of the PCam dataset. After
training the FCAE network, the output encoded feature maps of the Encoder are
flattened and these flattened features are then used to train a Random Forest
classifier for classifying an image into positive (metastatic) or negative (non-
metastatic) class.
the Encoder network, the feature maps are flattened into a linear set of neurons.
The linear set of neurons are then connected to a Fully-Connected Layer of
100 neurons. These neurons constitute the ‘bottleneck’. After passing through
the bottleneck. The neurons are then connected to a layer of 576 neurons. These
neurons are then reshaped back to a 3-dimensional array as input to the Decoder
network. Figure 4 shows the network architecture of the CAEB network.
The CAEB network is trained on the train set of the PCam dataset. After
training the CAEB network, the output features of the bottleneck layer are
used to train a Random Forest classifier for classifying an image into positive
(metastatic) or negative (non-metastatic) class.
5 Implementation Details
Experiments are conducted in order to verify if the features extracted from the
Autoencoder networks FCAE and CAEB are better than training the CNN net-
work ECNN from the scratch. For this, as mentioned earlier, all the three models
FCAE, CAEB and ECNN are initially trained and validated using the train and
validation images of the PCam dataset. After training the FCAE and CAEB
Comparison of Deep Learning Models for Cancer Metastases Detection 177
networks, the features are extracted from them and the extracted features are
used to train a Random Forest classifier. Let the trained random forest classi-
fier be denoted as FCAE+RF and CAEB+RF where FCAE+RF represents the
RF classifier trained using features from the FCAE and CAEB+RF represents
the RF classifier trained using features from the CAEB. The Random Forest
classifier use 128 estimators. After this, the FCAE+RF, CAEB+RF classifiers
and the trained ECNN model are tested using the test images from the PCam
dataset.
All the models in this work are written using Python 3.6 with Tensorflow 1.x
deep learning library and Keras as the API. The hardware and software environ-
ment consist of 2 Nvidia Tesla V100 GPUs with 32 GB dedicated memory and
Ubuntu 18.04 Operating System respectively. All the three models are trained
for 500 epochs and a batch size of 512 with early stopping callback. The callback
is used as a precautionary measure to prevent overfitting.
All the models used the Adam optimizer. The Mean Squared Error (MSE)
loss function is used to train the FCAE and the CAEB models whereas the
Binary Cross-entropy (BCE) loss function is used to train the ECNN model.
Equation (3) shows the formula for MSE loss function where y is the actual
value and y is the predicted value. Equation (4) shows the formula for BCE loss
function where y is the actual value and y is the predicted value. In ECNN,
there is a Dropout regularizer layer of rate 0.5 after the fully connected layer of
576 neurons and before the sigmoid classification layer.
1
N
MSE = (yi – yi )2 (3)
N
i=1
Let positive class represent a patch having metastatic tissue and negative class
represent a patch not having metastatic tissue. The general confusion matrix
showing True Positive (TP), False Positive (FP), False Negative (FN) and True
Negative (TN) is shown in Fig. 6. The performance of the classification models
TP + TN
Accuracy = (5)
TP + FP + TN + FN
TP
Precision = (6)
TP + FP
TP
Sensitivity/Recall = (7)
TP + FN
TN
Specificity = (8)
TN + FP
Precision × Recall
F1-Score = 2 × (9)
Precision + Recall
The classification results of the various models are presented in this section.
Table 2, Table 3, and Table 4 shows the classification results in the form of
confusion matrices for the three models namely FCAE+RF, CAEB+RF and
ECNN respectively. Table 5 shows the Accuracy, F1-Score, Precision, Sensitivity
and Specificity values of the models FCAE+RF, CAEB+RF and ECNN.
From the experimental results it is evident that the ECNN performed the
best among the three models for all the metrics. The Accuracy, F1-Score, Preci-
sion, Sensitivity, and Specificity values for the ECNN model are 0.7874, 0.8139,
0.7240, 0.9292, and 0.6454 respectively. Also, it is to be noted that the number of
parameters required to train the convolutional neural network (ECNN) is much
less when compared to both the autoencoder architectures (FCAE, CAEB). The
number of parameters that need to be trained (excluding classifier) in FCAE,
CAEB and ECNN are 261,507, 377,583 and 112,897 respectively. So, with less
Comparison of Deep Learning Models for Cancer Metastases Detection 179
training time, ECNN is able to generalize the dataset much faster. However, it
must be noted that ECNN extracts features in a supervised manner while the
feature extraction in both FCAE and CAEB networks is unsupervised. Thus,
there is an increase in the number of parameters to be learnt and the training
time for the FCAE and CAEB networks.
Between the autoencoder models, though CAEB encoded the features much
more sparsely when compared to FCAE, experimental results show that it do
not improve the performance of the classification. From Table 5 it can be noted
that FCAE performed much better compared to CAEB. Both the FCAE and
the CAEB networks have not given satisfactory results when compared with
ECNN model. Thus, it can be noted that a CNN model is more preferable if we
have large labelled training data whereas during lack of labelled data autoen-
180 V. G. Buddhavarapu and J. A. A. Jothi
7 Conclusion
In this study, deep neural network models like autoencoders and convolutional
neural networks are explored for the task of automatic metastases detection from
histopathology images. Experimental results showed that CNNs are superior
then autoencoders for the given task. The results also showed that autoencoders
can be used for unsupervised feature extraction of sparse features that can be
utilized for semi-supervised classification task of histopathology images. This
would allow for training of classifiers even with limited labeled training data.
Future work includes optimization of parameters and architecture for better
performance.
References
1. Breast cancer. https://www.who.int/cancer/prevention/diagnosis-screening/breas
t-cancer/en/. Accessed 02 Dec 2019
2. Cancer statistics. https://www.cancer.gov/about-cancer/understanding/statisti
cs. Accessed 02 Dec 2019
3. Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detec-
tion of lymph node metastases in women with breast cancer. JAMA 318(22),
2199–2210 (2017)
4. DeSantis, C.E., et al.: Breast cancer statistics, 2019. CA: Cancer J. Clin. 69(6),
438–451 (2019)
5. Feng, Y., Zhang, L., Yi, Z.: Breast cancer cell nuclei classification in histopathology
images using deep neural networks. Int. J. Comput. Assist. Radiol. Surg. 13(2),
179–191 (2018). https://doi.org/10.1007/s11548-017-1663-9
6. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge
(2016)
7. Hou, L., et al.: Sparse autoencoder for unsupervised nucleus detection and repre-
sentation in histopathology images. Pattern Recogn. 86, 188–200 (2019)
8. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by
reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
9. Janowczyk, A., Madabhushi, A.: Deep learning for digital pathology image analysis:
a comprehensive tutorial with selected use cases. J. Pathol. Inform. 7, 29 (2016)
10. Kingma, D.P., Welling, M.: An introduction to variational autoencoders. arXiv
preprint arXiv:1906.02691 (2019)
11. Kong, B., Sun, S., Wang, X., Song, Q., Zhang, S.: Invasive cancer detection utilizing
compressed convolutional neural network and transfer learning. In: Frangi, A.F.,
Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI
2018. LNCS, vol. 11071, pp. 156–164. Springer, Cham (2018). https://doi.org/10.
1007/978-3-030-00934-2 18
Comparison of Deep Learning Models for Cancer Metastases Detection 181
12. Koohbanani, N.A., Qaisar, T., Shaban, M., Gamper, J., Rajpoot, N.: Significance
of hyperparameter optimization for metastasis detection in breast histology images.
In: Stoyanov, D., et al. (eds.) OMIA/COMPAY -2018. LNCS, vol. 11039, pp. 139–
147. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00949-6 17
13. Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.:
Machine learning applications in cancer prognosis and prediction. Comput. Struct.
Biotechnol. J. 13, 8–17 (2015)
14. Kramer, M.A.: Nonlinear principal component analysis using autoassociative neu-
ral networks. AIChE J. 37(2), 233–243 (1991)
15. Li, Y., Ping, W.: Cancer metastasis detection with neural conditional random field.
arXiv preprint arXiv:1806.07064 (2018)
16. Liu, Y., et al.: Detecting cancer metastases on gigapixel pathology images. arXiv
preprint arXiv:1703.02442 (2017)
17. Liu, Y., et al.: Detecting cancer metastases on gigapixel pathology images. CoRR
abs/1703.02442 (2017). http://arxiv.org/abs/1703.02442
18. Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equiv-
ariant CNNs for digital pathology. In: Frangi, A.F., Schnabel, J.A., Davatzikos,
C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11071, pp.
210–218. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00934-2 24
19. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., Bottou, L.:
Stacked denoising autoencoders: Learning useful representations in a deep network
with a local denoising criterion. J. Mach. Learn. Res. 11(12), 3371–3408 (2010)
20. Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for
identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718 (2016)
21. Xu, J., Xiang, L., Hang, R., Wu, J.: Stacked sparse autoencoder (SSAE) based
framework for nuclei patch classification on breast cancer histopathology. In: 2014
IEEE 11th International Symposium on Biomedical Imaging (ISBI), pp. 999–1002.
IEEE (2014)
22. Xu, J., et al.: Stacked sparse autoencoder (SSAE) for nuclei detection on breast
cancer histopathology images. IEEE Trans. Med. Imaging 35(1), 119–130 (2015)
A Lightweight Hybrid Majority Vote Classifier
Using Top-k Dataset
1 Introduction
Human activity aims to capture the state of a subject and his/her environment using
different sensors attached on their bodies and within a closed location which enables
computing-aware application. Thus permits continuous monitoring of individual
numerical physiological behavior to provide personalized healthcare to improve indi-
viduals’ well-being [1]. The effort of identification of human activity is automatically
derived using supervised learning and classification algorithms in Human Activity
Recognition (HAR) to predict and improve human behavior within their environment
[1, 2]. Monitoring is achievable through Smartphone which is carried by its owners
24 h a day. Notwithstanding, its resource constraints including: (a) computational
power, (b) storage space, (c) memory allocation, (d) small screens and (e) battery life
span. Most Smartphones are equipped with sensor such as accelerometer, gyroscope,
GPS, Bluetooth, Light sensor, which unlocked context-aware computing [3]. Most
researchers in HAR use in-expensive accelerometer sensor to train and classify Human
Activities (HAs) such as sitting, standing, laying, walking, running, jumping, and
cycling, etc. in real time using Smartphone [1–6].
The supervised algorithms such as KNN, Naïve Bayes, Support Vector Machine,
C4.5 with largest training dataset can produce reliable classifier at expense of limited
2 Related Work
human activities. The Smartphone recorded the forces applied along the X, Y, Z axes
per m=s2 for each activity at frequency rate of 0.5 s, collecting 120 samples per minute.
Similar to the work of [9], time domain features were extracted and used to train and
test the AdaBoost, J48, Support vector Machines and Random forest in WEKA. They
[12] reported the accuracy rate of 98%, 96%, 95% and 93% for each models in the
order given. Most ensemble techniques in HAR combines tree-oriented algorithms,
because they are effective with smallest personalized dataset and achieved accuracy of
98% [15–19].
Contrarily to tree-oriented ensemble algorithm, the technique proposed by [20]
combines a bunch of data hungry Naïve Bayes classifier as expert algorithms using
weighted majority voting rule similar to [14]. The technique observes both experts and
true class for each sample of training set in order to compute conditional probability,
where once conditional probability is learned from training data, the combiner classifies
each unknown sample using weighted voting strategy. The overall rationale of their
[20] technique is to jointly analyst expert responses to capture collective behavior to
classify a sample. Thus, their [20] decision classifier, is based on all expert responses
without considering the performance of single classifiers. The technique was evaluated
using three dataset (MFeat, Optodigit and Pendigit) against bagging and boosting
algorithms, and found a slight increase in accuracy averaging to 97% in all datasets.
Another similar technique using data hungry algorithms (Naïve Bayes Tree, KNN and
third C4.5) was proposed by [13] based on [14] combiner strategy. Their technique
averages the probability voting rules used to combine three hypothesis (h1, h2 and h3)
of each leaners. Thus, each output class, a posteriori probabilities are generated by
individual learners, a class with maximum posteriori average is selected to be voting
hypothesis (h*). They [13] performed experiments using 10 fold cross validation on 28
datasets on WEKA, and found that their proposed voting models perform better than
individual classifier (Naïve Bayes Tree and KNN) in exception to C4.5.
In this article we propose a new ensemble algorithms to integrate Gaussian Naïve
Bayes (GNB) and KNN based on our split-then-vote strategy. Our technique combines
GNB and KNN into joint marriage of convenience based on integral conditionality to
address minor features problem presented in [7, 21] and [22] using a little compressed
dataset meant for resource constraint Smartphone. Since both algorithms performs far
better with higher dimensional space dataset. We found that GNB algorithm postpones
data processing until classification arises similar to KNN. Because GNB depends on
computation of standard deviation for new instance data point relative to mean average
of training input predicators per class. Moreover, the KNN is prone to minor-feature
miss-classification problem that cause two similar attributes to be recognized far apart
from each other due to limited [21–24]. Lastly, KNN uses constant K value to control
number of nearest neighbors to determine majority class to classify new instance data
point, see Fig. 1.
A Lightweight Hybrid Majority Vote Classifier Using Top-k Dataset 185
3 Methodology
In this paper we propose a reliable and effective lightweight algorithm suitable for
small personalized dataset and constraint Smartphone. A combination of Gaussian
Naïve Bayes (GNB) with KNN as hybrid algorithm for little dataset is rare, because
both algorithms heavily depends on largest dataset. We combine both algorithms based
on joint majority vote probability function to include data points in each class as
potential neighbors to resolve minor feature miss-classification indicated in Fig. 1. Our
proposed split-then-vote combiner strategy in Fig. 2 is presented in three steps to
reduce, split and combine GBN and KNN into Lightweight Hybrid Majority Vote
(LHMV).
In the next Section we present data collection method followed to collect person-
alized dataset.
ZR
1 XN
reducedTrainingTree RS
¼ p
x¼1 x
ð1Þ
S
S
That is, for all mean averages ordered as fl1 l2 l3 . . . lr g under area of
CDF, are normally distributed within integral conditional function f lr r j relative to
real time instances RT ¼ fr 1 , r 2 , r 3 ; r j g. Therefore, we compute standard deviation r
using Eq. (6), for all mean averages lr for each class cm category in relation to real-
time data point r j
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 Xk 2
r¼ ðr x l Þ ð6Þ
k1 x¼1 x
All computed r are accumulated as product probabilities into vote list Dk using
Bayes Theorem Eq. (7), used to classify real-time instance.
YK
Dk ¼ x¼1
IGNBðlx jr x Þ ð7Þ
1 Xk
EK ¼ DI ðlx ; r x Þ ð8Þ
K x¼1
Where DI is a computed Euclidean distance between all mean lx averages and real-
time data points r x within integral conditional f ðlx [ r x Þ function. All computed
smallest distances are accumulated into voting list E K using Eq. (8) and (9):
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X k Z lx
DI ðl [ r i Þ ¼ x¼1
ðlx r x Þ2 ð9Þ
rx
Lastly voting list DK and E K of IGNB and KNN are joined into union of conve-
nience DK [ E K as joint majority voting and thus increases classification features as
potential neighbors within K neighborhood. The joint majority vote probability is based
A Lightweight Hybrid Majority Vote Classifier Using Top-k Dataset 189
on existing principle of KNN, where DK and E K are called Joint Voting (JV) as our
LHMV defined by Eq. (10):
argmax X ðDk [ Dk Þ
JV ¼ ðDk [ E k Þ 2 V K
; ð10Þ
v K
Where K is a number of potential neighborhood majority in Dk and Ek voting lists,
corresponding to predicted class label in vote list V K for each class C ¼ fc1 ; c2 , …; cm }
in reducedTrainingTreeRN . Equation (10) determines a majority class by comparing
Dk and E k for the most appearing class category in both majority votes lists. Where Dk
is maximum probability of IGNB ranked in descending order due to the maximum
probability Gaussian Naïve Bayes algorithm, whereas a vector E k of KNN is sorted in
ascending order, therefore top K neighbours are matched, a class category that appears
the most in both sorted vectors is taken as majority class and is assigned to new
instance as class label.
4 Experimentation
Our experiments aim at answering the question “Will the integration of two simpler
algorithms KNN and Naïve Bayes improves the accuracy and precision of human
activities using compressed personalized dataset?”. We simulated our Split-Then-Vote
LHMV in R programming language and compare it with R existing algorithms (C4.5,
Boosted Tree, Naïve Bayes, SVM, and Random Forest) [25–27]. As input, we parti-
tioned our collected top-k Personalized Dataset of 2860 rows into 0.7:0.3 ratio as 70%
training and 30% test datasets respectively. Afterward, the partitioned training dataset
is further normalized and transformed 70% training dataset to 60 compressed groups of
best mean averages per human activity similar to [7, 22]. Consequently, we used our
transformed and reduced class-tree as input to train our LHMV algorithm.
The remaining 0.3 dataset is used for testing. We selected the neighborhood K to 1,
3 and 5 as majority voting [22, 28]. We repeated the experiments 5 times to determine
the consistence of each run. The results are given in confusion matrix, which include
True Positive (TP), False Positive (FP), False Negatives (FN), Precision and Accuracy.
We also compared our results with R existing algorithms (C4.5, Boosted Tree, Naïve
Bayes, SVM, and Random Forest) using default settings [25, 26, 28]. The results of our
implemented LHMV algorithm using different K (1, 3 and 5) neighbors are presented
in confusion Table 2, Table 3 and Table 4 respectively.
The results presented in Table 2, Table 3 and Table 4 reveal consistent overlap of
similar postural human activities with highest accuracy and precision irrespective of the
190
Walking slowly
Brisk walking
Jogging
Laying
Normal walking pace
Rope jumping
Running
Sitting
Standing
Walking downstairs
Walking upstairs
Total
TP
TN
FP
FN
Accuracy
Precision
M. L. Gadebe and O. P. Kogeda
Walking slowly 11 3 0 0 13 0 1 0 0 0 8 36 11 24 1 0 97 92
Brisk walking 4 20 7 0 7 0 6 0 0 10 6 60 20 27 13 0 78 61
Jogging 0 5 21 0 0 4 16 8 0 3 2 59 21 16 22 0 63 49
Laying 0 0 0 9 1 0 0 0 48 0 2 60 9 48 3 0 95 75
Normal walking pace 8 6 1 0 12 0 0 0 0 24 7 58 12 45 1 0 98 92
Rope jumping 1 7 4 0 0 39 8 0 0 1 0 60 39 0 21 0 65 65
Running 0 8 13 0 1 2 34 0 0 2 0 60 34 0 1 0 57 97
Sitting 5 0 3 0 3 0 0 16 4 29 0 60 16 0 44 0 27 27
Standing 2 0 0 24 2 0 0 10 20 1 1 60 20 34 6 0 90 77
Walking downstairs 9 6 1 0 10 1 0 0 0 23 7 57 23 32 2 0 96 92
Walking upstairs 6 5 2 0 19 0 0 0 0 10 12 54 12 40 2 0 96 86
Table 3. LHMV Confusion matrix with K nearest equal to 3
Walking slowly
Brisk walking
Jogging
Laying
Normal walking pace
Rope jumping
Running
Sitting
Standing
Walking downstairs
Walking upstairs
Total
TP
TN
FP
FN
Accuracy
Precision
Walking slowly 15 3 0 0 8 0 1 0 0 0 6 33 15 17 1 0 97 94
Brisk walking 4 20 6 0 5 1 7 0 0 11 6 60 20 26 14 0 77 59
Jogging 0 3 22 0 0 4 17 8 0 4 1 59 22 17 20 0 66 52
Laying 0 0 0 9 1 0 0 0 48 0 2 60 9 48 3 0 95 75
Normal walking pace 9 5 1 0 13 0 0 0 0 25 5 58 13 44 1 0 98 93
Rope jumping 0 2 4 0 0 51 2 0 0 1 0 60 51 0 9 0 85 85
Running 0 2 11 0 0 2 43 0 0 2 0 60 43 0 1 0 72 98
Sitting 2 0 3 0 2 0 0 41 1 11 0 60 41 0 19 0 68 68
Standing 2 0 0 7 0 0 0 10 39 1 1 60 39 17 4 0 93 91
Walking downstairs 6 7 2 0 10 1 0 1 0 29 3 59 29 26 4 0 93 88
Walking upstairs 6 5 1 0 17 1 0 0 0 15 10 55 10 43 2 0 96 83
A Lightweight Hybrid Majority Vote Classifier Using Top-k Dataset
191
192
Walking slowly
Brisk walking
Jogging
Laying
Normal walking pace
Rope jumping
Running
Sitting
Standing
Walking downstairs
Walking upstairs
Total
TP
TN
FP
FN
Accuracy
Precision
Walking slowly 14 2 0 0 13 0 1 0 0 19 6 55 14 40 1 0 98 93
Brisk walking 3 25 5 0 4 0 6 0 0 13 4 60 25 24 11 0 82 69
Jogging 0 3 20 0 0 4 19 8 0 4 1 59 20 19 20 0 66 50
Laying 0 0 0 12 1 0 0 0 45 0 2 60 12 45 3 0 95 80
Normal walking pace 6 6 2 0 11 0 0 0 0 29 4 58 11 45 2 0 97 85
Rope jumping 0 3 4 0 0 50 2 0 0 1 0 60 50 0 10 0 83 83
Running 0 1 12 0 0 2 44 0 0 1 0 60 44 0 1 0 73 98
Sitting 1 0 3 0 0 0 0 42 3 11 0 60 42 0 18 0 70 70
Standing 2 0 0 3 0 0 0 10 44 1 0 60 44 13 3 0 95 94
Walking downstairs 3 6 1 0 13 1 0 1 0 31 3 59 31 25 3 0 95 91
Walking upstairs 5 4 1 0 21 1 0 0 0 13 10 55 10 43 2 0 96 83
A Lightweight Hybrid Majority Vote Classifier Using Top-k Dataset 193
value of K neighbors. There is a balanced accuracy and precision of 95% and 83%
respectively across similar walking activities (walking slow, normal walking, and brisk
walking and walking upstairs and downstairs). This accuracy is attributed to majority
of TN than TP due to existing resemblance among walking activities owing to human
legs strides and transitions which was captured using harmonic motion discussed in
[21]. In all Table 2, Table 3 and Table 4 there is a constant accuracy of 66% in presence
of overlapping similar human activities such as jogging and running activities; in
comparison to higher accuracy bracket of 83% and 85%, and precision range of 83%
and 85% for rope jumping activities. The results also indicate steady accuracy and
precision of 95% on static human activities. However, the presented results showed and
proved that our LHMV algorithm with small personalized dataset achieves higher
classification accuracy and precision in presence of TN majority than TP on complex
human activities. The results presented in Table 2, Table 3 and Table 4 are summarized
and compared with R existing C4.5, Boosted Tree, Naïve Bayes, SVM, and Random
Forest results presented in accuracy and precision in Fig. 3 and Fig. 4.
The comparison results show that our LHMV comes third after tree-oriented
classifiers (Random Forest and C4.5) and above Boosted Tree in most complex and
static human activities with accuracy above 80% and 90%. Hence, we can confidently
100
90
80
Percentage
70
60
50
40
30
20
10
0
Human AcƟviƟes
60
50
40
30
20
10
0
Human AcƟvity
indicate that our Split-then-vote algorithm is competitive and ranks higher with Tree-
oriented algorithm yet with compressed smaller training dataset than C4.5 and Random
Forest. The comparison results are significant considering the fact that our LHMV
demonstrated superiority over similar algorithms such as KNN and Naive Bayes and
Support Vector Machines. Our LHMV algorithm showed a significant improvement
over KNN and Naïve Bayes in the presence of minor features due to reduced and
compressed training instances; yet again in all three different K values (1, 3 and 5) it
consistently produced higher accuracy and precision as compared to KNN. Further-
more, the results demonstrated that our LHMV algorithm is consistent and capable of
classifying a group of similar postural human activities with higher accuracy due to
majority of TN than TP with different K neighborhood as compared to [7] and [22].
In this paper, we presented our LHMV algorithm suitable for small personalized dataset
meant for Smartphone. We combined two simpler, intuitive and data hungry (GNB and
KNN) algorithms based on our own split-then-vote combining strategy. Moreover, the
training dataset is compressed to select best features similar to C4.5 algorithm based on
best mean average of input predicators. The compressed dataset reduces the training
time of lazy classifier, yet improves classification accuracy and precision with higher
TN than TP, than its predecessor KNN and Naïve Bayes. We can conclude that our
LHMV as union of convenience of KNN and Naïve Bayes improved classification
accuracy and precision of data hungry algorithms using smallest personalized dataset.
In future we intend to implement and evaluated using different datasets.
A Lightweight Hybrid Majority Vote Classifier Using Top-k Dataset 195
References
1. Anguita, D, Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: A public domain dataset for
human activity recognition using smartphones. In: ESANN, pp. 437–442 (2013)
2. Zhang, M., Sawchuk, A.A.: USC-HAD: a daily activity dataset for ubiquitous activity
recognition using wearable sensors. In: Proceedings of the 2012 ACM Conference on
Ubiquitous Computing, 5 September, pp. 1036–1043. ACM (2012)
3. Parkka, J., Ermes, M., Korpipaa, P., Mantyjarvi, J., Peltola, J., Korhonen, I.: Activity
classification using realistic data from wearable sensors. IEEE Trans. Inf Technol. Biomed.
10(1), 119–128 (2006)
4. Reiss, A., Stricker, D.: Introducing a new benchmarked dataset for activity monitoring. In:
2012 16th International Symposium on Wearable Computers (ISWC), June 18, pp. 108–109.
IEEE (2012)
5. Su, X., Tong, H., Ji, P.: Activity recognition with smartphone sensors. Tsinghua Sci.
Technol. 19(3), 235–249 (2014)
6. Mannini, A., Intille, S.S., Rosenberger, M., Sabatini, A.M., Haskell, W.: Activity
recognition using a single accelerometer placed at the wrist or ankle. Med. Sci. Sports
Exerc. 45(11), 2193 (2013)
7. Kaghyan, S., Sarukhanyan, H.: Activity recognition using K-nearest neighbor algorithm on
smartphone with tri-axial accelerometer. Int. J. Inform. Models Anal. (IJIMA) 1, 146–156
(2012)
8. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
9. Kwapisz, J.R., Weiss, G.M., Moore, S.A.: Activity recognition using cell phone
accelerometers. ACM SIGKDD Explor. Newslett. 12(2), 74–82 (2011)
10. Catal, C., Tufekci, S., Pirmit, E., Kocabag, G.: On the use of ensemble of classifiers for
accelerometer-based activity recognition. Appl. Soft Comput. 31(37), 1018–1022 (2015)
11. Daghistani, T., Alshammari, R.: Improving accelerometer-based activity recognition by
using ensemble of classifiers. Int. J. Adv. Comput. Sci. Appl. 7(5), 128–133 (2016)
12. Gupta, S., Kumar, A.: Human activity recognition through smartphone’s tri-axial
accelerometer using time domain wave analysis and machine learning. Int. J. Comput.
Appl. 127(18), 22–26 (2015)
13. Gandhi, I., Pandey, M.: Hybrid ensemble of classifiers using voting. In: 2015 International
Conference on Green Computing and Internet of Things (ICGCIoT), pp. 399–404. IEEE,
October 2015
14. Chan, P.K., Stolfo, S.J.: Experiments on multistrategy learning by meta-learning. In:
Proceedings of the Second International Conference on Information and Knowledge
Management, pp. 314–323, December 1993
15. Reiss, A., Hendeby, G., Stricker, D.: A competitive approach for human activity recognition
on smartphones. In: European Symposium on Artificial Neural Networks, Computational
Intelligence and Machine Learning, ESANN 2013, Bruges, Belgium, pp. 455–460, 24–26
April 2013. ESANN (2013)
16. Reiss, A.: Personalized mobile physical activity monitoring for everyday life. PHD thesis in
Computer Science, Technical University of Kaiserslautern (2014)
17. Bao, L., Intille, S.: Activity recognition from user-annotated acceleration data. Pervasive
Comput. 1–7 (2004)
18. Ravi, N., Dandekar, N., Mysore, P., Littman, M.L.: Activity recognition from accelerometer
data. In: AAAI 2005, vol. 5, no. 2005, pp. 1541–1546, 9 July 2005
19. Lockhart, J.W., Weiss, G.M.: Limitations with activity recognition methodology & data sets.
In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and
196 M. L. Gadebe and O. P. Kogeda
Abstract. Every brand created by the company plays a vital role in its
recognition, differentiation and sustaining loyal customer base. These
brands are analysed at different stages of their life cycle to under-
stand their personality, sentiments and emotions that are perceived and
expressed by their customers, thereby helping the companies to achieve a
positive brand image. Real time experience data about brands are avail-
able in abundance in the form of short texts on Twitter. In this paper,
based on Plutchik’s wheel of eight primary emotions-joy, sadness, trust,
fear, surprise, disgust, anger and anticipation, we have proposed a brand
comparison tool which carries out emotion analysis on the tweets with
the help of Synonym based Naive Bayes retry approach, thus quantifying
emotions from tweets about the brands. The tool has achieved an overall
accuracy of about 82.5%. This brand comparison tool helps companies
to understand their strengths and shortcomings among their competi-
tors. The feedback from this tool can be used to realign the marketing
strategies and goals of the companies.
1 Introduction
Consumers provide their opinions about products and services in the form of
videos, photos, or texts. Among them, textual data provide a wider scope to eas-
ily analyse their emotions and sentiments. Twitter has evolved as a giant infor-
mation sharing platform over the years. It provides specific features of retrieving
the tweets with keyword or hashtag for a specific period of time. As per Brand-
Watch report in January 2020, Twitter has around 330 Million monthly active
users, 145 Million daily active users contributing to 6,000 tweets every second,
accounting to 500 million tweets every day.
Artificial Intelligence is gaining profound importance in Brand Management.
Naive Bayes, Maximum Entropy and Support Vector Machines are some of the
Machine Learning approaches used for classification. Of this, Naive Bayes clas-
sifier proved to be one of the most accurate text classification techniques for
sentiment and emotion analysis. In addition, we are implementing a Synonym
based Naive Bayes retry approach in our tool to minimise the impact of data
scarcity and assumption of a generic probability value for classifying tweets.
Maintaining the brand image is significant for any company to survive in this
competitive world. Positive brand image has a direct correlation to the profitabil-
ity of the company [5]. Hence, companies are investing in AI based technologies
for understanding the current positioning of their brands.
Till now, sentiment analysis is given prime importance in classifying customer
views. Sentiment analysis only categorises the opinions as positive, neutral and
negative feedback. The current scenario demands in-depth understanding about
customers perceptions about the brand. This is possible through classifying their
feelings with the help of emotion analysis. This, in turn, helps to get in line with
the impulse and motives of the end users towards the brand. Hence we attempt to
create a brand comparison tool that uses the above to identify the gap between a
company’s expectations of the brand and how it is recognised by the consumers.
This tool assists the brand managers and marketers to devise a strategic plan
for brand management.
The paper is organized as follows: Sect. 2 gives a brief explanation of related
work in the research area of sentiment and emotion analysis. Section 3 explains
the proposed approach to pre-process tweet data for effective categorization and
the enhanced implementation of a supervised classifier for emotion analysis.
Section 4 gives a brief overview of the performance evaluation metrics carried out
for the tool and the inferences derived from the outcomes of the tool. Section 5
concludes with discussing the potential works that can be carried out in the
future as an extension of this research work.
2 Related Works
In a technology driven world, customers’ opinions posted in digital format have
gained much importance like word of mouth. [1] speaks about the huge value of
Twitter and other social media to gauge public perception. Brand comparison
can be done at various levels such as equity [9], personality [11], emotion [10],
and sentiment [13]. It is found that most of the research has been carried out
Twitter Emotion Analysis for Brand Comparison 201
Fig. 1. Plutchik’s wheel of eight primary emotions and twenty four secondary feelings
with Sentiment analysis. In [1], a tool called Geo-location specific Public Per-
ception Visualizer was designed where tweets were mined, manually annotated,
and classified into positive, negative or neutral sentiments using the Maximum
Entropy model. [13] focuses on understanding the opinions of people from Twit-
ter to compare top colleges in India using Sentiment Analysis. These tweets of
various colleges are widely classified into positive, negative ad neutral opinions
and are plotted in a graph. This paper focuses on effective data cleaning pro-
cesses including removal of URLs and elimination of duplicate tweets to avoid
spam reviews which we have also incorporated in our work.
Emotions are “preconscious social expressions of feelings and affect influenced
by culture” [7]. Sentiments are explained as “partly social constructs of emotions
that develop over time and are enduring”. Opinions and Emotions together form
information which are the result of personal interpretation. Since feelings are
expressed in majority as text in Social Media impulsively at a raw emotional
level than sentimental level, analyzing the emotions provides vital outcomes.
According to [3], data was extracted using Twitter API and processed with
Stanford NLP to rate the tweets using sentistrength method. At last each tweet
was classified under five categories based on the overall score of the tweet such
as worst, bad, neutral, decent and wonderful. The Sentistrength method focuses
on categorising an emotion using a range of values based on its intensity. This
research was aimed at finding the sentiment of a product, iphone6. While this
paper concentrated on analyzing a specific brand, our project analyzes various
emotions of brands.
Text emotion classification is carried out majorly by Naive Bayes, Maxi-
mum Entropy and Support Vector Machine (SVM) under Supervised Learning
techniques. Naive Bayes and SVM are very accurate in sentiment analysis [8].
However, the performance and rate of the Naive Bayes model is higher and
more accurate than the Maximum Entropy model, K-nearest neighbor classifier,
Decision Tree classifier in text classification [14]. Our tool uses the Naive Bayes
202 S. Shanmugam and I. Padmanaban
3 Our Approach
Fig. 2. Flow chart of Twitter Emotion Analysis based on Plutchik’s wheel of emotions
for Brand Comparison using Synonym based Naive Bayes retry mechanism
Twitter Emotion Analysis for Brand Comparison 203
Table 1. Sample training dataset for root words across eight primary emotions
with a single negative word for example - “not happy” is replaced as “sad” which
is almost equal in emotion. This is achieved by identifying the appropriate word
following the negative identifier (“not”) till the next following punctuation, and
fetching the antonym of the word from the online thesaurus finder website by
establishing a url connection to the site. The antonym is retrieved by crawling
and parsing the website. These filtering steps help in better classification of data
based on emotion and improve the accuracy of the tool.
the most important forms of words like nouns, verbs, adjectives, and adverbs
for tweet emotion analysis. As a result of this process, unnecessary words are
eliminated and the effectiveness of this tool is improved to get better results.
The brand and its reviews in the form of a series of root words are inserted in a
multimap with the brand name as a key for further processing.
Fig. 5. Stanford CoreNLP output with root words and their corresponding POS and
NER tags
Fig. 6. Comparison of brands with recent customer review tweets across eight primary
emotions
Once the tweets are analyzed and emotional value for both the brands are com-
puted, the outputs are described in the form of Pie charts and Bar charts as
shown in Fig. 6, 7, 10 and 11. The pie chart of the particular brand quantifies
the emotional impact the brand has over its consumers that are expressed in the
form of tweet data. The bar chart shows the comparison of various emotional
responses of consumers between brands. Based on the eight primary emotions
calculated for a brand, the corresponding twenty four secondary feelings have
been derived according to Plutchik’s wheel of emotions. This level of in-depth
classification of emotions and feelings will help marketers in better understand-
ing of the existing position of the brand in the minds of consumers. For example,
optimism (feeling) is a secondary dyad which is obtained by the combination of
primary emotions - anticipation and joy.
Tweets of the brands for the past six months are also retrieved for in-depth anal-
ysis and understanding of the change in consumer emotions across time periods.
These tweets are pre-processed and go through emotion analysis as discussed
in the above steps. Figure 8 and 9 gives a sample output of the effect of brand
resonance over a period of six months. This helps marketers to understand how
the marketing campaigns and engagement programmes by the brands influence
the consumers. Based on this analysis, they can understand whether the strate-
gies undertaken by companies across the past months have created a satisfactory
influence on customers or not.
Twitter Emotion Analysis for Brand Comparison 207
Fig. 7. Comparison of brands with recent customer review tweets across twenty four
secondary feelings
Table 2. Confusion matrix of actual and predicted tweets classification into eight
primary emotions for eight hundred samples
4.2 Evaluations
L
T Pl
l=1
recall = (2)
L
T P l + F Nl
l=1
L
T Pl
l=1
precision = (3)
L
T P l + F Pl
l=1
Twitter Emotion Analysis for Brand Comparison 209
F1-Score. F1-Score measures the accuracy of a model with the combined results
of Precision and Recall to give an harmonic mean value as result with the help
of Eq. 4. This metric provides more insight towards how our tool is performing
accurately. The F1 score for all emotion classes computed from our tool ranges
from 0.78 to 0.89.
L
2T Pl
l=1
F 1 − Score = (4)
L
2T Pl + F Pl + F Nl
l=1
TP + TN
accuracy = (5)
TP + FP + FN + TN
Table 3. Output of performance evaluation metrics for our Brand Comparison tool
4.3 Findings
We chose the hospitality industry for brand comparison wherein the top brands
like OYO Rooms and Airbnb are chosen. For each brand, 2000 recent tweet
reviews are extracted and processed with the brand comparator. From the output
it is inferred that OYO Rooms has sadness as the top emotion with 15.6% among
its customers who expressed their views on twitter. Similarly, Airbnb has joy as
the top emotion with 16.7%. Two of the extreme primary emotions- joy and
210 S. Shanmugam and I. Padmanaban
sadness are obtained as output in this scenario. With respect to feelings, guilt
which is a combination of joy and fear is the highest feeling received for Airbnb.
Remorse is the most incurred feeling for OYO Rooms which is a combination
of sadness and disgust. Extended brand comparison is carried out for a period
of six months. It can be deduced from Fig. 8 and 9 that the top two emotions -
joy and sadness remain the same for Airbnb and OYO Rooms, wherein we could
see a direct relationship between the two emotions in Airbnb but there is a huge
fluctuation observed in the above two emotions for OYO Rooms.
Fig. 10. Emotion Analysis of twitter Fig. 11. Emotion Analysis of twitter
reviews for Airbnb reviews for OYO Rooms
5.1 Conclusion
This module can be attached to existing sentiment analysis tools to gain better
insights at emotional level. A multi brand comparison tool can be developed
as an extension of this project with an approach that has improved accuracy.
Twitter Emotion Analysis for Brand Comparison 211
References
1. Aggarwal, A., Singh, A.K.: Geo-localized public perception visualization using
GLOPP for social media. In: 8th IEEE Annual Information Technology, Electronics
and Mobile Communication Conference (IEMCON), pp. 439–445 (2017)
2. Kumar, A., Dogra, P., Dabas, V.: Emotion analysis of Twitter using opinion min-
ing. In: 2015 8th International Conference on Contemporary Computing (IC3), pp.
285–290 (2015)
3. Geetha, R., Rekha, P., Karthika, S.: Twitter opinion mining and boosting using
sentiment analysis. In: International Conference on Computer, Communication,
and Signal Processing (ICCCSP), pp. 1–4 (2018)
4. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown,
D.: Text classification algorithms: a survey. Information 10(4), 150 (2019)
5. Thomas, H., Sondoh Jr, S., Mojolou, D., Tanakinjal, G.: Customer relationship
management (CRM) as a predictor to organization’s profitability: empirical study
in telecommunication company in Sabah (2018)
6. Abdul-Mageed, M., Ungar, L.: EmoNet: fine-grained emotion detection with gated
recurrent neural networks. In: Proceedings of the 55th Annual Meeting of the Asso-
ciation for Computational Linguistics (vol. 1: Long papers), pp. 718–728 (2017)
7. Munezero, M., Montero, C.S., Sutinen, E., Pajunen, J.: Are they different? affect,
feeling, emotion, sentiment, and opinion detection in text. IEEE Trans. Affect.
Comput. 5(2), 101–111 (2014)
8. Yu, B.: An evaluation of text classification methods for literary study. Literary
Linguist. Comput. 23(3), 327–343 (2008)
9. Stocchi, L., Fuller, R.: A comparison of brand equity strength across consumer
segments and markets. J. Prod. Brand Manag. (2017)
10. Becheur, I., Bayarassou, O., Ghrib, H.: Beyond brand personality: building
consumer–brand emotional relationship. Glob. Bus. Rev. 18, S128–S144 (2017)
11. Srivastava, K., Sharma, N.K.: Consumer perception of brand personality: an empir-
ical evidence from India. Glob. Bus. Rev. 17(2), 375–388 (2016)
12. Le, B., Nguyen, H.: Twitter sentiment analysis using machine learning techniques.
In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods
for Knowledge Engineering. AISC, vol. 358, pp. 279–289. Springer, Cham (2015).
https://doi.org/10.1007/978-3-319-17996-4 25
13. Mamgain, N., Mehta, E., Mittal, A., Bhatt G.: Sentiment analysis of top colleges
in India using Twitter data. In International Conference on Computational Tech-
niques in Information and Communication Technologies (ICCTICT), pp. 525–530
(2016)
14. Wang, Y., Fu, W., Sui, A., Ding, Y.: Comparison of four text classifiers on movie
reviews. In 3rd International Conference on Applied Computing and Information
Technology/2nd International Conference on Computational Science and Intelli-
gence, pp. 495–498 (2015)
A Smart Card Based Lightweight Multi Server
Encryption Scheme
Pranav Vyas(&)
1 Introduction
These days the world is increasingly becoming app oriented. The apps are used for
different purposes, starting from purchasing groceries and electronics to financial trans-
actions to health and fitness. While using these apps, users share large amount of personal
information that is sensitive in nature. The users receive many services over the internet in
response of payment. Currently in India one of the preferred modes of payment for online
transactions is by debit card. The usage of debit card for payment serves multiple pur-
poses: 1) from the perspective of issuing authority it is also easier as giving a single card to
customer reduces cost of production 2) it keeps your actual account number hidden 3) a
single card can be used on multiple platforms for payment towards variety of services 4)
since it is just a single card, managing its information is much easier for the user.
A debit card is associated with a single pin. This feature of a single pin makes it
very suitable to use for multiple services. A user can use the debit card to pay towards a
number of services provided by various platforms on the internet. However, if a user
wishes to subscribe to new service, he/she must register with the new platform and
provide identity details to the new platform. A debit card can effectively be used in
instances such as online grocery shopping, reservation or ticket booking and payment
for variety of utilities. A malicious user can keep a close watch on the activities of the
user and gather information that can be used to reveal debit card details or such similar
confidential information. This can result in misuse of cards or even identity theft. This
scenario highlights a major drawback of using a single set of information (debit card
details) on multiple platforms. This issue can be addressed with remote authentication.
2 Background
reinforce the encryption. The Fig. 1 shows rotation of matrix for scrambling of the bits.
Message expansion is shown in Fig. 2 and finally Fig. 3 shows overall encryption
technique.
Assuming that the original message is: 11111111 10100001 11111111 10001110
11111111 11001111 11111111 10111100.
In this section, we present our proposed system for multi-server authentication. Two
major issues with multi-server authentication systems are mutual authentication and
non-repudiation. Lack of these characteristics could result in impersonation type of
attacks by the malicious user. Our authentication scheme overcomes these issues.
4 Conclusion
References
1. Lamport, L.: Password authentication with insecure communication. Commun. ACM 24
(11), 770–772 (1981)
2. Chan, C.-K., Cheng, L.-M.: Cryptanalysis of a remote user authentication scheme using
smart cards. IEEE Trans. Consum. Electron. 46(4), 992–993 (2000)
3. Mitchell, C.: Limitations of challenge-response entity authentication. Electron. Lett. 25(17),
1995–1996 (1989)
4. Awasthi, A.K., Lal, S.: An enhanced remote user authentication scheme using smart cards.
IEEE Trans. Consum. Electron. 50(2), 583–586 (2004)
5. Chang, C., Hwang, K.: Some forgery attacks on a remote user authentication scheme using
smart cards. Informatica 14(3), 289–294 (2003)
6. Shen, J.-J., Lin, C.-W., Hwang, M.-S.: A modified remote user authentication scheme using
smart cards. IEEE Trans. Consum. Electron. 49(2), 414–416 (2003)
7. Radhakrishnan, N., Karuppiah, M.: An efficient and secure remote user mutual authenti-
cation scheme using smart cards for Telecare medical information systems. Inform. Med.
Unlocked 16, 1–11 (2019)
8. Banerjee, S., Chunka, C., Sen, S., Goswami, R.S.: An enhanced and secure biometric based
user authentication scheme in wireless sensor networks using smart cards. Wirel. Pers.
Commun. 107(1), 243–270 (2019)
9. Kumar, S., Singh, V., Sharma, V., Singh, V.P.: Advance remote user authentication scheme
using smart card. Telecommun. Radio Eng. 78(11), 957–971 (2019)
10. Pan, H.-T., Yang, H.-W., Hwang, M.-S.: An enhanced secure smart card-based password
authentication scheme. IJ Netw. Secur. 22(2), 358–363 (2020)
11. Lee, W.-B., Chang, C.-C.: User identification and key distribution maintaining anonymity
for distributed computer networks. Comput Syst Sci Eng 15(4), 211–214 (2000)
12. Chang, C., Cheng, T., Hsueh, W.: A robust and efficient dynamic identity-based multi-server
authentication scheme using smart cards. Int. J. Commun Syst 29(2), 290–306 (2016)
13. Jangirala, S., Mukhopadhyay, S., Das, A.K.: A multi-server environment with secure and
efficient remote user authentication scheme based on dynamic ID using smart cards. Wirel.
Pers. Commun. 95(3), 2735–2767 (2017)
14. Chiou, S.-F., Pan, H.-T., Cahyadi, E.F., Hwang, M.-S.: Cryptanalysis of the mutual
authentication and key agreement protocol with smart cards for wireless communications. IJ
Netw. Secur. 21(1), 100–104 (2019)
15. Chang, C.-C., Hsueh, W.-Y., Cheng, T.-F.: An Advanced anonymous and biometrics-based
multi-server authentication scheme using smart cards. IJ Netw. Secur. 18(6), 1010–1021
(2016)
16. Lwamo, N.M., Zhu, L., Xu, C., Sharif, K., Liu, X., Zhang, C.: SUAA: a secure user
authentication scheme with anonymity for the single & multi-server environments. Inf. Sci.
477, 369–385 (2019)
17. Xu, D., Chen, J., Liu, Q.: Provably secure anonymous three-factor authentication scheme for
multi-server environments. J. Ambient Intell. Humaniz. Comput. 10(2), 611–627 (2018).
https://doi.org/10.1007/s12652-018-0710-x
18. Biswas, A., Roy, A.: A study on dynamic ID based user authentication system using smart
card. Asian J. Converg. Technol. 5(2), 1–7 (2019)
19. Barman, S., Chaudhuri, A., Chatterjee, A., Ramiz Raza, M.: An elliptic curve cryptography-
based multi-server authentication scheme using cancelable biometrics. In: Bhateja, V.,
Satapathy, S.C., Zhang, Y.-D., Aradhya, V.N.M. (eds.) ICICC 2019. AISC, vol. 1034,
pp. 153–163. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1084-7_16
A Smart Card Based Lightweight Multi Server Encryption Scheme 223
20. Sahoo, S.S., Mohanty, S., Sunny, S.K.M.B.: An improved remote user authentication
scheme for multiserver environment using smart cards. In: Recent Findings in Intelligent
Computing Techniques, pp. 217–224 (2019)
21. Chen, T.-H., Hsiang, H.-C., Shih, W.-K.: Security enhancement on an improvement on two
remote user authentication schemes using smart cards. Future Gener. Comput. Syst. 27(4),
377–380 (2011)
22. Lee, C.-C., Lin, T.-H., Chang, R.-X.: A secure dynamic ID based remote user authentication
scheme for multi-server environment using smart cards. Expert Syst. Appl. 38(11), 13863–
13870 (2011)
23. Li, S., Wu, X., Zhao, D., Li, A., Tian, Z., Yang, X.: An efficient dynamic ID-based remote
user authentication scheme using self-certified public keys for multi-server environments.
PLoS ONE 13(10), 1–19 (2018)
Firmware Attack Detection on Gadgets Using
Ridge Regression (FAD-RR)
1 Introduction
Firmware security is fairly weak; threats aren’t seen as much as vulnerabilities that
threaten certain industries [22]. Seeing as the exploitation of firmware by altering it is
not easy; the targeted program is more quickly and effectively utilized for traditional
malevolent operations like breaching PII. The firmware also has a modified application
installed on a specific number of computers and is therefore a somewhat wider hit-rate
than that of the vulnerability attacking hardware operating systems such as Windows,
or software that is widely used on machines [23].
Software bugs enable unauthorized parties with keys to their systems—often
unseen. This is how the manipulation of software exploits a computer until it ever boots
© Springer Nature Singapore Pte Ltd. 2021
K. K. Patel et al. (Eds.): icSoftComp 2020, CCIS 1374, pp. 224–233, 2021.
https://doi.org/10.1007/978-981-16-0708-0_19
Firmware Attack Detection on Gadgets 225
[17]. It is achieved by inserting malware into the bottom layer framework that manages
the devices during the controller’s start-up. When the unauthorized script has reached
the device, it's able to change the hardware, hack the OS objective, penetrate devices,
and much more. Multiple modes of assault will come from basic BIOS and modern
UEF IT devices. Spam, key packs and kernel sets both are common portals for dis-
tribution [18].
There would be bugs, like in other technology deployments. If the array of
applications has been ignored over time, which is since exposed vulnerabilities can also
be used for assaults [21]. Deep rooted vulnerabilities are likely to be discovered. UEFI
serves a crucial part safely in loading applications, and it presents hackers with a
captivating assault surface [19]. Formerly considered only by nation states to be
abused, recent work and attacks also modified assumptions regarding the flaws of the
cpu firmware. For UEFI protection the main aim is to remove and fix the good kick-a-
mole strategy. The industrial sector will insure that such entry obstacles for hackers
stay high only through its sustained developments in aggressive technology and best
activities, creation of advanced digital innovations and improved cooperation with
partners’ communities [20].
For hardware it is always the device software including BIOS and its quite recent
substitute, UEFI, which becomes very main component to be considered [25]. This
machine firmware is fairly strong, but neither one of tens of modules in consumer
machines dependent on firmware and capable of performing important functions in an
assault. Such essential firmware can be the first application to operate while the
interface is in action which can be changed by modifying the startup configuration,
updating the OS framework which jeopardizing hypervisors and also the compute
nodes as well [29]. Device software could be used to distribute harmful software to
other machine modules.
Use CPU loopholes to access details including codes that usually stay in restricted
storage, which will stay secure. The capacity of the cpu bugs to be centrally abused
often increases the harm. In fact, the limitations of CPUs are caused by every device or
kernel as there are no other program exploitable vulnerabilities then all OS security
rates have been activated. In addition, the deployment of firmware fixes, processor
opcodes bug fixes, OS security patches, VMM patches, and application bug fixes will
present problems with fixing processors [28].
The PCIe bridge links to a vast range of essential devices, including GPUs, com-
munication adapters among far others [10]. Code and hence the infrastructure will
affect either the hardware of PCIe chips and Bios-connected computers. If infected,
destructive breaches will contribute to machine memory being read and written and
unauthorized code being run inside the framework of the victim’s kernel [12].
The firmware strategy is also close to conventional malicious attacks. “Application
offline” The first solution may be by spoofing, push-by-downloads, or online media
[13]. The intruder may target the weaknesses of compromised software involved in
order to distribute an exploit or use more resources, such as sensitive drivers for
increased rights and software control if required [9]. Both alternatives for an intruder
are rather simple. Firstly, the easy usage of a loophole, and secondly, it reflects the
traditional Key logger/Dispenser pattern used during decades by hackers.
226 E. Arul and A. Punidha
Threats on firmware provide a collection of modules and methods which are not
commonly used in conventional threats on devices. During this article we will
demonstrate some of the key firmware elements in a system, whether they can be
exploited as major aspect of an assault, and how attackers utilize tactics and tech-
nologies. The portion is not comprehensive or authoritative, but is meant to address
core firmware assaults paradigms [15].
2 Related Work
Roy research done on the society has developed to a level where contact through
communications or links among computers and individuals becomes difficult to con-
ceive [1]. The emergence and development of integrated machines and ‘stuff’ was the
result of the fifth industrialization. The Foreign New technology (IoT) is now the basis
of connectivity for computers and men. The older products on the market, the more
software needs to be upgraded in order to combat attacks on technology. Each system
must be upgraded to the new firmware to retain stable and efficiently function; this
article suggests a safe software upgrade framework in a centralized faith-based Smart
home platform repository.
Kumar did research on the mobile systems the cloud services/new software operates
daily. Updates are received via the Web on IoT gadgets Throughout the Cloud
(OTA) [2]. OTA upgrade functionality can be exploited in the lack of appropriate
protection steps. The challenges to protection such as the decoding of the software and
the installation of illegitimate code on compromised networks would result in the theft,
device copying and packet sniffing assault. Throughout this article, we suggest a pro-
tection architecture for the stable OTA hotfix phase by the logic analyzer/SoC users. The
comprehensive approach suggested promotes JTAG protection, preserves IP privileges
of OEMs and stable OTA updates. The safety mechanism is planned to resolve all safety
risks associated with the OTA system software-software upgrade, not handled in pre-
vious techniques, by means of correct routing protocols and configuration controls.
Thakur analysis the conventionally, intermittent system output disruptions and
degrades almost sometimes culminated in a centralized update of the computer firm-
ware [3]. Programmers needed to be at the ground to reintroduce the system from their
machine, update or fix the latest code to place the product back in the field.
Nonetheless, for companies this entire method is very difficult and un-scalable.
However, Off - the-Air software upgrade is actually a popular way to upgrade smart
devices without interruption. The document is intended to demonstrate the research that
we are doing in this field by creating a system for updating the firmware of diverse IoT
gadgets from a distance [5].
Dhobi deals throughout this report, we suggest using Safe Software Update for
integrated equipment that uses technologies from TrustZone to install new software in a
stable way [4]. TrustZone offers the software to be installed in the system with its
reliability and protection. It tests the validity, reliability and protection of the framework
using a trustworthy application together with the host program. The authentication of a
firmware, which can be safely done in TrustZone setting, is checked using a unique
identifier method [15].
Firmware Attack Detection on Gadgets 227
Ridge Regression is a multi-linear analysis methodology for least squares results. If tri-
collinearity exists, the minimum round figures are not statistical, yet the differences
were wide and so away from the real meaning. If the regression results contain a degree
of truth, a ridge approximation eliminates typical deviations [16]. The overall result is
expected to have more accurate forecasts. Regularization is a means of preventing
overpowering by fining strongly regarded correlations of regression. Simply speaking,
they reduce (simplify) prototype variables and narrows. A simpler, more common-
sensical approach would generally conduct well in forecasts. Regularization places
limitations on more complicated models, and instead sets alternative models from
fewer overfits to larger; the smallest overfit method generally allows the better option
for predictive value [11].
RR API
Anaysis
Bengin/Malicious
Firmware AƩack RR
Prefetch Anaylsis
Pool of
Executable
Firmware
Fig. 1. Flow diagram of Firmware Attack Detection on Gadgets Using Ridge Regression (F-RR)
may yield models of very certain correlations. This approach is used for the regression
of Lasso [6].
Ridge regression refers to a family of L2 reconstruction software. The other method
of convolution, L1, reduces the sum of the coefficients by applying a penalty of L1,
which is equivalent to the exponential function of the factor degree. Often, certain
factors are entirely omitted, which may generate fuzzy prototypes [7]. The regulated L2
system provides an L2 points deduction, which is equal to the square of coefficient
scale. The same component shrinks all coefficients (so no element is removed). L2 does
not contribute to fragmented templates without L1 convolution.
For the calculation of parameters OLS regression uses a similar formula:
^ ¼ ðX2 XÞ 1Y
B ð1Þ
The Cross Product Mats (X'X) becomes nearly unique because the X vectors are
strongly connected while the X matrix becomes oriented and measured. Ridge pro-
gression adds a new sequence (X′X + kI), of the ID multiverse to the bridge-product
matrix [8]. This is considered the ridge regression, even though the vertex of one would
be represented as a ridge in the connection function. To locate the factors, the following
method is used:
argmin here implies “Maximum Case” in order to obtain the minimal goal. It finds
the b throughout the light, which minimizes the RSS. So therefore know how to make
b out of the equations of the equation [17].
Also, a type of least squares is the ridge correlation. The latter word is the OLS
exponent’s ridge limit. They are searching for the b, and yet they will fulfill the latter
limit. The circumference of the ring is equal to the C, and it will be right on the rim
throughout the loop region [24].
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
jjBjj2 ¼ B0 þ b1 x1 þ b2 x1 þ . . . þ bp ð5Þ
They need the matrix equivalent to take a course, which is little more than the next
description.
The explanation is the shortest, but always the just like with everything were
addressed. Note that in the preceding expression the next term is actually OLS, so the
next phrase is the variance of the slope. The phrase lambda is sometimes referred to as
“Foul,” although it raises RSS. They place those parameters on the lambda then cal-
culate the function with the aid of a ‘Mean Square Error’ calculation [19]. The lambda
meaning decreasing MSE will then be used as the final standard. In fact, this device
regression model is stronger than the quantitative OLS method. Should be seen in the
Firmware Attack Detection on Gadgets 229
following figure, the b range varies with the lambda and is identical to OLS b, unless
the lambda is close to zero (no penalty) [20].
The lambda probably ends up going with the denominator by applying the matrix
rule used before. This implies that the b ridge will reduce if we growing the lambda
value. Yet b’s can’t be negative irrespective of the scale of the lambda number. This
implies that ridge regression provides weights to the features specific value but does not
decrease irrelevant aspects [10].
1
. . .. . .. . . þ þ . . .. . . xT y ð6Þ
k
Fig. 2. The proposed Firmware – LAR Finding with IATRVA component against the malware
image version
Fig. 3. The proposed Firmware – LAR Finding with ExportRVA component against the
malware image version
Fig. 4. The proposed Firmware – LAR Finding with ExportSize component against the malware
image version
Firmware Attack Detection on Gadgets 231
Table 1. Compared Lasso and RR with malware methods of the proposed Firmware - LAR
FirmwareEXE Lasso Ridge Regressor
IATRVA Alpha Best Score Alpha Best Score
1e−15 −0.103634 1e−15 −0.1024663
1e−10 −0.103634 1e−10 −0.103634
1e−08 −0.103634 1e−08 −0.103634
1 −0.726114 1 −0.103234
5 −0.688257 5 −0.102466
10 −0.661110 10 −0.102840
20 −0.657820 20 −0.105992
Table 2. Compared with existing malware methods of the proposed Firmware - LAR
Methods Number of Malware are TP Ratio FP FP Ratio
Detected (%) Detected (%)
Yohan 1095 97.50 28 0.02
Xie 1035 92.16 88 0.06
Proposed Firmware- 1107 98.57 16 0.01
LAR
Test Scale of Malware: 1123 Collection of Harmless Samples:
1322
Even if an Smart home system is hacked, it could be utilized for the purposes of
stealing information, breaching data privacy and undertaking such attacks which may
lead to loss of profits, considerable expense to resolve the issue, possible corporate
governance expenses and potential financial damages. This paper is neither final nor
accurate, but aims to lay forth the main guidelines for firmware attacks. Firmware
Ridge regression, which does not have a particular approach, is by far the most
commonly employed form of regularization for untouched issues. Regularizations
essentially offer extra details to select the “right” remedy for a problem. Regression
from Ridge and Lasso is effective approaches widely used to build prosaic frameworks
of ‘big’ characteristics. There, “high” would usually mean one of two things: big
enough to boost the propensity of an over-powered model (so small as 10 variables will
trigger overpower). It could occur for millions or even billions of characteristics in
advanced systems The outcome is a real 98.57% positive score and an incorrect pos-
itive score of 0.01% on the different firmware apps. In the future, this is to be replicated
by other APIs, which enable malicious network operations to be performed.
232 E. Arul and A. Punidha
References
1. Roy, G., Britto Ramesh Kumar, S.: An Architecture to Enable Secure Firmware Updates on
a Distributed-Trust IoT Network Using Blockchain. In: Smys, S., Bestak, R., Chen, J.I.-Z. ,
Kotuliak, I. (eds.) International Conference on Computer Networks and Communication
Technologies. LNDECT, vol. 15, pp. 671–679. Springer, Singapore (2019). https://doi.org/
10.1007/978-981-10-8681-6_61
2. Kumar, S.K., Sahoo, S., Kiran, K., Swain, A.K., Mahapatra, K.K.: A novel holistic security
framework for in-field firmware updates. In: 2018 IEEE International Symposium on Smart
Electronic Systems (iSES) (Formerly iNiS), Hyderabad, India, pp. 261–264 (2018)
3. Thakur, P., Bodade, V., Achary, A., Addagatla, M., Kumar, N., Pingle, Y.: Universal
firmware upgrade over-the-air for IoT devices with security. In: 2019 6th International
Conference on Computing for Sustainable Global Development (INDIACom), New Delhi,
India, pp. 27–30 (2019)
4. Dhobi, R., Gajjar, S., Parmar, D., Vaghela, T.: Secure firmware update over the air using
TrustZone. In: 2019 Innovations in Power and Advanced Computing Technologies (i-
PACT), Vellore, India, pp. 1–4 (2019)
5. Islam, M.N., Kundu, S.: IoT security, privacy and trust in home-sharing economy via
blockchain. In: Choo, K.-K., Dehghantanha, A., Parizi, R.M. (eds.) Blockchain Cyberse-
curity, Trust and Privacy. AIS, vol. 79, pp. 33–50. Springer, Cham (2020). https://doi.org/10.
1007/978-3-030-38181-3_3
6. Parizi, R., Dehghantanha, A., Azmoodeh, A., Choo, K.-K.R.: Blockchain in cybersecurity
realm: an overview. In: Choo, K.-K.R., Dehghantanha, A., Parizi, R.M. (eds.) Blockchain
Cybersecurity, Trust and Privacy. AIS, vol. 79, pp. 1–5. Springer, Cham (2020). https://doi.
org/10.1007/978-3-030-38181-3_1
7. Solangi, Z.A., Solangi, Y.A., Chandio, S., bin Hamzah, M.S., Shah, A.: The future of data
privacy and security concerns in Internet of Things. In: 2018 IEEE International Conference
on Innovative Research and Development (ICIRD), Bangkok, pp. 1–4 (2018)
8. Wazid, M., Das, A.K., Rodrigues, J.J.P.C., Shetty, S., Park, Y.: IoMT malware detection
approaches: analysis and research challenges. IEEE Access 7, 182459–182476 (2019)
9. Makhdoom, I., Abolhasan, M., Lipman, J., Liu, R.P., Ni, W.: Anatomy of threats to the
internet of things. IEEE Commun. Surv. Tutor. 21(2), 1636–1675 (2019)
10. Darabian, H., et al.: A multiview learning method for malware threat hunting: windows, IoT
and android as case studies. World Wide Web 23(2), 1241–1260 (2020). https://doi.org/10.
1007/s11280-019-00755-0
11. Du, M., Wang, K., Chen, Y., Wang, X., Sun, Y.: Big data privacy preserving in multi-access
edge computing for heterogeneous internet of things. IEEE Commun. Mag. 56(8), 62–67
(2018)
12. Ahmed, A., Latif, R., Latif, S., Abbas, H., Khan, F.A.: Malicious insiders attack in IoT based
Multi-Cloud e-Healthcare environment: a Systematic Literature Review. Multimed. Tools
Appl. 77(17), 21947–21965 (2018). https://doi.org/10.1007/s11042-017-5540-x
13. Ntantogian, C., Poulios, G., Karopoulos, G., Xenakis, C.: Transforming malicious code to
ROP gadgets for antivirus evasion. IET Inf. Secur. 13(6), 570–578 (2019). https://doi.org/10.
1049/iet-ifs.2018.5386
14. Patel, Z.D.: Malware detection in android operating system. In: 2018 International
Conference on Advances in Computing, Communication Control and Networking
(ICACCCN), Greater Noida (UP), India, pp. 366–370 (2018)
Firmware Attack Detection on Gadgets 233
15. Yoon, S., Jeon, Y.: Security threats analysis for android based mobile device. In: 2014
International Conference on Information and Communication Technology Convergence
(ICTC), Busan, pp. 775–776 (2014)
16. Erdődi, L.: Finding dispatcher gadgets for jump oriented programming code reuse attacks.
In: 2013 IEEE 8th International Symposium on Applied Computational Intelligence and
Informatics (SACI), Timisoara, pp. 321–325 (2013)
17. https://securityboulevard.com/2019/12/anatomy-of-a-firmware-attack/
18. https://www.thesslstore.com/blog/firmware-attacks-what-they-are-how-i-can-protect-myself/
19. https://www.refirmlabs.com/centrifuge-platform/
20. https://uefi.org/sites/default/files/resources/Getting%20a%20Handle%20on%20Firmware%
20Security%2011.11.17%20Final.pdf
21. https://www.helpnetsecurity.com/2019/07/17/hardening-firmware-security/
22. https://www.statisticshowto.datasciencecentral.com/regularization/
23. https://towardsdatascience.com/ridge-regression-for-better-usage-2f19b3a202db
24. https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Ridge_
Regression.pdf
25. https://towardsdatascience.com/how-to-perform-lasso-and-ridge-regression-in-python-
3b3b75541ad8
26. https://hackernoon.com/practical-machine-learning-ridge-regression-vs-lasso-a00326371ece
27. https://www.analyticsvidhya.com/blog/2016/01/ridge-lasso-regression-python-complete-
tutorial/
28. https://solidsystemsllc.com/firmware-security/
Automatic Text Extraction from Digital
Brochures: Achieving Competitiveness
for Mauritius Supermarkets
Abstract. In recent years, it has been observed that there is a growing number
of supermarket stores coming into operation in Mauritius. Being a small
country, Mauritius has got a limited customer base, and developing marketing
strategies to stay in the market has become a priority for supermarket owners.
A common marketing strategy is for decision makers to frequently offer sales on
items in stock. These sales are advertised using brochures that are distributed to
customers as soft copies (PDF digital brochures online) or hard copies within the
supermarket premise. To ensure competitiveness of sales prices, decision
makers must consult their competitor’s brochures. Given that each competitor’s
brochure must be manually checked, the process can be costly, time consuming
and labor intensive. To address this problem, we present the components of a
system suitable for automatically collecting and aggregating information on
items on sale in each supermarket by extracting useful information from digital
brochures published online. The proposed system has been implemented on a
pilot scale and tested in the local context. The results obtained indicate that our
proposal can be effectively used to help local supermarkets easily keep track of
market trends in order to remain competitive and attract customers.
1 Introduction
Today, companies rely on data for their day-to-day activities and for the development
of a long-term strategy [1–4]. Decision makers' measures are also vital to the business
on the ground that these choices will bring the company either benefit or loss. In
Mauritius, which is a small island, the number of operating supermarkets has under-
gone an increase over the past years. To survive and gain competitive advantage in the
highly competitive marketplace of Mauritius, local supermarkets regularly offer sales
on their products. Items on sales are listed in brochures and distributed to customers as
hard copies within the supermarket store or soft copies as online brochures. In parallel
to the current marketing strategy in place, decision makers of supermarkets have
adopted an alternative approach to decide on the next marketing or sales strategy:
Competitor’s brochures are often analyzed and future sales strategies to achieve
competitiveness are then formulated.
However, the process of analyzing competitor’s brochures to get an overview on
which products are on sale remains a costly, tedious and time-consuming task, as the
whole task is conducted manually. A physical visit to each competitor’s store is
required to collect the physical brochures or online brochures for each competitor must
be individually and manually downloaded. Decision-makers must then go through each
competitor’s catalogue and take note for each item price to obtain enough information
prior to coming up with necessary sales strategies that can maintain or boost up
transactions in their respective supermarkets.
Consequently, automating the process of extracting useful information from bro-
chures is desired. Such an approach will not only help in quickly obtaining useful
information in a cost-effective manner, but it may also serve to build a database of sales
data for later analysis. By creating a repository of sales data from multiple competitors
over a given period of time, data processing techniques such as data mining may be
used to discover marketing trends otherwise hidden from simple analysis [5].
In this paper, therefore, we present the design and implementation of a system
along with the required components that should be able to automatically and regularly
fetch digital brochures from online sources to be then subjected to further processing.
The outcome of the processing would be an aggregate of information extracted from
PDF files such as product name, description and price from each competitor’s bro-
chure. We expect that such a system would allow decision makers to have quick access
to information that can be used to better formulate marketing and sales strategies. It is
to be noted that there are already several approaches to extract data from PDF files.
However, those methods 1) usually employ complex algorithms, which are not nec-
essarily essential to extract information required from PDF supermarkets brochures and
2) have not been applied to the problem domain addressed in this work. In this work,
we instead adopt a simplified approach involving format conversion to extract only
essential information that would help in identifying product items and their corre-
sponding prices.
The rest of the paper is organized as follows: A description of the common methods
used by supermarket for advertisement is provided followed by an overview of related
work on text extraction. Then the components for the proposed system for data
extraction from digital brochures are presented. Each underlying component of the
proposed tool is then explained along with brief implementation details. Testing result
of the implemented prototype is provided, and its effectiveness is discussed. We finally
offer some conclusions based on the results obtained.
The brochure takes the form and size of a newspaper, which is published by the end
of each month or near the period of a festive season. Brochures can be collected as
prints in stores or they can be accessed, viewed and downloaded as PDF file from a
store website. Brochures, thus, not only serve as a guide to customers to know which
products are on sale in a given supermarket, but it also serves as a means for com-
petitors to quickly learn about trending price on the marketplace.
In this study, the goal is to eliminate the need for manual intervention when
extracting information from online brochures and instead come up with the components
of a system to automatically aggregate information from various brochures for the same
Automatic Text Extraction from Digital Brochures 237
sales period. As noted in Fig. 1, brochures, however, do not necessarily have a standard
approach to present items and their details (price, quantity, image, etc.). Texts and
images are presented as random blocks and different brochures may follow a different
approach to present their data making the task of extracting and aggregating infor-
mation from digital brochures a challenging task.
The Portable Document Format (PDF) is the file format used for digital brochures
online. It is a popular format due to its lightweight characteristics and being portable
across systems. In PDF files, text data is stored as a series of string objects. These
objects can vary in font style, font size, orientation, and line spacing. Hence, when
extracting text data from different PDF documents, special attention must be paid to
any variation in data presentation within the processed document. As summarized in
[32], the main steps involved in the extraction of text from a PDF document are: Layout
analysis, Segmentation, Character/Symbol recognition, Structure recognition and Text
extraction. Brief notes on each step are provided further.
Several statistical algorithms and methods have been used for conducting layout
analysis [10]. Based on the underlying method, layout analysis algorithms are often
238 Y. Chuttur et al.
3.2 Segmentation
In storage and retrieval systems, segmentation is considered a simple task in which
underlying algorithms manipulate complex document formats and backgrounds [13].
Besides, segmentation is a labeling method, the latter involves the process where same
label is assigned to spatially aligned units (such as pixels, linked components or
characteristic points) as shown in Fig. 3.
Similar to layout analysis, algorithms used for page segmentation fall under three
categories that are: bottom-up, top-down and hybrid algorithms. Furthermore, the
choice of a technique for segmentation must be carefully chosen as it will depend on
the document structure complexity.
4 Related Works
Over the past three decades, systematic studies as evidenced in [17–24] have been
carried out in text extraction for different types of documents. We summarize some of
the relevant studies here. Anewierden [25], for instance, documented his findings using
PDF file where his work is based on restoring the logical structure of technical man-
uals. He developed a system called AIDAS that takes a PDF file, extracts the logical
structure, then in this logical structure assigns indexes to each part. Indexes refer to two
things, firstly the content of the element and secondly, how the element can be included
in the instruction. Once extraction has been completed, indexes can be used to query
the database, an instructor can then retrieve suitable training content. The first step
taken by AIDAS is to describe a document’s overall layout. During this stage footers,
columns, headers, and the dominant font are defined. By statistical analysis, all of these
are calculated and can be overridden by the user. AIDAS subsequently divides each
page into segments. Compared with few text elements, drawings are recognized though
several graphical elements. Furthermore, tables are recognized by a texts labeled as
floating and the remaining segments are considered as the segments of the document.
Another method for extracting structures from PDF documents called Xed has been
developed by Hadjar et al. [26]. Xed is a software with main features transforming
original PDF files into canonical XML form. This process consists of two key steps,
first, it transforms PDF documents into an internal Java tree, normalizes the original
document's primitives, and considers all types of embedded tools. Second, to recover
the physical structures and their representation in the canonical format, Xed proceeds
by analyzing Internal Java tree text.
Chao et al. [27] have also worked on methods for segmenting logical structure
regions in a PDF document. The page is firstly divided into three layers namely text,
image and vector graphic layer. This is done to reduce the possible logical component
overlay and the interference between different logical components. Through this lay-
ering process, each layer is transformed into a PDF of its own, then the PDF document
is transformed into bitmap images which is used to define the logical structural com-
ponent blocks in each layer. Document image segmentation is then performed.
A polygon outline is obtained for each component as well as the component’s style
values and content. Text components, shapes of the images are extracted for pro-
cessing. Moreover, for vector graphic components, a SVG file has been created for
every component defined.
240 Y. Chuttur et al.
Ramakrishnan et al. [28] worked on a tool to extract text blocks using research
papers as raw data. Text blocks were grouped using rules that characterize sections,
called LA-PDFText. The LA-PDF Text scheme involves mainly on the text component
of the research documents. The tool can be used as an inspiration for more sophisti-
cated techniques that does not only involve text. The system goes through three stages:
identifying contiguous text blocks, categorizing text blocks into metaphorical cate-
gories using a rule-based approach, and linking categorized text blocks together by
correctly arranging them, resulting in extracting text from section-wise grouped text.
Gao et al. [29] implemented another method, SEB, that uses PDF format book
documents as input data. The extraction process involves the main physical and logical
structures of a book. The authors suggest a series of new methods in the scope of PDF-
based books to enhance traditional image-based methods by using the two character-
istics of PDF files and books. These characteristics are firstly page component
(SCC) style consistency and secondly natural rendering order local reliability. Bipartite
graph plays an important to decide the reading order as well as extracting metadata
from title pages.
Tkaczyk et al. [30] further launched CERMINE, an open source framework for the
extraction of born-digital metadata and parsed bibliographic references using scientific
papers as input. This framework made use of supervised and unsupervised machine
learning techniques. A workflow is used that consist of three steps. First a TrueViz
format [31] document is generated using the PDF file as input. Secondly the Metadata
Extraction path analyzes parts of the geometric hierarchical structure of metadata and
extracts from them a rich collection of record metadata. Thirdly the extraction path for
references analyzes parts of the framework classified as references.
Singh et al. [32] proposed an open-source system for a variety of scholarly paper
knowledge extraction tasks, including metadata (title, name of author, affiliation and e-
mail), structure (headings of section and body text, headings of table and figure, URLs
and footnotes), and bibliography (citation instances and references). The system
accepts a PDF article, (1) convert the PDF file to an XML format, (2) process the XML
file to extract useful information, and (3) export the output to TEI-encoded structured
documents. With rich metadata, namely x and y coordinates, font size, font weight, font
style etc., each token in the PDF file is annotated. For the extraction mission, machine
learning models, handwritten rules/heuristics and the rich knowledge found in the
XML files were used.
It can seen from the techniques described above that, the underlying process of
extracting information from PDF files is based on the extraction of multiple compo-
nents making the PDF documents such as images, texts, colors, etc. In the case of
supermarket brochures, the main goal is to extract text data containing product name,
price, quantity, etc. such that the underlying techniques presented in this section may be
overrated for the problem at hand. Alternatively, we propose that to speed up pro-
cessing time without compromising on the set objective of extracting essential text data
from PDF brochures, the focus should be on block of texts alone. We posit that once
texts blocks are extracted from brochures, the rest of the work would simply require the
accurately alignment of appropriate labels (price, name quantity, etc.) to the corre-
sponding values listed in the brochures. We present our proposed approach to extract
brochure data in the next section.
Automatic Text Extraction from Digital Brochures 241
In this paper, we seek to automatically access and extract useful information from PDF
brochures published online for supermarkets in Mauritius. The goal is to provide
decision makers with products details from competitors for strategic planning. The
architecture for our proposed system is shown in Fig. 4.
The task of the crawler is to regularly visit supermarket websites and download their
respective brochures. The crawler is implemented in Java and we make use of the
Jsoup1 library to obtain the HTML home page of the two supermarkets used in this
1
https://jsoup.org/.
242 Y. Chuttur et al.
study (i.e., jumbo and monoprix). Because brochures link are dynamic and cannot be
hard coded in our program, we designed our crawler in such as a way that it extracts all
links available on the home page of the supermarket and it specifically looks for any
link that contains a PDF extension with the domain name of the supermarket’s. It must
be pointed out that based on our observation, at the time of conducting this study,
Jumbo and Monoprix would offer their latest brochures in PDF format for downloads
on their websites. No other documents with the PDF extension were accessible on each
supermarket website. Once the brochure link is found, the crawler downloads the file
and save the document on a server for further processing.
The second step consists in converting the PDF documents into TXT format. The
result of converting the brochure into TXT format turned out to be interesting. Once
converted, details of each product item in the brochure (see Fig. 5) could be seen as a
block of data in the resultant TXT file. Block of data for each item block is aligned
vertically one after the other making it easy for information extraction one line at a
time.
Once conversion from PDF to TXT is completed, the next step consists in reading
each block of data from the TXT file one by one for later storage in a database. The
main challenge for this part lies in accurately identifying details of the item blocks for
later storage and retrieval in the database. For instance, the TXT files provided us with
block of texts that contained information presented in a linear but randomized and
unstructured format. We had to identify, which part of the block of data referred to the
Automatic Text Extraction from Digital Brochures 243
any of product description, price (old and promotional), details (colour, quantity,
dimensions, etc.) and supermarket information (jumbo or winners).
To add to the complexity of the task, blocks of data identified did not always follow
the same format within the same brochures or in brochures from same store on different
dates or even in brochures from different stores. The diversity in the way information
was presented in a brochure did not allow us to apply techniques already adopted in
previous studies. Instead, we proceeded in devising our own algorithms based on
manual analysis of the brochures downloaded from the supermarket’s websites.
Following document analysis, it was found that although blocks of data from a
sample of brochures could appear randomly, the underlying contents made it relatively
easy to determine the semantics of each part of the text for each block. As an example
of block variation, it was found that given two brochures, one brochure may present
block of items in the order of name, description, old price, and promotion price while in
the other brochure, block of data would present information in the order of name and
promotion price only. To accurately identify and classify blocks of items accurately,
therefore, we had to devise our own algorithm for information extraction as shown in
Figs. 6, 7 and 8.
At the core of the algorithm, once unwanted characters are removed from the TXT
files, we posit that any piece of text, which contains numbers followed by a period ‘.’
and two digits such as ‘135.00’, would be price values, with the larger value being the
current price and the smaller value being the promotional price. We also created our
244 Y. Chuttur et al.
corpus of data, which acted as a data dictionary to identify product name and
description. Supermarket product vocabularies are more or less standardized such that
once a line of text is matched with our dictionary; we were, thus, able to identify
whether the text referred to the product name or product description.
Once extraction is completed, all details are stored in a database that may be
accessed either via a web portal or a mobile application. A snapshot of the database
populated from data extracted from tested brochures is shown in Fig. 9. For this study,
the records are organized according to product name, description, previous price,
current price and supermarkets.
To evaluate the performance of our algorithm, we used data crawled for a period of
3 months from three different supermarkets brochures. Our dataset consisted of roughly
3000 items for which relevant texts had to be extracted. The total records extracted
amounted to around 15000 texts structured according to the product name, description,
previous price, current price and supermarket name. Using Eq. (1), we obtained a good
accuracy of 85%.
7 Discussions
Supermarket stores publish their online brochures in PDF documents. Such data is
currently difficult and costly to access given the manual task involved. We propose to
automatically extract useful information from online PDF brochures to ease the task of
decision makers for marketing purposes. Tools identified in previous studies for
extracting information from PDF documents are overrated with underlying processing
involved being too extensive or not required. In the present proposal, we proposed,
implemented and evaluated a simple technique that can automatically extract the
desired information required from supermarket brochures for product price comparison.
At the same time, our proposal takes into consideration the need for storing data over a
period of time for necessary analysis. This may help in Business Intelligence appli-
cations or similar data mining techniques, which can further enhance business deci-
sions. In our research, we could not find any system that addressed this issue of data
storage. The algorithm devised for this study demonstrated that information extraction
could be successfully adapted and applied to brochures, which presented information in
varying blocks of data.
8 Conclusion
In this paper, we present a system that can be used to extract useful information from
online PDF files. The techniques used in our proposal contrast with those used in
previous studies in that complex natural language or image processing techniques are
not used to obtain useful information in the present context. Instead, a simplified
technique making use of file conversion proved to be sufficient in extracting desired
information from source files. Our prototype has been tested using brochures from two
different supermarkets demonstrating a good performance. As future work, it may be
useful to investigate whether machine learning techniques can be applied along with
the technique proposed in this paper to obtain better information extraction perfor-
mance. Additionally, we expect that further investigation in comparing the performance
of the proposed algorithm in this paper to other similar algorithms can provide further
insights on its effectiveness in easily extracting useful information from PDF
documents.
References
1. Provost, F., Fawcett, T.: Data Science for Business: What You need to Know About Data
Mining and Data-Analytic Thinking. O’Reilly Media Inc., Sebastopol (2013)
2. Amankwah-Amoah, J., Adomako, S.: Big data analytics and business failures in data-Rich
environments: an organizing framework. Comput. Ind. 105, 204–212 (2019)
3. Bal, H.Ç., Erkan, Ç.: Industry 4.0 and competitiveness. Procedia Comput. Sci. 158, 625–631
(2019)
Automatic Text Extraction from Digital Brochures 247
4. Nyanga, C., Pansiri, J., Chatibura, D.: Enhancing competitiveness in the tourism industry
through the use of business intelligence: a literature review. J. Tour. Futures 6, 139–151
(2019)
5. Pustulka, E., Hanne, T.: Text mining innovation for business. In: Dornberger, R. (ed.) New
Trends in Business Information Systems and Technology. SSDC, vol. 294, pp. 49–61.
Springer, Cham (2021). https://doi.org/10.1007/978-3-030-48332-6_4
6. Cohen, M.C., Perakis, G.: Optimizing promotions for multiple items in supermarkets. In:
Ray, S., Yin, S. (eds.) Channel Strategies and Marketing Mix in a Connected World.
SSSCM, vol. 9, pp. 71–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-
31733-1_4
7. MasKhanam, M., Ali, A.: Impact of advertising: end user perspective. J. Soc. Sci. Humanit.
58(1), 179–189 (2019)
8. Wiese, M., Martínez-Climent, C., Botella-Carrubi, D.: A framework for Facebook
advertising effectiveness: a behavioral perspective. J. Bus. Res. 109, 76–87 (2020)
9. Knezevic, B., Davidaviciene, V., Skrobot, P.: Implementation of social networks as a digital
communication tool in social supermarkets. Int. J. Mod. Res. Eng. Manag. (IJMREM) 1(6),
41 (2018)
10. Binmakhashen, G.M., Mahmoud, S.A.: Document layout analysis: a comprehensive survey.
ACM Comput. Surv. (CSUR) 52(6), 1–36 (2019)
11. BinMakhashen, G.M., Mahmoud, S.A.: Historical document layout analysis using
anisotropic diffusion and geometric features. Int. J. Digit. Libr. 21(3), 329–342 (2020).
https://doi.org/10.1007/s00799-020-00280-w
12. Hassan, T.: Object-level document analysis of PDF files. In: Proceedings of the 9th ACM
Symposium on Document Engineering, Munich, Germany. Association for Computing
Machinery (2009)
13. Sesh Kumar, K.S., Namboodiri, A.M., Jawahar, C.V.: Learning segmentation of documents
with complex scripts. In: Kalra, P.K., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338,
pp. 749–760. Springer, Heidelberg (2006). https://doi.org/10.1007/11949619_67
14. Lovegrove, W.S., Brailsford, D.F.: Document analysis of PDF files: methods, results and
implications. Electron. Publ.-Orig. Dissem. Des. 8(3) (1995)
15. Audithan, S., Chandrasekaran, R.M.: Document text extraction from document images using
haar discrete wavelet transform. Eur. J. Sci. Res. 36(4), 502–512 (2009)
16. Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature
survey. In: Document Recognition and Retrieval X, vol. 5010, pp. 197–207. International
Society for Optics and Photonics, January 2003
17. Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for
technical journals. Computer 25(7), 10–22 (1992)
18. Dengel, A.: Initial learning of document structure. In: Proceedings of 2nd International
Conference on Document Analysis and Recognition (ICDAR 1993), pp. 86–90. IEEE (1993)
19. Dengel, A., Dubiel, F.: Clustering and classification of document structure-a machine
learning approach. In: Proceedings of 3rd International Conference on Document Analysis
and Recognition, 14–16 August 1995, vol. 2, pp. 587–591 (1995)
20. Jain, A.K., Bin, Y.: Document representation and its system to page decomposition. IEEE
Trans. Pattern Anal. Mach. Intell. 20, 294–308 (1998)
21. Alpizar-Chacon, I., Sosnovsky, S.: Order out of chaos: construction of knowledge models
from PDF textbooks. In: Proceedings of the ACM Symposium on Document Engineering
2020, p. 10, September 2020
22. Jensen, Z., et al.: A machine learning approach to zeolite synthesis enabled by automatic
literature data extraction. ACS Central Sci. 5(5), 892–899 (2019)
248 Y. Chuttur et al.
23. Payak, A., Rai, S., Shrivastava, K., Gulwani, R.: Automatic text summarization and keyword
extraction using natural language processing. In: 2020 International Conference on
Electronics and Sustainable Communication Systems (ICESC), pp. 98–103. IEEE, July 2020
24. Khatavkar, V., Kulkarni, P.: Trends in document analysis. In: Balas, V.E., Sharma, N.,
Chakrabarti, A. (eds.) Data Management, Analytics and Innovation. AISC, vol. 808,
pp. 249–262. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1402-5_19
25. Anjewierden, A.: AIDAS: incremental logical structure discovery in PDF documents. In:
Proceedings of Sixth International Conference on Document Analysis and Recognition, 13–
13 September 2001, pp. 374–378 (2001)
26. Hadjar, K., Rigamonti, M., Lalanne, D., Ingold, R.: Xed: a new tool for extracting hidden
structures from electronic documents. In: 2004 Proceedings of the First International
Workshop on Document Image Analysis for Libraries, 23–24 January 2004, pp. 212–224
(2004)
27. Chao, H., Fan, J.: Layout and content extraction for pdf documents. In: Marinai, S., Dengel,
A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 213–224. Springer, Heidelberg (2004). https://
doi.org/10.1007/978-3-540-28640-0_20
28. Ramakrishnan, C., Patnia, A., Hovy, E., Burns, G.A.P.C.: Layout-aware text extraction from
full-text PDF of scientific articles. Source Code Biol. Med. 7, 7 (2012). https://doi.org/10.
1186/1751-0473-7-7
29. Gao, L., Tang, Z., Lin, X., Liu, Y., Qiu, R., Wang, Y.: Structure extraction from PDF-based
book documents. In: Proceedings of the 11th Annual International ACM/IEEE Joint
Conference on Digital Libraries, Ottawa, Ontario, Canada. Association for Computing
Machinery (2011)
30. Tkaczyk, D., Szostek, P., Dendek, P.J., Fedoryszak, M., Bolikowski, L.: Cermine–automatic
extraction of metadata and references from scientific literature. In: 2014 11th IAPR
International Workshop on Document Analysis Systems, pp. 217–221. IEEE, April 2014
31. Lee, C., Kanungo, T.: The architecture of TrueViz: A groundTRUth/metadata editing and
VIsualiZing ToolKit. Pattern Recogn. 36(3), 811–825 (2003)
32. Singh, M., et al.: OCR++: a robust framework for information extraction from scholarly
articles. In: Proceedings of COLING 2016, the 26th International Conference on
Computational Linguistics: Technical Papers, Osaka, Japan, 11–17 December 2016,
pp. 3390–3400 (2016)
33. Sasirekha, D., Chandra, E.: Text extraction from PDF document. In: IJCA Proceedings on
Amrita International Conference of Women in Computing. AICWIC, pp. 17–19 (2013)
Improved Predictive System for Soil Test
Fertility Performance Using Fuzzy Rule
Approach
Abstract. Decision making in agriculture affects both crops in all their phases
(planting, management, costs, etc.) as well as the distribution and organization
of crop production systems. Generating rules and connecting the rules into the
fuzzy inference system for decision making is a very delicate task when asso-
ciated with soil fertility test. Some of the constraints with previous study is the
problem of uncertainty and imprecise complexity of experimenting and inter-
preting DSS system in real-life. The application of fuzzy-logic provides an
effective decision-making tool for dealing the aforementioned challenges. This
paper proposed a Mandani-type fuzzy inference System for testing soil fertility
and increasing overall crop yield. This study was conducted in FUNAAB farm
and seven inputs (nitrogen, phosphorous, potassium, organic matter, boron,
zinc, and PH of water) with their nutrients percentage were fed into a fuzzy
inference system. Three major inputs were varied based on their major role in
soil constituents (nitrogen, phosphorous and potassium (NPK)). Our experi-
mental result showed that irrespective of the level of the three major inputs
(NPK) at every given time, the values of the output did not significantly change,
it remained at 0.5 and it represents an average output on the crops considered for
this purpose of this work. This study concludes that the application of fuzzy
logic rule (mandani –type FIS model) in soil fertility test for cultivating four
crops (Yam, Cassava, Maize and Cocoyam) have proven to be effective measure
and thereby ensuring effective decision making.
1 Introduction
This section discusses in detailed the different state-of-the-art methods proposed and
the contributions of previous literature in the testing for soil fertility. The impacts and
roles of machine learning [37, 38] algorithms cannot be over-stressed as its applications
cut across different areas. However, some of the applications of machine learning in
soil test fertility include the studies from Moonjun et al. [14] that presented the impact
of fuzzy logic for effective soil mapping. The study applied rule-based reasoning for
efficient mapping of soil series and texture. Wang et al. [31] proposed a Takagli and
Sugeno (T-S) fuzzy neural networks models for testing soil fertility of rice cultivation.
Authors concluded on the impact of applying potassium and magnesium macronutri-
ents in fertilizers can improve overall productivity in rice. In addition, Rodriguez et al.
[20] proposed a fuzzy logic method using the T-S fuzzy interference method for
developing an indexes for soil quality assessment. The proposed approach was able to
determine the dynamics of soil quality for farming using physical, chemical and bio-
logical characteristics.
The application of novel adaptive neuro-fuzzy interference system (ANFIS) model
was presented by Were et al. [34] for predicting and mapping soil uncertainty. Fur-
thermore, Authors in [17] proposed the combination of ANN and fuzzy logic methods
for soil samples classification using munsell colour approach. Similarly, Dang et al. [4]
also presented a hybrid ANFIS system for rice cultivation prediction. The study inte-
grated HyFIS system with GIS-based and the analysis of eight environmental variables
was evaluated to assess the suitability areas for crop cultivation. Davatgar et al. [5]
introduced the combination of principal component analysis and fuzzy cluster algo-
rithm for the delineate of site specific nutrient management. The study was able to
provide farmers the privilege of site specific nutrient management. Sirsat et al. [27]
developed an automatic prediction system for measuring soil fertility indices. The study
was used on four soil nutrients for soil organic carbon and four important soil
(phosphorus pentoxide, iron, manganese and zinc. The study showed that the best
prediction was achieved using extremely randomized regression trees (extraTrees).
Prabakan et al. [18] developed a decision support system using fuzzy system to test for
soil array, nutrients, water and climatic parameters thus improving crop productivity.
Authors applied three types of graphical interference approach for determining the
aggregation rule and their results showed improvement in the cost ratio. Wang et al.
[33] also introduced a posterior probability support vector machine for assessing soil
quality test for concentration of dangerous substance. Authors claimed from their study
that emission from industries and vehicles, the use of phosphorus and potassium fer-
tilizers have high effect in determining soil quality. Shekofteh et al. [24] presented a
hybrid method using the combination of evolutional approach of ant colony organi-
zation algorithm and ANFIS method for predicting soil fertility. The method gave
better efficiency and determination coefficient for predicting soil CEC. Seyedmoham-
madi et al. [23] demonstrated the application of fuzzy analytical hierarchy process for
determining weight values in crop cultivation priority. The study further implemented
the geographical information system and multi-criteria decision making techniques for
enhancing soil suitability decision in crop cultivation. Agarwal et al. [1] proposed a soil
252 O. T. Arogundade et al.
fertility test method using principal of colorimetry and applied naïve bayes algorithm
for prediction. Authors claimed that applying NB predicted a better accuracy and it can
reduce time complexity for testing soil fertility. Authors in Sumiharto and Hardiyanto
[28] designed a prototype NPK nutrient measurement using local binary pattern and
back-propagation neural network for testing soil fertility.
Interestingly, authors in Ngunriji et al. [15] proposed a fuzzy logic soil mapping
approach for determining the following: soil types, soil CEC, depth and drainage class.
The results showed an improved prediction performance with field observations. Sun
et al. [29] presented the impacts and the relationship of soil quality index in cultivation
of A. monngholicus. Authors proposed five soil indicators namely organic matter,
calcium, gallium, borom and magnesium for assessing soil quality. Sefati et al. [22]
compared and evaluated different soil test models for assessing soil quality in urban
areas. Li et al. [9] evaluated and assessed soil quality index using twenty soil properties
based on physical, chemical and biological features for wheat-maize cultivation and
increasing crop yield. The result analysis shows that alka;i-hydrolyzable nitrogen and
organic matter are crucial elements in soil assessment compared to other elements.
Leena et al. [8] developed a decision support system using GIS technologies for
managing soil fertility. The study was able to enhance fertilizer recommendation dis-
tribution for soil nutrients and space variability thereby improving varieties of crop
productivity. Some studies have introduced the use of sensors and other hardware
devices to test and predict soil fertility. Authors in Rawankar et al. [19] presented a soil
analysis approach for evaluating the characteristics of main soil samples nutrients for
increasing crop cultivation yield. Similar study is from Authors in Masrie et al. [10]
who also developed an optical transducer to effectively measure and predict
macronutrients of nitrogen, phosphorus and potassium to test for soil fertility within
soil samples.
Based on the findings from related work the application of fuzzy logic system in
different soil test applications have shown that this method have the ability to under-
stand and ease the representation and processing of human knowledge in computer. In
addition, it has been shown in related study that the inputs, outputs, and rules of Fuzzy
Logic [39, 40] are easy to modify. However, some of the drawbacks of existing studies
includes failures with inference rule, and accumulated decision, time complexity of ML
methods, problems with limited and constrained environmental models, leading to
ineffective decision-making operations. Therefore, there is a need for a simple and
effective methods to consider prior farmers’ knowledge in agricultural system and best
soil samples with emphasis on level of nutrients for variety of crop cultivation. Thus,
this paper focussed on designing an effective fuzzy logic system based on mandami
fuzzy interference for testing soil fertility thus improving crop yield.
environmental field has been on the rise as this method allows the integration of
mathematical tools for dealing with uncertainty and imprecision [20]. Authors in [23]
also defined fuzzy sets as a more general kind of structures called L-relations [6], which
have been applied to various research domain, such as linguistics, decision-making and
clustering [3, 16, 30]. [7] further defined fuzzy set l of X 6¼ £ as a function from the
reference set X to the unit interval, such that l : X ! ½0; 1 where FðXÞ represents the
set of all fuzzy sets of X, that is F ð X Þdef ¼ fljl : X ! ½0; 1g. In addition, fuzzy sets
can be described by their characteristic/membership function and assigning degree of
membership lðXÞ to each element x 2 X.
The basic logical operations are the OR, AND and NOT with operands A and B
membership values with the interval [0, 1].
To compute the output of Mamdami-type FIS given the inputs, the following six
steps must be carefully executed:
i. A set of fuzzy rules must be established;
ii. Apply input membership functions to fuzzify the inputs values;
iii. Establish a rule strength based on fuzzy rules and the combination of the fuzzified
inputs in step (ii);
iv. Identify the impacts of the rule by combining the rule strength in step (iii) with the
output membership function;
v. Integrate the impacts in step (iv) to get an output distribution;
vi. To obtain a crisp output, result obtained in output distribution (step v) must be
defuzzify.
In this section, we presented and discussed the fuzzy interference system architecture
and the evaluation rules. The system architecture which shows how inputs and decision
making is performed in a fuzzy inference System to give an output is depicted in Fig. 2.
From Fig. 2, the rule base phase contains a number of fuzzy if-then rules and the
database phase helps to define the membership function of the fuzzy set used in the
Improved Predictive System for Soil Test Fertility Performance 255
fuzzy rules. The decision-making phase performs the inference operation on the rules
generated. The fuzzification interface allows the transformation of the crisp input into
degrees of match with linguistic values while the de-fuzzification interface produces the
output by calculating the performance value with the help of suitable de-fuzzification
method. The proposed Fuzzy interference process consist of the following steps as
follows:
i. Fuzzification of the input variables
ii. Application of the fuzzy operator (AND or OR) in the antecedent
iii. Implication from the antecedent to the consequent
iv. Aggregation of the consequents across the rules
v. De-fuzzification
Knowledge-Base
Data- Rule-
base base
De-
Character Set Fuzzification
fuzzification
(CRISP)
Output
Decision-making unit
Table 2. Table showing the input variables and their linguistic values
Inputs 1: PH Water
Strongly Acid 5.0–5.5
Moderately Acid 5.6–6.0
Slightly Acid 6.1–6.5
Neutral 6.6–7.2
Slightly Alkaline 7.3–7.8
Input 2: NITROGEN (Total
N)
Very Low 0.3–0.5
Low 0.6–1.0
Moderately Low 1.1–1.5
Medium 1.6–2.0
Moderately High 2.1–2.4
(continued)
Improved Predictive System for Soil Test Fertility Performance 257
Table 2. (continued)
Input 3: Phosphorous
(BRAY-1-)MGKg-1
Very Low <3
Low 3–7
Moderate 7–20
High >20 Mg
Input 4: Potassium (K)
Very Low 0.12–0.2
Low 0.21–0.3
Moderate 0.31–0.6
High 0.61–0.73
Input 5:
ORGANIC CARBON
(gkg−1 O.M)
Very Low <4.0
Low 4.0–10.0
Moderate 10–14
High 14–20
Very High >20
Input 6: ZINC (DPTA)
Mgkg−1
Low <1.0
Medium 1.0–5.0
High >5.0
Input 7: BORON (HOT
H2O SOL) Mgkg−1
Very Low <0.35
Low <0.35–0.5
Medium 0.5–2.0
High <2.9
This section presents the implementation of the model and the results obtained (Figs. 3,
4, 5, 6, 7, 8, 9, 10, 11 and 12) with discussions. The hardware and software requirement
for effective implementation the proposed model are: Window OS (32 or 64-bits),
1 GB RAM, 250 GB HDD and Matlab R2008.
Fig. 3. The Mamdani Fuzzy Inference Sys- Fig. 4. Membership Function editor for input
tem Showing the Inputs (yellow) and Outputs variable “Nitrogen”
(Cyan) (Color figure online)
Improved Predictive System for Soil Test Fertility Performance 259
Fig. 5. Membership Function editor for input Fig. 6. Membership Function editor for input
variable “Phosphorus” variable “Potassium”
Fig. 7. Membership Function editor for out- Fig. 8. Membership Function editor for out-
put variable “Yam” put variable “Maize”
Fig. 9. Membership Function editor for output Fig. 10. Membership Function editor for
variable “Cassava” output variable “Cocoyam”
260 O. T. Arogundade et al.
Fig. 11. Screenshot of rule viewer 1 Fig. 12. Screenshot of rule viewer 2
From the rule viewer 1, it was deduced that irrespective of the level of the three
major input that was varied (i.e. Nitrogen, Phosphorus and Potassium) at every given
time the values of the outputs did not significantly change, that it remained at 0.5, this
represent an average output on the crops that have been considered for the purpose of
this study. For this paper, all other inputs except nitrogen, phosphorus and potassium
were not varied basically because these three elements in the soil is vital in growth and
maximum yield of the selected crops. Furthermore, this paper ultimately conclude that
with the set of rules we have put up and available data gotten the yield of this crops
(Outputs) is a function of the availability of these three key element (i.e. Nitrogen,
Phosphorus and Potassium) are not on a critical region. This study adopted Mamdani
Fuzzy Inference System (FIS) because it works well with adaptive and optimization
techniques, the seven inputs used was fed into the mamdani FIS. Although the
designing of the FIS membership functions of Mamdani-type is linear and also the crisp
output (Productivity and Performance) is generated.
5 Conclusion
This paper applied fuzzy logic rule to test for soil fertility using Mamdani-type FIS
model. The proposed model allows Fuzzy Outputs to be computationally efficient to aid
swift and effective decision making. Our fuzzy inference system was based on seven
macronutrients inputs (nitrogen, phosphorous, potassium, organic matter, boron, zinc,
PH of water) and three inputs NPK were varied for soil constituents in cultivating Yam,
Cassava, Maize and Cocoyam. The drawback of our proposed model is the intelligent
adaption to new discoveries and strategies. Similar to some existing decision support
tool, fuzzy logic systems need to be reviewed and updated considering the dynamics of
Improved Predictive System for Soil Test Fertility Performance 261
new discovery as a result of new research method or a change in the testing envi-
ronment. Future recommendation is to effectively explore and adapt to dynamics deep
learning methods in soil fertility test.
References
1. Agarwal, S., Bhangale, N., Dhanure, K., Gavhane, S., Chakkarwar, V.A., Nagori, M.B.:
Application of colorimetry to determine soil fertility through Naive Bayes classification
algorithm. In: 2018 9th International Conference on Computing, Communication and
Networking Technologies (ICCCNT), pp. 1–6. IEEE, July 2018
2. Cintula, P., Fermüller, C., Noguera, C. (eds.): Handbook of Mathematical Fuzzy Logic, vol.
3, (Mathematical Logic and Foundations, vol. 58). College Publications, London (2015)
3. Coroiu, A.M.: Fuzzy methods in decision making process-a particular approach in
manufacturing systems. In: IOP Conference Series: Materials Science and Engineering,
vol. 95, no. 1, p. 012154. IOP Publishing (2015)
4. Dang, K.B., Burkhard, B., Windhorst, W., Müller, F.: Application of a hybrid neural-fuzzy
inference system for mapping crop suitability areas and predicting rice yields. Environ.
Model. Softw. 114, 166–180 (2019)
5. Davatgar, N., Neishabouri, M.R., Sepaskhah, A.R.: Delineation of site specific nutrient
management zones for a paddy cultivated area based on soil fertility using fuzzy clustering.
Geoderma 173, 111–118 (2012)
6. Hudedagaddi, D.P., Tripathy, B.K.: Clustering approaches in decision making using fuzzy
and rough sets. In: Handbook of Research on Fuzzy and Rough Set Theory in Organizational
Decision Making, pp. 116–136. IGI Global (2017)
7. Kruse, R., Moewes, C.: Fuzzy systems, Fuzzy set theory (2015). https://fuzzy.cs.ovgu.de/ci/
fs/fs_ch02_fst.pdf. Accessed 11 Apr 2020
8. Leena, H.U., Premasudha, B.G., Basavaraja, P.K.: Sensible approach for soil fertility
management using GIS cloud. In: 2016 International Conference on Advances in
Computing, Communications and Informatics (ICACCI), pp. 2776–2781. IEEE, September
2016
9. Li, P., et al.: Soil quality assessment of wheat-maize cropping system with different
productivities in China: establishing a minimum data set. Soil Tillage Res. 190, 31–40
(2019)
10. Masrie, M., Rosman, M.S.A., Sam, R., Janin, Z.: Detection of nitrogen, phosphorus, and
potassium (NPK) nutrients of soil using optical transducer. In: 2017 IEEE 4th International
Conference on Smart Instrumentation, Measurement and Application (ICSIMA), pp. 1–4.
IEEE, November 2017
11. Mazloumzadeh, S.M., Shamsi, M., Nezamabadi-Pour, H.: Evaluation of general-purpose
lifters for the date harvest industry based on a fuzzy inference system. Comput. Electron.
Agric. 60(1), 60–66 (2008)
12. Majumder, M.: Multi criteria decision making. In: Majumder, M. (ed.) Impact of
Urbanization on Water Shortage in Face of Climatic Aberrations, pp. 35–47. Springer,
Singapore (2015). https://doi.org/10.1007/978-981-4560-73-3_2
13. Mokarram, M., Hojati, M.: Using ordered weight averaging (OWA) aggregation for multi-
criteria soil fertility evaluation by GIS (case study: Southeast Iran). Comput. Electron. Agric.
132, 1–3 (2017)
14. Moonjun, R., Shrestha, D.P., Jetten, G.: Fuzzy logic for fine-scale soil mapping: a case study
in Thailand. CATENA 190, 104456 (2020)
262 O. T. Arogundade et al.
15. Ngunjiri, M.W., Libohova, Z., Minai, J.O., Serrem, C., Owens, P.R., Schulze, D.G.:
Predicting soil types and soil properties with limited data in the Uasin Gishu Plateau Kenya.
Geoderma Reg. 16, e00210 (2019)
16. Novák, V., Štěpnička, M., Kupka, J.: Linguistic descriptions: their structure and
applications. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Chris-
tiansen, H. (eds.) FQAS 2013. Lecture Notes in Computer Science, vol. 8132, pp. 209–220.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40769-7_19
17. Pegalajar, M.C., Ruiz, L.G.B., Sánchez-Marañón, M., Mansilla, L.: A Munsell colour-based
approach for soil classification using Fuzzy Logic and Artificial Neural Networks. Fuzzy
Sets Syst. 401, 38–54 (2019)
18. Prabakaran, G., Vaithiyanathan, D., Ganesan, M.: Fuzzy decision support system for
improving the crop productivity and efficient use of fertilizers. Comput. Electron. Agric. 150,
88–97 (2018)
19. Rawankar, A., et al.: Detection of N, P, K fertilizers in agricultural soil with NIR laser
absorption technique. In: 2018 3rd International Conference on Microwave and Photonics
(ICMAP), pp. 1–2. IEEE, February 2018
20. Rodríguez, E., et al.: Dynamic quality index for agricultural soils based on fuzzy logic. Ecol.
Ind. 60, 678–692 (2016)
21. Sami, M., Shiekhdavoodi, M.J., Pazhohanniya, M., Pazhohanniya, F.: Environmental
comprehensive assessment of agricultural systems at the farm level using fuzzy logic: a case
study in cane farms in Iran. Environ. Model. Softw. 58, 95–108 (2014)
22. Sefati, Z., Khalilimoghadam, B., Nadian, H.: Assessing urban soil quality by improving the
method for soil environmental quality evaluation in a saline groundwater area of Iran.
CATENA 173, 471–480 (2019)
23. Seyedmohammadi, J., Sarmadian, F., Jafarzadeh, A.A., Ghorbani, M.A., Shahbazi, F.:
Application of SAW, TOPSIS and fuzzy TOPSIS models in cultivation priority planning for
maize, rapeseed and soybean crops. Geoderma 310, 178–190 (2018)
24. Shekofteh, H., Ramazani, F., Shirani, H.: Optimal feature selection for predicting soil CEC:
comparing the hybrid of ant colony organization algorithm and adaptive network-based
fuzzy system with multiple linear regression. Geoderma 298, 27–34 (2017)
25. Silvertooth, J.C.: Soil fertility and soil testing guideline for Arizona cotton (2015). https://
cals.arizona.edu/crops/cotton/soilmgt/soil_fertility_testing.html. Accessed 25 Oct 2020
26. Singh, H., et al.: Real-life applications of fuzzy logic. Adv. Fuzzy Syst. (2013)
27. Sirsat, M.S., Cernadas, E., Fernández-Delgado, M., Barro, S.: Automatic prediction of
village-wise soil fertility for several nutrients in India using a wide range of regression
methods. Comput. Electron. Agric. 154, 120–133 (2018)
28. Sumiharto, R., Hardiyanto, R.: NPK soil nutrient measurement prototype based on local
binary pattern and back-propagation. In: 2018 IEEE International Conference on Internet of
Things and Intelligence System (IOTAIS), pp. 23–28, November 2018
29. Sun, H., et al.: Effects of soil quality on effective ingredients of Astragalus mongholicus
from the main cultivation regions in China. Ecol. Ind. 114, 106296 (2020)
30. Wan, H., Peng, Y.: Fuzzy set based web opinion text clustering algorithm. In: 2015 4th
International Conference on Mechatronics, Materials, Chemistry and Computer Engineering.
Atlantis Press (2015)
31. Wang, H., Yao, L., Huang, B., Hu, W., Qu, M., Zhao, Y.: An integrated approach to
exploring soil fertility from the perspective of rice (Oryza sativa L.) yields. Soil Tillage Res.
194, 104322 (2019)
Improved Predictive System for Soil Test Fertility Performance 263
32. Wang, C.: A study of membership functions on mamdani-type fuzzy inference system for
industrial decision-making. M.Sc. thesis from Department of Mechanical and Mechanics.
Lehigh University (2015). https://preserve.lehigh.edu/cgi/viewcontent.cgi?article=2665&co
ntext=etd. Accessed 11 Apr 2020
33. Wang, H., Zhang, H., Liu, Y.: Using a posterior probability support vector machine model to
assess soil quality in Taiyuan, China. Soil Tillage Res. 185, 146–152 (2019)
34. Were, K., Tien, B.D., Dick, Ã., Singh, B.: Novel evolutionary genetic optimization-based
adaptive neuro-fuzzy inference system and GIS predict and map soil organic carbon stocks
across an Afromontane landscape. Pedosphere 27, 877–889 (2017)
35. Zadeh, L.H.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
36. Zadeh, L.A.: Generalized theory of uncertainty: principal concepts and ideas. In:
Fundamental Uncertainty, pp. 104–150. Palgrave Macmillan, London (2011)
37. Behera, R.K., Rath, S.K., Misra, S., Leon, M., Adewumi, A.: Machine learning approach for
reliability assessment of open source software. In: Misra, S. (ed.) ICCSA 2019. LNCS, vol.
11622, pp. 472–482. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24305-0_35
38. Blessing, G., Azeta, A., Misra, S., Chigozie, F., Ahuja, R.: A machine learning prediction of
automatic text based assessment for open and distance learning: a review. In: Abraham, A.,
Panda, M., Pradhan, S., Garcia-Hernandez, L., Ma, K. (eds.) IBICA 2019, vol. 1180,
pp. 369–380. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-49339-4_38
39. Alfa, A.A., Misra, S., Bumojo, A., Ahmed, K.B., Oluranti, J., Ahuja, R.: Comparative
analysis of optimisations of antecedents and consequents of fuzzy inference system rules lists
using genetic algorithm operations. In: Chillarige, R., Distefano, S., Rawat, S. (eds.) ICACII
2019. Lecture Notes in Networks and Systems, vol. 119, pp. 373–379. Springer, Singapore
(2019). https://doi.org/10.1007/978-981-15-3338-9_42
40. Kumari, A., Behera, R.K., Shukla, A.S., Sahoo, S.P., Misra, S., Rath, S.K.: Quantifying
influential communities in granular social networks using fuzzy theory. In: Gervasi, O. (ed.)
ICCSA 2020. Lecture Notes in Computer Science, vol. 12252, pp. 906–917. Springer, Cham
(2020). https://doi.org/10.1007/978-3-030-58811-3_64
Artificial Intelligence Applications to Tackle
COVID-19
Abstract. The emergence of COVID-19 with grey areas about the cure and
spread of the infection is a significant challenge for the 21st century world. The
COVID-19 pandemic has prodigious consequences on human lives and the
healthcare industry is constantly seeking support from modern decisive tech-
nology such as Artificial Intelligence (AI) to tackle this rapidly spreading
pandemic. AI mimicking human intelligence plays a crucial role to predict and
track patients, helps to analyze data for various aspects, including medical image
processing, drug and vaccine development, and prediction and forecast of this
disease. The present study aims to review the potentialities of Artificial Intel-
ligence to diagnose COVID-19 cases and analyze the database to assess it for
prevention and combat against COVID-19. We can extract the new insights in
drug discovery through deep learning AI algorithms to speed up the process of
drug and vaccine development. AI thus acts as a potential tool to quell COVID-
19. Besides, this article provides the finest review of the contribution of AI and
constraints to its impact against COVID-19. The potentiality of AI must be
mobilized to tackle COVID-19 that is crucial in saving human lives and restricts
the damages to the world’s economy.
1 Introduction
COVID-19 disease was first revealed late in December 2019 in China and has been
proclaimed a pandemic affecting almost all countries, having a significant effect on the
healthcare industry. Apart from the colossal toll of human lives it has caused, the worst
effect is on the global economy dragging many countries into recession. The appli-
cation of new technologies has been sought by the healthcare industry for surveillance,
prediction, drug development, and mitigation of the spread of this viral pandemic.
Artificial Intelligence (AI) is a promising technology to handle the COVID-19 pan-
demic. AI applications such as Computer vision, Machine learning, and Natural
Language Processing and others such as chatbot and facial recognition help the
healthcare industry to show high-risk patients, predict and diagnose the infection, track
the spread, and treat this infection [1, 2]. AI-based facial recognition cameras can track
patients with infection. Robots and drones are used to deliver medication and food and
to sanitize public areas [3]. AI aids in drug development and vaccines for the cure of
the infection. The detection of the infection is done through radiological image pro-
cessing through CT scan and x-ray image analysis, while thermal cameras and
smartphone apps are used to detect fever [4]. To confront the circumstance, healthcare
organizations have invested their huge endeavours in treating contaminated people and
for investigating the general population for COVID-19 disease. The national govern-
ment is putting forth a valiant effort to quell the disease and to meet the prerequisites
for the healthcare organization. Also, lamentably, as of now neither medication nor a
vaccine is available to counteract this novel coronavirus. Besides, nations have been
attempting different treatment strategies. The preventive measures established by WHO
is frequent hand-washing with soap and water or with a sanitizer, follow social dis-
tancing and, practice respiratory hygiene. Masks are known to safeguard individuals
from infection. Aside from it, the best way to prevent the spread of the infection is to
stay home and avoid social gatherings and isolation from infected people. Thus, it
requires a joint effort of the healthcare industry, government and, public to help control
the pandemic.
The literature survey is conducted utilizing Google Scholar, Scopus, PubMed,
Elsevier and, IEEE databases and a concise review is done on data by evaluating
different facets of AI technologies to fight COVID-19 pandemic. This study empha-
sizes the potentiality of AI applications to fight the COVID-19 pandemic, along with
the restrictions and limitations of the technique. Section 2 discusses the potential role of
Artificial Intelligence to fight COVID-19. Section 3 discusses constraints. The sub-
section showcases the importance of reliable and transparent data and guidelines for the
whole of society approach towards pandemic preparedness. Section 4 proposes insight
into novel approaches, and finally, a discussion of the impact made by these advances,
current challenges, and anticipated future effects along with the conclusion is presented.
mortality. So far, the capabilities of AI to tackle COVID-19 are mainly focused on the
diagnosis of the disease based on radiological imaging.
Early Detection and Diagnostic AI
An early, rapid, and a precise diagnosis of COVID-19 disease curb further spread of the
infection thus helps save human lives. It helps to generate prolific data based on which
AI models are trained to handle this crisis. Detection and analysis of symptoms with AI
algorithms offer a quick and cost-effective diagnosis. AI utilizing radiological tech-
nologies such as X-rays, and CT scans help the diagnosis of infected cases faster and
cheaper thus saving radiologist’s time and can be as accurate as humans [5]. Alibaba’s
AI-based framework can predict COVID-19 in about 20 s with almost 96% diagnostic
accuracy utilizing CT scan image analysis. A technique to scan CT images utilizing
mobile phones has been proposed by Maghdid et al. [4]. A Deep learning model can
aid radiologists for diagnosis of COVID-19 utilizing radiological image processing via
X-ray and CT scan images. COVID-Net is an AI application that uses a deep convo-
lutional neural network. Wang and Wong developed this tool to diagnose the virus
from the chest X-ray images. An accuracy of 89.5% has been achieved with an AI
migration neural network using CT scan images to diagnose COVID-19 disease [6].
This novel idea with an accuracy of 92% has been put forward by China’s Paddle
Paddle, Baidu’s open-source deep learning platform.
The potential of AI in diagnosis has not been put in practice, though AI-assisted
radiology technology has been practiced by the Chinese hospitals. But, due to the
paucity of available data to train AI models and due to data being available only from
Chinese hospitals may cause selection bias. Lastly, not all patients affected by COVID-
19 need intensive treatment. Machine Learning assisted by a prognostic prediction
algorithm for the prediction of the mortality risk factor in the patients affected by
COVID-19 has been deployed by Yan et al., but the sample used is very small (only 29
patients) to train AI models [7]. To summarize, much research has been incited for AI
applications for diagnosis and prognosis of the disease progression but is not yet in
practice. The prompt diagnosis of COVID-19 is essential for the prevention, control
and, treatment of the infection. Even though RT-PCR tests are proven to be optimal
methods for the determination of COVID 19, it has certain limitations due to its low
sensitivity; the time required for the examination and its implementation and perfor-
mance of the RT-PCR test kit poses restrictions promoting AI-driven diagnosis from
CT scan images [8] [Fig. 1]. Table 2 explains various AI applications in the CT scan
diagnosis of COVID-19. As coronavirus tests are expensive and short in deftly,
however, because of the easy availability of X-rays or CT scans, radiologists are able to
diagnose COVID-19 with the aid of Deep Learning using X-ray or CT images.
COVID-Net an artificial intelligence application is based on X-ray analysis from the
data taken from patients having other lung conditions and COVID-19 disease.
268 D. Shah and S. K. Bharti
A significant hindrance to progress with AI is a lack of reliable data, and not able to
understand its state of affairs is a significant risk to organizations. The research is
ongoing to discover a vaccine against coronavirus and so will take a long way to abate
this pandemic. A major issue of the pandemic is the lack of data and information sharing
about transmission, clinical symptoms, diagnostic tests and preventive measures.
A significant concern is the lack of reliable data to train an AI model. Also, 80% of the
data is unstructured to train a model to analyze patterns. Transparent and accurate data
sharing is the prime need of the present world to understand the impact of the disease
and to battle against novel coronavirus. Either insufficient data or excessive data impose
limitations on the applications of AI. While AI systems contribute something significant,
Petropolous [35] and Bullock et al. have commented on its drawbacks at the current time
in terms of operational maturity. There is inadequate data available to train AI models,
insufficient open datasets to research upon, and yet other potential issues of large
information hubris, and a piece of exceptional and torrential information, need assess-
ment beforehand. About the need for more data on COVID-19, for multidisciplinary
research to train AI models, more receptiveness and sharing of the data is in need. Most
of the research available until now inclines to use very little, biased, and only Chinese
data. Thus, need for more diagnostic testing is to ameliorate the tracking and prediction
of the pandemic. Lack of data, huge data hubris, and noisy social media limit the AI-base
prediction of COVID-19 spread to be precise and authentic. Because of this, forecasters
prefers epidemiological SIR models over AI models.
The open-source dataset platforms are created for the collection and instant sharing
of important data while preserving privacy. Various government organizations have
strategically implemented to give transparent and accurate information to evade panic
and distress among the public. Certain promising initiatives are given by various
organizations to share datasets. WHO Global Research promises to share datasets on
Coronavirus disease of the prevailing pandemic. CORD-19 having articles is made
Artificial Intelligence Applications to Tackle COVID-19 273
accessible for data mining. It is a joint drive by Semantic Scholar, Microsoft, and
Facebook, etc. Kaggle has released data competition, CORD-19 Challenge. Elsevier
and Science Direct has made peer-reviewed research articles available for data mining.
The Lens, have made accessible its patented data to reinforce the research on new and
repurposed drugs. Few Universities have permitted their data and literature on COVID-
19 for open access. Italy has adopted a transparent strategy to refrain the public from
panic and confusion through overflowing social media. Not just insufficient data, but
even excessive data may restrict AI applications. With the progress of the pandemic,
the news and social media create huge data noise. With the daily overload of around
over 100 scientific articles, the researcher deals with flooding of data. To overcome
this, data analytic tools such as the COVID-19 Evidence Navigator by Gruenwald et al.
imparts updates on computer-generated evidence maps daily of this pandemic [36].
Lack of awareness and misinformation of the pandemic creates panic and distress
that is definitely more difficult to handle causing psychological effects on public health.
The government should carry out awareness campaigns, offer education of the pan-
demic, and carry out protective measures to control panic. Transparent and reliable
guidance to manage pandemic is of utmost importance to avoid panic and confusion
and to encourage public compliance.
4 Future Scope
5 Conclusion
Researchers are exploring the best possible method to quell Coronavirus pandemic, and
Artificial Intelligence presents an alluring technique. It is a very useful upcoming tool
for early detection of Coronavirus infection and helps to monitor the health of the
infected patients. AI applications can offer proper isolation methods to reduce mortality
in high-risk infected cases, help to plan treatment strategies for COVID-19, vaccine,
and drug development by various useful algorithms. AI-based on analysis of available
274 D. Shah and S. K. Bharti
data can hasten the research on this novel SARS-CoV-2. Application of these tech-
nologies gear up the digital transformation, providing telemedicine solutions, limiting
pointless hospital consultations. It enforces social distancing measures and allows
better transportation management for ensuring the safety of the patients. This tech-
nology provides an online platform for distance education of COVID-19 in remote
areas during the lockdown. Even though technology advancement is helping people
battle against COVID-19, concerns are there of using these algorithms in a real-time
environment. Despite achieving accuracy in diagnosing COVID-19 disease using
radiological image processing through X-rays and CT scan image analysis, this tech-
nology is still unaccountable and lacks lucidity. To summarize, even though there lies a
wide scope of utilizing AI applications to curb this pandemic; but, the development of
these technologies is not enough to be operational. Only time will let AI applications to
be utile to combat the emerging rampant diseases. There lies a threat to data privacy if
the governments continue to surveil its nationals even after the pandemic is under
control. The government authorities should take crucial steps in the analysis of the data
to combat the pandemic with proper justifications because of public health concerns,
otherwise, the public may lose trust in government and are less likely to follow health
advice. Lastly, although use of AI to quell the pandemic is limited, it has led to the
automation of human labor, has digitalized the economy and as a result of the current
crisis, a proper policy may come into existence for the administration of Artificial
Intelligence.
References
1. Javaid, M., Haleem, A., Vaishya, R., Bahl, S., Suman, R., Vaish, A.: Industry 4.0
technologies and their applications in fighting COVID-19 pandemic. Diabetes Metab. Syndr.
Clin. Res. Rev. 14, 419–422 (2020)
2. Vaishya, R., Javaid, M., Khan, I.H., Haleem, A.: Artificial Intelligence (AI) applications for
COVID-19 pandemic. Diabetes Metab. Syndr. Clin. Res. Rev. 14, 337–339 (2020)
3. Kumar, A., Gupta, P.K., Srivastava, A.: A review of modern technologies for tackling
COVID-19 pandemic. Diabetes Metab. Syndr. Clin. Res. Rev. 14, 569–573 (2020)
4. Maghdid, H.S., Ghafoor, K.Z., Sadiq, A.S., Curran, K., Rabie, K.: A novel AI-enabled
framework to diagnose coronavirus covid 19 using smartphone embedded sensors: design
study. arXiv preprint arXiv:2003.07434, 16 March 2020
5. Bullock, J., Pham, K.H., Lam, C.S., Luengo-Oroz, M.: Mapping the landscape of artificial
intelligence applications against COVID-19. arXiv preprint arXiv:2003.11336, 25 March
2020
6. Wang, S., et al.: A deep learning algorithm using CT images to screen for Corona Virus
Disease (COVID-19). MedRxiv, 1 January 2020
7. Yan, L., Zhang, H., Goncalves, J., et al.: A machine learning-based model for survival
prediction in patients with severe COVID-19 infection. medRxiv Prepr 2020. https://doi.org/
10.1101/2020.02.27.20028027
8. Yang, Y., et al.: Laboratory diagnosis and monitoring the viral shedding of 2019-nCoV
infections. MedRxiv, 1 January 2020
9. Jin, C., et al.: Development and evaluation of an AI system for COVID-19 diagnosis.
medRxiv. 1 January 2020
Artificial Intelligence Applications to Tackle COVID-19 275
10. Li, L., et al.: Artificial intelligence distinguishes COVID-19 from community acquired
pneumonia on chest CT. Radiology (2020)
11. Wang, L., Wong, A.: COVID-Net: a tailored deep convolutional neural network design for
detection of COVID-19 cases from chest X-ray images. arXiv preprint arXiv:2003.09871, 22
March 2020
12. Xu, X., Jiang, X., Ma, C., et al.: Deep learning system to screen coronavirus disease 2019
pneumonia, p. 1e29 (2020). https://arxiv.org/abs/2002.09334.
13. Huang, P., et al.: Use of chest CT in combination with negative RT-PCR assay for the 2019
novel coronavirus but high clinical suspicion. Radiology 295(1), 22–23 (2020)
14. Lei, J., Li, J., Li, X., Qi, X.: CT imaging of the 2019 novel coronavirus (2019-nCoV)
pneumonia. Radiology 295(1), 18 (2020)
15. Ai, T., et al.: Correlation of chest CT and RT-PCR testing in coronavirus disease 2019
(COVID-19) in China: a report of 1014 cases. Radiology 26, 200642 (2020)
16. Fang, Y., et al.: Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology
19, 200432 (2020)
17. He, J.L., et al.: Diagnostic performance between CT and initial real-time RT-PCR for
clinically suspected 2019 coronavirus disease (COVID-19) patients outside Wuhan. China.
Respiratory Med. 21, 105980 (2020)
18. Long, C., et al.: Diagnosis of the Coronavirus disease (COVID-19): rRT-PCR or CT? Eur.
J. Radiol. 25, 108961 (2020)
19. Akhtar, M., Kraemer, M.U., Gardner, L.M.: A dynamic neural network model for predicting
risk of Zika in real time. BMC Med. 17(1), 171 (2019)
20. Song, P., et al.: An epidemiological forecast model and software assessing interventions on
COVID-19 epidemic in China. medRxiv. Epub, 3 March 2020
21. Naudé, W.: Artificial intelligence vs COVID-19: limitations, constraints and pitfalls. AI Soc.
35(3), 761–765 (2020). https://doi.org/10.1007/s00146-020-00978-0
22. Yan, L., Zhang, H., Goncalves, J., et al.: A machine learning-based model for survival
prediction in patients with severe COVID-19 infection. medRxiv Prepr 2020. https://doi.org/
10.1101/2020.02.27.20028027.
23. Smith, M., Smith, J.C.: Repurposing therapeutics for COVID-19: supercomputer-based
docking to the SARS-CoV-2 viral spike protein and viral spike protein-human ACE2
interface Repurposing therapeutics for COVID-19: supercomputer-based docking to the
SARS-CoV-2 viral spike protein and viral spike protein-human ACE2 interface
24. Sohrabi, C., Alsafi, Z., et al.: World Health Organization declares global emergency: a
review of the 2019 novel coronavirus (COVID-19). Int. J. Surg. (2020)
25. Beck, B.R., Shin, B., Choi, Y., Park, S., Kang, K.: Predicting commercially available
antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug-target
interaction deep learning model. Comput. Struct. Biotech. J. 18, 784–790 (2020)
26. Regalado A.: A Coronavirus Vaccine will take at least 18 months if it works at all
27. Liu, T., et al.: The potential role of IL-6 in monitoring coronavirus disease 2019. SSRN
3548761, 1 March 2020.
28. Wang, M., et al.: Remdesivir and chloroquine effectively inhibit the recently emerged novel
coronavirus (2019-nCoV) in vitro. Cell Res. 30(3), 269–271 (2020)
29. Zhang, J., et al.: Teicoplanin potently blocks the cell entry of 2019-nCoV. BioRxiv, 1
January 2020
30. Richardson, P., et al.: Baricitinib as potential treatment for 2019-nCoV acute respiratory
disease. Lancet (LondonEngland). 395(10223), e30 (2020)
31. Yang, L., Li, Z., Bai, T., Hou, X.: Discovery of potential drugs for COVID-19 based on the
connectivity map
276 D. Shah and S. K. Bharti
32. Arya, R., Das, A., Prashar, V., Kumar, M.: Potential inhibitors against papain-like protease
of novel coronavirus (SARS-CoV-2) from FDA approved drugs
33. Chun, A.: In a time of coronavirus, Chinas investment in AI is paying off in a big way. South
China Morning Post, 18 March 2020.
34. Nemati, E., Rahman, M.M., Nathan, V., Vatanparvar, K., Kuang, J.: A comprehensive
approach for cough type detection. In: 2019 IEEE/ACM International Conference on
Connected Health: Applications, Systems and Engineering Technologies (CHASE), 25
September 2019, pp. 15–16. IEEE (2019)
35. Petropoulos, G.: Artificial Intelligence in the Fight against COVID-19, Bruegel, 23 March
2020
36. Gruenwald, E., Antons, D., Salge T.: COVID-19 evidence navigator. Institute for
Technology and Innovation Management, RWTH Aachen University (2020)
37. Wang, Y., Hu, M., Li, Q., Zhang, X.P., Zhai, G., Yao, N.: Abnormal respiratory patterns
classifier may contribute to large-scale screening of people infected with COVID-19 in an
accurate and unobtrusive manner. arXiv preprint arXiv:2002.05534, 12 February 2020
38. Qi1, X., Jiang, Z., Yu, Q., et al.: Machine learning based CT radiomics model for predicting
hospital stay in patients with pneumonia associated with SARSCoV- 2 infection: a
multicentre study. Pediatr. Clin. North Am. 13(3) 2020. https://doi.org/10.1016/s0031-3955
(16)31867-3
Throat Inflammation Based Mass Screening
of Covid-19 on Embedded Platform
Abstract. The upsurge of the COVID-19 pandemic and its fatal repercussions
make non-invasive and automated mass screening inevitable to ensure that the
suspected cases can be quarantined, and the dissemination of the disease can be
controlled during the incubation period of the coronavirus. Currently, thermal
screening being the only non-invasive mass-screening technique with the
drawback of concealing the feverish symptoms by the consumption of parac-
etamol, developing an efficient alternative for mass-screening becomes essential.
Throat inflammation is a vital perceivable symptom of COVID-19 for early
detection and mass screening of COVID. This paper proposes the identification
of Tonsillitis and Pharyngitis by detecting the Region of Interest, followed by
performing HSV segmentation on the region around uvula for Pharyngitis
detection and template matching using tonsil gland followed by Canny and
HOG transform on the ROI cropped image for Tonsillitis detection. The pro-
posed method can be administered without direct contact with the infectious
patients, thereby reducing the burden on the medical and paramedical fraternity.
Quantitative and qualitative evaluation showcase promising results and indicate
the reliability of the proposed method. Moreover, the proposed technique can be
implemented on the cost-effective, power-efficient, and parameter-constrained
embedded platforms, including NVIDIA Jetson Nano.
1 Introduction
The novel coronavirus disease is a highly contagious disease that rapidly spread across
the world and has become a concern for public health and welfare, with WHO
declaring it as a pandemic [1]. Many symptoms such as reduction in WBC (White
Blood Cells) count, fever, pneumonia are known to be the clinical characteristics of
SARS-CoV-2 (Severe Acute Respiratory Syndrome Novel Corona Virus). Recently, it
has also been noted that Tonsillitis and Pharyngitis may also be potential symptoms,
which appear in very early stages.
Coronavirus is known to spread through cough droplets in which it can live for
around 3 h, and its incubation period is known to be from 2 to 14 days. The virus enters
the body from the upper respiratory tract, and it multiplies in the mucosa of
nasopharynx and oropharynx, and hence a person may feel irritation in the throat for
the first few days. Redness in the throat and inflammation of the tonsils gland may be
observed, as shown in Fig. 1. This serves as the motivation for this research because if
this can be diagnosed in early stages, then the segregation of potential coronavirus
patients from the rest of the population can be done, which may help to control the
spread of this virus.
2 Related Work
Thermal screening is the common method identified for the mass screening for the
COVID-19 disease as fever is a common symptom. Due to the high contagiousness of
the disease, an early identification requirement arises. But the accuracy of thermal
screening is affected by human, environmental, and equipment variables. Also, fever is
not constant for the infected patients and it fluctuates. Hence, techniques better than
thermal screening are required [2]. Another process identified for early detection of the
SARS-COV-2 is using chest radiography or X-Rays, along with Deep-learning models
for detection of consolidation areas, and nodular opacities for infected patients.
Throat Inflammation Based Mass Screening of Covid-19 on Embedded Platform 279
Following a major drawback is the detection from X-Rays is successful only if the
infected person has pneumonia symptoms else the method fails [3–5].
Tonsillitis and Pharyngitis are identified as major throat infections, which are also
symptoms of the other virus of the coronavirus family. Due to the unavailability of a
good throat dataset, classification using artificial neural networks may seem irrational.
Tonsillitis can be detected from throat images by traditional image processing methods.
For the identification of tonsillitis by the fuzzy logic system [6], a 3-Channel image is
input to the system. After the preprocessing, the red channel boundary was identified
for tonsil size extraction. From the extracted image, tonsil size, area, feature, and color
were extracted which was given input to a fuzzy-based inference system. This method
is said to give 80% accuracy. Another method ascertained was to use a green channel
image for tonsil extraction, followed by mean value calculation and using a Sobel edge
detector [7]. In this, the RGB image and Tonsil extracted area are input for the
extraction of red color intensity and tonsil area in pixels. So, early diagnosis can be
performed.
For fast implementation, this research is focused to develop better techniques for
early diagnosis which can be easily computed on edge devices without compromising
heavy processing power. As image segmentation and localization plays an important
role in the detection of tonsillitis and pharyngitis from the throat, a multistage algorithm
which uses template matching is found useful for faster detection of tonsillitis [8]. In
this cited work, template matching compares a match between two images using
correlation which is found to be beneficial in faster implementation of the proposed
method. For the detection of pharyngitis, which can be identified as from redness of the
throat, HSV based segmentation is applied. An auto-crop and unsupervised clustering
method are identified [9] which works on segmentation based on HSV colour space.
The object is separated from the background followed by pixel processing and
calculation.
3 Proposed Method
Medical diagnosis from the throat is a complex task in the absence of a medical
professional. Using Artificial Neural Networks (ANNs) for image processing requires a
lot of processing power as well as a good dataset, which increases the cost factor
required for a portable device that could aid in mass-screening. So, a method that
involves low computation cost, as well as a better on-device performance with ready to
deploy methodology is needed. Moreover, this research endeavors to diagnose
symptoms of Tonsillitis and Pharyngitis, which may indicate the possibility of an
individual being infected with COVID-19.
Hence, a technique is required, which can help in mass screening to reduce the
computation and processing cost factor with good reliability. Using traditional image
processing methods, which include a series of segmentation, transformation, filtering,
280 P. Dalal et al.
and equalization along with correlation coefficient based template matching [10] has
performed profitably for the faster and cost-effective mass screening (Fig. 2).
3.2 Segmentation
A color image can be represented in different ways, through different color models. The
most common color models are RGB, HSV, YCrCb, and all these models have dif-
ferent applications. RGB is an additive color model, which means that different colors
can be represented by the addition of color pixels of Red, Green, and Blue. Different
proportions of these colors can be used to represent different colors. This model is
essentially used by display devices such as monitors, television to display different
colors. However, even the slight variations in color, in the form of brightness, satu-
ration, or illumination is strongly dependent on the three colors mentioned above,
which make this model less ideal for image processing applications.
HSV color space is more commonly used for image processing applications, such
as segmentation [11]. HSV color space closely matches the perception of how humans
understand the colors. This model is good at isolating color information such as hue
and saturation from grayscale information such as intensity or brightness. Hence, the
values of hue and saturation are not affected by different levels of brightness and
illumination conditions, which make it ideal for image processing applications such as
image segmentation. Moreover; conversion HSV color space to RGB can be easily
done using the following equations
B ¼ V ð 1 SÞ ð1Þ
S cosH
R ¼ V 1þ ð2Þ
cosð60 H Þ
G ¼ 3V ðR þ BÞ ð3Þ
Here H, S, and V stand for Hue, Saturation, and Value respectively, and B, G, and
R stand for Blue, Green, and Red, respectively.
282 P. Dalal et al.
Where,
X
T 0 ðx0 ; y0 Þ ¼ Tðx0 ; y0 Þ 1=ðw hÞ x00 ; y00 Tðx00 ; y00 Þ ð5Þ
X
I 0 ðx þ x0 ; y þ y0 Þ ¼ Iðx þ x0 ; y þ y0 Þ 1=ðw hÞ x00 ; y00 Iðx þ x00 ; y þ y00 Þ ð6Þ
(I denotes image, T template, R result). This summation is done over the template
and the image patch: x′ = 0…w − 1, y′ = 0…h − 1.
For the detection of tonsillitis, a template image of Tonsil transformed using canny-
edge and Histogram of Gradients (HoG) is used. Canny edge and HOG are applied to
the image obtained after the ROI cropping step, and the correlation coefficient is
compared using the Eq. (4). When the input image is tonsillitis positive, the correlation
coefficient has high value than a threshold, and lower in case of tonsillitis negative. By
comparing the correlation coefficient, the presence of swollen tonsil glands can be
computed.
This section elaborates on the experiments conducted and the dataset preparation
process. The metrics used to evaluate performance have been explained briefly.
After the segmentation, the foremost step was to select a template image for ROI
detection as well as tonsil gland identification. Testing through various template images
for Region of Interest, a template image was selected that was converted to grayscale
and different image transforms like Canny, Histogram of Gradients (HOG) were
applied. The best results were obtained with a combination of Canny Edge on the
grayscale template, followed by HOG. The same transforms are carried out on the input
image on which the template matching is performed. The main advantage of this
resulted that the proposed method was successful in determining the images which
belonged to both the classes, i.e., there were chances that a patient is infected with
Pharyngitis as well as tonsillitis, multi-class classification became possible.
Confusion Matrix
This metric is useful to derive all other metrics described henceforth. This metric
tabulates all the possibilities of predictions that can be given by the model based on
some particular input (Table 1).
Sensitivity
This metric measures the proportion of the correctly predicted positives to the actual
positives. This metric is especially useful to determine the model's performance on
cases pertaining to the infectious samples. It is denoted as given in Eq. (7).
True Positive
Sensitivity ¼ ð7Þ
True Positive þ False Negative
Specificity
This metrics gives a proportion of correctly predicted negatives to the total number of
negatives. This metric is especially useful to determine the model's performance on
cases pertaining to the normal samples. It is denoted as given in Eq. (8).
True Negatives
Specificity ¼ ð8Þ
True Negatives þ False Positives
True Positive
PPV ¼ ð9Þ
True Positive þ False Positive
True Negative
NPV ¼ ð10Þ
True Negative þ False Negative
286 P. Dalal et al.
False Positive
FPR ¼ ð11Þ
False Positive þ True Negative
False Negatives
FNR ¼ ð12Þ
False Negatives þ True Positives
Accuracy
It is the overall accuracy of the system, a ratio of samples identified correctly to the
total number of samples. Let TP denote True Positive, TN denotes True Negative, and
FP denote False Positive and FN denote False Negative. Accuracy may be denoted as
in Eq. (13).
TP þ TN
Accuracy ¼ ð13Þ
TP þ FP þ TN þ FN
F1 Score
F1 score is the harmonic mean of precision and recall and is one of the indicators that
give the overall performance of a model. It may be denoted as in Eq. (14).
2 Precision Recall
F1 Score ¼ ð14Þ
Precision þ Recall
TP TN FP FN
MCC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð15Þ
ðTP þ FPÞðTP þ FN ÞðTN þ FN ÞðTN þ FPÞ
Throat Inflammation Based Mass Screening of Covid-19 on Embedded Platform 287
5 Conclusion
This paper proposes a unique method for early detection and mass screening of
potential coronavirus patients based on Tonsillitis symptoms. The proposed technique
aims to identify the symptoms of Pharyngitis and Tonsillitis in the throat that occur in
the early stages of coronavirus infection. Detection of Region of Interest is the pivotal
step in for identification of Tonsillitis and Pharyngitis which is accomplished using
Template Matching algorithm. HSV segmentation on the region around the uvula is
performed for Pharyngitis detection. Template matching using tonsil gland followed by
Canny and HOG transform on the ROI cropped image is performed for Tonsillitis
detection. The proposed method can be implemented on power-efficient, parameter-
constrained, and cost-effective NVIDIA’s Jetson Nano embedded platform. Quantita-
tive and qualitative evaluation showcase promising results and indicate the reliability of
the proposed method. Future improvements can be achieved by using better dataset and
by overcoming the high distribution of dataset source.
References
1. https://www.who.int/emergencies/diseases/novel-coronavirus-2019. Accessed 15 June 2020
2. Tan, L., Wang, Q., Zhang, D., et al.: Lymphopenia predicts disease severity of COVID-19: a
descriptive and predictive study. Signal Transduct. Target. Ther. 5, 33 (2020). https://doi.
org/10.1038/s41392-020-0148-4
3. Wang, L., Lin, Z.Q., Wong, A.: COVID-Net: a tailored deep convolutional neural network
design for detection of COVID-19 cases from chest X-ray images. arXiv:2003.09871 (2020)
4. Oh, Y., Park, S., Ye, J.C.: Deep learning COVID-19 features on CXR using limited training
data sets. IEEE Trans. Med. Imaging 39, 2688–2700 (2020). https://doi.org/10.1109/TMI.
2020.2993291
5. Fan, D., et al.: Inf-net: automatic COVID-19 lung infection segmentation from CT images.
IEEE Trans. Med. Imaging 39, 2626–2637 (2020). https://doi.org/10.1109/TMI.2020.
2996645
288 P. Dalal et al.
Abstract. The 2019 coronavirus pandemic which started infecting the people
of Wuhan, China during December 2019 has affected many countries world-
wide within a span of 4–5 months. This has forced the countries to close their
borders resulting in a global lockdown. The World Health Organization declared
the disease as a pandemic during early March this year. As of 15th April 2020,
nearly 3 months since the spread of the disease, no vaccine has been developed
and preventive measures such as social distancing and countrywide lockdown
seem to be the only way to prevent it from spreading further. The rising death
toll indicates the need to carry out extensive research to aid medical practitioners
as well as the governments worldwide to comprehend the rapid spread of the
disease. While many research papers have been published explaining the origin
and theoretical background of the disease, further research is needed to develop
better prediction models. The data for the problem was generated from the
sources available during the course of this study. This paper extensively ana-
lyzes the medical features of 269 patients using various Machine Learning
techniques such as KNN, Random Forest, Ridge classifier, Decision Tree,
Support Vector Classifier and Logistic Regression. The paper aims to predict the
fatality status of an individual diagnosed with COVID-19 by assessing various
factors including age, symptoms, etc. The experimental results from the research
would help medical practitioners to identify the patients at higher risk and
require extra medical attention, thereby helping the medical practitioners to
prioritize them and increase their chances of survival.
1 Introduction
market tested positive indicating the origin of the virus. On 31st December 2019, the
Government of China alerted the World Health Organization upon which the Huanan
market was ordered to be closed until further investigation. On 7th January 2020, the
cause of the outbreak was identified by the Chinese Center for Disease Control and
Prevention (CDC) which was later named 2019-nCoV by the World Health Organi-
sation. It emerged as a coronavirus that shared more than 95% homology with the bat
coronavirus and more than 70% homology with SARS-CoV. As of date, 6 coronavirus
species are known to exist which cause human disease, 4 of which i.e. 229E, OC43,
NL63, and HKU1 cause common cold symptoms. During late January this year, people
worldwide were returning to their hometowns after visiting Wuhan to celebrate the
Chinese New Year. This accounts for one of the largest travel activities worldwide.
Soon people worldwide started getting infected and this resulted in the outbreak
becoming a pandemic, as termed by the World Health Organisation during early March
this year. The number of cases has increased exponentially over the past few months
which suggests the occurrence of human-to-human transmission. The hospital-related
transmission was identified when clusters of doctors or patients were found to be
infected at a particular point of time and a potential source of infection could be tracked
[4]. Studying various patients globally, the early symptoms shown by the patients
include fever, difficulty in breathing, dry cough, diarrhea, and fatigue [6]. This is
followed by the gradual development of dyspnea. In severe cases, patients experience
inflammatory shocks, septic shock, MODS, etc. leading to death [6]. The uncontrolled
spread of the disease can be attributed to the respiratory and extra respiratory routes. As
of 15th April 2020, a total number of 1,914,916 people have been infected by COVID-
19 with 123,010 deaths. The mortality rate is 3.4% as estimated by the World Health
Organization and the incubation time for this disease is 1–14 days.
1.2 Prevention
The disease is known to transmit through large droplets released when a person coughs or
sneezes [9]. The person transmitting the infection may be asymptomatic as well, making
the spread of this disease difficult to contain [9]. Studies are indicative of the presence of
higher viral loads in the nasal cavity as compared to the throat, making it imperative for
the person to cover his/her mouth or nose when traveling in public places [12]. Many
prevention techniques have been employed by various nations which include restricted
movement of people, preventing large gatherings, scanning suspected individuals,
quarantining patients with a travel history and in many cases, country-wide lockdown.
The best possible preventive measures that exist as of 15th April 2020 include self-
sanitation with special emphasis on hand sanitation and not touching one’s face.
1.3 Treatment
To date, no effective vaccine or therapy has been developed for COVID-19. According
to the guidelines laid by the National Health Committee, patients are currently
receiving oxygen therapies such as a nasal catheter, mask oxygen and high-flow
oxygen therapy [6]. Treatment methodology is dependent on the severity of the case.
Many factors such as symptomatic treatment, prevention of complications and
Predicting Novel CoronaVirus 2019 291
secondary infections, play a key role in determining respiratory and circulatory reports
[6]. Lopinavir and ritonavir are reported to have the potential to treat SARS infections
and may be a contributing factor in treating patients infected with COVID-1. In cases
where the patient experienced reduced lymphocyte count and thrombocytopenia,
symptomatic treatment was given [6]. It is imperative that drugs and vaccines are
developed as soon as possible. Currently, the usage of intravenous immunoglobulin has
also been recommended to improve the ability of anti-infection for extreme cases. The
use of steroids (methylprednisolone 1–2 mg/kg per day) has been recommended for
patients suffering from ARDS. Overall, no particular treatment has been recommended
for coronavirus infection except for meticulous supportive care.
As of date, there have been no reports regarding the mention of any methodology
which distinguishes patients requiring instantaneous medical attention and their fatality
status. Therefore, the task to identify the patients at imminent risk becomes a difficult task.
2 Literature Review
Fong, S. J., Li, G., Dey, N., Crespo, R. G., & Herrera-Viedma, E. (2020). Finding
an Accurate Early Forecasting Model from Small Dataset: A Case of 2019-nCoV
Novel Coronavirus Outbreak. arXiv preprint arXiv:2003.10776. The authors
propose a methodology that encompasses the process of augmenting the existing data,
distinguishing the best forecasting model from other models, and improving the model
to achieve higher accuracy. The newly constructed algorithm namely polynomial
neural network with corrective feedback (PNN + cf) is capable of forecasting the
disease with decreased prediction error.
Zhang, N., Zhang, R., Yao, H., Xu, H., Duan, M., Xie, T.,… & Zhou, F. (2020).
Severity Detection For the Coronavirus Disease 2019 (COVID-19) Patients Using
a Machine Learning Model Based on the Blood and Urine Tests. The paper studied
the application of machine learning algorithms to build the COVID-19 severity
detection model. The paper showed that SVM or support vector Machine showed an
accuracy of 0.8148 on the test dataset. The study included data of 137 patients iden-
tified with COVID-19. This study included the binary classification problem between
75 severely ill and 62 other patients with mild symptoms.
Jia, L., Li, K., Jiang, Y., & Guo, X. (2020). Prediction and Analysis of Coron-
avirus Disease 2019. arXiv preprint arXiv:2003.05447. This paper applied the
Logistic model, Bertalanffy model and Gompertz model. The validity of the models
was proved by fitting and analyzing the trends of SARS. In general, the Logistic model
proved to be the best model in this study. The paper analyzed the number of people
infected in Wuhan, non-Hubei and China.
3 Methodology
Machine Learning Techniques
A number of machine learning algorithms are being used by researchers worldwide to
enhance the accuracy of their findings and results. One of the widely used classification
techniques is the k-Nearest Neighbor (k-NN) classifier where the class is determined by
the nearest neighbors. It is also known as Memory-based Classification based on the
requirement of training examples in memory at run-time. Also, since this technique is
based on training examples, it is also known as Example-based Classification. The
selection criteria for k nearest neighbors are dependent on the distance metric. The widely
used approach to determine the class of query is to assign the majority class to the query.
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Xi¼1
k
ðxi yi Þ2 Distance calculation in KNN
In a tree classifier, unwanted computations are eliminated since it uses only particular
subsets against which the sample is tested. This is in contrast with traditional single-
stage classifiers. The decision tree classifier is based on a multistage decision-making
approach. It works on the approach of breaking a complex decision into a culmination
Predicting Novel CoronaVirus 2019 293
of various simpler decisions with the goal that the final result would resemble the
desired result. The building of a Decision Tree involves the calculation of entropy
using the frequency of one attribute and two attributes, both.
Xi¼1
E ð SÞ ¼ c
pi pi Mathematical Formula for Entropy
Another classifier, known as the Random Forest classifier, is widely used in machine
learning applications. The classifier constructs each tree by utilizing a different boot-
strap sample of the dataset. Standard trees involving each node are split using the best
split wears in Random Forest node splitting is carried out by using the best among a
subset of predictors chosen randomly at that point. This classifier is known to perform
better than many existing classification techniques. Random Forest classifier also
involves the use of Gini Impurity which is defined as the probability of a chosen sample
being incorrectly labeled if it was labeled by the distribution of samples in the node.
Xi¼1
I G ð nÞ ¼ 1 j
ð pi Þ 2 Gini Impurity of a node n
TP þ TN
Accuracy ¼
TP þ FP þ FN þ TN
Precision is used to quantify the number of positive class predictions that belong to
positive cases. It is given by:
TP
Precision ¼
TP þ FP
294 U. Soni et al.
TP
Recall ¼
TP þ FN
F1 Score takes into consideration the false positives and false negatives into
account.
Precision Recall
F1Score ¼ 2
Precision þ Recall
Data Source
The data samples used in this study were collected from Kaggle source (link: https://
www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset?select=COVID19
_line_list_data.csv) and they were further processed to generate a new dataset. Initially,
the primary dataset was filtered in MS Excel to obtain the required columns and row
entries. Data entries containing the description of the symptoms in the ‘symptom’
column were chosen. This was followed by the removal of unwanted features, con-
sidering the primary purpose of this study. The features which were irrelevant for our
analysis were removed. All the features were assigned the value ‘1’ in case they were
present in the clinical features of a patient and ‘0’ wherever they were absent. For
example, the value ‘1’ was assigned to the column named ‘cough’ if the particular
patient showed cough as one of the symptoms, otherwise 0. The feature ‘Days taken by
the doctors to confirm’ was obtained by the difference between symptom onset date and
reporting date. The value assigned to ‘male’ and ‘female’ in the ‘Gender’ column was
‘1’ and ‘0’ respectively. The dataset was further split into training and testing data into
two separate CSV files on the basis of the attributes -‘death’ and ‘recovered’. The
training dataset contained the feature named ‘death status’ of the patient. It was given
the value ‘1’ if the patient had died. Another feature named ‘recovery status’ was
created wherein the patients who recovered were given the value ‘1’. These two col-
umns were merged into one column named ‘fatality status’ which was given ‘1’ if the
person had died and ‘0’ if the person had recovered. The previous two attributes were
removed. There were 50-row entries and 24 attributes in our train dataset. The test
dataset contained the entries for which the ‘death’ and ‘recovered’ attributes both were
‘0’ implying that those patients were under-recovery and their death status was not
declared, hence we needed to predict the death status for these entries. There were 219-
row entries and 24 attributes in our test dataset. All the attributes contained in the
dataset are explained below in the table (Table 1):
Predicting Novel CoronaVirus 2019 295
Method
The prediction model is primarily based on Machine Learning techniques involving
classifiers. Initially, libraries such as Pandas, NumPy and TensorFlow were used. Train.
csv (containing the training data) was read in a Pandas DataFrame named ‘train’ and
Test.csv (containing the testing data) was read in a Pandas DataFrame named ‘test’.
A Pandas Series named ‘Y’ stored the label fatality status for training data. An array
named ‘X’ contained all the training data except the fatality status attribute. An array
named ‘X_test’ contained all the testing data except the death status attribute (which is
to be predicted). There were missing values in X and X_test which were imputed with
the mean along each column using SimpleImputer. From X, a cross-validation set was
prepared using train_test_split with the cross-validation set size being 0.3(30%) of X
and the rest i.e. 0.7(70%) of X was treated as the training set thereafter. After this,
296 U. Soni et al.
several ML classifiers were fit on the training data. The metrics used for comparison
between the classifiers and choosing the best option for prediction of the fatality status
for the testing data were accuracy, precision, recall and F1 score of the cross-validation
set.
Certain classifiers parameters were set which are as follows:
KNN: n_neighbors = 5,
Logistic regression: C = 2,
Support Vector Classifier: kernel = 'linear', C = 1,
Decision Tree: random_state = 0,
Random Forest: random_state = 1,
Ridge Classifier: no change, sticking to the default values.
The study showed that KNN Machine Learning Algorithm could predict the fatality
status of a patient most accurately with a 100% performance. KNN showed the highest
precision, recall and F1 score thereby showing that the algorithm is the most reliable
one to predict the health of an individual. The model implied can be used to test
patients who are at a higher risk of dying. This would allow medical practitioners to
optimize their efforts and time so as to save more patients. In such precarious times
when the shortage of medical staff and medical resources has become a contributing
factor in the death of large numbers of people, the application of this method would
help the medical practitioners to identify patients at higher risk so as to save more lives.
The accuracy, precision score, F1 score and Recall score for each Machine Learning
Algorithm is shown below (Table 2 and Fig. 1):
The number of male deaths that occurred accounted for 72.2%. Similarly, the
number of female deaths that occurred accounted for 27.77% of the cases respectively.
The number of cases where the hospital took more than 7 days (1week) to confirm that
the person was infected or not, accounted for 66.67% of the total number of cases
(Table 3).
While conducting this study authors came across several results that can help medical
practitioners to carry out practices that would help the patients thereby increasing their
chances of survival. Some of them are as follows:
The time taken by medical practitioners played a key role in the fatality status of
male patients of the age group of >70 years and hence they should be tested on a
priority basis. It is imperative for medical practitioners to take extra care while
checking multiple patients since patients of age < 30 years old can act as carriers of the
virus for patients of age >70 years who are at a higher risk of getting infected and not
recovering. This could lead to mass community transmission even when the patients
are being treated and tested simultaneously. The appointment of separate doctors for
testing of each age group should be practiced to minimize the chances of transmission
between young and old people. Male patients of age >45 years old experienced higher
chances of being infected with Malaise. Hence, medical resources used to cure
pneumonia need to be made available to this age group. Male patients of age >43 years
old had higher chances of being infected by Diarrhea. Hence, medical resources used to
cure pneumonia need to be made available to this age group. Female patients of age
>50 years old had higher chances of being infected by Sputum. Hence, medical
resources used to cure sputum need to be made available to this age group. Male
patients of age above 65 years old had higher chances of being infected by Dyspnea.
Hence, medical resources used to cure pneumonia need to be made available to this age
group. The immune system of a human body becomes weak with age and hence
doctors with age >40 should take extra precaution while treating and testing patients.
With the scarce number of ventilators, medical resources and healthy medical practi-
tioners worldwide, it is imperative for hospitals to strategize methods for the quick
recovery of the patients and the successful protection of the medical staff. This study
would help them to do the same.
Limitation of this Study
While our model achieves high performance, other models with higher performance
can be obtained with the increase in data available. The data available online is scarce
as compared to the information available from hospitals, which as of 15th April 2020,
cannot be accessed due to the rising risk of infection. Therefore, this study can be
further improved in the future if more symptom centric data is made available by
various hospitals and organizations. Look forward to subsequent large sample and
multi-centered studies.
Predicting Novel CoronaVirus 2019 299
6 Conclusion
The study primarily aims to predict the fatality status of a patient using Machine
learning algorithms. This would not only enable early intervention but it would also
play a key role in decreasing the mortality rate in patients who are at a higher risk of
dying due to COVID-19. This paper employed KNN as the technique showing higher
performance as compared to other algorithms in terms of F1 score, precision, recall and
accuracy. The experimental findings further revealed that Malaise, sputum, and dys-
pnea were the major symptoms apart from fever and cough. The model employed by
the paper is believed to prove beneficial in detecting high-risk patients. The key sig-
nificance of this study is the recommendations for medical practitioners which would
not only help them to optimize their efforts and medical resources towards the patients
but also be instrumental in decreasing the cases of community transmission. Though
the study employed a dataset of 269 patients, it is, however, the largest dataset on
symptoms available as of date. COVID-19 is a worldwide pandemic and requires
further research to enable effective prevention and treatment.
References
1. Huang, C., et al.: Clinical features of patients infected with 2019 novel coronavirus in
Wuhan, China. Lancet 395(10223), 497–506 (2020)
2. Wax, R., Christian, M.: Practical recommendations for critical care and anesthesiology teams
caring for novel coronavirus (2019-nCoV) patients. Can. J. Anesth./J. canadien d’anesthésie
67(5), 568–576 (2020). https://doi.org/10.1007/s12630-020-01591-x
3. Chen, Z.-M., et al.: Diagnosis and treatment recommendations for pediatric respiratory
infection caused by the 2019 novel coronavirus. World J. Pediatrics 16(3), 240–246 (2020).
https://doi.org/10.1007/s12519-020-00345-5
4. Wang, D., et al.: Clinical characteristics of 138 hospitalized patients with 2019 novel
coronavirus–infected pneumonia in Wuhan, China. Jama 329, 1061–1069 (2020)
5. Holshue, M., et al.: First case of 2019 novel coronavirus in the United States. New Engl.
J. Med. 382(10), 929–936 (2020). https://doi.org/10.1056/NEJMoa2001191
6. Pan, Y., Guan, H.: Imaging changes in patients with 2019-nCov (2020)
7. Ruan, Q., Yang, K., Wang, W., Jiang, L., Song, J.: Clinical predictors of mortality due to
COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care
Med. 46, 846–848 (2020). https://doi.org/10.1007/s00134-020-05991-x
8. Wang, M., et al.: Remdesivir and chloroquine effectively inhibit the recently emerged novel
coronavirus (2019-nCoV) in vitro. Cell Res. 30(3), 269–271 (2020)
9. Corman, V.M., et al.: Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-
PCR. Eurosurveillance 25(3), 2000045 (2020)
10. Zhao, S., et al.: Preliminary estimation of the basic reproduction number of novel
coronavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early
phase of the outbreak. Int. J. Infect. Dis. 92, 214–217 (2020)
11. Gong, F., et al.: China’s local governments are combating COVID-19 with unprecedented
responses—from a Wenzhou governance perspective. Front. Med. 14(2), 220–224 (2020)
12. Sabino-Silva, R., Jardim, A.C.G., Siqueira, W.L.: Coronavirus COVID-19 impacts to
dentistry and potential salivary diagnosis. Clin. Oral Invest. 24(4), 1619–1621 (2020). https://
doi.org/10.1007/s00784-020-03248-x
300 U. Soni et al.
13. Zhou, T., et al.: Preliminary prediction of the basic reproduction number of the Wuhan novel
coronavirus 2019‐nCoV. J. Evid.-Based Med. 13(1), 3–7 (2020)
14. Zhan, C., Tse, C., Fu, Y., Lai, Z., Zhang, H.: Modelling and prediction of the 2019
coronavirus disease spreading in china incorporating human migration data. Available at
SSRN 3546051 (2020)
15. Xie, J., Tong, Z., Guan, X., Du, B., Qiu, H., Slutsky, A.S.: Critical care crisis and some
recommendations during the COVID-19 epidemic in China. Intensive Care Med. 46(5),
837–840 (2020). https://doi.org/10.1007/s00134-020-05979-7
16. Cunningham, A.C., Goh, H.P., Koh, D.: Treatment of COVID-19: old tricks for new
challenges. Crit. Care 24, 91 (2020). https://doi.org/10.1186/s13054-020-2818-6
17. Arabi, Y.M., Murthy, S., Webb, S.: COVID-19: A novel coronavirus and a novel challenge
for critical care. Intensive Care Med. 46, 1–4 (2020). https://doi.org/10.1007/s00134-020-
05955-1
18. Ni, L., Zhou, L., Zhou, M., Zhao, J., Wang, D.: Combination of western medicine and
Chinese traditional patent medicine in treating a family case of COVID-19. Front. Med. 14
(2), 210–214 (2020). https://doi.org/10.1007/s11684-020-0757-x
19. Chung, M., et al.: CT imaging features of 2019 novel coronavirus (2019-nCoV). Radiology
295, 202–207 (2020)
20. Lai, C.-C., Shih, T.-P., Ko, W.-C., Tang, H.-J., Hsueh, P.-R.: Severe acute respiratory
syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): the
epidemic and the challenges. Int. J. Antimicrob. Agents 55(3), 105924 (2020). https://doi.
org/10.1016/j.ijantimicag.2020.105924
21. Backer, J., Klinkenberg, D., Wallinga, J.: Incubation period of 2019 novel coronavirus
(2019-nCoV) infections among travellers from Wuhan, China, 20–28 January 2020.
Eurosurveillance 25(5), 2000062 (2020). https://doi.org/10.2807/1560-7917.ES.2020.25.5.
2000062
22. Ji, L.-N., et al.: Clinical features of pediatric patients with COVID-19: a report of two family
cluster cases. World J. Pediatrics 16(3), 267–270 (2020). https://doi.org/10.1007/s12519-
020-00356-2
23. Salzberger, B., Glück, T., Ehrenstein, B.: Successful containment of COVID-19: the WHO-
Report on the COVID-19 outbreak in China. Infection 48(2), 151–153 (2020). https://doi.
org/10.1007/s15010-020-01409-4
24. Lu, H.: Drug treatment options for the 2019-new coronavirus (2019-nCoV). Biosci. Trends
14(1), 69–71 (2020)
25. Guo, Y.R., et al.: The origin, transmission and clinical therapies on coronavirus disease 2019
(COVID-19) outbreak–an update on the status. Military Medical Research 7(1), 1–10
(2020). https://doi.org/10.1186/s40779-020-00240-0
26. Wang, H., Wang, S., Kaijiang, Y.: COVID-19 infection epidemic: the medical management
strategies in Heilongjiang Province, China. Crit. Care 24(1), 107 (2020). https://doi.org/10.
1186/s13054-020-2832-8
27. Li, B., et al.: Prevalence and impact of cardiovascular metabolic diseases on COVID-19 in
China. Clin. Res. Cardiol. 109(5), 531–538 (2020). https://doi.org/10.1007/s00392-020-
01626-9
28. Riou, J., Althaus, C.: Pattern of early human-to-human transmission of Wuhan 2019 novel
coronavirus (2019-nCoV), December 2019 to January 2020. Eurosurveillance 25(4),
2000058 (2020)
29. World Health Organization: Novel Coronavirus (2019-nCoV): Situation Report, 3 (2020)
30. Wu, A., et al.: Genome composition and divergence of the novel coronavirus (2019-nCoV)
originating in China. Cell Host & Microbe 27, 325–328 (2020)
Evaluation of Image Filtering Parameters
for Plant Biometrics Improvement Using
Machine Learning
1 Introduction
Machine learning (ML) has continued to enjoy wide range of patronage as a flexible
data mining tool and a repository of several building-block learning algorithms for
conceptualizing thoroughbred skillful models focused on different domains of research
interests. Its vast application in the area of content based image retrieval system is
noteworthy as research works continued to discover leverages offered by the artificial
intelligence use case subcategory. Machine Learning can be viewed as a way through
which algorithms performance is enhanced with time based on feedback with two
primary approaches including supervised and unsupervised machine learning approa-
ches. In supervised learning, there exist a model of both input and output training
dataset such that the algorithm is executed until it can arrive at the desired output goal
and these set of algorithms could either fall into classification or regression category
[1]. Classifications algorithms simply classify input to match desired output while
Regression analysis is a statistical method for approximating the relationships among
variables. Both are used in predictive and forecasting tasks. Unsupervised machine
learning algorithms unravels structures from dataset by clustering or grouping data
points for spotting patterns. The essence of machine learning descriptions in literature
suggests the study of mining information from a pool of data thereby making it possible
to learn some information from data that were hitherto not obvious. ML wide accep-
tance and high applicability trend has enjoyed wide spread acceptance from end users
such that the moral burden of profound explainability lies on the researcher each time
to enhance transparency of proposed ML model [2]. Transparency is believed to
involve different efforts to offer stakeholders, especially end users, with necessary
information about functionalities of a model and ways may include publishing algo-
rithm’s code, disclosing properties of the training method and most importantly data-
sets used [3] which further underscores the paramount relevance of the nature, scope
and composition of data deployed for predictive analytics. However, data available for
mining could be textual, numeric or image and ML algorithms targets patterns
encapsulated within data by training a subset of the data and classification follows
based on the initial pattern-seeking training and the correlation observed between
independent data attributes and dependent output class category [4].
While several elements continued to serve as dataset for machine learning tasks,
images sometimes referred to as pictures, has been objects of data mining in other
research tasks in the areas of biometrics [5], identity management [6], image processing
[4], medical data mining tasks [7], handwriting detection [8], including agriculture
related research works [9] etc. Aforementioned notwithstanding, image preprocessing,
segmentation, filtering etc. comes before classification for a seamless ML task which
yields image attributes that forms the numeric composition of the featured metadata and
these attributes are transformed into applicable data types and forms the pool of the
instances deployed for the predictive analytics of intended research interest. A single
image filter can be used on the set of images or a group approach all targeted at better
output performance. The applied additional filters, if more than one, add new numeric
data to each instance of the dataset which eventually influences the classification
learning precursor to the testing phase. The collection of image filtering algorithms
Evaluation of Image Filtering Parameters for Plant Biometrics Improvement 303
available for plant biometrics with their unique parameters, characteristics, target
features etc. and variable nature of image datasets for machine learning tasks motivated
this work in a bid to establish best practices for enhanced classification.
The aim of this paper therefore is to determine the effect of combining filters for
processing image dataset, importance of size of feature attributes extracted from ima-
ges, the size of each image instances, type of features extracted by filters, and the size
of dataset itself through the deployment of synthetic minority oversampling technique
(SMOTE) for an enhanced performance accuracy metrics. The Waikato environment
for knowledge analysis (WEKA) tool was used for the preprocessing and classification
phases of the proposed model. The organization of the rest of the paper is as follows:
Sect. 2 of this paper reviewed several image filtering and preprocessing literatures that
offered related models to this study while Sect. 3 describes the methodology deployed
in this research work. Section 4 discusses results and Sect. 5 concluded this work with
recommendations and possible future works.
2 Related Works
The source of image dataset deployed for filtering processes differ and a germane factor
in the overall intention of having a reliable classification process of training and testing.
Sources include online image laboratories [10–12], modified copies of field images
targeting spots of interests across images to suit purpose and intent [13, 14], and field
images directly taken from the use case or natural habitat [15, 16] but for the online
laboratory images, they are already often categorized along use cases to cater for
unique application domains or area of interest [17] while the field images are at best for
image identification and returns better classification result [15].
In an image classification attempt, [4] applied random forest ML algorithm to
discover the effect of different filters on four distinct categories of image dataset to
further determine roles played by category of image dataset on output accuracy. The
study discovers that filter combinations got better presentation than a single filter-
approach while filter combinations on artworks returned a lower 83.42% accuracy
result in comparison to a 99.76% output on natural image dataset. Morphological
features of images are often targets of image preprocessing or segmentation tasks while
experiments could center on greyscale or colored images depending on the purpose of
filtering but the work of [18] targets image contour disparities popularly characterized
by amoebas. Experiments on grayscale and color images demonstrate that amoeba
filters outdo orthodox morphological operations with a fixed structured element for
noise reduction tasks. Tests on artificial 3D images shows the high noise-decline
capability of amoeba-based filters.
Authors of [19] opines that of all factors contributing to good image quality in
single-photon emission computed tomography (SPECT), image filtering is an impor-
tant, but mostly partially applied, image-processing factor. An overview of currently
available SPECT filtering choices was done and discussed in this work to offer guid-
ance on choice. It discovered Hann and Butterworth filters allow precise estimates to
304 T. Olaleye et al.
most filter types while suggesting the use of limited filter types in an effort to regulate
image-processing methods, which may lead to improved predictability. The work of
[20] describes new discovery on segmentation of textured gray-scale images through
pre-filtering of image inputs and fractal features. The work uses fractal-based features
which rely on textural characteristics, not intensity elements and to reduce number of
features deployed for segmentation, significance of features is observed using a test
similar to the F-test, while less significant features were exempted in the clustering
process. The authors used a variable window size to extend the K-means algorithm
while ensuring preservation of details of boundary.
A preliminary study on four variables augmentation method, augmentation rate,
size of basic dataset per label, and method combination was conducted by [21] herein
to unravel influence of the variables on deep learning tasks noting that combination of
two geometric methods reduces performance while combinations with at least one
photometric method improves performance especially pairing geometric and photo-
metric methods. Order of methods in combination is discovered to have no impact on
performance. In the first half of this work, [22] proposes fast algorithms for bilateral
image filtering (BLF) and nonlocal means (NLM) as against the kernel filtering dis-
cussed in their work. They achieved state-of-the-art fast algorithms for BLF of
grayscale images and extended idea of quick color filtering which involves calculation
of a three-dimensional kernel. Second half of the work features introduction of a scale-
adaptive variant of BLF used for defeating fine surfaces in images by developing swift
execution of a symmetrized variant of NLM that used for regularization within the
plug-and-play structure for image refurbishment.
[23] proposes a computerized proactive discovery of rice plant diseases using
image processing and SVM and random forest classification algorithms. An 82.41%
performance accuracy was recorded by SVM which is far better than the random forest
classification algorithm. The classification of rice plant is the research focus of [24] by
converting videos of rice plants into images before preprocessing and subsequent
classification of needed morphological regions. Features extracted from the images
through the #D technology serves as inputs of their 250 dataset by extracting 18 color
and 21 texture features. Researchers in [25] makes use of CNN by image processing to
discover the maturity level of Banana (Cavendish), Mango (Carabao), and
Calamansi/Calamondin, through categorization into pre-matured, matured, and over-
matured. The model is written in Spyder in Anaconda Navigator, which applies
Tensorflow-GPU and Keras coupled with the usage of CUDA and CUDDN to process
the data and determine results. RGB and grayscale datasets were used for classification
returning a state-of-the-art accuracy in output.
The work of [26] concerns the use of image processing to assess the reabsorption
rate of the implanted HA, and b-TCP commercial granules in bone flaws after a
granuloma removal in oral and maxillofacial surgery. Cone-beam computed tomog-
raphy dimensions were gathered at baseline and at 12-month. Result on 2D images
shows good reabsorption of the granules. Edge detection in road markings during
image preprocessing and segmentation was deployed in the work of [27] to aid
automatic patrol robots to identify road boundaries without errors. The experimental
Evaluation of Image Filtering Parameters for Plant Biometrics Improvement 305
results show that proposed method is efficient in reducing the edge features of road
marking lines while [28] deployed Superpixel segmentation for image preprocessing in
an attempt to diagnose age-Related Macular Degeneration (AMD) as a foremost retinal
disease that causes vision loss. Digital image preprocessing and other ML algorithms
were deployed in this work to diagnose AMD-fundus in images. The fundus images
went through intensity alteration and two-sided filtering followed by optic disc removal
and Superpixel dissection using Simple Linear Iterative Clustering. Intensity-based
statistics and Texton-map Histogram attributes were extracted from images for sub-
sequent classification by chosen ML. Work of [29] is an image retrieval system pre-
mised on manifold feature fusion. The multi attribute approach first combines the
global and local features images by setting weights differently, and then retrieval based
on adaptive coefficients of the fusion features. Result shows single retrieval extraction
outcome trails the model of the authors and likewise improves the query proficiency
with impressive P-R curve and good application value. In [30], an overview of machine
learning theories and algorithms describes the thoroughbred expertise inherent in the
machine learning subfield of artificial intelligence while establishing the truism of
machine learning being the future of technological and scientific advancements. The
paper describes the ‘what’, ‘how ‘and ‘why ‘of its applications. There are several other
related works available in literature [31–34].
Technical Gaps in Literature: Two major technical gaps prominent in most litera-
tures consulted are (1) the failure to factor in the prejudice of classification results when
imbalanced class dataset is deployed for training phase of predictive analytics which
inadvertently affects predictive result; and (2) the need to unravel the implication of
mixing filters of different classes (intra-filtering) for preprocessing of image datasets as
against the adoption of related filters (inter-filtering) deployed in literatures. These gaps
led to the design of the proposed work for subsequent intervention. Intervention:
Oversampling of minority classes is adopted to reduce level of classification partiality
while both inter-filtering and intra-filtering approaches were conducted to compare
accuracy outcomes as reported in (2).
3 Methodology
Four phases defines the scope of this work including the plant image capturing and
categorization, image preprocessing and filtering, synthetic minority oversampling
technique application, and the machine learning classification phases. The output of
each phase directly serves as inputs into the immediate subsequent phase and the
pipeline is as described by the activity diagram on Fig. 1.
306 T. Olaleye et al.
Fig. 2. Imbalanced class distribution of image instances across the three datasets
each 109 instances across datasets, while noting that the combined filtering abilities and
number of attributes add up for the combined categories. Figure 6 is the arff file of
dataset after filtering showing numeric values of features extracted. These filter com-
binations adopted gave the best results even though results of others are close enough
for adoption meaning the image filters deployed in WEKA are highly efficient without
significant inferiority in the processing of plant images chosen for this work. The
application order is also insignificant because with each filter, WEKA extracts unique
numeric data as attributes for each instance before machine learning classification
phase.
neighbors in such an highly efficient manner inheriting the exact characteristic nature of
the parent minority. The new synthetic data generated along the line connecting each
minority sample scales up the minority class with the new synthetic versions as cap-
tured on Fig. 6 below. Upon SMOTE application, the machine learning classification
phase ensues to determine the experimental result.
Upon successful image filtering sessions on the DS_1, DS_2 and DS_3, a corre-
sponding seven versions of arff file were generated including Arf_1, Arf_2, Arf_3,
Arf_4 and Arf_5 on single filters and CArf_1 and CArf_2 on combined filtering tasks.
The seven arff files were deployed for the subsequent machine learning classification
using Sequential Minimal Optimization (SMO) function algorithm as an improved
version of SVM, designed to solve the high quadratic programming problems of SVM.
SVM is widely used in image classification predictive analytics in literature and with an
improved SMO, classification is sure to return an improved accuracy by simply
breaking the optimization problem into a series of smaller possible sub-problems,
which are then solved methodically.
With the generation of the seven arff files prior to classification by SMO, they are
presented for oversampling of minority classes and the resulting synthetic additions for
balancing through SMOTE application on DS_1 is presented in Fig. 7 thereby
increasing the representations of the minority classes immediately after the 109th
instance while Fig. 8 shows the projection plots of the three SMOTE iterations on
DS_1 which improved minority classes in the process. Table 2 shows the performance
accuracy of SMO machine learning on Arf_1, Arf_2 and CArf_1 which seeks to dis-
cover the combinatorial advantage or otherwise of filters and the influence of attribute
size while Table 3 shows the effect of dataset nature, effect of kind of features extracted
from filter and combinatorial effect and Table 4 shows performance on dataset of
bigger size to discover the impact of image sizes on classification accuracy. Having
recorded same performance accuracy figures on a larger dataset (474 instances), the
graph of weighted averages of Precision, Recall, F-measure and ROC areas of the
coloured (DS_1) and grayscale (DS_2) datasets are presented in Fig. 9 for further
clarification while the confusion matrix of DS_1 by F_3 is presented on Table 5
showing the number of correct and incorrect predictions.
Fig. 7. arff_1 file showing added synthetic instances of the minority classes by SMOTE
Evaluation of Image Filtering Parameters for Plant Biometrics Improvement 311
Fig. 8. Projection plots of three SMOTE iterations on DS_1 showing synthetic additions for
class balancing
Table 2. Classification performance of filters over attribute size and combinatorial effect
S/N Experimental parameters Performance (%)
Initial inst. size SMOTE iterations
Filter_id Datset_id Arff_id Size of Attr. 109 134 244 314
1. F_1 DS_1 Arf_1 1024 66.9725 73.1343 88.9344 91.4013
2. F_2 DS_1 Arf_2 192 50.4587 62.6866 83.1967 87.2611
3. CF_1 DS_1 CArf_1 1216 68.8073 74.6269 88.9344 92.6752
Table 3. Classification performance over dataset nature, filter features, and filter combination
S/N Experimental parameters Performance %
Initial size SMOTE iterations
Filter_id Dataset_id Arff_id Attr.size 109 134 244 314 474
1. F_3 DS_2 Arf_4 756 38.5321 52.9851 76.2295 84.7134 90.0844
2. F_3 DS_1 Arf_3 756 33.0275 49.2537 77.8689 83.4395 90.0844
3. F_3 DS_1 Arf_3 756 33.0275 49.2537 77.8689 83.4395 90.0844
3b F_1 DS_1 Arf_1 1024 66.9725 73.1343 73.1343 88.9344 91.4013
4. CF_2 DS_1 CArf_2 1780 70.6422 74.6269 88.1148 92.0382 95.1477
= Confusion Matrix =
a b c d e f g h i j k l m <-- classified as
4 1 0 0 1 1 0 0 0 0 3 0 0 | a = bitter_leaf
0 66 0 0 0 0 0 0 0 0 0 0 0 | b = akintola
2 0 4 0 2 0 0 1 0 0 0 1 0 | c = sand_paper
0 0 0 30 0 0 0 0 0 0 0 0 0 | d = cassava
2 1 0 0 5 2 0 0 0 0 0 0 0 | e = pawpaw
0 0 0 1 2 1 2 0 2 1 1 0 0 | f = cashew
1 1 0 1 0 1 5 0 1 0 0 0 0 | g = pineapple
0 0 3 0 1 0 0 4 1 1 0 0 0 | h = scent
0 0 0 0 0 0 0 0 99 0 0 0 0 | i = lime
1 0 0 2 1 0 1 1 2 0 2 0 0 | j = plantain
0 0 0 0 0 0 0 0 0 0 77 0 0 | k = lemon
0 0 0 0 0 0 0 0 0 0 0 77 0 | l = orange
0 0 0 0 0 0 0 0 0 0 0 0 55 | m= green_grass
5 Conclusion
The tests intend to discover through experimental results, the effect of combining filters
for processing image dataset, importance of size of feature attributes extracted from
images, the size of each image instances, type of features extracted by filters, and the
size of dataset itself. Experimental results after the machine learning phase by SMO
reveals significant improvements when filters are combined from 91.4013% accuracy
to 95.1477%. Bigger sizes of extracted image attributes likewise returned a 4.1402%
improvement at one instance and a 1.3169% improvement at another instance of
classification. A 0.4219% improvement was recorded on bigger input image sizes of
413 550 pixels over 338 450 pixels. Extracted features that aligns with the nature
of dataset also improved classification accuracy compared to attributes of less similarity
with the characteristic nature of image dataset. Across all classifications and iterations,
a bigger dataset size returns better classification accuracies. As a future work to this
study, distinct datasets of different sources and texture are recommended for further
evaluation as against dataset of same nature used in this work.
Acknowledgement. The authors profoundly express gratitude to Covenant University for her
support through CUCRID.
314 T. Olaleye et al.
References
1. Mirza, S., Mittal, S., Zaman, M.: Decision support predictive model for prognosis of diabetes
using SMOTE and decision tree. Int. J. Appl. Eng. Res. 13(11), 9277–9292 (2018)
2. Bhatt, U., et al.: Explainable machine learning in deployment. In: Proceedings of the 2020
Conference on Fairness, Accountability, and Transparency, pp. 648–657 (2020)
3. Mitchell, M., et al.: Model cards for model reporting. In: Proceedings of the Conference on
Fairness, Accountability, and Transparency, pp. 220–229. ACM (2019)
4. Abidin, D.: Effects of image filters on various image datasets. In: Proceedings of the 2019
5th International Conference on Computer and Technology Applications, pp. 1–5 (2019)
5. Greitans, M., Pudzs, M., Fuksis, R.: Palm vein biometrics based on infrared imaging and
complex matched filtering. In: Proceedings of the 12th ACM Workshop on Multimedia and
Security, pp. 101–106 (2010)
6. Khan, S., Javed, M.H., Ahmed, E., Shah, S.A., Ali, S.U.: Facial recognition using
convolutional neural networks and implementation on smart glasses. In: 2019 International
Conference on Information Science and Communication Technology (ICISCT), pp. 1–6.
IEEE (2019)
7. Alghamdi, M., Al-Mallah, M., Keteyian, S., Brawner, C., Ehrman, J., Sakr, S.: Predicting
diabetes mellitus using SMOTE and ensemble machine learning approach: the Henry Ford
ExercIse Testing (FIT) project. PLoS ONE 12(7), e0179805 (2017)
8. Kato, N., Suzuki, M., Omachi, S.I., Aso, H., Nemoto, Y.: A handwritten character
recognition system using directional element feature and asymmetric Mahalanobis distance.
IEEE Trans. Pattern Anal. Mach. Intell. 21(3), 258–262 (1999)
9. Zahara, M.: Identification of morphological and stomatal characteristics of Zingiberaceae as
medicinal plants in Banda Aceh, Indonesia. In: IOP Conference Series: Earth and
Environmental Science, vol. 425, no. 1, p. 012046. IOP Publishing (January 2020)
10. Wang, Z., Chi, Z., Feng, D.: Shape based leaf image retrieval. IEE Proc.-Vis. Image Signal
Process. 150(1), 34–43 (2003)
11. Mokhtar, U., Ali, M.A.S., Hassanien, A.E., Hefny, H.: Identifying two of tomatoes leaf
viruses using support vector machine. In: Mandal, J.K., Satapathy, S.C., Sanyal, M.K.,
Sarkar, P.P., Mukhopadhyay, A. (eds.) Information Systems Design and Intelligent
Applications. AISC, vol. 339, pp. 771–782. Springer, New Delhi (2015). https://doi.org/
10.1007/978-81-322-2250-7_77
12. Prasvita, D.S., Herdiyeni, Y.: Medleaf: mobile application for medicinal plant identification
based on leaf image. Int. J. Adv. Sci. Eng. Inf. Technol. 3(2), 5–8 (2013)
13. Kumar, M., Gupta, S., Gao, X.Z., Singh, A.: Plant species recognition using morphological
features and adaptive boosting methodology. IEEE Access 7, 163912–163918 (2019)
14. Priyankara, H.C., Withanage, D.K.: Computer assisted plant identification system for
android. In: 2015 Moratuwa engineering research conference (MERCon), pp. 148–153.
IEEE (April 2015)
15. Thomas, V., Chawla, N.V., Bowyer, K.W., Flynn, P.J.: Learning to predict gender from iris
images. In: 2007 First IEEE International Conference on Biometrics: Theory, Applications,
and Systems, pp. 1–5. IEEE (September 2007)
16. Hossain, J., Amin, M.A.: Leaf shape identification based plant biometrics. In: 2010 13th
International Conference on Computer and Information Technology (ICCIT), pp. 458–463.
IEEE (December 2010)
17. Lee, S.H., Chang, Y.L., Chan, C.S., Remagnino, P.: Plant identification system based on a
convolutional neural network for the LifeClef 2016 plant classification task. In: CLEF
(Working Notes), pp. 502–510 (2016)
Evaluation of Image Filtering Parameters for Plant Biometrics Improvement 315
18. Lerallut, R., Decencière, É., Meyer, F.: Image filtering using morphological amoebas. Image
Vis. Comput. 25(4), 395–404 (2007)
19. Van Laere, K., Koole, M., Lemahieu, I., Dierckx, R.: Image filtering in single-photon
emission computed tomography: principles and applications. Comput. Med. Imaging Graph.
25(2), 127–133 (2001)
20. Kasparis, T., Charalampidis, D., Georgiopoulos, M., Rolland, J.: Segmentation of textured
images based on fractals and image filtering. Pattern Recognit. 34(10), 1963–1973 (2001)
21. Hu, B., Lei, C., Wang, D., Zhang, S., Chen, Z.: A preliminary study on data augmentation of
deep learning for image classification. arXiv preprint arXiv:1906.11887 (2019)
22. Ghosh, S.: Kernel-based image filtering: fast algorithms and applications. In: SIGGRAPH
Asia 2019 Doctoral Consortium, pp. 1–3 (2019)
23. Pascual, E.J.A.V., Plaza, J.M.J., Tesorero, J.L.L., De Goma, J.C.: Disease detection of Asian
rice (Oryza Sativa) in the Philippines using image processing. In: Proceedings of the 2nd
International Conference on Computing and Big Data, pp. 131–135 (October 2019)
24. Buenaventura, J.R.S., Kobayashi, J.T., Valles, L.M.P., De Goma, J.C., Balan, A.K.D.:
Classification of varietal type of Philippine rice grains using image processing through multi-
view 3D reconstruction. In: Proceedings of the 2nd International Conference on Computing
and Big Data, pp. 140–144 (October 2019)
25. Ayllon, M.A., Cruz, M.J., Mendoza, J.J., Tomas, M.C.: Detection of overall fruit maturity of
local fruits using convolutional neural networks through image processing. In: Proceedings of
the 2nd International Conference on Computing and Big Data, pp. 145–148 (October 2019)
26. El Byad, H., Hakim, S., Ezzahmouly, M., Ed-Dhahraouy, M., El Moutaouakkil, A., Hatim,
Z.: Application of image processing in dentistry: evaluation of bone regeneration. In:
Proceedings of the 4th International Conference on Big Data and Internet of Things, pp. 1–5
(October 2019)
27. Chen, S., Liu, Z.: An image processing method reducing road marking line edge features for
patrol robots identifying road boundaries. In: Proceedings of the 2019 3rd International
Conference on Advances in Image Processing, pp. 29–33 (November 2019)
28. De Goma, J.C., Binsol, O.J.D., Nadado, A.M.T., Casela III, J.P.A.: Age-related macular
degeneration detection through fundus image analysis using image processing techniques.
In: Proceedings of the 2019 3rd International Conference on Software and e-Business,
pp. 146–150 (December 2019)
29. Han, X., Li, Y., Zheng, Q., Huang, Y., Zhang, Z., He, Z.: A multiple feature fusion based
image retrieval algorithm. In: Proceedings of the 2019 8th International Conference on
Networks, Communication and Computing, pp. 103–108 (December 2019)
30. Alzubi, J., Nayyar, A., Kumar, A.: Machine learning from theories to algorithms: an overview.
In: Journal of Physics: Conference Series, vol. 1142, no. 1, p. 012012 (November 2018)
31. Temiatse, O.S., Misra, S., Dhawale, C., Ahuja, R., Matthews, V.: Image enhancement of
lemon grasses using image processing techniques (histogram equalization). In: Panda, B.,
Sharma, S., Roy, N. (eds.) Data Science and Analytics. REDSET 2017. Communications in
Computer and Information Science, vol. 799. Springer, Singapore (2018). https://doi.org/10.
1007/978-981-10-8527-7_24
32. Dhawale, C.A., Misra, S., Thakur, S., Jambhekar, N.D.: Analysis of nutritional deficiency in
citrus species tree leaf using image processing. In: 2016 International Conference on
Advances in Computing, Communications and Informatics (ICACCI), pp. 2248–2252. IEEE
(September 2016)
33. Patil, M.Y., Dhawale, C.A., Misra, S.: Analytical study of combined approaches to content
based image retrieval systems. Int. J. Pharm. Technol. 8(4), 22982–22995 (2016)
34. Singh, S.K., Dhawale, C., Misra, S.: Past, present and future research direction for Camoufor
camouflage image detection. Int. J. Control Theory Appl. 9(22), 335–340 (2016)
Glaucoma Detection Using Features of Optic
Nerve Head, CDR and ISNT from Fundus
Image of Eye
Abstract. Human eye suffers from many diseases. If patients have peripheral,
partial or blur vision, there are chances of Glaucoma. Glaucoma causes no pain
and show no symptoms unless noticeable vision loss occurs. Glaucoma is the
2nd most leading cause of blindness. Retinal Fundus images (photograph of the
back of the eye i.e. fundus using specialised fundus cameras) contain features
like Optic Cup, Optic Disc, Blood Vessels, Neuro Retinal Rim (NRR), Macula
and Retinal Nerve Fibre Layer (RNFL). These features change in shape, size and
colour if the eye is Glaucomatous. An approach is to analyse Optic Nerve Head
(ONH) for any changes occurred due to glaucoma. Optic Disc and Optic Cup
was analysed in images taken by normal Ophthalmoscope (Fundus Camera).
Cup to Disc Ratio (CDR) and Inferior (I), Superior (S), Nasal (N) and Temporal
(T) distance was calculated from extracted Optic Cup to Optic Disc. These ISNT
distances are significant signs of Glaucoma. We have proposed an algorithm to
segment Optic Disc and Optic Cup from retinal fundus image and analyse CDR
and ISNT values to classify whether the eye is Glaucomatous or Healthy. The
Algorithm was tested on 108 private image dataset achieving Sensitivity of 90%
and Specificity of 97%. The Algorithm achieved 94.48% accuracy using dif-
ferent types of images compared to other algorithms that works only on specific
image. In this context the proposed Algorithm gives satisfactory result in terms
of accuracy.
1 Introduction
Human eye suffers from many disorders, most of them have noticeable symptoms, but
Glaucoma is not one of them. From different types of Glaucoma like Open-Angle,
Angle-Closure or Normal-Tension, Open-Angle Glaucoma is said to be a “Silent Theft
of Vision” because it does not cause any pain or has no symptoms. Glaucoma is an
irrevocable disease which leads to partial or complete vision loss. According to study
by A. Issac et al. [1], Glaucoma is the 2nd leading cause of blindness. Ophthalmologist
can detect Glaucoma using different eye tests which are expensive and require expert
observation [2]. Another approach that is used to detect Glaucoma in its very early
stage is by analysing Retinal Fundus Images. Analysing structural changes that occur in
ONH area can be used with Image Processing Techniques to diagnose Glaucoma.
According to a study by Atheesan S. et al. [3], Retinal Fundus Images contain features
like 1) Optic Cup, 2) Optic Disc, 3) Blood Vessels, 4) Neuro Retinal Rim, 5) Macula
and 6) Retinal Nerve Fibre Layer as shown in Fig. 1. These features change in shape,
size and colour if the eye is Glaucomatous.
Optic Cup
Blood
Vessels Optic Disc
Macula
Neuro Retinal
Retinal Nerve
Fibre layer
A Healthy Eye and a Glaucomatous Eye are shown in Fig. 2(A) and Fig. 2(B)
respectively. We can detect, segment and analyse these features using different Image
Processing Techniques to check whether the eye is Glaucomatous or Healthy.
Fig. 2. (A) - Healthy eye (source: our sample images) (B) - Glaucomatous eye (source: our
sample images)
318 K. Thakkar et al.
Accurate segmentation of Optic Cup and Optic Disc is a vital step in image pro-
cessing to detect Glaucoma from fundus images. The area of Optic Cup can be
compared with the area of Optic Disc. The ratio of Optic Cup and Optic Disc is called
CDR. According to Ophthalmology if the CDR is more than 0.3 (CDR > 0.3), then
there are high chances of Glaucoma Al. Bandar et al. [2]. But we can’t rely on only one
parameter for accurate result. Circular area between Optic Cup and Optic Disc is
known as NRR. The thickness of NRR from all four sides are known as Inferior
(bottom), Superior (top), Nasal (nose side), Temporal (other side). The distance
between Optic Cup and Optic Disc from each side is known as ISNT distance. ISNT
distance are larger to smaller from Inferior to Temporal respectively. This is called
ISNT Rule according to which Inferior > Superior > Nasal > Temporal. If an eye does
not obey this rule, it is most likely to have Glaucoma as shown in Fig. 3. In our
proposed algorithm, we have used detection of features like OC and OD to calculate
CDR and ISNT which is then used to classify eyes into Healthy and Glaucomatous. We
have used Image processing techniques like Transformation, Segmentation, Edge
detection and Image Correction to achieve our goal.
Optic Cup
Inferior
Distance
Problem Statement. Glaucoma is a silent disease which does not leave any symptoms
or pain. It results in irreversible vision loss hence it is required to detect it in its very
early stage to prevent vision loss before it occurs. OD and OC are first features to get
affected by glaucoma in its early stage. We can check whether the patient is likely to
get glaucoma or not by checking CDR and ISNT features obtained using OD and OC.
The Goal of this study is to attain a method to detect glaucoma using CDR and ISNT
features obtained using Digital Image Processing techniques.
Glaucoma Detection Using Features of Optic Nerve Head, CDR and ISNT 319
2 Literature Survey
Some work has been done in the field of Glaucoma detection using computerized
image processing techniques.
Anindita Septiarini et al. 2020 [4] proposed statistical approach on fundus images
to detect gaulcoma by extracting ONH features using correlation feature selection
method and classifying with k-nearest neighbour algorithm on 84 fundus images.
Proposed method measured in terms of accuracy with 95.24% but the paper only
provides accuracy of feature extraction.
Juan Carrillo et al. 2019 [5] present a computational tool for improving Glaucoma
detection in comparison with earlier work using thresholding. Result were obtained by
taking fundus image with accuracy of 88.5%. Research have not defined size of dataset
and also accuracy is low compare to other earlier researches.
Khalil et al. 2018 [6] proposed a novel method to detect Glaucoma from Spectral
Domain Optical Coherence Tomography (SD-OCT) image. Algorithm was used to
improve precision of the Inner-Limiting-Membrane (ILM) layer extraction and to refine
contour of ILM layer. The system showed clear improvement over its contemporary
systems in terms of accuracy by getting 90% accuracy. Proposed method works on
OCT images only which is expensive and hence less desirable compared to fundus
image.
Cheriguene et al. 2018 [7] proposed a CAD system-based Twin Support Vector
Machine (TWSVM) and three families of feature extraction named Gray Level Co-
occurrence Matrix, Central moments and Hu moments. System was experimented on
169 images for classification in Glaucomatous and healthy. Paper shows that classi-
fication accuracy obtained by TWSVM is very promising compared to the SVM
results. Result accuracy of 94.11% is based on comparison of TWSVM and SVM, but
it did not clearly state that the image is Glaucomatous or not.
Nirmala et al. 2017 [8] diagnosed Glaucoma using wavelet based contourlet
transform with the help of Gabor filters. They used Naïve Bayes classifiers for Clas-
sification images which gave 89% of classification accuracy. It can be improved by
improving the method or by adding more features of fundus image.
320 K. Thakkar et al.
Thangaraj et al. 2017 [9] proposed supervised learning and compared with other
techniques of Glaucoma detection. Nonlinear transformation of Support Vector
Machine (SVM) was used for classification and it was compared with Level set based
method and Fused method. Classification accuracy was 93% on public databases like
DRIONS, STARK and RIM-ONE. Papers did not state which features were used to
detect Glaucoma and how those images were processed before classification.
Al-Bander et al. 2017 [10] used Deep Learning approach to detect Glaucoma by
proposing a fully automated system based on Convolutional Neural Network (CNN).
Features were extracted automatically from raw image and classified using SVM
classifier. Accuracy, Specificity and Sensitivity were demonstrated as 88%, 90% and
85% respectively. Results are lower compared to previous researches reviewed in this
paper.
Kartik Thakkar et al. 2017 [11] proposed an algorithm to segment OD and OC to
analyse CDR and ISNT values from retinal fundus image and classify whether the eye
is Glaucomatous or Healthy. Researchers found that ISNT and CDR both feathers can
give more accurate result by giving accuracy of 94.48%.
Dr. Kinjan Chauhan et al. 2016 [12] developed data mining technique using
Decision Tree, Linear Regression and Support Vector Machine for diagnosis of
glaucoma in the retinal image. OCT image have been used in each technique to find out
their performance in term of accuracy, sensitivity and specificity. Researcher found that
Decision Tree and Linear Regression Model performs much better than SVM giving
accuracy of 99.56%. Here OCT images are used which is expensive and less available.
We also considered techniques used for calculation of CDR and ISNT in [13–15]
for our research to obtain high accuracy.
From above literature study, we can observe that no method has been proposed to
detect glaucoma from fundus images using features of OC and OD with a good
accuracy. We have observed that detection of Glaucoma can be improved by using
only fundus images and detecting it in its early stage.
The following Figs. 5 and 6 show images taken during each phase of the algorithm.
Phase 2 Phase 4
Phase 1 Phase 3
Optic Disc Inferior and Superior
Cropped Image Optic Cup Detection
Detected Distance
Fig. 5. Result images of the proposed algorithm for Glaucomatous right eye
Phase 4
Phase 1 Phase 2 Phase 3
Inferior and Superi-
Cropped Image Optic Disc Detected Optic Cup Detection
or Distance
Fig. 6. Result images of the proposed algorithm for Normal left eye
322 K. Thakkar et al.
To effectively execute all the steps of the proposed Algorithm, we have developed a GUI
Tool using MATLAB. GUI includes separate procedural functionalities for each phase
of the Algorithm. Figure 7 shows the functionality of the algorithm in our GUI Tool.
324 K. Thakkar et al.
The algorithm was tested on 108 Private Dataset images of size 720 576 pixels
from the patients of Sudhalkar Eye Hospital, Raupura Main Road, Vadodara, Gujarat,
India. The algorithm was applied on these 108 images from which 3 images were
classified as Abnormal, 60 images were classified as Glaucomatous and 48 as Normal.
The results were compared with the results of OCT and Perimetry which gave 90%
Sensitivity which means 90% of images were correctly identified with Glaucoma, 97%
Specificity which means 97% of images were correctly identified without Glaucoma
and 94.48% F-Score specified that overall accuracy of the algorithm. Detailed results
with True Positive (TP), True Negative (TN), False Positive (FP) and False Negative
(FN) are described as follows.
Technically it can be defined as.
True Positive The image is Glaucomatous, and it is detected as Glaucomatous using the
(TP) proposed model. Our Experiment showed TP value as 60, that means out
of 108 images the proposed algorithm detected 60 images as
Glaucomatous which were Glaucomatous as diagnosed through the
Perimetry Test
True Negative The image is Normal, and the system also detects it as Normal using the
(TN) proposed model. Our Experiment showed 38 TN Value, which means out
of 108 images the proposed algorithm detected 38 images as Normal
which were Normal as diagnosed through the Perimetry Test
False Positive The image is Normal, but it is detected as Glaucomatous image using the
(FP) proposed model. Our Experiment detected 1 Normal image as
Glaucomatous which gives FP value 1
False Negative The image is Glaucomatous, but the system detects it as Normal image
(FN) using the proposed model. Experiment detected 6 Glaucomatous images as
Glaucoma Detection Using Features of Optic Nerve Head, CDR and ISNT 325
Normal and that was due to false detection of the OC hence FN value was
6
Precision Quantity of Glaucomatous images (TP) detected from the cases of
Glaucomatous images (TP + FP)
Precision ¼ ðTPTPþ FPÞ
From our experiment this ratio was 0.98 and hence the Precision was 98%
Specificity True Negative Rate. Proportion of Correctly Identified Images without
Disease
Specificity ¼ ðTNTN
þ FPÞ
Proportion of Normal images were calculated as 0.9743, which gives
Specificity as 97.43% by our experiment
Sensitivity True Positive Rate. Proportion of Correctly Identified Images with Disease
(Recall) SensitivityðRecallÞ ¼ ðTP TP þ FNÞ
91% image were found correctly with Glaucoma
F-Score It measures Test’s Accuracy. It is an average of Precision and Recall
F Score ¼ 2 Precision
PrecisionRecall
þ Recall
Overall accuracy of our experiment was 94.49%
96 94.48
94.11
94 93
92
90
90 89
88
88
86
84
Tehmina Khalil, S. Cheriguene K. Nirmala et. al. Vigneswaran Al-Bander, et al. Proposed
et al. 2017 [2] et. al 2017 [3] 2017 [4] Thangaraj et al. 2017 [6] Algorithm
2017 [5]
Acknowledgements. We would like to express our gratitude and deep regards to Dr. Prashant
Chauhan, Ophthalmologist/Eye Surgeon, Shree Arvind Eye Hospital, Surat for his constant support
and encouragement. We would also like to give our sincere gratitude to Prof. Alpa Shah, Sarvajanik
Institute of Engineering and Technology, Surat for her invaluable suggestions and support.
References
1. Agarwal, A., Gulia, S., Chaudhary, S., Dutta, M.K., Travieso, C.M., Alonso-Hernandez, J.
B.: A novel approach to detect glaucoma in retinal fundus images using cup-disk and rim-
disk ratio. In: Proceedings of the 2015 International Work Conference on Bio-Inspired
Intelligence: Intelligent Systems for Biodiversity Conservation, IWOBI 2015, pp. 139–144
(2015). https://doi.org/10.1109/IWOBI.2015.7160157
Glaucoma Detection Using Features of Optic Nerve Head, CDR and ISNT 327
2. Schacknow, P.N., Samples, J.R.: The glaucoma book: a practical, evidence-based approach
to patient care (2010)
3. Lowell, J., et al.: Optic nerve head segmentation. IEEE Trans. Med. Imaging 23(2), 256–264
(2004). https://doi.org/10.1109/TMI.2003.823261
4. Septiarini, A., Harjoko, A., Pulungan, R., Ekantini, R.: Automated detection of retinal nerve
fiber layer by texture-based analysis for glaucoma evaluation. Healthcare Inform. Res. 24(4),
335–345 (2018). https://doi.org/10.4258/hir.2018.24.4.335
5. Carrillo, J., Bautista, L., Villamizar, J., Rueda, J., Sanchez, M., Rueda, D.: Glaucoma
detection using fundus images of the eye. In: 2019 22nd Symposium on Image, Signal
Processing and Artificial Vision, STSIVA 2019 - Conference Proceedings, pp. 1–4 (2019).
https://doi.org/10.1109/STSIVA.2019.8730250
6. Khalil, T., Akram, M.U., Raja, H., Jameel, A., Basit, I.: Detection of glaucoma using cup to
disc ratio from spectral domain optical coherence tomography images. IEEE Access 6(c),
4560–4576 (2018). https://doi.org/10.1109/ACCESS.2018.2791427
7. Cheriguene, S., Azizi, N., Djellali, H., Bunakhla, O., Aldwairi, M., Ziani, A.: New computer
aided diagnosis system for glaucoma disease based on twin support vector machine. In:
Proceedings of EDIS 2017 - 1st International Conference on Embedded and Distributed
Systems, vol. 2017-Decem, pp. 1–6 (2018). https://doi.org/10.1109/EDIS.2017.8284039
8. Nirmala, K., Venkateswaran, N., Kumar, C.V., Christobel, J.S.: Glaucoma detection using
wavelet based contourlet transform. In: Proceedings of the International Conference on
Intelligent Computing Control, I2C2 2017, vol. 2018-Janua, pp. 1–5 (2018). https://doi.org/
10.1109/I2C2.2017.8321875
9. Thangaraj, V., Natarajan, V.: Glaucoma diagnosis using support vector machine. In:
Proceedings of the 2017 International Conference on Intelligent Computing and Control
Systems, ICICCS 2017, vol. 2018-Janua, pp. 394–399 (2017). https://doi.org/10.1109/
ICCONS.2017.8250750
10. Al-Bander, B., Al-Nuaimy, W., Al-Taee, M.A., Zheng, Y.: Automated glaucoma diagnosis
using deep learning approach. In: 2017 14th International Multi-Conference System Signals
Devices, SSD 2017, vol. 2017-Janua, pp. 207–210 (2017). https://doi.org/10.1109/SSD.
2017.8166974
11. Thakkar, K., Chauhan, K., Sudhalkar, A., Gulati, R.: A novel approach for glaucoma
detection using features of optic nerve head. Int. J. Trend Sci. Res. Dev. (2017)
12. Chauhan, K., Chauhan, P., Sudhalkar, A., Lad, K., Gulati, R.: Data mining techniques for
diagnostic support of glaucoma using stratus OCT and perimetric data. Int. J. Comput. Appl.
151(8), 34–39 (2016). https://doi.org/10.5120/ijca2016911855
13. Agarwal, A., Gulia, S., Chaudhary, S., Dutta, M.K., Burget, R., Riha, K.: Automatic
glaucoma detection using adaptive threshold based technique in fundus image. In: 2015 38th
International Conference on Telecommunications and Signal Processing, TSP 2015,
pp. 416–420 (2015). https://doi.org/10.1109/TSP.2015.7296295
14. Singh, A., Dutta, M.K., ParthaSarathi, M., Uher, V., Burget, R.: Image processing based
automatic diagnosis of glaucoma using wavelet features of segmented optic disc from fundus
image. Comput. Methods Programs Biomed. 124, 108–120 (2016). https://doi.org/10.1016/j.
cmpb.2015.10.010
15. Kotowski, J., Wollstein, G., Ishikawa, H., Schuman, J.S.: Imaging of the optic nerve and
retinal nerve fiber layer: an essential part of glaucoma diagnosis and monitoring. Surv.
Ophthalmol. 59(4), 458–467 (2014). https://doi.org/10.1016/j.survophthal.2013.04.007
Crop Yield Estimation Using Machine
Learning
Abstract. The recent area of interest for computer scientists and data
analysts working on precision farming has been the use of machine learn-
ing and deep learning algorithms to recommend crops to the farmers and
predict yield. Improving crop yields not only helps farmers but also seeks
to address global problems such as food shortages. Predictions can be
made taking into account forecasts of climate, soil and its mineral con-
tent, moisture, crop historic performance, rainfall and others. Crop Data
from six states of India for five crops from 2009–2016 has been used
for training and validation of different machine learning regression algo-
rithms to produce a comprehensive study of crop yield estimate. The
yields are estimated using Linear models, Support Vector Regressor, K
Neighbors Regressor, Tree-based models, Ensemble models and Shal-
low Neural Networks with R-squared score for evaluation. Test accuracy
showed promising results in ensemble models and neural networks. Extra
Trees Regressor is the best model with Mean Absolute Error of 351.10
and maximum accuracy of 99.95%. This paper aims at providing state of
art implementation on machine learning algorithms to facilitate farmers,
governments, economists, banks to estimate the crop yields in Indian
states based on specific parameters.
1 Introduction
As estimated by the United Nations, the global population is estimated to
reach 9.8 billion by 2050. This would create large need for Food and Agriculture
products to maintain the demand. Harvard Business Review reports that the
market for food is projected to rise between 59 % and 98 % by 2050. UN FAO
claims that almost one-third of the total food produced i.e 1.3 billion tonnes goes
c Springer Nature Singapore Pte Ltd. 2021
K. K. Patel et al. (Eds.): icSoftComp 2020, CCIS 1374, pp. 328–342, 2021.
https://doi.org/10.1007/978-981-16-0708-0_27
Crop Yield Estimation Using Machine Learning 329
missing or discarded, while 795 million people are starving. Consumers in rich
countries waste almost as much food annually (222 million tonnes) as the whole
of sub-Saharan African net food production (230 million tonnes). Currently,
almost 9 million people die of starvation and hunger-related diseases each year
as analysed by theworldcounts. This is greater than the cumulative effects
of AIDS, Malaria and TB. Hence it must be matter of concern to the world
governments to overcome this challenge faced by the entire human race. The
actions of governments, farmers, experts and technology together could solve
this situation.
The fast pace of technology has opened new arenas for better human life.
Trending technologies like Machine Learning, Internet of Things, Augmented
and Virtual Reality, Data Analytics, Cryptocurrency, etc has been in news for a
while. Machine Learning is an emerging computer science domain that attracts
millions of computer scientists’ attention to it. It is a fast growing field which
subsequently has led to Computer Vision, Deep Learning, Natural Language Pro-
cessing, and others. Machine learning means providing the ability to computers
to learn themselves rather than to program them explicitly. This groundbreak-
ing concept has lead to the development of smart recommendation and predic-
tion systems which has exceptional results in product recommendations used
by multi-national E-Commerce companies, Movies and Video Recommendation
used by Online Video streaming platforms, Stock Price Prediction, etc. The wide
spread use of machine learning in various domains has eventually started and
industries are already benefiting from it. Machine learning and Deep learning is
heavily being used in domains like Medical Science and Drug Development, Agri-
culture, Self Driving automobiles, Speech controlled Home Automation, Edu-
cational services, Space and Aviation, Recommendation Systems, Archaeology,
Deep Space Exploration, etc. This has benefited mankind to a great extent and
would continue to do so in future.
This paper aims to discuss a yield-estimation method using state-of-the-art
machine learning regression techniques to approximate crop yields according to
the specified conditions and the available historical evidence. Factors influenc-
ing crop’s production include rainfall, temperature, altitude, soil and moisture
content, field emissions, crop diseases, fertilizer application, irrigation timing,
etc. Average Rainfall for the Indian subcontinent as forecasted by The Indian
Metro logical Department (IMD) is 300–650 mm (11.8–25.6 in.). Because rain-
fall is quite unpredictable and distributed unequally among the sub continent,
many farmers need to rely on irrigation canals and bore wells. Though there
is quite variance in the irrigation facilities available to the farmers in different
states. The soil and its mineral content are another major factor. In India, 9
major soil types are alluvial soil, red and yellow soil, sandy and alkaline soil,
peaty and marshy soil, woodland and mountain soil. These soils differ in their
chemical composition and mineral content. Depending on their genetics, crops
require different soils to grow. Weather and Climate heavily affects the crop,
some are crops of Rabi season (July–October), while others are crops of Kharif
season (October–March).
330 N. Patel et al.
The soil’s moisture content also plays a major role in influencing crop yield
and nutrient content. Since the Industrial Revolution, pollution has been ever
rising. Pollution influences not only crop yield, but also decreases crops’ nutri-
tional value. The possible consequences of plant pollution could be calculated as:
interference in enzyme processes, alteration in chemistry and physical structure,
growth slowdown and decreased production due to metabolic structural changes,
rapid acute tissue degeneration. Weeds are undesirable grass/plants that grow
together with the main crops that usually use most of the soil nutrients and
therefore the main crop does not grow healthy. Every year, millions of dollars
are wasted in weed prevention and removal. The next most significant factor
impacting crop yield is fertilizer. They generally consist of three major macro-
nutrients, the nitrogen (N), the phosphorus (P) and the potassium (K). Calcium,
magnesium, and sulfur are also present as the secondary macro-nutrients. The
required fertilizer dosage produces an improvement in crop yield. If a random
amount of fertilizers is used, the crop can have serious tissue deterioration result-
ing in less yield and could degrade the quality of the soil hence making the soil
unfit for future plantations. It is therefore necessary to provide a proper dose
of fertilizers at regular intervals. This paper discusses at length how different
parameters affect crop yields and how different algorithms could be used to pre-
dict agricultural produce. This paper uses data from various states as a whole
to predict their agricultural production. Other work were either too specific for
a region or a factor affecting yield. Hence, this paper aims to provide a novel
approach to data of different states considering the affecting factors.
Section 2 contains literature survey of the existing methods used and
implemented for different factors affecting crop yield. Section 3 contains our
implementations of different approaches and their results assimilated together.
Section 4 contains the possible future work and conclusion.
Neural Network system which would be taking into account data like Humid-
ity, Temperature, Wind speed and Sunlight for Crop Monitoring. Paper by [21]
addresses an application named RSF that recommends crops to farmers. The
system detects user position and selects top N related upazillas (sub-region of
a district) taking into account agro-climatic and agro-ecological data. Based on
it, the application recommends top C crops to the user. It reported an average
precision of 0.72 and an average recall of 0.653.
A thorough review on crop yield prediction has been done by [14]. Authors
selected 80 papers out of which 50 used machine learning and the rest used
deep learning. The main finding was that temperature, rainfall, and soil type
were the most used features. Work by authors [11] suggests the use of Deep
Neural Networks rather than Shallow Neural Networks and Regression Tree for
yield estimation. Their studies have shown that environmental variables had a
greater influence than genotype on crop yields. Extending their work, authors
[12] proposed a CNN-RNN based model, which focused on the time dependency
of environmental variables.
Authors [19] worked on how to apply machine learning to farm data. A case
report on the culling of dairy herds has been clarified and data mining has been
applied. In [16], the authors discussed forecasting crop yields using machine
learning and deep learning for corn in Illinois using Caffe - a Deep Learning
framework developed by University of California - Berkeley. Two InnerProduct-
Layer’s network model gave the best results, achieving Root Mean Square Error
(RMSE) of 6.298 (standard value). While single InnerProductLayer and Support
Vector Regressor gave RMSE of 7.427 and 8.204 respectively. The authors [7]
explore the use of machine learning methods in order to derive new information
in the form of generic decision rules for efficient use of capital and manpower.
In [6], authors used various algorithms such as SVM, C4.5, CHAID, KNN, K-
means, Decision Tree, Neural Network and Naive Bayes to provide a method for
smart agriculture. Their analysis also included a paper that used the Hadoop
system to carry out rigorous calculations. Work by [20], summarizes the different
Machine Learning algorithms used for crop production estimation.
2.2 Rainfall
Rainfall is a major factor for agriculture. In a country like India with such diverse
lands, some states like Punjab have enough water via irrigation, but consider-
ing the majority, most farmers have to be dependent on rainfall. Sowing crop
solely relying on rainfall could prove disastrous for the farmers. Authors [28]
have worked on the disparity in rainfall in the Indian sub continent for a period
of over 150 years. The trends on rainfall vary measurably because India is a mix
of different terrains. Concentrated on the long-term patterns of extreme mon-
soon rainfall deficiency and abundance in Indian areas, [23] analysed 135 years
(1871–2005) of climate change using the non-parametric Mann-Kendall method-
ology. The analysis covered nearly 90% of the Indian subcontinent. Authors [9]
calculated Standard Precipitation Index (SPI) using Mann Kendall trend test
332 N. Patel et al.
Soil affects considerably in crop production. Various types of soils found in India
are Red, Black, Brown, Alluvial, Laterite and Sandy. The soils rich in mineral
content is the best suitable for crops to grow as well as to have proper nutritional
value. Soil includes number of parameters like pH, moisture content, mineral
content, temperature, humidity, water retainment capacity, ground water level,
etc. Authors [5] suggests using Artificial Neural Networks to predict crop yield
considering soil parameters like temperature, pH, humidity, etc. Mineral content
of iron, copper, manganese, sulphur, magnesium, calcium, potassium, nitrogen,
phosphate, organic carbon, etc. were taken into consideration. In order to train
their Artificial Neural Network model, the algorithm proposed by them uses
back propagation.
3.1 Methodology
Figure 1 illustrates the working of training and evaluation the system. Differ-
ent parameters are provided to various Regression Models to evaluate the best
performance based on test accuracy and Mean Absolute Error (MAE) for esti-
mating crop yield. Maximum accuracy with minimum MAE would be the ideal
model for prediction. Also necessary care has to be taken for the model not to
overfit the training data. All the hyper-parameters have been fine tuned for best
results. For the purpose of training, variety of models like Linear Models, SVR,
KNR, Tree based Models, Ensemble Models and Neural Networks have been
studied and implemented.
Crop Yield Estimation Using Machine Learning 335
The most important task of any machine learning problem is to gather accurate
data. The quality of data determines performance of the model. The current
problem requires data such as area, rainfall, state/region, climate, soil, tem-
perature, etc to train. However it is not possible to obtain consistent data for
all parameters. The current dataset includes data from Government of India
website. Different features like Rainfall, Year, State, Area, Production are avail-
able for different crops and has been used for preparing this dataset. Table 1
shows the sample data. The dataset consisted of five major crops Rice, Wheat,
Maize, Sugarcane, Cotton for six states Odisha, Kerala, Tamil Nadu, Gujarat,
Bihar, Uttar Pradesh for seven years (2009–2016). The states and crops are des-
ignated numbers rather than strings while training. The values of Area are in
Thousand Hectares, Rainfall in millimetres and Production in Thousand Tonnes.
Some parameters of the data were not available, hence such entries were removed
from the dataset before training. Data was divided as 90% training data and 10%
testing data.
3.3 Implementation
i (Yi − Ȳ )
2
n
1
M eanAbsoluteError = |Yi − Ŷi |
n i=1
Yi indicates True value,
Ŷi indicates Predicted value,
Ȳ shows the mean of True values.
3.4 Results
Table 2 summarizes the accuracy and MAE obtained for different models and
the detailed discussion of results are covered in subsequent subsections.
Crop Yield Estimation Using Machine Learning 337
Linear Models, SVR and KNR. Linear models in a regression problem tries
to fit a line to the given data points, minimizing the loss. The most basic is
the Linear Regression model. Ridge Regression and Lasso Regression are minor
variations of the Linear Regression. Elastic Net and MultiTask Elastic Net are
the combination of Ridge and Lasso Regression. As the data had four features
and continuous values as output, it was expected that linear models would not be
the perfect choice for it and the accuracy confirmed it. The maximum accuracy
of linear models was 32.89% by Lasso Regression. Linear models sometimes
estimated the yield to be negative as shown in Fig. 2a. The models failed to
predict the output accurately ending in large mean absolute error of 11686.79.
K Neighbors Regressor which works similar to K Nearest Neighbors (KNN),
showed a great improvement in terms of accuracy over the linear models. The
MAE was 1592 and accuracy obtained was significantly higher. The reason of it
working better than linear models was because it used its nearest neighbors to
predict output. Support Vector Regressor like Support Vector Machines creates
decision boundaries for the parameters and predicts output based on it. Figure
2b shows how well SVR performed, however it estimated one negative value
and so the MAE was more than KNR. Though SVR proved to be a significant
improvement over KNR with highest accuracy of 99.38%.
Neural Networks. Neural Networks (NN) are inspired from the human brain
and tries to mimic it. Many neurons are inter connected to each other so as
to create a complex network. A neural network of 2 hidden layers with 60 and
10 neurons was trained for 20000 epochs with adam optimizer and a learn-
ing rate of 0.0253. Mean absolute error was set as the loss criterion. Overall
Crop Yield Estimation Using Machine Learning 339
accuracy obtained was 99.91% which indicates how well the model performs.
Figure 3b shows the predictions of the final model. After many attempts, this
neural network was finally created with precise hyper-parameters.
References
1. Abdullahi, H.S., Sheriff, R., Mahieddine, F.: Convolution neural network in pre-
cision agriculture for plant image recognition and classification. In: 2017 Seventh
International Conference on Innovative Computing Technology (Intech), pp. 1–3.
IEEE, Londrés (2017)
2. Affrin, K., Reshma, P., Kumar, G.N.: Monitoring effect of air pollution on agricul-
ture using WSNs. In: 2017 IEEE Technological Innovations in ICT for Agriculture
and Rural Development (TIAR), pp. 46–50. IEEE (2017)
3. Chandgude, A., Harpale, N., Jadhav, D., Pawar, P., Patil, S.M.: A review on
machine learning algorithm used for crop monitoring system in agriculture. Int.
Res. J. Eng. Technol. 5(04), 1470 (2018)
4. Chunjing, Y., Yueyao, Z., Yaxuan, Z., Liu, H.: Application of convolutional neural
network in classification of high resolution agricultural remote sensing images. Int.
Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 42, 989 (2017)
5. Dahikar, S.S., Rode, S.V.: Agricultural crop yield prediction using artificial neural
network approach. Int. J. Innov. Res. Electr. Electron. Instrum. Control Eng. 2(1),
683–686 (2014)
6. Dighe, D., Joshi, H., Katkar, A., Patil, S., Kolkate, S.: Survey of crop recommen-
dation systems. IRJET 05, 476–481 (2018)
7. Dimitriadis, S., Goumopoulos, C.: Applying machine learning to extract new
knowledge in precision agriculture applications. In: 2008 Panhellenic Conference
on Informatics. pp. 100–104. IEEE (2008)
8. Jeon, W.S., Rhee, S.Y.: Plant leaf recognition using a convolution neural network.
Int. J. Fuzzy Logic Intell. Syst. 17(1), 26–34 (2017)
9. Jha, S., Sehgal, V.K., Raghava, R., Sinha, M.: Trend of standardized precipitation
index during Indian summer monsoon season in agroclimatic zones of India. Earth
Syst. Dyn. Discuss. 4, 429–449 (2013)
10. Ji, S., Zhang, C., Xu, A., Shi, Y., Duan, Y.: 3D convolutional neural networks
for crop classification with multi-temporal remote sensing images. Remote Sens.
10(1), 75 (2018)
Crop Yield Estimation Using Machine Learning 341
11. Khaki, S., Wang, L.: Crop yield prediction using deep neural networks. Front. Plant
Sci. 10, 621 (2019)
12. Khaki, S., Wang, L., Archontoulis, S.V.: A CNN-RNN framework for crop yield
prediction. Front. Plant Sci. 10, 1750 (2020)
13. Khatri-Chhetri, A., Aggarwal, P.K., Joshi, P.K., Vyas, S.: Farmers’ prioritization
of climate-smart agriculture (CSA) technologies. Agric. Syst. 151, 184–191 (2017)
14. van Klompenburg, T., Kassahun, A., Catal, C.: Crop yield prediction using
machine learning: a systematic literature review. Comput. Electron. Agric. 177,
105709 (2020)
15. Kussul, N., Lavreniuk, M., Skakun, S., Shelestov, A.: Deep learning classification of
land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens.
Lett. 14(5), 778–782 (2017)
16. Kuwata, K., Shibasaki, R.: Estimating crop yields with deep learning and remotely
sensed data. In: 2015 IEEE International Geoscience and Remote Sensing Sympo-
sium (IGARSS), pp. 858–861. IEEE (2015)
17. Lipper, L., et al.: Climate-smart agriculture for food security. Nat. Clim. Change
4(12), 1068–1072 (2014)
18. Mahato, A.: Climate change and its impact on agriculture. Int. J. Sci. Res. Publ.
4(4), 1–6 (2014)
19. McQueen, R.J., Garner, S.R., Nevill-Manning, C.G., Witten, I.H.: Applying
machine learning to agricultural data. Comput. Electron. Agric. 12(4), 275–293
(1995)
20. Mishra, S., Mishra, D., Santra, G.H.: Applications of machine learning techniques
in agricultural crop production: a review paper. Indian J. Sci. Technol. 9(38), 1–14
(2016)
21. Mokarrama, M.J., Arefin, M.S.: RSF: a recommendation system for farmers. In:
2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), pp. 843–
850. IEEE (2017)
22. Nayyar, A., Puri, V.: Smart farming: IoT based smart sensors agriculture stick for
live temperature and moisture monitoring using Arduino, cloud computing & solar
technology. In: Proceedings of the International Conference on Communication and
Computing Systems (ICCCS-2016), pp. 9781315364094–121 (2016)
23. Pal, I., Al-Tabbaa, A.: Regional changes in extreme monsoon rainfall deficit and
excess in India. Dyn. Atmos. Oceans 49(2–3), 206–214 (2010)
24. Pudumalar, S., Ramanujam, E., Rajashree, R.H., Kavya, C., Kiruthika, T., Nisha,
J.: Crop recommendation system for precision agriculture. In: 2016 Eighth Inter-
national Conference on Advanced Computing (ICoAC), pp. 32–36. IEEE (2017)
25. Pujari, J.D., Yakkundimath, R., Byadgi, A.S.: Identification and classification of
fungal disease affected on agriculture/horticulture crops using image processing
techniques. In: 2014 IEEE International Conference on Computational Intelligence
and Computing Research, pp. 1–4. IEEE (2014)
26. Puri, V., Nayyar, A., Raja, L.: Agriculture drones: a modern breakthrough in
precision agriculture. J. Stat. Manag. Syst. 20(4), 507–518 (2017)
27. Rebetez, J., et al.: Augmenting a convolutional neural network with local
histograms-a case study in crop classification from high-resolution UAV imagery.
In: ESANN (2016)
28. Saha, S., Chakraborty, D., Paul, R.K., Samanta, S., Singh, S.: Disparity in rainfall
trend and patterns among different regions: analysis of 158 years’ time series of
rainfall dataset across India. Theor. Appl. Climatol. 134(1–2), 381–395 (2018)
29. Shinde, K., Khadke, P.: The study of influence of rainfall on crop production in
Maharashtra state of India (January 2017)
342 N. Patel et al.
30. Singh, P., Kaur, A., Nayyar, A.: Role of Internet of Things and image processing
for the development of agriculture robots. In: Swarm Intelligence for Resource
Management in Internet of Things, pp. 147–167. Elsevier (2020)
31. Singh, S., Singh, K.M., Singh, R., Kumar, A., Kumar, U.: Impact of rainfall on agri-
cultural production in Bihar: a zone-wise analysis. Environ. Ecol. 32(4A), 1571–
1576 (2014)
32. Sladojevic, S., Arsenovic, M., Anderla, A., Culibrk, D., Stefanovic, D.: Deep neural
networks based recognition of plant diseases by leaf image classification. Comput.
Intell. Neurosci. 2016, 1–11 (2016)
33. Tripathy, A., et al.: Data mining and wireless sensor network for agriculture
pest/disease predictions. In: 2011 World Congress on Information and Commu-
nication Technologies, pp. 1229–1234. IEEE (2011)
34. Veenadhari, S., Misra, B., Singh, C.: Machine learning approach for forecasting
crop yield based on climatic parameters. In: 2014 International Conference on
Computer Communication and Informatics, pp. 1–5. IEEE (2014)
35. Wang, A.X., Tran, C., Desai, N., Lobell, D., Ermon, S.: Deep transfer learning for
crop yield prediction with remote sensing data. In: Proceedings of the 1st ACM
SIGCAS Conference on Computing and Sustainable Societies, pp. 1–5 (2018)
36. You, J., Li, X., Low, M., Lobell, D., Ermon, S.: Deep Gaussian process for crop
yield prediction based on remote sensing data. In: Thirty-First AAAI Conference
on Artificial Intelligence (2017)
37. Zhang, H., et al.: Design and implementation of crop recommendation fertiliza-
tion decision system based on WEBGIS at village scale. In: Li, Daoliang, Liu,
Yande, Chen, Yingyi (eds.) CCTA 2010. IAICT, vol. 345, pp. 357–364. Springer,
Heidelberg (2011). https://doi.org/10.1007/978-3-642-18336-2 44
38. Zhong, L., Hu, L., Zhou, H.: Deep learning based multi-temporal crop classification.
Remote Sens. Environ. 221, 430–443 (2019)
Embedded Linux Based Smart Secure IoT
Intruder Alarm System Implemented
on BeagleBone Black
1 Introduction
Today Linux has been installed on many different microprocessors and runs on many
platforms that don’t contain any hard disk. Simply, running Linux on Embedded
Systems is called Embedded Linux. Embedded Systems are resource constrained;
battery operated, some of them require real time performance, based other than the x86
architecture like ARM, Alf-Egil Bogen Vegard Wollan RIS (AVR), PowerPC, MIPS
and so on. Using Embedded Linux reduces cost of hardware by taking the advantage of
multitasking operating system brings to embedded devices. By its quirky benefits,
The BeagleBone Black has TI Sitara AM3358BZCZ100 Processor with 1 GHz and
2000 MIPS speed. 512 DDR3L 800 MMHz SRAM Memory and 4 GB with 8 bit
Embedded mmc Onboard flash is present in this Single Board Computer. It works on
1 GHz ARM Cortex A8 and has GPU of PowerVR SGX530. It has total external 92
pins in two headers (P8 and P9 with 46 each). It has Ethernet (10/100 RJ45), SD,
MMC, USB, micro HDMI Connectors on the board. It has enormous peripherals
connectivity possible like 4 x UART (Universal Asynchronous Receiver Transmitter),
LCD (Liquid Crystal Display), MMC1 (MultiMedial Card), 2 x SPI (Serial Peripheral
Interface), 2x I2C (Inter-Integrated Circuit), ADC (Analog to Digital Converter),
4 Timers, 8 PWMs (Pulse Width Modulation) and 2x CAN (Controller Area Network).
Three on-board buttons are there namely reset, power and boot. The Software Com-
patible with BBB is Linux, Android, and Cloud9 IDE (Integrated Development
Environment) with Bonescript.
2 Related Work
We have referred some of the papers. The following are the references mentioned from
the papers.
R. Dinakar et al. implemented the smart sensoring mechanism with notifying if
intruder is detected using PIR Sensor. The Raspberry Pi being robust Single-Board
Computer (SBC) is used as the platform for implementing the Home Security System.
The notification is inculcated by means of time constraint of certain minutes of
detection by PIR Sensor, if it detects then it sends the message of presence of intruder
[3]. S. Osmin et al. developed the Intrusion Detection System using Raspberry Pi for
securing the home and office. The ARM based Raspberry Pi platform is used for
operating and controlling the motion detection in the office or home which is deployed
using Raspberry Pi Camera Module. The Telegram Application is integrated with the
project for giving the alert to the user for any intruder notified by the camera module on
the certain time basis [4].
346 J. J. Patoliya et al.
R. Nadaf et al. deployed the Home Security System for intruder detection using
Raspberry Pi using the Raspberry Pi Camera module. The Python and Nodejs
framework is used for implementation of the system. The opencv library in python is
used to process the image captured by Raspberry Pi Camera for face detection of
intruder in the office or home. If the intrusion is detected using Camera module, the
alert will be sent to the owner’s smartphone [5]. N. Surantha et al. designed the Smart
Home Security system using Raspberry Pi and Arduino as the platform for the system.
The Webcam is mounted on the Pi for the continuous monitoring of the home and PIR
sensor senses the movement if any invader is entered. SVM is used for the verification
of object or human. The alert is generated in case of the detection of suspicious
movement and it is sent to the owner about its existence [6].
B. Ansari et al. implemented Character Device Driver on Raspberry Pi character-
izing the platform as ARM and customized kernel. The Implementation of interrupt
based GPIO control of raspberry Pi using character device driver is done. The Cross-
compiler is inculcated for converting the code written and compiled in x86 platform and
used for running in ARM based platform. The interrupt handling mechanism is used for
toggling the LED in the device driver using gpio API in kernel space [1]. C. Profentzas
et al. analyzed the performance of secure boot in Embedded Systems. The Performance
of the secure booting time is compared and analyzed among the single board computers
like Raspberry Pi and BeagleBone Black. The hardware cryptographic based and
software-based techniques are verified in both the embedded linux ARM based plat-
form. The use of the specific Single Board Computer is categorized on the basis of the
requirement of the speed of the Secure booting, the efficiency and the application
required to be built for the Industrial Internet of Things (IIoT) [7]. K. Patel et al.
implemented the IoT based Security System using Raspberry Pi (ARM Platform). PIR
Sensor is used for monitoring the motion in the home. The detection of intruder by the
PIR Sensor strikes the email alert to the owner of the house. The Email Alert by SMTP
Protocol is implemented using Raspbian OS framework [8] (Fig. 2).
3 Proposed Work
4 Mathematical Model
A mathematical model is an abstract model that utilizes the mathematical language like
Input, Process and output elements to denote the system behavior.
Input:
• Character Device Driver
• PIR Sensor interfaced with BBB
• Buzzer interfaced with BBB
• Crypto Cape on BBB
348 J. J. Patoliya et al.
Failure Cases:
• AWS Cloud is not connected properly
• Mobile Application is not working.
• Internet Connectivity is not there in all the places.
Success Cases: Successfully received the alert in the mobile application if the intruder
is there in the house using AWS Cloud platform as the mediator.
5 Experimental Setup
As shown in Fig. 3, The first stage of the boot process is to execute the code in
Boot ROM. This code will run automatically after power up the device as manufacturer
has hard-coded the code in the ROM, so we can’t change it. Boot ROM configures the
CPU and initializes the essential peripherals. The Boot ROM passes SHA-1 hash to the
TMP module in Crypto Cape for verifying. Then device has been prepared to execute
the Secondary Program Loader. The CPU starts the execution of the SPL program after
passing the control from Boot ROM to SPL into the RAM. Then, MLO is controlling
the second phase of the boot sequence by preparing the device to execute the U-Boot.
Again to securely confirming the completion of the sequence and authenticating, hash
is transferred to TMP Module. After only successful verification, it will be letting us to
move to the next step. Finally, the SPL will try to find the U- Boot Image, load it into
the RAM and pass the control of CPU to U-boot. Then, U- boot is the robust OS
bootloader providing the ability to configure the booting options using command
prompt. By setting the environmental and configuration variables, U- boot will load
and pass the execution flow to the kernel. The final hash file of kernel and boot
parameters is transferred to the TPM. The TPM is now free to decrypt the blob file of
data. As the attacker does not know the contents of the decrypted blob file, they can’t
be intruding into the OS even they can tampered the Boot TOM. Hence, this decrypted
blob file can be used as disk encryption key or can be shown to the sysadmin for
trusting the safety of the hardware of any intrusion. Then, kernel image can be binary
file or zipped like uImage or zImage. It contains the architecture of the kernel to be
compiled. This way the kernel image is secure.
7 Test Results
We can simply configure the AWS IoT service for sending the notification alert to the
Mobile application whenever certain triggering is there. So, when the PIR Sensor
senses the some unusual motion, the triggered message is sent remotely to the Mobile
Application using AWS IoT Service (Fig. 7).
8 Conclusion
Smart Secure IoT Intrusion Detection System with BeagleBone Black is presented here
which included TMP Module Crypto Cape, PIR Sensor, Interface between Mobile
Application and AWS Cloud. Security is the most indispensable for the embedded
applications. So, our main objective was to secure the kernel of the running system and
Embedded Linux Based Smart Secure IoT Intruder Alarm System 355
acquire the data remotely. TMP module and AWS Cloud Services added the fruitful
factor in the smart Intrusion Detection system using PIR Sensor. The BeagleBone
Black is very robust and effective for customized secure kernel in which intruder can’t
be easily attacking the system. If any intruder is detected by PIR Sensor, the alert is sent
to the Mobile Application remotely via the Cloud Platform and Buzzer turns high. The
time taken for the secure boot and transfer of data to the Cloud and mobile application
is quite reliable using this ARM Platform based BeagleBone Black. The deployed
system is the most beneficial among the discussed related work and other system of
Intrusion detection is that BeagleBone Black is robust platform which encapsulates the
customized kernel security with efficient timing. It provides better performance than
Raspberry Pi. The future extension of the developed system would be to add the real
time camera monitoring for the Intrusion Detection and providing better efficiency and
authenticity by trusted boot.
References
1. Ansari, B., Kaur, M.: Design and implementation of character device driver for customized
kernel of ARM based platform. In: International Conference on Current Trends in Computer,
Electrical, Electronics and Communication (ICCTCEEC) (2017)
2. Nayyar, A., Puri, V.: A comprehensive review of BeagleBone technology: smart board
powered by ARM. Int. J. Smart Home 10(4), 95–108 (2016)
3. Dinakar, R., Singh, D., Abbas, M., Alex, M., Yadav, A.: IoT based home security system
using Raspberry Pi. Int. J. Innov. Res. Comput. Commun. Eng. 6(4), 3835–3842 (2018)
4. Osmin, S.: Smart intrusion alert system using Raspberry Pi and PIR sensor. Spec.
Issue J. Kejuruteraan 1(1) (2017)
5. Nadaf, R., Hatture, S., Bonal, V., Naik, S.: Home security against human intrusion using
Raspberry Pi. In: International Conference on Computational Intelligence and Data Science
(ICCIDS) (2019)
6. Surantha, N., Wicaksono, W.: Design of smart home security system using object recognition
and PIR sensor. In: 3rd International Conference on Computer Science and Computational
Intelligence (2018)
7. Profentzas, C., Gunes, M., Nikolakopoulos, Y., Landsiedel, O., Almgren, M.: Performance of
secure boot in embedded systems. In: IEEE Conference (2018)
8. Tanwar, S., Patel, P., Patel, K., Tyagi, S., Kumar, N., Obadiat, M.: An advanced internet of
thing based security alert system for smart home. IEEE J. (2017)
9. Agarwal, N., Paul, S., Gujar, P., Gite, V.: Internet of Things (IoT) based switchox using
MQTT protocol. Int. J. Res. Eng. Technol. (IJRET) 5(4) (2016)
Author Index