Machine Learning Algorithms and Applications 2021
Machine Learning Algorithms and Applications 2021
and Applications
Scrivener Publishing
100 Cummings Center, Suite 541J
Beverly, MA 01915-6106
The objective of the series is to bring together the global research scholars, experts, and scientists in
the research areas of sustainable computing and optimization from all over the world to share their
knowledge and experiences on current research achievements in these fields. The series aims to provide
a golden opportunity for the global research community to share their novel research results, findings,
and innovations to a wide range of readers. Data is everywhere and continuing to grow massively,
which has created a huge demand for qualified experts who can uncover valuable insights from the
data. The series will promote sustainable computing and optimization methodologies in order to solve
real life problems mainly from engineering and management systems domains. The series will mainly
focus on the real-life problems, which can suitably be handled through these paradigms.
Publishers at Scrivener
Martin Scrivener ([email protected])
Phillip Carmical ([email protected])
Machine Learning Algorithms
and Applications
Edited by
Mettu Srinivas,
National Institute of Technology, Warangal, India
G. Sucharitha,
Institute of Aeronautical Engineering,
Hyderabad, India
Anjanna Matta
Faculty of Science and Technology, IFHE,
Hyderabad, India
and
Prasenjit Chatterjee
MCKV Institute of Engineering, Howrah, India
This edition first published 2021 by John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
and Scrivener Publishing LLC, 100 Cummings Center, Suite 541J, Beverly, MA 01915, USA
© 2021 Scrivener Publishing LLC
For more information about Scrivener publications please visit www.scrivenerpublishing.com.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or other-
wise, except as permitted by law. Advice on how to obtain permission to reuse material from this title
is available at http://www.wiley.com/go/permissions.
For details of our global editorial offices, customer services, and more information about Wiley prod-
ucts visit us at www.wiley.com.
ISBN 978-1-119-76885-2
Set in size of 11pt and Minion Pro by Manila Typesetting Company, Makati, Philippines
10 9 8 7 6 5 4 3 2 1
Contents
Acknowledgments xv
Preface xvii
Part 1: Machine Learning for Industrial
Applications 1
1 A Learning-Based Visualization Application for Air Quality
Evaluation During COVID-19 Pandemic in Open Data
Centric Services 3
Priyank Jain and Gagandeep Kaur
1.1 Introduction 4
1.1.1 Open Government Data Initiative 4
1.1.2 Air Quality 4
1.1.3 Impact of Lockdown on Air Quality 5
1.2 Literature Survey 5
1.3 Implementation Details 6
1.3.1 Proposed Methodology 7
1.3.2 System Specifications 8
1.3.3 Algorithms 8
1.3.4 Control Flow 10
1.4 Results and Discussions 11
1.5 Conclusion 21
References 21
2 Automatic Counting and Classification of Silkworm
Eggs Using Deep Learning 23
Shreedhar Rangappa, Ajay A. and G. S. Rajanna
2.1 Introduction 23
2.2 Conventional Silkworm Egg Detection Approaches 24
2.3 Proposed Method 25
2.3.1 Model Architecture 26
v
vi Contents
4.7 Conclusion 73
References 74
5 Hitting the Success Notes of Deep Learning 77
Sakshi Aggarwal, Navjot Singh and K.K. Mishra
5.1 Genesis 78
5.2 The Big Picture: Artificial Neural Network 79
5.3 Delineating the Cornerstones 80
5.3.1 Artificial Neural Network vs. Machine Learning 80
5.3.2 Machine Learning vs. Deep Learning 81
5.3.3 Artificial Neural Network vs. Deep Learning 81
5.4 Deep Learning Architectures 82
5.4.1 Unsupervised Pre-Trained Networks 82
5.4.2 Convolutional Neural Networks 83
5.4.3 Recurrent Neural Networks 84
5.4.4 Recursive Neural Network 85
5.5 Why is CNN Preferred for Computer Vision
Applications? 85
5.5.1 Convolutional Layer 86
5.5.2 Nonlinear Layer 86
5.5.3 Pooling Layer 87
5.5.4 Fully Connected Layer 87
5.6 Unravel Deep Learning in Medical Diagnostic Systems 89
5.7 Challenges and Future Expectations 94
5.8 Conclusion 94
References 95
6 Two-Stage Credit Scoring Model Based on Evolutionary
Feature Selection and Ensemble Neural Networks 99
Diwakar Tripathi, Damodar Reddy Edla, Annushree Bablani
and Venkatanareshbabu Kuppili
6.1 Introduction 100
6.1.1 Motivation 100
6.2 Literature Survey 101
6.3 Proposed Model for Credit Scoring 103
6.3.1 Stage-1: Feature Selection 104
6.3.2 Proposed Criteria Function 105
6.3.3 Stage-2: Ensemble Classifier 106
6.4 Results and Discussion 107
6.4.1 Experimental Datasets and Performance Measures 107
6.4.2 Classification Results With Feature Selection 108
viii Contents
xv
Preface
The Editors
June 2020
xvii
Part 1
MACHINE LEARNING
FOR INDUSTRIAL
APPLICATIONS
1
A Learning-Based Visualization
Application for Air Quality Evaluation
During COVID-19 Pandemic in
Open Data Centric Services
Priyank Jain* and Gagandeep Kaur†
Abstract
Air pollution has become a major concern in many developing countries. There
are various factors that affect the quality of air. Some of them are Nitrogen Dioxide
(NO2), Ozone (O3), Particulate Matter 10 (PM10), Particulate Matter 2.5 (PM2.5),
Sulfur Dioxide (SO2), and Carbon Monoxide (CO). The Government of India
under the Open Data Initiative provides data related to air pollution. Interpretation
of this data requires analysis, visualization, and prediction. This study proposes
machine learning and visualization techniques for air pollution. Both supervised
and unsupervised learning techniques have been used for prediction and analysis
of air quality at major places in India. The data used in this research contains the
presence of six major air pollutants in a given area. The work has been extended to
study the impact of lockdown on air pollution in Indian cities as well.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (3–22) © 2021 Scrivener Publishing LLC
3
4 Machine Learning Algorithms and Applications
1.1 Introduction
1.1.1 Open Government Data Initiative
These days, Open Government Data (OGD) is gaining momentum in pro-
viding sharing of knowledge by making public data and information of
governmental bodies freely available to private citizens in system process-
able formats so as to reuse it for mutual benefits. OGD is a global move-
ment and has its roots in the initiative started in 2009 by the US President
as a Memorandum on Transparency and Open Government providing
transparency in government projects and collaborations through sharing
of data by public administration and industry to private citizens. Indian
Government also has joined this initiative and provides free access to the
data for development of applications, etc., so as to be able to reuse the
information for mutual growth of industry and government. Open Data
is the raw data made available by governments, industry, as well as NGOs,
scientific institutions, and educational organizations and as such is not an
individual’s property.
The growth in the field of Open Data surely asks for new tools and tech-
niques that can support it. Digital transformation needs companies to look
out for new tools and techniques so as to be able to support the increas-
ing need for faster delivery of services at large numbers of delivery points.
Technologies like SaaS, mobile, and Internet of Things are gaining grounds
in providing increase in endpoints and thus enabling the success of Open
Data Initiative.
Predict Future
through LSTM
Fetch latest through
API
Visualization
Figure 1.1 shows the detailed description of the working of the applica-
tion. The latest data for specific location/place is fetched using Restful API
in JSON format from the Open Data Repository. Prediction is also done for
the future air quality using the past values. The predicted data is displayed
with an adequate message through the web app to the user. Visualization
involved displaying the results in human understandable format. For that,
Heat Maps were generated and appropriate messages were displayed based
on the WHO guidelines. The results were also displayed through graphs to
show connectivity between different values. Finally, the outputs were also
displayed on the Indian map.
1.3.3 Algorithms
RNNs are different from FFNNs because the output or result received at
stage t − 1 impacts the output or result received at stage t. In RNN, there
are two input values: first one being the present input value and second
one being the recent past value. Both inputs are used to compute the new
output.
Figure 1.2 shows the simple form of RNN. For a hidden state (ht) which
is non-linear transformation in itself, it can be computed using a combi-
nation of linear input value (It) and recent hidden past value (ht − 1). From
the figure, it can be observed that the output result is computable using
the present dependent hidden state ht. The output Ot holds dependence
on probability pt which was computed using a function called softmax.
Softmax was only computed in the last layer of RNN-based classification
before the final result was received.
Ot–1 Ot Ot+1
Wt ht–1 ht ht+1
Wt Wt Wt
It–1 It+1
It
Visakhapatnam
for LSTM Hyper Parameters for NO2, O3, PM10, PM2.5, and
SO2 are shown in (Table 1.4), and after careful analysis of the
LSTM Hyper Parameter scores, we zeroed in on the batch
size with minimum bias.
Visualization for Air Quality Evaluation 17
75%
70%
O3 65%
60%
55%
12
12
12
12
12
12
12
12
12
12
12
12
12
12
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
1
3
/2
/2
/2
/2
/2
/2
/2
/2
/2
/2
/2
/2
/2
/2
01
01
01
01
01
01
01
01
01
01
01
01
01
01
7
7
16
19
22
1:
4:
7:
10
13
16
19
22
1:
4:
7:
00
00
00
00
00
00
:0
:0
:0
:0
:0
:0
:0
:0
0:
0:
0:
:0
:0
:0
0:
0:
0:
0:
0:
:0
:0
:0
00
00
00
00
00
00
00
00
Figure 1.8 Heat map for ozone O3 for day and night in December, 2017.
18 Machine Learning Algorithms and Applications
100%
o3 80%
60%
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
7/
7/
7/
7/
7/
7/
7/
7/
8/
8/
8/
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
16
19
22
1:
4:
7:
10
13
16
19
22
1:
4:
7:
00
00
00
00
00
00
:0
:0
:0
:0
:0
:0
:0
:0
0:
0:
0:
:0
:0
:0
0:
0:
0:
0:
0:
:0
:0
:0
00
00
00
00
00
00
00
00
0
Figure 1.9 Heat map for ozone O3 for day and night in June, 2020.
Figure 1.10 shows the Heat Map for all the parameters
for the days 11th, 12th, and 13th December, 2017, at Sector 62,
Noida. From the Heat Maps it could be observed that PM2.5
is the main pollution causing parameter in the Air. It could
also be observed that it remains at dangerous levels on all
days and during Days as well as Nights. Figure 1.11 shows
the Heat Map for all the parameters for the days 6th, 7th
and 8th June, 2020 at Sector 62, Noida. The reduced levels
of all pollutants could clearly be seen from the Heat Map as
no2 800%
o3 600%
pm10 400%
pm2.5 200%
so2
12
12
12
12
12
12
12
12
12
12
12
12
12
12
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
/1
1/
1/
1/
2/
2/
2/
2/
2/
2/
2/
2/
3/
3/
3/
20
20
20
20
20
20
20
20
20
20
20
20
20
20
17
17
17
17
17
17
17
17
17
17
17
17
17
17
16
19
22
1:
4:
7:
10
13
19
22
1:
4:
7:
6
00
00
00
00
00
00
:0
:0
:0
:0
:0
:0
:0
:0
0:
0:
0:
:0
:0
:0
0:
0:
0:
0:
0:
:0
:0
:0
00
00
00
00
00
00
00
00
Figure 1.10 Heat map for all parameters for 3 days and nights in December, 2017.
no2
o3
100%
pm10
pm2.5 50%
so2
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
6/
7/
7/
7/
7/
7/
7/
7/
7/
8/
8/
8/
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
16
19
22
1:
4:
7:
10
13
16
19
22
1:
4:
7:
00
00
00
00
00
00
:0
:0
:0
:0
:0
:0
:0
:0
0:
0:
0:
:0
:0
:0
0:
0:
0:
0:
0:
:0
:0
:0
00
00
00
00
00
00
00
00
Figure 1.11 Heat map for all parameters for 3 days and nights in June, 2020.
Visualization for Air Quality Evaluation 19
75
Values(µg/m3)
50
25
0
12/14/2017 12/15/2017 12/16/2017 12/17/2017 12/18/2017 12/19/2017
Days
Figure 1.12 Predicted values for O3 for Anand Vihar, New Delhi.
40
Values(µg/m3)
20
0
6/7/2020 6/8/2020 6/9/2020 6/10/2020
Days
Figure 1.13 Predicted values for PM10 for Sector 62, Noida.
20 Machine Learning Algorithms and Applications
Severe(401 - 500)
Poor(201 - 300)
Moderate(101 - 200)
Satisfactory(51 - 100)
Good(0 - 50)
1.5 Conclusion
After applying K-means clustering using Silhouette coefficient, the data is
divided into seven clusters. The SVM is successfully able to classify the
data into its respective air quality class with accuracy of 99%. The LSTM
models for different places have been tuned accordingly to minimize MAE
and RMSE. The proposed model could be used for various purposes like
predicting future trends of air quality, assessing past trends of air quality,
visualizing data in an effective way, issuing health advisory, and providing
health effects (if any) based on the current air quality. Various parameters
can be compared and it could be determined which pollutant is affecting
more in a particular area and accordingly actions could be taken before-
hand. Anyone could get inference from the data easily which is tough to
analyze numerically and could take certain actions to control air pollution
in any area.
References
1. IHME and HEI State of Global Air/2017, A special report on global exposure
to air pollution and its disease burden. State of Global Air, vol. 1, 1–17, 2017.
2. Li, H., Fan, H., Mao, F., A visualization approach to air pollution data
exploration—a case study of air quality index (PM2. 5) in Beijing, China.
Atmosphere, 7, 3, 35, 2016.
3. Lu, W., Ai, T., Zhang, X., He, Y., An interactive web mapping visualization of
urban air quality monitoring data of China. Atmosphere, 8, 8, 148, 2017.
22 Machine Learning Algorithms and Applications
4. Kumar, A., Sinha, R., Bhattacherjee, V., Verma, D. S., & Singh, S., Modeling
using K-means clustering algorithm. 1st International Conference on Recent
Advances in Information Technology (RAIT), vol. 1, 554–558, IEEE, 2012.
5. Fan, J., Li, Q., Hou, J., Feng, X., Karimian, H., Lin, S., A spatiotemporal pre-
diction framework for air pollution based on deep RNN. ISPRS Annals of
the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 4,
p. 15, 2017.
6. Pereira, R.L., Sousa, P.C., Barata, R., Oliveira, A., Monsieur, G., CitySDK
Tourism API-building value around open data. J. Internet Serv. Appl., 6, 1,
1–13, 2015.
7. Adeleke, J.A., Moodley, D., Rens, G., Adewumi, A.O., Integrating statistical
machine learning in a semantic sensor web for proactive monitoring and
control. Sensors, 17, 4, 807, 2017.
8. Kim, S.H., Choi, J.W., Han, G.T., Air pollution data visualization method
based on google earth and KML for Seoul air quality monitoring in real-
time. Int. J. Software Eng. Its Appl., 10, 10, 117–128, 2016.
9. Sharma, S., Zhang, M., Gao, J., Zhang, H., Kota, S.H., Effect of restricted
emissions during COVID-19 on air quality in India. Sci. Total Environ., 728,
138878, 2020.
10. Mahato, S., Pal, S., Ghosh, K.G., Effect of lockdown amid COVID-19 pan-
demic on air quality of the megacity Delhi, India. Sci. Total Environ., 730,
139086, 2020.
11. Lloyd, S., Least squares quantization in PCM. IEEE Trans. Inf. Theory, 28, 2,
129–137, 1982.
12. Hochreiter, S. and Schmidhuber, J., Long short-term memory. Neural
Comput., 9, 8, 1735–1780, 1997.
13. Hasenkopf, C. A., Flasher, J. C., Veerman, O., & DeWitt, H. L., OpenAQ: A
Platform to Aggregate and Freely Share Global Air Quality Data. AGU Fall
Meeting Abstracts, 2015, A31D-0097, 2015.
14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Vanderplas, J., Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.,
12, 2825–2830, 2011.
15. Manaswi, N. K., Understanding and working with Keras, Deep Learning with
Applications Using Python, vol. 1, pp. 31–43, Springer, 2018.
2
Automatic Counting and Classification
of Silkworm Eggs Using Deep Learning
Shreedhar Rangappa1*, Ajay A.1 and G. S. Rajanna2
2
Maharani Cluster University, Sheshadri Road, Bengaluru, India
Abstract
The method of using convolutional neural networks to identify and quantify the
silkworm eggs that are laid on a sheet of paper by female silk moth. The method is
also capable of segmenting individual egg and classifying them into hatched egg
class and unhatched egg class, thus outperforming image processing techniques
used earlier. Fewer limitations of the techniques employed earlier are described
and attempt to increase accuracy using uniform illumination of a digital scanner is
illustrated. The use of a standard key marker that helps to transform any silkworm
egg sheet into a standard image, which can be used as input to a trained con-
volution neural network model to get predictions, is discussed briefly. The deep
learning model is trained on silkworm datasets of over 100K images for each cate-
gory. The experimental results on test image sets show that our approach yields an
accuracy of above 97% coupled with high repeatability.
2.1 Introduction
In the last decade, machine learning has gained the popularity that no
sequential programming approach has reached in a century in various
fields of engineering. Deep learning/convolution neural network (CNN) is
a part of a machine learning approach that solves a given problem without
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (23–40) © 2021 Scrivener Publishing LLC
23
24 Machine Learning Algorithms and Applications
techniques such as low contrast image setting [8], contrast enhancement fol-
lowed by image morphological operations [7], image patch centroid analysis
[6], image channel conversion from RGB to HSV to identify the region of
interest (ROI) [9], using Gaussian mixture model [10], and using Hough
transforms (blob analysis) [11, 12]. The accuracy achieved in these tech-
niques completely depends on the consistency of the experimental condi-
tions such as the color of the sheet on which the silkworm eggs were laid, the
size of the eggs, and uniform illumination while capturing a digital image of
eggs. By altering any one of these parameters, the results vary drastically, and
hence designing an image processing algorithm for every possible scenario
becomes laborious. Also, the method used in these techniques to capture
digital data of the eggs was to use digital cameras, operated manually without
any preset illumination parameter and hence resulting in poor accuracy.
Dy
Dx
d y = (D y − D ′y ) (2.2)
R″ − h − dx × w – dy (2.3)
diameter of eggs was collected manually using image processing and fea-
ture engineering, later fed into various ML algorithms (KNN, decision
trees, and SVM). ML algorithms provide accuracy over 90% but fail when
the input data is of a different class (breed in terms of sericulture field)
when compared to the class for which the algorithm was trained since the
color of eggs is not the same for different breeds of the silkworm. To over-
come this issue, a supervised CNN technique is used which requires the
true label while the features are selected automatically. The primary aim
of our approach was to accurately count several silkworm eggs present in
a given digital image and further classify them into respective classes such
as hatched and unhatched. Figure 2.2 represents a sample digital image
of the egg sheet with a different class of eggs being marked with specific
colors manually. The eggs marked with green color represent the hatched
class (HC), while eggs marked with red color represent the unhatched class
(UHC).
To identify the core features of the eggs, for segmenting them from the
background sheet and to classify them into respective categories, a sim-
ple deep learning technique was used with four hidden layers to provide
results that are much more accurate compared to conventional methods.
Deep learning models are trained using TensorFlow framework to pro-
vide three different results such as foreground-background segmentation,
detecting eggs, and classifying detected egg.
The core deep learning model used in our experiments is shown in
Figure 2.3. The model consists of convolution layers, max-pooling layers,
and fully connected layers. These layers are trained to identify the features
of egg, while the last layer is modified to provide categorical or continuous
data. The core CNN model is trained using a stride of (2 × 2) and kernel
of (5 × 5) for convolution, with (2 × 2) max pooling and a fully connected
Hatched eggs
Un-Hatched eggs
Figure 2.2 Silkworm egg classes: hatched eggs and unhatched eggs.
28 Machine Learning Algorithms and Applications
1000
2000
3000
4000
5000
6000
0 2000 4000
(a) (b)
Figure 2.4 Foreground-background segmentation of entire silkworm egg sheet (a) (input)
and (b) (output).
silkworm egg sheet is divided into square grids of 128 × 128 with a stride
of 128, resulting in an image set of 2K images that must be processed for
categorical results as foreground or background class. Figure 2.4 represents
the segmentation of the entire silkworm egg sheet, where the foreground
(presence of egg) and background (absence of egg) are represented by a
green color (pixels) and red color, respectively. Further processing is only
carried out for pixels represented in green color that minimizes computa-
tional time and increase final accuracy as background pixels are dropped
out of the data processing cycle.
bounding box will not yield any good result. Secondly, these methods have
the limitation of how many similar class objects can be recognized within a
single bounding box, which are two objects for YOLO [14]. Since the eggs
are small, many eggs will be present within a 100 × 100 grid image that may
belong to the same class and hence may not be detected.
The specification of the egg location predictor CNN model is rep-
resented in Table 2.2, where the input to the core CNN model has been
changed to 32 × 32 pixels, three-channel RGB image. Here, the output is
a regression that provides the location of the egg center (x, y) rather than
a bounding box (four corner points). The training dataset consists of both
the class images, i.e., hatched eggs and unhatched eggs. The positive sam-
ples consist of images where an individual egg is visible completely, while
the negative samples consist of eggs that are partially visible and have mul-
tiple egg entries. Figure 2.5 represents the classifier model that is trained to
determine positive and negative samples, and the positive samples are later
trained to predict the egg center location in terms of pixel values. Further,
during the practical application, the center location of the egg predicted is
used to crop a single egg data to be fed into a classifier that determines the
class of the selected egg into HC or UHC. Figure 2.6 represents an overall
result of locating egg centers using egg location predictor CNN model for
one of the test data sheets where all egg centers are marked with a blue dot.
A sliding window of (32 × 32) with a stride of (4, 4) was used to achieve
the results.
Output
Figure 2.5 CNN training model to predict egg location in terms of pixel values.
200
400
600
800
1000
images are fed into the egg class predictor CNN model, which provides
categorical data to distinguish the input image into class HC or UHC.
The specification of the egg class predictor CNN model is represented in
Table 2.3.
x = xa − 16 (2.4)
y = ya – 16 (2.5)
w = xa + 16 (2.6)
h = ya + 16 (2.7)
0
Hatched Class
400
600
800
1000
2.5 Results
The trained CNN models were tested with new silkworm egg sheets that
were scanned using a Canon® paper scanner at 600 dpi, to classify and
count the number of eggs. These digital datasets were completely isolated
from the training step; thereby, the trained CNN models had to predict the
results than providing learned results. Table 2.4 represents the performance
of the overall CNN model trained using our datasets. The performance of a
few datasets is shown due to space restriction. It can be observed that CNN
models trained with two hidden layers perform superior to the conven-
tional techniques by providing accuracy of over 97%. The accuracy shown
in Table 2.4 is the accuracy of the number of eggs counted and accuracy in
classifying the eggs. The model consistently outperforms the conventional
computer vision/image processing technique of silkworm counting and
classification with accuracy over 97% for newer data of the same breed.
The inference time shown in Table 2.4 was performed on an Nvidia GPU
(GTX 1060).
The model performance drops to newer egg data that are completely
different in color and texture, which were not available in the training data-
set. This happens due to the nature of different breed eggs that are spatially
36
different from the trained model. Collecting and training a deep learning
model to a different breed of silkworm eggs will resolve these issues, which
is under action.
2.6 Conclusion
In this paper, CNN-based silkworm egg counting and classification model
that overcomes many issues found with conventional image processing
techniques is explained. The main contribution of this paper is in four-
folds. First, a method to generalize the method of capturing silkworm egg
sheet data in a digital format using normal paper scanners rather than
designing extra hardware, which eliminates the need for additional light
sources to provide uniform illumination while recording data and main-
tain high repeatability.
Second, the scanned digital data can be transformed into standard size
by using key markers stamped onto the egg sheets before scanning. This
allows the user to resize the dimension of digital data and later use it in an
image processing algorithm or CNN without introducing dimensionality
error.
A dataset has been put together containing over 400K images represent-
ing different features of silkworm eggs. The CNN and other models that
need a lot of training, testing and validation data can easily use this dataset
to skip the data generation phase which is the third contribution.
Fourth, a CNN model has been trained using the dataset that is designed
to predict the egg class and count the number of eggs per egg sheet. With
over 97% accuracy the model outperforms many conventional approaches
with only 4 hidden layers and a fully connected layer.
The model performs accurately in quantifying (counting) different
breed silkworm eggs, but new datasets become necessary to predict the
class labels for new silkworm breed for which the model is not trained. This
is because HC class eggs have high pixel intensity throughout the egg sur-
face while UHC has dark pixels at the center surrounded with high-value
pixels for the egg breed used on our experiment. This color feature may
not be the same as other breed silkworm eggs, and hence, additional data
becomes important that can be fed into already trained CNN using trans-
fer learning. Also, the egg location model performs well with new breed
data, the training dataset to determining the class of eggs can be easily
generated with minimal human effort.
38 Machine Learning Algorithms and Applications
Acknowledgment
The authors would like to thank Smt. R. Latha S-B and Mr. P. B. Vijayakumar
S-C of KSSRDI, KA, IN for providing silkworm egg sheets for this study.
References
1. Xue, Y. and Ray, N., Cell Detection in Microscopy Images with Deep Convo
lutional Neural Network and Compressed Sensing, CoRR, abs/1708.03307,
arXiv preprint arXiv:1708.03307, 2017.
2. Zieliński, B., Plichta, A., Misztal, K., Spurek, P., Brzychczy-Włoch, M.,
Ochońska, D., Deep learning approach to bacterial colony classification.
PLoS One, 12, 9, e0184554, 2017.
3. Abe, M. and Nakayama, H., Deep learning for forecasting stock returns in
the cross-section, in: Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
vol. 10937 LNAI, pp. 273–284, 2018.
4. Kumar, D.A., Silkworm Growth Monitoring Smart Sericulture System based
on Internet of Things (IOT) and Image Processing. Int. J. Comput. Appl., 180,
18, 975–8887, 2018.
5. S.P., Rajanna, G.S., Chethan, D., Application of Image Analysis methods for
Quantification of Fecundity in Silkworm Bombyx mori L, in: International
Sericulture Commission, Research Papers, 2015 (http://www.inserco.org/en/
previous_issue).
6. Kiratiratanapruk, K., Watcharapinchai, N., Methasate, I., Sinthupinyo, W.,
Silkworm eggs detection and classification using image analysis, in: 2014
International Computer Science and Engineering Conference, ICSEC 2014,
pp. 340–345, 2014.
7. Pandit, A., Rangole, J., Shastri, R., Deosarkar, S., Vision system for automatic
counting of silkworm eggs, 2014 International Conference on Information
Communication and Embedded Systems, ICICES 2014, no. 978, pp. 1–5, 2015.
8. Kiratiratanapruk, K. and Sinthupinyo, W., Worm egg segmentation based
centroid detection in low contrast image, in: 2012 International Symposium
on Communications and Information Technologies, ISCIT 2012, pp. 1139–
1143, 2012.
9. Pathan, S., Harale, A., Student, P.G., A Method of Automatic Silkworm Eggs
Counting System. Int. J. Innovative Res. Comput. Commun. Eng., 4, 12, 25,
2016.
10. K.P.R., Sanjeev Poojary, L., M.G.V., S.N.K., An Image Processing Algorithm
for Silkworm Egg Counting. Perspect. Commun. Embedded-Syst. Signal-
Process. (PiCES), 1, 4, 2566–932, 2017.
Silkworm Egg Counting and Classification 39
11. Matas, J., Galambos, C., Kittler, J., Robust Detection of Lines Using
the Progressive Probabilistic Hough Transform. Comput. Vision Image
Understanding, 78, 1, 119–137, Apr. 2000.
12. Nikitha, R.N., Srinidhi, R.G., Harshith, R., Amar, T., Raghavendra, C.G.,
Reckoning the hatch rate of multivoltine silkworm eggs by differentiating
yellow grains from white shells using blob analysis technique, in: Advances
in Intelligent Systems and Computing, vol. 709, pp. 497–506, 2018.
13. F. und T. des L. N.-W. Ministerium für Innovation and Wissenschaft,
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
Networks. Advances in neural information processing systems, 2015.
14. Özdener, A.E. and Rivkin, A., You Only Look Once: Unified, Real-Time
Object Detection Joseph. Drug Des. Devel. Ther., abs/1506.02640, 11, 2827–
2840, 2015.
3
A Wind Speed Prediction System
Using Deep Neural Networks
Jaseena K. U.1,2* and Binsu C. Kovoor1
1
Division of Information Technology, School of Engineering, Cochin University of
Science and Technology, Kochi, Kerala, India
2
Department of Computer Applications, MES College Marampally, Aluva, Kochi,
Kerala, India
Abstract
The demand for renewable energy sources has improved significantly due to the
depletion of fossil fuels at a tremendous rate. The usage of conventional energy
sources is making the environment more polluted. So, wind and other renewable
energy sources have got more significance and demand. Low cost and availability
are the key factors that make wind one of the most dominant renewable sources of
energy. Wind speed prediction has applications in various domains such as wind
power stations, agriculture, navy, and airport operations. Wind speed can be pre-
dicted depending on various environmental factors such as dew point, humidity,
temperature, and pressure. Deep Neural Networks (DNNs) are particular types of
neural networks that can process and analyze massive datasets by applying a series
of trained algorithms and are capable of making predictions based on past data.
This chapter suggests a DNN model to forecast daily average wind speed using
massive datasets. The metrics used to estimate the accuracy of the prediction sys-
tem are mean absolute error, root mean squared error, and R2. Performance com-
parison of the proposed model with Artificial Neural Network, Support Vector
Machine, and Random Forests are analyzed, and the experimental outcomes
demonstrate that the proposed model is more efficient and effective.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (41–60) © 2021 Scrivener Publishing LLC
41
42 Machine Learning Algorithms and Applications
3.1 Introduction
Energy sources are an inevitable part of every developing country for their
industrial and agricultural activities. Wind energy is observed to be the
most promising and environment-friendly source of renewable energy that
can be used for power generation. Wind speed prediction is an essential
activity as the wind has many applications in various domains such as agri-
culture, industry, marine applications, military, and airport applications.
In wind power generators, effective and precise wind speed prediction is
a necessity for generating electricity. The cultivation of certain crops also
depends on wind speed. It also plays a significant role during the rocket
launching process. Wind speed forecasting can be accomplished with the
help of historical wind speed data. Wind speed is weather dependent, and
the recurrent behavior of wind speed makes the accurate prediction a
challenging task [1]. Hence, the development of accurate wind speed fore-
casting models has got more significance, and several systems are being
implemented for enhancing accuracy. In this big data era, traditional com-
putational intelligence models could take a considerable amount of time
to extract relevant information from big datasets. However, the incredible
potential of deep learning helps to process and analyze extensive datasets
for better feature learning and pattern analysis.
Current prediction systems can be grouped into four categories, namely,
physical, statistical, Artificial Intelligence (AI), and hybrid systems, depend-
ing on the methods employed for forecasting [2]. Physical systems utilize
mathematical models to forecast future states. Statistical models are linear
models and are suited for short-term wind speed forecasting. The most
commonly employed statistical models for wind speed forecasting are
Autoregressive Integrated Moving Average (ARIMA), multiple regression,
and Vector Autoregression (VAR). The development of AI has inspired the
development of intelligent prediction models. These models have proven to
be robust and effective when compared to statistical models. AI models can
efficiently manage non-linear datasets and demonstrate better predictive
performance. AI systems are further divided into machine learning pre-
dictors and deep learning predictors. Artificial Neural Networks (ANNs),
Extreme Learning Machine (ELM), and Support Vector Machine (SVM)
are the popular machine learning models used for forecasting wind speed.
Machine learning techniques are prevalent for predictive systems, in which
ANNs are frequently employed as they can learn nonlinear functions. Deep
learning uses neural networks with deep architectures that cover many lay-
ers of non-linear processing stages. Subsequently, Deep Neural Networks
(DNNs) are successfully implemented to predict wind speed due to their
Wind Speed Prediction System Using DNNs 43
DNN model. The effectiveness of this model is analyzed with other popu-
lar machine learning models. The main highlights of the chapter are pre-
sented below.
3.2 Methodology
The proposed framework is presented in this section with an introduc-
tion to DNNs. The primary objective is to predict daily average wind speed
using an optimal DNN architecture. In order to find out the optimum con-
figuration for the DNN architecture, initially, a parametric study of various
network parameters is conducted. The parameters considered for the study
are the count of hidden layers, count of neurons in each hidden layer, and
the learning rate. Accuracy of the proposed optimal model is then com-
pared with popular and widely used machine learning models like ANNs,
Random Forests (RF), Decision Trees (DT), SVM, and Linear Regression
(LR).
Hidden layers
Input Output
layer layer
∑
n
Yi = g Wij X j + bi (3.1)
j =1
where Wij represents the weight of the connection between neurons, bi rep-
resents the bias of the neuron i, and g is the activation function.
Each neuron in the hidden and output layers aggregates its input values
and then applies an activation function to it. The Rectified Linear Unit
(ReLU) is a nonlinear activation function used in the majority of the DNN
models. It is used with the hidden layers of a neural network. The equation
of ReLU activation function is given by Equation (3.2).
0 for z < 0
f (z ) = (3.2)
z for z ≥ 0
l
∇∅t = − ∗ gt
∑ (3.4)
t
2
gT
T =1
Parameter
Selection
Model Training 35
Average Wind Speed using DNN
30
Prediction Visualization
Evaluation
humidity, sea level pressure, visibility, and wind speed, respectively. The
variables low, avg, and high are synonyms for minimum, average, and
maximum, respectively.
v − min
v′ = (3.5)
max − min
where v is the original value and v’ is the value obtained after normaliza-
tion. The variables min and max are minimum and maximum values of the
samples, respectively.
Wind Speed Prediction System Using DNNs 49
25
Frequency
600
20
15
400
10
5 200
0
0
Days from January 1996 to March 2018 0 5 10 15 20 25 30 35
(a) (b) Average Wind Speed
Figure 3.3 Distribution of average wind speed: (a) distribution chart and (b) histogram.
50 Machine Learning Algorithms and Applications
Input Parameters
1 Output Parameter
1
Temperature 1
1 2
2 1
Humidity 2 3 2
3 2
Dew point 3 3
…
Predicted
3 1 Wind
…
Speed
…
Pressure
…
…
4
…
Visibility 15 23 9
19 5
Wind speed 16 24 10
20
25
Input Layer Hidden layer1 Hidden layer2 Hidden layer3 Hidden layer4 Output Layer
1
∑
n
RMSE = ( y j − yˆ j )2 (3.6)
n j =1
MAE is the absolute variation among the predicted (y^) and actual data
(y) as shown in Equation (3.7).
1
∑
n
MAE = | y j − yˆ j | (3.7)
n j =1
3.2.2.5 Visualization
Visualization is yet another essential step in any predictive system. In
this study, scatter plots and semi-log plots are employed to visualize
the results. Scatter plots are utilized to analyze graphically, the differ-
ence between actual and predicted values, and these plots are further
employed to assess the efficiency of the system. The variation of predicted
values from actual values can be visualized more precisely using semi-log
plots.
52 Machine Learning Algorithms and Applications
2.4
3.2
2.3
3.0 2.2
RMSE
MAE
2.8 2.1
2.0
2.6
1.9
2.4 1.8
DT RF SVM ANN LR DNN DT RF SVM ANN LR DNN
Machine Learning models Machine Learning models
(a) (b)
Prediction Accuracy based on R2
70
60
50
R^2 (%)
40
30
20
10
0
DT RF SVM ANN LR DNN
Machine Learning models
(c)
Figure 3.5 Prediction accuracy based on (a) RMSE, (b) MAE, and (c) R2.
the recorded (actual) data using scatter plots. DNN, ANN, RF, and
SVM models are considered for visualization because they produced
comparable results. The scatter plots of predicted daily data against the
recorded data to predict wind speed using various models are illustrated
in Figure 3.6.
Wind Speed Prediction System Using DNNs 55
Average Wind Speed using DNN Average Wind Speed using ANN
40 50
35 +30% +30%
Predicted Values
Predicted Values
30 40
25 30
20
15 -30% 20 -30%
10 10
5
0 0
0 10 20 30 40 0 10 20 30 40
Actual Values Actual Values
(a) (b)
Average Wind Speed using RF Average Wind Speed using SVM
50 50
+30% +30%
Predicted Values
Predicted Values
40 40
30 30
20 -30% 20
-30%
10 10
0 0
0 10 20 30 40 0 5 10 15 20 25 30 35
Actual Values Actual Values
(c) (d)
Figure 3.6 Comparison of predicted against the actual values (a) Average wind speed
using DNN (b) Average wind speed using ANN (c) Average wind speed using RF (d)
Average wind speed using SVM.
81.15% of the points lie within the ± 30% boundary of the regression line,
respectively. It can therefore be concluded that the proposed DNN system
exhibits better predictive accuracy when compared other models.
Average Wind Speed using DNN Average Wind Speed using ANN
35 Predicted_value 35 Predicted_value
Actual_value Actual_value
30 30
Average Wind Speed
25 25
20
20
15
15
10
10
5
5
0
100 101 102 100 101 102 103
Days Days
(a) (b)
Average Wind Speed using RF Average Wind Speed using SVM
35 Predicted_value 40 Predicted_value
Actual_value Actual_value
30
Average Wind Speed
25 30
20
20
15
10
10
5
0 0
100 101 102 103 100 101 102 103
Days Days
(c) (d)
Figure 3.7 Comparison of prediction accuracy of DNN and other models (a)Average
wind speed using DNN (b) Average wind speed using ANN (c) Average wind speed using
RF (d) Average wind speed using SVM.
Wind Speed Prediction System Using DNNs 57
and RF models, the study concludes that DNNs can be employed effec-
tively to forecast daily average wind speed.
3.4 Conclusion
The proposed framework investigates the effects of various model parame-
ters and selects an architecture with low RMSE and MAE value as the most
appropriate architecture for predicting average wind speed. The data col-
lected from Stanford weather station has been utilized for the experiments.
The optimal DNN architecture with hidden neurons (25, 20, 10, and 5) in
the hidden layers from 1 to 4 is employed for the study. Performance com-
parison of the proposed DNN model is performed with the benchmark
models, such as SVM, ANN, and RF models. The developed DNN model
outperforms the other prediction models with a minimum RMSE value of
2.8207, MAE of 2.0781, and a maximum R2 value of 67%. Analysis of the
experimental outcomes illustrates the effectiveness of the proposed DNN
system for predicting wind speed. Optimization algorithms to enhance
further the performance of the predictive model will be deliberated in
future works. Other deep learning–based wind speed forecasting models
using RNNs will also be considered as future directions.
References
1. Wang, X., Guo, P., Huang, X., A review of wind power forecasting models.
Energy Proc., 12, 770–778, 2011, https://doi.org/10.1016/j.egypro.2011.10.103.
2. Mi, X.W., Liu, H., Li, Y.F., Wind speed forecasting method using wave-
let, extreme learning machine and outlier correction algorithm. Energy
Convers. Manage., 151, 709–722, 2017, https://doi.org/10.1016/j.enconman.
2017.09.034.
3. He, J. and Xu, J., Ultra-short-term wind speed forecasting based on support
vector machine with combined kernel function and similar data. EURASIP
J. Wirel. Commun. Netw., 2019, 1, 248, 2019, https://doi.org/10.1186/
s13638-019-1559-1.
4. Yang, M., Liu, L., Cui, Y., Su, X., Ultra-Short-Term Multistep Prediction of
Wind Power Based on Representative Unit Method. Math. Probl. Eng., 2018,
1936565, 11, 2018, https://doi.org/10.1155/2018/1936565.
5. Finamore, A.R., Galdi, V., Calderaro, V., Piccolo, A., Conio, G., Grasso, S.,
Artificial neural network application in wind forecasting: An one-hour-ahead
wind speed prediction, 5th IET International Conference on Renewable Power
Generation (RPG) 2016, 1–6, 2016, https://doi.org/10.1049/cp.2016.0545.
58 Machine Learning Algorithms and Applications
6. Yu, C., Li, Y., Xiang, H., Zhang, M., Data mining-assisted short-term
wind speed forecasting by wavelet packet decomposition and Elman neu-
ral network. J. Wind Eng. Ind. Aerodyn., 175, 136–143, 2018, https://doi.
org/10.1016/j.jweia.2018.01.020.
7. Filik, Ü. B. and Filik, T., Wind speed prediction using artificial neural net-
works based on multiple local measurements in Eskisehir. Energy Proc., 107,
264–269, 2017, https://doi.org/10.1016/j.egypro.2016.12.147.
8. Tarade, R.S. and Katti, P.K., A comparative analysis for wind speed predic-
tion, in: 2011 International Conference on Energy, Automation and Signal,
2011, December, IEEE, pp. 1–6, https://doi.org/10.1109/iceas.2011.6147167.
9. Liu, H., Tian, H.Q., Pan, D.F., Li, Y.F., Forecasting models for wind speed using
wavelet, wavelet packet, time series and Artificial Neural Networks. Appl.
Energy, 107, 191–208, 2013, https://doi.org/10.1016/j.apenergy.2013.02.002.
10. Ramasamy, P., Chandel, S.S., Yadav, A.K., Wind speed prediction in the moun-
tainous region of India using an artificial neural network model. Renewable
Energy, 80, 338–347, 2015, https://doi.org/10.1016/j.renene.2015.02.034.
11. Ghorbani, M.A., Khatibi, R., FazeliFard, M.H., Naghipour, L., Makarynskyy,
O., Short-term wind speed predictions with machine learning techniques.
Meteorol. Atmos. Phys., 128, 1, 57–72, 2016, https://doi.org/10.1007/
s00703-015-0398-9.
12. Khandelwal, I., Adhikari, R., Verma, G., Time series forecasting using hybrid
ARIMA and ANN models based on DWT decomposition. Proc. Comput.
Sci., 48, 1, 173–179, 2015, https://doi.org/10.1016/j.procs.2015.04.167.
13. Yousefi, M., Hooshyar, D., Yousefi, M., Khaksar, W., Sahari, K.S.M., Alnaimi,
F.B., II, An artificial neural network hybrid with wavelet transform for short-
term wind speed forecasting: A preliminary case study, in: 2015 International
Conference on Science in Information Technology (ICSITech), 2015, October,
IEEE, pp. 95–99, https://doi.org/10.1109/icsitech.2015.7407784.
14. Shao, H., Wei, H., Deng, X., Xing, S., Short-term wind speed forecasting
using wavelet transformation and AdaBoosting neural networks in Yunnan
wind farm. IET Renewable Power Gener., 11, 4, 374–381, 2016, https://doi.
org/10.1049/iet-rpg.2016.0118.
15. Cui, Y., Huang, C., Cui, Y., A novel compound wind speed forecasting model
based on the back propagation neural network optimized by bat algorithm.
Environ. Sci. Pollut. Res., 27, 7, 7353–7365, 2020, https://doi.org/10.1007/
s11356-019-07402-1.
16. Senthil, K.P., Improved prediction of wind speed using machine learning.
EAI Endorsed Trans. Energy Web, 6, 23, 1–7, 2019, https://doi.org/10.4108/
eai.13-7-2018.157033.
17. Hui, H., Rong, J., Songkai, W., Ultra-Short-Term Prediction of Wind Power
Based on Fuzzy Clustering and RBF Neural Network. Adv. Fuzzy Syst., 2018,
9805748, 7, 2018, https://doi.org/10.1155/2018/9805748.
18. Bali, V., Kumar, A., Gangwar, S., Deep learning based wind speed forecast-
ing-A review, in: 2019 9th International Conference on Cloud Computing,
Wind Speed Prediction System Using DNNs 59
Data Science & Engineering (Confluence), 2019, January, IEEE, pp. 426–431,
https://doi.org/10.1109/confluence.2019.8776923.
19. Cao, Q., Ewing, B.T., Thompson, M.A., Forecasting wind speed with recur-
rent neural networks. Eur. J. Oper. Res., 221, 1, 148–154, 2012, https://doi.
org/10.1016/j.ejor.2012.02.042.
20. Khodayar, M. and Teshnehlab, M., Robust deep neural network for wind
speed prediction, in: 2015 4th Iranian Joint Congress on Fuzzy and Intelligent
Systems (CFIS), 2015, September, IEEE, pp. 1–5, https://doi.org/10.1109/
cfis.2015.7391664.
21. Sergio, A.T. and Ludermir, T.B., Deep learning for wind speed forecasting
in northeastern region of Brazil, in: 2015 Brazilian Conference on Intelligent
Systems (BRACIS), 2015, November, IEEE, pp. 322–327, https://doi.
org/10.1109/bracis.2015.40.
22. Liu, H., Mi, X., Li, Y., Smart deep learning based wind speed prediction model
using wavelet packet decomposition, convolutional neural network and con-
volutional long short term memory network. Energy Convers. Manage., 166,
120–131, 2018, https://doi.org/10.1016/j.enconman.2018.04.021.
23. Hu, Q., Zhang, R., Zhou, Y., Transfer learning for short-term wind speed
prediction with deep neural networks. Renewable Energy, 85, 83–95, 2016,
https://doi.org/10.1016/j.renene.2015.06.034.
24. Nielsen, M.A., Neural networks and deep learning, vol. 2018, Determination
press, San Francisco, CA, 2015.
25. Zeiler, M.D., Adadelta: an adaptive learning rate method. arXiv:1212.5701.
https://arxiv.org/abs/1212.5701, 2012.
4
Res-SE-Net: Boosting Performance of
ResNets by Enhancing Bridge Connections
Varshaneya V.*, S. Balasubramanian† and Darshan Gera‡
Abstract
One of the ways to train deep neural networks effectively is to use residual connec-
tions. Residual connections can be classified as being either identity connections
or bridge connections with a reshaping convolution. Empirical observations on
CIFAR-10 and CIFAR-100 datasets using a baseline ResNet model, with bridge
connections removed, have shown a significant reduction in accuracy. This reduc-
tion is due to lack of contribution, in the form of feature maps, by the bridge
connections. Hence, bridge connections are vital for ResNet. However, all feature
maps in the bridge connections are equally important. In this work, an upgraded
architecture “Res-SE-Net” is proposed to further strengthen the contribution from
the bridge connections by quantifying the importance of each feature map and
weighting them accordingly using Squeeze-and-Excitation (SE) block. It is demon-
strated that Res-SE-Net generalizes much better than ResNet and SE-ResNet on
the benchmark CIFAR-10 and CIFAR-100 datasets.
Keywords: Deep residual learning, weighting activations, bridge connections,
ResNet, Squeeze-and-Excitation Net
4.1 Introduction
Deep neural networks are increasingly being used in a number of com-
puter vision tasks. One great disadvantage of training a very deep network
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (61–76) © 2021 Scrivener Publishing LLC
61
62 Machine Learning Algorithms and Applications
not addition. The authors argue that such connections would help in fea-
ture reuse and thereby an unhindered information flow. In Ref. [6], the
weight layers in the ResNet block are randomly dropped thereby only
keeping skip-connections active in these layers. This gives rise to an ensem-
ble of ResNets similar to dropout [12]. Dropping weight layers depend
on “survival probability”. This idea outperforms the baseline ResNet.
Another important architecture that won the 2017 ILSVRC1 competition
§
is SE-ResNet [5]. This winning architecture has in its base a ResNet with
an SE block introduced between the layers of ResNet. This block quantifies
the importance of feature maps instead of considering all of them equally
likely. This has resulted in a significant level of improvement in perfor-
mance of the ResNet.
Though ResNet has been studied in detail, to the best of our knowledge
there has not been any work focusing on bridge connections in ResNet.
In this work, the effectiveness of bridge connections is investigated and
a novel architecture namely “Res-SE-Net” is proposed. This architecture
consists of an SE block in the bridge connection to weigh the importance
of feature maps. The proposed architecture demonstrates a superior per-
formance on CIFAR-10 and CIFAR-100 benchmark datasets over baseline
ResNet and SE-ResNet.
4.3 Preliminaries
4.3.1 ResNet
The idea behind ResNets [3] is to make a shallow architecture deeper
by adding identity mapping from a previous layer to the current layer
and then applying a suitable non-linear activation. Addition of skip-
connections facilitates larger gradient flow to earlier layers thereby address-
ing the degradation problem as mentioned in [3]. The building block of a
ResNet is depicted in Figure 4.1. Here, x is identity and F(x) is called the
residual mapping.
ResNet comprises of a stack of these blocks. A part of the 34-layer
ResNet is shown in Figure 4.2. The skip-connections that carry activations
within a block are referred to as identity skip-connections and those that
carry from block to another are called as bridge connections.
1
http://image-net.org/challenges/LSVRC/.
§
64 Machine Learning Algorithms and Applications
weight layer
F(x) relu
x
weight layer
identity
F(x) + x
relu
3×3 conv, 64
3×3 conv, 64
X
H×W×C
Global pooling
1×1×C
FC C
1×1×
r
ReLU C
1×1×
r
FC 1×1×C
Sigmoid
1×1×C
Scale
H×W×C
~
X
Batch
Conv 2D SE Block
Norm
Downsampling
4.5 Experiments
4.5.1 Datasets
All experiments are conducted on CIFAR-10 [9] and CIFAR-100 [9] data-
sets. The CIFAR-10 dataset consists of 50,000 training images and 10,000
test images in 10 classes, with 5,000 training images and 1,000 test images
per class. The CIFAR-100 dataset consists of 50,000 training images and
10,000 test images in 100 classes, with 500 training images and 100 test
images per class. There are 20 main classes which contain these classes. The
size of images in both the datasets is 32 × 32 and all of them are RGB images.
2
Image from Andrew Ng’s Deep Learning course and modified with an SE block.
3
Adapted from https://github.com/bearpaw/pytorch-classification.
4
Adapted from https://github.com/moskomule/senet.pytorch.
Res-SE-Net: Boosting Performance of ResNets 69
The images are only normalized at the test time. The input to the net-
work is of size 32 × 32. The architecture of the network used, for both
CIFAR-10 and CIFAR-100 datasets, is mentioned in Section 4.4. The
training starts with an initial learning rate of 0.1, and subsequently, it is
divided by 10 at 32,000 and 48,000 iterations. The training is done for a
maximum of 64,000 iterations. Stochastic Gradient Descent (SGD) is used
for updating the weights. The weights in the model are initialized by the
method described in [2], and further, batch normalization [8] is adopted.
The hyperparameters used are enlisted in Table 4.3.
4.6 Results
Tables 4.1, 4.4, and 4.5 report the accuracies obtained by baseline ResNets,
baseline SEResNets, and Res-SE-Nets, respectively. As evident from Table
4.1, the best performing ResNet is ResNet-110 with Top-1 accuracies of
93.66% and 73.33% on CIFAR-10 and CIFAR-100, respectively. Similarly,
Figures 4.5 to 4.7, it can be concluded that training of Res-SE-Net has taken
place smoothly. There is no abrupt increase in training loss of Res-SE-Net
models. This shows that gradient flow has not been hindered by the intro-
duction of an SE block in bridge connections, maintaining the principle of
ResNet (base of our Res-SE-Net) that skip-connections facilitate smooth
training of deep networks.
3.0
3
2.5
Loss
Loss
2.0
2
1.5
1.0 1
0.5
0.0 0
Figure 4.5 (a) Training losses plotted for depth of 20 layers. (b) Training losses plotted for
depth of 32 layers.
3
3
Loss
Loss
2
2
1 1
0 0
Figure 4.6 (a) Training losses plotted for depth of 44 layers. (b) Training losses plotted for
depth of 56 layers.
Res-SE-Net: Boosting Performance of ResNets 73
Resnet-110 on CIFAR-10
Res-SE-Net-110 on CIFAR-10
SE-Resnet-110 on CIFAR-10
Resnet-110 on CIFAR-100
Res-SE-Net-110 on CIFAR-100
4
SE-Resnet-110 on CIFAR-100
3
Loss
4.7 Conclusion
In this work, a novel architecture named “Res-SE-Net” is proposed, which
makes bridge connections in ResNets more influential. This is achieved by
incorporating an SE block in every bridge connection. Res-SE-Net sur-
passed the performances of baseline ResNet and SE-ResNets by signifi-
cant margins on CIFAR-10 and CIFAR-100 datasets. Further, it has been
demonstrated that reasonably sized deep networks with positively contrib-
uting bridge connections can outperform very deep networks. It is also
illustrated that addition of an SE block does not affect training. In future,
other ways of making bridge connections in ResNets influential can be
explored to enhance its overall performance.
74 Machine Learning Algorithms and Applications
References
1. Glorot, X. and Bengio, Y., Understanding the difficulty of training deep feed-
forward neural networks, in: Proceedings of the thirteenth international con-
ference on artificial intelligence and statistics, pp. 249–256, 2010.
2. He, K., Zhang, X., Ren, S., Sun, J., Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification, in: Proceedings of the
IEEE international conference on computer vision, pp. 1026–1034, 2015.
3. He, K., Zhang, X., Ren, S., Sun, J., Deep residual learning for image recog-
nition, in: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 770–778, 2016.
4. He, K., Zhang, X., Ren, S., Sun, J., Identity mappings in deep residual net-
works, in: European conference on computer vision, Springer, pp. 630–645,
2016.
5. Hu, J., Shen, L., Sun, G., Squeeze-and-excitation networks, in: Proceedings
of the IEEE conference on computer vision and pattern recognition, pp. 7132–
7141, 2018.
6. Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q., Deep networks
with stochastic depth, in: European conference on computer vision, Springer,
pp. 646–661, 2016.
7. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., Densely connected
convolutional networks, in: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 4700–4708, 2017.
8. Ioffe, S. and Szegedy, C., Batch normalization: Accelerating deep network
training by reducing internal covariate shift. CoRR, International conference
on machine learning, 2015.
9. Krizhevsky, A., Hinton, G. et al., Learning multiple layers of features from tiny
images. Technical report, Citeseer, 2009.
10. Nair, V. and Hinton, G.E., Rectified linear units improve restricted boltz-
mann machines, in: ICML, 2010.
11. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,
T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E.,
DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai,
J., Chintala, S., Pytorch: An imperative style, high-performance deep learn-
ing library, in: Advances in Neural Information Processing Systems, vol. 32, H.
Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett
(Eds.), pp. 8024–8035, Curran Associates, Inc., NeurIPS, 2019, URL http://
papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-perfor-
mance-deep-learning-library.pdf.
12. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.,
Dropout: A simple way to prevent neural networks from overfitting. J. Mach.
Learn. Res., 15, 1, 1929–1958, 2014.
13. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan,
D., Vanhoucke, V., Rabinovich, A., Going deeper with convolutions, in:
Res-SE-Net: Boosting Performance of ResNets 75
Abstract
During past decade, the Machine Learning (ML) has been driving with different
pace. Nowadays, it is disguised in a novel term that has been a buzzing around
market, named as Deep Learning (DL). Basically, the concept of DL has been
derived from human neurological traits. Humans are the masters of thinking, ana-
lyzing, expressing, actions, and, supreme of all, improvisation. These activities are
controlled by single neurological unit called brain. Our brain behaves according
to chemical and biological computation through neurons. Motivated by the neu-
ron functioning, a unique concept, formally known as Artificial Neural Network
(ANN), has been born among the research community. It is an art of inducing
artificial brain to machines, making them more reliable, robust, independent, effi-
cient, and self-adaptive. It is a well-established art which really needs a compre-
hensive review in several aspects. Therefore, the endeavor has been made by us to
work out over the subject, consolidate the findings and turn it into a chapter. It will
offer an organized view of ANN via DL. Further, we try to interpret the fine line
created between ML and DL, understand its variants, current trends, emerging
challenges, and future scope.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (77–98) © 2021 Scrivener Publishing LLC
77
78 Machine Learning Algorithms and Applications
5.1 Genesis
Suppose you have a tremendous amount of data but you do not know
how to make a practical use of it or inferences instead. One approach is
to deploy conventional statistical models which involve average, variance,
regression, or skewness. But the problem here is that the results are truly
dependent on average ignoring the qualitative aspects such as health, intel-
ligence, and so on. Another approach is to use the most influential tech-
nology takes hold nowadays, i.e., Machine Learning (ML) [1–5]. The word
itself has all the necessary terms required for understanding the concept
behind it, but primarily, the science of making machines work with min-
imal or no human assistance, coined as ML. In the broader and deeper
sense, the programs are developed and introduced into the systems in such
a way that they can learn themselves through accessing data and revise the
way they interact with the end-users.
Numerous ML algorithms are proposed to process data and seek hid-
den insights without being given explicit instructions. We frequently come
across ML without even realizing it. In fact, the extent to which it is being
used actively is far than one could have expected. The wide range of applica-
tions, explored in almost all domains, encompassing autonomous vehicles,
virtual assistants, infotainment, medical imaging, object tracking, com-
puter vision (CV), and much more, is the evidence of its acceptance among
programmers. According to one article in Gartner 2019 report [6], ML has
been on top priority in organization for past years. In fact, the adoption by
leading tech giants for enormous success in business expansion has tripled
over the past decade. From Google’s endeavor to design driverless cars to
using Apple’s Siri via voice commands to Amazon’s on-site product rec-
ommendation, ML is touted as key pillar. The area is evolving so fast that
business leaders have to shift their focus to deviate from the way they deal
with the market challenges. If they do not work swiftly on their business
model, they might fall behind or not even exist by 2023, the Microsoft
reveals in its report [7].
The rise over the applications from past decade has been certainly cor-
related with turbulent processing powers of system and intricate data set
with increased difficulty to interpret. The objective of applications is often
learning the model without being given excessive instructions explicitly
and generalising the unlabelled data through acquired intelligence. The
intelligence implies here is what we call as self-learning and it truly derives
from Artificial Neural Network (ANN) [8–10], a gamut of ML algorithms,
where model heavily relies on feature engineering. It is analogous to bio-
logical human brain where the neurons are imperative element. They are
Hitting the Success Notes of Deep Learning 79
Weights
Bias
y1
w1j θj
f
y2
w2j Σ
Targets
yn
wnj
Inputs
Inputs
Hidden States Weighted sum Activation function
(output from
previous layer)
generic term that refers to making systems learn about self-adaption. On the
other hand, ANN stems from ML that can frame the algorithms that simu-
lates human brain. It processes the data through multiple layers as mentioned
above. Enormous amount of data, normally shared through ubiquitous appli-
cations and in unstructured format, is difficult to grasp. But through DL, it
becomes least challenging when it comes to unravel this wealth of data. In
few words, if ML builds a model around several parameters and triggers the
results, those results can be served further to start building ANN model.
Output Layer
Input Layer
Hidden Layers
Input Data
Feature Extraction
Classifier
Output
Output
feature maps
Input feature maps
feature maps
process is given in Figure 5.5. It consumes less processing time which involves
reduced number of observations along with biased reusability.
Output
parameter sharing
hidden
states
Input Vector
like time series. It applies similar task with each element of sequence, and
yields output being heavily dependent on past computations. RNNs are
capable of constructing language models [21] and text generation might
rely on, Natural Language Processing (NLP), or Language Translation
Systems (LTSs).
y1
y2 y3
x1
y4 y6
x2
x4 x5 x6
1 1 1 1 1 1
0 0 1 0 1 1
1 0 1
1 0 1 1 1 1 X 0 1 0
0 1 0 1 0 0
Convolution 1 0 1
1 0 1 0 1 1
1 1 0 0 1 1 3×3 Filter
23 34 13 49
45 61 23 76 MaxPooling
61 76
23 96 35 47 96 67
16 45 67 43
output
Table 5.2 Some CAD references driven by deep learning and medical imaging.
Reference Algorithm(s) used Performance (in %) Description
[32] CNN Accuracy = 98 to 99 MRI is used to investigate whether the
K-Nearest Neighbor patient is vulnerable to Alzheimer’s
Support Vector Machine disease or not.
[33] CNN Accuracy = 97.5 Brain tumor detection by MRI scans
[34] Deep Neural Network (DNN) Accuracy = 96.97 Classification of brain tumors using
Principal Component-Analysis Precision = 97 MRI
Discrete Wavelet Transform
[35] CNN Area Under Curve (AUC) = 87 to 94 Chest radiographs are exploited for
pathology detection
[36] DNN Sensitivity = 96.2 Lung cancer diagnosis using CT scans
Linear Discriminate Analysis Specificity = 94.2
Accuracy = 94.56
[37] Heterogeneous Modified ANN Accuracy = 97.5 Kidney disease diagnosis by utilizing
Machine Learning Algorithms and Applications
(Continued)
91
92
Table 5.2 Some CAD references driven by deep learning and medical imaging. (Continued)
Reference Algorithm(s) used Performance (in %) Description
[44] Deep Learning Accuracy = 86.7 Classification of CT images into
Bayesian function COVID-19 and other irrelevant
groups
[45] Deep Learning Category-wise results are The Bone Scan Index (BSI) is embedded
mentioned, e.g., Lung cancer: to evaluate different tumors (e.g.,
Sensitivity = 62.5 Prostate: prostate cancer, breast cancer, and
Sensitivity = 87 Melanoma = 60 lung cancer)
[46] CNN Accuracy = 98.5 Exploitation of ECG signals to classify
Long short-term memory cardiovascular disease
[47] Deep Learning Accuracy = 99.5 3D stereophotogrammetry and deep
learning are combined to study
craniosynostosis condition in infants
[48] Deep Learning Accuracy = 72.5 To detect appendicitis, a 3D model
Machine Learning Algorithms and Applications
5.8 Conclusion
Since various cornerstones of DS are deeply engaged with each other, it
often takes long discussion to review them individually on the ground
Hitting the Success Notes of Deep Learning 95
level. However, through this article, we are able to underline stark dif-
ferences among those corners, paying special attention to DL and ANN.
While ANN takes the advantage of computational units, viz., neurons to
control the flow of information from input units to output units through
connections, DL simply looks after feature capturing and transformation
that ends up making a relationship with set of stimuli and associated neu-
ral results. Depending upon model requirements and objectives, there are
numbers of NN architectures available. Moreover, the most trivial class of
DL is ConvNet, which influence CV tasks like image pre-processing, object
analysis, object detection, and recognition. We provide a generic view of
CNN multi-layer architecture and their respective functions. It also comes
up with slight variations in layers, proposed by different groups or authors.
CNN architectures are highly useful because the time that consumes while
training tends to get eliminated as these architectures offer pre-trained
networks.
Thereafter, medical applications are discussed that discloses how effica-
cious DL is in solving the queries associated with medical domain. From
Alzheimer’s to zoonotic diseases, brain tumor classification to chronic kid-
ney disease diagnosis, DL image processing duo emerges as a silver-line
for medical informatics, bioinformatics, or health data analytics. But we
should not much celebrate its growth, for there are obstacles that could
compromise its flourish. We came forward with quoting major challenges,
often we face in NN-based learning specifically in medical domain. We
should not consider only one perspective in seeking solutions because
there have always been a room for innovation. However, we can keep our
eyes out for more researches in the future that nurtures through DL.
References
1. Mohan, S. et al., Effective heart disease prediction using hybrid machine
learning techniques. Special Section on Smart Caching, Communications,
Computing and Cybersecurity for Information-Centric. Internet Things, 7,
81542–81554, 2019.
2. Stephenson, N. et al., Survey of machine learning techniques in drug discov-
ery. Curr. Drug Metab., 9, 20, 185–193, 2019.
3. Hamedan, F. et al., Clinical decision support system to predict chronic kidney
disease: A fuzzy expert system approach. Int. J. Med. Inf., Elsevier, 138, 1–9, 2020.
4. Bhatti, G., Machine learning based localization in large-scale wireless sensor
networks. Sensors, 18, 4179, 2018.
5. Mosavi, A. and Varkonyi-Koczy, A.R., Integration of machine learning and
optimization for robot learning. Adv. Intell. Syst. Comput., 519, 349–355, 2017.
96 Machine Learning Algorithms and Applications
6. Smarter with Gartner, Gartner top 10 strategic technology trends for 2019.
Homepage https://www.gartner.com/smarterwithgartner.
7. Microsoft Annual report 2019. Homepage www.microsoft.com.
8. Arimura, H., Tokunaga, C., Yamashita, Y., Kuwazuru, J., Magnetic resonance
image analysis for brain CAD systems with machine learning, in: Machine
learning. Computer-aided diagnosis: Medical imaging intelligence and analysis, K.
Suzuki (Ed.), pp. 258–296, The University of Chicago, IGI global, USA, 2012.01.
9. El-Dahshan, E.A., Hosny, T., Salem, A.B.M., Hybrid intelligent techniques for
MRI brain images classification. Digital Signal Process., 20, 2, 433–441, 2010.
10. Ortiza, A., Gorriz, J.M., Ramırez, Salas-Gonzalez, D., Improving MRI seg-
mentation with probabilistic GHSOM and multi-objective optimization.
Neurocomputing, 114, 118–131, 2013.
11. Junwei, H. et al., Advanced Deep-Learning Techniques for Salient and Category-
Specific Object Detection. IEEE Signal Process. Mag., 35.1, 84–100, 2018.
12. Ajeet, R.P., Manjusha, P., Siddharth, R., Application of deep learning for
object detection. Proc. Comput. Sci., 132, 1706–1717, 2018.
13. Liu, et al., Deep learning for generic object detection: A survey. Int. J. Comput.
Vision, 128, 261–318, 2019.
14. Mohammad, H.H., Wenjing, J., Xiangjian, H., Paul, K., Deep learning tech-
niques for medical image segmentation: Achievements and Challenges.
J. Digit. Imaging, 32, 582–596, 2019.
15. Hinton, G.E., A practical guide to training restricted boltzmann machines,
in: Neural Networks: Tricks of the Trade, pp. 599–619, Springer, Berlin,
Heidelberg, 2012.
16. Bengio, Y. et al., Learning deep architectures for AI. Trends® Mach. Learn., 2,
1, 1–127, 2009.
17. Liou, C.Y., Cheng, W.C., Liou, J.W., Liou, D.R., Autoencoder for words.
Neurocomputing, 139, 84–96, 2014.
18. Rehman, S.U. et al., Unsupervised pre-trained filter learning approach for
efficient convolution neural network. Neurocomputing, 365, 171–190, 2019.
19. Zhang, et al., A new JPEG image steganalysis technique combining rich
model features and convolutional neural networks. Math. Biosci. Eng., 16, 5,
4069–4081, 2019.
20. Guan, Y. and Ploetz, T., Ensembles of deep LSTM learners for activity rec-
ognition using wearables. Proceedings of the ACM on Interactive, Mobile,
Wearable and Ubiquitous Technologies, 1.2, 1–28.
21. Majumder, N., Poria, S., Gelbukh, A., Cambria, E., Deep learning-based doc-
ument modeling for personality detection from text. IEEE Intell. Syst., 32, 2,
74–79, 2017.
22. Baldi, P. and Pollastri, G., The Principled Design of Large-Scale Recursive
Neural Networks Architectures-DAG-RNNs and the Protein Structure
Prediction Problem. J. Mach. Learn. Res., 4, 575–602, 2003.
Hitting the Success Notes of Deep Learning 97
23. Ponti, M. A., et al. Everything you wanted to know about deep learning for
computer vision but were afraid to ask. 2017 30th SIBGRAPI conference on
graphics, patterns and images tutorials (SIBGRAPI-T). IEEE, 2017.
24. IEEE Computational Intelligence Society, https://cis.ieee.org.
25. Krizhevsky, A., Sutskever, I., Hinton, G.E., Imagenet classification with deep
convolutional neural networks. Adv. Neural Inf. Process. Syst., 25, 1097–1105,
2012.
26. Dong, Y. and Bryan, A., Evaluations of deep convolutional neural net-
works for automatic identification of malaria infected cells, in: IEEE EMBS
International Conference on Biomedical & health informatics (BHI), pp. 101–
104, 2017.
27. Simonyan, K. and Zisserman, A., Very deep convolutional networks
for large-scale image recognition. International Conference on Learning
Representations, 2015, pp. 1–14, arXiv preprint arXiv:1409.1556, 2014.
28. He, K., Zhang, X., Ren, S., Sun, J., Deep residual learning for image recog-
nition, in: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 770–778, 2016.
29. Santosh, K.C., AI-driven tools for coronavirus outbreak: Need of active
learning and cross-population train/test models on multitudinal/multi-
modal data. J. Med. Syst., 44, 1–5, 2020.
30. Leligdowicz, A., et al. Ebola virus disease and critical illness. Critical Care,
20.1, 1–14, 2016.
31. Xu, M. et al., Identification of small-molecule inhibitors of Zika virus infec-
tion and induced neural cell death via a drug repurposing screen. Nat. Med.,
22, 1101–1107, 2016.
32. Khagi, B., Kwon, G.R., Lama, R., Comparative analysis of Alzheimer’s dis-
ease classification by CDR level using CNN, feature selection, and machine-
learning techniques. Int. J. Imaging Syst. Technol., 29, 1–14, 2019.
33. Seetha, J. and Raja, S.S., Brain tumor classification using convolutional neu-
ral networks. Biomed. Pharmacol. J., 11, 3, 1457–1461, 2018.
34. Mohsen, H. et al., Classification using deep learning neural networks for
brain tumors. Future Comput. Inf. J., 3, 68–71, 2018.
35. Bar, Y. et al., Chest pathology detection using deep learning with non-
medical training. 12th international symposium on biomedical imaging (ISBI),
pp. 294–297, IEEE, 2015.
36. L.S.K. et al., Optimal deep learning model for classification of lung cancer on
CT images. Future Gener. Comput. Syst., 92, 1–31, 2018.
37. Ma, et al., Detection and diagnosis of chronic kidney disease using deep
learning-based heterogeneous modified artificial neural network. Future
Gener. Comput. Syst., 111, 17–26, 2020.
38. Chang, J. et al., A mix-pooling CNN architecture with FCRF for brain tumor
segmentation. J. Vis. Commun. Image R, 58, 1–23, 2018.
98 Machine Learning Algorithms and Applications
39. Fong, S.J. et al., Composite Monte Carlo decision making under high uncer-
tainty of novel coronavirus epidemic using hybridized deep learning and
fuzzy rule induction. Appl. Soft Comput. J., 93, 1–27, 2020.
40. Apostolopoulos, I.D. and Mpesiana, T.A., Covid19: automatic detection from
Xray images utilizing transfer learning with convolutional neural networks.
Phys. Eng. Sci. Med., 43, 1–6, 2020.
41. Ucar, F. and Korkmaz, D., COVIDiagnosis-Net: Deep Bayes-SqueezeNet
based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray
images. Med. Hypotheses, 140, 1–12, 2020.
42. Ozturk, T. et al., Automated detection of COVID-19 cases using deep neural
networks with X-ray images. Comput. Biol. Med., 121, 1–11, 2020.
43. Beck, B.R. et al., Predicting commercially available antiviral drugs that may
act on the novel coronavirus (SARS-CoV-2) through a drug-target interac-
tion deep learning model. Comput. Struct. Biotechnol. J., 18, 784–790, 2020.
44. Xu, X. et al., A Deep Learning System to Screen Novel Coronavirus Disease
2019 Pneumonia. Engineering, 6, 1–11, 2020.
45. Wuestemann, J. et al., Analysis of Bone Scans in Various Tumor Entities Using
a Deep-Learning-Based Artificial Neural Network Algorithm—Evaluation
of Diagnostic Performance. Cancers, 12, 1–13, 2020.
46. Lih, O.S. et al., Comprehensive electrocardiographic diagnosis based on
deep learning. Artif. Intell. Med., 108, 1–8, 2020.
47. Jong, G.D. et al., Combining deep learning with 3D stereophotogrammetry
for craniosynostosis diagnosis. Sci. Rep., 10, 1–6, 2020.
48. Rajpurkar, P. et al., AppendiXNet: Deep Learning for Diagnosis of
Appendicitis from A Small Dataset of CT Exams Using Video Pretraining.
Sci. Rep., 10, 1–7, 2020.
49. Al-antari, M.A. and Kim, T.S., Evaluation of Deep Learning Detection and
Classification towards Computer-aided Diagnosis of Breast Lesions in Digital
X-ray Mammograms. Comput. Methods Programs Biomed., 196, 1–38, 2020.
50. Lee, J.H. et al., Application of deep learning to the diagnosis of cervical
lymph node metastasis from thyroid cancer with CT: External validation and
clinical utility for resident training. Eur. Radiol., 29, 1–7, 2020.
6
Two-Stage Credit Scoring Model Based
on Evolutionary Feature Selection
and Ensemble Neural Networks
Diwakar Tripathi1*, Damodar Reddy Edla2, Annushree Bablani2
and Venkatanareshbabu Kuppili2
1
Thapar Institute of Engineering & Technology Patiala, Punjab, India
2
National Institute of Technology Goa, Ponda, India
Abstract
Credit scoring is a progression to estimate the risk accompanying with a credit
merchandise. Nowadays, huge number of credit application data with different
features set is available of various credit products, as credit industries are issuing
various credit product. So, key challenge is to discover insight on how to improve
the performance ahead of time using data mining techniques, as it directly effects
to viability of that industry. Foremost emphasis of this article is to improve the
analytical performance of credit scoring model. Toward that, this article intro-
duces a two-stage credit scoring model. First stage focuses for reduction of irrele-
vant and noisy features because they may affect model’s performances. In second
stage, ensemble of neural networks is utilized for categorizing credit applicant. For
feature selection, an evolutionary approach by utilizing “Hybrid Binary Particle
Swarm Optimization and Gravitational Search Algorithm (BPSOGSA)” is pro-
posed and correlation coefficient is considered as criteria function to compute the
fitness value against each search agent generated by BPSOGSA algorithm. Further,
proposed credit scoring model is validated on four different credit applicants’ real-
world credit scoring datasets.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (99–116) © 2021 Scrivener Publishing LLC
99
100 Machine Learning Algorithms and Applications
6.1 Introduction
Credit scoring is a risk approximation approach associated with credit prod-
ucts such as credit card and loans and is estimated based on applicants’ his-
torical data [1]. As specified by Thomas et al. [2], “Credit scoring is a set of
decision models and their underlying techniques that aid credit lenders in
the granting of credit” [3]. Credit scoring efforts to isolate the consequence
of diverse applicants’ characteristics dependent on unlawful conduct and
non-payments. The essential point of convergence of credit scoring is to
preference whether an applicant has a place with financially sound group
or not. Credit denotes to amount acquired by an applicant from a monetary
foundation as “credit limit to an applicant is estimated by system on the basis
of customer’s credentials like annual income, property and etc”. Numerous
advantages of credit scoring for monetary foundation incorporate “ascertain-
ing and diminishing credit risk”, “making managerial decisions”, and “cash
flow improvement” [4, 5] and its performance is responsible for the profitabil-
ity of credit industries. It is not a single step progression, every so often, mon-
etary foundations succeed it in several steps some of them are as follows [4]:
6.1.1 Motivation
According to the statistics [7] given by Reserve Bank of India (RBI), the
users of credit cards during 2012–2016 are shown in Figure 6.1.
• The number of credit card holders are 24.50, 21.11, and
19.18 million in the financial year 2015–2016, 2014–2015,
and 2013–2014, respectively, in India [7].
• Total number of home loan account in the financial year
2009–2010 were 57.38 lakhs in India [8].
Two-Stage Credit Scoring Model 101
24.5
Card Holders in Millions
21.11
19.53 19.18
17.65
Together with credit cards, various credit products such as various types
of loan and mortgage and mini and micro finance are also obtainable from
various monetary foundations. Attributable to excessive number of new
candidates and existing clients, credit scoring is quite difficult to do man-
ually or it necessitates massive amount of authorities with expertise in
domain knowledge and customer’s behaviors. At present “credit scoring
is not restricted to banking or credit industries only, many other domains
such as telecommunication, real estate, etc., are also applying credit score
calculation models for investigation of clients’ conduct”. Consequently,
“Artificial intelligence may overwhelm the problem of manual credit scor-
ing”. Enlightening the predictive performance of model specially applicants
with mistrustfulness group will have excessive influence for monetary
foundation [9, 10]. This article emphasizes to boost the classification per-
formance of model by dropping the extraneous and noisy features.
Remainder of the article is organized as follows: Section 6.2 designates
a brief literature analysis. Section 6.3 presents “proposed credit scoring
model”. Section 6.4 demonstrates the investigational outcomes of proposed
approach followed by the concluding remarks based on obtained investi-
gational outcomes.
systems help experts to upgrade their insight for credit risk estimation.
In this context, various Machine Learning (ML) methods are employed
to model the risk estimation systems. Various classification approaches
such as “Artificial Neural Network (ANN)” and “Support Vector Machine
(SVM)” have efficiently applied for credit risk prediction model.
SVM [11] has “superior features of generalization” and “global optimi-
zation”. So, many researchers have employed it as classification tool not
only in credit scoring but in other domains also. Li et al. [12] have pre-
sented a credit risk assessment model by considering SVM as classifica-
tion approach with data pre-processing to distinguish probable applicants
for consumer mortgages. Gestel et al. [13] have employed “Least Squares
Support Vector Machine (LS-SVM)” to evaluate the creditworthiness of
probable corporate clients. West [14] has presented a survey of various
classification methods and evaluated the performances of classifiers on
credit scoring datasets. Xiao and Fei [15] have offered a method based on
“SVM with optimal parameters’ values”. Kuppili et al. [16] have applied
Extreme Learning Machine for credit scoring data analysis with aggrega-
tion of spiking neuron model. Similarly, another approach by considering
“weighted SVM” for credit scoring model is presented by Zhou et al. [17].
An amalgamation of various approaches is a dynamic arrangement
that syndicate the steps of various approaches toward to enhancement
of a specific conclusion. In credit scoring dataset, it has various features
related to applicants’ credentials such as societal and financial status, and
these features are heterogeneous features such as categorical, minor range
numerical, and wide range numerical. So, there is chance that some the
features may be redundant or irreverent which may reduce the mod-
el’s performance. Accordingly, several scientists have considered various
data-preprocessing steps such as “feature selection” and “outlier detection”
as prominent way to enhance the predictive performance of credit scoring
and inclusive descriptions of those methods are presented as follows.
Hybrid credit scoring models with aggregation of SVM as classification
approach and feature selection approaches based on “Multivariate Adaptive
Regression Splines (MARS)” and “F-score” are assessed in article [18].
Similarly, feature selection approaches “Stepwise Regression (SR)” and
“MARS” aggregated with ANN as classification approaches for credit scor-
ing are applied in article [19] and [20], respectively. A hybrid model based
on “Genetic Algorithm (GA)” for opting the valuable features and further
aggregated with classification approach is presented by Oreski and Oreski
[21]. Similarly, Wang et al. [22] and Huang and Dun [23] have applied
“Rough Set and Tabu search” and “Binary Particle Swarm Optimization
Two-Stage Credit Scoring Model 103
(BPSO)” for optimizing the set valuable features and further aggregated
with SVM for classification, respectively.
From the literature, it is detected that feature selection is a prominent
way to improve the predictive performances of classification approaches
and is a combinatorial optimization problem. Bio-inspired algorithms suit-
able for optimization on continuous search spaces. So, for feature optimi-
zation, first conversion of continuous to binary search is needed. Ensemble
is an association the outputs of base learners together to estimate the final
conclusion and in many studied it have been demonstrated that it is more
stable and accurate than single models. With consideration of aforemen-
tioned advantage of feature selection and ensemble approach, this article
presents a “Two-stage Credit Scoring Model based on Evolutionary Feature
Selection and Ensemble Neural Networks” with objective to extract the
knowledge from the credit scoring dataset using ML algorithms and to
improve performance by considering the valuable features.
Classifier-1
Dataset (with selected
Feature subset
Aggregator
Pre-processed
generation
features)
dataset
Classifier-2 Outcome
π k
( )
V1 → S Vijk = erf V (t )
2 ij (6.1)
( ) (
V2 → S Vijk = Tanh Vijk (t ) ) (6.2)
2 π
( )
V3 → S Vijk =
π
arktan Vijk (t )
2
(6.3)
Vijk (t )
( )
V4 → S Vijk = (6.4)
(
1 + Vijk (t ) ) 2
n ∗∑ P ∗ Q − ∑ P ∗∑ Q
Corr = (6.6)
n ∗∑ P 2 − (∑ P )2 ∗ n ∗∑ Q 2 − (∑ Q )2
where P and Q indicate the features, and n designates the number of values
in these features.
n
Interccc =
1
n ∑Corr(F ,Y )
i =1
i (6.7)
n n
Intra ccc =
2
n ∗ (n − 1) ∑ ∑Corr(F ,F )
i =1 j =1
i j (6.8)
106 Machine Learning Algorithms and Applications
Here, Fi and Y denote ith the feature and target label, respectively.
Interccc
fitness − value i = (6.9)
Intra ccc
where fitness valuei represents the fitness value against ith search agent.
TP + TN
Accuracy ( ACC ) = (6.10)
TP + TN + FP + FN
where “TP”, “FP”, “TN”, and “FN” indicate as “True Positive”, “False Positive”,
TP
“True Negative”, and “False Negative”, respectively, and Spe = ,
TN TP + FP
Sen = .
TN + FN
processes are applied. Toward the data-cleaning, data samples with miss-
ing value are eliminated. Toward the data-transformation distinct nomi-
nal values of an attribute is replaced by a unique integer number. Because,
neural network-based classifiers require the data samples as a vector of
real numbers. Numerical feature value is rescaled in a predefined inter-
val respectively. With the purpose of anticipating to the features with
wide-ranging numeric values overlook the features with narrow-ranging
numeric values, features are rescaled between a range using discretization
procedure by applying Boolean Reasoning Algorithm [39].
As per the proposed two-stage model, in first stage, pre-processed data-
sets are fed for feature selection, as mentioned in earlier section that a
transfer function is required to convert the search space from continuous
to binary. Because feature section is binary optimization problem, where
0 and 1 represent that corresponding feature is present and absent respec-
tively in that search agent. From the literature, it is clear that V-shaped
functions are better than S-shaped function.
In literature, various V-shaped functions are available and it is not clear
that which function may perform well in case of feature selection. So, in this
paper, four V-shaped transfer function, namely, V1, V2, V3, and V4, are
applied for converting the continuous to binary search space. Feature selected
by proposed approach is compared with state-of-the-art approach for fea-
ture selection “Correlation-based Feature Selection (CFS) [40]”. Further, the
datasets with selected feature are forwarded to next stage for classification
and outputs predicted by MLFN, RBFN, and TDNN are aggregated by WV
approach. In case of WV approach, weights are required to the base classifi-
ers. For calculating the weights to base classifier, classifiers accuracy is used
as parameter and initially equal weights are allocated to all classifier. Further,
weights to classifiers are updated by Equation (6.12) based on its classification
performances throughout to the iterations [41]. Same procedure is continued
n iterations and at last iteration weights are assigned to respective classifier.
1 Acci
Wiu = Wio Log
2 1 − Acci (6.12)
where wio and wiu indicate the old and updated weights to ith classifier at nth
iteration and acci indicates accuracy of ith classifier.
Accuracy
80
75 75
70 70
65 65
60 60
ALL CFS V1 V2 V3 V4 ALL CFS V1 V2 V3 V4
MLFN DTNN RBFN MV WV UV MLFN DTNN RBFN MV WV UV
Accuracy 80
75
75
70 70
65 65
60 60
ALL CFS V1 V2 V3 V4 ALL CFS V1 V2 V3 V4
MLFN DTNN RBFN MV WV UV MLFN DTNN RBFN MV WV UV
the classification performance of RBFN, MLFN, and TDNN with all afore-
mentioned credit scoring datasets. From the results as in Tables 6.2 to 6.5, it
is also observed that V3 transfer function has better performances in most
of the cases and it also improves the classification performance as com-
pare to same with all features. WV has the best performance as compared
to MV and UV and it makes significant improvement on classification
performances as compared to its base classifiers utilized for construction
of same ensemble framework. Final conclusion is that BPSOGSA-based
feature selection with V3 transfer function and WV has the best perfor-
mances on all four credit scoring datasets.
6.5 Conclusion
In this article, a two-stage credit scoring model has been offered as this is
a prominent research challenge for various credit industries and it directly
affects to viability of that industry. In first stage, we have applied an evo-
lutionary approach by utilizing BPSOGSA that is proposed, and in second
stage, ensemble of neural networks, namely, MLFN, TDNN, and RBFN,
is assembled by WV approach. Further, proposed credit scoring model is
validated on four diverse domain such as credit card and loan applicants’
Two-Stage Credit Scoring Model 113
References
1. Mester, L.J., What’s the point of credit scoring? Bus. Rev., 3, Sep/Oct, 3–16,
1997.
2. Thomas, L., Crook, J., Edelman, D., Credit scoring and its applications. Soc.
Ind. Appl. Math., 2017. https://catalog.princeton.edu/catalog/10425155
3. Louzada, F., Ara, A., Fernandes, G.B., Classification methods applied to
credit scoring: Systematic review and overall comparison. Surv. Oper. Res.
Manage. Sci., 21, 2, 117–134, 2016.
4. Paleologo, G., Elisseeff, A., Antonini, G., Subagging for credit scoring mod-
els. Eur. J. Oper. Res., 201, 2, 490–499, 2010.
5. Tripathi, D., Edla, D.R., Cheruku, R., Kuppili, V., A novel hybrid credit scor-
ing model based on ensemble feature selection and multilayer ensemble clas-
sification. Comput. Intell., 35, 2, 371–394, 2019.
6. Svozil, D., Kvasnicka, V., Pospichal, J., Introduction to multi-layer feed-
forward neural networks. Chemometr. Intell. Lab. Syst., 39, 1, 43–62, 1997.
7. Saha, M., Credit cards issued. http://www.thehindu.com/business/Industry/
Credit-cards-issued-touch-24.5-million/article14378386.ece (2017 (accessed
October 1, 2019)).
8. Fulwari, A., Issues of housing finance in Urban India A symptomatic study
(Doctoral dissertation), Maharaja Sayajirao University of Baroda, 2013.
http://shodhganga.inflibnet.ac.in/handle/10603/28192
9. Wang, G., Ma, J., Huang, L., Xu, K., Two credit scoring models based on dual
strategy ensemble trees. Knowledge-Based Syst., 26, 61–68, 2012.
10. Tripathi, D., Edla, D.R., Cheruku, R., Hybrid credit scoring model using
neighborhood rough set and multi-layer ensemble classification. J. Intell.
Fuzzy Syst., 34, 3, 1543–1549, 2018.
11. Vapnik, V., The nature of statistical learning theory, Springer science & busi-
ness media, 2013.
12. Li, S.T., Shiue, W., Huang, M.H., The evaluation of consumer loans using
support vector machines. Expert Syst. Appl., 30, 4, 772–782, 2006.
13. Van Gestel, T., Baesens, B., Suykens, J.A., Van den Poel, D., Baestaens, D.E.,
Willekens, M., Bayesian kernel-based classification for financial distress
detection. Eur. J. Oper. Res., 172, 3, 979–1003, 2006.
114 Machine Learning Algorithms and Applications
14. West, D., Neural network credit scoring models. Comput. Oper. Res., 27,
11–12, 1131–1152, 2000.
15. Wen-bing, X., II and Qi, F.E., II, A Study of Personal Credit Scoring Models
on Support Vector Machine with Optimal Choice of Kernel Function
Parameters [J]. Syst. Eng.-Theory Pract., 10, 73–79, 2006.
16. Kuppili, V., Tripathi, D., Reddy Edla, D., Credit score classification using
spiking extreme learning machine. Comput. Intell., 36, 2, 402–426, 2020.
17. Zhou, L., Lai, K.K., Yen, J., Credit scoring models with AUC maximization
based on weighted SVM. Int. J. Inf. Technol. Decis. Mak., 8, 04, 677–696,
2009.
18. Chen, W., Ma, C., Ma, L., Mining the customer credit using hybrid support
vector machine technique. Expert Syst. Appl., 36, 4, 7611–7616, 2009.
19. Wongchinsri, P. and Kuratach, W., SR-based binary classification in credit
scoring, in: 2017 14th International Conference on Electrical Engineering/
Electronics, Computer, Telecommunications and Information Technology
(ECTI-CON), IEEE, pp. 385–388, 2017, June.
20. Lee, T.S. and Chen, I.F., A two-stage hybrid credit scoring model using arti-
ficial neural networks and multivariate adaptive regression splines. Expert
Syst. Appl., 28, 4, 743–752, 2005.
21. Oreski, S. and Oreski, G., Genetic algorithm-based heuristic for feature
selection in credit risk assessment. Expert Syst. Appl., 41, 4, 2052–2064, 2014.
22. Wang, J., Guo, K., Wang, S., Rough set and Tabu search based feature selec-
tion for credit scoring. Proc. Comput. Sci., 1, 1, 2425–2432, 2010.
23. Huang, C.L. and Dun, J.F., A distributed PSO–SVM hybrid system with fea-
ture selection and parameter optimization. Appl. Soft Comput., 8, 4, 1381–
1391, 2008.
24. Mirjalili, S., Wang, G.G., Coelho, L.D.S., Binary optimization using hybrid
particle swarm optimization and gravitational search algorithm. Neural
Comput. Appl., 25, 6, 1423–1435, 2014.
25. Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S., BGSA: binary gravitational
search algorithm. Natural Comput., 9, 3, 727–745, 2010.
26. Mirjalili, S. and Lewis, A., S-shaped versus V-shaped transfer functions for
binary particle swarm optimization. Swarm Evol. Comput., 9, 1–14, 2013.
27. Mirjalili, S., Mirjalili, S.M., Yang, X.S., Binary bat algorithm. Neural Comput.
Appl., 25, 3–4, 663–681, 2014.
28. Cohen, J., Cohen, P., West, S.G., Aiken, L.S., Applied multiple regression/
correlation analysis for the behavioral sciences, Routledge, 2013.
29. Bluman, A.G., Elementary statistics: A step by step approach, McGraw-Hill
Higher Education, New York, NY, 2009.
30. Tripathi, D., Edla, D.R., Kuppili, V., Bablani, A., Dharavath, R., Credit scor-
ing model based on weighted voting and cluster-based feature selection.
Proc. Comput. Sci., 132, 22–31, 2018.
31. Tsai, C.F., Lin, Y.C., Yen, D.C., Chen, Y.M., Predicting stock returns by classi-
fier ensembles. Appl. Soft Comput., 11, 2, 2452–2459, 2011.
Two-Stage Credit Scoring Model 115
Abstract
Video summarization is the process of automatically extracting relevant frames or
segments from a video that can best represent the contents of the video. In the pro-
posed framework, a modified block-based clustering technique is implemented
for video summarization. The clustering technique employed is feature agglomer-
ation clustering which results in dimensionality reduction and makes the system
an optimized one. The sampled frames from the video are divided into varying
number of blocks and clustering is employed on corresponding block sets of all
frames rather than clustering frames as a whole. Additionally, image compression
based on Discrete Cosine Transform is applied on the individual frames. Results
prove that the proposed framework can produce optimum results by varying the
block sizes in a computationally efficient manner for videos of different duration.
Moreover, the division of frames into blocks before applying clustering ensures
that maximum information is retained in the summary.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (117–140) © 2021 Scrivener Publishing LLC
117
118 Machine Learning Algorithms and Applications
7.1 Introduction
Video summarization plays a crucial role in effective information exploita-
tion. The growing amount of video data being created and uploaded on
the Internet has resulted in an exploding amount of digital information.
Availability and ease of access to all sorts of recording devices has led to
an increased pace at which videos are generated. The amount of video
information being shared and processed on a day to day basis has substan-
tially increased due to which the demand for video processing techniques
has become a growing concern. Video summarization, a highly relevant
branch of video processing techniques, aims at automatically generating an
abstract view of the video so the user can go through the main events of the
video in less time. Video summaries aim at providing a gist of the original
video in a short duration of time.
Video summaries are typically of two types which are static and
dynamic. Static summaries are generated by combining stationary key
frames, whereas dynamic summaries are formed by combining key seg-
ments from the original video. Strategies adopted to identify and extract
the key frames or segments from videos vary among different categories
of videos. Similarly, the type of output summary suitable for different cat-
egories of videos also varies accordingly. The most effective and compu-
tationally efficient technique for video summarization is by employing
clustering. Clustering is an unsupervised machine learning technique
used for naturally grouping information based on similarity or dissim-
ilarity metrics that differ according to the clustering method. The most
common forms of clustering techniques utilized in video summarization
are k-means, DBSCAN (density-based spatial clustering of applications
with noise), hierarchical, and nearest neighbor algorithm. The distance
metric or similarity computation is the factor that changes with each clus-
tering method. The clustering algorithm adopted for summarizing videos
vary significantly among applications. The importance of clustering algo-
rithms in video summarization is well emphasized in the literature [1–3].
The various types of clustering methods used along with its effectiveness
in different video summarization applications can also be examined. The
effectiveness of various clustering techniques and how it varies according
to each category of videos can also be observed in the existing works which
further substantiates the relevance of conventional clustering approaches
in all categories of video summarization [4].
For video summarization to be successful, it is essential to capture the
details of the individual frames to a minute level so that information loss
Feature Agglomeration for Video Summarization 119
color features from individual frames are extracted and k-means cluster-
ing is performed. The cluster centers are chosen as the final key frame
from each cluster. The disadvantage of k-means clustering is that num-
ber of clusters has to be determined earlier. Kumar et al. [6] implement
another framework using k-means clustering. Here, K means clustering is
applied onto the video in a two-stage process. In the first stage, the video
as a whole is clustered into key frame groups, whereas in the second stage,
frames are divided into equal size partitions and then clustering is applied.
Mundur et al. [7] use Delaunay triangulation clustering for summariza-
tion as opposed to the traditional clustering approaches. Delaunay trian-
gulation leads to clusters of different size based on the content significance
and the advantage is that number of clusters need not be set in advance.
A video summarization framework with spectral clustering is described
by Cirne and Pedrini [8]. Spectral clustering utilizes spectrum (Eigen val-
ues) for clustering. Spectral approaches for clustering has shown better
performance in all domains. Mishra and Singh [9] perform event sum-
marization in videos by finding an optimal clustering algorithm. Frames
closer to the cluster heads are chosen as key frames. A video summariza-
tion framework where summary is obtained by employing graph theoretic
divisive clustering and by constructing minimum spanning tree to select
key frames rather than segmenting to shots is implemented by Guimaraes
and Gomes [10]. Sachan [11] proposes a framework where framesets are
formed by grouping small number of similar contiguous, frames and then,
agglomerative clustering is applied to group similar framesets. The impor-
tance of cluster is obtained from the number of frames in a cluster. Gharbi
et al. [12] implement a framework where candidate frames for summa-
rization are selected using a windowing rule. As a next stage, SURF [13]
features from frames in candidate set are selected. A repeatability table
is constructed which is viewed as a video similarity graph and clustering
is performed on graph modularity. The framework implemented by Wu
et al. [14] bases video representation on high density peak search cluster-
ing algorithm [15]. Majumdar et al. [16] uses Expectation-Maximization
clustering for video summarization. Here, shot boundaries are detected
by deploying a modified color layout descriptor [17]. A framework for
video summarization in surveillance domain is designed using Density-
based Spatial Clustering of Applications with Noise (DBSCAN) clustering
technique and background subtraction by Zhao et al. [18]. HSV color fea-
tures are utilized for the same. Background subtraction is performed after
clustering for identifying the key frames. A detailed analysis of various
clustering techniques used in video summarization currently is surveyed
by John et al. [19].
Feature Agglomeration for Video Summarization 121
Fei et al. [20] put forth a key frame extraction model combining sparse
frame selection and agglomerative hierarchical clustering. Sparse frame
selection is applied initially to obtain the first set of key frames. The criteria
for clustering are content loss minimization and representativeness rank-
ing. The clustering algorithm incorporated is also known as an improved
mutual information-based agglomerative hierarchical clustering. Sony
et al. [21] builds a video summarization framework based on clustering
similar frames using Euclidean distance. The length of the summary is
governed by user defined criteria. A fraction of frames with a compara-
tively larger Euclidean distance from each cluster is extracted to form
the summary sequence. Iterative node segmentation algorithm is used
for summarizing the video. Lv and Huang [22] extract key frames from
personal videos using nearest neighbor clustering algorithm. Clustering
is performed on frames based on the nearest neighbor algorithm which
has the additional advantage of not setting the number of clusters before-
hand. The clustering is not perfect but compared to other unsupervised
clustering methods it performs better. Majumdar et al. [23] put forth a
novel clustering algorithm named CURE (clustering using representative-
ness) hierarchical clustering. Cure clustering algorithm is applied for video
summarization in healthcare application and big data and the extracted
summaries are used for disease analysis and prediction. Chamasemani
et al. [24] develop a video summarization framework using density-based
clustering algorithm for surveillance videos. Global features such as color,
texture, and energy, as well as local SURF features extracted for applying
the clustering algorithm. The combined approach has resulted in a more
informative summary compared to the state of the art. Kuanar et al. [25]
propose a multi-view video summarization framework preferably for sur-
veillance videos based on graph theory approach. Features are extracted on
the basis of bag of words as well as color, texture, and shape before apply-
ing Gaussian entropy-based filtering. Clustering is performed by optimum
path forest algorithm. The results prove that the summaries demonstrate
the effectiveness of the semantic feature set extracted and the clustering
employed. On a similar note, several works have emphasized the signifi-
cance of hybrid clustering approaches. Srinivas and Mohan [26] put forth
such a model where two clustering models based on incremental and hier-
archical principles are combined. A novel cohesion metric is introduced
for computing the inter-cluster distance. The significance of similarity
measures for content-based image and video analysis is also described by
Srinivas and Mohan [27] where efficient medical image classification is
performed on the basis of a combined feature set with three different types
of similarity measures.
122 Machine Learning Algorithms and Applications
350000
300000
250000
200000
150000
100000
50000
0
0 5 10 15 20 25 30 35
7.4.1 Pre-Processing
Video pre-processing is an essential step in every video processing appli-
cation. The pre-processing stage in the proposed model consists of down
sampling the frames, converting the extracted frames to compressed
domain and identifying the shot boundaries.
Down sampling and frame compression. A video comprises of a set of
frames which when put together gives the user a notion of continuity and
motion. It is evident that majority of the time there is not much difference
between the consecutive frames. Hence, not all the frames need to be pro-
cessed and analyzed for its possibility of being a key frame. Instead, the
video is typically down sampled so that it reduces the chances of process-
ing redundant frames thereby making the model computationally efficient.
The strategy for down sampling varies for different application. In the
proposed model, once the input video is divided into individual frames,
the frames are down sampled as one frame in frame rate of the video and
compression is performed on the down sampled frames. The compression
124 Machine Learning Algorithms and Applications
technique applied is DCT. The DCT transforms a frame from the spatial
domain to frequency domain as in Equations (7.1) and (7.2).
(2 x + 1)µπ (2 y + 1)µπ
2
∑ ∑
M −1 N −1
F (u,v ) = C(u)C(v ) ( x , y )cos cos
MN x =0 y =0 2M 2N
(7.1)
(2 x + 1)µπ (2 y + 1)µπ
2
∑ ∑
M −1 N −1
f (x , y ) = C(u)C(v )F (u, v )co s cos
MN x =0 y =0 2M 2N
(7.2)
1
for u,v = 0
C(u),C(v ) = 2
1 for u,v > 0
x,y and u,v are coordinates in the spatial and frequency domain, respec-
tively. M and N are the size of the frame that the DCT is done. The equa-
tion calculates one entry (x,y)th of the transformed image from the pixel
values of the frame matrix.
Shot boundary detection. Shot boundary detection is a crucial step for
majority of the video summarization frameworks. However, there are cer-
tain categories of videos where it is not an essential requirement. A shot is
detected when there is a major change in the background and foreground
of consecutive frames are detected. Shot boundaries can be detected using
various techniques like histogram comparisons and clustering. In the pro-
posed model, shot boundaries are detected by comparing the histograms
of the consecutive down sampled frames.
Once, the DCT of the down sampled frames are obtained, color histo-
grams of the individual frames are computed. The distance between con-
secutive histograms is calculated using histogram comparison methods.
The distance measure used is Bhattacharya distance from Equation (7.3).
d( H1 ,H 2 ) = 1 −
1
H1 H 2 N 2
∑ I
H1 ( I ).H 2 ( I ) (7.3)
Feature Agglomeration for Video Summarization 125
histogram difference
175000
threshold
150000
Histogram difference
125000
100000
75000
50000
25000
0
0 20 40 60 80
Frame numbers
where N is the number of bins in the histogram and H1 and H2 are the
histograms compared. The frames that have distance values higher than
the threshold are identified as shot boundaries and the number of shot
boundaries is retrieved. Figure 7.2 shows the detection of shot boundaries
using histogram comparisons for a sample video named “Jumps.mp4”. The
red line represents the threshold, and the frame numbers corresponding
to the peak points above the threshold are the frames where the histogram
differences are maximum. This represents a scene or shot change since it
can be deciphered that there is an overall change in the color distribu-
tion between the consecutive frames at that point. These frames can be
identified as the shot boundaries. The threshold value for detecting a shot
boundary is computed as the sum of mean and standard deviation of the
histogram differences. The X axis shows the frame numbers and Y axis
shows the histogram difference values for consecutive frames of the video.
ni + nk
d(Ci C j ,Ck ) = d(Ci ,Ck )
ni + n j + nk
n j + nk nk
+ d(C j ,Ck ) − d(Ci ,C j )
ni + n j + nk ni + n j + nk (7.4)
∑
n
d( x , y ) = ( xi − yi )2 (7.5)
i =1
where xij is the representative frame in the ith block jth cluster and i = 1 to
m*m and j = 1 to ki. The candidate frame chosen from each cluster is the
frame with the maximum index in a cluster. The sets of candidate frames
formed from each cluster will have overlapping frames. Hence, it is neces-
sary to develop a strategy with which the key frames can be extracted from
these sets.
Feature Agglomeration for Video Summarization 127
Key frame extraction. The next phase is to extract the key frames
from the representative sets of candidate frames from each cluster. Since
the number of cluster sets is m*m, it can be clearly stated that if a frame
appears in more than half of the count of sets, the corresponding frame
can be identified as a key frame. Equation (7.8) shows the representation
of key frames.
{
Keyframes = xi , j such that the frequency of xi , j ≥
m ∗m
2 }
, (7.8)
(2 µ x µ y + c1 )(2σ xy + c2 )
SSIM ( x, y ) =
( µx2 + µ y2 + c1 )(σ x2 + σ y2 + c2 ) (7.9)
where
μx and μy are the respective averages of x and y.
2
σ x2 and σ y are the respective variances of x and y.
σxy is the covariance and c1 and c2 are constants.
The SSIM between frames in the key frame set is computed. The SSIM
value ranges between −1 and 1 where 1 indicates perfect similarity. If the
SSIM value is approximately 1, then only either of the frames is kept as
key frame. The SSIM index is set to 0.5, greater than which the frames
are treated as similar and discarded. The key frames extracted by the pro-
posed framework before and after post-processing for a sample video “v11.
flv” from cartoons category with duration less than 1 minute, is depicted
in Figures 7.3a and b, respectively. It is evident from Figure 7.3b that the
redundant frames have been eliminated after the post-processing stage.
128 Machine Learning Algorithms and Applications
(a) (b)
Figure 7.3 Key frames extracted by the proposed framework: (a) before post-processing
and (b) after post-processing.
Summary generation. The final key frame set is combined to form the
final static summary. The summary generated by the proposed system and
VSUMM [5] framework for a sample video “v100.avi” from news category
with duration between 1 and 3 minutes is shown in Figures 7.4a and b,
respectively. It is obvious from the comparison that the proposed frame-
work could retrieve majority of the relevant frames, but the total number
of key frames identified by the system is comparatively high. This results in
a drop in precision of the system but it significantly improves the quality of
the summary in terms of enjoyability and continuity. A step-by-step algo-
rithm of the work flow of the proposed model is detailed in Algorithms 7.1
and 7.2.
(a) (b)
VSUMM database. The evaluation criteria and the results obtained for the
proposed framework is detailed in the following sections.
Recall is the fraction of relevant instances that have been retrieved over
the total amount of relevant instances and is evaluated from Equation
(7.12).
F-score is the harmonic mean of precision and recall and trades off the
drawbacks of precision and recall and is evaluated from Equation (7.13).
2 ∗ Precision ∗ Recall
F _ score = (7.13)
Precision + Recall
It is noteworthy that for all the above metrics, a higher value denotes
ideal performance except for CT where a lower value is desirable.
Qualitative parameters. The various qualitative parameters used for per-
formance evaluation are informativeness, enjoyability, continuity, diversity,
coverage, and overall score which are defined below.
Informativeness. Informativeness is the measure of how well the gener-
ated summary provides information about the original video.
Enjoyability. Enjoyability measures the extent to which the generated
summary covers the frames that provide entertainment particularly in
movie genre.
Coverage. Coverage evaluates how well the generated summary encom-
passes the important events in the original video.
Continuity. Continuity is defined as the extent to which the maintenance
of consistency or unbroken flow in scenes is maintained while combining
the key frames.
Diversity. Diversity is the measure which evaluates the extent to which
the summary includes variety of events or scenes and ensures minimal
redundancy in frames.
Overall score. Overall score is the combined score of all the above sub-
jective parameters.
Similar to quantitative metrics, a higher value in all the qualitative met-
rics is desirable for an ideal video summarization framework.
7.5.3 Evaluation
Quantitative analysis. Quantitative analysis based on varying block size,
CT, CR, precision, recall, and f-score are detailed in the following sections.
Varying block size. The clustering performed in the proposed model
is a block-based approach where frames are divided into blocks and cor-
responding blocks from consecutive frames in the shots are clustered.
Experiments are conducted by varying the block size to identify the per-
fect block size in order to achieve the optimum results. Table 7.1 shows the
key frame count (KF#), CT, and CR achieved for a sample set of videos by
varying the block sizes in the range 1 to 6 where TF is the total frame count
132
5 25 8 0.5 7.2
6 36 8 0.5 7.3
(Continued)
Table 7.1 Comparison based on varying block size. (Continued)
Block Average CT
Sl. no. Sample video Duration TF# size m m*m KF# CR (seconds)
3 S3 38s 950 1 1 10 1.1 2.57
2 4 9 1 3.44
3 9 8 0.9 3.86
4 16 6 0.7 4.78
5 25 6 0.5 5.47
6 36 5 0.5 6.47
4 S4 3m55s 5,875 1 1 60 1 5.658
2 4 99 1.7 11.04
3 9 87 1.5 14.2
4 16 65 1.2 17
5 25 55 1.1 22
6 36 52 1 33
Feature Agglomeration for Video Summarization
(Continued)
133
134
of the video. Sample videos represented by S1, S2, S3, S4, and S5 are v11.
flv, v100.avi, Jumps.mp4, v103.avi, and greatwebofwater.mpg, respectively.
The block sizes for which optimum results are achieved are highlighted in
bold. It can be observed that for videos with duration less than 1 minute,
the optimum block size was reached by 3 after which there was no major
change in the summary generated. But for videos with longer duration,
the best summary achieved was for higher block sizes. It can be seen that
for the sample video S5 with duration more than 3 minutes, the optimum
summary was achieved with block sizes 5 and 6. It is evident from these
facts that for videos with even longer duration, the increase in block size
will result in better quality summaries whereas for shorter videos, increase
in block size might just increase the computational overhead rather than
producing better results.
Computation time (CT) and compression ratio (CR). The average CR
achieved and CT taken by the proposed framework for videos of different
duration is plotted in Figures 7.5a and b, respectively. The presence of DCT
compression stage has led to significant reduction in CT. The reduction
in CT can also be attributed to the adoption of a variant of conventional
widely accepted clustering technique for summarization. Even when the
duration of videos is longer the CT taken has not drastically increased. The
results for CR are also promising. The summaries extracted are less than
5 percentage of the original video which is a desirable property of video
summaries.
Average Computation time (seconds)
1 0.98 25
20
Average compression ratio
0.95 20
0.9 0.88 15
0.85 10.55
0.85 10
0.8 5 3.35
0.75 0
less than 1 between 1 greater than less than 1 between 1 greater than
minute and 3 3 minutes minute and 3 3 minutes
minutes minutes
Duration of videos Duration of videos
(a) (b)
Figure 7.5 Performance of proposed framework based on (a) average compression ratio
and (b) average computation time.
136 Machine Learning Algorithms and Applications
Precision, recall, and f-score. Precision, recall, and f-score are computed
by comparing the generated summaries of the proposed framework with
ground truth summaries for the sample videos in the dataset. The average
results obtained for precision, recall, and f-scores for the proposed system
and existing frameworks are plotted in Figure 7.6. The existing frameworks
with which the performance of the proposed framework is compared are
VSUMM [5], Delaunay Triangulation [7], and VRCVS [14]. The average
f-score obtained for the proposed system is 72% which is relatively high
compared with the previous systems. A higher f-score is always a desirable
factor in the performance of video summarization frameworks. F-score
trades off the drawbacks of using precision and recall alone for information
retrieval systems. However, it can be observed that the average precision
value obtained is not the best. This is due to the fact that number of key
frames retrieved by the proposed framework is comparatively higher than
the compared models. Since precision is the fraction of relevant frames in
the retrieved key frames, when the number of retrieved frames is slightly
higher, it is quite obvious that the precision value tends to be on the lower
side as they are inversely proportional. On the other hand, there is a spike
in the value of recall compared to other models. Since, recall is the fraction
of relevant frames retrieved among total relevant frames, a higher value
indicates that the number of relevant frames retrieved as key frames by the
proposed model is on the higher side. It can be observed that the average
recall value obtained is 88% which substantiates that the proposed system
100
88
90
77
Performance measures (%)
80 72
68
70 63 65 62
60 54
47 50 48
50 42
40 Precision
30 Recall
20 F-score
10
0
DT [7] VSUMM [5] VRCVS [14] Proposed system
Video Summarization frameworks
Figure 7.6 Comparison of proposed system with existing systems based on quantitative
metrics.
Feature Agglomeration for Video Summarization 137
Informativeness
5
Overall 3 Continuity
Enjoyability Diversity
7.6 Conclusion
Video summarization aims at providing a gist of the video to the user by
automatically identifying relevant frames or segments from a video. An
ideal video summary should be informative, representative, and at the
same time concise. The proposed system implements a key frame–based
video summarization framework based on feature agglomeration cluster-
ing. The clustering is applied on sets of individual blocks of the sampled
frames. The results prove that the system is computationally efficient due
to the clustering technique used and the compression stage incorporated.
Additionally, it can be seen that when clustering is applied on varying block
sizes, extraction of maximum information from the video is ensured. In
quantitative analysis, the framework achieves a relatively high f-score than
the existing systems. The results can be improved by modifying the block
size based on duration of videos. Qualitative analysis also demonstrates
that the system is capable of producing summaries that are diverse, infor-
mative, and enjoyable. Enhancing the system by making it real time is yet
to be realized which can further make the computation efficient in terms of
space and time. Likewise, employing deep learning techniques which has
gained high popularity these days due to its amazing performance in all
domains is also a prospective future concern.
References
1. Ajmal, M., Ashraf, M.H., Shakir, M., Abbas, Y., Shah, F.A., Video summariza-
tion: techniques and classification, in: International Conference on Computer
Vision and Graphics, Springer, Berlin, Heidelberg, pp. 1–13, 2012.
2. Kaur, P. and Kumar, R., Analysis of video summarization techniques. Int. J.
Res. Appl. Sci. Eng. Technol., (IJRASET), 6, 1, 1157–1162, 2018.
3. Moses, T.M. and Balachandran, K., A classified study on semantic anal-
ysis of video summarization, in: 2017 International Conference on
Algorithms, Methodology, Models and Applications in Emerging Technologies
(ICAMMAET), IEEE, pp. 1–6, 2017.
4. Sreeja, M.U. and Kovoor, B.C., Towards genre-specific frameworks for video
summarization: A survey. J. Visual Commun. Image Represent., 62, 340–358,
2019.
5. De Avila, S.E.F., Lopes, A.P.B., da Luz Jr., A., de Albuquerque Araújo, A.,
VSUMM: A mechanism designed to produce static video summaries and a
novel evaluation method. Pattern Recognit. Lett., 32, 1, 56–68, 2011.
6. Kumar, K., Shrimankar, D.D., Singh, N., Equal partition based clustering
approach for event summarization in videos, in: Signal-Image Technology
Feature Agglomeration for Video Summarization 139
21. Sony, A., Ajith, K., Thomas, K., Thomas, T., Deepa, P.L., Video summarization
by clustering using Euclidean distance, in: 2011 International Conference on
Signal Processing, Communication, Computing and Networking Technologies,
IEEE, pp. 642–646, 2011.
22. Lv, C. and Huang, Y., Effective Keyframe Extraction from Personal Video by
Using Nearest Neighbor Clustering, in: 2018 11th International Congress on
Image and Signal Processing, BioMedical Engineering and Informatics (CISP-
BMEI), IEEE, pp. 1–4, 2018.
23. Majumdar, J., Udandakar, S., Bai, B.M., Implementation of Cure Clustering
Algorithm for Video Summarization and Healthcare Applications in Big
Data, in: Emerging Research in Computing, Information, Communication and
Applications, pp. 553–564, Springer, Singapore, 2019.
24. Chamasemani, F.F., Affendey, L.S., Mustapha, N., Khalid, F., Video abstrac-
tion using density-based clustering algorithm. Vis. Comput., 34, 10, 1299–
1314, 2018.
25. Kuanar, S.K., Ranga, K.B., Chowdhury, A.S., Multi-view video summariza-
tion using bipartite matching constrained optimum-path forest clustering.
IEEE Trans. Multimedia, 17, 8, 1166–1173, 2015.
26. Srinivas, M. and Mohan, C.K., Efficient clustering approach using incremen-
tal and hierarchical clustering methods. International Joint Conference on
Neural Networks (IJCNN), Barcelona, pp. 1–7, 2010.
27. Srinivas, M. and Mohan, C.K., Medical Image Indexing and Retrieval Using
Multiple Features, in: International Conference on Computational Intelligence
and Information Technology, October 2013.
28. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., Image quality assess-
ment: from error visibility to structural similarity. IEEE Trans. Image Process.,
13, 4, 600–612, 2004.
Part 2
MACHINE LEARNING FOR
HEALTHCARE SYSTEMS
8
Cardiac Arrhythmia Detection and
Classification From ECG Signals
Using XGBoost Classifier
Saroj Kumar Pandeyz*, Rekh Ram Janghel and Vaibhav Gupta
Abstract
In this paper, we propose an effective electrocardiogram (ECG) signal classifica-
tion method using XGBoost classifier. The ECG signals are passed through four
phases of data acquisition, noise filtering, feature extraction, and classification. In
first phase, dataset is collected from the MIT-BIH arrhythmia database. In second
phase, noise is removed using baseline wandering removal filter. In next phase, 45
descriptors from four prominent features which showed good results in previous
work, namely, wavelets, higher order statistics (HOS), morphological descriptors,
and R-R intervals, are being employed. Using these features as an input to the
XGBoost classifier, the signals are being classified into four classes (N, S, V, and F)
as per the ANSI-AAMI standards; of all the classifiers, XGBoost shows best result
with an accuracy of 99.43%.
Keywords: Arrhythmia, XGBoost, ECG, wavelet, morphology, R-R interval
8.1 Introduction
Arrhythmia is defined as an abnormality in normal heart rate; it is a gen-
eral indication of cardiovascular diseases which can be severe in most
cases if left unattended. It signifies that the heart is not beating at its normal
rate or it is beating with an irregular pattern [1]. Timely detection of this
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (143–158) © 2021 Scrivener Publishing LLC
143
144 Machine Learning Algorithms and Applications
Classifiers
Statistical performance Validation XGBoost model
N S V F Adaboost model
Figure 8.1 Workflow diagram describing the approach to ECG classification used in the
paper.
the R-peaks and four different points taken from the beats. Four amplitude
values are considered which are 10 maximum from beat [0,40], minimum
from beat [75, 85], minimum from beat (95,105), and maximum from beat
(150, 180).
8.2.4 Classification
Two prominent classifiers, namely, XGBoost and AdaBoost have been used
for the classification of the MIT-BIH Arrhythmia database. The working
of XGBoost classifier and AdaBoost classifier is mentioned/given in the
following.
∑
I
Yˆout = fi ( xi ), fi ∈ A (8.1)
i =1
where I denotes the total trees, and fi is a function in the functional space
A for ith tree, and A is a set having all possible CARTs. In the training
model, each of the trained CARTs will attempt to supplement the far-flung
residual. The objective function (D) been optimized at (n+1)th CART is as
follows:
∑ ( ) ∑
t n
D= l y k , yˆ k(n ) + Ω ( fk ) (8.2)
k =1 k =1
ECG Signal Classification Using XGBoost 149
where l() is the training loss function, yk is the actual output value, and ŷ k(n )
is the predicted values at time step n. The function Ω() is a regularization
term and described as follows:
1
∑
L
Ω( f ) = γ L + λ wl2 (8.3)
2 l =1
where L denotes the number of leaves and wi is the score on lth leaf. When
Equation (8.3) is optimized, the gradient descent can work for the differ-
ential loss function because it uses a Taylor expansion. In addition, the
XGBoost model has no need for feature selection, as good features will be
selected as nodes in the tree. In this study, we use Python programming
with a scikit learn library for the XGBoost classifier. The input sample of
each heartbeat has a total of 45 features and classifies the output into four
distinct classes. It is configured with a learning rate of one, and the number
of boosting stages, i.e., n-estimator is fixed at 50.
F ( x ) = sign (∑ M
m =1
θ m fm ( x ) ) (8.4)
Where
fm: The mth weak classifier.
θm: The corresponding weight.
TP + TN
Accuracy = (8.5)
TP + TN + FP + FN
TP
Sensitivity = (8.6)
TP + FN
TN
Specificity = (8.7)
TN + FN
Table 8.2 XGBoost model performance for heartbeat classification using MIT-
BIH arrhythmia dataset with train-test ratio 60:40.
Evaluation parameters
Heartbeat types Accuracy Sensitivity Precision
Normal (N) 99.66 99.84 99.72
SVEB (S) 99.95 99.61 99.85
VEB (V) 99.93 99.44 99.61
Fusion (F) 99.70 97.40 98.27
Mean 99.81 99.07 99.36
Overall accuracy 99.26
Table 8.3 XGBoost model performance for heartbeat classification using MIT-
BIH arrhythmia dataset with train-test ratio 50:50.
Evaluation parameters
Heartbeat types Accuracy Sensitivity Precision
Normal (N) 99.63 99.83 99.69
SVEB (S) 99.94 99.39 99.85
VEB (V) 99.94 99.44 99.80
Fusion (F) 99.68 97.44 98.04
Mean 99.80 99.025 99.35
Overall accuracy 99.24
classifier, and approximately, 30,208 are used for testing its predictions.
The overall accuracy here is 99.28% and precision is 99.45%.
Table 8.5 shows the results obtained on splitting the database in 80-20
train-test ratio. Approximately, 80,552 records are used for training the clas-
sifier, and approximately, 20,139 are used for testing its predictions. The over-
all accuracy achieved by this technique is 99.33% and precision is 99.55%.
Table 8.6 shows the results obtained on splitting the database in 90-10
ratio. Approximately, 90,261 records are used for training the classifier, and
approximately, 10,070 are used for testing its predictions. The overall accu-
racy achieved here is 99.43%, and precision is 99.60%.
152 Machine Learning Algorithms and Applications
Table 8.4 XGBoost model performance for heartbeat classification using MIT-
BIH arrhythmia dataset with train-test ratio 70:30.
Evaluation parameters
Heartbeat types Accuracy Sensitivity Precision
Normal (N) 99.68 99.86 99.72
SVEB (S) 99.96 99.68 99.80
VEB (V) 99.94 99.44 99.76
Fusion (F) 99.72 97.46 98.52
Mean 99.82 99.11 99.45
Overall accuracy 99.28
Table 8.5 XGBoost model performance for heartbeat classification using MIT-
BIH arrhythmia dataset with train-test ratio 80:20.
Evaluation parameters
Heartbeat types Accuracy Sensitivity Precision
Normal (N) 99.73 99.92 99.74
SVEB (S) 99.97 99.70 99.94
VEB (V) 99.95 99.72 99.58
Fusion (F) 99.72 97.67 98.95
Mean 99.84 99.25 99.55
Overall accuracy 99.33
Table 8.6 XGBoost model performance for heartbeat classification using MIT-
BIH arrhythmia dataset with train-test ratio 90:10.
Evaluation parameters
Heartbeat types Accuracy Sensitivity Precision
Normal (N) 99.77 99.90 99.80
SVEB (S) 99.99 99.88 100
VEB (V) 99.97 99.71 99.86
Fusion (F) 99.79 97.98 98.74
Mean 99.88 99.36 99.60
Overall Accuracy 99.43
accuracy between the XGBoost and AdaBoost classifiers using different train-
test ratios on the MIT-BIH arrhythmia database as shown in given Figure 8.2.
Table 8.8 compares the state-of-art techniques used in automatic ECG
classification with the proposed work. U Rajendra et al. [19] proposed an
11-layer deep convolution neural network that was used to detect conges-
tive heart failure and it gave its best results with accuracy of 98.97%, spec-
ificity of 99.01%, and sensitivity of 98.87%. Nahian et al. [20] dealt with
ECG signal that are decomposed through empirical mode decomposition
(EMD) and higher order Intrinsic Mode Functions (IMFs) are combined
to form modified ECG signal, and then, one-dimensional convolution
neural network is used for classification, and it gives a maximum accuracy
154 Machine Learning Algorithms and Applications
XGBoost AdaBoost
99.24 99.26 99.28 99.33 99.43
100
95 93.29
91.12
Accuracy (%)
90.6
89.55
90
87.22
85
80
50-50 60-40 70-30 80-20 90-10
Train-Test ratios
Performance Comparison
98.97 99.43
Accuracy (%)
93.4
9]
0]
2]
3]
4]
k
1]
or
1
2
2
l. [
l. [
l. [
l. [
l. [
w
l.[
.a
.a
.a
.a
.a
.a
ed
et
et
et
et
et
et
os
ra
an
ita
ee
an
op
n
nd
u
hi
m
uj
Ya
Pr
ch
Na
of
ad
je
Ka
Ra
id
Sh
m
U
Ha
Authors work
Figure 8.3 Comparison between proposed work and other existing literature.
8.4 Conclusion
This paper proposes the classification of ECG signals to four classes: Normal
(N), Supraventricular Ectopic Beat (SVEB), Ventricular Ectopic Beat
(VEB), and Fusion (F) using XGBoost classifier. The data pre-processing
has been done through baseline wandering removal, and then, features are
extracted using segmentation of the signals.
The experiments performed here are done on the MIT-BIH database and
for the record selection scheme intra-patient scheme is adopted. The features
taken into consideration for the classification purpose are R-R intervals, wave-
let, HOS, and our own morphological descriptor. The method proposed here
only takes data from the single lead (lead 2) and uses segmentation for feature
extraction, which is better as compared to other state-of-the-art methods that
require more than one lead for the input data and have more sophisticated
segmentation step like the computation of P, QRS, and T waves. Introducing
more complexity in the segmentation step generally leads to higher error
probability. The results by the XGBoost classifier is better than many state-of-
the-art methods, on 90-10 split of database we are getting the best results with
an overall accuracy of 99.43%. Due to these, the proposed methodology of
ECG classification is better than other state-of-the-art methods.
Future scope of the paper will focus on getting better result in interpa-
tient database scheme where data is split by records of patients into two
groups, one group of patient’s records is used for training and the classifier
is tested on the other set of records.
156 Machine Learning Algorithms and Applications
References
1. Luz, E.J.D.S., Nunes, T.M., De Albuquerque, V.H.C., Papa, J.P., Menotti, D.,
ECG arrhythmia classification based on optimum-path forest. Expert Syst.
Appl., 40, 9, 3561–3573, 2013.
2. Singh, S., Pandey, S.K., Pawar, U., Janghel, R.R., Classification of ECG
arrhythmia using recurrent neural networks. Proc. Comput. Sci., 132, 1290–
1297, 2018.
3. Pandey, S.K. and Janghel, R.R., ECG arrhythmia classification using arti-
ficial neural networks, in: Proceedings of 2nd International Conference on
Communication, Computing and Networking, Springer, Singapore, pp. 645–
652, 2019.
4. Thalkar, S. and Upasani, D., Various techniques for removal of power line
interference from ECG signal. Int. J. Sci. Eng. Res., 4, 12, 12–23, 2013.
5. Elhaj, F.A., Salim, N., Harris, A.R., Swee, T.T., Ahmed, T., Arrhythmia rec-
ognition and classification using combined linear and nonlinear features of
ECG signals. Comput. Methods Programs Biomed., 127, 52–63, 2016.
6. Bassiouni, M.M., El-Dahshan, E.S.A., Khalefa, W., Salem, A.M., Intelligent
hybrid approaches for human ECG signals identification. Signal Image Video
Process., 12, 5, 941–949, 2018.
7. Rajagopal, R. and Ranganathan, V., Design of a hybrid model for cardiac
arrhythmia classification based on Daubechies wavelet transform. Adv. Clin.
Exp. Med.: Official Organ Wroclaw Medical University, 27, 6, 727–734, 2018.
8. Mondéjar-Guerra, V., Novo, J., Rouco, J., Penedo, M.G., Ortega, M., Heartbeat
classification fusing temporal and morphological information of ECGs via
ensemble of classifiers. Biomed. Signal Process. Control, 47, 41–48, 2019.
9. Mathews, S.M., Kambhamettu, C., Barner, K.E., A novel application of deep
learning for single-lead ECG classification. Comput. Biol. Med., 99, 53–62,
2018.
10. Sannino, G. and De Pietro, G., A deep learning approach for ECG-based
heartbeat classification for arrhythmia detection. Future Gener. Comput.
Syst., 86, 446–455, 2018.
11. Arlington, V. A. N. S., Testing and reporting performance results of cardiac
rhythm and ST segment measurement algorithms. ANSI-AAMI EC57, 1998.
12. Diker, A., Avci, D., Avci, E., Gedikpinar, M., A new technique for ECG signal
classification genetic algorithm Wavelet Kernel extreme learning machine.
Optik, 180, 46–55, 2019.
13. Moody, G.B. and Mark, R.G., The impact of the MIT-BIH arrhythmia data-
base. IEEE Eng. Med. Biol. Mag., 20, 3, 45–50, 2001.
14. De Chazal, P., Detection of supraventricular and ventricular ectopic beats
using a single lead ECG, in: Conference proceedings:.. Annual International
Conference of the IEEE Engineering in Medicine and Biology Society. IEEE
Engineering in Medicine and Biology Society. Annual Conference, vol. 2013,
p. 45, 2013.
ECG Signal Classification Using XGBoost 157
15. Mar, T., Zaunseder, S., Martínez, J.P., Llamedo, M., Poll, R., Optimization of
ECG classification by means of feature selection. IEEE Trans. Biomed. Eng.,
58, 8, 2168–2177, 2011.
16. Luz, E.J.D.S., Schwartz, W.R., Cámara-Chávez, G., Menotti, D., ECG-based
heartbeat classification for arrhythmia detection: A survey. Comput. Methods
Programs Biomed., 127, 144–164, 2016.
17. Osowski, S. and Linh, T.H., ECG beat recognition using fuzzy hybrid neural
network. IEEE Trans. Biomed. Eng., 48, 11, 1265–1271, 2001.
18. Torlay, L., Perrone-Bertolotti, M., Thomas, E., Baciu, M., Machine learning–
XGBoost analysis of language networks to classify patients with epilepsy.
Brain Inform., 4, 3, 159–169, 2017.
19. Li, X., Wang, L., Sung, E., AdaBoost with SVM-based component classifiers.
Eng. Appl. Artif. Intell., 21, 5, 785–795, 2008.
20. Acharya, U.R., Fujita, H., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., San
Tan, R., Deep convolutional neural network for the automated diagnosis of
congestive heart failure using ECG signals. Appl. Intell., 49, 1, 16–27, 2019.
21. Hasan, N., II and Bhattacharjee, A., Deep Learning Approach to
Cardiovascular Disease Classification Employing Modified ECG Signal from
Empirical Mode Decomposition. Biomed. Signal Process. Control, 52, 128–
140, 2019.
22. Kachuee, M., Fazeli, S., Sarrafzadeh, M., Ecg heartbeat classification: A
deep transferable representation, in: 2018 IEEE International Conference on
Healthcare Informatics (ICHI), 2018, June, IEEE, pp. 443–444.
23. Yang, Z.M., He, J.Y., Shao, Y.H., Feature selection based on linear twin sup-
port vector machines. Proc. Comput. Sci., 17, 1039–1046, 2013.
24. Shadmand, S. and Mashoufi, B., A new personalized ECG signal classifica-
tion algorithm using block-based neural network and particle swarm opti-
mization. Biomed. Signal Process. Control, 25, 12–23, 2016.
25. Pandey, S.K., Janghel, R.R., Vani, V., Patient Specific Machine Learning
Models for ECG Signal Classification. Proc. Comput. Sci., 167, 2181–2190,
2020.
9
GSA-Based Approach for Gene Selection
from Microarray Gene Expression Data
Pintu Kumar Ram* and Pratyay Kuila†
Abstract
Selection of gene is the most effective method that plays a vital role to detect the
cancers. Due to non-redundant data set, it is very difficult to extract the optimal
features or genes from microarray data. In this paper, we have proposed a new
model to extract the best features subset with high accuracy based on Gravitational
Search Algorithm (GSA) with machine learning classifiers. An extensive sim-
ulation is performed to evaluate the performance of the proposed algorithm.
Simulation results are compared with the Particle Swarm Optimization Algorithm
(PSO). The superiority of the proposed algorithm has been observed.
9.1 Introduction
Nowadays, it has been found that almost 1.8 million cases of cancer have to
be diagnosed in the USA [1]. Normally, cancer is caused due to the abnor-
mal growth of cell and damaged the DNA (Deoxyribo Nucleic Acid) [2]. It
is necessary to detect the abnormal cell at initial stage to prevent the can-
cers. Microarray technology is the great invention in the molecular biology
to address this issue. This technology focuses on identification of cancer-
ous cell by gene expressed data and draws an enormous attention of the
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (159–174) © 2021 Scrivener Publishing LLC
159
160 Machine Learning Algorithms and Applications
Hybrid Sample
Microarray chip
(a)
OFF GENE NOT EXPRESSED
ON GENE EXPRESSED
GENE SWITCH
(b)
Small Eye
Low Nasal Bridge
Underdeveloped Jaw
Figure 9.2 Basic information for gene expression, (a) how gene is expressed, and
(b) external effect by gene expression to produced diseases.
(a)
A (M2)
A (M1) F12
F14
F13
F1
A (M3)
A (M4)
(b)
Start
No t=t+1
Yes
Get the optimal
Solution Termination? Calculate the individual mass (mt)
and overall mass (Mt) for each agent
Figure 9.3 Foundation of GSA in (a) mass attraction, and (b) flowchart of GSA.
9.4.1 Pre-Processing
In this phase, we explore the redundant data to obtain the non-redundant
feature set. We use the SNR technique to get the best feature set from
164 Machine Learning Algorithms and Applications
Rand (0 and 1)
Large Data (In Matrix Extract N number of Select M number of features of
form) features 1 to make new data set. (M<N)
Apply GSA to select the P Number of feature Divide new data set in
Training Data training and testing
agent having max fitness select
Testing Data
New feature subset
Optimal Solution
µ(c1 ) − µ(c2 )
SNR = (9.1)
σ (c1 ) + σ (c2 )
g st − min( g t )
Normalize = (9.2)
max( g t ) − min( g t )
tion of ith agent in dth dimension. Agent with heavy mass moves slowly in
the search space, hence creating and adaptive learning rate in the process
of moving toward optimal solution.
GSA-Based Approach for Gene Selection 165
Calculation of Mass: Each agent has its mass, Massi. The mass is cal-
culated based on the fitness value of the agent itself, best fittest agent and
worst fittest agent. It is calculated by the following equation:
where fitnessi(t) is the fitness of the ith agent at tth iteration, best_fitness(t) be
the best fitness value and worst_fitness(t) be the worst fitness value among
the population.
Calculation of Gravitational Force: We calculate the force from Ai to Aj
on dth dimension with Equation (9.4)
massi (t ) × mass j (t ) d
Fijd = G(t ) × × a j (t ) − aid (t ) (9.4)
Rij (t )+ ∈
−∝t
The gravitational constant is calculated as G(t ) = G0 × e T , where
G0 = 10, ∝ = 20. The maximum iteration is T and we set the ∈ = 0.5,
Rij = |Ai (t ), A j (t )|, be the Euclidean distance between two agents at tth gen-
eration. Then, we need to find out the individual force as well as overall
force at dth dimension by using Equation (9.5)
Forceid (t ) = ∑ rand × F (t )
j =1 , j ≠1
j
d
ij (9.5)
Main Data
A1
A2
A3
A4
A5
Fitness = (A1+A2+A3+A4+A5)/5
Child All: It includes 8,280 genes and 110 samples of two class (before
therapy = 50 and after therapy = 60).
GSE2685 (Gastric Cancer): This kind of data sets is introduced in two
class problem in terms of gastric tumor = 22 and normal cell = 8. Also, it
contains the total number of genes is 4,522.
GSE1577 (Lymphoma and Leukemia): It includes total number of genes
that is 15,434. It contains the 29 samples, where 9 samples of T-LL (T cell
of lymphoblastic lymphoma) in class 1 and 10 samples of T-ALL (T cell
of acute lymphoblastic leukemia) in class 2 and additional 10 sample of
B-ALL (bone marrow samples of acute lymphoblastic leukemia) cell in
class three.
After preprocessing the data set, we divide these selected feature set
of training and testing data in the ratio of 80:20. Then, we have applied
the GSA over training data. The population size is taken as 50 and com-
pared the proposed algorithm with various classifiers like SVM and KNN
(K-Nearest Neighbor) and PSO for the sensitivity, specificity, accuracy,
and f-score. It can be observed from Tables 9.1 to 9.5 that the proposed
algorithm (GSA with SVM) provides better performance for sensitivity,
specificity, accuracy, and f-score than PSO with SVM and KNN based
approach.
In Figure 9.6, we have depicted the fitness value with respect to num-
ber of iteration of all data set and configured that our method (GSA) per-
formed better than that of PSO. Also, we have presented the bar graph for
accuracy with different classifiers as shown in Figure 9.7.
7.5 7
Fitness
Fitness
GSA + SVM
PSO + SVM
7 6.5
6.5 6
6 5.5
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Iteration Iteration
(a) (b)
Gastric Data Lymphoma and Leukemia Data
0.76 0.65
0.74 GSA + SVM
0.64 PSO + SVM
0.72
0.63
0.7 GSA + SVM
PSO + SVM 0.62
0.68
Fitness
Fitness
0.66 0.61
0.64
0.6
0.62
0.59
0.6
0.58 0.58
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Iteration Iteration
(c) (d)
Child All Data
7
GSA + SVM
6.8 PSO + SVM
6.6
6.4
6.2
Fitness
5.8
5.6
5.4
5.2
10 20 30 40 50 60 70 80 90 100
Iteration
(e)
Figure 9.6 Fitness vs. iteration for (a) Prostate data, (b) DLBCL data, (c) Gastric cancer
data, (d) Lymphoma and leukemia, and (e) Child all data.
of color. Red indicates the high expression level, i.e., cancerous, green rep-
resents the low expression level, i.e., noncancerous and black represent the
no expressions of genes. For Prostate Data, we got the five features (39939_
at, 41288_at, 38028_at, 37720_at, and 38634_at), where the percentage of
170 Machine Learning Algorithms and Applications
0.8 0.7
0.7
0.6
0.6
Accuracy
Accuracy
0.5
0.5
0.4
0.4
0.3
0.3
0.2 0.2
0.1 0.1
0 0
SVM KNN SVM KNN
Classifier Classifier
(a) (b)
Lymphoma and Leukemia Data Gastric Data
0.9 1
GSA GSA
0.8 PSO 0.9 PSO
0.8
0.7
0.7
0.6
0.6
Accuracy
Accuracy
0.5
0.5
0.4
0.4
0.3
0.3
0.2 0.2
0.1 0.1
0 0
SVM KNN SVM KNN
Classifier Classifier
(c) (d)
Child All Data
1
GSA
0.9 PSO
0.8
0.7
0.6
Accuracy
0.5
0.4
0.3
0.2
0.1
0
SVM KNN
Classifier
(e)
Figure 9.7 Accuracy vs. Classifier for (a) Prostate data, (b) DLBCL data, (c) Lymphoma
and leukemia data, (d) Gastric cancer data, and (e) Child all data.
GSA-Based Approach for Gene Selection 171
Class
X62078_at
X02152_at
X56494_at
38464_at
32264_at
36651_at
39994_at
class
(a) (b)
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
class
M63154_at
M62628_s_at
D78132_s_at
class
39939_at
41288_at
38028_at
37720_at
38634_at
(c) (d)
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
203492_x_at
203380_x_at
class
(e)
Figure 9.8 Heat map for (a) Child all data, (b) DLBCL data, (c) Prostate data, (d) Gastric
cancer data, and (e) Lymphoma and leukemia data.
172 Machine Learning Algorithms and Applications
black color is more in feature 39939_at and the rest four features are hav-
ing greenish black in color. Therefore, it is quite difficult to analyze the
cancerous or non-cancerous features. For DLBCL data, we have got the
three features (X62078_at, X02152_at, and X56494_at), where the features
(X62078_at and X02152_at) are having lack of expression level of genes
because it contains the higher percentage of black color than the rest of
colors and the feature (X56494_at) contains the higher percentage of green
color than black and red color so it represents the low expression level of
gene (it is called the noncancerous or normal feature). Similarly, for Child
All Data, we have got the four features (38464_at, 32264_at, and 36651_at
and 39994_at), where all the features are having the higher percentage of
green color than red or black color. Therefore, it tends the non-cancerous
features. Also, for Gastric Cancer Data and Lymphoma and Leukemia Data
set, we obtained the three (M63154_at, M62628_s_at, and D78132_s_at)
and two (203492_at and 203380_X_at), respectively. Whereas the % of
green expression is higher of all the features in Gastric data. So, it rep-
resents the non-cancerous. Similarly, in gastric data, feature (203380_x_at)
represents the low expression level but the feature (203492_x_at) could not
tell the appropriate level.
9.6 Conclusion
Here, we propose the model based on selection of feature from the large
amount of gene expression data by the GSA. We have used SNR for feature
preprocessing and select few genes for minimum feature selection with
maximum accuracy. Also, we have used 5-fold cross-validation technique
for fitness evaluation. Moreover, we have applied machine learning classi-
fier (SVM and KNN) with 10-fold cross-validation to get the reduced fea-
tures and maximize the accuracy. For comparison, we have implemented
PSO.
References
1. Islami, F., Sauer, A.G., Miller, K.D., Siegel, R.L., Fedewa, S.A., Jacobs, E.J.,
McCullough, M.L., Patel, A.V., Ma, J., Soerjomataram, I. et al., Proportion
and number of cancer cases and deaths attributable to potentially modifiable
risk factors in the united states. CA: Cancer J. Clin., 68, 1, 31–54, 2018.
2. Ruskin, H.J., Computational modeling and analysis of microarray data: New
horizons 26, 2016.
GSA-Based Approach for Gene Selection 173
3. Ram, P.K. and Kuila, P., Feature selection from microarray data: Genetic
algorithm based approach. J. Inf. Optim. Sci., 40, 8, 1599–1610, 2019.
4. Lee, J.W., Lee, J.B., Park, M., Song, S.H., An extensive comparison of recent
classi_cation tools applied to microarray data. Comput. Stat. Data Anal., 48,
4, 869–885, 2005.
5. Ding, C. and Peng, H., Minimum redundancy feature selection from micro
array gene expression data. J. Bioinf. Comput. Biol., 3, 02, 185–205, 2005.
6. Hua, J., Tembe, W.D., Dougherty, E.R., Performance of feature-selection
methods in the classification of high-dimension data. Pattern Recognit., 42,
3, 409–424, 2009.
7. Yang, C.-S., Chuang, L.-Y., Ke, C.-H., Yang, C.-H., A hybrid feature selection
method for microarray classification. IAENG Int. J. Comput. Sci., 35, 3, 2008.
8. Wang, S.-L., Li, X.-L., Fang, J., Finding minimum gene subsets with heuristic
breadth-first search algorithm for robust tumor classification. BMC Bioinf.,
13, 1, 178, 2012.
9. Nagpal, A. and Singh, V., A feature selection algorithm based on qualita-
tive mutual information for cancer microarray data. Proc. Comput. Sci., 132,
244–252, 2018.
10. Chinnaswamy, A., Srinivasan, R., Poolakkaparambil, S.M., Rough set based
variable tolerance attribute selection on high dimensional microarray imbal-
anced data, in: Data-Enabled Discovery and Applications, vol. 2(1), p. 7, 2018.
11. Prasad, Y., Biswas, K.K., Hanmandlu, M., A recursive pso scheme for gene
selection in microarray data. Appl. Soft Comput., 71, 213–225, 2018.
12. Tang, J. and Zhou, S., A new approach for feature selection from micro
array data based on mutual information. IEEE/ACM Trans. Comput. Biol.
Bioinform., 13, 6, 1004–1015, 2016.
13. Banerjee, M., Mitra, S., Banka, H., Evolutionary rough feature selection in
gene expression data. IEEE Trans. Syst. Man Cybern., Part C (Appl. Rev.), 37,
4, 622–632, 2007.
14. Li, F. and Yang, Y., Analysis of recursive gene selection approaches from
microarray data. Bioinformatics, 21, 19, 3741–3747, 2005.
15. Peng, H., Long, F., Ding, C., Feature selection based on mutual information
criteria of max-dependency, max-relevance, and min-redundancy. IEEE
Trans. Pattern Anal. Mach. Intell., 27, 8, 1226–1238, 2005.
16. Xing, E.P., Jordan, M., II, Karp, R.M. et al., Feature selection for highdimen-
sional genomic microarray data, in: ICML, vol. 1, Citeseer, pp. 601–608,
2001.
17. Guyon, I. and Elissee, A., An introduction to variable and feature selection.
J. Mach. Learn. Res., 3, Mar, 1157–1182, 2003.
18. Yu, L. and Liu, H., Feature selection for high-dimensional data: A fast cor-
relation-based filter solution, in: Proceedings of the 20th international confer-
ence on machine learning (ICML-03), pp. 856–863, 2003.
19. Zhu, Z., Ong, Y.-S., Dash, M., Markov blanket-embedded genetic algorithm
for gene selection. Pattern Recognit., 40, 11, 3236–3248, 2007.
174 Machine Learning Algorithms and Applications
20. Shen, Q., Shi, W.-M., Kong, W., Hybrid particle swarm optimization and
tabu search approach for selecting genes for tumor classification using gene
expression data. Comput. Biol. Chem., 32, 1, 53–60, 2008.
21. Niknam, T. and Amiri, B., An efficient hybrid approach based on PSO, ACO
and k-means for cluster analysis. Appl. Soft Comput., 10, 1, 183–197, 2010.
22. Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S., GSA: a gravitational search
algorithm. Inf. Sci., 179, 13, 2232–2248, 2009.
Part 3
MACHINE LEARNING
FOR SECURITY SYSTEMS
10
On Fusion of NIR and VW Information
for Cross-Spectral Iris Matching
Ritesh Vyas1*, Tirupathiraju Kanumuri2, Gyanendra Sheoran2
and Pawan Dubey2
2
NIT Delhi, Delhi, India
Abstract
Iris has been the most promising biometric trait when it comes to personal authen-
tication. But there are certain situations where iris images captured in one spectrum
or wavelength must be matched against the templates of feature vectors created from
iris images from another spectrum or wavelength. Representation of iris texture
using different spectrum is significantly different. Therefore, this paper investigates
the application of different fusion strategies in combining the information from near
infrared and visible wavelength images to enhance the recognition performance. Two
benchmark cross-spectral iris databases are used in the experimentation, namely,
PolyU cross spectral database and Cross-Eyed database. Fusion schemes at two dif-
ferent levels, feature level and score level, are adopted to validate the hypothesis of
performance improvement through combination of information from different spec-
trum. Results show that fusion helps in improvement of cross-spectral iris matching.
Keywords: Iris recognition, cross-spectral, near infrared (NIR), visible
wavelength (VW), information fusion
10.1 Introduction
Since its inception [1, 2], iris recognition has been the choice of research-
ers working in the field of biometric authentication. Reasons behind such
popularity of iris biometric are the uniqueness, randomness, and stabil-
ity of iris features. The core steps of any iris recognition system comprise
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (177–192) © 2021 Scrivener Publishing LLC
177
178 Machine Learning Algorithms and Applications
average (OWA) was used in [9] to combine the features of NIR and VW
images.
Burge and Monaco [10] implemented pixel-level fusion of different
spectrum images to extract the features. Furthermore, NIR iris images were
predicted based on the features of VW images through multi-stage super-
vised learning. In the work of Zuo et al. [11], the channel information for
NIR was predicted from color image by utilizing a nonlinear multivariate
adaptive mapping and feed forward neural network. Abdullah et al. [12, 13]
proposed a substantial cross-spectral iris recognition framework employ-
ing three separate descriptors and 1D log-Gabor filter. Nalla and Kumar
[6] proposed domain adaptation framework using Markov random fields
model. Recently, Vyas et al. [14] proposed difference of variance (DoV)
utilizing Gabor filtered iris images at multiple scales and orientations.
In this paper, combination of the information furnished by both NIR-
and VW-based iris images is investigated. The purpose of this sort of
fusion is improvement in the performance of cross-spectral iris matching.
The Xor-Sum Code approach, erstwhile developed for NIR-based iris rec-
ognition [15], has been expanded to address the cross-spectral iris recog-
nition. Additionally, a more generalized framework is proposed using two
benchmark cross-spectral iris databases. Fusion of information is investi-
gated at feature level and score level, respectively. Unlike other works on
cross-spectral iris recognition, this work does not require any sort of train-
ing or model to combine the NIR and VW information. Rather, this seems
the first attempt to achieve huge performance improvements in cross-
spectral iris matching through conventional methods of information
fusion.
The rest of the chapter is organized as follows. Section 10.2 provides the
preliminary details about Xor-Sum Code. Section 10.3 presents the exper-
imental results, and Section 10.4 concludes the paper.
imaginary parts of the filtered iris image at one particular orientation, and
2) Sum operation (expressed as eq. (10.4)), which is performed to combine
the Xor operations’ output for all orientations. Thereafter, the output of
Sum operation is encoded into bits to have different planes of binary fea-
ture vectors (see eq. (10.5)), corresponding to the input iris template. The
four core steps of the Xor-Sum Code approach (Filtering, Exclusive OR,
Sum, and Encoding) and their mathematical representations are as follows:
Filtering: In = I * G (10.1)
where I and G denote the input iris image and Gabor filter, respectively.
(x, y, σ, f, θn) are the Gabor filter parameters, respectively, defined as spa-
tial coordinates, scale, frequency, and orientation. Formal definition of
Gabor filter G can be expressed as follows:
1 x2 + y2
G( x , y ,σ , f ,θ n )= exp − × exp{2π if ( x cosθ n +y sinθ n )}
2πσ 2 2σ 2
(10.2)
n*π
where θ n = and n = (0,1,…,N − 1) with N denoting the number of
N
orientations. The optimal parameters used for 2D Gabor filter is given in
Table 10.1.
N +1
Encoding: 1 if b ≤ S < b +
XSC (b) = 2 (10.5)
0 otherwise
Dh ( P , Q ) = b =1
(10.6)
B
10.2.1 Fusion
Fusion of information is performed at two possible levels, at feature level
and at score level. Since the extracted feature template is binary in nature,
two well-known binary operations (i.e., OR operation and AND opera-
tion) are used to combine the features. In brief, OR fusion gives “1” for
any bit position if any of the VW or NIR feature templates has “1” at that
particular position, while AND fusion produces “1” corresponding to the
bit positions which are “1” in both the VW and NIR features. Regarding
score-level fusion, SUM and PROD (product) rule of fusion is employed
here to show the improvement in performance of cross-spectral iris match-
ing. Mathematically, all four fusion schemes employed in this work can be
written as the following equations:
• Feature-level fusion:
where F1 and F2 represent the XOR-SUM features for VW and NIR illumi-
nated iris images, respectively.
• Score-level fusion:
(S1 + S2 )
Fusion using SUM : S =
2
Fusion using PROD: S = (S1 × S2 ) (10.8)
182 Machine Learning Algorithms and Applications
where S1 and S2 are the Hamming distance matching scores for VW and
NIR features, respectively.
10.3.1 Databases
Two major cross-spectral iris databases are used here to investigate the idea
of information fusion from NIR and VW images. First database is The Hong
Kong PolyU cross-spectral database [17], which has 6,270 images from 209
subjects (or 418 classes) at the rate of 30 images per subject (or 15 images
per class). Each image in this database has a resolution of 640 × 480. Second
cross-spectral database is the Cross-Eyed database [18]. This database con-
tains 1,920 images from 240 classes, i.e., 8 images per class. Each image
has a resolution of 400 × 300. Both the employed databases are affluent
with registered iris images which confer pixel-to-pixel consistency between
VW and NIR channel iris images. Sample images from both the databases
are depicted in Figure 10.1. In this paper, 1,050 images from 35 subjects of
PolyU database and 1,000 images from 125 subjects of Cross-Eyed database
are used. For segmentation purpose, method suggested in [19] is employed.
(a) NIR image (b) VW image (c) NIR image (d) VW image
Database
Cross-Eyed
spectral
Database
Cross-
PolyU
false acceptance rate (FAR) (on x-axis) and genuine acceptance rate (GAR)
(on y-axis). While, GAR is obtained by subtracting false rejection rate (FRR)
from one. Both FAR and FRR denote the falsely accepted and rejected users,
respectively, at varying decision thresholds. For investigating the use of
fusion strategies to enhance the performance of cross-spectral iris matching
system, the following different matching scenarios are considered.
1
EER: 6.64%
Red
Green EER: 10.7%
0.9 EER: 7.52%
Blue
NIR EER: 12.4%
0.8 EER
0.7
GAR
0.6
0.5
0.4
0.3
10-5 10-4 10-3 10-2 10-1 100
FAR
0.8
GAR
0.7
0.6
Figure 10.2 ROC curves for same-spectral matchings of (top) PolyU and (bottom)
Cross-Eyed.
184 Machine Learning Algorithms and Applications
Cross-Eyed databases are depicted in Figures 10.2 (top) and (bottom). Four
different matching cases are exhibited in the ROC curves using red, green,
blue, and NIR channels, respectively. The EER values for all same-spectral
matchings are also displayed for more clarification. Table 10.2 summarizes
all EER values in tabular form.
It can be observed from Figure 10.2 (top) that the best EER of 6.64% is
yielded for red channel of VW images from PolyU database, followed by
7.52%, 10.74%, and 12.38% in case of green, blue, and NIR channel match-
ings, respectively. Consequently, it can be stated that the red channel of the
PolyU dataset images provides the most discriminating information when
compared to the other channels. Considering Cross-Eyed database [Figure
10.2 (bottom)], the Xor-Sum descriptor again provides best EER in case of
red-channel matchings with 3.92%. Unlike PolyU database, NIR images
of Cross-Eyed database have greater discrimination than the blue channel
of VW images. This fact is supported by the EER value (6.11%) obtained
in NIR matching of Cross-Eyed database, which is relatively lower as
compared to that obtained in blue channel matchings (7.15%).
1 EER: 8.06%
Red vs. Green EER: 12.5%
Red vs. Blue
0.8 Green vs. Blue EER: 15.7%
Red vs. NIR
Green vs. NIR EER: 38.4%
0.6 Blue vs. NIR
EER EER: 39.6% EER: 41.2%
GAR
0.4
0.2
0
10-6 10-5 10-4 10-3 10-2 10-1 100
FAR
1 EER: 6.11%
EER: 9.35%
Red vs. Green EER: 13.9%
Red vs. Blue EER: 17.8%
0.8 Green vs. Blue EER: 19.2%
Red vs. NIR EER: 25.2%
Green vs. NIR
0.6
Blue vs. NIR
GAR
EER
0.4
0.2
0
10-6 10-5 10-4 10-3 10-2 10-1 100
FAR
Figure 10.3 ROC curves for cross-spectral matchings of (top) PolyU and (bottom)
Cross-Eyed.
and Red channel features are matched. Similarly, AND fusion also results
in decrease in EER values in cross-spectral matchings. This fall in EER
values after feature-level fusion is more clearly demonstrated in Figure
10.4. In this figure, blue colored bars indicate the EER values achieved
in cross-spectral matching without using any feature-level fusion. While
green and pink colored bars indicate EER values after feature-level fusion
using OR and AND operator, respectively.
EER values are shown with respect to each of the channels of VW
images, i.e., red, green, and blue. On an overall basis, for PolyU database,
feature fusion using OR operation brings the EER values in the range of
10%–16%. Similarly, fusing the features using AND operation results in
EER values falling in the range of 9%–16%, which is also smaller when
compared to the EER values of matching without any fusion. Similarly, for
Cross-Eyed database, the EER values fall in the range of 5%–13% for fusion
using OR and 5%–12% for fusion using AND.
30
20
10
0
Ch. = Red Ch. = Green Ch. = Blue
35
Ch. vs. NIR
30 (Ch. + NIR) vs. NIR
(Ch. · NIR) vs. NIR
25
20
EER (%)
15
10
0
Ch. = Red Ch. = Green Ch. = Blue
Figure 10.4 Bar charts for EER values after feature-level fusion.
Fusion for Cross-Spectral Iris Matching 189
10.4 Conclusions
The feasibility of advancement in cross-spectral iris recognition using
fusion of information obtained from NIR and VW images is investigated
in this paper. Two databases, PolyU cross-spectral and Cross-Eyed data-
bases, are employed to validate the said purpose. Binary features, from
iris images captured in both the spectrum, are extracted using Xor-Sum
Code. Feature-level fusion and score-level fusion are implemented to com-
bine the information from different spectrum. Results indicate that AND
fusion between binary features of different wavelengths performs better
than fusion using OR. Regarding score-level fusion, best results, in terms
of EER, are obtained when scores from NIR and red and green channels
of VW images are fused together. Finally, the gist of the paper is that if the
information from different spectrum is combined, then the performance
of cross-spectral iris matching can be significantly improved. In future,
sophisticated fusion schemes, like learning-based fusion, can be utilized to
further enhance the recognition accuracy.
References
1. Flom, L. and Safir, A., Iris recognition system. U.S. Patent No. 4, 641, 349,
1987.
2. Daugman, J.G., High Confidence Visual Recognition of Persons by a test
of statistical independence. IEEE Trans. Pattern Anal. Mach. Intell., 15, 11,
1148–1161, 1993.
3. Bowyer, K.W., Hollingsworth, K., Flynn, P.J., Image understanding for iris
biometrics: A survey. Comput. Vis. Image Underst., 110, 2, 281–307, 2008.
4. Ross, A., Pasula, R., Hornak, L., Exploring multispectral iris recognition
beyond 900nm, in: IEEE 3rd International Conference on Biometrics: Theory,
Applications and Systems, BTAS 2009, pp. 1–8, 2009.
5. Ives, R.W., Ngo, H.T., Winchell, S.D., Matey, J.R., Preliminary evaluation
of multispectral iris imagery, in: IET Conference on Image Processing (IPR
2012), pp. 1–5, 2012.
6. Nalla, P.R. and Kumar, A., Towards More Accurate Iris Recognition using
Cross-Spectral Matching. IEEE Trans. Image Process., 26, 1, 208–221, Jan.
2017.
7. Boyce, C., Ross, A., Monaco, M., Hornak, L., Xin, L., Multispectral iris anal-
ysis: A preliminary study, in: 2006 IEEE Conference on Computer Vision and
Pattern Recognition Workshop (CVPRW'06), pp. 51–51, 2006.
Fusion for Cross-Spectral Iris Matching 191
8. Trokielewicz, M., Czajka, A., Maciejewicz, P., Cataract influence on iris rec-
ognition performance. Proc. SPIE - Int. Soc. Opt. Eng., vol. 9290, pp. 1–14,
2014.
9. Tajbakhsh, N., Araabi, B.N., Soltanianzadeh, H., Feature Fusion as a Practical
Solution toward Noncooperative Iris Recognition, in: 11th IEEE International
Conference on Information Fusion, pp. 1–7, 2008.
10. Burge, M.J. and Monaco, M.K., Multispectral iris fusion for enhance-
ment, interoperability, and cross wavelength matching, in: Algorithms and
Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery, vol.
7334, pp. 73341D-1-73341D–8, 2009.
11. Zuo, J., Nicolo, F., Schmid, N.A., Cross spectral iris matching based on pre-
dictive image mapping, in: 2010 Fourth IEEE International Conference on
Biometrics: Theory, Applications and Systems (BTAS), pp. 1–5, 2010.
12. Abdullah, M.A.M., Dlay, S.S., Woo, W.L., Chambers, J.A., A novel framework
for cross-spectral iris matching. IPSJ Trans. Comput. Vis. Appl., 8, 1, 9, 2016.
13. Abdullah, M.A.M., Al-Nima, R.R., Dlay, S.S., Woo, W.L., Chambers, J.A.,
Cross-Spectral Iris Matching for Surveillance Applications, in: Surveillance
in Action, Advanced Sciences and Technologies for Security Applications, P.
Karampelas and T. Bourlai (Eds.), pp. 105–125, Springer, Cham, 2018.
14. Vyas, R., Kanumuri, T., Sheoran, G., Cross spectral iris recognition for sur-
veillance based applications. Multimed. Tools Appl., 78, 5, 5681–5699, 2019.
15. Vyas, R., Kanumuri, T., Sheoran, G., Iris recognition using 2-D Gabor filter
and XOR-SUM code, in: 2016 IEEE 1st India International Conference on
Information Processing (IICIP), pp. 1–5, 2016.
16. Lee, T.S., Tai Sing, L., Lee, T.S., Image representation using 2D Gabor wave-
lets. IEEE Trans. Pattern Anal. Mach. Intell., 18, 10, 959–971, 1996.
17. The Hong Kong Polytechnic University Cross-Spectral Iris Images Database.
[Online]. Available: http://www4.comp.polyu.edu.hk/~csajaykr/polyuiris.
htm.
18. Sequeira, A.F. et al., Cross-Eyed - Cross-Spectral Iris/Periocular Recognition
Database and Competition, in: 5th International conference of the Biometrics
Special Interest Group (BIOSIG 2016), pp. 1–5, 2016, [Online]. Available:
https://sites.google.com/site/crossspectrumcompetition/home.
19. Zhao, Z. and Kumar, A., An accurate iris segmentation framework under
relaxed imaging constraints using total variation model, in: IEEE International
Conference on Computer Vision, pp. 3828–3836, 2015.
11
Fake Social Media Profile Detection
Umita Deepak Joshi1, Vanshika2, Ajay Pratap Singh3, Tushar Rajesh Pahuja4,
Smita Naval5 and Gaurav Singal6*
2
Maharaja Surajmal Institute of Technology, Delhi, India
3
Galgotias University, Greater Noida, India
4
Thadomal Shahani Engineering College, Mumbai, India
5
Malaviya National Institute of Technology, Jaipur, India
6
Bennett University, Greater Noida, India
Abstract
Social media like Twitter, Facebook, Instagram, and LinkedIn are an integral part
of our lives. People all over the world are actively engaged in these social media
platforms. But at the same time, it faces the problem of fake profiles. Fake profiles
are generally human-generated or bot-generated or cyborgs, created for spreading
rumors, phishing, data breaching, and identity theft. Therefore, in this article, we
discuss fake profile detection models. These differentiate between fake profiles and
genuine profiles on Twitter based on visible features like followers count, friends
count, status count, and more. We form the models using various machine learn-
ing methods. We use the MIB dataset of Twitter profiles, TFP, and E13 for genu-
ine and INT, TWT, and FSF for fake accounts. Here, we have tested different ML
approaches, such as Neural Networks, Random Forest, XG Boost, and LSTM. We
select significant features for determining the authenticity of a social media pro-
file. As a result, we get the output as 0 for real profiles and 1 for fake profiles. The
accuracy achieved is 99.46% by XG Boost and 98% by Neural Network. The fake
detected profiles can be blocked/deleted to avoiding future cyber-security threats.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (193–210) © 2021 Scrivener Publishing LLC
193
194 Machine Learning Algorithms and Applications
11.1 Introduction
Social media has become a vital part of our lives. From sharing attrac-
tive extravagant photographs to follow celebrities to chat with close and
far away friends, everyone is active on social media. It is a great platform
to share information and interact with people. But everything has a down-
side. As social media is footing a firm spot in our lives, there are instances
where it has turned out to be a bigger problem.
There are 330 million monthly active users and 145 million daily active
users on Twitter. Facebook also adds about 500,000 new users every day
and six new users every second. Loads of information are shared over twit-
ter every single day. From hot trending topics to the latest hash-tags and
news to one’s most recent trip, you get everything on Twitter. People react,
like, comment, share their views, and raise their opinions all through the
280 character limit. The social media users discuss genuine issues; but,
sometimes there are some rumors. These rumors lead to conflicts between
different sections of society. The concern of privacy, misuse, cyberbully-
ing [1], and false information has come into light in the recent past. Fake
profiles perform all these tasks. Fake accounts can be human-generated or
computer-generated or cyborgs [2]. Cyborgs are accounts initially created
by humans but later operated by computers.
Fake profiles usually get created in false names, and misleading and abu-
sive posts and pictures are circulated by these profiles to manipulate the
society or to push anti-vaccine conspiracy theories, etc. Every social media
platform is facing the problem of fake profiles these days. The goal behind
creating fake profiles is mainly spamming [3], phishing, and obtaining
more followers. Malicious accounts have the full potential to commit cyber-
crimes. The bogus accounts propose a major threat like identity theft and
data breaching [4]. These fake accounts send various URLs to people which,
when visited, send all the user’s data to faraway servers that could be utilized
against an individual. Also, the fake profiles, created seemingly on behalf of
organizations or people, can damage their reputations and decrease their
numbers of likes and followers. Along with all these, social media manipu-
lation is also an obstacle. The fake accounts lead to the spread of misleading
and inappropriate information, which, in turn, give rise to conflicts.
These hoax accounts get created to obtain more followers too. Who does
not want to be voguish on social media? To achieve a high figure of follow-
ers, people tend to find fake followers [5]. Overall, the research findings
revealed that fake profiles cause more harm than any other cybercrime.
Therefore, it is important to detect a fake profile even before the user is
Fake Social Media Profile Detection 195
• The dataset of twitter profiles E13 and TFP for genuine, and
INT, TWT, and FSF for fake is taken into use.
• As a technical contribution, we designed a multi-layer neu-
ral network model [7], a random forest model, an XG boost
model, and an LSTM model. The mentioned models have
supervised machine learning models.
• Also, the LSTM deep learning model classifies based on
tweets; the result can be combined with a convolution neural
network in the near future [8].
The paper is organized into various sections. The existing work is sum-
marized in Section 11.2. Data pre-processing and methodology have been
shown in Section 11.3 and experimental results in Section 11.4 with the
accuracy of models. In the last section, conclusion and future work are
described.
But as far as human-fake accounts are concerned, they tend to adapt and
yet somehow avoid the blacklist. Researches were also done to detect
fake accounts based on factors like engagement rate and artificial activity
[17]. An engagement rate is the percentage of the interaction of the audi-
ence with a post. The engagement rate is calculated as (Total number of
interactions/Total number of followers) × 100. These interactions could be
in the form of likes, shares, or comments.
Artificial activity is based on the number of shares, likes, and comments
made by a particular account. Insufficient information and the status of
verification of email are also considered as an artificial activity. In previous
methods of detecting fake profile, the following features are examined:
11.3 Methodology
For the detection of fake Twitter profiles, we incorporated various supervised
methods, all with the same goal yet different accuracy. Each model detects
a fake profile based upon visible features only. All these supervised models
are fed the same dataset, and corresponding accuracy and loss graphs are
plotted. Also, a comparison graph of the accuracy of different models is indi-
cated. The models are trained using appropriate optimizers, loss functions,
and activation functions. The used models are mentioned below.
11.3.1 Dataset
We used the dataset available on MIB [18]. The data set consisted of 3,474
genuine profiles and 3,351 fake profiles. The data set selected was TFP and
E13 for genuine and INT, TWT, and FSF for fake accounts. The data is
stored in CSV file format for easy reading by the machine.
All the labels on the x-axis depict the features used for the detection of
the fake profile. These got selected during the pre-processing. The y-axis
depicts the number of entries corresponding to each feature available in
the dataset as shown in Figure 11.1.
198 Machine Learning Algorithms and Applications
25
25
25
25
25
25
94
26
68
68
68
68
68
68
22
64
1.0 6825
0.8 5460
0.6 4095
Ratio
0.4 2730
0.2 1365
0.0 0
t
t
nt
nt
d
nt
um
ag ck
un
un
le
ou
u
ou
im ba
ab
co
co
_n
co
e
_c
_c
d_ e_
en
s_
s_
ng
d_
es
ds
un us
ite
er
o_
la
te
us
en
ro e_
w
ur
ge
lis
at
llo
fri
-g ofil
vo
st
fo
fa
pr
Features
11.3.2 Pre-Processing
Before proceeding for the models, we append one more stride, i.e., pre-
processing. The data set is pre-processed before it is fed to a model. For
feature extraction, we used the principal component analysis method. It is
a dimensionality reduction method used to reduce the dimensionality of
a large data set. It transforms a dataset of many variables into smaller ones
that contain vital information in the large dataset (Figure 11.2). It is com-
paratively easy to visualize and analyze small data sets without affecting the
accuracy of the original data set.
value − mean
z= (11.1)
standard deviation
0.35
0.30
Explained variance ratio
0.25
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6 7 8 9 10
Principal components
friends followers status count listed count fav count geo enabled lang num
are merged into one with an additional label for each profile, i.e., “is Fake”
that is a Boolean variable. It is then stored in the Y variable that is the
response concerning a profile X. Finally, the blank entries or NAN are sub-
stituted with zeros.
friends_count
follower_count
status_count
listed_count
INPUT HIDDEN HIDDEN HIDDEN OUTPUT
LAYER LAYER LAYER LAYER RESULT
LAYER
fav_count 1 2 3
lang_num
geo_enabled
use_bg_image
layer has neurons (nodes). We used the sequential from Keras. The model
design with an input layer, three hidden layers, and an output layer has
activation function ReLU for all but the output layer. Sigmoid is used as an
activation function for the output layer. The model compiled using opti-
mizer: Adam, loss function: binary cross-entropy. In our model (Figure
11.3), ANN of the stated above architecture is used. Sigmoid function
finally provides the output between 0 and 1 and based on the prediction of
a particular profile, labelled as fake or genuine.
Hyperparameters
6
Y Axis
2
max(0,x)
0
-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 7.5 10.0
X Axis
1
σ (z ) = (11.3)
1 + e -z
sigmoid
1.0
1
σ(z) =
1 + e-z
0.8
0.6
σ(z)
0.4
0.2
0.0
-10 -5 0 5 10
z
1
σ (z ) = ≈1 (11.4)
1+ 0
1
σ (z ) = ≈0 (11.5)
1 + largenumber
fˆ =
1
B ∑ f (x ′)
b −1
b (11.6)
Tree 2
Tree 1 Tree n
F0 ( x ) = argminγ ∑ L( y , γ )
i =1
i (11.7)
∂ L( yi , F ( x i ))
rim = −α (11.8)
∂ F ( x i )
for neural network LSTM, and model accuracy comparison, and ROC
curve for random forest, XG boost and other methods.
Neural Network: The model’s accuracy and loss graph have been shown
in Figures 11.7 and 11.8 for the trained neural network. After running for
15 epochs, the above accuracy and loss graphs are obtained. Initially, start-
ing from 0.97 the accuracy varies along the path and finally reaches its
maximum, i.e., 0.98. Similarly, the loss graph for testing data begins from
1 and for validation data begins from 4 and eventually reaches a minimum
Model Accuracy
1.00
0.98
0.96
0.94
Accuracy
0.92
0.90
0.88
Train
0.86 Val
0 2 4 6 8 10 12 14
Epoch
Model Loss
5
Train
Val
4
3
Loss
0
0 2 4 6 8 10 12 14
Epoch
point, less than 0.5. To calculate the loss binary cross-entropy function is
used. Initially, random weights get assigned to each feature and finally the
machine defines a unique weight to each feature.
Random Forest and Other Methods: In the comparison chart (Figure
11.9), we observe accuracy of different models, namely, random forest, xg
boost, ada boost, and decision tree. The maximum accuracy is achieved by
XG boost that equals to 0.996. Further, we have a decision tree and ran-
dom forest with approximately similar accuracy of 0.99. The least accuracy
of ADA boost model. Histogram for accuracy comparison and the ROC
curves have been presented in Figures 11.9 to 11.11.
1.000
0.995
0.990
Accuracy
0.985
0.980
0.975
For om
st
Tre ion
st
Boo
oo
est
cis
d
e
B
Ran
De
XG
a
Ad
Models
1.0
0.8
True Positive
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
False Positive
1.0
0.8
0.6
True Positive
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
False Positive
Acknowledgment
We extend our sincere gratitude to the mentors for their patient guid-
ance and useful critiques for this work. Their enthusiastic encouragement
helped us to keep the progress on schedule.
References
1. Elyusufi, Y., Elyusufi, Z., Kbir, M.A., Social networks fake profiles detection
based on account setting and activity, in: Proceedings of the 4th International
208 Machine Learning Algorithms and Applications
15. Hajdu, G., Minoso, Y., Lopez, R., Acosta, M., Elleithy, A., Use of Artificial
Neural Networks to Identify Fake Profiles, in: 2019 IEEE Long Island Systems,
Applications and Technology Conference (LISAT), IEEE, pp. 1–4, 2019.
16. Kharaji, M.Y. and Rizi, F.S., An iac approach for detecting profile cloning in
online social networks. arXiv preprint arXiv:1403.2006, 2014.
17. Raturi, R., Machine learning implementation for identifying fake accounts in
social network. Int. J. Pure Appl. Math., 118, 20, 4785–4797, 2018.
18. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., Tesconi, M., The
Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the
Arms Race, in: In Proceedings of the 26th International Conference on World
Wide Web Companion (WWW ‘17 Companion), International World Wide
Web Conferences Steering Committee, Republic and Canton of Geneva,
CHE, 963972, 2017.
19. Elyusufi, Y. and Elyusufi, Z., Social Networks Fake Profiles Detection Using
Machine Learning Algorithms. The Proceedings of the Third International
Conference on Smart City Applications, Springer, Cham, 2019.
20. Jyothi, V., Hamsini, K., Reddy, G.S., Vasireddy, B.K., Fake Profile Identification
Using Machine Learning. Tathapi with ISSN 2320-0693 is an UGC CARE
Journal, vol. 19(8), pp. 714–720, 2020.
12
Extraction of the Features of Fingerprints
Using Conventional Methods and
Convolutional Neural Networks
E. M. V. Naga Karthik* and Madan Gopal
Abstract
Fingerprints are one of the most common biometric identifiers. Until recently, the
conventional image processing methods were being used in extracting the fea-
tures of the fingerprints for the fingerprint classification problem. However, with
the rise of artificial intelligence, deep learning models such as the Convolutional
Neural Networks (CNNs) have shown promising results in image classification
problems. In this paper, we explain why CNNs are performing better by visual-
izing the features learned by its convolutional layers, and comparing them with
the fingerprints’ features extracted by estimating the local orientation map and
detecting the singular regions. A 17-layer CNN model is proposed, which obtains
a classification accuracy of 92.45% on the NIST-DB4 fingerprint dataset. We con-
clude that the first two convolutional layers are learning features that are similar to
the ones obtained after using the above techniques, while the remaining layers are
learning abstract and more complex features that are class-specific. This explains
why the deep learning models are performing better. The results are promising as
they bring us one step closer in demystifying the inner functioning of the CNNs.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (211–228) © 2021 Scrivener Publishing LLC
211
212 Machine Learning Algorithms and Applications
12.1 Introduction
Biometrics plays an important role in securing user data and maintaining
its privacy. Each individual has a unique biometric trait which can be used
for quick recognition and easy identification. Fingerprint identification
technology has seen tremendous success in recent times due to its low cost
of acquisition and the availability of abundant computational resources.
Fingerprints are characterized by their continuous ridge and valley patterns
that are seen as dark and white regions respectively, in fingerprint images.
The Henry system [1] of fingerprint classification consists of five classes
of fingerprints: Arch (A), Tented Arch (T), Whorl (W), Left loop (L), and
Right loop (R). Figure 12.1 shows the examples of fingerprints from the
NIST DB-4 dataset containing those five classes. Each class shows a dif-
ferent visual pattern, and the distribution of these fingerprints, in general,
is unequal. This is shown in Table 12.1. The fingerprint distribution shows
that the whorl, left loop, and right loop class patterns make upto 93.4%,
while the remaining classes comprise only 6.6% of all fingerprints [2].
There are two main levels in fingerprint analysis: the global level and the
minutiae level. By using the location of singular points as features, finger-
print classification is done by analysing the global structure of fingerprints.
On the other hand, the fingerprint minutiae are used for identification and
recognition tasks.
Figure 12.1 Sample images from the NIST DB-4 dataset showing all five classes.
Extraction of the Features of Fingerprints 213
the ridge-valley structures. The gradient-based methods [4–7] are used for
obtaining the orientation maps and the Poincare index method [4, 8, 9] for
singular region detection.
Hong et al. [6] normalized the fingerprint images and estimated the ori-
entation maps using the least squares method. Based on fixed criteria of the
range of index values as defined in [10], Kawagoe and Tojo [8] and Bo et al.
[9] computed the Poincaré index value and located the singular regions in the
image. Jain et al. [4] proposed an algorithm called FingerCode to extract the
features such that the minutiae details and the global ridge-valley structure
are represented effectively. Bay et al. [11] proposed SURF (Speeded-up Robust
Features) to automatically detect local image features based on the Hessian
matrix for finding the features of interest. Srinivas and Mohan [12] proposed
edge and patch-based feature extraction methods for content-based retrieval
of medical images. In the context of fine-grained object detection, Srinivas
et al. [13] used online dictionary learning sparse and efficient feature repre-
sentations using CNNs. Wang et al. [14] used a stacked sparse autoencoder
for fingerprint classification and obtained a 93.1% accuracy on the NIST
DB-4 dataset. The orientation map was used as a feature input to the model.
Peralta et al. [3] used CaffeNet for fingerprint classification and compared
their results with the accuracies obtained by using SVMs and k-NNs. Both
real (NIST-DB-4) and synthetic (SFinGe) fingerprint datasets were used in
their study. Simonyan et al. [15] proposed a method, where given a trained
CNN and a class of interest, an image that maximizes the corresponding
class score is obtained along with a class saliency map that is unique to the
image and the class. Zeiler et al. [16] proposed a De-Convolutional Neural
Network (DeConvNet) for visualization by mapping the activations back to
the input pixel space. By mapping features to input pixels, parts of the image
causing an activation in the feature map are obtained.
There are two ways in which we can visualize the learnings of a CNN:
(i) by visualizing the layer activations, and (ii) by visualizing the weights of
the network at each layer. In this study, we visualized the layer activations
and the kernels of the initial convolutional layers. This gave us an insight as
to what CNN is learning at each stage from the fingerprint images.
The novelty in our method lies in the fact that there has not been any
study, where the learnings of the CNN have been investigated for the finger-
print classification problem. While there are many results in the literature
which talk about new methods that result in high classification accura-
cies, none of them discuss how these CNNs are able to produce such high
accuracies. Therefore, our work in this paper primarily focuses on com-
paring the global set of features, namely, the singular regions computed
using the Poincare index method [9] and the local orientation maps of the
Extraction of the Features of Fingerprints 215
fingerprints using the least squares method [6], with the features learned
by CNN. We have visualized the features of the initial convolutional lay-
ers of the CNN and have found a correlation between the conventionally
extracted features and what the deep network is learning.
Figure 12.2 Results after applying the SURF algorithm. The green markers show the
detected keypoints. Circles are drawn around each keypoint. The size of the circle depends
on the relative weight of the detected keypoints. Notice that most of the features are biased
toward the corners of the image.
216 Machine Learning Algorithms and Applications
are near the edges of the image, whereas in the bottom-right image, fea-
tures are spread throughout the image—some at the center and a few near
the edges. This non-uniformity does not help in accurately detecting the
location of features across all fingerprint classes. This motivated us to use
the conventional methods to extract the fingerprints’ features.
Figure 12.3 Left: Original, raw fingerprint image. Right: Normalized image.
Extraction of the Features of Fingerprints 217
w w
i+ j+
2 2
Vx (i , j ) = ∑ ∑ 2δ (u, v )δ (u, v )
w w
x y (12.1)
u =i − v = j −
2 2
w w
i+ j+
2 2
Vy (i , j ) = ∑ ∑ δ (u, v )δ (u, v )
w w
2
x
2
y (12.2)
u =i − v= j−
2 2
wφ wφ
2 2
φx′ (i , j ) = ∑ ∑ W (u, v )φ (i − uw , j − vw )
− wφ − wφ
x (12.5)
u= v =−
2 2
wφ wφ
2 2
φ y′ (i , j ) = ∑ ∑ W (u,v )φ (i − uw , j − vw )
− wφ − wφ
y (12.6)
u= v =−
2 2
Orientation Estimation
200
300
400
500
600
Figure 12.4 Left: Original, raw fingerprint image. Right: Estimated local orientation.
1 φ y′ (i , j )
O(i , j ) = tan −1
2 φx′ (i , j ) (12.7)
Figure 12.5 Left: Tented Arch fingerprint. Right: Whorl fingerprint. The loops are shown
in green circles and the deltas in red triangles.
where
If Poincaré (i,j) = 0.5, the block of the image containing that pixel
may contain a core point (or, a loop). If Poincaré (i,j) = −0.5, then the
corresponding block of the image may contain a delta point. Otherwise,
the block does not contain any singular points. However, there are three
important points regarding the number of singular points in any given fin-
gerprint image [10]: (i) The arch class has no loops or deltas; (ii) The delta
is near the center of the image for tented arch class, near the right for left
loops, and near the left for right loops. All the above classes have one loop
each; and (iii) The whorl is a complex class, which can contain two loops
and two deltas. Figure 12.5 shows the location of the singular points over-
laid on the original fingerprint.
12.3.4 Dataset
The dataset used in this study is NIST Special Database 4 (NIST DB-4)
[18]. It contains 4,000 fingerprint images in PNG format. They are for-
matted files which consist of 8-bit grayscale fingerprint images for the 10
fingers on the left and right hands. There are two impressions “F” and “S”
for each finger. The size of each fingerprint image is 512-by-512 pixels with
32 rows of white space at the bottom. Each class has an equal number of
fingerprints (400), that is, 800 such fingerprint images. After removing
corrupted images from the original dataset, a total of 3,816 images were
used. These were split into 3,316 training images and 500 testing images
randomly. Tables 12.3 and 12.4 show the number of images in each of the
fingerprint classes used for training and testing.
with an Intel i7 processor, 8-GB memory, and a NVIDIA GeForce 830M GPU.
(ii) For feature extraction using CNN, Google Colaboratory was used. Keras
with Tensorflow backend was used as the deep learning framework.
12.4 Results
Figure 12.6 shows the confusion matrix obtained after classification. The
Whorl and Left loop classes have the lowest misclassification error of about
2.7% suggesting that our model learned most of the features of fingerprints
from these classes correctly. The Arch class has a misclassification error of
4.9% with a few being classified as Tented arches. The highest misclassifi-
cation error is observed in the case of Right loop where 15.6% of finger-
prints have been wrongly classified as Tented arches. Another important
observation from the confusion matrix is that a high number of images are
wrongly classified as Tented arch fingerprints. From Table 12.1, we know
that this Whorl, Left, and Right loop fingerprints are most commonly
observed while Arch and Tented arch fingerprints are quite rare. Therefore,
we believe that the lack of variability in the images is reflected in the high
misclassification error for this class of fingerprints.
Table 12.5 shows the performance of our CNN architecture. An accuracy
of 92.45% was obtained. We also compare our results with the classification
Confusion Matrix
A 116 0 0 6 0
L 0 72 0 2 0
True label
R 0 0 81 14 1
T 6 4 2 84 0
W 0 0 3 0 109
T
A
W
L
Predicted label
Figure 12.6 Confusion matrix. The diagonal elements show the number of correctly
predicted fingerprints of each class.
Extraction of the Features of Fingerprints 223
accuracies reported in the literature. The “F” and “S” suffixes denote that
the original dataset was split into two different databases containing two
different impressions of the same finger. This is shown in Table 12.6. Wang
et al. [14] used the tedious conventional method for feature extraction
and classification where the features are extracted manually and fed into a
NN classifier, whereas our model was directly trained on input fingerprint
images and also performs better than that of [3]. It is also worth noting
that [14] reported their model’s performance on a four-class classification
problem by merging the Arch and Tented arch fingerprints into one class.
This essentially reduces the class imbalance in the dataset, hence leading to
a higher accuracy. Our model, on the other hand, achieves similar perfor-
mance with all the five classes.
175 50
200 75
100
0 50 100 150 200
b1 a2 125
150
200
0
0 50 100 150 200
20 b2
40
c1 60
Image Size: 106×106
0
80
20
100
0 20 40 60 80 100
40
c2
60
100
10
0 20 40 60 80 100
20 d2
30
e1
40 e2
50
0 10 20 30 40 50
Figure 12.7 The outputs of the 1st to 5th convolutional layers. Figures a1 and a2: Output
of 1st convolutional layer and 16th activation enlarged. Figures b1 and b2: Output
of 2nd convolutional layer and 16th activation enlarged. Figures c1 and c2: Output of
3rd convolutional layer and 16th activation enlarged. Figures d1 and d2: Output of
4th convolutional layer and 16th activation enlarged. Figures e1 and e2: Output of 5th
convolutional layer and 16th activation enlarged.
Figure 12.8 (a–c) The raw fingerprint image from the tented arch class, its orientation
map, and locations of the singular regions. (d–f) One of the 32 output activations of the
first convolutional layer, its orientation map, and the locations of the singular regions.
(g–i) An activation from the second convolutional layer, its orientation map, and the
locations of the singular regions (best viewed in color).
map of the activation is preserved while at the same time, the location of
the singular regions is also intact. When the same process is repeated for
the activations from the second convolutional layer output, we could again
see some similarity to the features extracted using the conventional meth-
ods in Section 12.3.2. However, there are a few important points to note.
Compared to the original image, we can see that the output of the second
convolutional layer is darker and blurry. In the regions that are blurred, the
orientation map cannot be properly estimated, leading to erroneous ridge
directions. These lead to the incorrect detection of the singular regions and
are removed in the post-processing of the image. However, the centre of the
image is still intact which is responsible for detecting the singular regions.
Figure 12.8 shows the comparison between the results obtained after using
226 Machine Learning Algorithms and Applications
a fingerprint image from the tented arch class of the testing set and the acti-
vation outputs of the first and second convolutional layers.
12.5 Conclusion
In this study, features of fingerprints, namely, the local orientation map and
the singular regions were successfully extracted. A 17-layer CNN architec-
ture was proposed for fingerprint classification and an accuracy of 92.45%
was obtained. The features learned by CNN were interpreted by visualizing
the activations of the convolutional layer outputs. A comparison between
the features learned by CNN and the features extracted using conventional
methods was also performed. From our interpretation of the results, we
could understand that the features which are being used for classification
using SVMs and k-NNs are being learnt by the network in the initial two
layers of the network. The remaining layers are learning more complex rep-
resentations of the input which are incomprehensible. Instead of manually
feeding the encoded feature vectors to the network, the CNNs are learning
the class-specific features by themselves which, therefore, is resulting in
such high accuracies.
Acknowledgements
The NIST-DB4 fingerprint dataset is currently discontinued and is not
available online. However, the dataset will be provided on request. A part
of this work was done with the help of the workstation provided by the
Energy Science Lab of the Mechanical Engineering Department, Shiv
Nadar University.
References
1. Henry, E., Classification and Uses of Fingerprints, 2nd ed., George Routledge
and Sons, London, 1900.
2. Wilson, C.L., Candela, G.T., Watson, C. I., Journal for Artificial Neural
Networks, 1, 2 1994.
3. Peralta, D., Triguero, I., García, S., Saeys, Y., Benítez, J., Herrera, F., On the
use of convolutional neural networks for robust classification of multiple fin-
gerprint captures. Int. J. Intell. Syst., 33, 213–230, 2018.
Extraction of the Features of Fingerprints 227
Abstract
Facial emotion recognition plays an important role in machine learning and arti-
ficial intelligence applications. Based on the human facial expressions of informa-
tion, machines can provide personalized services. There are many applications,
like virtual reality, personalized recommendations, and customer satisfaction
that depend on reliable and an efficient way to recognize the facial expressions.
With the help of automated facial emotion recognition, it achieves improvement
in the human-machine interface. Emotions are categorized into seven categories
and that are anger, disgust, fear, happy, neutral, sad, and surprise. Facial emotion
recognition is a very important challenging problem because every individual
facial expression varies greatly by slightly change in the head pose, environmental
conditions, and so on, and it attracted many researchers for years. In this work,
we are proposing deep learning as well as handcraft-based model to detect the
facial expressions. In this model, we are using CNN which is a deep learning
model, BOVW, and HOG which is handcrafted methods to detect the facial fea-
ture. Finally, we use the Support Vector Machine (SVM) classification method to
classify the facial expressions. We have evaluated testing results on standard FER
2013 datasets. The proposed method gives promising results compared with other
well-known methods.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (229–246) © 2021 Scrivener Publishing LLC
229
230 Machine Learning Algorithms and Applications
13.1 Introduction
Many components contribute to the transmission of an individual’s emo-
tions, and some of them are pose, speech, facial expressions, behaviors, and
activities. However, facial expressions are of higher significance from oth-
ers. In communicating with others, humans can recognize emotions with
a considerable level of accuracy. If we can use computer science to find
practical solutions for automatic recognition of facial emotions, then we
would be able to attain accuracy that is virtually comparable to the human
perception.
Facial expression is the most powerful signal using which human beings
can express their emotional states and intentions. Emotions can be expressed
in three forms such as word of message, vocal part, and facial expression of
the speaker. Among these, researchers have found facial expressions of the
speaker contribute 55% of the facial emotions. This shows facial expres-
sion leaves a major impact on human communication. Many studies have
been carried out on automatic facial expression analysis (AFEA) due to its
practical uses in social robotics, medical treatment, driver suggestion, and
many other human-computer interaction (HCI). Based on cultural studies,
seven basic emotions have found. The term basic expression refers to a set
of expressions that convey universal emotions they are: neutral, disgust,
fear, sadness, happiness, anger, and surprise. Seven basic facial expression
sample images are shown in Figure 13.1.
In the course of recent years, numerous FER-related algorithms have been
proposed in the article, including perceiving emotions from facial frontal
Neutral
and non-frontal pictures. Non-frontal FER is more testing than frontal FER
and increasingly relevant in reality. In general, emotion recognition from
previous work can be done in three ways, that includes detecting a face in
the image, extracting facial features from the image, and classification of
expression in the emotion categories. For classification, the most popular
classifiers are used is the Support Vector Machine (SVM) [1] and Bayes clas-
sifiers, together with some unsupervised learning techniques [2].
However, without any effort or delay, humans easily recognize the facial
expression, but machine expression recognition is still a challenging task.
Automatic recognition of facial expression is a complex task because faces
can vary due to different ages and ethnicity from one individual to another.
Even if the recognition of emotion is based on culture-specific factors such
as the presence of eyeglasses, facial hair makes this task complex.
Another challenge to this task of emotion recognition is the variation
in size and facial orientation in input images. This limits a search for the
image’s fixed pattern. The facial poses may differ due to the angle of the
camera. There could be some frontal or non-frontal faces. Faces can be at
different angles, hiding some of the facial characteristics. We can use some
good preprocessing technique that applies to input images that have good
insensitivity to the scaling, rotation, and translation of the head. Currently,
numerous facial expression recognition (FER) methods based on features
such as local spatial analysis or geometric information as facial features.
Therefore, we can use automatic facial point localization to categorize
facial expressions firmly.
The performance of facial point extraction algorithms usually depends
on environmental conditions such as lighting in many practical applica-
tions, such as robotics. Therefore, if the illumination is not uniform, then
the facial point can be detected inaccurately, and therefore, high FER rates
are hardly expected. Typically, this factor would make the extraction of
features difficult to perform reliably. Preprocessing methods such as
Histogram Equalization, DCT normalization, and Rank Normalization
can be applied before the extraction of the feature to overcome the varia-
tion of illumination in an input image.
In this work, we are using deep learning methods to extract the fea-
tures from facial emotion data and improves the performance of automatic
emotion recognition. The main advantage of deep learning is to remove
the pre-processing technique, which uses a physical model, completely or
heavily by allowing “end-to-end” learning directly from the input image.
That is why deep learning has achieved state-of-the-art results in different
fields including face recognition, object detection, scene understanding,
and FER.
232 Machine Learning Algorithms and Applications
Given an image dataset, where each entry in datasets are images and
their corresponding emotion label, our job is to create a machine learning
model that will classify the image into any seven discrete emotion catego-
ries that represent universal human emotions.
lip, and nose. This is performed by the Sobel operator and Hough trans-
formation using Shi Tomasi point detection. Using the Euclidean distance
feature vector is formed and trained on a multi-layer perceptron (MLP) for
classifying the expression.
S. L. Happy et al. described [8] novel approach for expression recogni-
tion from selected facial patches. Depending on the facial landmark posi-
tion, certain facial patches are extracted during emotion elicitation. Using
this facial patches, salient patches are obtained, each salient patches differ
in the expression classes. The multi-classification method is used for clas-
sification. This method is tested on CK+ and JAFFE datasets.
Nazil Praveen et al. proposed an uncontrolled framework to detect rec-
ognition of spontaneous expression [9]. Initially, the Universal Attribute
Model (UAM) which is the form of a large Gaussian model is trained in
order to learn how to affect different expressions. The movements of the
different facial muscles which are called attributes are combined to form a
specific expression of the face. After that, for each expression clip, a super
expression vector (SEV) is built by utilizing a maximum UAM posteriori
adaptation. This SEV contains the expression attribute’s high dimensional
representation. Then, the SEV is decomposed to low-dimensional repre-
sentation to retain only some particular clip. Datasets expression attribute
such as BP4D and AFEW, the results show that expression-vector achieves
higher accuracy than the state-of-the-art techniques.
Tong Zang et al. presented CNNs for facial emotion recognition [10].
The purpose of this study is to classify each facial image into one of the
seven categories of facial emotion considered. Using grayscale images from
the Kaggle website, the author trained CNN models with different depths.
To accelerate the training process, the author developed their models in
Torch and used Graphics Processing Unit (GPU) computation. In addition
to the networks performing on the basis of raw pixel data, the author used
a hybrid feature strategy to train a new CNN model combining raw pixel
data and Histogram of Oriented Gradient (HOG) features. In addition to
L2 regularization, the author used various techniques to reduce the overfit-
ting of the models, including dropout and batch normalization. To deter-
mine the optimal hyper-parameters, the author applied cross-validation
and evaluated the performance of the models developed by looking at their
training history. The author also presents the visualization of different lay-
ers of a network to show what CNN models can learn from the features of
a face.
To detect peak expression frame from video, Yuanyuan Ding et al.
described [11] a Dual Local Binary Pattern (DLBP). This DLBP method
can reduce the reduction time successfully and it is of small size. In another
234 Machine Learning Algorithms and Applications
Most of the previous work is based on the frontal face image. S. Moore
et al. proposed a method that works on non-frontal images [16]. In this
approach, multiple factors on different facial expressions (position, reso-
lution, and global and local characteristics) have been investigated. Local
binary pattern and its variations are used for texture descriptors investiga-
tion. This feature has an influence on orientation and multi-view FER. The
authors used an appearance-based approach that divided the images into
sub-blocks and then used vector support machines to learn how to pose
dependent facial expressions.
Carlos Orrite et al. [17] described a method for emotion detection from
a still image. HOG is used to obtain local appearance and face shape. First,
the author uses a Bayesian equation for computing class-explicit edge dis-
semination and log-like maps over the entire adjusted training set. A pro-
gressive decision tree is then formed by recurrently bunching and blending
the classes at each dimension utilizing a bottom-up technique. We make a
list of possibly discriminatory HOG features for each part of the tree uti-
lizing the log-probability maps to support areas we hope to be increasingly
biased. Finally, a SVM is used to identify the human emotion from still
images taken in a semi-controlled environment. The author uses Kohn-
Kanade AU-coded facial expression database.
CNN
Happy
Sad
Angry
SVM Surprise
Landmark
detector
Disgust
Input
Database
Neutral
HOG
Fear
Merged Feature
Descriptor
features are extracted, the features are next fused and feed into SVM clas-
sifier to predict the emotions.
∑ ∑ ∑
p −1 q −1 c −1
ConvW ,B ( I t )i , j ,k = Iit− x , j − y ,uWkt,u,x ,y + Bkt (13.1)
x =0 y =0 u=0
It divides the image into b × b cell and selects maximum values from
each cells. Each channel is used independently. The mathematics for pool-
ing operation is represented in Equation (13.2):
s( I t )i , j ,k = g ( Iit, j ,k ) (13.3)
(x, y+1)
Target pixel -1
(x-1, y) =? (x+1, y)
-1 0 1
0
(x, y)
1
Horizontal kernel
(x, y+1)
Vertical kernel
I1 = f0(I0)
I2 = f1(I1)
(13.4)
..........
It = ft−1(It−1)
∑
N −1
L= || Yi − f ( Xi )||22 (13.5)
i =0
Next, for each cell, the HOG is created. For the histogram, the angle
Q bins are selected (e.g., Q = 9). Usually, unsigned orientation is used,
which increases the angles below 0o by 180o. Because different images may
have different contrasts, standardization of contrasts can be very useful.
Standardization takes place within a block on the histogram vector v. Each
detector window is assigned a descriptor. This descriptor is made up of all
the cells in the detector window for each block. The descriptor window
detector is used for object recognition information. Using this descriptor,
training and testing take place. There are many possible methods to use the
descriptor to classify objects such as SVMs and neural networks.
CNNs or RNNs (Recurrent Neural Networks), while others use basic (but
fast) features to estimate the point location.
Dlib’s Face Landmark Detection algorithm is an implementation of the
2014 Kazemi and Sullivan Ensemble of Regression Trees (ERT) [19]. This
technique uses simple and quick features (differences in pixel intensity)
to estimate the landmark positions directly. Subsequently, these estimated
positions are refined by a cascade of regressors with an iterative process.
The regressors produce a new estimate from the previous one, attempting
at each iteration to reduce the alignment error of the estimated points. The
algorithm blazes quickly.
Basically, from a set of images, annotations, and training options, a shape
predictor can be generated. A single annotation is the face region and the
points we want to locate labeled. Any face detection algorithm (such as
OpenCV, HaarCascade, Dlib HOG Detector, and CNN detectors) can easily
obtain the face region, but instead, the points must be manually labeled or
detected by already available landmark detectors and models (such as ERT
with SP68). Finally, the training options are a set of parameters defining the
trained model’s characteristics. These parameters can be adjusted properly
to obtain more or less the desired behavior of the model generated.
H3
H3
H2 H2
X2 H1 X2
H1
X1 X1
(a) (b)
hyperplanes which are at maximum distance from both the feature point. In
Figure 13.4b. H2 is the best hyperplane that we will choose for classification.
There are many possible hyperplanes that could be selected to sepa-
rate the two classes of data points. Our goal is to find a plane with the
maximum margin, i.e., the maximum distance between both classes’ data
points. Maximizing the margin distance provides some reinforcement in
order to be able to classify future data points with greater confidence.
In this work, we use different type features such as CNN, HoG, and
landmark-based descriptors to recognize facial emotions efficiently. Firstly,
we will use automatic FER CNN for facial emotion recognition and we will
use softmax as the last layer of CNN to classify emotion labels. CNN was
trained on five emotions and the result is tabulated in Table 13.1.
Table 13.1 shows the performance of the CNN-based classification
results. In this work, we are using the VGG model to extract the feature rep-
resentation and softmax layer to predict facial emotions. Based on results
with CNN-based features, surprise emotion recognized more among the
other expression. With this method, 68.57% of the average accuracy is
achieved.
We will use automatic FER CNN for facial feature extraction. We extract
these features from the last dense layer of the CNN. Finally, these extracted
features are trained on the SVM for classification. We will use a handcrafted
method like 68 landmark detectors which are implemented using the dlib
library to detect 68 landmark points from the facial image. Then, these 68
landmark points are used to train SVM, and after that, we will classify our
emotion label for test data. Next, we use the handcraft method like the
HoG feature descriptor to extract features from the image. After detec-
tion HoG feature from the image, then we will train these HoG feature on
the SVM for emotion classification. The result of CNN features with SVM
classifier and HoG features with SVM classifiers results on seven emotion
labels is tabulated in Table 13.2.
Table 13.2 Performance (%) of the different features with SVM classification
method on FER 2013 datasets.
CNN + SVM Landmarks + SVM HoG + SVM
Emotion label accuracy (%) accuracy (%) accuracy (%)
0 66 51 50
1 79 77 71
2 57 43 44
3 87 74 75
4 61 50 51
5 65 61 64
6 68 59 62
Average 69 59.2 59.5
Table 13.2 shows the different features with SVM classification accu-
racy results on the emotion dataset. In the case of CNN features with SVM
classification method, it gives more accuracy results compared with other
features results. The average accuracy of the CNN features with SVM clas-
sification method gives 69% highest accuracy and landmarks descriptors
and HoG features with the SVM classification average results are 59.2%
and 59.5%, respectively.
We use the combination of the CNN as automatic, HoG, and landmark
detector as handcraft method. After collecting feature vector, CNN, HoG,
and landmark detector corresponding to each image, we will merge these
feature and we will have built one merged feature vector. Finally, we will
train this merged feature vector on SVM for expression classification. The
result is tabulated in Table 13.3.
Table 13.3 shows the combination of merged features with SVM-based
classification results on emotion dataset. This three combination of features
results give more accuracy compared with the individual features results.
Table 13.3 Fusion of CNN, landmark, and HoG features with SVM
classification accuracy results.
Method Accuracy (%)
Proposed Method (CNN + HoG + Landmark) 72
FER Using Deep Learning and Multiple Features 245
13.5 Conclusion
In this work, we addressed the task of FER and our aim is to classify images
of faces into any of seven discrete emotion categories that represent univer-
sal human emotions. We have used automatic as well as handcraft meth-
ods to extract features from images. We have used a convolution neural
network as automatic feature extraction and histogram of oriented and
landmark detector as a handcraft feature detector. After that, we merge all
three features and we get a merged feature vector. Then, we train the SVM
using the merged feature vector for classification. The purpose method is
accurately classified emotions compared with the individual facial features
results.
Acknowledgement
The authors would like to thank the anonymous reviewers for their thor-
ough review and valuable comments. This work was supported, in part, by
grants Government of India, Ministry of Human Resource Development
and NIT Warangal under NITW/CS/CSE-RSM/2018/908/3118 project.
References
1. Srinivas, M., Basil, T., Krishna Mohan, C., Adaptive learning based heartbeat
classification. Bio-med. Mater. Eng., 26, 1–2, 49–55, 2015.
2. Srinivas, M. and Krishna Mohan, C., Efficient clustering approach using
incremental and hierarchical clustering methods. The 2010 International
Joint Conference on Neural Networks (IJCNN), IEEE, 2010.
3. Goodfellow, J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B.,
Cukierski, W., Tang, Y., Thaler, D., Lee, D.-H., Zhou, Y., Ramaiah, C., Feng,
F., Li, R., Wang, X., Athanasakis, D., Shawe-Taylor, J., Milakov, M., Park, J.,
Ionescu, R.T., Popescu, M., Grozea, C., Bergstra, J., Xie, J., Romaszko, L.,
Xu, B., Bengio, Y., Challenges in Representation Learning: A report on three
machine learning contests, in: Proceedings of ICONIP, vol. 8228, LNCS
Springer-Verlag, pp. 117–124, 2013.
4. Srinivas, M., Lin, Y.-Y., Mark Liao, H.-Y., Learning deep and sparse feature
representation for fine-grained object recognition. 2017 IEEE International
Conference on Multimedia and Expo (ICME), IEEE, 2017.
5. Georgescu, Mariana-Iuliana, Radu Tudor Ionescu, and Marius Popescu.
Local learning with deep and handcrafted features for facial expression rec-
ognition. IEEE Access, 7, 64827–64836, 2019.
246 Machine Learning Algorithms and Applications
6. He, Z., Kan, M., Zhang, J., Chen, X., Shan, S., A fully end-to-end cascaded
CNN for facial landmark detection. IEEE International Conference on
Automatic Face and Gesture Recognition, 2017.
7. Khan, F., Facial Expression Recognition using Facial Landmark Detection
and Feature Extraction via Neural Networks, arXiv:1812.04510v2, 2018.
8. Happy, S. L., and Aurobinda Routray. Automatic facial expression recogni-
tion using features of salient facial patches. IEEE transactions on Affective
Computing, 6.1, 1–12, 2014.
9. Perveen, N., Roy, D., Krishna Mohan, C., Spontaneous Expression Recognition
using Universal Attribute Model. IEEE Trans. Image Process., 27.11, 5575–
5584, 2018.
10. Zhang, T. and Zheng, W., A Deep Neural Network Driven Feature Learning
Method for Multi-view Facial Expression Recognition. IEEE Trans.
Multimed., 18.12, 2525–2536, 2016.
11. Ding, Y. et al., Facial Expression Recognition From Image Sequence Based
on LBP and Taylor Expansion. IEEE Access, 5, 19409–19419, 2017.
12. Sun, B., Li, L., Zhou, G., He, J., Facial expression recognition in the wild based
on multimodal texture features. J. Electron. Imaging, 25, 6, 407–407, 2016.
13. Wang, W., Chang, F., Liu, Y., Wu, X., Expression recognition method based
on evidence theory and local texture. Multimed. Tools Appl., 76.5, 7365–7379,
2016.
14. Lee, C.-Y., Xie, S., Gallagher, P.W., Zhang, Z., Tu, Z., Deeply Supervised Nets.
International Conference on Artificial Intelligence and Statistics, 2015.
15. Jumde, A.S., Sonavane, S.P., Behera, R.K., Face detection using data min-
ing approach. International Conference on Communications and Signal
Processing, 2015.
16. Moore, S. and Bowden, R., Local binary patterns for multi-view facial expres-
sion recognition. Comput. Vision Image Understanding, 115, 4, 541–558, 2011.
17. Orrite, C., Ganan, A., Rogez, G., Hog-based decision tree for facial expres-
sion classification. Pattern Recognition and Image Analysis, Springer,
pp. 176–183, 2009.
18. Srinivas, M., Roy, D., Krishna Mohan, C., Discriminative feature extraction
from X-ray images using deep convolutional neural networks. 2016 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP),
IEEE, 2016.
19. Kazemi, V. and Sullivan, J., One millisecond face alignment with an ensem-
ble of regression trees, in: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1867–1874, 2014.
20. Goodfellow, I.J. et al., Challenges in representation learning: A report on
three machine learning contests. International conference on neural informa-
tion processing, Springer, Berlin, Heidelberg, 2013.
22. Fathima, A. and Vaidehi, K., Review on facial expression recognition system
using machine learning techniques, in: Advances in Decision Sciences, Image
Processing, Security and Computer Vision, pp. 608–618, Springer, Cham, 2020.
Part 4
MACHINE LEARNING
FOR CLASSIFICATION
AND INFORMATION
RETRIEVAL SYSTEMS
14
AnimNet: An Animal Classification
Network using Deep Learning
Kanak Manjari1*, Kriti Singhal2, Madhushi Verma1 and Gaurav Singal1
2
Computer Science Engineering Galgotias University, Gr. Noida, India
Abstract
Image classification is a combination of technologies: Image Processing (IP),
Machine Learning (ML), and Computer Vision (CV). The classification of ani-
mals has been done in this work that are commonly found in the Indian scenario
using two approaches: transfer learning and a custom-built classification network,
i.e., AnimNet. For transfer learning, we have used VGG16, VGG19, and Xception
network that are existing pre-trained networks and compared the results of the
custom-built AnimNet network with these existing networks. The comparison
was done on the basis of the accuracy and size of the models as the size of network
is as important as the accuracy of network in this era of mobile computing. A
lightweight network with good performance is the most optimal choice nowadays.
The accuracy was observed to be highest for the Xception network whereas the
AnimNet network is lightweight, i.e., 5X smaller than the Xception model with
second-highest accuracy.
Keywords: Image classification, animal detection, VGG16, VGG19, computer
vision, deep learning
14.1 Introduction
Classification is a methodical categorization of images based on its charac-
teristics in different classes. Some classifiers are binary, leading to a conclu-
sion that is yes/no. Others are multi-class, capable of classifying an object
into one of many classifications. Image classification emerged to reduce
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (249–266) © 2021 Scrivener Publishing LLC
249
250 Machine Learning Algorithms and Applications
the lap between computer and human vision by training the computer
with data. The conventional methods used for the image classification are
segments of the Artificial Intelligence (AI) field, called Machine Learning
(ML). ML includes two modules: feature module and classification mod-
ule. Feature module is responsible for extracting viable features such as
textures and edges. Classification module classifies based on the extricated
characteristics. Classification is a very common case of ML; classification
algorithms are used to solve problems such as filtering email spam, cate-
gorization of documents, recognition of voice, recognition of images, and
recognition of handwriting. The main drawback of ML is the need of large
amount of training data which should be unbiased and of good quality.
This is rectified by Deep Learning (DL).
DL is a sub-segment of ML, able to learn via its own computing method.
DL uses complex, multi-algorithm framework represented as an artificial
neural network (ANN). The ANN’s architecture is replicated using the
human brain’s biological neural network. This makes DL more able than
the standard models of ML. In DL, we consider the neural networks which
recognize the image on the basis of its characteristics [1]. This is achieved
by the construction of a complete characteristic extraction model which
can solve the difficulties faced by conventional methods. In DL, to per-
form classification duties, a computer model learns directly from images,
text, or sound. DL models, often exceeding human-level efficiency, may
achieve state-of-the-art accuracy. Models are trained by the use of a large
set of labeled data and architectures of neural networks that involve mul-
tiple layers.
X1
X2
y
X3
X4
environment and annotated with noisy marks. To divide the training sam-
ples based on different characteristics, they used k-means clustering,
which was then used to train various networks. Two publicly accessible
camera-trap image datasets were evaluated for the performance of the
proposed method: Snapshot Serengeti [10] and the Panama-Netherlands
datasets. The results suggested that the approach selected by the authors
outperformed the literature’s state-of-the-art methods and enhanced the
precision of the classification of animal species from camera-trap images
which contain noise.
In paper [11], a classification system for classifying images of real ani-
mals has been developed by the authors. Using the toy photos of animals to
account for factors other than just the physical appearance of animals, the
model was educated. The segmentation was performed using the k-means
clustering technique after pre-processing the image. Following segmen-
tation, the extraction of hog features from the segmented image was per-
formed. The extracted features were used in the final step to classify the
image into a suitable class using the supervised multi-SVM classifier. In
paper [12], the authors have shed light on different methods available for
feature extraction and features these methods extract to perform an effi-
cient identification and classification. They presented the results obtained
from a dataset containing 111,467 photos in the training of a CNN to iden-
tify 20 African wildlife species with an overall accuracy of 87.5%. In order to
generate a visual similarity dendrogram of known organisms, hierarchical
clustering of feature vectors associated with each image has also been used.
In classifying camera trap data, how to process datasets with only a
few classified images that are generally difficult to model and applying a
trained model to a live online citizen science project, authors in [13] found
the accuracy of DL. In order to distinguish between images of various ani-
mal species, human or vehicle images, and empty images, CNNs were used.
Accuracies ranged between 91.2% and 98.0% for identifying empty images
through programs, while accuracies ranged between 88.7% and 92.7% for
identifying individual organisms.
Authors in [14] have attempted to resolve the challenge faced during
CNN-based fine-grain recognition. Generally, the need for large sets of
training data and the learned approaches to feature presentations are
high-dimensional, leading to less efficiency. The authors suggested an
approach where online dictionary learning is incorporated into CNN to
resolve these issues. A significant amount of weakly labeled information on
the Internet can be learned from the dictionary by an incremental process.
During fine-grained image classification, the authors discussed the
problems of elevated inter-class similarity and broad intra-class variations
254 Machine Learning Algorithms and Applications
1) U
sing Transfer Learning: We have trained our dataset using
available pre-trained models, i.e., VGG19, VGG16, and
Xception. In this section, we will look into the structure of
each of these models and perform training on the custom-
built dataset. Using transfer learning, we first train a base
network on a base dataset, and then transfer the learned fea-
tures to a second target network to be trained on a target
dataset. If the features are general, this mechanism would
appear to perform, meaning that they are appropriate for
both tasks, rather than unique to the base task. Here, we use
pre-trained weights and retrain the network using the cus-
tom dataset which increases the accuracy. The analysis of the
obtained results is provided in the later section.
Conv-64 Conv-64
Conv-64 Conv-64
Conv-128 Conv-128
Conv-128 Conv-128
Conv-256 Conv-256
Conv-256 Conv-256
Conv-256 Conv-256
Max Pool
Conv-512
Conv-512 Conv-512
Conv-512 Conv-512
Conv-512
Conv-512
Max Pool
Conv-512
Conv-512 Conv-512
Conv-512
FC-4096
Conv-512
FC-4096
Max Pool
FC-1000
Softmax FC-4096
FC-4096
VGG 16 FC-1000
Softmax
VGG 19
for a large image dataset. Its frame size is 224 × 224. A fixed-size RGB
image of size 224 × 224 has been inputted into this network. This sug-
gests the matrix would have been (224,224,3). The mean RGB value
was subtracted from each pixel. Kernels of 3 × 3 size with a step size of
AnimNet: An Animal Classification Network 257
1 pixel were used which made covering the whole notion of image pos-
sible. Spatial information of the image was retained with the help of spa-
tial padding. A 2 × 2-pixel window with stride 2 was used to perform
max pooling. Rectified linear unit (ReLU) was used after this to add
non-linearity. It is better than models using tanh or sigmoid functions
as non-linearity increases the computational time. Three fully connected
layers were implemented: the first two layers have 4,096 channels each,
the third layer performs ILSVRC 1,000-way classification and thus has 1
channel per 1,000 channel class. Softmax is applied in the last layer for
classification.
Xception [19] is a convolutional, 71-layer, deep neural network. It is a
CNN architecture that relies completely on depth-wise separable convo-
lution layers. A pre-trained version of the trained network can be loaded
from the ImageNet database which has over a million images. As a result,
the network has learned rich representations of features for a wide array of
images. The network size is 299 × 299 input images. It is a convolutional
architecture of the neural network, formed entirely on depth-wise, separa-
ble layers of convolution. Later, the following hypothesis was proposed that
in the feature maps of CNN the mapping of cross-channels correlations
and spatial correlations can be completely decoupled. As this hypothesis is
a firmer version of the Inception architecture hypothesis, this architecture
was coined the term “Xception” which stands for “Extreme Inception”. The
architecture consists of 36 convolutional layers which form the network’s
base for extracting features. These 36 layers are arranged into fourteen
modules, all except for the first and last modules, which have residual lin-
ear ties around them.
2) C
reating a Custom Model (AnimNet): The main aim to
develop a custom model is to have a good performance
model that is lightweight to make it suitable for mobile
devices. AnimNet required no pre-trained weights for train-
ing. This model has been trained from scratch by adjusting
the weights, adding customized layers and tried to make the
proposed model as light as possible with enough neurons
for efficient feature extraction along with a good accuracy.
The convolution is followed by max pooling and then drop-
out of 0.5 has been added. The input shape is (128 × 128 ×
3), kernel is of (3 × 3), and l2 regularizer of 0.01 has been
used. After this, flattening is done followed by adding dense
layers and dropout. Rmsprop has been used as optimizer,
ReLU as activation function, and categorical cross-entropy
258 Machine Learning Algorithms and Applications
Regression
Head
Feature
Map
Classification
Head
has been used for the training. The typical classification and
localization network architecture has been shown in Figure
14.3 with an additional regression head on the top right with
CNN classification network.
14.4 Results
The result achieved by applying transfer learning and by AnimNet has
been presented in this section. The training, validation, and testing per-
formance of the models has been shown in graphs and tables. The training
loss, training accuracy, validation loss, validation accuracy, and size of the
network are the performance evaluation parameters which we have used in
this chapter to discuss the performance of networks. The loss and accuracy
achieved during training and validation process for each of the networks
has been compared to analyze their performance. Along with the loss and
accuracy, size of each networks has been compared to understand their
compatibility with mobile devices. These evaluation parameters are dis-
cussed briefly below:
2.0 0.6
Accuracy
Loss
0.5
1.5
0.4
1.0 0.3
0.2
0 5 10 15 20 25 30 0 5 10 15 20 25 30 35
Epochs Epochs
(a) (b)
Figure 14.4 Plots of Xception network. (a) Training loss of Xception network
(b) accuracy of Xception network.
2.0
0.3
1.5
0.2
1.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epochs Epochs
(a) (b)
Figure 14.5 Plots of AnimNet network. (a) Training loss of AnimNet network
(b) accuracy of AnimNet networks.
classification of each image by both the networks are almost same, i.e.,
9–11 ms/step. By looking over the validation accuracy obtained by these
two models, it can be assumed that the Xception model would provide a
better accuracy while predicting images from our test dataset. Although
Xception network achieved the highest accuracy, the accuracy of AnimNet
is also acceptable. However, if the need is to have a lightweight as well as
accurate model, then the proposed network (AnimNet) may be preferred.
The test results of the model on the test images along with the accuracy
obtained for each class has also been shown. The test results obtained on
test images for Xception and AnimNet (custom-built) network has been
shown in Figures 14.6, 14.7, 14.8, 14.9, 14.10, and 14.11, respectively. These
results are after training all the models for the self-created dataset. It has
been observed from the test results that the accuracy was higher when
Xception model was used as it is pre-trained on large dataset. Although
0 0
25 25
50 50
75 75
100 Height 100 Height
125 125
150 150
175 175
200 200
(a) (b)
Figure 14.6 Accuracy of class “dog” (size of image = 200*200 pixels). (a) Using Xception
network (b) using AnimNet network.
262 Machine Learning Algorithms and Applications
0 0
25 25
50 50
75 75
100 Height 100 Height
125 125
150 150
175 175
200 200
(a) (b)
Figure 14.7 Accuracy of class “cat” (size of image = 200*200 pixels). (a) Using Xception
network (b) using AnimNet network.
0 0
25 25
50 50
75 75
100 Height 100 Height
125 125
150 150
175 175
200 200
(a) (b)
Figure 14.8 Accuracy of class “cow” (size of image = 200*200 pixels). (a) Using Xception
network (b) using AnimNet network.
0 0
25 25
50 50
75 75
100 100
Height Height
125 125
150 150
175 175
200 200
(a) (b)
Figure 14.9 Accuracy of class “horse” (size of image = 200*200 pixels). (a) Using
Xception network (b) using AnimNet network.
AnimNet: An Animal Classification Network 263
0 0
25 25
50 50
75 75
100 Height 100 Height
125 125
150 150
175 175
Width Width
200 200
(a) (b)
Figure 14.10 Accuracy of class “goat” (size of image = 200*200 pixels). (a) Using
Xception network (b) using AnimNet network.
0 0
25 25
50 50
75 75
100 100
Height Height
125 125
150 150
175 175
200 200
Width Width
0 50 100 150 200 0 50 100 150 200
(a) (b)
Figure 14.11 Accuracy of class “monkey” (size of image = 200*200 pixels). (a) Using
Xception network (b) using AnimNet network.
14.5 Conclusion
A quantitative as well as qualitative performance analysis of the pre-trained
networks and AnimNet has been provided in this chapter for clear under-
standing. As can be observed from result analysis, the accuracy given by
the AnimNet network is very well though the images have many things
other than animals like human and vehicles. A demonstration of how a
264 Machine Learning Algorithms and Applications
References
1. Manjari, K., Verma, M., Singal, G., A Survey on Assistive Technology for
Visually Impaired. Internet Things, Elsevier, 11, 100188, 2020.
2. Manjari, K., Verma, M., Singal, G., A Travel Aid for Visually Impaired:
R-Cane, in: International Conference on Smart City and Informatization,
2019, November, Springer, Singapore, pp. 404–417.
3. Jain, A.K., Mao, J., Mohiuddin, K.M., Artificial neural networks: A tutorial.
Computer, 29, 3, 31–44, 1996.
4. Torrey, L. and Shavlik, J., Transfer learning, in: Handbook of research on
machine learning applications and trends: algorithms, methods, and tech-
niques, pp. 242–264, IGI Global, New York, 2010.
5. Manjari, K., Verma, M., Singal, G., CREATION: Computational ConstRained
Travel Aid for Object Detection in Outdoor eNvironment, in: 2019 15th
IEEE International Conference on Signal-Image Technology & Internet-Based
Systems (SITIS), 2019, November, pp. 247–254, 2020.
6. Suryawanshi, M.S., Jogdande, M.V., Mane, M.A., Animal classification using
Deep learning. Int. J. Eng. Appl. Sci. Technol., 6, 305–307, 2020.
7. Khan, R.H., Kang, K.W., Lim, S.J., Youn, S.D., Kwon, O.J., Lee, S.H., Kwon,
K.R., Animal Face Classification using Dual Deep Convolutional Neural
Network. J. Korea Multimed. Soc., 23, 4, Korea multimedia society, 525–538,
2020.
8. Lin, L. and Link, Y., Household Animals Classification Using Deep Learning,
CS230: Deep Learning, Winter 2020, Stanford University, CA, 2020.
AnimNet: An Animal Classification Network 265
9. Ahmed, A., Yousif, H., Kays, R., He, Z., Animal species classification using
deep neural networks with noise labels. Ecol. Inf., 57, 101063, 2020.
10. Swanson, A., Kosmala, M., Lintott, C., Simpson, R., Smith, A., Packer, C.,
Snapshot Serengeti, high-frequency annotated camera trap images of 40
mammalian species in an African savanna. Sci. Data, 2, 1, 1–14, 2015.
11. Nanditha, D. and Manohar, N., Classification of Animals Using Toy Images,
in: 2020 4th International Conference on Intelligent Computing and Control
Systems (ICICCS), IEEE, pp. 680–684, 2020, May.
12. Miao, Z., Gaynor, K.M., Wang, J., Liu, Z., Muellerklein, O., Norouzzadeh,
M.S., Getz, W.M., Insights and approaches using deep learning to classify
wildlife. Sci. Rep., 9, 1, 1–9, 2019.
13. Willi, M., Pitman, R.T., Cardoso, A.W., Locke, C., Swanson, A., Boyer, A.,
Fortson, L., Identifying animal species in camera trap images using deep
learning and citizen science. Methods Ecol. Evol., Wiley, 10, 1, 80–91, 2019.
14. Srinivas, M., Lin, Y.Y., Liao, H.Y.M., Learning deep and sparse feature rep-
resentation for fine-grained object recognition, in: 2017 IEEE International
Conference on Multimedia and Expo (ICME), IEEE, pp. 1458–1463, 2017,
July.
15. Srinivas, M., Lin, Y.Y., Liao, H.Y.M., Deep dictionary learning for fine-
grained image classification, in: 2017 IEEE International Conference on Image
Processing (ICIP), IEEE, pp. 835–839, 2017, September.
16. Alippi, C., Disabato, S., Roveri, M., Moving convolutional neural networks
to embedded systems: the alexnet and VGG-16 case, in: 2018 17th ACM/
IEEE International Conference on Information Processing in Sensor Networks
(IPSN), pp. 212–223, 2018, April.
17. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., Imagenet: A large-
scale hierarchical image database, in: 2009 IEEE conference on computer
vision and pattern recognition, pp. 248–255, 2009, June.
18. Carvalho, T., De Rezende, E.R., Alves, M.T., Balieiro, F.K., Sovat, R.B.,
Exposing computer generated images by eye’s region classification via trans-
fer learning of VGG19 CNN, in: 2017 16th IEEE International Conference on
Machine Learning and Applications (ICMLA), 2017, December, pp. 866–870.
19. Chollet, F., Xception: Deep learning with depthwise separable convolutions,
in: Proceedings of the IEEE, 2017.
15
A Hybrid Approach for Feature
Extraction From Reviews to
Perform Sentiment Analysis
Alok Kumar* and Renu Jain†
Abstract
In this chapter, a hybrid approach to extract the important attributes called
features of a product or a service or a professional from the textual reviews/
feedbacks has been proposed. The approach makes use of topic modeling
concepts and the linguistic knowledge embedded in the text using Natural
Language Processing tools. A system has been implemented and tested taking
the feedbacks of two different domains: feedback of teachers and feedback of
laptops. The system tries to extract all those features (single word and multi-
ple words) for which users have expressed their opinion in the reviews. The
syntactic category and the frequency contribute in deciding the importance of
a feature and a numerical value between zero and one called weight is gener-
ated for each identified feature representing its significance. Results obtained
from the proposed system are comparable if extraction from the text is done
manually.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (267–288) © 2021 Scrivener Publishing LLC
267
268 Machine Learning Algorithms and Applications
15.1 Introduction
Sentiment analysis is a process to understand and quantify the feelings or
emotions of users from the text called reviews expressed in a natural lan-
guage. It tries to quantify the effectiveness and usefulness of a product/
service/professional on the basis of experiences shared by different users
conveyed through written communication. Sentiment analysis can either
simply classify something good or bad but a fine grained sentiment anal-
ysis is performed by feature-based analysis. Feature-based analysis takes
as input a list of features and then for each feature, sentiments according
to some pre-defined scale are evaluated. To get the feedback of users, tra-
ditionally, a questionnaire-based method Kumar et al. [6] is used where a
form containing a set of questions related to the item is given to users for
their evaluation. Every question in the questionnaire represents an import-
ant aspect or property of the item and it can be referred as one of the feature
of the item. On the basis of users’ responses, an overall analysis of the item
and of required question-based analysis is done. But, through question-
naire, users give their opinion only about those attributes/features which
are mentioned in the form and feel deprived of expressing their views on
other aspects if there are any. The selection of appropriate features of any
item plays a very important role in sentiment analysis of that item.
Though there is no standard method to decide the list of features for
inclusion in the questionnaire but most of the time, some authorities decide
features on the basis of their knowledge and experience. If we think how
each one of us would write feedback about a person or about a product,
we tend to write only about those features which are important to us while
other persons may comment on entirely different set of features depending
upon their requirements. Hence, the accurate way of identifying the fea-
tures would be to extract the features from feedbacks itself. In addition to
this, the importance of each feature may vary according to domain/envi-
ronment. For example: “knowledge” and “dress sense” are two features for
teachers but their weights would be different according to their importance
while calculating the effective sentiment score of a teacher. Hence, to cor-
rectly evaluate the overall usefulness/performance of any product or any
individual, the list of relevant features needs to be identified and catego-
rized as strong and weak features.
In this chapter, we have proposed a statistical unsupervised-based
method to identify relevant features from feedbacks. A weight estimated
statistically and representing the importance of a feature is associated with
each feature. The implementation has been done for two sets of data: one for
feedbacks of laptops and another one for feedbacks of teachers. Feedbacks
Hybrid Approach for Feature Extraction 269
Liu et al. [7] have modified double propagation algorithm that is given
by Qiu et al. [16] through including semantic similarity and aspect and
aspect association-based knowledge. It performs better than baseline algo-
rithm and is also able to capture multi-word aspect terms with high accu-
racy. Siqueira et al. [19] proposed a method to extract features from the
reviews of services. This method is used in the design of a prototype system
“What Matter”. Valid features are identified using linguistic knowledge. It
works in four steps, i.e., frequent noun identification, relevant noun identi-
fication, feature indicator mapping, and unrelated noun removal.
Hamdan et al. [4] proposed a Conditional Random Field (CRF)–based
system to extract multi-word aspect terms in reviews. Authors have used
IOB (Inside, Beginning, and the Outside) notation in review sentences.
Each single word is extracted with set of features and on the basis of
extracted features, it is decided whether a word is valid aspect or not.
System is implemented by using CRF suite. Wang et al. [20] proposed a
model-based on Restricted Boltzmann Machines to extract aspect terms
and sentiment words in unsupervised setting. Model is designed with three
types of hidden layers in neural network. Hidden layers represent aspect,
sentiment, and background, respectively.
Poria et al. [13] proposed a method to tag each word in review as aspect
term or non-aspect term. Authors have used seven-layer deep convolu-
tional neural network for it. Linguistic patterns are also combined with
neural network. Chen et al. [1] pro-posed a clustering-based method for
feature extraction and feature categorization simultaneously. Features are
grouped on the basis of their domain similarities and merging constraints.
Domain similarities are captured with public lexicon WordNet. Hamdan
[3] used CRF to learn system for feature extraction. Main features used by
author were terms itself, part of speech tag to each term, word shape (cap-
ital letter, small letter, digit, punctuation, and other symbol), word type
(uppercase, digit, symbol, and combination), prefixes (all prefixes having
length between one to four), suffixes (all suffixes having length between
one to four), and stop word (word is a stop word or not). Pranali and
Borikar [15] suggested an artificial neural network system to classify movie
reviews. Authors trained and tested the system using training and test data
sets which were collected from public IMDB movie review. In this system,
fuzzy logic was used to handle the sentiment negations like no, not, and
never. Integration of fuzzy logic and artificial neural network was helpful
in improving system accuracy.
Huey and Renganathan [5] proposed a system to find the sentiment
polarity of comments on electronics items written in Chinese language.
System is implemented using Maximum Entropy–based approach.
Hybrid Approach for Feature Extraction 271
Feedback Collector
Feature Concluder
∑
k
P(w) = π sj ∗ P(w|Θ j ) (15.1)
j =1
where πsj is converge of sentence s by topic j while P(w| Θj) word probabil-
ity distribution of word w in topic j. Logarithmic probabilities have been
used instead of direct probabilities to perform computations to desired
number of decimal because normal probability values are of the order
of 10−5. Logarithmic probability of a sentence’s (Log(P(s))) is defined as
follows:
∑ ∑
k
LogP(s)) = C(w,s ) ∗ log π sj ∗ P(w|Θ j ) (15.2)
w ∈V j =1
where C(w,s) is the frequency count of word “w” in sentence “s” and πsj,
P(w| Θj) are as defined in equation (1). Similarly, logarithmic probabil-
ity of whole corpus is sum of all individual feedbacks, i.e., P(C) given in
Equation (15.3).
274 Machine Learning Algorithms and Applications
∑ ∑ ∑
k
LogP(C)) = C(w,s ) ∗ log π sj ∗ P(w|Θ j )
s ∈C w ∈V j =1
(15.3)
∑
N
P(wi | Θ j ) = 1 for every topic j є [1,K].
i =1
(ii) Sum of coverage probabilities of all topics to be 1 in each
∑
k
sentence, i.e., π sj = 1 for every sentence s є C.
j =1
Step 4: Using inferred Z values to spilt the counts and then collecting
the right count to re-estimate the parameters πs,j and P(w|θj)
as
π s,j =
∑ C(d,w ) ∗ P(w|θ ) and
wεV
j
∑ ∑ c(d,w ) ∗ P(w|θ )
j ′ε K wεV
j′
P(w|θ j) =
∑ C(d,w ) ∗ P(w|θ ) .
dε C
j
∑ ∑ c(d,w ) ∗ P(w|θ )
w ′εV dε C
j′
Preprocessed Number of
feedbacks topics (K)
Feature Generator P
L
S
Topic 1 Topic 2 Topic k
Word Distribution Word Distribution Word Distribution A
Feature Collector
are adjective or adverb like “good”, “awesome”, “bad”, and “worst” are
removed and then all those words whose syntactic categories are not noun,
adjective, or adverb are removed from the tentative list of features.
pp ly
t
r
ed
de
op nt
ive
dg l
ng
ly
t
w le
ly
e
fu
ee
la
ric
er
tiv
er
nd
ow
ab
Co lige
gu
lin
Ru
lp
ki
at
nc
Sw
St
op
or
e
ie
or
He
Sl
er
ip
Re
Si
l
Pr
Fr
te
sc
le
rd
In
Su
Di
ow
Ha
Kn
Terms in seed feature list and terms of tentative list having higher context
similarities with terms in seed feature list are added in the final feature list.
Again, context similarities between terms in final feature list and terms in
tentative feature list are computed and highly contextually similar words
of tentative list are added in final feature list. This process is continued till
new terms keep adding to the final list. Paradigmatically similar terms of
word “Punctual” in teachers’ feedbacks (on x axis) with their contextual
similarities (on y axis) are shown in Figure 15.3.
From Figure 15.3, it is identified that “Regular” and “Disciplined” are
more relevant compared to “Sweet” and “slowly”.
Table 15.1 Most strongly and weakly sentiment associated words in teachers’
feedbacks.
Terms strongly associated with Terms weakly associated with
sentiment words sentiment words
(Kept in final feature list) (Removed from feature list)
Words Degree of association Words Degree of association
Teaching 0.96 Book 0.487805
Knowledge 0.918221 Board 0.328694
Presentation 0.859541 Bit 0.128694
Skills 0.837565 Minutes 0.028694
Regular 0.812575 Class 0.114347
Table 15.2 Most strongly and weakly sentiment associated words in laptops’
feedbacks.
Terms strongly associated with Terms weakly associated with
sentiment words sentiment words
(Kept in final feature list) (Removed from final feature list)
Words Degree of association Words Degree of association
Price 0.898234 Store 0.037645
Touchpad 0.897215 Power 0.026792
Size 0.783419 Night 0.024876
Screen 0.718394 Area 0.018721
Keyboard 0.688890 Network 0.012543
Step 14: All words in final feature list are the significant features of
the item.
Hybrid Approach for Feature Extraction 281
Average Average
F1 0.9 8 3
sentiment sentiment
score score without
without considering
F2 0.8 8 8 considering weights = (3 +
weights = 8+9+5+
(8 + 8 + 8 + 3 7)/5 = 6.4
F3 0.6 8 9 + 2 /5 = 5.8 Weighted
Weighted average
average sentiment
F4 0.3 3 5 sentiment score with
score with considering
F5 0.2 2 7 considering weights = (3
weights = (8 × 0.9 + 8 × 0.8
× 0.9 + 8 × + 9 × 0.6 + 5 ×
0.8 + 8 × 0.6 0.3 + 7 ×
+ 3× 0.3 +2 × 0.2)/5 = 3.48
0.2)/5 = 3.94
282 Machine Learning Algorithms and Applications
1.2
0.8
0.6
0.4
0.2
Young
Assignment
Grading
Knowledge
Explanation
Intelligent
Course Completion
Strong
Dedication
Teaching skill
Disciplined
Concept
Class Environment
Interesting
Frank
Active
Hair Style
Systematic
Supportive
Punctual
Strict
Approach
Friendly
Cooperative
Bias
Notes
Responsible
Dress Sense
Behavior
Presentation
Paper pattern
Nature
Interactive
Encouraging
Sincere
Figure 15.4 Teachers’ features with its importance.
all features of item 2 are better than item 1 except feature F1 and average
sentiment score of item 2 is also better than item 1 but if weights of features
are taken into account, then it can be seen that the effective sentiment score
of item 1 is better than item 2.
It can be easily seen from Figure 15.4 that dominating or most important
features like “Knowledge”, “Presentation”, and “Intelligent” having higher
weights in comparison to the features “Young”, “Hair Style”, “Notes”, etc.
94
93
92
91
90
89
88
87
86
Training Testing Training Testing
15.5 Conclusion
With the proposed approach, we are able to identify relevant features of a
product/service/professional with their weights representing significances
of features. Proposed system works in two phases: First, a tentative list of
features is identified using topic modeling-based technique PLSA and fur-
ther linguistic knowledge is used to extract significant features. This system
will be helpful in performing a fine gained sentiment analysis on textual
feedbacks. We have tested proposed system with the feedbacks of laptops
and teachers. In future, we will test the proposed system on feedbacks of
other domains like feedbacks of movies, restaurants, books, employees,
etc., to identify the significant features in each domain. Our future plan
is develop a topic model-based system to identify sentiment orientation
of the mentioned features. It can be integrated with our proposed feature
extraction system to build a new feature-based sentiment analyzer. We
are also planning to incorporate domain specific linguistic knowledge to
enhance performance of our system.
References
1. Chen, L., Martineau, J., Cheng, D., Sheth, A., Clustering for Simultaneous
Extraction of Aspects and Features from Reviews. Proceedings of NAACL-
HLT, San Diego, California, pp. 789–799, 2016.
2. Collected Textual Feedback: Textual feedbacks collected from 120 engineering
students for 20 teachers of University Institute of Engineering and Technology,
CSJM University, Kanpur, 2018.
3. Hamdan, H., SentiSys at SemEval-2016 Task 5: Opinion Target Extraction
and Sentiment Polarity Detection. Proceedings of SemEval-2016, San Diego,
California, pp. 350–355, 2016.
4. Hamdan, H., Bellot, P., Bechet, F., Lsislif: CRF and Logistic Regression for
Opinion Target Extraction and Sentiment Polarity Analysis. Proceedings
of the 9th International Workshop on Semantic Evaluation (SemEval 2015),
Denver, Colorado, pp. 753–758, 2015.
5. Huey and Renganathan, H., Chinese Sentiment Analysis Using Maximum
Entropy, in: Proceedings of the Workshop on Sentiment Analysis where AI
meets Psychology (SAAIP), IJCNLP 2011, Chiang Mai, Thailand, November
13, 2011, pp. 89–93, 2011.
6. Kumar, A. and Jain, R., Sentiment Analysis and Feedback Evaluation. IEEE
3rd International Conference on MOOCs, Innovation and Technology in
Education (MITE), India, 2015.
Hybrid Approach for Feature Extraction 287
7. Liu, Q., Liu, B., Zhang, Y., Kim, D.S., Gao, Z., Improving Opinion Aspect
Extraction Using Semantic Similarity and Aspect Associations. Proceedings
of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 2016.
8. Minqing, H. and Bing, L., Mining and summarizing customer reviews,
in: Proceedings of ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD-2004), 2004.
9. Muqtar U., Ameen, A., Raziuddin, S., Opinion Mining on Twitter Data using
Unsupervised Learning Technique. Int. J. Comput. Appl. (0975 – 8887), 148,
12, 12–19, 2016.
10. Myfaveteacher : Online Indian platform for teachers’ feedback.
11. NLTK: http://www.nltk.org.
12. Pavlopoulos, J. and Androutsopoulos, I., Aspect Term Extraction for
Sentiment Analysis: New Datasets, New Evaluation Measures and an
Improved Unsupervised Method. Proceedings of the 5th Workshop on
Language Analysis for Social Media (LASM), Gothenburg, Sweden, pp. 44–52,
2014.
13. Poria, S., Cambria, E., Gelbukh, A., Aspect extraction for opinion mining
with a deep convolutional neural network. Knowledge-Based Syst., 108,
42–49, 2016.
14. Poria, S., Cambria, E., Ku, L.W., Gui, C., Gelbukh, A., A Rule-Based Approach
to Aspect Extraction from Product Reviews. Proceedings of the Second
Workshop on Natural Language Processing for Social Media (SocialNLP),
Dublin, Ireland, pp. 28–37, 2014.
15. Pranali, P. and Borikar, D.A., An Approach to Sentiment Analysis using
Artificial Neural Network with Comparative Analysis of Different
Techniques. IOSR J. Comput. Eng. (IOSR-JCE), 18, 2, 64–69, 2016.
16. Qiu, G., Liu, B., Bu, J., Chen, C., Opinion Word Expansion and Target
Extraction through Double Propagation, Assoc. Comput. Linguist., 37, 1–21,
2011.
17. Ratemyprofessor: Online American platform for teachers’ feedback.
18. SemEval2014: http://alt.qcri.org/semeval2014/task4/index.php?id=data-and-
|tools.
19. Siqueira, H. and Barros, F., A Feature Extraction Process for Sentiment
Analysis of Opinions on Services. Proceeding of International Workshop on
Web and Text Intelligence, 2010.
20. Wang, L., Liu, K., Cao, Z., Zhao, J., Melo, G.D., Sentiment-Aspect Extraction
based on Restricted Boltzmann Machines. Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing, Beijing,
China, pp. 616–625, 2015.
16
Spark-Enhanced Deep Neural Network
Framework for Medical Phrase Embedding
Amol P. Bhopale* and Ashish Tiwari
Abstract
The fundamental problem in information retrieval (IR) technique is the limited set
of query words and, when used, cannot be helpful in retrieving documents having
similar context words. Although word embedding has greatly benefited the field
of NLP by considering similar meaning words, it is not capable of dealing with
contextually similar phrases. This paper presents a deep neural embedding–based
solution which not only considers the similar context words but also takes care of
semantic phrases. To ensure the scalability, a spark-based map-reduce framework
is employed to extract phrases using NLP techniques and prepare a new annotated
dataset using these phrases. This paper uses Word2Vec Continuous Bag-of-Word
(CBOW) model to learn embeddings over the annotated dataset and extracts
contextually similar phrases. Considering the advancements in medical field and
requirements of effective IR techniques for clinical decision support using medical
artefacts, the proposed methodology is evaluated on a dataset provided by TREC-
2014 CDS track. It consists of 733,125 PubMed articles which have been used in
many IR experiments. In the result, meaningful phrases and their contextually
similar forms are observed. These phrases can further be used for query expansion
task.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (289–304) © 2021 Scrivener Publishing LLC
289
290 Machine Learning Algorithms and Applications
16.1 Introduction
In numerous NLP applications such as sentiment analysis, web search, and
language translation, phrases are considered to be one of the important
language units. It is assumed that the words occurred in the similar context
usually have similar meaning [1]. Advancements in deep neural network
technique have benefited word vector representation into its contextually
similar form with more information in hand. Compare to the symbolic
linguistic representations, these rich informative embedding helped in
various NLP tasks [2]. With the huge success of low-dimensional word
embedding, it has been widely used for other domains, such as network
[3] and user [4] embedding. Initially, embeddings are derived only for
words, but in recent years, it is extended to phrases as well. In information
retrieval (IR) tasks, phrase embedding helps by enriching the meaning of
a word with its contextually similar forms. It has many applications in sci-
entific and medical domains where technical concepts are mainly repre-
sented as multi-word terms.
The motivation is to perform medical phrase embedding in the vast
growing digital content about patients, diseases, and medical reports on the
Internet. Many times, practitioners find difficulties in retrieving decision
supportive documents with fewer query words or using only acronyms.
Phrase embedding can be used to infer the attributes of the context they
enclosed with and they also can capture non-compositional semantics. This
paper presents a study to build the phrase embedding model trained on a
large dataset. It is used for contextually similar phrase extraction based on
similarity scores to leverage the query expansion task which may improve
the quality of the retrieved documents.
To achieve scalability a spark map-reduce framework is employed which
leverages CPU power using its multi-threading architecture to process
massive amount of data. Map function is used to split the dataset and gen-
erate phrases. This process is generalized to include phrases by identifying
their boundaries at each special character and stop word. This technique
is also known as chunking. Reduce function collects the frequent phrases
and explicitly annotated corpus with the derived phrases. For learning
phrase embedding, the adopted approach treats phrases as one unit and
learns to embed them in the same way as the word embedding technique
proposed in paper [5]. The key contributions of this paper are as follows:
Split Data 1
Split Data 3
Forward Propagation
Phase 1
Input Layer
Output Layer
X1 W1 O1
Hidden Layer
X2 W2 O2
h1
X3 W3 O3
h2 Phase 2
h3 Error
Calculation
hN
Xv Wv Ov
Backward Propagation
Phase 3
yj =
1
V ∑ log p w ∑
t =1
t
− c ≤ j ≤c , j ≠ 0
wt + j
(16.1)
where c is the size of context window, wt is the target word, and V is vocab-
ulary size. Probability at output is expressed in terms of softmax function
as shown in Equation (16.2).
Input Layer
X1K
WV×N
Output Layer
Hidden Layer
N-dim
V-dim
WV×N
XCk
C × V-dim
16.5 Results
Objective of this study is to learn embedding for phrases as the word
embedding does. This section describes the results obtained, i.e., phrases
extracted using chunking technique and spark architecture from the TREC
dataset. In the latter section, the sample phrase and word embeddings gen-
erated by the proposed phrase embedding model is shown.
Table 16.3 Sample phrases and their embedding with similarity score.
Contextually similar to “chest Contextually similar to
pain” “human_genome”
Similarity Similarity
Word/phrase score Word/phrase score
Palpitations 0.700 Genome 0.742
Syncope 0.699 Mouse genome 0.710
Dyspnea 0.693 Human genes 0.706
Epigastric pain 0.681 Entire genome 0.657
(Continued)
300 Machine Learning Algorithms and Applications
Table 16.3 Sample phrases and their embedding with similarity score.
(Continued)
Haemoptysis 0.670 Genomes 0.656
Abdominal pain 0.662 Yeast genome 0.643
Dyspnoea 0.658 Human genomes 0.642
Palpitation 0.647 Coding regions 0.621
Pleuritic 0.646 Mouse genomes 0.609
Exertional 0.645 Plant genomes 0.608
Contextually similar to Contextually similar to
“measles_virus” “kidney_diseases”
Similarity Similarity
Word/phrase score Word/phrase score
Poliovirus 0.661 Renal_diseases 0.700
Virus 0.627 Kidney_disease 0.624
Paramyxovirus 0.600 Renal_disease 0.613
Edmonston 0.590 Nephropathies 0.609
Vaccinia_virus 0.590 Nephropathy 0.547
Influenza_virus 0.587 Kidney_failure 0.533
Viral 0.581 Renal 0.528
Measles_viruses 0.568 Glomerulopathy 0.523
Reovirus 0.566 Kidney 0.516
Ectromelia 0.563 Hyperfiltration 0.509
Contextually similar to Contextually similar to
“spinal_surgery” “cardiac_arrest”
Similarity Similarity
Word/phrase score Word/phrase score
Spine_surgery 0.791 Ohca 0.727
Spinal_fusion 0.725 Resuscitation 0.718
Discectomy 0.674 cpr 0.689
(Continued)
DNN Framework for Medical Phrase Embedding 301
Table 16.3 Sample phrases and their embedding with similarity score.
(Continued)
Laminoplasty 0.661 Asystole 0.680
Surgery 0.642 rosc 0.673
Microdiscectomy 0.625 Cardiac_arrests 0.672
Decompression 0.609 Pulseless 0.650
Durotomy 0.609 Comatose 0.629
Thyroid_surgery 0.605 Asystolic 0.624
Arthrodesis 0.598 Hypothermia 0.616
Table 16.4 Sample words and their embedding with similarity score.
Contextually similar to Contextually similar to
“tachycardic” “ribosomal”
Similarity Similarity
Word/phrase score Word/phrase score
Tachypneic 0.719 rrna 0.743
Pulse rate 0.678 Large subunit 0.659
Tachypnea 0.646 Small subunit 0.651
Afebrile 0.642 ssu 0.626
Diaphoretic 0.640 Ribosome 0.614
Vitals 0.606 lsu 0.598
Tachypnoea 0.605 rrna genes 0.571
Drowsy 0.582 rdna 0.571
Hypotensive 0.571 rrnas 0.570
Diaphoresis 0.562 Processome 0.560
Contextually similar to Contextually similar to
“echocardiogram” “dermatologist”
Similarity Similarity
Word/phrase score Word/phrase score
Angiogram 0.655 Dermatologists 0.785
(Continued)
302 Machine Learning Algorithms and Applications
Table 16.4 Sample words and their embedding with similarity score.
(Contitnued)
Echocardiograms 0.654 Dermatology 0.687
Cardiomegaly 0.655 Dermatological 0.600
Transesophageal 0.639 Rheumatologist 0.574
Transthoracic 0.619 Acral 0.573
Murmur 0.612 Dermoscopy 0.571
ekg 0.603 Allergist 0.568
tte 0.600 Dermatoses 0.560
Hypokinesis 0.579 Skin_lesions 0.559
Physical_exam 0.573 Artefacta 0.550
Contextually similar to Contextually similar
“distress” “diabetes”
Similarity Similarity
Word/phrase score Word/phrase score
Anxiety 0.708 dm 0.796
Psychological 0.664 Type diabetes 0.767
Depression 0.643 Hypertension 0.714
Psychosocial 0.642 Obesity 0.706
Feelings 0.629 cvd 0.704
Emotional 0.599 Dysglycemia 0.699
Hopelessness 0.593 Hyperglycemia 0.688
Physical stress 0.593 Prediabetes 0.688
Social problems 0.590 Diabetic 0.685
Somatization 0.586 Dyslipidemia 0.677
DNN Framework for Medical Phrase Embedding 303
meaningful medical concepts and their similarity score are derived for
each target phrase and word.
16.6 Conclusion
This paper has proposed an idea of phrase embedding on a pre-annotated
corpus. The spark-based map-reduce framework is employed to achieve
scalability on large volume dataset and chunking technique is applied to
extract frequent word phrases from it. A word2vec CBOW model–based
phrase embedding technique is used to learn phrase and word representa-
tion in the same vector space and capture meaningful concepts. These con-
cepts have many applications in the field of IR and recommender systems.
It can be widely used for query expansion. In the future, it is proposed to
investigate the effect of medical query expansion using the phrase embed-
ding technique in the field of IR for clinical decision support.
References
1. Harris, Z.S., Distributional structure. Word, 10, 2–3, 146–162, 1954.
2. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.,
Natural language processing (almost) from scratch. J. Mach. Learn. Res., 12,
2493–2537, 2011, Aug.
3. Grover, A. and Leskovec, J., node2vec: Scalable feature learning for networks.
Proceedings of the 22nd ACM SIGKDD international conference on Knowledge
discovery and data mining, 2016.
4. Yu, Y., Wan, X., Zhou, X., User embedding for scholarly microblog recom-
mendation. Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 2: Short Papers), 2016.
5. Mikolov, T., Chen, K., Corrado, G., Dean, J., Efficient estimation of word
representations in vector space, 1st International Conference on Learning
Representations (ICLR) Scottsdale, Arizona, USA, Workshop Track
Proceedings, http://arxiv.org/abs/1301.3781, 2013
6. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman,
R., Indexing by latent semantic analysis. JASIST, 41, 6, 391–407, 1990.
7. Blei, D.M., Ng, A.Y., Jordan, M., II, Latent dirichlet allocation. J. Mach. Learn.
Res., 3, 993–1022, 2003, Jan.
8. Baroni, M., Dinu, G., Kruszewski, G., Don’t count, predict! A systematic
comparison of context-counting vs. context-predicting semantic vectors, in:
Proceedings of the 52nd Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pp. 238–247, 2014.
304 Machine Learning Algorithms and Applications
Abstract
Advancement in deep learning requires significantly huge amount of data for
training purpose, where protection of individual data plays a key role in data pri-
vacy and publication. Recent developments in deep learning demonstrate a huge
challenge for traditionally used approch for image anonymization, such as model
inversion attack, where adversary repeatedly query the model, in order to recon-
struct the original image from the anonymized image. In order to apply more pro-
tection on image anonymization, an approach is presented here to convert the
input (raw) image into a new synthetic image by applying optimized noise to the
latent space representation (LSR) of the original image. The synthetic image is
anonymized by adding well-designed noise calculated over the gradient during
the learning process, where the resultant image is both realistic and immune to
model inversion attack. More presicely, we extend the approach proposed by
T. Kim and J. Yang (2019) by using Deep Convolutional Generative Adversarial
Network (DCGAN) in order to make the approach more efficient. Our aim is
to improve the efficiency of the model by changing the loss function to achieve
optimal privacy in less time and computation. Finally, the proposed approach is
demonstrated using a benchmark dataset. The experimental study presents that
the proposed method can efficiently convert the input image into another syn-
thetic image which is of high quality as well as immune to model inversion attack.
Mettu Srinivas, G. Sucharitha, Anjanna Matta and Prasenjit Chatterjee (eds.) Machine Learning
Algorithms and Applications, (305–330) © 2021 Scrivener Publishing LLC
305
306 Machine Learning Algorithms and Applications
17.1 Introduction
Image anonymization technique has been developed to make it more dif-
ficult to identify a particular image from provided altered image. In image
anonymization, one is provided with the original image, and our task is
to convert that original image into some another anonymized image by
changing the pixel or adding random noise, etc., in such a way that an
image recognition model or a human eye, would not be able to label/
recognize the original image. Image anonymization plays an important role
in today’s world, as many tasks in today’s world require realistic data and
these realistic data may contain confidential information of an individual
and the particular individual does not want to disclose the information
because making this dataset public will affect the privacy of the individual.
Even if the dataset is made private, but if the model is not constructed
properly, there are some attacks possible over these private datasets to cap-
ture the original image by continuously querying the model.
Social media is also one of the domains, where confidential data may be
captured in an image, which alarms the privacy issue. Usually, social media
get millions of images from all his users, and these images may contain
some confidential information of a person, which he/she does not want to
disclose. This raises to a privacy issue and may lead to legal actions. Image
anonymization is one of the tasks which can be carried out to prevent such
privacy issue and legal actions. Machine learning also requires big amount
of data to prepare their model; more precisely, this large amount of data is
required to train the weights of the model so that, for some random data
also, their model should work. Therefore, big amount of data is necessary
to train these types of machine learning model. This large amount of data
can be made public or just be used to train the model only making this
dataset private. Public dataset is available to the entire user in the world
who has access to the link to download the same and should not contain
any sensitive information corresponding to a user. But private dataset is
considered to be having confidential data; therefore, privacy of these data-
set is also an important task while training the model. Recent researches on
the field of adversarial machine learning reveal that training data can also
be fetched form the trained model creating privacy issue for the training
data. One of such attack is model inversion attack in which the adversary
tries to query the trained machine learning model over a random image
and changes the random image according to the confidence value attain.
Another major importance of image anonymization is because of the
new European law, GDPR. In which, an organization/institute, cannot
Image Anonymization Using DCGAN 307
not be sufficient to remove the risk of privacy over the training data. To
achieve the privacy preserving goal, they have used objective permutation
given in [2]. Synthetic data generation [3–5] has been in the main focus for
privacy-preserving data publication in recent years, where sensitive data
is fully or partially replaced with synthetic data before it is allowed to be
published. Beaulieu-Jones et al. [3] have used ACGAN and have applied
objective perturbation over it, to generate sharable data. However, T. Kim
and J. Yang [6] have stated that Generative Adversarial Network (GAN)
[7, 8] with objective permutation, with training on image dataset [9–12],
lead to mode collapse, on their prior attempts. Objective permutation is
complex, costly, and time consuming, when used to generate synthetic data
using GAN, in differential private way. Instead of performing these costly
operations, T. Kim and J. Yang [6] have proposed one methodology to add
optimized noise to the latent space representation (LSR) of the image to
generate anonymized image, which are also differential private. In our
study, we have improved the performance of this methodology by opti-
mizing the loss function to provide us the same privacy gain or same level
of anonymized image in less time and computation. Recent research by
Fredrikson et al. [13] introduced a medical related case study by defining
model inversion attack, where the adversarial had access to the machine
learning–based model and was able to learn sensitive genomic function
about some individual. The authors in [14] stated that using model inver-
sion attack an adversary can recover (up to certain degree) the original
individual face of a particular person using the blurred face image, which
are removed by adding optimized noise generated via a neural network,
which takes input as a random noise and optimized this noise in such a
way that, when added to the original image, one is able to produce ano-
nymized image, which are immune to these privacy attacks.
A flowchart for image anonymization using GAN is given in Figure
17.1, where the latent space of an image is captured using encoder network,
then a well-designed noise is added to these latent spaces. Using decoder
network, a new anonymized image is constructed. The quality of the con-
structed image is supervised using discriminator which allows the gener-
ator network to train/construct new anonymized images which are more
realistic in nature. Figure 17.1 asserts the process used to convert a raw
input image into another synthetic image. In this process, firstly, the LSR of
an image is extracted, over which, optimized noise is added to make noised
latent space representation. Then, the GAN is trained, in min-max game,
so that the model gives highly anonymized, but realistic images.
Objective of this study is to present an image anonymization technique
using GAN, which will transform an image by adding optimized noise
Image Anonymization Using DCGAN 309
internal of the model (code and properties), whereas black box attack can
be performed by anyone who can query the model to generate the output.
observing the relationship between the input and output, adversary tries to
construct an inverse model for given machine learning model. This inverse
model will undo the changes made by our model based on the parameters
observed by fetching the relationship between the input and output.
17.2.3.1 Definition
“A randomized mechanism M:D→R with domain D and range R satisfies
(ϵ,δ)-differential privacy if for any two adjacent inputs d, d′ ∈ D and for
any subset of outputs S ⊆ R, it holds that
The real valued function f: D->R is basically approximated with the help
of Gaussian and Laplace noise mechanism (shown in Tables 17.1 and 17.2),
which is basically, done by adding noise calculated from f ’s sensitivity sf.
This sf can be calculated by fetching the maximum value from the abso-
lute difference between d and d’, where d and d’ are two adjacent inputs.
Therefore, one can define sf as sf = Max |f(d) − f(d’)|.
This paper has used Gaussian noise mechanism in the noise amplifica-
tion module. We can change the intensity (strength) of the noise by chang-
ing the value of ϵ and δ. One can also magnify (amplify) the noise from
Image Anonymization Using DCGAN 313
1.25 sf
If σ = 2log ∗ (satisfies (ϵ, δ) − Differential Privacy)
δ ∈
for certain relationship or certain rules in the input data to classify that the
provided email is spam or not. Therefore, we can say that it behaves like a
two-way classifier. In the same way, the descriptive network in GAN will
be trained on real as well as fake images and the task of the discriminator
is to label the real image as 1 (real) and fake image as fake (0). Unlike dis-
criminator, the generator networks try to predict the features of an image
that are realistic or can be passed through a discriminator network as a
real image feature. The generator network tries to generate such features
that are like the real data distribution. The generator network tries to pre-
dict what should be features of a real data distribution. For example, you
have the same email spam task; our generator will try to generate emails
which are most likely to be a spam. So, it can be stated that our generator
tries to answer, if a given email is a spam, how likely are their features.
Based on this, our generator will generate emails which are most likely to
be spammed as they contain features of a spam. GANs are basically consist
of two neural networks which are generator or protector network and dis-
criminator or critic network. Generator or protector network will generate
random samples and our task is to generate such samples which are most
likely to be classified as a real data. The task of the generator is to optimize
the weights of the generator network in such a way that, when an random
sample is given to the generator network, it should be able to produce new
instance of data, which have similar features as that of real data or when
passed through the discriminator, the discriminator should classify that
data instance as real data instance. Discriminator or critic network will
classify the data instance as real or fake. Depending upon the training it
receives from the real dataset to confidently classify real data instance as
real images (1). We first train our discriminator network to classify real
images as real and fake images as fake having generator module at not
trainable for those instance of real and fake data, i.e., we will not opti-
mize the weights associated with generator network but will optimize the
weights associated with discriminator network. Then, generator network is
trained to produce images which are from random noise and tries to opti-
mize the generator network to construct such synthetic images which will
pass though discriminator as a real image. While training the generator
network, we will have our discriminator network as not trainable. In the
proposed method, the fake image is generated by adding noise to the real
image, making it an anonymized image. Below given are the steps GAN
takes to generate new data samples:
Real Sample
Latent
Space
Is D
discriminator
Correct ?
Generator
Generated
Fake Samples
Noise
3 (6 + 6 + 6 + 6 + 2 × 9 = 42) 7 8 9 10
2 6 6 3 1 1 0 0 2
1 4 5 1 4 5 2 γ1 2 0 1 0 0
1 2 3 7 8 9 10 3 0 0 2 0
(6 + 6 + 6 + 8 + 9 + 7 = 42)
3 7 8 9 10
2 6 2 3 1 1 0 1 1
1 4 5 1 4 5 6 γ2 2 0 1 0 0
1 2 3 7 8 9 10 3 0 0 1 1
For example, suppose one needs to move the blocks (Figure 17.3) from
position 1, 2, 3 to 7, 8, 9, 10 with defined shape at destination position,
which has been shown above. There are many ways to performing this task
to move the box from one location to another so that we have the defined
shape at the last output/destination. Each of these ways of performing the
task are called as transportation plan and each plan contains a transporta-
tion cost calculated using (source position (i) – destination position (j)) *
number of blocks moved (value at i, j). In such a way, one can calculate the
transportation cost of each transportation plan and the minimum among
them can be classifies as Wasserstein distance.
1
∑ 1
∑
m m
g w ← ∇ w f w (x (i) ) − f w (g θ (z (i) )) (17.3)
m i =1 m i =1
large or batch normalization is not used). From this, one can state that the
performance of the model also depends upon the hyper parameter “c”. To
avoid this, we can use gradient penalty instead of clipping to enforce the
1-Lipschitz constraint.
∑ P( x )
N
DKL ( P Q ) = P(x)log (17.6)
x =1 Q( x )
where DKL(P || Q) is the notation for KL divergence, and P and Q are two
difference distributions. The main intuition behind KL divergence is that
one will have large divergence, when probability of P is large but proba-
bility of Q is small. In the same way, we will have large diversion, when
we have large probability of Q, but small probability of P. In this way, we
have a metric score to generalize the distance between two distributions.
KL diversion is not symmetric in nature, i.e., KL(P || Q) != KL(Q || P). JS
Divergence uses KL divergence to normalize the score, to have it symmet-
ric in nature. That is, with JS divergence, the divergence from p to q will be
same as that of from q to p, i.e., JS(P || Q) == JS(Q || P). The JS divergence
can be calculated using below given formula
1 p+q 1 p+q
DJS (pq) = DKL p + DKL p (17.7)
2 2 2 2
divergence will have gradient vanishing problem, when two data distribu-
tion are not overlapping.
17.2.9 DCGAN
DCGAN [25] is considered as an extension of GAN with few changes over
the model, with main change corresponds to the usage of convolutional and
convolutional-transpose layers in the discriminator and generator network,
respectively. In DCGAN, discriminator consists of convolution layers, batch
normalization layers, and leaky ReLU activation. The input to this discrimi-
nator is an image, whereas the output comprises of scalar probability stating
whether this image is from real data distribution or not. The generator con-
sists of convolutional-transpose layers, batch norm layers, and ReLU acti-
vation, which will take input as a random latent vector, and using strided
conv-transpose layers it converts the random latent vector into an image.
DCGAN has a min-max game between the generator and discriminator.
The generator is given, which produces random samples and we have real
samples also, both of these samples are given to the discriminator with real
samples as value 1 and fake samples as value 0. Whereas our generator is
train with fake images and value as 1. Therefore, our discriminator tries to
correctly mark the fake image and generator tries to generate images which
can have label value as 1. Below given is the loss function used in DCGAN:
where M gets the original image I, anonymize it into another image I’, N
attempts to rebuild the original image I from the new anonymized image I’,
and O checks the quality of I’.
The noise amplifier is added into the generator module in such a way
that it will generate anonymized image as well as it will prevent the model
inversion attack over the trained machine learning model. The noise ampli-
fier is trained in such a way that it will generate optimized noise which will
help us to attain differential privacy. The traditional image anonymization
technique using GAN and noise amplifier was introduced in [6], but they
have used WGAP-GP as a loss function. WGAP-GP produces effective
anonymized images but is time consuming and requires more computa-
tion power than tradition GAN. The reason for this large overall training
time and higher computation is basically due to involvement of gradient
penalty, which has to be calculated to define the discriminator loss. This
calculation of gradient penalty makes the model more time consuming
and requires more computation power. Gradient penalty is calculated to
withhold the Lipschitz constraint. Another reason for higher computa-
tion time is due to that fact that the discriminator in the WGAP-GP must
train discriminator loss “n” number of times for one generator loss per
epoch. This makes WGAP-GP discriminator more time consuming and
because of this the overall time to train the model increases. Instead of
using WGAN-GP, DCGAN loss function is used to train our model. The
DCGAN loss function is given below.
17.3.1 Algorithm
∑
m
wc ← Adam (∇(1/m) L(i)
c , wc, α, β1, β2)
i =1
for (t = 1, . . . . ., nattack):
for (i = 1, . . . , m):
Sample real data set y ~ Pr.
ỹ ← Pθ (y)
L(i)
a ← ||y − Awa(ỹ)||2
∑
m
wa ← AdamOptimizer(∇(1/m) L(i)
a , wc, α, β1, β2)
i =1
for (i = 1, . . . . ., m):
Sample real data set y ~ Pr.
ỹ ← Pθ (y)
L(i) p ← −Cwc(ỹ)
∑
m
θ ← AdamOptimizer(∇(1/m)) L(i)
p , wc, α, β1, β2)
i =1
322 Machine Learning Algorithms and Applications
17.3.2 Training
To evaluate the performance of an attacker network, we have defined a new
term called as privacy gain, where privacy gain can be calculated as L2 loss
between original image and reconstructed image. The main objective of
the adversary network is to minimize this privacy gain (Lpriv). Privacy gain
can be expressed as LPriv = || x-x’||2, where x is the original image and x’ is
the anonymized image. Discriminator tries to categories the original image
as real (label = 1), and anonymized image as fake (label = 0). DCGAN loss
function is used since we are adding additional noise toward the original
image. So, we would be having overlapping data distribution for these two
images. DCGAN Loss function is expressed as
1
∑
m
∇Θd [log D(x (i) ) + log(1 − D(G(z (i) ))] (17.10)
m i =1
where x(i) is the ith original image, D(x(i)) is the probability value of this
image to be categorized as a real image, z(i) is the noise added LSR of
original image, G(z(i)) is the anonymized image, and (1 − D(G(z(i)))) is
the probability value of this anonymized image to be categorized not as
a real image. The generator in our experiment has trained twice for one
Image Anonymization Using DCGAN 323
The attacker’s task is to minimize LPriv, i.e., LA= Lpriv, and the attacker
network will try to minimize this distance. When training of our model is
started, firstly, all the weights associated with protector, critic and attacker
network are fine-tuned by performing adversarial training. Then we will
try to converge the protector’s parameters, when this happens protector is
optimized to generate synthetic image, which are realistic and immune to
model inversion attack.
1.25 Se (17.13)
σ(Se ; ,δ ) = 2log ∗
δ
324 Machine Learning Algorithms and Applications
where xi and xj both belongs to sample images St. With the sensitivity value
added to the anonymized vector, one will have zi’ value as
where, zi is the original LSR of the ith image, se is the appropriate sensitivity,
noise amplifier function σ, and n0 is the initial noise. The value of ϵ is 1 and
δ is 1e-8.
17.3.4 Dataset
This paper used the MNIST dataset which is downloaded from “tensor-
flow.examples.tutorials.mnist” library using input_data() function and
is divided into three sections, such as 55,000 datapoints of training data
(mnist.train), 10,000 datapoints of test data (mnist.test), and 5,000 data-
points of validation data (mnist.validation).
final image of the decoder in the range of [0, 1], sigmoid is used instead of
ReLU, in the last activation function of the decoder in the attacker. The dis-
criminator network will take the image generated by the generator network
x’ of size N*N*k and determines whether it is fake or real. Discriminator
C contains four or five convolutional layers with 5*5 and stride of size 2,
which is followed by batch normalization and leaky ReLU. Five convolu-
tion layers will be used for an image of size 64*64*3, i.e., for color images
and four convolution layers for rest of image sizes. From discriminator net-
work, the output is sent through sigmoid layer.
17.3.6 Working
Encoder present at generator’s side will take the image of size N*N*K and
will convert it into vector representation of that image (z). On this vector
representation, we will add noise generated using noise amplifier and this
modified vector representation will be given to generator’s decoder to con-
vert the vector representation into modified image, which would be our
anonymized image. Attacker is a network of encoder-decoder, which will
take input as this anonymized image and will try to generate the original
image form this. The image generate by generator network will be given to
the discriminator to evaluate, i.e., to check whether the image generated
by the generator is real or not. Discriminator evaluates the performance of
the generator network, so that generator will generate more realist image,
which would be hard for attacker to generate the original image out of it.
trained over MNIST dataset using WGAN-GP and DCGAN loss function.
Both the loss function gives the nearly the same privacy gain, but DCGAN
takes less time and computation, as compared to WGAN-GP.
Privacy
WGANGP
0.5 DCGAN
0.4
0.3
y - axis
0.2
0.1
the original image, and this noise is also first optimized. Therefore, from
Figure 17.7, we can state that DCGAN will provide same privacy gain in
much less time and will also require less computation than WGAN-GP.
Figure 17.7 comprises of anonymized image (left most), original image
(middle one), and data distribution graph of anonymized image (red) vs.
328 Machine Learning Algorithms and Applications
25 original data
anonymized
data
0 0
20
5 5
10 10
15
15 15
20 20 10
25 25
0 5 10 15 20 25 0 5 10 15 20 25 5
0
–0.25 0.00 0.25 0.50 0.75 1.00 1.25
Figure 17.7 MNIST data distribution: left-hand figure shows anonymized data, middle
one shows original data, and right figure shows overlapping between these two images.
original image (blue). These images are taken from the first few images
given to our proposed model at initial stage of training. Therefore, it is
concluded that vanishing gradient problem will not arrive in our proposed
model due to non-overlapping data distribution. So, instead of WGAN-GP,
we can use DCGAN loss function to train our model.
17.5 Conclusion
Image anonymization is an important aspect to hide the privacy of a user
in an image captured with/without user concern and allow those images to
be used for project development purpose, mainly to train machine learn-
ing model. This also allows an organization to work on those images with-
out concerning European Union data privacy act (GDPR). Therefore, it is
important to anonymize these images to protect the privacy of user as well
as to provide compliance with GDPR. In this study, we have presented one
training model, which will anonymize the image by adding noise to its
latent feature vector, constructing new anonymized images, which are real-
ist and immune to model inversion attack. More precisely, we have contrib-
uted to present an image anonymization methodology to prevent model
inversion attack, which requires less time and computation power than
the traditional image anonymization to prevent model inversion attack. In
future, to enhance the efficiency of the network or to increase the privacy
gain, one can use RNN type structure to add noise at every convolutional
layer which should be optimized with a dense layer. Future scope of this
project can also be directed toward face and background anonymization.
Image Anonymization Using DCGAN 329
In the same way, one can use it for gait, speech, video, text anonymization,
and exploring different domains.
References
1. Chaudhuri, K., Monteleoni, C., Sarwate, A.D., Differentially private empiri-
cal risk minimization. J. Mach. Learn. Res., 12, 1069–1109, 2011.
2. Dwork, C., Differential privacy, in: Automata languages and programming,
pp. 1–12, Springer, Berlin, Germany, 2006.
3. Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Greene, C.S., Privacy-preserving
generative deep neural networks support clinical data sharing. BioRxiv, 2017.
4. Li, H., Xiong, L., Zhang, L., Jiang, X., DPSynthesizer: Differentially private
data synthesizer for privacy preserving data sharing. Proc. VLDB Endowment,
vol. 7, pp. 1677–1680, 2014.
5. Zhang, J., Cormode, G., Procopiuc, C.M., Srivastava, D., Xiao, X., PrivBayes:
Private data release via Bayesian networks. ACM Trans. Database Syst., 42, 4,
25:1–25:41, 2017.
6. Kim, T. and Yang, J., Latent-space-level image anonymization with adversar-
ial protector networks. IEEE Access, 7, 84992–84999, 2019.
7. Arjovsky, M., Chintala, S., Bottou, L., Wasserstein generative adversarial net-
works. Proc. 34th Int. Conf. Mach. Learn, vol. 70, pp. 214–223, 2017.
8. Berthelot, D., Schumm, T., Metz, L., BEGAN: Boundary equilibrium genera-
tive adversarial networks. CoRR, 2017. arXiv preprint arXiv:1703.10717.
9. Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J., LSUN: Construction of a large-
scale image dataset using deep learning with humans in the loop. CoRR,
2015. arXiv:1506.03365.
10. Krizhevsky, A., Nair, V., Hinton, G., Cifar-10, 2020. http://www. cs.toronto.
edu/kriz/cifar.html
11. LeCun, Y. and Cortes, C., MNIST handwritten digit database, 2010. http://
yann.lecun.com/exdb/mnist/
12. Liu, Z., Luo, P., Wang, X., Tang, X., Deep learning face attributes in the wild.
Proc. Int. Conf. Comput. Vis. (ICCV), 3730–3738, 2015.
13. Fredrikson, M., Lantz, E., Jha, S., Lin, S., Page, D., Ristenpart, T., Privacy in
pharmacogenetics: An end-to-end case study of personalized warfarin dos-
ing, in: 23rd Security Symposium ({USENIX} Security 14), pp. 17–32, 2014.
14. Fredrikson, M., Jha, S., Ristenpart, T., Model inversion attacks that exploit
confidence information and basic countermeasures, in: Proceedings of the
22nd ACM SIGSAC Conference on Computer and Communication, 2015.
15. Dwork, C., McSherry, C.F., Nissim, K., Smith, A., Calibrating noise to sensi-
tivity in private data analysis, in: Theory Cryptography, pp. 265–284, Springer,
Berlin, Germany, 2006.
330 Machine Learning Algorithms and Applications
16. Dwork, C., Differential privacy: A survey of results, in: Theory and
Applications of Models of Computation, pp. 1–19, Springer, Berlin, Germany,
2008.
17. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., Bengio, Y., Generative Adversarial Networks. Proceedings
of the International Conference on Neural Information Processing Systems,
pp. 2672–2680, 2014.
18. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.,
Improved Techniques for Training GANs. arXiv:1606.03498 [cs.LG], 2016.
19. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A., Image-to-image translation with con-
ditional adversarial nets. Computer vision and pattern recognition, 2017.
20. Ho, J. and Ermon, S., Generative adversarial imitation learning. Adv. Neural
Inf. Process. Syst., 4565–4573, 2016.
21. Zhao, J.J., Mathieu, M., LeCun, Y., Energy-based generative adversarial net-
work. CoRR, 2016. arXiv:1609.03126.
22. Radford, A., Metz, L., Chintala, S., Unsupervised representation learn-
ing with deep convolutional generative adversarial networks. CoRR, 2015.
arXiv:1511.06434.
23. Odena, A., Olah, C., Shlens, J., Conditional image synthesis with auxiliary
classifier GANs, arXiv:1610.09585, 2016.
24. Arjovsky, M., Chintala, S., Bottou, L., Wasserstein GAN, Courant Institute of
Mathematical Sciences 2Facebook AI Research, 2017.
25. Mehralian, M. and Karasfi, B., RDCGAN Unsupervised representation learn-
ing with regularized deep convolutional generative adversarial networks. 9th
Conference on artificial intelligence and robotics and 2nd Asia-Pacific interna-
tional symposium, Kish Island, Iran, pp. 31–38, 2018.
Index
Accuracy, 190, 205, 211, 222–223, 226, Classification results with feature
258–259 selection, 108
Activation, 220, 224–226 Clustering, 118
Adagrad, 46 CNN, 83–88, 90–95, 257
AFEW, 233 Combination, 177, 179, 186, 189
AI models, 42 Computer vision (CV), 35, 78, 81, 83,
Air quality index (AQI), 5 85, 86, 89, 95
ANN, 77–79, 81, 82, 87, 90, 95 Confusion matrix, 222
Arch, 212–213, 219, 221–222 Contextual similarities, 276
Arrhythmia, 143, 151–155 Contrast enhancement, 25
Artificial activity, 197 Convolution layer, 236
Artificial intelligence, 229, 246 Convolutional neural networks
Artificial neural networks (ANN), (CNN), 23, 211, 213, 219, 229,
41–42, 199, 232, 236 232–236, 238–239, 241–246
Association of aspect terms with Covid-19, 5
sentiment words, 277 Cross-entropy loss, 234
Authentication, 177 Cross-spectral, 177–179, 182–184, 188,
Automatic facial expression analysis 190
(AFEA), 230, 235 Cross-validation, 233
Automatic facial point localization,
231 Data acquisition, 47
Data pre-processing, 48
Batch normalization, 233 Database, 177–179, 182–185, 188–190
Bayes classifiers, 231 Datasets, 197, 229, 232–234, 242–243
Biometrics, 177 CK+, 233
Black box attack, 310 FEI face detection database, 234
BOVW, 229 FER 2013, 229, 242–244
BP4D, 233 JAFFE, 233–234
Kohn-Kanade, 234–235
Chi-square distance, 234 Kohn-Kanade AU-coded facial
Classification, 34, 35, 37, 229, 231–234, expression database, 235
241–246 De-blurring attack, 311
331
332 Index