Computer Vision and Recognition Systems
Computer Vision and Recognition Systems
RECOGNITION SYSTEMS
Research Innovations and Trends
COMPUTER VISION AND
RECOGNITION SYSTEMS
Research Innovations and Trends
Edited by
Chiranji Lal Chowdhary, PhD
G. Thippa Reddy, PhD
B. D. Parameshachari, PhD
First edition published 2022
Apple Academic Press Inc. CRC Press
1265 Goldenrod Circle, NE, 6000 Broken Sound Parkway NW,
Palm Bay, FL 32905 USA Suite 300, Boca Raton, FL 33487-2742 USA
4164 Lakeshore Road, Burlington, 2 Park Square, Milton Park,
ON, L7L 1A4 Canada Abingdon, Oxon, OX14 4RN UK
Contributors......................................................................................................... ix
Abbreviations ....................................................................................................... xi
Preface .................................................................................................................xv
12. Image Synthesis with Generative Adversarial Networks (GAN) .......... 239
Parvathi R. and Pattabiraman V.
H. Azath
VIT Bhopal, India
Chantana Chantrapornchai
Faculty of Engineering, Kasetsart University, Bangkok, Thailand
Tripti Goel
Department of Electronics and Communication Engineering, National Institute of Technology Silchar,
Assam 788010, India
Kiran
Department of ECE Engineering, Vidyavardhaka Engineering College, Mysuru, India
Pedram Khatamino
Department of Computer Engineering, İstanbul University - Cerrahpaşa, İstanbul, Turkey
Panida Khuphira
Faculty of Engineering, Kasetsart University, Bangkok, Thailand
Pradnya S. Kulkarni
School of Computer Engineering and Technology, MIT World Peace University, Pune, India
Honorary Research Fellow, Federation University, Australia
Vijay Kumar
Department of Computer Science and Engineering, National Institute of Technology, Hamirpur,
Himachal Pradesh, India
R. Maheswari
VIT Chennai, India
N. Jagan Mohan
Department of Electronics and Communication Engineering, National Institute of Technology
Silchar, Assam 788010, India
R. Murugan
Department of Electronics and Communication Engineering, National Institute of Technology
Silchar, Assam 788010, India
Zeynep Orman
Department of Computer Engineering, İstanbul University-Cerrahpaşa, İstanbul, Turkey
H. T. Panduranga
Department of ECE Engineering, Govt. Polytechnic, Turvekere, Tumkur, India
Sweta Panigrahi
Department of Computer Science and Engineering, National Institute of Technology Warangal,
Telangana State, India
B. D. Parameshachari
GSSS Institute of Engineering & Technology for Women, Mysuru, India
R. Parvathi
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
Debanjan Pathak
Department of Computer Science and Engineering, National Institute of Technology Warangal,
Telangana State, India
V. Pattabiraman
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
Patchara Pattiyathanee
Kasetsart University, Bangkok, Thailand
U. S. N. Raju
Department of Computer Science and Engineering, National Institute of Technology Warangal,
Telangana State, India
G. Thippa Reddy
School of Information Technology & Engineering, VIT Vellore 632014, Tamil Nadu, India
P. Sharmila
Sri Sai Ram Engineering College, India
Kundjanasith Thonglek
Nara Institute of Science and Technology, Nara, Japan
Norawit Urailertprasert
Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
Pritam Verma
Department of Computer Science and Engineering, National Institute of Technology, Hamirpur,
Himachal Pradesh, India
Abbreviations
ABSTRACT
In the wintry weather period, haze is the prime confront during driving. It
eliminates the visibility of an image. Fog removal techniques are required to
improve the visibility level of the image. In this chapter, a hybrid approach
is implemented for fog removal. The expected approach utilizes the basic
concepts of Dark Channel Prior and Bright Channel Prior. Apart from this,
order statistic filter would use to refine the transmission map. The bright
channel prior to boundary constraints would use to restore the edges. The
proposed technique has been compared with existing techniques over
a set of well-known foggy images. The proposed approach outperforms
the predefined techniques in terms of average gradient and percentage of
saturated pixels.
1.1 INTRODUCTION
the light comes toward the camera or the viewer is incapacitated due to
scattering through droplets and distort the visual quality of the image.4,6,7 To
conquer this problem, some sophisticated systems have been developing
to maximize visibility during restraining the strong and dazzling light for
oncoming vehicles.12 For the recognition of fog the motor vehicle detection
system was developed8,13,21 but the main tribulations would have occurred
that could not be able to remove the sky visibility. The automatic fog
detection could detect only daytime fog but it would not able to detect
the nighttime fog. To conquer this problem, computer vision techniques
have been started to use.11,14 These techniques also helped to cut down the
operating cost and accommodated a better visual system.10,25, He et al.16
planned a Dark Channel Prior (DCP) that would have utilized image pixels
with low-intensity value in at least one of the color channels. Nevertheless,
this value could be lessened in contrast due to additive air light. DCP
commonly use to evaluate the transmission map and atmosphere shroud.9,20
Fattal18 described the local color line prior to re-establish hazy images.
Nandal and Kumar (2018) proposed a novel image defogged model that
would use fractional-order anisotropic diffusion. They would have used
the air light map that would have been evaluated from the hazy model as
the picture in the anisotropic dissemination development. However, it went
through halo artifacts. To reduce this problem,19 implemented a technology
that would use improved DCP and contrast adaptive histogram equalization
that would able to remove the halo artifact with a new median operator in
the DCP. They would use a guided filter for the alteration of the transmission
map. Contrast Limited Adaptive Histogram Equalization (CLAHE) would
use for further visibility improvement but the complexity of computational
was so high. To cut down the complexity problem,22 integrated DCP and
Bright channel prior (BCP) would have been developed. They would use
BCP to solve the sky-region problem that would relate with DCP-based
Visual Quality Improvement Using Single Image 3
dehazing.5 They would use gain intervention filter to increase the computation
speed and improve edge preservation. In spite of this, this technique would
not able to provide the optimum solution for degraded images. To reduce
the above-mentioned problem, the hybrid algorithm is implemented that
integrates the DCP and BCP. The proposed approach uses a 2D order statistic
filter to illuminate the transmission map. BCP with boundary constraints is
being used to restore the edges. This technique is being compared with the
existing techniques over a set of well-known foggy images. The leftover
configuration of this section is as follows. Section 1.2 briefly describes the
degradation model. The proposed defogging techniques are mentioned in
Section 1.3. Experimental fallout and planning are given in Section 1.4. The
concluding observations are given in Section 1.5.
According to the DCP, an RGB image has at least one color channel that
have some pixels of lowest intensities that tends to zero. For examples,
4 Computer Vision and Recognition Systems
where hfIc denotes the intensity of the color channel c∈(R,G,B) of the
RGB image and and λ(x) is a local patch centered at pixel. The minimum
value among the three-color channels and all pixels are considered as the
dark channel hfIdark. The dark channel pixel value can be approximated as
follow16:
hfI dark ≈ 0 (1.5)
The dark channel is known as DCP when the approximation is zero for the
pixel values. Another part of this, the dark channel for the foggy images
produces the pixels that have values greater than zero. Global atmosphere
light heads to be achromatic and bright. A combination of air light and
direct depletion significant increases the minimum value of the three
colors in the local patch. This signifies that the pixel values of the dark
channel can play a particular rule to estimate the fog density.
obI ( x ) hfI ( x ) e(
=
−∂d ( x ) )
(
+ Air 1 − e
−∂d ( x )
) (1.6)
Then, the min operator of the three color channel is applied to eq. 1.7
as follow:
obI c ( x) hfI c
min
= min c tra ( x ) min min
+ (1− tra ( x)) (1.8)
c
Air Air
bright
Tr ( x ) = 1 – hfI bright ( x) (1.16)
It is a lower and upper bound limit of the solution x. By the help of this,
faster and reliable solutions can be generated by holding the upper and
lower bounds limit. Let’s consider that bounds are vector with the same
length as x.
• If no lower bound for any component then use -Inf as the bound and
use Inf for no upper bound.
• If either have upper or lower bound, then don’t need to write the
other type. For example, if have no upper bounds then do not need
to supply the other vector of Infs.
• Out of n component, if the first m have bounds then they have to
supply a vector of length m containing bounds.
For example, their boundaries are x ≥ 7 and ≤3. The constraint vectors
can be lb = lower-bound= [-Inf;-Inf;7] and upper-bound = [Inf;3] (will
give a warning) or upper bound = [Inf;3;Inf].
This is used to eliminate the fog from the foggy image. DCP uses patch
wise transmission form boundary constraints. It uses the hazy image, air
light, and check pixel-wise boundary for each colour (RGB) and uses the
max filter on concentration for the result set of the RGB.
1.3.5 ALGORITHM
Tr
int egrated
(
( x) = Tr
dark
)(
( x) / Tr
bright
( x) ) (1.17)
FIGURE 1.1 Defogging process: (a–d) Foggy images, (e–h) dark channel prior, (i–l)
double bright channel prior, (m–p) integrated transmission maps, and (q–t) final defogged
image.
1.5 CONCLUSIONS
KEYWORDS
REFERENCES
10. Das, T. K.; Chowdhary, C. L.; Gao, X. Z. Chest X-Ray Investigation: A Convolutional
Neural Network Approach. J. Biomimetics, Biomater. Biomed. Eng. 2020, 45, 57–70.
11. Reddy, T.; RM, S. P.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan, W. Z. A Deep
Neural Networks Based Model for Uninterrupted Marine Environment Monitoring.
Comput. Commun. 2020.
12. Anwar, M. I.; Khosla, A. Vision Enhancement through Single Image Fog Removal.
Eng. Sci. Technol. 2017, 20 (3), 1075–1083.
13. Hautière, N.; Tarel, J.; Lavenant, J.; Aubert, D. Automatic Fog Detection and Estimation
of Visibility Distance through Use of an Onboard Camera. Mach. Vision App. 2005, 17,
8–20.
14. Bronte, S.; Bergasa, L. M.; Alcantarilla, P. F. Fog Detection System Based on
Computer Vision Techniques. In 12th International IEEE Conference on Intelligent
Transportation Systems 2009, pp. 1–6.
15. Narasimhan, S. G.; Nayar, S. K. Vision and the Atmosphere. Int. J. Comput. Vision
2002, 48 (3), 233–254.
16. He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior.
IEEE Trans. Pattern Analy. Mach. Intell. 2011, 33 (12), 2341–2353.
17. Kaur, M.; Singh, D.; Kumar, V.; Sun, K. Color Image Dehazing Using Gradient
Channel Prior and Guided L0 File Information Sciences. 2020.
18. Fattal, R. Dehazing Using Color-lines. ACM Trans. Graphics 2014, 34 (1), 13.
19. Kapoor, R.; Gupta, R.; Son, L.; Kumar, R.; Jha, S. Fog Removal in Images Using
Improved Dark Channel Prior and Contrast Limited Adaptive Histogram. 2019.
20. Nandal, S.; Kumar, S. R. Single Image Fog Removal Algorithm in Spatial Domain
Using Fractional Order Anisotropic Diffusion. Multimedia Tools App. 2018, 78,
10717–10732.
21. RM, S. P.; Maddikunta, P. K. P.; Parimala, M.; Koppu, S.; Reddy, T.; Chowdhary, C.
L.; Alazab, M. An Effective Feature Engineering for DNN Using Hybrid PCA-GWO
for Intrusion Detection in IoMT Architecture. Comput. Commun. 2020.
22. Singh, D.; Kumar, V. Single Image Haze Removal Using Integrated Dark and Bright
Channel Prior. Modern Phys. Lett. B 2018, 32 (4), 1–9.
23. Pang, J.; Au, O. C.; Guo, Z. Improved Single Image Dehazing Using Guided Filter.
Asia Pacific Sign. Info. Process. Assoc. 2011.
24. Khandelwal, V.; Mangal, D.; Kumar, N. Elimination of Fog in Single Image Using
Dark Channel Prior. Int. J. Eng. Technol. 2018, 5 (2), 1601–1606.
25. Wu, M.; Zhang, C.; Jiao, Z.; Zhang, G. Improvement of Dehazing Algorithm Based
on Dark Channel Priori Theory. Optik 2020, 206, 164174.
CHAPTER 2
ABSTRACT
2.1 INTRODUCTION
shaking may happen in the hands, arms, legs, and chin. However, the
uncontrolled movement of the thumbs is one of the most common symp
toms. Of course, not every handshake is a sign of PD. In order to make
this diagnosis, a general check-up is required by experts. The slowing
of movements is a quite common symptom of PD.1 Unfortunately, the
patients are unable to perform the necessary daily life movements over
time. As they walk, they may see shrinkage in their steps and begin to
lean forward. Apart from these, most common symptoms, speech changes,
handwriting deterioration, posture deterioration, sudden movements while
sleeping and bowel disorders are the other the symptoms of PD.2
PD has spread worldwide due to the modern world lifestyle, which
is more common in older people. The PD represents the second most
common neurodegenerative disorder after Alzheimer’s disease.3 This
disease leads to the limitation of the person’s speaking skills, tremors
in hand movements and movement and muscle problems in general. PD
reduces the standard of living of sick people and naturally affects their
families. PD is the second most prevalent neurodegenerative disease in
the world, affecting approximately 10 million people worldwide.4 Non
invasive methods are more suitable for these people because most of
them are not physically good. The most common non-invasive methods in
clinics in the Parkinson area are the handwriting and voice speech tests.5
The non-invasive techniques are generally referred to as disease diagnosis
methods that do not require surgical intervention.
The datasets collected by these non-invasive methods are generally
suitable for analyzing by machine learning techniques. There are many
studies conducted in the literature on the diagnosis of PD by using different
techniques. Since there is no specific rule of machine learning techniques
and parameter optimization, the trial and error approaches are often used.6
Therefore, experiments with different machine learning methods will
enrich and improve the literature.
There are many different sorts of articles and researches in Parkinson’s
literature. Many machine learning methods could be applied to Parkinson
datasets. In recent literature, the accuracy parameter is usually used
for evaluating the efficiency of the methods. However, there are many
different sorts of machine learning performance evaluating methods like
f1 score, sensitivity, and confusion matrix.52–54
In this chapter, the Background section presents some useful information
about voice and handwriting datasets; additionally, this section contains
A Comparative Study of Machine Learning Algorithms 15
2.2.1 BACKGROUND
Handwriting tests are one of the most widely used non-invasive methods
in recent years. The idea of collecting data from handwriting tests to detect
A Comparative Study of Machine Learning Algorithms 17
Many varieties of machine learning and deep learning methods are deployed
for PD detection from voice, gait, and handwriting datasets. For instance,
Bernardo et al.20 introduced a PC app for PD detection. The C# based
interface app is designed for capturing data from patients; furthermore, the
author developed some algorithms for feature extraction. The author intro
duced novel samples for a handwritten test like a spiral, triangle, and cube.
Several preprocessing algorithms like Color thresholding, RGB convert to
grayscale, De-noising the pattern, and Skeletonization process operates for
feature extraction. Euclidean distance, relative distance, circular distance,
Manhattan distance, mouse pointer speed, the similarity between pixels
are features of the dataset. Optimum Path Forest (OPF), SVM and, NB are
the classifiers of the research. In this work, the author team reached 100%
accuracy with SVM classifier.
Pereira et al.11 mainly used the preprocessing methods for distin
guishing the template and patient drawings from paper-based tests; color
thresholding, blur filter, median filter, capturing the pattern of handwritten
drawings from the paper-based test are the preprocessing stages of this
work. Features like RMS, maximum difference (argmax), minimum
difference (argmin), standard deviation, Mean Relative Tremor (MRT)
had been extracted from images. From the comparison of OPF, NB, SVM
classifiers, SVM classifier reached 67% of accuracy. The authors collected
handwriting dataset consists of spirals, meanders, and captured drawings
from paper-based tests.
In another research, Pereira et al.21 designed the extracting method for
feature images from handwriting drawings. The author team extended
their dataset to six tests such as circle on the paper, circle on the air,
diadochokinesis with the right hand, diadochokinesis with the left hand,
meander, spiral, and time-series base images. The main purpose of this
work was to produce the feature images from raw data by normalizing,
squaring, and sketching matrixes into greyscale images as CNN inputs.
The data collected by digitized pen from a tablet in which the features were
Microphone, Finger grip, Axial Pressure of ink Refill, x, y, z. Different
sort of CNN architectures was used such as Cifar10, ImageNet for feature
extraction. Classifiers such as OPF, NB, SVM were deployed and had
been reached to a 95% accuracy level.
20 Computer Vision and Recognition Systems
extracted from EMG signals such as Density ratio, Height ratio, Execution
time, Execution average Linear speed, Acceleration norm, Gyroscope
components, and RMS. Simple ANN model Optimal topology of ANN and
SVM was reached to 89% accuracy. Computer vision based handwriting
analysis tool and surface ElectroMyoGraphy (sEMG) signal-processing
techniques were the central aspects of this work.
In Graça et al.,30 an online mobile app was designed for data collection
of patients. Development of the mobile app for online handwriting tests
and also analysis of the gait positions were the main aspects of this work.
The mobile app could detect drawings features like Spiral Average Error,
Spiral Cross, Spiral Pressure Ratio, Spiral Side Ratio, Tap Time Ratio,
and Tap Pressure Ratio. Decision tree, Ripper k, and Bayesian Network
classifiers were used to reach 85% of accuracy.
In Drotar et al.,31 the air movement-based data collection method was
used for the Parkinson dataset. Online in-air & on-surface movement-
based features were analyzed by SVM classifier, and 85% of accuracy was
obtained. There were some spiral and word writing tasks in the applied
tests.
Shahbaba et al.32 proposed a new mathematical approach dpMNL
(multinomial logit) for PD classification problems in the voice dataset. The
proposed model was using the Dirichlet process mixtures, which allowed
maintaining the relationship between the distribution of the response
variable and covariates in a non-parametrically way. This model was
generative, so it had advantages over the traditional MNL (multinomial
logit) models which were discriminative. The five-fold cross-validation
method was used for evaluating the performance of the model, and 87.7 ±
3.3% accuracy was achieved.
In another research, Psorakis et al.33 investigated the classification ability
of the proposed improved mRVMs (multiclass multi-kernel relevance
vector machines) over the real world datasets such as Parkinson dataset.
The research team achieved some improvements such as convergence
measures, sample selection strategies, and model improvements for better
results by 10-fold cross-validation with 10 repetitions.
In another work on the Parkinson voice dataset, Little et al.34 proposed
dysphonia detecting for PD detection. Also, the authors proposed a novel
dysphonia measure, Pitch Period Entropy (PPE), besides usual speech
features. The primary approach of this work was setting the exhaustive
search of all possible combinations of dysphonia measures to find the
A Comparative Study of Machine Learning Algorithms 23
of PSO and FKNN; this model’s performance had been evaluated through
10-fold cross-validation. The average value of accuracy was 97.47%. A
PD voice dataset from UCI database was analyzed through the research.
Luukka et al.46 introduced a hybrid model for PD detection. The model
composed of fuzzy entropy-based feature selection combined with a
similarity classifier. This combination proved the efficiency of the model
by simplifying the dataset and accelerating the classification process.
The model’s results revealed that the hybrid model reached a high-level
accuracy (mean value = 85.03%).
In another work, Li et al.47 compared optimization approaches by
analyzing different medical datasets. The primary purpose of this work
was to find the optimum feature set of datasets for better classification
results—a fuzzy-based non-linear transformation method was designed
for selecting the optimum feature subset from PD dataset. Also, the
authors compared the proposed feature selection method with principal
component analysis (PCA) and kernel principal component analysis
(KPCA) feature selection methods for illustrating the efficiency of the
method. The proposed classification approach was applied on different
sorts of datasets such as Pima Indians’ diabetes, Wisconsin diagnostic
breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders
dataset, and bladder cancer cases dataset. In conclusion, fuzzy-based non
linear transformation’s performance with SVM classifier was found to be
better than other methods (93.47% accuracy).
Ozcift et al.48 proposed computer-aided diagnosis (CADx) systems
to improve accuracy. The author proposed the combination of rotation
forest (RF) and some machine learning algorithms (30 ML algorithms) to
diagnosis disease from heart, diabetes, and Parkinson’s disease datasets.
RF classifier predicted the accuracy (ACC), KE, and area under the
receiver operating characteristic (ROC) curve (AUC) of 74.47, 80.49, and
87.13% respectively.
Khatamino et al.49 proposed an efficient convolutional neural network
for PD classification. The generalization ability of the model was
illustrated by comparing it with conventional machine learning classifiers
such as SVM and NB. One of the main purposes of this work was to
show the discriminative power of the novel DST test. Another main aspect
was to illustrate CNN’s flexibility and powerful feature learning ability
by comparing it with SVM and NB. However, two main approaches
were selected for evaluating the performance of the proposed model (CV,
26 Computer Vision and Recognition Systems
This chapter surveyed all studies that used machine learning algorithms
in PD diagnosis. In order to analyze the factors affecting the success rate
of the proposed algorithms, studies were summarized in terms of the clas
sification methods and classifier types, years, datasets, and accuracy rates
as shown in Table 2.2.
As one of the results of this literature review, it is realized that
researchers have tended to collect PD data, especially in collaboration
with the research hospitals in many studies. It is evident that the ideas and
guidance of the doctors and medical experts of the neurology departments
are essential.
In general, it is observed that high accuracy percentages have been
achieved in the literature in the last few years. One of the significant
reasons for this performance growth is the improvements in machine
learning and deep learning libraries of different programming languages.
The analysis of handwriting data often shifts to the field of image
processing. Therefore, some researchers are trying new methods of deep
learning rather than old image processing techniques in current studies.
Some studies have shown successful results of the CNN structure. CNN’s
discriminative detection power is useful for this topic as a literature review
as well. The reason is that automatic pattern detection filters are available
instead of designing manual filters in new methods. It is more practical
to use CNN’s self-learning adaptive filters instead of conventional image
processing methods for feature extraction. Moreover, user-friendly inter
faces of programing IDE’s facilitate easy model creation.
TABLE 2.2 Literature Review Summary.
27
Spread2, PPE
TABLE 2.2 (Continued)
28
Study Method & classifier type Data description Accuracy (%)
[39] LASSO, mRMR, RELIEF, LLBFS, Parkinson’s disease speech dataset (dysphonia 99% (overall value on test data)
Random forest, SVM measures)
[42] RF ensemble of IBk, SVM Parkinson’s Disease voice signal dataset 97% (test data)
[45] PSO–FKNN Parkinson’s disease voice dataset from UCI 97.47% (10-fold CV)
database
[9] fuzzy C-means, ANN Speech signal dataset fuzzy C-means 68.04%
ANN 92 %
[44] Multinomial logistic regression, Haar Parkinson’s Disease voice dataset 100% (test data)
wavelets
[41] PCA–FKNN Parkinson’s Disease speech dataset of UCI 96.07% (average10-fold CV)
[22] Gaussian mixture model, PCA, LDA, Voice signals: MDVP, NHR and HNR, RPDE and 100% (test data)
SFS, SBS, LS-SVM, PNN, GRNN D2, DFA, Spread1, Spread2, and PPE
29
30 Computer Vision and Recognition Systems
handwriting
voice
80 85 90 95 100
FIGURE 2.2 Average accuracy percentage of studies.
8%
35%
57%
Figure 2.4 illustrates the methods which are used in the PD literature
and their use case percentage among all analyzed studies respectively.
SVM classifier is generally used for classification on both voice and hand
writing data. The figure shows that SVM, NB and OPF classifiers are often
used for this literature.
Innovative approaches are included in this study, as well as attributes
and methods that have become standard in PD diagnosis for many years.
For instance, generally the attributes collected in a handwriting dataset
are x, y, z, pressure, grip angle and timestamp. However51 calculate
attributes such as speed, acceleration, RMS, etc. by innovative formulas.
Moreover, another example of creativity in the literature is creating unique
32 Computer Vision and Recognition Systems
2.4 CONCLUSION
KEYWORDS
• Parkinson’s disease
• machine learning
• deep learning
• literature review
• convolutional neural networks
A Comparative Study of Machine Learning Algorithms 35
REFERENCES
50. Isenkul, M.; Sakar, B.; Kursun, O. In Improved Spiral Test Using Digitized Graphics
Tablet for Monitoring Parkinson’s Disease, Proceedings of the International
Conference on e-Health and Telemedicine; 2014; pp 171–175.
51. Illán, I.; Górriz, J.; Ramírez, J.; Segovia, F.; Jiménez-Hoyuela, J.; Ortega Lozano,
S. J. M. p. Automatic Assistance to Parkinson’s Disease Diagnosis in DaTSCAN
SPECT Imaging. 2012, 39 (10), 5971–5980.
52. Khare, N.; Devan, P.; Chowdhary, C. L.; Bhattacharya, S.; Singh, G.; Singh, S.; Yoon,
B. SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid
Classifier Model for Intrusion Detection. Electronics 2020, 9 (4), 692.
53. Chowdhary, C. L.; Acharjya, D. P. Segmentation and Feature Extraction in Medical
Imaging: A Systematic Review. Procedia Comput. Sci. 2020, 167, 26–36.
54. Das, T. K.; Chowdhary, C. L.; Gao, X. Z. Chest X-Ray Investigation: A Convolutional
Neural Network Approach. J. Biomimetics, Biomater. Biomed. Eng. 2020, 45, 57–70.
CHAPTER 3
ABSTRACT
3.1 INTRODUCTION
its veins, in this manner acquiring important data about the inward eye and
eye maladies15.
The ophthalmoscope instrument is used for reviewing the inside of
the eye. It was created in 1850 by German researcher, furthermore, savant
Hermann von Helmholtz. The ophthalmoscope turned into a model for
later types of endoscopy. The gadget comprises of a solid light that can
be coordinated into the eye by a little mirror or crystal. The light reflects
off the retina and back through a little opening in the ophthalmoscope,
through which the inspector sees a non-stereoscopic amplified picture of
the structures at the back of the eye, including the optic plate, retina, retinal
BV’s, and macula shown in Figure 3.1. The ophthalmoscope is especially
helpful as a screening device for different visual infections, for example,
diabetic retinopathy (DR).15
OPTIC DISC
FOVEA
RETINAL VENULES
RETINAL ARTERIOLES
present. There are different parameters utilizing which they can grade the
seriousness of retinal sicknesses25 (Fig. 3.2).
The focal point of the fundus lies on the optical hub; this is the fovea,
which gathers the best-settled pictures, and it is typically connected with
a little yellow dab, the macula lutea. The anatomic and clinical foveola,
fovea, and macula are shown on the outline. The major vascular inventory
of the retina structures from the predominant and substandard arcade
of veins. The retinal region between the predominant and substandard
arcade is known as the territory central or back post. The focal point of
this back post contains the macula, which is redder (dim dark in the print
adaptation) and denser in shading than the encompassing retina. This is
because of more photoreceptors stuffed at high densities what’s more,
more colors behind the photoreceptor cells. The macula lutea alludes to
yellow xanthophyll color inside the retina in the focal point of the macula.
The focal point of the macula is alluded to as the fovea, which is 500 μm in
distance across an avascular region that basically made out of the internal
restricting layer and concentrated cone photoreceptor cells, known as the
pack of Rochelle Duverney. The significant vessels unmistakable in this
shading fundus photographs lie in the superior retina.11,31
Machine Learning Algorithms for Hypertensive Retinopathy 43
The retina is an inside and significant piece of the human eye whose work
is to catch and send pictures to the cerebrum. It comprises of various
structures alongside two sorts of veins, veins and courses. These retinal
veins are influenced by the quantity of eyes maladies. HR is caused because
of a steady high pulse in retinal BV’s. A great deal of people groups on
the planet is experiencing HR sickness; be that as it may, by and large, HR
patients are ignorant of it. The presence of HR and its seriousness can be
distinguished by the patient’s eye ophthalmologic assessment. More often
than not, HR is analyzed at the last stage which drove the patient to visual
deficiency or vision misfortune; thusly, it is important for HR patients
to ensure the standard assessment of their eyes.3 Presumably, there will
not be any signs until the condition has advanced broadly. Potential signs
and indications include: compact visualization, eye-distension, teeming
of a BV, double visualization accompanied by headaches. Getting quick
restorative assistance is better if the circulatory strain is high; and all of a
sudden has changed in the vision.9
Clinical discoveries of HR incorporate the presence of sores which
can be ordered into two gatherings, for example, delicate exudates and
hard exudates. Delicate exudates are otherwise called Cotton Wool Spots
(CWS). CWS is soft white-yellow spots seen in cutting edge phases
of HR, though HE is splendid yellow injuries. These CWS are either
observed detached in fundus pictures or exist with different injuries like
HEM and HE of tissue’s blood supply. CWS is likewise found in the retina
of diabetic patients yet they are all the more firmly identified with HR
when contrasted with DR. DR is described by different HE and a couple of
CWS while numerous CWS is related to HR.14 This sickness does not have
early ciphers and much of the time, HR is examined at later phases when
the illness prompts visual deficiency or vision misfortune. In this way, it
is fundamental for hypertensive patients to have a normal assessment of
their eyes.
Drawn out hypertension, or hypertension is the primary driver of HR.
Hypertension is an interminable issue where the blood power against
the arteries is excessively high. The power is an after effect of the blood
siphoning out from the heart and to the supply routes just as the power
made as the heart rests between pulses. When the blood travels through
the BV with more pressure, in the long run it makes harm by extending the
44 Computer Vision and Recognition Systems
supply routes. This prompts numerous issues after some time. HR, for the
most part, happens after the pulse has been reliably high over a drawn-out
period. The blood pressure (BP) can be influenced by: not having a daily
physical activity, being fatty, taking too much salt in daily food, daily
stress, High BP.6
HR is analyzed dependent on its clinical appearance on the widened
funduscopic test and concurrent hypertension. The primary care physician
will utilize an ophthalmoscope to analyze the retina. It sparkles a light
through the understudy for inspecting the rear of the eye for indications of
tightening veins or to check whether any liquid is spilling from the veins.
This strategy is easy. It takes under 10 min to finish. At times, a unique
test called fluorescein angiography (FA) is performed to look at the retinal
bloodstream. In this strategy, the doctor applies distinct eye droplets to
enlarge the pupils and afterward takes the photos of the eye. After taking
the pictures, the primary care physician will infuse a colorant called
fluorescein into a vein. They will commonly do this within the elbow. As
the dye moves into the veins of the eye the retinal images are accepted.
Intense harmful hypertension will make patients grumble of eye agony,
migraines, or diminished visual keenness. Ceaseless arteriosclerotic
changes from hypertension would not cause any side effects alone. In
any case, the complexities of arteriosclerotic hypertensive changes make
patients the present with normal indications of vascular impediments or
micro aneurysms (MA). For any disease, it is better to know the severity of
that disease so that the required treatments and precautions for the further
developments of the disease can be taken. In this purpose, it is required
to grade the disease. The following section discusses the grading and
classification of the HR.21
3.2.1 GRADING HR
As shown in the Figure 3.3, if the AVR ratio is in between 0.667 and
0.75 then that retinal image is graded as normal. If the AVR ratio is 0.5,
0.33, 0.25, and <0.20 then retinal images are graded as Grade-1, Grade-2,
Grade-3, Grade-4, respectively.39
The differential diagnosis for HR with diffuse retinal HEM, CWS,
and hard EX incorporates most outstandingly DR. DR can be recognized
from HR by assessment for the individual fundamental diseases. Different
conditions with diffuse retinal HEM that can take after HR incorporate
radiation retinopathy, paleness, and other blood dyscrasias, visual ischemic
disorder, and retinal vein impediment.
The gradation and extent of hypertension are usually the key deter
minants of retinopathy with hypertension. The improvements mentioned
in the sections above, however, are not specific for hypertension. Similar
Machine Learning Algorithms for Hypertensive Retinopathy 47
3.3.1 ACCURACY
Acc is defined as the ratio of correctly identified pixels to the total number
of pixels present in the image.
TP + FN
Acc = (3.1)
TP + TN + FP + FN
It is the capacity measure that the BV pixel identified as the BVs is really
positive and it is expressed in eq 3.4.
TP
PPV = (3.4)
TP + FP
The ratio between the true positive rate and false positive rate is considered
as AUC and it is indicated in eq 3.5.
1 TP TN
AUC = + (3.5)
2 TP + FN TN + FP
50 Computer Vision and Recognition Systems
3.4 METHODS
The next step involved is optic disc detection. For retinal images, the
optic disk is a central anatomical structure. The ability to detect optical
disks for retinal images plays a major role in automatic screening systems.
The next step followed by the OD detection is retinal vessel classification.
Based on the thresholding values carried out by the pixels retinal vessels
are classified into arteries or veins. As mentioned in the above sections the
AVR ratio is calculated based on which the HR grading will be done.
Machine Learning Algorithms for Hypertensive Retinopathy 53
Classification of retinal fundus images has become one of the main uses
of the pilot to illustrate ML. Convolutional neural networks (CNNs)
are a kind of deep neural networks (DNN) that generate fairly accurate
results when used to classify retinal fundus images.29,35-36,42-45 The general
approach for grading HR using ML is shown in Figure 3.7.
3.4.2.2 STRIDE
The number of pixels that moves over the given input image matrix is
called as stride. For example, if the stride is 1 then we move the kernel to 1
Machine Learning Algorithms for Hypertensive Retinopathy 55
pixel and if the stride is 2 then we move the kernel to 2 pixels on the given
input image matrix.
h * fh = h - fh + 1
d
fw
d
w w - fw + 1
FIGURE 3.8 Image pixels matrix multiplied with kernel or filter matrix.
Source: Reprinted from Ref. [3]. Open access.
3.4.2.3 PADDING
In certain situations, the kernel may not be fitted with the given image
pixels matrix. In such situations, we are having two choices
i. We can add zeros to the input image matrix so that the kernel or
filter fits
ii. We can drop or eliminate a part of input image where the kernel or
filter fits
Rectified Linear Unit for a non-linear operation-ReLU.
Sometimes, there may be a chance to have negative values in the given
matrices. To provide non-linearity in ConvNet the ReLU operations will
be useful in providing non-negative linear values.
The output of the ReLU is f(x)=max(0,x)
3.4.2.4 POOLING
3.4.2.5 DROPOUTS
deal with these contortions, henceforth, they would have the option to
function admirably in reality. Another basic method is to subtract the
mean picture from each picture and furthermore isolate it by the standard
deviation.
FIGURE 3.9 Neural network with more than one convolutional layers.
Source: Reprinted from Ref. [3]. Open access.
3.5 DATABASE
3.5.4 VICAVR
INSPIRE AVR with 40 shading pictures of the vessels and optic circle
and an arterial–venous proportion reference standard. The orientation
standard is the normal of the appraisal of two specialists utilizing IVAN
(a semi-mechanized PC program created by the University of Wisconsin,
Madison, WI, USA) on the pictures.20
The retinal fundus image databases with the number of images available
for HR classification are mentioned in Table 3.4.
The retinal database of VICAVR (Fig. 3.12a–c) and STARE (Fig. 3.12d–f)
is used in the method. The Figure 3.11 indicates the steps of the method
60 Computer Vision and Recognition Systems
proposed. The method takes the retinal images as input in the first step
represented in Figure 3.12(i). Then the green channel of the fundus image
is extracted. The next step is to enhance the retinal image using CLAHE.
The next step is to localize the OD using morphological operations. Then
it is to segment the BV and classifying them as arteries and veins. In the
final step based on the ratio of AVR ratio, the HR classification is done.
TABLE 3.4 Retinal Database with Number of Fundus Images Available for HR.
SI. No Database Total images available
1 DRIVE 40
2 STARE 400
3 AVRDB 100
4 VICAVR 58
5 INSPIRE AVR 40
The final step is to calculate the AVR ratio which is mentioned in the
above sections. Based on the AVR ratio obtained the classification of HR
is done which is mentioned in the Figure 3.3.
62 Computer Vision and Recognition Systems
This section will give the Acc obtained for various algorithms for various
databases. From the observation, it is found that Abbasi et al. used
conventional approach for HR detection on locally available database and
obtained a low Acc of 81%. Whereas Irshad et al. got very good results of
98.65% with the conventional methods using VICAVR database. Using
ML approach for grading the HR Syahputra et al. achieved a highest Acc of
100% using a testing sample of 20 images from STARE database. Authors
have used only one type of database for HR detection; but in the proposed
method, VICAVR and STARE databases are used for HR detection. The
Acc for HR grading using various algorithms is listed in Table 3.5 along
with the database used.
KEYWORDS
• retina
• blood vessels
• diabetic retinopathy
• hypertensive retinopathy
• machine learning
REFERENCES
26. Ortíz, D.; Cubides, M.; Suárez, A.; Zequera, M.; Quiroga, J.; Gómez, J.; Arroyo, N.
Support System for the Preventive Diagnosis of Hypertensive Retinopathy. In 2010
Annual International Conference of the IEEE Engineering in Medicine and Biology;
IEEE, 2010 Sept; pp 5649–5652.
27. Ortiz, D.; Cubides, M.; Suarez, A.; Zequera, M.; Quiroga, J.; Gómez, J. A.; Arroyo,
N. System for Measuring the Arterious Venous Rate (AVR) for the Diagnosis of
Hypertensive Retinopathy. In 2010 IEEE ANDESCON; IEEE, 2010 Sept; pp 1–4.
28. Ortíz, D.; Cubides, M.; Suarez, A.; Zequera, M.; Quiroga, J.; Gómez, J. A.; Arroyo,
N. System Development for Measuring the Arterious Venous Rate (AVR) for the
Diagnosis of Hypertensive Retinopathy. In 2012 VI Andean Region International
Conference; IEEE, 2010 Sept; pp 53–56.
29. Prabhu, R. Understanding of Convolutional Neural Network (CNN)—Deep Learning,
2018. https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural
network-cnn-deep-learning-99760835f148
30. Savant, V.; Shenvi, N. Analysis of the Vessel Parameters for the Detection of
Hypertensive Retinopathy. In 2019 3rd International conference on Electronics,
Communication and Aerospace Technology (ICECA); IEEE, 2019 Jun; pp 838–841.
31. Akbar, S.; Hassan, T.; Akram, M. U.; Yasin, U.; Basit, I. AVRDB: Annotated Dataset
for Vessel Segmentation and Calculation of Arteriovenous Ratio, 2017.
32. Akbar, S.; Akram, M. U.; Sharif, M.; Tariq, A.; Yasin, U. Decision Support System
for Detection of Papilledema through Fundus Retinal Images. J. Med. Syst. 2017, 41
(4), 66.
33. Syahputra, M. F.; Aulia, I.; Rahmat, R. F. Hypertensive Retinopathy Identification
from Retinal Fundus Image Using Probabilistic Neural Network. In 2017 International
Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA);
IEEE, 2017 Aug; pp 1–6.
34. Triwijoyo, B. K.; Budiharto, W.; Abdurachman, E. The Classification of Hypertensive
Retinopathy Using Convolutional Neural Network. Procedia Comput. Sci. 2017, 116,
166–173.
35. Carolina Ophthalmology, P. A. Diseases & Surgery of the Eye, Retina Center https://
www.carolinaeyemd.com/retina-center-hendersonville/# [accessed 5 May 2020].
36. Walsh, J. B. Hypertensive Retinopathy: Description, Classification, and Prognosis.
Ophthalmology 1982, 89 (10), 1127–1131.
37. Wong, T. Y.; Mitchell, P. Hypertensive Retinopathy. New Engl. J. Med. 2004, 351
(22), 2310–2317.
38. Zhang, B.; Zhang, L.; Zhang, L.; Karray, F. Retinal Vessel Extraction by Matched Filter
with First-order Derivative of Gaussian. Comput. Biol. Med. 2010, 40 (4), 438–445.
39. Rani, A.; Mittal, D. Measurement of Arterio-venous Ratio for Detection of Hyperten
sive Retinopathy through Digital Color Fundus Images. J. Biomed. Eng. Med. Imag.
2015, 2 (5), 35–35.
40. Modi, P.; Arsiwalla, T. Hypertensive Retinopathy. In StatPearls [Internet]. StatPearls
Publishing, 2019.
41. Hoover, A. STARE Database, 1975. http://www. ces. clemson. edu/~ ahoover/stare
42. Khare, N.; Devan, P.; Chowdhary, C. L.; Bhattacharya, S.; Singh, G.; Singh, S.; Yoon,
B. SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid
Classifier Model for Intrusion Detection. Electronics 2020, 9 (4), 692.
Machine Learning Algorithms for Hypertensive Retinopathy 67
ABSTRACT
4.1 INTRODUCTION
system gives rise to issues like monitoring vehicle density and traffic
congestion wherein a large quantity of videos needs to be processed
using Big Data mechanisms.16
loads of parameters and variables. This has become one of the challenges
of Big Data.
Variability: Variability differs from variety. A restaurant may have 20
different kinds of food items on the menu. However, if the same item
from the menu tastes different each day, then it is called variability. The
same applies to data, whenever the meaning of a data changes constantly
it affects the homogeneous nature of data. Variability indicates data whose
meaning constantly changes.
Vulnerability: With a huge amount of data, there also arise concerns
about security. A data breach on Big Data can cause an exploitation of
important information. Many hackers have attempted and succeeded in
many Big Data breaches.
Volatility: Before the advent of Big Data, data were stored indefinitely.
But due to the volume and velocity of Big Data, volatility needs to be
considered. It needs to be established that how long data should be stored
and when to consider that data have become irrelevant or historic.
Validity: Validity refers to how accurate and correct the data is for its
intended use. Benefits from Big Data can be derived if the underlying data
are consistent in quality, metadata, and common definitions.
The general perception of Big Image Data Processing is that it deals with the
processing of images that are huge in quantity. However, Big Image implies
Big Image Data Processing: Methods, Technologies 73
(1) images which are large in quantity, shown in Figure 4.3. (2) individual
image big with respect to Dimension (M × N) as shown in Table 4.1 and
(3) individual images big with respect to Size, that is, amount of storage
required to store it, as shown in Table 4.2.
In this chapter, the authors have given details about how to handle these
three types of “big image” to store and process in a distributed environment.
In this chapter, the authors have given different methods, technologies
and implementation issues that they have experienced in making BIDP
success.
• The objectives of this chapter are:
• To give different methods of handling Big Image Data.
74 Computer Vision and Recognition Systems
4.2 BACKGROUND
Sky 100000×50000
pixels
The main concerns while dealing with processing of Big Image Data are (1)
storage of the given data when it cannot be stored in the existing infrastruc
ture and (2) processing of the given data when it cannot be processed with
the existing infrastructure. In some cases, both can be done with the existing
infrastructure, but it is a very time-consuming process. So to deal with this,
the authors have discussed different existing technologies in this section.
A. Hadoop: Hadoop is a part of the Apache project.33 It is an open-
source Java-based framework used for storage and processing of
Big Data in a distributed environment.
• Storage: Hadoop mainly contains two parts. One for storage
and another for processing. For storing the data, it uses a file
system known as Hadoop Distributed File System (HDFS).
If the amount of data cannot fit into the memory of a single
computer, a Hadoop cluster can be made with n number of
computers, which gives combined storage. The total storage
that can be contributed by all the computers in the cluster is
termed as HDFS. In this scenario, all the computers which are
a part of the cluster can access the data.
• Processing: As the data are stored in a distributed file system,
a different programing paradigm is needed to process these
data. So, Hadoop uses the MapReduce programing paradigm
for it. When dealing with large data, the MapReduce paradigm
is one of the best solutions to get the results in less time than
that on doing it on a single system. This is a programing para
digm in which the execution takes place where the data reside.
The execution takes place in three stages: Map, Shuffle & Sort
and Reduce stages. The Map stage takes in the input in <Key,
Value> pair and produces the output also as <Key, Value> pair.
Then the Shuffle & Sort stage will sort this based on the “key”.
Therefore, the reducer will consolidate the work for each of
the key and produce the final output. For storing the data in
intermediate steps Distributed File System can be used. This
data can be in any form: Text, Images, Videos, Log Data, etc.
B. MATLAB with Matlab Distributed Computing Server (MDCS): The
MATLAB Distributed Computing Server (MDCS) allows users to
submit (from within MATLAB) sequential or parallel MATLAB
80 Computer Vision and Recognition Systems
work with this, the different options that can be used are: 0, cluster, and
gcp (get current parallel pool), respectively, in the implementation code
as cluster setup. When “0” is used, the data will be taken from the local
system and Matlab’s MapReduce will do the entire job. When “cluster”
is used as the option, the Hadoop’s MapReduce will be active and data
can be accessed from HDFS. Last, if “gcp” is the option used, Matlab’s
Parallel pool with MapReduce will be activated and the data can be taken
from HDFS.
A. Working with Text Data: In this method, the authors created a total
of 116 nodes Hadoop cluster. One of the nodes is considered as the
master node whereas the remaining 115 nodes are considered as the
slave nodes. These 116 nodes are situated in three different labs of
their college. Table 4.3 shows the configuration of all the nodes in
this cluster. The configuration capacity has become 28.13 TB with
the help of this cluster. The authors have uploaded text data of a total
size 1.2 TB with a replication factor of 5 into HDFS to test the cluster
performance. Then execution of the standard “word-count” example
is carried out on this data. The time taken for completion is 8 min 8 sec.
The authors have also uploaded the entire process into youtube given
in the link: https://www.youtube.com/watch?v=CSryEIkNGdk. The
sample code for this one is given here.
%word_count.m
mapreducer(0);
datafolder = ‘/input’;
files = fullfile(datafolder, ‘*.txt’);
ds = datastore(files,’TextscanFormats’ , ‘%s’, ‘Delimiter’, ‘ ‘,
‘ReadVariableNames’, false, ‘VariableNames’, ‘Word’);
output_folder = ‘/output’;
outds = mapreduce(ds, @mapCountWords, @reduceCountWords, ‘Output
Folder’, output_folder); readall(outds)
84 Computer Vision and Recognition Systems
%mapCountWords.m
function mapCountWords(data, info, intermKVStore)
x = table2array(data);
for i=1:size(x,1)
disp([string(x(i,1)) 1]); % displaying the key value pair
%which is output of mapper
add(intermKVStore,string(x(i,1)),1);
end
end
%reduceCountWords.m
function reduceCountWords(intermkey, intermValIter, outKVStore)
sum_occurences = 0;
while(hasnext(intermValIter))
sum_occurences = sum_occurences + getnext(intermValIter)
end
add(outKVStore, intermkey, sum_occurences);
end
Algorithm-1:
Begin
Step-1: Store all the images of the dataset into HDFS.
Step-2: Give all the images to MR_Job1, which gives the <FileName,
ImageData> as the output of this Job in the form of sequence
file.
86 Computer Vision and Recognition Systems
FIGURE 4.9 (a) 1024 sized block (b) 2048 sized block of Figure 4.8.
4.5 CONCLUSION
The authors have shown the process of handling Big Image Data. Three
different cases are shown: (1) to handle a large number of images (2)
Working with Big image of huge Dimension, and (3) Working with Big
image of huge Size. The authors have processed different standard image
88 Computer Vision and Recognition Systems
datasets which are large in quantity to achieve image retrieval tasks using
MapReduce paradigm by storing the data in a distributed file system. The
different modes of parallel execution are discussed. The advantage of
converting the files into sequence files is also discussed.
KEYWORDS
REFERENCES
1. Gonzalez, R. C.; Woods, E. W. Digital Image Processing, 4th ed.; Pearson: New
York, 2018.
2. Bovik, A. C. Handbook of Image and Video Processing; Academic Press, 2010.
3. Papyan, V.; Elad, M. Multi-scale Patch-based Image Restoration. IEEE Trans. Image
Proces. 2015, 25 (1), 249–261.
4. Manzke, R.; Meyer, C.; Ecabert, O.; Peters, J.; Noordhoek, N. J.; Thiagalingam, A.;
Reddy, V. Y.; Chan, R. C.; Weese, J. Automatic Segmentation of Rotational X-ray
Images for Anatomic Intra-procedural Surface Generation in Atrial Fibrillation
Ablation Procedures. IEEE Trans. Med. Imag. 2009, 29 (2), 260–272.
5. Yang, W.; Zhong, L.; Chen, Y.; Lin, L.; Lu, Z.; Liu, S.; Wu, Y.; Feng, Q.; Chen,
W. Predicting CT Image from MRI Data through Feature Matching with Learned
Nonlinear Local Descriptors. IEEE Trans. Med. Imag. 2018, 37 (4), 977–987.
6. Ma, X.; Schonfeld, D.; Khokhar, A. A General Two-dimensional Hidden Markov Model
and Its Application in Image Classification. 2007 IEEE Int. Conf. Image Process. 2007,
6, VI–41.
7. Cao, Z.; Simon, T.; Wei, S. E.; Sheikh, Y. Realtime Multi-person 2D Pose Estimation
Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition; 2017; pp 7291–7299.
8. Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D Object Detection Network
for Autonomous Driving. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition; 2017; pp 1907–1915.
Big Image Data Processing: Methods, Technologies 89
29. Regina, A.; Peter, D.; Louise, K.; Andrew, L.; Jacqui, S. Thinking Creatively about Video
Assignment- A Conversation with Penn Faculty. http://wic.library.upenn.edu/wicideas/
facvideo.html
30. Raju, U. S. N.; Chaitanya, B.; Kumar, K. P.; Krishna, P. N.; Mishra, P. Video
Copy Detection in Distributed Environment. In 2016 IEEE Second International
Conference on Multimedia Big Data (BigMM); IEEE, 2016 Apr; pp 432–435.
31. BigMM 2020. http://bigmm2020.org/
32. International Trends in Video Surveillance- Public Transport Gets Smarter, 2018.
https://www.uitp.org/sites/default/files/cck-focus-papers-files/1809-Statistics%20
Brief%20-%20Videosurveillance-Final.pdf
33. Welcome to Apache™ Hadoop®!. http://hadoop.apache.org/
34. Getting Started with MapReduce. https://in.mathworks.com/help/matlab/import_
export/getting-started-with-mapreduce.html
35. Zaharia, M.; Chowdhury, M.; Franklin, M. J.; Shenker, S.; Stoica, I. Spark: Cluster
Computing with Working Sets (PDF). In USENIX Workshop on Hot Topics in Cloud
Computing (HotCloud); 2014.
36. Zaharia, M.; Chowdhury, M.; Das, T.; Dave, A.; Ma, J.; McCauly, M.; … Stoica, I.
Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster
Computing. In Presented as Part of the 9th {USENIX} Symposium on Networked
Systems Design and Implementation ({NSDI} 12); 2012; pp 15–28.
37. Xin, R. S.; Rosen, J.; Zaharia, M.; Franklin, M. J.; Shenker, S.; Stoica, I. Shark: SQL
and Rich Analytics at Scale. In Proceedings of the 2013 ACM SIGMOD International
Conference on Management of Data; 2013, Jun; pp 13–24.
38. Harris, D. 4 Reasons Why Spark Could Jolt Hadoop Into Hyperdrive. Gigaom,
2014. https://gigaom. com/2014/06/28/4-reasons-why-spark-could-jolt-hadoop
intohyperdrive
39. Sarmad, I.; Mohammad-Reza, S. Unstructured Medical Image Query Using Big
Data- An Epilepsy Case Study. J. Biomed. Info. 2016, 59, 218–226.
40. Raju, U. S. N.; Suresh Kumar, K.; Haran, P.; Boppana, R. S.; Kumar, N. Content-
based Image Retrieval Using Local Texture Features in Distributed Environment. Int.
J. Wavelets, Multiresol. Info. Process. 2019, 1941001.
41. Lan, Z.; Taeho, J.; Kebin, L.; Xiang-Yang, L.; Xuan, D.; Jiaxi, G.; Yunhao, Liu. PIC:
Enable Large-scale Privacy Preserving Content-based Image Search on Cloud. IEEE
Trans. Parallel Dist. Syst. 2017, 25 (11), 3258–3271.
42. Le, D.; Zhiyu, L.; Yan, L.; Ling, H.; Ning, Z.; Qi, C.; Xiaochun, C.; Ebroul, I. A
Hierarchical Distributed Processing Framework for Big Image Data. IEEE Trans. Big
Data 2016, 2 (4), 297–309.
43. Das, T. K.; Chowdhary, C. L.; Gao, X. Z. Chest X-Ray Investigation: A Convolutional
Neural Network Approach. J. Biomimetics, Biomater. Biomed. Eng. 2020, 45,
57–70).
44. Chowdhary, C. L.; Acharjya, D. P. Segmentation and Feature Extraction in Medical
Imaging: A Systematic Review. Procedia Comput. Sci. 2020, 167, 26–36.
45. Khare, N.; Devan, P.; Chowdhary, C. L.; Bhattacharya, S.; Singh, G.; Singh, S.; Yoon,
B. SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid
Classifier Model for Intrusion Detection. Electronics 2020, 9 (4), 692.
Big Image Data Processing: Methods, Technologies 91
46. Reddy, T.; RM, S. P.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan, W. Z. A Deep
Neural Networks Based Model for Uninterrupted Marine Environment Monitoring.
Comput. Commun. 2020.
47. Jiachen, Y.; Bin, J.; Baihua, L.; Kun, T.; Zhihan, L. A Fast Image Retrieval Method
Designed for Network Big Data. IEEE Trans. Ind. Info. 2017, 13 (5), 2350–2359.
CHAPTER 5
ABSTRACT
5.1 INTRODUCTION
The BoW model was first introduced in the text retrieval and categoriza
tion domain where a document is described by a set of keywords and their
frequency of occurrence in the document. The same idea was applied to
the image domain and has been quite successful.74 Here, the idea is to
represent an image using a dictionary of different visual words. Images
are quite different from text documents in the sense that there is no natural
concept of a word in case of images.4 Thus, there is a need to break down
the image into a list of visual elements. Moreover, as the number of possible
visual elements in an image could be enormous, these elements should be
discretized to form a visual word dictionary known as a codebook.
Vocabulary construction has been achieved mainly using two approaches:
local, patch-based approach or dense sampling4,48 and key point-based
approach or sparse sampling.16,62,65 In the patch-based approach, the image
is divided into a number of equal sized patches by using a grid. Local
features are then computed for each patch separately. Keypoints are the
centers of salient patches generally located around the corners and edges.
Keypoints are also known as interest points and can be detected using
various region detectors such as the Harris–Laplace detector (corner-like
structures), Hessian-affine detector,79 Maximally stable extremal regions or
the Salient regions detector.55 Local features are then computed for each
interest point.
Some of the state-of-art local feature descriptors used for modeling
texture information include Scale Invariant Feature Transform (SIFT),53
Speeded Up Robust Features (SURF),5 Histogram of Oriented Edges
(HOG),18 Local Ternary Pattern (LTP),78 and Discrete Cosine Transform
(DCT).15 Color hues and shape features have also been used as local feature
descriptors by some of the researchers. These local feature descriptors are
briefly described below.
N-grams for Image Classification and Retrieval 97
After calculation of local features, the next step in the BoVW or N-gram
representation of an image is the vocabulary construction. Since an image
does not contain discrete visual words, a challenging task is to discover
meaningful visual words. This can be achieved by clustering local features
so that cluster centroids can be treated as visual words. Various clustering
algorithms such as Generalized Llyod Algorithm (GLA), Pairwise Nearest
Neighbor Algorithm (PNNA) and K-means Algorithm have been widely
used for this purpose.92 However, GLA is computationally complex and
cannot guarantee an optimal codebook generation.92 On the other hand,
PNNA is more efficient than GLA but slightly inferior to GLA in terms
of optimality.91,92 Further, the K-means algorithm performs better than
the hierarchical algorithms in terms of accuracy and computation time. It
differs from the GLA in that the input for k-means algorithm is the discrete
set of points rather than continuous geometric region. This algorithm
partitions N number of local features into K clusters in which each feature
belongs to the cluster with the nearest mean. This is the most commonly
used algorithm for visual codebook generation.4,8,16,46,52,58,65,75,83,90
The approaches for vocabulary construction can be mainly grouped
under two main categories: global dictionary and sub-dictionary. If a single
dictionary of visual words is created using all the images in the collection, it
is called as global dictionary.4,18,52,56,62,75,79,90,91 On the contrary, sub-dictionary
approach considers subset of visual words that best represent a specific
image class and is also known as region-specific visual words. For example,
in diabetic retinopathy images, two sub-dictionaries related to lesion and
no-lesion classes can be separately created.33 Classification as well as
retrieval performance can be improved over the global dictionary approach
using the sub-dictionary approach.29,64
Creation of visual N-gram codebook can be more challenging than the
BoVW codebook creation. This is due to the fact that as opposed to text,
an image can be read in many different directions (horizontal, vertical,
at an angle of ϴ degrees). Further, visual N-grams that have the same
order but different orientations may be related to the same pattern. One
such approach of generating rotation invariant N-gram codebooks can
be seen in the work of López-Monroy et al.52 Moreover, as N increases
the dictionary size is increased tremendously if we consider all possible
combinations of visual words in all possible directions.
100 Computer Vision and Recognition Systems
Character Pixel
N-grams Pixel N-grams
Character
Color features have been used for CBIR because they can be easily extracted
and are powerful descriptors for images. Color histograms representing
relative frequency of color pixels across the image are common for CBIR.
However, they only convey global image properties and do not represent
local color information. In the Color N-grams approach, an image has been
represented with respect to a codebook, which describes every possible
combination of a fixed number of coarsely quantized color hues.71 This
allows comparison of images based on shared adjacent color objects or
boundaries. N-gram samples were taken to be 25% of the total number of
pixels in an image. The dataset included 100 general color images of faces,
flowers, animals, cars, and aeroplanes. The results were compared with
the approach adopted by Faloutsos et al.21 The average rank of all relevant
images was reported to be 2.4 as compared to the 2.5 of the baseline.
Also the number of relevant images missed was 1.9 as compared to 2.1
of the baseline. The limitation of this study is that the quantization of the
hues does not match the sensitivity of the human color perception model.
Another limitation was the very small database used. However, further
104 Computer Vision and Recognition Systems
work has demonstrated that this approach could also be used for very large
databases.70 Moreover, this approach is less sensitive to small spectral
differences and is not prone to color constancy problems.
The concept of N-gram has been used to group perceptual shape features
to discover higher level semantic representation of an image.56 Here,
low-level shape features are extracted and perceptually grouped using the
Order Preserving Arctangent Bin (OPABS) algorithm advanced by Hu and
Gao. This is based on perceptual curve partitioning and grouping PCPG
model.23 In this PCPG model, each curve is made up of Generic Edge
Tokens (GET) connected at Curve Partitioning Points (CPP). Each GET
is characterized by monotonic characteristics of its Tangent Function (TF)
set. The extracted perceptual shape descriptors are categorized as one of
eight generic edge segments.
Gao and Wang’s model is based on Gestalt’s theory of perceptual
organization which states that humans perceive the objects as a whole.
The authors define shape N-gram as continuous subsequence of GETs
connected at CPP points. There are three main cases of how the GETs
are connected at CPP. The first references a curve segments connected to
another curve segment (CS–CS); the second is a line segment connected
to line segment (LS–LS), and the third is curve segment connected to line
segment (CS–LS). Here, four N-gram based perceptual feature vector are
proposed, which encode local and global shape information in an image.
The Caltech256 dataset was used for classification experiments.27 Results
show that the combination of shape N-grams with conventional SIFT
vocabulary achieve around 8% higher classification accuracy as compared
to SIFT-based vocabulary alone.
Further, the development of CANDID (Comparison Algorithm for
Navigating Digital Image Database)37 was inspired by the N-gram approach
to document fingerprinting. Here, a global signature is derived from various
image features such as localized texture, shape, or color information. A
distance between probability density functions of feature vectors is used
to compare the image signatures. Global feature vectors represent single
measurement over the entire image (e.g., dominant color, texture). Whereas,
the N-gram approach allows for the retention of information about the relative
N-grams for Image Classification and Retrieval 105
occurrences of local features such as color, gray scale intensity or shape. Use
of probability density functions can reduce the problem of high dimensions;
however, they are computationally more expensive than histogram-based
features.38 It is observed that subtracting a dominant background from
every signature prior to comparison does not have any effect while using
true distance function; whereas, considering a similarity measure such as
nSim(I1,I2), dominant background subtraction has a dramatic effect. The
experiments were conducted on satellite data (LandSet TM 100 images) and
Pulmonary CT imagery (220 lung images from 34 patients). Experimental
results show good retrieval precision.
The word N-gram approaches are divided into keypoint based and local
patch based according to the sampling strategies used; whereas, based on
local features used these approaches are divided into color N-grams and
shape N-grams. Another concept called character N-grams in the text
retrieval domain has also been applied recently for image representation.
This is described below.
A new representation of images that goes further in the analogy with textual
data, called visual sentences, has been proposed by Tirilly et al.79 A visual
sentence that allows visual words to be read visual words in a certain order.
An axis is chosen for representing an image as a visual sentence, so that
(a) it is at an orientation fitting the orientation of the object in the image,
(b) it is at a direction fitting the direction of the object. The keypoints are
then projected onto this axis using orthogonal projection. In this work,
SIFT descriptors are used and keypoints detection is achieved using
Hessian-affine detector. The main problem is to decide the best axis for
projection. Experiments include five different axis configurations: 1 PCA
axis, 2 orthogonal PCA axis, 10 axis obtained by successive rotation of 10
degrees of main PCA axis, X-axis and finally one random axis. Results
show that the approach with X-axis outperforms those with the PCA axis
on classification tasks.13,40,69 This is because the PCA axis is biased by
background clutter. However, PCA axis takes spatial relations into account
and outperforms the random axis or the multiple axis configurations.70,86
108
Author Year Model Local features Dataset Advantages Application
Kelly et al. 1994 CANDID 5 one dimensional Pulmonary CT scans Features invariant to Retrieval
kernels rotation pulmonary
diseased cases
Rickman 1996 Color N-grams Color hues (Hue 100 color images Robust to noise Rapid fuzzy Retrieval of color
and Rosin saturation and matching of color images images
intensity)
Soffer 1997 N×M grams N × M grams Fingerprint, floorplans, Works well on simple Image
absolute count and comics, animals etc. images such as floorplan, categorization
frequency music notes, comincs
Zhu et al. 2000 Key-block CDB-500 web color images Superior to color histogram, Image retrieval
divided into 41 groups color coherent vector, Haar
TDB-2240 Gray scale and Daubechies wavelet
109
classification colorectal polyp
TABLE 5.1 (Continued)
110
Author Year Model Local features Dataset Advantages Application
Battiato et 2013 N-grams SIFT Flickr: 3300 images Exploit coherence between Near duplicate
al. UKBench: 10200 images feature space not only in image detection
image representation step
but also during codebook
creation. Outperforms BoVP
Monroy 2013 N-gram Discrete cosine Histopathological dataset: 1 + 2 grams improves Histopathological
et al. combination transform 1417 images of 7 categories accuracy by 6% than BoVW classification
for basal cell
carcinoma
Mukanova 2014 Shape N-grams SIFT, perceptual Wang : 100 images of 10 Improves accuracy by 8% Classification of
et al. shape features categories Caltech 256: 10 as compared to traditional images
classes each with 80 images BoVW
KEYWORDS
REFERENCES
1. Aggarwal, N.; Agrawal, R. First and Second Order Statistics Features for Classification
of Magnetic Resonance Brain Images. J. Sign. Info. Process. 2012, 3 (2).
2. Amin, J.; Sharif, M.; Gul, N.; Yasmin, M.; Shad, S. A. Brain Tumor Classification
Based on DWT Fusion of MRI Sequences Using Convolutional Neural Network.
Pattern Recogn. Lett. 2020, 129, 115–122.
3. Angelov, P.; Sperduti, A. Challenges in Deep Learning. Paper presented at the
ESANN, 2016.
114 Computer Vision and Recognition Systems
4. Avni, U.; Goldberger, J.; Sharon, M.; Konen, E.; Greenspan, H. Chest X-ray
Characterization: From Organ Identification to Pathology Categorization. Paper
presented at the Proceedings of the international conference on Multimedia information
retrieval, 2010.
5. Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up Robust Features (SURF).
Comput. Vision Image Understand. 2008, 110 (3), 346–359.
6. Bay, H.; Fasel, B.; Gool, L. V. Interactive Museum Guide: Fast and Robust Recognition
of Museum Objects. Paper presented at the Proceedings of the first international
workshop on mobile vision, 2006.
7. Bosch, A.; Zisserman, A.; Muñoz, X. Scene Classification via pLSA. Comput.
Vision–ECCV 2006 2006, 517–530.
8. Samantaray, S.; Deotale, R.; Chowdhary, C. L.. Lane Detection Using Sliding Window
for Intelligent Ground Vehicle Challenge. In Innovative Data Communication
Technologies and Application, Springer: Singapore, 2021; pp. 871-881.
9. Bouachir, W.; Kardouchi, M.; Belacel, N. Improving Bag of Visual Words Image
Retrieval: A Fuzzy Weighting Scheme for Efficient Indexation. Paper presented
at the Signal-Image Technology & Internet-Based Systems (SITIS), 2009 Fifth
International Conference on, 2009b.
10. Brodatz, P.; Textures, A. A Photographic Album for Artists and Designers. 1966.
Images downloaded in July, 2009.
11. Caicedo, J. C.; Cruz, A.; Gonzalez, F. A. Histopathology Image Classification Using
Bag of Features and Kernel Functions. Artif. Intell. Med. 2009, 126–135.
12. Chen, Y.; Wang, J. Z.; Krovetz, R. An Unsupervised Learning Approach to Content-
based Image Retrieval. Paper presented at the Signal Processing and Its Applications,
2003. Proceedings. Seventh International Symposium on, 2003.
13. Chowdhary, C. L. 3D Object Recognition System Based on Local Shape Descriptors
and Depth Data Analysis. Rec. Patents Comput. Sci. 2019, 12 (1), 18–24.
14. Climer, J. Overcoming Pose Limitations of a Skin-Cued Histograms of Oriented
Gradients Dismount Detector Through Contextual Use of Skin Islands and Multiple
Support Vector Machines, 2011.
15. Cruz-Roa, A.; Díaz, G.; Romero, E.; González, F. A. Automatic Annotation of Histo
pathological Images Using a Latent Topic Model Based on Non-negative Matrix
Factorization. J. Pathol. Info. 2011, 2.
16. Csurka, G.; Dance, C.; Fan, L.; Willamowski, J.; Bray, C. Visual Categorization with
Bags of Keypoints. Paper presented at the Workshop on statistical learning in computer
vision, ECCV, 2004.
17. Dai, L.; Sun, X.; Wu, F.; Yu, N. Large Scale Image Retrieval with Visual Groups.
Paper presented at the Image Processing (ICIP), 2013 20th IEEE International
Conference on, 2013.
18. Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. Paper
presented at the Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE
Computer Society Conference on, 2005.
19. Deselaers, T.; Ferrari, V. Global and Efficient Self-similarity for Object Classification
and Detection. Paper presented at the Computer Vision and Pattern Recognition
(CVPR), 2010 IEEE Conference on, 2010.
N-grams for Image Classification and Retrieval 115
20. Ess, A.; Leibe, B.; Schindler, K.; Gool, L. V. A Mobile Vision System for Robust Multi-
person Tracking. Paper presented at the Computer Vision and Pattern Recognition,
2008. CVPR 2008. IEEE Conference on, 2008.
21. Faloutsos, C.; Barber, R.; Flickner, M.; Hafner, J.; Niblack, W.; Petkovic, D.; Equitz,
W. Efficient and Effective Querying by Image Content. J. Intell. Info. Syst. 1994, 3
(3–4), 231–262.
22. Feng, J.; Liu, Y.; Wu, L. Bag of Visual Words Model with Deep Spatial Features for
Geographical Scene Classification. Comput. Intell. Neurosci. 2017, 2017.
23. Gao, Q.-G.; Wong, A. Curve Detection Based on Perceptual Organization. Pattern
Recogn. 1993, 26 (7), 1039–1046.
24. Ghoneim, A.; Muhammad, G.; Hossain, M. S. Cervical Cancer Classification Using
Convolutional Neural Networks and Extreme Learning Machines. Future Gen.
Comput. Syst. 2020, 102, 643–649.
25. Goodrum, A.; Spink, A. Image Searching on the Excite Web Search Engine. Info.
Process. Manage. 2001, 37 (2), 295–311.
26. Greenspan, H.; Van Ginneken, B.; Summers, R. M. Guest Editorial Deep Learning
in Medical Imaging: Overview and Future Promise of an Exciting New Technique.
IEEE Trans. Med. Imag. 2016, 35 (5), 1153–1159.
27. Griffin, G.; Holub, A.; Perona, P. Caltech-256 Object Category Dataset, 2007.
28. Huang, H.; Ji, Z.; Lin, L.; Liao, Z.; Chen, Q.; Hu, H.; . . . Tong, R. Multiphase Focal
Liver Lesions Classification with Combined N-gram and BoVW. In Innovation in
Medicine and Healthcare Systems, and Multimedia; Springer, 2019; pp 81–91.
29. Huang, M.; Yang, W.; Yu, M.; Lu, Z.; Feng, Q.; Chen, W. Retrieval of Brain Tumors
with Region-specific Bag-of-visual-words Representations in Contrast-enhanced MRI
Images. Comput. Math. Methods Med. 2012, 2012, 280538. doi:10.1155/2012/280538
30. Hussain, M.; Khan, S.; Muhammad, G.; Berbar, M.; Bebis, G. Mass Detection
in Digital Mammograms Using Gabor Filter Bank. Paper presented at the Image
Processing (IPR 2012), IET Conference on, 2012.
31. Jansohn, C.; Ulges, A.; Breuel, T. M. Detecting Pornographic Video Content by
Combining Image Features with Motion Information. Paper presented at the
Proceedings of the 17th ACM international conference on Multimedia, 2009.
32. Jégou, H.; Douze, M.; Schmid, C. Improving Bag-of-features for Large Scale Image
Search. Int. J. Comput. Vision 2010, 87 (3), 316–336.
33. Jelinek, H. F.; Pires, R.; Padilha, R.; Goldenstein, S.; Wainer, J.; Rocha, A. Quality
Control and Multi-lesion Detection in Automated Retinopathy Classification Using
a Visual Words Dictionary. Paper presented at the Intl. Conference of the IEEE
Engineering in Medicine and Biology Society, 2013.
34. Juan, L.; Gwun, O. A Comparison of Sift, PCA-sift and Surf. Int. J. Image Process.
(IJIP) 2009, 3 (4), 143–152.
35. Jurie, F.; Triggs, B. Creating Efficient Codebooks for Visual Recognition. Paper
presented at the Computer Vision, 2005. ICCV 2005. Tenth IEEE International
Conference on, 2005.
36. Kanaris, I.; Kanaris, K.; Houvardas, I.; Stamatatos, E. Words Versus Character
N-grams for Anti-spam Filtering. Int. J. Artif. Intell. Tools 2007, 16 (6), 1047–1067.
37. Kelly, P. M.; Cannon, T. M. CANDID: Comparison Algorithm for Navigating
Digital Image Databases. Paper presented at the Scientific and Statistical Database
116 Computer Vision and Recognition Systems
68. Rahman, M. M.; Antani, S. K.; Thoma, G. R. Biomedical CBIR Using “bag of
keypoints” in a Modified Inverted Index. Paper presented at the Computer-Based
Medical Systems (CBMS), 2011 24th International Symposium on, 2011.
69. Reddy, T.; RM, S. P.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan, W. Z. A Deep
Neural Networks Based Model for Uninterrupted Marine Environment Monitoring.
Comput. Commun., 2020.
70. Rickman, R. M.; Stonham, T. J. Content-based Image Retrieval Using Color Tuple
Histograms. Paper presented at the Electronic Imaging: Science & Technology, 1996.
71. Rickman, R.; Rosin, P. Content-based Image Retrieval Using Colour N-grams. Paper
presented at the Intelligent Image Databases, IEE Colloquium on, 1996.
72. Shen, L.; Lin, J.; Wu, S.; Yu, S. HEp-2 Image Classification Using Intensity Order
Pooling Based Features and Bag of Words. Pattern Recogn. 2014, 47 (7), 2419–2427.
73. Sheshadri, H.; Kandaswamy, A. Experimental Investigation on Breast Tissue Classifi
cation Based on Statistical Feature Extraction of Mammograms. Comput. Med. Imag.
Graph. 2007, 31 (1), 46–48.
74. Sivic; Zisserman. Video Google: A Text Retrieval Approach to Object Matching in
Videos; USA, 2003; pp 1470–1477.
75. Sivic, J.; Zisserman, A. Video Google: A Text Retrieval Approach to Object Matching
in Videos. Paper presented at the Computer Vision, 2003. Proceedings. Ninth IEEE
International Conference on, 2003.
76. Smeulders, A. W.; Worring, M.; Santini, S.; Gupta, A.; Jain, R. Content-based Image
Retrieval at the End of the Early Years. Pattern Analy. Mach. Intell., IEEE Trans.
2000, 22 (12), 1349–1380.
77. Suen, C. Y. N-gram Statistics for Natural Language Understanding and Text Processing.
Pattern Analy. Mach. Intell., IEEE Trans. 1979, 2, 164–172.
78. Tan, X.; Triggs, B. Enhanced Local Texture Feature Sets for Face Recognition Under
Difficult Lighting Conditions. Image Process., IEEE Trans. 2010, 19 (6), 1635–1650.
79. Tirilly, P.; Claveau, V.; Gros, P. Language Modeling for Bag-of-visual Words
Image Categorization. Paper presented at the Proceedings of the 2008 international
conference on Content-based image and video retrieval, 2008.
80. Tsai, C.-F. Bag-of-Words Representation in Image Annotation: A Review. ISRN Artif.
Intell. 2012, 2012, 1–19. doi:10.5402/2012/376804
81. van de Sande, K. E.; Gevers, T.; Snoek, C. G. Empowering Visual Categorization
with the GPU. IEEE Trans. Multimedia 2011, 13 (1), 60–70.
82. Wang, J.; Li, Y.; Zhang, Y.; Xie, H.; Wang, C. Bag-of-features Based Classification of
Breast Parenchymal Tissue in the Mammogram Via Jointly Selecting and Weighting
Visual Words. Paper presented at the Image and Graphics (ICIG), 2011 Sixth Interna
tional Conference on, 2011.
83. Wang, S.; McKenna, M.; Wei, Z.; Liu, J.; Liu, P.; Summers, R. M. Visual Phrase
Learning and Its Application in Computed Tomographic Colonography. Medical
Image Computing and Computer-Assisted Intervention–MICCAI 2013; Springer,
2013; 243–250.
84. Wang, W.; Liang, D.; Chen, Q.; Iwamoto, Y.; Han, X.-H.; Zhang, Q.; . . . Chen, Y.-W.
Medical Image Classification Using Deep Learning. Deep Learning in Healthcare;
Springer, 2020; pp 33–51.
N-grams for Image Classification and Retrieval 119
85. Xiao, J.; Ehinger, K. A.; Hays, J.; Torralba, A.; Oliva, A. Sun Database: Exploring a
Large Collection of Scene Categories. Int. J. Comput. Vision 2014, 1–20.
86. Yanagihara, R. T.; Lee, C. S.; Ting, D. S. W.; Lee, A. Y. Methodological Challenges
of Deep Learning in Optical Coherence Tomography for Retinal Diseases: A Review.
Transl. Vision Sci. Technol. 2020, 9 (2), 11–11.
87. Yang, W.; Lu, Z.; Yu, M.; Huang, M.; Feng, Q.; Chen, W. Content-based Retrieval
of Focal Liver Lesions Using Bag-of-visual-words Representations of Single- and
Multiphase Contrast-enhanced CT Images. J. Digit Imag. 2012, 25 (6), 708–719.
doi:10.1007/s10278-012-9495-1
88. Zhang, J.; Xie, Y.; Wu, Q.; Xia, Y. Medical Image Classification Using Synergic
Deep Learning. Med. Image Analy. 2019, 54, 10–19.
89. Zhang, Z.; Cao, C.; Zhang, R.; Zou, J. Video Copy Detection Based on Speeded Up
Robust Features and Locality Sensitive Hashing. Paper presented at the Automation
and Logistics (ICAL), 2010 IEEE International Conference on, 2010.
90. Zheng, Q.-F.; Wang, W.-Q.; Gao, W. Effective and Efficient Object-based Image
Retrieval Using Visual Phrases. Paper presented at the Proceedings of the 14th
annual ACM international conference on Multimedia, 2006.
91. Zhu, L.; Rao, A.; Zhang, A. Advanced Feature Extraction for Keyblock-based Image
Retrieval. Info. Syst. 2002, 27 (8), 537–557.
92. Zhu, L.; Zhang, A.; Rao, A.; Srihari, R. Keyblock: An Approach for Content-based
Image Retrieval. Paper presented at the Proceedings of the eighth ACM international
conference on Multimedia, 2000.
CHAPTER 6
ABSTRACT
6.1 INTRODUCTION
Brain MR images are one of the most commonly used image types in
the field of biomedical image processing. Today, many diseases such as
cancer, schizophrenia (SZ) can be diagnosed by scientists on these images.
122 Computer Vision and Recognition Systems
However, the duration of these manual diagnoses and the accuracy of the
diagnosis may vary depending on the person’s experience. Therefore,
computer-aided studies are needed in this field and there are many papers
in the literature which have been studied on this subject.19,38,40
Arunachalam and Savarimuthu proposed a computer-aided brain tumor
detection and segmentation method.7 The proposed system has stages of
enhancement, conversion, feature extraction, and classification. Brain
images are enhanced using shift-invariant shearlet transformation (SIST).
Brain tumor detection is a difficult task because the brain images contain
large variations in shape and density. Shanmuga Priya and Valarmathi
focused on edema and tumor segmentation based on skull extraction and
kernel-based fuzzy c-means (FCM) approach.49 The clustering process was
developed by combining spatial information-based multiple kernels. Sajid
et al. presented a deep learning-based method for brain tumor segmentation
using different MRIs.47 The proposed convolutional neural network (CNN)
architecture uses a patch-based approach and takes local and contextual
information into account when estimating the output tag. Patil and Hamde
proposed a computer-aided system based on monogenic signal analysis for
the recognition of brain tumor image.42 Textural identifiers from different
monogenic components were obtained using a completed local binary pattern
and a gray-level co-occurrence matrix. Kebir et al. presented a complete and
fully automated MRI brain tumor detection and segmentation methodology
using the Gaussian mixture model, FCM, active contour, wavelet transform,
and entropy segmentation methods.23 The proposed algorithm consists of
skull extraction, tumor segmentation, and detection sections.
As can be seen from the above studies, brain images are difficult to be
studied due to their anatomy. Therefore, it has been found more useful to use
many methods together. EAs are also frequently used as alternative methods
in many studies with brain images. In this chapter, studies using EAs on 2-D
brain MR images are presented and various results are discussed.
6.2 BACKGROUND
There are many areas in which the EAs are used in the literature. One
of these areas is the brain MRI processing. In the following section, the
studies between 2014 and 2019 using 2-D brain MR images and EAs are
mentioned.
In this chapter, the studies between 2014 and 2019, in which EAs were
applied to 2-dimensional brain MR images, were examined. In order to
analyze the effects of these studies on brain MR images, the studies were
compared according to the methods used, publication years, datasets and
accuracy rates.
EAs have been used in many different stages in the processing of brain
MR images. All tables between Table 6.1 and Table 6.7 provide infor
mation about which methods are used in which stages. In Table 6.1, the
studies using PSO are listed by year and the stage of use. It is examined
that PSO is used in most of the image processing stages. In some studies,
the PSO algorithm is used in a combination with other EAs. In some other
studies, some modified versions of the PSO algorithm were presented.
A Survey on Evolutionary Algorithms 137
TABLE 6.1 Brain MR Images Processing which are Used in Studies Using PSO.
Year Reference Method(s) Stage of use
2019 [36] PSOBFO Segmentation
2014 [29] PSO Image denoising
2018 [17] PSO Feature reduction
2015 [32] PSO Segmentation
2018 [37] MPSO Classification
2018 [41] PSO Segmentation
2019 [9] PSO Image filtering
2015 [43] BF QPSO Image registration
2019 [44] PSONN Feature extraction
2017 [28] classical PSO, DPSO, or FODPSO Segmentation
2018 [22] MPSO Thresholding
2018 [31] BPSO Classification
2019 [34] MASCA–PSO Feature extraction
TABLE 6.2 Brain MR Images Processing which are Used in Studies Using DE.
Year Reference Method(s) Stage of use
2017 [48] MOEA/D-DE Segmentation
2018 [37] MDE Classification
TABLE 6.3 Brain MR Images Processing which are Used in Studies Using BFO.
Year Reference Method(s) Stage of use
2017 [56] BFO Segmentation
2014 [1] GA BFO Segmentation
2018 [57] BFO Segmentation
2019 [36] PSO BFO Segmentation
TABLE 6.4 Brain MR Images Processing which are Used in Studies Using GA.
Year Reference Method(s) Stage of use
2016 [11] GA Segmentation
2018 (Hemanth et al., 2018) Three GA combinations Feature selection
2016 [4] GA Image denoising
2016 [15] GA Segmentation
2015 [20] NIFCMGA Segmentation
2019 [6] GA Classification
2014 [30] Real coded genetic algorithm Segmentation
2014 [1] GA BFO Segmentation
2019 [46] MedGA (Medical image Thresholding
preprocessing based on GAs)
2014 [5] GA Preprocessing
2017 [27] GA Feature selection
2019 [8] GA Feature selection
2019 [33] GA Segmentation
2018 [50] GA Feature selection
2019 [52] GA Feature selection
TABLE 6.5 Brain MR Images Processing which are Used in Studies Using ACO.
Year Reference Method(s) Stage of use
2019 [25] ACO Thresholding
2016 [10] FABC (Fuzzy-based artificial bee colony Segmentation
optimization)
TABLE 6.6 Brain MR Images Processing which are Used in Studies Using BBO.
Year Reference Method(s) Stage of use
2016 [59] BBO Classification
2016 [60] ARCBBO Classification
TABLE 6.7 Brain MR Images Processing which are Used in Studies Using CSA.
Year Reference Method(s) Stage of use
2017 [39] CSA Thresholding
2017 [18] CSA Image enhancement
The EAs given in Table 6.8 illustrate the stage of use of these algorithms
in the related studies. According to Tables 6.1–6.8, it is obvious that GA and
PSO are the most used algorithms in the mentioned years. The algorithms in
these tables have often been combined with other methods. In some studies,
the methods have been modified and used instead of their original form.
TABLE 6.8 Brain MR Images Processing which are Used in Studies Using Other
Evolutionary Algorithms.
Year Reference Method(s) Stage of use
2016 [21] FA Feature selection
2017 [51] Bat optimization (BO) Image enhancement
2019 [3] GWO Classification
2017 [2] ACRO Thresholding
2014 [35] HGOA Segmentation
2018 [55] WOA Classification
2016 [53] SFLA Feature extraction
2016 (Panda et al., 2016) ASSO Thresholding
2019 [16] Social group optimization (SGO) Thresholding
2018 [26] AWDO Segmentation
140 Computer Vision and Recognition Systems
6.6 CONCLUSIONS
In this chapter, the studies that use the EAs in the processing of 2D brain
MR images are examined. As a result of the studies, it has been found that
EAs are utilized in many image processing stages such as segmentation,
tumor detection, feature extraction, and classification. When the details of
A Survey on Evolutionary Algorithms 143
these stages are examined, it is seen that EAs are used in a hybrid way
with other methods. In addition to the original versions of these algo
rithms, various modified versions have been included in the studies. In the
mentioned studies, these algorithms are generally used for optimization
and improvement of other methods. This optimization has sometimes
helped to determine the optimum parameters of the method in which it is
used together, sometimes to improve the classification performance and
sometimes to obtain more accurate results. As can be seen from the studies
examined, EAs have significantly contributed to the studies performed with
brain images in the field of biomedical image processing as in other areas.
KEYWORDS
• bioinspired algorithms
• biomedical ımage processing
• genetic algorithm
• particle swarm optimization
• bacteria foraging optimization
• brain MR segmentation
• brain MR classification
REFERENCES
1. Agrawal, S.; Panda, R.; Dora, L. A Study on Fuzzy Clustering for Magnetic Resonance
Brain Image Segmentation Using Soft Computing Approaches. Appl. Soft Comput.
2014, 24, 522–533.
2. Agrawal, S.; Panda, R.; Samantaray, L.; Abraham, A. A Novel Automated Absolute
Intensity Difference Based Technique for Optimal MR Brain Image Thresholding. J
King Saud University Comput. Inf. Sci. 2017.
3. Ahmed, H. M.; Youssef, B. A. B.; Elkorany, A. S.; Elsharkawy, Z. F.; Saleeb, A. A.;
Abd El-Samie, F. Hybridized Classification Approach for Magnetic Resonance Brain
Images Using Gray Wolf Optimizer and Support Vector Machine. Multimedia Tools
Appl. 2019, 78, 27983–28002.
4. Akdemir Akar, S. Determination of Optimal Parameters for Bilateral Filter in Brain
MR Image Denoising. Appl. Soft Comput. 2016, 43, 87–96.
144 Computer Vision and Recognition Systems
5. Akusta Dagdeviren, Z.; Oguz, K.; Cinsdikici, M. G. Three Techniques for Automatic
Extraction of Corpus Callosum in Structural Midsagittal Brain MR Images: Valley
Matching, Evolutionary Corpus Callosum Detection and Hybrid method. Eng. Appl.
Artif. Intell. 2014, 31, 101–115.
6. Anaraki, A. K.; Ayati, M.; Kazemi, F. Magnetic Resonance Imaging-Based Brain
Tumor Grades Classification and Grading via Convolutional Neural Networks and
Genetic Algorithms. Biocybern. Biomed. Eng. 2019, 39(1), 63–74.
7. Arunachalam, M.; Savarimuthu, S. R. An Efficient and Automatic Glioblastoma Brain
Tumor Detection Using Shift-Invariant Shearlet Transform and Neural Networks.
Imaging Syst. Technol. 2017, 27(3), 216–226.
8. Aswathy, S. U.; Devadhas, G. G.; Kumar, S. S. Brain Tumor Detection and
Segmentation Using a Wrapper Based Genetic Algorithm for Optimized Feature Set.
Cluster Comput. 2019, 22, 13369–13380.
9. Bhateja, V.; Nigam, M.; Bhadauria, A. S.; Arya, A.; Zhang, E. Y. Human Visual
System Based Optimized Mathematical Morphology Approach for Enhancement of
Brain MR Images. J Ambient Intell. Humaniz. Comput. 2019.
10. Bose, A.; Mali, K. Fuzzy-Based Artificial Bee Colony Optimization for Gray Image
Segmentation. Signal Image Video Process. 2016, 10(6), 1089–1096.
11. Chandra, G. R.; Rao, Dr. K. R. H. Tumor Detection in Brain using Genetic Algorithm.
Procedia Comput. Sci. 2016, 79, 449–457.
12. Chowdhary, C. L. 3D Object Recognition System Based on Local Shape Descriptors
and Depth Data Analysis. Recent Pat. Comput. Sci. 2019, 12(1), 18–24.
13. Chowdhary, C. L.; Acharjya, D. P. Segmentation and Feature Extraction in Medical
Imaging: A Systematic Review. Procedia Comput. Sci. 2020, 167, 26–36.
14. Das, T. K.; Chowdhary, C. L.; Gao, X. Z. Chest X-Ray Investigation: A Convolutional
Neural Network Approach. In Journal of Biomimetics, Biomaterials and Biomedical
Engineering; Trans Tech Publications Ltd, 2020; Vol. 45, pp 57–70.
15. De, S.; Bhattacharyya, S.; Dutta, P. Automatic Magnetic Resonance Image Segmenta
tion by Fuzzy Intercluster Hostility Index Based Genetic Algorithm: An Application.
Appl. Soft Comput. 2016, 47, 669–683.
16. Dey, N.; Rajinikanth, V.; Shi, F.; Tavares, J. M. R.S.; Moraru, L.; Karthik, K. A.;
Lin, H.; Kamalanand, K.; Emmanuel, C. Social-Group-Optimization Based Tumor
Evaluation Tool for Clinical Brain MRI of Flair/Diffusion-Weighted Modality.
Biocybernet. Biomed. Eng. 2019, 39(3), 843–856.
17. Ding, W.; Lin, C. T.; Chen, S.; Zhang, X.; Hu, B. Multiagent-Consensus-MapReduce-
Based Attribute Reduction Using Co-Evolutionary Quantum PSO for Big Data
Applications. Neurocomput. 2018, 272, 136–153.
18. Gong, T.; Fan, T.; Pei, L.; Cai, Z. Magnetic Resonance Imaging-Clonal Selection
Algorithm: An Intelligent Adaptive Enhancement of Brain Image with an Improved
Immune Algorithm. Eng. Appl. Artif. Intell. 2017, 62, 405–411.
19. Hemanth, D. J.; Anitha, J. Modified Genetic Algorithm Approaches for Classification
of Abnormal Magnetic Resonance Brain Tumor Images. Appl. Soft Comput. J. 2019,
75, 21–28.
20. Huang, C. W.; Lin, K. P.; Wu, M. C.; Hung, K. C.; Liu, G. S.; Jen, C. H. Intuitionistic
Fuzzy c-Means Clustering Algorithm with Neighborhood Attraction in Segmenting
Medical Image. Soft Comput. 2015, 19(2), 459–470.
A Survey on Evolutionary Algorithms 145
21. Jothi, G.; Inbarani, H. H. Hybrid Tolerance Rough Set–Firefly Based Supervised
Feature Selection for MRI Brain Tumor Image Classification. Appl. Soft Comput.
2016, 46, 639–651.
22. Kaur, T.; Saini, B. S.; Gupta, S. A Joint Intensity and Edge Magnitude-Based
Multilevel Thresholding Algorithm for the Automatic Segmentation of Pathological
MR Brain Images. Neural Comput. Appl. 2018, 30(4), 1317–1340.
23. Kebir, S. T.; Mekaoui, S.; Bouhedda, M. A Fully Automatic Methodology for MRI
Brain Tumor Detection and Segmentation. Imaging Sci. J. 2019, 67(1), 42–62.
24. Khare, N., Devan, P., Chowdhary, C. L., Bhattacharya, S., Singh, G., Singh, S.,
& Yoon, B. SMO-DNN: Spider Monkey Optimization and Deep Neural Network
Hybrid Classifier Model for Intrusion Detection. Electronics 2020, 9(4), 692.
25. Khorram, B.; Yazdi, M. A New Optimized Thresholding Method Using Ant Colony
Algorithm for MR Brain Image Segmentation. J Digital Imaging. 2019, 32(1), 162–174.
26. Kotte, S.; Pullakura, R. K.; Injeti, S. K. Optimal Multilevel Thresholding Selection
for Brain MRI Image Segmentation Based on Adaptive Wind Driven Optimization.
Measurement 2018, 130, 340–361.
27. Kumar, S.; Dabas, C.; Godara, S. Classification of Brain MRI Tumor Images: A
Hybrid Approach. Procedia Comput. Sci. 2017, 122, 510–517.
28. Lahmiri, S. Glioma Detection Based on Multi-Fractal Features of Segmented Brain
MRI by Particle Swarm Optimization Techniques. Biomed. Signal Process. Control.
2017, 31, 148–155.
29. Mandal, A. D.; Chatterjee, A.; Maitra, M. Robust Medical Image Segmentation
Using Particle Swarm Optimization Aided Level Set Based Global Fitting Energy
Active Contour Approach. Eng. Appl. Artif. Intell. 2014, 35, 199–214.
30. Manikandan, S.; Ramar, K.; Iruthayarajan, M. W.; Srinivasagan, K. G. Multilevel
Thresholding for Segmentation of Medical Brain Images Using Real Coded Genetic
Algorithm. Measurement 2014, 47, 558–568.
31. Manohar, L.; Ganesan, K. Diagnosis of Schizophrenia Disorder in MR Brain Images
Using Multi-objective BPSO Based Feature Selection with Fuzzy SVM. J Med. Biol.
Eng. 2018, 38(6), 917–932.
32. Mekhmoukh, A.; Mokrani, K. Improved Fuzzy C-Means Based Particle Swarm
Optimization (PSO) Initialization and Outlier Rejection with Level Set Methods for MR
Brain Image Segmentation. Comput. Methods Progr. Biomed. 2015, 122(2), 266–281.
33. Méndez, I. A. R.; Ureña, R.; Herrera-Viedma, E. Fuzzy Clustering Approach for
Brain Tumor Tissue Segmentation Inmagnetic Resonance Images. Soft Comput.
2019, 23(20), 10105–10117.
34. Mishra, S.; Sahu, P.; Senapati, M. R. MASCA–PSO Based LLRBFNN Model and
Improved Fast and Robust FCM Algorithm for Detection and Classification of Brain
Tumor from MR Image. Evolut. Intell. 2019, 12(4), 647–663.
35. Nabizadeh, N.; John, N.; Wright, C. Histogram-Based Gravitational Optimization
Algorithm on Single MR Modality for Automatic Brain Lesion Detection and
Segmentation. Expert Syst. Appl. 2014, 41(17), 7820–7836.
36. Narayanan, A.; Rajasekaran, M. P.; Zhang, Y.; Govindaraj, V.; Thiyagarajan, A.
Multi-Channeled MR Brain Image Segmentation: A Novel Double Optimization
Approach Combined with Clustering Technique for Tumor Identification and Tissue
Segmentation. Biocybernet. Biomed. Eng. 2019, 39(2), 350–381.
146 Computer Vision and Recognition Systems
37. Nayak, D. R.; Dash, R.; Majhi, B. Discrete Ripplet-II Transform and Modified PSO
Based Improved Evolutionary Extreme Learning Machine for Pathological Brain
Detection. Neurocomput. 2018, 282, 232–247.
38. Nayak, D. R.; Dash, R.; Majhi, B. An Improved Pathological Brain Detection System
Basedon Two-Dimensional PCA and Evolutionary Extreme Learning Machine. J
Med. Syst. 2018, 42(19).
39. Oliva, D.; Hinojosa, S.; Cuevas, E.; Pajares, G.; Avalos, O.; Gálvez, J. Cross Entropy
Based Thresholding for Magnetic Resonance Brain Images Using Crow Search
Algorithm. Expert Syst. Appl. 2017, 79, 164–180.
40. Panda, R.; Agrawal, S.; Samantaray, L.; Abraham, A. An Evolutionary Gray Gradient
Algorithm for Multilevel Thresholding of Brain MR Images Using Soft Computing
Techniques. Appl. Soft Computi. 2017, 50, 94–108.
41. Pham, T. X.; Siarry, P.; Oulhadj, H. Integrating Fuzzy Entropy Clustering with an
Improved PSO for MRI Brain Image Segmentation. Appl. Soft Comput. 2018, 65,
230–242.
42. Patil, D. O.; Hamde, S. T. Brain MR Imaging Tumor Detection Using Monogenic
Signal Analysis-Based Invariant Texture Descriptors. Arab. J Sci. Eng. 2019, 44(11),
9143–9158.
43. Pradhan, S.; Patra, D. RMI Based Non-Rigid Image Registration Using BF-QPSO
Optimization and P-Spline. AEU Int. J Electron. Commun. 2015, 69(3), 609–621.
44. Rajesh, T.; Suja Mani Malar, R.; Geetha, M. R. Brain Tumor Detection Using Opti
mization Classification Based on Rough Set Theory. Cluster Comput. 2019, 22(6),
13853–13859.
45. Reddy, T.; RM, S. P.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan, W. Z. A Deep
Neural Networks Based Model for Uninterrupted Marine Environment Monitoring.
Comput. Commun. 2020.
46. Rundo, L.; Tangherloni, A.; Cazzaniga, P.; Nobile, M. S.; Russo, G.; Gilardi, M.
C.; Vitabilei, S.; Mauria, G.; Besozzi, D.; Militello, C. A Novel Framework for MR
Image Segmentation and Quantification by Using MedGA. Comput. Methods Progr.
Biomed. 2019, 176, 159–172.
47. Sajid, S.; Hussain, S.; Sarwar, A. Brain Tumor Detection and Segmentation in MR
Images Using Deep Learning. Arab. J Sci. Eng. 2019, 44(11), 9249–9261.
48. Sarkar, S.; Das, S.; Chaudhuri, S. S. Multi-Level Thresholding with a Decomposition-
Based Multi-Objective Evolutionary Algorithm for Segmenting Natural and Medical
Images. Appl. Soft Comput. 2017, 50,142–157.
49. Shanmuga Priya, S.; Valarmathi, A. Efficient Fuzzy C-Means Based Multilevel Image
Segmentation for Brain Tumor Detection in MR Images. Design Automation Embed.
Syst. 2018, 22(1–2), 81–93.
50. Sharif, M.; Tanvir, U.; Munir, E. U.; Khan, M. A.; Yasmin, M. Brain Tumor Segmen
tation and Classification by Improved Binomial Thresholding and Multi-Features
Selection. J Ambient Intell. Humaniz. Comput. 2018, 1–20.
51. Singh, M.; Verma, A.; Sharma, N. Bat Optimization Based Neuron Model of Stochastic
Resonance for the Enhancement of MR Images. Biocybernet. Biomed. Eng. 2017,
37(1), 124–134.
A Survey on Evolutionary Algorithms 147
ABSTRACT
7.1 INTRODUCTION
Scene graph proposed by13 is one type of the graphs, representing the rela
tions between objects inside the image. In the graph, each node represents
the objects in the image. The leaf node can be physical, geometric, or
material depending on each object type. We can use the scene graph to
150 Computer Vision and Recognition Systems
language. The derived scene graph data set can be alternatives for Thai
developers to create tasks such as Thai image captioning applications. The
method contains the following steps, given an image as an input of the
generator. The input image is given to the caption generator and the output
sentence is fed to the scene graph parser to put the information in the scene
graph format. Finally, the scene graph in English language is translated to
Thai language by a neural machine translator.
7.2 BACKGROUND
The scene graph is the graph structure which describes relations or attributes
between two objects.17 There are various ways to develop the scene graph
generator.26 The first approach is to generate captions using convolutional
neural network (CNN) and recurrent neural network (RNN).14,20 Then, the
caption is used to be the input of the graph parser to convert into the scene
graph (). The second approach is to use object detection and use attribute
extraction as well as relation extraction to generate the scene graph.
The generator utilizes the object detection scheme and then uses feature
extraction to convert the information into the scene graph.3
For two basic tasks such as object detection, and object recognition,
COCO data set is one of popular data sets utilized.25 CNN is the popular
152 Computer Vision and Recognition Systems
7.3 METHODOLOGY
Figure 7.3 describes the overall steps of this research. The scene graph
generator is made up of three elements: caption generator, scene graph
parser, and translation machine.6,4 First, the caption generator model from
Show and Tell A Neural Image Caption Generator, which is a public
research project of on Github28 is used. The structure of the caption
generator includes image encoder, a deep convolution neural network
which is initialized from Inception_v3 checkpoint and hidden layers
like Long Short-Term Memory (LSTM). As an initialization for caption
generator model, we use image caption generator27 based on COCO 2014
data set. We use caption 2014 and image 2014 for training and testing data
set, evaluation (256 records, 4 and 8, respectively).
Inception_v3 checkpoint is used as a pretrained weight and then the
training is done with COCO 2014 dataset. The model is trained with
1,000,000 epochs, which takes around 1 week on our machine with the
following specification 8-Core Intel(R) Xeon(R) CPU E5-2680 0 @
2.70GHz, RAM 126 GB, Harddisk 1 TB, 2 Tesla K40c Memory Usage
12206MiB Power 235W.
The scene graph parser is used17 to convert the English sentences into
the scene graph. The scene graph format includes relations and attributes in
JSON format. Stanford Scene Graph Parser is used, the model implemented
to support rule-based parser and classifier-based parser. At this point,
the machine translator is used. In our case, Py-translate is selected. The
154 Computer Vision and Recognition Systems
Note that there are two alternatives in applying the translator depending
on which may affect the accuracy.31 For the first approach, the translator
machine takes the scene graph which is output from scene graph parser as
an input like a word-for-word translation in the top figure of Figure 7.4.
In the first step, the sentence is separated into the list of word. Then, each
word is put into the translator machine. The last, resulting words are
mapped into a result sentence. The second option is to apply the translator
from the sentence which is the direct output from the caption generator
like sentence-for-sentence translation as in the bottom of Figure 7.4.
Chatbot Application with Scene Graph in Thai Language 155
7.4 EVALUATION
Different evaluation metrics are used for each model. Then, we calculate
the overall system score by weight sum. For captioning, we use the
accuracy based on Microsoft COCO Caption Evaluation module2 on the
evaluated performance of caption generator. COCO val 2014 is used as a
testing set which includes 4369 records.
The evaluation modules include metrics: BLEU, METEOR, ROUGE,
and CIDEr represents in Table 7.2. For the machine translator, we use
NLPMetric.7 Sub-module’s NLPMetric is SPICE, GLEU, WER, and TER.
Bilingual Evaluation Understudy (BLEU) is the measure tool which
counts the number of words overlap in resulting translation and compares
with the number with the ground-truth translation applied to N-grams.
GLEU, also called Google-BLEU, is the minimum of BLEU precision and
recall applied to N-gram. Recall is calculated by the number of matching
N-grams divided by the number of total N-grams. Word error rate (WER)
is used in speech recognition for counting substitutions mainly, calculated
from number of error words in the predicted sentence compared with
156 Computer Vision and Recognition Systems
the reference sentence. Translation edit rate (TER) counts the number of
edited words which are words deletion, addition, and substitution. The
score calculated from the minimum number of edits divided by the average
length of reference text.
For Microsoft COCO Caption Evaluation, include BLEU, METEOR,
ROUGE-L, CIDEr, and SPICE. Metric for Evaluation of Translation with
Explicit Ordering (METEOR) is the harmonic mean of weighted unigram
precision and recall which includes stemming and synonym matching.
Recall-Oriented Understudy for Gisting Evaluation (ROUGE), remodels
from BLEU adding more attention to recall than precision by paying atten
tion to N-gram. Consensus-based Image Description Evaluation (CIDEr)
measures the similarity of resulting sentences against a set of a ground truth
sentence by focusing on the sentence similarity by the notions of gram
maticality and saliency. Semantic Propositional Image Caption Evaluation
(SPICE) is the F1-score of scene graph tuples.
Our experiment includes two options for using a translator machine.
Thus, the evaluation module receives an input data set for two ways. The
evaluation module gets a predict sentence from a word-for-word transla
tion. The other one is a sentence-for-sentence translation.
The module takes a predicted Thai sentence that was a resulting
sentence from translator machine and then, compares with a reference
sentence. In addition, we use a Thai language parser or the tokenize tool
from PyThaiNLP to separate a predicted sentence into the sentence with a
space in between word before feeding to the evaluation model.
The results presented in Table 7.3 is based on TALPCo.10 TALPCo
project was developed based on the main language like Japanese and then
this language translated to other Asian languages. The data set translated
into English is done by Japanese undergraduate students who had studied at
an international junior school and it is rechecked by native British English
speaker. The second version of this project supports Thai language. The
data set was rechecked correctly by Thai major student at Tokyo University.
Only the first 100 records for evaluating data set are used. The evaluating
data set is preprocessed. Our preprocessing step removes the character like
a dot from a sentence. Some example of TALPCo data set are “There is a
tree in the park.”' which is translated to Thai as ‘‘มีตน
้ ไม้อยูใ่ นสวนสาธารณะ”
From Table 7.2, the highest value for the caption generator is CIDEr
which is 0.996. The second is 0.720 from BLEU-4. In translator model,
from Table 7.3, the highest evaluate value is WER which is 5.000. The
second is 1.1082 from TER average with the second approach.
Chatbot Application with Scene Graph in Thai Language 157
The overall system score calculated from two parts giving equal
weights. First, we use CIDEr score for a leader of our caption generator
score and GLEU score be a leader for our translator machine score.
Overall system score for our first approach, a word-for-word translation,
is 0.53925, the second, a sentence-for-sentence translation, is 0.6028.
Github of the code and results are available at https://github.com/Bell001/
scene-graph-project.git. The code contains implementation example divided
into folders:
|--CoreNLP
|--application-captures
|--helper
|--measures-model
|--translator-machine
|-- NLPMetrics
|--test
|--TALPCo
|--translate-word
|--process-model
|--python-packages
|--test-more
|--Trans_data_result
7.5 APPLICATION
FIGURE 7.5 Messenger Chatbot user response with scene graph and sentences (Example I).
160 Computer Vision and Recognition Systems
In the Figure 7.6, at (1), the image to create the scene graph is submitted.
Then the chatbot replies in JSON format. The responses contain objects
and relationship where each word gets translated in Thai. In (3), the Thai
sentence is response meaning as “a closed look of a cat drinking from the
cup” or “ภาพระยะใกล้ของแมวที่ดื่มจากถ้วย” in Thai.
FIGURE 7.6 Messenger Chatbot user response with scene graph and sentences (Example II).
Chatbot Application with Scene Graph in Thai Language 161
In Figure 7.6, the image submitted at (1) is response with (2) containing
three objects and two relationships. The whole sentence is translated as “a
group of people sitting around the table” or “กลุ่มคนที่นั่งรอบๆ โต๊ะอาหาร” in
Thai.
7.6 CONCLUSION
We present a method for Thai Scene graph generation and the usage on
the chatbot. Scene graph contains objects and relations extracted from
the given image. The steps contain (1) image captioning (2) scene graph
parser (3) Translator machine. The performance is measured for each
step and the overall score is computed by the sum of all scores. From our
experiment, there are two approaches to use the translator machine. The
overall score from a sentence-for-sentence translation gives a higher score
than a word-for-word translation. The translator evaluation score implies
how correctly the system can translate. The second approach yields better
performance as a translator machine.
The results, scene graph in Thai language, show that our scene graph
model generation contains a limitation about the accuracy of the scene
graph in Thai language. Caption generator is the mainly sub-model that
has main impacts on the result. Our model uses the best sentence, which
is output from the caption generator, to convert be a scene graph. In our
experiment, this model still cannot cover general information due to the
limited training data. With this model, the demonstration works on a small
group of detected objects and their relations in the image derived by the
COCO data set. If the larger data set is available, the approach can be used
to generate sentences with larger classes of objects.
KEYWORDS
• chatbot
• scene graph
• deep learning
• caption generation
162 Computer Vision and Recognition Systems
REFERENCES
18. Li, Y.; Ouyang, W.; Zhou, B.; Wang, K.; Wang, X. Scene Graph Generation from
Objects, Phrases and Region Captions. ICCV 2017, 2017.
19. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S. E.; Fu, C.; Berg, A. C.
SSD: Single Shot Multibox Detector, 2015. CoRR, abs/1512.02325. http://arxiv.org/
abs/1512.02325
20. Reddy, T.; RM, S. P.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan, W. Z. A Deep
Neural Networks Based Model for Uninterrupted Marine Environment Monitoring.
Comput. Commun. 2020.
21. Redmon, J.; Divvala, S. K.; Girshick, R. B.; Farhadi, A. You Only Look Once:
Unified, Real-Time Object Detection, 2015. CoRR, abs/1506.02640. http://arxiv.org/
abs/1506.02640
22. Ren, S.; He, K.; Girshick, R. B.; Sun, J. Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks, 2015. CoRR, abs/1506.01497. http://
arxiv.org/abs/1506.01497
23. Schuster, S.; Krishna, R.; Chang, A.; Fei-Fei, L.; Manning, C. D. Generating
Semantically Precise Scene Graphs from Textual Descriptions for Improved Image
Retrieval. Proceedings of the Fourth Workshop on Vision and Language, 2015.
24. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S. E.; Anguelov, D.;. Rabinovich,
A. Going Deeper with Convolutions, 2014. CoRR, abs/1409.4842. http://arxiv.org/
abs/1409.4842
25. Tsung-Yi, L.; Michael, M.; Serge, B.; James, H.; Pietro, P.; Deva, R.; Lawrence, Z. C.
Microsoft Coco: Common Objects in Context. In Computer Vision – eccv 2014; Fleet,
D., Tomas, P., Bernt, S., Tinne, T., Eds.; Springer International Publishing: Cham,
2014; pp 740–755.
26. Tsutsui, S.; Kumar, M. Scene Graph generation from Images, 2017. http://vision.soic.
indiana.edu/b657/sp2016/projects/stsutsui/paper.pdf.
27. Vinyals, O.; et al. Show and Tell: Lessons Learned from the 2015 MS-COCO Image
Captioning Challenge. IEEE Transac. Patt. Anal. Machine Intell. 2016, 39(4), 652–663.
28. Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and Tell: A Neural Image Caption
Generator, 2014. CoRR, abs/1411.4555. http://arxiv.org/abs/ 1411.4555
29. Wang, Y. S.; Liu, C.; Zeng, X.; Yuille, A. Scene Graph Parsing as Dependency Parsing.
NAACL 2018, 2018.
30. Yang, J.; Lu, J.; Lee, S.; Batra, D.; Parikh, D. Graph R-CNN for Scene Graph
Generation. ECCV 2018, 2018.
31. Yngve, V. H. Sentence-for-Sentence Translation. Mechanical Translation 1955, 2(2),
29–37. http://www.mt-archive.info/MT-1955-Yngve.pdf
32. Zhang, C. Deep Learning for Land Cover and Land Use Classification (Doctoral
Dissertation), 2018. DOI: 10.17635/ lancaster/thesis/428
CHAPTER 8
ABSTRACT
Credit rating firms like D&B, A.M Best Company, etc., usually give scores
to companies based on their bank records, scanning the failure to repay
the loan, etc. since they only look into the financial details of whether
the company defaulted or not in repaying their loans. Text/sentimental
analysis improves decisions made by the banks before lending loans to
their customers. Also enables businesses to grow profitably by providing
information-based intelligence tools. The mission of the work has been
to extract the unstructured data from websites (i.e., Glassdoor, Indeed)
housing company reviews. The objective is to automate the extraction of
the aspects and their corresponding sentiments and cumulate a credit score.
This proposed prototype will accept a text input manually or via a text file
stacking review. These reviews will be tokenized into words and catego
rized by noun and adjective. The adjectives are assigned the respective class
values/polarity (binary form). The entire goal was to make use of company
information stored on the Internet, since it was unaccounted. This kind of
information has been extracted from public websites like kanoon.com,
166 Computer Vision and Recognition Systems
Glassdoor, Indeed.in, etc. So, the rating now is not only based on the bank
records but also on how the company operates its employees, sanitation
issues, the pay problems if any, beneficial perks given to the employees’,
etc. Even the consumer whoever given review about the company perfor
mance in the market also considered for processing. The sentiments/adjec
tives given to all noun forms are recorded and given binary score values.
The cumulative score of a sentence or a paragraph is then presented in a
database. Then a pivot table is generated, which displays a frequency table
of the noun forms and their respective sentiment used to describe them.
The number of times a noun form has a positive/negative sentiment gets
recorded and a score get displayed to the user. Accuracy values for both the
text analysis algorithms have been analyzed, and the best one, that is, the
TextBlob Analyzer has been put to use since it had accuracy values above
95% for positive sentiments and 91% for the mixed sentiments.
8.1 INTRODUCTION
8.1.1 OBJECTIVE
This work takes into consideration the free, unstructured data and using it
for scoring. Sometimes investors do not trust the credit score provided by
the CRA.
Credit Score Improvisation through Automating 167
One reason is that, companies believe that the score might not be updated
to the current date, and that, it could have tampered. Hence, this work looks
at getting the most recent data and generating a score that could add value
to the general credit score. The proposed system uses python and its Natural
Language ToolKit (NLTK) corpus to perform text/sentiment analysis. It made
use of data frames from the Pandas package. It has helped to create tables out
of lists and dictionaries. This scrutinizes and provides better access to it when
the results are computed. The PorterStemmer from the NLTK package is used
to stem down words. For instance, “dread,” “dreadful,” and “dreadfulness”
will be considered as word “dread” while computing “dread” as a sentiment.
Thus, the mission of the work has been to extract the unstructured data from
websites (i.e., Glassdoor, Indeed.in) housing company reviews. The objective
is to automate the extraction of the aspects and their corresponding sentiments
and cumulate a credit score. The score produced will help to provide value to
the credit score generated by the CRAs.
Firstly, the system needs to take care of the organization of the scraped
data. The unstructured data collected needs to be perfectly ordered since
168 Computer Vision and Recognition Systems
the text sentiments describe the particular aspects. Then there comes a
need to determine the sentiment, followed by the consideration of interac
tion factor. Figure 8.2 shows the data flow for the calculation of accuracies
from the training text files. The proposed system achieves accuracies from
two training text files such as pos_sentiment.txt and mixed_sentiment.txt
which has been trained using sentiment analyzer such as Naïve Bayer and
Rule-Based.
In this system, the user is required to input the number of pages he/she
would like to scrape. Once the required pages are set as input, the automa
tion of pages begins, thus the latest reviews will provide valuable insight.
Once the scraping is done, the next step is to perform the sentimental
analysis on the file containing the scraped data. This system was designed
to help aid bring up the value of some CRAs offer. As seen noticeably
in some literature surveys, the investors do not entirely trust the CRAs
completely.7–8 The data collected/reflected might be sometimes a couple
of years old when the company indeed had a good/bad credit score.9
Otherwise, the agencies end up giving incomplete information that is not
very useful to predict the company future.10–11 The user can take advantage
Credit Score Improvisation through Automating 169
of the automated scraping to collect and analyze data for practically any
company from the many registered in the website indeed.com.12–20
The system used Python 3.6 and other necessary packages. This design is
intended to add value to the traditional methods of credit score calculation.
With a lot of thought process put into action, the solution makes use of
the freely available unstructured data available on websites like Indeed,
Glassdoor, etc. The system now helps users to get an insight on the
company performance not only quantitatively, but also qualitatively. This
work includes the use of two test files in which one file contains full
stacked repositories of positive sentiment and the other is stacked with
test data for mixed sentiments. These data were collected by automating
the scraping process by arranging the text reviews by descending order
for the positive reviews and the arrangement for the mixed reviews was
obtained by arranging the reviews by ascending order. These two datasets
are used to get appropriate accuracies for the two algorithms used, namely
TextBlob and Vader Sentiment Analyzer. Proceeding toward the sentiment
analysis module, the system has imported the packages required for both
the mentioned algorithms.
The Vader Sentiment Analyzer package gives output for any given text,
mostly in float data type with values ranging from negative “−1” to positive
“+1,” wherein the parameters are “pos,” “neg,” “neu,” “compound”. This
work, however, uses the compound value for better estimation. The reason
is that a sentence can have a mix of sentiments on one or more aspects. The
sample set of compound classification is given in the examples column.
Examples:
• I do not like the work experience here, but I am pleased about
the salary.
• The salary is not that good, but the free food menu does cease
to surprise me.
170 Computer Vision and Recognition Systems
So a score for sentences like these does not deserve a highly positive
score or a negative score. Hence, the system makes use of a compound
score that helps to attain a net score value that would do justice to the
output. Further, the graph is used to represent the polarity scores that
have been collected over a period of time (as in the Indeed.in website).
Adding to that, the proposed system has made sure that the reviews that
are scraped are only posted in the current financial year. This was done
to make sure the scoring is done in recent basis, since too old data may
hamper the results.
8.3.2 TEXTBLOB
FIGURE 8.4 Created soup for the positive sentiments text file.
Sample reviews with the best rating appearing at first are shown in
Figure 8.5. The screenshot of the reviews in ascending order with the
least rating reviews showing first. This is the code for the scraping of the
reviews for training in ascending order.
172 Computer Vision and Recognition Systems
Next, the system looks at the review scraped in ascending order for the
mixed sentiments. The sample screenshot of the reviews is shown in Figure
8.6. Further, the prototype looks at the accuracy obtained by using the Naïve
Bayes algorithm with TextBlob. Out of two TextBlob measure parameters
such as the polarity and the subjectivity, this system used the polarity to
obtain the accuracies. The subjectivity can be used to determine how many
fact-oriented or opinion-oriented sentiments are existing in the input file.
As it observed, the first pair of results are the accuracies for the two approaches
which was used to calculate the sentiment scores, namely TextBlob and Vader
Credit Score Improvisation through Automating 173
Using the Vader Sentiment Analyzer, the system got the following
accuracies represented in Figure 8.8. It shows the accuracy achieved
174 Computer Vision and Recognition Systems
As it can be observed, the polarities for the mixed sentiments are a little
to the positive side, closer to zero. The reason for this is that as reviews
were scraped for training, it was observed that many employees in spite of
giving a bad rating for the company also provide good pointers to compen
sate and to keep their identity safe.
The output of the scraped file that was created by automating the scraping
process keeping in mind that reviews only of the current year are scraped
to ensure data quality is shown in Figure 8.9. Next, the system getting
ahead and measures the accuracies for both the approaches used. Making
Credit Score Improvisation through Automating 175
a simple comparison, the system can make out that the accuracy for Text-
Blob is more than that of Vader.
The reason is that with Vader it uses a rule base implementation for
Sentiment Analysis. On the contrary with TextBlob, the system uses Naive
Bayes classifier algorithm, which is more efficient. This is the scraped
review for performing the sentiment analysis on. The score from these data
is used to create graphs and the consumer a complete insight by showing
how often the graph reaches the positive and negative peaks.
With “0” marked as the centre of the y-axis, this makes the output even
more apparent.
Figure 8.11 represents the graphical outcome produced with the TextBlob
package using the Naive Bayes classifier algorithm. It is observable that
in the graphical representation, the positive and the negative peaks are
pointed, thus giving us more accurate results.
These data were collected keeping in mind every time the year the user
is scraping it on. For instance, if the user is scraping it in the year 2019,
only those reviews in the particular year will be scraped as the user enters
the page numbers in the multiples of 20. Figure 8.12 represents the output
achieved through the dates and page number of the reviews scraped.
The sentiment module reads lines from the text file indeedreviews.txt.
Further, it looks at the sentimental analysis module using two algorithms,
Credit Score Improvisation through Automating 177
one that is rule-based, Vader Sentiment Analyzer and the other that works
on top of the Naïve Bayes classifier, called the TextBlob classifier.
8.5.6 LIMITATION
The proposed system does not have a login module since this was done
keeping in mind the fact that the functionality it provides for scraping and
text analysis. The system was made to aid and bring value to the credit
scores calculated by the CRAs. As of now, the proposed work only assists
in scraping all company reviews from Indeed.in website. The limitation
of this system is that it is still not capable of scraping through websites,
Credit Score Improvisation through Automating 179
namely Glassdoor.com, since the access is forbidden and does not permit
python packages to do the same. Secondly, the project does not look at
data integrity as a priority. It does not limit the amount of data to scrape,
but data integrity is not considered. Large volumes of freely available
unstructured data are collected for analysis. Adding permissions to these
files would not be a necessity.
For inference, it is safe to say that the proposed system will bring some
value to the customers looking for legitimate investments. More apt infor
mation can be gained from the outputs. Once the peak values from the
180 Computer Vision and Recognition Systems
KEYWORDS
• credit rating
• sentiment analysis
• Glassdoor
• TextBlob analysis
• cumulative score
REFERENCES
1. Zhang, M. L.; Peña, J. M.; Robles, V. Feature Selection for Multi-Label Naive Bayes
Classification [J]. Inf. Sci. 2009, 179(19), 3218–3229.
2. Field, B. J. Towards Automatic Indexing: Automatic Assignment of Controlled-
Language Indexing and Classification from Free Indexing. J Doc. 1975, 31, 246–265.
3. Ittner, D. J.; Lewis, D. D.; Ahn, D. D. Text Categorization of Low Quality Images.
Symposium on Document Analysis and Information Retrieval Las Vegas, NV. ISRI;
Univ. of Nevada: Las Vegas, 1995; pp 301–315.
4. Joachims, T. A Probabilistic Analysis of the Rochhio Algorithm with TFIDF for
Text Categorization. Machine Learning: Proceedings of the Fourteenth International
Conference, 1997; pp 143–151.
5. Lewis, D. D.; Ringuette, M. A Comparison of Two Learning Algorithms for Text
Categorization. Third Annual Symposium on Document Analysis and Information
Retrieval, 1994; pp 81–93.
6. Ng, H. T.; Goh, W. B.; Low, K. L. Feature Selection, Perceptron Learning, and a
Usability Case Study for Text Categorization. Proceedings of the 20th Annual
Credit Score Improvisation through Automating 181
ABSTRACT
9.1 INTRODUCTION
trained, the input frame’s width and height dimensions (width and height)
are expanded gradually. As a result of this, the output layer’s dimensions
map from (1 × 1) to an aspect ratio comparable to that of a new large input.
This can be perceived as trimming a large input image into squares of the
models’ initial input size (64 × 64) and identifying the substances in each
of those squares.11,17
We study few of the research in the topic of on-road vehicle detection and
lane detection as follows:
Sun et al. (2006) presented a survey of vision-based on-road vehicle
detection systems which is an important component of a driver-assistance
system. He put light on several prominent designed prototypes in the last
15 years. They discussed Hypothesis Generation (HG) methods they are
(1) knowledge-based, (2) stereo-based, and (3) motion-based. Edge-based
methods, Hypothesis Verification (HV) methods, they are (1) template-
based and (2) appearance-based along with critique of each methods.
Moreover, effectiveness of optical sensors in detecting on-road vehicle
is being discussed. Furthermore, vision-based vehicle detection methods
with special references to the monocular and stereovision domains in the
last decade have been discussed.12 In the later time, a concise review has
been carried out on vehicle detection by classifying vehicle type classifica
tion by processing videos from traffic surveillance cameras.13
Song et al. (2019) propose a vision-based vehicle detection system
which can be employed for counting vehicles in highway. This research
proposes a segmentation approach to uncouple road surface from the image
and classifying it into a remote area and a proximal area and subsequently
identifying the dimension and location of the vehicle. Next, the Oriented
FAST and Rotated BRIEF (ORB) algorithm is employed to locate the
vehicle trajectories.14
An exhaustive study of the vehicle detection in dynamic conditions
such that visual data are processed using a feature representation method
known as object proposal methods has been presented by Sakhare et al.
(2020).15 Inspired by the capability and usage of CNN in analyzing a huge
image data,16 Leung et al. (2019) experimented vehicle detection in insuf
ficient and nighttime environment where the objects on photographs are
blurry and darkened using deep learning techniques.
186 Computer Vision and Recognition Systems
9.3.1 DATASET
The data for investigation are gathered from Udacity which provides a
labeled data of 9000 images consisting of vehicles and other 9000 images
where vehicles are not present considering all the images are of size (64
× 64). The dataset is an instance of GTI Vehicle Image Database, KITTI
Vision Benchmark Suite,10 and samples are extracted from the project
video graphs. A sample of images from the dataset is shown in Figure 9.1.
20 20 20 20 20
40 40 40 40 40
60 60 60 60 60
0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60
20 20 20 20 20
40 40 40 40 40
60 60 60 60 60
0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60
The data are of 17,760 samples of colored image and image of resolu
tion of (64 × 64) pixels. The dataset has been partitioned into a training set
consisting of 90% volume (15,984 samples) and validation set of 10% data
(1776 samples) in order to realize a balanced division, which in turn would
be a dominant factor later while training and testing the deep learning
model and may causes bias toward a particular class.
9.3.2 FLOWCHART
9.3.3 ARCHITECTURE
The system and its underlying components are represented in Figure 9.3.
The CNN model makes use of Rectified Linear Unit (RELU) activation
functions in the convolution layers whereas in order to compute output at
output layer, sigmoid function is being utilized. The use of RELU function
188 Computer Vision and Recognition Systems
The generated feature map is employed in the next step in order to trim
the input image. It has been exhibited in Figure 9.6 as multiple feature
maps form the convolutional layer.
Input Image
Convolutional Layer
The dataset is split into the training set (90%, 15,984 samples) and validation
set (10%, 1776 samples).
A neural network is designed to be operated implementing a CNN
with an objective to classify the images into car and non-car classes. The
fully convolutional network parameters are represented in Table 9.1 which
shows the structure of the CNN and its learning parameters. Here, “Conv”
represents a convolution layer; all pooling operations are performed using
Max_ pooling. The different levels of features of images in both convolution
and pooling layer are extracted and it is revealed that 1,347,585 total number
of parameters are elicited and trained in training phase.
After training for 20 epochs, the model can be employed for making a
prediction on a random sample (Fig. 9.8).
Additionally, the same network trained with our 64 × 64 images can be
used to detect cars anywhere in the frame. They scale to whatever the input
is, so now we have a heat map output. Consequently, abounding boxes can
be drawn on the hot positions.
In this section, we experiment detection of two lane lines on the road for
each frame using computer vision techniques (Fig. 9.9).
192 Computer Vision and Recognition Systems
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200
It is witnessed from Figure 9.10 that the accuracy of the model increases
drastically after the 1st epoch; however, after the 2nd epoch the accuracy
increases gradually.
194 Computer Vision and Recognition Systems
In Figure 9.11 above, the value of the loss decreases drastically after
the 1st epoch and then decreases gradually after the other 4th epoch.
9.6 CONCLUSION
KEYWORDS
• Vehicle Detection
• Lane Detection
• Autonomous Unmanned Vehicle
• Convolutional Neural Networks (CNN)
• Automatic driving
REFERENCES
1. Hadi, R.; Sulong, G.; George, L. Vehicle Detection and Tracking Techniques: A
Concise Review. Signal Image Process. Int. J 2014, 5(1), 1–12.
2. Bertozzi, M.; Broggi, A.; Fascioli, A. Vision-Based Intelligent Vehicles: State of the
Art and Perspectives. Robot. Auton. Syst. 2000, 32, 1–16.
3. Gill, N. K.; Sharma, A. Vehicle Detection from Satellite Images in Digital Image
Processing. Int. J Comput. Intell. Res. 2017, 13(5), 697–705.
4. Chandrasekhar, U.; Das, T. K. A Survey of Techniques for Background Subtraction and
Traffic Analysis on Surveillance Video. Univers. J Appl. Comput. Sci. Technol. 2011,
1(3), 107–113.
5. Liu, M.; Hua, W.; Wei, Q. Vehicle Detection Using Three-Axis AMR Sensors Deployed
Along Travel Lane Markings. IET Intell. Transp. Syst. 2017, 11(9), 581–587.
6. Alletto, S.; Serra, G.; Cucchiara, R. Video Registration in Egocentric Vision Under Day
and Night Illumination Changes. Comput. Vis. Image Underst. 2017, 157, 274–283.
7. Sivaraman, S.; Trivedi, M. Active Learning for On-Road Vehicle Detection: A
Comparative Study. Mach. Vis. Appl. 2011, 25(3), 599–611.
8. Lu, W.; Wang, H.; Wang, Q. A Synchronous Detection of the Road Boundary and
Lane Marking for Intelligent Vehicles, Eighth ACIS International Conference on
Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed
Computing 2007 IEEE, 2007; pp 741–745.
9. Khalifa, O. O.; Assidiq Abdulhakam, A. M.; Hashim Aisha-Hassan, A. Vision-
Based Lane Detection for Autonomous Artificial Intelligent Vehicles, 2009 IEEE
International Conference on Semantic Computing, 2009; pp 636–641.
10. Geiger, A. Are We Ready for Autonomous Driving? The Kitti Vision Benchmark
Suite, In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012;
pp 3354–3361. https://doi.org/10.1109/cvpr.2012.6248074.
11. Sun, Z.; Bebis, G.; Miller, R. On-Road Vehicle Detection: A Review. IEEE Trans.
Pattern Anal. Mach. Intell. 2006, 28(5), 694–711.
12. Sivaraman, S.; Trivedi, M. M. A Review of Recent Developments in Vision-Based
Vehicle Detection. IEEE Intelligent Vehicles Symposium (IV); Gold Coast, Australia,
2013; pp 310–315.
196 Computer Vision and Recognition Systems
13. Kul, S.; Eken, S.; Sayar, A. A Concise Review on Vehicle Detection and Classification;
ICET 2017; Antalya, Turkey, 2017.
14. Song, H.; Liang, H.; Li, H.; Dai, Z; Yun, X. Vision-Based Vehicle Detection and
Counting System Using Deep Learning in Highway Scenes. Eur. Trans. Res. Rev.
2019, 11(51), 1–16.
15. Sakhare, K. V.; Tewari, T.; Vyas, V. Review of Vehicle Detection Systems in Advanced
Driver Assistant Systems. Arch. Comput. Methods Eng. 2020, 27, 591–610.
16. Das, T. K.; Chowdhary, C. L.; Gao, X. Z. Chest X-Ray Investigation: A Convolutional
Neural Network Approach. J Biomim. Biomater. Biomed. Eng. 2020, 45, 57–70.
17. Leung, H. K.; Chen, X. Z.; Yu, C. W.; Liang, H. Y.; Wu, J. Y.; Chen, Y. L. A
Deep-Learning-Based Vehicle Detection Approach for Insufficient and Nighttime
Illumination Conditions. Appl. Sci. 2019, 9, 4769.
18. Chowdhary, C. L.; Acharjya, D. P. Segmentation and Feature Extraction in Medical
Imaging: A Systematic Review. Proc. Comput. Sci. 2020, 167, 26–36.
19. Khare, N.; Devan, P.; Chowdhary, C. L.; Bhattacharya, S.; Singh, G.; Singh, S; Yoon,
B.. SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid
Classifier Model for Intrusion Detection. Electronics, 2020, 9(4), 692.
20. Reddy, T., Swarna Priya, R. M.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan,
W. Z. A Deep Neural Networks Based Model for Uninterrupted Marine Environment
Monitoring. Comput. Commun. 2020, 157, 64–75.
21. Tripathy, A. K.; Das, T. K.; Chowdhary, C. L. Monitoring Quality of Tap Water in
Cities Using IoT. In Emerging Technologies for Agriculture and Environment,
Springer: Singapore, 2020, pp. 107–113.
22. Samantaray, S.; Deotale, R.; Chowdhary, C. L. Lane Detection Using Sliding Window
for Intelligent Ground Vehicle Challenge. In Innovative Data Communication
Technologies and Application, Springer: Singapore, 2021, pp. 871–881.
CHAPTER 10
ABSTRACT
10.1 INTRODUCTION
demonstrates how the claim processes can be automated, serving all stake
holders: field worker, car owner, body shop partner to speed up the service
anytime and anywhere.
The outline of the chapter is as follows. Next section presents the back
grounds including the literature reviews of car damage evaluation systems
and object detection methods. Then, the overall system, the description of
each element, and software architecture are presented. The implementation
of each system element is then described. Finally, the evaluation process
as well as the conclusion remarks are presented.
10.2 BACKGROUND
recognition. Our work is built upon the integration with open source
software, supports multiple image processing at the time and provides a
user-friendly and price estimation.
Figure 10.2(b) presents the car damage detective software which is
an open source on Github by Neokt.4 Compared to ours, IVAA is used to
detect the specific vehicle part via images, support multiple images of the
vehicle, and provide a price estimation on a mobile application.
IBM Watson is a system based on cognitive computing as shown in
Figure 10.3. It contains three elements: Watson Visual recognition, web
server, and mobile application.
Table 10.1 compares the three softwares in many aspects. The required
features of the software are such as classification, localization, automatic
model training, and cloud support.
In Ref [3], IBM Watson has on its own recognition engine while ours
and car detective are based on Tensorflow and Keras. Compared to these,
we can detect more vehicle parts and more damage levels. To achieve an
accurate estimation, the model should be able to infer the type of damage.
As it affects the expense, it is necessary for the service to suggest the
repair or replace the damaged part.
The template matching method is a naive approach for finding a similar
pattern in the image.5 The extension is gray scale-based matching and
edge-based matching outlines.6 The gray scale-based matching is able to
reduce the computation time, resulting in up to 400 times faster than the
base-line method while edge-based matching performs the matching only
on an edge of an object.7–8 The output is a gray scale image by each pixel
representing the degree of matching.
Damaged Vehicle Parts Recognition Using Capsule 201
CNN to generate the object proposals rather than using selective search
in the first stage.13 This layer is called region proposal network (RPN).
RPN uses the base network to extract feature map more precisely from
the image. Then, it separates the feature maps to the multiple squared
tiles and slides on a small network across each tile continuously. The
small network feeds a set of object confidence scores and bounding box
coordinates to each location of tile.14 RPN is designed to be trained in
an end-to-end manner. Using Faster R-CNN can reduce the training and
detection time.15–16
Recently, capsule neural network (CapsNet) has shown a better
accuracy than the typical CNN. A capsule is a group of neurons whose
activity vector represents the instantiation parameters of a specific type
of entity such as an object or an object part.17 CapsNet contains capsules
rather than neurons. The group of capsules learns to detect an object
within a given region of the image, and gives the outputs vector which
represents the estimated probability that the object is present and whose
orientation encodes the object’s pose parameters.18 The capsules are
equivariant to the object pose, orientation, and size.
The architecture contains an encoder and a decoder as shown in
Figure 10.4. The encoder is used to take the input data and convert it
into the n-dimensional vector. The weights of the lower-level capsule
(PrimaryCaps) must align with the weights of the higher-level capsule
(DigitCaps). At the end of the encoder, an n-dimensional vector is passed
to the decoder. The decoder contains many fully connected layers. The
main job of the decoder is used to take the n-dimensional vector and
attempt to reconstruct from scratch which makes the network more robust
by generating predictions based on its own weights.
There are four user roles in the IVAA system: insurance experts, data
scientists, operators, and field employees as shown in Figure 10.5. The four
tools are developed for these four users: data labeling tool for insurance
experts, deep learning APIs for data scientists, web monitoring application
for operators, and LINE chatbot to interact with the back-end server for
field employees as in product layers in Figure 10.5.
The labeling task is one of the time-consuming tasks before the training
model process can start. The traditional labeling software such as LabelImg
and Imglab19 works as a standalone application which makes it hard to
handle large number of data annotations. Figure 10.6 shows the flows
of our tool which has a web interface where the user can collaboratively
204 Computer Vision and Recognition Systems
work on the labeling task. The labeling tool returns a downloadable JSON
file for the user for future use. VueJS is used as a frontend framework
and REST API server. The labeling tool is also useful for adding more
damaged labeled images for future retraining.
APIs are gateways which are designed for data scientists and developers to
train and deploy the model. Figure 10.7(a) presents the deep learning API
used to input new data and model hyper-parameter for training to create the
new deep learning model. The API returns the model identification (model
ID) to the user as a link for the model deployment. Figure 10.7(b) shows
the testing API which inputs the testing data and model ID to deploy the
model. It returns with the list of damaged parts and levels on the vehicle
along with the accuracy.
The operators monitor the cases using the web monitoring application. It
shows the historical data that contains the number of cases, the number
Damaged Vehicle Parts Recognition Using Capsule 205
of processed images, and the number of days that system operated. The
visualization displays in the heat map style, showing the frequency of
accidents by locations and calendar days. Figure 10.8 shows an example
of tasks that an office operator monitors with an overview of the case and
location.
(a) Model training API sequence diagram. (b) Model testing API sequence diagram.
FIGURE 10.7 Deep learning APIs sequence diagram.
Field employees use the LINE chatbot service specifically designed for
insurance field employees. The chatbot takes the damaged car images via
LINE chat and gives the resulting car model and price table images, along
with the list of body shop details and locations.
In Figure 10.9, the field employee sends the detail of an accident case,
customer ID, shares the accident location, and uploads the damaged car
images. Next, the deep learning testing API is executed to recognize the
damaged parts and classify the damage level from the submitted photos.
The chatbot stores the communication dialogues to the main database.
All the above services are deployed on the private cloud system with
hardware specification listed in Table 10.2. We use the private server to
train the model, serving the model, and hosting a website.
Damaged Vehicle Parts Recognition Using Capsule 207
10.4 IMPLEMENTATION
The labels used for building the models come from multiple insurance
experts and the experts may have different subjective opinions on how
some of the cases should be labeled. We have studied this scenario by
designing a multi-expert learning framework that assumes the information
on who labeled the case is available. The framework explicitly models
different sources of disagreements and lets us naturally combine labels
from different human experts to obtain a consensus classification model
representing the model groups of experts converging to and individual
expert models.
Damaged Vehicle Parts Recognition Using Capsule 209
Deep learning APIs are gateways for the user to deploy our system.
This enables adding the new data sets and retraining the deep learning
model effectively. Incremental retraining allows the increments of model
accuracy when having limited computing power.
Figure 10.14 presents the web application developed using VueJS
framework with Bulma CSS framework. Web monitoring application
is targeted for an office worker, a system administrator and a business
manager. An application has six main pages for monitoring and interacting
with the system.
The Login page on our web monitoring application is shown in
Figure 10.14(a). The Authentication Required function in Go program
ming language library is adopted. The security in the front end is one way
to limit the user interference. However, some users require more flexibility
than others and there are always trade-offs.
Figure 10.14(b) shows the dashboard page on our system. It contains
three elements: (1) the cases (2) the images processed, and (3) how long
systems operated. The first element is the important one where it presents
cases reported as well as case management. The second element is about
images and their processes. The third element is the system administrative
information. Dashboard is a data visualization tool that allows all users
to analyze issues to their system. It provides an objective view of perfor
mance metrics and serves as an effective foundation for further dialogue.
Damaged Vehicle Parts Recognition Using Capsule 211
Figure 10.14(c) shows the heat map of the cases reported. The primary
purpose of heat maps is to visualize the volume events by locations within
the data sets and assist in directing viewers toward areas. Fading color
shows the density of accident case in that location.
An accident case can be inserted via the case insertion page as shown
in Figure 10.14(d) For each accident case, the case identification number,
212 Computer Vision and Recognition Systems
and the LINE Platform sends a request to the webhook URL. The server
sends a request to the LINE platform to respond to the user.
The requests are sent over HTTPS in JSON format. The users can post
the IVAA web page onto their Line timelines to make it visible all their
friends. The LINE platform allows the user (field employee or customer)
to send the damaged car images to the company LINE official account to
get the price and damaged results.
After adding IVAA as a friend, the user starts using the service as in
Figure 10.15(b). The system requests the customer identification number
for authentication. Figure 10.15(c) presents our system authentication to
use our service. The service can generate the unique case identification
number to the user. The unique case identification number is used for
tracking the service progress.
The service also requires the user to share the place of an accident
location as shown in Figure 10.15(d) sharing an accident location allows
the field worker heading to the location.
Figure 10.15(e) shows our uploading the damaged vehicle’s photo in
the accident process. At the start, the user takes the photos of the damaged
vehicle includes the font side view, the back-side view, the left-side view,
and the right-side view. After that, the system acknowledges the receipt of
photos. Then, our service returns the analysis result from the deep learning
model. The user can visualize the damaged level on the vehicle parts using
the difference color as in Figure 10.15(f). In addition, our service can esti
mate the repair price with the breakdown level damaged parts of vehicle.
10.5 EVALUATION
The evaluation of the application is broken down into three parts. The first
part evaluates the IVAA deep learning models. Secondly, the user satisfaction
toward web application and LINE chatbot is assessed. Finally, the comparison
of the our platform service against the pubic cloud platform is presented.
IVAA deep learning model is compared against the template matching
approach and other object detection on the selected car damage data set.
Template matching is a technique in digital image processing for finding
small parts of an image which matches a template image. The typical
object detection algorithm such as R-CNN is used.
IVAA deep learning model utilizes CapsNet to enhance our deep
learning model. Due to its recent outstanding performance, we applied
CapsNet to detect the damaged vehicle object from the photos, and then
Damaged Vehicle Parts Recognition Using Capsule 215
recognize the damaged vehicle parts and the levels of severity. However,
since the focus of the work is the application of the model toward the auto
insurance claiming process, alternative object detection model is possible.
The architecture of CapNets is shown in Figure 10.16. From the bounding
box, the damage part, CapsNet classifies the damage into the mentioned
five levels. The part of car is highlighted according to the damage level.
Toyota Camry image set available on https://gitlab.com/Intelligent
Vehicle-Accident-Analysis is used for evaluation. The data set includes
1624 images and we divide 80% training and 20% testing. IVAA utilizing
CapsNet yields the accuracy up to 97.21% as shown in Figure 10.17. It has
greater accuracy than that of the template matching approach (93.58%).
The object detection approach of traditional computer vision technique
explores multiple paths where the algorithm is simplified but yet it can
achieve higher accuracy with less computation cost (91.53%).
To deploy the model for LINE ChatBot use, we set the threshold for
bounding box detection and severe classification to 97.21%. Intersection
over under (IoU) for our proposed system is 89.53%. The average inference
time per image is 13.12 s on our private cloud. Figure 10.18 implies the
inference time when increasing the number of images to 20 images.
The general opinions from 30 users are collected. The average score
for each aspect is shown. The average overall score is 4.69/5 for applica
tion side and 4.66/5 for intelligence module. There are 93.3% of users are
highly recommend to their friends or companies. Moreover, experience of
users expects to use our system in the real situation.
Damaged Vehicle Parts Recognition Using Capsule 219
Tables 10.6 and Table 10.7 compare our platform against public services
and general web development. IVAA targets at specific task, car damage
detection, rather than general vision task. Our service solution using LINE
is ready to use and the development process is not complex compared to
using WebApp and NativeApp.
10.6 CONCLUSION
KEYWORDS
• AI as a service
• object detection
• image classification and localization
• capsule neural networks
• scalable data processing
REFERENCES
25. Reddy, G. T.; Bhattacharya, S.; Ramakrishnan, S. S.; Chowdhary, C. L.; Hakak, S.;
Kaluri, R.; Reddy, M. P. K. (2020, February). An Ensemble based Machine Learning
model for Diabetic Retinopathy Classification. In 2020 International Conference on
Emerging Trends in Information Technology and Engineering (ic-ETITE) (pp. 1-6).
IEEE.
26. Shynu, P. G.; Shayan, H. M.; & Chowdhary, C. L. (2020, February). A Fuzzy based
Data Perturbation Technique for Privacy Preserved Data Mining. In 2020 International
Conference on Emerging Trends in Information Technology and Engineering
(ic-ETITE) (pp. 1-4). IEEE.
CHAPTER 11
ABSTRACT
Medical image security becomes more and more important. Full image
encryption is not necessary in the field of medical because partial amount
of encryption is enough to provide the security. Here Proposed is a partial
image encryption of medical images, which uses different permutation
techniques. Proposed technique mainly consists of permutation and
diffusion process. Original medical image divided into nonoverlapping
blocks with the help of block size table. Then position of each pixel in
every blocks are shuffled according to chaotic sequence generated from
the chaotic map system and predefined block size table. In the diffusion
process, based on basic intensity image (BII) and different permutation
technique, the mapping operation apply to get partially encrypted medical
224 Computer Vision and Recognition Systems
11.1 INTRODUCTION
the help of DWT which leads to four frequency band of original image in
frequency domain. Belaze et al.2 explained about most common shuffling-
diffusion process based on an image encryption system where the diffusion
of the image occurred first then followed by chaos-based shuffling process.
Xiang et al. describe the medical image full and selective image encryp
tion. This technique consists of several stages where every stage consists
of permutation phase and diffusion phase. Block-based concept is used to
permute and encrypt with the help of chaotic map.16 Parameshachari et al.
(2013) proposed partial encryption for medical images which uses the DNA
encoding and addition techniques. Random image is generated from chaotic
map which undergo DNA addition with original image to get different partial
encrypted images.10 Bhatnagar and Wu explain the concept of SVD and pixel
of interest to encrypt selectively the group of pixels in the input image. The
idea of this method is to use saw tooth space fiand Q curve to shuffle the pixel
positions and diffusion can be done with the help of nonlinear chaotic map.3
Mahmood and Dony explained algorithm which divides the medical image
into two parts based on amount of significant and nonsignificant information
namely the region of interest (ROI) and the region of background (ROB).
To reduce the encryption time, AES applied to ROI and Gold code (GC)
to ROB.6 Parameshachari et al. introduced the partial encryption of color
RGB image. In this method, input color image is segmented into number of
macroblocks. Based on the interest, few significant blocks are selected and
encrypted using chaotic map.9 Chowdhary et al.19 explained about different
fuzzy segmentation methods used for dividing and detecting brain tumors
in the medical MRI images. Chowdhary et al.20 introduced a hybrid scheme
for breast cancer detection using intuitionistic fuzzy rough set technique.
The hybrid scheme starts with image segmentation using intuitionistic fuzzy
set to extract the zone of interest and then to enhance the edges surrounding
it. Chowdhary21 explained about how clustering approach holds the posi
tive points of possibilistic fuzzy c-mean that will overcome the coincident
cluster problem, reduce the noise, and bring less sensitivity to an outlier.
Chowdhary et al.22 explained experimental assessment of beam search
algorithm for improvement in image caption generation.
The entire chapter is divided into various sections, where Section 2
explains about various permutation methods used in the proposed system.
Section 3 gives detailed description of proposed partial encryption system
based on various permutation techniques. Performance metric analysis of
the proposed system is explained in Section 4. At last conclusion of the
chapter is described in Section 6.
226 Computer Vision and Recognition Systems
Selecting the map for any image encryption scheme is very important
and also the major step. Here, chaotic map has been used for permutation
process because of its tremendous features like periodic windows, chaotic
interval, complexity, sensitivity to initial condition, uses of chaotic system
in encryption system more secure and less complex. Chaotic map fulfills
the requirement of encryption system in terms of privacy and efficiency.10
Mathematical chaotic map can be defined by using following eq 11.1
which includes two parameters that is r and x0 that will be considered as
key for the encryption.
X n+1 r * X n ( 1 − X n )
= (11.1)
where the range of initial parameter x lies between 0 and 1. The range of r
lies between 3.57 and 4.
Another system used for generating the random sequence is the continuous
chaotic system which can be defined by Lorenz system30 as shown in eq 11.2.
−10 10 0 x 0
x′
′ 8 4 0 y + −xz .
=y
(11.2)
z ′ 0 0 −8 z xy
3
To remove the near predictability of above CC system by adjusting the
output sequences x, y, z. Later long sequence can be obtained by combining
every values of all the three sequences. This sequence is arranged in the
Partial Image Encryption of Medical Images Based 227
nondecreasing order and store the new index values for the shuffling
process.
Original Matrix
(a) Chaotic Map (b) Continuos Chaos (c) Suduku (d) Arnold cat Map
and size of macroblock has been defined in Table 11.1. By using one of
the abovementioned permutation technique especially chaotic map used for
changing the pixels within every segmented block to get the various inter
mediate permuted images. With the use of basic intensity image (BII) where
it contains all the pixels ranging from 0 to 255 in mapping process along
with one of the permutation method to get the various partial encrypted
images. The detailed description of block-wise permutation and mapping
process can be explained as mentioned below. Permutation process:
FIGURE 11.2 Architecture of partial image encryption system for medical images.
The steps explain about how input image is permuted by using chaotic
map system.
Step 1: Input plain medical image having a size M * N.
Step 2: Partitioning the plain medical image into nonoverlapping
macroblocks according to predefined block size from block
size Table 11.1.
Step 3: Generate the random sequence with the help of chaotic system
eq 11.1 along the initial key x0 and r. Chaotic sequence X can
be represented as:
X = x1, x2, x3, ……………………xn – 1
Step 4: Arrange the above chaotic sequence X in the increasing order
and store the newly obtained index values.
Step 5: With respect to new position values, randomly permute the
position of gray values in every block.
Step 6: To get the randomly permuted image by merging all the blocks
in a nonoverlapped fashion to obtain permuted image.
After block-wise permutation, we get different permuted images.
Apply the different permutation methodology for the permuted images in
the mapping stage process and select different permutation techniques.
Steps involving in mapping stage are as follows:
Step 1: Input BII for the mapping process along with one of the permu
tation technique.
Step 2: Every pixel of intermediate permuted image to be converted
into its binary 8-bit number.
Step 3: Split the binary 8-bit binary into two 4-bit number by grouping
most significant 4-bit as a higher nibble and least significant
4-bit as a lower nibble.
Step 4: Upper and lower nibble 4-bit number converted into its equiva
lent decimal value.
Step 5: By using two decimal values obtained from step 4 are used to
fetch the gray value pixel basic intensity mapping image. Where
decimal value of a upper nibble is treated as a row indicator and
decimal value of lower nibble is treated as a column indicator
for mapping image.
where M, N is the total row and total column of image and org(i,j) is
original input image and enc(i,j) is encrypted image.
UACI
1 org (i, j ) − enc(i, j )
=
M ∗N ∑
i, j
255 (11.7)
×100%
where, M stands for image’s width, N stands for image’s height, and where
D(i,j) is defined as follows (Table 11.2):
1 if I ( i, j ) ≠ E ( i, j ) ;
D ( i, j ) =
0 if I ( i, j ) = E ( i, j ) ,
where I(i,j) and E(i,j) are the original input image and output cipher image,
respectively.
where µx, µy, σx, σy, and σxy are the mean of x and y, variance x and y, and
the covariance of x and y, respectively.
The SSIM is the extended version of the UIQ index. Range of SSIM is
[−1,1] where value 1 indicates more similarity and value −1 indicates less
similarity. SSIM is defined as follows:12
(2 µ x µ y + C1)(2σ xy + C2)
SSIM ( x, y ) = 2 (11.10)
(µ x + µ 2 y + C1)(σ 2 x + σ 2 y + C2)
Partial Image Encryption of Medical Images Based 233
M
1
MSSIM =
M ∑SSIM ( x , y )
j=1
j j (11.11)
where C1, C2 are two constants and are used to stabilize the division with
weak denominator.
TABLE 11.2 Results Obtained from Proposed Method for Baby Image.
PIE
Gray
code
Sudoku
Cat map
Chaos
TABLE 11.3 MSE for Baby Image for Different Permutation Techniques.
MSE for Baby Image
PIE List 1 2 3 4 5 6 7
GC 31.62 35.21 40.24 47.39 56.72 68.85 82.45
SC 33.10 32.72 32.38 32.40 32.72 33.18 39.19
AC 28.43 27.84 26.97 26.16 25.00 22.35 15.67
CC 26.37 26.53 26.68 26.50 25.38 23.45 17.17
TABLE 11.4 PSNR for Baby Image for Different Permutation Techniques.
PSNR for Baby Image
PIE List 1 2 3 4 5 6 7
GC 33.13 32.66 32.08 31.37 30.59 29.75 28.96
SC 32.93 32.98 33.02 33.02 32.98 32.92 32.19
AC 33.59 32.68 33.82 33.95 34.15 34.63 36.17
CC 33.91 33.89 33.86 33.89 34.08 34.42 35.78
TABLE 11.5 NPCR for Baby Image for Different Permutation Techniques.
NPCR for Baby Image
PIE List 1 2 3 4 5 6 7
GC 54.92 58.57 60.46 62.29 66.76 73.31 81.12
SC 99.65 99.63 99.64 99.60 99.54 99.53 99.47
AC 99.80 99.78 99.79 99.81 99.83 99.84 99.83
CC 99.71 99.75 99.77 99.78 99.80 99.79 99.79
TABLE 11.6 UACI for Baby Image for Different Permutation Techniques.
UACI for Baby Image
PIE List 1 2 3 4 5 6 7
GC 4.90 5.73 6.94 8.54 11.13 14.45 19.58
SC 29.54 29.57 29.63 29.72 29.90 30.31 32.01
AC 63.40 63.42 63.39 63.36 63.22 62.77 61.30
CC 57.98 57.98 58.06 58.09 57.99 57.75 56.23
Partial Image Encryption of Medical Images Based 235
TABLE 11.7 SSIM for Baby Image for Different Permutation Techniques.
SSIM for Baby Image
PIE List 1 2 3 4 5 6 7
GC 0.7426 0.6718 0.5356 0.4110 0.2572 0.1394 0.0271
SC 0.0166 0.0184 0.0180 0.0177 0.0156 0.0159 0.0152
AC 0.0122 0.0093 0.0124 0.0117 0.0129 0.0100 0.0095
CC 0.0185 0.0174 0.0138 0.0116 0.0141 0.0115 0.0113
TABLE 11.8 UQI for Baby Image for Different Permutation Techniques.
UQI for Baby Image
PIE List 1 2 3 4 5 6 7
GC 0.9287 0.8612 0.7488 0.6186 0.5769 0.4109 0.3439
SC 0.2376 0.2372 0.2391 0.2458 0.2606 0.2760 0.2795
AC 0.2435 0.2427 0.2399 0.2341 0.2215 0.2093 0.1881
CC 0.2479 0.2475 0.2449 0.2397 0.2307 0.2155 0.2005
TABLE 11.9 Gray Code (GC) Results Obtained from Proposed Method for Lena and
Pepper Images.
PIE-Lena
Gray code
PIE-Pepper
Gray code
TABLE 11.10 NPCR and UACI Comparison Between Proposed GC Code Method and
Existing Method.
Images GC code Ref. [11]
NPCR UACI NPCR UACI
Lena 99.59 29.01 98.69 18.23
Pepper 99.62 29.97 97.23 22.21
236 Computer Vision and Recognition Systems
TABLE 11.11 MSE and PSNR Comparison Between Proposed GC Code Method and
Existing Method.
Images GC code Ref. [11]
MSE PSNR MSE PSNR
Lena 89.29 28.62 9.83 6801
Pepper 94.26 28.38 9.10 8051
11.6 CONCLUSION
ACKNOWLEDGMENT
KEYWORDS
• permutation
• encryption
• chaotic map
• intensity image
• arnold map
• Gray code
Partial Image Encryption of Medical Images Based 237
REFERENCES
1. Ahmad, J.; Ahmed, F. Efficiency Analysis and Security Evaluation of Image Encryp
tion Schemes. Computing 2010, 23, 25.
2. Belazi, A.; Abd El-Latif, A. A.; Belghith, S. A Novel Image Encryption Scheme
Based on Substitution-Permutation Network and Chaos. Signal Process. 2016, 128,
155–170.
3. Bhatnagar, G.; Jonathan Wu, Q. M. Selective Image Encryption Based on Pixels of
Interest and Singular Value Decomposition. Digital Signal Process. 22(4), 648–663,
2012.
4. Goel, A.; Chaudhari, K. Median Based Pixel Selection for Partial Image Encryption,
2015.
5. Hua, Z.; Zhou, Y.; Pun, C. M.; Philip Chen, C. L. 2d Sine Logistic Modulation Map
for Image Encryption. Inform. Sci. 297, 80–94, 2015.
6. Mahmood, A. B.; Dony, R. D. Segmentation Based Encryption Method for Medical
Images. In 2011 International Conference for Internet Technology and Secured
Transactions, 2011; pp 596–601.
7. Naveenkumar, S. K.; Panduranga, H. T.; et al. Partial Image Encryption for Smart
Camera. In Recent Trends in Information Technology (ICRTIT), 2013 International
Conference on, 2013; pp 126–132.
8. Panduranga, H. T.; Naveenkumar, S. K.; et al. Partial Image Encryption Using Block
Wise Shuffling and Chaotic Map. In Optical Imaging Sensor and Security (ICOSS),
2013 International Conference on, 2013; pp 1–5.
9. Parameshachari, B. D.; Karappa, R.; Sunjiv Soyjaudah, K. M.; Devi KA, S. Partial
Image Encryption Algorithm Using Pixel Position Manipulation Technique:
The Smart Copyback System. In 2014 4th International Conference on Arti_cial
Intelligence with Applications in Engineering and Technology, 2014; pp 177–181.
10. Parameshachari, B. D.; Panduranga, H. T.; Naveenkumar, S. K.; et al. Partial
Encryption of Medical Images by Dual Dna Addition Using Dna Encoding. In 2017
International Conference on Recent Innovations in Signal processing and Embedded
Systems (RISE), IEEE, 2017; pp 310–314.
11. Som, S.; Mitra, A.; Kota, A. A Chaos Based Partial Image Encryption Scheme, 2014.
12. Wang, Z.; Bovik, A. C. Modern Image Quality Assessment. Synth. Lect. Image Video
Multimedia Process. 2006, 2(1), 1–156.
13. Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image Quality Assessment:
from Error Visibility to Structural Similarity. IEEE Transac. Image Process. 2004,
13(4), 600–612.
14. Wu, X.; Wang, D.; Kurths, J.; Kan, H. A Novel Lossless Color Image Encryption
Scheme Using 2d dwt and 6d Hyperchaotic System. Inform. Sci. 2016, 349, 137–153.
15. Wu, Y.; Noonan, J. P.; Agaian, S. Npcr and UACI Randomness Tests for Image
Encryption. Cyber Journals: Multidisciplinary Journals in Science And Technology,
Journal of Selected Areas in Telecommunications (JSAT), 2011; pp 31–38.
16. Xiang, T.; Hu, J.; Sun, J. Outsourcing Chaotic Selective Image Encryption to the
Cloud with Steganography. Digital Signal Process. 2015, 43, 28–37.
238 Computer Vision and Recognition Systems
17. Xu, L.; Gou, X.; Li, Z.; Li, J. A Novel Chaotic Image Encryption Algorithm Using
Block Scrambling and Dynamic Index Based di_usion. Opt. Lasers in Eng. 2017,
91, 41–52.
18. Ye, G.; Huang, X. An Image Encryption Algorithm Based on Autoblocking and
Electrocardiography. IEEE Multimedia 2016, 23(2), 64–71.
19. Chowdhary, C. L.; Goyal, A.; Vasnani, B. K. Experimental Assessment of Beam
Search Algorithm for Improvement in Image Caption Generation. J Appl. Sci. Eng.
2019, 22(4), 691г698.
20. Khare, N.; Devan, P.; Chowdhary, C. L.; Bhattacharya, S.; Singh, G.; Singh, S.; Yoon,
B. SMO-DNN: Spider Monkey Optimization and Deep Neural Network Hybrid
Classifier Model for Intrusion Detection. Electronics 2020, 9(4), 692.
21. Chowdhary, C. L.; Acharjya, D. P. Segmentation and Feature Extraction in Medical
Imaging: A Systematic Review. Procedia Comput. Sci. 2020, 167, 26–36.
22. Reddy, T.; RM, S. P.; Parimala, M.; Chowdhary, C. L.; Hakak, S.; Khan, W. Z. A Deep
Neural Networks Based Model for Uninterrupted Marine Environment Monitoring.
Comput. Commun. 2020a.
23. Chowdhary, C. L. 3D Object Recognition System Based on Local Shape Descriptors
and Depth Data Analysis. Recent Pat. Comput. Sci. 2019, 12(1), 18–24.
24. Chowdhary, C. L.; Acharjya, D. P. Singular Value Decomposition–Principal
Component Analysis-Based Object Recognition Approach. Bio-Inspired Computing
for Image and Video Processing, 2018; p 323.
25. Reddy, G. T.; Bhattacharya, S.; Ramakrishnan, S. S.; Chowdhary, C. L.; Hakak, S.;
Kaluri, R.; Reddy, M. P. K. An Ensemble based Machine Learning model for Diabetic
Retinopathy Classification. In 2020 International Conference on Emerging Trends in
Information Technology and Engineering, (ic-ETITE), IEEE, 2020b; pp 1–6.
26. Chowdhary, C. L. Application of Object Recognition With Shape-Index Identification
and 2D Scale Invariant Feature Transform for Key-Point Detection. In Feature
Dimension Reduction for Content-Based Image Identification, IGI Global, 2018; pp
218–231.
27. Shynu, P. G.; Shayan, H. M.; Chowdhary, C. L. A Fuzzy based Data Perturbation
Technique for Privacy Preserved Data Mining. In 2020 International Conference on
Emerging Trends in Information Technology and Engineering (ic-ETITE), IEEE,
2020; pp 1–4.
28. Benson, R.; et al. A New Transformation of 3D Models Using Chaotic Encryption
Based on Arnold Cat Map. International Conference on Emerging Internetworking,
Data & Web Technologies, Springer, Cham, 2019.
29. Yue, W.; et al. Design of Image Cipher Using Latin Squares. Inform. Sci. 2014, 264,
317–339.
30. Arshad; Usman; Batool, S.; Amin, M. A Novel Image Encryption Scheme Based on
Walsh Compressed Quantum Spinning Chaotic Lorenz System. Int. J Theor. Phys.
2019, 58(10), 3565–3588.
31. Jun-xin, C.; et al. An Efficient Image Encryption Scheme Using Gray Code Based
Permutation Approach. Opt. Lasers Eng. 2015, 67, 191–204.
CHAPTER 12
ABSTRACT
The emergence of artificial Intelligence has paved the way for numerous
developments in the domain of machine vision. One of the many frame
works and algorithms which have set a benchmark for the generation of
data from learned parameters is Generative Adversarial Networks. In this
chapter, Generative Adversarial Networks (GANs) and similar algorithms,
such as Variation Auto-Encoders (VAEs), are used to generate handwritten
digits from noise. Furthermore, the training data has been visualized to gain
a proper understanding of the data our model is trying to learn.
12.1 INTRODUCTION
Generally:
• New data instances can be generated by generative model.
• Discriminative models segregate between various types of informa
tion cases.
This generative model could produce new photograph of creatures that
resemble genuine creatures; hence, the working of GAN and generative
models are similar. All the more officially, group of data occurrences X
and set of labels Y:
• The likelihood p(X,Y) or p(X) is obtained by the generative model
of the GAN architecture.
• The discriminative models catch the contingent likelihood p(Y | X).
Dissimilarity between discriminative and generative models of manu
ally written4 digits is shown in Figure 12.1. A generative model5 includes
the allocation of the data. For instance, the models for predicting the next
word in a sequence are similar to the generative models and are more
simple as compared to the GANs, in light of the fact that they assign a
probability to a sequence of words.11
data space. On the off chance that it gets the line right, it can recognize 0’s
from 1’s while never having to demonstrate or identify where the digits
will be precisely on either side of the line. Conversely, the generative
representation attempts to deliver persuading 1’s and 0’s by creating digits
that falls near their genuine partners in the allotted data space. It needs to
display dispersion all through the data space.1,16-20
The generator part of a GAN figures out how to make false data by joining
input from the discriminator. It figures out how to cause the discriminator
to group its output as real. Generator preparing requires more tight combi
nation between the generator and the discriminator than discriminator
preparing requires. The segment of the GAN that prepares the generator
incorporates:
244 Computer Vision and Recognition Systems
• Random input
• Generator arrange, which changes the irregular contribution to an
information occasion.
• The neural network of the discriminator that is utilized for identi
fying the obtained data.
• Discriminator output
• Generator misfortune, which penalizes the generator for neglecting
to trick the discriminator.
12.3.3 CONVERGENCE
A GAN can have two misfortune capacities: first for generator preparing
and furthermore the second for discriminator preparing. In both of those
plans, in any case, the generator can just influence one term inside the
separation measure: the term that mirrors the circulation of the copy infor
mation. So during generator preparing, we drop the contrary term, which
mirrors the distribution of the significant data.
Minimax Loss8 is shown in eq 11.1. The generator tries to attenuate the
subsequent function while the discriminator tries to maximize it:
Image Synthesis with Generative Adversarial 245
Input
Dense - 500
Dense - 120
μ σ
Dense - 30 Dense - 30
Sample - 30
Dense - 120
Dense - 500
Output
Similar to that of GANs, VAEs2 are used for the representation of latent
variables. The issue with auto encoders is that their latent space might not
be continuous. Furthermore, there might be problems with interpolation as
well. On the other hand, VAEs have latent spaces which are continuous,
allowing for easy random sampling and interpolation.10
Due to the issue of unstability which GANs face, the model at its
ideal training path, takes much lengthy time for the perfect generation
of handwritten digits samples that fool the discriminator. To combat
this, a ConvNet was pretrained on the MNIST dataset7 and is utilized as
a reinstatement for the previously prevailing discriminator for reducing
unstability among latent variables.
One the other hand VAEs showed exception training capacity and
stability. Even though it took long training periods, it reduced the chances
of unstability by following the right training path and ending up with
near perfect generated results. A similar concept could be applied to 3D
objects stored in the format of Point Clouds. A 3D cloud can be created by
compressing the data into a voxel-based compression and then fed to 3D
convolutional layers instead of 2D layers.
12.5.1 DATASET
samples. This dataset is the smaller portion in the larger dataset provided
by NIST as given by Yann LeCun, Corinna Cortes, and Christopher J.C.
Burges.4
12.5.2 METHODOLOGY
The MNIST handwritten digits dataset contains 60,000 training and 10,000
testing images of dimension 28 × 28. There are 10 classes of data each
comprising of 7000 images of the respective handwritten digits (0–9). The
dataset was loaded and trained on our constructed model for over 80,000
iterations. For every iteration, the outputs were recorded and plotted to
check for progress.
Trained model on the dataset for over 80,000 iterations and posted
the results as shown in Figure 12.6. On applying t-SNE9 on our dataset,
as demonstrated in Figure 12.6, that at around 2000 epochs we reach a
good enough clustering to understand that a generative and adversarial
model would be able to pick up on the high level and low-level features as
possessed by the data.
FIGURE 12.6 Tensor board visualization of the dataset as 10 clusters using t-SNE.
248 Computer Vision and Recognition Systems
After a certain point in training time, the outliers and noise in the data
gets reduced and the network starts generating near perfect visualizations
of the handwritten digits in the dataset. Figure 12.6 shows how well the
data were analyzed by the network and images generated.
Due to the issue of unstability which GANs face, the model at its ideal
training path,13 took a longer time for perfecting the model and gener
ating handwritten digit samples that fool the discriminator. To combat
this, a ConvNet was pretrained on the MNIST dataset and then used as a
Image Synthesis with Generative Adversarial 249
KEYWORDS
REFERENCES
1. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Warde-Farley, D.; Ozair, S.; Bengio, Y.
Generative Adversarial Nets. In Advances in Neural Information Processing Systems;
2014; pp 2672–2680.
2. Kingma, D. P.; Welling, M. Stochastic Gradient VB and the Variational Auto-Encoder.
In Second International Conference on Learning Representations, ICLR, 2014; Vol. 19.
250 Computer Vision and Recognition Systems
P R
Padding, 55 Recall-Oriented Understudy for Gisting
Pad-size (array A), 7 Evaluation (ROUGE), 156
Pairwise Nearest Neighbor Algorithm Rectified Linear Unit (RELU), 187
(PNNA), 99 activation function, 190
Parkinson’s disease (PD), 13–15 use of function, 187–188
background, 15 Recurrent neural network (RNN), 151
future research directions, 32–33 Region of background (ROB), 225
handwriting tests, 16–18 Region of interest (ROI), 225
feature selection algorithms, 18 Region proposal network (RPN), 202
spiral test samples, 17 Region-based convolutional neural
literature review, 19–26, 27–29 network (R-CNN), 201
solutions and recommendations, 26, Resilient Distributed Dataset (RDD), 80
30–32 REST architectural style, 209
treatment, 18
voice data, 15–16 S
Pathological brain detection system Scale Invariant Feature Transform (SIFT),
(PBDS), 130 97
Peak signal to noise ratio (PSNR), 128, Scene graph, 149
231 background, 151–153
Permutation techniques, 226 CNN and RNN, 151
arnold cat map (AC), 228 chatbot application, 158–161
chaotic map, 226 evaluation, 155–158
continuous chaos (CC), 226–227 example, 151
gray code (GC), 227 generator, 150
sudoku code (SC), 227 methodology, 153–155
Pooling, 55 preprocess data procedure, 154
Proposed defogging algorithm, 6 translation step, 155
BCP with boundary constraints, 7 previous generation approach, 153
BCP with pad image, 7–8 Semantic gap, 94
boundary constraints, 7 Semantic Propositional Image Caption
experimental results, 8 Evaluation (SPICE), 156
qualitative analysis, 10 Shift-invariant shearlet transformation
quantitative analysis, 9 (SIST), 122
setup, 8–9 Signal to noise ratio (SNR), 128
Proposed partial image encryption (PIE) Single image defogging technique
method, 228–230 visual quality improvement, 1
architecture, 229 Single shot detector (SSD), 152
experimental results, 233–236 Six-dimensional hyper chaos, 224
performance metric, 230 Speeded Up Robust Features (SURF), 97
MSE, 231 Static ICP (SICP), 81
NPCR, 231–232 Stochastic resonance (SR), 129
SSIM, 232–233 Stride, 54–55
UACI, 231–232 Structural similarity index measure
UIQ index, 232 (SSIM), 232–233
256 Index